Concordia
|
#include <hash_generator.hpp>
Public Member Functions | |
HashGenerator (std::string indexPath, boost::shared_ptr< ConcordiaConfig > config) throw (ConcordiaException) | |
virtual | ~HashGenerator () |
TokenizedSentence | generateHash (const std::string &sentence, bool byWhitespace=false) throw (ConcordiaException) |
TokenizedSentence | generateTokens (const std::string &sentence, bool byWhitespace=false) throw (ConcordiaException) |
void | serializeWordMap () |
void | clearWordMap () |
Class for generating a sentence hash. The hash is generated from a sentence given in raw string. String is first tokenized by SentenceTokenizer and then each token is coded as an integer, according to WordMap. Resulting hash is an instance of TokenizedSentence.
Hashed sentence is used when adding a sentence to index and during searching.
HashGenerator holds an instance of WordMap, used to code tokens as integers and SentenceTokenizer, used to tokenize the sentence string.
|
explicit |
Constructor.
indexPath | path to the index directory |
config | pointer to current config object |
|
virtual |
Destructor.
void HashGenerator::clearWordMap | ( | ) |
Clears word map.
TokenizedSentence HashGenerator::generateHash | ( | const std::string & | sentence, |
bool | byWhitespace = false |
||
) | |||
throw | ( | ConcordiaException | |
) |
Generates hash of a sentence.
sentence | sentence to generate hash from |
byWhitespace | whether to tokenize the sentence by whitespace |
TokenizedSentence HashGenerator::generateTokens | ( | const std::string & | sentence, |
bool | byWhitespace = false |
||
) | |||
throw | ( | ConcordiaException | |
) |
This method acts like generateHash, but only performs tokenization. Resulting TokenizedSentence does not have token codes information.
sentence | sentence to tokenize |
byWhitespace | whether to tokenize the sentence by whitespace |
void HashGenerator::serializeWordMap | ( | ) |
Saves the contents of current WordMap to HDD.