Concordia
HashGenerator Class Reference

#include <hash_generator.hpp>

Public Member Functions

 HashGenerator (std::string indexPath, boost::shared_ptr< ConcordiaConfig > config) throw (ConcordiaException)
 
virtual ~HashGenerator ()
 
TokenizedSentence generateHash (const std::string &sentence, bool byWhitespace=false) throw (ConcordiaException)
 
TokenizedSentence generateTokens (const std::string &sentence, bool byWhitespace=false) throw (ConcordiaException)
 
void serializeWordMap ()
 
void clearWordMap ()
 

Detailed Description

Class for generating a sentence hash. The hash is generated from a sentence given in raw string. String is first tokenized by SentenceTokenizer and then each token is coded as an integer, according to WordMap. Resulting hash is an instance of TokenizedSentence.

Hashed sentence is used when adding a sentence to index and during searching.

HashGenerator holds an instance of WordMap, used to code tokens as integers and SentenceTokenizer, used to tokenize the sentence string.

Constructor & Destructor Documentation

HashGenerator::HashGenerator ( std::string  indexPath,
boost::shared_ptr< ConcordiaConfig config 
)
throw (ConcordiaException
)
explicit

Constructor.

Parameters
indexPathpath to the index directory
configpointer to current config object
HashGenerator::~HashGenerator ( )
virtual

Destructor.

Member Function Documentation

void HashGenerator::clearWordMap ( )

Clears word map.

TokenizedSentence HashGenerator::generateHash ( const std::string &  sentence,
bool  byWhitespace = false 
)
throw (ConcordiaException
)

Generates hash of a sentence.

Parameters
sentencesentence to generate hash from
byWhitespacewhether to tokenize the sentence by whitespace
Returns
tokenized sentence, containing the hash

Here is the call graph for this function:

TokenizedSentence HashGenerator::generateTokens ( const std::string &  sentence,
bool  byWhitespace = false 
)
throw (ConcordiaException
)

This method acts like generateHash, but only performs tokenization. Resulting TokenizedSentence does not have token codes information.

Parameters
sentencesentence to tokenize
byWhitespacewhether to tokenize the sentence by whitespace
Returns
tokenized sentence, containing the tokens

Here is the call graph for this function:

void HashGenerator::serializeWordMap ( )

Saves the contents of current WordMap to HDD.


The documentation for this class was generated from the following files: