Concordia
|
#include <concordia_index.hpp>
Public Member Functions | |
ConcordiaIndex (const std::string &hashedIndexFilePath, const std::string &markersFilePath) throw (ConcordiaException) | |
virtual | ~ConcordiaIndex () |
TokenizedSentence | addExample (boost::shared_ptr< HashGenerator > hashGenerator, boost::shared_ptr< std::vector< sauchar_t > > T, boost::shared_ptr< std::vector< SUFFIX_MARKER_TYPE > > markers, const Example &example) |
void | addTokenizedExample (boost::shared_ptr< HashGenerator > hashGenerator, boost::shared_ptr< std::vector< sauchar_t > > T, boost::shared_ptr< std::vector< SUFFIX_MARKER_TYPE > > markers, const TokenizedSentence &tokenizedSentence, const SUFFIX_MARKER_TYPE id) |
void | addAllTokenizedExamples (boost::shared_ptr< HashGenerator > hashGenerator, boost::shared_ptr< std::vector< sauchar_t > > T, boost::shared_ptr< std::vector< SUFFIX_MARKER_TYPE > > markers, const std::vector< TokenizedSentence > &tokenizedSentences, const std::vector< SUFFIX_MARKER_TYPE > &ids) |
std::vector< TokenizedSentence > | addAllExamples (boost::shared_ptr< HashGenerator > hashGenerator, boost::shared_ptr< std::vector< sauchar_t > > T, boost::shared_ptr< std::vector< SUFFIX_MARKER_TYPE > > markers, const std::vector< Example > &examples) |
boost::shared_ptr< std::vector< saidx_t > > | generateSuffixArray (boost::shared_ptr< std::vector< sauchar_t > > T) |
Class for creating and maintaining the index. This class does not hold the index data structures but only operates on them when they are passed to ConcordiaIndex methods by smart pointers. This class only remembers paths to two files: hashed index and markers array, which are backups of the respective data structures on HDD.
|
explicit |
Constructor.
hashedIndexFilePath | path to the hashed index file |
markersFilePath | path to the markers array |
ConcordiaException |
|
virtual |
Destructor.
std::vector< TokenizedSentence > ConcordiaIndex::addAllExamples | ( | boost::shared_ptr< HashGenerator > | hashGenerator, |
boost::shared_ptr< std::vector< sauchar_t > > | T, | ||
boost::shared_ptr< std::vector< SUFFIX_MARKER_TYPE > > | markers, | ||
const std::vector< Example > & | examples | ||
) |
Adds multiple examples to the index. Examples are first hashed using the hash generator passed to this method. Then, hashed index and markers array (also passed to this method) are appended with the hashed examples. At the same time, HDD versions of these two data structures are also appended with the same examples. The method returns a vector of tokenized examples.
hashGenerator | hash generator to be used to prepare the hash of the example |
T | RAM-based hash index to be appended to |
markers | RAM-based markers array to be appended to |
examples | vector of examples to be added to index |
ConcordiaException |
void ConcordiaIndex::addAllTokenizedExamples | ( | boost::shared_ptr< HashGenerator > | hashGenerator, |
boost::shared_ptr< std::vector< sauchar_t > > | T, | ||
boost::shared_ptr< std::vector< SUFFIX_MARKER_TYPE > > | markers, | ||
const std::vector< TokenizedSentence > & | tokenizedSentences, | ||
const std::vector< SUFFIX_MARKER_TYPE > & | ids | ||
) |
Adds multiple tokenized examples to the index. Hashed index and markers array are appended with the examples. At the same time, HDD versions of these two data structures are also appended with the same examples.
hashGenerator | hash generator to be used to prepare the hash of the example |
T | RAM-based hash index to be appended to |
markers | RAM-based markers array to be appended to |
example | example to be added to index |
tokenizedSentences | vector of tokenized sentences to be added |
ids | vector of ids of the sentences to be added |
ConcordiaException |
TokenizedSentence ConcordiaIndex::addExample | ( | boost::shared_ptr< HashGenerator > | hashGenerator, |
boost::shared_ptr< std::vector< sauchar_t > > | T, | ||
boost::shared_ptr< std::vector< SUFFIX_MARKER_TYPE > > | markers, | ||
const Example & | example | ||
) |
Adds an Example to the index. Example is first hashed using the hash generator passed to this method. Then, hashed index and markers array (also passed to this method) are appended with the hashed example. At the same time, HDD versions of these two data structures are also appended with the same example. The method returns a tokenized version of the example.
hashGenerator | hash generator to be used to prepare the hash of the example |
T | RAM-based hash index to be appended to |
markers | RAM-based markers array to be appended to |
example | example to be added to index |
ConcordiaException |
void ConcordiaIndex::addTokenizedExample | ( | boost::shared_ptr< HashGenerator > | hashGenerator, |
boost::shared_ptr< std::vector< sauchar_t > > | T, | ||
boost::shared_ptr< std::vector< SUFFIX_MARKER_TYPE > > | markers, | ||
const TokenizedSentence & | tokenizedSentence, | ||
const SUFFIX_MARKER_TYPE | id | ||
) |
Adds a tokenized example to the index. Hashed index and markers array are appended with the example. At the same time, HDD versions of these two data structures are also appended with the same example.
hashGenerator | hash generator to be used to prepare the hash of the example |
T | RAM-based hash index to be appended to |
markers | RAM-based markers array to be appended to |
example | example to be added to index |
tokenizedSentence | tokenized sentence to be added |
id | of the sentence to be added |
ConcordiaException |
boost::shared_ptr< std::vector< saidx_t > > ConcordiaIndex::generateSuffixArray | ( | boost::shared_ptr< std::vector< sauchar_t > > | T | ) |
Generates suffix array based on the passed hashed index.
ConcordiaException |