Concordia
Concordia Class Reference

#include <concordia.hpp>

Public Member Functions

 Concordia ()
 
 Concordia (const std::string &indexPath, const std::string &configFilePath) throw (ConcordiaException)
 
virtual ~Concordia ()
 
std::string & getVersion ()
 
TokenizedSentence tokenize (const std::string &sentence, bool byWhitespace=false, bool generateCodes=true) throw (ConcordiaException)
 
std::vector< TokenizedSentencetokenizeAll (const std::vector< std::string > &sentences, bool byWhitespace=false, bool generateCodes=true) throw (ConcordiaException)
 
TokenizedSentence addExample (const Example &example) throw (ConcordiaException)
 
void addTokenizedExample (const TokenizedSentence &tokenizedSentence, const SUFFIX_MARKER_TYPE id) throw (ConcordiaException)
 
void addAllTokenizedExamples (const std::vector< TokenizedSentence > &tokenizedSentences, const std::vector< SUFFIX_MARKER_TYPE > &ids) throw (ConcordiaException)
 
std::vector< TokenizedSentenceaddAllExamples (const std::vector< Example > &examples) throw (ConcordiaException)
 
MatchedPatternFragment simpleSearch (const std::string &pattern, bool byWhitespace=false) throw (ConcordiaException)
 
MatchedPatternFragment lexiconSearch (const std::string &pattern, bool byWhitespace=false) throw (ConcordiaException)
 
std::vector< AnubisSearchResultanubisSearch (const std::string &pattern) throw (ConcordiaException)
 
boost::shared_ptr< ConcordiaSearchResultconcordiaSearch (const std::string &pattern, bool byWhitespace=false) throw (ConcordiaException)
 
void loadRAMIndexFromDisk () throw (ConcordiaException)
 
void refreshSAfromRAM () throw (ConcordiaException)
 
void clearIndex () throw (ConcordiaException)
 

Detailed Description

The Concordia class is the main access point to the library. This class holds references to three out of four main data structures used by Concordia: hashed index, markers array and suffix array. Word map is maintained by the class HashGenerator. Concordia has references to:

Whenever it is necessary, the data structures and tools held by Concordia are passed by smart pointers to methods which carry out specific functionalities.

Constructor & Destructor Documentation

Concordia::Concordia ( )

Parameterless constructor

Concordia::Concordia ( const std::string &  indexPath,
const std::string &  configFilePath 
)
throw (ConcordiaException
)
explicit

Constructor.

Parameters
indexPathpath to the index directory
configFilePathpath to the Concordia configuration file
Exceptions
ConcordiaException
Concordia::~Concordia ( )
virtual

Destructor.

Member Function Documentation

std::vector< TokenizedSentence > Concordia::addAllExamples ( const std::vector< Example > &  examples)
throw (ConcordiaException
)

Adds multiple examples to the index.

Parameters
examplesvector of examples to be added
Returns
vector of tokenized sentence objects, containing information about original word positions
Exceptions
ConcordiaException
void Concordia::addAllTokenizedExamples ( const std::vector< TokenizedSentence > &  tokenizedSentences,
const std::vector< SUFFIX_MARKER_TYPE > &  ids 
)
throw (ConcordiaException
)

Adds multiple tokenized examples to the index.

Parameters
examplesvector of examples to be added
idsvector of ids of the sentences to be added
Exceptions
ConcordiaException
TokenizedSentence Concordia::addExample ( const Example example)
throw (ConcordiaException
)

Adds an Example to the index.

Parameters
exampleexample to be added
Returns
tokenized sentence object, containing information about original word positions
Exceptions
ConcordiaException
void Concordia::addTokenizedExample ( const TokenizedSentence tokenizedSentence,
const SUFFIX_MARKER_TYPE  id 
)
throw (ConcordiaException
)

Adds a tokenized example to the index.

Parameters
tokenizedSentencetokenized sentence to be added
idid of the sentence to be added
Exceptions
ConcordiaException
std::vector< AnubisSearchResult > Concordia::anubisSearch ( const std::string &  pattern)
throw (ConcordiaException
)
Deprecated:
Finds the examples from the index, whose resemblance to the pattern is maximal. This method may perform very slow, try using concordiaSearch instead.
Parameters
patternpattern to be searched in the index
Returns
vector of anubis results
Exceptions
ConcordiaException
void Concordia::clearIndex ( )
throw (ConcordiaException
)

Clears all the examples from the index

Exceptions
ConcordiaException
boost::shared_ptr< ConcordiaSearchResult > Concordia::concordiaSearch ( const std::string &  pattern,
bool  byWhitespace = false 
)
throw (ConcordiaException
)

Performs concordia lookup on the index. This is a unique library functionality, designed to facilitate Computer-Aided Translation. For more info see Concordia searching.

Parameters
patternpattern to be searched in the index
Returns
concordia result
Exceptions
ConcordiaException
std::string & Concordia::getVersion ( )

Getter for version.

Returns
version of the Concordia library.
MatchedPatternFragment Concordia::lexiconSearch ( const std::string &  pattern,
bool  byWhitespace = false 
)
throw (ConcordiaException
)

Performs a search useful for lexicons in the following scenario: Concordia gets fed by a lexicon (glossary) instead of a TM. The lexicon search performs as simple search - it requires the match to cover the whole pattern, but additionally the lexicon search requires that the match is the whole example source.

Parameters
patternpattern to be searched in the index
byWhitespacewhether to tokenize the pattern by white space
Returns
matched pattern fragment containing vector of occurences
Exceptions
ConcordiaException
void Concordia::loadRAMIndexFromDisk ( )
throw (ConcordiaException
)

Loads HDD stored index files to RAM and generates suffix array based on RAM stored data structures. For more info see Concept of HDD and RAM index.

Exceptions
ConcordiaException

Here is the call graph for this function:

void Concordia::refreshSAfromRAM ( )
throw (ConcordiaException
)

Generates suffix array based on RAM stored data structures. For more info see Concept of HDD and RAM index.

Exceptions
ConcordiaException
MatchedPatternFragment Concordia::simpleSearch ( const std::string &  pattern,
bool  byWhitespace = false 
)
throw (ConcordiaException
)

Performs a simple substring lookup on the index. For more info see Simple substring lookup.

Parameters
patternpattern to be searched in the index
byWhitespacewhether to tokenize the pattern by white space
Returns
matched pattern fragment containing vector of occurences
Exceptions
ConcordiaException
TokenizedSentence Concordia::tokenize ( const std::string &  sentence,
bool  byWhitespace = false,
bool  generateCodes = true 
)
throw (ConcordiaException
)

Tokenizes the given sentence.

Parameters
sentencesentence to be tokenized
byWhitespacewhether to tokenize the sentence by whitespace
generateCodeswhether to generate codes for tokens using WordMap
Returns
tokenized sentence object, containing information about original word positions
Exceptions
ConcordiaException

Here is the call graph for this function:

std::vector< TokenizedSentence > Concordia::tokenizeAll ( const std::vector< std::string > &  sentences,
bool  byWhitespace = false,
bool  generateCodes = true 
)
throw (ConcordiaException
)

Tokenizes all the given sentences.

Parameters
sentencesvector of sentences to be tokenized
byWhitespacewhether to tokenize the sentence by whitespace
generateCodeswhether to generate codes for tokens using WordMap
Returns
vector of tokenized sentence objects
Exceptions
ConcordiaException

The documentation for this class was generated from the following files: