WELCOME TO CONCORDIA HOME PAGE

Concordia - Roman goddess of agreement. Concordance searcher - tool for translators who need their translations to "agree" with one standard.

Concordia is a C++ library for fast text lookup in large corpora. It uses a RAM stored index, which takes up approximately 600MB of memory for a corpus of 2 million sentences. It is based on the idea of a suffix array, enhanced by the presence of other auxiliary data structures.

The effects are stunning - Concordia is able to do simple substring lookup at the pace of 5000 queries per second (on personal PC) - a speed which can not be achieved by any other search library.

Moreover, Concordia can perform its own "concordia search". For a given input sentece, all substring matches covering this sentence are retrieved.

This project now contains fully functional Concordia search library. In the near future, it will be extended by concordia-server: ligthweight, robust web server providing corpora search functionalities.

Publication

If you are planning to use this software in scientific research, please cite the following paper:

R. Jaworski: "Approximate sentence matching and its applications to corpus-based research", The Future of Information Sciences: e-Institutions, Openness, Accessibility and Preservation, pp. 21-30 (keynote paper), 2015 [docx].

BibTex:

@article{jaworski2015approximate,
  title={Approximate sentence matching and its application in corpus-based research},
  author={Jaworski, Rafa{\l}},
  journal={The future of Information Sciences: e-Institutions,
           Openness, Accessibility and Preservation},
  volume={5},
  pages={21--30},
  year={2015},
  publisher={Department of Information and Communication Sciences,
             Faculty of Humanities and Social Sciences, University of Zagreb}
}

Acknowledgements

Concordia makes use of the following Open Source projects:

libdivsufsort - a lightweight suffix-sorting library
PSI-Toolkit - multi-functional NLP toolkit

CONCORDIA

Have I nottranslated this before?

WELCOME TO CONCORDIA HOME PAGE

Publication

Acknowledgements

Why wait? Start now!

We are on SourceForge