lucene/solr/contrib/langid
Chris M. Hostetter 6961d9f589 SOLR-3650: checkpoint - merged in CHANGES.txt entries from contrib/analysis-extras contrib/langid contrib/clustering
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1367377 13f79535-47bb-0310-9956-ffa450edef68
2012-07-31 00:34:01 +00:00
..
src [SOLR-2838] - applying Koji's patch plus aligning tests to use a SolrCore while building the update processor 2012-07-24 08:59:19 +00:00
README.txt SOLR-1979: Updated README and CHANGES in trunk 2012-04-08 23:05:52 +00:00
build.xml LUCENE-3945: use sha1 checksums to verify jars pulled from ivy match expectations 2012-04-04 17:53:32 +00:00
ivy.xml SOLR-3310: nuke the solr-langdetect maven packaging 2012-04-04 00:47:22 +00:00

README.txt

Apache Solr Language Identifier


Introduction
------------
This module is intended to be used while indexing documents.
It is implemented as an UpdateProcessor to be placed in an UpdateChain.
Its purpose is to identify language from documents and tag the document with language code.
The module can optionally map field names to their language specific counterpart,
e.g. if the input is "title" and language is detected as "en", map to "title_en".
Language may be detected globally for the document, and/or individually per field.
Language detector implementations are pluggable.

Getting Started
---------------
Please refer to the module documentation at http://wiki.apache.org/solr/LanguageDetection

Dependencies
------------
The Tika detector depends on Tika Core (which is part of extraction contrib)
The Langdetect detector depends on LangDetect library