lucene

History

Chris M. Hostetter bb7bc2ff44 LUCENE-3945: use sha1 checksums to verify jars pulled from ivy match expectations git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1309503 13f79535-47bb-0310-9956-ffa450edef68		2012-04-04 17:53:32 +00:00
..
lib	LUCENE-3945: use sha1 checksums to verify jars pulled from ivy match expectations	2012-04-04 17:53:32 +00:00
src	LUCENE-1866: better RAT reporting	2012-04-04 05:03:53 +00:00
CHANGES.txt	SOLR-3107: hardwire seed in langdetect langid impl	2012-02-07 18:58:43 +00:00
README.txt	SOLR-1979: Create LanguageIdentifierUpdateProcessor	2011-10-05 20:21:59 +00:00
build.xml	LUCENE-3945: use sha1 checksums to verify jars pulled from ivy match expectations	2012-04-04 17:53:32 +00:00
ivy.xml	SOLR-3310: nuke the solr-langdetect maven packaging	2012-04-04 00:47:22 +00:00

README.txt

Apache Solr Language Identifier


Introduction
------------
This module is intended to be used while indexing documents.
It is implemented as an UpdateProcessor to be placed in an UpdateChain.
Its purpose is to identify language from documents and tag the document with language code.
The module can optionally map field names to their language specific counterpart,
e.g. if the input is "title" and language is detected as "en", map to "title_en".
Language may be detected globally for the document, and/or individually per field.

The module currently relies on Tika's language identification capabilities.

Getting Started
---------------
Please refer to the module documentation at http://wiki.apache.org/solr/LanguageDetection

Dependencies
------------
This contrib depends on Tika Core, which is part of extraction contrib.
The easiest is thus to first install extraction contrib and then langid.
Alternatively you can include tika-core manually on your classpath.