lucene/modules/analysis
Dawid Weiss 8c2e3cef8f LUCENE-3820: limiting the amount of input for pattern matching to go past exponential time patterns, even if they happen. A nice catch from Mike too -- un-ignore testNastyPattern and look at processing time go wild with each additional input character...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1294797 13f79535-47bb-0310-9956-ffa450edef68
2012-02-28 19:26:05 +00:00
..
common LUCENE-3820: limiting the amount of input for pattern matching to go past exponential time patterns, even if they happen. A nice catch from Mike too -- un-ignore testNastyPattern and look at processing time go wild with each additional input character... 2012-02-28 19:26:05 +00:00
icu LUCENE-3717: add better offsets testing to BaseTokenStreamTestCase, fix offsets bugs in ThaiWordFilter and ICUTokenizer 2012-01-23 00:08:52 +00:00
kuromoji SOLR-3115: improve japanese stopwords.txt description 2012-02-09 22:17:44 +00:00
morfologik LUCENE-3774: Optimized and streamlined license and notice file validation 2012-02-13 14:12:59 +00:00
phonetic LUCENE-3720: add warning+experimental and disable test 2012-01-26 18:26:07 +00:00
smartcn LUCENE-3765: Trappy behavior with StopFilter/ignoreCase 2012-02-09 19:59:50 +00:00
stempel LUCENE-3765: Trappy behavior with StopFilter/ignoreCase 2012-02-09 19:59:50 +00:00
uima [LUCENE-3731] - refactored analyzeText method to initializeIterator and made it abstract inside BaseUIMATokenizer 2012-02-25 14:14:00 +00:00
CHANGES.txt LUCENE-3820: Wrong trailing index calculation in PatternReplaceCharFilter. 2012-02-27 13:13:10 +00:00
LICENSE.txt LUCENE-2341: integrating morfologik (Polish stemming/ morphosyntactic dictionary) as an analysis module. 2011-06-30 19:12:54 +00:00
NOTICE.txt LUCENE-3305: add Kuromoji Japanese morphological analyzer 2012-01-12 20:10:48 +00:00
README.txt [LUCENE-3731] - Creating the analysis-uima module for UIMA based tokenizers/analyzers 2012-02-14 22:13:34 +00:00
build.xml [LUCENE-3731] - Creating the analysis-uima module for UIMA based tokenizers/analyzers 2012-02-14 22:13:34 +00:00

README.txt

Analysis README file

INTRODUCTION

The Analysis Module provides analysis capabilities to Lucene and Solr
applications.

The Lucene web site is at:
  http://lucene.apache.org/

Please join the Lucene-User mailing list by sending a message to:
  java-user-subscribe@lucene.apache.org

FILES

lucene-analyzers-common-XX.jar
  The primary analysis module library, containing general-purpose analysis
  components and support for various languages.

lucene-analyzers-icu-XX.jar
  An add-on analysis library that provides improved Unicode support via
  International Components for Unicode (ICU). Note: this module depends on
  the ICU4j jar file (version >= 4.6.0)

lucene-analyzers-kuromoji-XX.jar
  An analyzer with morphological analysis for Japanese.

lucene-analyzers-morfologik-XX.jar
  An analyzer using the Morfologik stemming library.

lucene-analyzers-phonetic-XX.jar
  An add-on analysis library that provides phonetic encoders via Apache
  Commons-Codec. Note: this module depends on the commons-codec jar 
  file (version >= 1.4)
  
lucene-analyzers-smartcn-XX.jar
  An add-on analysis library that provides word segmentation for Simplified
  Chinese.

lucene-analyzers-stempel-XX.jar
  An add-on analysis library that contains a universal algorithmic stemmer,
  including tables for the Polish language.

lucene-analyzers-uima-XX.jar
  An add-on analysis library that contains tokenizers/analyzers using
  Apache UIMA extracted annotations to identify tokens/types/etc.

common/src/java
icu/src/java
kuromoji/src/java
morfologik/src/java
phonetic/src/java
smartcn/src/java
stempel/src/java
uima/src/java
  The source code for the libraries.

common/src/test
icu/src/test
kuromoji/src/test
morfologik/src/test
phonetic/src/test
smartcn/src/test
stempel/src/test
uima/src/test
  Unit tests for the libraries.