diff --git a/lucene/CHANGES.txt b/lucene/CHANGES.txt index 9c412782dcc..bfbea0565bc 100644 --- a/lucene/CHANGES.txt +++ b/lucene/CHANGES.txt @@ -167,6 +167,10 @@ Documentation to the analysis package overview. (Benson Margulies via Robert Muir - pull request #12) +* LUCENE-5389: Add more guidance in the analyis documentation + package overview. + (Benson Margulies via Robert Muir - pull request #14) + ======================= Lucene 4.6.0 ======================= New Features diff --git a/lucene/core/src/java/org/apache/lucene/analysis/package.html b/lucene/core/src/java/org/apache/lucene/analysis/package.html index 5d5b65aa347..c76666d05f8 100644 --- a/lucene/core/src/java/org/apache/lucene/analysis/package.html +++ b/lucene/core/src/java/org/apache/lucene/analysis/package.html @@ -179,7 +179,7 @@ and proximity searches (though sentence identification is not provided by Lucene
However an application might invoke Analysis of any text for testing or for any other purpose, something like:
-+Version matchVersion = Version.LUCENE_XY; // Substitute desired Lucene version for XY Analyzer analyzer = new StandardAnalyzer(matchVersion); // or any other analyzer TokenStream ts = analyzer.tokenStream("myfield", new StringReader("some text goes here")); @@ -476,6 +476,71 @@ and proximity searches (though sentence identification is not provided by Lucene +More Requirements for Analysis Component Classes
+Due to the historical development of the API, there are some perhaps +less than obvious requirements to implement analysis components +classes. +Token Stream Lifetime
+The code fragment of the analysis workflow +protocol above shows a token stream being obtained, used, and then +left for garbage. However, that does not mean that the components of +that token stream will, in fact, be discarded. The default is just the +opposite. {@link org.apache.lucene.analysis.Analyzer} applies a reuse +strategy to the tokenizer and the token filters. It will reuse +them. For each new input, it calls {@link org.apache.lucene.analysis.Tokenizer#setReader(java.io.Reader)} +to set the input. Your components must be prepared for this scenario, +as described below. +Tokenizer
+
super.end()
. It must set a correct final offset into
+ the offset attribute, and finish up and other attributes to reflect
+ the end of the stream.
+ + public class ForwardingTokenizer extends Tokenizer { + private Tokenizer delegate; + ... + {@literal @Override} + public void reset() { + super.reset(); + delegate.setReader(this.input); + delegate.reset(); + } + } +
The lucene-test-framework component defines