From 0086a6e644bd1f46a74a72cd8f02c41e5968e39e Mon Sep 17 00:00:00 2001 From: Uwe Schindler Date: Tue, 6 May 2014 22:24:58 +0000 Subject: [PATCH] LUCENE-5640: Refactor Token, add new PackedTokenAttributeImpl, make use of Java 7 MethodHandles in DEFAULT_ATTRIBUTE_FACTORY git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1592914 13f79535-47bb-0310-9956-ffa450edef68 --- lucene/CHANGES.txt | 15 +- .../analysis/core/KeywordTokenizer.java | 1 + .../core/KeywordTokenizerFactory.java | 2 +- .../lucene/analysis/core/LetterTokenizer.java | 3 +- .../analysis/core/LetterTokenizerFactory.java | 2 +- .../analysis/core/LowerCaseTokenizer.java | 3 +- .../core/LowerCaseTokenizerFactory.java | 2 +- .../analysis/core/WhitespaceTokenizer.java | 3 +- .../core/WhitespaceTokenizerFactory.java | 2 +- .../miscellaneous/SingleTokenTokenStream.java | 2 + .../analysis/ngram/EdgeNGramTokenizer.java | 3 +- .../ngram/EdgeNGramTokenizerFactory.java | 2 +- .../ngram/Lucene43NGramTokenizer.java | 3 +- .../lucene/analysis/ngram/NGramTokenizer.java | 3 +- .../analysis/ngram/NGramTokenizerFactory.java | 2 +- .../analysis/path/PathHierarchyTokenizer.java | 5 +- .../path/PathHierarchyTokenizerFactory.java | 2 +- .../path/ReversePathHierarchyTokenizer.java | 5 +- .../analysis/pattern/PatternTokenizer.java | 4 +- .../pattern/PatternTokenizerFactory.java | 2 +- .../analysis/standard/ClassicTokenizer.java | 3 +- .../standard/ClassicTokenizerFactory.java | 2 +- .../analysis/standard/StandardTokenizer.java | 3 +- .../standard/StandardTokenizerFactory.java | 2 +- .../standard/UAX29URLEmailTokenizer.java | 2 +- .../UAX29URLEmailTokenizerFactory.java | 2 +- .../lucene/analysis/th/ThaiTokenizer.java | 4 +- .../analysis/th/ThaiTokenizerFactory.java | 4 +- .../lucene/analysis/util/CharTokenizer.java | 1 + .../util/SegmentingTokenizerBase.java | 5 +- .../analysis/util/TokenizerFactory.java | 7 +- .../wikipedia/WikipediaTokenizer.java | 3 +- .../wikipedia/WikipediaTokenizerFactory.java | 2 +- .../collation/CollationAttributeFactory.java | 24 +- .../lucene/analysis/core/TestFactories.java | 2 +- .../analysis/core/TestRandomChains.java | 2 +- .../icu/segmentation/ICUTokenizer.java | 4 +- .../icu/segmentation/ICUTokenizerFactory.java | 2 +- .../ICUCollationAttributeFactory.java | 25 +- .../lucene/analysis/ja/JapaneseTokenizer.java | 4 +- .../analysis/ja/JapaneseTokenizerFactory.java | 2 +- .../cn/smart/HMMChineseTokenizer.java | 4 +- .../cn/smart/HMMChineseTokenizerFactory.java | 2 +- .../analysis/cn/smart/SentenceTokenizer.java | 1 + .../SmartChineseSentenceTokenizerFactory.java | 2 +- .../analysis/uima/BaseUIMATokenizer.java | 1 + .../uima/UIMAAnnotationsTokenizer.java | 5 +- .../uima/UIMAAnnotationsTokenizerFactory.java | 2 +- .../UIMATypeAwareAnnotationsTokenizer.java | 5 +- ...ATypeAwareAnnotationsTokenizerFactory.java | 2 +- .../lucene/analysis/NumericTokenStream.java | 3 +- .../org/apache/lucene/analysis/Token.java | 525 ++---------------- .../apache/lucene/analysis/TokenStream.java | 8 +- .../org/apache/lucene/analysis/Tokenizer.java | 1 + .../org/apache/lucene/analysis/package.html | 2 +- .../PackedTokenAttributeImpl.java | 206 +++++++ .../apache/lucene/util/AttributeFactory.java | 202 +++++++ .../org/apache/lucene/util/AttributeImpl.java | 11 +- .../apache/lucene/util/AttributeSource.java | 82 +-- .../org/apache/lucene/analysis/TestToken.java | 170 +----- .../TestCharTermAttributeImpl.java | 25 +- .../TestPackedTokenAttributeImpl.java | 96 ++++ .../org/apache/lucene/index/Test2BTerms.java | 4 +- .../spatial/prefix/CellTokenStream.java | 8 +- .../analyzing/AnalyzingSuggesterTest.java | 8 +- .../analysis/BaseTokenStreamTestCase.java | 42 +- .../lucene/analysis/MockBytesAnalyzer.java | 7 +- .../analysis/MockBytesAttributeFactory.java | 40 -- .../apache/lucene/analysis/MockTokenizer.java | 4 +- .../analysis/MockUTF16TermAttributeImpl.java | 10 +- .../apache/solr/schema/PreAnalyzedField.java | 1 + .../solr/analysis/MockTokenizerFactory.java | 2 +- 72 files changed, 769 insertions(+), 883 deletions(-) create mode 100644 lucene/core/src/java/org/apache/lucene/analysis/tokenattributes/PackedTokenAttributeImpl.java create mode 100644 lucene/core/src/java/org/apache/lucene/util/AttributeFactory.java create mode 100644 lucene/core/src/test/org/apache/lucene/analysis/tokenattributes/TestPackedTokenAttributeImpl.java delete mode 100644 lucene/test-framework/src/java/org/apache/lucene/analysis/MockBytesAttributeFactory.java diff --git a/lucene/CHANGES.txt b/lucene/CHANGES.txt index 60ce042330c..82776189756 100644 --- a/lucene/CHANGES.txt +++ b/lucene/CHANGES.txt @@ -100,6 +100,10 @@ Changes in Backwards Compatibility Policy can be used by custom fieldtypes, which don't use the Analyzer, but implement their own TokenStream. (Uwe Schindler, Robert Muir) +* LUCENE-5640: AttributeSource.AttributeFactory was moved to a + top-level class: org.apache.lucene.util.AttributeFactory + (Uwe Schindler, Robert Muir) + API Changes * LUCENE-5582: Deprecate IndexOutput.length (just use @@ -126,6 +130,9 @@ API Changes * LUCENE-5633: Change NoMergePolicy to a singleton with no distinction between compound and non-compound types. (Shai Erera) +* LUCENE-5640: The Token class was deprecated. Since Lucene 2.9, TokenStreams + are using Attributes, Token is no longer used. (Uwe Schindler, Robert Muir) + Optimizations * LUCENE-5603: hunspell stemmer more efficiently strips prefixes @@ -140,9 +147,11 @@ Optimizations * LUCENE-5634: IndexWriter reuses TokenStream instances for String and Numeric fields by default. (Uwe Schindler, Shay Banon, Mike McCandless, Robert Muir) -* LUCENE-5638: TokenStream uses a more performant AttributeFactory by default, - that packs the core attributes into one impl, for faster clearAttributes(), - saveState(), and restoreState(). (Uwe Schindler, Robert Muir) +* LUCENE-5638, LUCENE-5640: TokenStream uses a more performant AttributeFactory + by default, that packs the core attributes into one implementation + (PackedTokenAttributeImpl), for faster clearAttributes(), saveState(), and + restoreState(). In addition, AttributeFactory uses Java 7 MethodHandles for + instantiating Attribute implementations. (Uwe Schindler, Robert Muir) * LUCENE-5609: Changed the default NumericField precisionStep from 4 to 8 (for int/float) and 16 (for long/double), for faster indexing diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/KeywordTokenizer.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/KeywordTokenizer.java index 50014e655a0..876a6160f73 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/KeywordTokenizer.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/KeywordTokenizer.java @@ -23,6 +23,7 @@ import java.io.Reader; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; +import org.apache.lucene.util.AttributeFactory; import org.apache.lucene.util.AttributeSource; /** diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/KeywordTokenizerFactory.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/KeywordTokenizerFactory.java index 1e31c7f2866..c29bcd50992 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/KeywordTokenizerFactory.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/KeywordTokenizerFactory.java @@ -18,7 +18,7 @@ package org.apache.lucene.analysis.core; */ import org.apache.lucene.analysis.util.TokenizerFactory; -import org.apache.lucene.util.AttributeSource.AttributeFactory; +import org.apache.lucene.util.AttributeFactory; import java.io.Reader; import java.util.Map; diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/LetterTokenizer.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/LetterTokenizer.java index b559b580ea3..e0437b3d467 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/LetterTokenizer.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/LetterTokenizer.java @@ -19,6 +19,7 @@ package org.apache.lucene.analysis.core; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.util.CharTokenizer; +import org.apache.lucene.util.AttributeFactory; import org.apache.lucene.util.Version; /** @@ -55,7 +56,7 @@ public class LetterTokenizer extends CharTokenizer { /** * Construct a new LetterTokenizer using a given - * {@link org.apache.lucene.util.AttributeSource.AttributeFactory}. + * {@link org.apache.lucene.util.AttributeFactory}. * * @param matchVersion * Lucene version to match See {@link above} diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/LetterTokenizerFactory.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/LetterTokenizerFactory.java index 7cc617ae577..4a06f3127d8 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/LetterTokenizerFactory.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/LetterTokenizerFactory.java @@ -18,7 +18,7 @@ package org.apache.lucene.analysis.core; */ import org.apache.lucene.analysis.util.TokenizerFactory; -import org.apache.lucene.util.AttributeSource.AttributeFactory; +import org.apache.lucene.util.AttributeFactory; import java.util.Map; diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/LowerCaseTokenizer.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/LowerCaseTokenizer.java index a1645057ae8..d61e1a938d9 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/LowerCaseTokenizer.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/LowerCaseTokenizer.java @@ -21,6 +21,7 @@ import java.io.Reader; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.util.CharTokenizer; +import org.apache.lucene.util.AttributeFactory; import org.apache.lucene.util.AttributeSource; import org.apache.lucene.util.Version; @@ -60,7 +61,7 @@ public final class LowerCaseTokenizer extends LetterTokenizer { /** * Construct a new LowerCaseTokenizer using a given - * {@link org.apache.lucene.util.AttributeSource.AttributeFactory}. + * {@link org.apache.lucene.util.AttributeFactory}. * * @param matchVersion * Lucene version to match See {@link above} diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/LowerCaseTokenizerFactory.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/LowerCaseTokenizerFactory.java index 2d9cf17d018..4af9a10484c 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/LowerCaseTokenizerFactory.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/LowerCaseTokenizerFactory.java @@ -20,7 +20,7 @@ package org.apache.lucene.analysis.core; import org.apache.lucene.analysis.util.AbstractAnalysisFactory; import org.apache.lucene.analysis.util.MultiTermAwareComponent; import org.apache.lucene.analysis.util.TokenizerFactory; -import org.apache.lucene.util.AttributeSource.AttributeFactory; +import org.apache.lucene.util.AttributeFactory; import java.util.HashMap; import java.util.Map; diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/WhitespaceTokenizer.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/WhitespaceTokenizer.java index a8c9b1de267..354322c444d 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/WhitespaceTokenizer.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/WhitespaceTokenizer.java @@ -21,6 +21,7 @@ import java.io.Reader; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.util.CharTokenizer; +import org.apache.lucene.util.AttributeFactory; import org.apache.lucene.util.AttributeSource; import org.apache.lucene.util.Version; @@ -50,7 +51,7 @@ public final class WhitespaceTokenizer extends CharTokenizer { /** * Construct a new WhitespaceTokenizer using a given - * {@link org.apache.lucene.util.AttributeSource.AttributeFactory}. + * {@link org.apache.lucene.util.AttributeFactory}. * * @param * matchVersion Lucene version to match See diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/WhitespaceTokenizerFactory.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/WhitespaceTokenizerFactory.java index 9a75c800051..e23ee869665 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/WhitespaceTokenizerFactory.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/WhitespaceTokenizerFactory.java @@ -18,7 +18,7 @@ package org.apache.lucene.analysis.core; */ import org.apache.lucene.analysis.util.TokenizerFactory; -import org.apache.lucene.util.AttributeSource.AttributeFactory; +import org.apache.lucene.util.AttributeFactory; import java.io.Reader; import java.util.Map; diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/SingleTokenTokenStream.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/SingleTokenTokenStream.java index 0bdc357af4f..bcf182ecb95 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/SingleTokenTokenStream.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/SingleTokenTokenStream.java @@ -24,7 +24,9 @@ import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; /** * A {@link TokenStream} containing a single token. + * @deprecated Do not use this anymore! */ +@Deprecated public final class SingleTokenTokenStream extends TokenStream { private boolean exhausted = false; diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/EdgeNGramTokenizer.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/EdgeNGramTokenizer.java index f14591da7a3..8b4d42fc88d 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/EdgeNGramTokenizer.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/EdgeNGramTokenizer.java @@ -20,6 +20,7 @@ package org.apache.lucene.analysis.ngram; import java.io.Reader; import org.apache.lucene.analysis.Tokenizer; +import org.apache.lucene.util.AttributeFactory; import org.apache.lucene.util.Version; /** @@ -49,7 +50,7 @@ public class EdgeNGramTokenizer extends NGramTokenizer { * Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range * * @param version the Lucene match version - * @param factory {@link org.apache.lucene.util.AttributeSource.AttributeFactory} to use + * @param factory {@link org.apache.lucene.util.AttributeFactory} to use * @param minGram the smallest n-gram to generate * @param maxGram the largest n-gram to generate */ diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/EdgeNGramTokenizerFactory.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/EdgeNGramTokenizerFactory.java index cf1f30e89b7..2990513f597 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/EdgeNGramTokenizerFactory.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/EdgeNGramTokenizerFactory.java @@ -18,7 +18,7 @@ package org.apache.lucene.analysis.ngram; */ import org.apache.lucene.analysis.util.TokenizerFactory; -import org.apache.lucene.util.AttributeSource.AttributeFactory; +import org.apache.lucene.util.AttributeFactory; import java.io.Reader; import java.util.Map; diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/Lucene43NGramTokenizer.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/Lucene43NGramTokenizer.java index 1eb53399cd3..fa9fcb0caec 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/Lucene43NGramTokenizer.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/Lucene43NGramTokenizer.java @@ -23,6 +23,7 @@ import java.io.Reader; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; +import org.apache.lucene.util.AttributeFactory; /** * Old broken version of {@link NGramTokenizer}. @@ -54,7 +55,7 @@ public final class Lucene43NGramTokenizer extends Tokenizer { /** * Creates NGramTokenizer with given min and max n-grams. - * @param factory {@link org.apache.lucene.util.AttributeSource.AttributeFactory} to use + * @param factory {@link org.apache.lucene.util.AttributeFactory} to use * @param minGram the smallest n-gram to generate * @param maxGram the largest n-gram to generate */ diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/NGramTokenizer.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/NGramTokenizer.java index 34f44a69072..72c943b1ef9 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/NGramTokenizer.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/NGramTokenizer.java @@ -26,6 +26,7 @@ import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; import org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute; import org.apache.lucene.analysis.util.CharacterUtils; +import org.apache.lucene.util.AttributeFactory; import org.apache.lucene.util.Version; /** @@ -99,7 +100,7 @@ public class NGramTokenizer extends Tokenizer { /** * Creates NGramTokenizer with given min and max n-grams. * @param version the lucene compatibility version - * @param factory {@link org.apache.lucene.util.AttributeSource.AttributeFactory} to use + * @param factory {@link org.apache.lucene.util.AttributeFactory} to use * @param minGram the smallest n-gram to generate * @param maxGram the largest n-gram to generate */ diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/NGramTokenizerFactory.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/NGramTokenizerFactory.java index 14bba0f2d51..7aa4a502cdb 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/NGramTokenizerFactory.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/NGramTokenizerFactory.java @@ -20,7 +20,7 @@ package org.apache.lucene.analysis.ngram; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.util.TokenizerFactory; -import org.apache.lucene.util.AttributeSource.AttributeFactory; +import org.apache.lucene.util.AttributeFactory; import org.apache.lucene.util.Version; import java.io.Reader; diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/path/PathHierarchyTokenizer.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/path/PathHierarchyTokenizer.java index 4c3fc30368a..d61632187f3 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/path/PathHierarchyTokenizer.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/path/PathHierarchyTokenizer.java @@ -17,13 +17,12 @@ package org.apache.lucene.analysis.path; */ import java.io.IOException; -import java.io.Reader; -import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; +import org.apache.lucene.util.AttributeFactory; /** * Tokenizer for path-like hierarchies. @@ -69,7 +68,7 @@ public class PathHierarchyTokenizer extends Tokenizer { } public PathHierarchyTokenizer(int bufferSize, char delimiter, char replacement, int skip) { - this(Token.TOKEN_ATTRIBUTE_FACTORY, bufferSize, delimiter, replacement, skip); + this(DEFAULT_TOKEN_ATTRIBUTE_FACTORY, bufferSize, delimiter, replacement, skip); } public PathHierarchyTokenizer diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/path/PathHierarchyTokenizerFactory.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/path/PathHierarchyTokenizerFactory.java index 646d2c92886..17c834acca0 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/path/PathHierarchyTokenizerFactory.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/path/PathHierarchyTokenizerFactory.java @@ -21,7 +21,7 @@ import java.util.Map; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.util.TokenizerFactory; -import org.apache.lucene.util.AttributeSource.AttributeFactory; +import org.apache.lucene.util.AttributeFactory; /** * Factory for {@link PathHierarchyTokenizer}. diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/path/ReversePathHierarchyTokenizer.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/path/ReversePathHierarchyTokenizer.java index 0f58f7f3dc8..f8bb6296a63 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/path/ReversePathHierarchyTokenizer.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/path/ReversePathHierarchyTokenizer.java @@ -17,15 +17,14 @@ package org.apache.lucene.analysis.path; */ import java.io.IOException; -import java.io.Reader; import java.util.ArrayList; import java.util.List; -import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; +import org.apache.lucene.util.AttributeFactory; /** * Tokenizer for domain-like hierarchies. @@ -82,7 +81,7 @@ public class ReversePathHierarchyTokenizer extends Tokenizer { } public ReversePathHierarchyTokenizer( int bufferSize, char delimiter, char replacement, int skip) { - this(Token.TOKEN_ATTRIBUTE_FACTORY, bufferSize, delimiter, replacement, skip); + this(DEFAULT_TOKEN_ATTRIBUTE_FACTORY, bufferSize, delimiter, replacement, skip); } public ReversePathHierarchyTokenizer (AttributeFactory factory, int bufferSize, char delimiter, char replacement, int skip) { diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTokenizer.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTokenizer.java index 6dcf0720c88..0c1c01f9126 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTokenizer.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTokenizer.java @@ -22,10 +22,10 @@ import java.io.Reader; import java.util.regex.Matcher; import java.util.regex.Pattern; -import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; +import org.apache.lucene.util.AttributeFactory; /** * This tokenizer uses regex pattern matching to construct distinct tokens @@ -67,7 +67,7 @@ public final class PatternTokenizer extends Tokenizer { /** creates a new PatternTokenizer returning tokens from group (-1 for split functionality) */ public PatternTokenizer(Pattern pattern, int group) { - this(Token.TOKEN_ATTRIBUTE_FACTORY, pattern, group); + this(DEFAULT_TOKEN_ATTRIBUTE_FACTORY, pattern, group); } /** creates a new PatternTokenizer returning tokens from group (-1 for split functionality) */ diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTokenizerFactory.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTokenizerFactory.java index 9fc72dfb64a..15ef4c33db5 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTokenizerFactory.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTokenizerFactory.java @@ -21,7 +21,7 @@ import java.util.Map; import java.util.regex.Pattern; import org.apache.lucene.analysis.util.TokenizerFactory; -import org.apache.lucene.util.AttributeSource.AttributeFactory; +import org.apache.lucene.util.AttributeFactory; /** * Factory for {@link PatternTokenizer}. diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ClassicTokenizer.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ClassicTokenizer.java index 463ed2bc174..eb085894788 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ClassicTokenizer.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ClassicTokenizer.java @@ -25,6 +25,7 @@ import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.apache.lucene.analysis.tokenattributes.TypeAttribute; +import org.apache.lucene.util.AttributeFactory; import org.apache.lucene.util.AttributeSource; import org.apache.lucene.util.Version; @@ -106,7 +107,7 @@ public final class ClassicTokenizer extends Tokenizer { } /** - * Creates a new ClassicTokenizer with a given {@link org.apache.lucene.util.AttributeSource.AttributeFactory} + * Creates a new ClassicTokenizer with a given {@link org.apache.lucene.util.AttributeFactory} */ public ClassicTokenizer(Version matchVersion, AttributeFactory factory) { super(factory); diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ClassicTokenizerFactory.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ClassicTokenizerFactory.java index 388ab0227b8..3d73bd7d506 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ClassicTokenizerFactory.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ClassicTokenizerFactory.java @@ -18,7 +18,7 @@ package org.apache.lucene.analysis.standard; */ import org.apache.lucene.analysis.util.TokenizerFactory; -import org.apache.lucene.util.AttributeSource.AttributeFactory; +import org.apache.lucene.util.AttributeFactory; import java.util.Map; diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/StandardTokenizer.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/StandardTokenizer.java index e269dfeb8cc..196c0ca1baf 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/StandardTokenizer.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/StandardTokenizer.java @@ -25,6 +25,7 @@ import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; import org.apache.lucene.analysis.tokenattributes.TypeAttribute; +import org.apache.lucene.util.AttributeFactory; import org.apache.lucene.util.AttributeSource; import org.apache.lucene.util.Version; @@ -120,7 +121,7 @@ public final class StandardTokenizer extends Tokenizer { } /** - * Creates a new StandardTokenizer with a given {@link org.apache.lucene.util.AttributeSource.AttributeFactory} + * Creates a new StandardTokenizer with a given {@link org.apache.lucene.util.AttributeFactory} */ public StandardTokenizer(Version matchVersion, AttributeFactory factory) { super(factory); diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/StandardTokenizerFactory.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/StandardTokenizerFactory.java index de83b82f71e..bb5248b947b 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/StandardTokenizerFactory.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/StandardTokenizerFactory.java @@ -18,7 +18,7 @@ package org.apache.lucene.analysis.standard; */ import org.apache.lucene.analysis.util.TokenizerFactory; -import org.apache.lucene.util.AttributeSource.AttributeFactory; +import org.apache.lucene.util.AttributeFactory; import java.util.Map; diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizer.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizer.java index 8934bcf98ec..cd1218d8da7 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizer.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizer.java @@ -27,9 +27,9 @@ import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.apache.lucene.analysis.tokenattributes.TypeAttribute; +import org.apache.lucene.util.AttributeFactory; import org.apache.lucene.util.AttributeSource; import org.apache.lucene.util.Version; -import org.apache.lucene.util.AttributeSource.AttributeFactory; /** * This class implements Word Break rules from the Unicode Text Segmentation diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizerFactory.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizerFactory.java index 60fa4b4dfe3..e1218075aea 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizerFactory.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizerFactory.java @@ -18,7 +18,7 @@ package org.apache.lucene.analysis.standard; */ import org.apache.lucene.analysis.util.TokenizerFactory; -import org.apache.lucene.util.AttributeSource.AttributeFactory; +import org.apache.lucene.util.AttributeFactory; import java.io.Reader; import java.util.Map; diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/th/ThaiTokenizer.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/th/ThaiTokenizer.java index b916fbdb44b..6fbfb7821ea 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/th/ThaiTokenizer.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/th/ThaiTokenizer.java @@ -20,11 +20,11 @@ package org.apache.lucene.analysis.th; import java.text.BreakIterator; import java.util.Locale; -import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.util.CharArrayIterator; import org.apache.lucene.analysis.util.SegmentingTokenizerBase; +import org.apache.lucene.util.AttributeFactory; /** * Tokenizer that use {@link BreakIterator} to tokenize Thai text. @@ -60,7 +60,7 @@ public class ThaiTokenizer extends SegmentingTokenizerBase { /** Creates a new ThaiTokenizer */ public ThaiTokenizer() { - this(Token.TOKEN_ATTRIBUTE_FACTORY); + this(DEFAULT_TOKEN_ATTRIBUTE_FACTORY); } /** Creates a new ThaiTokenizer, supplying the AttributeFactory */ diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/th/ThaiTokenizerFactory.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/th/ThaiTokenizerFactory.java index 6226f831d3c..05121c3c3c7 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/th/ThaiTokenizerFactory.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/th/ThaiTokenizerFactory.java @@ -21,7 +21,7 @@ import java.util.Map; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.util.TokenizerFactory; -import org.apache.lucene.util.AttributeSource; +import org.apache.lucene.util.AttributeFactory; /** * Factory for {@link ThaiTokenizer}. @@ -43,7 +43,7 @@ public class ThaiTokenizerFactory extends TokenizerFactory { } @Override - public Tokenizer create(AttributeSource.AttributeFactory factory) { + public Tokenizer create(AttributeFactory factory) { return new ThaiTokenizer(factory); } } diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/util/CharTokenizer.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/util/CharTokenizer.java index 46cbc1e4072..bfa40a02af1 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/util/CharTokenizer.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/util/CharTokenizer.java @@ -23,6 +23,7 @@ import java.io.Reader; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; +import org.apache.lucene.util.AttributeFactory; import org.apache.lucene.util.AttributeSource; import org.apache.lucene.analysis.util.CharacterUtils; import org.apache.lucene.util.Version; diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/util/SegmentingTokenizerBase.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/util/SegmentingTokenizerBase.java index 00c3e9fd2b6..0020b1db292 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/util/SegmentingTokenizerBase.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/util/SegmentingTokenizerBase.java @@ -19,12 +19,11 @@ package org.apache.lucene.analysis.util; import java.io.IOException; import java.io.Reader; - import java.text.BreakIterator; -import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; +import org.apache.lucene.util.AttributeFactory; /** * Breaks text into sentences with a {@link BreakIterator} and @@ -63,7 +62,7 @@ public abstract class SegmentingTokenizerBase extends Tokenizer { * be provided to this constructor. */ public SegmentingTokenizerBase(BreakIterator iterator) { - this(Token.TOKEN_ATTRIBUTE_FACTORY, iterator); + this(DEFAULT_TOKEN_ATTRIBUTE_FACTORY, iterator); } /** diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/util/TokenizerFactory.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/util/TokenizerFactory.java index d86092af069..0952d9c8f18 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/util/TokenizerFactory.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/util/TokenizerFactory.java @@ -17,11 +17,10 @@ package org.apache.lucene.analysis.util; * limitations under the License. */ -import org.apache.lucene.analysis.Token; +import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.Tokenizer; -import org.apache.lucene.util.AttributeSource.AttributeFactory; +import org.apache.lucene.util.AttributeFactory; -import java.io.Reader; import java.util.Map; import java.util.Set; @@ -73,7 +72,7 @@ public abstract class TokenizerFactory extends AbstractAnalysisFactory { /** Creates a TokenStream of the specified input using the default attribute factory. */ public final Tokenizer create() { - return create(Token.TOKEN_ATTRIBUTE_FACTORY); + return create(TokenStream.DEFAULT_TOKEN_ATTRIBUTE_FACTORY); } /** Creates a TokenStream of the specified input using the given AttributeFactory */ diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTokenizer.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTokenizer.java index b168b8a416a..cd96cd26f2f 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTokenizer.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTokenizer.java @@ -23,6 +23,7 @@ import org.apache.lucene.analysis.tokenattributes.FlagsAttribute; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; import org.apache.lucene.analysis.tokenattributes.TypeAttribute; +import org.apache.lucene.util.AttributeFactory; import org.apache.lucene.util.AttributeSource; import java.io.IOException; @@ -145,7 +146,7 @@ public final class WikipediaTokenizer extends Tokenizer { /** * Creates a new instance of the {@link org.apache.lucene.analysis.wikipedia.WikipediaTokenizer}. Attaches the - * input to a the newly created JFlex scanner. Uses the given {@link org.apache.lucene.util.AttributeSource.AttributeFactory}. + * input to a the newly created JFlex scanner. Uses the given {@link org.apache.lucene.util.AttributeFactory}. * * @param tokenOutput One of {@link #TOKENS_ONLY}, {@link #UNTOKENIZED_ONLY}, {@link #BOTH} */ diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTokenizerFactory.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTokenizerFactory.java index e12ad815b32..b0dbba08fae 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTokenizerFactory.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTokenizerFactory.java @@ -21,7 +21,7 @@ import java.util.Collections; import java.util.Map; import org.apache.lucene.analysis.util.TokenizerFactory; -import org.apache.lucene.util.AttributeSource.AttributeFactory; +import org.apache.lucene.util.AttributeFactory; /** * Factory for {@link WikipediaTokenizer}. diff --git a/lucene/analysis/common/src/java/org/apache/lucene/collation/CollationAttributeFactory.java b/lucene/analysis/common/src/java/org/apache/lucene/collation/CollationAttributeFactory.java index 1db52ad6f1d..64ece614e49 100644 --- a/lucene/analysis/common/src/java/org/apache/lucene/collation/CollationAttributeFactory.java +++ b/lucene/analysis/common/src/java/org/apache/lucene/collation/CollationAttributeFactory.java @@ -19,11 +19,9 @@ package org.apache.lucene.collation; import java.text.Collator; -import org.apache.lucene.analysis.Token; +import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.collation.tokenattributes.CollatedTermAttributeImpl; -import org.apache.lucene.util.Attribute; -import org.apache.lucene.util.AttributeImpl; -import org.apache.lucene.util.AttributeSource; +import org.apache.lucene.util.AttributeFactory; /** *

@@ -69,18 +67,17 @@ import org.apache.lucene.util.AttributeSource; * ICUCollationAttributeFactory on the query side, or vice versa. *

*/ -public class CollationAttributeFactory extends AttributeSource.AttributeFactory { +public class CollationAttributeFactory extends AttributeFactory.StaticImplementationAttributeFactory { private final Collator collator; - private final AttributeSource.AttributeFactory delegate; /** * Create a CollationAttributeFactory, using - * {@link org.apache.lucene.analysis.Token#TOKEN_ATTRIBUTE_FACTORY} as the + * {@link TokenStream#DEFAULT_TOKEN_ATTRIBUTE_FACTORY} as the * factory for all other attributes. * @param collator CollationKey generator */ public CollationAttributeFactory(Collator collator) { - this(Token.TOKEN_ATTRIBUTE_FACTORY, collator); + this(TokenStream.DEFAULT_TOKEN_ATTRIBUTE_FACTORY, collator); } /** @@ -89,16 +86,13 @@ public class CollationAttributeFactory extends AttributeSource.AttributeFactory * @param delegate Attribute Factory * @param collator CollationKey generator */ - public CollationAttributeFactory(AttributeSource.AttributeFactory delegate, Collator collator) { - this.delegate = delegate; + public CollationAttributeFactory(AttributeFactory delegate, Collator collator) { + super(delegate, CollatedTermAttributeImpl.class); this.collator = collator; } @Override - public AttributeImpl createAttributeInstance( - Class attClass) { - return attClass.isAssignableFrom(CollatedTermAttributeImpl.class) - ? new CollatedTermAttributeImpl(collator) - : delegate.createAttributeInstance(attClass); + public CollatedTermAttributeImpl createInstance() { + return new CollatedTermAttributeImpl(collator); } } diff --git a/lucene/analysis/common/src/test/org/apache/lucene/analysis/core/TestFactories.java b/lucene/analysis/common/src/test/org/apache/lucene/analysis/core/TestFactories.java index b597eb043ec..6ba9999f78f 100644 --- a/lucene/analysis/common/src/test/org/apache/lucene/analysis/core/TestFactories.java +++ b/lucene/analysis/common/src/test/org/apache/lucene/analysis/core/TestFactories.java @@ -35,7 +35,7 @@ import org.apache.lucene.analysis.util.ResourceLoaderAware; import org.apache.lucene.analysis.util.StringMockResourceLoader; import org.apache.lucene.analysis.util.TokenFilterFactory; import org.apache.lucene.analysis.util.TokenizerFactory; -import org.apache.lucene.util.AttributeSource.AttributeFactory; +import org.apache.lucene.util.AttributeFactory; /** * Sanity check some things about all factories, diff --git a/lucene/analysis/common/src/test/org/apache/lucene/analysis/core/TestRandomChains.java b/lucene/analysis/common/src/test/org/apache/lucene/analysis/core/TestRandomChains.java index f3cc4e03b07..f3972e84982 100644 --- a/lucene/analysis/common/src/test/org/apache/lucene/analysis/core/TestRandomChains.java +++ b/lucene/analysis/common/src/test/org/apache/lucene/analysis/core/TestRandomChains.java @@ -81,8 +81,8 @@ import org.apache.lucene.analysis.synonym.SynonymMap; import org.apache.lucene.analysis.util.CharArrayMap; import org.apache.lucene.analysis.util.CharArraySet; import org.apache.lucene.analysis.wikipedia.WikipediaTokenizer; +import org.apache.lucene.util.AttributeFactory; import org.apache.lucene.util.AttributeSource; -import org.apache.lucene.util.AttributeSource.AttributeFactory; import org.apache.lucene.util.CharsRef; import org.apache.lucene.util.Rethrow; import org.apache.lucene.util.TestUtil; diff --git a/lucene/analysis/icu/src/java/org/apache/lucene/analysis/icu/segmentation/ICUTokenizer.java b/lucene/analysis/icu/src/java/org/apache/lucene/analysis/icu/segmentation/ICUTokenizer.java index 25005fea73c..efaffb6db63 100644 --- a/lucene/analysis/icu/src/java/org/apache/lucene/analysis/icu/segmentation/ICUTokenizer.java +++ b/lucene/analysis/icu/src/java/org/apache/lucene/analysis/icu/segmentation/ICUTokenizer.java @@ -20,12 +20,12 @@ package org.apache.lucene.analysis.icu.segmentation; import java.io.IOException; import java.io.Reader; -import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.icu.tokenattributes.ScriptAttribute; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.apache.lucene.analysis.tokenattributes.TypeAttribute; +import org.apache.lucene.util.AttributeFactory; import com.ibm.icu.lang.UCharacter; import com.ibm.icu.text.BreakIterator; @@ -80,7 +80,7 @@ public final class ICUTokenizer extends Tokenizer { * @param config Tailored BreakIterator configuration */ public ICUTokenizer(ICUTokenizerConfig config) { - this(Token.TOKEN_ATTRIBUTE_FACTORY, config); + this(DEFAULT_TOKEN_ATTRIBUTE_FACTORY, config); } /** diff --git a/lucene/analysis/icu/src/java/org/apache/lucene/analysis/icu/segmentation/ICUTokenizerFactory.java b/lucene/analysis/icu/src/java/org/apache/lucene/analysis/icu/segmentation/ICUTokenizerFactory.java index 12b5f824e18..0b3ad833caa 100644 --- a/lucene/analysis/icu/src/java/org/apache/lucene/analysis/icu/segmentation/ICUTokenizerFactory.java +++ b/lucene/analysis/icu/src/java/org/apache/lucene/analysis/icu/segmentation/ICUTokenizerFactory.java @@ -28,7 +28,7 @@ import java.util.Map; import org.apache.lucene.analysis.util.ResourceLoader; import org.apache.lucene.analysis.util.ResourceLoaderAware; import org.apache.lucene.analysis.util.TokenizerFactory; -import org.apache.lucene.util.AttributeSource.AttributeFactory; +import org.apache.lucene.util.AttributeFactory; import org.apache.lucene.util.IOUtils; import com.ibm.icu.lang.UCharacter; diff --git a/lucene/analysis/icu/src/java/org/apache/lucene/collation/ICUCollationAttributeFactory.java b/lucene/analysis/icu/src/java/org/apache/lucene/collation/ICUCollationAttributeFactory.java index 42fc1c6419c..2f890462686 100644 --- a/lucene/analysis/icu/src/java/org/apache/lucene/collation/ICUCollationAttributeFactory.java +++ b/lucene/analysis/icu/src/java/org/apache/lucene/collation/ICUCollationAttributeFactory.java @@ -17,12 +17,9 @@ package org.apache.lucene.collation; * limitations under the License. */ -import org.apache.lucene.analysis.Token; +import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.collation.tokenattributes.ICUCollatedTermAttributeImpl; -import org.apache.lucene.util.Attribute; -import org.apache.lucene.util.AttributeImpl; -import org.apache.lucene.util.AttributeSource; -import org.apache.lucene.collation.CollationAttributeFactory; // javadoc +import org.apache.lucene.util.AttributeFactory; import com.ibm.icu.text.Collator; @@ -63,18 +60,17 @@ import com.ibm.icu.text.Collator; * java.text.Collator over several languages. *

*/ -public class ICUCollationAttributeFactory extends AttributeSource.AttributeFactory { +public class ICUCollationAttributeFactory extends AttributeFactory.StaticImplementationAttributeFactory { private final Collator collator; - private final AttributeSource.AttributeFactory delegate; /** * Create an ICUCollationAttributeFactory, using - * {@link org.apache.lucene.analysis.Token#TOKEN_ATTRIBUTE_FACTORY} as the + * {@link TokenStream#DEFAULT_TOKEN_ATTRIBUTE_FACTORY} as the * factory for all other attributes. * @param collator CollationKey generator */ public ICUCollationAttributeFactory(Collator collator) { - this(Token.TOKEN_ATTRIBUTE_FACTORY, collator); + this(TokenStream.DEFAULT_TOKEN_ATTRIBUTE_FACTORY, collator); } /** @@ -83,16 +79,13 @@ public class ICUCollationAttributeFactory extends AttributeSource.AttributeFacto * @param delegate Attribute Factory * @param collator CollationKey generator */ - public ICUCollationAttributeFactory(AttributeSource.AttributeFactory delegate, Collator collator) { - this.delegate = delegate; + public ICUCollationAttributeFactory(AttributeFactory delegate, Collator collator) { + super(delegate, ICUCollatedTermAttributeImpl.class); this.collator = collator; } @Override - public AttributeImpl createAttributeInstance( - Class attClass) { - return attClass.isAssignableFrom(ICUCollatedTermAttributeImpl.class) - ? new ICUCollatedTermAttributeImpl(collator) - : delegate.createAttributeInstance(attClass); + public ICUCollatedTermAttributeImpl createInstance() { + return new ICUCollatedTermAttributeImpl(collator); } } diff --git a/lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseTokenizer.java b/lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseTokenizer.java index 4fc6c91c06e..8fe2dd1c664 100644 --- a/lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseTokenizer.java +++ b/lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseTokenizer.java @@ -18,7 +18,6 @@ package org.apache.lucene.analysis.ja; */ import java.io.IOException; -import java.io.Reader; import java.util.ArrayList; import java.util.Arrays; import java.util.Collections; @@ -40,6 +39,7 @@ import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; import org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute; import org.apache.lucene.analysis.util.RollingCharBuffer; import org.apache.lucene.util.ArrayUtil; +import org.apache.lucene.util.AttributeFactory; import org.apache.lucene.util.IntsRef; import org.apache.lucene.util.RamUsageEstimator; import org.apache.lucene.util.fst.FST; @@ -195,7 +195,7 @@ public final class JapaneseTokenizer extends Tokenizer { * @param mode tokenization mode. */ public JapaneseTokenizer(UserDictionary userDictionary, boolean discardPunctuation, Mode mode) { - this(org.apache.lucene.analysis.Token.TOKEN_ATTRIBUTE_FACTORY, userDictionary, discardPunctuation, mode); + this(DEFAULT_TOKEN_ATTRIBUTE_FACTORY, userDictionary, discardPunctuation, mode); } /** diff --git a/lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseTokenizerFactory.java b/lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseTokenizerFactory.java index fdae2e964b5..ef164d57a4b 100644 --- a/lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseTokenizerFactory.java +++ b/lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseTokenizerFactory.java @@ -30,7 +30,7 @@ import java.util.Map; import org.apache.lucene.analysis.ja.JapaneseTokenizer.Mode; import org.apache.lucene.analysis.ja.dict.UserDictionary; import org.apache.lucene.analysis.util.TokenizerFactory; -import org.apache.lucene.util.AttributeSource.AttributeFactory; +import org.apache.lucene.util.AttributeFactory; import org.apache.lucene.util.IOUtils; import org.apache.lucene.analysis.util.ResourceLoader; import org.apache.lucene.analysis.util.ResourceLoaderAware; diff --git a/lucene/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/HMMChineseTokenizer.java b/lucene/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/HMMChineseTokenizer.java index 0e068c8219e..ba7e8688729 100644 --- a/lucene/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/HMMChineseTokenizer.java +++ b/lucene/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/HMMChineseTokenizer.java @@ -22,12 +22,12 @@ import java.text.BreakIterator; import java.util.Iterator; import java.util.Locale; -import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.cn.smart.hhmm.SegToken; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.tokenattributes.TypeAttribute; import org.apache.lucene.analysis.util.SegmentingTokenizerBase; +import org.apache.lucene.util.AttributeFactory; /** * Tokenizer for Chinese or mixed Chinese-English text. @@ -48,7 +48,7 @@ public class HMMChineseTokenizer extends SegmentingTokenizerBase { /** Creates a new HMMChineseTokenizer */ public HMMChineseTokenizer() { - this(Token.TOKEN_ATTRIBUTE_FACTORY); + this(DEFAULT_TOKEN_ATTRIBUTE_FACTORY); } /** Creates a new HMMChineseTokenizer, supplying the AttributeFactory */ diff --git a/lucene/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/HMMChineseTokenizerFactory.java b/lucene/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/HMMChineseTokenizerFactory.java index 31f2b45ed02..8d537aa89ce 100644 --- a/lucene/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/HMMChineseTokenizerFactory.java +++ b/lucene/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/HMMChineseTokenizerFactory.java @@ -21,7 +21,7 @@ import java.util.Map; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.util.TokenizerFactory; -import org.apache.lucene.util.AttributeSource.AttributeFactory; +import org.apache.lucene.util.AttributeFactory; /** * Factory for {@link HMMChineseTokenizer} diff --git a/lucene/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/SentenceTokenizer.java b/lucene/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/SentenceTokenizer.java index 64257710f16..7a858954920 100644 --- a/lucene/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/SentenceTokenizer.java +++ b/lucene/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/SentenceTokenizer.java @@ -24,6 +24,7 @@ import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.tokenattributes.TypeAttribute; +import org.apache.lucene.util.AttributeFactory; import org.apache.lucene.util.AttributeSource; /** diff --git a/lucene/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/SmartChineseSentenceTokenizerFactory.java b/lucene/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/SmartChineseSentenceTokenizerFactory.java index f94086ed4a3..da844d36933 100644 --- a/lucene/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/SmartChineseSentenceTokenizerFactory.java +++ b/lucene/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/SmartChineseSentenceTokenizerFactory.java @@ -21,7 +21,7 @@ import java.io.Reader; import java.util.Map; import org.apache.lucene.analysis.util.TokenizerFactory; -import org.apache.lucene.util.AttributeSource.AttributeFactory; +import org.apache.lucene.util.AttributeFactory; /** * Factory for the SmartChineseAnalyzer {@link SentenceTokenizer} diff --git a/lucene/analysis/uima/src/java/org/apache/lucene/analysis/uima/BaseUIMATokenizer.java b/lucene/analysis/uima/src/java/org/apache/lucene/analysis/uima/BaseUIMATokenizer.java index 3620dbb158a..f1503062810 100644 --- a/lucene/analysis/uima/src/java/org/apache/lucene/analysis/uima/BaseUIMATokenizer.java +++ b/lucene/analysis/uima/src/java/org/apache/lucene/analysis/uima/BaseUIMATokenizer.java @@ -19,6 +19,7 @@ package org.apache.lucene.analysis.uima; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.uima.ae.AEProviderFactory; +import org.apache.lucene.util.AttributeFactory; import org.apache.uima.analysis_engine.AnalysisEngine; import org.apache.uima.analysis_engine.AnalysisEngineProcessException; import org.apache.uima.cas.CAS; diff --git a/lucene/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMAAnnotationsTokenizer.java b/lucene/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMAAnnotationsTokenizer.java index 7715fe38c57..0b8fc39b28a 100644 --- a/lucene/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMAAnnotationsTokenizer.java +++ b/lucene/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMAAnnotationsTokenizer.java @@ -17,17 +17,16 @@ package org.apache.lucene.analysis.uima; * limitations under the License. */ -import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; +import org.apache.lucene.util.AttributeFactory; import org.apache.uima.analysis_engine.AnalysisEngineProcessException; import org.apache.uima.cas.Type; import org.apache.uima.cas.text.AnnotationFS; import org.apache.uima.resource.ResourceInitializationException; import java.io.IOException; -import java.io.Reader; import java.util.Map; /** @@ -44,7 +43,7 @@ public final class UIMAAnnotationsTokenizer extends BaseUIMATokenizer { private int finalOffset = 0; public UIMAAnnotationsTokenizer(String descriptorPath, String tokenType, Map configurationParameters) { - this(descriptorPath, tokenType, configurationParameters, Token.TOKEN_ATTRIBUTE_FACTORY); + this(descriptorPath, tokenType, configurationParameters, DEFAULT_TOKEN_ATTRIBUTE_FACTORY); } public UIMAAnnotationsTokenizer(String descriptorPath, String tokenType, Map configurationParameters, diff --git a/lucene/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMAAnnotationsTokenizerFactory.java b/lucene/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMAAnnotationsTokenizerFactory.java index 8a1990c7a15..76bceb7698d 100644 --- a/lucene/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMAAnnotationsTokenizerFactory.java +++ b/lucene/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMAAnnotationsTokenizerFactory.java @@ -18,7 +18,7 @@ package org.apache.lucene.analysis.uima; */ import org.apache.lucene.analysis.util.TokenizerFactory; -import org.apache.lucene.util.AttributeSource.AttributeFactory; +import org.apache.lucene.util.AttributeFactory; import java.io.Reader; import java.util.HashMap; diff --git a/lucene/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMATypeAwareAnnotationsTokenizer.java b/lucene/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMATypeAwareAnnotationsTokenizer.java index 045813d42e3..65fd2ef2142 100644 --- a/lucene/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMATypeAwareAnnotationsTokenizer.java +++ b/lucene/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMATypeAwareAnnotationsTokenizer.java @@ -17,11 +17,11 @@ package org.apache.lucene.analysis.uima; * limitations under the License. */ -import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.tokenattributes.TypeAttribute; +import org.apache.lucene.util.AttributeFactory; import org.apache.uima.analysis_engine.AnalysisEngineProcessException; import org.apache.uima.cas.CASException; import org.apache.uima.cas.FeaturePath; @@ -30,7 +30,6 @@ import org.apache.uima.cas.text.AnnotationFS; import org.apache.uima.resource.ResourceInitializationException; import java.io.IOException; -import java.io.Reader; import java.util.Map; /** @@ -54,7 +53,7 @@ public final class UIMATypeAwareAnnotationsTokenizer extends BaseUIMATokenizer { private int finalOffset = 0; public UIMATypeAwareAnnotationsTokenizer(String descriptorPath, String tokenType, String typeAttributeFeaturePath, Map configurationParameters) { - this(descriptorPath, tokenType, typeAttributeFeaturePath, configurationParameters, Token.TOKEN_ATTRIBUTE_FACTORY); + this(descriptorPath, tokenType, typeAttributeFeaturePath, configurationParameters, DEFAULT_TOKEN_ATTRIBUTE_FACTORY); } public UIMATypeAwareAnnotationsTokenizer(String descriptorPath, String tokenType, String typeAttributeFeaturePath, diff --git a/lucene/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMATypeAwareAnnotationsTokenizerFactory.java b/lucene/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMATypeAwareAnnotationsTokenizerFactory.java index b78788675b9..f3dabed5b8c 100644 --- a/lucene/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMATypeAwareAnnotationsTokenizerFactory.java +++ b/lucene/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMATypeAwareAnnotationsTokenizerFactory.java @@ -18,7 +18,7 @@ package org.apache.lucene.analysis.uima; */ import org.apache.lucene.analysis.util.TokenizerFactory; -import org.apache.lucene.util.AttributeSource.AttributeFactory; +import org.apache.lucene.util.AttributeFactory; import java.io.Reader; import java.util.HashMap; diff --git a/lucene/core/src/java/org/apache/lucene/analysis/NumericTokenStream.java b/lucene/core/src/java/org/apache/lucene/analysis/NumericTokenStream.java index 9b6b8a60c4d..231333b99dc 100644 --- a/lucene/core/src/java/org/apache/lucene/analysis/NumericTokenStream.java +++ b/lucene/core/src/java/org/apache/lucene/analysis/NumericTokenStream.java @@ -28,6 +28,7 @@ import org.apache.lucene.document.LongField; // for javadocs import org.apache.lucene.search.NumericRangeFilter; // for javadocs import org.apache.lucene.search.NumericRangeQuery; import org.apache.lucene.util.Attribute; +import org.apache.lucene.util.AttributeFactory; import org.apache.lucene.util.AttributeImpl; import org.apache.lucene.util.AttributeReflector; import org.apache.lucene.util.BytesRef; @@ -233,7 +234,7 @@ public final class NumericTokenStream extends TokenStream { /** * Expert: Creates a token stream for numeric values with the specified * precisionStep using the given - * {@link org.apache.lucene.util.AttributeSource.AttributeFactory}. + * {@link org.apache.lucene.util.AttributeFactory}. * The stream is not yet initialized, * before using set a value using the various set???Value() methods. */ diff --git a/lucene/core/src/java/org/apache/lucene/analysis/Token.java b/lucene/core/src/java/org/apache/lucene/analysis/Token.java index 378aae70fe4..c3bfecbdb6b 100644 --- a/lucene/core/src/java/org/apache/lucene/analysis/Token.java +++ b/lucene/core/src/java/org/apache/lucene/analysis/Token.java @@ -17,16 +17,12 @@ package org.apache.lucene.analysis; * limitations under the License. */ -import org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl; -import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.tokenattributes.FlagsAttribute; +import org.apache.lucene.analysis.tokenattributes.PackedTokenAttributeImpl; import org.apache.lucene.analysis.tokenattributes.PayloadAttribute; -import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; -import org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute; -import org.apache.lucene.analysis.tokenattributes.TypeAttribute; import org.apache.lucene.index.DocsAndPositionsEnum; // for javadoc import org.apache.lucene.util.Attribute; -import org.apache.lucene.util.AttributeSource; +import org.apache.lucene.util.AttributeFactory; import org.apache.lucene.util.AttributeImpl; import org.apache.lucene.util.AttributeReflector; import org.apache.lucene.util.BytesRef; @@ -57,54 +53,7 @@ import org.apache.lucene.util.BytesRef; Even though it is not necessary to use Token anymore, with the new TokenStream API it can be used as convenience class that implements all {@link Attribute}s, which is especially useful to easily switch from the old to the new TokenStream API. - -

- -

Tokenizers and TokenFilters should try to re-use a Token - instance when possible for best performance, by - implementing the {@link TokenStream#incrementToken()} API. - Failing that, to create a new Token you should first use - one of the constructors that starts with null text. To load - the token from a char[] use {@link #copyBuffer(char[], int, int)}. - To load from a String use {@link #setEmpty} followed by {@link #append(CharSequence)} or {@link #append(CharSequence, int, int)}. - Alternatively you can get the Token's termBuffer by calling either {@link #buffer()}, - if you know that your text is shorter than the capacity of the termBuffer - or {@link #resizeBuffer(int)}, if there is any possibility - that you may need to grow the buffer. Fill in the characters of your term into this - buffer, with {@link String#getChars(int, int, char[], int)} if loading from a string, - or with {@link System#arraycopy(Object, int, Object, int, int)}, and finally call {@link #setLength(int)} to - set the length of the term text. See LUCENE-969 - for details.

-

Typical Token reuse patterns: -

    -
  • Copying text from a string (type is reset to {@link #DEFAULT_TYPE} if not specified):
    -
    -    return reusableToken.reinit(string, startOffset, endOffset[, type]);
    -  
    -
  • -
  • Copying some text from a string (type is reset to {@link #DEFAULT_TYPE} if not specified):
    -
    -    return reusableToken.reinit(string, 0, string.length(), startOffset, endOffset[, type]);
    -  
    -
  • - -
  • Copying text from char[] buffer (type is reset to {@link #DEFAULT_TYPE} if not specified):
    -
    -    return reusableToken.reinit(buffer, 0, buffer.length, startOffset, endOffset[, type]);
    -  
    -
  • -
  • Copying some text from a char[] buffer (type is reset to {@link #DEFAULT_TYPE} if not specified):
    -
    -    return reusableToken.reinit(buffer, start, end - start, startOffset, endOffset[, type]);
    -  
    -
  • -
  • Copying from one one Token to another (type is reset to {@link #DEFAULT_TYPE} if not specified):
    -
    -    return reusableToken.reinit(source.buffer(), 0, source.length(), source.startOffset(), source.endOffset()[, source.type()]);
    -  
    -
  • -
+ A few things to note: