Add an example that builds a CharFilter chain in Analyzer

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1378593 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Robert Muir 2012-08-29 14:56:28 +00:00
parent d447b1ae51
commit 71dd31de24
2 changed files with 28 additions and 0 deletions

View File

@ -33,6 +33,9 @@ import java.io.Reader;
* You can optionally provide more efficient implementations of additional methods
* like {@link #read()}, {@link #read(char[])}, {@link #read(java.nio.CharBuffer)},
* but this is not required.
* <p>
* For examples and integration with {@link Analyzer}, see the
* {@link org.apache.lucene.analysis Analysis package documentation}.
*/
// the way java.io.FilterReader should work!
public abstract class CharFilter extends Reader {

View File

@ -817,5 +817,30 @@ As a small hint, this is how the new Attribute class could begin:
...
</pre>
<h4>Adding a CharFilter chain</h4>
Analyzers take Java {@link java.io.Reader}s as input. Of course you can wrap your Readers with {@link java.io.FilterReader}s
to manipulate content, but this would have the big disadvantage that character offsets might be inconsistent with your original
text.
<p>
{@link org.apache.lucene.analysis.CharFilter} is designed to allow you to pre-process input like a FilterReader would, but also
preserve the original offsets associated with those characters. This way mechanisms like highlighting still work correctly.
CharFilters can be chained.
<p>
Example:
<pre class="prettyprint">
public class MyAnalyzer extends Analyzer {
{@literal @Override}
protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
return new TokenStreamComponents(new MyTokenizer(reader));
}
{@literal @Override}
protected Reader initReader(String fieldName, Reader reader) {
// wrap the Reader in a CharFilter chain.
return new SecondCharFilter(new FirstCharFilter(reader));
}
}
</pre>
</body>
</html>