Reverting back because of:

"Actually, now I'm considering reverting back to the version without a public clear() method. The rationale is that this would be less complex and more consistent with the AnalyzerUtil design (simple methods generating simple anonymous analyzer wrappers). If desired, you can still (re)use a single static "child" analyzer instance. It's cheap and easy to create a new caching analyzer on top of the static analyzer, and to do so before each document. The old one will simply be gc'd."


git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@479749 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Wolfgang Hoschek 2006-11-27 20:25:32 +00:00
parent ad49369d3d
commit 8ab762aef2
1 changed files with 38 additions and 50 deletions

View File

@ -203,10 +203,10 @@ public class AnalyzerUtil {
/**
* Analyzer wrapper that caches all tokens generated by the underlying child analyzer's
* Returns an analyzer wrapper that caches all tokens generated by the underlying child analyzer's
* token streams, and delivers those cached tokens on subsequent calls to
* <code>tokenStream(String fieldName, Reader reader)</code>,
* if the fieldName has been seen before, altogether ignoring the Reader parameter.
* <code>tokenStream(String fieldName, Reader reader)</code>
* if the fieldName has been seen before, altogether ignoring the Reader parameter on cache lookup.
* <p>
* If Analyzer / TokenFilter chains are expensive in terms of I/O or CPU, such caching can
* help improve performance if the same document is added to multiple Lucene indexes,
@ -216,35 +216,22 @@ public class AnalyzerUtil {
* <ul>
* <li>Caching the tokens of large Lucene documents can lead to out of memory exceptions.</li>
* <li>The Token instances delivered by the underlying child analyzer must be immutable.</li>
* <li>A caching analyzer instance must not be used for more than one document, unless
* <code>clear()</code> is called before each new document.</li>
* <li>A caching analyzer instance must not be used for more than one document
* because the cache is not keyed on the Reader parameter.</li>
* </ul>
*/
public static class TokenCachingAnalyzer extends Analyzer {
private final Analyzer child;
private final HashMap cache = new HashMap();
/**
* Creates and returns a new caching analyzer that wraps the given underlying child analyzer.
*
* @param child
* the underlying child analyzer
* @return a new caching analyzer
* @return a new analyzer
*/
public TokenCachingAnalyzer(Analyzer child) {
public static Analyzer getTokenCachingAnalyzer(final Analyzer child) {
if (child == null)
throw new IllegalArgumentException("child analyzer must not be null");
this.child = child;
}
return new Analyzer() {
/**
* Removes all cached data.
*/
public void clear() {
cache.clear();
}
private final HashMap cache = new HashMap();
public TokenStream tokenStream(String fieldName, Reader reader) {
final ArrayList tokens = (ArrayList) cache.get(fieldName);
@ -271,6 +258,7 @@ public class AnalyzerUtil {
};
}
}
};
}