mirror of https://github.com/apache/lucene.git
nuke some outdated references to contrib
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1328873 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
dce30db692
commit
0bf1f362eb
|
@ -376,7 +376,7 @@ LUCENE-1458, LUCENE-2111: Flexible Indexing
|
|||
- o.a.l.util.CharacterUtils -> o.a.l.analysis.util.CharacterUtils
|
||||
|
||||
* LUCENE-2514: The option to use a Collator's order (instead of binary order) for
|
||||
sorting and range queries has been moved to contrib/queries.
|
||||
sorting and range queries has been moved to lucene/queries.
|
||||
|
||||
The Collated TermRangeQuery/Filter has been moved to SlowCollatedTermRangeQuery/Filter,
|
||||
and the collated sorting has been moved to SlowCollatedStringComparator.
|
||||
|
|
|
@ -12,8 +12,7 @@ including, but not limited to:
|
|||
- Apache Xerces
|
||||
|
||||
ICU4J, (under analysis/icu) is licensed under an MIT styles license
|
||||
(contrib/icu/lib/ICU-LICENSE.txt) and Copyright (c) 1995-2008
|
||||
International Business Machines Corporation and others
|
||||
and Copyright (c) 1995-2008 International Business Machines Corporation and others
|
||||
|
||||
Some data files (under analysis/icu/src/data) are derived from Unicode data such
|
||||
as the Unicode Character Database. See http://unicode.org/copyright.html for more
|
||||
|
|
|
@ -79,17 +79,14 @@ algorithm.
|
|||
<hr/>
|
||||
<h1><a name="collation">Collation</a></h1>
|
||||
<p>
|
||||
<code>ICUCollationKeyFilter</code>
|
||||
<code>ICUCollationKeyAnalyzer</code>
|
||||
converts each token into its binary <code>CollationKey</code> using the
|
||||
provided <code>Collator</code>, and then encode the <code>CollationKey</code>
|
||||
as a String using
|
||||
{@link org.apache.lucene.util.IndexableBinaryStringTools}, to allow it to be
|
||||
provided <code>Collator</code>, allowing it to be
|
||||
stored as an index term.
|
||||
</p>
|
||||
<p>
|
||||
<code>ICUCollationKeyFilter</code> depends on ICU4J 4.4 to produce the
|
||||
<code>CollationKey</code>s. <code>icu4j-4.4.jar</code>
|
||||
is included in Lucene's Subversion repository at <code>contrib/icu/lib/</code>.
|
||||
<code>ICUCollationKeyAnalyzer</code> depends on ICU4J to produce the
|
||||
<code>CollationKey</code>s.
|
||||
</p>
|
||||
|
||||
<h2>Use Cases</h2>
|
||||
|
@ -209,11 +206,11 @@ algorithm.
|
|||
</li>
|
||||
</ol>
|
||||
<p>
|
||||
<code>ICUCollationKeyFilter</code> uses ICU4J's <code>Collator</code>, which
|
||||
<code>ICUCollationKeyAnalyzer</code> uses ICU4J's <code>Collator</code>, which
|
||||
makes its version available, thus allowing collation to be versioned
|
||||
independently from the JVM. <code>ICUCollationKeyFilter</code> is also
|
||||
independently from the JVM. <code>ICUCollationKeyAnalyzer</code> is also
|
||||
significantly faster and generates significantly shorter keys than
|
||||
<code>CollationKeyFilter</code>. See
|
||||
<code>CollationKeyAnalyzer</code>. See
|
||||
<a href="http://site.icu-project.org/charts/collation-icu4j-sun"
|
||||
>http://site.icu-project.org/charts/collation-icu4j-sun</a> for key
|
||||
generation timing and key length comparisons between ICU4J and
|
||||
|
@ -222,8 +219,8 @@ algorithm.
|
|||
<p>
|
||||
<code>CollationKey</code>s generated by <code>java.text.Collator</code>s are
|
||||
not compatible with those those generated by ICU Collators. Specifically, if
|
||||
you use <code>CollationKeyFilter</code> to generate index terms, do not use
|
||||
<code>ICUCollationKeyFilter</code> on the query side, or vice versa.
|
||||
you use <code>CollationKeyAnalyzer</code> to generate index terms, do not use
|
||||
<code>ICUCollationKeyAnalyzer</code> on the query side, or vice versa.
|
||||
</p>
|
||||
<hr/>
|
||||
<h1><a name="normalization">Normalization</a></h1>
|
||||
|
|
|
@ -20,7 +20,7 @@
|
|||
# This alg will process the Reuters documents feed to produce a
|
||||
# single file that contains all documents, one per line.
|
||||
#
|
||||
# To use this, first cd to contrib/benchmark and then run:
|
||||
# To use this, first cd to benchmark and then run:
|
||||
#
|
||||
# ant run-task -Dtask.alg=conf/createLineFile.alg
|
||||
#
|
||||
|
|
|
@ -20,7 +20,7 @@
|
|||
# This alg will process the Wikipedia documents feed to produce a
|
||||
# single file that contains all documents, one per line.
|
||||
#
|
||||
# To use this, first cd to contrib/benchmark and then run:
|
||||
# To use this, first cd to benchmark and then run:
|
||||
#
|
||||
# ant run-task -Dtask.alg=conf/extractWikipedia.alg
|
||||
#
|
||||
|
|
|
@ -24,7 +24,7 @@
|
|||
# indexing your documents vs time spent creating the documents.
|
||||
#
|
||||
# To use this, you must first run the createLineFile.alg, then cd to
|
||||
# contrib/benchmark and then run:
|
||||
# benchmark and then run:
|
||||
#
|
||||
# ant run-task -Dtask.alg=conf/indexLineFile.alg
|
||||
#
|
||||
|
|
|
@ -22,7 +22,7 @@
|
|||
# gather baselines for operations like indexing (if reading from the content
|
||||
# source takes 'X' time, we cannot index faster).
|
||||
#
|
||||
# To use this, first cd to contrib/benchmark and then run:
|
||||
# To use this, first cd to benchmark and then run:
|
||||
#
|
||||
# ant run-task -Dtask.alg=conf/readContentSource.alg
|
||||
#
|
||||
|
|
|
@ -20,7 +20,7 @@
|
|||
# This alg reads all tokens out of a document but does not index them.
|
||||
# This is useful for benchmarking tokenizers.
|
||||
#
|
||||
# To use this, cd to contrib/benchmark and then run:
|
||||
# To use this, cd to benchmark and then run:
|
||||
#
|
||||
# ant run-task -Dtask.alg=conf/tokenize.alg
|
||||
#
|
||||
|
|
|
@ -229,7 +229,7 @@ and proximity searches (though sentence identification is not provided by Lucene
|
|||
Tokenizer, and TokenFilter(s) <i>(optional)</i> — or components you
|
||||
create, or a combination of existing and newly created components. Before
|
||||
pursuing this approach, you may find it worthwhile to explore the
|
||||
contrib/analyzers library and/or ask on the
|
||||
<a href="{@docRoot}/../analyzers-common/overview-summary.html">analyzers-common</a> library and/or ask on the
|
||||
<a href="http://lucene.apache.org/java/docs/mailinglists.html"
|
||||
>java-user@lucene.apache.org mailing list</a> first to see if what you
|
||||
need already exists. If you are still committed to creating your own
|
||||
|
|
|
@ -34,7 +34,7 @@ import org.apache.lucene.util.BytesRef;
|
|||
* <p/>
|
||||
*
|
||||
* This is the same functionality as TermsFilter (from
|
||||
* contrib/queries), except this filter requires that the
|
||||
* queries/), except this filter requires that the
|
||||
* field contains only a single term for all documents.
|
||||
* Because of drastically different implementations, they
|
||||
* also have different performance characteristics, as
|
||||
|
|
|
@ -28,7 +28,7 @@ import org.apache.lucene.util.Bits;
|
|||
* <p/>
|
||||
*
|
||||
* Technically, this same functionality could be achieved
|
||||
* with ChainedFilter (under contrib/misc), however the
|
||||
* with ChainedFilter (under queries/), however the
|
||||
* benefit of this class is it never materializes the full
|
||||
* bitset for the filter. Instead, the {@link #match}
|
||||
* method is invoked on-demand, per docID visited during
|
||||
|
|
|
@ -38,7 +38,7 @@ import org.apache.lucene.util.automaton.RegExp;
|
|||
* <p>
|
||||
* The supported syntax is documented in the {@link RegExp} class.
|
||||
* Note this might be different than other regular expression implementations.
|
||||
* For some alternatives with different syntax, look under contrib/regex
|
||||
* For some alternatives with different syntax, look under the sandbox.
|
||||
* </p>
|
||||
* <p>
|
||||
* Note this query can be slow, as it needs to iterate over many terms. In order
|
||||
|
|
|
@ -1,3 +1,3 @@
|
|||
contrib/miscellaneous is a home of different Lucene-related classes
|
||||
miscellaneous is a home of different Lucene-related classes
|
||||
that all belong to org.apache.lucene.misc package, as they are not
|
||||
substantial enough to warrant their own package.
|
||||
|
|
|
@ -47,7 +47,7 @@ for details.
|
|||
|
||||
Steps to build:
|
||||
<ul>
|
||||
<li> <tt>cd lucene/contrib/misc/</tt>
|
||||
<li> <tt>cd lucene/misc/</tt>
|
||||
|
||||
<li> To compile NativePosixUtil.cpp -> libNativePosixUtil.so, run<tt> ant build-native-unix</tt>.
|
||||
|
||||
|
|
|
@ -36,9 +36,9 @@ import org.apache.lucene.document.StringField;
|
|||
import org.apache.lucene.document.TextField;
|
||||
import org.apache.lucene.index.DocValues;
|
||||
|
||||
/** Minimal port of contrib/benchmark's LneDocSource +
|
||||
/** Minimal port of benchmark's LneDocSource +
|
||||
* DocMaker, so tests can enum docs from a line file created
|
||||
* by contrib/benchmark's WriteLineDoc task */
|
||||
* by benchmark's WriteLineDoc task */
|
||||
public class LineFileDocs implements Closeable {
|
||||
|
||||
private BufferedReader reader;
|
||||
|
|
|
@ -49,8 +49,7 @@ including, but not limited to:
|
|||
- Apache Xerces
|
||||
|
||||
ICU4J, (under analysis/icu) is licensed under an MIT styles license
|
||||
(contrib/icu/lib/ICU-LICENSE.txt) and Copyright (c) 1995-2008
|
||||
International Business Machines Corporation and others
|
||||
and Copyright (c) 1995-2008 International Business Machines Corporation and others
|
||||
|
||||
Some data files (under analysis/icu/src/data) are derived from Unicode data such
|
||||
as the Unicode Character Database. See http://unicode.org/copyright.html for more
|
||||
|
|
Loading…
Reference in New Issue