mirror of https://github.com/apache/lucene.git
LUCENE-8279: fix javadocs wrong header levels and accessibility issues
Java 13 adds a new doclint check under "accessibility" that the html header nesting level isn't crazy. Many are incorrect because the html4-style javadocs had horrible font-sizes, so developers used the wrong header level to work around it. This is no issue in trunk (always html5). Java recommends against using such structured tags at all in javadocs, but that is a more involved change: this just "shifts" header levels in documents to be correct.
This commit is contained in:
parent
f5c132be6d
commit
f41eabdc5f
|
@ -44,7 +44,7 @@ allprojects {
|
||||||
)
|
)
|
||||||
|
|
||||||
opts.addStringOption("-release", "11")
|
opts.addStringOption("-release", "11")
|
||||||
opts.addBooleanOption('Xdoclint:all,-missing,-accessibility', true)
|
opts.addBooleanOption('Xdoclint:all,-missing', true)
|
||||||
|
|
||||||
def libName = project.path.startsWith(":lucene") ? "Lucene" : "Solr"
|
def libName = project.path.startsWith(":lucene") ? "Lucene" : "Solr"
|
||||||
opts.overview = file("src/java/overview.html").toString()
|
opts.overview = file("src/java/overview.html").toString()
|
||||||
|
|
|
@ -73,8 +73,8 @@
|
||||||
* word.</li>
|
* word.</li>
|
||||||
* </ul>
|
* </ul>
|
||||||
*
|
*
|
||||||
* <h3>Compound word token filters</h3>
|
* <h2>Compound word token filters</h2>
|
||||||
* <h4>HyphenationCompoundWordTokenFilter</h4>
|
* <h3>HyphenationCompoundWordTokenFilter</h3>
|
||||||
* The {@link
|
* The {@link
|
||||||
* org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter
|
* org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter
|
||||||
* HyphenationCompoundWordTokenFilter} uses hyphenation grammars to find
|
* HyphenationCompoundWordTokenFilter} uses hyphenation grammars to find
|
||||||
|
@ -82,7 +82,7 @@
|
||||||
* without a dictionary as well but then produces a lot of "nonword" tokens.
|
* without a dictionary as well but then produces a lot of "nonword" tokens.
|
||||||
* The quality of the output tokens is directly connected to the quality of the
|
* The quality of the output tokens is directly connected to the quality of the
|
||||||
* grammar file you use. For languages like German they are quite good.
|
* grammar file you use. For languages like German they are quite good.
|
||||||
* <h5>Grammar file</h5>
|
* <h4>Grammar file</h4>
|
||||||
* Unfortunately we cannot bundle the hyphenation grammar files with Lucene
|
* Unfortunately we cannot bundle the hyphenation grammar files with Lucene
|
||||||
* because they do not use an ASF compatible license (they use the LaTeX
|
* because they do not use an ASF compatible license (they use the LaTeX
|
||||||
* Project Public License instead). You can find the XML based grammar
|
* Project Public License instead). You can find the XML based grammar
|
||||||
|
@ -99,7 +99,7 @@
|
||||||
* <a href="http://xmlgraphics.apache.org/fop/">Apache FOP project</a>
|
* <a href="http://xmlgraphics.apache.org/fop/">Apache FOP project</a>
|
||||||
* .
|
* .
|
||||||
*
|
*
|
||||||
* <h4>DictionaryCompoundWordTokenFilter</h4>
|
* <h3>DictionaryCompoundWordTokenFilter</h3>
|
||||||
* The {@link
|
* The {@link
|
||||||
* org.apache.lucene.analysis.compound.DictionaryCompoundWordTokenFilter
|
* org.apache.lucene.analysis.compound.DictionaryCompoundWordTokenFilter
|
||||||
* DictionaryCompoundWordTokenFilter} uses a dictionary-only approach to
|
* DictionaryCompoundWordTokenFilter} uses a dictionary-only approach to
|
||||||
|
@ -107,7 +107,7 @@
|
||||||
* uses the hyphenation grammars. You can use it as a first start to
|
* uses the hyphenation grammars. You can use it as a first start to
|
||||||
* see if your dictionary is good or not because it is much simpler in design.
|
* see if your dictionary is good or not because it is much simpler in design.
|
||||||
*
|
*
|
||||||
* <h3>Dictionary</h3>
|
* <h2>Dictionary</h2>
|
||||||
* The output quality of both token filters is directly connected to the
|
* The output quality of both token filters is directly connected to the
|
||||||
* quality of the dictionary you use. They are language dependent of course.
|
* quality of the dictionary you use. They are language dependent of course.
|
||||||
* You always should use a dictionary
|
* You always should use a dictionary
|
||||||
|
@ -118,7 +118,7 @@
|
||||||
* dictionaries</a>
|
* dictionaries</a>
|
||||||
* Wiki.
|
* Wiki.
|
||||||
*
|
*
|
||||||
* <h3>Which variant should I use?</h3>
|
* <h2>Which variant should I use?</h2>
|
||||||
* This decision matrix should help you:
|
* This decision matrix should help you:
|
||||||
* <table style="border: 1px solid">
|
* <table style="border: 1px solid">
|
||||||
* <caption>comparison of dictionary and hyphenation based decompounding</caption>
|
* <caption>comparison of dictionary and hyphenation based decompounding</caption>
|
||||||
|
@ -138,7 +138,7 @@
|
||||||
* <td>slow</td>
|
* <td>slow</td>
|
||||||
* </tr>
|
* </tr>
|
||||||
* </table>
|
* </table>
|
||||||
* <h3>Examples</h3>
|
* <h2>Examples</h2>
|
||||||
* <pre class="prettyprint">
|
* <pre class="prettyprint">
|
||||||
* public void testHyphenationCompoundWordsDE() throws Exception {
|
* public void testHyphenationCompoundWordsDE() throws Exception {
|
||||||
* String[] dict = { "Rind", "Fleisch", "Draht", "Schere", "Gesetz",
|
* String[] dict = { "Rind", "Fleisch", "Draht", "Schere", "Gesetz",
|
||||||
|
|
|
@ -272,6 +272,7 @@ correct stem.</li>
|
||||||
<li><b>table size:</b> the size in bytes of the stemmer table.</li>
|
<li><b>table size:</b> the size in bytes of the stemmer table.</li>
|
||||||
</ul>
|
</ul>
|
||||||
<table class="padding2" style="border: 1px solid; border-spacing: 0px; border-collapse: separate">
|
<table class="padding2" style="border: 1px solid; border-spacing: 0px; border-collapse: separate">
|
||||||
|
<caption>test results for different sizes of training samples</caption>
|
||||||
<tbody>
|
<tbody>
|
||||||
<tr style="background-color: #a0b0c0">
|
<tr style="background-color: #a0b0c0">
|
||||||
<th>Training sets</th>
|
<th>Training sets</th>
|
||||||
|
|
|
@ -63,7 +63,7 @@ import org.apache.lucene.util.fst.Util;
|
||||||
* <p>
|
* <p>
|
||||||
*
|
*
|
||||||
* <a id="Termdictionary"></a>
|
* <a id="Termdictionary"></a>
|
||||||
* <h3>Term Dictionary</h3>
|
* <h2>Term Dictionary</h2>
|
||||||
* <p>
|
* <p>
|
||||||
* The .tst contains a list of FSTs, one for each field.
|
* The .tst contains a list of FSTs, one for each field.
|
||||||
* The FST maps a term to its corresponding statistics (e.g. docfreq)
|
* The FST maps a term to its corresponding statistics (e.g. docfreq)
|
||||||
|
|
|
@ -201,10 +201,9 @@
|
||||||
<!-- We must have the index disabled for releases, as Java 11 includes a javascript search engine with GPL license: -->
|
<!-- We must have the index disabled for releases, as Java 11 includes a javascript search engine with GPL license: -->
|
||||||
<property name="javadoc.noindex" value="true"/>
|
<property name="javadoc.noindex" value="true"/>
|
||||||
|
|
||||||
<!---TODO: Fix accessibility (order of H1/H2/H3 headings), see https://issues.apache.org/jira/browse/LUCENE-8729 -->
|
<property name="javadoc.doclint.args" value="-Xdoclint:all,-missing"/>
|
||||||
<property name="javadoc.doclint.args" value="-Xdoclint:all,-missing,-accessibility"/>
|
|
||||||
<!---proc:none was added because of LOG4J2-1925 / JDK-8186647 -->
|
<!---proc:none was added because of LOG4J2-1925 / JDK-8186647 -->
|
||||||
<property name="javac.doclint.args" value="-Xdoclint:all/protected -Xdoclint:-missing -Xdoclint:-accessibility -proc:none"/>
|
<property name="javac.doclint.args" value="-Xdoclint:all/protected -Xdoclint:-missing -proc:none"/>
|
||||||
|
|
||||||
<condition property="javadoc.nomodule.args" value="--no-module-directories" else="">
|
<condition property="javadoc.nomodule.args" value="--no-module-directories" else="">
|
||||||
<or>
|
<or>
|
||||||
|
|
|
@ -100,7 +100,7 @@ import org.apache.lucene.util.fst.Util;
|
||||||
* </ul>
|
* </ul>
|
||||||
* <p>
|
* <p>
|
||||||
* <a id="Termdictionary"></a>
|
* <a id="Termdictionary"></a>
|
||||||
* <h3>Term Dictionary</h3>
|
* <h2>Term Dictionary</h2>
|
||||||
*
|
*
|
||||||
* <p>The .tim file contains the list of terms in each
|
* <p>The .tim file contains the list of terms in each
|
||||||
* field along with per-term statistics (such as docfreq)
|
* field along with per-term statistics (such as docfreq)
|
||||||
|
@ -159,7 +159,7 @@ import org.apache.lucene.util.fst.Util;
|
||||||
* to child nodes(sub-block). If so, the corresponding TermStats and TermMetaData are omitted </li>
|
* to child nodes(sub-block). If so, the corresponding TermStats and TermMetaData are omitted </li>
|
||||||
* </ul>
|
* </ul>
|
||||||
* <a id="Termindex"></a>
|
* <a id="Termindex"></a>
|
||||||
* <h3>Term Index</h3>
|
* <h2>Term Index</h2>
|
||||||
* <p>The .tip file contains an index into the term dictionary, so that it can be
|
* <p>The .tip file contains an index into the term dictionary, so that it can be
|
||||||
* accessed randomly. The index is also used to determine
|
* accessed randomly. The index is also used to determine
|
||||||
* when a given term cannot exist on disk (in the .tim file), saving a disk seek.</p>
|
* when a given term cannot exist on disk (in the .tim file), saving a disk seek.</p>
|
||||||
|
|
|
@ -18,7 +18,7 @@
|
||||||
/**
|
/**
|
||||||
* Lucene 8.4 file format.
|
* Lucene 8.4 file format.
|
||||||
*
|
*
|
||||||
* <h1>Apache Lucene - Index File Formats</h1>
|
* <h2>Apache Lucene - Index File Formats</h2>
|
||||||
* <div>
|
* <div>
|
||||||
* <ul>
|
* <ul>
|
||||||
* <li><a href="#Introduction">Introduction</a></li>
|
* <li><a href="#Introduction">Introduction</a></li>
|
||||||
|
@ -42,7 +42,7 @@
|
||||||
* </ul>
|
* </ul>
|
||||||
* </div>
|
* </div>
|
||||||
* <a id="Introduction"></a>
|
* <a id="Introduction"></a>
|
||||||
* <h2>Introduction</h2>
|
* <h3>Introduction</h3>
|
||||||
* <div>
|
* <div>
|
||||||
* <p>This document defines the index file formats used in this version of Lucene.
|
* <p>This document defines the index file formats used in this version of Lucene.
|
||||||
* If you are using a different version of Lucene, please consult the copy of
|
* If you are using a different version of Lucene, please consult the copy of
|
||||||
|
@ -52,7 +52,7 @@
|
||||||
* Lucene file formats.</p>
|
* Lucene file formats.</p>
|
||||||
* </div>
|
* </div>
|
||||||
* <a id="Definitions"></a>
|
* <a id="Definitions"></a>
|
||||||
* <h2>Definitions</h2>
|
* <h3>Definitions</h3>
|
||||||
* <div>
|
* <div>
|
||||||
* <p>The fundamental concepts in Lucene are index, document, field and term.</p>
|
* <p>The fundamental concepts in Lucene are index, document, field and term.</p>
|
||||||
* <p>An index contains a sequence of documents.</p>
|
* <p>An index contains a sequence of documents.</p>
|
||||||
|
@ -65,14 +65,14 @@
|
||||||
* term. Thus terms are represented as a pair: the string naming the field, and the
|
* term. Thus terms are represented as a pair: the string naming the field, and the
|
||||||
* bytes within the field.</p>
|
* bytes within the field.</p>
|
||||||
* <a id="Inverted_Indexing"></a>
|
* <a id="Inverted_Indexing"></a>
|
||||||
* <h3>Inverted Indexing</h3>
|
* <h4>Inverted Indexing</h4>
|
||||||
* <p>The index stores statistics about terms in order to make term-based search
|
* <p>The index stores statistics about terms in order to make term-based search
|
||||||
* more efficient. Lucene's index falls into the family of indexes known as an
|
* more efficient. Lucene's index falls into the family of indexes known as an
|
||||||
* <i>inverted index.</i> This is because it can list, for a term, the documents
|
* <i>inverted index.</i> This is because it can list, for a term, the documents
|
||||||
* that contain it. This is the inverse of the natural relationship, in which
|
* that contain it. This is the inverse of the natural relationship, in which
|
||||||
* documents list terms.</p>
|
* documents list terms.</p>
|
||||||
* <a id="Types_of_Fields"></a>
|
* <a id="Types_of_Fields"></a>
|
||||||
* <h3>Types of Fields</h3>
|
* <h4>Types of Fields</h4>
|
||||||
* <p>In Lucene, fields may be <i>stored</i>, in which case their text is stored
|
* <p>In Lucene, fields may be <i>stored</i>, in which case their text is stored
|
||||||
* in the index literally, in a non-inverted manner. Fields that are inverted are
|
* in the index literally, in a non-inverted manner. Fields that are inverted are
|
||||||
* called <i>indexed</i>. A field may be both stored and indexed.</p>
|
* called <i>indexed</i>. A field may be both stored and indexed.</p>
|
||||||
|
@ -83,7 +83,7 @@
|
||||||
* <p>See the {@link org.apache.lucene.document.Field Field}
|
* <p>See the {@link org.apache.lucene.document.Field Field}
|
||||||
* java docs for more information on Fields.</p>
|
* java docs for more information on Fields.</p>
|
||||||
* <a id="Segments"></a>
|
* <a id="Segments"></a>
|
||||||
* <h3>Segments</h3>
|
* <h4>Segments</h4>
|
||||||
* <p>Lucene indexes may be composed of multiple sub-indexes, or <i>segments</i>.
|
* <p>Lucene indexes may be composed of multiple sub-indexes, or <i>segments</i>.
|
||||||
* Each segment is a fully independent index, which could be searched separately.
|
* Each segment is a fully independent index, which could be searched separately.
|
||||||
* Indexes evolve by:</p>
|
* Indexes evolve by:</p>
|
||||||
|
@ -94,7 +94,7 @@
|
||||||
* <p>Searches may involve multiple segments and/or multiple indexes, each index
|
* <p>Searches may involve multiple segments and/or multiple indexes, each index
|
||||||
* potentially composed of a set of segments.</p>
|
* potentially composed of a set of segments.</p>
|
||||||
* <a id="Document_Numbers"></a>
|
* <a id="Document_Numbers"></a>
|
||||||
* <h3>Document Numbers</h3>
|
* <h4>Document Numbers</h4>
|
||||||
* <p>Internally, Lucene refers to documents by an integer <i>document number</i>.
|
* <p>Internally, Lucene refers to documents by an integer <i>document number</i>.
|
||||||
* The first document added to an index is numbered zero, and each subsequent
|
* The first document added to an index is numbered zero, and each subsequent
|
||||||
* document added gets a number one greater than the previous.</p>
|
* document added gets a number one greater than the previous.</p>
|
||||||
|
@ -123,7 +123,7 @@
|
||||||
* </ul>
|
* </ul>
|
||||||
* </div>
|
* </div>
|
||||||
* <a id="Overview"></a>
|
* <a id="Overview"></a>
|
||||||
* <h2>Index Structure Overview</h2>
|
* <h3>Index Structure Overview</h3>
|
||||||
* <div>
|
* <div>
|
||||||
* <p>Each segment index maintains the following:</p>
|
* <p>Each segment index maintains the following:</p>
|
||||||
* <ul>
|
* <ul>
|
||||||
|
@ -195,7 +195,7 @@
|
||||||
* <p>Details on each of these are provided in their linked pages.</p>
|
* <p>Details on each of these are provided in their linked pages.</p>
|
||||||
* </div>
|
* </div>
|
||||||
* <a id="File_Naming"></a>
|
* <a id="File_Naming"></a>
|
||||||
* <h2>File Naming</h2>
|
* <h3>File Naming</h3>
|
||||||
* <div>
|
* <div>
|
||||||
* <p>All files belonging to a segment have the same name with varying extensions.
|
* <p>All files belonging to a segment have the same name with varying extensions.
|
||||||
* The extensions correspond to the different file formats described below. When
|
* The extensions correspond to the different file formats described below. When
|
||||||
|
@ -211,7 +211,7 @@
|
||||||
* represented in alpha-numeric (base 36) form.</p>
|
* represented in alpha-numeric (base 36) form.</p>
|
||||||
* </div>
|
* </div>
|
||||||
* <a id="file-names"></a>
|
* <a id="file-names"></a>
|
||||||
* <h2>Summary of File Extensions</h2>
|
* <h3>Summary of File Extensions</h3>
|
||||||
* <div>
|
* <div>
|
||||||
* <p>The following table summarizes the names and extensions of the files in
|
* <p>The following table summarizes the names and extensions of the files in
|
||||||
* Lucene:</p>
|
* Lucene:</p>
|
||||||
|
@ -317,7 +317,7 @@
|
||||||
* </table>
|
* </table>
|
||||||
* </div>
|
* </div>
|
||||||
* <a id="Lock_File"></a>
|
* <a id="Lock_File"></a>
|
||||||
* <h2>Lock File</h2>
|
* <h3>Lock File</h3>
|
||||||
* The write lock, which is stored in the index directory by default, is named
|
* The write lock, which is stored in the index directory by default, is named
|
||||||
* "write.lock". If the lock directory is different from the index directory then
|
* "write.lock". If the lock directory is different from the index directory then
|
||||||
* the write lock will be named "XXXX-write.lock" where XXXX is a unique prefix
|
* the write lock will be named "XXXX-write.lock" where XXXX is a unique prefix
|
||||||
|
@ -325,7 +325,7 @@
|
||||||
* writer is currently modifying the index (adding or removing documents). This
|
* writer is currently modifying the index (adding or removing documents). This
|
||||||
* lock file ensures that only one writer is modifying the index at a time.
|
* lock file ensures that only one writer is modifying the index at a time.
|
||||||
* <a id="History"></a>
|
* <a id="History"></a>
|
||||||
* <h2>History</h2>
|
* <h3>History</h3>
|
||||||
* <p>Compatibility notes are provided in this document, describing how file
|
* <p>Compatibility notes are provided in this document, describing how file
|
||||||
* formats have changed from prior versions:</p>
|
* formats have changed from prior versions:</p>
|
||||||
* <ul>
|
* <ul>
|
||||||
|
@ -401,7 +401,7 @@
|
||||||
* performant encoding that is vectorized.</li>
|
* performant encoding that is vectorized.</li>
|
||||||
* </ul>
|
* </ul>
|
||||||
* <a id="Limitations"></a>
|
* <a id="Limitations"></a>
|
||||||
* <h2>Limitations</h2>
|
* <h3>Limitations</h3>
|
||||||
* <div>
|
* <div>
|
||||||
* <p>Lucene uses a Java <code>int</code> to refer to
|
* <p>Lucene uses a Java <code>int</code> to refer to
|
||||||
* document numbers, and the index file format uses an <code>Int32</code>
|
* document numbers, and the index file format uses an <code>Int32</code>
|
||||||
|
|
|
@ -38,7 +38,7 @@ import org.apache.lucene.util.bkd.BKDWriter;
|
||||||
* points are indexed with datastructures such as <a href="https://en.wikipedia.org/wiki/K-d_tree">KD-trees</a>.
|
* points are indexed with datastructures such as <a href="https://en.wikipedia.org/wiki/K-d_tree">KD-trees</a>.
|
||||||
* These structures are optimized for operations such as <i>range</i>, <i>distance</i>, <i>nearest-neighbor</i>,
|
* These structures are optimized for operations such as <i>range</i>, <i>distance</i>, <i>nearest-neighbor</i>,
|
||||||
* and <i>point-in-polygon</i> queries.
|
* and <i>point-in-polygon</i> queries.
|
||||||
* <h1>Basic Point Types</h1>
|
* <h2>Basic Point Types</h2>
|
||||||
* <table>
|
* <table>
|
||||||
* <caption>Basic point types in Java and Lucene</caption>
|
* <caption>Basic point types in Java and Lucene</caption>
|
||||||
* <tr><th>Java type</th><th>Lucene class</th></tr>
|
* <tr><th>Java type</th><th>Lucene class</th></tr>
|
||||||
|
@ -66,7 +66,7 @@ import org.apache.lucene.util.bkd.BKDWriter;
|
||||||
* Query query = IntPoint.newRangeQuery("year", 1960, 1980);
|
* Query query = IntPoint.newRangeQuery("year", 1960, 1980);
|
||||||
* TopDocs docs = searcher.search(query, ...);
|
* TopDocs docs = searcher.search(query, ...);
|
||||||
* </pre>
|
* </pre>
|
||||||
* <h1>Geospatial Point Types</h1>
|
* <h2>Geospatial Point Types</h2>
|
||||||
* Although basic point types such as {@link DoublePoint} support points in multi-dimensional space too, Lucene has
|
* Although basic point types such as {@link DoublePoint} support points in multi-dimensional space too, Lucene has
|
||||||
* specialized classes for location data. These classes are optimized for location data: they are more space-efficient and
|
* specialized classes for location data. These classes are optimized for location data: they are more space-efficient and
|
||||||
* support special operations such as <i>distance</i> and <i>polygon</i> queries. There are currently two implementations:
|
* support special operations such as <i>distance</i> and <i>polygon</i> queries. There are currently two implementations:
|
||||||
|
@ -76,7 +76,7 @@ import org.apache.lucene.util.bkd.BKDWriter;
|
||||||
* <li><a href="{@docRoot}/../spatial3d/org/apache/lucene/spatial3d/Geo3DPoint.html">Geo3DPoint</a>* in <i>lucene-spatial3d</i>: indexes {@code (latitude,longitude)} as {@code (x,y,z)} in three-dimensional space.
|
* <li><a href="{@docRoot}/../spatial3d/org/apache/lucene/spatial3d/Geo3DPoint.html">Geo3DPoint</a>* in <i>lucene-spatial3d</i>: indexes {@code (latitude,longitude)} as {@code (x,y,z)} in three-dimensional space.
|
||||||
* </ol>
|
* </ol>
|
||||||
* * does <b>not</b> support altitude, 3D here means "uses three dimensions under-the-hood"<br>
|
* * does <b>not</b> support altitude, 3D here means "uses three dimensions under-the-hood"<br>
|
||||||
* <h1>Advanced usage</h1>
|
* <h2>Advanced usage</h2>
|
||||||
* Custom structures can be created on top of single- or multi- dimensional basic types, on top of
|
* Custom structures can be created on top of single- or multi- dimensional basic types, on top of
|
||||||
* {@link BinaryPoint} for more flexibility, or via custom {@link Field} subclasses.
|
* {@link BinaryPoint} for more flexibility, or via custom {@link Field} subclasses.
|
||||||
*
|
*
|
||||||
|
|
|
@ -34,7 +34,7 @@ import java.util.Arrays;
|
||||||
* <p><code>document.add (new Field ("byNumber", Integer.toString(x), Field.Store.NO, Field.Index.NOT_ANALYZED));</code></p>
|
* <p><code>document.add (new Field ("byNumber", Integer.toString(x), Field.Store.NO, Field.Index.NOT_ANALYZED));</code></p>
|
||||||
*
|
*
|
||||||
*
|
*
|
||||||
* <h3>Valid Types of Values</h3>
|
* <h2>Valid Types of Values</h2>
|
||||||
*
|
*
|
||||||
* <p>There are four possible kinds of term values which may be put into
|
* <p>There are four possible kinds of term values which may be put into
|
||||||
* sorting fields: Integers, Longs, Floats, or Strings. Unless
|
* sorting fields: Integers, Longs, Floats, or Strings. Unless
|
||||||
|
@ -67,14 +67,14 @@ import java.util.Arrays;
|
||||||
* of term value has higher memory requirements than the other
|
* of term value has higher memory requirements than the other
|
||||||
* two types.
|
* two types.
|
||||||
*
|
*
|
||||||
* <h3>Object Reuse</h3>
|
* <h2>Object Reuse</h2>
|
||||||
*
|
*
|
||||||
* <p>One of these objects can be
|
* <p>One of these objects can be
|
||||||
* used multiple times and the sort order changed between usages.
|
* used multiple times and the sort order changed between usages.
|
||||||
*
|
*
|
||||||
* <p>This class is thread safe.
|
* <p>This class is thread safe.
|
||||||
*
|
*
|
||||||
* <h3>Memory Usage</h3>
|
* <h2>Memory Usage</h2>
|
||||||
*
|
*
|
||||||
* <p>Sorting uses of caches of term values maintained by the
|
* <p>Sorting uses of caches of term values maintained by the
|
||||||
* internal HitQueue(s). The cache is static and contains an integer
|
* internal HitQueue(s). The cache is static and contains an integer
|
||||||
|
|
|
@ -31,7 +31,7 @@
|
||||||
* <p>
|
* <p>
|
||||||
* The main access point is the {@link org.apache.lucene.util.packed.PackedInts} factory.
|
* The main access point is the {@link org.apache.lucene.util.packed.PackedInts} factory.
|
||||||
*
|
*
|
||||||
* <h3>In-memory structures</h3>
|
* <h2>In-memory structures</h2>
|
||||||
*
|
*
|
||||||
* <ul>
|
* <ul>
|
||||||
* <li><b>{@link org.apache.lucene.util.packed.PackedInts.Mutable}</b><ul>
|
* <li><b>{@link org.apache.lucene.util.packed.PackedInts.Mutable}</b><ul>
|
||||||
|
@ -62,7 +62,7 @@
|
||||||
* </ul></li>
|
* </ul></li>
|
||||||
* </ul>
|
* </ul>
|
||||||
*
|
*
|
||||||
* <h3>Disk-based structures</h3>
|
* <h2>Disk-based structures</h2>
|
||||||
*
|
*
|
||||||
* <ul>
|
* <ul>
|
||||||
* <li><b>{@link org.apache.lucene.util.packed.PackedInts.Writer}, {@link org.apache.lucene.util.packed.PackedInts.Reader}, {@link org.apache.lucene.util.packed.PackedInts.ReaderIterator}</b><ul>
|
* <li><b>{@link org.apache.lucene.util.packed.PackedInts.Writer}, {@link org.apache.lucene.util.packed.PackedInts.Reader}, {@link org.apache.lucene.util.packed.PackedInts.ReaderIterator}</b><ul>
|
||||||
|
|
|
@ -22,7 +22,7 @@
|
||||||
</head>
|
</head>
|
||||||
<body>
|
<body>
|
||||||
|
|
||||||
<h2>Misc Tools</h2>
|
<h1>Misc Tools</h1>
|
||||||
|
|
||||||
The misc package has various tools for splitting/merging indices,
|
The misc package has various tools for splitting/merging indices,
|
||||||
changing norms, finding high freq terms, and others.
|
changing norms, finding high freq terms, and others.
|
||||||
|
|
|
@ -80,7 +80,7 @@ import org.apache.lucene.util.PriorityQueue;
|
||||||
*
|
*
|
||||||
* Doug
|
* Doug
|
||||||
* </code></pre>
|
* </code></pre>
|
||||||
* <h3>Initial Usage</h3>
|
* <h2>Initial Usage</h2>
|
||||||
* <p>
|
* <p>
|
||||||
* This class has lots of options to try to make it efficient and flexible.
|
* This class has lots of options to try to make it efficient and flexible.
|
||||||
* The simplest possible usage is as follows. The bold
|
* The simplest possible usage is as follows. The bold
|
||||||
|
@ -109,7 +109,7 @@ import org.apache.lucene.util.PriorityQueue;
|
||||||
* <li> call the searcher to find the similar docs
|
* <li> call the searcher to find the similar docs
|
||||||
* </ol>
|
* </ol>
|
||||||
* <br>
|
* <br>
|
||||||
* <h3>More Advanced Usage</h3>
|
* <h2>More Advanced Usage</h2>
|
||||||
* <p>
|
* <p>
|
||||||
* You may want to use {@link #setFieldNames setFieldNames(...)} so you can examine
|
* You may want to use {@link #setFieldNames setFieldNames(...)} so you can examine
|
||||||
* multiple fields (e.g. body and title) for similarity.
|
* multiple fields (e.g. body and title) for similarity.
|
||||||
|
|
|
@ -21,7 +21,7 @@
|
||||||
</title>
|
</title>
|
||||||
</head>
|
</head>
|
||||||
<body>
|
<body>
|
||||||
Apache Lucene QueryParsers.
|
<h1>Apache Lucene QueryParsers.</h1>
|
||||||
<p>
|
<p>
|
||||||
This module provides a number of queryparsers:
|
This module provides a number of queryparsers:
|
||||||
<ul>
|
<ul>
|
||||||
|
|
|
@ -16,6 +16,6 @@
|
||||||
*/
|
*/
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* <h1>Near-real-time replication framework</h1>
|
* Near-real-time replication framework
|
||||||
*/
|
*/
|
||||||
package org.apache.lucene.replicator.nrt;
|
package org.apache.lucene.replicator.nrt;
|
||||||
|
|
|
@ -16,7 +16,7 @@
|
||||||
*/
|
*/
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* <h1>Files replication framework</h1>
|
* Files replication framework
|
||||||
*
|
*
|
||||||
* The
|
* The
|
||||||
* <a href="Replicator.html">Replicator</a> allows replicating files between a server and client(s). Producers publish
|
* <a href="Replicator.html">Replicator</a> allows replicating files between a server and client(s). Producers publish
|
||||||
|
|
|
@ -43,7 +43,7 @@ import org.apache.lucene.util.fst.FST;
|
||||||
* </ul>
|
* </ul>
|
||||||
* <p>
|
* <p>
|
||||||
* <a id="Completionictionary"></a>
|
* <a id="Completionictionary"></a>
|
||||||
* <h3>Completion Dictionary</h3>
|
* <h2>Completion Dictionary</h2>
|
||||||
* <p>The .lkp file contains an FST for each suggest field
|
* <p>The .lkp file contains an FST for each suggest field
|
||||||
* </p>
|
* </p>
|
||||||
* <ul>
|
* <ul>
|
||||||
|
@ -60,7 +60,7 @@ import org.apache.lucene.util.fst.FST;
|
||||||
* <li>FST maps all analyzed forms to surface forms of a SuggestField</li>
|
* <li>FST maps all analyzed forms to surface forms of a SuggestField</li>
|
||||||
* </ul>
|
* </ul>
|
||||||
* <a id="Completionindex"></a>
|
* <a id="Completionindex"></a>
|
||||||
* <h3>Completion Index</h3>
|
* <h2>Completion Index</h2>
|
||||||
* <p>The .cmp file contains an index into the completion dictionary, so that it can be
|
* <p>The .cmp file contains an index into the completion dictionary, so that it can be
|
||||||
* accessed randomly.</p>
|
* accessed randomly.</p>
|
||||||
* <ul>
|
* <ul>
|
||||||
|
|
|
@ -153,7 +153,7 @@ import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS;
|
||||||
/**
|
/**
|
||||||
* Base class for all Lucene unit tests, Junit3 or Junit4 variant.
|
* Base class for all Lucene unit tests, Junit3 or Junit4 variant.
|
||||||
*
|
*
|
||||||
* <h3>Class and instance setup.</h3>
|
* <h2>Class and instance setup.</h2>
|
||||||
*
|
*
|
||||||
* <p>
|
* <p>
|
||||||
* The preferred way to specify class (suite-level) setup/cleanup is to use
|
* The preferred way to specify class (suite-level) setup/cleanup is to use
|
||||||
|
@ -170,13 +170,13 @@ import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS;
|
||||||
* your subclass, make sure you call <code>super.setUp()</code> and
|
* your subclass, make sure you call <code>super.setUp()</code> and
|
||||||
* <code>super.tearDown()</code>. This is detected and enforced.
|
* <code>super.tearDown()</code>. This is detected and enforced.
|
||||||
*
|
*
|
||||||
* <h3>Specifying test cases</h3>
|
* <h2>Specifying test cases</h2>
|
||||||
*
|
*
|
||||||
* <p>
|
* <p>
|
||||||
* Any test method with a <code>testXXX</code> prefix is considered a test case.
|
* Any test method with a <code>testXXX</code> prefix is considered a test case.
|
||||||
* Any test method annotated with {@link Test} is considered a test case.
|
* Any test method annotated with {@link Test} is considered a test case.
|
||||||
*
|
*
|
||||||
* <h3>Randomized execution and test facilities</h3>
|
* <h2>Randomized execution and test facilities</h2>
|
||||||
*
|
*
|
||||||
* <p>
|
* <p>
|
||||||
* {@link LuceneTestCase} uses {@link RandomizedRunner} to execute test cases.
|
* {@link LuceneTestCase} uses {@link RandomizedRunner} to execute test cases.
|
||||||
|
|
|
@ -53,7 +53,7 @@ import org.apache.solr.util.SolrPluginUtils;
|
||||||
* features declared in the feature store of the current reranking model,
|
* features declared in the feature store of the current reranking model,
|
||||||
* or a specified feature store. Ex. <code>fl=id,[features store=myStore efi.user_text="ibm"]</code>
|
* or a specified feature store. Ex. <code>fl=id,[features store=myStore efi.user_text="ibm"]</code>
|
||||||
*
|
*
|
||||||
* <h3>Parameters</h3>
|
* <h2>Parameters</h2>
|
||||||
* <code>store</code> - The feature store to extract features from. If not provided it
|
* <code>store</code> - The feature store to extract features from. If not provided it
|
||||||
* will default to the features used by your reranking model.<br>
|
* will default to the features used by your reranking model.<br>
|
||||||
* <code>efi.*</code> - External feature information variables required by the features
|
* <code>efi.*</code> - External feature information variables required by the features
|
||||||
|
|
|
@ -33,7 +33,7 @@ to easily build their own learning to rank systems and access the rich
|
||||||
matching features readily available in Solr. It also provides tools to perform
|
matching features readily available in Solr. It also provides tools to perform
|
||||||
feature engineering and feature extraction.
|
feature engineering and feature extraction.
|
||||||
</p>
|
</p>
|
||||||
<h2> Code structure </h2>
|
<h1> Code structure </h1>
|
||||||
<p>
|
<p>
|
||||||
A Learning to Rank model is plugged into the ranking through the {@link org.apache.solr.ltr.search.LTRQParserPlugin},
|
A Learning to Rank model is plugged into the ranking through the {@link org.apache.solr.ltr.search.LTRQParserPlugin},
|
||||||
a {@link org.apache.solr.search.QParserPlugin}. The plugin will
|
a {@link org.apache.solr.search.QParserPlugin}. The plugin will
|
||||||
|
|
|
@ -15,7 +15,7 @@
|
||||||
* limitations under the License.
|
* limitations under the License.
|
||||||
*/
|
*/
|
||||||
/**
|
/**
|
||||||
* <h1>Simulated environment for autoscaling.</h1>
|
* Simulated environment for autoscaling.
|
||||||
*
|
*
|
||||||
* <h2>Goals</h2>
|
* <h2>Goals</h2>
|
||||||
* <ul>
|
* <ul>
|
||||||
|
|
|
@ -79,7 +79,7 @@ import org.apache.lucene.util.NumericUtils;
|
||||||
* one of the BooleanQuery rewrite methods without changing
|
* one of the BooleanQuery rewrite methods without changing
|
||||||
* BooleanQuery's default max clause count.
|
* BooleanQuery's default max clause count.
|
||||||
*
|
*
|
||||||
* <br><h3>How it works</h3>
|
* <br><h2>How it works</h2>
|
||||||
*
|
*
|
||||||
* <p>See the publication about <a target="_blank" href="http://www.panfmp.org">panFMP</a>,
|
* <p>See the publication about <a target="_blank" href="http://www.panfmp.org">panFMP</a>,
|
||||||
* where this algorithm was described (referred to as <code>TrieRangeQuery</code>):
|
* where this algorithm was described (referred to as <code>TrieRangeQuery</code>):
|
||||||
|
@ -111,7 +111,7 @@ import org.apache.lucene.util.NumericUtils;
|
||||||
* In practice, we have seen up to 300 terms in most cases (index with 500,000 metadata records
|
* In practice, we have seen up to 300 terms in most cases (index with 500,000 metadata records
|
||||||
* and a uniform value distribution).</p>
|
* and a uniform value distribution).</p>
|
||||||
*
|
*
|
||||||
* <h3><a id="precisionStepDesc">Precision Step</a></h3>
|
* <h2><a id="precisionStepDesc">Precision Step</a></h2>
|
||||||
* <p>You can choose any <code>precisionStep</code> when encoding values.
|
* <p>You can choose any <code>precisionStep</code> when encoding values.
|
||||||
* Lower step values mean more precisions and so more terms in index (and index gets larger). The number
|
* Lower step values mean more precisions and so more terms in index (and index gets larger). The number
|
||||||
* of indexed terms per value is (those are generated by {@link org.apache.solr.legacy.LegacyNumericTokenStream}):
|
* of indexed terms per value is (those are generated by {@link org.apache.solr.legacy.LegacyNumericTokenStream}):
|
||||||
|
|
|
@ -54,12 +54,12 @@ import org.apache.solr.search.TermsQParserPlugin;
|
||||||
* the value of this field is a document list, which is a result of executing subquery using
|
* the value of this field is a document list, which is a result of executing subquery using
|
||||||
* document fields as an input.
|
* document fields as an input.
|
||||||
*
|
*
|
||||||
* <h3>Subquery Parameters Shift</h3>
|
* <h2>Subquery Parameters Shift</h2>
|
||||||
* if subquery is declared as <code>fl=*,foo:[subquery]</code>, subquery parameters
|
* if subquery is declared as <code>fl=*,foo:[subquery]</code>, subquery parameters
|
||||||
* are prefixed with the given name and period. eg <br>
|
* are prefixed with the given name and period. eg <br>
|
||||||
* <code>q=*:*&fl=*,foo:[subquery]&foo.q=to be continued&foo.rows=10&foo.sort=id desc</code>
|
* <code>q=*:*&fl=*,foo:[subquery]&foo.q=to be continued&foo.rows=10&foo.sort=id desc</code>
|
||||||
*
|
*
|
||||||
* <h3>Document Field As An Input For Subquery Parameters</h3>
|
* <h2>Document Field As An Input For Subquery Parameters</h2>
|
||||||
*
|
*
|
||||||
* It's necessary to pass some document field value as a parameter for subquery. It's supported via
|
* It's necessary to pass some document field value as a parameter for subquery. It's supported via
|
||||||
* implicit <code>row.<i>fieldname</i></code> parameters, and can be (but might not only) referred via
|
* implicit <code>row.<i>fieldname</i></code> parameters, and can be (but might not only) referred via
|
||||||
|
@ -70,13 +70,13 @@ import org.apache.solr.search.TermsQParserPlugin;
|
||||||
* Note, when document field has multiple values they are concatenated with comma by default, it can be changed by
|
* Note, when document field has multiple values they are concatenated with comma by default, it can be changed by
|
||||||
* <code>foo:[subquery separator=' ']</code> local parameter, this mimics {@link TermsQParserPlugin} to work smoothly with.
|
* <code>foo:[subquery separator=' ']</code> local parameter, this mimics {@link TermsQParserPlugin} to work smoothly with.
|
||||||
*
|
*
|
||||||
* <h3>Cores And Collections In SolrCloud</h3>
|
* <h2>Cores And Collections In SolrCloud</h2>
|
||||||
* use <code>foo:[subquery fromIndex=departments]</code> invoke subquery on another core on the same node, it's like
|
* use <code>foo:[subquery fromIndex=departments]</code> invoke subquery on another core on the same node, it's like
|
||||||
* {@link JoinQParserPlugin} for non SolrCloud mode. <b>But for SolrCloud</b> just (and only) <b>explicitly specify</b>
|
* {@link JoinQParserPlugin} for non SolrCloud mode. <b>But for SolrCloud</b> just (and only) <b>explicitly specify</b>
|
||||||
* its' native parameters like <code>collection, shards</code> for subquery, eg<br>
|
* its' native parameters like <code>collection, shards</code> for subquery, eg<br>
|
||||||
* <code>q=*:*&fl=*,foo:[subquery]&foo.q=cloud&foo.collection=departments</code>
|
* <code>q=*:*&fl=*,foo:[subquery]&foo.q=cloud&foo.collection=departments</code>
|
||||||
*
|
*
|
||||||
* <h3>When used in Real Time Get</h3>
|
* <h2>When used in Real Time Get</h2>
|
||||||
* <p>
|
* <p>
|
||||||
* When used in the context of a Real Time Get, the <i>values</i> from each document that are used
|
* When used in the context of a Real Time Get, the <i>values</i> from each document that are used
|
||||||
* in the qubquery are the "real time" values (possibly from the transaction log), but the query
|
* in the qubquery are the "real time" values (possibly from the transaction log), but the query
|
||||||
|
|
|
@ -50,7 +50,7 @@ import org.slf4j.LoggerFactory;
|
||||||
* the final Lucene {@link Document} to be indexed.</p>
|
* the final Lucene {@link Document} to be indexed.</p>
|
||||||
* <p>Fields that are declared in the patterns list but are not present
|
* <p>Fields that are declared in the patterns list but are not present
|
||||||
* in the current schema will be removed from the input document.</p>
|
* in the current schema will be removed from the input document.</p>
|
||||||
* <h3>Implementation details</h3>
|
* <h2>Implementation details</h2>
|
||||||
* <p>This update processor uses {@link PreAnalyzedParser}
|
* <p>This update processor uses {@link PreAnalyzedParser}
|
||||||
* to parse the original field content (interpreted as a string value), and thus
|
* to parse the original field content (interpreted as a string value), and thus
|
||||||
* obtain the stored part and the token stream part. Then it creates the "template"
|
* obtain the stored part and the token stream part. Then it creates the "template"
|
||||||
|
@ -61,7 +61,7 @@ import org.slf4j.LoggerFactory;
|
||||||
* field type does not support stored or indexed parts then such parts are silently
|
* field type does not support stored or indexed parts then such parts are silently
|
||||||
* discarded. Finally the updated "template" {@link Field}-s are added to the resulting
|
* discarded. Finally the updated "template" {@link Field}-s are added to the resulting
|
||||||
* {@link SolrInputField}, and the original value of that field is removed.</p>
|
* {@link SolrInputField}, and the original value of that field is removed.</p>
|
||||||
* <h3>Example configuration</h3>
|
* <h2>Example configuration</h2>
|
||||||
* <p>In the example configuration below there are two update chains, one that
|
* <p>In the example configuration below there are two update chains, one that
|
||||||
* uses the "simple" parser ({@link SimplePreAnalyzedParser}) and one that uses
|
* uses the "simple" parser ({@link SimplePreAnalyzedParser}) and one that uses
|
||||||
* the "json" parser ({@link JsonPreAnalyzedParser}). Field "nonexistent" will be
|
* the "json" parser ({@link JsonPreAnalyzedParser}). Field "nonexistent" will be
|
||||||
|
|
Loading…
Reference in New Issue