LUCENE-8279: fix javadocs wrong header levels and accessibility issues

Java 13 adds a new doclint check under "accessibility" that the html
header nesting level isn't crazy.

Many are incorrect because the html4-style javadocs had horrible
font-sizes, so developers used the wrong header level to work around it.
This is no issue in trunk (always html5).

Java recommends against using such structured tags at all in javadocs,
but that is a more involved change: this just "shifts" header levels
in documents to be correct.
This commit is contained in:
Robert Muir 2020-02-08 10:00:00 -05:00
parent f5c132be6d
commit f41eabdc5f
No known key found for this signature in database
GPG Key ID: 817AE1DD322D7ECA
23 changed files with 59 additions and 59 deletions

View File

@ -44,7 +44,7 @@ allprojects {
)
opts.addStringOption("-release", "11")
opts.addBooleanOption('Xdoclint:all,-missing,-accessibility', true)
opts.addBooleanOption('Xdoclint:all,-missing', true)
def libName = project.path.startsWith(":lucene") ? "Lucene" : "Solr"
opts.overview = file("src/java/overview.html").toString()

View File

@ -73,8 +73,8 @@
* word.</li>
* </ul>
*
* <h3>Compound word token filters</h3>
* <h4>HyphenationCompoundWordTokenFilter</h4>
* <h2>Compound word token filters</h2>
* <h3>HyphenationCompoundWordTokenFilter</h3>
* The {@link
* org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter
* HyphenationCompoundWordTokenFilter} uses hyphenation grammars to find
@ -82,7 +82,7 @@
* without a dictionary as well but then produces a lot of "nonword" tokens.
* The quality of the output tokens is directly connected to the quality of the
* grammar file you use. For languages like German they are quite good.
* <h5>Grammar file</h5>
* <h4>Grammar file</h4>
* Unfortunately we cannot bundle the hyphenation grammar files with Lucene
* because they do not use an ASF compatible license (they use the LaTeX
* Project Public License instead). You can find the XML based grammar
@ -99,7 +99,7 @@
* <a href="http://xmlgraphics.apache.org/fop/">Apache FOP project</a>
* .
*
* <h4>DictionaryCompoundWordTokenFilter</h4>
* <h3>DictionaryCompoundWordTokenFilter</h3>
* The {@link
* org.apache.lucene.analysis.compound.DictionaryCompoundWordTokenFilter
* DictionaryCompoundWordTokenFilter} uses a dictionary-only approach to
@ -107,7 +107,7 @@
* uses the hyphenation grammars. You can use it as a first start to
* see if your dictionary is good or not because it is much simpler in design.
*
* <h3>Dictionary</h3>
* <h2>Dictionary</h2>
* The output quality of both token filters is directly connected to the
* quality of the dictionary you use. They are language dependent of course.
* You always should use a dictionary
@ -118,7 +118,7 @@
* dictionaries</a>
* Wiki.
*
* <h3>Which variant should I use?</h3>
* <h2>Which variant should I use?</h2>
* This decision matrix should help you:
* <table style="border: 1px solid">
* <caption>comparison of dictionary and hyphenation based decompounding</caption>
@ -138,7 +138,7 @@
* <td>slow</td>
* </tr>
* </table>
* <h3>Examples</h3>
* <h2>Examples</h2>
* <pre class="prettyprint">
* public void testHyphenationCompoundWordsDE() throws Exception {
* String[] dict = { "Rind", "Fleisch", "Draht", "Schere", "Gesetz",

View File

@ -272,6 +272,7 @@ correct stem.</li>
<li><b>table size:</b> the size in bytes of the stemmer table.</li>
</ul>
<table class="padding2" style="border: 1px solid; border-spacing: 0px; border-collapse: separate">
<caption>test results for different sizes of training samples</caption>
<tbody>
<tr style="background-color: #a0b0c0">
<th>Training sets</th>

View File

@ -63,7 +63,7 @@ import org.apache.lucene.util.fst.Util;
* <p>
*
* <a id="Termdictionary"></a>
* <h3>Term Dictionary</h3>
* <h2>Term Dictionary</h2>
* <p>
* The .tst contains a list of FSTs, one for each field.
* The FST maps a term to its corresponding statistics (e.g. docfreq)

View File

@ -201,10 +201,9 @@
<!-- We must have the index disabled for releases, as Java 11 includes a javascript search engine with GPL license: -->
<property name="javadoc.noindex" value="true"/>
<!---TODO: Fix accessibility (order of H1/H2/H3 headings), see https://issues.apache.org/jira/browse/LUCENE-8729 -->
<property name="javadoc.doclint.args" value="-Xdoclint:all,-missing,-accessibility"/>
<property name="javadoc.doclint.args" value="-Xdoclint:all,-missing"/>
<!---proc:none was added because of LOG4J2-1925 / JDK-8186647 -->
<property name="javac.doclint.args" value="-Xdoclint:all/protected -Xdoclint:-missing -Xdoclint:-accessibility -proc:none"/>
<property name="javac.doclint.args" value="-Xdoclint:all/protected -Xdoclint:-missing -proc:none"/>
<condition property="javadoc.nomodule.args" value="--no-module-directories" else="">
<or>

View File

@ -100,7 +100,7 @@ import org.apache.lucene.util.fst.Util;
* </ul>
* <p>
* <a id="Termdictionary"></a>
* <h3>Term Dictionary</h3>
* <h2>Term Dictionary</h2>
*
* <p>The .tim file contains the list of terms in each
* field along with per-term statistics (such as docfreq)
@ -159,7 +159,7 @@ import org.apache.lucene.util.fst.Util;
* to child nodes(sub-block). If so, the corresponding TermStats and TermMetaData are omitted </li>
* </ul>
* <a id="Termindex"></a>
* <h3>Term Index</h3>
* <h2>Term Index</h2>
* <p>The .tip file contains an index into the term dictionary, so that it can be
* accessed randomly. The index is also used to determine
* when a given term cannot exist on disk (in the .tim file), saving a disk seek.</p>

View File

@ -18,7 +18,7 @@
/**
* Lucene 8.4 file format.
*
* <h1>Apache Lucene - Index File Formats</h1>
* <h2>Apache Lucene - Index File Formats</h2>
* <div>
* <ul>
* <li><a href="#Introduction">Introduction</a></li>
@ -42,7 +42,7 @@
* </ul>
* </div>
* <a id="Introduction"></a>
* <h2>Introduction</h2>
* <h3>Introduction</h3>
* <div>
* <p>This document defines the index file formats used in this version of Lucene.
* If you are using a different version of Lucene, please consult the copy of
@ -52,7 +52,7 @@
* Lucene file formats.</p>
* </div>
* <a id="Definitions"></a>
* <h2>Definitions</h2>
* <h3>Definitions</h3>
* <div>
* <p>The fundamental concepts in Lucene are index, document, field and term.</p>
* <p>An index contains a sequence of documents.</p>
@ -65,14 +65,14 @@
* term. Thus terms are represented as a pair: the string naming the field, and the
* bytes within the field.</p>
* <a id="Inverted_Indexing"></a>
* <h3>Inverted Indexing</h3>
* <h4>Inverted Indexing</h4>
* <p>The index stores statistics about terms in order to make term-based search
* more efficient. Lucene's index falls into the family of indexes known as an
* <i>inverted index.</i> This is because it can list, for a term, the documents
* that contain it. This is the inverse of the natural relationship, in which
* documents list terms.</p>
* <a id="Types_of_Fields"></a>
* <h3>Types of Fields</h3>
* <h4>Types of Fields</h4>
* <p>In Lucene, fields may be <i>stored</i>, in which case their text is stored
* in the index literally, in a non-inverted manner. Fields that are inverted are
* called <i>indexed</i>. A field may be both stored and indexed.</p>
@ -83,7 +83,7 @@
* <p>See the {@link org.apache.lucene.document.Field Field}
* java docs for more information on Fields.</p>
* <a id="Segments"></a>
* <h3>Segments</h3>
* <h4>Segments</h4>
* <p>Lucene indexes may be composed of multiple sub-indexes, or <i>segments</i>.
* Each segment is a fully independent index, which could be searched separately.
* Indexes evolve by:</p>
@ -94,7 +94,7 @@
* <p>Searches may involve multiple segments and/or multiple indexes, each index
* potentially composed of a set of segments.</p>
* <a id="Document_Numbers"></a>
* <h3>Document Numbers</h3>
* <h4>Document Numbers</h4>
* <p>Internally, Lucene refers to documents by an integer <i>document number</i>.
* The first document added to an index is numbered zero, and each subsequent
* document added gets a number one greater than the previous.</p>
@ -123,7 +123,7 @@
* </ul>
* </div>
* <a id="Overview"></a>
* <h2>Index Structure Overview</h2>
* <h3>Index Structure Overview</h3>
* <div>
* <p>Each segment index maintains the following:</p>
* <ul>
@ -195,7 +195,7 @@
* <p>Details on each of these are provided in their linked pages.</p>
* </div>
* <a id="File_Naming"></a>
* <h2>File Naming</h2>
* <h3>File Naming</h3>
* <div>
* <p>All files belonging to a segment have the same name with varying extensions.
* The extensions correspond to the different file formats described below. When
@ -211,7 +211,7 @@
* represented in alpha-numeric (base 36) form.</p>
* </div>
* <a id="file-names"></a>
* <h2>Summary of File Extensions</h2>
* <h3>Summary of File Extensions</h3>
* <div>
* <p>The following table summarizes the names and extensions of the files in
* Lucene:</p>
@ -317,7 +317,7 @@
* </table>
* </div>
* <a id="Lock_File"></a>
* <h2>Lock File</h2>
* <h3>Lock File</h3>
* The write lock, which is stored in the index directory by default, is named
* "write.lock". If the lock directory is different from the index directory then
* the write lock will be named "XXXX-write.lock" where XXXX is a unique prefix
@ -325,7 +325,7 @@
* writer is currently modifying the index (adding or removing documents). This
* lock file ensures that only one writer is modifying the index at a time.
* <a id="History"></a>
* <h2>History</h2>
* <h3>History</h3>
* <p>Compatibility notes are provided in this document, describing how file
* formats have changed from prior versions:</p>
* <ul>
@ -401,7 +401,7 @@
* performant encoding that is vectorized.</li>
* </ul>
* <a id="Limitations"></a>
* <h2>Limitations</h2>
* <h3>Limitations</h3>
* <div>
* <p>Lucene uses a Java <code>int</code> to refer to
* document numbers, and the index file format uses an <code>Int32</code>

View File

@ -38,7 +38,7 @@ import org.apache.lucene.util.bkd.BKDWriter;
* points are indexed with datastructures such as <a href="https://en.wikipedia.org/wiki/K-d_tree">KD-trees</a>.
* These structures are optimized for operations such as <i>range</i>, <i>distance</i>, <i>nearest-neighbor</i>,
* and <i>point-in-polygon</i> queries.
* <h1>Basic Point Types</h1>
* <h2>Basic Point Types</h2>
* <table>
* <caption>Basic point types in Java and Lucene</caption>
* <tr><th>Java type</th><th>Lucene class</th></tr>
@ -66,7 +66,7 @@ import org.apache.lucene.util.bkd.BKDWriter;
* Query query = IntPoint.newRangeQuery("year", 1960, 1980);
* TopDocs docs = searcher.search(query, ...);
* </pre>
* <h1>Geospatial Point Types</h1>
* <h2>Geospatial Point Types</h2>
* Although basic point types such as {@link DoublePoint} support points in multi-dimensional space too, Lucene has
* specialized classes for location data. These classes are optimized for location data: they are more space-efficient and
* support special operations such as <i>distance</i> and <i>polygon</i> queries. There are currently two implementations:
@ -76,7 +76,7 @@ import org.apache.lucene.util.bkd.BKDWriter;
* <li><a href="{@docRoot}/../spatial3d/org/apache/lucene/spatial3d/Geo3DPoint.html">Geo3DPoint</a>* in <i>lucene-spatial3d</i>: indexes {@code (latitude,longitude)} as {@code (x,y,z)} in three-dimensional space.
* </ol>
* * does <b>not</b> support altitude, 3D here means "uses three dimensions under-the-hood"<br>
* <h1>Advanced usage</h1>
* <h2>Advanced usage</h2>
* Custom structures can be created on top of single- or multi- dimensional basic types, on top of
* {@link BinaryPoint} for more flexibility, or via custom {@link Field} subclasses.
*

View File

@ -34,7 +34,7 @@ import java.util.Arrays;
* <p><code>document.add (new Field ("byNumber", Integer.toString(x), Field.Store.NO, Field.Index.NOT_ANALYZED));</code></p>
*
*
* <h3>Valid Types of Values</h3>
* <h2>Valid Types of Values</h2>
*
* <p>There are four possible kinds of term values which may be put into
* sorting fields: Integers, Longs, Floats, or Strings. Unless
@ -67,14 +67,14 @@ import java.util.Arrays;
* of term value has higher memory requirements than the other
* two types.
*
* <h3>Object Reuse</h3>
* <h2>Object Reuse</h2>
*
* <p>One of these objects can be
* used multiple times and the sort order changed between usages.
*
* <p>This class is thread safe.
*
* <h3>Memory Usage</h3>
* <h2>Memory Usage</h2>
*
* <p>Sorting uses of caches of term values maintained by the
* internal HitQueue(s). The cache is static and contains an integer

View File

@ -31,7 +31,7 @@
* <p>
* The main access point is the {@link org.apache.lucene.util.packed.PackedInts} factory.
*
* <h3>In-memory structures</h3>
* <h2>In-memory structures</h2>
*
* <ul>
* <li><b>{@link org.apache.lucene.util.packed.PackedInts.Mutable}</b><ul>
@ -62,7 +62,7 @@
* </ul></li>
* </ul>
*
* <h3>Disk-based structures</h3>
* <h2>Disk-based structures</h2>
*
* <ul>
* <li><b>{@link org.apache.lucene.util.packed.PackedInts.Writer}, {@link org.apache.lucene.util.packed.PackedInts.Reader}, {@link org.apache.lucene.util.packed.PackedInts.ReaderIterator}</b><ul>

View File

@ -22,7 +22,7 @@
</head>
<body>
<h2>Misc Tools</h2>
<h1>Misc Tools</h1>
The misc package has various tools for splitting/merging indices,
changing norms, finding high freq terms, and others.

View File

@ -80,7 +80,7 @@ import org.apache.lucene.util.PriorityQueue;
*
* Doug
* </code></pre>
* <h3>Initial Usage</h3>
* <h2>Initial Usage</h2>
* <p>
* This class has lots of options to try to make it efficient and flexible.
* The simplest possible usage is as follows. The bold
@ -109,7 +109,7 @@ import org.apache.lucene.util.PriorityQueue;
* <li> call the searcher to find the similar docs
* </ol>
* <br>
* <h3>More Advanced Usage</h3>
* <h2>More Advanced Usage</h2>
* <p>
* You may want to use {@link #setFieldNames setFieldNames(...)} so you can examine
* multiple fields (e.g. body and title) for similarity.

View File

@ -21,7 +21,7 @@
</title>
</head>
<body>
Apache Lucene QueryParsers.
<h1>Apache Lucene QueryParsers.</h1>
<p>
This module provides a number of queryparsers:
<ul>

View File

@ -16,6 +16,6 @@
*/
/**
* <h1>Near-real-time replication framework</h1>
* Near-real-time replication framework
*/
package org.apache.lucene.replicator.nrt;

View File

@ -16,7 +16,7 @@
*/
/**
* <h1>Files replication framework</h1>
* Files replication framework
*
* The
* <a href="Replicator.html">Replicator</a> allows replicating files between a server and client(s). Producers publish

View File

@ -43,7 +43,7 @@ import org.apache.lucene.util.fst.FST;
* </ul>
* <p>
* <a id="Completionictionary"></a>
* <h3>Completion Dictionary</h3>
* <h2>Completion Dictionary</h2>
* <p>The .lkp file contains an FST for each suggest field
* </p>
* <ul>
@ -60,7 +60,7 @@ import org.apache.lucene.util.fst.FST;
* <li>FST maps all analyzed forms to surface forms of a SuggestField</li>
* </ul>
* <a id="Completionindex"></a>
* <h3>Completion Index</h3>
* <h2>Completion Index</h2>
* <p>The .cmp file contains an index into the completion dictionary, so that it can be
* accessed randomly.</p>
* <ul>

View File

@ -153,7 +153,7 @@ import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS;
/**
* Base class for all Lucene unit tests, Junit3 or Junit4 variant.
*
* <h3>Class and instance setup.</h3>
* <h2>Class and instance setup.</h2>
*
* <p>
* The preferred way to specify class (suite-level) setup/cleanup is to use
@ -170,13 +170,13 @@ import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS;
* your subclass, make sure you call <code>super.setUp()</code> and
* <code>super.tearDown()</code>. This is detected and enforced.
*
* <h3>Specifying test cases</h3>
* <h2>Specifying test cases</h2>
*
* <p>
* Any test method with a <code>testXXX</code> prefix is considered a test case.
* Any test method annotated with {@link Test} is considered a test case.
*
* <h3>Randomized execution and test facilities</h3>
* <h2>Randomized execution and test facilities</h2>
*
* <p>
* {@link LuceneTestCase} uses {@link RandomizedRunner} to execute test cases.

View File

@ -53,7 +53,7 @@ import org.apache.solr.util.SolrPluginUtils;
* features declared in the feature store of the current reranking model,
* or a specified feature store. Ex. <code>fl=id,[features store=myStore efi.user_text="ibm"]</code>
*
* <h3>Parameters</h3>
* <h2>Parameters</h2>
* <code>store</code> - The feature store to extract features from. If not provided it
* will default to the features used by your reranking model.<br>
* <code>efi.*</code> - External feature information variables required by the features

View File

@ -33,7 +33,7 @@ to easily build their own learning to rank systems and access the rich
matching features readily available in Solr. It also provides tools to perform
feature engineering and feature extraction.
</p>
<h2> Code structure </h2>
<h1> Code structure </h1>
<p>
A Learning to Rank model is plugged into the ranking through the {@link org.apache.solr.ltr.search.LTRQParserPlugin},
a {@link org.apache.solr.search.QParserPlugin}. The plugin will

View File

@ -15,7 +15,7 @@
* limitations under the License.
*/
/**
* <h1>Simulated environment for autoscaling.</h1>
* Simulated environment for autoscaling.
*
* <h2>Goals</h2>
* <ul>

View File

@ -79,7 +79,7 @@ import org.apache.lucene.util.NumericUtils;
* one of the BooleanQuery rewrite methods without changing
* BooleanQuery's default max clause count.
*
* <br><h3>How it works</h3>
* <br><h2>How it works</h2>
*
* <p>See the publication about <a target="_blank" href="http://www.panfmp.org">panFMP</a>,
* where this algorithm was described (referred to as <code>TrieRangeQuery</code>):
@ -111,7 +111,7 @@ import org.apache.lucene.util.NumericUtils;
* In practice, we have seen up to 300 terms in most cases (index with 500,000 metadata records
* and a uniform value distribution).</p>
*
* <h3><a id="precisionStepDesc">Precision Step</a></h3>
* <h2><a id="precisionStepDesc">Precision Step</a></h2>
* <p>You can choose any <code>precisionStep</code> when encoding values.
* Lower step values mean more precisions and so more terms in index (and index gets larger). The number
* of indexed terms per value is (those are generated by {@link org.apache.solr.legacy.LegacyNumericTokenStream}):

View File

@ -54,12 +54,12 @@ import org.apache.solr.search.TermsQParserPlugin;
* the value of this field is a document list, which is a result of executing subquery using
* document fields as an input.
*
* <h3>Subquery Parameters Shift</h3>
* <h2>Subquery Parameters Shift</h2>
* if subquery is declared as <code>fl=*,foo:[subquery]</code>, subquery parameters
* are prefixed with the given name and period. eg <br>
* <code>q=*:*&amp;fl=*,foo:[subquery]&amp;foo.q=to be continued&amp;foo.rows=10&amp;foo.sort=id desc</code>
*
* <h3>Document Field As An Input For Subquery Parameters</h3>
* <h2>Document Field As An Input For Subquery Parameters</h2>
*
* It's necessary to pass some document field value as a parameter for subquery. It's supported via
* implicit <code>row.<i>fieldname</i></code> parameters, and can be (but might not only) referred via
@ -70,13 +70,13 @@ import org.apache.solr.search.TermsQParserPlugin;
* Note, when document field has multiple values they are concatenated with comma by default, it can be changed by
* <code>foo:[subquery separator=' ']</code> local parameter, this mimics {@link TermsQParserPlugin} to work smoothly with.
*
* <h3>Cores And Collections In SolrCloud</h3>
* <h2>Cores And Collections In SolrCloud</h2>
* use <code>foo:[subquery fromIndex=departments]</code> invoke subquery on another core on the same node, it's like
* {@link JoinQParserPlugin} for non SolrCloud mode. <b>But for SolrCloud</b> just (and only) <b>explicitly specify</b>
* its' native parameters like <code>collection, shards</code> for subquery, eg<br>
* <code>q=*:*&amp;fl=*,foo:[subquery]&amp;foo.q=cloud&amp;foo.collection=departments</code>
*
* <h3>When used in Real Time Get</h3>
* <h2>When used in Real Time Get</h2>
* <p>
* When used in the context of a Real Time Get, the <i>values</i> from each document that are used
* in the qubquery are the "real time" values (possibly from the transaction log), but the query

View File

@ -50,7 +50,7 @@ import org.slf4j.LoggerFactory;
* the final Lucene {@link Document} to be indexed.</p>
* <p>Fields that are declared in the patterns list but are not present
* in the current schema will be removed from the input document.</p>
* <h3>Implementation details</h3>
* <h2>Implementation details</h2>
* <p>This update processor uses {@link PreAnalyzedParser}
* to parse the original field content (interpreted as a string value), and thus
* obtain the stored part and the token stream part. Then it creates the "template"
@ -61,7 +61,7 @@ import org.slf4j.LoggerFactory;
* field type does not support stored or indexed parts then such parts are silently
* discarded. Finally the updated "template" {@link Field}-s are added to the resulting
* {@link SolrInputField}, and the original value of that field is removed.</p>
* <h3>Example configuration</h3>
* <h2>Example configuration</h2>
* <p>In the example configuration below there are two update chains, one that
* uses the "simple" parser ({@link SimplePreAnalyzedParser}) and one that uses
* the "json" parser ({@link JsonPreAnalyzedParser}). Field "nonexistent" will be