[DOCS] Add attribute for Lucene analysis links (#51687)

Adds a `lucene-analysis-docs` attribute for the Lucene `/analysis/`
javadocs directory. This should prevent typos and keep the docs DRY.
This commit is contained in:
James Rodewig 2020-01-30 11:22:30 -05:00
parent 2a2a0941af
commit 36b2663e98
24 changed files with 33 additions and 27 deletions

View File

@ -1,6 +1,8 @@
[[analysis]]
= Text analysis
:lucene-analysis-docs: https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis
[partintro]
--

View File

@ -8,8 +8,8 @@ Strips all characters after an apostrophe, including the apostrophe itself.
This filter is included in {es}'s built-in <<turkish-analyzer,Turkish language
analyzer>>. It uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/tr/ApostropheFilter.html[ApostropheFilter],
which was built for the Turkish language.
{lucene-analysis-docs}/tr/ApostropheFilter.html[ApostropheFilter], which was
built for the Turkish language.
[[analysis-apostrophe-tokenfilter-analyze-ex]]

View File

@ -9,7 +9,7 @@ Latin Unicode block (first 127 ASCII characters) to their ASCII equivalent, if
one exists. For example, the filter changes `à` to `a`.
This filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html[ASCIIFoldingFilter].
{lucene-analysis-docs}/miscellaneous/ASCIIFoldingFilter.html[ASCIIFoldingFilter].
[[analysis-asciifolding-tokenfilter-analyze-ex]]
==== Example

View File

@ -9,7 +9,7 @@ Japanese, and Korean) tokens.
This filter is included in {es}'s built-in <<cjk-analyzer,CJK language
analyzer>>. It uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/cjk/CJKBigramFilter.html[CJKBigramFilter].
{lucene-analysis-docs}/cjk/CJKBigramFilter.html[CJKBigramFilter].
[[analysis-cjk-bigram-tokenfilter-analyze-ex]]

View File

@ -14,7 +14,7 @@ characters
This filter is included in {es}'s built-in <<cjk-analyzer,CJK language
analyzer>>. It uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/cjk/CJKWidthFilter.html[CJKWidthFilter].
{lucene-analysis-docs}/cjk/CJKWidthFilter.html[CJKWidthFilter].
NOTE: This token filter can be viewed as a subset of NFKC/NFKD Unicode
normalization. See the

View File

@ -9,7 +9,7 @@ Performs optional post-processing of terms generated by the
This filter removes the english possessive (`'s`) from the end of words and
removes dots from acronyms. It uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/standard/ClassicFilter.html[ClassicFilter].
{lucene-analysis-docs}/standard/ClassicFilter.html[ClassicFilter].
[[analysis-classic-tokenfilter-analyze-ex]]
==== Example

View File

@ -16,7 +16,7 @@ You can use the `common_grams` filter in place of the
completely ignore common words.
This filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/commongrams/CommonGramsFilter.html[CommonGramsFilter].
{lucene-analysis-docs}/commongrams/CommonGramsFilter.html[CommonGramsFilter].
[[analysis-common-grams-analyze-ex]]
==== Example

View File

@ -8,7 +8,7 @@ Applies a set of token filters to tokens that match conditions in a provided
predicate script.
This filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/ConditionalTokenFilter.html[ConditionalTokenFilter].
{lucene-analysis-docs}/miscellaneous/ConditionalTokenFilter.html[ConditionalTokenFilter].
[[analysis-condition-analyze-ex]]
==== Example

View File

@ -8,7 +8,7 @@ Converts all digits in the Unicode `Decimal_Number` General Category to `0-9`.
For example, the filter changes the Bengali numeral `৩` to `3`.
This filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysiscore/DecimalDigitFilter.html[DecimalDigitFilter].
{lucene-analysis-docs}/core/DecimalDigitFilter.html[DecimalDigitFilter].
[[analysis-decimal-digit-tokenfilter-analyze-ex]]
==== Example

View File

@ -18,7 +18,7 @@ split `the|1 quick|2 fox|3` into the tokens `the`, `quick`, and `fox`
with respective payloads of `1`, `2`, and `3`.
This filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/payloads/DelimitedPayloadTokenFilter.html[DelimitedPayloadTokenFilter].
{lucene-analysis-docs}/payloads/DelimitedPayloadTokenFilter.html[DelimitedPayloadTokenFilter].
[NOTE]
.Payloads

View File

@ -17,7 +17,7 @@ Uses a specified list of words and a brute force approach to find subwords in
compound words. If found, these subwords are included in the token output.
This filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilter.html[DictionaryCompoundWordTokenFilter],
{lucene-analysis-docs}/compound/DictionaryCompoundWordTokenFilter.html[DictionaryCompoundWordTokenFilter],
which was built for Germanic languages.
[[analysis-dict-decomp-tokenfilter-analyze-ex]]

View File

@ -13,7 +13,7 @@ For example, you can use the `edge_ngram` token filter to change `quick` to
When not customized, the filter creates 1-character edge n-grams by default.
This filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenFilter.html[EdgeNGramTokenFilter].
{lucene-analysis-docs}/ngram/EdgeNGramTokenFilter.html[EdgeNGramTokenFilter].
[NOTE]
====

View File

@ -22,7 +22,7 @@ Customized versions of this filter are included in several of {es}'s built-in
* <<italian-analyzer, Italian analyzer>>
This filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/util/ElisionFilter.html[ElisionFilter].
{lucene-analysis-docs}/util/ElisionFilter.html[ElisionFilter].
[[analysis-elision-tokenfilter-analyze-ex]]
==== Example

View File

@ -22,7 +22,7 @@ https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth#fingerprint[Op
project].
This filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene//analysis/miscellaneous/FingerprintFilter.html[FingerprintFilter].
{lucene-analysis-docs}/miscellaneous/FingerprintFilter.html[FingerprintFilter].
[[analysis-fingerprint-tokenfilter-analyze-ex]]
==== Example

View File

@ -9,7 +9,7 @@ words. These subwords are then checked against the specified word list. Subwords
in the list are excluded from the token output.
This filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/compound/HyphenationCompoundWordTokenFilter.html[HyphenationCompoundWordTokenFilter],
{lucene-analysis-docs}/compound/HyphenationCompoundWordTokenFilter.html[HyphenationCompoundWordTokenFilter],
which was built for Germanic languages.
[[analysis-hyp-decomp-tokenfilter-analyze-ex]]

View File

@ -26,7 +26,7 @@ type.
====
This filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/core/TypeTokenFilter.html[TypeTokenFilter].
{lucene-analysis-docs}/core/TypeTokenFilter.html[TypeTokenFilter].
[[analysis-keep-types-tokenfilter-analyze-include-ex]]
==== Include example

View File

@ -7,7 +7,7 @@
Keeps only tokens contained in a specified word list.
This filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/KeepWordFilter.html[KeepWordFilter].
{lucene-analysis-docs}/miscellaneous/KeepWordFilter.html[KeepWordFilter].
[NOTE]
====

View File

@ -9,7 +9,7 @@ For example, you can use the `length` filter to exclude tokens shorter than 2
characters and tokens longer than 5 characters.
This filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/LengthFilter.html[LengthFilter].
{lucene-analysis-docs}/miscellaneous/LengthFilter.html[LengthFilter].
[TIP]
====

View File

@ -12,7 +12,7 @@ example, the filter can change the token stream `[ one, two, three ]` to
`[ one ]`.
This filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilter.html[LimitTokenCountFilter].
{lucene-analysis-docs}/miscellaneous/LimitTokenCountFilter.html[LimitTokenCountFilter].
[TIP]
====

View File

@ -104,13 +104,17 @@ PUT lowercase_example
(Optional, string)
Language-specific lowercase token filter to use. Valid values include:
`greek`::: Uses Lucene's https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/el/GreekLowerCaseFilter.html[GreekLowerCaseFilter]
`greek`::: Uses Lucene's
{lucene-analysis-docs}/el/GreekLowerCaseFilter.html[GreekLowerCaseFilter]
`irish`::: Uses Lucene's http://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/ga/IrishLowerCaseFilter.html[IrishLowerCaseFilter]
`irish`::: Uses Lucene's
http://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/ga/IrishLowerCaseFilter.html[IrishLowerCaseFilter]
`turkish`::: Uses Lucene's https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/tr/TurkishLowerCaseFilter.html[TurkishLowerCaseFilter]
`turkish`::: Uses Lucene's
{lucene-analysis-docs}/tr/TurkishLowerCaseFilter.html[TurkishLowerCaseFilter]
If not specified, defaults to Lucene's https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/core/LowerCaseFilter.html[LowerCaseFilter].
If not specified, defaults to Lucene's
{lucene-analysis-docs}/core/LowerCaseFilter.html[LowerCaseFilter].
--
[[analysis-lowercase-tokenfilter-customize]]

View File

@ -11,7 +11,7 @@ For example, you can use the `ngram` token filter to change `fox` to
`[ f, fo, o, ox, x ]`.
This filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenFilter.html[NGramTokenFilter].
{lucene-analysis-docs}/ngram/NGramTokenFilter.html[NGramTokenFilter].
[NOTE]
====

View File

@ -12,7 +12,7 @@ such as finding words that end in `-ion` or searching file names by their
extension.
This filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/reverse/ReverseStringFilter.html[ReverseStringFilter].
{lucene-analysis-docs}/reverse/ReverseStringFilter.html[ReverseStringFilter].
[[analysis-reverse-tokenfilter-analyze-ex]]
==== Example

View File

@ -11,7 +11,7 @@ For example, you can use the `truncate` filter to shorten all tokens to
`3` characters or fewer, changing `jumping fox` to `jum fox`.
This filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/TruncateTokenFilter.html[TruncateTokenFilter].
{lucene-analysis-docs}/miscellaneous/TruncateTokenFilter.html[TruncateTokenFilter].
[[analysis-truncate-tokenfilter-analyze-ex]]
==== Example

View File

@ -8,7 +8,7 @@ Changes token text to uppercase. For example, you can use the `uppercase` filter
to change `the Lazy DoG` to `THE LAZY DOG`.
This filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/core/UpperCaseFilter.html[UpperCaseFilter].
{lucene-analysis-docs}/core/UpperCaseFilter.html[UpperCaseFilter].
[WARNING]
====