[DOCS] Add attribute for Lucene analysis links (#51687)
Adds a `lucene-analysis-docs` attribute for the Lucene `/analysis/` javadocs directory. This should prevent typos and keep the docs DRY.
This commit is contained in:
parent
2a2a0941af
commit
36b2663e98
|
@ -1,6 +1,8 @@
|
|||
[[analysis]]
|
||||
= Text analysis
|
||||
|
||||
:lucene-analysis-docs: https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis
|
||||
|
||||
[partintro]
|
||||
--
|
||||
|
||||
|
|
|
@ -8,8 +8,8 @@ Strips all characters after an apostrophe, including the apostrophe itself.
|
|||
|
||||
This filter is included in {es}'s built-in <<turkish-analyzer,Turkish language
|
||||
analyzer>>. It uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/tr/ApostropheFilter.html[ApostropheFilter],
|
||||
which was built for the Turkish language.
|
||||
{lucene-analysis-docs}/tr/ApostropheFilter.html[ApostropheFilter], which was
|
||||
built for the Turkish language.
|
||||
|
||||
|
||||
[[analysis-apostrophe-tokenfilter-analyze-ex]]
|
||||
|
|
|
@ -9,7 +9,7 @@ Latin Unicode block (first 127 ASCII characters) to their ASCII equivalent, if
|
|||
one exists. For example, the filter changes `à` to `a`.
|
||||
|
||||
This filter uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html[ASCIIFoldingFilter].
|
||||
{lucene-analysis-docs}/miscellaneous/ASCIIFoldingFilter.html[ASCIIFoldingFilter].
|
||||
|
||||
[[analysis-asciifolding-tokenfilter-analyze-ex]]
|
||||
==== Example
|
||||
|
|
|
@ -9,7 +9,7 @@ Japanese, and Korean) tokens.
|
|||
|
||||
This filter is included in {es}'s built-in <<cjk-analyzer,CJK language
|
||||
analyzer>>. It uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/cjk/CJKBigramFilter.html[CJKBigramFilter].
|
||||
{lucene-analysis-docs}/cjk/CJKBigramFilter.html[CJKBigramFilter].
|
||||
|
||||
|
||||
[[analysis-cjk-bigram-tokenfilter-analyze-ex]]
|
||||
|
|
|
@ -14,7 +14,7 @@ characters
|
|||
|
||||
This filter is included in {es}'s built-in <<cjk-analyzer,CJK language
|
||||
analyzer>>. It uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/cjk/CJKWidthFilter.html[CJKWidthFilter].
|
||||
{lucene-analysis-docs}/cjk/CJKWidthFilter.html[CJKWidthFilter].
|
||||
|
||||
NOTE: This token filter can be viewed as a subset of NFKC/NFKD Unicode
|
||||
normalization. See the
|
||||
|
|
|
@ -9,7 +9,7 @@ Performs optional post-processing of terms generated by the
|
|||
|
||||
This filter removes the english possessive (`'s`) from the end of words and
|
||||
removes dots from acronyms. It uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/standard/ClassicFilter.html[ClassicFilter].
|
||||
{lucene-analysis-docs}/standard/ClassicFilter.html[ClassicFilter].
|
||||
|
||||
[[analysis-classic-tokenfilter-analyze-ex]]
|
||||
==== Example
|
||||
|
|
|
@ -16,7 +16,7 @@ You can use the `common_grams` filter in place of the
|
|||
completely ignore common words.
|
||||
|
||||
This filter uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/commongrams/CommonGramsFilter.html[CommonGramsFilter].
|
||||
{lucene-analysis-docs}/commongrams/CommonGramsFilter.html[CommonGramsFilter].
|
||||
|
||||
[[analysis-common-grams-analyze-ex]]
|
||||
==== Example
|
||||
|
|
|
@ -8,7 +8,7 @@ Applies a set of token filters to tokens that match conditions in a provided
|
|||
predicate script.
|
||||
|
||||
This filter uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/ConditionalTokenFilter.html[ConditionalTokenFilter].
|
||||
{lucene-analysis-docs}/miscellaneous/ConditionalTokenFilter.html[ConditionalTokenFilter].
|
||||
|
||||
[[analysis-condition-analyze-ex]]
|
||||
==== Example
|
||||
|
|
|
@ -8,7 +8,7 @@ Converts all digits in the Unicode `Decimal_Number` General Category to `0-9`.
|
|||
For example, the filter changes the Bengali numeral `৩` to `3`.
|
||||
|
||||
This filter uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysiscore/DecimalDigitFilter.html[DecimalDigitFilter].
|
||||
{lucene-analysis-docs}/core/DecimalDigitFilter.html[DecimalDigitFilter].
|
||||
|
||||
[[analysis-decimal-digit-tokenfilter-analyze-ex]]
|
||||
==== Example
|
||||
|
|
|
@ -18,7 +18,7 @@ split `the|1 quick|2 fox|3` into the tokens `the`, `quick`, and `fox`
|
|||
with respective payloads of `1`, `2`, and `3`.
|
||||
|
||||
This filter uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/payloads/DelimitedPayloadTokenFilter.html[DelimitedPayloadTokenFilter].
|
||||
{lucene-analysis-docs}/payloads/DelimitedPayloadTokenFilter.html[DelimitedPayloadTokenFilter].
|
||||
|
||||
[NOTE]
|
||||
.Payloads
|
||||
|
|
|
@ -17,7 +17,7 @@ Uses a specified list of words and a brute force approach to find subwords in
|
|||
compound words. If found, these subwords are included in the token output.
|
||||
|
||||
This filter uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilter.html[DictionaryCompoundWordTokenFilter],
|
||||
{lucene-analysis-docs}/compound/DictionaryCompoundWordTokenFilter.html[DictionaryCompoundWordTokenFilter],
|
||||
which was built for Germanic languages.
|
||||
|
||||
[[analysis-dict-decomp-tokenfilter-analyze-ex]]
|
||||
|
|
|
@ -13,7 +13,7 @@ For example, you can use the `edge_ngram` token filter to change `quick` to
|
|||
When not customized, the filter creates 1-character edge n-grams by default.
|
||||
|
||||
This filter uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenFilter.html[EdgeNGramTokenFilter].
|
||||
{lucene-analysis-docs}/ngram/EdgeNGramTokenFilter.html[EdgeNGramTokenFilter].
|
||||
|
||||
[NOTE]
|
||||
====
|
||||
|
|
|
@ -22,7 +22,7 @@ Customized versions of this filter are included in several of {es}'s built-in
|
|||
* <<italian-analyzer, Italian analyzer>>
|
||||
|
||||
This filter uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/util/ElisionFilter.html[ElisionFilter].
|
||||
{lucene-analysis-docs}/util/ElisionFilter.html[ElisionFilter].
|
||||
|
||||
[[analysis-elision-tokenfilter-analyze-ex]]
|
||||
==== Example
|
||||
|
|
|
@ -22,7 +22,7 @@ https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth#fingerprint[Op
|
|||
project].
|
||||
|
||||
This filter uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene//analysis/miscellaneous/FingerprintFilter.html[FingerprintFilter].
|
||||
{lucene-analysis-docs}/miscellaneous/FingerprintFilter.html[FingerprintFilter].
|
||||
|
||||
[[analysis-fingerprint-tokenfilter-analyze-ex]]
|
||||
==== Example
|
||||
|
|
|
@ -9,7 +9,7 @@ words. These subwords are then checked against the specified word list. Subwords
|
|||
in the list are excluded from the token output.
|
||||
|
||||
This filter uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/compound/HyphenationCompoundWordTokenFilter.html[HyphenationCompoundWordTokenFilter],
|
||||
{lucene-analysis-docs}/compound/HyphenationCompoundWordTokenFilter.html[HyphenationCompoundWordTokenFilter],
|
||||
which was built for Germanic languages.
|
||||
|
||||
[[analysis-hyp-decomp-tokenfilter-analyze-ex]]
|
||||
|
|
|
@ -26,7 +26,7 @@ type.
|
|||
====
|
||||
|
||||
This filter uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/core/TypeTokenFilter.html[TypeTokenFilter].
|
||||
{lucene-analysis-docs}/core/TypeTokenFilter.html[TypeTokenFilter].
|
||||
|
||||
[[analysis-keep-types-tokenfilter-analyze-include-ex]]
|
||||
==== Include example
|
||||
|
|
|
@ -7,7 +7,7 @@
|
|||
Keeps only tokens contained in a specified word list.
|
||||
|
||||
This filter uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/KeepWordFilter.html[KeepWordFilter].
|
||||
{lucene-analysis-docs}/miscellaneous/KeepWordFilter.html[KeepWordFilter].
|
||||
|
||||
[NOTE]
|
||||
====
|
||||
|
|
|
@ -9,7 +9,7 @@ For example, you can use the `length` filter to exclude tokens shorter than 2
|
|||
characters and tokens longer than 5 characters.
|
||||
|
||||
This filter uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/LengthFilter.html[LengthFilter].
|
||||
{lucene-analysis-docs}/miscellaneous/LengthFilter.html[LengthFilter].
|
||||
|
||||
[TIP]
|
||||
====
|
||||
|
|
|
@ -12,7 +12,7 @@ example, the filter can change the token stream `[ one, two, three ]` to
|
|||
`[ one ]`.
|
||||
|
||||
This filter uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilter.html[LimitTokenCountFilter].
|
||||
{lucene-analysis-docs}/miscellaneous/LimitTokenCountFilter.html[LimitTokenCountFilter].
|
||||
|
||||
[TIP]
|
||||
====
|
||||
|
|
|
@ -104,13 +104,17 @@ PUT lowercase_example
|
|||
(Optional, string)
|
||||
Language-specific lowercase token filter to use. Valid values include:
|
||||
|
||||
`greek`::: Uses Lucene's https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/el/GreekLowerCaseFilter.html[GreekLowerCaseFilter]
|
||||
`greek`::: Uses Lucene's
|
||||
{lucene-analysis-docs}/el/GreekLowerCaseFilter.html[GreekLowerCaseFilter]
|
||||
|
||||
`irish`::: Uses Lucene's http://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/ga/IrishLowerCaseFilter.html[IrishLowerCaseFilter]
|
||||
`irish`::: Uses Lucene's
|
||||
http://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/ga/IrishLowerCaseFilter.html[IrishLowerCaseFilter]
|
||||
|
||||
`turkish`::: Uses Lucene's https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/tr/TurkishLowerCaseFilter.html[TurkishLowerCaseFilter]
|
||||
`turkish`::: Uses Lucene's
|
||||
{lucene-analysis-docs}/tr/TurkishLowerCaseFilter.html[TurkishLowerCaseFilter]
|
||||
|
||||
If not specified, defaults to Lucene's https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/core/LowerCaseFilter.html[LowerCaseFilter].
|
||||
If not specified, defaults to Lucene's
|
||||
{lucene-analysis-docs}/core/LowerCaseFilter.html[LowerCaseFilter].
|
||||
--
|
||||
|
||||
[[analysis-lowercase-tokenfilter-customize]]
|
||||
|
|
|
@ -11,7 +11,7 @@ For example, you can use the `ngram` token filter to change `fox` to
|
|||
`[ f, fo, o, ox, x ]`.
|
||||
|
||||
This filter uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenFilter.html[NGramTokenFilter].
|
||||
{lucene-analysis-docs}/ngram/NGramTokenFilter.html[NGramTokenFilter].
|
||||
|
||||
[NOTE]
|
||||
====
|
||||
|
|
|
@ -12,7 +12,7 @@ such as finding words that end in `-ion` or searching file names by their
|
|||
extension.
|
||||
|
||||
This filter uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/reverse/ReverseStringFilter.html[ReverseStringFilter].
|
||||
{lucene-analysis-docs}/reverse/ReverseStringFilter.html[ReverseStringFilter].
|
||||
|
||||
[[analysis-reverse-tokenfilter-analyze-ex]]
|
||||
==== Example
|
||||
|
|
|
@ -11,7 +11,7 @@ For example, you can use the `truncate` filter to shorten all tokens to
|
|||
`3` characters or fewer, changing `jumping fox` to `jum fox`.
|
||||
|
||||
This filter uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/TruncateTokenFilter.html[TruncateTokenFilter].
|
||||
{lucene-analysis-docs}/miscellaneous/TruncateTokenFilter.html[TruncateTokenFilter].
|
||||
|
||||
[[analysis-truncate-tokenfilter-analyze-ex]]
|
||||
==== Example
|
||||
|
|
|
@ -8,7 +8,7 @@ Changes token text to uppercase. For example, you can use the `uppercase` filter
|
|||
to change `the Lazy DoG` to `THE LAZY DOG`.
|
||||
|
||||
This filter uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/core/UpperCaseFilter.html[UpperCaseFilter].
|
||||
{lucene-analysis-docs}/core/UpperCaseFilter.html[UpperCaseFilter].
|
||||
|
||||
[WARNING]
|
||||
====
|
||||
|
|
Loading…
Reference in New Issue