[DOCS] Reformat ASCII folding token filter docs (#48143)
This commit is contained in:
parent
8163e0a9e5
commit
06dc1fbd96
|
@ -8,7 +8,7 @@ Strips all characters after an apostrophe, including the apostrophe itself.
|
||||||
|
|
||||||
This filter is included in {es}'s built-in <<turkish-analyzer,Turkish language
|
This filter is included in {es}'s built-in <<turkish-analyzer,Turkish language
|
||||||
analyzer>>. It uses Lucene's
|
analyzer>>. It uses Lucene's
|
||||||
https://lucene.apache.org/core/4_8_0/analyzers-common/org/apache/lucene/analysis/tr/ApostropheFilter.html[ApostropheFilter],
|
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/tr/ApostropheFilter.html[ApostropheFilter],
|
||||||
which was built for the Turkish language.
|
which was built for the Turkish language.
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -1,10 +1,83 @@
|
||||||
[[analysis-asciifolding-tokenfilter]]
|
[[analysis-asciifolding-tokenfilter]]
|
||||||
=== ASCII Folding Token Filter
|
=== ASCII folding token filter
|
||||||
|
++++
|
||||||
|
<titleabbrev>ASCII folding</titleabbrev>
|
||||||
|
++++
|
||||||
|
|
||||||
A token filter of type `asciifolding` that converts alphabetic, numeric,
|
Converts alphabetic, numeric, and symbolic characters that are not in the Basic
|
||||||
and symbolic Unicode characters which are not in the first 127 ASCII
|
Latin Unicode block (first 127 ASCII characters) to their ASCII equivalent, if
|
||||||
characters (the "Basic Latin" Unicode block) into their ASCII
|
one exists. For example, the filter changes `à` to `a`.
|
||||||
equivalents, if one exists. Example:
|
|
||||||
|
This filter uses Lucene's
|
||||||
|
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html[ASCIIFoldingFilter].
|
||||||
|
|
||||||
|
[[analysis-asciifolding-tokenfilter-analyze-ex]]
|
||||||
|
==== Example
|
||||||
|
|
||||||
|
The following <<indices-analyze,analyze API>> request uses the `asciifolding`
|
||||||
|
filter to drop the diacritical marks in `açaí à la carte`:
|
||||||
|
|
||||||
|
[source,console]
|
||||||
|
--------------------------------------------------
|
||||||
|
GET /_analyze
|
||||||
|
{
|
||||||
|
"tokenizer" : "standard",
|
||||||
|
"filter" : ["asciifolding"],
|
||||||
|
"text" : "açaí à la carte"
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
The filter produces the following tokens:
|
||||||
|
|
||||||
|
[source,text]
|
||||||
|
--------------------------------------------------
|
||||||
|
[ acai, a, la, carte ]
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
/////////////////////
|
||||||
|
[source,console-result]
|
||||||
|
--------------------------------------------------
|
||||||
|
{
|
||||||
|
"tokens" : [
|
||||||
|
{
|
||||||
|
"token" : "acai",
|
||||||
|
"start_offset" : 0,
|
||||||
|
"end_offset" : 4,
|
||||||
|
"type" : "<ALPHANUM>",
|
||||||
|
"position" : 0
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"token" : "a",
|
||||||
|
"start_offset" : 5,
|
||||||
|
"end_offset" : 6,
|
||||||
|
"type" : "<ALPHANUM>",
|
||||||
|
"position" : 1
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"token" : "la",
|
||||||
|
"start_offset" : 7,
|
||||||
|
"end_offset" : 9,
|
||||||
|
"type" : "<ALPHANUM>",
|
||||||
|
"position" : 2
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"token" : "carte",
|
||||||
|
"start_offset" : 10,
|
||||||
|
"end_offset" : 15,
|
||||||
|
"type" : "<ALPHANUM>",
|
||||||
|
"position" : 3
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
/////////////////////
|
||||||
|
|
||||||
|
[[analysis-asciifolding-tokenfilter-analyzer-ex]]
|
||||||
|
==== Add to an analyzer
|
||||||
|
|
||||||
|
The following <<indices-create-index,create index API>> request uses the
|
||||||
|
`asciifolding` filter to configure a new
|
||||||
|
<<analysis-custom-analyzer,custom analyzer>>.
|
||||||
|
|
||||||
[source,console]
|
[source,console]
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
|
@ -13,7 +86,7 @@ PUT /asciifold_example
|
||||||
"settings" : {
|
"settings" : {
|
||||||
"analysis" : {
|
"analysis" : {
|
||||||
"analyzer" : {
|
"analyzer" : {
|
||||||
"default" : {
|
"standard_asciifolding" : {
|
||||||
"tokenizer" : "standard",
|
"tokenizer" : "standard",
|
||||||
"filter" : ["asciifolding"]
|
"filter" : ["asciifolding"]
|
||||||
}
|
}
|
||||||
|
@ -23,9 +96,23 @@ PUT /asciifold_example
|
||||||
}
|
}
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
|
|
||||||
Accepts `preserve_original` setting which defaults to false but if true
|
[[analysis-asciifolding-tokenfilter-configure-parms]]
|
||||||
will keep the original token as well as emit the folded token. For
|
==== Configurable parameters
|
||||||
example:
|
|
||||||
|
`preserve_original`::
|
||||||
|
(Optional, boolean)
|
||||||
|
If `true`, emit both original tokens and folded tokens.
|
||||||
|
Defaults to `false`.
|
||||||
|
|
||||||
|
[[analysis-asciifolding-tokenfilter-customize]]
|
||||||
|
==== Customize
|
||||||
|
|
||||||
|
To customize the `asciifolding` filter, duplicate it to create the basis
|
||||||
|
for a new custom token filter. You can modify the filter using its configurable
|
||||||
|
parameters.
|
||||||
|
|
||||||
|
For example, the following request creates a custom `asciifolding` filter with
|
||||||
|
`preserve_original` set to true:
|
||||||
|
|
||||||
[source,console]
|
[source,console]
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
|
@ -34,7 +121,7 @@ PUT /asciifold_example
|
||||||
"settings" : {
|
"settings" : {
|
||||||
"analysis" : {
|
"analysis" : {
|
||||||
"analyzer" : {
|
"analyzer" : {
|
||||||
"default" : {
|
"standard_asciifolding" : {
|
||||||
"tokenizer" : "standard",
|
"tokenizer" : "standard",
|
||||||
"filter" : ["my_ascii_folding"]
|
"filter" : ["my_ascii_folding"]
|
||||||
}
|
}
|
||||||
|
|
Loading…
Reference in New Issue