[DOCS] Reformat lowercase token filter docs (#49935)

This commit is contained in:
James Rodewig 2019-12-12 09:39:06 -05:00
parent 521933aa11
commit 1186a5dc09
1 changed files with 122 additions and 12 deletions

View File

@ -1,25 +1,135 @@
[[analysis-lowercase-tokenfilter]]
=== Lowercase Token Filter
=== Lowercase token filter
++++
<titleabbrev>Lowercase</titleabbrev>
++++
A token filter of type `lowercase` that normalizes token text to lower
case.
Changes token text to lowercase. For example, you can use the `lowercase` filter
to change `THE Lazy DoG` to `the lazy dog`.
Lowercase token filter supports Greek, Irish, and Turkish lowercase token
filters through the `language` parameter. Below is a usage example in a
custom analyzer
In addition to a default filter, the `lowercase` token filter provides access to
Lucene's language-specific lowercase filters for Greek, Irish, and Turkish.
[[analysis-lowercase-tokenfilter-analyze-ex]]
==== Example
The following <<indices-analyze,analyze API>> request uses the default
`lowercase` filter to change the `THE Quick FoX JUMPs` to lowercase:
[source,console]
--------------------------------------------------
PUT /lowercase_example
GET _analyze
{
"tokenizer" : "standard",
"filter" : ["lowercase"],
"text" : "THE Quick FoX JUMPs"
}
--------------------------------------------------
The filter produces the following tokens:
[source,text]
--------------------------------------------------
[ the, quick, fox, jumps ]
--------------------------------------------------
/////////////////////
[source,console-result]
--------------------------------------------------
{
"tokens" : [
{
"token" : "the",
"start_offset" : 0,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "quick",
"start_offset" : 4,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "fox",
"start_offset" : 10,
"end_offset" : 13,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "jumps",
"start_offset" : 14,
"end_offset" : 19,
"type" : "<ALPHANUM>",
"position" : 3
}
]
}
--------------------------------------------------
/////////////////////
[[analysis-lowercase-tokenfilter-analyzer-ex]]
==== Add to an analyzer
The following <<indices-create-index,create index API>> request uses the
`lowercase` filter to configure a new
<<analysis-custom-analyzer,custom analyzer>>.
[source,console]
--------------------------------------------------
PUT lowercase_example
{
"settings" : {
"analysis" : {
"analyzer" : {
"whitespace_lowercase" : {
"tokenizer" : "whitespace",
"filter" : ["lowercase"]
}
}
}
}
}
--------------------------------------------------
[[analysis-lowercase-tokenfilter-configure-parms]]
==== Configurable parameters
`language`::
+
--
(Optional, string)
Language-specific lowercase token filter to use. Valid values include:
`greek`::: Uses Lucene's https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/el/GreekLowerCaseFilter.html[GreekLowerCaseFilter]
`irish`::: Uses Lucene's http://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/ga/IrishLowerCaseFilter.html[IrishLowerCaseFilter]
`turkish`::: Uses Lucene's https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/tr/TurkishLowerCaseFilter.html[TurkishLowerCaseFilter]
If not specified, defaults to Lucene's https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/core/LowerCaseFilter.html[LowerCaseFilter].
--
[[analysis-lowercase-tokenfilter-customize]]
==== Customize
To customize the `lowercase` filter, duplicate it to create the basis
for a new custom token filter. You can modify the filter using its configurable
parameters.
For example, the following request creates a custom `lowercase` filter for the
Greek language:
[source,console]
--------------------------------------------------
PUT custom_lowercase_example
{
"settings": {
"analysis": {
"analyzer": {
"standard_lowercase_example": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase"]
},
"greek_lowercase_example": {
"type": "custom",
"tokenizer": "standard",