[DOCS] Reformat `mapping` charfilter (#57818) (#57885)

Changes:

* Adds title abbreviation
* Adds Lucene link to description
* Adds standard headings
* Simplifies analyze example
* Simplifies analyzer example and adds contextual text
This commit is contained in:
James Rodewig 2020-06-09 12:43:01 -04:00 committed by GitHub
parent c17121428e
commit 24a50eb3af
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 105 additions and 127 deletions

View File

@ -1,5 +1,8 @@
[[analysis-mapping-charfilter]]
=== Mapping Char Filter
=== Mapping character filter
++++
<titleabbrev>Mapping</titleabbrev>
++++
The `mapping` character filter accepts a map of keys and values. Whenever it
encounters a string of characters that is the same as a key, it replaces them
@ -8,75 +11,53 @@ with the value associated with that key.
Matching is greedy; the longest pattern matching at a given point wins.
Replacements are allowed to be the empty string.
[float]
=== Configuration
The `mapping` filter uses Lucene's
{lucene-analysis-docs}/charfilter/MappingCharFilter.html[MappingCharFilter].
The `mapping` character filter accepts the following parameters:
[[analysis-mapping-charfilter-analyze-ex]]
==== Example
[horizontal]
`mappings`::
A array of mappings, with each element having the form `key => value`.
`mappings_path`::
A path, either absolute or relative to the `config` directory, to a UTF-8
encoded text mappings file containing a `key => value` mapping per line.
Either the `mappings` or `mappings_path` parameter must be provided.
[float]
=== Example configuration
In this example, we configure the `mapping` character filter to replace Arabic
numerals with their Latin equivalents:
The following <<indices-analyze,analyze API>> request uses the `mapping` filter
to convert Hindu-Arabic numerals (٠‎١٢٣٤٥٦٧٨‎٩‎) into their Arabic-Latin
equivalents (0123456789), changing the text `My license plate is ٢٥٠١٥` to
`My license plate is 25015`.
[source,console]
----------------------------
PUT my_index
----
GET /_analyze
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "keyword",
"char_filter": [
"my_char_filter"
]
}
},
"char_filter": {
"my_char_filter": {
"type": "mapping",
"mappings": [
"٠ => 0",
"١ => 1",
"٢ => 2",
"٣ => 3",
"٤ => 4",
"٥ => 5",
"٦ => 6",
"٧ => 7",
"٨ => 8",
"٩ => 9"
]
}
}
"tokenizer": "keyword",
"char_filter": [
{
"type": "mapping",
"mappings": [
"٠ => 0",
"١ => 1",
"٢ => 2",
"٣ => 3",
"٤ => 4",
"٥ => 5",
"٦ => 6",
"٧ => 7",
"٨ => 8",
"٩ => 9"
]
}
}
}
POST my_index/_analyze
{
"analyzer": "my_analyzer",
],
"text": "My license plate is ٢٥٠١٥"
}
----------------------------
----
/////////////////////
The filter produces the following text:
[source,text]
----
[ My license plate is 25015 ]
----
////
[source,console-result]
----------------------------
----
{
"tokens": [
{
@ -88,24 +69,45 @@ POST my_index/_analyze
}
]
}
----------------------------
----
////
/////////////////////
[[analysis-mapping-charfilter-configure-parms]]
==== Configurable parameters
`mappings`::
(Required*, array of strings)
Array of mappings, with each element having the form `key => value`.
+
Either this or the `mappings_path` parameter must be specified.
The above example produces the following term:
`mappings_path`::
(Required*, string)
Path to a file containing `key => value` mappings.
+
This path must be absolute or relative to the `config` location, and the file
must be UTF-8 encoded. Each mapping in the file must be separated by a line
break.
+
Either this or the `mappings` parameter must be specified.
[source,text]
---------------------------
[ My license plate is 25015 ]
---------------------------
[[analysis-mapping-charfilter-customize]]
==== Customize and add to an analyzer
Keys and values can be strings with multiple characters. The following
example replaces the `:)` and `:(` emoticons with a text equivalent:
To customize the `mappings` filter, duplicate it to create the basis for a new
custom character filter. You can modify the filter using its configurable
parameters.
The following <<indices-create-index,create index API>> request
configures a new <<analysis-custom-analyzer,custom analyzer>> using a custom
`mappings` filter, `my_mappings_char_filter`.
The `my_mappings_char_filter` filter replaces the `:)` and `:(` emoticons
with a text equivalent.
[source,console]
----------------------------
PUT my_index
----
PUT /my_index
{
"settings": {
"analysis": {
@ -113,12 +115,12 @@ PUT my_index
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_char_filter"
"my_mappings_char_filter"
]
}
},
"char_filter": {
"my_char_filter": {
"my_mappings_char_filter": {
"type": "mapping",
"mappings": [
":) => _happy_",
@ -129,67 +131,43 @@ PUT my_index
}
}
}
----
POST my_index/_analyze
The following <<indices-analyze,analyze API>> request uses the custom
`my_mappings_char_filter` to replace `:(` with `_sad_` in
the text `I'm delighted about it :(`.
[source,console]
----
GET /my_index/_analyze
{
"analyzer": "my_analyzer",
"tokenizer": "keyword",
"char_filter": [ "my_mappings_char_filter" ],
"text": "I'm delighted about it :("
}
----------------------------
----
// TEST[continued]
/////////////////////
[source,console-result]
----------------------------
{
"tokens": [
{
"token": "I'm",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "delighted",
"start_offset": 4,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "about",
"start_offset": 14,
"end_offset": 19,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "it",
"start_offset": 20,
"end_offset": 22,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "_sad_",
"start_offset": 23,
"end_offset": 25,
"type": "<ALPHANUM>",
"position": 4
}
]
}
----------------------------
/////////////////////
The above example produces the following terms:
The filter produces the following text:
[source,text]
---------------------------
[ I'm, delighted, about, it, _sad_ ]
[ I'm delighted about it _sad_ ]
---------------------------
////
[source,console-result]
----
{
"tokens": [
{
"token": "I'm delighted about it _sad_",
"start_offset": 0,
"end_offset": 25,
"type": "word",
"position": 0
}
]
}
----
////