[DOCS] Use keyword tokenizer in word delimiter graph examples (#53384)
In a tip admonition, we recommend using the `keyword` tokenizer with the `word_delimiter_graph` token filter. However, we only use the `whitespace` tokenizer in the example snippets. This updates those snippets to use the `keyword` tokenizer instead. Also corrects several spacing issues for arrays in these docs.
This commit is contained in:
parent
7189c57b6c
commit
a9dd7773d2
|
@ -40,16 +40,16 @@ hyphens, we recommend using the
|
||||||
==== Example
|
==== Example
|
||||||
|
|
||||||
The following <<indices-analyze,analyze API>> request uses the
|
The following <<indices-analyze,analyze API>> request uses the
|
||||||
`word_delimiter_graph` filter to split `Neil's Super-Duper-XL500--42+AutoCoder`
|
`word_delimiter_graph` filter to split `Neil's-Super-Duper-XL500--42+AutoCoder`
|
||||||
into normalized tokens using the filter's default rules:
|
into normalized tokens using the filter's default rules:
|
||||||
|
|
||||||
[source,console]
|
[source,console]
|
||||||
----
|
----
|
||||||
GET /_analyze
|
GET /_analyze
|
||||||
{
|
{
|
||||||
"tokenizer": "whitespace",
|
"tokenizer": "keyword",
|
||||||
"filter": [ "word_delimiter_graph" ],
|
"filter": [ "word_delimiter_graph" ],
|
||||||
"text": "Neil's Super-Duper-XL500--42+AutoCoder"
|
"text": "Neil's-Super-Duper-XL500--42+AutoCoder"
|
||||||
}
|
}
|
||||||
----
|
----
|
||||||
|
|
||||||
|
@ -64,62 +64,62 @@ The filter produces the following tokens:
|
||||||
[source,console-result]
|
[source,console-result]
|
||||||
----
|
----
|
||||||
{
|
{
|
||||||
"tokens" : [
|
"tokens": [
|
||||||
{
|
{
|
||||||
"token" : "Neil",
|
"token": "Neil",
|
||||||
"start_offset" : 0,
|
"start_offset": 0,
|
||||||
"end_offset" : 4,
|
"end_offset": 4,
|
||||||
"type" : "word",
|
"type": "word",
|
||||||
"position" : 0
|
"position": 0
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"token" : "Super",
|
"token": "Super",
|
||||||
"start_offset" : 7,
|
"start_offset": 7,
|
||||||
"end_offset" : 12,
|
"end_offset": 12,
|
||||||
"type" : "word",
|
"type": "word",
|
||||||
"position" : 1
|
"position": 1
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"token" : "Duper",
|
"token": "Duper",
|
||||||
"start_offset" : 13,
|
"start_offset": 13,
|
||||||
"end_offset" : 18,
|
"end_offset": 18,
|
||||||
"type" : "word",
|
"type": "word",
|
||||||
"position" : 2
|
"position": 2
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"token" : "XL",
|
"token": "XL",
|
||||||
"start_offset" : 19,
|
"start_offset": 19,
|
||||||
"end_offset" : 21,
|
"end_offset": 21,
|
||||||
"type" : "word",
|
"type": "word",
|
||||||
"position" : 3
|
"position": 3
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"token" : "500",
|
"token": "500",
|
||||||
"start_offset" : 21,
|
"start_offset": 21,
|
||||||
"end_offset" : 24,
|
"end_offset": 24,
|
||||||
"type" : "word",
|
"type": "word",
|
||||||
"position" : 4
|
"position": 4
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"token" : "42",
|
"token": "42",
|
||||||
"start_offset" : 26,
|
"start_offset": 26,
|
||||||
"end_offset" : 28,
|
"end_offset": 28,
|
||||||
"type" : "word",
|
"type": "word",
|
||||||
"position" : 5
|
"position": 5
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"token" : "Auto",
|
"token": "Auto",
|
||||||
"start_offset" : 29,
|
"start_offset": 29,
|
||||||
"end_offset" : 33,
|
"end_offset": 33,
|
||||||
"type" : "word",
|
"type": "word",
|
||||||
"position" : 6
|
"position": 6
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"token" : "Coder",
|
"token": "Coder",
|
||||||
"start_offset" : 33,
|
"start_offset": 33,
|
||||||
"end_offset" : 38,
|
"end_offset": 38,
|
||||||
"type" : "word",
|
"type": "word",
|
||||||
"position" : 7
|
"position": 7
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
@ -141,7 +141,7 @@ PUT /my_index
|
||||||
"analysis": {
|
"analysis": {
|
||||||
"analyzer": {
|
"analyzer": {
|
||||||
"my_analyzer": {
|
"my_analyzer": {
|
||||||
"tokenizer": "whitespace",
|
"tokenizer": "keyword",
|
||||||
"filter": [ "word_delimiter_graph" ]
|
"filter": [ "word_delimiter_graph" ]
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -189,7 +189,7 @@ could produce tokens with illegal offsets.
|
||||||
(Optional, boolean)
|
(Optional, boolean)
|
||||||
If `true`, the filter produces catenated tokens for chains of alphanumeric
|
If `true`, the filter produces catenated tokens for chains of alphanumeric
|
||||||
characters separated by non-alphabetic delimiters. For example:
|
characters separated by non-alphabetic delimiters. For example:
|
||||||
`super-duper-xl-500` -> [**`superduperxl500`**, `super`, `duper`, `xl`, `500` ].
|
`super-duper-xl-500` -> [ **`superduperxl500`**, `super`, `duper`, `xl`, `500` ].
|
||||||
Defaults to `false`.
|
Defaults to `false`.
|
||||||
|
|
||||||
[WARNING]
|
[WARNING]
|
||||||
|
@ -215,7 +215,7 @@ you plan to use these queries.
|
||||||
(Optional, boolean)
|
(Optional, boolean)
|
||||||
If `true`, the filter produces catenated tokens for chains of numeric characters
|
If `true`, the filter produces catenated tokens for chains of numeric characters
|
||||||
separated by non-alphabetic delimiters. For example: `01-02-03` ->
|
separated by non-alphabetic delimiters. For example: `01-02-03` ->
|
||||||
[**`010203`**, `01`, `02`, `03` ]. Defaults to `false`.
|
[ **`010203`**, `01`, `02`, `03` ]. Defaults to `false`.
|
||||||
|
|
||||||
[WARNING]
|
[WARNING]
|
||||||
====
|
====
|
||||||
|
@ -240,7 +240,7 @@ you plan to use these queries.
|
||||||
(Optional, boolean)
|
(Optional, boolean)
|
||||||
If `true`, the filter produces catenated tokens for chains of alphabetical
|
If `true`, the filter produces catenated tokens for chains of alphabetical
|
||||||
characters separated by non-alphabetic delimiters. For example: `super-duper-xl`
|
characters separated by non-alphabetic delimiters. For example: `super-duper-xl`
|
||||||
-> [**`superduperxl`**, `super`, `duper`, `xl`]. Defaults to `false`.
|
-> [ **`superduperxl`**, `super`, `duper`, `xl` ]. Defaults to `false`.
|
||||||
|
|
||||||
[WARNING]
|
[WARNING]
|
||||||
====
|
====
|
||||||
|
@ -277,8 +277,8 @@ Defaults to `true`.
|
||||||
(Optional, boolean)
|
(Optional, boolean)
|
||||||
If `true`, the filter includes the original version of any split tokens in the
|
If `true`, the filter includes the original version of any split tokens in the
|
||||||
output. This original version includes non-alphanumeric delimiters. For example:
|
output. This original version includes non-alphanumeric delimiters. For example:
|
||||||
`super-duper-xl-500` -> [**`super-duper-xl-500`**, `super`, `duper`, `xl`, `500`
|
`super-duper-xl-500` -> [ **`super-duper-xl-500`**, `super`, `duper`, `xl`,
|
||||||
]. Defaults to `false`.
|
`500` ]. Defaults to `false`.
|
||||||
|
|
||||||
[WARNING]
|
[WARNING]
|
||||||
====
|
====
|
||||||
|
@ -309,7 +309,7 @@ break.
|
||||||
`split_on_case_change`::
|
`split_on_case_change`::
|
||||||
(Optional, boolean)
|
(Optional, boolean)
|
||||||
If `true`, the filter splits tokens at letter case transitions. For example:
|
If `true`, the filter splits tokens at letter case transitions. For example:
|
||||||
`camelCase` -> [ `camel`, `Case`]. Defaults to `true`.
|
`camelCase` -> [ `camel`, `Case` ]. Defaults to `true`.
|
||||||
|
|
||||||
`split_on_numerics`::
|
`split_on_numerics`::
|
||||||
(Optional, boolean)
|
(Optional, boolean)
|
||||||
|
@ -319,7 +319,7 @@ If `true`, the filter splits tokens at letter-number transitions. For example:
|
||||||
`stem_english_possessive`::
|
`stem_english_possessive`::
|
||||||
(Optional, boolean)
|
(Optional, boolean)
|
||||||
If `true`, the filter removes the English possessive (`'s`) from the end of each
|
If `true`, the filter removes the English possessive (`'s`) from the end of each
|
||||||
token. For example: `O'Neil's` -> `[ `O`, `Neil` ]. Defaults to `true`.
|
token. For example: `O'Neil's` -> [ `O`, `Neil` ]. Defaults to `true`.
|
||||||
|
|
||||||
`type_table`::
|
`type_table`::
|
||||||
+
|
+
|
||||||
|
@ -332,7 +332,7 @@ those characters.
|
||||||
For example, the following array maps the plus (`+`) and hyphen (`-`) characters
|
For example, the following array maps the plus (`+`) and hyphen (`-`) characters
|
||||||
as alphanumeric, which means they won't be treated as delimiters:
|
as alphanumeric, which means they won't be treated as delimiters:
|
||||||
|
|
||||||
`["+ => ALPHA", "- => ALPHA"]`
|
`[ "+ => ALPHA", "- => ALPHA" ]`
|
||||||
|
|
||||||
Supported types include:
|
Supported types include:
|
||||||
|
|
||||||
|
@ -408,7 +408,7 @@ PUT /my_index
|
||||||
"analysis": {
|
"analysis": {
|
||||||
"analyzer": {
|
"analyzer": {
|
||||||
"my_analyzer": {
|
"my_analyzer": {
|
||||||
"tokenizer": "whitespace",
|
"tokenizer": "keyword",
|
||||||
"filter": [ "my_custom_word_delimiter_graph_filter" ]
|
"filter": [ "my_custom_word_delimiter_graph_filter" ]
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
|
Loading…
Reference in New Issue