[DOCS] Use keyword tokenizer in word delimiter graph examples (#53384)

In a tip admonition, we recommend using the `keyword` tokenizer with the
`word_delimiter_graph` token filter. However, we only use the
`whitespace` tokenizer in the example snippets. This updates those
snippets to use the `keyword` tokenizer instead.

Also corrects several spacing issues for arrays in these docs.
This commit is contained in:
James Rodewig 2020-03-11 04:45:26 -04:00
parent 7189c57b6c
commit a9dd7773d2
1 changed files with 54 additions and 54 deletions

View File

@ -40,16 +40,16 @@ hyphens, we recommend using the
==== Example ==== Example
The following <<indices-analyze,analyze API>> request uses the The following <<indices-analyze,analyze API>> request uses the
`word_delimiter_graph` filter to split `Neil's Super-Duper-XL500--42+AutoCoder` `word_delimiter_graph` filter to split `Neil's-Super-Duper-XL500--42+AutoCoder`
into normalized tokens using the filter's default rules: into normalized tokens using the filter's default rules:
[source,console] [source,console]
---- ----
GET /_analyze GET /_analyze
{ {
"tokenizer": "whitespace", "tokenizer": "keyword",
"filter": [ "word_delimiter_graph" ], "filter": [ "word_delimiter_graph" ],
"text": "Neil's Super-Duper-XL500--42+AutoCoder" "text": "Neil's-Super-Duper-XL500--42+AutoCoder"
} }
---- ----
@ -64,62 +64,62 @@ The filter produces the following tokens:
[source,console-result] [source,console-result]
---- ----
{ {
"tokens" : [ "tokens": [
{ {
"token" : "Neil", "token": "Neil",
"start_offset" : 0, "start_offset": 0,
"end_offset" : 4, "end_offset": 4,
"type" : "word", "type": "word",
"position" : 0 "position": 0
}, },
{ {
"token" : "Super", "token": "Super",
"start_offset" : 7, "start_offset": 7,
"end_offset" : 12, "end_offset": 12,
"type" : "word", "type": "word",
"position" : 1 "position": 1
}, },
{ {
"token" : "Duper", "token": "Duper",
"start_offset" : 13, "start_offset": 13,
"end_offset" : 18, "end_offset": 18,
"type" : "word", "type": "word",
"position" : 2 "position": 2
}, },
{ {
"token" : "XL", "token": "XL",
"start_offset" : 19, "start_offset": 19,
"end_offset" : 21, "end_offset": 21,
"type" : "word", "type": "word",
"position" : 3 "position": 3
}, },
{ {
"token" : "500", "token": "500",
"start_offset" : 21, "start_offset": 21,
"end_offset" : 24, "end_offset": 24,
"type" : "word", "type": "word",
"position" : 4 "position": 4
}, },
{ {
"token" : "42", "token": "42",
"start_offset" : 26, "start_offset": 26,
"end_offset" : 28, "end_offset": 28,
"type" : "word", "type": "word",
"position" : 5 "position": 5
}, },
{ {
"token" : "Auto", "token": "Auto",
"start_offset" : 29, "start_offset": 29,
"end_offset" : 33, "end_offset": 33,
"type" : "word", "type": "word",
"position" : 6 "position": 6
}, },
{ {
"token" : "Coder", "token": "Coder",
"start_offset" : 33, "start_offset": 33,
"end_offset" : 38, "end_offset": 38,
"type" : "word", "type": "word",
"position" : 7 "position": 7
} }
] ]
} }
@ -141,7 +141,7 @@ PUT /my_index
"analysis": { "analysis": {
"analyzer": { "analyzer": {
"my_analyzer": { "my_analyzer": {
"tokenizer": "whitespace", "tokenizer": "keyword",
"filter": [ "word_delimiter_graph" ] "filter": [ "word_delimiter_graph" ]
} }
} }
@ -189,7 +189,7 @@ could produce tokens with illegal offsets.
(Optional, boolean) (Optional, boolean)
If `true`, the filter produces catenated tokens for chains of alphanumeric If `true`, the filter produces catenated tokens for chains of alphanumeric
characters separated by non-alphabetic delimiters. For example: characters separated by non-alphabetic delimiters. For example:
`super-duper-xl-500` -> [**`superduperxl500`**, `super`, `duper`, `xl`, `500` ]. `super-duper-xl-500` -> [ **`superduperxl500`**, `super`, `duper`, `xl`, `500` ].
Defaults to `false`. Defaults to `false`.
[WARNING] [WARNING]
@ -215,7 +215,7 @@ you plan to use these queries.
(Optional, boolean) (Optional, boolean)
If `true`, the filter produces catenated tokens for chains of numeric characters If `true`, the filter produces catenated tokens for chains of numeric characters
separated by non-alphabetic delimiters. For example: `01-02-03` -> separated by non-alphabetic delimiters. For example: `01-02-03` ->
[**`010203`**, `01`, `02`, `03` ]. Defaults to `false`. [ **`010203`**, `01`, `02`, `03` ]. Defaults to `false`.
[WARNING] [WARNING]
==== ====
@ -240,7 +240,7 @@ you plan to use these queries.
(Optional, boolean) (Optional, boolean)
If `true`, the filter produces catenated tokens for chains of alphabetical If `true`, the filter produces catenated tokens for chains of alphabetical
characters separated by non-alphabetic delimiters. For example: `super-duper-xl` characters separated by non-alphabetic delimiters. For example: `super-duper-xl`
-> [**`superduperxl`**, `super`, `duper`, `xl`]. Defaults to `false`. -> [ **`superduperxl`**, `super`, `duper`, `xl` ]. Defaults to `false`.
[WARNING] [WARNING]
==== ====
@ -277,8 +277,8 @@ Defaults to `true`.
(Optional, boolean) (Optional, boolean)
If `true`, the filter includes the original version of any split tokens in the If `true`, the filter includes the original version of any split tokens in the
output. This original version includes non-alphanumeric delimiters. For example: output. This original version includes non-alphanumeric delimiters. For example:
`super-duper-xl-500` -> [**`super-duper-xl-500`**, `super`, `duper`, `xl`, `500` `super-duper-xl-500` -> [ **`super-duper-xl-500`**, `super`, `duper`, `xl`,
]. Defaults to `false`. `500` ]. Defaults to `false`.
[WARNING] [WARNING]
==== ====
@ -309,7 +309,7 @@ break.
`split_on_case_change`:: `split_on_case_change`::
(Optional, boolean) (Optional, boolean)
If `true`, the filter splits tokens at letter case transitions. For example: If `true`, the filter splits tokens at letter case transitions. For example:
`camelCase` -> [ `camel`, `Case`]. Defaults to `true`. `camelCase` -> [ `camel`, `Case` ]. Defaults to `true`.
`split_on_numerics`:: `split_on_numerics`::
(Optional, boolean) (Optional, boolean)
@ -319,7 +319,7 @@ If `true`, the filter splits tokens at letter-number transitions. For example:
`stem_english_possessive`:: `stem_english_possessive`::
(Optional, boolean) (Optional, boolean)
If `true`, the filter removes the English possessive (`'s`) from the end of each If `true`, the filter removes the English possessive (`'s`) from the end of each
token. For example: `O'Neil's` -> `[ `O`, `Neil` ]. Defaults to `true`. token. For example: `O'Neil's` -> [ `O`, `Neil` ]. Defaults to `true`.
`type_table`:: `type_table`::
+ +
@ -332,7 +332,7 @@ those characters.
For example, the following array maps the plus (`+`) and hyphen (`-`) characters For example, the following array maps the plus (`+`) and hyphen (`-`) characters
as alphanumeric, which means they won't be treated as delimiters: as alphanumeric, which means they won't be treated as delimiters:
`["+ => ALPHA", "- => ALPHA"]` `[ "+ => ALPHA", "- => ALPHA" ]`
Supported types include: Supported types include:
@ -408,7 +408,7 @@ PUT /my_index
"analysis": { "analysis": {
"analyzer": { "analyzer": {
"my_analyzer": { "my_analyzer": {
"tokenizer": "whitespace", "tokenizer": "keyword",
"filter": [ "my_custom_word_delimiter_graph_filter" ] "filter": [ "my_custom_word_delimiter_graph_filter" ]
} }
}, },