[DOCS] Reformat unique token filter docs (#50748)

* Updates the description
* Adds analyze, custom analyzer, and custom filter snippets
* Adds parameter documentation
This commit is contained in:
James Rodewig 2020-01-28 10:33:45 -05:00
parent 550254ec7f
commit 70e4ae3381
1 changed files with 144 additions and 4 deletions

View File

@ -4,7 +4,147 @@
<titleabbrev>Unique</titleabbrev>
++++
The `unique` token filter can be used to only index unique tokens during
analysis. By default it is applied on all the token stream. If
`only_on_same_position` is set to `true`, it will only remove duplicate
tokens on the same position.
Removes duplicate tokens from a stream. For example, you can use the `unique`
filter to change `the lazy lazy dog` to `the lazy dog`.
If the `only_on_same_position` parameter is set to `true`, the `unique` filter
removes only duplicate tokens _in the same position_.
[NOTE]
====
When `only_on_same_position` is `true`, the `unique` filter works the same as
<<analysis-remove-duplicates-tokenfilter,`remove_duplicates`>> filter.
====
[[analysis-unique-tokenfilter-analyze-ex]]
==== Example
The following <<indices-analyze,analyze API>> request uses the `unique` filter
to remove duplicate tokens from `the quick fox jumps the lazy fox`:
[source,console]
--------------------------------------------------
GET _analyze
{
"tokenizer" : "whitespace",
"filter" : ["unique"],
"text" : "the quick fox jumps the lazy fox"
}
--------------------------------------------------
The filter removes duplicated tokens for `the` and `fox`, producing the
following output:
[source,text]
--------------------------------------------------
[ the, quick, fox, jumps, lazy ]
--------------------------------------------------
/////////////////////
[source,console-result]
--------------------------------------------------
{
"tokens" : [
{
"token" : "the",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 0
},
{
"token" : "quick",
"start_offset" : 4,
"end_offset" : 9,
"type" : "word",
"position" : 1
},
{
"token" : "fox",
"start_offset" : 10,
"end_offset" : 13,
"type" : "word",
"position" : 2
},
{
"token" : "jumps",
"start_offset" : 14,
"end_offset" : 19,
"type" : "word",
"position" : 3
},
{
"token" : "lazy",
"start_offset" : 24,
"end_offset" : 28,
"type" : "word",
"position" : 4
}
]
}
--------------------------------------------------
/////////////////////
[[analysis-unique-tokenfilter-analyzer-ex]]
==== Add to an analyzer
The following <<indices-create-index,create index API>> request uses the
`unique` filter to configure a new <<analysis-custom-analyzer,custom analyzer>>.
[source,console]
--------------------------------------------------
PUT custom_unique_example
{
"settings" : {
"analysis" : {
"analyzer" : {
"standard_truncate" : {
"tokenizer" : "standard",
"filter" : ["unique"]
}
}
}
}
}
--------------------------------------------------
[[analysis-unique-tokenfilter-configure-parms]]
==== Configurable parameters
`only_on_same_position`::
(Optional, boolean)
If `true`, only remove duplicate tokens in the same position.
Defaults to `false`.
[[analysis-unique-tokenfilter-customize]]
==== Customize
To customize the `unique` filter, duplicate it to create the basis
for a new custom token filter. You can modify the filter using its configurable
parameters.
For example, the following request creates a custom `unique` filter with
`only_on_same_position` set to `true`.
[source,console]
--------------------------------------------------
PUT letter_unique_pos_example
{
"settings": {
"analysis": {
"analyzer": {
"letter_unique_pos": {
"tokenizer": "letter",
"filter": [ "unique_pos" ]
}
},
"filter": {
"unique_pos": {
"type": "unique",
"only_on_same_position": true
}
}
}
}
}
--------------------------------------------------