[DOCS] Reformat condition token filter (#48775)

This commit is contained in:
James Rodewig 2019-11-11 08:49:01 -05:00
parent eb0d8f3383
commit dd92830801
1 changed files with 120 additions and 60 deletions

View File

@ -1,88 +1,148 @@
[[analysis-condition-tokenfilter]] [[analysis-condition-tokenfilter]]
=== Conditional Token Filter === Conditional token filter
++++
<titleabbrev>Conditional</titleabbrev>
++++
The conditional token filter takes a predicate script and a list of subfilters, and Applies a set of token filters to tokens that match conditions in a provided
only applies the subfilters to the current token if it matches the predicate. predicate script.
[float] This filter uses Lucene's
=== Options https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/ConditionalTokenFilter.html[ConditionalTokenFilter].
[horizontal]
filter:: a chain of token filters to apply to the current token if the predicate
matches. These can be any token filters defined elsewhere in the index mappings.
script:: a predicate script that determines whether or not the filters will be applied [[analysis-condition-analyze-ex]]
to the current token. Note that only inline scripts are supported ==== Example
[float] The following <<indices-analyze,analyze API>> request uses the `condition`
=== Settings example filter to match tokens with fewer than 5 characters in `THE QUICK BROWN FOX`.
It then applies the <<analysis-lowercase-tokenfilter,`lowercase`>> filter to
You can set it up like: those matching tokens, converting them to lowercase.
[source,console] [source,console]
-------------------------------------------------- --------------------------------------------------
PUT /condition_example GET /_analyze
{ {
"settings" : { "tokenizer": "standard",
"analysis" : { "filter": [
"analyzer" : { {
"my_analyzer" : { "type": "condition",
"tokenizer" : "standard", "filter": [ "lowercase" ],
"filter" : [ "my_condition" ] "script": {
} "source": "token.getTerm().length() < 5"
}, }
"filter" : {
"my_condition" : {
"type" : "condition",
"filter" : [ "lowercase" ],
"script" : {
"source" : "token.getTerm().length() < 5" <1>
}
}
}
}
} }
],
"text": "THE QUICK BROWN FOX"
} }
-------------------------------------------------- --------------------------------------------------
<1> This will only apply the lowercase filter to terms that are less than 5 The filter produces the following tokens:
characters in length
And test it like: [source,text]
[source,console]
-------------------------------------------------- --------------------------------------------------
POST /condition_example/_analyze [ the, QUICK, BROWN, fox ]
{
"analyzer" : "my_analyzer",
"text" : "What Flapdoodle"
}
-------------------------------------------------- --------------------------------------------------
// TEST[continued]
And it'd respond:
/////////////////////
[source,console-result] [source,console-result]
-------------------------------------------------- --------------------------------------------------
{ {
"tokens": [ "tokens" : [
{ {
"token": "what", <1> "token" : "the",
"start_offset": 0, "start_offset" : 0,
"end_offset": 4, "end_offset" : 3,
"type": "<ALPHANUM>", "type" : "<ALPHANUM>",
"position": 0 "position" : 0
}, },
{ {
"token": "Flapdoodle", <2> "token" : "QUICK",
"start_offset": 5, "start_offset" : 4,
"end_offset": 15, "end_offset" : 9,
"type": "<ALPHANUM>", "type" : "<ALPHANUM>",
"position": 1 "position" : 1
},
{
"token" : "BROWN",
"start_offset" : 10,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "fox",
"start_offset" : 16,
"end_offset" : 19,
"type" : "<ALPHANUM>",
"position" : 3
} }
] ]
} }
-------------------------------------------------- --------------------------------------------------
/////////////////////
<1> The term `What` has been lowercased, because it is only 4 characters long [[analysis-condition-tokenfilter-configure-parms]]
<2> The term `Flapdoodle` has been left in its original case, because it doesn't pass ==== Configurable parameters
the predicate
`filter`::
+
--
(Required, array of token filters)
Array of token filters. If a token matches the predicate script in the `script`
parameter, these filters are applied to the token in the order provided.
These filters can include custom token filters defined in the index mapping.
--
`script`::
+
--
(Required, <<modules-scripting-using,script object>>)
Predicate script used to apply token filters. If a token
matches this script, the filters in the `filter` parameter are applied to the
token.
For valid parameters, see <<_script_parameters>>. Only inline scripts are
supported. Painless scripts are executed in the
{painless}/painless-analysis-predicate-context.html[analysis predicate context]
and require a `token` property.
--
[[analysis-condition-tokenfilter-customize]]
==== Customize and add to an analyzer
To customize the `condition` filter, duplicate it to create the basis
for a new custom token filter. You can modify the filter using its configurable
parameters.
For example, the following <<indices-create-index,create index API>> request
uses a custom `condition` filter to configure a new
<<analysis-custom-analyzer,custom analyzer>>. The custom `condition` filter
matches the first token in a stream. It then reverses that matching token using
the <<analysis-reverse-tokenfilter,`reverse`>> filter.
[source,console]
--------------------------------------------------
PUT /palindrome_list
{
"settings": {
"analysis": {
"analyzer": {
"whitespace_reverse_first_token": {
"tokenizer": "whitespace",
"filter": [ "reverse_first_token" ]
}
},
"filter": {
"reverse_first_token": {
"type": "condition",
"filter": [ "reverse" ],
"script": {
"source": "token.getPosition() === 0"
}
}
}
}
}
}
--------------------------------------------------