[DOCS] Reformat `predicate_token_filter` tokenfilter (#57705) (#59714)

This commit is contained in:
James Rodewig 2020-07-16 13:35:09 -04:00 committed by GitHub
parent f2b530dbdb
commit da85a40e7e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 112 additions and 63 deletions

View File

@ -4,76 +4,125 @@
<titleabbrev>Predicate script</titleabbrev>
++++
The predicate_token_filter token filter takes a predicate script, and removes tokens that do
not match the predicate.
Removes tokens that don't match a provided predicate script. The filter supports
inline {painless}/index.html[Painless] scripts only. Scripts are evaluated in
the {painless}/painless-analysis-predicate-context.html[analysis predicate
context].
[float]
=== Options
[horizontal]
script:: a predicate script that determines whether or not the current token will
be emitted. Note that only inline scripts are supported.
[[analysis-predicatefilter-tokenfilter-analyze-ex]]
==== Example
[float]
=== Settings example
You can set it up like:
The following <<indices-analyze,analyze API>> request uses the
`predicate_token_filter` filter to only output tokens longer than three
characters from `the fox jumps the lazy dog`.
[source,console]
--------------------------------------------------
PUT /condition_example
----
GET /_analyze
{
"settings" : {
"analysis" : {
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "standard",
"filter" : [ "my_script_filter" ]
}
},
"filter" : {
"my_script_filter" : {
"type" : "predicate_token_filter",
"script" : {
"source" : "token.getTerm().length() > 5" <1>
}
}
}
}
}
}
--------------------------------------------------
<1> This will emit tokens that are more than 5 characters long
And test it like:
[source,console]
--------------------------------------------------
POST /condition_example/_analyze
{
"analyzer" : "my_analyzer",
"text" : "What Flapdoodle"
}
--------------------------------------------------
// TEST[continued]
And it'd respond:
[source,console-result]
--------------------------------------------------
{
"tokens": [
"tokenizer": "whitespace",
"filter": [
{
"token": "Flapdoodle", <1>
"start_offset": 5,
"end_offset": 15,
"type": "<ALPHANUM>",
"position": 1 <2>
"type": "predicate_token_filter",
"script": {
"source": """
token.term.length() > 3
"""
}
}
],
"text": "the fox jumps the lazy dog"
}
----
The filter produces the following tokens.
[source,text]
----
[ jumps, lazy ]
----
The API response contains the position and offsets of each output token. Note
the `predicate_token_filter` filter does not change the tokens' original
positions or offets.
.*Response*
[%collapsible]
====
[source,console-result]
----
{
"tokens" : [
{
"token" : "jumps",
"start_offset" : 8,
"end_offset" : 13,
"type" : "word",
"position" : 2
},
{
"token" : "lazy",
"start_offset" : 18,
"end_offset" : 22,
"type" : "word",
"position" : 4
}
]
}
--------------------------------------------------
----
====
<1> The token 'What' has been removed from the tokenstream because it does not
match the predicate.
<2> The position and offset values are unaffected by the removal of earlier tokens
[[analysis-predicatefilter-tokenfilter-configure-parms]]
==== Configurable parameters
`script`::
(Required, <<modules-scripting-using,script object>>)
Script containing a condition used to filter incoming tokens. Only tokens that
match this script are included in the output.
+
This parameter supports inline {painless}/index.html[Painless] scripts only. The
script is evaluated in the
{painless}/painless-analysis-predicate-context.html[analysis predicate context].
[[analysis-predicatefilter-tokenfilter-customize]]
==== Customize and add to an analyzer
To customize the `predicate_token_filter` filter, duplicate it to create the basis
for a new custom token filter. You can modify the filter using its configurable
parameters.
The following <<indices-create-index,create index API>> request
configures a new <<analysis-custom-analyzer,custom analyzer>> using a custom
`predicate_token_filter` filter, `my_script_filter`.
The `my_script_filter` filter removes tokens with of any type other than
`ALPHANUM`.
[source,console]
----
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"my_script_filter"
]
}
},
"filter": {
"my_script_filter": {
"type": "predicate_token_filter",
"script": {
"source": """
token.type.contains("ALPHANUM")
"""
}
}
}
}
}
}
----