parent
f2b530dbdb
commit
da85a40e7e
|
@ -4,76 +4,125 @@
|
||||||
<titleabbrev>Predicate script</titleabbrev>
|
<titleabbrev>Predicate script</titleabbrev>
|
||||||
++++
|
++++
|
||||||
|
|
||||||
The predicate_token_filter token filter takes a predicate script, and removes tokens that do
|
Removes tokens that don't match a provided predicate script. The filter supports
|
||||||
not match the predicate.
|
inline {painless}/index.html[Painless] scripts only. Scripts are evaluated in
|
||||||
|
the {painless}/painless-analysis-predicate-context.html[analysis predicate
|
||||||
|
context].
|
||||||
|
|
||||||
[float]
|
[[analysis-predicatefilter-tokenfilter-analyze-ex]]
|
||||||
=== Options
|
==== Example
|
||||||
[horizontal]
|
|
||||||
script:: a predicate script that determines whether or not the current token will
|
|
||||||
be emitted. Note that only inline scripts are supported.
|
|
||||||
|
|
||||||
[float]
|
The following <<indices-analyze,analyze API>> request uses the
|
||||||
=== Settings example
|
`predicate_token_filter` filter to only output tokens longer than three
|
||||||
|
characters from `the fox jumps the lazy dog`.
|
||||||
You can set it up like:
|
|
||||||
|
|
||||||
[source,console]
|
[source,console]
|
||||||
--------------------------------------------------
|
----
|
||||||
PUT /condition_example
|
GET /_analyze
|
||||||
{
|
{
|
||||||
"settings" : {
|
"tokenizer": "whitespace",
|
||||||
"analysis" : {
|
"filter": [
|
||||||
"analyzer" : {
|
|
||||||
"my_analyzer" : {
|
|
||||||
"tokenizer" : "standard",
|
|
||||||
"filter" : [ "my_script_filter" ]
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"filter" : {
|
|
||||||
"my_script_filter" : {
|
|
||||||
"type" : "predicate_token_filter",
|
|
||||||
"script" : {
|
|
||||||
"source" : "token.getTerm().length() > 5" <1>
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
<1> This will emit tokens that are more than 5 characters long
|
|
||||||
|
|
||||||
And test it like:
|
|
||||||
|
|
||||||
[source,console]
|
|
||||||
--------------------------------------------------
|
|
||||||
POST /condition_example/_analyze
|
|
||||||
{
|
|
||||||
"analyzer" : "my_analyzer",
|
|
||||||
"text" : "What Flapdoodle"
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
// TEST[continued]
|
|
||||||
|
|
||||||
And it'd respond:
|
|
||||||
|
|
||||||
[source,console-result]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tokens": [
|
|
||||||
{
|
{
|
||||||
"token": "Flapdoodle", <1>
|
"type": "predicate_token_filter",
|
||||||
"start_offset": 5,
|
"script": {
|
||||||
"end_offset": 15,
|
"source": """
|
||||||
"type": "<ALPHANUM>",
|
token.term.length() > 3
|
||||||
"position": 1 <2>
|
"""
|
||||||
|
}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"text": "the fox jumps the lazy dog"
|
||||||
|
}
|
||||||
|
----
|
||||||
|
|
||||||
|
The filter produces the following tokens.
|
||||||
|
|
||||||
|
[source,text]
|
||||||
|
----
|
||||||
|
[ jumps, lazy ]
|
||||||
|
----
|
||||||
|
|
||||||
|
The API response contains the position and offsets of each output token. Note
|
||||||
|
the `predicate_token_filter` filter does not change the tokens' original
|
||||||
|
positions or offets.
|
||||||
|
|
||||||
|
.*Response*
|
||||||
|
[%collapsible]
|
||||||
|
====
|
||||||
|
[source,console-result]
|
||||||
|
----
|
||||||
|
{
|
||||||
|
"tokens" : [
|
||||||
|
{
|
||||||
|
"token" : "jumps",
|
||||||
|
"start_offset" : 8,
|
||||||
|
"end_offset" : 13,
|
||||||
|
"type" : "word",
|
||||||
|
"position" : 2
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"token" : "lazy",
|
||||||
|
"start_offset" : 18,
|
||||||
|
"end_offset" : 22,
|
||||||
|
"type" : "word",
|
||||||
|
"position" : 4
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
--------------------------------------------------
|
----
|
||||||
|
====
|
||||||
|
|
||||||
<1> The token 'What' has been removed from the tokenstream because it does not
|
[[analysis-predicatefilter-tokenfilter-configure-parms]]
|
||||||
match the predicate.
|
==== Configurable parameters
|
||||||
<2> The position and offset values are unaffected by the removal of earlier tokens
|
|
||||||
|
`script`::
|
||||||
|
(Required, <<modules-scripting-using,script object>>)
|
||||||
|
Script containing a condition used to filter incoming tokens. Only tokens that
|
||||||
|
match this script are included in the output.
|
||||||
|
+
|
||||||
|
This parameter supports inline {painless}/index.html[Painless] scripts only. The
|
||||||
|
script is evaluated in the
|
||||||
|
{painless}/painless-analysis-predicate-context.html[analysis predicate context].
|
||||||
|
|
||||||
|
[[analysis-predicatefilter-tokenfilter-customize]]
|
||||||
|
==== Customize and add to an analyzer
|
||||||
|
|
||||||
|
To customize the `predicate_token_filter` filter, duplicate it to create the basis
|
||||||
|
for a new custom token filter. You can modify the filter using its configurable
|
||||||
|
parameters.
|
||||||
|
|
||||||
|
The following <<indices-create-index,create index API>> request
|
||||||
|
configures a new <<analysis-custom-analyzer,custom analyzer>> using a custom
|
||||||
|
`predicate_token_filter` filter, `my_script_filter`.
|
||||||
|
|
||||||
|
The `my_script_filter` filter removes tokens with of any type other than
|
||||||
|
`ALPHANUM`.
|
||||||
|
|
||||||
|
[source,console]
|
||||||
|
----
|
||||||
|
PUT /my_index
|
||||||
|
{
|
||||||
|
"settings": {
|
||||||
|
"analysis": {
|
||||||
|
"analyzer": {
|
||||||
|
"my_analyzer": {
|
||||||
|
"tokenizer": "standard",
|
||||||
|
"filter": [
|
||||||
|
"my_script_filter"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"filter": {
|
||||||
|
"my_script_filter": {
|
||||||
|
"type": "predicate_token_filter",
|
||||||
|
"script": {
|
||||||
|
"source": """
|
||||||
|
token.type.contains("ALPHANUM")
|
||||||
|
"""
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
----
|
||||||
|
|
Loading…
Reference in New Issue