[DOCS] Reformat `predicate_token_filter` tokenfilter (#57705) (#59714)

2020-07-16 13:35:09 -04:00 · 2020-07-16 13:35:09 -04:00 · da85a40e7e
parent f2b530dbdb
commit da85a40e7e
1 changed files with 112 additions and 63 deletions
--- a/docs/reference/analysis/tokenfilters/predicate-tokenfilter.asciidoc
+++ b/docs/reference/analysis/tokenfilters/predicate-tokenfilter.asciidoc
@ -4,76 +4,125 @@
 <titleabbrev>Predicate script</titleabbrev>
 ++++
-The predicate_token_filter token filter takes a predicate script, and removes tokens that do
+Removes tokens that don't match a provided predicate script. The filter supports
-not match the predicate.
+inline {painless}/index.html[Painless] scripts only. Scripts are evaluated in
 the {painless}/painless-analysis-predicate-context.html[analysis predicate
 context].
-[float]
+[[analysis-predicatefilter-tokenfilter-analyze-ex]]
-=== Options
+==== Example
 [horizontal]
 script:: a predicate script that determines whether or not the current token will
 be emitted.  Note that only inline scripts are supported.
-[float]
+The following <<indices-analyze,analyze API>> request uses the
-=== Settings example
+`predicate_token_filter` filter to only output tokens longer than three
-
+characters from `the fox jumps the lazy dog`.
 You can set it up like:
 [source,console]
--------------------------------------------------
+----
-PUT /condition_example
+GET /_analyze
 {
-    "settings" : {
+  "tokenizer": "whitespace",
-        "analysis" : {
+  "filter": [
            "analyzer" : {
                "my_analyzer" : {
                    "tokenizer" : "standard",
                    "filter" : [ "my_script_filter" ]
                }
            },
            "filter" : {
                "my_script_filter" : {
                    "type" : "predicate_token_filter",
                    "script" : {
                        "source" : "token.getTerm().length() > 5"  <1>
                    }
                }
            }
        }
    }
 }
 --------------------------------------------------
 <1> This will emit tokens that are more than 5 characters long
 And test it like:
 [source,console]
 --------------------------------------------------
 POST /condition_example/_analyze
 {
  "analyzer" : "my_analyzer",
  "text" : "What Flapdoodle"
 }
 --------------------------------------------------
 // TEST[continued]
 And it'd respond:
 [source,console-result]
 --------------------------------------------------
 {
  "tokens": [
    {
-      "token": "Flapdoodle",        <1>
+      "type": "predicate_token_filter",
-      "start_offset": 5,
+      "script": {
-      "end_offset": 15,
+        "source": """
-      "type": "<ALPHANUM>",
+          token.term.length() > 3
-      "position": 1                 <2>
+        """
      }
    }
  ],
  "text": "the fox jumps the lazy dog"
 }
 ----
 The filter produces the following tokens.
 [source,text]
 ----
 [ jumps, lazy ]
 ----
 The API response contains the position and offsets of each output token. Note
 the `predicate_token_filter` filter does not change the tokens' original
 positions or offets.
 .*Response*
 [%collapsible]
 ====
 [source,console-result]
 ----
 {
  "tokens" : [
    {
      "token" : "jumps",
      "start_offset" : 8,
      "end_offset" : 13,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "lazy",
      "start_offset" : 18,
      "end_offset" : 22,
      "type" : "word",
      "position" : 4
    }
  ]
 }
--------------------------------------------------
+----
 ====
-<1> The token 'What' has been removed from the tokenstream because it does not
+[[analysis-predicatefilter-tokenfilter-configure-parms]]
-match the predicate.
+==== Configurable parameters
-<2> The position and offset values are unaffected by the removal of earlier tokens
+
 `script`::
 (Required, <<modules-scripting-using,script object>>)
 Script containing a condition used to filter incoming tokens. Only tokens that
 match this script are included in the output.
 +
 This parameter supports inline {painless}/index.html[Painless] scripts only. The
 script is evaluated in the
 {painless}/painless-analysis-predicate-context.html[analysis predicate context].
 [[analysis-predicatefilter-tokenfilter-customize]]
 ==== Customize and add to an analyzer
 To customize the `predicate_token_filter` filter, duplicate it to create the basis
 for a new custom token filter. You can modify the filter using its configurable
 parameters.
 The following <<indices-create-index,create index API>> request
 configures a new <<analysis-custom-analyzer,custom analyzer>> using a custom
 `predicate_token_filter` filter, `my_script_filter`.
 The `my_script_filter` filter removes tokens with of any type other than
 `ALPHANUM`.
 [source,console]
 ----
 PUT /my_index
 {
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "my_script_filter"
          ]
        }
      },
      "filter": {
        "my_script_filter": {
          "type": "predicate_token_filter",
          "script": {
            "source": """
              token.type.contains("ALPHANUM")
            """
          }
        }
      }
    }
  }
 }
 ----