[DOCS] Incorporated feedback on the highlighting changes.

2017-07-12 16:36:07 -07:00 · 2017-07-12 16:36:07 -07:00 · ded9f55263
parent 70b2897bdf
commit ded9f55263
1 changed files with 406 additions and 366 deletions
--- a/docs/reference/search/request/highlighting.asciidoc
+++ b/docs/reference/search/request/highlighting.asciidoc
@ -35,20 +35,24 @@ GET /_search
 // CONSOLE
 // TEST[setup:twitter]
-{es} supports three highlighters:
+{es} supports three highlighters: `unified`, `plain`, and `fvh` (fast vector
 highlighter). You can specify the highlighter `type` you want to use
 for each field.
 [[unified-highlighter]]
-* The `unified` highlighter uses the Lucene Unified Highlighter. This
+==== Unified highlighter
 The `unified` highlighter uses the Lucene Unified Highlighter. This
 highlighter breaks the text into sentences and uses the BM25 algorithm to score
 individual sentences as if they were documents in the corpus. It also supports
 accurate phrase and multi-term (fuzzy, prefix, regex) highlighting. This is the
 default highlighter.
 [[plain-highlighter]]
-* The `plain` highlighter uses the standard Lucene highlighter. It attempts to
+==== Plain highlighter
 The `plain` highlighter uses the standard Lucene highlighter. It attempts to
 reflect the query matching logic in terms of understanding word importance and
 any word positioning criteria in phrase queries.
-+
+
 [WARNING]
 The `plain` highlighter works best for highlighting simple query matches in a
 single field. To accurately reflect query logic, it creates a tiny in-memory
@ -59,20 +63,23 @@ If you want to highlight a lot of fields in a lot of documents with complex
 queries, we recommend using one of the other highlighters.
 [[fast-vector-highlighter]]
-* The `fvh` highlighter uses the Lucene Fast Vector highlighter.
+==== Fast vector highlighter
 The `fvh` highlighter uses the Lucene Fast Vector highlighter.
 This highlighter can be used on fields with `term_vector` set to
 `with_positions_offsets` in the mapping. The fast vector highlighter:
-** Is faster especially for large fields (> `1MB`)
+* Is faster especially for large fields (> `1MB`)
-** Can be customized with  a <<boundary-scanners,`boundary_scanner`>>. 
+* Can be customized with  a <<boundary-scanners,`boundary_scanner`>>. 
-** Requires setting `term_vector` to `with_positions_offsets` which
+* Requires setting `term_vector` to `with_positions_offsets` which
  increases the size of the index
-** Can combine matches from multiple fields into one result.  See
+* Can combine matches from multiple fields into one result.  See
  `matched_fields`
-** Can assign different weights to matches at different positions allowing
+* Can assign different weights to matches at different positions allowing
  for things like phrase matches being sorted above term matches when
  highlighting a Boosting Query that boosts phrase matches over term matches
 [[offsets-strategy]]
 ==== Offsets Strategy
 To create meaningful search snippets from the terms being queried,
 the highlighter needs to know the start and end character offsets of each word
 in the original text. These offsets can be obtained from:
@ -99,9 +106,6 @@ Lucene's query execution planner to get access to low-level match information on
 the current document. This is repeated for every field and every document that
 needs highlighting. The `plain` highlighter always uses plain highlighting.
 You can specify the highlighter `type` you want to use
 for each field.
 [[highlighting-settings]]
 ==== Highlighting Settings
@ -118,11 +122,10 @@ boundary_scanner:: Specifies how to break the highlighted fragments: `chars`,
 `sentence`, or `word`. Only valid for the `unified` and `fvh` highlighters.
 Defaults to `sentence` for the `unified` highlighter. Defaults to `chars` for
 the `fvh` highlighter.
-+
+`chars`::: Use the characters specified by `boundary_chars` as highlighting
 * `chars` Use the characters specified by `boundary_chars` as highlighting
 boundaries.  The `boundary_max_scan` setting controls how far to scan for
 boundary characters. Only valid for the `fvh` highlighter.
-* `sentence` Break highlighted fragments at the next sentence boundary, as
+`sentence`::: Break highlighted fragments at the next sentence boundary, as
 determined by Java's 
 https://docs.oracle.com/javase/8/docs/api/java/text/BreakIterator.html[BreakIterator].
 You can specify the locale to use with `boundary_scanner_locale`.
@ -131,7 +134,7 @@ NOTE: When used with the `unified` highlighter, the `sentence` scanner splits
 sentences bigger than `fragment_size` at the first word boundary next to
 `fragment_size`. You can set `fragment_size` to 0 to never split any sentence.
-* `word` Break highlighted fragments at the next word boundary, as determined
+`word`::: Break highlighted fragments at the next word boundary, as determined
 by Java's https://docs.oracle.com/javase/8/docs/api/java/text/BreakIterator.html[BreakIterator].
 You can specify the locale to use with `boundary_scanner_locale`.
@ -156,9 +159,9 @@ stored separately. Defaults to `false`.
 fragmenter:: Specifies how text should be broken up in highlight
 snippets: `simple` or `span`. Only valid for the `plain` highlighter.
 Defaults to `span`.
-+
+
-* `simple` Breaks up text into same-sized fragments.
+`simple`::: Breaks up text into same-sized fragments.
-* `span` Breaks up text into same-sized fragments, but tried to avoid
+`span`::: Breaks up text into same-sized fragments, but tried to avoid
 breaking up text between highlighted terms. This is helpful when you're
 querying for phrases. Default.
@ -207,7 +210,7 @@ Defaults to 256.
 pre_tags:: Use in conjunction with `post_tags` to define the HTML tags
 to use for the highlighted text. By default, highlighted text is wrapped
-in `<em>` and </em>` tags. Specify as an array of strings.
+in `<em>` and `</em>` tags. Specify as an array of strings.
 post_tags:: Use in conjunction with `pre_tags` to define the HTML tags
 to use for the highlighted text. By default, highlighted text is wrapped
@ -229,7 +232,6 @@ schema defines the following `pre_tags` and defines `post_tags` as
 <em class="hlt10">
 --------------------------------------------------
 [[highlighter-type]]
 type:: The highlighter to use: `unified`, `plain`, or `fvh`. Defaults to
 `unified`.
@ -237,50 +239,120 @@ type:: The highlighter to use: `unified`, `plain`, or `fvh`. Defaults to
 [[highlighting-examples]]
 ==== Highlighting Examples
-Here is an example of setting the `comment` field in the index mapping to allow for
+* <<override-global-settings, Override global settings>>
-highlighting using the postings:
+* <<specify-highlight-query, Specify a highlight query>>
 * <<set-highlighter-type, Set highlighter type>>
 * <<configure-tags, Configure highlighting tags>>
 * <<highlight-source, Highlight source>>
 * <<highlight-all, Highlight all fields>>
 * <<matched-fields, Combine matches on multiple fields>>
 * <<explicit-field-order, Explicitly order highlighted fields>>
 * <<control-highlighted-frags, Control highlighted fragments>>
 * <<highlight-postings-list, Highlight using the postings list>>
 * <<specify-fragmenter, Specify a fragmenter for the plain highlighter>>
 [[override-global-settings]]
 [float]
 === Override global settings
 You can specify highlighter settings globally and selectively override them for
 individual fields.
 [source,js]
 --------------------------------------------------
-PUT /example
+GET /_search
 {
-  "mappings": {
+    "query" : {
-    "doc" : {
+        "match": { "user": "kimchy" }
-      "properties": {
+    },
-        "comment" : {
+    "highlight" : {
-          "type": "text",
+        "number_of_fragments" : 3,
-          "index_options" : "offsets"
+        "fragment_size" : 150,
        "fields" : {
            "_all" : { "pre_tags" : ["<em>"], "post_tags" : ["</em>"] },
            "blog.title" : { "number_of_fragments" : 0 },
            "blog.author" : { "number_of_fragments" : 0 },
            "blog.comment" : { "number_of_fragments" : 5, "order" : "score" }
        }
      }
    }
  }
 }
 --------------------------------------------------
 // CONSOLE
 // TEST[setup:twitter]
-Here is an example of setting the `comment` field to allow for
+[float]
-highlighting using the `term_vectors` (this will cause the index to be bigger):
+[[specify-highlight-query]]
 === Specify a highlight query
 You can specify a `highlight_query` to take additional information into account
 when highlighting. For example, the following query includes both the search
 query and rescore query in the `highlight_query`. Without the `highlight_query`,
 highlighting would only take the search query into account.
 [source,js]
 --------------------------------------------------
-PUT /example
+GET /_search
 {
-  "mappings": {
+    "stored_fields": [ "_id" ],
-    "doc" : {
+    "query" : {
-      "properties": {
+        "match": {
-        "comment" : {
+            "comment": {
-          "type": "text",
+                "query": "foo bar"
-          "term_vector" : "with_positions_offsets"
+            }
        }
    },
    "rescore": {
        "window_size": 50,
        "query": {
            "rescore_query" : {
                "match_phrase": {
                    "comment": {
                        "query": "foo bar",
                        "slop": 1
                    }
                }
            },
            "rescore_query_weight" : 10
        }
    },
    "highlight" : {
        "order" : "score",
        "fields" : {
            "comment" : {
                "fragment_size" : 150,
                "number_of_fragments" : 3,
                "highlight_query": {
                    "bool": {
                        "must": {
                            "match": {
                                "comment": {
                                    "query": "foo bar"
                                }
                            }
                        },
                        "should": {
                            "match_phrase": {
                                "comment": {
                                    "query": "foo bar",
                                    "slop": 1,
                                    "boost": 10.0
                                }
                            }
                        },
                        "minimum_should_match": 0
                    }
                }
            }
        }
      }
    }
  }
 }
 --------------------------------------------------
 // CONSOLE
 // TEST[setup:twitter]
-
+[float]
-===== Force highlighter type
+[[set-highlighter-type]]
 === Set highlighter type
 The `type` field allows to force a specific highlighter type.
 The allowed values are: `unified`, `plain` and `fvh`.
@ -303,30 +375,9 @@ GET /_search
 // CONSOLE
 // TEST[setup:twitter]
-===== Force highlighting on source
+[[configure-tags]]
-
+[float]
-Forces the highlighting to highlight fields based on the source even if fields
+=== Configure highlighting tags
 are stored separately. Defaults to `false`.
 [source,js]
 --------------------------------------------------
 GET /_search
 {
    "query" : {
        "match": { "user": "kimchy" }
    },
    "highlight" : {
        "fields" : {
            "comment" : {"force_source" : true}
        }
    }
 }
 --------------------------------------------------
 // CONSOLE
 // TEST[setup:twitter]
 [[tags]]
 ===== Configure highlighting tags
 By default, the highlighting will wrap highlighted text in `<em>` and
 `</em>`. This can be controlled by setting `pre_tags` and `post_tags`,
@ -393,13 +444,12 @@ GET /_search
 // CONSOLE
 // TEST[setup:twitter]
 [float]
 [[highlight-source]]
 === Highlight on source
-===== Controlling highlighted fragments
+Forces the highlighting to highlight fields based on the source even if fields
-
+are stored separately. Defaults to `false`.
 Each field highlighted can control the size of the highlighted fragment
 in characters (defaults to `100`), and the maximum number of fragments
 to return (defaults to `5`).
 For example:
 [source,js]
 --------------------------------------------------
@ -410,7 +460,7 @@ GET /_search
    },
    "highlight" : {
        "fields" : {
-            "comment" : {"fragment_size" : 150, "number_of_fragments" : 3}
+            "comment" : {"force_source" : true}
        }
    }
 }
@ -418,294 +468,10 @@ GET /_search
 // CONSOLE
 // TEST[setup:twitter]
 On top of this it is possible to specify that highlighted fragments need
 to be sorted by score:
-[source,js]
+[[highlight-all]]
--------------------------------------------------
+[float]
-GET /_search
+=== Highlight in all fields
 {
    "query" : {
        "match": { "user": "kimchy" }
    },
    "highlight" : {
        "order" : "score",
        "fields" : {
            "comment" : {"fragment_size" : 150, "number_of_fragments" : 3}
        }
    }
 }
 --------------------------------------------------
 // CONSOLE
 // TEST[setup:twitter]
 If the `number_of_fragments` value is set to `0` then no fragments are
 produced, instead the whole content of the field is returned, and of
 course it is highlighted. This can be very handy if short texts (like
 document title or address) need to be highlighted but no fragmentation
 is required. Note that `fragment_size` is ignored in this case.
 [source,js]
 --------------------------------------------------
 GET /_search
 {
    "query" : {
        "match": { "user": "kimchy" }
    },
    "highlight" : {
        "fields" : {
            "_all" : {},
            "blog.title" : {"number_of_fragments" : 0}
        }
    }
 }
 --------------------------------------------------
 // CONSOLE
 // TEST[setup:twitter]
 When using `fvh` one can use `fragment_offset`
 parameter to control the margin to start highlighting from.
 In the case where there is no matching fragment to highlight, the default is
 to not return anything. Instead, we can return a snippet of text from the
 beginning of the field by setting `no_match_size` (default `0`) to the length
 of the text that you want returned. The actual length may be shorter or longer than
 specified as it tries to break on a word boundary.
 [source,js]
 --------------------------------------------------
 GET /_search
 {
    "query" : {
        "match": { "user": "kimchy" }
    },
    "highlight" : {
        "fields" : {
            "comment" : {
                "fragment_size" : 150,
                "number_of_fragments" : 3,
                "no_match_size": 150
            }
        }
    }
 }
 --------------------------------------------------
 // CONSOLE
 // TEST[setup:twitter]
 ===== Specifying a fragmenter for the plain highlighter
 When using the `plain` highlighter, you can choose between the `simple` and
 `span` fragmenters:
 [source,js]
 --------------------------------------------------
 GET twitter/tweet/_search
 {
    "query" : {
        "match_phrase": { "message": "number 1" }
    },
    "highlight" : {
        "fields" : {
            "message" : {
                "type": "plain",
                "fragment_size" : 15,
                "number_of_fragments" : 3,
                "fragmenter": "simple"
            }
        }
    }
 }
 --------------------------------------------------
 // CONSOLE
 // TEST[setup:twitter]
 Response:
 [source,js]
 --------------------------------------------------
 {
    ...
    "hits": {
        "total": 1,
        "max_score": 1.601195,
        "hits": [
            {
                "_index": "twitter",
                "_type": "tweet",
                "_id": "1",
                "_score": 1.601195,
                "_source": {
                    "user": "test",
                    "message": "some message with the number 1",
                    "date": "2009-11-15T14:12:12",
                    "likes": 1
                },
                "highlight": {
                    "message": [
                        " with the <em>number</em>",
                        " <em>1</em>"
                    ]
                }
            }
        ]
    }
 }
 --------------------------------------------------
 // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,/]
 [source,js]
 --------------------------------------------------
 GET twitter/tweet/_search
 {
    "query" : {
        "match_phrase": { "message": "number 1" }
    },
    "highlight" : {
        "fields" : {
            "message" : {
                "type": "plain",
                "fragment_size" : 15,
                "number_of_fragments" : 3,
                "fragmenter": "span"
            }
        }
    }
 }
 --------------------------------------------------
 // CONSOLE
 // TEST[setup:twitter]
 Response:
 [source,js]
 --------------------------------------------------
 {
    ...
    "hits": {
        "total": 1,
        "max_score": 1.601195,
        "hits": [
            {
                "_index": "twitter",
                "_type": "tweet",
                "_id": "1",
                "_score": 1.601195,
                "_source": {
                    "user": "test",
                    "message": "some message with the number 1",
                    "date": "2009-11-15T14:12:12",
                    "likes": 1
                },
                "highlight": {
                    "message": [
                        "some message with the <em>number</em> <em>1</em>"
                    ]
                }
            }
        ]
    }
 }
 --------------------------------------------------
 // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,/]
 If the `number_of_fragments` option is set to `0`,
 `NullFragmenter` is used which does not fragment the text at all.
 This is useful for highlighting the entire contents of a document or field.
 ===== Specifying a highlight query
 Here is an example of including both the search
 query and the rescore query in `highlight_query`.
 [source,js]
 --------------------------------------------------
 GET /_search
 {
    "stored_fields": [ "_id" ],
    "query" : {
        "match": {
            "comment": {
                "query": "foo bar"
            }
        }
    },
    "rescore": {
        "window_size": 50,
        "query": {
            "rescore_query" : {
                "match_phrase": {
                    "comment": {
                        "query": "foo bar",
                        "slop": 1
                    }
                }
            },
            "rescore_query_weight" : 10
        }
    },
    "highlight" : {
        "order" : "score",
        "fields" : {
            "comment" : {
                "fragment_size" : 150,
                "number_of_fragments" : 3,
                "highlight_query": {
                    "bool": {
                        "must": {
                            "match": {
                                "comment": {
                                    "query": "foo bar"
                                }
                            }
                        },
                        "should": {
                            "match_phrase": {
                                "comment": {
                                    "query": "foo bar",
                                    "slop": 1,
                                    "boost": 10.0
                                }
                            }
                        },
                        "minimum_should_match": 0
                    }
                }
            }
        }
    }
 }
 --------------------------------------------------
 // CONSOLE
 // TEST[setup:twitter]
 [[overriding-global-settings]]
 ===== Overriding global settings
 [source,js]
 --------------------------------------------------
 GET /_search
 {
    "query" : {
        "match": { "user": "kimchy" }
    },
    "highlight" : {
        "number_of_fragments" : 3,
        "fragment_size" : 150,
        "fields" : {
            "_all" : { "pre_tags" : ["<em>"], "post_tags" : ["</em>"] },
            "blog.title" : { "number_of_fragments" : 0 },
            "blog.author" : { "number_of_fragments" : 0 },
            "blog.comment" : { "number_of_fragments" : 5, "order" : "score" }
        }
    }
 }
 --------------------------------------------------
 // CONSOLE
 // TEST[setup:twitter]
 [[field-match]]
 ===== Highlighting in all fields
 By default, only fields that contains a query match are highlighted. Set
 `require_field_match` to `false` to highlight all fields.
@ -729,7 +495,8 @@ GET /_search
 // TEST[setup:twitter]
 [[matched-fields]]
-===== Combining matches on multiple fields
+[float]
 === Combine matches on multiple fields
 WARNING: This is only supported by the `fvh` highlighter
@ -865,7 +632,8 @@ to
 [[explicit-field-order]]
-===== Explicitly ordering highlighted fields
+[float]
 === Explicitly order highlighted fields
 Elasticsearch highlights the fields in the order that they are sent, but per the
 JSON spec, objects are unordered.  If you need to be explicit about the order
 in which fields are highlighted specify the `fields` as an array:
@ -887,3 +655,275 @@ GET /_search
 None of the highlighters built into Elasticsearch care about the order that the
 fields are highlighted but a plugin might.
 [float]
 [[control-highlighted-frags]]
 === Control highlighted fragments
 Each field highlighted can control the size of the highlighted fragment
 in characters (defaults to `100`), and the maximum number of fragments
 to return (defaults to `5`).
 For example:
 [source,js]
 --------------------------------------------------
 GET /_search
 {
    "query" : {
        "match": { "user": "kimchy" }
    },
    "highlight" : {
        "fields" : {
            "comment" : {"fragment_size" : 150, "number_of_fragments" : 3}
        }
    }
 }
 --------------------------------------------------
 // CONSOLE
 // TEST[setup:twitter]
 On top of this it is possible to specify that highlighted fragments need
 to be sorted by score:
 [source,js]
 --------------------------------------------------
 GET /_search
 {
    "query" : {
        "match": { "user": "kimchy" }
    },
    "highlight" : {
        "order" : "score",
        "fields" : {
            "comment" : {"fragment_size" : 150, "number_of_fragments" : 3}
        }
    }
 }
 --------------------------------------------------
 // CONSOLE
 // TEST[setup:twitter]
 If the `number_of_fragments` value is set to `0` then no fragments are
 produced, instead the whole content of the field is returned, and of
 course it is highlighted. This can be very handy if short texts (like
 document title or address) need to be highlighted but no fragmentation
 is required. Note that `fragment_size` is ignored in this case.
 [source,js]
 --------------------------------------------------
 GET /_search
 {
    "query" : {
        "match": { "user": "kimchy" }
    },
    "highlight" : {
        "fields" : {
            "_all" : {},
            "blog.title" : {"number_of_fragments" : 0}
        }
    }
 }
 --------------------------------------------------
 // CONSOLE
 // TEST[setup:twitter]
 When using `fvh` one can use `fragment_offset`
 parameter to control the margin to start highlighting from.
 In the case where there is no matching fragment to highlight, the default is
 to not return anything. Instead, we can return a snippet of text from the
 beginning of the field by setting `no_match_size` (default `0`) to the length
 of the text that you want returned. The actual length may be shorter or longer than
 specified as it tries to break on a word boundary.
 [source,js]
 --------------------------------------------------
 GET /_search
 {
    "query" : {
        "match": { "user": "kimchy" }
    },
    "highlight" : {
        "fields" : {
            "comment" : {
                "fragment_size" : 150,
                "number_of_fragments" : 3,
                "no_match_size": 150
            }
        }
    }
 }
 --------------------------------------------------
 // CONSOLE
 // TEST[setup:twitter]
 [float]
 [[highlight-postings-list]]
 === Highlight using the postings list
 Here is an example of setting the `comment` field in the index mapping to
 allow for highlighting using the postings:
 [source,js]
 --------------------------------------------------
 PUT /example
 {
  "mappings": {
    "doc" : {
      "properties": {
        "comment" : {
          "type": "text",
          "index_options" : "offsets"
        }
      }
    }
  }
 }
 --------------------------------------------------
 // CONSOLE
 Here is an example of setting the `comment` field to allow for
 highlighting using the `term_vectors` (this will cause the index to be bigger):
 [source,js]
 --------------------------------------------------
 PUT /example
 {
  "mappings": {
    "doc" : {
      "properties": {
        "comment" : {
          "type": "text",
          "term_vector" : "with_positions_offsets"
        }
      }
    }
  }
 }
 --------------------------------------------------
 // CONSOLE
 [float]
 [[specify-fragmenter]]
 === Specify a fragmenter for the plain highlighter
 When using the `plain` highlighter, you can choose between the `simple` and
 `span` fragmenters:
 [source,js]
 --------------------------------------------------
 GET twitter/tweet/_search
 {
    "query" : {
        "match_phrase": { "message": "number 1" }
    },
    "highlight" : {
        "fields" : {
            "message" : {
                "type": "plain",
                "fragment_size" : 15,
                "number_of_fragments" : 3,
                "fragmenter": "simple"
            }
        }
    }
 }
 --------------------------------------------------
 // CONSOLE
 // TEST[setup:twitter]
 Response:
 [source,js]
 --------------------------------------------------
 {
    ...
    "hits": {
        "total": 1,
        "max_score": 1.601195,
        "hits": [
            {
                "_index": "twitter",
                "_type": "tweet",
                "_id": "1",
                "_score": 1.601195,
                "_source": {
                    "user": "test",
                    "message": "some message with the number 1",
                    "date": "2009-11-15T14:12:12",
                    "likes": 1
                },
                "highlight": {
                    "message": [
                        " with the <em>number</em>",
                        " <em>1</em>"
                    ]
                }
            }
        ]
    }
 }
 --------------------------------------------------
 // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,/]
 [source,js]
 --------------------------------------------------
 GET twitter/tweet/_search
 {
    "query" : {
        "match_phrase": { "message": "number 1" }
    },
    "highlight" : {
        "fields" : {
            "message" : {
                "type": "plain",
                "fragment_size" : 15,
                "number_of_fragments" : 3,
                "fragmenter": "span"
            }
        }
    }
 }
 --------------------------------------------------
 // CONSOLE
 // TEST[setup:twitter]
 Response:
 [source,js]
 --------------------------------------------------
 {
    ...
    "hits": {
        "total": 1,
        "max_score": 1.601195,
        "hits": [
            {
                "_index": "twitter",
                "_type": "tweet",
                "_id": "1",
                "_score": 1.601195,
                "_source": {
                    "user": "test",
                    "message": "some message with the number 1",
                    "date": "2009-11-15T14:12:12",
                    "likes": 1
                },
                "highlight": {
                    "message": [
                        "some message with the <em>number</em> <em>1</em>"
                    ]
                }
            }
        ]
    }
 }
 --------------------------------------------------
 // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,/]
 If the `number_of_fragments` option is set to `0`,
 `NullFragmenter` is used which does not fragment the text at all.
 This is useful for highlighting the entire contents of a document or field.