OpenSearch/docs/reference/search/request/highlighting.asciidoc

[[search-request-highlighting]]
=== Highlighting

Allows to highlight search results on one or more fields. The
implementation uses either the lucene `fast-vector-highlighter` or
`highlighter`. The search request body:

[source,js]
--------------------------------------------------
{
    "query" : {...},
    "highlight" : {
        "fields" : {
            "content" : {}
        }
    }
}
--------------------------------------------------

In the above case, the `content` field will be highlighted for each
search hit (there will be another element in each search hit, called
`highlight`, which includes the highlighted fields and the highlighted
fragments).

In order to perform highlighting, the actual content of the field is
required. If the field in question is stored (has `store` set to `yes`
in the mapping), it will be used, otherwise, the actual `_source` will
be loaded and the relevant field will be extracted from it.

If `term_vector` information is provided by setting `term_vector` to
`with_positions_offsets` in the mapping then the fast vector
highlighter will be used instead of the plain highlighter.  The fast vector highlighter:

* Is faster especially for large fields (> `1MB`)
* Can be customized with `boundary_chars`, `boundary_max_scan`, and
 `fragment_offset` (see below)
* Requires setting `term_vector` to `with_positions_offsets` which
  increases the size of the index

Here is an example of setting the `content` field to allow for
highlighting using the fast vector highlighter on it (this will cause
the index to be bigger):

[source,js]
--------------------------------------------------
{
    "type_name" : {
        "content" : {"term_vector" : "with_positions_offsets"}
    }
}
--------------------------------------------------

The field name supports wildcard notation, for example,
using `comment_*` which will cause all fields that match the expression
to be highlighted.

[[tags]]
==== Highlighting Tags

By default, the highlighting will wrap highlighted text in `<em>` and
`</em>`. This can be controlled by setting `pre_tags` and `post_tags`,
for example:

[source,js]
--------------------------------------------------
{
    "query" : {...},
    "highlight" : {
        "pre_tags" : ["<tag1>", "<tag2>"],
        "post_tags" : ["</tag1>", "</tag2>"],
        "fields" : {
            "_all" : {}
        }
    }
}
--------------------------------------------------

There can be a single tag or more, and the "importance" is ordered.
There are also built in "tag" schemas, with currently a single schema
called `styled` with `pre_tags` of:

[source,js]
--------------------------------------------------
<em class="hlt1">, <em class="hlt2">, <em class="hlt3">,
<em class="hlt4">, <em class="hlt5">, <em class="hlt6">,
<em class="hlt7">, <em class="hlt8">, <em class="hlt9">,
<em class="hlt10">
--------------------------------------------------

And post tag of `</em>`. If you think of more nice to have built in tag
schemas, just send an email to the mailing list or open an issue. Here
is an example of switching tag schemas:

[source,js]
--------------------------------------------------
{
    "query" : {...},
    "highlight" : {
        "tags_schema" : "styled",
        "fields" : {
            "content" : {}
        }
    }
}
--------------------------------------------------

An `encoder` parameter can be used to define how highlighted text will
be encoded. It can be either `default` (no encoding) or `html` (will
escape html, if you use html highlighting tags).

==== Highlighted Fragments

Each field highlighted can control the size of the highlighted fragment
in characters (defaults to `100`), and the maximum number of fragments
to return (defaults to `5`). For example:

[source,js]
--------------------------------------------------
{
    "query" : {...},
    "highlight" : {
        "fields" : {
            "content" : {"fragment_size" : 150, "number_of_fragments" : 3}
        }
    }
}
--------------------------------------------------

On top of this it is possible to specify that highlighted fragments are
order by score:

[source,js]
--------------------------------------------------
{
    "query" : {...},
    "highlight" : {
        "order" : "score",
        "fields" : {
            "content" : {"fragment_size" : 150, "number_of_fragments" : 3}
        }
    }
}
--------------------------------------------------

If the `number_of_fragments` value is set to 0 then no fragments are
produced, instead the whole content of the field is returned, and of
course it is highlighted. This can be very handy if short texts (like
document title or address) need to be highlighted but no fragmentation
is required. Note that `fragment_size` is ignored in this case.

[source,js]
--------------------------------------------------
{
    "query" : {...},
    "highlight" : {
        "fields" : {
            "_all" : {},
            "bio.title" : {"number_of_fragments" : 0}
        }
    }
}
--------------------------------------------------

When using `fast-vector-highlighter` one can use `fragment_offset`
parameter to control the margin to start highlighting from.

==== Highlight query

It is also possible to highlight against a query other than the search
query by setting `highlight_query`.  This is especially useful if you
use a rescore query because those are not taken into account by
highlighting by default.  Elasticsearch does not validate that
`highlight_query` contains the search query in any way so it is possible
to define it so legitimate query results aren't highlighted at all.
Generally it is better to include the search query in the
`highlight_query`.  Here is an example of including both the search
query and the rescore query in `highlight_query`.
[source,js]
--------------------------------------------------
{
    "fields": [ "_id" ],
    "query" : {
        "match": {
            "content": {
                "query": "foo bar"
            }
        }
    },
    "rescore": {
        "window_size": 50,
        "query": {
            "rescore_query" : {
                "match_phrase": {
                    "content": {
                        "query": "foo bar",
                        "phrase_slop": 1
                    }
                }
            },
            "rescore_query_weight" : 10
        }
    },
    "highlight" : {
        "order" : "score",
        "fields" : {
            "content" : {
                "fragment_size" : 150,
                "number_of_fragments" : 3,
                "highlight_query": {
                    "bool": {
                        "must": {
                            "match": {
                                "content": {
                                    "query": "foo bar"
                                }
                            }
                        },
                        "should": {
                            "match_phrase": {
                                "content": {
                                    "query": "foo bar",
                                    "phrase_slop": 1,
                                    "boost": 10.0
                                }
                            }
                        },
                        "minimum_should_match": 0
                    }
                }
            }
        }
    }
}
--------------------------------------------------

Note the score of text fragment in this case is calculated by Lucene
highlighting framework. For implementation details you can check
`ScoreOrderFragmentsBuilder.java` class.

[[highlighting-settings]]
==== Global Settings

Highlighting settings can be set on a global level and then overridden
at the field level.

[source,js]
--------------------------------------------------
{
    "query" : {...},
    "highlight" : {
        "number_of_fragments" : 3,
        "fragment_size" : 150,
        "tag_schema" : "styled",
        "fields" : {
            "_all" : { "pre_tags" : ["<em>"], "post_tags" : ["</em>"] },
            "bio.title" : { "number_of_fragments" : 0 },
            "bio.author" : { "number_of_fragments" : 0 },
            "bio.content" : { "number_of_fragments" : 5, "order" : "score" }
        }
    }
}
--------------------------------------------------

[[field-match]]
==== Require Field Match

`require_field_match` can be set to `true` which will cause a field to
be highlighted only if a query matched that field. `false` means that
terms are highlighted on all requested fields regardless if the query
matches specifically on them.

[[boundary-characters]]
==== Boundary Characters

When highlighting a field that is mapped with term vectors,
`boundary_chars` can be configured to define what constitutes a boundary
for highlighting. It's a single string with each boundary character
defined in it. It defaults to `.,!? \t\n`.

The `boundary_max_scan` allows to control how far to look for boundary
characters, and defaults to `20`.
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`[[search-request-highlighting]]`
			`=== Highlighting`

			`Allows to highlight search results on one or more fields. The`
			implementation uses either the lucene `fast-vector-highlighter` or
			`highlighter`. The search request body:

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"query" : {...},`
			`"highlight" : {`
			`"fields" : {`
			`"content" : {}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			In the above case, the `content` field will be highlighted for each
			`search hit (there will be another element in each search hit, called`
			`highlight`, which includes the highlighted fields and the highlighted
			`fragments).`

			`In order to perform highlighting, the actual content of the field is`
			required. If the field in question is stored (has `store` set to `yes`
			in the mapping), it will be used, otherwise, the actual `_source` will
			`be loaded and the relevant field will be extracted from it.`

[DOCS] Reorganised the highlight_query docs and added a version flag 2013-10-18 12:03:31 -04:00			If `term_vector` information is provided by setting `term_vector` to
			`with_positions_offsets` in the mapping then the fast vector
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`highlighter will be used instead of the plain highlighter. The fast vector highlighter:`

			* Is faster especially for large fields (> `1MB`)
[DOCS] Reorganised the highlight_query docs and added a version flag 2013-10-18 12:03:31 -04:00			* Can be customized with `boundary_chars`, `boundary_max_scan`, and
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`fragment_offset` (see below)
[DOCS] Reorganised the highlight_query docs and added a version flag 2013-10-18 12:03:31 -04:00			* Requires setting `term_vector` to `with_positions_offsets` which
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`increases the size of the index`

			Here is an example of setting the `content` field to allow for
			`highlighting using the fast vector highlighter on it (this will cause`
			`the index to be bigger):`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"type_name" : {`
			`"content" : {"term_vector" : "with_positions_offsets"}`
			`}`
			`}`
			`--------------------------------------------------`

[DOCS] Removed outdated new/deprecated version notices 2013-09-03 15:27:49 -04:00			`The field name supports wildcard notation, for example,`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			using `comment_*` which will cause all fields that match the expression
			`to be highlighted.`

Add more anchor links to documentation Related to #3679 2013-09-25 12:17:40 -04:00			`[[tags]]`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`==== Highlighting Tags`

			By default, the highlighting will wrap highlighted text in `<em>` and
			`</em>`. This can be controlled by setting `pre_tags` and `post_tags`,
			`for example:`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"query" : {...},`
			`"highlight" : {`
			`"pre_tags" : ["<tag1>", "<tag2>"],`
			`"post_tags" : ["</tag1>", "</tag2>"],`
			`"fields" : {`
			`"_all" : {}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			`There can be a single tag or more, and the "importance" is ordered.`
			`There are also built in "tag" schemas, with currently a single schema`
			called `styled` with `pre_tags` of:

			`[source,js]`
			`--------------------------------------------------`
			`<em class="hlt1">, <em class="hlt2">, <em class="hlt3">,`
			`<em class="hlt4">, <em class="hlt5">, <em class="hlt6">,`
			`<em class="hlt7">, <em class="hlt8">, <em class="hlt9">,`
			`<em class="hlt10">`
			`--------------------------------------------------`

			And post tag of `</em>`. If you think of more nice to have built in tag
			`schemas, just send an email to the mailing list or open an issue. Here`
			`is an example of switching tag schemas:`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"query" : {...},`
			`"highlight" : {`
			`"tags_schema" : "styled",`
			`"fields" : {`
			`"content" : {}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			An `encoder` parameter can be used to define how highlighted text will
			be encoded. It can be either `default` (no encoding) or `html` (will
			`escape html, if you use html highlighting tags).`

			`==== Highlighted Fragments`

			`Each field highlighted can control the size of the highlighted fragment`
			in characters (defaults to `100`), and the maximum number of fragments
			to return (defaults to `5`). For example:

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"query" : {...},`
			`"highlight" : {`
			`"fields" : {`
			`"content" : {"fragment_size" : 150, "number_of_fragments" : 3}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			`On top of this it is possible to specify that highlighted fragments are`
			`order by score:`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"query" : {...},`
			`"highlight" : {`
			`"order" : "score",`
			`"fields" : {`
			`"content" : {"fragment_size" : 150, "number_of_fragments" : 3}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

[DOCS] Reorganised the highlight_query docs and added a version flag 2013-10-18 12:03:31 -04:00			If the `number_of_fragments` value is set to 0 then no fragments are
			`produced, instead the whole content of the field is returned, and of`
			`course it is highlighted. This can be very handy if short texts (like`
			`document title or address) need to be highlighted but no fragmentation`
			is required. Note that `fragment_size` is ignored in this case.

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"query" : {...},`
			`"highlight" : {`
			`"fields" : {`
			`"_all" : {},`
			`"bio.title" : {"number_of_fragments" : 0}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			When using `fast-vector-highlighter` one can use `fragment_offset`
			`parameter to control the margin to start highlighting from.`

			`==== Highlight query`

Support specifing score query on highlight. This is useful if you want to highlight terms not in the search query or you want sort highlighted snippets based on another query. Closes #3630 2013-09-05 12:39:01 -04:00			`It is also possible to highlight against a query other than the search`
			query by setting `highlight_query`. This is especially useful if you
			`use a rescore query because those are not taken into account by`
			`highlighting by default. Elasticsearch does not validate that`
			`highlight_query` contains the search query in any way so it is possible
			`to define it so legitimate query results aren't highlighted at all.`
			`Generally it is better to include the search query in the`
			`highlight_query`. Here is an example of including both the search
			query and the rescore query in `highlight_query`.
			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"fields": [ "_id" ],`
			`"query" : {`
			`"match": {`
			`"content": {`
			`"query": "foo bar"`
			`}`
			`}`
			`},`
			`"rescore": {`
			`"window_size": 50,`
			`"query": {`
			`"rescore_query" : {`
			`"match_phrase": {`
			`"content": {`
			`"query": "foo bar",`
			`"phrase_slop": 1`
			`}`
			`}`
			`},`
			`"rescore_query_weight" : 10`
			`}`
			`},`
			`"highlight" : {`
			`"order" : "score",`
			`"fields" : {`
			`"content" : {`
			`"fragment_size" : 150,`
			`"number_of_fragments" : 3,`
			`"highlight_query": {`
			`"bool": {`
			`"must": {`
			`"match": {`
			`"content": {`
			`"query": "foo bar"`
			`}`
			`}`
			`},`
			`"should": {`
			`"match_phrase": {`
			`"content": {`
			`"query": "foo bar",`
			`"phrase_slop": 1,`
			`"boost": 10.0`
			`}`
			`}`
			`},`
			`"minimum_should_match": 0`
			`}`
			`}`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`Note the score of text fragment in this case is calculated by Lucene`
			`highlighting framework. For implementation details you can check`
			`ScoreOrderFragmentsBuilder.java` class.

Uniquify anchor links to fix asciidoc/docbook generation 2013-09-30 17:32:00 -04:00			`[[highlighting-settings]]`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`==== Global Settings`

			`Highlighting settings can be set on a global level and then overridden`
			`at the field level.`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"query" : {...},`
			`"highlight" : {`
			`"number_of_fragments" : 3,`
			`"fragment_size" : 150,`
			`"tag_schema" : "styled",`
			`"fields" : {`
			`"_all" : { "pre_tags" : ["<em>"], "post_tags" : ["</em>"] },`
			`"bio.title" : { "number_of_fragments" : 0 },`
			`"bio.author" : { "number_of_fragments" : 0 },`
			`"bio.content" : { "number_of_fragments" : 5, "order" : "score" }`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

Add more anchor links to documentation Related to #3679 2013-09-25 12:17:40 -04:00			`[[field-match]]`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`==== Require Field Match`

			`require_field_match` can be set to `true` which will cause a field to
			be highlighted only if a query matched that field. `false` means that
			`terms are highlighted on all requested fields regardless if the query`
			`matches specifically on them.`

Add more anchor links to documentation Related to #3679 2013-09-25 12:17:40 -04:00			`[[boundary-characters]]`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`==== Boundary Characters`

			`When highlighting a field that is mapped with term vectors,`
			`boundary_chars` can be configured to define what constitutes a boundary
			`for highlighting. It's a single string with each boundary character`
			defined in it. It defaults to `.,!? \t\n`.

			The `boundary_max_scan` allows to control how far to look for boundary
			characters, and defaults to `20`.