[[search-request-highlighting]] === Highlighting Allows to highlight search results on one or more fields. The implementation uses either the lucene `fast-vector-highlighter` or `highlighter`. The search request body: [source,js] -------------------------------------------------- { "query" : {...}, "highlight" : { "fields" : { "content" : {} } } } -------------------------------------------------- In the above case, the `content` field will be highlighted for each search hit (there will be another element in each search hit, called `highlight`, which includes the highlighted fields and the highlighted fragments). In order to perform highlighting, the actual content of the field is required. If the field in question is stored (has `store` set to `yes` in the mapping), it will be used, otherwise, the actual `_source` will be loaded and the relevant field will be extracted from it. If `term_vector` information is provided by setting `term_vector` to `with_positions_offsets` in the mapping then the fast vector highlighter will be used instead of the plain highlighter. The fast vector highlighter: * Is faster especially for large fields (> `1MB`) * Can be customized with `boundary_chars`, `boundary_max_scan`, and `fragment_offset` (see below) * Requires setting `term_vector` to `with_positions_offsets` which increases the size of the index Here is an example of setting the `content` field to allow for highlighting using the fast vector highlighter on it (this will cause the index to be bigger): [source,js] -------------------------------------------------- { "type_name" : { "content" : {"term_vector" : "with_positions_offsets"} } } -------------------------------------------------- The field name supports wildcard notation, for example, using `comment_*` which will cause all fields that match the expression to be highlighted. [[tags]] ==== Highlighting Tags By default, the highlighting will wrap highlighted text in `` and ``. This can be controlled by setting `pre_tags` and `post_tags`, for example: [source,js] -------------------------------------------------- { "query" : {...}, "highlight" : { "pre_tags" : ["", ""], "post_tags" : ["", ""], "fields" : { "_all" : {} } } } -------------------------------------------------- There can be a single tag or more, and the "importance" is ordered. There are also built in "tag" schemas, with currently a single schema called `styled` with `pre_tags` of: [source,js] -------------------------------------------------- , , , , , , , , , -------------------------------------------------- And post tag of ``. If you think of more nice to have built in tag schemas, just send an email to the mailing list or open an issue. Here is an example of switching tag schemas: [source,js] -------------------------------------------------- { "query" : {...}, "highlight" : { "tags_schema" : "styled", "fields" : { "content" : {} } } } -------------------------------------------------- An `encoder` parameter can be used to define how highlighted text will be encoded. It can be either `default` (no encoding) or `html` (will escape html, if you use html highlighting tags). ==== Highlighted Fragments Each field highlighted can control the size of the highlighted fragment in characters (defaults to `100`), and the maximum number of fragments to return (defaults to `5`). For example: [source,js] -------------------------------------------------- { "query" : {...}, "highlight" : { "fields" : { "content" : {"fragment_size" : 150, "number_of_fragments" : 3} } } } -------------------------------------------------- On top of this it is possible to specify that highlighted fragments are order by score: [source,js] -------------------------------------------------- { "query" : {...}, "highlight" : { "order" : "score", "fields" : { "content" : {"fragment_size" : 150, "number_of_fragments" : 3} } } } -------------------------------------------------- It is also possible to highlight against a query other than the search query by setting `highlight_query`. This is especially useful if you use a rescore query because those are not taken into account by highlighting by default. Elasticsearch does not validate that `highlight_query` contains the search query in any way so it is possible to define it so legitimate query results aren't highlighted at all. Generally it is better to include the search query in the `highlight_query`. Here is an example of including both the search query and the rescore query in `highlight_query`. [source,js] -------------------------------------------------- { "fields": [ "_id" ], "query" : { "match": { "content": { "query": "foo bar" } } }, "rescore": { "window_size": 50, "query": { "rescore_query" : { "match_phrase": { "content": { "query": "foo bar", "phrase_slop": 1 } } }, "rescore_query_weight" : 10 } }, "highlight" : { "order" : "score", "fields" : { "content" : { "fragment_size" : 150, "number_of_fragments" : 3, "highlight_query": { "bool": { "must": { "match": { "content": { "query": "foo bar" } } }, "should": { "match_phrase": { "content": { "query": "foo bar", "phrase_slop": 1, "boost": 10.0 } } }, "minimum_should_match": 0 } } } } } } -------------------------------------------------- Note the score of text fragment in this case is calculated by Lucene highlighting framework. For implementation details you can check `ScoreOrderFragmentsBuilder.java` class. If the `number_of_fragments` value is set to 0 then no fragments are produced, instead the whole content of the field is returned, and of course it is highlighted. This can be very handy if short texts (like document title or address) need to be highlighted but no fragmentation is required. Note that `fragment_size` is ignored in this case. [source,js] -------------------------------------------------- { "query" : {...}, "highlight" : { "fields" : { "_all" : {}, "bio.title" : {"number_of_fragments" : 0} } } } -------------------------------------------------- When using `fast-vector-highlighter` one can use `fragment_offset` parameter to control the margin to start highlighting from. [[highlighting-settings]] ==== Global Settings Highlighting settings can be set on a global level and then overridden at the field level. [source,js] -------------------------------------------------- { "query" : {...}, "highlight" : { "number_of_fragments" : 3, "fragment_size" : 150, "tag_schema" : "styled", "fields" : { "_all" : { "pre_tags" : [""], "post_tags" : [""] }, "bio.title" : { "number_of_fragments" : 0 }, "bio.author" : { "number_of_fragments" : 0 }, "bio.content" : { "number_of_fragments" : 5, "order" : "score" } } } } -------------------------------------------------- [[field-match]] ==== Require Field Match `require_field_match` can be set to `true` which will cause a field to be highlighted only if a query matched that field. `false` means that terms are highlighted on all requested fields regardless if the query matches specifically on them. [[boundary-characters]] ==== Boundary Characters When highlighting a field that is mapped with term vectors, `boundary_chars` can be configured to define what constitutes a boundary for highlighting. It's a single string with each boundary character defined in it. It defaults to `.,!? \t\n`. The `boundary_max_scan` allows to control how far to look for boundary characters, and defaults to `20`.