209 lines
6.5 KiB
Plaintext
209 lines
6.5 KiB
Plaintext
|
[[search-request-highlighting]]
|
||
|
=== Highlighting
|
||
|
|
||
|
Allows to highlight search results on one or more fields. The
|
||
|
implementation uses either the lucene `fast-vector-highlighter` or
|
||
|
`highlighter`. The search request body:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
"query" : {...},
|
||
|
"highlight" : {
|
||
|
"fields" : {
|
||
|
"content" : {}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
|
||
|
In the above case, the `content` field will be highlighted for each
|
||
|
search hit (there will be another element in each search hit, called
|
||
|
`highlight`, which includes the highlighted fields and the highlighted
|
||
|
fragments).
|
||
|
|
||
|
In order to perform highlighting, the actual content of the field is
|
||
|
required. If the field in question is stored (has `store` set to `yes`
|
||
|
in the mapping), it will be used, otherwise, the actual `_source` will
|
||
|
be loaded and the relevant field will be extracted from it.
|
||
|
|
||
|
If `term_vector` information is provided by setting `term_vector` to
|
||
|
`with_positions_offsets` in the mapping then the fast vector
|
||
|
highlighter will be used instead of the plain highlighter. The fast vector highlighter:
|
||
|
|
||
|
* Is faster especially for large fields (> `1MB`)
|
||
|
* Can be customized with `boundary_chars`, `boundary_max_scan`, and
|
||
|
`fragment_offset` (see below)
|
||
|
* Requires setting `term_vector` to `with_positions_offsets` which
|
||
|
increases the size of the index
|
||
|
|
||
|
Here is an example of setting the `content` field to allow for
|
||
|
highlighting using the fast vector highlighter on it (this will cause
|
||
|
the index to be bigger):
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
"type_name" : {
|
||
|
"content" : {"term_vector" : "with_positions_offsets"}
|
||
|
}
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
|
||
|
Since `0.20.2` the field name support wildcard notation, for example,
|
||
|
using `comment_*` which will cause all fields that match the expression
|
||
|
to be highlighted.
|
||
|
|
||
|
==== Highlighting Tags
|
||
|
|
||
|
By default, the highlighting will wrap highlighted text in `<em>` and
|
||
|
`</em>`. This can be controlled by setting `pre_tags` and `post_tags`,
|
||
|
for example:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
"query" : {...},
|
||
|
"highlight" : {
|
||
|
"pre_tags" : ["<tag1>", "<tag2>"],
|
||
|
"post_tags" : ["</tag1>", "</tag2>"],
|
||
|
"fields" : {
|
||
|
"_all" : {}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
|
||
|
There can be a single tag or more, and the "importance" is ordered.
|
||
|
There are also built in "tag" schemas, with currently a single schema
|
||
|
called `styled` with `pre_tags` of:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
<em class="hlt1">, <em class="hlt2">, <em class="hlt3">,
|
||
|
<em class="hlt4">, <em class="hlt5">, <em class="hlt6">,
|
||
|
<em class="hlt7">, <em class="hlt8">, <em class="hlt9">,
|
||
|
<em class="hlt10">
|
||
|
--------------------------------------------------
|
||
|
|
||
|
And post tag of `</em>`. If you think of more nice to have built in tag
|
||
|
schemas, just send an email to the mailing list or open an issue. Here
|
||
|
is an example of switching tag schemas:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
"query" : {...},
|
||
|
"highlight" : {
|
||
|
"tags_schema" : "styled",
|
||
|
"fields" : {
|
||
|
"content" : {}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
|
||
|
An `encoder` parameter can be used to define how highlighted text will
|
||
|
be encoded. It can be either `default` (no encoding) or `html` (will
|
||
|
escape html, if you use html highlighting tags).
|
||
|
|
||
|
==== Highlighted Fragments
|
||
|
|
||
|
Each field highlighted can control the size of the highlighted fragment
|
||
|
in characters (defaults to `100`), and the maximum number of fragments
|
||
|
to return (defaults to `5`). For example:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
"query" : {...},
|
||
|
"highlight" : {
|
||
|
"fields" : {
|
||
|
"content" : {"fragment_size" : 150, "number_of_fragments" : 3}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
|
||
|
On top of this it is possible to specify that highlighted fragments are
|
||
|
order by score:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
"query" : {...},
|
||
|
"highlight" : {
|
||
|
"order" : "score",
|
||
|
"fields" : {
|
||
|
"content" : {"fragment_size" : 150, "number_of_fragments" : 3}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
|
||
|
Note the score of text fragment in this case is calculated by Lucene
|
||
|
highlighting framework. For implementation details you can check
|
||
|
`ScoreOrderFragmentsBuilder.java` class.
|
||
|
|
||
|
If the `number_of_fragments` value is set to 0 then no fragments are
|
||
|
produced, instead the whole content of the field is returned, and of
|
||
|
course it is highlighted. This can be very handy if short texts (like
|
||
|
document title or address) need to be highlighted but no fragmentation
|
||
|
is required. Note that `fragment_size` is ignored in this case.
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
"query" : {...},
|
||
|
"highlight" : {
|
||
|
"fields" : {
|
||
|
"_all" : {},
|
||
|
"bio.title" : {"number_of_fragments" : 0}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
|
||
|
When using `fast-vector-highlighter` one can use `fragment_offset`
|
||
|
parameter to control the margin to start highlighting from.
|
||
|
|
||
|
==== Global Settings
|
||
|
|
||
|
Highlighting settings can be set on a global level and then overridden
|
||
|
at the field level.
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
"query" : {...},
|
||
|
"highlight" : {
|
||
|
"number_of_fragments" : 3,
|
||
|
"fragment_size" : 150,
|
||
|
"tag_schema" : "styled",
|
||
|
"fields" : {
|
||
|
"_all" : { "pre_tags" : ["<em>"], "post_tags" : ["</em>"] },
|
||
|
"bio.title" : { "number_of_fragments" : 0 },
|
||
|
"bio.author" : { "number_of_fragments" : 0 },
|
||
|
"bio.content" : { "number_of_fragments" : 5, "order" : "score" }
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
|
||
|
==== Require Field Match
|
||
|
|
||
|
`require_field_match` can be set to `true` which will cause a field to
|
||
|
be highlighted only if a query matched that field. `false` means that
|
||
|
terms are highlighted on all requested fields regardless if the query
|
||
|
matches specifically on them.
|
||
|
|
||
|
==== Boundary Characters
|
||
|
|
||
|
When highlighting a field that is mapped with term vectors,
|
||
|
`boundary_chars` can be configured to define what constitutes a boundary
|
||
|
for highlighting. It's a single string with each boundary character
|
||
|
defined in it. It defaults to `.,!? \t\n`.
|
||
|
|
||
|
The `boundary_max_scan` allows to control how far to look for boundary
|
||
|
characters, and defaults to `20`.
|