mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-03-24 17:09:48 +00:00
[DOCS] Reorganized the highlighting topic so it's less confusing.
This commit is contained in:
parent
e165c405ac
commit
b5e81132cf
@ -1,9 +1,22 @@
|
||||
[[search-request-highlighting]]
|
||||
=== Highlighting
|
||||
|
||||
Highlighters allow you to produce highlighted snippets from one or more fields
|
||||
in your search results.
|
||||
The following is an example of the search request body:
|
||||
Highlighters enable you to get highlighted snippets from one or more fields
|
||||
in your search results so you can show users where the query matches are.
|
||||
When you request highlights, the response contains an additional `highlight`
|
||||
element for each search hit that includes the highlighted fields and the
|
||||
highlighted fragments.
|
||||
|
||||
Highlighting requires the actual content of a field. If the field is not
|
||||
stored (the mapping does not set `store` to `true`), the actual `_source` is
|
||||
loaded and the relevant field is extracted from `_source`.
|
||||
|
||||
NOTE: The `_all` field cannot be extracted from `_source`, so it can only
|
||||
be used for highlighting if it is explicitly stored.
|
||||
|
||||
For example, to get highlights for the `content` field in each search hit
|
||||
using the default highlighter, include a `highlight` object in
|
||||
the request body that specifies the `content` field:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
@ -22,63 +35,207 @@ GET /_search
|
||||
// CONSOLE
|
||||
// TEST[setup:twitter]
|
||||
|
||||
In the above case, the `comment` field will be highlighted for each
|
||||
search hit (there will be another element in each search hit, called
|
||||
`highlight`, which includes the highlighted fields and the highlighted
|
||||
fragments).
|
||||
|
||||
[NOTE]
|
||||
==================================
|
||||
In order to perform highlighting, the actual content of the field is
|
||||
required. If the field in question is stored (has `store` set to `true`
|
||||
in the mapping) it will be used, otherwise, the actual `_source` will
|
||||
be loaded and the relevant field will be extracted from it.
|
||||
|
||||
The `_all` field cannot be extracted from `_source`, so it can only
|
||||
be used for highlighting if it mapped to have `store` set to `true`.
|
||||
==================================
|
||||
|
||||
The field name supports wildcard notation. For example, using `comment_*`
|
||||
will cause all <<text,text>> and <<keyword,keyword>> fields that match the expression to be highlighted.
|
||||
Note that all other fields will not be highlighted. If you use a custom mapper and want to
|
||||
highlight on a field anyway, you have to provide the field name explicitly.
|
||||
{es} supports three highlighters:
|
||||
|
||||
[[unified-highlighter]]
|
||||
==== Unified Highlighter
|
||||
* The `unified` highlighter uses the Lucene Unified Highlighter. This
|
||||
highlighter breaks the text into sentences and uses the BM25 algorithm to score
|
||||
individual sentences as if they were documents in the corpus. It also supports
|
||||
accurate phrase and multi-term (fuzzy, prefix, regex) highlighting. This is the
|
||||
default highlighter.
|
||||
|
||||
The unified highlighter (which is used by default if no highlighter type is specified)
|
||||
uses the Lucene Unified Highlighter.
|
||||
This highlighter breaks the text into sentences and scores individual sentences as
|
||||
if they were documents in this corpus, using the BM25 algorithm.
|
||||
It also supports accurate phrase and multi-term (fuzzy, prefix, regex) highlighting.
|
||||
[[plain-highlighter]]
|
||||
* The `plain` highlighter uses the standard Lucene highlighter. It attempts to
|
||||
reflect the query matching logic in terms of understanding word importance and
|
||||
any word positioning criteria in phrase queries.
|
||||
+
|
||||
[WARNING]
|
||||
The `plain` highlighter works best for highlighting simple query matches in a
|
||||
single field. To accurately reflect query logic, it creates a tiny in-memory
|
||||
index and re-runs the original query criteria through Lucene's query execution
|
||||
planner to get access to low-level match information for the current document.
|
||||
This is repeated for every field and every document that needs to be highlighted.
|
||||
If you want to highlight a lot of fields in a lot of documents with complex
|
||||
queries, we recommend using one of the other highlighters.
|
||||
|
||||
===== Offsets Strategy
|
||||
[[fast-vector-highlighter]]
|
||||
* The `fvh` highlighter uses the Lucene Fast Vector highlighter.
|
||||
This highlighter can be used on fields with `term_vector` set to
|
||||
`with_positions_offsets` in the mapping. The fast vector highlighter:
|
||||
|
||||
In order to create meaningful search snippets from the terms being queried,
|
||||
a highlighter needs to know the start and end character offsets of each word
|
||||
in the original text.
|
||||
These offsets can be obtained from:
|
||||
** Is faster especially for large fields (> `1MB`)
|
||||
** Can be customized with a <<boundary-scanners,`boundary_scanner`>>.
|
||||
** Requires setting `term_vector` to `with_positions_offsets` which
|
||||
increases the size of the index
|
||||
** Can combine matches from multiple fields into one result. See
|
||||
`matched_fields`
|
||||
** Can assign different weights to matches at different positions allowing
|
||||
for things like phrase matches being sorted above term matches when
|
||||
highlighting a Boosting Query that boosts phrase matches over term matches
|
||||
|
||||
* The postings list (fields mapped as "index_options": "offsets").
|
||||
* Term vectors (fields mapped as "term_vectors": "with_positions_offsets").
|
||||
* The original field, by reanalysing the text on-the-fly.
|
||||
To create meaningful search snippets from the terms being queried,
|
||||
the highlighter needs to know the start and end character offsets of each word
|
||||
in the original text. These offsets can be obtained from:
|
||||
|
||||
====== Plain highlighting
|
||||
* The postings list. If `index_options` is set to `offsets` in the mapping,
|
||||
the `unified` highlighter uses this information to highlight documents without
|
||||
re-analyzing the text. It re-runs the original query directly on the postings
|
||||
and extracts the matching offsets from the index, limiting the collection to
|
||||
the highlighted documents. This is important if you have large fields because
|
||||
it doesn't require reanalyzing the text to be highlighted. It also requires less
|
||||
disk space than using `term_vectors`.
|
||||
|
||||
This mode is picked when there is no other alternative.
|
||||
* Term vectors. If `term_vector` information is provided by setting
|
||||
`term_vector` to `with_positions_offsets` in the mapping, the `unified`
|
||||
highlighter automatically uses the `term_vector` to highlight the field.
|
||||
Term vector highlighting is faster for highlighting multi-term queries like
|
||||
`prefix` or `wildcard` because it can access the dictionary of terms for
|
||||
each document, but it can be slower than using the postings list. The `fvh`
|
||||
highlighter always uses term vectors.
|
||||
|
||||
* Plain highlighting. This mode is used when there is no other alternative.
|
||||
It creates a tiny in-memory index and re-runs the original query criteria through
|
||||
Lucene's query execution planner to get access to low-level match information on the current document.
|
||||
This is repeated for every field and every document that needs highlighting.
|
||||
Lucene's query execution planner to get access to low-level match information on
|
||||
the current document. This is repeated for every field and every document that
|
||||
needs highlighting. The `plain` highlighter always uses plain highlighting.
|
||||
|
||||
====== Postings
|
||||
You can specify the highlighter `type` you want to use
|
||||
for each field.
|
||||
|
||||
If `index_options` is set to `offsets` in the mapping the `unified` highlighter
|
||||
will use this information to highlight documents without re-analyzing the text.
|
||||
It re-runs the original query directly on the postings and extracts the matching offsets
|
||||
directly from the index limiting the collection to the highlighted documents.
|
||||
This mode is faster on large fields since it doesn't require to reanalyze the text to be highlighted
|
||||
and requires less disk space than term_vectors, needed for the fast vector
|
||||
highlighting.
|
||||
[[highlighting-settings]]
|
||||
==== Highlighting Settings
|
||||
|
||||
Highlighting settings can be set on a global level and overridden at
|
||||
the field level.
|
||||
|
||||
boundary_chars:: A string that contains each boundary character.
|
||||
Defaults to `.,!? \t\n`.
|
||||
|
||||
boundary_max_scan:: How far to scan for boundary characters. Defaults to `20`.
|
||||
|
||||
[[boundary-scanners]]
|
||||
boundary_scanner:: Specifies how to break the highlighted fragments: `chars`,
|
||||
`sentence`, or `word`. Only valid for the `unified` and `fvh` highlighters.
|
||||
Defaults to `sentence` for the `unified` highlighter. Defaults to `chars` for
|
||||
the `fvh` highlighter.
|
||||
+
|
||||
* `chars` Use the characters specified by `boundary_chars` as highlighting
|
||||
boundaries. The `boundary_max_scan` setting controls how far to scan for
|
||||
boundary characters. Only valid for the `fvh` highlighter.
|
||||
* `sentence` Break highlighted fragments at the next sentence boundary, as
|
||||
determined by Java's
|
||||
https://docs.oracle.com/javase/8/docs/api/java/text/BreakIterator.html[BreakIterator].
|
||||
You can specify the locale to use with `boundary_scanner_locale`.
|
||||
+
|
||||
NOTE: When used with the `unified` highlighter, the `sentence` scanner splits
|
||||
sentences bigger than `fragment_size` at the first word boundary next to
|
||||
`fragment_size`. You can set `fragment_size` to 0 to never split any sentence.
|
||||
|
||||
* `word` Break highlighted fragments at the next word boundary, as determined
|
||||
by Java's https://docs.oracle.com/javase/8/docs/api/java/text/BreakIterator.html[BreakIterator].
|
||||
You can specify the locale to use with `boundary_scanner_locale`.
|
||||
|
||||
boundary_scanner_locale:: Controls which locale is used to search for sentence
|
||||
and word boundaries.
|
||||
|
||||
encoder:: Indicates if the highlighted text should be HTML encoded:
|
||||
`default` (no encoding) or `html` (escapes HTML highlighting tags).
|
||||
|
||||
fields:: Specifies the fields to retrieve highlights for. You can use wildcards
|
||||
to specify fields. For example, you could specify `comment_*` to
|
||||
get highlights for all <<text,text>> and <<keyword,keyword>> fields
|
||||
that start with `comment_`.
|
||||
+
|
||||
NOTE: Only text and keyword fields are highlighted when you use wildcards.
|
||||
If you use a custom mapper and want to highlight on a field anyway, you
|
||||
must explicitly specify that field name.
|
||||
|
||||
force_source:: Highlight based on the source even if the field is
|
||||
stored separately. Defaults to `false`.
|
||||
|
||||
fragmenter:: Specifies how text should be broken up in highlight
|
||||
snippets: `simple` or `span`. Only valid for the `plain` highlighter.
|
||||
Defaults to `span`.
|
||||
+
|
||||
* `simple` Breaks up text into same-sized fragments.
|
||||
* `span` Breaks up text into same-sized fragments, but tried to avoid
|
||||
breaking up text between highlighted terms. This is helpful when you're
|
||||
querying for phrases. Default.
|
||||
|
||||
fragment_offset:: Controls the margin from which you want to start
|
||||
highlighting. Only valid when using the `fvh` highlighter.
|
||||
|
||||
fragment_size:: The size of the highlighted fragment in characters. Defaults
|
||||
to 100.
|
||||
|
||||
highlight_query:: Highlight matches for a query other than the search
|
||||
query. This is especially useful if you use a rescore query because
|
||||
those are not taken into account by highlighting by default.
|
||||
+
|
||||
IMPORTANT: {es} does not validate that `highlight_query` contains
|
||||
the search query in any way so it is possible to define it so
|
||||
legitimate query results are not highlighted. Generally, you should
|
||||
include the search query as part of the `highlight_query`.
|
||||
|
||||
matched_fields:: Combine matches on multiple fields to highlight a single field.
|
||||
This is most intuitive for multifields that analyze the same string in different
|
||||
ways. All `matched_fields` must have `term_vector` set to
|
||||
`with_positions_offsets`, but only the field to which
|
||||
the matches are combined is loaded so only that field benefits from having
|
||||
`store` set to `yes`. Only valid for the `fvh` highlighter.
|
||||
|
||||
no_match_size:: The amount of text you want to return from the beginning
|
||||
of the field if there are no matching fragments to highlight. Defaults
|
||||
to 0 (nothing is returned).
|
||||
|
||||
number_of_fragments:: The maximum number of fragments to return. If the
|
||||
number of fragments is set to 0, no fragments are returned. Instead,
|
||||
the entire field contents are highlighted and returned. This can be
|
||||
handy when you need to highlight short texts such as a title or
|
||||
address, but fragmentation is not required. If `number_of_fragments`
|
||||
is 0, `fragment_size` is ignored. Defaults to 5.
|
||||
|
||||
order:: Sorts highlighted fragments by score when set to `score`. Only valid for
|
||||
the `unified` highlighter.
|
||||
|
||||
phrase_limit:: Controls the number of matching phrases in a document that are
|
||||
considered. Prevents the `fvh` highlighter from analyzing too many phrases
|
||||
and consuming too much memory. When using `matched_fields, `phrase_limit`
|
||||
phrases per matched field are considered. Raising the limit increases query
|
||||
time and consumes more memory. Only supported by the `fvh` highlighter.
|
||||
Defaults to 256.
|
||||
|
||||
pre_tags:: Use in conjunction with `post_tags` to define the HTML tags
|
||||
to use for the highlighted text. By default, highlighted text is wrapped
|
||||
in `<em>` and </em>` tags. Specify as an array of strings.
|
||||
|
||||
post_tags:: Use in conjunction with `pre_tags` to define the HTML tags
|
||||
to use for the highlighted text. By default, highlighted text is wrapped
|
||||
in `<em>` and `</em>` tags. Specify as an array of strings.
|
||||
|
||||
require_field_match:: By default, only fields that contains a query match are
|
||||
highlighted. Set `require_field_match` to `false` to highlight all fields.
|
||||
Defaults to `true`.
|
||||
|
||||
tags_schema:: Set to `styled` to use the built-in tag schema. The `styled`
|
||||
schema defines the following `pre_tags` and defines `post_tags` as
|
||||
`</em>`.
|
||||
+
|
||||
[source,html]
|
||||
--------------------------------------------------
|
||||
<em class="hlt1">, <em class="hlt2">, <em class="hlt3">,
|
||||
<em class="hlt4">, <em class="hlt5">, <em class="hlt6">,
|
||||
<em class="hlt7">, <em class="hlt8">, <em class="hlt9">,
|
||||
<em class="hlt10">
|
||||
--------------------------------------------------
|
||||
|
||||
|
||||
[[highlighter-type]]
|
||||
type:: The highlighter to use: `unified`, `plain`, or `fvh`. Defaults to
|
||||
`unified`.
|
||||
|
||||
[[highlighting-examples]]
|
||||
==== Highlighting Examples
|
||||
|
||||
Here is an example of setting the `comment` field in the index mapping to allow for
|
||||
highlighting using the postings:
|
||||
@ -101,15 +258,6 @@ PUT /example
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
|
||||
====== Term Vectors
|
||||
|
||||
If `term_vector` information is provided by setting `term_vector` to
|
||||
`with_positions_offsets` in the mapping then the `unified` highlighter
|
||||
will automatically use the `term_vector` to highlight the field.
|
||||
The `term_vector` highlighting is faster to highlight multi-term queries like
|
||||
`prefix` or `wildcard` because it can access the dictionary of term for each document
|
||||
but it is also usually more costly than using the `postings` directly.
|
||||
|
||||
Here is an example of setting the `comment` field to allow for
|
||||
highlighting using the `term_vectors` (this will cause the index to be bigger):
|
||||
|
||||
@ -131,59 +279,8 @@ PUT /example
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
|
||||
[[plain-highlighter]]
|
||||
==== Plain highlighter
|
||||
|
||||
This highlighter of type `plain` uses the standard Lucene highlighter.
|
||||
It tries hard to reflect the query matching logic in terms of understanding word importance and any word positioning criteria in phrase queries.
|
||||
|
||||
[WARNING]
|
||||
If you want to highlight a lot of fields in a lot of documents with complex queries this highlighter will not be fast.
|
||||
In its efforts to accurately reflect query logic it creates a tiny in-memory index and re-runs the original query criteria through
|
||||
Lucene's query execution planner to get access to low-level match information on the current document.
|
||||
This is repeated for every field and every document that needs highlighting. If this presents a performance issue in your system consider using an alternative highlighter.
|
||||
|
||||
[[fast-vector-highlighter]]
|
||||
==== Fast vector highlighter
|
||||
|
||||
This highlighter of type `fvh` uses the Lucene Fast Vector highlighter.
|
||||
This highlighter can be used on fields with `term_vector` set to
|
||||
`with_positions_offsets` in the mapping.
|
||||
The fast vector highlighter:
|
||||
|
||||
* Is faster especially for large fields (> `1MB`)
|
||||
* Can be customized with `boundary_scanner` (see <<boundary-scanners,below>>)
|
||||
* Requires setting `term_vector` to `with_positions_offsets` which
|
||||
increases the size of the index
|
||||
* Can combine matches from multiple fields into one result. See
|
||||
`matched_fields`
|
||||
* Can assign different weights to matches at different positions allowing
|
||||
for things like phrase matches being sorted above term matches when
|
||||
highlighting a Boosting Query that boosts phrase matches over term matches
|
||||
|
||||
Here is an example of setting the `comment` field to allow for
|
||||
highlighting using the fast vector highlighter on it (this will cause
|
||||
the index to be bigger):
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT /example
|
||||
{
|
||||
"mappings": {
|
||||
"doc" : {
|
||||
"properties": {
|
||||
"comment" : {
|
||||
"type": "text",
|
||||
"term_vector" : "with_positions_offsets"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
|
||||
==== Force highlighter type
|
||||
===== Force highlighter type
|
||||
|
||||
The `type` field allows to force a specific highlighter type.
|
||||
The allowed values are: `unified`, `plain` and `fvh`.
|
||||
@ -206,10 +303,10 @@ GET /_search
|
||||
// CONSOLE
|
||||
// TEST[setup:twitter]
|
||||
|
||||
==== Force highlighting on source
|
||||
===== Force highlighting on source
|
||||
|
||||
Forces the highlighting to highlight fields based on the source even if fields are
|
||||
stored separately. Defaults to `false`.
|
||||
Forces the highlighting to highlight fields based on the source even if fields
|
||||
are stored separately. Defaults to `false`.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
@ -229,7 +326,7 @@ GET /_search
|
||||
// TEST[setup:twitter]
|
||||
|
||||
[[tags]]
|
||||
==== Highlighting Tags
|
||||
===== Configure highlighting tags
|
||||
|
||||
By default, the highlighting will wrap highlighted text in `<em>` and
|
||||
`</em>`. This can be controlled by setting `pre_tags` and `post_tags`,
|
||||
@ -254,8 +351,8 @@ GET /_search
|
||||
// CONSOLE
|
||||
// TEST[setup:twitter]
|
||||
|
||||
Using the fast vector highlighter there can be more tags, and the "importance"
|
||||
is ordered.
|
||||
When using the fast vector highlighter, you can specify additional tags and the
|
||||
"importance" is ordered.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
@ -276,20 +373,7 @@ GET /_search
|
||||
// CONSOLE
|
||||
// TEST[setup:twitter]
|
||||
|
||||
There are also built in "tag" schemas, with currently a single schema
|
||||
called `styled` with the following `pre_tags`:
|
||||
|
||||
[source,html]
|
||||
--------------------------------------------------
|
||||
<em class="hlt1">, <em class="hlt2">, <em class="hlt3">,
|
||||
<em class="hlt4">, <em class="hlt5">, <em class="hlt6">,
|
||||
<em class="hlt7">, <em class="hlt8">, <em class="hlt9">,
|
||||
<em class="hlt10">
|
||||
--------------------------------------------------
|
||||
|
||||
and `</em>` as `post_tags`. If you think of more nice to have built in tag
|
||||
schemas, just send an email to the mailing list or open an issue. Here
|
||||
is an example of switching tag schemas:
|
||||
You can also use the built-in `styled` tag schema:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
@ -309,13 +393,8 @@ GET /_search
|
||||
// CONSOLE
|
||||
// TEST[setup:twitter]
|
||||
|
||||
==== Encoder
|
||||
|
||||
An `encoder` parameter can be used to define how highlighted text will
|
||||
be encoded. It can be either `default` (no encoding) or `html` (will
|
||||
escape html, if you use html highlighting tags).
|
||||
|
||||
==== Highlighted Fragments
|
||||
===== Controlling highlighted fragments
|
||||
|
||||
Each field highlighted can control the size of the highlighted fragment
|
||||
in characters (defaults to `100`), and the maximum number of fragments
|
||||
@ -414,17 +493,10 @@ GET /_search
|
||||
// CONSOLE
|
||||
// TEST[setup:twitter]
|
||||
|
||||
==== Fragmenter
|
||||
===== Specifying a fragmenter for the plain highlighter
|
||||
|
||||
WARNING: This option is not supported by the `unified` highlighter
|
||||
|
||||
Fragmenter can control how text should be broken up in highlight snippets.
|
||||
However, this option is applicable only for the Plain Highlighter.
|
||||
There are two options:
|
||||
|
||||
[horizontal]
|
||||
`simple`:: Breaks up text into same sized fragments.
|
||||
`span`:: Same as the simple fragmenter, but tries not to break up text between highlighted terms (this is applicable when using phrase like queries). This is the default.
|
||||
When using the `plain` highlighter, you can choose between the `simple` and
|
||||
`span` fragmenters:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
@ -539,19 +611,13 @@ Response:
|
||||
|
||||
If the `number_of_fragments` option is set to `0`,
|
||||
`NullFragmenter` is used which does not fragment the text at all.
|
||||
This is useful for highlighting the entire content of a document or field.
|
||||
This is useful for highlighting the entire contents of a document or field.
|
||||
|
||||
==== Highlight query
|
||||
===== Specifying a highlight query
|
||||
|
||||
It is also possible to highlight against a query other than the search
|
||||
query by setting `highlight_query`. This is especially useful if you
|
||||
use a rescore query because those are not taken into account by
|
||||
highlighting by default. Elasticsearch does not validate that
|
||||
`highlight_query` contains the search query in any way so it is possible
|
||||
to define it so legitimate query results aren't highlighted at all.
|
||||
Generally it is better to include the search query in the
|
||||
`highlight_query`. Here is an example of including both the search
|
||||
Here is an example of including both the search
|
||||
query and the rescore query in `highlight_query`.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
GET /_search
|
||||
@ -613,11 +679,8 @@ GET /_search
|
||||
// CONSOLE
|
||||
// TEST[setup:twitter]
|
||||
|
||||
[[highlighting-settings]]
|
||||
==== Global Settings
|
||||
|
||||
Highlighting settings can be set on a global level and then overridden
|
||||
at the field level.
|
||||
[[overriding-global-settings]]
|
||||
===== Overriding global settings
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
@ -642,12 +705,10 @@ GET /_search
|
||||
// TEST[setup:twitter]
|
||||
|
||||
[[field-match]]
|
||||
==== Require Field Match
|
||||
===== Highlighting in all fields
|
||||
|
||||
`require_field_match` can be set to `false` which will cause any field to
|
||||
be highlighted regardless of whether the query matched specifically on them.
|
||||
The default behaviour is `true`, meaning that only fields that hold a query
|
||||
match will be highlighted.
|
||||
By default, only fields that contains a query match are highlighted. Set
|
||||
`require_field_match` to `false` to highlight all fields.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
@ -667,43 +728,19 @@ GET /_search
|
||||
// CONSOLE
|
||||
// TEST[setup:twitter]
|
||||
|
||||
[[boundary-scanners]]
|
||||
==== Boundary Scanners
|
||||
|
||||
When highlighting a field using the unified highlighter or the fast vector highlighter,
|
||||
you can specify how to break the highlighted fragments using `boundary_scanner`, which accepts
|
||||
the following values:
|
||||
|
||||
* `chars` (default mode for the FVH): allows to configure which characters (`boundary_chars`)
|
||||
constitute a boundary for highlighting. It's a single string with each boundary
|
||||
character defined in it (defaults to `.,!? \t\n`). It also allows configuring
|
||||
the `boundary_max_scan` to control how far to look for boundary characters
|
||||
(defaults to `20`). Works only with the Fast Vector Highlighter.
|
||||
|
||||
* `sentence` and `word`: use Java's https://docs.oracle.com/javase/8/docs/api/java/text/BreakIterator.html[BreakIterator]
|
||||
to break the highlighted fragments at the next _sentence_ or _word_ boundary.
|
||||
You can further specify `boundary_scanner_locale` to control which Locale is used
|
||||
to search the text for these boundaries.
|
||||
|
||||
[NOTE]
|
||||
When used with the `unified` highlighter, the `sentence` scanner splits sentence
|
||||
bigger than `fragment_size` at the first word boundary next to `fragment_size`.
|
||||
You can set `fragment_size` to 0 to never split any sentence.
|
||||
|
||||
[[matched-fields]]
|
||||
==== Matched Fields
|
||||
===== Combining matches on multiple fields
|
||||
|
||||
WARNING: This is only supported by the `fvh` highlighter
|
||||
|
||||
The Fast Vector Highlighter can combine matches on multiple fields to
|
||||
highlight a single field using `matched_fields`. This is most
|
||||
intuitive for multifields that analyze the same string in different
|
||||
ways. All `matched_fields` must have `term_vector` set to
|
||||
`with_positions_offsets` but only the field to which the matches are
|
||||
combined is loaded so only that field would benefit from having
|
||||
highlight a single field. This is most intuitive for multifields that
|
||||
analyze the same string in different ways. All `matched_fields` must have
|
||||
`term_vector` set to `with_positions_offsets` but only the field to which
|
||||
the matches are combined is loaded so only that field would benefit from having
|
||||
`store` set to `yes`.
|
||||
|
||||
In the following examples `comment` is analyzed by the `english`
|
||||
In the following examples, `comment` is analyzed by the `english`
|
||||
analyzer and `comment.plain` is analyzed by the `standard` analyzer.
|
||||
|
||||
[source,js]
|
||||
@ -826,26 +863,13 @@ to
|
||||
// NOTCONSOLE
|
||||
===================================================================
|
||||
|
||||
[[phrase-limit]]
|
||||
==== Phrase Limit
|
||||
|
||||
WARNING: this is only supported by the `fvh` highlighter
|
||||
|
||||
The fast vector highlighter has a `phrase_limit` parameter that prevents
|
||||
it from analyzing too many phrases and eating tons of memory. It defaults
|
||||
to 256 so only the first 256 matching phrases in the document scored
|
||||
considered. You can raise the limit with the `phrase_limit` parameter but
|
||||
keep in mind that scoring more phrases consumes more time and memory.
|
||||
|
||||
If using `matched_fields` keep in mind that `phrase_limit` phrases per
|
||||
matched field are considered.
|
||||
|
||||
[float]
|
||||
[[explicit-field-order]]
|
||||
=== Field Highlight Order
|
||||
Elasticsearch highlights the fields in the order that they are sent. Per the
|
||||
json spec objects are unordered but if you need to be explicit about the order
|
||||
that fields are highlighted then you can use an array for `fields` like this:
|
||||
===== Explicitly ordering highlighted fields
|
||||
Elasticsearch highlights the fields in the order that they are sent, but per the
|
||||
JSON spec, objects are unordered. If you need to be explicit about the order
|
||||
in which fields are highlighted specify the `fields` as an array:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
GET /_search
|
||||
@ -862,4 +886,4 @@ GET /_search
|
||||
// TEST[setup:twitter]
|
||||
|
||||
None of the highlighters built into Elasticsearch care about the order that the
|
||||
fields are highlighted but a plugin may.
|
||||
fields are highlighted but a plugin might.
|
||||
|
Loading…
x
Reference in New Issue
Block a user