OpenSearch/docs/reference/search
Luca Cavanna 48ac9747a8 Added third highlighter type based on lucene postings highlighter
Requires field index_options set to "offsets" in order to store positions and offsets in the postings list.
Considerably faster than the plain highlighter since it doesn't require to reanalyze the text to be highlighted: the larger the documents the better the performance gain should be.
Requires less disk space than term_vectors, needed for the fast_vector_highlighter.
Breaks the text into sentences and highlights them. Uses a BreakIterator to find sentences in the text. Plays really well with natural text, not quite the same if the text contains html markup for instance.
Treats the document as the whole corpus, and scores individual sentences as if they were documents in this corpus, using the BM25 algorithm.

Uses forked version of lucene postings highlighter to support:
- per value discrete highlighting for fields that have multiple values, needed when number_of_fragments=0 since we want to return a snippet per value
- manually passing in query terms to avoid calling extract terms multiple times, since we use a different highlighter instance per doc/field, but the query is always the same

The lucene postings highlighter api is  quite different compared to the existing highlighters api, the main difference being that it allows to highlight multiple fields in multiple docs with a single call, ensuring sequential IO.
The way it is introduced in elasticsearch in this first round is a compromise trying not to change the current highlight api, which works per document, per field. The main disadvantage is that we lose the sequential IO, but we can always refactor the highlight api to work with multiple documents.

Supports pre_tag, post_tag, number_of_fragments (0 highlights the whole field), require_field_match, no_match_size, order by score and html encoding.

Closes #3704
2013-10-24 23:38:00 +02:00
..
facets introduced support for "shard_size" for terms & terms_stats facets. The "shard_size" is the number of term entries each shard will send back to the coordinating node. "shard_size" > "size" will increase the accuracy (both in terms of the counts associated with each term and the terms that will actually be returned the user) - of course, the higher "shard_size" is, the more expensive the processing becomes as bigger queues are maintained on a shard level and larger lists are streamed back from the shards. 2013-10-02 22:02:00 +02:00
request Added third highlighter type based on lucene postings highlighter 2013-10-24 23:38:00 +02:00
suggesters phrase_len is not called phrase_length 2013-10-18 09:29:53 -04:00
count.asciidoc [DOCS] Added a few clarifications to the docs from the issues list 2013-09-04 23:20:55 +02:00
explain.asciidoc [DOCS] Reorganised common API conventions 2013-10-13 16:46:56 +02:00
facets.asciidoc Fix markup 2013-10-21 16:11:09 +02:00
more-like-this.asciidoc Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00
multi-search.asciidoc [DOCS] Removed outdated new/deprecated version notices 2013-09-03 21:28:31 +02:00
percolate.asciidoc Added initial documentation for the redesigned percolator. 2013-10-16 14:12:19 +02:00
request-body.asciidoc Added docs for named queries. 2013-09-16 11:17:01 +02:00
search.asciidoc [DOCS] Reorganised common API conventions 2013-10-13 16:46:56 +02:00
suggesters.asciidoc Add more anchor links to documentation 2013-09-30 13:13:16 -06:00
termvectors.asciidoc [DOCS] Added docs for term vectors 2013-09-04 23:20:54 +02:00
uri-request.asciidoc [DOCS] Added pages explaining lucene query parser syntax and regular expression syntax 2013-10-07 14:42:49 +02:00
validate.asciidoc Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00