Commit Graph

278 Commits

Author SHA1 Message Date
Nik Everett 8bd9e34e39 Stop FVH from throwing away some query boosts
The FVH was throwing away some boosts on queries stopping a number of
ways to boost phrase matches to the top of the list of fragments from
working.

The plain highlighter also doesn't work for this but that is because it
doesn't support the concept of the same term having a different score at
different positions.

Also update documentation claiming that FHV is nicer for weighing terms
found by query combinations.

Closes #4351
2014-01-08 11:51:48 +01:00
Nik Everett 522d620eb6 Use FHV's phraseLimit
This prevents poisoning the FVH with documents that contain TONS of matches
which take tons of memory and time to highlight.

Closes #4645
2014-01-08 11:27:58 +01:00
Simon Willnauer fa16969360 Cleanup comments and class names s/ElasticSearch/Elasticsearch
* Clean up s/ElasticSearch/Elasticsearch on docs/*
 * Clean up s/ElasticSearch/Elasticsearch on src/* bin/* & pom.xml
 * Clean up s/ElasticSearch/Elasticsearch on NOTICE.txt and README.textile

Closes #4634
2014-01-07 11:21:51 +01:00
Martijn van Groningen f1bf585089 The `fields` option should always return an array for json document fields and single valued field for metadata fields.
Also the `fields` option can only be used to fetch leaf fields, trying to do fetch object fields will return in a client error.

Closes #4542
2014-01-03 17:29:12 +01:00
Clinton Gormley 34b9b16233 [DOCS] Fixed some bad link refs 2013-12-16 18:07:33 +01:00
Martijn van Groningen 23d2b1ea7b Renamed top level `filter` to `post_filter`.
Closes #4119
2013-12-16 17:10:14 +01:00
Martijn van Groningen 10e2528cce Added the `force_source` option to highlighting that enforces to use of the _source even if there are stored fields.
The percolator uses this option to deal with the fact that the MemoryIndex doesn't support stored fields,
this is possible b/c the _source of the document being percolated is always present.

Closes #4348
2013-12-13 13:39:53 +01:00
Nik Everett 8e34057bc0 Add support for combining fields to the FVH
The Fast Vector Highlighter can combine matches on multiple fields to
highlight a single field using `matched_fields`.  This is most
intuitive for multifields that analyze the same string in different
ways.  Example:
{
    "query": {
        "query_string": {
            "query": "content.plain:running scissors",
            "fields": ["content"]
        }
    },
    "highlight": {
        "order": "score",
        "fields": {
            "content": {
                "matched_fields": ["content", "content.plain"],
                "type" : "fvh"
            }
        }
    }
}

Closes #3750
2013-12-03 11:10:01 +01:00
Conrad Pankoff 87246af256 [DOCS] Fixed typos and corrected grammar 2013-12-02 10:08:26 +01:00
Clinton Gormley bc393b6d79 Changed the minScore comparator from > to >=
Closes #4303
2013-11-29 20:29:20 +01:00
Boaz Leskes c63d8c4fb5 [Docs] Added _source filtering to documentation
Relates to #3301
2013-11-26 19:16:24 +01:00
Clinton Gormley 3465e69e83 [DOCS] Changed all store:yes/no to store:true/false
which is how this setting is stored internally
2013-11-07 16:57:18 +01:00
Luca Cavanna 48ac9747a8 Added third highlighter type based on lucene postings highlighter
Requires field index_options set to "offsets" in order to store positions and offsets in the postings list.
Considerably faster than the plain highlighter since it doesn't require to reanalyze the text to be highlighted: the larger the documents the better the performance gain should be.
Requires less disk space than term_vectors, needed for the fast_vector_highlighter.
Breaks the text into sentences and highlights them. Uses a BreakIterator to find sentences in the text. Plays really well with natural text, not quite the same if the text contains html markup for instance.
Treats the document as the whole corpus, and scores individual sentences as if they were documents in this corpus, using the BM25 algorithm.

Uses forked version of lucene postings highlighter to support:
- per value discrete highlighting for fields that have multiple values, needed when number_of_fragments=0 since we want to return a snippet per value
- manually passing in query terms to avoid calling extract terms multiple times, since we use a different highlighter instance per doc/field, but the query is always the same

The lucene postings highlighter api is  quite different compared to the existing highlighters api, the main difference being that it allows to highlight multiple fields in multiple docs with a single call, ensuring sequential IO.
The way it is introduced in elasticsearch in this first round is a compromise trying not to change the current highlight api, which works per document, per field. The main disadvantage is that we lose the sequential IO, but we can always refactor the highlight api to work with multiple documents.

Supports pre_tag, post_tag, number_of_fragments (0 highlights the whole field), require_field_match, no_match_size, order by score and html encoding.

Closes #3704
2013-10-24 23:38:00 +02:00
Luca Cavanna e981e411d7 [DOCS] rephrased docs for highlight no_match_size parameter
(removed 0.90.6 coming tag as it's needed only in 0.90 branch)
2013-10-24 14:38:32 +02:00
Nik Everett 14a709f563 Highlighting can return excerpt with no highlights
You can configure the highlighting api to return an excerpt of a field
even if there wasn't a match on the field.

The FVH makes excerpts from the beginning of the string to the first
boundary character after the requested length or the boundary_max_scan,
whichever comes first.  The Plain highlighter makes excerpts from the
beginning of the string to the end of the last token before the requested
length.

Closes #1171
2013-10-24 14:38:32 +02:00
Clinton Gormley b2d82d7e75 [DOCS] Reorganised the highlight_query docs and added a version flag 2013-10-18 18:03:31 +02:00
Clinton Gormley ba1b4886e3 [DOCS] Moved "named filters/queries" up one level 2013-10-10 11:23:08 +02:00
Clinton Gormley 7a53d41446 [DOCS] Changed capitalization of operator in rescore query 2013-10-05 17:18:15 +02:00
Nik Everett 6b000d8c6d Support specifing score query on highlight.
This is useful if you want to highlight terms not in the search query or
you want sort highlighted snippets based on another query.

Closes #3630
2013-10-02 15:46:24 -04:00
Lee Hinman ba40aa374e Uniquify anchor links to fix asciidoc/docbook generation 2013-09-30 15:32:00 -06:00
Lee Hinman 0442b737be Add more anchor links to documentation
Related to #3679
2013-09-30 13:13:16 -06:00
Clinton Gormley 85bba668f7 [DOCS] Tidied up various doc formatting errors 2013-09-16 16:13:01 +02:00
Martijn van Groningen f6f4b5014f Added docs for named queries.
Relates to #3581
2013-09-16 11:17:01 +02:00
Martijn van Groningen 8ddb809f98 If all scroll ids should be removed then the `_all` value should be used instead of not specifying any scroll ids. 2013-09-12 10:41:38 +02:00
Martijn van Groningen 0efa78710b Added clear scroll api.
The clear scroll api allows clear all resources associated with a `scroll_id` by deleting the `scroll_id` and its associated SearchContext.

Closes #3657
2013-09-10 21:17:34 +02:00
Clinton Gormley 6d667e5d41 [DOCS] Missing sort values now works for all field types 2013-09-04 23:20:55 +02:00
Clinton Gormley 393c28bee4 [DOCS] Removed outdated new/deprecated version notices 2013-09-03 21:28:31 +02:00
Clinton Gormley 822043347e Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00