Jim Ferenczi 8250aa4267 Remove the postings highlighter and make unified the default highlighter choice (#25028)
This change removes the `postings` highlighter. This highlighter has been removed from Lucene master (7.x) because it behaves
exactly like the `unified` highlighter when index_options is set to `offsets`:
https://issues.apache.org/jira/browse/LUCENE-7815

It also makes the `unified` highlighter the default choice for highlighting a field (if `type` is not provided).
The strategy used internally by this highlighter remain the same as before, it checks `term_vectors` first, then `postings` and ultimately it re-analyzes the text.
Ultimately it rewrites the docs so that the options that the `unified` highlighter cannot handle are clearly marked as such.
There are few features that the `unified` highlighter is not able to handle which is why the other highlighters (`plain` and `fvh`) are still available.
I'll open separate issues for these features and we'll deprecate the `fvh` and `plain` highlighters when full support for these features have been added to the `unified`.
2017-06-09 14:09:57 +02:00

111 lines
4.5 KiB
Plaintext

[[breaking_60_search_changes]]
=== Search and Query DSL changes
==== Changes to queries
* The `collect_payloads` parameter of the `span_near` query has been removed. Payloads will be
loaded when needed.
* Queries on boolean fields now strictly parse boolean-like values. This means
only the strings `"true"` and `"false"` will be parsed into their boolean
counterparts. Other strings will cause an error to be thrown.
* The `in` query (a synonym for the `terms` query) has been removed
* The `geo_bbox` query (a synonym for the `geo_bounding_box` query) has been removed
* The `mlt` query (a synonym for the `more_like_this` query) has been removed
* The `fuzzy_match` and `match_fuzzy` query (synonyma for the `match` query) have been removed
* The `terms` query now always returns scores equal to `1` and is not subject to
`indices.query.bool.max_clause_count` anymore.
* The deprecated `indices` query has been removed.
* Support for empty query objects (`{ }`) has been removed from the query DSL.
An error is thrown whenever an empty query object is provided.
* The deprecated `minimum_number_should_match` parameter in the `bool` query has
been removed, use `minimum_should_match` instead.
* The `query_string` query now correctly parses the maximum number of
states allowed when
"https://en.wikipedia.org/wiki/Powerset_construction#Complexity[determinizing]"
a regex as `max_determinized_states` instead of the typo
`max_determined_states`.
* The `query_string` query no longer accepts `enable_position_increment`, use
`enable_position_increments` instead.
* For `geo_distance` queries, sorting, and aggregations the `sloppy_arc` option
has been removed from the `distance_type` parameter.
* The `geo_distance_range` query, which was deprecated in 5.0, has been removed.
* The `optimize_bbox` parameter has been removed from `geo_distance` queries.
* The `ignore_malformed` and `coerce` parameters have been removed from
`geo_bounding_box`, `geo_polygon`, and `geo_distance` queries.
* The `disable_coord` parameter of the `bool` and `common_terms` queries has
been removed. If provided, it will be ignored and issue a deprecation warning.
* The `template` query has been removed. This query was deprecated since 5.0
==== Search shards API
The search shards API no longer accepts the `type` url parameter, which didn't
have any effect in previous versions.
==== Changes to the Profile API
* The `"time"` field showing human readable timing output has been replaced by the `"time_in_nanos"`
field which displays the elapsed time in nanoseconds. The `"time"` field can be turned on by adding
`"?human=true"` to the request url. It will display a rounded, human readable time value.
==== Scoring changes
===== Query normalization
Query normalization has been removed. This means that the TF-IDF similarity no
longer tries to make scores comparable across queries and that boosts are now
integrated into scores as simple multiplicative factors.
Other similarities are not affected as they did not normalize scores and
already integrated boosts into scores as multiplicative factors.
See https://issues.apache.org/jira/browse/LUCENE-7347[`LUCENE-7347`] for more
information.
===== Coordination factors
Coordination factors have been removed from the scoring formula. This means that
boolean queries no longer score based on the number of matching clauses.
Instead, they always return the sum of the scores of the matching clauses.
As a consequence, use of the TF-IDF similarity is now discouraged as this was
an important component of the quality of the scores that this similarity
produces. BM25 is recommended instead.
See https://issues.apache.org/jira/browse/LUCENE-7347[`LUCENE-7347`] for more
information.
==== Fielddata on _uid
Fielddata on `_uid` is deprecated. It is possible to switch to `_id` instead
but the only reason why it has not been deprecated too is because it is used
for the `random_score` function. If you really need access to the id of
documents for sorting, aggregations or search scripts, the recommandation is
to duplicate the id as a field in the document.
==== Highlighers
The `unified` highlighter is the new default choice for highlighter.
The offset strategy for each field is picked internally by this highlighter depending on the
type of the field (`index_options`).
It is still possible to force the highlighter to `fvh` or `plain` types.
The `postings` highlighter has been removed from Lucene and Elasticsearch.
The `unified` highlighter outputs the same highlighting when `index_options` is set
to `offsets`.