From dc62255ddda987b028e3b00d17825d3e3d6e0afa Mon Sep 17 00:00:00 2001 From: Adrien Grand Date: Mon, 30 Jan 2017 11:08:49 +0100 Subject: [PATCH] Document upcoming scoring changes. (#22806) --- .../migration/migrate_6_0/search.asciidoc | 27 +++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/docs/reference/migration/migrate_6_0/search.asciidoc b/docs/reference/migration/migrate_6_0/search.asciidoc index bd9022dbaaf..85bc8417486 100644 --- a/docs/reference/migration/migrate_6_0/search.asciidoc +++ b/docs/reference/migration/migrate_6_0/search.asciidoc @@ -45,3 +45,30 @@ have any effect in previous versions. * The `"time"` field showing human readable timing output has been replaced by the `"time_in_nanos"` field which displays the elapsed time in nanoseconds. The `"time"` field can be turned on by adding `"?human=true"` to the request url. It will display a rounded, human readable time value. + +==== Scoring changes + +==== Query normalization + +Query normalization has been removed. This means that the TF-IDF similarity no +longer tries to make scores comparable across queries and that boosts are now +integrated into scores as simple multiplicative factors. + +Other similarities are not affected as they did not normalize scores and +already integrated boosts into scores as multiplicative factors. + +See https://issues.apache.org/jira/browse/LUCENE-7347[`LUCENE-7347`] for more +information. + +==== Coordination factors + +Coordination factors have been removed from the scoring formula. This means that +boolean queries no longer score based on the number of matching clauses. +Instead, they always return the sum of the scores of the matching clauses. + +As a consequence, use of the TF-IDF similarity is now discouraged as this was +an important component of the quality of the scores that this similarity +produces. BM25 is recommended instead. + +See https://issues.apache.org/jira/browse/LUCENE-7347[`LUCENE-7347`] for more +information.