2015-08-06 11:24:29 -04:00
|
|
|
[[similarity]]
|
|
|
|
=== `similarity`
|
|
|
|
|
|
|
|
Elasticsearch allows you to configure a scoring algorithm or _similarity_ per
|
|
|
|
field. The `similarity` setting provides a simple way of choosing a similarity
|
2017-03-28 10:17:23 -04:00
|
|
|
algorithm other than the default `BM25`, such as `TF/IDF`.
|
2015-08-06 11:24:29 -04:00
|
|
|
|
2016-03-18 12:01:27 -04:00
|
|
|
Similarities are mostly useful for <<text,`text`>> fields, but can also apply
|
|
|
|
to other field types.
|
2015-08-06 11:24:29 -04:00
|
|
|
|
2016-01-20 03:32:51 -05:00
|
|
|
Custom similarities can be configured by tuning the parameters of the built-in
|
2015-08-06 11:24:29 -04:00
|
|
|
similarities. For more details about this expert options, see the
|
|
|
|
<<index-modules-similarity,similarity module>>.
|
|
|
|
|
|
|
|
The only similarities which can be used out of the box, without any further
|
|
|
|
configuration are:
|
|
|
|
|
|
|
|
`BM25`::
|
2020-08-17 11:27:04 -04:00
|
|
|
The {wikipedia}/Okapi_BM25[Okapi BM25 algorithm]. The
|
2020-05-19 11:04:53 -04:00
|
|
|
algorithm used by default in {es} and Lucene.
|
2015-08-06 11:24:29 -04:00
|
|
|
|
2016-10-27 04:32:01 -04:00
|
|
|
`classic`::
|
2020-05-19 11:04:53 -04:00
|
|
|
deprecated:[7.0.0]
|
|
|
|
The https://en.wikipedia.org/wiki/Tf%E2%80%93idf[TF/IDF algorithm], the former
|
|
|
|
default in {es} and Lucene.
|
2016-10-27 04:32:01 -04:00
|
|
|
|
2017-03-28 10:17:23 -04:00
|
|
|
`boolean`::
|
|
|
|
A simple boolean similarity, which is used when full-text ranking is not needed
|
|
|
|
and the score should only be based on whether the query terms match or not.
|
|
|
|
Boolean similarity gives terms a score equal to their query boost.
|
|
|
|
|
2015-08-06 11:24:29 -04:00
|
|
|
|
|
|
|
The `similarity` can be set on the field level when a field is first created,
|
|
|
|
as follows:
|
|
|
|
|
2019-09-06 11:31:13 -04:00
|
|
|
[source,console]
|
2015-08-06 11:24:29 -04:00
|
|
|
--------------------------------------------------
|
2020-07-27 15:58:26 -04:00
|
|
|
PUT my-index-000001
|
2015-08-06 11:24:29 -04:00
|
|
|
{
|
|
|
|
"mappings": {
|
2019-01-22 09:13:52 -05:00
|
|
|
"properties": {
|
|
|
|
"default_field": { <1>
|
|
|
|
"type": "text"
|
|
|
|
},
|
|
|
|
"boolean_sim_field": {
|
|
|
|
"type": "text",
|
|
|
|
"similarity": "boolean" <2>
|
2015-08-06 11:24:29 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
2019-09-06 11:31:13 -04:00
|
|
|
|
2016-10-27 04:32:01 -04:00
|
|
|
<1> The `default_field` uses the `BM25` similarity.
|
2018-04-03 10:45:25 -04:00
|
|
|
<2> The `boolean_sim_field` uses the `boolean` similarity.
|