OpenSearch/docs/reference/mapping/params/similarity.asciidoc

57 lines
1.8 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[[similarity]]
=== `similarity`
Elasticsearch allows you to configure a scoring algorithm or _similarity_ per
field. The `similarity` setting provides a simple way of choosing a similarity
algorithm other than the default `BM25`, such as `TF/IDF`.
Similarities are mostly useful for <<text,`text`>> fields, but can also apply
to other field types.
Custom similarities can be configured by tuning the parameters of the built-in
similarities. For more details about this expert options, see the
<<index-modules-similarity,similarity module>>.
The only similarities which can be used out of the box, without any further
configuration are:
`BM25`::
The Okapi BM25 algorithm. The algorithm used by default in Elasticsearch and Lucene.
See {defguide}/pluggable-similarites.html[Pluggable Similarity Algorithms]
for more information.
`classic`::
The TF/IDF algorithm which used to be the default in Elasticsearch and
Lucene. See {defguide}/practical-scoring-function.html[Lucenes Practical Scoring Function]
for more information.
`boolean`::
A simple boolean similarity, which is used when full-text ranking is not needed
and the score should only be based on whether the query terms match or not.
Boolean similarity gives terms a score equal to their query boost.
The `similarity` can be set on the field level when a field is first created,
as follows:
[source,console]
--------------------------------------------------
PUT my_index
{
"mappings": {
"properties": {
"default_field": { <1>
"type": "text"
},
"boolean_sim_field": {
"type": "text",
"similarity": "boolean" <2>
}
}
}
}
--------------------------------------------------
<1> The `default_field` uses the `BM25` similarity.
<2> The `boolean_sim_field` uses the `boolean` similarity.