OpenSearch/docs/reference/mapping/params/similarity.asciidoc

[[similarity]]
=== `similarity`

Elasticsearch allows you to configure a scoring algorithm or _similarity_ per
field. The `similarity` setting provides a simple way of choosing a similarity
algorithm other than the default `BM25`, such as `TF/IDF`.

Similarities are mostly useful for <<text,`text`>> fields, but can also apply
to other field types.

Custom similarities can be configured by tuning the parameters of the built-in
similarities. For more details about this expert options, see the
<<index-modules-similarity,similarity module>>.

The only similarities which can be used out of the box, without any further
configuration are:

`BM25`::
        The Okapi BM25 algorithm. The algorithm used by default in Elasticsearch and Lucene.
        See {defguide}/pluggable-similarites.html[Pluggable Similarity Algorithms]
        for more information.

`classic`::
        The TF/IDF algorithm which used to be the default in Elasticsearch and
        Lucene. See {defguide}/practical-scoring-function.html[Lucene’s Practical Scoring Function]
        for more information.

`boolean`::
        A simple boolean similarity, which is used when full-text ranking is not needed
        and the score should only be based on whether the query terms match or not.
        Boolean similarity gives terms a score equal to their query boost.


The `similarity` can be set on the field level when a field is first created,
as follows:

[source,js]
--------------------------------------------------
PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "default_field": { <1>
          "type": "text"
        },
        "boolean_sim_field": {
          "type": "text",
          "similarity": "boolean" <2>
        }
      }
    }
  }
}
--------------------------------------------------
// CONSOLE
<1> The `default_field` uses the `BM25` similarity.
<2> The `boolean_sim_field` uses the `boolean` similarity.
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 11:24:29 -04:00
+								[[similarity]]
 								=== `similarity`
 								Elasticsearch allows you to configure a scoring algorithm or _similarity_ per
 								field. The `similarity` setting provides a simple way of choosing a similarity
-												Adds boolean similarity to Elasticsearch (#23637)

This commit adds the boolean similarity scoring from Lucene to
Elasticsearch.  The boolean similarity provides a means to specify that
a field should not be scored with typical full-text ranking algorithms,
but rather just whether the query terms match the document or not.
Boolean similarity scores a query term equal to its query boost only.
Boolean similarity is available as a default similarity option and thus
a field can be specified to have boolean similarity by declaring in its
mapping:
    "similarity": "boolean"

Closes #6731
											
										
										
											2017-03-28 10:17:23 -04:00
+								algorithm other than the default `BM25`, such as `TF/IDF`.
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 11:24:29 -04:00
-												Document 5.0 mapping changes.

											
										
										
											2016-03-18 12:01:27 -04:00
+								Similarities are mostly useful for <<text,`text`>> fields, but can also apply
 								to other field types.
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 11:24:29 -04:00
-												Update lucene to r1725675

Adds DFI (divergence from independence) provider.
Fixes test bugs passing invalid values for BM25 parameters.

											
										
										
											2016-01-20 03:32:51 -05:00
+								Custom similarities can be configured by tuning the parameters of the built-in
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 11:24:29 -04:00
+								similarities. For more details about this expert options, see the
 								<<index-modules-similarity,similarity module>>.
 								The only similarities which can be used out of the box, without any further
 								configuration are:
 								`BM25`::
-												Correct similarity default for 5.0 (#21144)


											
										
										
											2016-10-27 04:32:01 -04:00
+								        The Okapi BM25 algorithm. The algorithm used by default in Elasticsearch and Lucene.
-												Fix typos in docs.

											
										
										
											2016-02-09 05:07:32 -05:00
+								        See {defguide}/pluggable-similarites.html[Pluggable Similarity Algorithms]
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 11:24:29 -04:00
+								        for more information.
-												Correct similarity default for 5.0 (#21144)


											
										
										
											2016-10-27 04:32:01 -04:00
+								`classic`::
 								        The TF/IDF algorithm which used to be the default in Elasticsearch and
 								        Lucene. See {defguide}/practical-scoring-function.html[Lucene’s Practical Scoring Function]
 								        for more information.
-												Adds boolean similarity to Elasticsearch (#23637)

This commit adds the boolean similarity scoring from Lucene to
Elasticsearch.  The boolean similarity provides a means to specify that
a field should not be scored with typical full-text ranking algorithms,
but rather just whether the query terms match the document or not.
Boolean similarity scores a query term equal to its query boost only.
Boolean similarity is available as a default similarity option and thus
a field can be specified to have boolean similarity by declaring in its
mapping:
    "similarity": "boolean"

Closes #6731
											
										
										
											2017-03-28 10:17:23 -04:00
+								`boolean`::
 								        A simple boolean similarity, which is used when full-text ranking is not needed
 								        and the score should only be based on whether the query terms match or not.
 								        Boolean similarity gives terms a score equal to their query boost.
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 11:24:29 -04:00
 								The `similarity` can be set on the field level when a field is first created,
 								as follows:
 								[source,js]
 								--------------------------------------------------
 								PUT my_index
 								{
 								  "mappings": {
-												Allow `_doc` as a type. (#27816)

Allowing `_doc` as a type will enable users to make the transition to 7.0
smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`.
This also moves most of the documentation to `_doc` as a type name.

Closes #27750
Closes #27751
											
										
										
											2017-12-14 11:47:53 -05:00
+								    "_doc": {
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 11:24:29 -04:00
+								      "properties": {
 								        "default_field": { <1>
-												Document 5.0 mapping changes.

											
										
										
											2016-03-18 12:01:27 -04:00
+								          "type": "text"
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 11:24:29 -04:00
+								        },
-												Adds boolean similarity to Elasticsearch (#23637)

This commit adds the boolean similarity scoring from Lucene to
Elasticsearch.  The boolean similarity provides a means to specify that
a field should not be scored with typical full-text ranking algorithms,
but rather just whether the query terms match the document or not.
Boolean similarity scores a query term equal to its query boost only.
Boolean similarity is available as a default similarity option and thus
a field can be specified to have boolean similarity by declaring in its
mapping:
    "similarity": "boolean"

Closes #6731
											
										
										
											2017-03-28 10:17:23 -04:00
+								        "boolean_sim_field": {
 								          "type": "text",
-												Improve similarity integration. (#29187)

This improves the way similarities are plugged in in order to:
 - reject the classic similarity on 7.x indices and emit a deprecation
   warning otherwise
 - reject unkwown parameters on 7.x indices and emit a deprecation
   warning otherwise

Even though this breaks the plugin API, I'd like to backport to 7.x so
that users can get deprecation warnings when they are doing something
that will become unsupported in the future.

Closes #23208
Closes #29035

											
										
										
											2018-04-03 10:45:25 -04:00
+								          "similarity": "boolean" <2>
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 11:24:29 -04:00
+								        }
 								      }
 								    }
 								  }
 								}
 								--------------------------------------------------
-												Renamed all AUTOSENSE snippets to CONSOLE (#18210)
											
										
										
											2016-05-09 09:42:23 -04:00
+								// CONSOLE
-												Correct similarity default for 5.0 (#21144)


											
										
										
											2016-10-27 04:32:01 -04:00
+								<1> The `default_field` uses the `BM25` similarity.
-												Improve similarity integration. (#29187)

This improves the way similarities are plugged in in order to:
 - reject the classic similarity on 7.x indices and emit a deprecation
   warning otherwise
 - reject unkwown parameters on 7.x indices and emit a deprecation
   warning otherwise

Even though this breaks the plugin API, I'd like to backport to 7.x so
that users can get deprecation warnings when they are doing something
that will become unsupported in the future.

Closes #23208
Closes #29035

											
										
										
											2018-04-03 10:45:25 -04:00
+								<2> The `boolean_sim_field` uses the `boolean` similarity.