OpenSearch/docs/reference/mapping/params/similarity.asciidoc

[[similarity]]
=== `similarity`

Elasticsearch allows you to configure a scoring algorithm or _similarity_ per
field. The `similarity` setting provides a simple way of choosing a similarity
algorithm other than the default TF/IDF, such as `BM25`.

Similarities are mostly useful for <<text,`text`>> fields, but can also apply
to other field types.

Custom similarities can be configured by tuning the parameters of the built-in
similarities. For more details about this expert options, see the
<<index-modules-similarity,similarity module>>.

The only similarities which can be used out of the box, without any further
configuration are:

`classic`::
        The Default TF/IDF algorithm used by Elasticsearch and
        Lucene. See {defguide}/practical-scoring-function.html[Lucene’s Practical Scoring Function]
        for more information.

`BM25`::
        The Okapi BM25 algorithm.
        See {defguide}/pluggable-similarites.html[Pluggable Similarity Algorithms]
        for more information.


The `similarity` can be set on the field level when a field is first created,
as follows:

[source,js]
--------------------------------------------------
PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "default_field": { <1>
          "type": "text"
        },
        "bm25_field": {
          "type": "text",
          "similarity": "BM25" <2>
        }
      }
    }
  }
}
--------------------------------------------------
// AUTOSENSE
<1> The `default_field` uses the `classic` similarity (ie TF/IDF).
<2> The `bm25_field` uses the `BM25` similarity.
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 11:24:29 -04:00
+								[[similarity]]
 								=== `similarity`
 								Elasticsearch allows you to configure a scoring algorithm or _similarity_ per
 								field. The `similarity` setting provides a simple way of choosing a similarity
 								algorithm other than the default TF/IDF, such as `BM25`.
-												Document 5.0 mapping changes.

											
										
										
											2016-03-18 12:01:27 -04:00
+								Similarities are mostly useful for <<text,`text`>> fields, but can also apply
 								to other field types.
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 11:24:29 -04:00
-												Update lucene to r1725675

Adds DFI (divergence from independence) provider.
Fixes test bugs passing invalid values for BM25 parameters.

											
										
										
											2016-01-20 03:32:51 -05:00
+								Custom similarities can be configured by tuning the parameters of the built-in
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 11:24:29 -04:00
+								similarities. For more details about this expert options, see the
 								<<index-modules-similarity,similarity module>>.
 								The only similarities which can be used out of the box, without any further
 								configuration are:
-												Renames "default" similarity into "classic".
Replaces deprecated DefaultSimilarity by ClassicSimilarity.
Fixes #15102

											
										
										
											2015-12-15 06:07:07 -05:00
+								`classic`::
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 11:24:29 -04:00
+								        The Default TF/IDF algorithm used by Elasticsearch and
 								        Lucene. See {defguide}/practical-scoring-function.html[Lucene’s Practical Scoring Function]
 								        for more information.
 								`BM25`::
 								        The Okapi BM25 algorithm.
-												Fix typos in docs.

											
										
										
											2016-02-09 05:07:32 -05:00
+								        See {defguide}/pluggable-similarites.html[Pluggable Similarity Algorithms]
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 11:24:29 -04:00
+								        for more information.
 								The `similarity` can be set on the field level when a field is first created,
 								as follows:
 								[source,js]
 								--------------------------------------------------
 								PUT my_index
 								{
 								  "mappings": {
 								    "my_type": {
 								      "properties": {
 								        "default_field": { <1>
-												Document 5.0 mapping changes.

											
										
										
											2016-03-18 12:01:27 -04:00
+								          "type": "text"
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 11:24:29 -04:00
+								        },
 								        "bm25_field": {
-												Document 5.0 mapping changes.

											
										
										
											2016-03-18 12:01:27 -04:00
+								          "type": "text",
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 11:24:29 -04:00
+								          "similarity": "BM25" <2>
 								        }
 								      }
 								    }
 								  }
 								}
 								--------------------------------------------------
 								// AUTOSENSE
-												Renames "default" similarity into "classic".
Replaces deprecated DefaultSimilarity by ClassicSimilarity.
Fixes #15102

											
										
										
											2015-12-15 06:07:07 -05:00
+								<1> The `default_field` uses the `classic` similarity (ie TF/IDF).
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 11:24:29 -04:00
+								<2> The `bm25_field` uses the `BM25` similarity.