OpenSearch/docs/reference/mapping/params/similarity.asciidoc

[[similarity]]
=== `similarity`

Elasticsearch allows you to configure a scoring algorithm or _similarity_ per
field. The `similarity` setting provides a simple way of choosing a similarity
algorithm other than the default `BM25`, such as `TF/IDF`.

Similarities are mostly useful for <<text,`text`>> fields, but can also apply
to other field types.

Custom similarities can be configured by tuning the parameters of the built-in
similarities. For more details about this expert options, see the
<<index-modules-similarity,similarity module>>.

The only similarities which can be used out of the box, without any further
configuration are:

`BM25`::
        The Okapi BM25 algorithm. The algorithm used by default in Elasticsearch and Lucene.
        See {defguide}/pluggable-similarites.html[Pluggable Similarity Algorithms]
        for more information.

`classic`::
        The TF/IDF algorithm which used to be the default in Elasticsearch and
        Lucene. See {defguide}/practical-scoring-function.html[Lucene’s Practical Scoring Function]
        for more information.

`boolean`::
        A simple boolean similarity, which is used when full-text ranking is not needed
        and the score should only be based on whether the query terms match or not.
        Boolean similarity gives terms a score equal to their query boost.


The `similarity` can be set on the field level when a field is first created,
as follows:

[source,js]
--------------------------------------------------
PUT my_index?include_type_name=true
{
  "mappings": {
    "_doc": {
      "properties": {
        "default_field": { <1>
          "type": "text"
        },
        "boolean_sim_field": {
          "type": "text",
          "similarity": "boolean" <2>
        }
      }
    }
  }
}
--------------------------------------------------
// CONSOLE
<1> The `default_field` uses the `BM25` similarity.
<2> The `boolean_sim_field` uses the `boolean` similarity.
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 17:24:29 +02:00
+								[[similarity]]
 								=== `similarity`
 								Elasticsearch allows you to configure a scoring algorithm or _similarity_ per
 								field. The `similarity` setting provides a simple way of choosing a similarity
-												Adds boolean similarity to Elasticsearch (#23637)

This commit adds the boolean similarity scoring from Lucene to
Elasticsearch.  The boolean similarity provides a means to specify that
a field should not be scored with typical full-text ranking algorithms,
but rather just whether the query terms match the document or not.
Boolean similarity scores a query term equal to its query boost only.
Boolean similarity is available as a default similarity option and thus
a field can be specified to have boolean similarity by declaring in its
mapping:
    "similarity": "boolean"

Closes #6731
											
										
										
											2017-03-28 10:17:23 -04:00
+								algorithm other than the default `BM25`, such as `TF/IDF`.
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 17:24:29 +02:00
-												Document 5.0 mapping changes.

											
										
										
											2016-03-18 17:01:27 +01:00
+								Similarities are mostly useful for <<text,`text`>> fields, but can also apply
 								to other field types.
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 17:24:29 +02:00
-												Update lucene to r1725675

Adds DFI (divergence from independence) provider.
Fixes test bugs passing invalid values for BM25 parameters.

											
										
										
											2016-01-20 03:32:51 -05:00
+								Custom similarities can be configured by tuning the parameters of the built-in
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 17:24:29 +02:00
+								similarities. For more details about this expert options, see the
 								<<index-modules-similarity,similarity module>>.
 								The only similarities which can be used out of the box, without any further
 								configuration are:
 								`BM25`::
-												Correct similarity default for 5.0 (#21144)


											
										
										
											2016-10-27 09:32:01 +01:00
+								        The Okapi BM25 algorithm. The algorithm used by default in Elasticsearch and Lucene.
-												Fix typos in docs.

											
										
										
											2016-02-09 02:07:32 -08:00
+								        See {defguide}/pluggable-similarites.html[Pluggable Similarity Algorithms]
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 17:24:29 +02:00
+								        for more information.
-												Correct similarity default for 5.0 (#21144)


											
										
										
											2016-10-27 09:32:01 +01:00
+								`classic`::
 								        The TF/IDF algorithm which used to be the default in Elasticsearch and
 								        Lucene. See {defguide}/practical-scoring-function.html[Lucene’s Practical Scoring Function]
 								        for more information.
-												Adds boolean similarity to Elasticsearch (#23637)

This commit adds the boolean similarity scoring from Lucene to
Elasticsearch.  The boolean similarity provides a means to specify that
a field should not be scored with typical full-text ranking algorithms,
but rather just whether the query terms match the document or not.
Boolean similarity scores a query term equal to its query boost only.
Boolean similarity is available as a default similarity option and thus
a field can be specified to have boolean similarity by declaring in its
mapping:
    "similarity": "boolean"

Closes #6731
											
										
										
											2017-03-28 10:17:23 -04:00
+								`boolean`::
 								        A simple boolean similarity, which is used when full-text ranking is not needed
 								        and the score should only be based on whether the query terms match or not.
 								        Boolean similarity gives terms a score equal to their query boost.
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 17:24:29 +02:00
 								The `similarity` can be set on the field level when a field is first created,
 								as follows:
 								[source,js]
 								--------------------------------------------------
-												Update the default for include_type_name to false. (#37285)

* Default include_type_name to false for get and put mappings.

* Default include_type_name to false for get field mappings.

* Add a constant for the default include_type_name value.

* Default include_type_name to false for get and put index templates.

* Default include_type_name to false for create index.

* Update create index calls in REST documentation to use include_type_name=true.

* Some minor clean-ups around the get index API.

* In REST tests, use include_type_name=true by default for index creation.

* Make sure to use 'expression == false'.

* Clarify the different IndexTemplateMetaData toXContent methods.

* Fix FullClusterRestartIT#testSnapshotRestore.

* Fix the ml_anomalies_default_mappings test.

* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.

We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.

This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.

* Fix more REST tests.

* Improve some wording in the create index documentation.

* Add a note about types removal in the create index docs.

* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.

* Make sure to mention include_type_name in the REST docs for affected APIs.

* Make sure to use 'expression == false' in FullClusterRestartIT.

* Mention include_type_name in the REST templates docs.

											
										
										
											2019-01-14 13:08:01 -08:00
+								PUT my_index?include_type_name=true
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 17:24:29 +02:00
+								{
 								  "mappings": {
-												Allow `_doc` as a type. (#27816)

Allowing `_doc` as a type will enable users to make the transition to 7.0
smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`.
This also moves most of the documentation to `_doc` as a type name.

Closes #27750
Closes #27751
											
										
										
											2017-12-14 17:47:53 +01:00
+								    "_doc": {
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 17:24:29 +02:00
+								      "properties": {
 								        "default_field": { <1>
-												Document 5.0 mapping changes.

											
										
										
											2016-03-18 17:01:27 +01:00
+								          "type": "text"
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 17:24:29 +02:00
+								        },
-												Adds boolean similarity to Elasticsearch (#23637)

This commit adds the boolean similarity scoring from Lucene to
Elasticsearch.  The boolean similarity provides a means to specify that
a field should not be scored with typical full-text ranking algorithms,
but rather just whether the query terms match the document or not.
Boolean similarity scores a query term equal to its query boost only.
Boolean similarity is available as a default similarity option and thus
a field can be specified to have boolean similarity by declaring in its
mapping:
    "similarity": "boolean"

Closes #6731
											
										
										
											2017-03-28 10:17:23 -04:00
+								        "boolean_sim_field": {
 								          "type": "text",
-												Improve similarity integration. (#29187)

This improves the way similarities are plugged in in order to:
 - reject the classic similarity on 7.x indices and emit a deprecation
   warning otherwise
 - reject unkwown parameters on 7.x indices and emit a deprecation
   warning otherwise

Even though this breaks the plugin API, I'd like to backport to 7.x so
that users can get deprecation warnings when they are doing something
that will become unsupported in the future.

Closes #23208
Closes #29035

											
										
										
											2018-04-03 16:45:25 +02:00
+								          "similarity": "boolean" <2>
-												Docs: Mapping docs completely rewritten for 2.0

											
										
										
											2015-08-06 17:24:29 +02:00
+								        }
 								      }
 								    }
 								  }
 								}
 								--------------------------------------------------
-												Renamed all AUTOSENSE snippets to CONSOLE (#18210)
											
										
										
											2016-05-09 15:42:23 +02:00
+								// CONSOLE
-												Correct similarity default for 5.0 (#21144)


											
										
										
											2016-10-27 09:32:01 +01:00
+								<1> The `default_field` uses the `BM25` similarity.
-												Improve similarity integration. (#29187)

This improves the way similarities are plugged in in order to:
 - reject the classic similarity on 7.x indices and emit a deprecation
   warning otherwise
 - reject unkwown parameters on 7.x indices and emit a deprecation
   warning otherwise

Even though this breaks the plugin API, I'd like to backport to 7.x so
that users can get deprecation warnings when they are doing something
that will become unsupported in the future.

Closes #23208
Closes #29035

											
										
										
											2018-04-03 16:45:25 +02:00
+								<2> The `boolean_sim_field` uses the `boolean` similarity.