[DOCS] Reformat distance feature query (#44916)

This commit is contained in:
James Rodewig 2019-07-29 08:34:50 -04:00
parent ad129f7947
commit 652f943f30
1 changed files with 135 additions and 86 deletions

View File

@ -4,81 +4,38 @@
<titleabbrev>Distance feature</titleabbrev> <titleabbrev>Distance feature</titleabbrev>
++++ ++++
The `distance_feature` query is a specialized query that only works Boosts the <<query-filter-context, relevance score>> of documents closer to a
on <<date, `date`>>, <<date_nanos, `date_nanos`>> or <<geo-point,`geo_point`>> provided `origin` date or point. For example, you can use this query to give
fields. Its goal is to boost documents' scores based on proximity more weight to documents closer to a certain date or location.
to some given origin. For example, use this query if you want to
give more weight to documents with dates closer to a certain date,
or to documents with locations closer to a certain location.
This query is called `distance_feature` query, because it dynamically You can use the `distance_feature` query to find the nearest neighbors to a
calculates distances between the given origin and documents' field values, location. You can also use the query in a <<query-dsl-bool-query,`bool`>>
and use these distances as features to boost the documents' scores. search's `should` filter to add boosted relevance scores to the `bool` query's
scores.
`distance_feature` query is typically used on its own to find the nearest
neighbors to a given point, or put in a `should` clause of a
<<query-dsl-bool-query,`bool`>> query so that its score is added to the score
of the query.
Compared to using <<query-dsl-function-score-query,`function_score`>> or other
ways to modify the score, this query has the benefit of being able to
efficiently skip non-competitive hits when
<<search-uri-request,`track_total_hits`>> is not set to `true`.
==== Syntax of distance_feature query
`distance_feature` query has the following syntax:
[source,js]
--------------------------------------------------
"distance_feature": {
"field": <field>,
"origin": <origin>,
"pivot": <pivot>,
"boost" : <boost>
}
--------------------------------------------------
// NOTCONSOLE
[horizontal]
`field`::
Required parameter. Defines the name of the field on which to calculate
distances. Must be a field of the type `date`, `date_nanos` or `geo_point`,
and must be indexed (`"index": true`, which is the default) and has
<<doc-values, doc values>> (`"doc_values": true`, which is the default).
`origin`::
Required parameter. Defines a point of origin used for calculating
distances. Must be a date for date and date_nanos fields,
and a geo-point for geo_point fields. Date math (for example `now-1h`) is
supported for a date origin.
`pivot`::
Required parameter. Defines the distance from origin at which the computed
score will equal to a half of the `boost` parameter. Must be
a `number+date unit` ("1h", "10d",...) for date and date_nanos fields,
and a `number + geo unit` ("1km", "12m",...) for geo fields.
`boost`::
Optional parameter with a default value of `1`. Defines the factor by which
to multiply the score. Must be a non-negative float number.
The `distance_feature` query computes a document's score as following: [[distance-feature-query-ex-request]]
==== Example request
`score = boost * pivot / (pivot + distance)` [[distance-feature-index-setup]]
===== Index setup
To use the `distance_feature` query, your index must include a <<date, `date`>>,
<<date_nanos, `date_nanos`>> or <<geo-point,`geo_point`>> field.
where `distance` is the absolute difference between the origin and To see how you can set up an index for the `distance_feature` query, try the
a document's field value. following example.
==== Example using distance_feature query . Create an `items` index with the following field mapping:
+
--
Let's look at an example. We index several documents containing * `name`, a <<keyword,`keyword`>> field
information about sales items, such as name, production date, * `production_date`, a <<date, `date`>> field
and location. * `location`, a <<geo-point,`geo_point`>> field
[source,js] [source,js]
-------------------------------------------------- ----
PUT items PUT /items
{ {
"mappings": { "mappings": {
"properties": { "properties": {
@ -94,15 +51,24 @@ PUT items
} }
} }
} }
----
// CONSOLE
// TESTSETUP
--
PUT items/_doc/1 . Index several documents to this index.
+
--
[source,js]
----
PUT /items/_doc/1?refresh
{ {
"name" : "chocolate", "name" : "chocolate",
"production_date": "2018-02-01", "production_date": "2018-02-01",
"location": [-71.34, 41.12] "location": [-71.34, 41.12]
} }
PUT items/_doc/2 PUT /items/_doc/2?refresh
{ {
"name" : "chocolate", "name" : "chocolate",
"production_date": "2018-01-01", "production_date": "2018-01-01",
@ -110,24 +76,29 @@ PUT items/_doc/2
} }
PUT items/_doc/3 PUT /items/_doc/3?refresh
{ {
"name" : "chocolate", "name" : "chocolate",
"production_date": "2017-12-01", "production_date": "2017-12-01",
"location": [-71.3, 41.12] "location": [-71.3, 41.12]
} }
----
POST items/_refresh
--------------------------------------------------
// CONSOLE // CONSOLE
--
We look for all chocolate items, but we also want chocolates
that are produced recently (closer to the date `now`) [[distance-feature-query-ex-query]]
to be ranked higher. ===== Example queries
[[distance-feature-query-date-ex]]
====== Boost documents based on date
The following `bool` search returns documents with a `name` value of
`chocolate`. The search also uses the `distance_feature` query to increase the
relevance score of documents with a `production_date` value closer to `now`.
[source,js] [source,js]
-------------------------------------------------- ----
GET items/_search GET /items/_search
{ {
"query": { "query": {
"bool": { "bool": {
@ -146,17 +117,18 @@ GET items/_search
} }
} }
} }
-------------------------------------------------- ----
// CONSOLE // CONSOLE
// TEST[continued]
We can look for all chocolate items, but we also want chocolates [[distance-feature-query-distance-ex]]
that are produced locally (closer to our geo origin) ====== Boost documents based on location
come first in the result list. The following `bool` search returns documents with a `name` value of
`chocolate`. The search also uses the `distance_feature` query to increase the
relevance score of documents with a `location` value closer to `[-71.3, 41.15]`.
[source,js] [source,js]
-------------------------------------------------- ----
GET items/_search GET /items/_search
{ {
"query": { "query": {
"bool": { "bool": {
@ -175,6 +147,83 @@ GET items/_search
} }
} }
} }
-------------------------------------------------- ----
// CONSOLE // CONSOLE
// TEST[continued]
[[distance-feature-top-level-params]]
==== Top-level parameters for `distance_feature`
`field`::
(Required, string) Name of the field used to calculate distances. This field
must meet the following criteria:
* Be a <<date, `date`>>, <<date_nanos, `date_nanos`>> or
<<geo-point,`geo_point`>> field
* Have an <<mapping-index,`index`>> mapping parameter value of `true`, which is
the default
* Have an <<doc-values,`doc_values`>> mapping parameter value of `true`, which
is the default
`origin`::
+
--
(Required, string) Date or point of origin used to calculate distances.
If the `field` value is a <<date, `date`>> or <<date_nanos, `date_nanos`>>
field, the `origin` value must be a <<date-format-pattern,date>>.
<<date-math,Date Math>>, such as `now-1h`, is supported.
If the `field` value is a <<geo-point,`geo_point`>> field, the `origin` value
must be a geopoint.
--
`pivot`::
+
--
(Required, <<time-units,time unit>> or <<distance-units,distance unit>>)
Distance from the `origin` at which relevance scores receive half of the `boost`
value.
If the `field` value is a <<date, `date`>> or <<date_nanos, `date_nanos`>>
field, the `pivot` value must be a <<time-units,time unit>>, such as `1h` or
`10d`.
If the `field` value is a <<geo-point,`geo_point`>> field, the `pivot` value
must be a <<distance-units,distance unit>>, such as `1km` or `12m`.
--
`boost`::
+
--
(Optional, float) Floating point number used to multiply the
<<query-filter-context, relevance score>> of matching documents. This value
cannot be negative. Defaults to `1.0`.
--
[[distance-feature-notes]]
==== Notes
[[distance-feature-calculation]]
===== How the `distance_feature` query calculates relevance scores
The `distance_feature` query dynamically calculates the distance between the
`origin` value and a document's field values. It then uses this distance as a
feature to boost the <<query-filter-context, relevance score>> of closer
documents.
The `distance_feature` query calculates a document's <<query-filter-context,
relevance score>> as follows:
```
relevance score = boost * pivot / (pivot + distance)
```
The `distance` is the absolute difference between the `origin` value and a
document's field value.
[[distance-feature-skip-hits]]
===== Skip non-competitive hits
Unlike the <<query-dsl-function-score-query,`function_score`>> query or other
ways to change <<query-filter-context, relevance scores>>, the
`distance_feature` query efficiently skips non-competitive hits when the
<<search-uri-request,`track_total_hits`>> parameter is **not** `true`.