[DOCS] Reformat distance feature query (#44916)

This commit is contained in:
James Rodewig 2019-07-29 08:34:50 -04:00
parent ad129f7947
commit 652f943f30
1 changed files with 135 additions and 86 deletions

View File

@ -4,81 +4,38 @@
<titleabbrev>Distance feature</titleabbrev>
++++
The `distance_feature` query is a specialized query that only works
on <<date, `date`>>, <<date_nanos, `date_nanos`>> or <<geo-point,`geo_point`>>
fields. Its goal is to boost documents' scores based on proximity
to some given origin. For example, use this query if you want to
give more weight to documents with dates closer to a certain date,
or to documents with locations closer to a certain location.
Boosts the <<query-filter-context, relevance score>> of documents closer to a
provided `origin` date or point. For example, you can use this query to give
more weight to documents closer to a certain date or location.
This query is called `distance_feature` query, because it dynamically
calculates distances between the given origin and documents' field values,
and use these distances as features to boost the documents' scores.
`distance_feature` query is typically used on its own to find the nearest
neighbors to a given point, or put in a `should` clause of a
<<query-dsl-bool-query,`bool`>> query so that its score is added to the score
of the query.
Compared to using <<query-dsl-function-score-query,`function_score`>> or other
ways to modify the score, this query has the benefit of being able to
efficiently skip non-competitive hits when
<<search-uri-request,`track_total_hits`>> is not set to `true`.
==== Syntax of distance_feature query
`distance_feature` query has the following syntax:
[source,js]
--------------------------------------------------
"distance_feature": {
"field": <field>,
"origin": <origin>,
"pivot": <pivot>,
"boost" : <boost>
}
--------------------------------------------------
// NOTCONSOLE
[horizontal]
`field`::
Required parameter. Defines the name of the field on which to calculate
distances. Must be a field of the type `date`, `date_nanos` or `geo_point`,
and must be indexed (`"index": true`, which is the default) and has
<<doc-values, doc values>> (`"doc_values": true`, which is the default).
`origin`::
Required parameter. Defines a point of origin used for calculating
distances. Must be a date for date and date_nanos fields,
and a geo-point for geo_point fields. Date math (for example `now-1h`) is
supported for a date origin.
`pivot`::
Required parameter. Defines the distance from origin at which the computed
score will equal to a half of the `boost` parameter. Must be
a `number+date unit` ("1h", "10d",...) for date and date_nanos fields,
and a `number + geo unit` ("1km", "12m",...) for geo fields.
`boost`::
Optional parameter with a default value of `1`. Defines the factor by which
to multiply the score. Must be a non-negative float number.
You can use the `distance_feature` query to find the nearest neighbors to a
location. You can also use the query in a <<query-dsl-bool-query,`bool`>>
search's `should` filter to add boosted relevance scores to the `bool` query's
scores.
The `distance_feature` query computes a document's score as following:
[[distance-feature-query-ex-request]]
==== Example request
`score = boost * pivot / (pivot + distance)`
[[distance-feature-index-setup]]
===== Index setup
To use the `distance_feature` query, your index must include a <<date, `date`>>,
<<date_nanos, `date_nanos`>> or <<geo-point,`geo_point`>> field.
where `distance` is the absolute difference between the origin and
a document's field value.
To see how you can set up an index for the `distance_feature` query, try the
following example.
==== Example using distance_feature query
. Create an `items` index with the following field mapping:
+
--
Let's look at an example. We index several documents containing
information about sales items, such as name, production date,
and location.
* `name`, a <<keyword,`keyword`>> field
* `production_date`, a <<date, `date`>> field
* `location`, a <<geo-point,`geo_point`>> field
[source,js]
--------------------------------------------------
PUT items
----
PUT /items
{
"mappings": {
"properties": {
@ -94,15 +51,24 @@ PUT items
}
}
}
----
// CONSOLE
// TESTSETUP
--
PUT items/_doc/1
. Index several documents to this index.
+
--
[source,js]
----
PUT /items/_doc/1?refresh
{
"name" : "chocolate",
"production_date": "2018-02-01",
"location": [-71.34, 41.12]
}
PUT items/_doc/2
PUT /items/_doc/2?refresh
{
"name" : "chocolate",
"production_date": "2018-01-01",
@ -110,24 +76,29 @@ PUT items/_doc/2
}
PUT items/_doc/3
PUT /items/_doc/3?refresh
{
"name" : "chocolate",
"production_date": "2017-12-01",
"location": [-71.3, 41.12]
}
POST items/_refresh
--------------------------------------------------
----
// CONSOLE
--
We look for all chocolate items, but we also want chocolates
that are produced recently (closer to the date `now`)
to be ranked higher.
[[distance-feature-query-ex-query]]
===== Example queries
[[distance-feature-query-date-ex]]
====== Boost documents based on date
The following `bool` search returns documents with a `name` value of
`chocolate`. The search also uses the `distance_feature` query to increase the
relevance score of documents with a `production_date` value closer to `now`.
[source,js]
--------------------------------------------------
GET items/_search
----
GET /items/_search
{
"query": {
"bool": {
@ -146,17 +117,18 @@ GET items/_search
}
}
}
--------------------------------------------------
----
// CONSOLE
// TEST[continued]
We can look for all chocolate items, but we also want chocolates
that are produced locally (closer to our geo origin)
come first in the result list.
[[distance-feature-query-distance-ex]]
====== Boost documents based on location
The following `bool` search returns documents with a `name` value of
`chocolate`. The search also uses the `distance_feature` query to increase the
relevance score of documents with a `location` value closer to `[-71.3, 41.15]`.
[source,js]
--------------------------------------------------
GET items/_search
----
GET /items/_search
{
"query": {
"bool": {
@ -175,6 +147,83 @@ GET items/_search
}
}
}
--------------------------------------------------
----
// CONSOLE
// TEST[continued]
[[distance-feature-top-level-params]]
==== Top-level parameters for `distance_feature`
`field`::
(Required, string) Name of the field used to calculate distances. This field
must meet the following criteria:
* Be a <<date, `date`>>, <<date_nanos, `date_nanos`>> or
<<geo-point,`geo_point`>> field
* Have an <<mapping-index,`index`>> mapping parameter value of `true`, which is
the default
* Have an <<doc-values,`doc_values`>> mapping parameter value of `true`, which
is the default
`origin`::
+
--
(Required, string) Date or point of origin used to calculate distances.
If the `field` value is a <<date, `date`>> or <<date_nanos, `date_nanos`>>
field, the `origin` value must be a <<date-format-pattern,date>>.
<<date-math,Date Math>>, such as `now-1h`, is supported.
If the `field` value is a <<geo-point,`geo_point`>> field, the `origin` value
must be a geopoint.
--
`pivot`::
+
--
(Required, <<time-units,time unit>> or <<distance-units,distance unit>>)
Distance from the `origin` at which relevance scores receive half of the `boost`
value.
If the `field` value is a <<date, `date`>> or <<date_nanos, `date_nanos`>>
field, the `pivot` value must be a <<time-units,time unit>>, such as `1h` or
`10d`.
If the `field` value is a <<geo-point,`geo_point`>> field, the `pivot` value
must be a <<distance-units,distance unit>>, such as `1km` or `12m`.
--
`boost`::
+
--
(Optional, float) Floating point number used to multiply the
<<query-filter-context, relevance score>> of matching documents. This value
cannot be negative. Defaults to `1.0`.
--
[[distance-feature-notes]]
==== Notes
[[distance-feature-calculation]]
===== How the `distance_feature` query calculates relevance scores
The `distance_feature` query dynamically calculates the distance between the
`origin` value and a document's field values. It then uses this distance as a
feature to boost the <<query-filter-context, relevance score>> of closer
documents.
The `distance_feature` query calculates a document's <<query-filter-context,
relevance score>> as follows:
```
relevance score = boost * pivot / (pivot + distance)
```
The `distance` is the absolute difference between the `origin` value and a
document's field value.
[[distance-feature-skip-hits]]
===== Skip non-competitive hits
Unlike the <<query-dsl-function-score-query,`function_score`>> query or other
ways to change <<query-filter-context, relevance scores>>, the
`distance_feature` query efficiently skips non-competitive hits when the
<<search-uri-request,`track_total_hits`>> parameter is **not** `true`.