2013-08-28 19:24:34 -04:00
|
|
|
[[search-explain]]
|
|
|
|
== Explain API
|
|
|
|
|
|
|
|
The explain api computes a score explanation for a query and a specific
|
|
|
|
document. This can give useful feedback whether a document matches or
|
2013-10-13 10:46:56 -04:00
|
|
|
didn't match a specific query.
|
|
|
|
|
2018-10-19 12:47:34 -04:00
|
|
|
Note that a single index must be provided to the `index` parameter.
|
2013-08-28 19:24:34 -04:00
|
|
|
|
|
|
|
[float]
|
|
|
|
=== Usage
|
|
|
|
|
|
|
|
Full query example:
|
|
|
|
|
|
|
|
[source,js]
|
2016-08-01 08:09:54 -04:00
|
|
|
--------------------------------------------------
|
2017-12-14 11:47:53 -05:00
|
|
|
GET /twitter/_doc/0/_explain
|
2016-09-14 11:23:25 -04:00
|
|
|
{
|
2016-08-01 08:09:54 -04:00
|
|
|
"query" : {
|
2016-09-14 11:23:25 -04:00
|
|
|
"match" : { "message" : "elasticsearch" }
|
2016-08-01 08:09:54 -04:00
|
|
|
}
|
2016-09-14 11:23:25 -04:00
|
|
|
}
|
2016-08-01 08:09:54 -04:00
|
|
|
--------------------------------------------------
|
2016-09-14 11:23:25 -04:00
|
|
|
// CONSOLE
|
|
|
|
// TEST[setup:twitter]
|
2013-08-28 19:24:34 -04:00
|
|
|
|
|
|
|
This will yield the following result:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
2018-09-06 08:42:06 -04:00
|
|
|
"_index":"twitter",
|
|
|
|
"_type":"_doc",
|
|
|
|
"_id":"0",
|
|
|
|
"matched":true,
|
|
|
|
"explanation":{
|
|
|
|
"value":1.6943597,
|
|
|
|
"description":"weight(message:elasticsearch in 0) [PerFieldSimilarity], result of:",
|
|
|
|
"details":[
|
2016-11-29 12:40:31 -05:00
|
|
|
{
|
2018-09-06 08:42:06 -04:00
|
|
|
"value":1.6943597,
|
|
|
|
"description":"score(freq=1.0), product of:",
|
|
|
|
"details":[
|
2016-11-29 12:40:31 -05:00
|
|
|
{
|
2018-09-06 08:42:06 -04:00
|
|
|
"value":2.2,
|
|
|
|
"description":"scaling factor, k1 + 1",
|
|
|
|
"details":[]
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"value":1.3862944,
|
|
|
|
"description":"idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
|
|
|
|
"details":[
|
2016-11-29 12:40:31 -05:00
|
|
|
{
|
2018-09-06 08:42:06 -04:00
|
|
|
"value":1,
|
|
|
|
"description":"n, number of documents containing term",
|
|
|
|
"details":[]
|
2016-11-29 12:40:31 -05:00
|
|
|
},
|
|
|
|
{
|
2018-09-06 08:42:06 -04:00
|
|
|
"value":5,
|
|
|
|
"description":"N, total number of documents with field",
|
|
|
|
"details":[]
|
|
|
|
}
|
|
|
|
]
|
2016-11-29 12:40:31 -05:00
|
|
|
},
|
2018-09-06 08:42:06 -04:00
|
|
|
{
|
|
|
|
"value":0.5555555,
|
|
|
|
"description":"tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
|
|
|
|
"details":[
|
2016-11-29 12:40:31 -05:00
|
|
|
{
|
2018-09-06 08:42:06 -04:00
|
|
|
"value":1.0,
|
|
|
|
"description":"freq, occurrences of term within document",
|
|
|
|
"details":[]
|
2016-11-29 12:40:31 -05:00
|
|
|
},
|
|
|
|
{
|
2018-09-06 08:42:06 -04:00
|
|
|
"value":1.2,
|
|
|
|
"description":"k1, term saturation parameter",
|
|
|
|
"details":[]
|
Nested queries should avoid adding unnecessary filters when possible. (#23079)
When nested objects are present in the mappings, many queries get deoptimized
due to the need to exclude documents that are not in the right space. For
instance, a filter is applied to all queries that prevents them from matching
non-root documents (`+*:* -_type:__*`). Moreover, a filter is applied to all
child queries of `nested` queries in order to make sure that the child query
only matches child documents (`_type:__nested_path`), which is required by
`ToParentBlockJoinQuery` (the Lucene query behing Elasticsearch's `nested`
queries).
These additional filters slow down `nested` queries. In 1.7-, the cost was
somehow amortized by the fact that we cached filters very aggressively. However,
this has proven to be a significant source of slow downs since 2.0 for users
of `nested` mappings and queries, see #20797.
This change makes the filtering a bit smarter. For instance if the query is a
`match_all` query, then we need to exclude nested docs. However, if the query
is `foo: bar` then it may only match root documents since `foo` is a top-level
field, so no additional filtering is required.
Another improvement is to use a `FILTER` clause on all types rather than a
`MUST_NOT` clause on all nested paths when possible since `FILTER` clauses
are more efficient.
Here are some examples of queries and how they get rewritten:
```
"match_all": {}
```
This query gets rewritten to `ConstantScore(+*:* -_type:__*)` on master and
`ConstantScore(_type:AutomatonQuery {\norg.apache.lucene.util.automaton.Automaton@4371da44})`
with this change. The automaton is the complement of `_type:__*` so it matches
the same documents, but is faster since it is now a positive clause. Simplistic
performance testing on a 10M index where each root document has 5 nested
documents on average gave a latency of 420ms on master and 90ms with this change
applied.
```
"term": {
"foo": {
"value": "0"
}
}
```
This query is rewritten to `+foo:0 #(ConstantScore(+*:* -_type:__*))^0.0` on
master and `foo:0` with this change: we do not need to filter nested docs out
since the query cannot match nested docs. While doing performance testing in
the same conditions as above, response times went from 250ms to 50ms.
```
"nested": {
"path": "nested",
"query": {
"term": {
"nested.foo": {
"value": "0"
}
}
}
}
```
This query is rewritten to
`+ToParentBlockJoinQuery (+nested.foo:0 #_type:__nested) #(ConstantScore(+*:* -_type:__*))^0.0`
on master and `ToParentBlockJoinQuery (nested.foo:0)` with this change. The
top-level filter (`-_type:__*`) could be removed since `nested` queries only
match documents of the parent space, as well as the child filter
(`#_type:__nested`) since the child query may only match nested docs since the
`nested` object has both `include_in_parent` and `include_in_root` set to
`false`. While doing performance testing in the same conditions as above,
response times went from 850ms to 270ms.
2017-02-14 10:05:19 -05:00
|
|
|
},
|
|
|
|
{
|
2018-09-06 08:42:06 -04:00
|
|
|
"value":0.75,
|
|
|
|
"description":"b, length normalization parameter",
|
|
|
|
"details":[]
|
Nested queries should avoid adding unnecessary filters when possible. (#23079)
When nested objects are present in the mappings, many queries get deoptimized
due to the need to exclude documents that are not in the right space. For
instance, a filter is applied to all queries that prevents them from matching
non-root documents (`+*:* -_type:__*`). Moreover, a filter is applied to all
child queries of `nested` queries in order to make sure that the child query
only matches child documents (`_type:__nested_path`), which is required by
`ToParentBlockJoinQuery` (the Lucene query behing Elasticsearch's `nested`
queries).
These additional filters slow down `nested` queries. In 1.7-, the cost was
somehow amortized by the fact that we cached filters very aggressively. However,
this has proven to be a significant source of slow downs since 2.0 for users
of `nested` mappings and queries, see #20797.
This change makes the filtering a bit smarter. For instance if the query is a
`match_all` query, then we need to exclude nested docs. However, if the query
is `foo: bar` then it may only match root documents since `foo` is a top-level
field, so no additional filtering is required.
Another improvement is to use a `FILTER` clause on all types rather than a
`MUST_NOT` clause on all nested paths when possible since `FILTER` clauses
are more efficient.
Here are some examples of queries and how they get rewritten:
```
"match_all": {}
```
This query gets rewritten to `ConstantScore(+*:* -_type:__*)` on master and
`ConstantScore(_type:AutomatonQuery {\norg.apache.lucene.util.automaton.Automaton@4371da44})`
with this change. The automaton is the complement of `_type:__*` so it matches
the same documents, but is faster since it is now a positive clause. Simplistic
performance testing on a 10M index where each root document has 5 nested
documents on average gave a latency of 420ms on master and 90ms with this change
applied.
```
"term": {
"foo": {
"value": "0"
}
}
```
This query is rewritten to `+foo:0 #(ConstantScore(+*:* -_type:__*))^0.0` on
master and `foo:0` with this change: we do not need to filter nested docs out
since the query cannot match nested docs. While doing performance testing in
the same conditions as above, response times went from 250ms to 50ms.
```
"nested": {
"path": "nested",
"query": {
"term": {
"nested.foo": {
"value": "0"
}
}
}
}
```
This query is rewritten to
`+ToParentBlockJoinQuery (+nested.foo:0 #_type:__nested) #(ConstantScore(+*:* -_type:__*))^0.0`
on master and `ToParentBlockJoinQuery (nested.foo:0)` with this change. The
top-level filter (`-_type:__*`) could be removed since `nested` queries only
match documents of the parent space, as well as the child filter
(`#_type:__nested`) since the child query may only match nested docs since the
`nested` object has both `include_in_parent` and `include_in_root` set to
`false`. While doing performance testing in the same conditions as above,
response times went from 850ms to 270ms.
2017-02-14 10:05:19 -05:00
|
|
|
},
|
|
|
|
{
|
2018-09-06 08:42:06 -04:00
|
|
|
"value":3.0,
|
|
|
|
"description":"dl, length of field",
|
|
|
|
"details":[]
|
Nested queries should avoid adding unnecessary filters when possible. (#23079)
When nested objects are present in the mappings, many queries get deoptimized
due to the need to exclude documents that are not in the right space. For
instance, a filter is applied to all queries that prevents them from matching
non-root documents (`+*:* -_type:__*`). Moreover, a filter is applied to all
child queries of `nested` queries in order to make sure that the child query
only matches child documents (`_type:__nested_path`), which is required by
`ToParentBlockJoinQuery` (the Lucene query behing Elasticsearch's `nested`
queries).
These additional filters slow down `nested` queries. In 1.7-, the cost was
somehow amortized by the fact that we cached filters very aggressively. However,
this has proven to be a significant source of slow downs since 2.0 for users
of `nested` mappings and queries, see #20797.
This change makes the filtering a bit smarter. For instance if the query is a
`match_all` query, then we need to exclude nested docs. However, if the query
is `foo: bar` then it may only match root documents since `foo` is a top-level
field, so no additional filtering is required.
Another improvement is to use a `FILTER` clause on all types rather than a
`MUST_NOT` clause on all nested paths when possible since `FILTER` clauses
are more efficient.
Here are some examples of queries and how they get rewritten:
```
"match_all": {}
```
This query gets rewritten to `ConstantScore(+*:* -_type:__*)` on master and
`ConstantScore(_type:AutomatonQuery {\norg.apache.lucene.util.automaton.Automaton@4371da44})`
with this change. The automaton is the complement of `_type:__*` so it matches
the same documents, but is faster since it is now a positive clause. Simplistic
performance testing on a 10M index where each root document has 5 nested
documents on average gave a latency of 420ms on master and 90ms with this change
applied.
```
"term": {
"foo": {
"value": "0"
}
}
```
This query is rewritten to `+foo:0 #(ConstantScore(+*:* -_type:__*))^0.0` on
master and `foo:0` with this change: we do not need to filter nested docs out
since the query cannot match nested docs. While doing performance testing in
the same conditions as above, response times went from 250ms to 50ms.
```
"nested": {
"path": "nested",
"query": {
"term": {
"nested.foo": {
"value": "0"
}
}
}
}
```
This query is rewritten to
`+ToParentBlockJoinQuery (+nested.foo:0 #_type:__nested) #(ConstantScore(+*:* -_type:__*))^0.0`
on master and `ToParentBlockJoinQuery (nested.foo:0)` with this change. The
top-level filter (`-_type:__*`) could be removed since `nested` queries only
match documents of the parent space, as well as the child filter
(`#_type:__nested`) since the child query may only match nested docs since the
`nested` object has both `include_in_parent` and `include_in_root` set to
`false`. While doing performance testing in the same conditions as above,
response times went from 850ms to 270ms.
2017-02-14 10:05:19 -05:00
|
|
|
},
|
|
|
|
{
|
2018-09-06 08:42:06 -04:00
|
|
|
"value":5.4,
|
|
|
|
"description":"avgdl, average length of field",
|
|
|
|
"details":[]
|
2016-11-29 12:40:31 -05:00
|
|
|
}
|
|
|
|
]
|
|
|
|
}
|
|
|
|
]
|
|
|
|
}
|
|
|
|
]
|
|
|
|
}
|
2016-08-01 08:09:54 -04:00
|
|
|
}
|
2013-08-28 19:24:34 -04:00
|
|
|
--------------------------------------------------
|
2016-09-14 11:23:25 -04:00
|
|
|
// TESTRESPONSE
|
2013-08-28 19:24:34 -04:00
|
|
|
|
|
|
|
There is also a simpler way of specifying the query via the `q`
|
|
|
|
parameter. The specified `q` parameter value is then parsed as if the
|
|
|
|
`query_string` query was used. Example usage of the `q` parameter in the
|
|
|
|
explain api:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
2017-12-14 11:47:53 -05:00
|
|
|
GET /twitter/_doc/0/_explain?q=message:search
|
2013-08-28 19:24:34 -04:00
|
|
|
--------------------------------------------------
|
2016-09-14 11:23:25 -04:00
|
|
|
// CONSOLE
|
|
|
|
// TEST[setup:twitter]
|
2013-08-28 19:24:34 -04:00
|
|
|
|
|
|
|
This will yield the same result as the previous request.
|
|
|
|
|
|
|
|
[float]
|
|
|
|
=== All parameters:
|
|
|
|
|
|
|
|
[horizontal]
|
2013-11-13 10:53:10 -05:00
|
|
|
`_source`::
|
|
|
|
|
2014-09-26 15:04:42 -04:00
|
|
|
Set to `true` to retrieve the `_source` of the document explained. You can also
|
2013-11-13 10:53:10 -05:00
|
|
|
retrieve part of the document by using `_source_include` & `_source_exclude` (see <<get-source-filtering,Get API>> for more details)
|
|
|
|
|
2016-09-13 14:54:41 -04:00
|
|
|
`stored_fields`::
|
2013-11-13 10:53:10 -05:00
|
|
|
Allows to control which stored fields to return as part of the
|
|
|
|
document explained.
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2013-10-13 10:46:56 -04:00
|
|
|
`routing`::
|
2013-08-28 19:24:34 -04:00
|
|
|
Controls the routing in the case the routing was used
|
|
|
|
during indexing.
|
|
|
|
|
2013-10-13 10:46:56 -04:00
|
|
|
`parent`::
|
|
|
|
Same effect as setting the routing parameter.
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2013-10-13 10:46:56 -04:00
|
|
|
`preference`::
|
|
|
|
Controls on which shard the explain is executed.
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2013-10-13 10:46:56 -04:00
|
|
|
`source`::
|
2013-08-28 19:24:34 -04:00
|
|
|
Allows the data of the request to be put in the query
|
2013-10-13 10:46:56 -04:00
|
|
|
string of the url.
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2013-10-13 10:46:56 -04:00
|
|
|
`q`::
|
|
|
|
The query string (maps to the query_string query).
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2013-10-13 10:46:56 -04:00
|
|
|
`df`::
|
2013-08-28 19:24:34 -04:00
|
|
|
The default field to use when no field prefix is defined within
|
2017-08-28 11:43:59 -04:00
|
|
|
the query.
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2013-10-13 10:46:56 -04:00
|
|
|
`analyzer`::
|
2013-08-28 19:24:34 -04:00
|
|
|
The analyzer name to be used when analyzing the query
|
2017-08-28 11:43:59 -04:00
|
|
|
string. Defaults to the default search analyzer.
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2013-10-13 10:46:56 -04:00
|
|
|
`analyze_wildcard`::
|
2013-08-28 19:24:34 -04:00
|
|
|
Should wildcard and prefix queries be analyzed or
|
2013-10-13 10:46:56 -04:00
|
|
|
not. Defaults to false.
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2013-10-13 10:46:56 -04:00
|
|
|
`lenient`::
|
2013-08-28 19:24:34 -04:00
|
|
|
If set to true will cause format based failures (like
|
2013-10-13 10:46:56 -04:00
|
|
|
providing text to a numeric field) to be ignored. Defaults to false.
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2013-10-13 10:46:56 -04:00
|
|
|
`default_operator`::
|
2013-08-28 19:24:34 -04:00
|
|
|
The default operator to be used, can be AND or
|
|
|
|
OR. Defaults to OR.
|