OpenSearch/docs/reference
Adrien Grand 8d6a41f671 Nested queries should avoid adding unnecessary filters when possible. (#23079)
When nested objects are present in the mappings, many queries get deoptimized
due to the need to exclude documents that are not in the right space. For
instance, a filter is applied to all queries that prevents them from matching
non-root documents (`+*:* -_type:__*`). Moreover, a filter is applied to all
child queries of `nested` queries in order to make sure that the child query
only matches child documents (`_type:__nested_path`), which is required by
`ToParentBlockJoinQuery` (the Lucene query behing Elasticsearch's `nested`
queries).

These additional filters slow down `nested` queries. In 1.7-, the cost was
somehow amortized by the fact that we cached filters very aggressively. However,
this has proven to be a significant source of slow downs since 2.0 for users
of `nested` mappings and queries, see #20797.

This change makes the filtering a bit smarter. For instance if the query is a
`match_all` query, then we need to exclude nested docs. However, if the query
is `foo: bar` then it may only match root documents since `foo` is a top-level
field, so no additional filtering is required.

Another improvement is to use a `FILTER` clause on all types rather than a
`MUST_NOT` clause on all nested paths when possible since `FILTER` clauses
are more efficient.

Here are some examples of queries and how they get rewritten:

```
"match_all": {}
```

This query gets rewritten to `ConstantScore(+*:* -_type:__*)` on master and
`ConstantScore(_type:AutomatonQuery {\norg.apache.lucene.util.automaton.Automaton@4371da44})`
with this change. The automaton is the complement of `_type:__*` so it matches
the same documents, but is faster since it is now a positive clause. Simplistic
performance testing on a 10M index where each root document has 5 nested
documents on average gave a latency of 420ms on master and 90ms with this change
applied.

```
"term": {
  "foo": {
    "value": "0"
  }
}
```

This query is rewritten to `+foo:0 #(ConstantScore(+*:* -_type:__*))^0.0` on
master and `foo:0` with this change: we do not need to filter nested docs out
since the query cannot match nested docs. While doing performance testing in
the same conditions as above, response times went from 250ms to 50ms.

```
"nested": {
  "path": "nested",
  "query": {
    "term": {
      "nested.foo": {
        "value": "0"
      }
    }
  }
}
```

This query is rewritten to
`+ToParentBlockJoinQuery (+nested.foo:0 #_type:__nested) #(ConstantScore(+*:* -_type:__*))^0.0`
on master and `ToParentBlockJoinQuery (nested.foo:0)` with this change. The
top-level filter (`-_type:__*`) could be removed since `nested` queries only
match documents of the parent space, as well as the child filter
(`#_type:__nested`) since the child query may only match nested docs since the
`nested` object has both `include_in_parent` and `include_in_root` set to
`false`. While doing performance testing in the same conditions as above,
response times went from 850ms to 270ms.
2017-02-14 16:05:19 +01:00
..
aggregations Use `typed_keys` parameter to prefix suggester names by type in search responses (#23080) 2017-02-10 10:53:38 +01:00
analysis Consolify docs/reference/analysis/tokenfilters/pattern-capture-tokenfilter.asciidoc. (#23050) 2017-02-13 11:00:12 +01:00
cat Fix duplicates from search.query (#22701) 2017-01-20 18:45:10 +01:00
cluster Docs: Consoleify cluster and indices settings docs (#23030) 2017-02-10 14:57:43 -08:00
docs Fixed bad asciidoc in delete-by-query 2017-02-09 20:14:56 +01:00
how-to Improve wording in recipes docs 2017-01-17 21:00:36 -05:00
images Docs: clarify calculation of sigma and lambda in function_score (#20267) 2016-09-02 14:41:07 +02:00
index-modules Restores the original default format of search slow log 2016-12-09 12:38:28 -05:00
indices Docs: Consoleify cluster and indices settings docs (#23030) 2017-02-10 14:57:43 -08:00
ingest Link directly to the attachments in arrays section 2016-12-22 20:52:08 +08:00
mapping Disallow include_in_all for 6.0+ indices 2017-02-07 19:31:51 -07:00
migration Disallow include_in_all for 6.0+ indices 2017-02-07 19:31:51 -07:00
modules Add a note about `cluster.routing.allocation.node_concurrent_recoveries` (#23160) 2017-02-14 14:14:41 +02:00
painless-api-reference Expose multi-valued dates to scripts and document painless's date functions (#22875) 2017-02-01 21:57:07 -05:00
query-dsl Add note about min_score filtering efficiency (#23109) 2017-02-13 12:15:01 +01:00
search Nested queries should avoid adding unnecessary filters when possible. (#23079) 2017-02-14 16:05:19 +01:00
setup Adding `ansible-elasticsearch` to list of CM tools (#23058) 2017-02-09 21:14:30 +01:00
testing Update docs after test-framework moved 2016-11-02 09:51:55 -04:00
aggregations.asciidoc `value_type` is useful regardless of scripting. (#22160) 2016-12-22 14:35:12 +01:00
analysis.asciidoc Add the ability to set an analyzer on keyword fields. (#21919) 2016-12-30 09:36:10 +01:00
api-conventions.asciidoc Optionally require a valid content type for all rest requests with content (#22691) 2017-02-02 14:07:13 -05:00
cat.asciidoc Allows multiple patterns to be specified for index templates (#21009) 2016-11-10 18:00:30 -05:00
cluster.asciidoc Convert more docs to CONSOLE 2016-09-21 09:36:21 -04:00
docs.asciidoc Inclusion of link to Multi Delete (#22619) 2017-01-16 12:58:59 +01:00
getting-started.asciidoc Replaced absolute URLs in docs with attributes 2017-02-04 12:05:03 +01:00
glossary.asciidoc Improve glossary to not refer to types as "like a table" (#17704) 2016-04-13 14:29:47 +02:00
how-to.asciidoc Correct grammar in list in how-to docs 2017-01-17 20:57:22 -05:00
index-modules.asciidoc Allow an index to be partitioned with custom routing (#22274) 2017-01-18 08:51:23 +01:00
index.asciidoc Centralised doc versions in docs/Versions.asciidoc 2017-02-04 11:16:19 +01:00
indices.asciidoc Removed the upgrade API docs 2016-10-11 12:21:46 +02:00
ingest.asciidoc Renamed all AUTOSENSE snippets to CONSOLE (#18210) 2016-05-09 15:42:23 +02:00
mapping.asciidoc Disallow include_in_all for 6.0+ indices 2017-02-07 19:31:51 -07:00
modules.asciidoc Docs: Cross-cluster search doc wasn't being included 2017-01-18 10:02:51 +01:00
painless-api-reference.asciidoc Generate reference links for painless API (#22775) 2017-01-26 10:39:19 -05:00
query-dsl.asciidoc Fixed broken xrefs to query-dsl-not-query, which has been removed. 2015-10-20 13:01:37 -07:00
redirects.asciidoc Update redirects.asciidoc (#23148) 2017-02-13 16:23:25 +01:00
release-notes.asciidoc Remove links to release notes 2016-09-08 18:07:39 +02:00
search.asciidoc percolator: remove deprecated percolate and mpercolate apis 2017-01-10 11:18:27 +01:00
setup.asciidoc Docs: Add setup section for the keystore tool and secure settings (#22838) 2017-01-30 14:56:45 -08:00
testing.asciidoc [DOCS] Test framework documentation 2013-12-02 18:01:45 +01:00