OpenSearch/docs
Adrien Grand 8d6a41f671 Nested queries should avoid adding unnecessary filters when possible. (#23079)
When nested objects are present in the mappings, many queries get deoptimized
due to the need to exclude documents that are not in the right space. For
instance, a filter is applied to all queries that prevents them from matching
non-root documents (`+*:* -_type:__*`). Moreover, a filter is applied to all
child queries of `nested` queries in order to make sure that the child query
only matches child documents (`_type:__nested_path`), which is required by
`ToParentBlockJoinQuery` (the Lucene query behing Elasticsearch's `nested`
queries).

These additional filters slow down `nested` queries. In 1.7-, the cost was
somehow amortized by the fact that we cached filters very aggressively. However,
this has proven to be a significant source of slow downs since 2.0 for users
of `nested` mappings and queries, see #20797.

This change makes the filtering a bit smarter. For instance if the query is a
`match_all` query, then we need to exclude nested docs. However, if the query
is `foo: bar` then it may only match root documents since `foo` is a top-level
field, so no additional filtering is required.

Another improvement is to use a `FILTER` clause on all types rather than a
`MUST_NOT` clause on all nested paths when possible since `FILTER` clauses
are more efficient.

Here are some examples of queries and how they get rewritten:

```
"match_all": {}
```

This query gets rewritten to `ConstantScore(+*:* -_type:__*)` on master and
`ConstantScore(_type:AutomatonQuery {\norg.apache.lucene.util.automaton.Automaton@4371da44})`
with this change. The automaton is the complement of `_type:__*` so it matches
the same documents, but is faster since it is now a positive clause. Simplistic
performance testing on a 10M index where each root document has 5 nested
documents on average gave a latency of 420ms on master and 90ms with this change
applied.

```
"term": {
  "foo": {
    "value": "0"
  }
}
```

This query is rewritten to `+foo:0 #(ConstantScore(+*:* -_type:__*))^0.0` on
master and `foo:0` with this change: we do not need to filter nested docs out
since the query cannot match nested docs. While doing performance testing in
the same conditions as above, response times went from 250ms to 50ms.

```
"nested": {
  "path": "nested",
  "query": {
    "term": {
      "nested.foo": {
        "value": "0"
      }
    }
  }
}
```

This query is rewritten to
`+ToParentBlockJoinQuery (+nested.foo:0 #_type:__nested) #(ConstantScore(+*:* -_type:__*))^0.0`
on master and `ToParentBlockJoinQuery (nested.foo:0)` with this change. The
top-level filter (`-_type:__*`) could be removed since `nested` queries only
match documents of the parent space, as well as the child filter
(`#_type:__nested`) since the child query may only match nested docs since the
`nested` object has both `include_in_parent` and `include_in_root` set to
`false`. While doing performance testing in the same conditions as above,
response times went from 850ms to 270ms.
2017-02-14 16:05:19 +01:00
..
community-clients Add a preface title 2017-02-04 21:23:41 +01:00
groovy-api Use Versions.asciidoc for groovy docs too 2017-02-04 11:42:45 +01:00
java-api Centralised doc versions in docs/Versions.asciidoc 2017-02-04 11:16:19 +01:00
java-rest Centralised doc versions in docs/Versions.asciidoc 2017-02-04 11:16:19 +01:00
perl Updated copyright years to include 2016 (#17808) 2016-04-18 12:39:23 +02:00
plugins Replaced absolute URLs in docs with attributes 2017-02-04 12:05:03 +01:00
python Remove most of the need for `// NOTCONSOLE` 2016-09-06 10:32:54 -04:00
reference Nested queries should avoid adding unnecessary filters when possible. (#23079) 2017-02-14 16:05:19 +01:00
resiliency Update resiliency page for the release of v5 (#21177) 2016-10-28 18:46:54 +02:00
ruby Updated copyright years to include 2016 (#17808) 2016-04-18 12:39:23 +02:00
src/test CONSOLE-ify min and max aggregation docs 2017-01-20 15:33:00 -05:00
README.asciidoc Fix confusing section in docs/README 2017-02-09 16:17:14 -05:00
Versions.asciidoc Defguide link should not use {branch} 2017-02-04 21:21:46 +01:00
build.gradle Docs: Consoleify cluster and indices settings docs (#23030) 2017-02-10 14:57:43 -08:00

README.asciidoc

The Elasticsearch docs are in AsciiDoc format and can be built using the
Elasticsearch documentation build process.

See: https://github.com/elastic/docs

Snippets marked with `// CONSOLE` are automatically annotated with "VIEW IN
CONSOLE" in the documentation and are automatically tested by the command
`gradle :docs:check`. To test just the docs from a single page, use e.g.
`gradle :docs:check -Dtests.method=*rollover*`.

By default `// CONSOLE` snippet runs as its own isolated
test. You can manipulate the test execution in the following ways:

* `// TEST`: Explicitly marks a snippet as a test. Snippets marked this way
are tests even if they don't have `// CONSOLE`.
  * `// TEST[s/foo/bar/]`: Replace `foo` with `bar` in the test. This should be
  used sparingly because it makes the test "lie". Sometimes, though, you can use
  it to make the tests more clear.
  * `// TEST[catch:foo]`: Used to expect errors in the requests. Replace `foo`
  with `request` to expect a 400 error, for example. If the snippet contains
  multiple requests then only the last request will expect the error.
  * `// TEST[continued]`: Continue the test started in the last snippet. Between
  tests the nodes are cleaned: indexes are removed, etc. This will prevent that.
  This is really useful when you have text and snippets that work together to
  tell the story of some use case because it merges the snippets (and thus the
  use case) into one big test.
  * `// TEST[skip:reason]`: Skip this test. Replace `reason` with the actual
  reason to skip the test. Snippets without `// TEST` or `// CONSOLE` aren't
  considered tests anyway but this is useful for explicitly documenting the
  reason why the test shouldn't be run.
  * `// TEST[setup:name]`: Run some setup code before running the snippet. This
  is useful for creating and populating indexes used in the snippet. The setup
  code is defined in `docs/build.gradle`.
  * `// TEST[warning:some warning]`: Expect the response to include a `Warning`
  header. If the response doesn't include a `Warning` header with the exact
  text then the test fails. If the response includes `Warning` headers that
  aren't expected then the test fails.
* `// TESTRESPONSE`: Matches this snippet against the body of the response of
  the last test. If the response is JSON then order is ignored. If you add
  `// TEST[continued]` to the snippet after `// TESTRESPONSE` it will continue
  in the same test, allowing you to interleve requests with responses to check.
  * `// TESTRESPONSE[s/foo/bar/]`: Substitutions. See `// TEST[s/foo/bar]`.
  * `// TESTRESPONSE[_cat]`: Add substitutions for testing `_cat` responses. Use
  this after all other substitutions so it doesn't make other substitutions
  difficult.
* `// TESTSETUP`: Marks this snippet as the "setup" for all other snippets in
  this file. This is a somewhat natural way of structuring documentation. You
  say "this is the data we use to explain this feature" then you add the
  snippet that you mark `// TESTSETUP` and then every snippet will turn into
  a test that runs the setup snippet first. See the "painless" docs for a file
  that puts this to good use. This is fairly similar to `// TEST[setup:name]`
  but rather than the setup defined in `docs/build.gradle` the setup is defined
  right in the documentation file.

Any place you can use json you can use elements like `$body.path.to.thing`
which is replaced on the fly with the contents of the thing at `path.to.thing`
in the last response.