mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-18 10:54:54 +00:00
When nested objects are present in the mappings, many queries get deoptimized due to the need to exclude documents that are not in the right space. For instance, a filter is applied to all queries that prevents them from matching non-root documents (`+*:* -_type:__*`). Moreover, a filter is applied to all child queries of `nested` queries in order to make sure that the child query only matches child documents (`_type:__nested_path`), which is required by `ToParentBlockJoinQuery` (the Lucene query behing Elasticsearch's `nested` queries). These additional filters slow down `nested` queries. In 1.7-, the cost was somehow amortized by the fact that we cached filters very aggressively. However, this has proven to be a significant source of slow downs since 2.0 for users of `nested` mappings and queries, see #20797. This change makes the filtering a bit smarter. For instance if the query is a `match_all` query, then we need to exclude nested docs. However, if the query is `foo: bar` then it may only match root documents since `foo` is a top-level field, so no additional filtering is required. Another improvement is to use a `FILTER` clause on all types rather than a `MUST_NOT` clause on all nested paths when possible since `FILTER` clauses are more efficient. Here are some examples of queries and how they get rewritten: ``` "match_all": {} ``` This query gets rewritten to `ConstantScore(+*:* -_type:__*)` on master and `ConstantScore(_type:AutomatonQuery {\norg.apache.lucene.util.automaton.Automaton@4371da44})` with this change. The automaton is the complement of `_type:__*` so it matches the same documents, but is faster since it is now a positive clause. Simplistic performance testing on a 10M index where each root document has 5 nested documents on average gave a latency of 420ms on master and 90ms with this change applied. ``` "term": { "foo": { "value": "0" } } ``` This query is rewritten to `+foo:0 #(ConstantScore(+*:* -_type:__*))^0.0` on master and `foo:0` with this change: we do not need to filter nested docs out since the query cannot match nested docs. While doing performance testing in the same conditions as above, response times went from 250ms to 50ms. ``` "nested": { "path": "nested", "query": { "term": { "nested.foo": { "value": "0" } } } } ``` This query is rewritten to `+ToParentBlockJoinQuery (+nested.foo:0 #_type:__nested) #(ConstantScore(+*:* -_type:__*))^0.0` on master and `ToParentBlockJoinQuery (nested.foo:0)` with this change. The top-level filter (`-_type:__*`) could be removed since `nested` queries only match documents of the parent space, as well as the child filter (`#_type:__nested`) since the child query may only match nested docs since the `nested` object has both `include_in_parent` and `include_in_root` set to `false`. While doing performance testing in the same conditions as above, response times went from 850ms to 270ms.
The Elasticsearch docs are in AsciiDoc format and can be built using the Elasticsearch documentation build process. See: https://github.com/elastic/docs Snippets marked with `// CONSOLE` are automatically annotated with "VIEW IN CONSOLE" in the documentation and are automatically tested by the command `gradle :docs:check`. To test just the docs from a single page, use e.g. `gradle :docs:check -Dtests.method=*rollover*`. By default `// CONSOLE` snippet runs as its own isolated test. You can manipulate the test execution in the following ways: * `// TEST`: Explicitly marks a snippet as a test. Snippets marked this way are tests even if they don't have `// CONSOLE`. * `// TEST[s/foo/bar/]`: Replace `foo` with `bar` in the test. This should be used sparingly because it makes the test "lie". Sometimes, though, you can use it to make the tests more clear. * `// TEST[catch:foo]`: Used to expect errors in the requests. Replace `foo` with `request` to expect a 400 error, for example. If the snippet contains multiple requests then only the last request will expect the error. * `// TEST[continued]`: Continue the test started in the last snippet. Between tests the nodes are cleaned: indexes are removed, etc. This will prevent that. This is really useful when you have text and snippets that work together to tell the story of some use case because it merges the snippets (and thus the use case) into one big test. * `// TEST[skip:reason]`: Skip this test. Replace `reason` with the actual reason to skip the test. Snippets without `// TEST` or `// CONSOLE` aren't considered tests anyway but this is useful for explicitly documenting the reason why the test shouldn't be run. * `// TEST[setup:name]`: Run some setup code before running the snippet. This is useful for creating and populating indexes used in the snippet. The setup code is defined in `docs/build.gradle`. * `// TEST[warning:some warning]`: Expect the response to include a `Warning` header. If the response doesn't include a `Warning` header with the exact text then the test fails. If the response includes `Warning` headers that aren't expected then the test fails. * `// TESTRESPONSE`: Matches this snippet against the body of the response of the last test. If the response is JSON then order is ignored. If you add `// TEST[continued]` to the snippet after `// TESTRESPONSE` it will continue in the same test, allowing you to interleve requests with responses to check. * `// TESTRESPONSE[s/foo/bar/]`: Substitutions. See `// TEST[s/foo/bar]`. * `// TESTRESPONSE[_cat]`: Add substitutions for testing `_cat` responses. Use this after all other substitutions so it doesn't make other substitutions difficult. * `// TESTSETUP`: Marks this snippet as the "setup" for all other snippets in this file. This is a somewhat natural way of structuring documentation. You say "this is the data we use to explain this feature" then you add the snippet that you mark `// TESTSETUP` and then every snippet will turn into a test that runs the setup snippet first. See the "painless" docs for a file that puts this to good use. This is fairly similar to `// TEST[setup:name]` but rather than the setup defined in `docs/build.gradle` the setup is defined right in the documentation file. Any place you can use json you can use elements like `$body.path.to.thing` which is replaced on the fly with the contents of the thing at `path.to.thing` in the last response.