OpenSearch

Commit Graph

Author	SHA1	Message	Date
Adrien Grand	8d6a41f671	Nested queries should avoid adding unnecessary filters when possible. (#23079 ) When nested objects are present in the mappings, many queries get deoptimized due to the need to exclude documents that are not in the right space. For instance, a filter is applied to all queries that prevents them from matching non-root documents (`+: -_type:__`). Moreover, a filter is applied to all child queries of `nested` queries in order to make sure that the child query only matches child documents (`_type:__nested_path`), which is required by `ToParentBlockJoinQuery` (the Lucene query behing Elasticsearch's `nested` queries). These additional filters slow down `nested` queries. In 1.7-, the cost was somehow amortized by the fact that we cached filters very aggressively. However, this has proven to be a significant source of slow downs since 2.0 for users of `nested` mappings and queries, see #20797. This change makes the filtering a bit smarter. For instance if the query is a `match_all` query, then we need to exclude nested docs. However, if the query is `foo: bar` then it may only match root documents since `foo` is a top-level field, so no additional filtering is required. Another improvement is to use a `FILTER` clause on all types rather than a `MUST_NOT` clause on all nested paths when possible since `FILTER` clauses are more efficient. Here are some examples of queries and how they get rewritten: ``` "match_all": {} ``` This query gets rewritten to `ConstantScore(+:* -_type:__)` on master and `ConstantScore(_type:AutomatonQuery {\norg.apache.lucene.util.automaton.Automaton@4371da44})` with this change. The automaton is the complement of `_type:__` so it matches the same documents, but is faster since it is now a positive clause. Simplistic performance testing on a 10M index where each root document has 5 nested documents on average gave a latency of 420ms on master and 90ms with this change applied. ``` "term": { "foo": { "value": "0" } } ``` This query is rewritten to `+foo:0 #(ConstantScore(+: -_type:__))^0.0` on master and `foo:0` with this change: we do not need to filter nested docs out since the query cannot match nested docs. While doing performance testing in the same conditions as above, response times went from 250ms to 50ms. ``` "nested": { "path": "nested", "query": { "term": { "nested.foo": { "value": "0" } } } } ``` This query is rewritten to `+ToParentBlockJoinQuery (+nested.foo:0 #_type:__nested) #(ConstantScore(+:* -_type:__))^0.0` on master and `ToParentBlockJoinQuery (nested.foo:0)` with this change. The top-level filter (`-_type:__`) could be removed since `nested` queries only match documents of the parent space, as well as the child filter (`#_type:__nested`) since the child query may only match nested docs since the `nested` object has both `include_in_parent` and `include_in_root` set to `false`. While doing performance testing in the same conditions as above, response times went from 850ms to 270ms.	2017-02-14 16:05:19 +01:00
Jim Ferenczi	d791ddf704	Upgrade to lucene-6.4.0-snapshot-ec38570 (#21853 ) Set lucene version to 6.4.0-snapshot-ec38570 and update all the sha1s/license Fix invalid combo after upgrade in query_string query. split_on_whitespace=false is disallowed if auto_generate_phrase_queries=true Adapt the expectations of some tests to the new format of the Lucene explain output	2016-11-29 18:40:31 +01:00
Adrien Grand	52de0645fb	Remove `lowercase_expanded_terms` and `locale` from query-parser options. (#20208 ) Lucene 6.2 introduces the new `Analyzer.normalize` API, which allows to apply only character-level normalization such as lowercasing or accent folding, which is exactly what is needed to process queries that operate on partial terms such as `prefix`, `wildcard` or `fuzzy` queries. As a consequence, the `lowercase_expanded_terms` option is not necessary anymore. Furthermore, the `locale` option was only needed in order to know how to perform the lowercasing, so this one can be removed as well. Closes #9978	2016-11-02 14:25:08 +01:00
Jim Ferenczi	d0bbe89c16	Optimize query with types filter in the URL (t/t/_search) (#20979 ) This change adds a TypesQuery that checks if the disjunction of types should be rewritten to a MatchAllDocs query. The check is done only if the number of terms is below a threshold (16 by default and configurable via max_boolean_clause).	2016-10-20 12:33:32 +02:00
Nik Everett	2d568ece2d	CONSOLEify some search docs * search/search.asciidoc * search/request-body.asciidoc * search/explain.asciidoc * search/search-shards.asciidoc	2016-09-14 13:06:37 -04:00
Jim Ferenczi	1764ec56b3	Fixed naming inconsistency for fields/stored_fields in the APIs (#20166 ) This change replaces the fields parameter with stored_fields when it makes sense. This is dictated by the renaming we made in #18943 for the search API. The following list of endpoint has been changed to use `stored_fields` instead of `fields`: * get * mget * explain The documentation and the rest API spec has been updated to cope with the changes for the following APIs: * delete_by_query * get * mget * explain The `fields` parameter has been deprecated for the following APIs (it is replaced by _source filtering): * update: the fields are extracted from the _source directly. * bulk: the fields parameter is used but fields are extracted from the source directly so it is allowed to have non-stored fields. Some APIs still have the `fields` parameter for various reasons: * cat.fielddata: the fields paramaters relates to the fielddata fields that should be printed. * indices.clear_cache: used to indicate which fielddata fields should be cleared. * indices.get_field_mapping: used to filter fields in the mapping. * indices.stats: get stats on fields (stored or not stored). * termvectors: fields are retrieved from the stored fields if possible and extracted from the _source otherwise. * mtermvectors: * nodes.stats: the fields parameter is used to concatenate completion_fields and fielddata_fields so it's not related to stored_fields at all. Fixes #20155	2016-09-13 20:54:41 +02:00
Isabel Drost-Fromm	672ffb6e4d	Revert "Add console to docs for inner hits, explain, and friends"	2016-08-01 14:09:54 +02:00
Isabel Drost-Fromm	2a148a5b25	Update to current format.	2016-07-27 14:30:14 +02:00
Isabel Drost-Fromm	88c5a574a6	Add CONSOLE to explain	2016-05-19 12:17:13 +02:00
Lee Hinman	6aec68cd29	Revert "[QUERY] Remove lowercase_expanded_terms and locale options" This reverts commit `d1f7bd97cb`. Ryan pointed out that this needs to work with the multi term query, so additional analysis and tests should be added.	2015-03-13 13:51:44 -06:00
Lee Hinman	d1f7bd97cb	[QUERY] Remove lowercase_expanded_terms and locale options The analysis chain should be used instead of relying on this, as it is confusing when dealing with different per-field analysers. The `locale` option was only used for `lowercase_expanded_terms`, which, once removed, is no longer needed, so it was removed as well. Fixes #9978 Relates to #9973	2015-03-13 13:17:27 -06:00
Clinton Gormley	cb00d4a542	Docs: Removed all the added/deprecated tags from 1.x	2014-09-26 21:04:42 +02:00
Luca Cavanna	4126ae2631	[DOCS] updated json responses after #4310 and #4480 - Removed "ok": true from response examples - Added "created" flag to index response examples - Replaced exists flag with found in delete response examples	2014-01-16 12:01:39 +01:00
Boaz Leskes	c63d8c4fb5	[Docs] Added _source filtering to documentation Relates to #3301	2013-11-26 19:16:24 +01:00
Clinton Gormley	9a062e465c	[DOCS] Reorganised common API conventions	2013-10-13 16:46:56 +02:00
Clinton Gormley	393c28bee4	[DOCS] Removed outdated new/deprecated version notices	2013-09-03 21:28:31 +02:00
Clinton Gormley	822043347e	Migrated documentation into the main repo	2013-08-29 01:24:34 +02:00

17 Commits