OpenSearch

Commit Graph

Author	SHA1	Message	Date
Adrien Grand	8d6a41f671	Nested queries should avoid adding unnecessary filters when possible. (#23079 ) When nested objects are present in the mappings, many queries get deoptimized due to the need to exclude documents that are not in the right space. For instance, a filter is applied to all queries that prevents them from matching non-root documents (`+: -_type:__`). Moreover, a filter is applied to all child queries of `nested` queries in order to make sure that the child query only matches child documents (`_type:__nested_path`), which is required by `ToParentBlockJoinQuery` (the Lucene query behing Elasticsearch's `nested` queries). These additional filters slow down `nested` queries. In 1.7-, the cost was somehow amortized by the fact that we cached filters very aggressively. However, this has proven to be a significant source of slow downs since 2.0 for users of `nested` mappings and queries, see #20797. This change makes the filtering a bit smarter. For instance if the query is a `match_all` query, then we need to exclude nested docs. However, if the query is `foo: bar` then it may only match root documents since `foo` is a top-level field, so no additional filtering is required. Another improvement is to use a `FILTER` clause on all types rather than a `MUST_NOT` clause on all nested paths when possible since `FILTER` clauses are more efficient. Here are some examples of queries and how they get rewritten: ``` "match_all": {} ``` This query gets rewritten to `ConstantScore(+:* -_type:__)` on master and `ConstantScore(_type:AutomatonQuery {\norg.apache.lucene.util.automaton.Automaton@4371da44})` with this change. The automaton is the complement of `_type:__` so it matches the same documents, but is faster since it is now a positive clause. Simplistic performance testing on a 10M index where each root document has 5 nested documents on average gave a latency of 420ms on master and 90ms with this change applied. ``` "term": { "foo": { "value": "0" } } ``` This query is rewritten to `+foo:0 #(ConstantScore(+: -_type:__))^0.0` on master and `foo:0` with this change: we do not need to filter nested docs out since the query cannot match nested docs. While doing performance testing in the same conditions as above, response times went from 250ms to 50ms. ``` "nested": { "path": "nested", "query": { "term": { "nested.foo": { "value": "0" } } } } ``` This query is rewritten to `+ToParentBlockJoinQuery (+nested.foo:0 #_type:__nested) #(ConstantScore(+:* -_type:__))^0.0` on master and `ToParentBlockJoinQuery (nested.foo:0)` with this change. The top-level filter (`-_type:__`) could be removed since `nested` queries only match documents of the parent space, as well as the child filter (`#_type:__nested`) since the child query may only match nested docs since the `nested` object has both `include_in_parent` and `include_in_root` set to `false`. While doing performance testing in the same conditions as above, response times went from 850ms to 270ms.	2017-02-14 16:05:19 +01:00
Boaz Leskes	70a3ac1767	Add a note about `cluster.routing.allocation.node_concurrent_recoveries` (#23160 ) Closes #23152	2017-02-14 14:14:41 +02:00
Loek van Gool	214a3536aa	Update redirects.asciidoc (#23148 )	2017-02-13 16:23:25 +01:00
Giuseppe	ecbeffcb1e	Add note about min_score filtering efficiency (#23109 ) * Add note about min_score filtering efficiency * Reword to mention 'HAVING' * Remove reference to HAVING	2017-02-13 12:15:01 +01:00
Adrien Grand	f3509b8003	Consolify docs/reference/analysis/tokenfilters/pattern-capture-tokenfilter.asciidoc. (#23050 )	2017-02-13 11:00:12 +01:00
Ryan Ernst	c91848e6a7	Docs: Consoleify cluster and indices settings docs (#23030 ) relates #23001	2017-02-10 14:57:43 -08:00
Tanguy Leroux	e2e5937455	Use `typed_keys` parameter to prefix suggester names by type in search responses (#23080 ) This pull request reuses the typed_keys parameter added in #22965, but this time it applies it to suggesters. When set to true, the suggester names in the search response will be prefixed with a prefix that reflects their type.	2017-02-10 10:53:38 +01:00
Tanguy Leroux	63ea6f7168	[Docs] Remove unnecessary // TEST[continued] in search-template doc It has been explained in `e39b96f257`	2017-02-10 10:08:24 +01:00
Clinton Gormley	d43417ef47	Docs: Deleted redundant word in scripting	2017-02-09 22:02:42 +01:00
Jonathan D Strootman	cb35b3785a	Adding `ansible-elasticsearch` to list of CM tools (#23058 )	2017-02-09 21:14:30 +01:00
Clinton Gormley	78d3028bb7	Fixed bad asciidoc in delete-by-query	2017-02-09 20:14:56 +01:00
Jim Ferenczi	94087b3274	Removes ExpandCollapseSearchResponseListener, search response listeners and blocking calls This changes removes the SearchResponseListener that was used by the ExpandCollapseSearchResponseListener to expand collapsed hits. The removal of SearchResponseListener is not a breaking change because it was never released. This change also replace the blocking call in ExpandCollapseSearchResponseListener by a single asynchronous multi search request. The parallelism of the expand request can be set via CollapseBuilder#max_concurrent_group_searches Closes #23048	2017-02-09 18:06:10 +01:00
Tanguy Leroux	3553522328	Add parameter to prefix aggs name with type in search responses (#22965 ) This pull request adds a new parameter to the REST Search API named `typed_keys`. When set to true, the aggregation names in the search response will be prefixed with a prefix that reflects the internal type of the aggregation. Here is a simple example: ``` GET /_search?typed_keys { "aggs": { "tweets_per_user": { "terms": { "field": "user" } } }, "size": 0 } ``` And the response: ``` { "aggs": { "sterms:tweets_per_user": { ... } } } ``` This parameter is intended to make life easier for REST clients that could parse back the prefix and could detect the type of the aggregation to parse. It could also be implemented for suggesters.	2017-02-09 11:19:04 +01:00
Tanguy Leroux	832952cb29	[Docs] Fix consoleify search-template.asciidoc It does not reproduce well, hopefully this will fix the failure on DELETE _search/template/<templatename>.	2017-02-08 21:23:38 +01:00
Igor Motov	1fc4fa5729	Docs: CONSOLEify native script docs Relates #23001	2017-02-08 13:30:39 -05:00
Jay Modi	7f3769c745	Remove ldjson support and document ndjson for bulk/msearch (#23049 ) This commit removes support for the `application/x-ldjson` Content-Type header as this was only used in the first draft of the spec and had very little uptake. Additionally, the docs for bulk and msearch have been updated to specifically call out ndjson and mention that the newline character may be preceded by a carriage return. Finally, the bulk request handling of the carriage return has been improved to remove this character from the source. Closes #23025	2017-02-08 11:55:50 -05:00
Clinton Gormley	40f40d7676	Docs: Fix termvectors by removing example blocks with embedded CONSOLE tests	2017-02-08 17:12:40 +01:00
Tanguy Leroux	477d1aa8bf	[Docs] Consoleify multi-search and search-template docs (#23047 ) Relates #23001	2017-02-08 17:05:22 +01:00
Christoph Büscher	e177d2ca40	Docs: CONSOLEify termvectors.asciidoc (#23046 )	2017-02-08 16:06:11 +01:00
Daniel Mitterdorfer	88c5627a1b	CONSOLEify multi-termvectors docs Relates #23001	2017-02-08 11:47:42 +01:00
Yannick Welsch	9154686623	Remove legacy primary shard allocation mode based on versions (#23016 ) Elasticsearch v5.0.0 uses allocation IDs to safely allocate primary shards whereas prior versions of ES used a version-based mode instead. Elasticsearch v5 still has support for version-based primary shard allocation as it needs to be able to load 2.x shards. ES v6 can drop the legacy support.	2017-02-08 10:00:55 +01:00
Lee Hinman	b3c27a7fdd	Disallow include_in_all for 6.0+ indices Since `_all` is now deprecated and cannot be set for new indices, we should also disallow any field that has the `include_in_all` parameter set. Resolves #22923	2017-02-07 19:31:51 -07:00
Tim Brooks	ad4bfa2307	Docs: CONSOLEify transport docs (#23027 ) This is related to #23001.	2017-02-07 20:06:28 -06:00
Jordan Robinson	693b0017af	Small typo fix in Windows service documentation This commit removes a duplicate definite article in the Windows service documentation. Relates #23028	2017-02-07 17:25:46 -05:00
Nik Everett	0e98c9107a	Docs: CONSOLEify some more docs These need to be CONSOLEified now because we're starting to require Content-Type headers and they didn't have any. * cluster/reroute: Marked as CONSOLE but skipped because the docs build runs with a single node. * docs/bulk: Marked as NOTCONSOLE because the snippets describe either examples or `curl` commands. Fixed the `curl` command to include the `Content-Type` header. * query-dsl/terms-query: Marked as CONSOLE. * search/request/rescore: Marked as CONSOLE. Fixed deprecated syntax. Relates #23001 Relates #18160	2017-02-07 16:49:01 -05:00
Nik Everett	0c011cb290	Docs: CONSOLEify histogram aggregation docs This adds the `COPY AS CURL` and `VIEW IN CONSOLE` links to the docs and causes the snippets to be tested during Elasticsearch's build. Relates to #18160	2017-02-07 16:09:32 -05:00
Nik Everett	245aa0404a	Docs: CONSOLEify sum aggregation docs This adds the `COPY AS CURL` and `VIEW IN CONSOLE` buttons to the docs and makes the build execute the snippets as part of `docs:check`. Relates to #18160	2017-02-07 14:18:54 -05:00
Jim Ferenczi	bbf62e3472	CONSOLify indices/analyze.asciidoc and search/field-stats.asciidoc Relates #23001	2017-02-07 20:15:09 +01:00
Nik Everett	274ee30d34	Docs: CONSOLEify the avg aggregation docs This creates the `COPY AS CURL` and `VIEW IN CONSOLE` buttons and makes the build test the examples. Relates to #18160	2017-02-07 13:48:27 -05:00
Nik Everett	a2ed676862	Docs: Explain painless's method dispatch (#23021 ) Painless uses Ruby-like method dispatch (reciever type, method name, and arity) rather than Java-like (reciever type, method name, and argument compile time types) or Groovy-like method dispatch (receiver type, method name, and argument run time types). We do this for mostly good reasons but we never documented it. Relates to #22720	2017-02-07 12:09:22 -05:00
Clinton Gormley	f5e7c25e24	Update normalizers.asciidoc analyzers -> normalizers	2017-02-07 12:09:39 +01:00
Simon Willnauer	dc659feeb4	Add a setting to disable remote cluster connections on a node (#23005 ) Today either all nodes in the cluster connect to remote clusters of only nodes that have remote clusters configured in their node config. To allow global remote cluster configuration but restrict connections to a set of nodes in the cluster this change adds a new setting `search.remote.connect` (defaults to `true`) to allow to disable remote cluster connections on a per node basis.	2017-02-07 09:59:24 +01:00
Nik Everett	0d6e622242	Make dates be ReadableDateTimes in scripts (#22948 ) Instead of longs. If you want millis since epoch you can call doc.date_field.value.millis. Relates to #22875	2017-02-06 16:44:56 -05:00
Nicholas Knize	bc884c1e7b	[Docs] Remove ignore_malformed from Geo Query DSL docs This commit removes the ignore_malformed parameter from the Geo Query DSL documentation.	2017-02-06 14:27:15 -06:00
javanna	b9cf6333bd	[TEST] fix typo in cross cluster search docs	2017-02-05 15:56:45 +01:00
Clinton Gormley	e181a020a9	Replaced absolute URLs in docs with attributes	2017-02-04 12:05:03 +01:00
Clinton Gormley	c1be26f2e1	Centralised doc versions in docs/Versions.asciidoc	2017-02-04 11:16:19 +01:00
Lee Hinman	39e7c30912	Change certain replica failures not to fail the replica shard This changes the way that replica failures are handled such that not all failures will cause the replica shard to be failed or marked as stale. In some cases such as refresh operations, or global checkpoint syncs, it is "okay" for the operation to fail without the shard being failed (because no data is out of sync). In these cases, instead of failing the shard we should simply fail the operation, and, in the event it is a user-facing operation, return a 5xx response code including the shard-specific failures. This was accomplished by having two forms of the `Replicas` proxy, one that is for non-write operations that does not fail the shard, and one that is for write operations that will fail the shard when an operation fails. Relates to #10708	2017-02-03 14:39:46 -07:00
Nik Everett	70e3cce904	Fix name of `enable_position_increments` (#22895 ) It was accidentally renamed `enabled_position_increment` in the cleanups for 5.0. This adds `enable_position_increment` as a deprecated alias so it will continue to work.	2017-02-03 16:28:27 -05:00
Nicholas Knize	b1a6b227e1	Remove deprecated geo query parameters, and GeoPointDistanceRangeQuery This commit removes the following queries and parameters (which were deprecated in 5.0): * GeoPointDistanceRangeQuery * coerce, and ignore_malformed for GeoBoundingBoxQuery, GeoDistanceQuery, GeoPolygonQuery, and GeoDistanceSort	2017-02-03 10:08:00 -06:00
Nicholas Knize	f1e1975882	[DOCS] Add sloppy_arc and factor removal to 6.0 migration docs	2017-02-03 09:49:12 -06:00
AlexNodex	fb8bdbc57a	Update typo in date (#22955 ) your example has yyy and it should be yyyy	2017-02-03 13:16:17 +01:00
Clinton Gormley	8ace37e214	Fix asciidoc in stored fields	2017-02-03 10:18:01 +01:00
Jim Ferenczi	4876448e39	Consilify get-field-mapping docs (#22936 ) This change also removes the reference to the difference bewteen full name and index name. They are always the same since 2.x and `name` does not refer anymore to `author.name` automatically. A simple pattern must be used instead. Remove redundant code that checks the field name twice.	2017-02-03 10:04:31 +01:00
Jun Ohtani	7ea457955d	Merge pull request #22879 from johtani/fix_documentation_error_in_date_histogram [Doc]Not support "M" time unit in offset param	2017-02-03 16:40:08 +09:00
Jay Modi	7520a107be	Optionally require a valid content type for all rest requests with content (#22691 ) This change adds a strict mode for xcontent parsing on the rest layer. The strict mode will be off by default for 5.x and in a separate commit will be enabled by default for 6.0. The strict mode, which can be enabled by setting `http.content_type.required: true` in 5.x, will require that all incoming rest requests have a valid and supported content type header before the request is dispatched. In the non-strict mode, the Content-Type header will be inspected and if it is not present or not valid, we will continue with auto detection of content like we have done previously. The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to auto-detect the content type. As part of this, many transport requests and builders were updated to provide methods that accepted the XContentType along with the bytes and the methods that would rely on auto-detection have been deprecated. In the non-strict mode, deprecation warnings are issued whenever a request with body doesn't provide the Content-Type header. See #19388	2017-02-02 14:07:13 -05:00
Nicholas Knize	b41d5747f0	Reduce GeoDistance insanity GeoDistance query, sort, and scripts make use of a crazy GeoDistance enum for handling 4 different ways of computing geo distance: SLOPPY_ARC, ARC, FACTOR, and PLANE. Only two of these are necessary: ARC, PLANE. This commit removes SLOPPY_ARC, and FACTOR and cleans up the way Geo distance is computed.	2017-02-02 12:39:42 -06:00
Nik Everett	dacc150934	Expose multi-valued dates to scripts and document painless's date functions (#22875 ) Implemented by wrapping an array of reused `ModuleDateTime`s that we grow when needed. The `ModuleDateTime`s are reused when we move to the next document. Also improves the error message returned when attempting to modify the `ScriptdocValues`, removes a couple of allocations, and documents that the date functions are available in Painless. Relates to #22162	2017-02-01 21:57:07 -05:00
Jack Conradson	3d2626c4c6	Change Namespace for Stored Script to Only Use Id (#22206 ) Currently, stored scripts use a namespace of (lang, id) to be put, get, deleted, and executed. This is not necessary since the lang is stored with the stored script. A user should only have to specify an id to use a stored script. This change makes that possible while keeping backwards compatibility with the previous namespace of (lang, id). Anywhere the previous namespace is used will log deprecation warnings. The new behavior is the following: When a user specifies a stored script, that script will be stored under both the new namespace and old namespace. Take for example script 'A' with lang 'L0' and data 'D0'. If we add script 'A' to the empty set, the scripts map will be ["A" -- D0, "A#L0" -- D0]. If a script 'A' with lang 'L1' and data 'D1' is then added, the scripts map will be ["A" -- D1, "A#L1" -- D1, "A#L0" -- D0]. When a user deletes a stored script, that script will be deleted from both the new namespace (if it exists) and the old namespace. Take for example a scripts map with {"A" -- D1, "A#L1" -- D1, "A#L0" -- D0}. If a script is removed specified by an id 'A' and lang null then the scripts map will be {"A#L0" -- D0}. To remove the final script, the deprecated namespace must be used, so an id 'A' and lang 'L0' would need to be specified. When a user gets/executes a stored script, if the new namespace is used then the script will be retrieved/executed using only 'id', and if the old namespace is used then the script will be retrieved/executed using 'id' and 'lang'	2017-01-31 13:27:02 -08:00
Jim Ferenczi	f6d38d480a	Integrate UnifiedHighlighter (#21621 ) * Integrate UnifiedHighlighter This change integrates the Lucene highlighter called "unified" in the list of supported highlighters for ES. This highlighter can extract offsets from either postings, term vectors, or via re-analyzing text. The best strategy is picked automatically at query time and depends on the field and the query to highlight.	2017-01-31 19:06:03 +01:00

1 2 3 4 5 ...

3460 Commits