OpenSearch

Commit Graph

Author	SHA1	Message	Date
Adrien Grand	8d6a41f671	Nested queries should avoid adding unnecessary filters when possible. (#23079 ) When nested objects are present in the mappings, many queries get deoptimized due to the need to exclude documents that are not in the right space. For instance, a filter is applied to all queries that prevents them from matching non-root documents (`+: -_type:__`). Moreover, a filter is applied to all child queries of `nested` queries in order to make sure that the child query only matches child documents (`_type:__nested_path`), which is required by `ToParentBlockJoinQuery` (the Lucene query behing Elasticsearch's `nested` queries). These additional filters slow down `nested` queries. In 1.7-, the cost was somehow amortized by the fact that we cached filters very aggressively. However, this has proven to be a significant source of slow downs since 2.0 for users of `nested` mappings and queries, see #20797. This change makes the filtering a bit smarter. For instance if the query is a `match_all` query, then we need to exclude nested docs. However, if the query is `foo: bar` then it may only match root documents since `foo` is a top-level field, so no additional filtering is required. Another improvement is to use a `FILTER` clause on all types rather than a `MUST_NOT` clause on all nested paths when possible since `FILTER` clauses are more efficient. Here are some examples of queries and how they get rewritten: ``` "match_all": {} ``` This query gets rewritten to `ConstantScore(+:* -_type:__)` on master and `ConstantScore(_type:AutomatonQuery {\norg.apache.lucene.util.automaton.Automaton@4371da44})` with this change. The automaton is the complement of `_type:__` so it matches the same documents, but is faster since it is now a positive clause. Simplistic performance testing on a 10M index where each root document has 5 nested documents on average gave a latency of 420ms on master and 90ms with this change applied. ``` "term": { "foo": { "value": "0" } } ``` This query is rewritten to `+foo:0 #(ConstantScore(+: -_type:__))^0.0` on master and `foo:0` with this change: we do not need to filter nested docs out since the query cannot match nested docs. While doing performance testing in the same conditions as above, response times went from 250ms to 50ms. ``` "nested": { "path": "nested", "query": { "term": { "nested.foo": { "value": "0" } } } } ``` This query is rewritten to `+ToParentBlockJoinQuery (+nested.foo:0 #_type:__nested) #(ConstantScore(+:* -_type:__))^0.0` on master and `ToParentBlockJoinQuery (nested.foo:0)` with this change. The top-level filter (`-_type:__`) could be removed since `nested` queries only match documents of the parent space, as well as the child filter (`#_type:__nested`) since the child query may only match nested docs since the `nested` object has both `include_in_parent` and `include_in_root` set to `false`. While doing performance testing in the same conditions as above, response times went from 850ms to 270ms.	2017-02-14 16:05:19 +01:00
Tanguy Leroux	e2e5937455	Use `typed_keys` parameter to prefix suggester names by type in search responses (#23080 ) This pull request reuses the typed_keys parameter added in #22965, but this time it applies it to suggesters. When set to true, the suggester names in the search response will be prefixed with a prefix that reflects their type.	2017-02-10 10:53:38 +01:00
Tanguy Leroux	63ea6f7168	[Docs] Remove unnecessary // TEST[continued] in search-template doc It has been explained in `e39b96f257`	2017-02-10 10:08:24 +01:00
Jim Ferenczi	94087b3274	Removes ExpandCollapseSearchResponseListener, search response listeners and blocking calls This changes removes the SearchResponseListener that was used by the ExpandCollapseSearchResponseListener to expand collapsed hits. The removal of SearchResponseListener is not a breaking change because it was never released. This change also replace the blocking call in ExpandCollapseSearchResponseListener by a single asynchronous multi search request. The parallelism of the expand request can be set via CollapseBuilder#max_concurrent_group_searches Closes #23048	2017-02-09 18:06:10 +01:00
Tanguy Leroux	832952cb29	[Docs] Fix consoleify search-template.asciidoc It does not reproduce well, hopefully this will fix the failure on DELETE _search/template/<templatename>.	2017-02-08 21:23:38 +01:00
Jay Modi	7f3769c745	Remove ldjson support and document ndjson for bulk/msearch (#23049 ) This commit removes support for the `application/x-ldjson` Content-Type header as this was only used in the first draft of the spec and had very little uptake. Additionally, the docs for bulk and msearch have been updated to specifically call out ndjson and mention that the newline character may be preceded by a carriage return. Finally, the bulk request handling of the carriage return has been improved to remove this character from the source. Closes #23025	2017-02-08 11:55:50 -05:00
Tanguy Leroux	477d1aa8bf	[Docs] Consoleify multi-search and search-template docs (#23047 ) Relates #23001	2017-02-08 17:05:22 +01:00
Nik Everett	0e98c9107a	Docs: CONSOLEify some more docs These need to be CONSOLEified now because we're starting to require Content-Type headers and they didn't have any. * cluster/reroute: Marked as CONSOLE but skipped because the docs build runs with a single node. * docs/bulk: Marked as NOTCONSOLE because the snippets describe either examples or `curl` commands. Fixed the `curl` command to include the `Content-Type` header. * query-dsl/terms-query: Marked as CONSOLE. * search/request/rescore: Marked as CONSOLE. Fixed deprecated syntax. Relates #23001 Relates #18160	2017-02-07 16:49:01 -05:00
Jim Ferenczi	bbf62e3472	CONSOLify indices/analyze.asciidoc and search/field-stats.asciidoc Relates #23001	2017-02-07 20:15:09 +01:00
Clinton Gormley	e181a020a9	Replaced absolute URLs in docs with attributes	2017-02-04 12:05:03 +01:00
Clinton Gormley	8ace37e214	Fix asciidoc in stored fields	2017-02-03 10:18:01 +01:00
Nicholas Knize	b41d5747f0	Reduce GeoDistance insanity GeoDistance query, sort, and scripts make use of a crazy GeoDistance enum for handling 4 different ways of computing geo distance: SLOPPY_ARC, ARC, FACTOR, and PLANE. Only two of these are necessary: ARC, PLANE. This commit removes SLOPPY_ARC, and FACTOR and cleans up the way Geo distance is computed.	2017-02-02 12:39:42 -06:00
Jim Ferenczi	f6d38d480a	Integrate UnifiedHighlighter (#21621 ) * Integrate UnifiedHighlighter This change integrates the Lucene highlighter called "unified" in the list of supported highlighters for ES. This highlighter can extract offsets from either postings, term vectors, or via re-analyzing text. The best strategy is picked automatically at query time and depends on the field and the query to highlight.	2017-01-31 19:06:03 +01:00
Jim Ferenczi	e7e871acdd	Fix link to keyword and numeric type	2017-01-30 13:57:28 +01:00
Clinton Gormley	938f5194ef	Include field-collapsing docs in request-body search	2017-01-30 11:47:12 +01:00
Jim Ferenczi	e48bc2eed7	Add field collapsing for search request (#22337 ) * Add top hits collapsing to search request The field collapsing is done with a custom top docs collector that "collapse" search hits with same field value. The distributed aspect is resolve using the two passes that the regular search uses. The first pass "collapse" the top hits, then the coordinating node merge/collapse the top hits from each shard. ``` GET _search { "collapse": { "field": "category", } } ``` This change also adds an ExpandCollapseSearchResponseListener that intercepts the search response and expands collapsed hits using the CollapseBuilder#innerHit} options. The retrieval of each inner_hits is done by sending a query to all shards filtered by the collapse key. ``` GET _search { "collapse": { "field": "category", "inner_hits": { "size": 2 } } } ```	2017-01-23 16:33:51 +01:00
Christoph Büscher	9ed867ea83	[DOCS] Fix inconsistent formatting for fieldnames in profile.asciidoc	2017-01-18 10:41:22 +01:00
Clinton Gormley	401438819e	Docs: Fix the first highlighting example to work Closes #22642	2017-01-17 12:20:03 +01:00
Christoph Büscher	2791c69960	Update profile.asciidoc Making the "Human readable output" section a note instead of an own section.	2017-01-16 16:19:07 +01:00
Christoph Büscher	49a49da3f5	[Docs] Fix section title in profile.asciidoc	2017-01-16 14:53:06 +01:00
Christoph Büscher	59a48ffc41	ProfileResult and CollectorResult should print machine readable timing information (#22561 ) Currently both ProfileResult and CollectorResult print the time field in a human readable string format (e.g. "time": "55.20315000ms"). When trying to parse this back to a long value, for example to use in the planned high level java rest client, we can lose precision because of conversion and rounding issues. This change adds a new additional field (`time_in_nanos`) to the profile response to be able to get the original time value in nanoseconds back. The old `time` field is only printed when the `?`human=true` flag in the url is set. This follow the behaviour for all other stats-related apis. Also the format of the `time` field is slightly changed. Instead of always formatting the output as a 10-digit ms value, by using the `XContentBuilder#timeValueField()` method we now print the largest time unit present is used (e.g. "s", "ms", "micros").	2017-01-16 14:27:55 +01:00
maciejkula	b4c8c21553	State default sort order on missing values Closes #19099	2017-01-13 17:05:13 +01:00
Lee Hinman	2db01b6127	Merge remote-tracking branch 'dakrone/disable-all-by-default'	2017-01-12 10:17:51 -07:00
Tanguy Leroux	df703dce0a	[DOC] Document {{url}} mustache function (#22549 ) This function introduced in #20838 wasn't documented at all. Related to #22459	2017-01-12 14:57:03 +01:00
Lee Hinman	7a18bb50fc	Disable _all by default This change disables the _all meta field by default. Now that we have the "all-fields" method of query execution, we can save both indexing time and disk space by disabling it. _all can no longer be configured for indices created after 6.0. Relates to #20925 and #21341 Resolves #19784	2017-01-11 16:47:13 -07:00
Martijn van Groningen	cb2333dacd	percolator: remove deprecated percolate and mpercolate apis	2017-01-10 11:18:27 +01:00
Masaru Hasegawa	a0185c83a7	Merge pull request #21393 from masaruh/alias_boost Resolve index names in indices_boost	2016-12-16 15:07:51 +09:00
Areek Zillur	cdd5fbe3a1	Deprecate _suggest endpoint in favour of _search (#20305 ) * Replace _suggest endpoint to _search in docs In 5.0, the _suggest endpoint is just sugar for _search with suggestions specified. Users should move away from using the _suggest endpoint, as it is marked as deprecated in 5.x and will be removed in 6.0 * update docs to use _search endpoint instead of _suggest * Add deprecation logging to RestSuggestAction * Use search endpoint instead of suggest endpoint in rest tests	2016-12-14 21:49:53 -05:00
Masaru Hasegawa	3df2a086d4	Resolve index names in indices_boost This change allows specifying alias/wildcard expression in indices_boost. And added another format for specifying indices_boost. It accepts array of index name and boost pair. If an index is included in multiple aliases/wildcard expressions, the first match will be used. With new format, old format is marked as deprecated. Closes #4756	2016-12-11 21:41:49 +09:00
Jim Ferenczi	b42ca6bcc9	Include unindexed field in FieldStats response (#21821 ) * Include unindexed field in FieldStats response This change adds non-searchable fields to the FieldStats response. These fields do not have min/max informations but they can be aggregatable. Fields that are only stored in _source (store:no, index:no, doc_values:no) will still be missing since they do not have any useful information to show. Indices and clients must be at least on V_5_2_0 to see this change.	2016-12-06 13:32:57 +01:00
Jim Ferenczi	d791ddf704	Upgrade to lucene-6.4.0-snapshot-ec38570 (#21853 ) Set lucene version to 6.4.0-snapshot-ec38570 and update all the sha1s/license Fix invalid combo after upgrade in query_string query. split_on_whitespace=false is disallowed if auto_generate_phrase_queries=true Adapt the expectations of some tests to the new format of the Lucene explain output	2016-11-29 18:40:31 +01:00
Adrin Jalali	eec05ec208	then -> than (#21829 )	2016-11-28 17:04:56 +01:00
Adrin Jalali	953928b2c5	typo fix (it self -> itself) (#21781 ) * typo fix. * apply "stored field value" * replaced "whereas" with "on the contrary"	2016-11-24 17:11:43 +01:00
Adrin Jalali	0871073f9b	clarification on geo distance sorting (#21779 ) * clarification on geo distance sorting * applying the suggested change	2016-11-24 16:06:10 +01:00
Luca Cavanna	db5a72774b	Add indices and filter information to search shards api output (#21738 ) Add indices and filter information to search shards api output The search shards api returns info about which shards are going to be hit by executing a search with provided parameters: indices, routing, preference. Indices can also be aliases, which can also hold filters. The output includes an array of shards and a summary of all the nodes the shards are allocated on. This commit adds a new indices section to the search shards output that includes one entry per index, where each index can be associated with an optional filter in case the index was hit through a filtered alias. This is relevant since we have moved parsing of alias filters to the coordinating node. Relates to #20916	2016-11-22 23:00:25 +01:00
Luca Cavanna	db8b2dceea	Remove ignored type parameter in search_shards api (#21688 ) The `type` parameter has always been accepted by the search_shards api, probably to make the api and its urls the same as search. Truth is that the type never had any effect, it's been ignored from day one while accepting it may make users think that we actually do something with it. This commit removes support for the type parameter from the REST layer and the Java API. Backwards compatibility is maintained on the transport layer though. The new added serialization test also uncovered a bug in the java API where the `ClusterSearchShardsRequest` could be created with no arguments, but the indices were required to be not null otherwise the request couldn't be serialized as `writeTo` would throw NPE. Fixed by setting a default value (empty array) for indices.	2016-11-22 17:22:33 +01:00
Lee Hinman	11da09e9bc	Allow overriding all-field leniency when `lenient` option is specified As part of #20925 and #21341 we added an "all-fields" mode to the `query_string` and `simple_query_string`. This would expand the query to all fields and automatically set `lenient` to true. However, we should still allow a user to override the `lenient` flag to whichever value they desire, should they add it in the request. This commit does that.	2016-11-21 21:32:25 -07:00
Luca Wintergerst	277f4b8d24	fix two errors in suggester docs The first changed referred to an example of the 2.4 documentation. I removed the no longer relevant parts. We should consider adding a little more here. The second change was just then->than in the suggest_mode popular section	2016-11-18 12:05:49 +01:00
Nik Everett	593d47efe2	Make it clear _suggest doesn't support source filtering (#21268 ) We plan to deprecate `_suggest` during 5.0 so it isn't worth fixing it to support the `_source` parameter for `_source` filtering. But we should fix the docs so they are accurate. Since this removes the last non-`// CONSOLE` line in `completion-suggest.asciidoc` this also removes it from the list of files that have non-`// CONSOLE` docs. Closes #20482	2016-11-06 20:15:45 -05:00
Adrien Grand	52de0645fb	Remove `lowercase_expanded_terms` and `locale` from query-parser options. (#20208 ) Lucene 6.2 introduces the new `Analyzer.normalize` API, which allows to apply only character-level normalization such as lowercasing or accent folding, which is exactly what is needed to process queries that operate on partial terms such as `prefix`, `wildcard` or `fuzzy` queries. As a consequence, the `lowercase_expanded_terms` option is not necessary anymore. Furthermore, the `locale` option was only needed in order to know how to perform the lowercasing, so this one can be removed as well. Closes #9978	2016-11-02 14:25:08 +01:00
Craig Squire	1f1daf59bc	Documentation updates for scroll API size parameter (#21229 ) * Document size parameter for scroll API * Fix size parameter behavior description for scroll	2016-11-01 15:55:09 -04:00
Igor Motov	17ad88d539	Makes search action cancelable by task management API Long running searches now can be cancelled using standard task cancellation mechanism.	2016-10-25 12:27:34 -10:00
Jim Ferenczi	d0bbe89c16	Optimize query with types filter in the URL (t/t/_search) (#20979 ) This change adds a TypesQuery that checks if the disjunction of types should be rewritten to a MatchAllDocs query. The check is done only if the number of terms is below a threshold (16 by default and configurable via max_boolean_clause).	2016-10-20 12:33:32 +02:00
Joshua Rich	cdb156e691	Merge pull request #20794 from joshuar/doc/fix_highlighter_ambiguities [DOCS] Use a better name for fields in examples to avoid ambiguity	2016-10-18 14:23:27 +11:00
Adrien Grand	7a403f640b	Clarify some docs about geo-distance sorting. (#20735 ) This also improves formatting a bit.	2016-10-07 15:26:34 +02:00
Jason Tedor	d01a62908a	Change separator for shards preference The shards preference on a search request enables specifying a list of shards to hit, and then a secondary preference (e.g., "_primary") can be added. Today, the separator between the shards list and the secondary preference is ';'. Unfortunately, this is also a valid separtor for URL query parameters. This means that a preference like "_shards:0;_primary" will be parsed into two URL parameters: "_shards:0" and "_primary". With the recent change to strict URL parsing, the second parameter will be rejected, "_primary" is not a valid URL parameter on a search request. This means that this feature has never worked (unless the ';' is escaped, but no one does that because our docs do not that, and there was no indication from Elasticsearch that this did not work). This commit changes the separator to '\|'. Relates #20786	2016-10-07 07:17:01 -05:00
Joshua Rich	e06a40ccbd	[DOCS] Use a better name for fields in examples to avoid ambiguity Previously, this doc was using a field called "content". This is confusing, especially when the doc starts talking about the content of the content field. This change makes the field name "comment" which is less ambiguous and also changes some related field names in the doc to make a consistent example theme of editing docs around blog posts.	2016-10-07 14:46:55 +11:00
Nik Everett	41d6529d06	CONSOLEify scroll docs This causes the snippets to be tested during the build and gives helpful links to the reader to open the docs in console or copy them as curl commands. Relates to #18160	2016-10-05 11:21:54 -04:00
Simon Willnauer	74184cb1b0	Stabelize tests in phrase-suggest.asciidoc	2016-09-29 11:13:17 +02:00
Nik Everett	560fba1b28	Document that sliced scroll works for reindex Surprise! You can use sliced scroll to easily parallelize reindex and friend. They support it because they use the same infrastructure as a regular search to parse the search request. While we would like to make an "automatic" option for parallelizing reindex, this manual option works right now and is pretty convenient!	2016-09-26 05:27:44 +02:00

1 2 3 4 5 ...

591 Commits