OpenSearch

Commit Graph

Author	SHA1	Message	Date
Simon Willnauer	e81804cfa4	Add a shard filter search phase to pre-filter shards based on query rewriting (#25658 ) Today if we search across a large amount of shards we hit every shard. Yet, it's quite common to search across an index pattern for time based indices but filtering will exclude all results outside a certain time range ie. `now-3d`. While the search can potentially hit hundreds of shards the majority of the shards might yield 0 results since there is not document that is within this date range. Kibana for instance does this regularly but used `_field_stats` to optimize the indexes they need to query. Now with the deprecation of `_field_stats` and it's upcoming removal a single dashboard in kibana can potentially turn into searches hitting hundreds or thousands of shards and that can easily cause search rejections even though the most of the requests are very likely super cheap and only need a query rewriting to early terminate with 0 results. This change adds a pre-filter phase for searches that can, if the number of shards are higher than a the `pre_filter_shard_size` threshold (defaults to 128 shards), fan out to the shards and check if the query can potentially match any documents at all. While false positives are possible, a negative response means that no matches are possible. These requests are not subject to rejection and can greatly reduce the number of shards a request needs to hit. The approach here is preferable to the kibana approach with field stats since it correctly handles aliases and uses the correct threadpools to execute these requests. Further it's completely transparent to the user and improves scalability of elasticsearch in general on large clusters.	2017-07-12 22:19:20 +02:00
James Baiera	847378a43b	Add another parent value option to join documentation (#25609 ) Indexing a join field on a document requires a value of type "object" and two sub fields "name" and "parent". The "parent" field is only required on child documents, but the "name" field which denotes the name of the relation is always needed. Previously, only the short-hand version of the join field was documented. This adds documentation for the long-hand join field data, and explicitly points out that just specifying the name of the relation for the field value is a convenience shortcut.	2017-07-11 15:36:59 -04:00
Martijn van Groningen	d0f9f425bd	parent/child: Removed ParentJoinFieldSubFetchPhase	2017-07-06 13:15:02 +02:00
Adrien Grand	26de905f1e	Fix the documentation to state that the `_id` field is indexed. (#25540 )	2017-07-05 16:09:31 +02:00
Clinton Gormley	0170e0e8d3	Remove usage of multi-types from the docs and added a page explaining type removal (#25543 ) Closes #25401	2017-07-05 12:30:19 +02:00
Martijn van Groningen	9ce9c21b83	docs: added percolator script query limitation	2017-06-28 17:10:30 +02:00
Nathan Taylor	645bb9d0fb	Docs: Removed duplicated line in mapping docs	2017-06-21 10:47:19 +02:00
Jim Ferenczi	afada69ea9	[Docs] more fix for the parent-join docs	2017-06-16 12:49:16 +02:00
Jim Ferenczi	664193185e	[Docs] Fix cross reference for parent-join field	2017-06-16 11:53:16 +02:00
Jim Ferenczi	ccb3c9aae7	Add documentation for the new parent-join field (#25227 ) * Add documentation for the new parent-join field This commit adds the docs for the new parent-join field. It explains how to define, index and query this new field. Relates #20257	2017-06-16 11:13:23 +02:00
Russ Cam	f6821c41d8	Add half_float and scaled float (#22988 ) to numeric datatypes (cherry picked from commit 67ea06145a80d5ec52ba55d1f2e1e8287e1882b1)	2017-06-13 09:54:44 +10:00
Ryan Ernst	a03b6c2fa5	Scripting: Change keys for inline/stored scripts to source/id (#25127 ) This commit adds back "id" as the key within a script to specify a stored script (which with file scripts now gone is no longer ambiguous). It also adds "source" as a replacement for "code". This is in an attempt to normalize how scripts are specified across both put stored scripts and script usages, including search template requests. This also deprecates the old inline/stored keys.	2017-06-09 08:29:25 -07:00
Jim Ferenczi	8250aa4267	Remove the postings highlighter and make unified the default highlighter choice (#25028 ) This change removes the `postings` highlighter. This highlighter has been removed from Lucene master (7.x) because it behaves exactly like the `unified` highlighter when index_options is set to `offsets`: https://issues.apache.org/jira/browse/LUCENE-7815 It also makes the `unified` highlighter the default choice for highlighting a field (if `type` is not provided). The strategy used internally by this highlighter remain the same as before, it checks `term_vectors` first, then `postings` and ultimately it re-analyzes the text. Ultimately it rewrites the docs so that the options that the `unified` highlighter cannot handle are clearly marked as such. There are few features that the `unified` highlighter is not able to handle which is why the other highlighters (`plain` and `fvh`) are still available. I'll open separate issues for these features and we'll deprecate the `fvh` and `plain` highlighters when full support for these features have been added to the `unified`.	2017-06-09 14:09:57 +02:00
Andrey Groshev	e4fd8485ce	Made the same length of opening and closing lines (#23583 )	2017-06-09 00:50:43 -07:00
Jim Ferenczi	ad905924ae	update docs that claim that classic is the default similarity	2017-06-09 09:22:48 +02:00
Adrien Grand	ebf806d38f	Reorganize docs of global ordinals. (#24982 ) Currently global ordinals are documented under `fielddata`. It moves them to their own file since they also work with doc values and fielddata is on the way out. Closes #23101	2017-06-01 16:47:44 +02:00
markharwood	b7197f5e21	SignificantText aggregation - like significant_terms, but for text (#24432 ) * SignificantText aggregation - like significant_terms but doesn’t require fielddata=true, recommended used with `sampler` agg to limit expense of tokenizing docs and takes optional `filter_duplicate_text`:true setting to avoid stats skew from repeated sections of text in search results. Closes #23674	2017-05-24 13:46:43 +01:00
Adrien Grand	a72eaa8e0f	Identify documents by their `_id`. (#24460 ) Now that indices have a single type by default, we can move to the next step and identify documents using their `_id` rather than the `_uid`. One notable change in this commit is that I made deletions implicitly create types. This helps with the live version map in the case that documents are deleted before the first type is introduced. Otherwise there would be no way to differenciate `DELETE index/foo/1` followed by `PUT index/foo/1` from `DELETE index/bar/1` followed by `PUT index/foo/1`, even though those are different if versioning is involved.	2017-05-09 16:33:52 +02:00
Nicholas Knize	0c4eb0a029	Add new ip_range field type This commit adds support for indexing and searching a new ip_range field type. Both IPv4 and IPv6 formats are supported. Tests are updated and docs are added.	2017-05-05 09:43:42 -05:00
Nik Everett	a01f846226	CONSOLEify a few more docs Adds CONSOLE to cross-cluster-search docs but skips them for testing because we don't have a second cluster set up. This gets us the `VIEW IN CONSOLE` and `COPY AS CURL` links and makes sure that they are valid yaml (not json, technically) but doesn't get testing. Which is better than we had before. Adds CONSOLE to the dynamic templates docs and ingest-node docs. The ingest-node docs contain a ton of non-console snippets. We might want to convert them to full examples later, but that can be a separate thing. Relates to #18160	2017-05-04 21:01:14 -04:00
Adrien Grand	1be2800120	Only allow one type on 7.0 indices (#24317 ) This adds the `index.mapping.single_type` setting, which enforces that indices have at most one type when it is true. The default value is true for 6.0+ indices and false for old indices. Relates #15613	2017-04-27 08:43:20 +02:00
Danilo Akamine	0adaf9fb4c	Drop `search_analyzer` parameter from keyword.asciidoc (#24221 ) `search_analyzer` isn't supported by `keyword` fields so this removes it from the documentation for them.	2017-04-25 12:49:50 -04:00
Nik Everett	e429d66956	CONSOLEify some more docs Relates to #18160	2017-04-24 16:08:19 -04:00
Fabien Baligand	4a45579506	token_count type : add an option to count tokens (fix #23227 ) (#24175 ) Add option "enable_position_increments" with default value true. If option is set to false, indexed value is the number of tokens (not position increments count)	2017-04-21 00:53:28 +02:00
Loek van Gool	e11d892562	Update field-names-field.asciidoc (#24178 ) fix typo in field name	2017-04-19 11:57:37 +02:00
Martijn van Groningen	3d9671a668	[PERCOLATOR] Allowing range queries with now ranges inside percolator queries. Before now ranges where forbidden, because the percolator query itself could get cached and then the percolator queries with now ranges that should no longer match, incorrectly will continue to match. By disabling caching when the `percolator` is being used, the percolator can now correctly support range queries with now based ranges. I think this is the right tradeoff. The percolator query is likely to not be the same between search requests and disabling range queries with now ranges really disabled people using the percolator for their use cases. Also fixed an issue that existed in the percolator fieldmapper, it was unable to find forbidden queries inside `dismax` queries. Closes #23859	2017-04-07 08:44:43 +02:00
Lee Hinman	b6b9ef8e26	[DOCS] Remove line about eager loading global ordinals Fielddata can no longer be configured to be loaded eagerly (it only accepts `true` and `false`), so this line is a little misleading because it talks about a procedure we can no longer do.	2017-04-03 12:56:21 -06:00
Nik Everett	653f50973a	CONSOLEify geo-shape docs `CONSOLE`ify geo-shape type and geo-shape query docs. Relates to #18160	2017-03-31 09:11:54 -04:00
Nik Everett	5f91241f57	CONSOLEify geo aggregation docs Turns the top example in each of the geo aggregation docs into a working example that can be opened in CONSOLE. Subsequent examples can all also be opened in console and will work after you've run the first example. All examples are tested as part of the build.	2017-03-30 21:28:52 -04:00
Ali Beyad	8359dd05c9	Adds boolean similarity to Elasticsearch (#23637 ) This commit adds the boolean similarity scoring from Lucene to Elasticsearch. The boolean similarity provides a means to specify that a field should not be scored with typical full-text ranking algorithms, but rather just whether the query terms match the document or not. Boolean similarity scores a query term equal to its query boost only. Boolean similarity is available as a default similarity option and thus a field can be specified to have boolean similarity by declaring in its mapping: "similarity": "boolean" Closes #6731	2017-03-28 10:17:23 -04:00
Martijn van Groningen	b116b8f0cb	[DOCS] Update the docs about the fact that global ordinals for _parent field are loaded eagerly instead of lazily by default. Relates to #8053	2017-03-22 10:39:39 +01:00
Lee Hinman	b3c27a7fdd	Disallow include_in_all for 6.0+ indices Since `_all` is now deprecated and cannot be set for new indices, we should also disallow any field that has the `include_in_all` parameter set. Resolves #22923	2017-02-07 19:31:51 -07:00
AlexNodex	fb8bdbc57a	Update typo in date (#22955 ) your example has yyy and it should be yyyy	2017-02-03 13:16:17 +01:00
Clinton Gormley	19ce039d2d	Update type-field.asciidoc Wildcard type names are not supported	2017-01-27 17:50:28 +01:00
Yannick Welsch	881993de3a	[Docs] Remove outdated info about enabling/disabling doc_values (#22694 )	2017-01-19 17:33:40 +01:00
Daniel Mitterdorfer	aece89d6a1	Make boolean conversion strict (#22200 ) This PR removes all leniency in the conversion of Strings to booleans: "true" is converted to the boolean value `true`, "false" is converted to the boolean value `false`. Everything else raises an error.	2017-01-19 07:59:18 +01:00
Scott Somerville	372812da98	Allow an index to be partitioned with custom routing (#22274 ) This change makes it possible for custom routing values to go to a subset of shards rather than just a single shard. This enables the ability to utilize the spatial locality that custom routing can provide while mitigating the likelihood of ending up with an imbalanced cluster or suffering from a hot shard. This is ideal for large multi-tenant indices with custom routing that suffer from one or both of the following: - The big tenants cannot fit into a single shard or there is so many of them that they will likely end up on the same shard - Tenants often have a surge in write traffic and a single shard cannot process it fast enough Beyond that, this should also be useful for use cases where most queries are done under the context of a specific field (e.g. a category) since it gives a hint at how the data can be stored to minimize the number of shards to check per query. While a similar solution can be achieved with multiple concrete indices or aliases per value today, those approaches breakdown for high cardinality fields. A partitioned index enforces that mappings have routing required, that the partition size does not change when shrinking an index (the partitions will shrink proportionally), and rejects mappings that have parent/child relationships. Closes #21585	2017-01-18 08:51:23 +01:00
Alex	a0c83c4511	Minor doc changes to clarify mapping index param for string type (#22652 ) * Grammatical correction * Add note for legacy string mapping type * Update truncate token filter to not mention the keyword tokenizer The advice predates the existence of the keyword field Closes #22650	2017-01-17 16:43:11 +01:00
Lee Hinman	7a18bb50fc	Disable _all by default This change disables the _all meta field by default. Now that we have the "all-fields" method of query execution, we can save both indexing time and disk space by disabling it. _all can no longer be configured for indices created after 6.0. Relates to #20925 and #21341 Resolves #19784	2017-01-11 16:47:13 -07:00
Nik Everett	75d5b3d9eb	Fix parent_id example in docs And fix some indentation I noticed while looking up the query.	2017-01-10 10:01:31 -05:00
Clinton Gormley	cb7952e71d	Docs: Parent field is no longer indexed and should use parent_id instead of term query Closes #22517	2017-01-10 13:48:07 +01:00
Jason Veatch	20f90178fe	Docs: Detail on false/strict dynamic mapping setting (#22451 ) Reference: https://www.elastic.co/guide/en/elasticsearch/guide/master/dynamic-mapping.html	2017-01-05 14:36:18 -05:00
Adrien Grand	3f805d68cb	Add the ability to set an analyzer on keyword fields. (#21919 ) This adds a new `normalizer` property to `keyword` fields that pre-processes the field value prior to indexing, but without altering the `_source`. Note that only the normalization components that work on a per-character basis are applied, so for instance stemming filters will be ignored while lowercasing or ascii folding will be applied. Closes #18064	2016-12-30 09:36:10 +01:00
Adrien Grand	84edf36f11	Make `-0` compare less than `+0` consistently. (#22173 ) Our `float`/`double` fields generally assume that `-0` compares less than `+0`, except when bounds are exclusive: an exclusive lower bound on `-0` excludes `+0` and an exclusive upper bound on `+0` excludes `-0`. Closes #22167	2016-12-21 16:51:45 +01:00
Adrien Grand	9524c81af9	Document the `locale` option of the `date` field. (#22050 ) This also adds another level of protection against using the default locale. Relates to https://discuss.elastic.co/t/mapping-for-12h-date-format/68433/3.	2016-12-09 09:45:53 +01:00
Nicholas Knize	af1ab68b64	Add RangeFieldMapper for numeric and date range types Lucene 6.2 added index and query support for numeric ranges. This commit adds a new RangeFieldMapper for indexing numeric (int, long, float, double) and date ranges and creating appropriate range and term queries. The design is similar to NumericFieldMapper in that it uses a RangeType enumerator for implementing the logic specific to each type. The following range types are supported by this field mapper: int_range, float_range, long_range, double_range, date_range. Lucene does not provide a DocValue field specific to RangeField types so the RangeFieldMapper implements a CustomRangeDocValuesField for handling doc value support. When executing a Range query over a Range field, the RangeQueryBuilder has been enhanced to accept a new relation parameter for defining the type of query as one of: WITHIN, CONTAINS, INTERSECTS. This provides support for finding all ranges that are related to a specific range in a desired way. As with other spatial queries, DISJOINT can be achieved as a MUST_NOT of an INTERSECTS query.	2016-11-29 10:10:14 -06:00
Clinton Gormley	5555e85619	Document that the PUT mapping API with the _default_ type overwrites instead of merging Closes #8215	2016-11-26 12:43:56 +01:00
Clinton Gormley	a4e88bb64a	Fixed bad asciidoc in boolean mapping docs	2016-11-15 17:50:23 +00:00
Lee Hinman	96122aa518	Be strict when parsing values searching for booleans (#21555 ) This changes only the query parsing behavior to be strict when searching on boolean values. We continue to accept the variety of values during index time, but searches will only be parsed using `"true"` or `"false"`. Resolves #21545	2016-11-15 10:36:57 -07:00
Alexander Lin	0219a211d3	Allows multiple patterns to be specified for index templates (#21009 ) * Allows for an array of index template patterns to be provided to an index template, and rename the field from 'template' to 'index_pattern'. Closes #20690	2016-11-10 18:00:30 -05:00

1 2 3 4 5 ...

315 Commits