OpenSearch

Commit Graph

Author	SHA1	Message	Date
Simon Willnauer	e81804cfa4	Add a shard filter search phase to pre-filter shards based on query rewriting (#25658 ) Today if we search across a large amount of shards we hit every shard. Yet, it's quite common to search across an index pattern for time based indices but filtering will exclude all results outside a certain time range ie. `now-3d`. While the search can potentially hit hundreds of shards the majority of the shards might yield 0 results since there is not document that is within this date range. Kibana for instance does this regularly but used `_field_stats` to optimize the indexes they need to query. Now with the deprecation of `_field_stats` and it's upcoming removal a single dashboard in kibana can potentially turn into searches hitting hundreds or thousands of shards and that can easily cause search rejections even though the most of the requests are very likely super cheap and only need a query rewriting to early terminate with 0 results. This change adds a pre-filter phase for searches that can, if the number of shards are higher than a the `pre_filter_shard_size` threshold (defaults to 128 shards), fan out to the shards and check if the query can potentially match any documents at all. While false positives are possible, a negative response means that no matches are possible. These requests are not subject to rejection and can greatly reduce the number of shards a request needs to hit. The approach here is preferable to the kibana approach with field stats since it correctly handles aliases and uses the correct threadpools to execute these requests. Further it's completely transparent to the user and improves scalability of elasticsearch in general on large clusters.	2017-07-12 22:19:20 +02:00
Adrien Grand	8c869e2a0b	More advices around search speed and disk usage. (#25252 ) It adds notes about: - how preference can help optimize cache usage - the fact that too many replicas can hurt search performance due to lower utilization of the filesystem cache - how index sorting can improve _source compression - how always putting fields in the same order in documents can improve _source compression	2017-06-16 11:23:40 +02:00
Adrien Grand	0c117145f6	Upgrade to lucene-7.0.0-snapshot-92b1783. (#25222 ) This snapshot has faster range queries on range fields (LUCENE-7828), more accurate norms (LUCENE-7730) and the ability to use fake term frequencies (LUCENE-7854).	2017-06-15 09:52:07 +02:00
Adrien Grand	bbdf50f6bd	Docs: More search speed advices. (#24802 )	2017-06-01 17:23:22 +02:00
Glen Smith	a590a22ea3	Add note and link to 'tune for disk usage' (#23252 ) * Add note and link to 'tune for disk usage' * Changed formatting as suggested Thanks, @ clintongormley!	2017-02-20 20:31:19 +01:00
Elijah	3b92179e09	Improve wording in recipes docs This commit improves some of the wording the recipes docs. Relates #22661	2017-01-17 21:00:36 -05:00
Elijah	297b1b7d9a	Capitalize "Elasticsearch" in indexing speed docs This commit fixes the capitalization of "Elasticsearch" in the indexing speed docs. Relates #22659	2017-01-17 12:33:01 -05:00
Adrien Grand	52408fc389	Add a recommendation against large documents to the docs. (#21652 )	2016-11-21 15:01:36 +01:00
Adrien Grand	68b0e395b2	Add recommendations about getting consistent scores despite shards and replicas. (#21167 ) This is a topic that has triggered many questions recently so it would be good to have these recommendations documented.	2016-11-02 10:50:38 +01:00
Adrien Grand	9cbbddb6dc	Add support for `quote_field_suffix` to `simple_query_string`. (#21060 ) Closes #18641	2016-10-28 09:11:57 +02:00
Pascal Borreli	fcb01deb34	Fixed typos (#20843 )	2016-10-10 14:51:47 -06:00
Adrien Grand	cdc27b75b8	Add more information to the how-to docs. #20297 - use auto-generated ids for indexing #20211 - use rounded dates in queries #20115	2016-09-02 14:28:47 +02:00
Adrien Grand	398d70b567	Add `scaled_float`. #19264 This is a tentative to revive #15939 motivated by elastic/beats#1941. Half-floats are a pretty bad option for storing percentages. They would likely require 2 bytes all the time while they don't need more than one byte. So this PR exposes a new `scaled_float` type that requires a `scaling_factor` and internally indexes `valuescaling_factor` in a long field. Compared to the original PR it exposes a lower-level API so that the trade-offs are clearer and avoids any reference to fixed precision that might imply that this type is more accurate (actually it is less* accurate). In addition to being more space-efficient for some use-cases that beats is interested in, this is also faster that `half_float` unless we can improve the efficiency of decoding half-float bits (which is currently done using software) or until Java gets first-class support for half-floats.	2016-07-18 12:36:23 +02:00
Jason Tedor	c05f818160	Fix casing of "Elasticsearch" in how-to docs	2016-07-07 12:33:27 -04:00
Adrien Grand	873661df17	Fix typo.	2016-07-07 17:49:01 +02:00
Adrien Grand	f295a218a0	Add notes about sparsity.	2016-07-07 17:47:19 +02:00
Tanguy Leroux	453a4b9647	Fix documentation typo in How-To docs	2016-06-27 14:49:37 +02:00
Adrien Grand	fbad3af352	Add a how-to section to the docs. #18998 This moves the "Performance Considerations for Elasticsearch Indexing" blog post to the reference guide and adds similar recommendations for tuning disk usage and search speed.	2016-06-24 10:58:33 +02:00

18 Commits