Commit Graph

68 Commits

Author SHA1 Message Date
Simon Willnauer e81804cfa4 Add a shard filter search phase to pre-filter shards based on query rewriting (#25658)
Today if we search across a large amount of shards we hit every shard. Yet, it's quite
common to search across an index pattern for time based indices but filtering will exclude
all results outside a certain time range ie. `now-3d`. While the search can potentially hit
hundreds of shards the majority of the shards might yield 0 results since there is not document
that is within this date range. Kibana for instance does this regularly but used `_field_stats`
to optimize the indexes they need to query. Now with the deprecation of `_field_stats` and it's upcoming removal a single dashboard in kibana can potentially turn into searches hitting hundreds or thousands of shards and that can easily cause search rejections even though the most of the requests are very likely super cheap and only need a query rewriting to early terminate with 0 results.

This change adds a pre-filter phase for searches that can, if the number of shards are higher than a the `pre_filter_shard_size` threshold (defaults to 128 shards), fan out to the shards
and check if the query can potentially match any documents at all. While false positives are possible, a negative response means that no matches are possible. These requests are not subject to rejection and can greatly reduce the number of shards a request needs to hit. The approach here is preferable to the kibana approach with field stats since it correctly handles aliases and uses the correct threadpools to execute these requests. Further it's completely transparent to the user and improves scalability of elasticsearch in general on large clusters.
2017-07-12 22:19:20 +02:00
Adrien Grand 8c869e2a0b More advices around search speed and disk usage. (#25252)
It adds notes about:
 - how preference can help optimize cache usage
 - the fact that too many replicas can hurt search performance due to lower
   utilization of the filesystem cache
 - how index sorting can improve _source compression
 - how always putting fields in the same order in documents can improve _source
   compression
2017-06-16 11:23:40 +02:00
Adrien Grand 0c117145f6 Upgrade to lucene-7.0.0-snapshot-92b1783. (#25222)
This snapshot has faster range queries on range fields (LUCENE-7828), more
accurate norms (LUCENE-7730) and the ability to use fake term frequencies
(LUCENE-7854).
2017-06-15 09:52:07 +02:00
Adrien Grand bbdf50f6bd Docs: More search speed advices. (#24802) 2017-06-01 17:23:22 +02:00
Glen Smith a590a22ea3 Add note and link to 'tune for disk usage' (#23252)
* Add note and link to 'tune for disk usage'

* Changed formatting as suggested

Thanks, @ clintongormley!
2017-02-20 20:31:19 +01:00
Elijah 3b92179e09 Improve wording in recipes docs
This commit improves some of the wording the recipes docs.

Relates #22661
2017-01-17 21:00:36 -05:00
Elijah 297b1b7d9a Capitalize "Elasticsearch" in indexing speed docs
This commit fixes the capitalization of "Elasticsearch" in the indexing
speed docs.

Relates #22659
2017-01-17 12:33:01 -05:00
Adrien Grand 52408fc389 Add a recommendation against large documents to the docs. (#21652) 2016-11-21 15:01:36 +01:00
Adrien Grand 68b0e395b2 Add recommendations about getting consistent scores despite shards and replicas. (#21167)
This is a topic that has triggered many questions recently so it would be good
to have these recommendations documented.
2016-11-02 10:50:38 +01:00
Adrien Grand 9cbbddb6dc Add support for `quote_field_suffix` to `simple_query_string`. (#21060)
Closes #18641
2016-10-28 09:11:57 +02:00
Pascal Borreli fcb01deb34 Fixed typos (#20843) 2016-10-10 14:51:47 -06:00
Adrien Grand cdc27b75b8 Add more information to the how-to docs. #20297
- use auto-generated ids for indexing #20211
 - use rounded dates in queries #20115
2016-09-02 14:28:47 +02:00
Adrien Grand 398d70b567 Add `scaled_float`. #19264
This is a tentative to revive #15939 motivated by elastic/beats#1941.
Half-floats are a pretty bad option for storing percentages. They would likely
require 2 bytes all the time while they don't need more than one byte.

So this PR exposes a new `scaled_float` type that requires a `scaling_factor`
and internally indexes `value*scaling_factor` in a long field. Compared to the
original PR it exposes a lower-level API so that the trade-offs are clearer and
avoids any reference to fixed precision that might imply that this type is more
accurate (actually it is *less* accurate).

In addition to being more space-efficient for some use-cases that beats is
interested in, this is also faster that `half_float` unless we can improve the
efficiency of decoding half-float bits (which is currently done using software)
or until Java gets first-class support for half-floats.
2016-07-18 12:36:23 +02:00
Jason Tedor c05f818160 Fix casing of "Elasticsearch" in how-to docs 2016-07-07 12:33:27 -04:00
Adrien Grand 873661df17 Fix typo. 2016-07-07 17:49:01 +02:00
Adrien Grand f295a218a0 Add notes about sparsity. 2016-07-07 17:47:19 +02:00
Tanguy Leroux 453a4b9647 Fix documentation typo in How-To docs 2016-06-27 14:49:37 +02:00
Adrien Grand fbad3af352 Add a how-to section to the docs. #18998
This moves the "Performance Considerations for Elasticsearch Indexing" blog post
to the reference guide and adds similar recommendations for tuning disk usage
and search speed.
2016-06-24 10:58:33 +02:00