OpenSearch/docs
Adrien Grand ce11e0ee6d Filter cache: add a `_cache: auto` option and make it the default.
Up to now, all filters could be cached using the `_cache` flag that could be
set to `true` or `false` and the default was set depending on the type of the
`filter`. For instance, `script` filters are not cached by default while
`terms` are. For some filters, the default is more complicated and eg. date
range filters are cached unless they use `now` in a non-rounded fashion.

This commit adds a 3rd option called `auto`, which becomes the default for
all filters. So for all filters a cache wrapper will be returned, and the
decision will be made at caching time, per-segment. Here is the default logic:
 - if there is already a cache entry for this filter in the current segment,
   then return the cache entry.
 - else if the doc id set cannot iterate (eg. script filter) then do not cache.
 - else if the doc id set is already cacheable and it has been used twice or
   more in the last 1000 filters then cache it.
 - else if the filter is costly (eg. multi-term) and has been used twice or more
   in the last 1000 filters then cache it.
 - else if the doc id set is not cacheable and it has been used 5 times or more
   in the last 1000 filters, then load it into a cacheable set and cache it.
 - else return the uncached set.

So for instance geo-distance filters and script filters are going to use this
new default and are not going to be cached because of their iterators.

Similarly, date range filters are going to use this default all the time, but
it is very unlikely that those that use `now` in a not rounded fashion will get
reused so in practice they won't be cached.

`terms`, `range`, ... filters produce cacheable doc id sets with good iterators
so they will be cached as soon as they have been used twice.

Filters that don't produce cacheable doc id sets such as the `term` filter will
need to be used 5 times before being cached. This ensures that we don't spend
CPU iterating over all documents matching such filters unless we have good
evidence of reuse.

One last interesting point about this change is that it also applies to compound
filters. So if you keep on repeating the same `bool` filter with the same
underlying clauses, it will be cached on its own while up to now it used to
never be cached by default.

`_cache: true` has been changed to only cache on large segments, in order to not
pollute the cache since small segments should not be the bottleneck anyway.
However `_cache: false` still has the same semantics.

Close #8449
2014-12-18 15:51:36 +01:00
..
community Docs: Adding REST ACL plugin 2014-12-15 14:09:22 +01:00
groovy-api Updated groovy docs to point to the new groovy repo 2014-05-14 12:08:02 +02:00
java-api doc: transport sniff only adds data nodes 2014-12-17 11:29:01 +00:00
javascript added doc page for the JavaScipt client, and listed it in the clients list. 2013-12-17 15:26:29 -07:00
perl Docs: Updated Perl client page to mention async client 2014-10-29 14:48:56 +01:00
python [DOCS] adding a note on python client versioning schema 2014-02-11 03:43:53 +01:00
reference Filter cache: add a `_cache: auto` option and make it the default. 2014-12-18 15:51:36 +01:00
resiliency Docs: minor update to resiliency page 2014-11-05 15:00:53 +01:00
river [DOCS] Fixed typo 2013-10-05 17:10:30 +02:00
ruby [DOC] Added comprehensive documentation for the Ruby and Rails integrations 2014-07-10 11:21:27 +02:00
README.md [DOCS] various docs fixes 2014-01-23 10:52:13 +01:00

README.md

The Elasticsearch docs are in AsciiDoc format and can be built using the Elasticsearch documentation build process

See: https://github.com/elasticsearch/docs