[Docs] Improve documentation of the new caching policy for filters.

This commit is contained in:
Adrien Grand 2014-12-22 17:14:47 +01:00
parent 391b5f3f5e
commit fb6c3b7c29
1 changed files with 59 additions and 20 deletions

View File

@ -15,37 +15,76 @@ filter does not require a lot of memory, and will cause other queries
executing against the same filter (same parameters) to be blazingly
fast.
Some filters already produce a result that is easily cacheable, and the
difference between caching and not caching them is the act of placing
the result in the cache or not. These filters, which include the
<<query-dsl-term-filter,term>>,
However the cost of caching is not the same for all filters. For
instance some filters are already fast out of the box while caching could
add significant overhead, and some filters produce results that are already
cacheable so caching them is just a matter of putting the result in the
cache.
The default caching policy, `_cache: auto`, tracks the 1000 most recently
used filters on a per-index basis and makes decisions based on their
frequency.
[float]
==== Filters that read directly the index structure
Some filters can directly read the index structure and potentially jump
over large sequences of documents that are not worth evaluating (for
instance when these documents do not match the query). Caching these
filters introduces overhead given that all documents that the filter
matches need to be consumed in order to be loaded into the cache.
These filters, which include the <<query-dsl-term-filter,term>> and
<<query-dsl-term-query,query>> filters, are only cached after they
appear 5 times or more in the history of the 1000 most recently used
filters.
[float]
==== Filters that produce results that are already cacheable
Some filters produce results that are already cacheable, and the difference
between caching and not caching them is the act of placing the result in
the cache or not. These filters, which include the
<<query-dsl-terms-filter,terms>>,
<<query-dsl-prefix-filter,prefix>>, and
<<query-dsl-range-filter,range>> filters, are by
default cached and are recommended to use (compared to the equivalent
query version) when the same filter (same parameters) will be used
across multiple different queries (for example, a range filter with age
higher than 10).
<<query-dsl-range-filter,range>> filters, are by default cached after they
appear twice or more in the history of the most 1000 recently used filters.
Other filters, usually already working with the field data loaded into
memory, are not cached by default. Those filters are already very fast,
and the process of caching them requires extra processing in order to
allow the filter result to be used with different queries than the one
executed. These filters, including the geo,
and <<query-dsl-script-filter,script>> filters
are not cached by default.
[float]
==== Computational filters
The last type of filters are those working with other filters. The
Some filters need to run some computation in order to figure out whether
a given document matches a filter. These filters, which include the geo and
<<query-dsl-script-filter,script>> filters, but also the
<<query-dsl-terms-filter,terms>> and <<query-dsl-range-filter,range>>
filters when using the `fielddata` execution mode are never cached by default,
as it would require to evaluate the filter on all documents in your indices
while they can otherwise be only evaluated on documents that match the query.
[float]
==== Compound filters
The last type of filters are those working with other filters, and includes
the <<query-dsl-bool-filter,bool>>,
<<query-dsl-and-filter,and>>,
<<query-dsl-not-filter,not>> and
<<query-dsl-or-filter,or>> filters are not
cached as they basically just manipulate the internal filters.
<<query-dsl-or-filter,or>> filters.
There is no general rule about these filters. Depending on the filters that
they wrap, they will sometimes return a filter that dynamically evaluates the
sub filters and sometimes evaluate the sub filters eagerly in order to return
a result that is already cacheable, so depending on the case, these filters
will be cached after they appear 2+ or 5+ times in the history of the most
1000 recently used filters.
[float]
==== Overriding the default behaviour
All filters allow to set `_cache` element on them to explicitly control
caching. It accepts 3 values: `true` in order to cache the filter, `false`
to make sure that the filter will not be cached, and `auto`, which is the
default and will decide on whether to cache the filter based on the cost
to cache the filter and how often the filter has been used.
to cache it and how often it has been used as explained above.
Filters also allow to set `_cache_key` which will be used as the
caching key for that filter. This can be handy when using very large