diff --git a/docs/reference/query-dsl/filters.asciidoc b/docs/reference/query-dsl/filters.asciidoc index f30dcb38c78..0c78dd21934 100644 --- a/docs/reference/query-dsl/filters.asciidoc +++ b/docs/reference/query-dsl/filters.asciidoc @@ -15,37 +15,76 @@ filter does not require a lot of memory, and will cause other queries executing against the same filter (same parameters) to be blazingly fast. -Some filters already produce a result that is easily cacheable, and the -difference between caching and not caching them is the act of placing -the result in the cache or not. These filters, which include the -<>, +However the cost of caching is not the same for all filters. For +instance some filters are already fast out of the box while caching could +add significant overhead, and some filters produce results that are already +cacheable so caching them is just a matter of putting the result in the +cache. + +The default caching policy, `_cache: auto`, tracks the 1000 most recently +used filters on a per-index basis and makes decisions based on their +frequency. + +[float] +==== Filters that read directly the index structure + +Some filters can directly read the index structure and potentially jump +over large sequences of documents that are not worth evaluating (for +instance when these documents do not match the query). Caching these +filters introduces overhead given that all documents that the filter +matches need to be consumed in order to be loaded into the cache. + +These filters, which include the <> and +<> filters, are only cached after they +appear 5 times or more in the history of the 1000 most recently used +filters. + +[float] +==== Filters that produce results that are already cacheable + +Some filters produce results that are already cacheable, and the difference +between caching and not caching them is the act of placing the result in +the cache or not. These filters, which include the <>, <>, and -<> filters, are by -default cached and are recommended to use (compared to the equivalent -query version) when the same filter (same parameters) will be used -across multiple different queries (for example, a range filter with age -higher than 10). +<> filters, are by default cached after they +appear twice or more in the history of the most 1000 recently used filters. -Other filters, usually already working with the field data loaded into -memory, are not cached by default. Those filters are already very fast, -and the process of caching them requires extra processing in order to -allow the filter result to be used with different queries than the one -executed. These filters, including the geo, -and <> filters -are not cached by default. +[float] +==== Computational filters -The last type of filters are those working with other filters. The +Some filters need to run some computation in order to figure out whether +a given document matches a filter. These filters, which include the geo and +<> filters, but also the +<> and <> +filters when using the `fielddata` execution mode are never cached by default, +as it would require to evaluate the filter on all documents in your indices +while they can otherwise be only evaluated on documents that match the query. + +[float] +==== Compound filters + +The last type of filters are those working with other filters, and includes +the <>, <>, <> and -<> filters are not -cached as they basically just manipulate the internal filters. +<> filters. + +There is no general rule about these filters. Depending on the filters that +they wrap, they will sometimes return a filter that dynamically evaluates the +sub filters and sometimes evaluate the sub filters eagerly in order to return +a result that is already cacheable, so depending on the case, these filters +will be cached after they appear 2+ or 5+ times in the history of the most +1000 recently used filters. + +[float] +==== Overriding the default behaviour All filters allow to set `_cache` element on them to explicitly control caching. It accepts 3 values: `true` in order to cache the filter, `false` to make sure that the filter will not be cached, and `auto`, which is the default and will decide on whether to cache the filter based on the cost -to cache the filter and how often the filter has been used. +to cache it and how often it has been used as explained above. Filters also allow to set `_cache_key` which will be used as the caching key for that filter. This can be handy when using very large