Documented the query cache module

Related to #7161 and #7167
This commit is contained in:
Clinton Gormley 2014-08-06 11:54:51 +02:00
parent 6f09eb1b06
commit e7f1aa4f4f
6 changed files with 225 additions and 43 deletions

View File

@ -72,6 +72,8 @@ include::index-modules/translog.asciidoc[]
include::index-modules/cache.asciidoc[] include::index-modules/cache.asciidoc[]
include::index-modules/query-cache.asciidoc[]
include::index-modules/fielddata.asciidoc[] include::index-modules/fielddata.asciidoc[]
include::index-modules/codec.asciidoc[] include::index-modules/codec.asciidoc[]

View File

@ -0,0 +1,145 @@
[[index-modules-shard-query-cache]]
== Shard query cache
coming[1.4.0]
When a search request is run against an index or against many indices, each
involved shard executes the search locally and returns its local results to
the _coordinating node_, which combines these shard-level results into a
``global'' result set.
The shard-level query cache module caches the local results on each shard.
This allows frequently used (and potentially heavy) search requests to return
results almost instantly. The query cache is a very good fit for the logging
use case, where only the most recent index is being actively updated --
results from older indices will be served directly from the cache.
[IMPORTANT]
==================================
For now, the query cache will only only cache the results of search requests
where <<count,`?search_type=count`>>, so it will not cache `hits`,
but it will cache `hits.total`, <<search-aggregations,aggregations>>, and
<<search-suggesters,suggestions>>.
Queries that use `now` (see <<date-math>>) cannot be cached.
==================================
[float]
=== Cache invalidation
The cache is smart -- it keeps the same _near real-time_ promise as uncached
search.
Cached results are invalidated automatically whenever the shard refreshes, but
only if the data in the shard has actually changed. In other words, you will
always get the same results from the cache as you would for an uncached search
request.
The longer the refresh interval, the longer that cached entries will remain
valid. If the cache is full, the least recently used cache keys will be
evicted.
The cache can be expired manually with the <<indices-clearcache,`clear-cache` API>>:
[source,json]
------------------------
curl -XPOST 'localhost:9200/kimchy,elasticsearch/_cache/clear?query_cache=true'
------------------------
[float]
=== Enabling caching by default
The cache is not enabled by default, but can be enabled when creating a new
index as follows:
[source,json]
-----------------------------
curl -XPUT localhost:9200/my_index -d'
{
"settings": {
"index.cache.query.enable": true
}
}
'
-----------------------------
It can also be enabled or disabled dynamically on an existing index with the
<<indices-update-settings,`update-settings`>> API:
[source,json]
-----------------------------
curl -XPUT localhost:9200/my_index/_settings -d'
{ "index.cache.query.enable": true }
'
-----------------------------
[float]
=== Enabling caching per request
The `query_cache` query-string parameter can be used to enable or disable
caching on a *per-query* basis. If set, it overrides the index-level setting:
[source,json]
-----------------------------
curl localhost:9200/my_index/_search?search_type=count&query_cache=true -d'
{
"aggs": {
"popular_colors": {
"terms": {
"field": "colors"
}
}
}
}
'
-----------------------------
IMPORTANT: If your query uses a script whose result is not deterministic (e.g.
it uses a random function or references the current time) you should set the
`query_cache` flag to `false` to disable caching for that request.
[float]
=== Cache key
The whole JSON body is used as the cache key. This means that if the JSON
changes -- for instance if keys are output in a different order -- then the
cache key will not be recognised.
TIP: Most JSON libraries support a _canonical_ mode which ensures that JSON
keys are always emitted in the same order. This canonical mode can be used in
the application to ensure that a request is always serialized in the same way.
[float]
=== Cache settings
The cache is managed at the node level, and has a default maximum size of `1%`
of the heap. This can be changed in the `config/elasticsearch.yml` file with:
[source,yaml]
--------------------------------
indices.cache.query.size: 2%
--------------------------------
Also, you can use the +indices.cache.query.expire+ setting to specify a TTL
for cached results, but there should be no reason to do so. Remember that
stale results are automatically invalidated when the index is refreshed. This
setting is provided for completeness' sake only.
[float]
=== Monitoring cache usage
The size of the cache (in bytes) and the number of evictions can be viewed
by index, with the <<indices-stats,`indices-stats`>> API:
[source,json]
------------------------
curl -XPOST 'localhost:9200/_stats/query_cache?pretty&human'
------------------------
or by node with the <<cluster-nodes-stats,`nodes-stats`>> API:
[source,json]
------------------------
curl -XPOST 'localhost:9200/_nodes/stats/indices/query_cache?pretty&human'
------------------------

View File

@ -9,9 +9,9 @@ associated with one ore more indices.
$ curl -XPOST 'http://localhost:9200/twitter/_cache/clear' $ curl -XPOST 'http://localhost:9200/twitter/_cache/clear'
-------------------------------------------------- --------------------------------------------------
The API, by default, will clear all caches. Specific caches can be The API, by default, will clear all caches. Specific caches can be cleaned
cleaned explicitly by setting `filter`, `field_data` or `id_cache` to explicitly by setting `filter`, `field_data`, `query_cache` coming[1.4.0],
`true`. or `id_cache` to `true`.
All caches relating to a specific field(s) can also be cleared by All caches relating to a specific field(s) can also be cleared by
specifying `fields` parameter with a comma delimited list of the specifying `fields` parameter with a comma delimited list of the

View File

@ -39,20 +39,32 @@ specified as well in the URI. Those stats can be any of:
groups). The `groups` parameter accepts a comma separated list of group names. groups). The `groups` parameter accepts a comma separated list of group names.
Use `_all` to return statistics for all groups. Use `_all` to return statistics for all groups.
`warmer`:: Warmer statistics. `completion`:: Completion suggest statistics.
`merge`:: Merge statistics.
`fielddata`:: Fielddata statistics. `fielddata`:: Fielddata statistics.
`flush`:: Flush statistics. `flush`:: Flush statistics.
`completion`:: Completion suggest statistics. `merge`:: Merge statistics.
`query_cache`:: <<index-modules-shard-query-cache,Shard query cache>> statistics. coming[1.4.0]
`refresh`:: Refresh statistics. `refresh`:: Refresh statistics.
`suggest`:: Suggest statistics. `suggest`:: Suggest statistics.
`warmer`:: Warmer statistics.
Some statistics allow per field granularity which accepts a list comma-separated list of included fields. By default all fields are included: Some statistics allow per field granularity which accepts a list
comma-separated list of included fields. By default all fields are included:
[horizontal] [horizontal]
`fields`:: List of fields to be included in the statistics. This is used as the default list unless a more specific field list is provided (see below). `fields`::
`completion_fields`:: List of fields to be included in the Completion Suggest statistics
`fielddata_fields`:: List of fields to be included in the Fielddata statistics List of fields to be included in the statistics. This is used as the
default list unless a more specific field list is provided (see below).
`completion_fields`::
List of fields to be included in the Completion Suggest statistics.
`fielddata_fields`::
List of fields to be included in the Fielddata statistics.
Here are some samples: Here are some samples:

View File

@ -125,6 +125,18 @@ aggregated for the buckets created by their "parent" bucket aggregation.
There are different bucket aggregators, each with a different "bucketing" strategy. Some define a single bucket, some There are different bucket aggregators, each with a different "bucketing" strategy. Some define a single bucket, some
define fixed number of multiple buckets, and others dynamically create the buckets during the aggregation process. define fixed number of multiple buckets, and others dynamically create the buckets during the aggregation process.
[float]
=== Caching heavy aggregations
coming[1.4.0]
Frequently used aggregations (e.g. for display on the home page of a website)
can be cached for faster responses. These cached results are the same results
that would be returned by an uncached aggregation -- you will never get stale
results.
See <<index-modules-shard-query-cache>> for more details.
include::aggregations/metrics.asciidoc[] include::aggregations/metrics.asciidoc[]

View File

@ -46,39 +46,50 @@ And here is a sample response:
[float] [float]
=== Parameters === Parameters
[cols="<,<",options="header",] [horizontal]
|======================================================================= `timeout`::
|Name |Description
|`timeout` |A search timeout, bounding the search request to be executed
within the specified time value and bail with the hits accumulated up to
that point when expired. Defaults to no timeout. See <<time-units>>.
|`from` |The starting from index of the hits to return. Defaults to `0`. A search timeout, bounding the search request to be executed within the
specified time value and bail with the hits accumulated up to that point
when expired. Defaults to no timeout. See <<time-units>>.
|`size` |The number of hits to return. Defaults to `10`. `from`::
|`search_type` |The type of the search operation to perform. Can be The starting from index of the hits to return. Defaults to `0`.
`dfs_query_then_fetch`, `dfs_query_and_fetch`, `query_then_fetch`,
`query_and_fetch`. Defaults to `query_then_fetch`. See
<<search-request-search-type,_Search Type_>> for
more details on the different types of search that can be performed.
|coming[1.4.0] `terminate_after` |The maximum number of documents to collect for `size`::
each shard, upon reaching which the query execution will terminate early.
If set, the response will have a boolean field `terminated_early` to
indicate whether the query execution has actually terminated_early.
Defaults to no terminate_after.
|=======================================================================
Out of the above, the `search_type` is the one that can not be passed The number of hits to return. Defaults to `10`.
within the search request body, and in order to set it, it must be
passed as a request REST parameter.
The rest of the search request should be passed within the body itself. `search_type`::
The body content can also be passed as a REST parameter named `source`.
Both HTTP GET and HTTP POST can be used to execute search with body. The type of the search operation to perform. Can be
Since not all clients support GET with body, POST is allowed as well. `dfs_query_then_fetch`, `dfs_query_and_fetch`, `query_then_fetch`,
`query_and_fetch`. Defaults to `query_then_fetch`. See
<<search-request-search-type,_Search Type_>> for more.
`query_cache`::
coming[1.4.0] Set to `true` or `false` to enable or disable the caching
of search results for requests where `?search_type=count`, ie
aggregations and suggestions. See <<index-modules-shard-query-cache>>.
`terminate_after`::
coming[1.4.0] The maximum number of documents to collect for each shard,
upon reaching which the query execution will terminate early. If set, the
response will have a boolean field `terminated_early` to indicate whether
the query execution has actually terminated_early. Defaults to no
terminate_after.
Out of the above, the `search_type` and the `query_cache` must be passed as
query-string parameters. The rest of the search request should be passed
within the body itself. The body content can also be passed as a REST
parameter named `source`.
Both HTTP GET and HTTP POST can be used to execute search with body. Since not
all clients support GET with body, POST is allowed as well.
include::request/query.asciidoc[] include::request/query.asciidoc[]