OpenSearch/docs/reference/frozen-indices.asciidoc

109 lines
5.3 KiB
Plaintext
Raw Normal View History

[role="xpack"]
[testenv="basic"]
[[frozen-indices]]
= Frozen Indices
[partintro]
--
{es} indices keep some data structures in memory to allow you to search them
efficiently and to index into them. If you have a lot of indices then the
memory required for these data structures can add up to a significant amount.
For indices that are searched frequently it is better to keep these structures
in memory because it takes time to rebuild them. However, you might access some
of your indices so rarely that you would prefer to release the corresponding
memory and rebuild these data structures on each search.
For example, if you are using time-based indices to store log messages or time
series data then it is likely that older indices are searched much less often
than the more recent ones. Older indices also receive no indexing requests.
Furthermore, it is usually the case that searches of older indices are for
performing longer-term analyses for which a slower response is acceptable.
If you have such indices then they are good candidates for becoming _frozen
indices_. {es} builds the transient data structures of each shard of a frozen
index each time that shard is searched, and discards these data structures as
soon as the search is complete. Because {es} does not maintain these transient
data structures in memory, frozen indices consume much less heap than normal
indices. This allows for a much higher disk-to-heap ratio than would otherwise
be possible.
Searches performed on frozen indices use the small, dedicated,
<<search-throttled,`search_throttled` threadpool>> to control the number of
concurrent searches that hit frozen shards on each node. This limits the amount
of extra memory required for the transient data structures corresponding to
frozen shards, which consequently protects nodes against excessive memory
consumption.
Frozen indices are read-only: you cannot index into them.
Searches on frozen indices are expected to execute slowly. Frozen indices are
not intended for high search load. It is possible that a search of a frozen
index may take seconds or minutes to complete, even if the same searches
completed in milliseconds when the indices were not frozen.
--
== Best Practices
Since frozen indices provide a much higher disk to heap ratio at the expense of search latency, it is advisable to allocate frozen indices to
dedicated nodes to prevent searches on frozen indices influencing traffic on low latency nodes. There is significant overhead in loading
data structures on demand which can cause page faults and garbage collections, which further slow down query execution.
Since indices that are eligible for freezing are unlikely to change in the future, disk space can be optimized as described in <<tune-for-disk-usage>>.
It's highly recommended to <<indices-forcemerge,`_forcemerge`>> your indices prior to freezing to ensure that each shard has only a single
segment on disk. This not only provides much better compression but also simplifies the data structures needed to service aggregation
or sorted search requests.
[source,js]
--------------------------------------------------
POST /twitter/_forcemerge?max_num_segments=1
--------------------------------------------------
// CONSOLE
// TEST[setup:twitter]
== Searching a frozen index
Frozen indices are throttled in order to limit memory consumptions per node. The number of concurrently loaded frozen indices per node is
limited by the number of threads in the <<search-throttled>> threadpool, which is `1` by default.
Search requests will not be executed against frozen indices by default, even if a frozen index is named explicitly. This is
to prevent accidental slowdowns by targeting a frozen index by mistake. To include frozen indices a search request must be executed with
the query parameter `ignore_throttled=false`.
[source,js]
--------------------------------------------------
GET /twitter/_search?q=user:kimchy&ignore_throttled=false
--------------------------------------------------
// CONSOLE
// TEST[setup:twitter]
[IMPORTANT]
================================
While frozen indices are slow to search, they can be pre-filtered efficiently. The request parameter `pre_filter_shard_size` specifies
a threshold that, when exceeded, will enforce a round-trip to pre-filter search shards that cannot possibly match.
This filter phase can limit the number of shards significantly. For instance, if a date range filter is applied, then all indices (frozen or unfrozen) that do not contain documents within the date range can be skipped efficiently.
The default value for `pre_filter_shard_size` is `128` but it's recommended to set it to `1` when searching frozen indices. There is no
significant overhead associated with this pre-filter phase.
================================
== Monitoring frozen indices
Frozen indices are ordinary indices that use search throttling and a memory efficient shard implementation. For API's like the
`<<cat-indices>>` frozen indicies may identified by an index's `search.throttled` property (`sth`).
[source,js]
--------------------------------------------------
GET /_cat/indices/twitter?v&h=i,sth
--------------------------------------------------
// CONSOLE
// TEST[s/^/PUT twitter\nPOST twitter\/_freeze\n/]
The response looks like:
[source,txt]
--------------------------------------------------
i sth
twitter true
--------------------------------------------------
// TESTRESPONSE[_cat]