Add guide for using ILM with existing indices (#43667)
In many cases, including migration from previous versions of data shippers (e.g. Beats), it is useful to use ILM to manage historical indices, which are no longer being written to. This commit adds a guide which gives an example of how to do that.
This commit is contained in:
parent
87fb487299
commit
d0d808776f
|
@ -252,7 +252,7 @@ POST _reindex
|
|||
// CONSOLE
|
||||
// TEST[setup:twitter]
|
||||
|
||||
|
||||
[[reindex-scripts]]
|
||||
Like `_update_by_query`, `_reindex` supports a script that modifies the
|
||||
document. Unlike `_update_by_query`, the script is allowed to modify the
|
||||
document's metadata. This example bumps the version of the source document:
|
||||
|
|
|
@ -0,0 +1,416 @@
|
|||
[role="xpack"]
|
||||
[testenv="basic"]
|
||||
[[ilm-with-existing-indices]]
|
||||
== Using {ilm-init} with existing indices
|
||||
|
||||
While it is recommended to use {ilm-init} to manage the index lifecycle from
|
||||
start to finish, it may be useful to use {ilm-init} with existing indices,
|
||||
particularly when transitioning from an alternative method of managing the index
|
||||
lifecycle such as Curator, or when migrating from daily indices to
|
||||
rollover-based indices. Such use cases are fully supported, but there are some
|
||||
configuration differences from when {ilm-init} can manage the complete index
|
||||
lifecycle.
|
||||
|
||||
This section describes strategies to leverage {ilm-init} for existing periodic
|
||||
indices when migrating to fully {ilm-init}-manged indices, which can be done in
|
||||
a few different ways, each providing different tradeoffs. As an example, we'll
|
||||
walk through a use case of a very simple logging index with just a field for the
|
||||
log message and a timestamp.
|
||||
|
||||
First, we need to create a template for these indices:
|
||||
|
||||
[source,js]
|
||||
-----------------------
|
||||
PUT _template/mylogs_template
|
||||
{
|
||||
"index_patterns": [
|
||||
"mylogs-*"
|
||||
],
|
||||
"settings": {
|
||||
"number_of_shards": 1,
|
||||
"number_of_replicas": 1
|
||||
},
|
||||
"mappings": {
|
||||
"properties": {
|
||||
"message": {
|
||||
"type": "text"
|
||||
},
|
||||
"@timestamp": {
|
||||
"type": "date"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
-----------------------
|
||||
// CONSOLE
|
||||
// TEST
|
||||
|
||||
And we'll ingest a few documents to create a few daily indices:
|
||||
|
||||
[source,js]
|
||||
-----------------------
|
||||
POST mylogs-pre-ilm-2019.06.24/_doc
|
||||
{
|
||||
"@timestamp": "2019-06-24T10:34:00",
|
||||
"message": "this is one log message"
|
||||
}
|
||||
-----------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
[source,js]
|
||||
-----------------------
|
||||
POST mylogs-pre-ilm-2019.06.25/_doc
|
||||
{
|
||||
"@timestamp": "2019-06-25T17:42:00",
|
||||
"message": "this is another log message"
|
||||
}
|
||||
-----------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
Now that we have these indices, we'll look at a few different ways of migrating
|
||||
these indices to ILM.
|
||||
|
||||
[[ilm-with-existing-periodic-indices]]
|
||||
=== Managing existing periodic indices with {ilm-init}
|
||||
|
||||
NOTE: The examples in this section assume daily indices as set up in
|
||||
<<ilm-with-existing-indices,the previous section>>.
|
||||
|
||||
The simplest way to manage existing indices while transitioning to fully
|
||||
{ilm-init}-managed indices is to allow all new indices to be fully managed by
|
||||
{ilm-init} before attaching {ilm-init} policies to existing indices. To do this,
|
||||
all new documents should be directed to {ilm-init}-managed indices - if you are
|
||||
using Beats or Logstash data shippers, upgrading all of those shippers to
|
||||
version 7.0.0 or higher will take care of that part for you. If you are not
|
||||
using Beats or Logstash, you may need to set up ILM for new indices yourself as
|
||||
demonstrated in the <<getting-started-index-lifecycle-management,getting started
|
||||
guide>>.
|
||||
|
||||
NOTE: If you are using Beats through Logstash, you may need to change your
|
||||
Logstash output configuration and invoke the Beats setup to use ILM for new
|
||||
data.
|
||||
|
||||
Once all new documents are being written to fully {ilm-init}-managed indices, it
|
||||
is easy to add an {ilm-init} policy to existing indices. However, there are two
|
||||
things to keep in mind when doing this, and a trick that makes those two things
|
||||
much easier to handle.
|
||||
|
||||
The two biggest things to keep in mind are:
|
||||
|
||||
1. Existing periodic indices shouldn't use policies with rollover, because
|
||||
rollover is used to manage where new data goes. Since existing indices should no
|
||||
longer be receiving new documents, there is no point to using rollover for them.
|
||||
|
||||
2. {ilm-init} policies attached to existing indices will compare the `min_age`
|
||||
for each phase to the original creation date of the index, and so might proceed
|
||||
through multiple phases immediately.
|
||||
|
||||
The first one is the most important, because it makes it difficult to use the
|
||||
same policy for new and existing periodic indices. But that's easy to solve
|
||||
with one simple trick: Create a second policy for existing indices, in addition
|
||||
to the one for new indices. {ilm-init} policies are cheap to create, so don't be
|
||||
afraid to have more than one. Modifying a policy designed for new indices to be
|
||||
used on existing indices is generally very simple: just remove the `rollover`
|
||||
action.
|
||||
|
||||
For example, if you created a policy for your new indices with each phase
|
||||
like so:
|
||||
[source,js]
|
||||
-----------------------
|
||||
PUT _ilm/policy/mylogs_policy
|
||||
{
|
||||
"policy": {
|
||||
"phases": {
|
||||
"hot": {
|
||||
"actions": {
|
||||
"rollover": {
|
||||
"max_size": "25GB"
|
||||
}
|
||||
}
|
||||
},
|
||||
"warm": {
|
||||
"min_age": "1d",
|
||||
"actions": {
|
||||
"forcemerge": {
|
||||
"max_num_segments": 1
|
||||
}
|
||||
}
|
||||
},
|
||||
"cold": {
|
||||
"min_age": "7d",
|
||||
"actions": {
|
||||
"freeze": {}
|
||||
}
|
||||
},
|
||||
"delete": {
|
||||
"min_age": "30d",
|
||||
"actions": {
|
||||
"delete": {}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
-----------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
You can create a policy for pre-existing indices by removing the `rollover`
|
||||
action, and in this case, the `hot` phase is now empty so we can remove that
|
||||
too:
|
||||
|
||||
[source,js]
|
||||
-----------------------
|
||||
PUT _ilm/policy/mylogs_policy_existing
|
||||
{
|
||||
"policy": {
|
||||
"phases": {
|
||||
"warm": {
|
||||
"min_age": "1d",
|
||||
"actions": {
|
||||
"forcemerge": {
|
||||
"max_num_segments": 1
|
||||
}
|
||||
}
|
||||
},
|
||||
"cold": {
|
||||
"min_age": "7d",
|
||||
"actions": {
|
||||
"freeze": {}
|
||||
}
|
||||
},
|
||||
"delete": {
|
||||
"min_age": "30d",
|
||||
"actions": {
|
||||
"delete": {}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
-----------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
Creating a separate policy for existing indices will also allow using different
|
||||
`min_age` values. You may want to use higher values to prevent many indices from
|
||||
running through the policy at once, which may be important if your policy
|
||||
includes potentially resource-intensive operations like force merge.
|
||||
|
||||
You can configure the lifecycle for many indices at once by using wildcards in
|
||||
the index name when calling the <<indices-update-settings,Update Settings API>>
|
||||
to set the policy name, but be careful that you don't include any indices that
|
||||
you don't want to change the policy for:
|
||||
|
||||
[source,js]
|
||||
-----------------------
|
||||
PUT mylogs-pre-ilm*/_settings <1>
|
||||
{
|
||||
"index": {
|
||||
"lifecycle": {
|
||||
"name": "mylogs_policy_existing"
|
||||
}
|
||||
}
|
||||
}
|
||||
-----------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
<1> This pattern will match all indices with names that start with
|
||||
`mylogs-pre-ilm`
|
||||
|
||||
Once all pre-{ilm-init} indices have aged out and been deleted, the policy for
|
||||
older periodic indices can be deleted.
|
||||
|
||||
[[ilm-reindexing-into-rollover]]
|
||||
=== Reindexing via {ilm-init}
|
||||
|
||||
NOTE: The examples in this section assume daily indices as set up in
|
||||
<<ilm-with-existing-indices,the previous section>>.
|
||||
|
||||
In some cases, it may be useful to reindex data into {ilm-init}-managed indices.
|
||||
This is more complex than simply attaching policies to existing indices as
|
||||
described in <<ilm-with-existing-periodic-indices,the previous section>>, and
|
||||
requires pausing indexing during the reindexing process. However, this technique
|
||||
may be useful in cases where periodic indices were created with very small
|
||||
amounts of data leading to excessive shard counts, or for indices which grow
|
||||
steadily over time, but have not been broken up into time-series indices leading
|
||||
to shards which are much too large, situations that cause significant
|
||||
performance problems.
|
||||
|
||||
Before getting started with reindexing data, the new index structure should be
|
||||
set up. For this section, we'll be using the same setup described in
|
||||
<<ilm-with-existing-indices,{ilm-imit} with existing indices>>.
|
||||
|
||||
First, we'll set up a policy with rollover, and can include any additional
|
||||
phases required. For simplicity, we'll just use rollover:
|
||||
|
||||
[source,js]
|
||||
-----------------------
|
||||
PUT _ilm/policy/sample_policy
|
||||
{
|
||||
"policy": {
|
||||
"phases": {
|
||||
"hot": {
|
||||
"actions": {
|
||||
"rollover": {
|
||||
"max_age": "7d",
|
||||
"max_size": "50G"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
-----------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
And now we'll update the index template for our indices to include the relevant
|
||||
{ilm-init} settings:
|
||||
|
||||
[source,js]
|
||||
-----------------------
|
||||
PUT _template/mylogs_template
|
||||
{
|
||||
"index_patterns": [
|
||||
"ilm-mylogs-*" <1>
|
||||
],
|
||||
"settings": {
|
||||
"number_of_shards": 1,
|
||||
"number_of_replicas": 1,
|
||||
"index": {
|
||||
"lifecycle": {
|
||||
"name": "mylogs_condensed_policy", <2>
|
||||
"rollover_alias": "mylogs" <3>
|
||||
}
|
||||
}
|
||||
},
|
||||
"mappings": {
|
||||
"properties": {
|
||||
"message": {
|
||||
"type": "text"
|
||||
},
|
||||
"@timestamp": {
|
||||
"type": "date"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
-----------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
<1> The new index pattern has a prefix compared to the old one, this will
|
||||
make it easier to reindex later
|
||||
<2> The name of the policy we defined above
|
||||
<3> The name of the alias we'll use to write to and query
|
||||
|
||||
And create the first index with the alias specified in the `rollover_alias`
|
||||
setting in the index template:
|
||||
|
||||
[source,js]
|
||||
-----------------------
|
||||
PUT ilm-mylogs-000001
|
||||
{
|
||||
"aliases": {
|
||||
"mylogs": {
|
||||
"is_write_index": true
|
||||
}
|
||||
}
|
||||
}
|
||||
-----------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
All new documents should be indexed via the `mylogs` alias at this point. Adding
|
||||
new data to the old indices during the reindexing process can cause data to be
|
||||
added to the old indices, but not be reindexed into the new indices.
|
||||
|
||||
NOTE: If you do not want to mix new data and old data in the new ILM-managed
|
||||
indices, indexing of new data should be paused entirely while the reindex
|
||||
completes. Mixing old and new data within one index is safe, but keep in mind
|
||||
that the indices with mixed data should be retained in their entirety until you
|
||||
are ready to delete both the old and new data.
|
||||
|
||||
By default, {ilm-init} only checks rollover conditions every 10 minutes. Under
|
||||
normal indexing load, this usually works well, but during reindexing, indices
|
||||
can grow very, very quickly. We'll need to set the poll interval to something
|
||||
shorter to ensure that the new indices don't grow too large while waiting for
|
||||
the rollover check:
|
||||
|
||||
[source,js]
|
||||
-----------------------
|
||||
PUT _cluster/settings
|
||||
{
|
||||
"transient": {
|
||||
"indices.lifecycle.poll_interval": "1m" <1>
|
||||
}
|
||||
}
|
||||
-----------------------
|
||||
// CONSOLE
|
||||
// TEST[skip:don't want to overwrite this setting for other tests]
|
||||
<1> This tells ILM to check for rollover conditions every minute
|
||||
|
||||
We're now ready to reindex our data using the <<docs-reindex,reindex API>>. If
|
||||
you have a timestamp or date field in your documents, as in this example, it may
|
||||
be useful to specify that the documents should be sorted by that field - this
|
||||
will mean that all documents in `ilm-mylogs-000001` come before all documents in
|
||||
`ilm-mylogs-000002`, and so on. However, if this is not a requirement, omitting
|
||||
the sort will allow the data to be reindexed more quickly.
|
||||
|
||||
IMPORTANT: If your data uses document IDs generated by means other than
|
||||
Elasticsearch's automatic ID generation, you may need to do additional
|
||||
processing to ensure that the document IDs don't conflict during the reindex, as
|
||||
documents will retain their original IDs. One way to do this is to use a
|
||||
<<reindex-scripts,script>> in the reindex call to append the original index name
|
||||
to the document ID.
|
||||
|
||||
[source,js]
|
||||
-----------------------
|
||||
POST _reindex
|
||||
{
|
||||
"source": {
|
||||
"index": "mylogs-*", <1>
|
||||
"sort": { "@timestamp": "desc" }
|
||||
},
|
||||
"dest": {
|
||||
"index": "mylogs", <2>
|
||||
"op_type": "create" <3>
|
||||
}
|
||||
}
|
||||
-----------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
<1> This index pattern matches our existing indices. Using the prefix for
|
||||
the new indices makes using this index pattern much easier.
|
||||
<2> The alias set up above
|
||||
<3> This option will cause the reindex to abort if it encounters multiple
|
||||
documents with the same ID. This is optional, but recommended to prevent
|
||||
accidentally overwriting documents if two documents from different indices
|
||||
have the same ID.
|
||||
|
||||
Once this completes, indexing new data can be resumed, as long as all new
|
||||
documents are indexed into the alias used above. All data, existing and new, can
|
||||
be queried using that alias as well. We should also be sure to set the
|
||||
{ilm-init} poll interval back to its default value, because keeping it set too
|
||||
low can cause unnecessary load on the current master node:
|
||||
|
||||
[source,js]
|
||||
-----------------------
|
||||
PUT _cluster/settings
|
||||
{
|
||||
"transient": {
|
||||
"indices.lifecycle.poll_interval": null
|
||||
}
|
||||
}
|
||||
|
||||
-----------------------
|
||||
// CONSOLE
|
||||
// TEST[skip:don't want to overwrite this setting for other tests]
|
||||
|
||||
All of the reindexed data should now be accessible via the alias set up above,
|
||||
in this case `mylogs`. Once you have verified that all the data has been
|
||||
reindexed and is available in the new indices, the existing indices can be
|
||||
safely removed.
|
|
@ -84,4 +84,6 @@ include::ilm-and-snapshots.asciidoc[]
|
|||
|
||||
include::start-stop-ilm.asciidoc[]
|
||||
|
||||
include::ilm-with-existing-indices.asciidoc[]
|
||||
|
||||
include::getting-started-slm.asciidoc[]
|
||||
|
|
Loading…
Reference in New Issue