412 lines
13 KiB
Plaintext
412 lines
13 KiB
Plaintext
[role="xpack"]
|
|
[testenv="basic"]
|
|
[[ilm-with-existing-indices]]
|
|
== Using {ilm-init} with existing indices
|
|
|
|
While it is recommended to use {ilm-init} to manage the index lifecycle from
|
|
start to finish, it may be useful to use {ilm-init} with existing indices,
|
|
particularly when transitioning from an alternative method of managing the index
|
|
lifecycle such as Curator, or when migrating from daily indices to
|
|
rollover-based indices. Such use cases are fully supported, but there are some
|
|
configuration differences from when {ilm-init} can manage the complete index
|
|
lifecycle.
|
|
|
|
This section describes strategies to leverage {ilm-init} for existing periodic
|
|
indices when migrating to fully {ilm-init}-manged indices, which can be done in
|
|
a few different ways, each providing different tradeoffs. As an example, we'll
|
|
walk through a use case of a very simple logging index with just a field for the
|
|
log message and a timestamp.
|
|
|
|
First, we need to create a template for these indices:
|
|
|
|
[source,console]
|
|
-----------------------
|
|
PUT _template/mylogs_template
|
|
{
|
|
"index_patterns": [
|
|
"mylogs-*"
|
|
],
|
|
"settings": {
|
|
"number_of_shards": 1,
|
|
"number_of_replicas": 1
|
|
},
|
|
"mappings": {
|
|
"properties": {
|
|
"message": {
|
|
"type": "text"
|
|
},
|
|
"@timestamp": {
|
|
"type": "date"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
-----------------------
|
|
|
|
And we'll ingest a few documents to create a few daily indices:
|
|
|
|
[source,console]
|
|
-----------------------
|
|
POST mylogs-pre-ilm-2019.06.24/_doc
|
|
{
|
|
"@timestamp": "2019-06-24T10:34:00",
|
|
"message": "this is one log message"
|
|
}
|
|
-----------------------
|
|
// TEST[continued]
|
|
|
|
[source,console]
|
|
-----------------------
|
|
POST mylogs-pre-ilm-2019.06.25/_doc
|
|
{
|
|
"@timestamp": "2019-06-25T17:42:00",
|
|
"message": "this is another log message"
|
|
}
|
|
-----------------------
|
|
// TEST[continued]
|
|
|
|
Now that we have these indices, we'll look at a few different ways of migrating
|
|
these indices to ILM.
|
|
|
|
[[ilm-with-existing-periodic-indices]]
|
|
=== Managing existing periodic indices with {ilm-init}
|
|
|
|
NOTE: The examples in this section assume daily indices as set up in
|
|
<<ilm-with-existing-indices,the previous section>>.
|
|
|
|
The simplest way to manage existing indices while transitioning to fully
|
|
{ilm-init}-managed indices is to allow all new indices to be fully managed by
|
|
{ilm-init} before attaching {ilm-init} policies to existing indices. To do this,
|
|
all new documents should be directed to {ilm-init}-managed indices - if you are
|
|
using Beats or Logstash data shippers, upgrading all of those shippers to
|
|
version 7.0.0 or higher will take care of that part for you. If you are not
|
|
using Beats or Logstash, you may need to set up ILM for new indices yourself as
|
|
demonstrated in the <<getting-started-index-lifecycle-management,getting started
|
|
guide>>.
|
|
|
|
NOTE: If you are using Beats through Logstash, you may need to change your
|
|
Logstash output configuration and invoke the Beats setup to use ILM for new
|
|
data.
|
|
|
|
Once all new documents are being written to fully {ilm-init}-managed indices, it
|
|
is easy to add an {ilm-init} policy to existing indices. However, there are two
|
|
things to keep in mind when doing this, and a trick that makes those two things
|
|
much easier to handle.
|
|
|
|
The two biggest things to keep in mind are:
|
|
|
|
1. Existing periodic indices shouldn't use policies with rollover, because
|
|
rollover is used to manage where new data goes. Since existing indices should no
|
|
longer be receiving new documents, there is no point to using rollover for them.
|
|
|
|
2. {ilm-init} policies attached to existing indices will compare the `min_age`
|
|
for each phase to the original creation date of the index, and so might proceed
|
|
through multiple phases immediately.
|
|
|
|
The first one is the most important, because it makes it difficult to use the
|
|
same policy for new and existing periodic indices. But that's easy to solve
|
|
with one simple trick: Create a second policy for existing indices, in addition
|
|
to the one for new indices. {ilm-init} policies are cheap to create, so don't be
|
|
afraid to have more than one. Modifying a policy designed for new indices to be
|
|
used on existing indices is generally very simple: just remove the `rollover`
|
|
action.
|
|
|
|
For example, if you created a policy for your new indices with each phase
|
|
like so:
|
|
|
|
[source,console]
|
|
-----------------------
|
|
PUT _ilm/policy/mylogs_policy
|
|
{
|
|
"policy": {
|
|
"phases": {
|
|
"hot": {
|
|
"actions": {
|
|
"rollover": {
|
|
"max_size": "25GB"
|
|
}
|
|
}
|
|
},
|
|
"warm": {
|
|
"min_age": "1d",
|
|
"actions": {
|
|
"forcemerge": {
|
|
"max_num_segments": 1
|
|
}
|
|
}
|
|
},
|
|
"cold": {
|
|
"min_age": "7d",
|
|
"actions": {
|
|
"freeze": {}
|
|
}
|
|
},
|
|
"delete": {
|
|
"min_age": "30d",
|
|
"actions": {
|
|
"delete": {}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
-----------------------
|
|
// TEST[continued]
|
|
|
|
You can create a policy for pre-existing indices by removing the `rollover`
|
|
action, and in this case, the `hot` phase is now empty so we can remove that
|
|
too:
|
|
|
|
[source,console]
|
|
-----------------------
|
|
PUT _ilm/policy/mylogs_policy_existing
|
|
{
|
|
"policy": {
|
|
"phases": {
|
|
"warm": {
|
|
"min_age": "1d",
|
|
"actions": {
|
|
"forcemerge": {
|
|
"max_num_segments": 1
|
|
}
|
|
}
|
|
},
|
|
"cold": {
|
|
"min_age": "7d",
|
|
"actions": {
|
|
"freeze": {}
|
|
}
|
|
},
|
|
"delete": {
|
|
"min_age": "30d",
|
|
"actions": {
|
|
"delete": {}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
-----------------------
|
|
// TEST[continued]
|
|
|
|
Creating a separate policy for existing indices will also allow using different
|
|
`min_age` values. You may want to use higher values to prevent many indices from
|
|
running through the policy at once, which may be important if your policy
|
|
includes potentially resource-intensive operations like force merge.
|
|
|
|
You can configure the lifecycle for many indices at once by using wildcards in
|
|
the index name when calling the <<indices-update-settings,Update Settings API>>
|
|
to set the policy name, but be careful that you don't include any indices that
|
|
you don't want to change the policy for:
|
|
|
|
[source,console]
|
|
-----------------------
|
|
PUT mylogs-pre-ilm*/_settings <1>
|
|
{
|
|
"index": {
|
|
"lifecycle": {
|
|
"name": "mylogs_policy_existing"
|
|
}
|
|
}
|
|
}
|
|
-----------------------
|
|
// TEST[continued]
|
|
|
|
<1> This pattern will match all indices with names that start with
|
|
`mylogs-pre-ilm`
|
|
|
|
Once all pre-{ilm-init} indices have aged out and been deleted, the policy for
|
|
older periodic indices can be deleted.
|
|
|
|
[[ilm-reindexing-into-rollover]]
|
|
=== Reindexing via {ilm-init}
|
|
|
|
NOTE: The examples in this section assume daily indices as set up in
|
|
<<ilm-with-existing-indices,the previous section>>.
|
|
|
|
In some cases, it may be useful to reindex data into {ilm-init}-managed indices.
|
|
This is more complex than simply attaching policies to existing indices as
|
|
described in <<ilm-with-existing-periodic-indices,the previous section>>, and
|
|
requires pausing indexing during the reindexing process. However, this technique
|
|
may be useful in cases where periodic indices were created with very small
|
|
amounts of data leading to excessive shard counts, or for indices which grow
|
|
steadily over time, but have not been broken up into time-series indices leading
|
|
to shards which are much too large, situations that cause significant
|
|
performance problems.
|
|
|
|
Before getting started with reindexing data, the new index structure should be
|
|
set up. For this section, we'll be using the same setup described in
|
|
<<ilm-with-existing-indices,{ilm-imit} with existing indices>>.
|
|
|
|
First, we'll set up a policy with rollover, and can include any additional
|
|
phases required. For simplicity, we'll just use rollover:
|
|
|
|
[source,console]
|
|
-----------------------
|
|
PUT _ilm/policy/sample_policy
|
|
{
|
|
"policy": {
|
|
"phases": {
|
|
"hot": {
|
|
"actions": {
|
|
"rollover": {
|
|
"max_age": "7d",
|
|
"max_size": "50G"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
-----------------------
|
|
// TEST[continued]
|
|
|
|
And now we'll update the index template for our indices to include the relevant
|
|
{ilm-init} settings:
|
|
|
|
[source,console]
|
|
-----------------------
|
|
PUT _template/mylogs_template
|
|
{
|
|
"index_patterns": [
|
|
"ilm-mylogs-*" <1>
|
|
],
|
|
"settings": {
|
|
"number_of_shards": 1,
|
|
"number_of_replicas": 1,
|
|
"index": {
|
|
"lifecycle": {
|
|
"name": "mylogs_condensed_policy", <2>
|
|
"rollover_alias": "mylogs" <3>
|
|
}
|
|
}
|
|
},
|
|
"mappings": {
|
|
"properties": {
|
|
"message": {
|
|
"type": "text"
|
|
},
|
|
"@timestamp": {
|
|
"type": "date"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
-----------------------
|
|
// TEST[continued]
|
|
|
|
<1> The new index pattern has a prefix compared to the old one, this will
|
|
make it easier to reindex later
|
|
<2> The name of the policy we defined above
|
|
<3> The name of the alias we'll use to write to and query
|
|
|
|
And create the first index with the alias specified in the `rollover_alias`
|
|
setting in the index template:
|
|
|
|
[source,console]
|
|
-----------------------
|
|
PUT ilm-mylogs-000001
|
|
{
|
|
"aliases": {
|
|
"mylogs": {
|
|
"is_write_index": true
|
|
}
|
|
}
|
|
}
|
|
-----------------------
|
|
// TEST[continued]
|
|
|
|
All new documents should be indexed via the `mylogs` alias at this point. Adding
|
|
new data to the old indices during the reindexing process can cause data to be
|
|
added to the old indices, but not be reindexed into the new indices.
|
|
|
|
NOTE: If you do not want to mix new data and old data in the new ILM-managed
|
|
indices, indexing of new data should be paused entirely while the reindex
|
|
completes. Mixing old and new data within one index is safe, but keep in mind
|
|
that the indices with mixed data should be retained in their entirety until you
|
|
are ready to delete both the old and new data.
|
|
|
|
By default, {ilm-init} only checks rollover conditions every 10 minutes. Under
|
|
normal indexing load, this usually works well, but during reindexing, indices
|
|
can grow very, very quickly. We'll need to set the poll interval to something
|
|
shorter to ensure that the new indices don't grow too large while waiting for
|
|
the rollover check:
|
|
|
|
[source,console]
|
|
-----------------------
|
|
PUT _cluster/settings
|
|
{
|
|
"transient": {
|
|
"indices.lifecycle.poll_interval": "1m" <1>
|
|
}
|
|
}
|
|
-----------------------
|
|
// TEST[skip:don't want to overwrite this setting for other tests]
|
|
|
|
<1> This tells ILM to check for rollover conditions every minute
|
|
|
|
We're now ready to reindex our data using the <<docs-reindex,reindex API>>. If
|
|
you have a timestamp or date field in your documents, as in this example, it may
|
|
be useful to specify that the documents should be sorted by that field - this
|
|
will mean that all documents in `ilm-mylogs-000001` come before all documents in
|
|
`ilm-mylogs-000002`, and so on. However, if this is not a requirement, omitting
|
|
the sort will allow the data to be reindexed more quickly.
|
|
|
|
NOTE: Sorting in reindex is deprecated, see
|
|
<<docs-reindex-api-request-body,reindex request body>>. Instead use timestamp
|
|
ranges to partition data in separate reindex runs.
|
|
|
|
IMPORTANT: If your data uses document IDs generated by means other than
|
|
Elasticsearch's automatic ID generation, you may need to do additional
|
|
processing to ensure that the document IDs don't conflict during the reindex, as
|
|
documents will retain their original IDs. One way to do this is to use a
|
|
<<reindex-scripts,script>> in the reindex call to append the original index name
|
|
to the document ID.
|
|
|
|
[source,console]
|
|
-----------------------
|
|
POST _reindex
|
|
{
|
|
"source": {
|
|
"index": "mylogs-*", <1>
|
|
"sort": { "@timestamp": "desc" }
|
|
},
|
|
"dest": {
|
|
"index": "mylogs", <2>
|
|
"op_type": "create" <3>
|
|
}
|
|
}
|
|
-----------------------
|
|
// TEST[continued]
|
|
|
|
<1> This index pattern matches our existing indices. Using the prefix for
|
|
the new indices makes using this index pattern much easier.
|
|
<2> The alias set up above
|
|
<3> This option will cause the reindex to abort if it encounters multiple
|
|
documents with the same ID. This is optional, but recommended to prevent
|
|
accidentally overwriting documents if two documents from different indices
|
|
have the same ID.
|
|
|
|
Once this completes, indexing new data can be resumed, as long as all new
|
|
documents are indexed into the alias used above. All data, existing and new, can
|
|
be queried using that alias as well. We should also be sure to set the
|
|
{ilm-init} poll interval back to its default value, because keeping it set too
|
|
low can cause unnecessary load on the current master node:
|
|
|
|
[source,console]
|
|
-----------------------
|
|
PUT _cluster/settings
|
|
{
|
|
"transient": {
|
|
"indices.lifecycle.poll_interval": null
|
|
}
|
|
}
|
|
|
|
-----------------------
|
|
// TEST[skip:don't want to overwrite this setting for other tests]
|
|
|
|
All of the reindexed data should now be accessible via the alias set up above,
|
|
in this case `mylogs`. Once you have verified that all the data has been
|
|
reindexed and is available in the new indices, the existing indices can be
|
|
safely removed.
|