OpenSearch/docs/reference/ilm/ilm-with-existing-indices.a...

203 lines
6.6 KiB
Plaintext

[role="xpack"]
[testenv="basic"]
[[ilm-with-existing-indices]]
== Manage existing indices
If you've been using Curator or some other mechanism to manage periodic indices,
you have a couple options when migrating to {ilm-init}:
* Set up your index templates to use an {ilm-init} policy to manage your new indices.
Once {ilm-init} is managing your current write index, you can apply an appropriate policy to your old indices.
* Reindex into an {ilm-init}-managed index.
NOTE: Starting in Curator version 5.7, Curator ignores {ilm-init} managed indices.
[discrete]
[[ilm-existing-indices-apply]]
=== Apply policies to existing time series indices
The simplest way to transition to managing your periodic indices with {ilm-init} is
to <<apply-policy-template, configure an index template>> to apply a lifecycle policy to new indices.
Once the index you are writing to is being managed by {ilm-init},
you can <<apply-policy-multiple, manually apply a policy>> to your older indices.
Define a separate policy for your older indices that omits the rollover action.
Rollover is used to manage where new data goes, so isn't applicable.
Keep in mind that policies applied to existing indices compare the `min_age` for each phase to
the original creation date of the index, and might proceed through multiple phases immediately.
If your policy performs resource-intensive operations like force merge,
you don't want to have a lot of indices performing those operations all at once
when you switch over to {ilm-init}.
You can specify different `min_age` values in the policy you use for existing indices,
or set <<index-lifecycle-origination-date, `index.lifecycle.origination_date`>>
to control how the index age is calculated.
Once all pre-{ilm-init} indices have been aged out and removed,
you can delete the policy you used to manage them.
NOTE: If you are using {beats} or {ls}, enabling {ilm-init} in version 7.0 and onward
sets up {ilm-init} to manage new indices automatically.
If you are using {beats} through {ls},
you might need to change your {ls} output configuration and invoke the {beats} setup
to use {ilm-init} for new data.
[discrete]
[[ilm-existing-indices-reindex]]
=== Reindex into a managed index
An alternative to <<ilm-with-existing-periodic-indices,applying policies to existing indices>> is to
reindex your data into an {ilm-init}-managed index.
You might want to do this if creating periodic indices with very small amounts of data
has led to excessive shard counts, or if continually indexing into the same index has led to large shards
and performance issues.
First, you need to set up the new {ilm-init}-managed index:
. Update your index template to include the necessary {ilm-init} settings.
. Bootstrap an initial index as the write index.
. Stop writing to the old indices and index new documents using the alias that points to bootstrapped index.
To reindex into the managed index:
. Pause indexing new documents if you do not want to mix new and old data in the {ilm-init}-managed index.
Mixing old and new data in one index is safe,
but a combined index needs to be retained until you are ready to delete the new data.
. Reduce the {ilm-init} poll interval to ensure that the index doesn't
grow too large while waiting for the rollover check.
By default, {ilm-init} checks to see what actions need to be taken every 10 minutes.
+
--
[source,console]
-----------------------
PUT _cluster/settings
{
"transient": {
"indices.lifecycle.poll_interval": "1m" <1>
}
}
-----------------------
// TEST[skip:don't want to overwrite this setting for other tests]
<1> Check once a minute to see if {ilm-init} actions such as rollover need to be performed.
--
. Reindex your data using the <<docs-reindex,reindex API>>.
If you want to partition the data in the order in which it was originally indexed,
you can run separate reindex requests.
+
--
IMPORTANT: Documents retain their original IDs. If you don't use automatically generated document IDs,
and are reindexing from multiple source indices, you might need to do additional processing to
ensure that document IDs don't conflict. One way to do this is to use a
<<reindex-scripts,script>> in the reindex call to append the original index name
to the document ID.
//////////////////////////
[source,console]
-----------------------
PUT _index_template/mylogs_template
{
"index_patterns": [
"mylogs-*"
],
"template": {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"index": {
"lifecycle": {
"name": "mylogs_condensed_policy", <2>
"rollover_alias": "mylogs" <3>
}
}
},
"mappings": {
"properties": {
"message": {
"type": "text"
},
"@timestamp": {
"type": "date"
}
}
}
}
}
-----------------------
[source,console]
-----------------------
POST mylogs-pre-ilm-2019.06.24/_doc
{
"@timestamp": "2019-06-24T10:34:00",
"message": "this is one log message"
}
-----------------------
// TEST[continued]
[source,console]
-----------------------
POST mylogs-pre-ilm-2019.06.25/_doc
{
"@timestamp": "2019-06-25T17:42:00",
"message": "this is another log message"
}
-----------------------
// TEST[continued]
[source,console]
--------------------------------------------------
DELETE _index_template/mylogs_template
--------------------------------------------------
// TEST[continued]
//////////////////////////
[source,console]
-----------------------
POST _reindex
{
"source": {
"index": "mylogs-*" <1>
},
"dest": {
"index": "mylogs", <2>
"op_type": "create" <3>
}
}
-----------------------
// TEST[continued]
<1> Matches your existing indices. Using the prefix for
the new indices makes using this index pattern much easier.
<2> The alias that points to your bootstrapped index.
<3> Halts reindexing if multiple documents have the same ID.
This is recommended to prevent accidentally overwriting documents
if documents in different source indices have the same ID.
--
. When reindexing is complete, set the {ilm-init} poll interval back to its default value to
prevent unnecessary load on the master node:
+
[source,console]
-----------------------
PUT _cluster/settings
{
"transient": {
"indices.lifecycle.poll_interval": null
}
}
-----------------------
// TEST[skip:don't want to overwrite this setting for other tests]
. Resume indexing new data using the same alias.
+
Querying using this alias will now search your new data and all of the reindexed data.
. Once you have verified that all of the reindexed data is available in the new managed indices,
you can safely remove the old indices.