--- layout: default title: Reindex data nav_order: 16 --- # Reindex data After creating an index, you might need to make an extensive change such as adding a new field to every document or combining multiple indices to form a new one. Rather than deleting your index, making the change offline, and then indexing your data all over again, you can use the `reindex` operation. With the `reindex` operation, you can copy all or a subset of documents that you select through a query to another index. Reindex is a `POST` operation. In its most basic form, you specify a source index and a destination index. Reindexing can be an expensive operation depending on the size of your source index. We recommend you disable replicas in your destination index by setting `number_of_replicas` to `0` and re-enable them once the reindex process is complete. {: .note } --- #### Table of contents 1. TOC {:toc} --- ## Reindex all documents You can copy all documents from one index to another. You first need to create a destination index with your desired field mappings and settings or you can copy the ones from your source index: ```json PUT destination { "mappings":{ "Add in your desired mappings" }, "settings":{ "Add in your desired settings" } } ``` This `reindex` command copies all the documents from a source index to a destination index: ```json POST _reindex { "source":{ "index":"source" }, "dest":{ "index":"destination" } } ``` If the destination index is not already created, the `reindex` operation creates a new destination index with default configurations. ## Reindex from a remote cluster You can copy documents from an index in a remote cluster. Use the `remote` option to specify the remote hostname and the required login credentials. This command reaches out to a remote cluster, logs in with the username and password, and copies all the documents from the source index in that remote cluster to the destination index in your local cluster: ```json POST _reindex { "source":{ "remote":{ "host":"https://:9200", "username":"YOUR_USERNAME", "password":"YOUR_PASSWORD" } }, "dest":{ "index":"destination" } } ``` You can specify the following options: Options | Valid values | Description | Required :--- | :--- | :--- `host` | String | The REST endpoint of the remote cluster. | Yes `username` | String | The username to log into the remote cluster. | No `password` | String | The password to log into the remote cluster. | No `socket_timeout` | Time Unit | The wait time for socket reads (default 30s). | No `connect_timeout` | Time Unit | The wait time for remote connection timeouts (default 30s). | No ## Reindex a subset of documents You can copy a specific set of documents that match a search query. This command copies only a subset of documents matched by a query operation to the destination index: ```json POST _reindex { "source":{ "index":"source", "query": { "match": { "field_name": "text" } } }, "dest":{ "index":"destination" } } ``` For a list of all query operations, see [Full-text queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/). ## Combine one or more indices You can combine documents from one or more indices by adding the source indices as a list. This command copies all documents from two source indices to one destination index: ```json POST _reindex { "source":{ "index":[ "source_1", "source_2" ] }, "dest":{ "index":"destination" } } ``` Make sure the number of shards for your source and destination indices are the same. ## Reindex only unique documents You can copy only documents missing from a destination index by setting the `op_type` option to `create`. In this case, if a document with the same ID already exists, the operation ignores the one from the source index. To ignore all version conflicts of documents, set the `conflicts` option to `proceed`. ```json POST _reindex { "conflicts":"proceed", "source":{ "index":"source" }, "dest":{ "index":"destination", "op_type":"create" } } ``` ## Reindex sorted documents You can copy certain documents after sorting specific fields in the document. This command copies the last 10 documents based on the `timestamp` field: ```json POST _reindex { "size":10, "source":{ "index":"source", "sort":{ "timestamp":"desc" } }, "dest":{ "index":"destination" } } ``` ## Transform documents during reindexing You can transform your data during the reindexing process using the `script` option. We recommend Painless for scripting in OpenSearch. This command runs the source index through a Painless script that increments a `number` field inside an `account` object before copying it to the destination index: ```json POST _reindex { "source":{ "index":"source" }, "dest":{ "index":"destination" }, "script":{ "lang":"painless", "source":"ctx._account.number++" } } ``` You can also specify an ingest pipeline to transform your data during the reindexing process. You would first have to create a pipeline with `processors` defined. You have a number of different `processors` available to use in your ingest pipeline. Here's a sample ingest pipeline that defines a `split` processor that splits a `text` field based on a `space` separator and stores it in a new `word` field. The `script` processor is a Painless script that finds the length of the `word` field and stores it in a new `word_count` field. The `remove` processor removes the `test` field. ```json PUT _ingest/pipeline/pipeline-test { "description": "Splits the text field into a list. Computes the length of the 'word' field and stores it in a new 'word_count' field. Removes the 'test' field.", "processors": [ { "split": { "field": "text", "separator": "\\s+", "target_field": "word" }, } { "script": { "lang": "painless", "source": "ctx.word_count = ctx.word.length" } }, { "remove": { "field": "test" } } ] } ``` After creating a pipeline, you can use the `reindex` operation: ```json POST _reindex { "source": { "index": "source", }, "dest": { "index": "destination", "pipeline": "pipeline-test" } } ``` ## Update documents in the current index To update the data in your current index itself without copying it to a different index, use the `update_by_query` operation. The `update_by_query` operation is `POST` operation that you can perform on a single index at a time. ```json POST /_update_by_query ``` If you run this command with no parameters, it increments the version number for all documents in the index. ## Source index options You can specify the following options for your source index: Option | Valid values | Description | Required :--- | :--- | :--- `index` | String | The name of the source index. You can provide multiple source indices as a list. | Yes `max_docs` | Integer | The maximum number of documents to reindex. | No `query` | Object | The search query to use for the reindex operation. | No `size` | Integer | The number of documents to reindex. | No `slice` | String | Specify manual or automatic slicing to parallelize reindexing. | No `sort` | List | Sort specific fields in the document before reindexing. | No ## Destination index options You can specify the following options for your destination index: Option | Valid values | Description | Required :--- | :--- | :--- `index` | String | The name of the destination index. | Yes `version_type` | Enum | The version type for the indexing operation. Valid values: internal, external, external_gt, external_gte. | No