opensearch-docs-cn/docs/opensearch/reindex-data.md

---
layout: default
title: Reindex data
parent: OpenSearch
nav_order: 6
---

# Reindex data

After creating an index, you might need to make an extensive change such as adding a new field to every document or combining multiple indices to form a new one. Rather than deleting your index, making the change offline, and then indexing your data all over again, you can use the `reindex` operation.

With the `reindex` operation, you can copy all or a subset of documents that you select through a query to another index. Reindex is a `POST` operation. In its most basic form, you specify a source index and a destination index.

Reindexing can be an expensive operation depending on the size of your source index. We recommend you disable replicas in your destination index by setting `number_of_replicas` to `0` and re-enable them once the reindex process is complete.
{: .note }

---

#### Table of contents
1. TOC
{:toc}


---

## Reindex all documents

You can copy all documents from one index to another.

You first need to create a destination index with your desired field mappings and settings or you can copy the ones from your source index:

```json
PUT destination
{
   "mappings":{
      "Add in your desired mappings"
   },
   "settings":{
      "Add in your desired settings"
   }
}
```

This `reindex` command copies all the documents from a source index to a destination index:

```json
POST _reindex
{
   "source":{
      "index":"source"
   },
   "dest":{
      "index":"destination"
   }
}
```

If the destination index is not already created, the `reindex` operation creates a new destination index with default configurations.

## Reindex from a remote cluster

You can copy documents from an index in a remote cluster. Use the `remote` option to specify the remote hostname and the required login credentials.

This command reaches out to a remote cluster, logs in with the username and password, and copies all the documents from the source index in that remote cluster to the destination index in your local cluster:

```json
POST _reindex
{
   "source":{
      "remote":{
         "host":"https://<REST_endpoint_of_remote_cluster>:9200",
         "username":"YOUR_USERNAME",
         "password":"YOUR_PASSWORD"
      }
   },
   "dest":{
      "index":"destination"
   }
}
```

You can specify the following options:

Options | Valid values | Description | Required
:--- | :--- | :---
`host` | String | The REST endpoint of the remote cluster. | Yes
`username` | String | The username to log into the remote cluster. | No
`password` | String | The password to log into the remote cluster. | No
`socket_timeout` | Time Unit | The wait time for socket reads (default 30s). | No
`connect_timeout` | Time Unit | The wait time for remote connection timeouts (default 30s). | No


## Reindex a subset of documents

You can copy a specific set of documents that match a search query.

This command copies only a subset of documents matched by a query operation to the destination index:

```json
POST _reindex
{
   "source":{
      "index":"source",
      "query": {
        "match": {
           "field_name": "text"
         }
      }
   },
   "dest":{
      "index":"destination"
   }
}
```

For a list of all query operations, see [Full-text queries](../full-text/).

## Combine one or more indices

You can combine documents from one or more indices by adding the source indices as a list.

This command copies all documents from two source indices to one destination index:

```json
POST _reindex
{
   "source":{
      "index":[
         "source_1",
         "source_2"
      ]
   },
   "dest":{
      "index":"destination"
   }
}
```
Make sure the number of shards for your source and destination indices are the same.

## Reindex only unique documents

You can copy only documents missing from a destination index by setting the `op_type` option to `create`.
In this case, if a document with the same ID already exists, the operation ignores the one from the source index.
To ignore all version conflicts of documents, set the `conflicts` option to `proceed`.

```json
POST _reindex
{
   "conflicts":"proceed",
   "source":{
      "index":"source"
   },
   "dest":{
      "index":"destination",
      "op_type":"create"
   }
}
```

## Reindex sorted documents

You can copy certain documents after sorting specific fields in the document.

This command copies the last 10 documents based on the `timestamp` field:

```json
POST _reindex
{
   "size":10,
   "source":{
      "index":"source",
      "sort":{
         "timestamp":"desc"
      }
   },
   "dest":{
      "index":"destination"
   }
}
```

## Transform documents during reindexing

You can transform your data during the reindexing process using the `script` option.
We recommend Painless for scripting in OpenSearch.

This command runs the source index through a Painless script that increments a `number` field inside an `account` object before copying it to the destination index:

```json
POST _reindex
{
   "source":{
      "index":"source"
   },
   "dest":{
      "index":"destination"
   },
   "script":{
      "lang":"painless",
      "source":"ctx._account.number++"
   }
}
```

You can also specify an ingest pipeline to transform your data during the reindexing process.

You would first have to create a pipeline with `processors` defined. You have a number of different `processors` available to use in your ingest pipeline.

Here's a sample ingest pipeline that defines a `split` processor that splits a `text` field based on a `space` separator and stores it in a new `word` field. The `script` processor is a Painless script that finds the length of the `word` field and stores it in a new `word_count` field. The `remove` processor removes the `test` field.

```json
PUT _ingest/pipeline/pipeline-test
{
"description": "Splits the text field into a list. Computes the length of the 'word' field and stores it in a new 'word_count' field. Removes the 'test' field.",
"processors": [
 {
   "split": {
     "field": "text",
     "separator": "\\s+",
     "target_field": "word"
   },
 }
 {
   "script": {
     "lang": "painless",
     "source": "ctx.word_count = ctx.word.length"
   }
 },
 {
   "remove": {
     "field": "test"
   }
 }
]
}
```

After creating a pipeline, you can use the `reindex` operation:

```json
POST _reindex
{
  "source": {
    "index": "source",
  },
  "dest": {
    "index": "destination",
    "pipeline": "pipeline-test"
  }
}
```

## Update documents in the current index

To update the data in your current index itself without copying it to a different index, use the `update_by_query` operation.

The `update_by_query` operation is `POST` operation that you can perform on a single index at a time.

```json
POST <index_name>/_update_by_query
```

If you run this command with no parameters, it increments the version number for all documents in the index.

## Source index options

You can specify the following options for your source index:

Option | Valid values | Description | Required
:--- | :--- | :---
`index` | String | The name of the source index. You can provide multiple source indices as a list. | Yes
`max_docs` | Integer | The maximum number of documents to reindex. | No
`query` | Object | The search query to use for the reindex operation. | No
`size` | Integer | The number of documents to reindex. | No
`slice` | String | Specify manual or automatic slicing to parallelize reindexing. | No
`sort` | List | Sort specific fields in the document before reindexing. | No

## Destination index options

You can specify the following options for your destination index:

Option | Valid values | Description | Required
:--- | :--- | :---
`index` | String | The name of the destination index. | Yes
`version_type` | Enum | The version type for the indexing operation. Valid values: internal, external, external_gt, external_gte. | No
Initial documentation cut 2021-05-05 10:09:47 -07:00			`---`
			`layout: default`
OpenSearch chapter fixes 2021-05-05 14:21:50 -07:00			`title: Reindex data`
Initial documentation cut 2021-05-05 10:09:47 -07:00			`parent: OpenSearch`
			`nav_order: 6`
			`---`

			`# Reindex data`

OpenSearch chapter fixes 2021-05-05 14:21:50 -07:00			After creating an index, you might need to make an extensive change such as adding a new field to every document or combining multiple indices to form a new one. Rather than deleting your index, making the change offline, and then indexing your data all over again, you can use the `reindex` operation.
Initial documentation cut 2021-05-05 10:09:47 -07:00
			With the `reindex` operation, you can copy all or a subset of documents that you select through a query to another index. Reindex is a `POST` operation. In its most basic form, you specify a source index and a destination index.

			Reindexing can be an expensive operation depending on the size of your source index. We recommend you disable replicas in your destination index by setting `number_of_replicas` to `0` and re-enable them once the reindex process is complete.
			`{: .note }`

			`---`

			`#### Table of contents`
			`1. TOC`
			`{:toc}`


			`---`

			`## Reindex all documents`

			`You can copy all documents from one index to another.`

			`You first need to create a destination index with your desired field mappings and settings or you can copy the ones from your source index:`

			```json
			`PUT destination`
			`{`
			`"mappings":{`
			`"Add in your desired mappings"`
			`},`
			`"settings":{`
			`"Add in your desired settings"`
			`}`
			`}`
			```

			This `reindex` command copies all the documents from a source index to a destination index:

			```json
			`POST _reindex`
			`{`
			`"source":{`
			`"index":"source"`
			`},`
			`"dest":{`
			`"index":"destination"`
			`}`
			`}`
			```

			If the destination index is not already created, the `reindex` operation creates a new destination index with default configurations.

			`## Reindex from a remote cluster`

			You can copy documents from an index in a remote cluster. Use the `remote` option to specify the remote hostname and the required login credentials.

			`This command reaches out to a remote cluster, logs in with the username and password, and copies all the documents from the source index in that remote cluster to the destination index in your local cluster:`

			```json
			`POST _reindex`
			`{`
			`"source":{`
			`"remote":{`
			`"host":"https://<REST_endpoint_of_remote_cluster>:9200",`
			`"username":"YOUR_USERNAME",`
			`"password":"YOUR_PASSWORD"`
			`}`
			`},`
			`"dest":{`
			`"index":"destination"`
			`}`
			`}`
			```

			`You can specify the following options:`

			`Options \| Valid values \| Description \| Required`
			`:--- \| :--- \| :---`
			`host` \| String \| The REST endpoint of the remote cluster. \| Yes
OpenSearch chapter fixes 2021-05-05 14:21:50 -07:00			`username` \| String \| The username to log into the remote cluster. \| No
			`password` \| String \| The password to log into the remote cluster. \| No
Initial documentation cut 2021-05-05 10:09:47 -07:00			`socket_timeout` \| Time Unit \| The wait time for socket reads (default 30s). \| No
			`connect_timeout` \| Time Unit \| The wait time for remote connection timeouts (default 30s). \| No


			`## Reindex a subset of documents`

OpenSearch chapter fixes 2021-05-05 14:21:50 -07:00			`You can copy a specific set of documents that match a search query.`
Initial documentation cut 2021-05-05 10:09:47 -07:00
			`This command copies only a subset of documents matched by a query operation to the destination index:`

			```json
			`POST _reindex`
			`{`
			`"source":{`
			`"index":"source",`
			`"query": {`
			`"match": {`
			`"field_name": "text"`
			`}`
			`}`
			`},`
			`"dest":{`
			`"index":"destination"`
			`}`
			`}`
			```

			`For a list of all query operations, see [Full-text queries](../full-text/).`

			`## Combine one or more indices`

			`You can combine documents from one or more indices by adding the source indices as a list.`

			`This command copies all documents from two source indices to one destination index:`

			```json
			`POST _reindex`
			`{`
			`"source":{`
			`"index":[`
			`"source_1",`
			`"source_2"`
			`]`
			`},`
			`"dest":{`
			`"index":"destination"`
			`}`
			`}`
			```
			`Make sure the number of shards for your source and destination indices are the same.`

			`## Reindex only unique documents`

			You can copy only documents missing from a destination index by setting the `op_type` option to `create`.
			`In this case, if a document with the same ID already exists, the operation ignores the one from the source index.`
			To ignore all version conflicts of documents, set the `conflicts` option to `proceed`.

			```json
			`POST _reindex`
			`{`
			`"conflicts":"proceed",`
			`"source":{`
			`"index":"source"`
			`},`
			`"dest":{`
			`"index":"destination",`
			`"op_type":"create"`
			`}`
			`}`
			```

			`## Reindex sorted documents`

			`You can copy certain documents after sorting specific fields in the document.`

			This command copies the last 10 documents based on the `timestamp` field:

			```json
			`POST _reindex`
			`{`
			`"size":10,`
			`"source":{`
			`"index":"source",`
			`"sort":{`
			`"timestamp":"desc"`
			`}`
			`},`
			`"dest":{`
			`"index":"destination"`
			`}`
			`}`
			```

			`## Transform documents during reindexing`

			You can transform your data during the reindexing process using the `script` option.
			`We recommend Painless for scripting in OpenSearch.`

			This command runs the source index through a Painless script that increments a `number` field inside an `account` object before copying it to the destination index:

			```json
			`POST _reindex`
			`{`
			`"source":{`
			`"index":"source"`
			`},`
			`"dest":{`
			`"index":"destination"`
			`},`
			`"script":{`
			`"lang":"painless",`
			`"source":"ctx._account.number++"`
			`}`
			`}`
			```

			`You can also specify an ingest pipeline to transform your data during the reindexing process.`

			You would first have to create a pipeline with `processors` defined. You have a number of different `processors` available to use in your ingest pipeline.

			Here's a sample ingest pipeline that defines a `split` processor that splits a `text` field based on a `space` separator and stores it in a new `word` field. The `script` processor is a Painless script that finds the length of the `word` field and stores it in a new `word_count` field. The `remove` processor removes the `test` field.

			```json
			`PUT _ingest/pipeline/pipeline-test`
			`{`
			`"description": "Splits the text field into a list. Computes the length of the 'word' field and stores it in a new 'word_count' field. Removes the 'test' field.",`
			`"processors": [`
			`{`
			`"split": {`
			`"field": "text",`
			`"separator": "\\s+",`
			`"target_field": "word"`
			`},`
			`}`
			`{`
			`"script": {`
			`"lang": "painless",`
			`"source": "ctx.word_count = ctx.word.length"`
			`}`
			`},`
			`{`
			`"remove": {`
			`"field": "test"`
			`}`
			`}`
			`]`
			`}`
			```

			After creating a pipeline, you can use the `reindex` operation:

			```json
			`POST _reindex`
			`{`
			`"source": {`
			`"index": "source",`
			`},`
			`"dest": {`
			`"index": "destination",`
			`"pipeline": "pipeline-test"`
			`}`
			`}`
			```

OpenSearch chapter fixes 2021-05-05 14:21:50 -07:00			`## Update documents in the current index`
Initial documentation cut 2021-05-05 10:09:47 -07:00
OpenSearch chapter fixes 2021-05-05 14:21:50 -07:00			To update the data in your current index itself without copying it to a different index, use the `update_by_query` operation.
Initial documentation cut 2021-05-05 10:09:47 -07:00
			The `update_by_query` operation is `POST` operation that you can perform on a single index at a time.

			```json
			`POST <index_name>/_update_by_query`
			```

			`If you run this command with no parameters, it increments the version number for all documents in the index.`

			`## Source index options`

			`You can specify the following options for your source index:`

			`Option \| Valid values \| Description \| Required`
			`:--- \| :--- \| :---`
			`index` \| String \| The name of the source index. You can provide multiple source indices as a list. \| Yes
			`max_docs` \| Integer \| The maximum number of documents to reindex. \| No
			`query` \| Object \| The search query to use for the reindex operation. \| No
			`size` \| Integer \| The number of documents to reindex. \| No
			`slice` \| String \| Specify manual or automatic slicing to parallelize reindexing. \| No
			`sort` \| List \| Sort specific fields in the document before reindexing. \| No

			`## Destination index options`

			`You can specify the following options for your destination index:`

			`Option \| Valid values \| Description \| Required`
			`:--- \| :--- \| :---`
			`index` \| String \| The name of the destination index. \| Yes
			`version_type` \| Enum \| The version type for the indexing operation. Valid values: internal, external, external_gt, external_gte. \| No