kolchfa-aws 5abc22147c
Refactor the Query DSL section (#2904)
* for query dsl index page rewrites for proper index page

Signed-off-by: alicejw <alicejw@amazon.com>

* fix formatting in table

Signed-off-by: alicejw <alicejw@amazon.com>

* update query table intro

Signed-off-by: alicejw <alicejw@amazon.com>

* rmv proprietary from overview

Signed-off-by: alicejw <alicejw@amazon.com>

* awkward sentence fix

Signed-off-by: alicejw <alicejw@amazon.com>

* to add list of all query categories

Signed-off-by: alicejw <alicejw@amazon.com>

* for query category descriptions

Signed-off-by: alicejw <alicejw@amazon.com>

* remove commented note

Signed-off-by: alicejw <alicejw@amazon.com>

* update term-level query page

Signed-off-by: alicejw <alicejw@amazon.com>

* for clarity about term and full-text query use cases

Signed-off-by: alicejw <alicejw@amazon.com>

* for parallel bullet list of queries

Signed-off-by: alicejw <alicejw@amazon.com>

* remove redundant word

Signed-off-by: alicejw <alicejw@amazon.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/term.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/term.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* for tech review feedback

Signed-off-by: alicejw <alicejw@amazon.com>

* for entire list of query types we support, even though we don't have document topic pages for them yet.

Signed-off-by: alicejw <alicejw@amazon.com>

* to include full list of query types we support

Signed-off-by: alicejw <alicejw@amazon.com>

* change Boolean to  type for consistency in the section

Signed-off-by: alicejw <alicejw@amazon.com>

* update query type category list title

Signed-off-by: alicejw <alicejw@amazon.com>

* for compound query type definitions

Signed-off-by: alicejw <alicejw@amazon.com>

* for additional descriptions

Signed-off-by: alicejw <alicejw@amazon.com>

* for query context descriptions

Signed-off-by: alicejw <alicejw@amazon.com>

* for additional edits to query descriptions list

Signed-off-by: alicejw <alicejw@amazon.com>

* create span query category page and update bullet list on index to cross-reference to it.

Signed-off-by: alicejw <alicejw@amazon.com>

* add pages for geo and shape query category, and add cross-references

Signed-off-by: alicejw <alicejw@amazon.com>

* remove regex it is part of term-level queries

Signed-off-by: alicejw <alicejw@amazon.com>

* for bullet list granular edits

Signed-off-by: alicejw <alicejw@amazon.com>

* put bullet list in alphabetical order

Signed-off-by: alicejw <alicejw@amazon.com>

* for doc review updates

Signed-off-by: alicejw <alicejw@amazon.com>

* reword for reviewer feedback

Signed-off-by: alicejw <alicejw@amazon.com>

* small rewording

Signed-off-by: alicejw <alicejw@amazon.com>

* typo space

Signed-off-by: alicejw <alicejw@amazon.com>

* put topics in alphabetical order in left nav

Signed-off-by: alicejw <alicejw@amazon.com>

* additional reviewer's comment

Signed-off-by: alicejw <alicejw@amazon.com>

* for second doc reviewer's feedback updates

Signed-off-by: alicejw <alicejw@amazon.com>

* for doc reviewer comment that was hidden

Signed-off-by: alicejw <alicejw@amazon.com>

* Update _opensearch/query-dsl/geo-and-shape.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/span-query.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/span-query.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/term.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* updates from third doc review for tech accuracy requested by editorial

Signed-off-by: alicejw <alicejw@amazon.com>

* create compound query sub-page to move descriptions to make bullet list parallel

Signed-off-by: alicejw <alicejw@amazon.com>

* fix compound query page title

Signed-off-by: alicejw <alicejw@amazon.com>

* add fuzzy query definition

Signed-off-by: alicejw <alicejw@amazon.com>

* for editorial feedback updates

Signed-off-by: alicejw <alicejw@amazon.com>

* Update _opensearch/query-dsl/term.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Refactor Query DSL section

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Adds doc review comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Fix typo

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Implemented editorial comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Changed periods to colons when introducing code blocks

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

---------

Signed-off-by: alicejw <alicejw@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: alicejw <alicejw@amazon.com>
Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com>
2023-02-15 17:12:50 -05:00

7.3 KiB

layout title nav_order
default Reindex data 16

Reindex data

After creating an index, you might need to make an extensive change such as adding a new field to every document or combining multiple indexes to form a new one. Rather than deleting your index, making the change offline, and then indexing your data again, you can use the reindex operation.

With the reindex operation, you can copy all or a subset of documents that you select through a query to another index. Reindex is a POST operation. In its most basic form, you specify a source index and a destination index.

Reindexing can be an expensive operation depending on the size of your source index. We recommend you disable replicas in your destination index by setting number_of_replicas to 0 and re-enable them once the reindex process is complete. {: .note }


Table of contents

  1. TOC {:toc}

Reindex all documents

You can copy all documents from one index to another.

You first need to create a destination index with your desired field mappings and settings or you can copy the ones from your source index:

PUT destination
{
   "mappings":{
      "Add in your desired mappings"
   },
   "settings":{
      "Add in your desired settings"
   }
}

This reindex command copies all the documents from a source index to a destination index:

POST _reindex
{
   "source":{
      "index":"source"
   },
   "dest":{
      "index":"destination"
   }
}

If the destination index is not already created, the reindex operation creates a new destination index with default configurations.

Reindex from a remote cluster

You can copy documents from an index in a remote cluster. Use the remote option to specify the remote hostname and the required login credentials.

This command reaches out to a remote cluster, logs in with the username and password, and copies all the documents from the source index in that remote cluster to the destination index in your local cluster:

POST _reindex
{
   "source":{
      "remote":{
         "host":"https://<REST_endpoint_of_remote_cluster>:9200",
         "username":"YOUR_USERNAME",
         "password":"YOUR_PASSWORD"
      },
      "index": "source"
   },
   "dest":{
      "index":"destination"
   }
}

You can specify the following options:

Options | Valid values | Description | Required :--- | :--- | :--- host | String | The REST endpoint of the remote cluster. | Yes username | String | The username to log into the remote cluster. | No password | String | The password to log into the remote cluster. | No socket_timeout | Time Unit | The wait time for socket reads (default 30s). | No connect_timeout | Time Unit | The wait time for remote connection timeouts (default 30s). | No

Reindex a subset of documents

You can copy a specific set of documents that match a search query.

This command copies only a subset of documents matched by a query operation to the destination index:

POST _reindex
{
   "source":{
      "index":"source",
      "query": {
        "match": {
           "field_name": "text"
         }
      }
   },
   "dest":{
      "index":"destination"
   }
}

For a list of all query operations, see Full-text queries.

Combine one or more indexes

You can combine documents from one or more indexes by adding the source indexes as a list.

This command copies all documents from two source indexes to one destination index:

POST _reindex
{
   "source":{
      "index":[
         "source_1",
         "source_2"
      ]
   },
   "dest":{
      "index":"destination"
   }
}

Make sure the number of shards for your source and destination indexes is the same.

Reindex only unique documents

You can copy only documents missing from a destination index by setting the op_type option to create. In this case, if a document with the same ID already exists, the operation ignores the one from the source index. To ignore all version conflicts of documents, set the conflicts option to proceed.

POST _reindex
{
   "conflicts":"proceed",
   "source":{
      "index":"source"
   },
   "dest":{
      "index":"destination",
      "op_type":"create"
   }
}

Transform documents during reindexing

You can transform your data during the reindexing process using the script option. We recommend Painless for scripting in OpenSearch.

This command runs the source index through a Painless script that increments a number field inside an account object before copying it to the destination index:

POST _reindex
{
   "source":{
      "index":"source"
   },
   "dest":{
      "index":"destination"
   },
   "script":{
      "lang":"painless",
      "source":"ctx._account.number++"
   }
}

You can also specify an ingest pipeline to transform your data during the reindexing process.

You would first have to create a pipeline with processors defined. You have a number of different processors available to use in your ingest pipeline.

Here's a sample ingest pipeline that defines a split processor that splits a text field based on a space separator and stores it in a new word field. The script processor is a Painless script that finds the length of the word field and stores it in a new word_count field. The remove processor removes the test field.

PUT _ingest/pipeline/pipeline-test
{
"description": "Splits the text field into a list. Computes the length of the 'word' field and stores it in a new 'word_count' field. Removes the 'test' field.",
"processors": [
 {
   "split": {
     "field": "text",
     "separator": "\\s+",
     "target_field": "word"
   },
 }
 {
   "script": {
     "lang": "painless",
     "source": "ctx.word_count = ctx.word.length"
   }
 },
 {
   "remove": {
     "field": "test"
   }
 }
]
}

After creating a pipeline, you can use the reindex operation:

POST _reindex
{
  "source": {
    "index": "source",
  },
  "dest": {
    "index": "destination",
    "pipeline": "pipeline-test"
  }
}

Update documents in the current index

To update the data in your current index itself without copying it to a different index, use the update_by_query operation.

The update_by_query operation is POST operation that you can perform on a single index at a time.

POST <index_name>/_update_by_query

If you run this command with no parameters, it increments the version number for all documents in the index.

Source index options

You can specify the following options for your source index:

Option | Valid values | Description | Required :--- | :--- | :--- index | String | The name of the source index. You can provide multiple source indexes as a list. | Yes max_docs | Integer | The maximum number of documents to reindex. | No query | Object | The search query to use for the reindex operation. | No size | Integer | The number of documents to reindex. | No slice | String | Specify manual or automatic slicing to parallelize reindexing. | No

Destination index options

You can specify the following options for your destination index:

Option | Valid values | Description | Required :--- | :--- | :--- index | String | The name of the destination index. | Yes version_type | Enum | The version type for the indexing operation. Valid values: internal, external, external_gt, external_gte. | No