Merge pull request #1 from snyder114/initial-fixes

OpenSearch chapter fixes
This commit is contained in:
Andrew Etter 2021-05-05 18:50:26 -07:00 committed by GitHub
commit 9513e3622e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
15 changed files with 101 additions and 115 deletions

View File

@ -1,6 +1,6 @@
---
layout: default
title: Boolean Queries
title: Boolean queries
parent: OpenSearch
nav_order: 11
---

View File

@ -1,6 +1,6 @@
---
layout: default
title: Cluster Formation
title: Cluster formation
parent: OpenSearch
nav_order: 2
---
@ -13,17 +13,17 @@ OpenSearch can operate as a single-node or multi-node cluster. The steps to conf
To create and deploy an OpenSearch cluster according to your requirements, its important to understand how node discovery and cluster formation work and what settings govern them.
There are many ways that you can design a cluster. The following illustration shows a basic architecture.
There are many ways to design a cluster. The following illustration shows a basic architecture:
![multi-node cluster architecture diagram](../../images/cluster.png)
This is a four-node cluster that has one dedicated master node, one dedicated coordinating node, and two data nodes that are master-eligible and also used for ingesting data.
The following table provides brief descriptions of the node types.
The following table provides brief descriptions of the node types:
Node type | Description | Best practices for production
:--- | :--- | :-- |
`Master` | Manages the overall operation of a cluster and keeps track of the cluster state. This includes creating and deleting indices, keeping track of the nodes that join and leave the cluster, checking the health of each node in the cluster (by running ping requests), and allocating shards to nodes. | Three dedicated master nodes in three different zones is the right approach for almost all production use cases. This makes sure your cluster never loses quorum. Two nodes will be idle for most of the time except when one node goes down or needs some maintenance.
`Master` | Manages the overall operation of a cluster and keeps track of the cluster state. This includes creating and deleting indices, keeping track of the nodes that join and leave the cluster, checking the health of each node in the cluster (by running ping requests), and allocating shards to nodes. | Three dedicated master nodes in three different zones is the right approach for almost all production use cases. This configuration ensures your cluster never loses quorum. Two nodes will be idle for most of the time except when one node goes down or needs some maintenance.
`Master-eligible` | Elects one node among them as the master node through a voting process. | For production clusters, make sure you have dedicated master nodes. The way to achieve a dedicated node type is to mark all other node types as false. In this case, you have to mark all the other nodes as not master-eligible.
`Data` | Stores and searches data. Performs all data-related operations (indexing, searching, aggregating) on local shards. These are the worker nodes of your cluster and need more disk space than any other node type. | As you add data nodes, keep them balanced between zones. For example, if you have three zones, add data nodes in multiples of three, one for each zone. We recommend using storage and RAM-heavy nodes.
`Ingest` | Preprocesses data before storing it in the cluster. Runs an ingest pipeline that transforms your data before adding it to an index. | If you plan to ingest a lot of data and run complex ingest pipelines, we recommend you use dedicated ingest nodes. You can also optionally offload your indexing from the data nodes so that your data nodes are used exclusively for searching and aggregating.
@ -37,11 +37,9 @@ This page demonstrates how to work with the different node types. It assumes tha
## Prerequisites
Before you get started, you must install and configure OpenSearch on all of your nodes. For information about the available options, see [Install and Configure](../../install/).
Before you get started, you must install and configure OpenSearch on all of your nodes. For information about the available options, see [Install and configure OpenSearch](../../install/).
After you are done, use SSH to connect to each node, and then open the `config/opensearch.yml` file.
You can set all configurations for your cluster in this file.
After you're done, use SSH to connect to each node, then open the `config/opensearch.yml` file. You can set all configurations for your cluster in this file.
## Step 1: Name a cluster
@ -132,7 +130,7 @@ node.ingest: false
## Step 3: Bind a cluster to specific IP addresses
`network_host` defines the IP address that's used to bind the node. By default, OpenSearch listens on a local host, which limits the cluster to a single node. You can also use `_local_` and `_site_` to bind to any loopback or site-local address, whether IPv4 or IPv6:
`network_host` defines the IP address used to bind the node. By default, OpenSearch listens on a local host, which limits the cluster to a single node. You can also use `_local_` and `_site_` to bind to any loopback or site-local address, whether IPv4 or IPv6:
```yml
network.host: [_local_, _site_]
@ -154,7 +152,7 @@ Now that you've configured the network hosts, you need to configure the discover
Zen Discovery is the built-in, default mechanism that uses [unicast](https://en.wikipedia.org/wiki/Unicast) to find other nodes in the cluster.
You can generally just add all of your master-eligible nodes to the `discovery.seed_hosts` array. When a node starts up, it finds the other master-eligible nodes, determines which one is the master, and asks to join the cluster.
You can generally just add all your master-eligible nodes to the `discovery.seed_hosts` array. When a node starts up, it finds the other master-eligible nodes, determines which one is the master, and asks to join the cluster.
For example, for `opensearch-master` the line looks something like this:
@ -165,7 +163,7 @@ discovery.seed_hosts: ["<private IP of opensearch-d1>", "<private IP of opensear
## Step 5: Start the cluster
After you set the configurations, start OpenSearch on all nodes.
After you set the configurations, start OpenSearch on all nodes:
```bash
sudo systemctl start opensearch.service
@ -220,9 +218,9 @@ PUT _cluster/settings
}
```
You can either use `persistent` or `transient` settings. We recommend the `persistent` setting because it persists through a cluster reboot. Transient settings do not persist through a cluster reboot.
You can either use `persistent` or `transient` settings. We recommend the `persistent` setting because it persists through a cluster reboot. Transient settings don't persist through a cluster reboot.
Shard allocation awareness attempts to separate primary and replica shards across multiple zones. But, if only one zone is available (such as after a zone failure), OpenSearch allocates replica shards to the only remaining zone.
Shard allocation awareness attempts to separate primary and replica shards across multiple zones. However, if only one zone is available (such as after a zone failure), OpenSearch allocates replica shards to the only remaining zone.
Another option is to require that primary and replica shards are never allocated to the same zone. This is called forced awareness.
@ -238,7 +236,7 @@ PUT _cluster/settings
}
```
Now, if a data node fails, forced awareness does not allocate the replicas to a node in the same zone. Instead, the cluster enters a yellow state and only allocates the replicas when nodes in another zone come online.
Now, if a data node fails, forced awareness doesn't allocate the replicas to a node in the same zone. Instead, the cluster enters a yellow state and only allocates the replicas when nodes in another zone come online.
In our two-zone architecture, we can use allocation awareness if `opensearch-d1` and `opensearch-d2` are less than 50% utilized, so that each of them have the storage capacity to allocate replicas in the same zone.
If that is not the case, and `opensearch-d1` and `opensearch-d2` do not have the capacity to contain all primary and replica shards, we can use forced awareness. This approach helps to make sure that, in the event of a failure, OpenSearch doesn't overload your last remaining zone and lock up your cluster due to lack of storage.

View File

@ -1,6 +1,6 @@
---
layout: default
title: Full-Text Queries
title: Full-text queries
parent: OpenSearch
nav_order: 10
---
@ -421,10 +421,10 @@ Option | Valid values | Description
`fuzziness` | `AUTO`, `0`, or a positive integer | The number of character edits (insert, delete, substitute) that it takes to change one word to another when determining whether a term matched a value. For example, the distance between `wined` and `wind` is 1. The default, `AUTO`, chooses a value based on the length of each term and is a good choice for most use cases.
`fuzzy_transpositions` | Boolean | Setting `fuzzy_transpositions` to true (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the `fuzziness` option. For example, the distance between `wind` and `wnid` is 1 if `fuzzy_transpositions` is true (swap "n" and "i") and 2 if it is false (delete "n", insert "n"). <br /><br />If `fuzzy_transpositions` is false, `rewind` and `wnid` have the same distance (2) from `wind`, despite the more human-centric opinion that `wnid` is an obvious typo. The default is a good choice for most use cases.
`lenient` | Boolean | Setting `lenient` to true lets you ignore data type mismatches between the query and the document field. For example, a query string of "8.2" could match a field of type `float`. The default is false.
`low_freq_operator` | `and, or` | The operator for low-frequency terms. The default is `or`. See [Common Terms](#common-terms) queries and `operator` in this table.
`low_freq_operator` | `and, or` | The operator for low-frequency terms. The default is `or`. See [Common terms](#common-terms) queries and `operator` in this table.
`max_determinized_states` | Positive integer | The maximum number of "[states](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/util/automaton/Operations.html#DEFAULT_MAX_DETERMINIZED_STATES)" (a measure of complexity) that Lucene can create for query strings that contain regular expressions (e.g. `"query": "/wind.+?/"`). Larger numbers allow for queries that use more memory. The default is 10,000.
`max_expansions` | Positive integer | Fuzzy queries "expand to" a number of matching terms that are within the distance specified in `fuzziness`. Then OpenSearch tries to match those terms against its indices. `max_expansions` specifies the maximum number of terms that the fuzzy query expands to. The default is 50.
`minimum_should_match` | Positive or negative integer, positive or negative percentage, combination | If the query string contains multiple search terms and you used the `or` operator, the number of terms that need to match for the document to be considered a match. For example, if `minimum_should_match` is 2, "wind often rising" does not match "The Wind Rises." If `minimum_should_match` is 1, it matches. This option also has `low_freq` and `high_freq` properties for [Common Terms](#common-terms) queries.
`minimum_should_match` | Positive or negative integer, positive or negative percentage, combination | If the query string contains multiple search terms and you used the `or` operator, the number of terms that need to match for the document to be considered a match. For example, if `minimum_should_match` is 2, "wind often rising" does not match "The Wind Rises." If `minimum_should_match` is 1, it matches. This option also has `low_freq` and `high_freq` properties for [Common terms](#common-terms) queries.
`operator` | `or, and` | If the query string contains multiple search terms, whether all terms need to match (`and`) or only one term needs to match (`or`) for a document to be considered a match.
`phrase_slop` | `0` (default) or a positive integer | See `slop`.
`prefix_length` | `0` (default) or a positive integer | The number of leading characters that are not considered in fuzziness.

View File

@ -1,11 +1,11 @@
---
layout: default
title: Index Aliases
title: Index aliases
parent: OpenSearch
nav_order: 4
---
# Index alias
# Index aliases
An alias is a virtual index name that can point to one or more indices.

View File

@ -1,6 +1,6 @@
---
layout: default
title: Index Data
title: Index data
parent: OpenSearch
nav_order: 3
---
@ -16,9 +16,9 @@ For situations in which new data arrives incrementally (for example, customer or
Before you can search data, you must *index* it. Indexing is the method by which search engines organize data for fast retrieval. The resulting structure is called, fittingly, an index.
In OpenSearch, the basic unit of data is a JSON *document*. Within an index, OpenSearch identifies each document using a unique *ID*.
In OpenSearch, the basic unit of data is a JSON *document*. Within an index, OpenSearch identifies each document using a unique ID.
A request to the index API looks like the following:
A request to the index API looks like this:
```json
PUT <index>/_doc/<id>
@ -31,7 +31,6 @@ A request to the `_bulk` API looks a little different, because you specify the i
POST _bulk
{ "index": { "_index": "<index>", "_id": "<id>" } }
{ "A JSON": "document" }
```
Bulk data must conform to a specific format, which requires a newline character (`\n`) at the end of every line, including the last line. This is the basic format:
@ -41,10 +40,9 @@ Action and metadata\n
Optional document\n
Action and metadata\n
Optional document\n
```
The document is optional, because `delete` actions do not require a document. The other actions (`index`, `create`, and `update`) all require a document. If you specifically want the action to fail if the document already exists, use the `create` action instead of the `index` action.
The document is optional, because `delete` actions don't require a document. The other actions (`index`, `create`, and `update`) all require a document. If you specifically want the action to fail if the document already exists, use the `create` action instead of the `index` action.
{: .note }
To index bulk data using the `curl` command, navigate to the folder where you have your file saved and run the following command:
@ -55,14 +53,14 @@ curl -H "Content-Type: application/x-ndjson" -POST https://localhost:9200/data/_
If any one of the actions in the `_bulk` API fail, OpenSearch continues to execute the other actions. Examine the `items` array in the response to figure out what went wrong. The entries in the `items` array are in the same order as the actions specified in the request.
OpenSearch features automatic index creation when you add a document to an index that doesn't already exist. It also features automatic ID generation if you don't specify an ID in the request. This simple example automatically creates the movies index, indexes the document, and assigns it a unique ID:
OpenSearch automatically creates an index when you add a document to an index that doesn't already exist. It also automatically generates an ID if you don't specify an ID in the request. This simple example automatically creates the movies index, indexes the document, and assigns it a unique ID:
```json
POST movies/_doc
{ "title": "Spirited Away" }
```
Automatic ID generation has a clear downside: because the indexing request didn't specify a document ID, you can't easily update the document at a later time. Also, if you run this request 10 times, OpenSearch indexes this document as 10 different documents with unique IDs. To specify an ID of 1, use the following request, and note the use of PUT instead of POST:
Automatic ID generation has a clear downside: because the indexing request didn't specify a document ID, you can't easily update the document at a later time. Also, if you run this request 10 times, OpenSearch indexes this document as 10 different documents with unique IDs. To specify an ID of 1, use the following request (note the use of PUT instead of POST):
```json
PUT movies/_doc/1
@ -83,7 +81,7 @@ PUT more-movies
OpenSearch indices have the following naming restrictions:
- All letters must be lowercase.
- Index names can't begin with `_` (underscore) or `-` (hyphen).
- Index names can't begin with underscores (`_`) or hyphens (`-`).
- Index names can't contain spaces, commas, or the following characters:
`:`, `"`, `*`, `+`, `/`, `\`, `|`, `?`, `#`, `>`, or `<`

View File

@ -1,17 +1,14 @@
---
layout: default
title: Index Templates
title: Index templates
parent: OpenSearch
nav_order: 5
---
# Index template
# Index templates
Index templates let you initialize new indices with predefined mappings and settings. For example, if you continuously index log data, you can define an index template so that all of these indices have the same number of shards and replicas.
OpenSearch switched from `_template` to `_index_template` in version 7.8. Use `_template` for older versions of OpenSearch.
{: .note }
---
#### Table of contents
@ -21,7 +18,7 @@ OpenSearch switched from `_template` to `_index_template` in version 7.8. Use `_
---
## Create template
## Create a template
To create an index template, use a POST request:
@ -110,7 +107,7 @@ GET logs-2020-01-01
Any additional indices that match this pattern---`logs-2020-01-02`, `logs-2020-01-03`, and so on---will inherit the same mappings and settings.
## Retrieve template
## Retrieve a template
To list all index templates:
@ -148,7 +145,7 @@ You can create multiple index templates for your indices. If the index name matc
The settings from the more recently created index templates override the settings of older index templates. So, you can first define a few common settings in a generic template that can act as a catch-all and then add more specialized settings as required.
An even better approach is to explicitly specify template priority using the `order` parameter. OpenSearch applies templates with lower priority numbers first and then overrides them with templates that have higher priority numbers.
An even better approach is to explicitly specify template priority using the `order` parameter. OpenSearch applies templates with lower priority numbers first and then overrides them with templates with higher priority numbers.
For example, say you have the following two templates that both match the `logs-2020-01-02` index and theres a conflict in the `number_of_shards` field:
@ -188,9 +185,9 @@ PUT _index_template/template-02
Because `template-02` has a higher `priority` value, it takes precedence over `template-01` . The `logs-2020-01-02` index would have the `number_of_shards` value as 3.
## Delete template
## Delete a template
You can delete an index template using its name, as shown in the following command:
You can delete an index template using its name:
```json
DELETE _index_template/daily_logs
@ -198,9 +195,9 @@ DELETE _index_template/daily_logs
## Index template options
You can specify the options shown in the following table:
You can specify the following template options:
Option | Type | Description | Required
:--- | :--- | :--- | :---
`priority` | `Number` | Specify the priority of the index template. | No
`create` | `Boolean` | Specify whether this index template should replace an existing one. | No
`priority` | `Number` | The priority of the index template. | No
`create` | `Boolean` | Whether this index template should replace an existing one. | No

View File

@ -21,7 +21,7 @@ Its distributed design means that you interact with OpenSearch *clusters*. Each
You can run OpenSearch locally on a laptop---its system requirements are minimal---but you can also scale a single cluster to hundreds of powerful machines in a data center.
In a single node cluster, such as a laptop, one machine has to do everything: manage the state of the cluster, index and search data, and perform any preprocessing of data prior to indexing it. As a cluster grows, however, you can subdivide responsibilities. Nodes with fast disks and plenty of RAM might be great at indexing and searching data, whereas a node with plenty of CPU power and a tiny disk could manage cluster state. For more information on setting node types, see [Cluster Formation](cluster/).
In a single node cluster, such as a laptop, one machine has to do everything: manage the state of the cluster, index and search data, and perform any preprocessing of data prior to indexing it. As a cluster grows, however, you can subdivide responsibilities. Nodes with fast disks and plenty of RAM might be great at indexing and searching data, whereas a node with plenty of CPU power and a tiny disk could manage cluster state. For more information on setting node types, see [Cluster formation](cluster/).
## Indices and documents
@ -55,10 +55,6 @@ Indices also contain mappings and settings:
- A *mapping* is the collection of *fields* that documents in the index have. In this case, those fields are `title` and `release_date`.
- Settings include data like the index name, creation date, and number of shards.
Older versions of OpenSearch used arbitrary document *types*, but indices created in current versions of OpenSearch should use a single type named `_doc`. Store different document types in different indices.
{: .note }
## Primary and replica shards
OpenSearch splits indices into *shards* for even distribution across nodes in a cluster. For example, a 400 GB index might be too large for any single node in your cluster to handle, but split into ten shards, each one 40 GB, OpenSearch can distribute the shards across ten nodes and work with each shard individually.

View File

@ -155,7 +155,7 @@ PUT <some-index>/_settings
}
```
In this example, OpenSearch logs indexing operations that take 15 seconds or longer at the WARN level and operations that take between 10 and 14.*x* seconds at the INFO level. If you set a threshold to 0 seconds, OpenSearch logs all operations, which can be useful for testing that slow logs are indeed enabled.
In this example, OpenSearch logs indexing operations that take 15 seconds or longer at the WARN level and operations that take between 10 and 14.*x* seconds at the INFO level. If you set a threshold to 0 seconds, OpenSearch logs all operations, which can be useful for testing whether slow logs are indeed enabled.
- `reformat` specifies whether to log the document `_source` field as a single line (`true`) or let it span multiple lines (`false`).
- `source` is the number of characters of the document `_source` field to log.

View File

@ -1,13 +1,13 @@
---
layout: default
title: Reindex Data
title: Reindex data
parent: OpenSearch
nav_order: 6
---
# Reindex data
After creating an index, if you need to make an extensive change such as adding a new field to every document or combining multiple indices to form a new one, rather than deleting your index, making the change offline, and then indexing your data all over again, you can use the `reindex` operation.
After creating an index, you might need to make an extensive change such as adding a new field to every document or combining multiple indices to form a new one. Rather than deleting your index, making the change offline, and then indexing your data all over again, you can use the `reindex` operation.
With the `reindex` operation, you can copy all or a subset of documents that you select through a query to another index. Reindex is a `POST` operation. In its most basic form, you specify a source index and a destination index.
@ -84,15 +84,15 @@ You can specify the following options:
Options | Valid values | Description | Required
:--- | :--- | :---
`host` | String | The REST endpoint of the remote cluster. | Yes
`username` | String | The username to login to the remote cluster. | No
`password` | String | The password to login to the remote cluster. | No
`username` | String | The username to log into the remote cluster. | No
`password` | String | The password to log into the remote cluster. | No
`socket_timeout` | Time Unit | The wait time for socket reads (default 30s). | No
`connect_timeout` | Time Unit | The wait time for remote connection timeouts (default 30s). | No
## Reindex a subset of documents
You can copy only a specific set of documents that match a search query.
You can copy a specific set of documents that match a search query.
This command copies only a subset of documents matched by a query operation to the destination index:
@ -250,9 +250,9 @@ POST _reindex
}
```
## Update documents in current index
## Update documents in the current index
To update your data in your current index itself without copying it to a different index, use the `update_by_query` operation.
To update the data in your current index itself without copying it to a different index, use the `update_by_query` operation.
The `update_by_query` operation is `POST` operation that you can perform on a single index at a time.

View File

@ -1,6 +1,6 @@
---
layout: default
title: Search Templates
title: Search templates
parent: OpenSearch
nav_order: 11
---
@ -20,7 +20,7 @@ Search templates use the Mustache language. For a list of all syntax options, se
A search template has two components: the query and the parameters. Parameters are user-inputted values that get placed into variables. Variables are represented with double braces in Mustache notation. When encountering a variable like `{% raw %}{{var}}{% endraw %}` in the query, OpenSearch goes to the `params` section, looks for a parameter called `var`, and replaces it with the specified value.
You can code your application to ask your user what they want to search for and then plug in that value in the `params` object at runtime.
You can code your application to ask your user what they want to search for and then plug that value into the `params` object at runtime.
This command defines a search template to find a play by its name. The `{% raw %}{{play_name}}{% endraw %}` in the query is replaced by the value `Henry IV`:
@ -69,7 +69,7 @@ GET _search/template
}
```
To improve the search experience, you can define defaults so that the user doesnt have to specify every possible parameter. If the parameter is not defined in the `params` section, OpenSearch uses the default value.
To improve the search experience, you can define defaults so the user doesnt have to specify every possible parameter. If the parameter is not defined in the `params` section, OpenSearch uses the default value.
The syntax for defining the default value for a variable `var` is as follows:

View File

@ -1,6 +1,6 @@
---
layout: default
title: Take and Restore Snapshots
title: Take and restore snapshots
parent: OpenSearch
nav_order: 30
---
@ -85,12 +85,12 @@ You probably only need to specify `location`, but the following table summarizes
Setting | Description
:--- | :---
location | The shared file system for snapshots. Required.
chunk_size | Breaks large files into chunks during snapshot operations (e.g. `64mb`, `1gb`), which is important for cloud storage providers and far less important for shared file systems. Default is `null` (unlimited). Optional.
compress | Whether to compress metadata files. This setting does not affect data files, which might already be compressed, depending on your index settings. Default is `false`. Optional.
max_restore_bytes_per_sec | The maximum rate at which snapshots restore. Default is 40 MB per second (`40m`). Optional.
max_snapshot_bytes_per_sec | The maximum rate at which snapshots take. Default is 40 MB per second (`40m`). Optional.
readonly | Whether the repository is read-only. Useful when migrating from one cluster (`"readonly": false` when registering) to another cluster (`"readonly": true` when registering). Optional.
`location` | The shared file system for snapshots. Required.
`chunk_size` | Breaks large files into chunks during snapshot operations (e.g. `64mb`, `1gb`), which is important for cloud storage providers and far less important for shared file systems. Default is `null` (unlimited). Optional.
`compress` | Whether to compress metadata files. This setting does not affect data files, which might already be compressed, depending on your index settings. Default is `false`. Optional.
`max_restore_bytes_per_sec` | The maximum rate at which snapshots restore. Default is 40 MB per second (`40m`). Optional.
`max_snapshot_bytes_per_sec` | The maximum rate at which snapshots take. Default is 40 MB per second (`40m`). Optional.
`readonly` | Whether the repository is read-only. Useful when migrating from one cluster (`"readonly": false` when registering) to another cluster (`"readonly": true` when registering). Optional.
### Amazon S3
@ -200,18 +200,18 @@ You probably don't need to specify anything but `bucket` and `base_path`, but th
Setting | Description
:--- | :---
base_path | The path within the bucket where you want to store snapshots (e.g. `my/snapshot/directory`). Optional. If not specified, snapshots are stored in the bucket root.
bucket | Name of the S3 bucket. Required.
buffer_size | The threshold beyond which chunks (of `chunk_size`) should be broken into pieces (of `buffer_size`) and sent to S3 using a different API. Default is the smaller of two values: 100 MB or 5% of the Java heap. Valid values are between `5mb` and `5gb`. We don't recommend changing this option.
canned_acl | S3 has several [canned ACLs](https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl) that the `repository-s3` plugin can add to objects as it creates them in S3. Default is `private`. Optional.
chunk_size | Breaks files into chunks during snapshot operations (e.g. `64mb`, `1gb`), which is important for cloud storage providers and far less important for shared file systems. Default is `1gb`. Optional.
client | When specifying client settings (e.g. `s3.client.default.access_key`), you can use a string other than `default` (e.g. `s3.client.backup-role.access_key`). If you used an alternate name, change this value to match. Default and recommended value is `default`. Optional.
compress | Whether to compress metadata files. This setting does not affect data files, which might already be compressed, depending on your index settings. Default is `false`. Optional.
max_restore_bytes_per_sec | The maximum rate at which snapshots restore. Default is 40 MB per second (`40m`). Optional.
max_snapshot_bytes_per_sec | The maximum rate at which snapshots take. Default is 40 MB per second (`40m`). Optional.
readonly | Whether the repository is read-only. Useful when migrating from one cluster (`"readonly": false` when registering) to another cluster (`"readonly": true` when registering). Optional.
server_side_encryption | Whether to encrypt snapshot files in the S3 bucket. This setting uses AES-256 with S3-managed keys. See [Protecting Data Using Server-Side Encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html). Default is false. Optional.
storage_class | Specifies the [S3 storage class](https://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html) for the snapshots files. Default is `standard`. Do not use the `glacier` and `deep_archive` storage classes. Optional.
`base_path` | The path within the bucket where you want to store snapshots (e.g. `my/snapshot/directory`). Optional. If not specified, snapshots are stored in the bucket root.
`bucket` | Name of the S3 bucket. Required.
`buffer_size` | The threshold beyond which chunks (of `chunk_size`) should be broken into pieces (of `buffer_size`) and sent to S3 using a different API. Default is the smaller of two values: 100 MB or 5% of the Java heap. Valid values are between `5mb` and `5gb`. We don't recommend changing this option.
`canned_acl` | S3 has several [canned ACLs](https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl) that the `repository-s3` plugin can add to objects as it creates them in S3. Default is `private`. Optional.
`chunk_size` | Breaks files into chunks during snapshot operations (e.g. `64mb`, `1gb`), which is important for cloud storage providers and far less important for shared file systems. Default is `1gb`. Optional.
`client` | When specifying client settings (e.g. `s3.client.default.access_key`), you can use a string other than `default` (e.g. `s3.client.backup-role.access_key`). If you used an alternate name, change this value to match. Default and recommended value is `default`. Optional.
`compress` | Whether to compress metadata files. This setting does not affect data files, which might already be compressed, depending on your index settings. Default is `false`. Optional.
`max_restore_bytes_per_sec` | The maximum rate at which snapshots restore. Default is 40 MB per second (`40m`). Optional.
`max_snapshot_bytes_per_sec` | The maximum rate at which snapshots take. Default is 40 MB per second (`40m`). Optional.
`readonly` | Whether the repository is read-only. Useful when migrating from one cluster (`"readonly": false` when registering) to another cluster (`"readonly": true` when registering). Optional.
`server_side_encryption` | Whether to encrypt snapshot files in the S3 bucket. This setting uses AES-256 with S3-managed keys. See [Protecting data using server-side encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html). Default is false. Optional.
`storage_class` | Specifies the [S3 storage class](https://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html) for the snapshots files. Default is `standard`. Do not use the `glacier` and `deep_archive` storage classes. Optional.
## Take snapshots
@ -241,10 +241,10 @@ PUT _snapshot/my-repository/2
Setting | Description
:--- | :---
indices | The indices you want to include in the snapshot. You can use `,` to create a list of indices, `*` to specify an index pattern, and `-` to exclude certain indices. Don't put spaces between items. Default is all indices.
ignore_unavailable | If an index from the `indices` list doesn't exist, whether to ignore it rather than fail the snapshot. Default is false.
include_global_state | Whether to include cluster state in the snapshot. Default is true.
partial | Whether to allow partial snapshots. Default is false, which fails the entire snapshot if one or more shards fails to store.
`indices` | The indices you want to include in the snapshot. You can use `,` to create a list of indices, `*` to specify an index pattern, and `-` to exclude certain indices. Don't put spaces between items. Default is all indices.
`ignore_unavailable` | If an index from the `indices` list doesn't exist, whether to ignore it rather than fail the snapshot. Default is false.
`include_global_state` | Whether to include cluster state in the snapshot. Default is true.
`partial` | Whether to allow partial snapshots. Default is false, which fails the entire snapshot if one or more shards fails to store.
If you request the snapshot immediately after taking it, you might see something like this:
@ -333,15 +333,15 @@ POST _snapshot/my-repository/2/_restore
Setting | Description
:--- | :---
indices | The indices you want to restore. You can use `,` to create a list of indices, `*` to specify an index pattern, and `-` to exclude certain indices. Don't put spaces between items. Default is all indices.
ignore_unavailable | If an index from the `indices` list doesn't exist, whether to ignore it rather than fail the restore operation. Default is false.
include_global_state | Whether to restore the cluster state. Default is false.
include_aliases | Whether to restore aliases alongside their associated indices. Default is true.
partial | Whether to allow the restoration of partial snapshots. Default is false.
rename_pattern | If you want to rename indices as you restore them, use this option to specify a regular expression that matches all indices you want to restore. Use capture groups (`()`) to reuse portions of the index name.
rename_replacement | If you want to rename indices as you restore them, use this option to specify the replacement pattern. Use `$0` to include the entire matching index name, `$1` to include the content of the first capture group, etc.
index_settings | If you want to change index settings on restore, specify them here.
ignore_index_settings | Rather than explicitly specifying new settings with `index_settings`, you can ignore certain index settings in the snapshot and use the cluster defaults on restore.
`indices` | The indices you want to restore. You can use `,` to create a list of indices, `*` to specify an index pattern, and `-` to exclude certain indices. Don't put spaces between items. Default is all indices.
`ignore_unavailable` | If an index from the `indices` list doesn't exist, whether to ignore it rather than fail the restore operation. Default is false.
`include_global_state` | Whether to restore the cluster state. Default is false.
`include_aliases` | Whether to restore aliases alongside their associated indices. Default is true.
`partial` | Whether to allow the restoration of partial snapshots. Default is false.
`rename_pattern` | If you want to rename indices as you restore them, use this option to specify a regular expression that matches all indices you want to restore. Use capture groups (`()`) to reuse portions of the index name.
`rename_replacement` | If you want to rename indices as you restore them, use this option to specify the replacement pattern. Use `$0` to include the entire matching index name, `$1` to include the content of the first capture group, etc.
`index_settings` | If you want to change index settings on restore, specify them here.
`ignore_index_settings` | Rather than explicitly specifying new settings with `index_settings`, you can ignore certain index settings in the snapshot and use the cluster defaults on restore.
### Conflicts and compatibility
@ -356,10 +356,7 @@ We recommend ceasing write requests to a cluster before restoring from a snapsho
1. A write request to the now-deleted alias creates a new index with the same name as the alias.
1. The alias from the snapshot fails to restore due to a naming conflict with the new index.
Snapshots are only forward-compatible by one major version. For example, you can't restore snapshots taken on a 2.x cluster to a 1.x cluster or a 6.x cluster, but you *can* restore them on a 2.x or 5.x cluster.
If you have an old snapshot, you can sometimes restore it into an intermediate cluster, reindex all indices, take a new snapshot, and repeat until you arrive at your desired version, but you might find it easier to just manually index your data on the new cluster.
Snapshots are only forward-compatible by one major version. If you have an old snapshot, you can sometimes restore it into an intermediate cluster, reindex all indices, take a new snapshot, and repeat until you arrive at your desired version, but you might find it easier to just manually index your data on the new cluster.
## Security plugin considerations

View File

@ -16,7 +16,7 @@ The following request returns information about all of your tasks:
GET _tasks
```
By including a task ID, you can get information that's specific to a particular task. Note that a task ID consists of a node's identifying string and the task's numerical ID. For example, if your node's identifying string is `nodestring` and the task's numerical ID is `1234`, then your task ID is `nodestring:1234`. You can find this information by running the `tasks` operation.
By including a task ID, you can get information specific to a particular task. Note that a task ID consists of a node's identifying string and the task's numerical ID. For example, if your node's identifying string is `nodestring` and the task's numerical ID is `1234`, then your task ID is `nodestring:1234`. You can find this information by running the `tasks` operation:
```
GET _tasks/<task_id>
@ -80,16 +80,16 @@ You can also use the following parameters with your query.
Parameter | Data type | Description |
:--- | :--- | :---
nodes | List | A comma-separated list of node IDs or names to limit the returned information. Use `_local` to return information from the node you're connecting to, specify the node name to get information from specific nodes, or keep the parameter empty to get information from all nodes.
actions | List | A comma-separated list of actions that should be returned. Keep empty to return all.
detailed | Boolean | Returns detailed task information. (Default: false)
parent_task_id | String | Returns tasks with a specified parent task ID (node_id:task_number). Keep empty or set to -1 to return all.
wait_for_completion | Boolean | Waits for the matching tasks to complete. (Default: false)
group_by | Enum | Groups tasks by parent/child relationships or nodes. (Default: nodes)
timeout | Time | An explicit operation timeout. (Default: 30 seconds)
master_timeout | Time | The time to wait for a connection to the primary node. (Default: 30 seconds)
`nodes` | List | A comma-separated list of node IDs or names to limit the returned information. Use `_local` to return information from the node you're connecting to, specify the node name to get information from specific nodes, or keep the parameter empty to get information from all nodes.
`actions` | List | A comma-separated list of actions that should be returned. Keep empty to return all.
`detailed` | Boolean | Returns detailed task information. (Default: false)
`parent_task_id` | String | Returns tasks with a specified parent task ID (node_id:task_number). Keep empty or set to -1 to return all.
`wait_for_completion` | Boolean | Waits for the matching tasks to complete. (Default: false)
`group_by` | Enum | Groups tasks by parent/child relationships or nodes. (Default: nodes)
`timeout` | Time | An explicit operation timeout. (Default: 30 seconds)
`master_timeout` | Time | The time to wait for a connection to the primary node. (Default: 30 seconds)
For example, this request returns tasks currently running on a node named `opensearch-node1`.
For example, this request returns tasks currently running on a node named `opensearch-node1`:
**Sample Request**
@ -225,7 +225,7 @@ content-length: 768
}
}
```
This operation supports the same parameters as the `tasks` operation. The following example shows how you can associate `X-Opaque-Id` with specific tasks.
This operation supports the same parameters as the `tasks` operation. The following example shows how you can associate `X-Opaque-Id` with specific tasks:
```bash
curl -i -H "X-Opaque-Id: 123456" "https://localhost:9200/_tasks?nodes=opensearch-node1" -u 'admin:admin' --insecure

View File

@ -1,6 +1,6 @@
---
layout: default
title: Term-Level Queries
title: Term-level queries
parent: OpenSearch
nav_order: 9
---
@ -9,7 +9,7 @@ nav_order: 9
OpenSearch supports two types of queries when you search for data: term-level queries and full-text queries.
The following table shows the differences between them:
The following table describes the differences between them:
| | Term-level queries | Full-text queries
:--- | :--- | :---

View File

@ -1,6 +1,6 @@
---
layout: default
title: Supported Units
title: Supported units
parent: OpenSearch
nav_order: 90
---

View File

@ -1,11 +1,11 @@
---
layout: default
title: Search Experience
title: Search experience
parent: OpenSearch
nav_order: 12
---
# Search Experience
# Search experience
Expectations from search engines have evolved over the years. Just returning relevant results quickly is no longer enough for most users. OpenSearch includes many features that enhance the users search experience as follows:
@ -25,7 +25,7 @@ Autocomplete shows suggestions to users while they type.
For example, if a user types "pop," OpenSearch provides suggestions like "popcorn" or "popsicles." These suggestions preempt your user's intention and lead them to a possible search term more quickly.
OpenSearch allows you to design autocomplete that updates with each keystroke, provides a few relevant suggestions, and tolerates typos.
OpenSearch lets you design autocomplete that updates with each keystroke, provides a few relevant suggestions, and tolerates typos.
Implement autocomplete using one of three methods:
@ -33,7 +33,7 @@ Implement autocomplete using one of three methods:
- Edge N-gram matching
- Completion suggesters
These methods are described below.
These methods are described in the following sections.
### Prefix matching
@ -866,9 +866,9 @@ To close all open scroll contexts:
DELETE _search/scroll/_all
```
The `scroll` operation corresponds to a specific timestamp. It does not consider documents added after that timestamp as potential results.
The `scroll` operation corresponds to a specific timestamp. It doesn't consider documents added after that timestamp as potential results.
Because open search contexts consume a lot of memory, we suggest you do not use the `scroll` operation for frequent user queries that don't need the search context open. Instead, use the `sort` parameter with the `search_after` parameter to scroll responses for user queries.
Because open search contexts consume a lot of memory, we suggest you don't use the `scroll` operation for frequent user queries that don't need the search context open. Instead, use the `sort` parameter with the `search_after` parameter to scroll responses for user queries.
## Sort results
@ -928,7 +928,7 @@ GET shakespeare/_search
}
```
You can continue to sort by any number of field values to get the results in just the right order. It doesnt have to be a numerical value, you can also sort by date or timestamp fields:
You can continue to sort by any number of field values to get the results in just the right order. It doesnt have to be a numerical value---you can also sort by date or timestamp fields:
```json
"sort": [