[OSCI][DOCs] Replacing 'indices' terms for 'indexes' terms ONLY for description texts (#5353)

* Fixing documentation for Wildcard in term-level queries section for Query DSL

Signed-off-by: Samuel Valdes Gutierrez <valdesgutierrez@gmail.com>

* replacing 'indices' term for 'indexes' term ONLY for description texts (not variables, links or properties)

Signed-off-by: Samuel Valdes Gutierrez <valdesgutierrez@gmail.com>

* Update creating-custom-workloads.md

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* updating changes suggested by Naarcha-AWS

Signed-off-by: Samuel Valdes Gutierrez <valdesgutierrez@gmail.com>

* updating changes suggested by Naarcha-AWS

Signed-off-by: Samuel Valdes Gutierrez <valdesgutierrez@gmail.com>

* Rename _benchmark/workloads/index.md to _benchmark/workloads/reference/index.md

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Rename _benchmark/workloads/indices.md to _benchmark/workloads/reference/indices.md

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

---------

Signed-off-by: Samuel Valdes Gutierrez <valdesgutierrez@gmail.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
This commit is contained in:
Samuel Valdes Gutierrez 2023-10-31 15:18:45 +00:00 committed by GitHub
parent f9e0c02fdf
commit 7012af124a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
16 changed files with 266 additions and 116 deletions

View File

@ -53,7 +53,7 @@ In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-refe
Parameter | Type | Description Parameter | Type | Description
:--- | :--- | :--- :--- | :--- | :---
local | Boolean | Whether to return information from the local node only instead of from the master node. Default is false. local | Boolean | Whether to return information from the local node only instead of from the master node. Default is false.
expand_wildcards | Enum | Expands wildcard expressions to concrete indices. Combine multiple values with commas. Supported values are `all`, `open`, `closed`, `hidden`, and `none`. Default is `open`. expand_wildcards | Enum | Expands wildcard expressions to concrete indexes. Combine multiple values with commas. Supported values are `all`, `open`, `closed`, `hidden`, and `none`. Default is `open`.
## Response ## Response

View File

@ -12,7 +12,7 @@ redirect_from:
**Introduced 1.0** **Introduced 1.0**
{: .label .label-purple } {: .label .label-purple }
The CAT indices operation lists information related to indexes, that is, how much disk space they are using, how many shards they have, their health status, and so on. The CAT indexes operation lists information related to indexes, that is, how much disk space they are using, how many shards they have, their health status, and so on.
## Example ## Example
@ -44,7 +44,7 @@ GET _cat/indices
## URL parameters ## URL parameters
All CAT indices URL parameters are optional. All CAT indexes URL parameters are optional.
In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters: In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters:

View File

@ -2,7 +2,7 @@
layout: default layout: default
title: Count title: Count
nav_order: 21 nav_order: 21
redirect_from: redirect_from:
- /opensearch/rest-api/count/ - /opensearch/rest-api/count/
--- ---
@ -61,7 +61,7 @@ GET _count
``` ```
{% include copy-curl.html %} {% include copy-curl.html %}
Alternatively, you could use the [cat indices]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-indices/) and [cat count]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-count/) APIs to see the number of documents per index or data stream. Alternatively, you could use the [cat indexes]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-indices/) and [cat count]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-count/) APIs to see the number of documents per index or data stream.
{: .note } {: .note }

View File

@ -57,7 +57,7 @@ refresh | Enum | Whether to refresh the affected shards after performing the ind
require_alias | Boolean | Set to `true` to require that all actions target an index alias rather than an index. Default is `false`. require_alias | Boolean | Set to `true` to require that all actions target an index alias rather than an index. Default is `false`.
routing | String | Routes the request to the specified shard. routing | String | Routes the request to the specified shard.
timeout | Time | How long to wait for the request to return. Default `1m`. timeout | Time | How long to wait for the request to return. Default `1m`.
type | String | (Deprecated) The default document type for documents that don't specify a type. Default is `_doc`. We highly recommend ignoring this parameter and using a type of `_doc` for all indices. type | String | (Deprecated) The default document type for documents that don't specify a type. Default is `_doc`. We highly recommend ignoring this parameter and using a type of `_doc` for all indexes.
wait_for_active_shards | String | Specifies the number of active shards that must be available before OpenSearch processes the bulk request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the request to succeed. wait_for_active_shards | String | Specifies the number of active shards that must be available before OpenSearch processes the bulk request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the request to succeed.
{% comment %}_source | List | asdf {% comment %}_source | List | asdf
_source_excludes | list | asdf _source_excludes | list | asdf
@ -114,7 +114,7 @@ All actions support the same metadata: `_index`, `_id`, and `_require_alias`. If
{ "update": { "_index": "movies", "_id": "tt0816711" } } { "update": { "_index": "movies", "_id": "tt0816711" } }
{ "doc" : { "title": "World War Z" } } { "doc" : { "title": "World War Z" } }
``` ```
It can also include a script or upsert for more complex document updates. It can also include a script or upsert for more complex document updates.
- Script - Script
@ -122,7 +122,7 @@ All actions support the same metadata: `_index`, `_id`, and `_require_alias`. If
{ "update": { "_index": "movies", "_id": "tt0816711" } } { "update": { "_index": "movies", "_id": "tt0816711" } }
{ "script" : { "source": "ctx._source.title = \"World War Z\"" } } { "script" : { "source": "ctx._source.title = \"World War Z\"" } }
``` ```
- Upsert - Upsert
```json ```json
{ "update": { "_index": "movies", "_id": "tt0816711" } } { "update": { "_index": "movies", "_id": "tt0816711" } }

View File

@ -3,7 +3,7 @@ layout: default
title: Delete by query title: Delete by query
parent: Document APIs parent: Document APIs
nav_order: 40 nav_order: 40
redirect_from: redirect_from:
- /opensearch/rest-api/document-apis/delete-by-query/ - /opensearch/rest-api/document-apis/delete-by-query/
--- ---
@ -39,16 +39,16 @@ All URL parameters are optional.
Parameter | Type | Description Parameter | Type | Description
:--- | :--- | :--- | :--- :--- | :--- | :--- | :---
&lt;index&gt; | String | Name or list of the data streams, indices, or aliases to delete from. Supports wildcards. If left blank, OpenSearch searches all indices. &lt;index&gt; | String | Name or list of the data streams, indexes, or aliases to delete from. Supports wildcards. If left blank, OpenSearch searches all indexes.
allow_no_indices | Boolean | Whether to ignore wildcards that dont match any indices. Default is `true`. allow_no_indices | Boolean | Whether to ignore wildcards that dont match any indexes. Default is `true`.
analyzer | String | The analyzer to use in the query string. analyzer | String | The analyzer to use in the query string.
analyze_wildcard | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is false. analyze_wildcard | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is false.
conflicts | String | Indicates to OpenSearch what should happen if the delete by query operation runs into a version conflict. Valid options are `abort` and `proceed`. Default is `abort`. conflicts | String | Indicates to OpenSearch what should happen if the delete by query operation runs into a version conflict. Valid options are `abort` and `proceed`. Default is `abort`.
default_operator | String | Indicates whether the default operator for a string query should be AND or OR. Default is OR. default_operator | String | Indicates whether the default operator for a string query should be AND or OR. Default is OR.
df | String | The default field in case a field prefix is not provided in the query string. df | String | The default field in case a field prefix is not provided in the query string.
expand_wildcards | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indices), `closed` (match closed, non-hidden indices), `hidden` (match hidden indices), and `none` (deny wildcard expressions). Default is `open`. expand_wildcards | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indexes), `closed` (match closed, non-hidden indexes), `hidden` (match hidden indexes), and `none` (deny wildcard expressions). Default is `open`.
from | Integer | The starting index to search from. Default is 0. from | Integer | The starting index to search from. Default is 0.
ignore_unavailable | Boolean | Specifies whether to include missing or closed indices in the response. Default is false. ignore_unavailable | Boolean | Specifies whether to include missing or closed indexes in the response. Default is false.
lenient | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is false. lenient | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is false.
max_docs | Integer | How many documents the delete by query operation should process at most. Default is all documents. max_docs | Integer | How many documents the delete by query operation should process at most. Default is all documents.
preference | String | Specifies which shard or node OpenSearch should perform the delete by query operation on. preference | String | Specifies which shard or node OpenSearch should perform the delete by query operation on.

View File

@ -81,14 +81,14 @@ POST _bulk
``` ```
## List all indices ## List all indexes
``` ```
GET _cat/indices?v&expand_wildcards=all GET _cat/indices?v&expand_wildcards=all
``` ```
## Open or close all indices that match a pattern ## Open or close all indexes that match a pattern
``` ```
POST my-logs*/_open POST my-logs*/_open
@ -96,7 +96,7 @@ POST my-logs*/_close
``` ```
## Delete all indices that match a pattern ## Delete all indexes that match a pattern
``` ```
DELETE my-logs* DELETE my-logs*
@ -119,7 +119,7 @@ GET _cat/aliases?v
``` ```
## Search an index or all indices that match a pattern ## Search an index or all indexes that match a pattern
``` ```
GET my-logs/_search?q=test GET my-logs/_search?q=test

View File

@ -94,8 +94,8 @@ A workload usually includes the following elements:
- [indices]({{site.url}}{{site.baseurl}}/benchmark/workloads/indices/): Defines the relevant indexes and index templates used for the workload. - [indices]({{site.url}}{{site.baseurl}}/benchmark/workloads/indices/): Defines the relevant indexes and index templates used for the workload.
- [corpora]({{site.url}}{{site.baseurl}}/benchmark/workloads/corpora/): Defines all document corpora used for the workload. - [corpora]({{site.url}}{{site.baseurl}}/benchmark/workloads/corpora/): Defines all document corpora used for the workload.
- `schedule`: Defines operations and the order in which the operations run inline. Alternatively, you can use `operations` to group operations and the `test_procedures` parameter to specify the order of operations. - `schedule`: Defines operations and the order in which the operations run inline. Alternatively, you can use `operations` to group operations and the `test_procedures` parameter to specify the order of operations.
- `operations`: **Optional**. Describes which operations are available for the workload and how they are parameterized. - `operations`: **Optional**. Describes which operations are available for the workload and how they are parameterized.
### Indices ### Indices
@ -105,9 +105,9 @@ To create an index, specify its `name`. To add definitions to your index, use th
The `corpora` element requires the name of the index containing the document corpus, for example, `movies`, and a list of parameters that define the document corpora. This list includes the following parameters: The `corpora` element requires the name of the index containing the document corpus, for example, `movies`, and a list of parameters that define the document corpora. This list includes the following parameters:
- `source-file`: The file name that contains the workload's corresponding documents. When using OpenSearch Benchmark locally, documents are contained in a JSON file. When providing a `base_url`, use a compressed file format: `.zip`, `.bz2`, `.gz`, `.tar`, `.tar.gz`, `.tgz`, or `.tar.bz2`. The compressed file must have one JSON file containing the name. - `source-file`: The file name that contains the workload's corresponding documents. When using OpenSearch Benchmark locally, documents are contained in a JSON file. When providing a `base_url`, use a compressed file format: `.zip`, `.bz2`, `.gz`, `.tar`, `.tar.gz`, `.tgz`, or `.tar.bz2`. The compressed file must have one JSON file containing the name.
- `document-count`: The number of documents in the `source-file`, which determines which client indexes correlate to which parts of the document corpus. Each N client receives an Nth of the document corpus. When using a source that contains a document with a parent-child relationship, specify the number of parent documents. - `document-count`: The number of documents in the `source-file`, which determines which client indexes correlate to which parts of the document corpus. Each N client receives an Nth of the document corpus. When using a source that contains a document with a parent-child relationship, specify the number of parent documents.
- `uncompressed-bytes`: The size, in bytes, of the source file after decompression, indicating how much disk space the decompressed source file needs. - `uncompressed-bytes`: The size, in bytes, of the source file after decompression, indicating how much disk space the decompressed source file needs.
- `compressed-bytes`: The size, in bytes, of the source file before decompression. This can help you assess the amount of time needed for the cluster to ingest documents. - `compressed-bytes`: The size, in bytes, of the source file before decompression. This can help you assess the amount of time needed for the cluster to ingest documents.
### Operations ### Operations
@ -116,7 +116,7 @@ The `operations` element lists the OpenSearch API operations performed by the wo
### Schedule ### Schedule
The `schedule` element contains a list of actions and operations that are run by the workload. Operations run according to the order in which they appear in the `schedule`. The following example illustrates a `schedule` with multiple operations, each defined by its `operation-type`: The `schedule` element contains a list of actions and operations that are run by the workload. Operations run according to the order in which they appear in the `schedule`. The following example illustrates a `schedule` with multiple operations, each defined by its `operation-type`:
```json ```json
"schedule": [ "schedule": [

View File

@ -8,44 +8,53 @@ redirect_from: /benchmark/creating-custom-workloads/
# Creating custom workloads # Creating custom workloads
OpenSearch Benchmark includes a set of [workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) that you can use to benchmark data from your cluster. Additionally, if you want to create a workload that is tailored to your own data, you can create a custom workload using one of the following options: OpenSearch Benchmark includes a set of [workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) that you can use to benchmark data from your cluster. Additionally, if you want to create a workload that is tailored to your own data, you can create a custom workload using one of the following options:
- [Creating a workload from an existing cluster](#creating-a-workload-from-an-existing-cluster) - [Creating custom workloads](#creating-custom-workloads)
- [Creating a workload without an existing cluster](#creating-a-workload-without-an-existing-cluster) - [Creating a workload from an existing cluster](#creating-a-workload-from-an-existing-cluster)
- [Prerequisites](#prerequisites)
- [Customizing the workload](#customizing-the-workload)
- [Creating a workload without an existing cluster](#creating-a-workload-without-an-existing-cluster)
- [Invoking your custom workload](#invoking-your-custom-workload)
- [Advanced options](#advanced-options)
- [Test mode](#test-mode)
- [Adding variance to test procedures](#adding-variance-to-test-procedures)
- [Separate operations and test procedures](#separate-operations-and-test-procedures)
- [Next steps](#next-steps)
## Creating a workload from an existing cluster ## Creating a workload from an existing cluster
If you already have an OpenSearch cluster with indexed data, use the following steps to create a custom workload for your cluster. If you already have an OpenSearch cluster with indexed data, use the following steps to create a custom workload for your cluster.
### Prerequisites ### Prerequisites
Before creating a custom workload, make sure you have the following prerequisites: Before creating a custom workload, make sure you have the following prerequisites:
- An OpenSearch cluster with an index that contains 1000 or more documents. If your cluster's index does not contain at least 1000 documents, the workload can still run tests, however, you cannot run workloads using `--test-mode`. - An OpenSearch cluster with an index that contains 1000 or more documents. If your cluster's index does not contain at least 1000 documents, the workload can still run tests, however, you cannot run workloads using `--test-mode`.
- You must have the correct permissions to access your OpenSearch cluster. For more information about cluster permissions, see [Permissions]({{site.url}}{{site.baseurl}}/security/access-control/permissions/). - You must have the correct permissions to access your OpenSearch cluster. For more information about cluster permissions, see [Permissions]({{site.url}}{{site.baseurl}}/security/access-control/permissions/).
### Customizing the workload ### Customizing the workload
To begin creating a custom workload, use the `opensearch-benchmark create-workload` command. To begin creating a custom workload, use the `opensearch-benchmark create-workload` command.
``` ```
opensearch-benchmark create-workload \ opensearch-benchmark create-workload \
--workload="<WORKLOAD NAME>" \ --workload="<WORKLOAD NAME>" \
--target-hosts="<CLUSTER ENDPOINT>" \ --target-hosts="<CLUSTER ENDPOINT>" \
--client-options="basic_auth_user:'<USERNAME>',basic_auth_password:'<PASSWORD>'" \ --client-options="basic_auth_user:'<USERNAME>',basic_auth_password:'<PASSWORD>'" \
--indices="<INDICES TO GENERATE WORKLOAD FROM>" \ --indices="<INDEXES TO GENERATE WORKLOAD FROM>" \
--output-path="<LOCAL DIRECTORY PATH TO STORE WORKLOAD>" --output-path="<LOCAL DIRECTORY PATH TO STORE WORKLOAD>"
``` ```
Replace the following options in the preceding example with information specific to your existing cluster: Replace the following options in the preceding example with information specific to your existing cluster:
- `--workload`: A custom name for your custom workload. - `--workload`: A custom name for your custom workload.
- `--target-hosts:` A comma-separated list of host:port pairs from which the cluster extracts data. - `--target-hosts:` A comma-separated list of host:port pairs from which the cluster extracts data.
- `--client-options`: The basic authentication client options that OpenSearch Benchmark uses to access the cluster. - `--client-options`: The basic authentication client options that OpenSearch Benchmark uses to access the cluster.
- `--indices`: One or more indexes inside your OpenSearch cluster that contain data. - `--indices`: One or more indexes inside your OpenSearch cluster that contain data.
- `--output-path`: The directory in which OpenSearch Benchmark creates the workload and its configuration files. - `--output-path`: The directory in which OpenSearch Benchmark creates the workload and its configuration files.
The following example response creates a workload named `movies` from a cluster with an index named `movies-info`. The `movies-info` index contains over 2,000 documents. The following example response creates a workload named `movies` from a cluster with an index named `movies-info`. The `movies-info` index contains over 2,000 documents.
``` ```
____ _____ __ ____ __ __ ____ _____ __ ____ __ __
@ -68,13 +77,13 @@ Extracting documents for index [movies]... 2000/2000 docs [10
------------------------------- -------------------------------
``` ```
As part of workload creation, OpenSearch Benchmark generates the following files. You can access them in the directory specified by the `--output-path` option. As part of workload creation, OpenSearch Benchmark generates the following files. You can access them in the directory specified by the `--output-path` option.
- `workload.json`: Contains general workload specifications. - `workload.json`: Contains general workload specifications.
- `<index>.json`: Contains mappings and settings for the extracted indexes. - `<index>.json`: Contains mappings and settings for the extracted indexes.
- `<index>-documents.json`: Contains the sources of every document from the extracted indexes. Any sources suffixed with `-1k` encompass only a fraction of the document corpus of the workload and are only used when running the workload in test mode. - `<index>-documents.json`: Contains the sources of every document from the extracted indexes. Any sources suffixed with `-1k` encompass only a fraction of the document corpus of the workload and are only used when running the workload in test mode.
By default, OpenSearch Benchmark does not contain a reference to generate queries. Because you have the best understanding of your data, we recommend adding a query to `workload.json` that matches your index's specifications. Use the following `match_all` query as an example of a query added to your workload: By default, OpenSearch Benchmark does not contain a reference to generate queries. Because you have the best understanding of your data, we recommend adding a query to `workload.json` that matches your index's specifications. Use the following `match_all` query as an example of a query added to your workload:
```json ```json
{ {
@ -100,17 +109,17 @@ If you want to create a custom workload but do not have an existing OpenSearch c
To build a workload with source files, create a directory for your workload and perform the following steps: To build a workload with source files, create a directory for your workload and perform the following steps:
1. Build a `<index>-documents.json` file that contains rows of documents that comprise the document corpora of the workload and houses all data to be ingested and queried into the cluster. The following example shows the first few rows of a `movies-documents.json` file that contains rows of documents about famous movies: 1. Build a `<index>-documents.json` file that contains rows of documents that comprise the document corpora of the workload and houses all data to be ingested and queried into the cluster. The following example shows the first few rows of a `movies-documents.json` file that contains rows of documents about famous movies:
```json ```json
# First few rows of movies-documents.json # First few rows of movies-documents.json
{"title": "Back to the Future", "director": "Robert Zemeckis", "revenue": "$212,259,762 USD", "rating": "8.5 out of 10", "image_url": "https://imdb.com/images/32"} {"title": "Back to the Future", "director": "Robert Zemeckis", "revenue": "$212,259,762 USD", "rating": "8.5 out of 10", "image_url": "https://imdb.com/images/32"}
{"title": "Avengers: Endgame", "director": "Anthony and Joe Russo", "revenue": "$2,800,000,000 USD", "rating": "8.4 out of 10", "image_url": "https://imdb.com/images/2"} {"title": "Avengers: Endgame", "director": "Anthony and Joe Russo", "revenue": "$2,800,000,000 USD", "rating": "8.4 out of 10", "image_url": "https://imdb.com/images/2"}
{"title": "The Grand Budapest Hotel", "director": "Wes Anderson", "revenue": "$173,000,000 USD", "rating": "8.1 out of 10", "image_url": "https://imdb.com/images/65"} {"title": "The Grand Budapest Hotel", "director": "Wes Anderson", "revenue": "$173,000,000 USD", "rating": "8.1 out of 10", "image_url": "https://imdb.com/images/65"}
{"title": "The Godfather: Part II", "director": "Francis Ford Coppola", "revenue": "$48,000,000 USD", "rating": "9 out of 10", "image_url": "https://imdb.com/images/7"} {"title": "The Godfather: Part II", "director": "Francis Ford Coppola", "revenue": "$48,000,000 USD", "rating": "9 out of 10", "image_url": "https://imdb.com/images/7"}
``` ```
2. In the same directory, build a `index.json` file. The workload uses this file as a reference for data mappings and index settings for the documents contained in `<index>-documents.json`. The following example creates mappings and settings specific to the `movie-documents.json` data from the previous step: 2. In the same directory, build a `index.json` file. The workload uses this file as a reference for data mappings and index settings for the documents contained in `<index>-documents.json`. The following example creates mappings and settings specific to the `movie-documents.json` data from the previous step:
```json ```json
{ {
@ -140,21 +149,21 @@ To build a workload with source files, create a directory for your workload and
} }
``` ```
3. Next, build a `workload.json` file that provides a high-level overview of your workload and determines how your workload runs benchmark tests. The `workload.json` file contains the following sections: 3. Next, build a `workload.json` file that provides a high-level overview of your workload and determines how your workload runs benchmark tests. The `workload.json` file contains the following sections:
- `indices`: Defines the name of the index to be created in your OpenSearch cluster using the mappings from the workload's `index.json` file created in the previous step. - `indices`: Defines the name of the index to be created in your OpenSearch cluster using the mappings from the workload's `index.json` file created in the previous step.
- `corpora`: Defines the corpora and the source file, including the: - `corpora`: Defines the corpora and the source file, including the:
- `document-count`: The number of documents in `<index>-documents.json`. To get an accurate number of documents, run `wc -l <index>-documents.json`. - `document-count`: The number of documents in `<index>-documents.json`. To get an accurate number of documents, run `wc -l <index>-documents.json`.
- `uncompressed-bytes`: The number of bytes inside the index. To get an accurate number of bytes, run `stat -f %z <index>-documents.json` on macOS or `stat -c %s <index>-documents.json` on GNU/Linux. Alternatively, run `ls -lrt | grep <index>-documents.json`. - `uncompressed-bytes`: The number of bytes inside the index. To get an accurate number of bytes, run `stat -f %z <index>-documents.json` on macOS or `stat -c %s <index>-documents.json` on GNU/Linux. Alternatively, run `ls -lrt | grep <index>-documents.json`.
- `schedule`: Defines the sequence of operations and available test procedures for the workload. - `schedule`: Defines the sequence of operations and available test procedures for the workload.
The following example `workload.json` file provides the entry point for the `movies` workload. The `indices` section creates an index called `movies`. The corpora section refers to the source file created in step one, `movie-documents.json`, and provides the document count and the amount of uncompressed bytes. Lastly, the schedule section defines a few operations the workload performs when invoked, including: The following example `workload.json` file provides the entry point for the `movies` workload. The `indices` section creates an index called `movies`. The corpora section refers to the source file created in step one, `movie-documents.json`, and provides the document count and the amount of uncompressed bytes. Lastly, the schedule section defines a few operations the workload performs when invoked, including:
- Deleting any current index named `movies`. - Deleting any current index named `movies`.
- Creating an index named `movies` based on data from `movie-documents.json` and the mappings from `index.json`. - Creating an index named `movies` based on data from `movie-documents.json` and the mappings from `index.json`.
- Verifying that the cluster is in good health and can ingest the new index. - Verifying that the cluster is in good health and can ingest the new index.
- Ingesting the data corpora from `workload.json` into the cluster. - Ingesting the data corpora from `workload.json` into the cluster.
- Querying the results. - Querying the results.
```json ```json
{ {
@ -230,15 +239,25 @@ To build a workload with source files, create a directory for your workload and
} }
``` ```
4. For all the workload files created, verify that the workload is functional by running a test. To verify the workload, run the following command, replacing `--workload-path` with a path to your workload directory: The corpora section refers to the source file created in step one, `movie-documents.json`, and provides the document count and the amount of uncompressed bytes. Lastly, the schedule section defines a few operations the workload performs when invoked, including:
``` - Deleting any current index named `movies`.
opensearch-benchmark list workloads --workload-path=</path/to/workload/> - Creating an index named `movies` based on data from `movie-documents.json` and the mappings from `index.json`.
``` - Verifying that the cluster is in good health and can ingest the new index.
- Ingesting the data corpora from `workload.json` into the cluster.
- Querying the results.
For all the workload files created, verify that the workload is functional by running a test. To verify the workload, run the following command, replacing `--workload-path` with a path to your workload directory:
```
opensearch-benchmark list workloads --workload-path=</path/to/workload/>
```
## Invoking your custom workload ## Invoking your custom workload
Use the `opensearch-benchmark execute-test` command to invoke your new workload and run a benchmark test against your OpenSearch cluster, as shown in the following example. Replace `--workload-path` with the path to your custom workload, `--target-host` with the `host:port` pairs for your cluster, and `--client-options` with any authorization options required to access the cluster. Use the `opensearch-benchmark execute-test` command to invoke your new workload and run a benchmark test against your OpenSearch cluster, as shown in the following example. Replace `--workload-path` with the path to your custom workload, `--target-host` with the `host:port` pairs for your cluster, and `--client-options` with any authorization options required to access the cluster.
``` ```
opensearch-benchmark execute_test \ opensearch-benchmark execute_test \
@ -256,7 +275,7 @@ You can enhance your custom workload's functionality with the following advanced
### Test mode ### Test mode
If you want run the test in test mode to make sure your workload operates as intended, add the `--test-mode` option to the `execute-test` command. Test mode ingests only the first 1000 documents from each index provided and runs query operations against them. If you want run the test in test mode to make sure your workload operates as intended, add the `--test-mode` option to the `execute-test` command. Test mode ingests only the first 1000 documents from each index provided and runs query operations against them.
To use test mode, create a `<index>-documents-1k.json` file that contains the first 1000 documents from `<index>-documents.json` using the following command: To use test mode, create a `<index>-documents-1k.json` file that contains the first 1000 documents from `<index>-documents.json` using the following command:
@ -277,13 +296,13 @@ opensearch-benchmark execute_test \
### Adding variance to test procedures ### Adding variance to test procedures
After using your custom workload several times, you might want to use the same workload but perform the workload's operations in a different order. Instead of creating a new workload or reorganizing the procedures directly, you can provide test procedures to vary workload operations. After using your custom workload several times, you might want to use the same workload but perform the workload's operations in a different order. Instead of creating a new workload or reorganizing the procedures directly, you can provide test procedures to vary workload operations.
To add variance to your workload operations, go to your `workload.json` file and replace the `schedule` section with a `test_procedures` array, as shown in the following example. Each item in the array contains the following: To add variance to your workload operations, go to your `workload.json` file and replace the `schedule` section with a `test_procedures` array, as shown in the following example. Each item in the array contains the following:
- `name`: The name of the test procedure. - `name`: The name of the test procedure.
- `default`: When set to `true`, OpenSearch Benchmark defaults to the test procedure specified as `default` in the workload if no other test procedures are specified. - `default`: When set to `true`, OpenSearch Benchmark defaults to the test procedure specified as `default` in the workload if no other test procedures are specified.
- `schedule`: All the operations the test procedure will run. - `schedule`: All the operations the test procedure will run.
```json ```json
@ -347,11 +366,11 @@ To add variance to your workload operations, go to your `workload.json` file and
### Separate operations and test procedures ### Separate operations and test procedures
If you want to make your `workload.json` file more readable, you can separate your operations and test procedures into different directories and reference the path to each in `workload.json`. To separate operations and procedures, perform the following steps: If you want to make your `workload.json` file more readable, you can separate your operations and test procedures into different directories and reference the path to each in `workload.json`. To separate operations and procedures, perform the following steps:
1. Add all test procedures to a single file. You can give the file any name. Because the `movies` workload in the preceding contains and index task and queries, this step names the test procedures file `index-and-query.json`. 1. Add all test procedures to a single file. You can give the file any name. Because the `movies` workload in the preceding contains and index task and queries, this step names the test procedures file `index-and-query.json`.
2. Add all operations to a file named `operations.json`. 2. Add all operations to a file named `operations.json`.
3. Reference the new files in `workloads.json` by adding the following syntax, replacing `parts` with the relative path to each file, as shown in the following example: 3. Reference the new files in `workloads.json` by adding the following syntax, replacing `parts` with the relative path to each file, as shown in the following example:
```json ```json
"operations": [ "operations": [
@ -365,11 +384,5 @@ If you want to make your `workload.json` file more readable, you can separate yo
## Next steps ## Next steps
- For more information about configuring OpenSearch Benchmark, see [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/). - For more information about configuring OpenSearch Benchmark, see [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/).
- To show a list of prepackaged workloads for OpenSearch Benchmark, see the [opensearch-benchmark-workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) repository. - To show a list of prepackaged workloads for OpenSearch Benchmark, see the [opensearch-benchmark-workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) repository.

View File

@ -0,0 +1,109 @@
---
layout: default
title: Workload reference
nav_order: 60
has_children: true
---
# OpenSearch Benchmark workload reference
A workload is a specification of one or more benchmarking scenarios. A workload typically includes the following:
- One or more data streams that are ingested into indexes
- A set of queries and operations that are invoked as part of the benchmark
This section provides a list of options and examples you can use when customizing or using a workload.
For more information about what comprises a workload, see [Anatomy of a workload]({{site.url}}{{site.baseurl}}/benchmark/user-guide/concepts#anatomy-of-a-workload).
## Workload examples
If you want to try certain workloads before creating your own, use the following examples.
### Running unthrottled
In the following example, OpenSearch Benchmark runs an unthrottled bulk index operation for 1 hour against the `movies` index:
```json
{
"description": "Tutorial benchmark for OpenSearch Benchmark",
"indices": [
{
"name": "movies",
"body": "index.json"
}
],
"corpora": [
{
"name": "movies",
"documents": [
{
"source-file": "movies-documents.json",
"document-count": 11658903, # Fetch document count from command line
"uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line
}
]
}
],
"schedule": [
{
"operation": "bulk",
"warmup-time-period": 120,
"time-period": 3600,
"clients": 8
}
]
}
```
### Workload with a single task
The following workload runs a benchmark with a single task: a `match_all` query. Because no `clients` are indicated, only one client is used. According to the `schedule`, the workload runs the `match_all` query at 10 operations per second with 1 client, uses 100 iterations to warm up, and uses the next 100 iterations to measure the benchmark:
```json
{
"description": "Tutorial benchmark for OpenSearch Benchmark",
"indices": [
{
"name": "movies",
"body": "index.json"
}
],
"corpora": [
{
"name": "movies",
"documents": [
{
"source-file": "movies-documents.json",
"document-count": 11658903, # Fetch document count from command line
"uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line
}
]
}
],
{
"schedule": [
{
"operation": {
"operation-type": "search",
"index": "_all",
"body": {
"query": {
"match_all": {}
}
}
},
"warmup-iterations": 100,
"iterations": 100,
"target-throughput": 10
}
]
}
}
```
## Next steps
- For more information about configuring OpenSearch Benchmark, see [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/).
- For a list of prepackaged workloads for OpenSearch Benchmark, see the [opensearch-benchmark-workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) repository.

View File

@ -0,0 +1,30 @@
---
layout: default
title: indices
parent: Workload reference
nav_order: 65
---
# indices
The `indices` element contains a list of all indexes used in the workload.
## Example
```json
"indices": [
{
"name": "geonames",
"body": "geonames-index.json",
}
]
```
## Configuration options
Use the following options with `indices`:
Parameter | Required | Type | Description
:--- | :--- | :--- | :---
`name` | Yes | String | The name of the index template.
`body` | No | String | The file name corresponding to the index definition used in the body of the Create Index API.

View File

@ -28,13 +28,13 @@ Snapshots have two main uses:
## Creating a repository ## Creating a repository
Before you create an SM policy, set up a repository for snapshots. Before you create an SM policy, set up a repository for snapshots.
1. From the OpenSearch Dashboards main menu, select **Management** > **Snapshot Management**. 1. From the OpenSearch Dashboards main menu, select **Management** > **Snapshot Management**.
2. In the left panel, under **Snapshot Management**, select **Repositories**. 2. In the left panel, under **Snapshot Management**, select **Repositories**.
3. Choose the **Create Repository** button. 3. Choose the **Create Repository** button.
4. Enter the repository name, type, and location. 4. Enter the repository name, type, and location.
5. (Optional) Select **Advanced Settings** and enter additional settings for this repository as a JSON object. 5. (Optional) Select **Advanced Settings** and enter additional settings for this repository as a JSON object.
#### Example #### Example
```json ```json
{ {
@ -87,7 +87,7 @@ You can view, edit, or delete an SM policy on the policy details page.
1. From the OpenSearch Dashboards main menu, select **Management** > **Snapshot Management**. 1. From the OpenSearch Dashboards main menu, select **Management** > **Snapshot Management**.
1. In the left panel, under **Snapshot Management**, select **Snapshot Policies**. 1. In the left panel, under **Snapshot Management**, select **Snapshot Policies**.
1. Click on the **Policy name** of the policy you want to view, edit, or delete. <br> 1. Click on the **Policy name** of the policy you want to view, edit, or delete. <br>
The policy settings, snapshot schedule, snapshot retention period, notifications, and last creation and deletion are displayed in the policy details page. <br> If a snapshot creation or deletion fails, you can view information about the failure in the **Last Creation/Deletion** section. To view the failure message, click on the **cause** in the **Info** column. The policy settings, snapshot schedule, snapshot retention period, notifications, and last creation and deletion are displayed in the policy details page. <br> If a snapshot creation or deletion fails, you can view information about the failure in the **Last Creation/Deletion** section. To view the failure message, click on the **cause** in the **Info** column.
1. To edit or delete the SM policy, select the **Edit** or **Delete** button. 1. To edit or delete the SM policy, select the **Edit** or **Delete** button.
## Enable, disable, or delete SM policies ## Enable, disable, or delete SM policies
@ -131,7 +131,7 @@ The **Delete** button [deletes]({{site.url}}{{site.baseurl}}/api-reference/snaps
1. From the OpenSearch Dashboards main menu, select **Management** > **Snapshot Management**. 1. From the OpenSearch Dashboards main menu, select **Management** > **Snapshot Management**.
1. In the left panel, under **Snapshot Management**, select **Snapshots**. The **Snapshots** tab is selected by default. 1. In the left panel, under **Snapshot Management**, select **Snapshots**. The **Snapshots** tab is selected by default.
1. Select the checkbox next to the snapshot you want to restore. An example is shown in the following image: 1. Select the checkbox next to the snapshot you want to restore. An example is shown in the following image:
<img src="{{site.url}}{{site.baseurl}}/images/restore-snapshot/restore-snapshot-main.png" alt="Snapshots">{: .img-fluid} <img src="{{site.url}}{{site.baseurl}}/images/restore-snapshot/restore-snapshot-main.png" alt="Snapshots">{: .img-fluid}
{::nomarkdown}<img src="{{site.url}}{{site.baseurl}}/images/icons/star-icon.png" class="inline-icon" alt="star icon"/>{:/} **Note:** You can only restore snapshots with the status of `Success` or `Partial`. The status of the snapshot is displayed in the **Snapshot status** column. {::nomarkdown}<img src="{{site.url}}{{site.baseurl}}/images/icons/star-icon.png" class="inline-icon" alt="star icon"/>{:/} **Note:** You can only restore snapshots with the status of `Success` or `Partial`. The status of the snapshot is displayed in the **Snapshot status** column.
@ -142,7 +142,7 @@ The **Delete** button [deletes]({{site.url}}{{site.baseurl}}/api-reference/snaps
<img src="{{site.url}}{{site.baseurl}}/images/restore-snapshot/restore-snapshot.png" alt="Restore Snapshot" width="450"> <img src="{{site.url}}{{site.baseurl}}/images/restore-snapshot/restore-snapshot.png" alt="Restore Snapshot" width="450">
For more information about the options in the **Restore snapshot** flyout, see [Restore snapshots]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore#restore-snapshots). For more information about the options in the **Restore snapshot** flyout, see [Restore snapshots]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore#restore-snapshots).
**Ignoring missing indexes** **Ignoring missing indexes**
@ -154,20 +154,20 @@ The **Delete** button [deletes]({{site.url}}{{site.baseurl}}/api-reference/snaps
&emsp;&#x2022; Select the **Customize index settings** checkbox to provide new values for the specified index settings. All newly restored indexes will use these values instead of the ones in the snapshot. <br> &emsp;&#x2022; Select the **Customize index settings** checkbox to provide new values for the specified index settings. All newly restored indexes will use these values instead of the ones in the snapshot. <br>
&emsp;&#x2022; Select the **Ignore index settings** checkbox to specify the settings in the snapshot to ignore. All newly restored indexes will use the cluster defaults for these settings. &emsp;&#x2022; Select the **Ignore index settings** checkbox to specify the settings in the snapshot to ignore. All newly restored indexes will use the cluster defaults for these settings.
The examples in the following image set `index.number_of_replicas` to `0`, `index.auto_expand_replicas` to `true`, and `index.refresh_interval` and `index.max_script_fields` to the cluster default values for all newly restored indexes. The examples in the following image set `index.number_of_replicas` to `0`, `index.auto_expand_replicas` to `true`, and `index.refresh_interval` and `index.max_script_fields` to the cluster default values for all newly restored indexes.
<img src="{{site.url}}{{site.baseurl}}/images/restore-snapshot/restore-snapshot-custom.png" alt="Custom settings" width="450"> <img src="{{site.url}}{{site.baseurl}}/images/restore-snapshot/restore-snapshot-custom.png" alt="Custom settings" width="450">
For more information about index settings, see [Index settings]({{site.url}}{{site.baseurl}}/im-plugin/index-settings/). For more information about index settings, see [Index settings]({{site.url}}{{site.baseurl}}/im-plugin/index-settings/).
For a list of settings that you cannot change or ignore, see [Restore snapshots]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore#restore-snapshots). For a list of settings that you cannot change or ignore, see [Restore snapshots]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore#restore-snapshots).
After choosing the options, select the **Restore snapshot** button. After choosing the options, select the **Restore snapshot** button.
1. (Optional) To monitor the restore progress, select **View restore activities** in the confirmation dialog. You can also monitor the restore progress at any time by selecting the **Restore activities in progress** tab, as shown in the following image. 1. (Optional) To monitor the restore progress, select **View restore activities** in the confirmation dialog. You can also monitor the restore progress at any time by selecting the **Restore activities in progress** tab, as shown in the following image.
<img src="{{site.url}}{{site.baseurl}}/images/restore-snapshot/restore-snapshot-activities.png" alt="Restore Activities">{: .img-fluid} <img src="{{site.url}}{{site.baseurl}}/images/restore-snapshot/restore-snapshot-activities.png" alt="Restore Activities">{: .img-fluid}
You can view the percentage of the job that has been completed in the **Status** column. Once the snapshot restore is complete, the **Status** changes to `Completed (100%)`. You can view the percentage of the job that has been completed in the **Status** column. Once the snapshot restore is complete, the **Status** changes to `Completed (100%)`.
{::nomarkdown}<img src="{{site.url}}{{site.baseurl}}/images/icons/star-icon.png" class="inline-icon" alt="star icon"/>{:/} **Note:** The **Restore activities in progress** panel is not persistent. It displays only the progress of the current restore operation. If multiple restore operations are running, the panel displays the most recent one. {::nomarkdown}<img src="{{site.url}}{{site.baseurl}}/images/icons/star-icon.png" class="inline-icon" alt="star icon"/>{:/} **Note:** The **Restore activities in progress** panel is not persistent. It displays only the progress of the current restore operation. If multiple restore operations are running, the panel displays the most recent one.
{: .note purple} {: .note purple}
@ -178,4 +178,3 @@ The **Delete** button [deletes]({{site.url}}{{site.baseurl}}/api-reference/snaps
After the restore operation is complete, the restored indexes are listed in the **Indices** panel. To view the indexes, in the left panel, under **Index Management**, choose **Indices**. After the restore operation is complete, the restored indexes are listed in the **Indices** panel. To view the indexes, in the left panel, under **Index Management**, choose **Indices**.
<img src="{{site.url}}{{site.baseurl}}/images/restore-snapshot/restore-snapshot-indices-panel.png" alt="View Indices">{: .img-fluid} <img src="{{site.url}}{{site.baseurl}}/images/restore-snapshot/restore-snapshot-indices-panel.png" alt="View Indices">{: .img-fluid}

View File

@ -1,6 +1,6 @@
--- ---
layout: default layout: default
title: opensearch title: opensearch
parent: Sources parent: Sources
grand_parent: Pipelines grand_parent: Pipelines
nav_order: 30 nav_order: 30
@ -39,7 +39,7 @@ opensearch-source-pipeline:
include: include:
- index_name_regex: "test-index-.*" - index_name_regex: "test-index-.*"
exclude: exclude:
- index_name_regex: "\..*" - index_name_regex: "\..*"
scheduling: scheduling:
interval: "PT1H" interval: "PT1H"
index_read_count: 2 index_read_count: 2
@ -103,15 +103,15 @@ Option | Required | Type | Description
`aws` | No | Object | The AWS configuration. For more information, see [aws](#aws). `aws` | No | Object | The AWS configuration. For more information, see [aws](#aws).
`acknowledgments` | No | Boolean | When `true`, enables the `opensearch` source to receive [end-to-end acknowledgments]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/#end-to-end-acknowledgments) when events are received by OpenSearch sinks. Default is `false`. `acknowledgments` | No | Boolean | When `true`, enables the `opensearch` source to receive [end-to-end acknowledgments]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/#end-to-end-acknowledgments) when events are received by OpenSearch sinks. Default is `false`.
`connection` | No | Object | The connection configuration. For more information, see [Connection](#connection). `connection` | No | Object | The connection configuration. For more information, see [Connection](#connection).
`indices` | No | Object | The configuration for filtering which indexes are processed. Defaults to all indexes, including system indexes. For more information, see [Indices](#indices). `indices` | No | Object | The configuration for filtering which indexes are processed. Defaults to all indexes, including system indexes. For more information, see [indexes](#indices).
`scheduling` | No | Object | The scheduling configuration. For more information, see [Scheduling](#scheduling). `scheduling` | No | Object | The scheduling configuration. For more information, see [Scheduling](#scheduling).
`search_options` | No | Object | A list of search options performed by the source. For more information, see [Search options](#search_options). `search_options` | No | Object | A list of search options performed by the source. For more information, see [Search options](#search_options).
### Scheduling ### Scheduling
The `scheduling` configuration allows the user to configure how indexes are reprocessed in the source based on the the `index_read_count` and recount time `interval`. The `scheduling` configuration allows the user to configure how indexes are reprocessed in the source based on the the `index_read_count` and recount time `interval`.
For example, setting `index_read_count` to `3` with an `interval` of `1h` will result in all indexes being reprocessed 3 times, 1 hour apart. By default, indexes will only be processed once. For example, setting `index_read_count` to `3` with an `interval` of `1h` will result in all indexes being reprocessed 3 times, 1 hour apart. By default, indexes will only be processed once.
Use the following options under the `scheduling` configuration. Use the following options under the `scheduling` configuration.
@ -119,12 +119,12 @@ Option | Required | Type | Description
:--- | :--- |:----------------| :--- :--- | :--- |:----------------| :---
`index_read_count` | No | Integer | The number of times each index will be processed. Default is `1`. `index_read_count` | No | Integer | The number of times each index will be processed. Default is `1`.
`interval` | No | String | The interval that determines the amount of time between reprocessing. Supports ISO 8601 notation strings, such as "PT20.345S" or "PT15M", as well as simple notation strings for seconds ("60s") and milliseconds ("1500ms"). Defaults to `8h`. `interval` | No | String | The interval that determines the amount of time between reprocessing. Supports ISO 8601 notation strings, such as "PT20.345S" or "PT15M", as well as simple notation strings for seconds ("60s") and milliseconds ("1500ms"). Defaults to `8h`.
`start_time` | No | String | The time when processing should begin. The source will not start processing until this time. The string must be in ISO 8601 format, such as `2007-12-03T10:15:30.00Z`. The default option starts processing immediately. `start_time` | No | String | The time when processing should begin. The source will not start processing until this time. The string must be in ISO 8601 format, such as `2007-12-03T10:15:30.00Z`. The default option starts processing immediately.
### indices ### indices
The following options help the `opensearch` source determine which indexes are processed from the source cluster using regex patterns. An index will only be processed if it matches one of the `index_name_regex` patterns under the `include` setting and does not match any of the The following options help the `opensearch` source determine which indexes are processed from the source cluster using regex patterns. An index will only be processed if it matches one of the `index_name_regex` patterns under the `include` setting and does not match any of the
patterns under the `exclude` setting. patterns under the `exclude` setting.
Option | Required | Type | Description Option | Required | Type | Description
@ -137,7 +137,7 @@ Use the following setting under the `include` and `exclude` options to indicate
Option | Required | Type | Description Option | Required | Type | Description
:--- |:----|:-----------------| :--- :--- |:----|:-----------------| :---
`index_name_regex` | Yes | Regex string | The regex pattern to match indexes against. `index_name_regex` | Yes | Regex string | The regex pattern to match indexes against.
### search_options ### search_options
@ -145,13 +145,13 @@ Use the following settings under the `search_options` configuration.
Option | Required | Type | Description Option | Required | Type | Description
:--- |:---------|:--------| :--- :--- |:---------|:--------| :---
`batch_size` | No | Integer | The number of documents to read while paginating from OpenSearch. Default is `1000`. `batch_size` | No | Integer | The number of documents to read while paginating from OpenSearch. Default is `1000`.
`search_context_type` | No | Enum | An override for the type of search/pagination to use on indexes. Can be [point_in_time]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/paginate/#point-in-time-with-search_after)), [scroll]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/paginate/#scroll-search), or `none`. The `none` option will use the [search_after]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/paginate/#the-search_after-parameter) parameter. For more information, see [Default Search Behavior](#default-search-behavior). `search_context_type` | No | Enum | An override for the type of search/pagination to use on indexes. Can be [point_in_time]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/paginate/#point-in-time-with-search_after)), [scroll]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/paginate/#scroll-search), or `none`. The `none` option will use the [search_after]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/paginate/#the-search_after-parameter) parameter. For more information, see [Default Search Behavior](#default-search-behavior).
### Default search behavior ### Default search behavior
By default, the `opensearch` source will look up the cluster version and distribution to determine By default, the `opensearch` source will look up the cluster version and distribution to determine
which `search_context_type` to use. For versions and distributions that support [Point in Time](https://opensearch.org/docs/latest/search-plugins/searching-data/paginate/#point-in-time-with-search_after), `point_in_time` will be used. which `search_context_type` to use. For versions and distributions that support [Point in Time](https://opensearch.org/docs/latest/search-plugins/searching-data/paginate/#point-in-time-with-search_after), `point_in_time` will be used.
If `point_in_time` is not supported by the cluster, then [scroll](https://opensearch.org/docs/latest/search-plugins/searching-data/paginate/#scroll-search) will be used. For Amazon OpenSearch Serverless collections, [search_after](https://opensearch.org/docs/latest/search-plugins/searching-data/paginate/#the-search_after-parameter) will be used because neither `point_in_time` nor `scroll` are supported by collections. If `point_in_time` is not supported by the cluster, then [scroll](https://opensearch.org/docs/latest/search-plugins/searching-data/paginate/#scroll-search) will be used. For Amazon OpenSearch Serverless collections, [search_after](https://opensearch.org/docs/latest/search-plugins/searching-data/paginate/#the-search_after-parameter) will be used because neither `point_in_time` nor `scroll` are supported by collections.
### Connection ### Connection

View File

@ -43,14 +43,14 @@ Name | Description
indices_all | Grants all permissions on the index. Equates to `indices:*`. indices_all | Grants all permissions on the index. Equates to `indices:*`.
get | Grants permissions to use `get` and `mget` actions only. get | Grants permissions to use `get` and `mget` actions only.
read | Grants read permissions such as search, get field mappings, `get`, and `mget`. read | Grants read permissions such as search, get field mappings, `get`, and `mget`.
write | Grants permissions to create and update documents within *existing indices*. To create new indices, see `create_index`. write | Grants permissions to create and update documents within *existing indices*. To create new indexes, see `create_index`.
delete | Grants permissions to delete documents. delete | Grants permissions to delete documents.
crud | Combines the `read`, `write`, and `delete` action groups. Included in the `data_access` action group. crud | Combines the `read`, `write`, and `delete` action groups. Included in the `data_access` action group.
search | Grants permissions to search documents. Includes `suggest`. search | Grants permissions to search documents. Includes `suggest`.
suggest | Grants permissions to use the suggest API. Included in the `read` action group. suggest | Grants permissions to use the suggest API. Included in the `read` action group.
create_index | Grants permissions to create indices and mappings. create_index | Grants permissions to create indexes and mappings.
indices_monitor | Grants permissions to execute all index monitoring actions (e.g. recovery, segments info, index stats, and status). indices_monitor | Grants permissions to execute all index monitoring actions (e.g. recovery, segments info, index stats, and status).
index | A more limited version of the `write` action group. index | A more limited version of the `write` action group.
data_access | Combines the `crud` action group with `indices:data/*`. data_access | Combines the `crud` action group with `indices:data/*`.
manage_aliases | Grants permissions to manage aliases. manage_aliases | Grants permissions to manage aliases.
manage | Grants all monitoring and administration permissions for indices. manage | Grants all monitoring and administration permissions for indexes.

View File

@ -93,10 +93,10 @@ System index permissions also work with the wildcard to include all variations o
* Specifying the full name of a system index limits access to only that index: `.opendistro-alerting-config`. * Specifying the full name of a system index limits access to only that index: `.opendistro-alerting-config`.
* Specifying a partial name for a system index along with the wildcard provides access to all system indexes that begin with that name: `.opendistro-anomaly-detector*`. * Specifying a partial name for a system index along with the wildcard provides access to all system indexes that begin with that name: `.opendistro-anomaly-detector*`.
* Although not recommended---given the wide-reaching access granted by this role definition---using `*` for the index pattern along with `system:admin/system_index` as an allowed action grants access to all system indexes. * Although not recommended---given the wide-reaching access granted by this role definition---using `*` for the index pattern along with `system:admin/system_index` as an allowed action grants access to all system indexes.
Entering the wildcard `*` by itself under `allowed_actions` does not automatically grant access to system indexes. The allowed action `system:admin/system_index` must be explicitly added. Entering the wildcard `*` by itself under `allowed_actions` does not automatically grant access to system indexes. The allowed action `system:admin/system_index` must be explicitly added.
{: .note } {: .note }
The following example shows a role that grants access to all system indexes: The following example shows a role that grants access to all system indexes:
```yml ```yml
@ -474,4 +474,4 @@ Allowing access to these endpoints has the potential to trigger operational chan
- restapi:admin/rolesmapping - restapi:admin/rolesmapping
- restapi:admin/ssl/certs/info - restapi:admin/ssl/certs/info
- restapi:admin/ssl/certs/reload - restapi:admin/ssl/certs/reload
- restapi:admin/tenants - restapi:admin/tenants

View File

@ -136,9 +136,9 @@ _meta:
``` ```
## Manage OpenSearch Dashboards indices ## Manage OpenSearch Dashboards indexes
The open source version of OpenSearch Dashboards saves all objects to a single index: `.kibana`. The Security plugin uses this index for the global tenant, but separate indices for every other tenant. Each user also has a private tenant, so you might see a large number of indices that follow two patterns: The open source version of OpenSearch Dashboards saves all objects to a single index: `.kibana`. The Security plugin uses this index for the global tenant, but separate indexes for every other tenant. Each user also has a private tenant, so you might see a large number of indexes that follow two patterns:
``` ```
.kibana_<hash>_<tenant_name> .kibana_<hash>_<tenant_name>
@ -149,4 +149,3 @@ The Security plugin scrubs these index names of special characters, so they migh
{: .tip } {: .tip }
To back up your OpenSearch Dashboards data, [take a snapshot]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore/) of all tenant indexes using an index pattern such as `.kibana*`. To back up your OpenSearch Dashboards data, [take a snapshot]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore/) of all tenant indexes using an index pattern such as `.kibana*`.

View File

@ -13,7 +13,7 @@ redirect_from:
# Introduction to OpenSearch # Introduction to OpenSearch
OpenSearch is a distributed search and analytics engine based on [Apache Lucene](https://lucene.apache.org/). After adding your data to OpenSearch, you can perform full-text searches on it with all of the features you might expect: search by field, search multiple indices, boost fields, rank results by score, sort results by field, and aggregate results. OpenSearch is a distributed search and analytics engine based on [Apache Lucene](https://lucene.apache.org/). After adding your data to OpenSearch, you can perform full-text searches on it with all of the features you might expect: search by field, search multiple indexes, boost fields, rank results by score, sort results by field, and aggregate results.
Unsurprisingly, people often use search engines like OpenSearch as the backend for a search application---think [Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:FAQ/Technical#What_software_is_used_to_run_Wikipedia?) or an online store. It offers excellent performance and can scale up and down as the needs of the application grow or shrink. Unsurprisingly, people often use search engines like OpenSearch as the backend for a search application---think [Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:FAQ/Technical#What_software_is_used_to_run_Wikipedia?) or an online store. It offers excellent performance and can scale up and down as the needs of the application grow or shrink.
@ -29,9 +29,9 @@ You can run OpenSearch locally on a laptop---its system requirements are minimal
In a single node cluster, such as a laptop, one machine has to do everything: manage the state of the cluster, index and search data, and perform any preprocessing of data prior to indexing it. As a cluster grows, however, you can subdivide responsibilities. Nodes with fast disks and plenty of RAM might be great at indexing and searching data, whereas a node with plenty of CPU power and a tiny disk could manage cluster state. For more information on setting node types, see [Cluster formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/). In a single node cluster, such as a laptop, one machine has to do everything: manage the state of the cluster, index and search data, and perform any preprocessing of data prior to indexing it. As a cluster grows, however, you can subdivide responsibilities. Nodes with fast disks and plenty of RAM might be great at indexing and searching data, whereas a node with plenty of CPU power and a tiny disk could manage cluster state. For more information on setting node types, see [Cluster formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/).
## Indices and documents ## indexes and documents
OpenSearch organizes data into *indices*. Each index is a collection of JSON *documents*. If you have a set of raw encyclopedia articles or log lines that you want to add to OpenSearch, you must first convert them to [JSON](https://www.json.org/). A simple JSON document for a movie might look like this: OpenSearch organizes data into *indexes*. Each index is a collection of JSON *documents*. If you have a set of raw encyclopedia articles or log lines that you want to add to OpenSearch, you must first convert them to [JSON](https://www.json.org/). A simple JSON document for a movie might look like this:
```json ```json
{ {
@ -55,14 +55,14 @@ When you add the document to an index, OpenSearch adds some metadata, such as th
} }
``` ```
Indices also contain mappings and settings: Indexes also contain mappings and settings:
- A *mapping* is the collection of *fields* that documents in the index have. In this case, those fields are `title` and `release_date`. - A *mapping* is the collection of *fields* that documents in the index have. In this case, those fields are `title` and `release_date`.
- Settings include data like the index name, creation date, and number of shards. - Settings include data like the index name, creation date, and number of shards.
## Primary and replica shards ## Primary and replica shards
OpenSearch splits indices into *shards* for even distribution across nodes in a cluster. For example, a 400 GB index might be too large for any single node in your cluster to handle, but split into ten shards, each one 40 GB, OpenSearch can distribute the shards across ten nodes and work with each shard individually. OpenSearch splits indexes into *shards* for even distribution across nodes in a cluster. For example, a 400 GB index might be too large for any single node in your cluster to handle, but split into ten shards, each one 40 GB, OpenSearch can distribute the shards across ten nodes and work with each shard individually.
By default, OpenSearch creates a *replica* shard for each *primary* shard. If you split your index into ten shards, for example, OpenSearch also creates ten replica shards. These replica shards act as backups in the event of a node failure---OpenSearch distributes replica shards to different nodes than their corresponding primary shards---but they also improve the speed and rate at which the cluster can process search requests. You might specify more than one replica per index for a search-heavy workload. By default, OpenSearch creates a *replica* shard for each *primary* shard. If you split your index into ten shards, for example, OpenSearch also creates ten replica shards. These replica shards act as backups in the event of a node failure---OpenSearch distributes replica shards to different nodes than their corresponding primary shards---but they also improve the speed and rate at which the cluster can process search requests. You might specify more than one replica per index for a search-heavy workload.
@ -93,4 +93,4 @@ To delete the document:
DELETE https://<host>:<port>/<index-name>/_doc/<document-id> DELETE https://<host>:<port>/<index-name>/_doc/<document-id>
``` ```
You can change most OpenSearch settings using the REST API, modify indices, check the health of the cluster, get statistics---almost everything. You can change most OpenSearch settings using the REST API, modify indexes, check the health of the cluster, get statistics---almost everything.