Add operations and schedule sections (#5809)

* Add operations and schedule sections

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Add bulk operation documentation

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Add operations

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Place operations under workloads. Add remaining operations.

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Fix link

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Update test-procedures.md

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Update _benchmark/reference/workloads/operations.md

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Update operations.md

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Update operations.md

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Update _benchmark/reference/workloads/test-procedures.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Update test-procedures.md

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Change nav order

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

---------

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
This commit is contained in:
Naarcha-AWS 2023-12-20 12:35:58 -06:00 committed by GitHub
parent b6965a64c7
commit e71a3c63eb
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 515 additions and 0 deletions

View File

@ -0,0 +1,324 @@
---
layout: default
title: operations
parent: Workload reference
grand_parent: OpenSearch Benchmark Reference
nav_order: 100
---
# operations
The `operations` element contains a list of all available operations for specifying a schedule.
## bulk
The `bulk` operation type allows you to run [bulk](/api-reference/document-apis/bulk/) requests as a task.
### Usage
The following example shows a `bulk` operation type with a `bulk-size` of `5000` documents:
```yml
{
"name": "index-append",
"operation-type": "bulk",
"bulk-size": 5000
}
```
### Split documents among clients
When you have multiple `clients`, OpenSearch Benchmark splits each document based on the set number of clients. Having multiple `clients` parallelizes the bulk index operations but doesn't preserve the ingestion order of each document. For example, if `clients` is set to `2`, one client indexes the document starting from the beginning, while the other client indexes the document starting from the middle.
If there are multiple documents or corpora, OpenSearch Benchmark attempts to index all documents in parallel in two ways:
1. Each client starts at a different point in the corpus. For example, in a workload with 2 corpora and 5 clients, clients 1, 3, and 5 begin with the first corpus, whereas clients 2 and 4 start with the second corpus.
2. Each client is assigned to multiple documents. Client 1 starts with the first split of the first document of the first corpus. Then it moves to the first split of the first document of the second corpus, and so on.
### Configuration options
Use the following options to customize the `bulk` operation.
Parameter | Required | Type | Description
:--- | :--- | :--- | :---
`bulk-size` | Yes | Number | Specifies the number of documents to be ingested in the bulk request.
`ingest-percentage` | No | Range [0, 100] | Defines the portion of the document corpus to be indexed. Valid values are numbers between 0 and 100.
`corpora` | No | List | Defines which document corpus names should be targeted by the bulk operation. Only needed if the `corpora` section contains more than one document corpus and you dont want to index all of them during the bulk request.
`indices` | No | List | Defines which indexes should be used in the bulk index operation. OpenSearch Benchmark only selects document files that have a matching `target-index`.
`batch-size` | No | Number | Defines how many documents OpenSearch Benchmark reads simultaneously. This is an expert setting and is only meant to avoid accidental bottlenecks for very small bulk sizes. If you want to benchmark with a `bulk-size` of `1`, you should set a higher `batch-size`.
`pipeline` | No | String | Defines which existing ingest pipeline to use.
`conflicts` | No | String | Defines the type of index `conflicts` to simulate. If not specified, none are simulated. Valid values are sequential, which replaces a document ID with a sequentially increasing document ID, and random, which replaces a document ID with a random document ID.
`conflict-probability` | No | Percentage | Defines how many of the documents are replaced when a conflict exists. Combining `conflicts=sequential` and `conflict-probability=0` makes OpenSearch Benchmark generate the index ID itself instead of using OpenSearch's automatic ID generation. Valid values are numbers between 0 and 100. Default is `25%`.
`on-conflict` | No | String | Determines whether OpenSearch should use the action `index` or `update` index for ID conflicts. Default is `index`, which creates a new index during ID conflicts.
`recency` | No | Number | Uses a number between 0 and 1 to indicate recency. A recency closer to `1` biases conflicting IDs toward more recent IDs. A recency closer to 0 considers all IDs for ID conflicts.
`detailed-results` | No | Boolean | Records more detailed [metadata](#metadata) for bulk requests. As OpenSearch Benchmark analyzes the corresponding bulk response in more detail, additional overhead may be incurred, which can skew measurement results. This property must be set to `true` so that OpenSearch Benchmark logs individual bulk request failures.
`timeout` | No | Duration | Defines the amount of time (in minutes) that OpenSearch waits per action until completing the processing of the following operations: automatic index creation, dynamic mapping updates, and waiting for active shards. Default is `1m`.
`refresh` | No | String | Controls OpenSearch refresh behavior for bulk requests that use the `refresh` bulk API query parameter. Valid values are `true`, which refreshes target shards in the background; `wait_for`, which blocks bulk requests until affected shards have been refreshed; and `false`, which uses the default refresh behavior.
### Metadata
The `bulk` operation always returns the following metadata:
- `index`: The name of the affected index. If an index cannot be derived, it returns `null`.
- `weight`: An operation-agnostic representation of the bulk size, denoted by `units`.
- `unit`: The unit used to interpret `weight`.
- `success`: A Boolean indicating whether the `bulk` request succeeded.
- `success-count`: The number of successfully processed bulk items for the request. This value is determined when there are errors or when the `bulk-size` has been specified in the documents.
- `error-count`: The number of failed bulk items for the request.
- `took`: The value of the `took` property in the bulk response.
If `detailed-results` is `true`, the following metadata is returned:
- `ops`: A nested document with the operation name as its key, such as `index`, `update`, or `delete`, and various counts as values. `item-count` contains the total number of items for this key. Additionally, OpenSearch Benchmark returns a separate counter for each result, for example, a result for the number of created items or the number of deleted items.
- `shards_histogram`: An array of hashes, each of which has two keys. The `item-count` key contains the number of items to which a shard distribution applies. The `shards` key contains a hash with the actual distribution of `total`, `successful`, and `failed` shards.
- `bulk-request-size-bytes`: The total size of the bulk request body, in bytes.
- `total-document-size-bytes`: The total size of all documents within the bulk request body, in bytes.
## create-index
The `create-index` operation runs the [Create Index API](/api-reference/index-apis/create-index/). It supports the following two modes of index creation:
- Creating all indexes specified in the workloads `indices` section
- Creating one specific index defined within the operation itself
### Usage
The following example creates all indexes defined in the `indices` section of the workload. It uses all of the index settings defined in the workload but overrides the number of shards:
```yml
{
"name": "create-all-indices",
"operation-type": "create-index",
"settings": {
"index.number_of_shards": 1
},
"request-params": {
"wait_for_active_shards": "true"
}
}
```
The following example creates a new index with all index settings specified in the operation body:
```yml
{
"name": "create-an-index",
"operation-type": "create-index",
"index": "people",
"body": {
"settings": {
"index.number_of_shards": 0
},
"mappings": {
"docs": {
"properties": {
"name": {
"type": "text"
}
}
}
}
}
}
```
### Configuration options
Use the following options when creating all indexes from the `indices` section of a workload.
Parameter | Required | Type | Description
:--- | :--- | :--- | :---
`settings` | No | Array | Specifies additional index settings to be merged with the index settings specified in the `indices` section of the workload.
`request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark does not attempt to serialize the parameters and passes them in their current state.
Use the following options when creating a single index in the operation.
Parameter | Required | Type | Description
:--- | :--- | :--- | :---
`index` | Yes | String | The index name.
`body` | No | Request body | The request body for the Create Index API. For more information, see [Create Index API](/api-reference/index-apis/create-index/).
`request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark does not attempt to serialize the parameters and passes them in their current state.
### Metadata
The `create-index` operation returns the following metadata:
`weight`: The number of indexes created by the operation.
`unit`: Always `ops`, indicating the number of operations inside the workload.
`success`: A Boolean indicating whether the operation has succeeded.
## delete-index
The `delete-index` operation runs the [Delete Index API](api-reference/index-apis/delete-index/). Like with the [`create-index`](#create-index) operation, you can delete all indexes found in the `indices` section of the workload or delete one or more indexes based on the string passed in the `index` setting.
### Usage
The following example deletes all indexes found in the `indices` section of the workload:
```yml
{
"name": "delete-all-indices",
"operation-type": "delete-index"
}
```
The following example deletes all `logs_*` indexes:
```yml
{
"name": "delete-logs",
"operation-type": "delete-index",
"index": "logs-*",
"only-if-exists": false,
"request-params": {
"expand_wildcards": "all",
"allow_no_indices": "true",
"ignore_unavailable": "true"
}
}
```
### Configuration options
Use the following options when deleting all indexes indicated in the `indices` section of the workload.
Parameter | Required | Type | Description
:--- | :--- | :--- | :---
`only-if-exists` | No | Boolean | Decides whether an existing index should be deleted. Default is `true`.
`request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark does not attempt to serialize the parameters and passes them in their current state.
Use the following options if you want to delete one or more indexes based on the pattern indicated in the `index` option.
Parameter | Required | Type | Description
:--- | :--- | :--- | :---
`index` | Yes | String | The index or indexes that you want to delete.
`only-if-exists` | No | Boolean | Decides whether an index should be deleted when the index exists. Default is `true`.
`request-params` | No | List of settings | Contains any request parameters allowed by the Create Index API. OpenSearch Benchmark does not attempt to serialize the parameters and passes them in their current state.
### Metadata
The `delete-index` operation returns the following metadata:
`weight`: The number of indexes created by the operation.
`unit`: Always `ops`, for the number of operations inside the workload.
`success`: A Boolean indicating whether the operation has succeeded.
## cluster-health
The `cluster-health` operation runs the [Cluster Health API](api-reference/cluster-api/cluster-health/), which checks the cluster health status and returns the expected status according to the parameters set for `request-params`. If an unexpected cluster health status is returned, the operation reports a failure. You can use the `--on-error` option in the OpenSearch Benchmark `execute-test` command to control how OpenSearch Benchmark behaves when the health check fails.
### Usage
The following example creates a `cluster-health` operation that checks for a `green` health status on any `log-*` indexes:
```yml
{
"name": "check-cluster-green",
"operation-type": "cluster-health",
"index": "logs-*",
"request-params": {
"wait_for_status": "green",
"wait_for_no_relocating_shards": "true"
},
"retry-until-success": true
}
```
### Configuration options
Use the following options with the `cluster-health` operation.
Parameter | Required | Type | Description
:--- | :--- | :--- | :---
`index` | Yes | String | The index or indexes you want to assess.
`request-params` | No | List of settings | Contains any request parameters allowed by the Cluster Health API. OpenSearch Benchmark does not attempt to serialize the parameters and passes them in their current state.
### Metadata
The `cluster-health` operation returns the following metadata:
`weight`: The number of indexes the `cluster-health` operation assesses. Alwasys `1`, since the operation runs once per index.
`unit`: Always `ops`, for the number of operations inside the workload.
`success`: A Boolean indicating whether the operation has succeeded.
- `cluster-status`: The current cluster status.
- `relocating-shards`: The number of shards currently relocating to a different node.
## refresh
The `refresh` operation runs the Refresh API. The `operation` returns no metadata.
### Usage
The following example refreshes all `logs-*` indexes:
```yml
{
"name": "refresh",
"operation-type": "refresh",
"index": "logs-*"
}
```
### Configuration options
The `refresh` operation uses the following options.
Parameter | Required | Type | Description
:--- | :--- | :--- | :---
`index` | No | String | The names of the indexes or data streams to refresh.
## search
The `search` operation runs the [Search API](/api-reference/search/), which you can use to run queries in OpenSearch Benchmark indexes.
### Usage
The following example runs a `match_all` query inside the `search` operation:
```yml
{
"name": "default",
"operation-type": "search",
"body": {
"query": {
"match_all": {}
}
},
"request-params": {
"_source_include": "some_field",
"analyze_wildcard": "false"
}
}
```
### Configuration options
The `search` operation uses the following options.
Parameter | Required | Type | Description
:--- | :--- | :--- | :---
`index` | No | String | The indexes or data streams targeted by the query. This option is needed only when the `indices` section contains two or more indexes. Otherwise, OpenSearch Benchmark automatically derives the index or data stream to use. Specify `"index": "_all"` to query against all indexes in the workload.
`cache` | No | Boolean | Specifies whether to use the query request cache. OpenSearch Benchmark defines no value. The default depends on the benchmark candidate settings and the OpenSearch version.
`request-params` | No | List of settings | Contains any request parameters allowed by the Search API.
`body` | Yes | Request body | Indicates which query and query parameters to use.
`detailed-results` | No | Boolean | Records more detailed metadata about queries. When set to `true`, additional overhead may be incurred, which can skew measurement results. This option does not work with `scroll` queries.
`results-per-page` | No | Integer | Specifies the number of documents to retrieve per page. This maps to the Search API `size` parameter and can be used for scroll and non-scroll searches. Default is `10`.
### Metadata
The following metadata is always returned:
- `weight`: The “weight” of an operation. Always `1` for regular queries and the number of retrieved pages for scroll queries.
- `unit`: The unit used to interpret weight, which is `ops` for regular queries and `pages` for scroll queries.
- `success`: A Boolean indicating whether the query has succeeded.
If `detailed-results` is set to `true`, the following metadata is also returned:
- `hits`: The total number of hits for the query.
- `hits_relation`: Whether the number of hits is accurate (eq) or a lower bound of the actual hit count (gte).
- `timed_out`: Whether the query has timed out. For scroll queries, this flag is `true` if the flag was `true` for any of the queries issued.
- `took`: The value of the `took` property in the query response. For scroll queries, the value is the sum of all `took` values in all query responses.

View File

@ -0,0 +1,191 @@
---
layout: default
title: test_procedures
parent: Workload reference
grand_parent: OpenSearch Benchmark Reference
nav_order: 110
---
# test_procedures
If your workload only defines one benchmarking scenario, specify the schedule at the top level. Use the `test-procedures` element to specify additional properties, such as a name or description. A test procedure is like a benchmarking scenario. If you have multiple test procedures, you can define a variety of challenges.
The following table lists test procedures for the benchmarking scenarios in this dataset. A test procedure can reference all operations that are defined in the operations section.
Parameter | Required | Type | Description
:--- | :--- | :--- | :---
`name` | Yes | String | The name of the test procedure. When naming the test procedure, do not use spaces; this ensures that the name can be easily entered on the command line.
`description` | No | String | Describes the test procedure in a human-readable format.
`user-info` | No | String | Outputs a message at the start of the test to notify you about important test-related information, for example, deprecations.
`default` | No | Boolean | When set to `true`, selects the default test procedure if you did not specify a test procedure on the command line. If the workload only defines one test procedure, it is implicitly selected as the default. Otherwise, you must define `"default": true` on exactly one challenge.
[`schedule`](#Schedule) | Yes | Array | Defines the order in which workload tasks are run.
## schedule
The `schedule` element contains a list of a tasks, which are operations supported by OpenSearch Benchmark, that are run by the workload during the benchmark test.
### Usage
The `schedule` element defines tasks using the methods described in this section.
#### Using the operations element
The following example defines a `force-merge` and `match-all` query task using the `operations` element. The `force-merge` operation does not use any parameters, so only the `name` and `operation-type` are needed. The `match-all-query` parameter requires a query `body` and `operation-type`.
Operations defined in the `operations` element can be reused in the schedule more than once:
```yml
{
"operations": [
{
"name": "force-merge",
"operation-type": "force-merge"
},
{
"name": "match-all-query",
"operation-type": "search",
"body": {
"query": {
"match_all": {}
}
}
}
],
"schedule": [
{
"operation": "force-merge",
"clients": 1
},
{
"operation": "match-all-query",
"clients": 4,
"warmup-iterations": 1000,
"iterations": 1000,
"target-throughput": 100
}
]
}
```
#### Defining operations inline
If you don't want to reuse an operation in the schedule, you can define operations inside the `schedule` element, as shown in the following example:
```yml
{
"schedule": [
{
"operation": {
"name": "force-merge",
"operation-type": "force-merge"
},
"clients": 1
},
{
"operation": {
"name": "match-all-query",
"operation-type": "search",
"body": {
"query": {
"match_all": {}
}
}
},
"clients": 4,
"warmup-iterations": 1000,
"iterations": 1000,
"target-throughput": 100
}
]
}
```
### Task options
Each task contains the following options.
Parameter | Required | Type | Description
:--- | :--- | :--- | :---
`operation` | Yes | List | Either refers to the name of an operation, defined in the `operations` element, or includes the entire operation inline.
`name` | No | String | Specifies a unique name for the task when multiple tasks use the same operation.
`tags` | No | String | Unique identifiers that can be used to filter between `tasks.clients` or the number of clients that should execute a task concurrently. Default is 1.
`clients` | No | Integer | Specifies the number of clients that will run the task concurrently. Default is `1`.
### Target options
OpenSearch Benchmark requires one of the following options when running a task:
`target-throughput` | No | Integer | Defines the benchmark mode. When not defined, OpenSearch Benchmark assumes that it is a throughput benchmark and runs the task as fast as possible. This is useful for batch operations, where achieving better throughput is preferred over better latency. When defined, the target specifies the number of requests per second across all clients. For example, if you specify `target-throughput: 1000` with 8 clients, each client issues 125 (= 1000 / 8) requests per second.
`target-interval` | No | Interval | Defines an interval of 1 divided by the `target-throughput` (in seconds) when the `target-throughput` is less than 1 operation per second. Define either `target-throughput` or `target-interval` but not both, otherwise OpenSearch Benchmark raises an error.
`ignore-response-error-level` | No | Boolean | Controls whether to ignore errors encountered during the task when a benchmark is run with the `on-error=abort` command flag.
### Iteration-based options
Iteration-based options determine the number of times that an operation should run. It can also define the number of iterative runs when tasks are run in [parallel](#parallel-tasks). To configure an iteration-based schedule, use the following options.
Parameter | Required | Type | Description
:--- | :--- | :--- | :---
`iterations` | No | Integer | Specifies the number of times that a client should execute an operation. All iterations are included in the measured results. Default is `1`.
`warmup-iterations` | No | Integer | Specifies the number of times that a client should execute an operation in order to warm up the benchmark candidate. The `warmup-iterations` do not appear in the measurement results. Default is `0`.
### Parallel tasks
The `parallel` element concurrently runs tasks wrapped inside the element.
When running tasks in parallel, each task requires the `client` option in order to ensure that clients inside your benchmark are reserved for that task. Otherwise, when the `client` option is specified inside the `parallel` element without a connection to the task, the benchmark uses that number of clients for all tasks.
#### Usage
In the following example, `parallel-task-1` and `parallel-task-2` execute a `bulk` operation concurrently:
```yml
{
"name": "parallel-any",
"description": "Workload completed-by property",
"schedule": [
{
"parallel": {
"tasks": [
{
"name": "parellel-task-1",
"operation": {
"operation-type": "bulk",
"bulk-size": 1000
},
"clients": 8
},
{
"name": "parellel-task-2",
"operation": {
"operation-type": "bulk",
"bulk-size": 500
},
"clients": 8
}
]
}
}
]
}
```
#### Options
The `parallel` element supports all `schedule` parameters, in addition to the following options.
`tasks` | Yes | Array | Defines a list of tasks that should be executed concurrently.
`completed-by` | No | String | Allows you to define the name of one task in the task list or the value `any`. If `completed-by` is set to the name of one task in the list, the `parallel-task` structure is considered to be complete once that specific task has been completed. If `completed-by` is set to `any`, the `parallel-task` structure is considered to be complete when any one of the tasks in the list has been completed. If `completed-by` is not explicitly defined, the `parallel-task` structure is considered to be complete as soon as all of the tasks in the list have been completed.
### Time-based options
Time-based options determine the duration of time, in seconds, for which operations should run. This is ideal for batch-style operations, which may require an additional warmup period.
To configure a time-based schedule, use the following options.
Parameter | Required | Type | Description
:--- | :--- | :--- | :---
`time-period` | No | Integer | Specifies the time period, in seconds, that OpenSearch Benchmark considers for measurement. This is not required for bulk indexing because OpenSearch Benchmark bulk indexes all documents and naturally measures all samples after the specified `warmup-time-period`.
`ramp-up-time-period` | No | Integer | Specifies the time period, in seconds, during which OpenSearch Benchmark gradually adds clients and reaches the total number of clients specified for the operation.
`warmup-time-period` | No | Integer | Specifies the amount of time, in seconds, to warm up the benchmark candidate. None of the response data captured during the warmup period appears in the measurement results.