Add additional technical feedback to workloads (#4879)

* Add additional technical feedback to workloads

TO DO: Add Running Tasks in Parallel section.

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Fix indices table.

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Fix corpora page

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Update _benchmark/workloads/corpora.md

Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

---------

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com>
This commit is contained in:
Naarcha-AWS 2023-08-28 18:09:04 -05:00 committed by GitHub
parent f04a887a60
commit dc21de0f80
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 23 additions and 19 deletions

View File

@ -5,6 +5,8 @@ parent: Workload reference
nav_order: 70
---
# corpora
The `corpora` element contains all the document corpora used by the workload. You can use document corpora across workloads by copying and pasting any corpora definitions.
## Example
@ -32,23 +34,23 @@ Use the following options with `corpora`.
Parameter | Required | Type | Description
:--- | :--- | :--- | :---
| `name` | Yes | String | The name of the document corpus. Because OpenSearch Benchmark uses this name in its directories, use only lowercase names without white spaces. |
| `documents` | Yes | JSON array | An array of document files. |
| `meta` | No | String | A mapping of key-value pairs with additional metadata for a corpus. |
`name` | Yes | String | The name of the document corpus. Because OpenSearch Benchmark uses this name in its directories, use only lowercase names without white spaces.
`documents` | Yes | JSON array | An array of document files.
`meta` | No | String | A mapping of key-value pairs with additional metadata for a corpus.
Each entry in the `documents` array consists of the following options.
Parameter | Required | Type | Description
:--- | :--- | :--- | :---
| `source-file` | Yes | String | The file name containing the corresponding documents for the workload. When using OpenSearch Benchmark locally, documents are contained in a JSON file. When providing a `base_url`, use a compressed file format: `.zip`, `.bz2`, `.gz`, `.tar`, `.tar.gz`, `.tgz`, or `.tar.bz2`. The compressed file must have one JSON file containing the name. |
| `document-count` | Yes | Integer | The number of documents in the `source-file`, which determines which client indices correlate to which parts of the document corpus. Each N client receives an Nth of the document corpus. When using a source that contains a document with a parent-child relationship, specify the number of parent documents. |
| `base-url` | No | String | An http(s), Amazon Simple Storage Service (Amazon S3), or Google Cloud Storage URL that points to the root path where OpenSearch Benchmark can obtain the corresponding source file. |
| `source-format` | No | String | Defines the format OpenSearch Benchmark uses to interpret the data file specified in `source-file`. Only `bulk` is supported. |
| `compressed-bytes` | No | Integer | The size, in bytes, of the compressed source file, indicating how much data OpenSearch Benchmark downloads. |
| `uncompressed-bytes` | No | Integer | The size, in bytes, of the source file after decompression, indicating how much disk space the decompressed source file needs. |
| `target-index` | No | String | Defines the name of the index that the `bulk` operation should target. OpenSearch Benchmark automatically derives this value when only one index is defined in the `indices` element. The value of `target-index` is ignored when the `includes-action-and-meta-data` setting is `true`. |
| `target-type` | No | String | Defines the document type of the target index targeted in bulk operations. OpenSearch Benchmark automatically derives this value when only one index is defined in the `indices` element and the index has only one type. The value of `target-type` is ignored when the `includes-action-and-meta-data` setting is `true`. |
| `includes-action-and-meta-data` | No | Boolean | When set to `true`, indicates that the document's file already contains an `action` line and a `meta-data` line. When `false`, indicates that the document's file contains only documents. Default is `false`. |
| `meta` | No | String | A mapping of key-value pairs with additional metadata for a corpus. |
`source-file` | Yes | String | The file name containing the corresponding documents for the workload. When using OpenSearch Benchmark locally, documents are contained in a JSON file. When providing a `base_url`, use a compressed file format: `.zip`, `.bz2`, `.gz`, `.tar`, `.tar.gz`, `.tgz`, or `.tar.bz2`. The compressed file must have one JSON file containing the name.
`document-count` | Yes | Integer | The number of documents in the `source-file`, which determines which client indexes correlate to which parts of the document corpus. Each N client receives an Nth of the document corpus. When using a source that contains a document with a parent-child relationship, specify the number of parent documents.
`base-url` | No | String | An http(s), Amazon Simple Storage Service (Amazon S3), or Google Cloud Storage URL that points to the root path where OpenSearch Benchmark can obtain the corresponding source file.
`source-format` | No | String | Defines the format OpenSearch Benchmark uses to interpret the data file specified in `source-file`. Only `bulk` is supported.
`compressed-bytes` | No | Integer | The size, in bytes, of the compressed source file, indicating how much data OpenSearch Benchmark downloads.
`uncompressed-bytes` | No | Integer | The size, in bytes, of the source file after decompression, indicating how much disk space the decompressed source file needs.
`target-index` | No | String | Defines the name of the index that the `bulk` operation should target. OpenSearch Benchmark automatically derives this value when only one index is defined in the `indices` element. The value of `target-index` is ignored when the `includes-action-and-meta-data` setting is `true`.
`target-type` | No | String | Defines the document type of the target index targeted in bulk operations. OpenSearch Benchmark automatically derives this value when only one index is defined in the `indices` element and the index has only one type. The value of `target-type` is ignored when the `includes-action-and-meta-data` setting is `true`.
`includes-action-and-meta-data` | No | Boolean | When set to `true`, indicates that the document's file already contains an `action` line and a `meta-data` line. When `false`, indicates that the document's file contains only documents. Default is `false`.
`meta` | No | String | A mapping of key-value pairs with additional metadata for a corpus.

View File

@ -152,10 +152,10 @@ According to this schedule, the actions will run in the following order:
2. The `cluster-health` operation assesses the health of the cluster before running the workload. In this example, the workload waits until the status of the cluster's health is `green`.
- The `bulk` operation runs the `bulk` API to index `5000` documents simultaneously.
- Before benchmarking, the workload waits until the specified `warmup-time-period` passes. In this example, the warmup period is `120` seconds.
5. The `clients` option defines the number of clients that will run the remaining actions in the schedule concurrently.
5. The `clients` field defines the number of clients that will run the remaining actions in the schedule concurrently.
6. The `search` runs a `match_all` query to match all documents after they have been indexed by the `bulk` API using the 8 clients specified.
- The `iterations` option indicates the number of times each client runs the `search` operation. The report generated by the benchmark automatically adjusts the percentile numbers based on this number. To generate a precise percentile, the benchmark needs to run at least 1,000 iterations.
- Lastly, the `target-throughput` option defines the number of requests per second each client performs, which, when set, can help reduce the latency of the benchmark. For example, a `target-throughput` of 100 requests divided by 8 clients means that each client will issue 12 requests per second.
- The `iterations` field indicates the number of times each client runs the `search` operation. The report generated by the benchmark automatically adjusts the percentile numbers based on this number. To generate a precise percentile, the benchmark needs to run at least 1,000 iterations.
- Lastly, the `target-throughput` field defines the number of requests per second each client performs, which, when set, can help reduce the latency of the benchmark. For example, a `target-throughput` of 100 requests divided by 8 clients means that each client will issue 12 requests per second.
## More workload examples
@ -247,4 +247,4 @@ The following workload runs a benchmark with a single task: a `match_all` query.
## Next steps
- For more information about configuring OpenSearch Benchmark, see [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/).
- For a list of prepackaged workloads for OpenSearch Benchmark, see the [opensearch-benchmark-workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) repository.
- For a list of prepackaged workloads for OpenSearch Benchmark, see the [opensearch-benchmark-workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) repository.

View File

@ -5,6 +5,8 @@ parent: Workload reference
nav_order: 65
---
# indices
The `indices` element contains a list of all indices used in the workload.
## Example
@ -24,5 +26,5 @@ Use the following options with `indices`:
Parameter | Required | Type | Description
:--- | :--- | :--- | :---
| `name` | Yes | String | The name of the index template. |
| `body` | No | String | The file name corresponding to the index definition used in the body of the Create Index API. |
`name` | Yes | String | The name of the index template.
`body` | No | String | The file name corresponding to the index definition used in the body of the Create Index API.