From dc21de0f80ed94b0f48b9b14f5edb53ad76489fd Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Mon, 28 Aug 2023 18:09:04 -0500 Subject: [PATCH] Add additional technical feedback to workloads (#4879) * Add additional technical feedback to workloads TO DO: Add Running Tasks in Parallel section. Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Fix indices table. Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Fix corpora page Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _benchmark/workloads/corpora.md Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com> --- _benchmark/workloads/corpora.md | 28 +++++++++++++++------------- _benchmark/workloads/index.md | 8 ++++---- _benchmark/workloads/indices.md | 6 ++++-- 3 files changed, 23 insertions(+), 19 deletions(-) diff --git a/_benchmark/workloads/corpora.md b/_benchmark/workloads/corpora.md index 61ecb715..930baa5c 100644 --- a/_benchmark/workloads/corpora.md +++ b/_benchmark/workloads/corpora.md @@ -5,6 +5,8 @@ parent: Workload reference nav_order: 70 --- +# corpora + The `corpora` element contains all the document corpora used by the workload. You can use document corpora across workloads by copying and pasting any corpora definitions. ## Example @@ -32,23 +34,23 @@ Use the following options with `corpora`. Parameter | Required | Type | Description :--- | :--- | :--- | :--- -| `name` | Yes | String | The name of the document corpus. Because OpenSearch Benchmark uses this name in its directories, use only lowercase names without white spaces. | -| `documents` | Yes | JSON array | An array of document files. | -| `meta` | No | String | A mapping of key-value pairs with additional metadata for a corpus. | +`name` | Yes | String | The name of the document corpus. Because OpenSearch Benchmark uses this name in its directories, use only lowercase names without white spaces. +`documents` | Yes | JSON array | An array of document files. +`meta` | No | String | A mapping of key-value pairs with additional metadata for a corpus. Each entry in the `documents` array consists of the following options. Parameter | Required | Type | Description :--- | :--- | :--- | :--- -| `source-file` | Yes | String | The file name containing the corresponding documents for the workload. When using OpenSearch Benchmark locally, documents are contained in a JSON file. When providing a `base_url`, use a compressed file format: `.zip`, `.bz2`, `.gz`, `.tar`, `.tar.gz`, `.tgz`, or `.tar.bz2`. The compressed file must have one JSON file containing the name. | -| `document-count` | Yes | Integer | The number of documents in the `source-file`, which determines which client indices correlate to which parts of the document corpus. Each N client receives an Nth of the document corpus. When using a source that contains a document with a parent-child relationship, specify the number of parent documents. | -| `base-url` | No | String | An http(s), Amazon Simple Storage Service (Amazon S3), or Google Cloud Storage URL that points to the root path where OpenSearch Benchmark can obtain the corresponding source file. | -| `source-format` | No | String | Defines the format OpenSearch Benchmark uses to interpret the data file specified in `source-file`. Only `bulk` is supported. | -| `compressed-bytes` | No | Integer | The size, in bytes, of the compressed source file, indicating how much data OpenSearch Benchmark downloads. | -| `uncompressed-bytes` | No | Integer | The size, in bytes, of the source file after decompression, indicating how much disk space the decompressed source file needs. | -| `target-index` | No | String | Defines the name of the index that the `bulk` operation should target. OpenSearch Benchmark automatically derives this value when only one index is defined in the `indices` element. The value of `target-index` is ignored when the `includes-action-and-meta-data` setting is `true`. | -| `target-type` | No | String | Defines the document type of the target index targeted in bulk operations. OpenSearch Benchmark automatically derives this value when only one index is defined in the `indices` element and the index has only one type. The value of `target-type` is ignored when the `includes-action-and-meta-data` setting is `true`. | -| `includes-action-and-meta-data` | No | Boolean | When set to `true`, indicates that the document's file already contains an `action` line and a `meta-data` line. When `false`, indicates that the document's file contains only documents. Default is `false`. | -| `meta` | No | String | A mapping of key-value pairs with additional metadata for a corpus. | +`source-file` | Yes | String | The file name containing the corresponding documents for the workload. When using OpenSearch Benchmark locally, documents are contained in a JSON file. When providing a `base_url`, use a compressed file format: `.zip`, `.bz2`, `.gz`, `.tar`, `.tar.gz`, `.tgz`, or `.tar.bz2`. The compressed file must have one JSON file containing the name. +`document-count` | Yes | Integer | The number of documents in the `source-file`, which determines which client indexes correlate to which parts of the document corpus. Each N client receives an Nth of the document corpus. When using a source that contains a document with a parent-child relationship, specify the number of parent documents. +`base-url` | No | String | An http(s), Amazon Simple Storage Service (Amazon S3), or Google Cloud Storage URL that points to the root path where OpenSearch Benchmark can obtain the corresponding source file. +`source-format` | No | String | Defines the format OpenSearch Benchmark uses to interpret the data file specified in `source-file`. Only `bulk` is supported. +`compressed-bytes` | No | Integer | The size, in bytes, of the compressed source file, indicating how much data OpenSearch Benchmark downloads. +`uncompressed-bytes` | No | Integer | The size, in bytes, of the source file after decompression, indicating how much disk space the decompressed source file needs. +`target-index` | No | String | Defines the name of the index that the `bulk` operation should target. OpenSearch Benchmark automatically derives this value when only one index is defined in the `indices` element. The value of `target-index` is ignored when the `includes-action-and-meta-data` setting is `true`. +`target-type` | No | String | Defines the document type of the target index targeted in bulk operations. OpenSearch Benchmark automatically derives this value when only one index is defined in the `indices` element and the index has only one type. The value of `target-type` is ignored when the `includes-action-and-meta-data` setting is `true`. +`includes-action-and-meta-data` | No | Boolean | When set to `true`, indicates that the document's file already contains an `action` line and a `meta-data` line. When `false`, indicates that the document's file contains only documents. Default is `false`. +`meta` | No | String | A mapping of key-value pairs with additional metadata for a corpus. diff --git a/_benchmark/workloads/index.md b/_benchmark/workloads/index.md index 3ce6ba83..771e9830 100644 --- a/_benchmark/workloads/index.md +++ b/_benchmark/workloads/index.md @@ -152,10 +152,10 @@ According to this schedule, the actions will run in the following order: 2. The `cluster-health` operation assesses the health of the cluster before running the workload. In this example, the workload waits until the status of the cluster's health is `green`. - The `bulk` operation runs the `bulk` API to index `5000` documents simultaneously. - Before benchmarking, the workload waits until the specified `warmup-time-period` passes. In this example, the warmup period is `120` seconds. -5. The `clients` option defines the number of clients that will run the remaining actions in the schedule concurrently. +5. The `clients` field defines the number of clients that will run the remaining actions in the schedule concurrently. 6. The `search` runs a `match_all` query to match all documents after they have been indexed by the `bulk` API using the 8 clients specified. - - The `iterations` option indicates the number of times each client runs the `search` operation. The report generated by the benchmark automatically adjusts the percentile numbers based on this number. To generate a precise percentile, the benchmark needs to run at least 1,000 iterations. - - Lastly, the `target-throughput` option defines the number of requests per second each client performs, which, when set, can help reduce the latency of the benchmark. For example, a `target-throughput` of 100 requests divided by 8 clients means that each client will issue 12 requests per second. + - The `iterations` field indicates the number of times each client runs the `search` operation. The report generated by the benchmark automatically adjusts the percentile numbers based on this number. To generate a precise percentile, the benchmark needs to run at least 1,000 iterations. + - Lastly, the `target-throughput` field defines the number of requests per second each client performs, which, when set, can help reduce the latency of the benchmark. For example, a `target-throughput` of 100 requests divided by 8 clients means that each client will issue 12 requests per second. ## More workload examples @@ -247,4 +247,4 @@ The following workload runs a benchmark with a single task: a `match_all` query. ## Next steps - For more information about configuring OpenSearch Benchmark, see [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/). -- For a list of prepackaged workloads for OpenSearch Benchmark, see the [opensearch-benchmark-workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) repository. \ No newline at end of file +- For a list of prepackaged workloads for OpenSearch Benchmark, see the [opensearch-benchmark-workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) repository. diff --git a/_benchmark/workloads/indices.md b/_benchmark/workloads/indices.md index 988401e3..1aae3e53 100644 --- a/_benchmark/workloads/indices.md +++ b/_benchmark/workloads/indices.md @@ -5,6 +5,8 @@ parent: Workload reference nav_order: 65 --- +# indices + The `indices` element contains a list of all indices used in the workload. ## Example @@ -24,5 +26,5 @@ Use the following options with `indices`: Parameter | Required | Type | Description :--- | :--- | :--- | :--- -| `name` | Yes | String | The name of the index template. | -| `body` | No | String | The file name corresponding to the index definition used in the body of the Create Index API. | +`name` | Yes | String | The name of the index template. +`body` | No | String | The file name corresponding to the index definition used in the body of the Create Index API.