Add Running a Workload (#6287)

* Add Running a Workload draft

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Update running-workloads.md

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Update _benchmark/user-guide/running-workloads.md

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Fix link

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Add additional missing link

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Add running workloads

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Add numbered steps

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

---------

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
This commit is contained in:
Naarcha-AWS 2024-02-13 14:28:04 -06:00 committed by GitHub
parent c87bd64a57
commit de38df1048
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 169 additions and 1 deletions

View File

@ -11,7 +11,7 @@ Before using OpenSearch Benchmark, familiarize yourself with the following conce
## Core concepts and definitions
- **Workload**: The description of one or more benchmarking scenarios that use a specific document corpus to perform a benchmark against your cluster. The document corpus contains any indexes, data files, and operations invoked when the workflow runs. You can list the available workloads by using `opensearch-benchmark list workloads` or view any included workloads in the [OpenSearch Benchmark Workloads repository](https://github.com/opensearch-project/opensearch-benchmark-workloads/). For more information about the elements of a workload, see [Anatomy of a workload](({{site.url}}{{site.baseurl}}/benchmark/understanding-workloads/anatomy-of-a-workload/). For information about building a custom workload, see [Creating custom workloads]({{site.url}}{{site.baseurl}}/benchmark/creating-custom-workloads/).
- **Workload**: The description of one or more benchmarking scenarios that use a specific document corpus to perform a benchmark against your cluster. The document corpus contains any indexes, data files, and operations invoked when the workflow runs. You can list the available workloads by using `opensearch-benchmark list workloads` or view any included workloads in the [OpenSearch Benchmark Workloads repository](https://github.com/opensearch-project/opensearch-benchmark-workloads/). For more information about the elements of a workload, see [Anatomy of a workload]({{site.url}}{{site.baseurl}}/benchmark/user-guide/understanding-workloads/anatomy-of-a-workload/). For information about building a custom workload, see [Creating custom workloads]({{site.url}}{{site.baseurl}}/benchmark/creating-custom-workloads/).
- **Pipeline**: A series of steps occurring before and after a workload is run that determines benchmark results. OpenSearch Benchmark supports three pipelines:
- `from-sources`: Builds and provisions OpenSearch, runs a benchmark, and then publishes the results.

View File

@ -0,0 +1,168 @@
---
layout: default
title: Running a workload
nav_order: 9
parent: User guide
---
# Running a workload
Once you have a complete understanding of the various components of an OpenSearch Benchmark [workload]({{site.url}}{{site.baseurl}}/benchmark/user-guide/understanding-workloads/anatomy-of-a-workload/), you can run your first workload.
## Step 1: Find the workload name
To learn more about the standard workloads included with OpenSearch Benchmark, use the following command:
```
opensearch-benchmark list workloads
```
{% include copy.html %}
A list of all workloads supported by OpenSearch Benchmark appears. Review the list and select the workload that's most similar to your cluster's use case.
## Step 2: Running the test
After you've selected the workload, you can invoke the workload using the `opensearch-benchmark execute-test` command. Replace `--target-host` with the `host:port` pairs for your cluster and `--client-options` with any authorization options required to access the cluster. The following example runs the `nyc_taxis` workload on a localhost for testing purposes.
If you want to run a test on an external cluster, see [Running the workload on your own cluster](#running-a-workload-on-an-external-cluster).
```bash
opensearch-benchmark execute-test --pipeline=benchmark-only --workload=nyc_taxis --target-host=https://localhost:9200 --client-options=basic_auth_user:admin,basic_auth_password:admin,verify_certs:false
```
{% include copy.html %}
Results from the test appear in the directory set by the `--output-path` option in the `execute-test` command.
### Test mode
If you want to run the test in test mode to make sure that your workload operates as intended, add the `--test-mode` option to the `execute-test` command. Test mode ingests only the first 1,000 documents from each index provided and runs query operations against them.
## Step 3: Validate the test
After running an OpenSearch Benchmark test, take the following steps to verify that it has run properly:
1. Note the number of documents in the OpenSearch or OpenSearch Dashboards index that you plan to run the benchmark against.
2. In the results returned by OpenSearch Benchmark, compare the `workload.json` file for your specific workload and verify that the document count matches the number of documents. For example, based on the [nyc_taxis](https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/main/nyc_taxis/workload.json#L20) `workload.json` file, you should expect to see `165346692` documents in your cluster.
## Expected results
OSB returns the following response once the benchmark completes:
```bash
------------------------------------------------------
_______ __ _____
/ ____(_)___ ____ _/ / / ___/_________ ________
/ /_ / / __ \/ __ `/ / \__ \/ ___/ __ \/ ___/ _ \
/ __/ / / / / / /_/ / / ___/ / /__/ /_/ / / / __/
/_/ /_/_/ /_/\__,_/_/ /____/\___/\____/_/ \___/
------------------------------------------------------
| Metric | Task | Value | Unit |
|---------------------------------------------------------------:|-------------------------------------------:|------------:|-------:|
| Cumulative indexing time of primary shards | | 0.02655 | min |
| Min cumulative indexing time across primary shards | | 0 | min |
| Median cumulative indexing time across primary shards | | 0.00176667 | min |
| Max cumulative indexing time across primary shards | | 0.0140333 | min |
| Cumulative indexing throttle time of primary shards | | 0 | min |
| Min cumulative indexing throttle time across primary shards | | 0 | min |
| Median cumulative indexing throttle time across primary shards | | 0 | min |
| Max cumulative indexing throttle time across primary shards | | 0 | min |
| Cumulative merge time of primary shards | | 0.0102333 | min |
| Cumulative merge count of primary shards | | 3 | |
| Min cumulative merge time across primary shards | | 0 | min |
| Median cumulative merge time across primary shards | | 0 | min |
| Max cumulative merge time across primary shards | | 0.0102333 | min |
| Cumulative merge throttle time of primary shards | | 0 | min |
| Min cumulative merge throttle time across primary shards | | 0 | min |
| Median cumulative merge throttle time across primary shards | | 0 | min |
| Max cumulative merge throttle time across primary shards | | 0 | min |
| Cumulative refresh time of primary shards | | 0.0709333 | min |
| Cumulative refresh count of primary shards | | 118 | |
| Min cumulative refresh time across primary shards | | 0 | min |
| Median cumulative refresh time across primary shards | | 0.00186667 | min |
| Max cumulative refresh time across primary shards | | 0.0511667 | min |
| Cumulative flush time of primary shards | | 0.00963333 | min |
| Cumulative flush count of primary shards | | 4 | |
| Min cumulative flush time across primary shards | | 0 | min |
| Median cumulative flush time across primary shards | | 0 | min |
| Max cumulative flush time across primary shards | | 0.00398333 | min |
| Total Young Gen GC time | | 0 | s |
| Total Young Gen GC count | | 0 | |
| Total Old Gen GC time | | 0 | s |
| Total Old Gen GC count | | 0 | |
| Store size | | 0.000485923 | GB |
| Translog size | | 2.01873e-05 | GB |
| Heap used for segments | | 0 | MB |
| Heap used for doc values | | 0 | MB |
| Heap used for terms | | 0 | MB |
| Heap used for norms | | 0 | MB |
| Heap used for points | | 0 | MB |
| Heap used for stored fields | | 0 | MB |
| Segment count | | 32 | |
| Min Throughput | index | 3008.97 | docs/s |
| Mean Throughput | index | 3008.97 | docs/s |
| Median Throughput | index | 3008.97 | docs/s |
| Max Throughput | index | 3008.97 | docs/s |
| 50th percentile latency | index | 351.059 | ms |
| 100th percentile latency | index | 365.058 | ms |
| 50th percentile service time | index | 351.059 | ms |
| 100th percentile service time | index | 365.058 | ms |
| error rate | index | 0 | % |
| Min Throughput | wait-until-merges-finish | 28.41 | ops/s |
| Mean Throughput | wait-until-merges-finish | 28.41 | ops/s |
| Median Throughput | wait-until-merges-finish | 28.41 | ops/s |
| Max Throughput | wait-until-merges-finish | 28.41 | ops/s |
| 100th percentile latency | wait-until-merges-finish | 34.7088 | ms |
| 100th percentile service time | wait-until-merges-finish | 34.7088 | ms |
| error rate | wait-until-merges-finish | 0 | % |
| Min Throughput | percolator_with_content_president_bush | 36.09 | ops/s |
| Mean Throughput | percolator_with_content_president_bush | 36.09 | ops/s |
| Median Throughput | percolator_with_content_president_bush | 36.09 | ops/s |
| Max Throughput | percolator_with_content_president_bush | 36.09 | ops/s |
| 100th percentile latency | percolator_with_content_president_bush | 35.9822 | ms |
| 100th percentile service time | percolator_with_content_president_bush | 7.93048 | ms |
| error rate | percolator_with_content_president_bush | 0 | % |
[...]
| Min Throughput | percolator_with_content_ignore_me | 16.1 | ops/s |
| Mean Throughput | percolator_with_content_ignore_me | 16.1 | ops/s |
| Median Throughput | percolator_with_content_ignore_me | 16.1 | ops/s |
| Max Throughput | percolator_with_content_ignore_me | 16.1 | ops/s |
| 100th percentile latency | percolator_with_content_ignore_me | 131.798 | ms |
| 100th percentile service time | percolator_with_content_ignore_me | 69.5237 | ms |
| error rate | percolator_with_content_ignore_me | 0 | % |
| Min Throughput | percolator_no_score_with_content_ignore_me | 29.37 | ops/s |
| Mean Throughput | percolator_no_score_with_content_ignore_me | 29.37 | ops/s |
| Median Throughput | percolator_no_score_with_content_ignore_me | 29.37 | ops/s |
| Max Throughput | percolator_no_score_with_content_ignore_me | 29.37 | ops/s |
| 100th percentile latency | percolator_no_score_with_content_ignore_me | 45.5703 | ms |
| 100th percentile service time | percolator_no_score_with_content_ignore_me | 11.316 | ms |
| error rate | percolator_no_score_with_content_ignore_me | 0 | % |
--------------------------------
[INFO] SUCCESS (took 18 seconds)
--------------------------------
```
## Running a workload on an external cluster
Now that you're familiar with running OpenSearch Benchmark on a local cluster, you can run it on your external cluster, as described in the following steps:
1. Replace `https://localhost:9200` with your target cluster endpoint. This could be a Uniform Resource Identifier (URI), such as `https://search.mydomain.com`, or a `HOST:PORT` specification.
2. If the cluster is configured with basic authentication, replace the username and password in the command line with the appropriate credentials.
3. Remove the `verify_certs:false` directive if you are not specifying `localhost` as your target cluster. This directive is necessary solely for clusters without SSL certificates.
4. If you are using a `HOST:PORT`specification and plan to use SSL or TLS, either specify `https://` or add the `use_ssl:true` directive to the `--client-options` string option.
5. Remove the `--test-mode` flag to run the full workload rather than an abbreviated test.
You can copy the following command template to use it in your own terminal:
```bash
opensearch-benchmark execute-test --pipeline=benchmark-only --workload=nyc_taxis --target-host=<OpenSearch Cluster Endpoint> --client-options=basic_auth_user:admin,basic_auth_password:admin
```
{% include copy.html %}