251 lines
9.1 KiB
Markdown
251 lines
9.1 KiB
Markdown
---
|
|
layout: default
|
|
title: Workload reference
|
|
nav_order: 60
|
|
has_children: true
|
|
---
|
|
|
|
# OpenSearch Benchmark workload reference
|
|
|
|
A workload is a specification of one or more benchmarking scenarios. A workload typically includes the following:
|
|
|
|
- One or more data streams that are ingested into indices
|
|
- A set of queries and operations that are invoked as part of the benchmark
|
|
|
|
## Anatomy of a workload
|
|
|
|
The following example workload shows all of the essential elements needed to create a workload.json file. You can run this workload in your own benchmark configuration in order to understand how all of the elements work together:
|
|
|
|
```json
|
|
{
|
|
"description": "Tutorial benchmark for OpenSearch Benchmark",
|
|
"indices": [
|
|
{
|
|
"name": "movies",
|
|
"body": "index.json"
|
|
}
|
|
],
|
|
"corpora": [
|
|
{
|
|
"name": "movies",
|
|
"documents": [
|
|
{
|
|
"source-file": "movies-documents.json",
|
|
"document-count": 11658903, # Fetch document count from command line
|
|
"uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line
|
|
}
|
|
]
|
|
}
|
|
],
|
|
"schedule": [
|
|
{
|
|
"operation": {
|
|
"operation-type": "create-index"
|
|
}
|
|
},
|
|
{
|
|
"operation": {
|
|
"operation-type": "cluster-health",
|
|
"request-params": {
|
|
"wait_for_status": "green"
|
|
},
|
|
"retry-until-success": true
|
|
}
|
|
},
|
|
{
|
|
"operation": {
|
|
"operation-type": "bulk",
|
|
"bulk-size": 5000
|
|
},
|
|
"warmup-time-period": 120,
|
|
"clients": 8
|
|
},
|
|
{
|
|
"operation": {
|
|
"name": "query-match-all",
|
|
"operation-type": "search",
|
|
"body": {
|
|
"query": {
|
|
"match_all": {}
|
|
}
|
|
}
|
|
},
|
|
"iterations": 1000,
|
|
"target-throughput": 100
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
A workload usually consists of the following elements:
|
|
|
|
- [indices]({{site.url}}{{site.baseurl}}/benchmark/workloads/indices/): Defines the relevant indices and index templates used for the workload.
|
|
- [corpora]({{site.url}}{{site.baseurl}}/benchmark/workloads/corpora/): Defines all document corpora used for the workload.
|
|
- `schedule`: Defines operations and in what order the operations run in-line. Alternatively, you can use `operations` to group operations and the `test_procedures` parameter to specify the order of operations.
|
|
- `operations`: **Optional**. Describes which operations are available for the workload and how they are parameterized.
|
|
|
|
### Indices
|
|
|
|
To create an index, specify its `name`. To add definitions to your index, use the `body` option and point it to the JSON file containing the index definitions. For more information, see [indices]({{site.url}}{{site.baseurl}}/benchmark/workloads/indices/). For more information, see [indices]({{site.url}}{{site.baseurl}}/benchmark/workloads/indices/).
|
|
|
|
### Corpora
|
|
|
|
The `corpora` element requires the name of the index containing the document corpus, for example, `movies`, and a list of parameters that define the document corpora. This list includes the following parameters:
|
|
|
|
- `source-file`: The file name that contains the workload's corresponding documents. When using OpenSearch Benchmark locally, documents are contained in a JSON file. When providing a `base_url`, use a compressed file format: `.zip`, `.bz2`, `.gz`, `.tar`, `.tar.gz`, `.tgz`, or `.tar.bz2`. The compressed file must have one JSON file containing the name.
|
|
- `document-count`: The number of documents in the `source-file`, which determines which client indices correlate to which parts of the document corpus. Each N client receives an Nth of the document corpus. When using a source that contains a document with a parent-child relationship, specify the number of parent documents.
|
|
- `uncompressed-bytes`: The size, in bytes, of the source file after decompression, indicating how much disk space the decompressed source file needs.
|
|
- `compressed-bytes`: The size, in bytes, of the source file before decompression. This can help you assess the amount of time needed for the cluster to ingest documents.
|
|
|
|
### Operations
|
|
|
|
The `operations` element lists the OpenSearch API operations performed by the workload. For example, you can set an operation to `create-index`, which creates an index in the test cluster that OpenSearch Benchmark can write documents into. Operations are usually listed inside of `schedule`.
|
|
|
|
### Schedule
|
|
|
|
The `schedule` element contains a list of actions and operations that are run by the workload. Operations run according to the order in which they appear in the `schedule`. The following example illustrates a `schedule` with multiple operations, each defined by its `operation-type`:
|
|
|
|
```json
|
|
"schedule": [
|
|
{
|
|
"operation": {
|
|
"operation-type": "create-index"
|
|
}
|
|
},
|
|
{
|
|
"operation": {
|
|
"operation-type": "cluster-health",
|
|
"request-params": {
|
|
"wait_for_status": "green"
|
|
},
|
|
"retry-until-success": true
|
|
}
|
|
},
|
|
{
|
|
"operation": {
|
|
"operation-type": "bulk",
|
|
"bulk-size": 5000
|
|
},
|
|
"warmup-time-period": 120,
|
|
"clients": 8
|
|
},
|
|
{
|
|
"operation": {
|
|
"name": "query-match-all",
|
|
"operation-type": "search",
|
|
"body": {
|
|
"query": {
|
|
"match_all": {}
|
|
}
|
|
}
|
|
},
|
|
"iterations": 1000,
|
|
"target-throughput": 100
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
According to this schedule, the actions will run in the following order:
|
|
|
|
1. The `create-index` operation creates an index. The index remains empty until the `bulk` operation adds documents with benchmarked data.
|
|
2. The `cluster-health` operation assesses the health of the cluster before running the workload. In this example, the workload waits until the status of the cluster's health is `green`.
|
|
- The `bulk` operation runs the `bulk` API to index `5000` documents simultaneously.
|
|
- Before benchmarking, the workload waits until the specified `warmup-time-period` passes. In this example, the warmup period is `120` seconds.
|
|
5. The `clients` field defines the number of clients that will run the remaining actions in the schedule concurrently.
|
|
6. The `search` runs a `match_all` query to match all documents after they have been indexed by the `bulk` API using the 8 clients specified.
|
|
- The `iterations` field indicates the number of times each client runs the `search` operation. The report generated by the benchmark automatically adjusts the percentile numbers based on this number. To generate a precise percentile, the benchmark needs to run at least 1,000 iterations.
|
|
- Lastly, the `target-throughput` field defines the number of requests per second each client performs, which, when set, can help reduce the latency of the benchmark. For example, a `target-throughput` of 100 requests divided by 8 clients means that each client will issue 12 requests per second.
|
|
|
|
|
|
## More workload examples
|
|
|
|
If you want to try certain workloads before creating your own, use the following examples.
|
|
|
|
### Running unthrottled
|
|
|
|
In the following example, OpenSearch Benchmark runs an unthrottled bulk index operation for 1 hour against the `movies` index:
|
|
|
|
```json
|
|
{
|
|
"description": "Tutorial benchmark for OpenSearch Benchmark",
|
|
"indices": [
|
|
{
|
|
"name": "movies",
|
|
"body": "index.json"
|
|
}
|
|
],
|
|
"corpora": [
|
|
{
|
|
"name": "movies",
|
|
"documents": [
|
|
{
|
|
"source-file": "movies-documents.json",
|
|
"document-count": 11658903, # Fetch document count from command line
|
|
"uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line
|
|
}
|
|
]
|
|
}
|
|
],
|
|
"schedule": [
|
|
{
|
|
"operation": "bulk",
|
|
"warmup-time-period": 120,
|
|
"time-period": 3600,
|
|
"clients": 8
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Workload with a single task
|
|
|
|
The following workload runs a benchmark with a single task: a `match_all` query. Because no `clients` are indicated, only one client is used. According to the `schedule`, the workload runs the `match_all` query at 10 operations per second with 1 client, uses 100 iterations to warm up, and uses the next 100 iterations to measure the benchmark:
|
|
|
|
```json
|
|
{
|
|
"description": "Tutorial benchmark for OpenSearch Benchmark",
|
|
"indices": [
|
|
{
|
|
"name": "movies",
|
|
"body": "index.json"
|
|
}
|
|
],
|
|
"corpora": [
|
|
{
|
|
"name": "movies",
|
|
"documents": [
|
|
{
|
|
"source-file": "movies-documents.json",
|
|
"document-count": 11658903, # Fetch document count from command line
|
|
"uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line
|
|
}
|
|
]
|
|
}
|
|
],
|
|
{
|
|
"schedule": [
|
|
{
|
|
"operation": {
|
|
"operation-type": "search",
|
|
"index": "_all",
|
|
"body": {
|
|
"query": {
|
|
"match_all": {}
|
|
}
|
|
}
|
|
},
|
|
"warmup-iterations": 100,
|
|
"iterations": 100,
|
|
"target-throughput": 10
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
## Next steps
|
|
|
|
- For more information about configuring OpenSearch Benchmark, see [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/).
|
|
- For a list of prepackaged workloads for OpenSearch Benchmark, see the [opensearch-benchmark-workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) repository.
|