[OSCI][DOCs] Replacing 'indices' terms for 'indexes' terms ONLY for description texts (#5353)

* Fixing documentation for Wildcard in term-level queries section for Query DSL

Signed-off-by: Samuel Valdes Gutierrez <valdesgutierrez@gmail.com>

* replacing 'indices' term for 'indexes' term ONLY for description texts (not variables, links or properties)

Signed-off-by: Samuel Valdes Gutierrez <valdesgutierrez@gmail.com>

* Update creating-custom-workloads.md

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* updating changes suggested by Naarcha-AWS

Signed-off-by: Samuel Valdes Gutierrez <valdesgutierrez@gmail.com>

* updating changes suggested by Naarcha-AWS

Signed-off-by: Samuel Valdes Gutierrez <valdesgutierrez@gmail.com>

* Rename _benchmark/workloads/index.md to _benchmark/workloads/reference/index.md

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Rename _benchmark/workloads/indices.md to _benchmark/workloads/reference/indices.md

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

---------

Signed-off-by: Samuel Valdes Gutierrez <valdesgutierrez@gmail.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
This commit is contained in:
Samuel Valdes Gutierrez 2023-10-31 15:18:45 +00:00 committed by GitHub
parent f9e0c02fdf
commit 7012af124a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
16 changed files with 266 additions and 116 deletions

View File

@ -53,7 +53,7 @@ In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-refe
Parameter | Type | Description
:--- | :--- | :---
local | Boolean | Whether to return information from the local node only instead of from the master node. Default is false.
expand_wildcards | Enum | Expands wildcard expressions to concrete indices. Combine multiple values with commas. Supported values are `all`, `open`, `closed`, `hidden`, and `none`. Default is `open`.
expand_wildcards | Enum | Expands wildcard expressions to concrete indexes. Combine multiple values with commas. Supported values are `all`, `open`, `closed`, `hidden`, and `none`. Default is `open`.
## Response

View File

@ -12,7 +12,7 @@ redirect_from:
**Introduced 1.0**
{: .label .label-purple }
The CAT indices operation lists information related to indexes, that is, how much disk space they are using, how many shards they have, their health status, and so on.
The CAT indexes operation lists information related to indexes, that is, how much disk space they are using, how many shards they have, their health status, and so on.
## Example
@ -44,7 +44,7 @@ GET _cat/indices
## URL parameters
All CAT indices URL parameters are optional.
All CAT indexes URL parameters are optional.
In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters:

View File

@ -61,7 +61,7 @@ GET _count
```
{% include copy-curl.html %}
Alternatively, you could use the [cat indices]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-indices/) and [cat count]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-count/) APIs to see the number of documents per index or data stream.
Alternatively, you could use the [cat indexes]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-indices/) and [cat count]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-count/) APIs to see the number of documents per index or data stream.
{: .note }

View File

@ -57,7 +57,7 @@ refresh | Enum | Whether to refresh the affected shards after performing the ind
require_alias | Boolean | Set to `true` to require that all actions target an index alias rather than an index. Default is `false`.
routing | String | Routes the request to the specified shard.
timeout | Time | How long to wait for the request to return. Default `1m`.
type | String | (Deprecated) The default document type for documents that don't specify a type. Default is `_doc`. We highly recommend ignoring this parameter and using a type of `_doc` for all indices.
type | String | (Deprecated) The default document type for documents that don't specify a type. Default is `_doc`. We highly recommend ignoring this parameter and using a type of `_doc` for all indexes.
wait_for_active_shards | String | Specifies the number of active shards that must be available before OpenSearch processes the bulk request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the request to succeed.
{% comment %}_source | List | asdf
_source_excludes | list | asdf

View File

@ -39,16 +39,16 @@ All URL parameters are optional.
Parameter | Type | Description
:--- | :--- | :--- | :---
&lt;index&gt; | String | Name or list of the data streams, indices, or aliases to delete from. Supports wildcards. If left blank, OpenSearch searches all indices.
allow_no_indices | Boolean | Whether to ignore wildcards that dont match any indices. Default is `true`.
&lt;index&gt; | String | Name or list of the data streams, indexes, or aliases to delete from. Supports wildcards. If left blank, OpenSearch searches all indexes.
allow_no_indices | Boolean | Whether to ignore wildcards that dont match any indexes. Default is `true`.
analyzer | String | The analyzer to use in the query string.
analyze_wildcard | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is false.
conflicts | String | Indicates to OpenSearch what should happen if the delete by query operation runs into a version conflict. Valid options are `abort` and `proceed`. Default is `abort`.
default_operator | String | Indicates whether the default operator for a string query should be AND or OR. Default is OR.
df | String | The default field in case a field prefix is not provided in the query string.
expand_wildcards | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indices), `closed` (match closed, non-hidden indices), `hidden` (match hidden indices), and `none` (deny wildcard expressions). Default is `open`.
expand_wildcards | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indexes), `closed` (match closed, non-hidden indexes), `hidden` (match hidden indexes), and `none` (deny wildcard expressions). Default is `open`.
from | Integer | The starting index to search from. Default is 0.
ignore_unavailable | Boolean | Specifies whether to include missing or closed indices in the response. Default is false.
ignore_unavailable | Boolean | Specifies whether to include missing or closed indexes in the response. Default is false.
lenient | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is false.
max_docs | Integer | How many documents the delete by query operation should process at most. Default is all documents.
preference | String | Specifies which shard or node OpenSearch should perform the delete by query operation on.

View File

@ -81,14 +81,14 @@ POST _bulk
```
## List all indices
## List all indexes
```
GET _cat/indices?v&expand_wildcards=all
```
## Open or close all indices that match a pattern
## Open or close all indexes that match a pattern
```
POST my-logs*/_open
@ -96,7 +96,7 @@ POST my-logs*/_close
```
## Delete all indices that match a pattern
## Delete all indexes that match a pattern
```
DELETE my-logs*
@ -119,7 +119,7 @@ GET _cat/aliases?v
```
## Search an index or all indices that match a pattern
## Search an index or all indexes that match a pattern
```
GET my-logs/_search?q=test

View File

@ -10,8 +10,17 @@ redirect_from: /benchmark/creating-custom-workloads/
OpenSearch Benchmark includes a set of [workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) that you can use to benchmark data from your cluster. Additionally, if you want to create a workload that is tailored to your own data, you can create a custom workload using one of the following options:
- [Creating custom workloads](#creating-custom-workloads)
- [Creating a workload from an existing cluster](#creating-a-workload-from-an-existing-cluster)
- [Prerequisites](#prerequisites)
- [Customizing the workload](#customizing-the-workload)
- [Creating a workload without an existing cluster](#creating-a-workload-without-an-existing-cluster)
- [Invoking your custom workload](#invoking-your-custom-workload)
- [Advanced options](#advanced-options)
- [Test mode](#test-mode)
- [Adding variance to test procedures](#adding-variance-to-test-procedures)
- [Separate operations and test procedures](#separate-operations-and-test-procedures)
- [Next steps](#next-steps)
## Creating a workload from an existing cluster
@ -33,7 +42,7 @@ opensearch-benchmark create-workload \
--workload="<WORKLOAD NAME>" \
--target-hosts="<CLUSTER ENDPOINT>" \
--client-options="basic_auth_user:'<USERNAME>',basic_auth_password:'<PASSWORD>'" \
--indices="<INDICES TO GENERATE WORKLOAD FROM>" \
--indices="<INDEXES TO GENERATE WORKLOAD FROM>" \
--output-path="<LOCAL DIRECTORY PATH TO STORE WORKLOAD>"
```
@ -230,7 +239,17 @@ To build a workload with source files, create a directory for your workload and
}
```
4. For all the workload files created, verify that the workload is functional by running a test. To verify the workload, run the following command, replacing `--workload-path` with a path to your workload directory:
The corpora section refers to the source file created in step one, `movie-documents.json`, and provides the document count and the amount of uncompressed bytes. Lastly, the schedule section defines a few operations the workload performs when invoked, including:
- Deleting any current index named `movies`.
- Creating an index named `movies` based on data from `movie-documents.json` and the mappings from `index.json`.
- Verifying that the cluster is in good health and can ingest the new index.
- Ingesting the data corpora from `workload.json` into the cluster.
- Querying the results.
For all the workload files created, verify that the workload is functional by running a test. To verify the workload, run the following command, replacing `--workload-path` with a path to your workload directory:
```
opensearch-benchmark list workloads --workload-path=</path/to/workload/>
@ -367,9 +386,3 @@ If you want to make your `workload.json` file more readable, you can separate yo
- For more information about configuring OpenSearch Benchmark, see [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/).
- To show a list of prepackaged workloads for OpenSearch Benchmark, see the [opensearch-benchmark-workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) repository.

View File

@ -0,0 +1,109 @@
---
layout: default
title: Workload reference
nav_order: 60
has_children: true
---
# OpenSearch Benchmark workload reference
A workload is a specification of one or more benchmarking scenarios. A workload typically includes the following:
- One or more data streams that are ingested into indexes
- A set of queries and operations that are invoked as part of the benchmark
This section provides a list of options and examples you can use when customizing or using a workload.
For more information about what comprises a workload, see [Anatomy of a workload]({{site.url}}{{site.baseurl}}/benchmark/user-guide/concepts#anatomy-of-a-workload).
## Workload examples
If you want to try certain workloads before creating your own, use the following examples.
### Running unthrottled
In the following example, OpenSearch Benchmark runs an unthrottled bulk index operation for 1 hour against the `movies` index:
```json
{
"description": "Tutorial benchmark for OpenSearch Benchmark",
"indices": [
{
"name": "movies",
"body": "index.json"
}
],
"corpora": [
{
"name": "movies",
"documents": [
{
"source-file": "movies-documents.json",
"document-count": 11658903, # Fetch document count from command line
"uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line
}
]
}
],
"schedule": [
{
"operation": "bulk",
"warmup-time-period": 120,
"time-period": 3600,
"clients": 8
}
]
}
```
### Workload with a single task
The following workload runs a benchmark with a single task: a `match_all` query. Because no `clients` are indicated, only one client is used. According to the `schedule`, the workload runs the `match_all` query at 10 operations per second with 1 client, uses 100 iterations to warm up, and uses the next 100 iterations to measure the benchmark:
```json
{
"description": "Tutorial benchmark for OpenSearch Benchmark",
"indices": [
{
"name": "movies",
"body": "index.json"
}
],
"corpora": [
{
"name": "movies",
"documents": [
{
"source-file": "movies-documents.json",
"document-count": 11658903, # Fetch document count from command line
"uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line
}
]
}
],
{
"schedule": [
{
"operation": {
"operation-type": "search",
"index": "_all",
"body": {
"query": {
"match_all": {}
}
}
},
"warmup-iterations": 100,
"iterations": 100,
"target-throughput": 10
}
]
}
}
```
## Next steps
- For more information about configuring OpenSearch Benchmark, see [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/).
- For a list of prepackaged workloads for OpenSearch Benchmark, see the [opensearch-benchmark-workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) repository.

View File

@ -0,0 +1,30 @@
---
layout: default
title: indices
parent: Workload reference
nav_order: 65
---
# indices
The `indices` element contains a list of all indexes used in the workload.
## Example
```json
"indices": [
{
"name": "geonames",
"body": "geonames-index.json",
}
]
```
## Configuration options
Use the following options with `indices`:
Parameter | Required | Type | Description
:--- | :--- | :--- | :---
`name` | Yes | String | The name of the index template.
`body` | No | String | The file name corresponding to the index definition used in the body of the Create Index API.

View File

@ -178,4 +178,3 @@ The **Delete** button [deletes]({{site.url}}{{site.baseurl}}/api-reference/snaps
After the restore operation is complete, the restored indexes are listed in the **Indices** panel. To view the indexes, in the left panel, under **Index Management**, choose **Indices**.
<img src="{{site.url}}{{site.baseurl}}/images/restore-snapshot/restore-snapshot-indices-panel.png" alt="View Indices">{: .img-fluid}

View File

@ -103,7 +103,7 @@ Option | Required | Type | Description
`aws` | No | Object | The AWS configuration. For more information, see [aws](#aws).
`acknowledgments` | No | Boolean | When `true`, enables the `opensearch` source to receive [end-to-end acknowledgments]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/#end-to-end-acknowledgments) when events are received by OpenSearch sinks. Default is `false`.
`connection` | No | Object | The connection configuration. For more information, see [Connection](#connection).
`indices` | No | Object | The configuration for filtering which indexes are processed. Defaults to all indexes, including system indexes. For more information, see [Indices](#indices).
`indices` | No | Object | The configuration for filtering which indexes are processed. Defaults to all indexes, including system indexes. For more information, see [indexes](#indices).
`scheduling` | No | Object | The scheduling configuration. For more information, see [Scheduling](#scheduling).
`search_options` | No | Object | A list of search options performed by the source. For more information, see [Search options](#search_options).

View File

@ -43,14 +43,14 @@ Name | Description
indices_all | Grants all permissions on the index. Equates to `indices:*`.
get | Grants permissions to use `get` and `mget` actions only.
read | Grants read permissions such as search, get field mappings, `get`, and `mget`.
write | Grants permissions to create and update documents within *existing indices*. To create new indices, see `create_index`.
write | Grants permissions to create and update documents within *existing indices*. To create new indexes, see `create_index`.
delete | Grants permissions to delete documents.
crud | Combines the `read`, `write`, and `delete` action groups. Included in the `data_access` action group.
search | Grants permissions to search documents. Includes `suggest`.
suggest | Grants permissions to use the suggest API. Included in the `read` action group.
create_index | Grants permissions to create indices and mappings.
create_index | Grants permissions to create indexes and mappings.
indices_monitor | Grants permissions to execute all index monitoring actions (e.g. recovery, segments info, index stats, and status).
index | A more limited version of the `write` action group.
data_access | Combines the `crud` action group with `indices:data/*`.
manage_aliases | Grants permissions to manage aliases.
manage | Grants all monitoring and administration permissions for indices.
manage | Grants all monitoring and administration permissions for indexes.

View File

@ -136,9 +136,9 @@ _meta:
```
## Manage OpenSearch Dashboards indices
## Manage OpenSearch Dashboards indexes
The open source version of OpenSearch Dashboards saves all objects to a single index: `.kibana`. The Security plugin uses this index for the global tenant, but separate indices for every other tenant. Each user also has a private tenant, so you might see a large number of indices that follow two patterns:
The open source version of OpenSearch Dashboards saves all objects to a single index: `.kibana`. The Security plugin uses this index for the global tenant, but separate indexes for every other tenant. Each user also has a private tenant, so you might see a large number of indexes that follow two patterns:
```
.kibana_<hash>_<tenant_name>
@ -149,4 +149,3 @@ The Security plugin scrubs these index names of special characters, so they migh
{: .tip }
To back up your OpenSearch Dashboards data, [take a snapshot]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore/) of all tenant indexes using an index pattern such as `.kibana*`.

View File

@ -13,7 +13,7 @@ redirect_from:
# Introduction to OpenSearch
OpenSearch is a distributed search and analytics engine based on [Apache Lucene](https://lucene.apache.org/). After adding your data to OpenSearch, you can perform full-text searches on it with all of the features you might expect: search by field, search multiple indices, boost fields, rank results by score, sort results by field, and aggregate results.
OpenSearch is a distributed search and analytics engine based on [Apache Lucene](https://lucene.apache.org/). After adding your data to OpenSearch, you can perform full-text searches on it with all of the features you might expect: search by field, search multiple indexes, boost fields, rank results by score, sort results by field, and aggregate results.
Unsurprisingly, people often use search engines like OpenSearch as the backend for a search application---think [Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:FAQ/Technical#What_software_is_used_to_run_Wikipedia?) or an online store. It offers excellent performance and can scale up and down as the needs of the application grow or shrink.
@ -29,9 +29,9 @@ You can run OpenSearch locally on a laptop---its system requirements are minimal
In a single node cluster, such as a laptop, one machine has to do everything: manage the state of the cluster, index and search data, and perform any preprocessing of data prior to indexing it. As a cluster grows, however, you can subdivide responsibilities. Nodes with fast disks and plenty of RAM might be great at indexing and searching data, whereas a node with plenty of CPU power and a tiny disk could manage cluster state. For more information on setting node types, see [Cluster formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/).
## Indices and documents
## indexes and documents
OpenSearch organizes data into *indices*. Each index is a collection of JSON *documents*. If you have a set of raw encyclopedia articles or log lines that you want to add to OpenSearch, you must first convert them to [JSON](https://www.json.org/). A simple JSON document for a movie might look like this:
OpenSearch organizes data into *indexes*. Each index is a collection of JSON *documents*. If you have a set of raw encyclopedia articles or log lines that you want to add to OpenSearch, you must first convert them to [JSON](https://www.json.org/). A simple JSON document for a movie might look like this:
```json
{
@ -55,14 +55,14 @@ When you add the document to an index, OpenSearch adds some metadata, such as th
}
```
Indices also contain mappings and settings:
Indexes also contain mappings and settings:
- A *mapping* is the collection of *fields* that documents in the index have. In this case, those fields are `title` and `release_date`.
- Settings include data like the index name, creation date, and number of shards.
## Primary and replica shards
OpenSearch splits indices into *shards* for even distribution across nodes in a cluster. For example, a 400 GB index might be too large for any single node in your cluster to handle, but split into ten shards, each one 40 GB, OpenSearch can distribute the shards across ten nodes and work with each shard individually.
OpenSearch splits indexes into *shards* for even distribution across nodes in a cluster. For example, a 400 GB index might be too large for any single node in your cluster to handle, but split into ten shards, each one 40 GB, OpenSearch can distribute the shards across ten nodes and work with each shard individually.
By default, OpenSearch creates a *replica* shard for each *primary* shard. If you split your index into ten shards, for example, OpenSearch also creates ten replica shards. These replica shards act as backups in the event of a node failure---OpenSearch distributes replica shards to different nodes than their corresponding primary shards---but they also improve the speed and rate at which the cluster can process search requests. You might specify more than one replica per index for a search-heavy workload.
@ -93,4 +93,4 @@ To delete the document:
DELETE https://<host>:<port>/<index-name>/_doc/<document-id>
```
You can change most OpenSearch settings using the REST API, modify indices, check the health of the cluster, get statistics---almost everything.
You can change most OpenSearch settings using the REST API, modify indexes, check the health of the cluster, get statistics---almost everything.