Merge branch 'main' into notifications

This commit is contained in:
Keith Chan 2022-05-24 09:32:21 -07:00 committed by GitHub
commit d7f42451e4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
213 changed files with 8757 additions and 1471 deletions

2
.github/CODEOWNERS vendored Normal file
View File

@ -0,0 +1,2 @@
# This should match the owning team set up in https://github.com/orgs/opensearch-project/teams
* @opensearch-project/docs

18
.github/workflows/dco.yml vendored Normal file
View File

@ -0,0 +1,18 @@
name: Developer Certificate of Origin Check
on: [pull_request]
jobs:
check:
runs-on: ubuntu-latest
steps:
- name: Get PR Commits
id: 'get-pr-commits'
uses: tim-actions/get-pr-commits@v1.1.0
with:
token: ${{ secrets.GITHUB_TOKEN }}
- name: DCO Check
uses: tim-actions/dco@v1.1.0
with:
commits: ${{ steps.get-pr-commits.outputs.commits }}

View File

@ -1,6 +1,6 @@
# Contributing Guidelines
Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
Thank you for your interest in contributing to the OpenSource documentation! Whether it's a bug report, new feature, correction, or additional
documentation, we greatly value feedback and contributions from our community.
Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
@ -9,7 +9,7 @@ information to effectively respond to your bug report or contribution.
## Reporting Bugs/Feature Requests
We welcome you to use the GitHub issue tracker to report bugs or suggest features.
Use the GitHub issue tracker to report bugs or suggest features.
When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already
reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:

View File

@ -26,10 +26,9 @@ Community contributions remain essential in keeping this documentation comprehen
If you encounter problems or have questions when contributing to the documentation, these people can help:
- [aetter](https://github.com/aetter)
- [ashwinkumar12345](https://github.com/ashwinkumar12345)
- [keithhc2](https://github.com/keithhc2)
- [snyder114](https://github.com/snyder114)
- [Naarcha-AWS](https://github.com/Naarcha-AWS)
- [lizsnyder](https://github.com/lizsnyder)
## How the website works
@ -77,6 +76,8 @@ If you just need to fix a typo or add a sentence, this web-based method works we
1. Make your changes.
1. Sign off on the commit by including the text "Signed-off by: <GitHub Username> <your-email-here>" in the optional description. Be sure to use an email that's added to your GitHub account.
1. Choose **Create a new branch for this commit and start a pull request** and **Commit changes**.
@ -94,7 +95,7 @@ If you want to add a few paragraphs across multiple files and are comfortable wi
1. Edit the Markdown files in `/docs`.
1. Commit, push your changes to your fork, and submit a pull request.
1. Commit, [sign off](https://github.com/src-d/guide/blob/9171d013c648236c39faabcad8598be3c0cf8f56/developer-community/fix-DCO.md#how-to-prevent-missing-sign-offs-in-the-future), push your changes to your fork, and submit a pull request.
### Major changes
@ -143,7 +144,7 @@ If you're making major changes to the documentation and need to see the rendered
1. When you save a file, marvel as Jekyll automatically rebuilds the site and refreshes your web browser. This process can take anywhere from 10-30 seconds.
1. When you're happy with how everything looks, commit, push your changes to your fork, and submit a pull request.
1. When you're happy with how everything looks, commit, [sign off](https://github.com/src-d/guide/blob/9171d013c648236c39faabcad8598be3c0cf8f56/developer-community/fix-DCO.md#how-to-prevent-missing-sign-offs-in-the-future), push your changes to your fork, and submit a pull request.
## Writing tips

View File

@ -1,7 +1,7 @@
---
layout: default
title: Agents and ingestion tools
nav_order: 100
nav_order: 140
has_children: false
has_toc: false
redirect_from:
@ -12,9 +12,16 @@ redirect_from:
Historically, many multiple popular agents and ingestion tools have worked with Elasticsearch OSS, such as Beats, Logstash, Fluentd, FluentBit, and OpenTelemetry. OpenSearch aims to continue to support a broad set of agents and ingestion tools, but not all have been tested or have explicitly added OpenSearch compatibility.
As an intermediate compatibility solution, OpenSearch has a setting that instructs the cluster to return version 7.10.2 rather than its actual version.
Previously, an intermediate compatibility solution was available. OpenSearch had a setting that instructed the cluster to return version 7.10.2 rather than its actual version.
If you use clients that include a version check, such as recent versions of Logstash OSS or Filebeat OSS, enable the setting:
The override main response setting `compatibility.override_main_response_version` is deprecated from OpenSearch version 1.x and removed from OpenSearch 2.0.0. This setting is no longer supported for compatibility with legacy clients.
{: .note}
<!--
{: .note}
If you use clients that include a version check, such as versions of Logstash OSS or Filebeat OSS between 7.x - 7.12.x, enable the setting:
```json
PUT _cluster/settings
@ -32,7 +39,12 @@ PUT _cluster/settings
```yml
compatibility.override_main_response_version: true
```
-->
Logstash OSS 8.0 introduces a breaking change where all plugins run in ECS compatibility mode by default. If you use a compatible [OSS client](#compatibility-matrices) you must override the default value to maintain legacy behavior:
```yml
ecs_compatibility => disabled
```
## Downloads
@ -58,13 +70,13 @@ Some users report compatibility issues with ingest pipelines on these versions o
### Compatibility Matrix for Logstash
| | Logstash OSS 7.x to 7.11.x | Logstash OSS 7.12.x\* | Logstash 7.13.x without OpenSearch output plugin | Logstash 7.13.x with OpenSearch output plugin |
| :---| :--- | :--- | :--- | :--- |
| Elasticsearch OSS 7.x to 7.9.x | *Yes* | *Yes* | *No* | *Yes* |
| Elasticsearch OSS 7.10.2 | *Yes* | *Yes* | *No* | *Yes* |
| ODFE 1.x to 1.12 | *Yes* | *Yes* | *No* | *Yes* |
| ODFE 1.13 | *Yes* | *Yes* | *No* | *Yes* |
| OpenSearch 1.0 | Yes via version setting | Yes via version setting | *No* | *Yes* |
| | Logstash OSS 7.x to 7.11.x | Logstash OSS 7.12.x\* | Logstash 7.13.x-7.16.x without OpenSearch output plugin | Logstash 7.13.x-7.16.x with OpenSearch output plugin | Logstash 8.x+ with OpenSearch output plugin
| :---| :--- | :--- | :--- | :--- | :--- |
| Elasticsearch OSS 7.x to 7.9.x | *Yes* | *Yes* | *No* | *Yes* | *Yes* |
| Elasticsearch OSS 7.10.2 | *Yes* | *Yes* | *No* | *Yes* | *Yes* |
| ODFE 1.x to 1.12 | *Yes* | *Yes* | *No* | *Yes* | *Yes* |
| ODFE 1.13 | *Yes* | *Yes* | *No* | *Yes* | *Yes* |
| OpenSearch 1.x | Yes via version setting | Yes via version setting | *No* | *Yes* | Yes, with Elastic Common Schema Setting |
\* Most current compatible version with Elasticsearch OSS.
@ -77,7 +89,7 @@ Some users report compatibility issues with ingest pipelines on these versions o
| Elasticsearch OSS 7.10.2 | *Yes* | *Yes* | No |
| ODFE 1.x to 1.12 | *Yes* | *Yes* | No |
| ODFE 1.13 | *Yes* | *Yes* | No |
| OpenSearch 1.0 | Yes via version setting | Yes via version setting | No |
| OpenSearch 1.x | Yes via version setting | Yes via version setting | No |
| Logstash OSS 7.x to 7.11.x | *Yes* | *Yes* | *Yes* |
| Logstash OSS 7.12.x\* | *Yes* | *Yes* | *Yes* |
| Logstash 7.13.x with OpenSearch output plugin | *Yes* | *Yes* | *Yes* |

View File

@ -0,0 +1,371 @@
---
layout: default
title: Configuration reference
parent: Data Prepper
nav_order: 3
---
# Data Prepper configuration reference
This page lists all supported Data Prepper server, sources, buffers, processors, and sinks, along with their associated options. For example configuration files, see [Data Prepper]({{site.url}}{{site.baseurl}}/clients/data-prepper/pipelines/).
## Data Prepper server options
Option | Required | Type | Description
:--- | :--- | :--- | :---
ssl | No | Boolean | Indicates whether TLS should be used for server APIs. Defaults to true.
keyStoreFilePath | No | String | Path to a .jks or .p12 keystore file. Required if ssl is true.
keyStorePassword | No | String | Password for keystore. Optional, defaults to empty string.
privateKeyPassword | No | String | Password for private key within keystore. Optional, defaults to empty string.
serverPort | No | Integer | Port number to use for server APIs. Defaults to 4900
metricRegistries | No | List | Metrics registries for publishing the generated metrics. Currently supports Prometheus and CloudWatch. Defaults to Prometheus.
## General pipeline options
Option | Required | Type | Description
:--- | :--- | :--- | :---
workers | No | Integer | Essentially the number of application threads. As a starting point for your use case, try setting this value to the number of CPU cores on the machine. Default is 1.
delay | No | Integer | Amount of time in milliseconds workers wait between buffer read attempts. Default is 3,000.
## Sources
Sources define where your data comes from.
### otel_trace_source
Source for the OpenTelemetry Collector.
Option | Required | Type | Description
:--- | :--- | :--- | :---
port | No | Integer | The port OTel trace source is running on. Default is `21890`.
request_timeout | No | Integer | The request timeout in milliseconds. Default is `10_000`.
health_check_service | No | Boolean | Enables a gRPC health check service under `grpc.health.v1/Health/Check`. Default is `false`.
proto_reflection_service | No | Boolean | Enables a reflection service for Protobuf services (see [gRPC reflection](https://github.com/grpc/grpc/blob/master/doc/server-reflection.md) and [gRPC Server Reflection Tutorial](https://github.com/grpc/grpc-java/blob/master/documentation/server-reflection-tutorial.md) docs). Default is `false`.
unframed_requests | No | Boolean | Enable requests not framed using the gRPC wire protocol.
thread_count | No | Integer | The number of threads to keep in the ScheduledThreadPool. Default is `200`.
max_connection_count | No | Integer | The maximum allowed number of open connections. Default is `500`.
ssl | No | Boolean | Enables connections to the OTel source port over TLS/SSL. Defaults to `true`.
sslKeyCertChainFile | Conditionally | String | File-system path or AWS S3 path to the security certificate (e.g. `"config/demo-data-prepper.crt"` or `"s3://my-secrets-bucket/demo-data-prepper.crt"`). Required if ssl is set to `true`.
sslKeyFile | Conditionally | String | File-system path or AWS S3 path to the security key (e.g. `"config/demo-data-prepper.key"` or `"s3://my-secrets-bucket/demo-data-prepper.key"`). Required if ssl is set to `true`.
useAcmCertForSSL | No | Boolean | Whether to enable TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is `false`.
acmCertificateArn | Conditionally | String | Represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificate. Required if `useAcmCertForSSL` is set to `true`.
awsRegion | Conditionally | String | Represents the AWS region to use ACM or S3. Required if `useAcmCertForSSL` is set to `true` or `sslKeyCertChainFile` and `sslKeyFile` are AWS S3 paths.
authentication | No | Object| An authentication configuration. By default, this runs an unauthenticated server. This uses pluggable authentication for HTTPS. To use basic authentication, define the `http_basic` plugin with a `username` and `password`. To provide customer authentication use or create a plugin which implements: [GrpcAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/GrpcAuthenticationProvider.java).
### http_source
This is a source plugin that supports HTTP protocol. Currently ONLY support Json UTF-8 codec for incoming request, e.g. `[{"key1": "value1"}, {"key2": "value2"}]`.
Option | Required | Type | Description
:--- | :--- | :--- | :---
port | No | Integer | The port the source is running on. Default is `2021`. Valid options are between `0` and `65535`.
request_timeout | No | Integer | The request timeout in millis. Default is `10_000`.
thread_count | No | Integer | The number of threads to keep in the ScheduledThreadPool. Default is `200`.
max_connection_count | No | Integer | The maximum allowed number of open connections. Default is `500`.
max_pending_requests | No | Integer | The maximum number of allowed tasks in ScheduledThreadPool work queue. Default is `1024`.
authentication | No | Object | An authentication configuration. By default, this runs an unauthenticated server. This uses pluggable authentication for HTTPS. To use basic authentication define the `http_basic` plugin with a `username` and `password`. To provide customer authentication use or create a plugin which implements: [ArmeriaHttpAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/ArmeriaHttpAuthenticationProvider.java).
### file
Source for flat file input.
Option | Required | Type | Description
:--- | :--- | :--- | :---
path | Yes | String | Path to the input file (e.g. `logs/my-log.log`).
format | No | String | Format of each line in the file. Valid options are `json` or `plain`. Default is `plain`.
record_type | No | String | The record type to store. Valid options are `string` or `event`. Default is `string`. If you would like to use the file source for log analytics use cases like grok, set this option to `event`.
### pipeline
Source for reading from another pipeline.
Option | Required | Type | Description
:--- | :--- | :--- | :---
name | Yes | String | Name of the pipeline to read from.
### stdin
Source for console input. Can be useful for testing. No options.
## Buffers
Buffers store data as it passes through the pipeline. If you implement a custom buffer, it can be memory-based (better performance) or disk-based (larger).
### bounded_blocking
The default buffer. Memory-based.
Option | Required | Type | Description
:--- | :--- | :--- | :---
buffer_size | No | Integer | The maximum number of records the buffer accepts. Default is 512.
batch_size | No | Integer | The maximum number of records the buffer drains after each read. Default is 8.
## Processors
Processors perform some action on your data: filter, transform, enrich, etc.
Prior to Data Prepper 1.3, Processors were named Preppers. Starting in Data Prepper 1.3, the term Prepper is deprecated in favor of Processor. Data Prepper will continue to support the term "Prepper" until 2.0, where it will be removed.
{: .note }
### otel_trace_raw_prepper
Converts OpenTelemetry data to OpenSearch-compatible JSON documents.
Option | Required | Type | Description
:--- | :--- | :--- | :---
root_span_flush_delay | No | Integer | Represents the time interval in seconds to flush all the root spans in the processor together with their descendants. Default is 30.
trace_flush_interval | No | Integer | Represents the time interval in seconds to flush all the descendant spans without any root span. Default is 180.
### service_map_stateful
Uses OpenTelemetry data to create a distributed service map for visualization in OpenSearch Dashboards.
Option | Required | Type | Description
:--- | :--- | :--- | :---
window_duration | No | Integer | Represents the fixed time window in seconds to evaluate service-map relationships. Default is 180.
### peer_forwarder
Forwards ExportTraceServiceRequests via gRPC to other Data Prepper instances. Required for operating Data Prepper in a clustered deployment.
Option | Required | Type | Description
:--- | :--- | :--- | :---
time_out | No | Integer | Forwarded request timeout in seconds. Defaults to 3 seconds.
span_agg_count | No | Integer | Batch size for number of spans per request. Defaults to 48.
target_port | No | Integer | The destination port to forward requests to. Defaults to `21890`.
discovery_mode | No | String | Peer discovery mode to be used. Allowable values are `static`, `dns`, and `aws_cloud_map`. Defaults to `static`.
static_endpoints | No | List | List containing string endpoints of all Data Prepper instances.
domain_name | No | String | Single domain name to query DNS against. Typically used by creating multiple DNS A Records for the same domain.
ssl | No | Boolean | Indicates whether TLS should be used. Default is true.
awsCloudMapNamespaceName | Conditionally | String | Name of your CloudMap Namespace. Required if `discovery_mode` is set to `aws_cloud_map`.
awsCloudMapServiceName | Conditionally | String | Service name within your CloudMap Namespace. Required if `discovery_mode` is set to `aws_cloud_map`.
sslKeyCertChainFile | Conditionally | String | Represents the SSL certificate chain file path or AWS S3 path. S3 path example `s3://<bucketName>/<path>`. Required if `ssl` is set to `true`.
useAcmCertForSSL | No | Boolean | Enables TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is `false`.
awsRegion | Conditionally | String | Represents the AWS region to use ACM, S3, or CloudMap. Required if `useAcmCertForSSL` is set to `true` or `sslKeyCertChainFile` and `sslKeyFile` are AWS S3 paths.
acmCertificateArn | Conditionally | String | Represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificate. Required if `useAcmCertForSSL` is set to `true`.
### string_converter
Converts string to uppercase or lowercase. Mostly useful as an example if you want to develop your own processor.
Option | Required | Type | Description
:--- | :--- | :--- | :---
upper_case | No | Boolean | Whether to convert to uppercase (`true`) or lowercase (`false`).
### aggregate
Groups events together based on the keys provided and performs a action on each group.
Option | Required | Type | Description
:--- | :--- | :--- | :---
identification_keys | Yes | List | A unordered list by which to group Events. Events with the same values for these keys are put into the same group. If an Event does not contain one of the `identification_keys`, then the value of that key is considered to be equal to `null`. At least one identification_key is required. (e.g. `["sourceIp", "destinationIp", "port"]`).
action | Yes | AggregateAction | The action to be performed for each group. One of the available Aggregate Actions must be provided or you can create custom aggregate actions. `remove_duplicates` and `put_all` are available actions. For more information, see [creating custom aggregate actions](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/aggregate-processor#creating-new-aggregate-actions).
group_duration | No | String | The amount of time that a group should exist before it is concluded automatically. Supports ISO_8601 notation strings ("PT20.345S", "PT15M", etc.) as well as simple notation for seconds (`"60s"`) and milliseconds (`"1500ms"`). Default value is `180s`.
### date
Adds a default timestamp to the event or parses timestamp fields, and converts it to ISO 8601 format which can be used as event timestamp.
Option | Required | Type | Description
:--- | :--- | :--- | :---
match | Conditionally | List | List of `key` and `patterns` where patterns is a list. The list of match can have exactly one `key` and `patterns`. There is no default value. This option cannot be defined at the same time as `from_time_received`. Include multiple date processors in your pipeline if both options should be used.
from_time_received | Conditionally | Boolean | A boolean that is used for adding default timestamp to event data from event metadata which is the time when source receives the event. Default value is `false`. This option cannot be defined at the same time as `match`. Include multiple date processors in your pipeline if both options should be used.
destination | No | String | Field to store the timestamp parsed by date processor. It can be used with both `match` and `from_time_received`. Default value is `@timestamp`.
source_timezone | No | String | Time zone used to parse dates. It is used in case zone or offset cannot be extracted from the value. If zone or offset are part of the value, then timezone is ignored. Find all the available timezones [the list of database time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List) in the "TZ database name" column.
destination_timezone | No | String | Timezone used for storing timestamp in `destination` field. The available timezone values are the same as `source_timestamp`.
locale | No | String | Locale is used for parsing dates. It's commonly used for parsing month names(`MMM`). It can have language, country and variant fields using IETF BCP 47 or String representation of [Locale](https://docs.oracle.com/javase/8/docs/api/java/util/Locale.html) object. For example `en-US` for IETF BCP 47 and `en_US` for string representation of Locale. Full list of locale fields which includes language, country and variant can be found [the language subtag registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry). Default value is `Locale.ROOT`.
### drop_events
Drops all the events that are passed into this processor.
Option | Required | Type | Description
:--- | :--- | :--- | :---
drop_when | Yes | String | Accepts a Data Prepper Expression string following the [Data Prepper Expression Syntax](https://github.com/opensearch-project/data-prepper/blob/main/docs/expression_syntax.md). Configuring `drop_events` with `drop_when: true` drops all the events received.
handle_failed_events | No | Enum | Specifies how exceptions are handled when an exception occurs while evaluating an event. Default value is `drop`, which drops the event so it doesn't get sent to OpenSearch. Available options are `drop`, `drop_silently`, `skip`, `skip_silently`. For more information, see [handle_failed_events](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/drop-events-processor#handle_failed_events).
### grok_prepper
Takes unstructured data and utilizes pattern matching to structure and extract important keys and make data more structured and queryable.
Option | Required | Type | Description
:--- | :--- | :--- | :---
match | No | Map | Specifies which keys to match specific patterns against. Default is an empty body.
keep_empty_captures | No | Boolean | Enables preserving `null` captures. Default value is `false`.
named_captures_only | No | Boolean | enables whether to keep only named captures. Default value is `true`.
break_on_match | No | Boolean | Specifies whether to match all patterns or stop once the first successful match is found. Default is `true`.
keys_to_overwrite | No | List | Specifies which existing keys are to be overwritten if there is a capture with the same key value. Default is `[]`.
pattern_definitions | No | Map | Allows for custom pattern use inline. Default value is an empty body.
patterns_directories | No | List | Specifies the path of directories that contain customer pattern files. Default value is an empty list.
pattern_files_glob | No | String | Specifies which pattern files to use from the directories specified for `pattern_directories`. Default is `*`.
target_key | No | String | Specifies a parent level key to store all captures. Default value is `null`.
timeout_millis | No | Integer | Maximum amount of time that should take place for the matching. Setting to `0` disables the timeout. Default value is `30,000`.
### key_value
Takes in a field and parses it into key/value pairs.
Option | Required | Type | Description
:--- | :--- | :--- | :---
source | No | String | The key in the event that is parsed. Default value is `message`.
destination | No | String | The key where to output the parsed source to. Doing so overwrites the value of the key if it exists. Default value is `parsed_message`
field_delimiter_regex | Conditionally | String | A regex specifying the delimiter between key/value pairs. Special regex characters such as `[` and `]` must be escaped using `\\`. This option cannot be defined at the same time as `field_split_characters`.
field_split_characters | Conditionally | String | A string of characters to split between key/value pairs. Special regex characters such as `[` and `]` must be escaped using `\\`. Default value is `&`. This option cannot be defined at the same time as `field_delimiter_regex`.
key_value_delimiter_regex| Conditionally | String | A regex specifying the delimiter between a key and a value. Special regex characters such as `[` and `]` must be escaped using `\\`. There is no default value. This option cannot be defined at the same time as `value_split_characters`.
value_split_characters | Conditionally | String | A string of characters to split between keys and values. Special regex characters such as `[` and `]` must be escaped using `\\`. Default value is `=`. This option cannot be defined at the same time as `key_value_delimiter_regex`.
non_match_value | No | String | When a key/value cannot be successfully split, the key/value is be placed in the key field and the specified value in the value field. Default value is `null`.
prefix | No | String | A prefix given to all keys. Default value is empty string.
delete_key_regex | No | String | A regex used to delete characters from the key. Special regex characters such as `[` and `]` must be escaped using `\\`. There is no default value.
delete_value_regex | No | String | A regex used to delete characters from the value. Special regex characters such as `[` and `]` must be escaped using `\\`. There is no default value.
### add_entries
Adds an entry to event. `add_entries` is part of [mutate event](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-event-processors#mutate-event-processors) processors.
Option | Required | Type | Description
:--- | :--- | :--- | :---
entries | Yes | List | List of events to be added. Valid entries are `key`, `value`, and `overwrite_if_key_exists`.
key | N/A | N/A | Key of the new event to be added.
value | N/A | N/A | Value of the new entry to be added. Valid data types are strings, booleans, numbers, null, nested objects, and arrays containing the aforementioned data types.
overwrite_if_key_exists | No | Boolean | If true, the existing value gets overwritten if the key already exists within the event. Default is `false`.
### copy_values
Copy values within an event. `copy_values` is part of [mutate event](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-event-processors#mutate-event-processors) processors.
Option | Required | Type | Description
:--- | :--- | :--- | :---
entries | Yes | List | List of entries to be copied. Valid values are `from_key`, `to_key`, and `overwrite_if_key_exists`.
from_key | N/A | N/A | The key of the entry to be copied.
to_key | N/A | N/A | The key of the new entry to be added.
overwrite_if_to_key_exists | No | Boolean | If true, the existing value gets overwritten if the key already exists within the event. Default is `false`.
### delete_entries
Delete entries in an event. `delete_entries` is part of [mutate event](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-event-processors#mutate-event-processors) processors.
Option | Required | Type | Description
:--- | :--- | :--- | :---
with_keys | Yes | List | An array of keys of the entries to be deleted.
### rename_keys
Rename keys in an event. `rename_keys` is part of [mutate event](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-event-processors#mutate-event-processors) processors.
Option | Required | Type | Description
:--- | :--- | :--- | :---
entries | Yes | List | List of entries. Valid values are `from_key`, `to_key`, and `overwrite_if_key_exists`. Renaming occurs in the order defined.
from_key | N/A | N/A | The key of the entry to be renamed.
to_key | N/A | N/A | The new key of the entry.
overwrite_if_to_key_exists | No | Boolean | If true, the existing value gets overwritten if `to_key` already exists in the event.
### substitute_string
Matches a key's value against a regular expression and replaces all matches with a replacement string. `substitute_string` is part of [mutate string](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-string-processors#mutate-string-processors) processors.
Option | Required | Type | Description
:--- | :--- | :--- | :---
entries | Yes | List | List of entries. Valid values are `source`, `from`, and `to`.
source | N/A | N/A | The key to modify.
from | N/A | N/A | The Regex String to be replaced. Special regex characters such as `[` and `]` must be escaped using `\\` when using double quotes and `\ ` when using single quotes. See [Java Patterns](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/regex/Pattern.html) for more information.
to | N/A | N/A | The String to be substituted for each match of `from`.
### split_string
Splits a field into an array using a delimiter character. `split_string` is part of [mutate string](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-string-processors#mutate-string-processors) processors.
Option | Required | Type | Description
:--- | :--- | :--- | :---
entries | Yes | List | List of entries. Valid values are `source`, `delimiter`, and `delimiter_regex`.
source | N/A | N/A | The key to split.
delimiter | No | N/A | The separator character responsible for the split. Cannot be defined at the same time as `delimiter_regex`. At least `delimiter` or `delimiter_regex` must be defined.
delimiter_regex | No | N/A | The regex string responsible for the split. Cannot be defined at the same time as `delimiter`. At least `delimiter` or `delimiter_regex` must be defined.
### uppercase_string
Converts a string to its uppercase counterpart. `uppercase_string` is part of [mutate string](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-string-processors#mutate-string-processors) processors.
Option | Required | Type | Description
:--- | :--- | :--- | :---
with_keys | Yes | List | A list of keys to convert to uppercase.
### lowercase_string
Converts a string to its lowercase counterpart. `lowercase_string` is part of [mutate string](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-string-processors#mutate-string-processors) processors.
Option | Required | Type | Description
:--- | :--- | :--- | :---
with_keys | Yes | List | A list of keys to convert to lowercase.
### trim_string
Strips whitespace from the beginning and end of a key. `trim_string` is part of [mutate string](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-string-processors#mutate-string-processors) processors.
Option | Required | Type | Description
:--- | :--- | :--- | :---
with_keys | Yes | List | A list of keys to trim the whitespace from.
## Sinks
Sinks define where Data Prepper writes your data to.
### opensearch
Sink for an OpenSearch cluster.
Option | Required | Type | Description
:--- | :--- | :--- | :---
hosts | Yes | List | List of OpenSearch hosts to write to (e.g. `["https://localhost:9200", "https://remote-cluster:9200"]`).
cert | No | String | Path to the security certificate (e.g. `"config/root-ca.pem"`) if the cluster uses the OpenSearch security plugin.
username | No | String | Username for HTTP basic authentication.
password | No | String | Password for HTTP basic authentication.
aws_sigv4 | No | Boolean | default false. Whether to use IAM signing to connect to an Amazon OpenSearch Service domain. For your access key, secret key, and optional session token, Data Prepper uses the default credential chain (environment variables, Java system properties, `~/.aws/credential`, etc.).
aws_region | No | String | AWS region (e.g. `"us-east-1"`) for the domain if you are connecting to Amazon OpenSearch Service.
aws_sts_role_arn | No | String | IAM role which the sink plugin assumes to sign request to Amazon OpenSearch Service. If not provided the plugin uses the default credentials.
socket_timeout | No | Integer | the timeout in milliseconds for waiting for data (or, put differently, a maximum period inactivity between two consecutive data packets). A timeout value of zero is interpreted as an infinite timeout. If this timeout value is either negative or not set, the underlying Apache HttpClient would rely on operating system settings for managing socket timeouts.
connect_timeout | No | Integer | The timeout in milliseconds used when requesting a connection from the connection manager. A timeout value of zero is interpreted as an infinite timeout. If this timeout value is either negative or not set, the underlying Apache HttpClient would rely on operating system settings for managing connection timeouts.
insecure | No | Boolean | Whether to verify SSL certificates. If set to true, CA certificate verification is disabled and insecure HTTP requests are sent instead. Default is `false`.
proxy | No | String | The address of a [forward HTTP proxy server](https://en.wikipedia.org/wiki/Proxy_server). The format is "&lt;host name or IP&gt;:&lt;port&gt;". Examples: "example.com:8100", "http://example.com:8100", "112.112.112.112:8100". Port number cannot be omitted.
trace_analytics_raw | No | Boolean | Deprecated in favor of `index_type`. Whether to export as trace data to the `otel-v1-apm-span-*` index pattern (alias `otel-v1-apm-span`) for use with the Trace Analytics OpenSearch Dashboards plugin. Default is `false`.
trace_analytics_service_map | No | Boolean | Deprecated in favor of `index_type`. Whether to export as trace data to the `otel-v1-apm-service-map` index for use with the service map component of the Trace Analytics OpenSearch Dashboards plugin. Default is `false`.
index | No | String | Name of the index to export to. Only required if you don't use the `trace-analytics-raw` or `trace-analytics-service-map` presets. In other words, this parameter is applicable and required only if index_type is explicitly `custom` or defaults to `custom`.
index_type | No | String | This index type instructs the Sink plugin what type of data it is handling. Valid values: `custom`, `trace-analytics-raw`, `trace-analytics-service-map`. Default is `custom`.
template_file | No | String | Path to a JSON [index template]({{site.url}}{{site.baseurl}}/opensearch/index-templates/) file (e.g. `/your/local/template-file.json` if you do not use the `trace_analytics_raw` or `trace_analytics_service_map`.) See [otel-v1-apm-span-index-template.json](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/src/main/resources/otel-v1-apm-span-index-template.json) for an example.
document_id_field | No | String | The field from the source data to use for the OpenSearch document ID (e.g. `"my-field"`) if you don't use the `trace_analytics_raw` or `trace_analytics_service_map` presets.
dlq_file | No | String | The path to your preferred dead letter queue file (e.g. `/your/local/dlq-file`). Data Prepper writes to this file when it fails to index a document on the OpenSearch cluster.
bulk_size | No | Integer (long) | The maximum size (in MiB) of bulk requests to the OpenSearch cluster. Values below 0 indicate an unlimited size. If a single document exceeds the maximum bulk request size, Data Prepper sends it individually. Default is 5.
ism_policy_file | No | String | The absolute file path for an ISM (Index State Management) policy JSON file. This policy file is effective only when there is no built-in policy file for the index type. For example, `custom` index type is currently the only one without a built-in policy file, thus it would use the policy file here if it's provided through this parameter. For more information, see [ISM policies]({{site.url}}{{site.baseurl}}/im-plugin/ism/policies/).
number_of_shards | No | Integer | The number of primary shards that an index should have on the destination OpenSearch server. This parameter is effective only when `template_file` is either explicitly provided in Sink configuration or built-in. If this parameter is set, it would override the value in index template file. For more information, see [create index]({{site.url}}{{site.baseurl}}/opensearch/rest-api/index-apis/create-index/).
number_of_replicas | No | Integer | The number of replica shards each primary shard should have on the destination OpenSearch server. For example, if you have 4 primary shards and set number_of_replicas to 3, the index has 12 replica shards. This parameter is effective only when `template_file` is either explicitly provided in Sink configuration or built-in. If this parameter is set, it would override the value in index template file. For more information, see [create index]({{site.url}}{{site.baseurl}}/opensearch/rest-api/index-apis/create-index/).
### file
Sink for flat file output.
Option | Required | Type | Description
:--- | :--- | :--- | :---
path | Yes | String | Path for the output file (e.g. `logs/my-transformed-log.log`).
### pipeline
Sink for writing to another pipeline.
Option | Required | Type | Description
:--- | :--- | :--- | :---
name | Yes | String | Name of the pipeline to write to.
### stdout
Sink for console output. Can be useful for testing. No options.

View File

@ -0,0 +1,63 @@
---
layout: default
title: Get Started
parent: Data Prepper
nav_order: 1
---
# Get started with Data Prepper
Data Prepper is an independent component, not an OpenSearch plugin, that converts data for use with OpenSearch. It's not bundled with the all-in-one OpenSearch installation packages.
## 1. Install Data Prepper
To use the Docker image, pull it like any other image:
```bash
docker pull opensearchproject/data-prepper:latest
```
## 2. Define a pipeline
Create a Data Prepper pipeline file, `pipelines.yaml`, with the following configuration:
```yml
simple-sample-pipeline:
workers: 2
delay: "5000"
source:
random:
sink:
- stdout:
```
## 3. Start Data Prepper
Run the following command with your pipeline configuration YAML.
```bash
docker run --name data-prepper \
-v /full/path/to/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml \
opensearchproject/opensearch-data-prepper:latest
```
This sample pipeline configuration above demonstrates a simple pipeline with a source (`random`) sending data to a sink (`stdout`). For more examples and details on more advanced pipeline configurations, see [Pipelines]({{site.url}}{{site.baseurl}}/clients/data-prepper/pipelines).
After starting Data Prepper, you should see log output and some UUIDs after a few seconds:
```yml
2021-09-30T20:19:44,147 [main] INFO com.amazon.dataprepper.pipeline.server.DataPrepperServer - Data Prepper server running at :4900
2021-09-30T20:19:44,681 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
2021-09-30T20:19:45,183 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
2021-09-30T20:19:45,687 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
2021-09-30T20:19:46,191 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
2021-09-30T20:19:46,694 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
2021-09-30T20:19:47,200 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
2021-09-30T20:19:49,181 [simple-test-pipeline-processor-worker-1-thread-1] INFO com.amazon.dataprepper.pipeline.ProcessWorker - simple-test-pipeline Worker: Processing 6 records from buffer
07dc0d37-da2c-447e-a8df-64792095fb72
5ac9b10a-1d21-4306-851a-6fb12f797010
99040c79-e97b-4f1d-a70b-409286f2a671
5319a842-c028-4c17-a613-3ef101bd2bdd
e51e700e-5cab-4f6d-879a-1c3235a77d18
b4ed2d7e-cf9c-4e9d-967c-b18e8af35c90
```

View File

@ -0,0 +1,15 @@
---
layout: default
title: Data Prepper
nav_order: 120
has_children: true
has_toc: false
---
# Data Prepper
Data Prepper is a server side data collector capable of filtering, enriching, transforming, normalizing and aggregating data for downstream analytics and visualization.
Data Prepper lets users build custom pipelines to improve the operational view of applications. Two common uses for Data Prepper are trace and log analytics. [Trace analytics]({{site.url}}{{site.baseurl}}/observability-plugin/trace/index/) can help you visualize the flow of events and identify performance problems, and [log analytics]({{site.url}}{{site.baseurl}}/observability-plugin/log-analytics/) can improve searching, analyzing and provide insights into your application.
To get started building your own custom pipelines with Data Prepper, see the [Get Started]({{site.url}}{{site.baseurl}}/clients/data-prepper/get-started/) guide.

View File

@ -0,0 +1,153 @@
---
layout: default
title: Pipelines
parent: Data Prepper
nav_order: 2
---
# Pipelines
![Data Prepper Pipeline]({{site.url}}{{site.baseurl}}/images/data-prepper-pipeline.png)
To use Data Prepper, you define pipelines in a configuration YAML file. Each pipeline is a combination of a source, a buffer, zero or more preppers, and one or more sinks. For example:
```yml
simple-sample-pipeline:
workers: 2 # the number of workers
delay: 5000 # in milliseconds, how long workers wait between read attempts
source:
random:
buffer:
bounded_blocking:
buffer_size: 1024 # max number of records the buffer accepts
batch_size: 256 # max number of records the buffer drains after each read
processor:
- string_converter:
upper_case: true
sink:
- stdout:
```
- Sources define where your data comes from. In this case, the source is a random UUID generator (`random`).
- Buffers store data as it passes through the pipeline.
By default, Data Prepper uses its one and only buffer, the `bounded_blocking` buffer, so you can omit this section unless you developed a custom buffer or need to tune the buffer settings.
- Preppers perform some action on your data: filter, transform, enrich, etc.
You can have multiple preppers, which run sequentially from top to bottom, not in parallel. The `string_converter` prepper transform the strings by making them uppercase.
- Sinks define where your data goes. In this case, the sink is stdout.
## Examples
This section provides some pipeline examples that you can use to start creating your own pipelines. For more information, see [Data Prepper configuration reference]({{site.url}}{{site.baseurl}}/clients/data-prepper/data-prepper-reference/) guide.
The Data Prepper repository has several [sample applications](https://github.com/opensearch-project/data-prepper/tree/main/examples) to help you get started.
### Log ingestion pipeline
The following example demonstrates how to use HTTP source and Grok prepper plugins to process unstructured log data.
```yml
log-pipeline:
source:
http:
ssl: false
processor:
- grok:
match:
log: [ "%{COMMONAPACHELOG}" ]
sink:
- opensearch:
hosts: [ "https://opensearch:9200" ]
insecure: true
username: admin
password: admin
index: apache_logs
```
This example uses weak security. We strongly recommend securing all plugins which open external ports in production environments.
{: .note}
### Trace Analytics pipeline
The following example demonstrates how to build a pipeline that supports the [Trace Analytics OpenSearch Dashboards plugin]({{site.url}}{{site.baseurl}}/observability-plugin/trace/ta-dashboards/). This pipeline takes data from the OpenTelemetry Collector and uses two other pipelines as sinks. These two separate pipelines index trace and the service map documents for the dashboard plugin.
```yml
entry-pipeline:
delay: "100"
source:
otel_trace_source:
ssl: false
sink:
- pipeline:
name: "raw-pipeline"
- pipeline:
name: "service-map-pipeline"
raw-pipeline:
source:
pipeline:
name: "entry-pipeline"
prepper:
- otel_trace_raw_prepper:
sink:
- opensearch:
hosts: ["https://localhost:9200"]
insecure: true
username: admin
password: admin
trace_analytics_raw: true
service-map-pipeline:
delay: "100"
source:
pipeline:
name: "entry-pipeline"
prepper:
- service_map_stateful:
sink:
- opensearch:
hosts: ["https://localhost:9200"]
insecure: true
username: admin
password: admin
trace_analytics_service_map: true
```
## Migrating from Logstash
Data Prepper supports Logstash configuration files for a limited set of plugins. Simply use the logstash config to run Data Prepper.
```bash
docker run --name data-prepper \
-v /full/path/to/logstash.conf:/usr/share/data-prepper/pipelines.conf \
opensearchproject/opensearch-data-prepper:latest
```
This feature is limited by feature parity of Data Prepper. As of Data Prepper 1.2 release, the following plugins from the Logstash configuration are supported:
- HTTP Input plugin
- Grok Filter plugin
- Elasticsearch Output plugin
- Amazon Elasticsearch Output plugin
## Configure the Data Prepper server
Data Prepper itself provides administrative HTTP endpoints such as `/list` to list pipelines and `/metrics/prometheus` to provide Prometheus-compatible metrics data. The port that has these endpoints has a TLS configuration and is specified by a separate YAML file. By default, these endpoints are secured by Data Prepper docker images. We strongly recommend providing your own configuration file for securing production environments. Here is an example `data-prepper-config.yaml`:
```yml
ssl: true
keyStoreFilePath: "/usr/share/data-prepper/keystore.jks"
keyStorePassword: "password"
privateKeyPassword: "other_password"
serverPort: 1234
```
To configure the Data Prepper server, run Data Prepper with the additional yaml file.
```bash
docker run --name data-prepper -v /full/path/to/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml \
/full/path/to/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml \
opensearchproject/opensearch-data-prepper:latest
````

View File

@ -68,7 +68,7 @@ func main() {
}`)
// Create an index with non-default settings.
res := opensearchapi.CreateRequest{
res := opensearchapi.IndicesCreateRequest{
Index: IndexName,
Body: mapping,
}

View File

@ -19,18 +19,24 @@ For example, a 1.0.0 client works with an OpenSearch 1.1.0 cluster, but might no
* [OpenSearch Python client]({{site.url}}{{site.baseurl}}/clients/python/)
* [OpenSearch JavaScript (Node.js) client]({{site.url}}{{site.baseurl}}/clients/javascript/)
* [OpenSearch Go client]({{site.url}}{{site.baseurl}}/clients/go/)
* [OpenSearch PHP client]({{site.url}}{{site.baseurl}}/clients/php/)
## Legacy clients
Most clients that work with Elasticsearch OSS 7.10.2 *should* work with OpenSearch, but the latest versions of those clients might include license or version checks that artificially break compatibility. This page includes recommendations around which versions of those clients to use for best compatibility with OpenSearch.
{: .note}
OpenSearch 2.0.0 no longer supports compatibility with legacy clients. Due to breaking changes with REST APIs, some features are not supported when using OpenSearch 1.x clients to connect to OpenSearch 2.0.
Client | Recommended version
:--- | :---
[Java low-level REST client](https://search.maven.org/artifact/org.elasticsearch.client/elasticsearch-rest-client/7.13.4/jar) | 7.13.4
[Java high-level REST client](https://search.maven.org/artifact/org.elasticsearch.client/elasticsearch-rest-high-level-client/7.13.4/jar) | 7.13.4
[Python Elasticsearch client](https://pypi.org/project/elasticsearch/7.13.4/) | 7.13.4
[Elasticsearch Node.js client](https://www.npmjs.com/package/@elastic/elasticsearch/v/7.13.0) | 7.13.0
[Ruby Elasticsearch client](https://rubygems.org/gems/elasticsearch/versions/7.13.0) | 7.13.0
If you test a legacy client and verify that it works, please [submit a PR](https://github.com/opensearch-project/documentation-website/pulls) and add it to this table.

View File

@ -6,9 +6,6 @@ nav_order: 60
# Java high-level REST client
Although the OpenSearch Java high-level REST client is still usable, we recommend that you use the [OpenSearch Java client]({{site.url}}{{site.baseurl}}/clients/java/), which replaces the existing Java high-level REST client.
{: .note}
The OpenSearch Java high-level REST client lets you interact with your OpenSearch clusters and indices through Java methods and data structures rather than HTTP methods and JSON.
## Setup
@ -25,8 +22,24 @@ To start using the OpenSearch Java high-level REST client, ensure that you have
You can now start your OpenSearch cluster. The OpenSearch 1.x high-level REST client works with the 1.x versions of OpenSearch.
## Security
Before using the REST client in your Java application, you must configure the application's truststore to connect to the security plugin. If you are using self-signed certificates or demo configurations, you can use the following command to create a custom truststore and add in root authority certificates.
If you're using certificates from a trusted Certificate Authority (CA), you don't need to configure the truststore.
```bash
keytool -import <path-to-cert> -alias <alias-to-call-cert> -keystore <truststore-name>
```
You can now point your Java client to the truststore and set basic authentication credentials that can access a secure cluster (refer to the sample code below on how to do so).
If you run into issues when configuring security, see [common issues]({{site.url}}{{site.baseurl}}/troubleshoot/index) and [troubleshoot TLS]({{site.url}}{{site.baseurl}}/troubleshoot/tls).
## Sample code
This code example uses basic credentials that come with the default OpenSearch configuration. If youre using the OpenSearch Java high-level REST client with your own OpenSearch cluster, be sure to change the code to use your own credentials.
```java
import org.apache.http.HttpHost;
import org.apache.http.auth.AuthScope;
@ -62,7 +75,7 @@ public class RESTClientSample {
System.setProperty("javax.net.ssl.trustStorePassword", "password-to-keystore");
//Establish credentials to use basic authentication.
//Only for demo purposes. Do not specify your credentials in code.
//Only for demo purposes. Don't specify your credentials in code.
final CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
credentialsProvider.setCredentials(AuthScope.ANY,

View File

@ -6,15 +6,23 @@ nav_order: 65
# Java client
The OpenSearch Java client is currently in its beta phase, so we recommend that you use the [OpenSearch Java high-level REST client]({{site.url}}{{site.baseurl}}/clients/java-rest-high-level).
{: .note}
The OpenSearch Java client allows you to interact with your OpenSearch clusters through Java methods and data structures rather than HTTP methods and raw JSON.
For example, you can submit requests to your cluster using objects to create indices, add data to documents, or complete some other operation using the client's built-in methods.
## Setup
## Install the client
To start using the OpenSearch Java client, ensure that you have the following dependency in your project's `pom.xml` file:
To start using the OpenSearch Java client, ensure that you have the following dependencies in your project's `pom.xml` file:
```
<dependency>
<groupId>org.opensearch.client</groupId>
<artifactId>opensearch-rest-client</artifactId>
<version>{{site.opensearch_version}}</version>
</dependency>
<dependency>
<groupId>org.opensearch.client</groupId>
<artifactId>opensearch-java</artifactId>
@ -33,9 +41,21 @@ dependencies {
You can now start your OpenSearch cluster.
The following example uses credentials that come with the default OpenSearch configuration. If you're using the OpenSearch Java client with your own OpenSearch cluster, be sure to change the code to use your own credentials.
## Security
## Sample code
Before using the REST client in your Java application, you must configure the application's truststore to connect to the security plugin. If you are using self-signed certificates or demo configurations, you can use the following command to create a custom truststore and add in root authority certificates.
If you're using certificates from a trusted Certificate Authority (CA), you don't need to configure the truststore.
```bash
keytool -import <path-to-cert> -alias <alias-to-call-cert> -keystore <truststore-name>
```
You can now point your Java client to the truststore and set basic authentication credentials that can access a secure cluster (refer to the sample code below on how to do so).
If you run into issues when configuring security, see [common issues]({{site.url}}{{site.baseurl}}/troubleshoot/index) and [troubleshoot TLS]({{site.url}}{{site.baseurl}}/troubleshoot/tls).
## Sample data
This section uses a class called `IndexData`, which is a simple Java class that stores basic data and methods. For your own OpenSearch cluster, you might find that you need a more robust class to store your data.
@ -74,7 +94,11 @@ static class IndexData {
}
```
### OpenSearch client example
## Initialize the client with SSL and TLS enabled
This code example uses basic credentials that come with the default OpenSearch configuration. If youre using the Java client with your own OpenSearch cluster, be sure to change the code to use your own credentials.
The following sample code initializes a client with SSL and TLS enabled:
```java
import org.apache.http.HttpHost;
@ -85,20 +109,134 @@ import org.apache.http.impl.client.BasicCredentialsProvider;
import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
import org.opensearch.client.RestClient;
import org.opensearch.client.RestClientBuilder;
import org.opensearch.clients.base.RestClientTransport;
import org.opensearch.clients.base.Transport;
import org.opensearch.clients.json.jackson.JacksonJsonpMapper;
import org.opensearch.clients.opensearch.OpenSearchClient;
import org.opensearch.clients.opensearch._global.IndexRequest;
import org.opensearch.clients.opensearch._global.IndexResponse;
import org.opensearch.clients.opensearch._global.SearchResponse;
import org.opensearch.clients.opensearch.indices.*;
import org.opensearch.clients.opensearch.indices.put_settings.IndexSettingsBody;
import org.opensearch.client.base.RestClientTransport;
import org.opensearch.client.base.Transport;
import org.opensearch.client.json.jackson.JacksonJsonpMapper;
import org.opensearch.client.opensearch.OpenSearchClient;
import org.opensearch.client.opensearch._global.IndexRequest;
import org.opensearch.client.opensearch._global.IndexResponse;
import org.opensearch.client.opensearch._global.SearchResponse;
import org.opensearch.client.opensearch.indices.*;
import org.opensearch.client.opensearch.indices.put_settings.IndexSettingsBody;
import java.io.IOException;
public class OpenSearchClientExample {
public static void main(String[] args) {
RestClient restClient = null;
try{
System.setProperty("javax.net.ssl.trustStore", "/full/path/to/keystore");
System.setProperty("javax.net.ssl.trustStorePassword", "password-to-keystore");
//Only for demo purposes. Don't specify your credentials in code.
final CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
credentialsProvider.setCredentials(AuthScope.ANY,
new UsernamePasswordCredentials("admin", "admin"));
//Initialize the client with SSL and TLS enabled
restClient = RestClient.builder(new HttpHost("localhost", 9200, "https")).
setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
@Override
public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) {
return httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider);
}
}).build();
Transport transport = new RestClientTransport(restClient, new JacksonJsonpMapper());
OpenSearchClient client = new OpenSearchClient(transport);
}
}
}
```
## OpenSearch client examples
This section has sample code that shows you how to create an index with non-default settings, add a document to the index, search for the document, delete the document, and finally delete the index.
### Create an index with non-default settings
```java
String index = "sample-index";
CreateRequest createIndexRequest = new CreateRequest.Builder().index(index).build();
client.indices().create(createIndexRequest);
IndexSettings indexSettings = new IndexSettings.Builder().autoExpandReplicas("0-all").build();
IndexSettingsBody settingsBody = new IndexSettingsBody.Builder().settings(indexSettings).build();
PutSettingsRequest putSettingsRequest = new PutSettingsRequest.Builder().index(index).value(settingsBody).build();
client.indices().putSettings(putSettingsRequest);
```
### Index data
```java
IndexData indexData = new IndexData("first_name", "Bruce");
IndexRequest<IndexData> indexRequest = new IndexRequest.Builder<IndexData>().index(index).id("1").value(indexData).build();
client.index(indexRequest);
```
### Search for the document
```java
SearchResponse<IndexData> searchResponse = client.search(s -> s.index(index), IndexData.class);
for (int i = 0; i< searchResponse.hits().hits().size(); i++) {
System.out.println(searchResponse.hits().hits().get(i).source());
}
```
### Delete the document
The following sample code deletes a document whose ID is 1.
```java
client.delete(b -> b.index(index).id("1"));
```
### Delete the index
```java
DeleteRequest deleteRequest = new DeleteRequest.Builder().index(index).build();
DeleteResponse deleteResponse = client.indices().delete(deleteRequest);
} catch (IOException e){
System.out.println(e.toString());
} finally {
try {
if (restClient != null) {
restClient.close();
}
} catch (IOException e) {
System.out.println(e.toString());
}
}
}
}
```
## Complete code sample
```java
import org.apache.http.HttpHost;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.client.CredentialsProvider;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
import org.opensearch.client.RestClient;
import org.opensearch.client.RestClientBuilder;
import org.opensearch.client.base.RestClientTransport;
import org.opensearch.client.base.Transport;
import org.opensearch.client.json.jackson.JacksonJsonpMapper;
import org.opensearch.client.opensearch.OpenSearchClient;
import org.opensearch.client.opensearch._global.IndexRequest;
import org.opensearch.client.opensearch._global.IndexResponse;
import org.opensearch.client.opensearch._global.SearchResponse;
import org.opensearch.client.opensearch.indices.*;
import org.opensearch.client.opensearch.indices.put_settings.IndexSettingsBody;
import java.io.IOException;
public class OpenSearchClientExample {
public static void main(String[] args) {
RestClient restClient = null;
try{
System.setProperty("javax.net.ssl.trustStore", "/full/path/to/keystore");
System.setProperty("javax.net.ssl.trustStorePassword", "password-to-keystore");
@ -109,7 +247,7 @@ public class OpenSearchClientExample {
new UsernamePasswordCredentials("admin", "admin"));
//Initialize the client with SSL and TLS enabled
RestClient restClient = RestClient.builder(new HttpHost("localhost", 9200, "https")).
restClient = RestClient.builder(new HttpHost("localhost", 9200, "https")).
setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
@Override
public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) {
@ -148,13 +286,12 @@ public class OpenSearchClientExample {
DeleteRequest deleteRequest = new DeleteRequest.Builder().index(index).build();
DeleteResponse deleteResponse = client.indices().delete(deleteRequest);
restClient.close();
} catch (IOException e){
System.out.println(e.toString());
} finally {
try {
if (client != null) {
client.close();
if (restClient != null) {
restClient.close();
}
} catch (IOException e) {
System.out.println(e.toString());

View File

@ -1,7 +1,7 @@
---
layout: default
title: JavaScript client
nav_order: 90
nav_order: 100
---
# JavaScript client

View File

@ -212,7 +212,7 @@ useragent {
}
```
Start Logstah and send an HTTP request.
Start Logstash and send an HTTP request.
You can see a field named `ua` with a number of keys including the browser name and version, the OS, and the device.

View File

@ -67,10 +67,10 @@ If you're migrating from an existing Logstash installation, you can install the
2. Navigate to the downloaded folder in the terminal and extract the files:
```bash
tar -zxvf logstash-oss-with-opensearch-output-plugin-7.13.2-linux-x64.tar.gz
tar -zxvf logstash-oss-with-opensearch-output-plugin-7.16.2-linux-x64.tar.gz
```
3. Navigate to the `logstash-7.13.2` directory.
3. Navigate to the `logstash-7.16.2` directory.
- You can add your pipeline configurations to the `config` directory. Logstash saves any data from the plugins in the `data` directory. The `bin` directory contains the binaries for starting Logstash and managing plugins.
### Docker
@ -78,7 +78,7 @@ If you're migrating from an existing Logstash installation, you can install the
1. Pull the Logstash oss package with the OpenSearch output plugin image:
```
docker pull opensearchproject/logstash-oss-with-opensearch-output-plugin:7.13.2
docker pull opensearchproject/logstash-oss-with-opensearch-output-plugin:7.16.2
```
1. Create a Docker network:
@ -90,13 +90,13 @@ If you're migrating from an existing Logstash installation, you can install the
1. Start OpenSearch with this network:
```
docker run -p 9200:9200 -p 9600:9600 --name opensearch --net test -e "discovery.type=single-node" opensearchproject/opensearch:1.0.0
docker run -p 9200:9200 -p 9600:9600 --name opensearch --net test -e "discovery.type=single-node" opensearchproject/opensearch:1.2.0
```
1. Start Logstash:
```
docker run -it --rm --name logstash --net test opensearchproject/logstash-oss-with-opensearch-output-plugin:7.13.2 -e 'input { stdin { } } output {
docker run -it --rm --name logstash --net test opensearchproject/logstash-oss-with-opensearch-output-plugin:7.16.2 -e 'input { stdin { } } output {
opensearch {
hosts => ["https://opensearch:9200"]
index => "opensearch-logstash-docker-%{+YYYY.MM.dd}"

View File

@ -0,0 +1,51 @@
---
layout: default
title: Read from OpenSearch
parent: Logstash
nav_order: 220
---
# Read from OpenSearch
As we ship Logstash events to an OpenSearch cluster using the [OpenSearch output plugin](https://github.com/opensearch-project/logstash-output-opensearch), we can also perform read operations on an OpenSearch cluster and load data into Logstash using the [OpenSearch input plugin](https://github.com/opensearch-project/logstash-input-opensearch).
The OpenSearch input plugin reads the search query results performed on an OpenSearch cluster and loads them into Logstash. This lets you replay test logs, reindex, and perform other operations based on the loaded data. You can schedule ingestions to run periodically by using
[cron expressions](https://opensearch.org/docs/latest/monitoring-plugins/alerting/cron/), or manually load data into Logstash by running the query once.
## OpenSearch input plugin
To run the OpenSearch input plugin, add the configuration to the `pipeline.conf` file within your Logstash's `config` folder. The example below runs the `match_all` query filter and loads in data once.
```yml
input {
opensearch {
hosts => "https://hostname:port"
user => "admin"
password => "admin"
index => "logstash-logs-%{+YYYY.MM.dd}"
query => "{ "query": { "match_all": {}} }"
}
}
filter {
}
output {
}
```
To ingest data according to a schedule, use a cron expression that specifies the schedule you want. For example, to load in data every minute, add `schedule => "* * * * *"` to the input section of your `pipeline.conf` file.
Like the output plugin, after adding your configuration to the `pipeline.conf` file, start Logstash by providing the path to this file:
```bash
$ bin/logstash -f config/pipeline.conf --config.reload.automatic
```
`config/pipeline.conf` is a relative path to the `pipeline.conf` file. You can use an absolute path as well.
Adding `stdout{}` to the `output{}` section of your `pipeline.conf` file prints the query results to the console.
To reindex the data into an OpenSearch domain, add the destination domain configuration in the `output{}` section like shown [here](https://opensearch.org/docs/latest/clients/logstash/ship-to-opensearch/#opensearch-output-plugin).

View File

@ -75,3 +75,96 @@ output {
health | status | index | uuid | pri | rep | docs.count | docs.deleted | store.size | pri.store.size
green | open | logstash-logs-2021.07.01 | iuh648LYSnmQrkGf70pplA | 1 | 1 | 1 | 0 | 10.3kb | 5.1kb
```
## Adding different Authentication mechanisms in the Output plugin
## auth_type to support different authentication mechanisms
In addition to the existing authentication mechanisms, if we want to add new authentication then we will be adding them in the configuration by using auth_type
Example Configuration for basic authentication:
```yml
output {
opensearch {
hosts => ["https://hostname:port"]
auth_type => {
type => 'basic'
user => 'admin'
password => 'admin'
}
index => "logstash-logs-%{+YYYY.MM.dd}"
}
}
```
### Parameters inside auth_type
- type (string) - We should specify the type of authentication
- We should add credentials required for that authentication like 'user' and 'password' for 'basic' authentication
- We should also add other parameters required for that authentication mechanism like we added 'region' for 'aws_iam' authentication
## Configuration for AWS IAM Authentication
To run the Logstash Output Opensearch plugin using aws_iam authentication, simply add a configuration following the below documentation.
Example Configuration:
```yml
output {
opensearch {
hosts => ["https://hostname:port"]
auth_type => {
type => 'aws_iam'
aws_access_key_id => 'ACCESS_KEY'
aws_secret_access_key => 'SECRET_KEY'
region => 'us-west-2'
}
index => "logstash-logs-%{+YYYY.MM.dd}"
}
}
```
### Required Parameters
- hosts (array of string) - AmazonOpensearchService domain endpoint : port number
- auth_type (Json object) - Which holds other parameters required for authentication
- type (string) - "aws_iam"
- aws_access_key_id (string) - AWS access key
- aws_secret_access_key (string) - AWS secret access key
- region (string, :default => "us-east-1") - region in which the domain is located
- if we want to pass other optional parameters like profile, session_token,etc. They needs to be added in auth_type
- port (string) - AmazonOpensearchService listens on port 443 for HTTPS
- protocol (string) - The protocol used to connect to AmazonOpensearchService is 'https'
### Optional Parameters
- The credential resolution logic can be described as follows:
- User passed aws_access_key_id and aws_secret_access_key in configuration
- Environment variables - AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY (RECOMMENDED since they are recognized by all the AWS SDKs and CLI except for .NET), or AWS_ACCESS_KEY and AWS_SECRET_KEY (only recognized by Java SDK)
- Credential profiles file at the default location (~/.aws/credentials) shared by all AWS SDKs and the AWS CLI
- Instance profile credentials delivered through the Amazon EC2 metadata service
- template (path) - You can set the path to your own template here, if you so desire. If not set, the included template will be used.
- template_name (string, default => "logstash") - defines how the template is named inside Opensearch
## Data streams
The OpenSearch output plugin can store both time series datasets (such as logs, events, and metrics) and non-time series data in OpenSearch.
The data stream is recommended to index time series datasets (such as logs, metrics, and events) into OpenSearch.
To know more about data streams, refer to this [documentation](https://opensearch.org/docs/latest/opensearch/data-streams/).
We can ingest data into a data stream through logstash. We need to create the data stream and specify the name of data stream and the `op_type` of `create` in the output configuration. The sample configuration is shown below:
```yml
output {
opensearch {
hosts => ["https://hostname:port"]
auth_type => {
type => 'basic'
user => 'admin'
password => 'admin'
}
index => "my-data-stream"
action => "create"
}
}
```

100
_clients/php.md Normal file
View File

@ -0,0 +1,100 @@
---
layout: default
title: PHP client
nav_order: 90
---
# PHP client
The OpenSearch PHP client provides a safer and easier way to interact with your OpenSearch cluster. Rather than using OpenSearch from the browser and potentially exposing your data to the public, you can build an OpenSearch client that takes care of sending requests to your cluster.
The client contains a library of APIs that let you perform different operations on your cluster and return a standard response body. The example here demonstrates some basic operations like creating an index, adding documents, and searching your data.
## Setup
To add the client to your project, install it using [composer](https://getcomposer.org/):
```bash
composer require opensearch-project/opensearch-php
```
To install a specific major version of the client, run the following command:
```bash
composer require opensearch-project/opensearch-php:<version>
```
Then require the autload file from composer in your code:
```php
require __DIR__ . '/vendor/autoload.php';
```
## Sample code
```php
<?php
require __DIR__ . '/vendor/autoload.php';
$client = (new \OpenSearch\ClientBuilder())
->setHosts(['https://localhost:9200'])
->setBasicAuthentication('admin', 'admin') // For testing only. Don't store credentials in code.
->setSSLVerification(false) // For testing only. Use certificate for validation
->build();
$indexName = 'test-index-name';
// Print OpenSearch version information on console.
var_dump($client->info());
// Create an index with non-default settings.
$client->indices()->create([
'index' => $indexName,
'body' => [
'settings' => [
'index' => [
'number_of_shards' => 4
]
]
]
]);
$client->create([
'index' => $indexName,
'id' => 1,
'body' => [
'title' => 'Moneyball',
'director' => 'Bennett Miller',
'year' => 2011
]
]);
// Search for it
var_dump(
$client->search([
'index' => $indexName,
'body' => [
'size' => 5,
'query' => [
'multi_match' => [
'query' => 'miller',
'fields' => ['title^2', 'director']
]
]
]
])
);
// Delete a single document
$client->delete([
'index' => $indexName,
'id' => 1,
]);
// Delete index
$client->indices()->delete([
'index' => $indexName
]);
```

View File

@ -5,9 +5,10 @@ baseurl: "/docs/latest" # the subpath of your site, e.g. /blog
url: "https://opensearch.org" # the base hostname & protocol for your site, e.g. http://example.com
permalink: /:path/
opensearch_version: 1.1.0
opensearch_major_minor_version: 1.1
lucene_version: 8_9_0
opensearch_version: 2.0.0-rc1
opensearch_dashboards_version: 2.0.0-rc1
opensearch_major_minor_version: 2.0-rc1
lucene_version: 9_1_0
# Build settings
markdown: kramdown
@ -48,6 +49,12 @@ collections:
replication-plugin:
permalink: /:collection/:path/
output: true
observability-plugin:
permalink: /:collection/:path/
output: true
ml-commons-plugin:
permalink: /:collection/:path/
output: true
monitoring-plugins:
permalink: /:collection/:path/
output: true
@ -90,6 +97,12 @@ just_the_docs:
replication-plugin:
name: Replication plugin
nav_fold: true
observability-plugin:
name: Observability plugin
nav_fold: true
ml-commons-plugin:
name: ML Commons plugin
nav_fold: true
monitoring-plugins:
name: Monitoring plugins
nav_fold: true
@ -107,8 +120,15 @@ just_the_docs:
# Enable or disable the site search
# Supports true (default) or false
search_enabled: true
# By default, just-the-docs enables its JSON file-based search. We also have an OpenSearch-driven search functionality.
# To disable any search from appearing, both `search_enabled` and `use_custom_search` need to be false.
# To use the OpenSearch-driven search, `search_enabled` has to be false and `use_custom_search` needs to be true.
# If `search_enabled` is true, irrespective of the value of `use_custom_search`, the JSON file-based search appears.
#
# `search_enabled` defaults to true
# `use_custom_search` defaults to false
search_enabled: false
use_custom_search: true
search:
# Split pages into sections that can be searched individually

126
_dashboards/branding.md Normal file
View File

@ -0,0 +1,126 @@
---
layout: default
title: Custom branding
nav_order: 200
---
# OpenSearch Dashboards custom branding
Introduced 1.2
{: .label .label-purple }
By default, OpenSearch Dashboards uses the OpenSearch logo, but if you want to use custom branding elements such as the favicon or main Dashboards logo, you can do so by editing `opensearch_dashboards.yml` or by including a custom `opensearch_dashboards.yml` file when you start your OpenSearch cluster.
For example, if you're using Docker to start your OpenSearch cluster, include the following lines in the `opensearch-dashboards` section of your `docker-compose.yml` file:
```
volumes:
- ./opensearch_dashboards.yml:/usr/share/opensearch-dashboards/config/opensearch_dashboards.yml
```
Doing so replaces the Docker image's default `opensearch_dashboards.yml` with your custom `opensearch_dashboards.yml` file, so be sure to include your desired settings as well. For example, if you want to configure TLS for OpenSearch Dashboards, see [Configure TLS for OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/dashboards/install/tls).
Re-launch OpenSearch Dashboards, and OpenSearch Dashboards now uses your custom elements.
## Branding elements
The following elements in OpenSearch Dashboards are customizable:
![OpenSearch customizable branding elements]({{site.url}}{{site.baseurl}}/images/dashboards-branding-labels.png)
Setting | Corresponding branding element
:--- | :---
logo | Header logo. See #1 in the image.
mark | OpenSearch Dashboards mark. See #2 in the image.
loadingLogo | Loading logo used when OpenSearch Dashboards is starting. See #3 in the image.
faviconUrl | Website icon. Loads next to the application title. See #4 in the image.
applicationTitle | The application's title. See #5 in the image.
To start using your own branding elements in OpenSearch Dashboards, first uncomment this section of `opensearch_dashboards.yml`:
```yml
# opensearchDashboards.branding:
# logo:
# defaultUrl: ""
# darkModeUrl: ""
# mark:
# defaultUrl: ""
# darkModeUrl: ""
# loadingLogo:
# defaultUrl: ""
# darkModeUrl: ""
# faviconUrl: ""
# applicationTitle: ""
```
Add the URLs you want to use as branding elements to the appropriate setting. Valid image types are `SVG`, `PNG`, and `GIF`.
Customization of dark mode Dashboards is also available, but you first must supply a valid link to `defaultUrl`, and then link to your preferred image with `darkModeUrl`. If you don't provide a `darkModeUrl` link, then Dashboards uses the provided `defaultUrl` element for dark mode. You are not required to customize all branding elements, so if you wanted to, it's perfectly valid to change just the logo or any other element. Leave unchanged elements as commented.
The following example demonstrates how to use `SVG` files as logos but leaves the other elements as defaults.
```yml
logo:
defaultUrl: "https://example.com/validUrl.svg"
darkModeUrl: "https://example.com/validDarkModeUrl.svg"
# mark:
# defaultUrl: ""
# darkModeUrl: ""
# loadingLogo:
# defaultUrl: ""
# darkModeUrl: ""
# faviconUrl: ""
applicationTitle: "My custom application"
```
We recommend linking to images that are hosted on a web server, but if you really want to use locally hosted images, save your images inside `assets`, and then configure `opensearch_dashboards.yml` to use the correct paths. You can access locally stored images through the `ui/assets` folder.
The following example assumes the default port of 5601 that Dashboards uses and demonstrates how to link to locally stored images.
```yml
logo:
defaultUrl: "https://localhost:5601/ui/assets/my-own-image.svg"
darkModeUrl: "https://localhost:5601/ui/assets/dark-mode-my-own-image.svg"
mark:
defaultUrl: "https://localhost:5601/ui/assets/my-own-image2.svg"
darkModeUrl: "https://localhost:5601/ui/assets/dark-mode-my-own-image2.svg"
# loadingLogo:
# defaultUrl: ""
# darkModeUrl: ""
# faviconUrl: ""
applicationTitle: "My custom application"
```
## Sample configuration
The following configuration enables the security plugin and SSL within OpenSearch Dashboards and uses custom branding elements to replace the OpenSearch logo and application title.
```yml
server.host: "0"
opensearch.hosts: ["https://localhost:9200"]
opensearch.ssl.verificationMode: none
opensearch.username: "kibanaserver"
opensearch.password: "kibanaserver"
opensearch.requestHeadersAllowlist: [ authorization,securitytenant ]
#server.ssl.enabled: true
#server.ssl.certificate: /path/to/your/server/certificate
#server.ssl.key: /path/to/your/server/key
opensearch_security.multitenancy.enabled: true
opensearch_security.multitenancy.tenants.preferred: ["Private", "Global"]
opensearch_security.readonly_mode.roles: ["kibana_read_only"]
# Use this setting if you are running opensearch-dashboards without https
opensearch_security.cookie.secure: false
opensearchDashboards.branding:
logo:
defaultUrl: "https://example.com/sample.svg"
darkModeUrl: "https://example.com/dark-mode-sample.svg"
# mark:
# defaultUrl: ""
# darkModeUrl: ""
# loadingLogo:
# defaultUrl: ""
# darkModeUrl: ""
# faviconUrl: ""
applicationTitle: "Just some testing"
```

View File

@ -1,7 +1,6 @@
---
layout: default
title: Browser compatibility
parent: OpenSearch Dashboards
nav_order: 3
---

View File

@ -6,7 +6,7 @@ nav_order: 99
# Dashboards Query Language
Similar to the [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index) that lets you use the HTTP request body to search for data, you can use the Dashbaords Query Language (DQL) in OpenSearch Dashboards to search for data and visualizations.
Similar to the [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index) that lets you use the HTTP request body to search for data, you can use the Dashboards Query Language (DQL) in OpenSearch Dashboards to search for data and visualizations.
For example, if you want to see all visualizations of visits to a host based in the US, enter `geo.dest:US` into the search field, and Dashboards refreshes to display all related data.

View File

@ -9,7 +9,7 @@ nav_order: 1
You *can* start OpenSearch Dashboards using `docker run` after [creating a Docker network](https://docs.docker.com/engine/reference/commandline/network_create/) and starting OpenSearch, but the process of connecting OpenSearch Dashboards to OpenSearch is significantly easier with a Docker Compose file.
1. Run `docker pull opensearchproject/opensearch-dashboards:{{site.opensearch_version}}`.
1. Run `docker pull opensearchproject/opensearch-dashboards:{{site.opensearch_dashboards_version}}`.
1. Create a [`docker-compose.yml`](https://docs.docker.com/compose/compose-file/) file appropriate for your environment. A sample file that includes OpenSearch Dashboards is available on the OpenSearch [Docker installation page]({{site.url}}{{site.baseurl}}/opensearch/install/docker#sample-docker-compose-file).

View File

@ -28,6 +28,48 @@ If you don't want to use the all-in-one installation options, you can install th
</tr>
</thead>
<tbody>
<tr>
<td>1.3.1</td>
<td>
<pre>alertingDashboards 1.3.1.0
anomalyDetectionDashboards 1.3.1.0
ganttChartDashboards 1.3.1.0
indexManagementDashboards 1.3.1.0
observabilityDashboards 1.3.1.0
queryWorkbenchDashboards 1.3.1.0
reportsDashboards 1.3.1.0
securityDashboards 1.3.1.0
</pre>
</td>
</tr>
<tr>
<td>1.3.0</td>
<td>
<pre>alertingDashboards 1.3.0.0
anomalyDetectionDashboards 1.3.0.0
ganttChartDashboards 1.3.0.0
indexManagementDashboards 1.3.0.0
observabilityDashboards 1.3.0.0
queryWorkbenchDashboards 1.3.0.0
reportsDashboards 1.3.0.0
securityDashboards 1.3.0.0
</pre>
</td>
</tr>
<tr>
<td>1.2.0</td>
<td>
<pre>alertingDashboards 1.2.0.0
anomalyDetectionDashboards 1.2.0.0
ganttChartDashboards 1.2.0.0
indexManagementDashboards 1.2.0.0
observabilityDashboards 1.2.0.0
queryWorkbenchDashboards 1.2.0.0
reportsDashboards 1.2.0.0
securityDashboards 1.2.0.0
</pre>
</td>
</tr>
<tr>
<td>1.1.0</td>
<td>

View File

@ -25,7 +25,7 @@ opensearch.hosts: ["https://localhost:9200"]
opensearch.ssl.verificationMode: full
opensearch.username: "kibanaserver"
opensearch.password: "kibanaserver"
opensearch.requestHeadersWhitelist: [ authorization,securitytenant ]
opensearch.requestHeadersAllowlist: [ authorization,securitytenant ]
server.ssl.enabled: true
server.ssl.certificate: /usr/share/opensearch-dashboards/config/client-cert.pem
server.ssl.key: /usr/share/opensearch-dashboards/config/client-cert-key.pem

View File

@ -13,12 +13,12 @@ CSV reports have a non-configurable 10,000 row limit. They have no explicit size
{: .tip }
## Create reports from Discovery, Visualize, or Dashboard
## Create reports from Discovery, Visualize, Dashboard, or Notebooks
Quickly generate an on-demand report from the current view.
1. From the top menu bar, choose **Reporting**.
1. For dashboards or visualizations, choose **Download PDF** or **Download PNG**. From the Discover page, choose **Download CSV**.
1. For dashboards, visualizations, or notebooks, choose **Download PDF** or **Download PNG**. If you're creating a report from the Discover page, choose **Generate CSV**.
Reports generate asynchronously in the background and might take a few minutes, depending on the size of the report. A notification appears when your report is ready to download.
@ -32,11 +32,11 @@ Definitions let you generate reports on a periodic schedule.
1. From the navigation panel, choose **Reporting**.
1. Choose **Create**.
1. Under **Report settings**, enter a name and optional description for your report.
1. Choose the **Report Source** (i.e. the page from which the report is generated). You can generate reports from the **Dashboard**, **Visualize**, or **Discover** pages.
1. Select your dashboard, visualization, or saved search. Then choose a time range for the report.
1. Choose the **Report source** (i.e. the page from which the report is generated). You can generate reports from the **Dashboard**, **Visualize**, **Discover** (saved search), or **Notebooks** pages.
1. Select your dashboard, visualization, saved search, or notebook. Then choose a time range for the report.
1. Choose an appropriate file format for the report.
1. (Optional) Add a header or footer to the report. Headers and footers are only available for dashboard or visualization reports.
1. Under **Report trigger**, choose either **On-demand** or **Schedule**.
1. (Optional) Add a header or footer to the report. Headers and footers are only available for dashboard, visualization, and notebook reports.
1. Under **Report trigger**, choose either **On demand** or **Schedule**.
For scheduled reports, select either **Recurring** or **Cron based**. You can receive reports daily or at some other time interval. Cron expressions give you even more flexiblity. See [Cron expression reference]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/cron/) for more information.
@ -55,3 +55,7 @@ This problem can occur for two reasons:
- You don't have the correct version of `headless-chrome` to match the operating system on which OpenSearch Dashboards is running. Download the correct version [here](https://github.com/opensearch-project/dashboards-reports/releases/tag/chromium-1.12.0.0).
- You're missing additional dependencies. Install the required dependencies for your operating system from the [additional libraries](https://github.com/opensearch-project/dashboards-reports/blob/main/dashboards-reports/rendering-engine/headless-chrome/README.md#additional-libaries) section.
### Characters not loading in reports
You might encounter an issue where UTF-8 encoded characters look fine in your browser, but they don't load in your generated reports because you're missing the required font dependencies. Install the [font dependencies](https://github.com/opensearch-project/dashboards-reports#missing-font-dependencies), and then generate your reports again.

View File

@ -0,0 +1,44 @@
---
layout: default
title: Search telemetry
nav_order: 30
---
# About search telemetry
You can use search telemetry to analyze search request performance by success or failure in OpenSearch Dashboards. OpenSearch stores telemetry data in the `.kibana_1` index.
Because there are thousands of concurrent search requests from OpenSearch Dashboards, the heavy traffic can cause significant load in an OpenSearch cluster.
OpenSearch clusters perform better with search telemetry turned off.
{: .tip }
## Turn on search telemetry
Search usage telemetry is turned off by default. To turn it on, you need to set `data.search.usageTelemetry.enabled` to `true` in the `opensearch_dashboards.yml` file.
You can find the [OpenSearch Dashboards YAML file](https://github.com/opensearch-project/OpenSearch-Dashboards/blob/main/config/opensearch_dashboards.yml) in the opensearch-project repository on GitHub.
Turning on telemetry in the `opensearch_dashboards.yml` file overrides the default search telemetry setting of `false` in the [Data plugin configuration file](https://github.com/opensearch-project/OpenSearch-Dashboards/blob/main/src/plugins/data/config.ts).
{: .note }
### Turn search telemetry on or off
The following table shows the `data.search.usageTelemetry.enabled` values you can set in `opensearch_dashboards.yml` to turn search telemetry on or off.
OpenSearch Dashboards YAML value | Search telemetry status: on or off
:--- | :---
`true` | On
`false` | Off
`none` | Off
#### Sample opensearch_dashboards.yml with telemetry enabled
This OpenSearch Dashboards YAML file excerpt shows the telemetry setting set to `true` to turn on search telemetry:
```json
# Set the value of this setting to false to suppress
# search usage telemetry to reduce the load of the OpenSearch cluster.
data.search.usageTelemetry.enabled: true
```

View File

@ -1 +1 @@
message: "🌡️ [OpenSearch 1.1.0 arrived October 5 with cross-cluster replication, bucket-level alerting, and much, much more. Grab it here!](/downloads.html)"
message: "This is a pre-release version of OpenSearch 2.0.0. Feel free to try it out and provide feedback. If you are looking for the most recent production-ready release, see the [1.x line](https://opensearch.org/lines/1x.html)"

View File

@ -26,6 +26,9 @@ columns:
-
title: FAQ
url: '/faq/'
-
title: Testimonials
url: '/testimonials.html'
-
title: 'Brand Guidelines'
url: '/brand.html'
@ -45,5 +48,5 @@ columns:
# title: 'Facebook'
# url: 'http://www.facebook.com/opensearch'
-
title: 'E-mail'
url: 'mailto:opensearch@amazon.com'
title: 'Connect'
url: 'https://opensearch.org/connect.html'

View File

@ -1,6 +1,11 @@
{
"current": "1.1",
"past": [
"current": "2.0",
"all": [
"2.0",
"1.3",
"1.2",
"1.1",
"1.0"
]
],
"latest": "1.3"
}

View File

@ -0,0 +1,22 @@
---
layout: default
title: Settings
parent: Index rollups
nav_order: 30
---
# Index rollup settings
We don't recommend changing these settings; the defaults should work well for most use cases.
All settings are available using the OpenSearch `_cluster/settings` operation. None require a restart, and all can be marked `persistent` or `transient`.
Setting | Default | Description
:--- | :--- | :---
`plugins.rollup.search.backoff_millis` | 1000 milliseconds | The backoff time between retries for failed rollup jobs.
`plugins.rollup.search.backoff_count` | 5 | How many retries the plugin should attempt for failed rollup jobs.
`plugins.rollup.search.search_all_jobs` | false | Whether OpenSearch should return all jobs that match all specified search terms. If disabled, OpenSearch returns just one, as opposed to all, of the jobs that matches the search terms.
`plugins.rollup.dashboards.enabled` | true | Whether rollups are enabled in OpenSearch Dashboards.
`plugins.rollup.enabled` | true | Whether the rollup plugin is enabled.
`plugins.ingest.backoff_millis` | 1000 milliseconds | The backoff time between data ingestions for rollup jobs.
`plugins.ingest.backoff_count` | 5 | How many retries the plugin should attempt for failed ingestions.

View File

@ -9,16 +9,16 @@ has_toc: false
# Index transforms
Whereas index rollup jobs let you reduce data granularity by rolling up old data into condensed indices, transform jobs let you create a different, summarized view of your data centered around certain fields, so you can visualize or analyze the data in different ways.
Whereas index rollup jobs let you reduce data granularity by rolling up old data into condensed indexes, transform jobs let you create a different, summarized view of your data centered around certain fields, so you can visualize or analyze the data in different ways.
For example, suppose that you have airline data thats scattered across multiple fields and categories, and you want to view a summary of the data thats organized by airline, quarter, and then price. You can use a transform job to create a new, summarized index thats organized by those specific categories.
You can use transform jobs in two ways:
1. Use the OpenSearch Dashboards UI to specify the index you want to transform and any optional data filters you want to use to filter the original index. Then select the fields you want to transform and the aggregations to use in the transformation. Finally, define a schedule for your job to follow.
2. Use the transforms API to specify all the details about your job: the index you want to transform, target groups you want the transformed index to have, any aggregations you want to use to group columns, and a schedule for your job to follow.
1. Use the transforms API to specify all the details about your job: the index you want to transform, target groups you want the transformed index to have, any aggregations you want to use to group columns, and a schedule for your job to follow.
OpenSearch Dashboards provides a detailed summary of the jobs you created and their relevant information, such as associated indices and job statuses. You can review and edit your jobs details and selections before creation, and even preview a transformed indexs data as youre choosing which fields to transform. However, you can also use the REST API to create transform jobs and preview transform job results, but you must know all of the necessary settings and parameters to submit them as part of the HTTP request body. Submitting your transform job configurations as JSON scripts offers you more portability, allowing you to share and replicate your transform jobs, which is harder to do using OpenSearch Dashboards.
OpenSearch Dashboards provides a detailed summary of the jobs you created and their relevant information, such as associated indexes and job statuses. You can review and edit your jobs details and selections before creation, and even preview a transformed indexs data as youre choosing which fields to transform. However, you can also use the REST API to create transform jobs and preview transform job results, but you must know all of the necessary settings and parameters to submit them as part of the HTTP request body. Submitting your transform job configurations as JSON scripts offers you more portability, allowing you to share and replicate your transform jobs, which is harder to do using OpenSearch Dashboards.
Your use cases will help you decide which method to use to create transform jobs.
@ -26,7 +26,7 @@ Your use cases will help you decide which method to use to create transform jobs
If you don't have any data in your cluster, you can use the sample flight data within OpenSearch Dashboards to try out transform jobs. Otherwise, after launching OpenSearch Dashboards, choose **Index Management**. Select **Transform Jobs**, and choose **Create Transform Job**.
### Step 1: Choose indices
### Step 1: Choose indexes
1. In the **Job name and description** section, specify a name and an optional description for your job.
2. In the **Indices** section, select the source and target index. You can either select an existing target index or create a new one by entering a name for your new index. If you want to transform just a subset of your source index, choose **Edit data filter**, and use the OpenSearch query DSL to specify a subset of your source index. For more information about the OpenSearch query DSL, see [query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/).
@ -34,7 +34,7 @@ If you don't have any data in your cluster, you can use the sample flight data w
### Step 2: Select fields to transform
After specifying the indices, you can select the fields you want to use in your transform job, as well as whether to use groupings or aggregations.
After specifying the indexes, you can select the fields you want to use in your transform job, as well as whether to use groupings or aggregations.
You can use groupings to place your data into separate buckets in your transformed index. For example, if you want to group all of the airport destinations within the sample flight data, you can group the `DestAirportID` field into a target field of `DestAirportID_terms` field, and you can find the grouped airport IDs in your transformed index after the transform job finishes.
@ -44,16 +44,17 @@ On the other hand, aggregations let you perform simple calculations. For example
Currently, transform jobs support histogram, date_histogram, and terms groupings. For more information about groupings, see [Bucket Aggregations]({{site.url}}{{site.baseurl}}/opensearch/bucket-agg/). In terms of aggregations, you can select from `sum`, `avg`, `max`, `min`, `value_count`, `percentiles`, and `scripted_metric`. For more information about aggregations, see [Metric Aggregations]({{site.url}}{{site.baseurl}}/opensearch/metric-agg/).
2. Repeat step 1 for any other fields that you want to transform.
3. After selecting the fields that you want to transform and verifying the transformation, choose **Next**.
1. Repeat step 1 for any other fields that you want to transform.
1. After selecting the fields that you want to transform and verifying the transformation, choose **Next**.
### Step 3: Specify a schedule
You can configure transform jobs to run once or multiple times on a schedule. Transform jobs are enabled by default.
1. For **transformation execution frequency**, select **Define by fixed interval** and specify a **transform interval**.
2. Under **Advanced**, specify an optional amount for **Pages per execution**. A larger number means more data is processed in each search request, but also uses more memory and causes higher latency. Exceeding allowed memory limits can cause exceptions and errors to occur.
3. Choose **Next**.
1. Choose whether the job should be **continuous**. Continuous jobs execute at each **transform execution interval** and incrementally transform newly modified buckets, which can include new data added to the source indexes. Non-continuous jobs execute only once.
1. For **transformation execution interval**, specify a transform interval in minutes, hours, or days. This interval dicatates how often continuous jobs should execute, and non-continuous jobs execute once after the interval elapses.
1. Under **Advanced**, specify an optional amount for **Pages per execution**. A larger number means more data is processed in each search request, but also uses more memory and causes higher latency. Exceeding allowed memory limits can cause exceptions and errors to occur.
1. Choose **Next**.
### Step 4: Review and confirm details
@ -103,7 +104,6 @@ GET finished_flight_job/_search
"hits" : [
{
"_index" : "finished_flight_job",
"_type" : "_doc",
"_id" : "dSNKGb8U3OJOmC4RqVCi1Q",
"_score" : 3.845883,
"_source" : {
@ -115,7 +115,6 @@ GET finished_flight_job/_search
},
{
"_index" : "finished_flight_job",
"_type" : "_doc",
"_id" : "_D7oqOy7drx9E-MG96U5RA",
"_score" : 3.845883,
"_source" : {
@ -127,7 +126,6 @@ GET finished_flight_job/_search
},
{
"_index" : "finished_flight_job",
"_type" : "_doc",
"_id" : "YuZ8tOt1OsBA54e84WuAEw",
"_score" : 3.6988301,
"_source" : {
@ -139,7 +137,6 @@ GET finished_flight_job/_search
},
{
"_index" : "finished_flight_job",
"_type" : "_doc",
"_id" : "W_-e7bVmH6eu8veJeK8ZxQ",
"_score" : 3.6988301,
"_source" : {

View File

@ -28,6 +28,7 @@ PUT _plugins/_transform/<transform_id>
{
"transform": {
"enabled": true,
"continuous": true,
"schedule": {
"interval": {
"period": 1,
@ -78,6 +79,7 @@ PUT _plugins/_transform/<transform_id>
"transform": {
"transform_id": "sample",
"schema_version": 7,
"continuous": true,
"schedule": {
"interval": {
"start_time": 1621467964243,
@ -128,17 +130,18 @@ You can specify the following options in the HTTP request body:
Option | Data Type | Description | Required
:--- | :--- | :--- | :---
enabled | Boolean | If true, the transform job is enabled at creation. | No
schedule | JSON | The schedule the transform job runs on. | Yes
continuous | Boolean | Specifies whether the transform job should be continuous. Continuous jobs execute every time they are scheduled according to the `schedule` field and run based off of newly transformed buckets, as well as any new data added to source indexes. Non-continuous jobs execute only once. Default is false. | No
schedule | Object | The schedule the transform job runs on. | Yes
start_time | Integer | The Unix epoch time of the transform job's start time. | Yes
description | String | Describes the transform job. | No
metadata_id | String | Any metadata to be associated with the transform job. | No
source_index | String | The source index whose data to transform. | Yes
target_index | String | The target index the newly transformed data is added into. You can create a new index or update an existing one. | Yes
data_selection_query | JSON | The query DSL to use to filter a subset of the source index for the transform job. See [query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl) for more information. | Yes
page_size | Integer | The number of fields to transform at a time. Higher number means higher performance but requires more memory and can cause higher latency. (Default: 1) | Yes
data_selection_query | Object | The query DSL to use to filter a subset of the source index for the transform job. See [query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl) for more information. | Yes
page_size | Integer | The number of buckets IM processes and indexes concurrently. Higher number means better performance but requires more memory. If your machine runs out of memory, IM automatically adjusts this field and retries until the operation succeeds. | Yes
groups | Array | Specifies the grouping(s) to use in the transform job. Supported groups are `terms`, `histogram`, and `date_histogram`. For more information, see [Bucket Aggregations]({{site.url}}{{site.baseurl}}/opensearch/bucket-agg). | Yes if not using aggregations
source_field | String | The field(s) to transform | Yes
aggregations | JSON | The aggregations to use in the transform job. Supported aggregations are: `sum`, `max`, `min`, `value_count`, `avg`, `scripted_metric`, and `percentiles`. For more information, see [Metric Aggregations]({{site.url}}{{site.baseurl}}/opensearch/metric-agg). | Yes if not using groups
aggregations | Object | The aggregations to use in the transform job. Supported aggregations are: `sum`, `max`, `min`, `value_count`, `avg`, `scripted_metric`, and `percentiles`. For more information, see [Metric Aggregations]({{site.url}}{{site.baseurl}}/opensearch/metric-agg). | Yes if not using groups
## Update a transform job
Introduced 1.0
@ -250,12 +253,23 @@ PUT _plugins/_transform/<transform_id>
}
```
The `Update` operation supports the following URL parameters:
The Update operation supports the following URL parameters:
Parameter | Description | Required
:---| :--- | :---
`if_seq_no` | Only perform the transform operation if the last operation that changed the transform job has the specified sequence number. | No
`if_primary_term` | Only perform the transform operation if the last operation that changed the transform job has the specified sequence term. | No
`if_seq_no` | Only perform the transform operation if the last operation that changed the transform job has the specified sequence number. | Yes
`if_primary_term` | Only perform the transform operation if the last operation that changed the transform job has the specified sequence term. | Yes
You can update the following fields:
Option | Data Type | Description
:--- | :--- | :---
schedule | Object | The schedule the transform job runs on. Contains the fields `interval.start_time`, `interval.period`, and `interval.unit`.
start_time | Integer | The Unix epoch start time of the transform job.
period | Integer | How often to execute the transform job.
unit | String | The unit of time associated with the execution period. Available options are `Minutes`, `Hours`, and `Days`.
description | Integer | Describes the transform job.
page_size | Integer | The number of buckets IM processes and indexes concurrently. Higher number means better performance but requires more memory. If your machine runs out of memory, IM automatically adjusts this field and retries until the operation succeeds.
## Get a transform job's details
Introduced 1.0
@ -280,6 +294,7 @@ GET _plugins/_transform/<transform_id>
"transform": {
"transform_id": "sample",
"schema_version": 7,
"continuous": true,
"schedule": {
"interval": {
"start_time": 1621467964243,
@ -347,6 +362,7 @@ GET _plugins/_transform/
"transform": {
"transform_id": "sample",
"schema_version": 7,
"continuous": true,
"schedule": {
"interval": {
"start_time": 1621467964243,
@ -399,11 +415,11 @@ You can specify these options as the `GET` API operations URL parameters to f
Parameter | Description | Required
:--- | :--- | :---
from | The starting index to search from. (Default: 0) | No
size | Specifies the amount of results to return (Default: 10) | No
from | The starting index to search from. Default is 0. | No
size | Specifies the amount of results to return. Default is 10. | No
search |The search term to use to filter results. | No
sortField | The field to sort results with. | No
sortDirection | Specifies the direction to sort results in. Can be `ASC` or `DESC`. (Default: ASC) | No
sortDirection | Specifies the direction to sort results in. Can be `ASC` or `DESC`. Default is ASC. | No
For example, this request returns two results starting from the eighth index.
@ -584,6 +600,12 @@ GET _plugins/_transform/<transform_id>/_explain
"sample": {
"metadata_id": "PzmjweME5xbgkenl9UpsYw",
"transform_metadata": {
"continuous_stats": {
"last_timestamp": 1621883525672,
"documents_behind": {
"sample_index": 72
}
},
"transform_id": "sample",
"last_updated_at": 1621883525873,
"status": "finished",
@ -691,7 +713,7 @@ POST _plugins/_transform/_preview
Introduced 1.0
{: .label .label-purple }
Deletes a transform job. This operation does not delete the source or target indices.
Deletes a transform job. This operation does not delete the source or target indexes.
**Sample Request**
@ -709,7 +731,6 @@ DELETE _plugins/_transform/<transform_id>
{
"delete": {
"_index": ".opensearch-ism-config",
"_type": "_doc",
"_id": "sample",
"_version": 4,
"result": "deleted",

View File

@ -7,7 +7,7 @@ nav_order: 20
# ISM API
Use the index state management operations to programmatically work with policies and managed indices.
Use the index state management operations to programmatically work with policies and managed indexes.
---
@ -163,8 +163,8 @@ POST _plugins/_ism/add/index_1
}
```
If you use a wildcard `*` while adding a policy to an index, the ISM plugin interprets `*` as all indices, including system indices like `.opendistro-security`, which stores users, roles, and tenants. A delete action in your policy might accidentally delete all user roles and tenants in your cluster.
Don't use the broad `*` wildcard, and instead add a prefix, such as `my-logs*`, when specifying indices with the `_ism/add` API.
If you use a wildcard `*` while adding a policy to an index, the ISM plugin interprets `*` as all indexes, including system indexes like `.opendistro-security`, which stores users, roles, and tenants. A delete action in your policy might accidentally delete all user roles and tenants in your cluster.
Don't use the broad `*` wildcard, and instead add a prefix, such as `my-logs*`, when specifying indexes with the `_ism/add` API.
{: .warning }
---
@ -176,6 +176,8 @@ Introduced 1.0
Updates a policy. Use the `seq_no` and `primary_term` parameters to update an existing policy. If these numbers don't match the existing policy or the policy doesn't exist, ISM throws an error.
It's possible that the policy currently applied to your index isn't the most up-to-date policy available. To see what policy is currently applied to your index, see [Explain index]({{site.url}}{{site.baseurl}}/im-plugin/ism/api/#explain-index). To get the most up-to-date version of a policy, see [Get policy]({{site.url}}{{site.baseurl}}/im-plugin/ism/api/#get-policy).
#### Request
```json
@ -390,9 +392,9 @@ POST _plugins/_ism/remove/index_1
Introduced 1.0
{: .label .label-purple }
Updates the managed index policy to a new policy (or to a new version of the policy). You can use an index pattern to update multiple indices at once. When updating multiple indices, you might want to include a state filter to only affect certain managed indices. The change policy filters out all the existing managed indices and only applies the change to the ones in the state that you specify. You can also explicitly specify the state that the managed index transitions to after the change policy takes effect.
Updates the managed index policy to a new policy (or to a new version of the policy). You can use an index pattern to update multiple indexes at once. When updating multiple indexes, you might want to include a state filter to only affect certain managed indexes. The change policy filters out all the existing managed indexes and only applies the change to the ones in the state that you specify. You can also explicitly specify the state that the managed index transitions to after the change policy takes effect.
A policy change is an asynchronous background process. The changes are queued and are not executed immediately by the background process. This delay in execution protects the currently running managed indices from being put into a broken state. If the policy you are changing to has only some small configuration changes, then the change takes place immediately. For example, if the policy changes the `min_index_age` parameter in a rollover condition from `1000d` to `100d`, this change takes place immediately in its next execution. If the change modifies the state, actions, or the order of actions of the current state the index is in, then the change happens at the end of its current state before transitioning to a new state.
A policy change is an asynchronous background process. The changes are queued and are not executed immediately by the background process. This delay in execution protects the currently running managed indexes from being put into a broken state. If the policy you are changing to has only some small configuration changes, then the change takes place immediately. For example, if the policy changes the `min_index_age` parameter in a rollover condition from `1000d` to `100d`, this change takes place immediately in its next execution. If the change modifies the state, actions, or the order of actions of the current state the index is in, then the change happens at the end of its current state before transitioning to a new state.
In this example, the policy applied on the `index_1` index is changed to `policy_1`, which could either be a completely new policy or an updated version of its existing policy. The process only applies the change if the index is currently in the `searches` state. After this change in policy takes place, `index_1` transitions to the `delete` state.
@ -428,7 +430,7 @@ POST _plugins/_ism/change_policy/index_1
Introduced 1.0
{: .label .label-purple }
Retries the failed action for an index. For the retry call to succeed, ISM must manage the index, and the index must be in a failed state. You can use index patterns (`*`) to retry multiple failed indices.
Retries the failed action for an index. For the retry call to succeed, ISM must manage the index, and the index must be in a failed state. You can use index patterns (`*`) to retry multiple failed indexes.
#### Request
@ -456,7 +458,7 @@ POST _plugins/_ism/retry/index_1
Introduced 1.0
{: .label .label-purple }
Gets the current state of the index. You can use index patterns to get the status of multiple indices.
Gets the current state of the index. You can use index patterns to get the status of multiple indexes.
#### Request
@ -475,6 +477,40 @@ GET _plugins/_ism/explain/index_1
}
```
Optionally, you can add the `show_policy` parameter to your request's path to get the policy that is currently applied to your index, which is useful for seeing whether the policy applied to your index is the latest one. To get the most up-to-date policy, see [Get Policy API]({{site.url}}{{site.baseurl}}/im-plugin/ism/api/#get-policy).
#### Request
```json
GET _plugins/_ism/explain/index_1?show_policy=true
```
#### Sample response
```json
{
"index_1": {
"index.plugins.index_state_management.policy_id": "sample-policy",
"index.opendistro.index_state_management.policy_id": "sample-policy",
"index": "index_1",
"index_uuid": "gCFlS_zcTdih8xyxf3jQ-A",
"policy_id": "sample-policy",
"enabled": true,
"policy": {
"policy_id": "sample-policy",
"description": "ingesting logs",
"last_updated_time": 1647284980148,
"schema_version": 13,
"error_notification": null,
"default_state": "ingest",
"states": [...],
"ism_template": null
}
},
"total_managed_indices": 1
}
```
The `plugins.index_state_management.policy_id` setting is deprecated starting from ODFE version 1.13.0. We retain this field in the response API for consistency.
---
@ -497,7 +533,6 @@ DELETE _plugins/_ism/policies/policy_1
```json
{
"_index": ".opendistro-ism-config",
"_type": "_doc",
"_id": "policy_1",
"_version": 3,
"result": "deleted",

View File

@ -12,7 +12,7 @@ has_toc: false
OpenSearch Dashboards
{: .label .label-yellow :}
If you analyze time-series data, you likely prioritize new data over old data. You might periodically perform certain operations on older indices, such as reducing replica count or deleting them.
If you analyze time-series data, you likely prioritize new data over old data. You might periodically perform certain operations on older indexes, such as reducing replica count or deleting them.
Index State Management (ISM) is a plugin that lets you automate these periodic, administrative operations by triggering them based on changes in the index age, index size, or number of documents. Using the ISM plugin, you can define *policies* that automatically handle index rollovers or deletions to fit your use case.
@ -39,8 +39,8 @@ You can use the visual editor or JSON editor to create policies. Compared to the
2. Choose **Create policy**.
3. Choose **Visual editor**.
4. In the **Policy info** section, enter a policy ID and an optional description.
5. In the **Error notification** section, set up an optional error notification that gets sent whenever a policy execution fails. For more information, see [Error notifications]({{site.url}}{{site.baseurl}}/im-plugin/ism/policies#error-notifications).
6. In **ISM templates**, enter any ISM template patterns to automatically apply this policy to existing and future indices. For example, if you specify a template of `sample-index*`, the ISM plugin automatically applies this policy to any indices whose names start with `sample-index`.
5. In the **Error notification** section, set up an optional error notification that gets sent whenever a policy execution fails. For more information, see [Error notifications]({{site.url}}{{site.baseurl}}/im-plugin/ism/policies#error-notifications). If you're using auto rollovers in your policy, we recommend setting up error notifications, which notify you of unexpectedly large indexes if rollovers fail.
6. In **ISM templates**, enter any ISM template patterns to automatically apply this policy to future indexes. For example, if you specify a template of `sample-index*`, the ISM plugin automatically applies this policy to any indexes whose names start with `sample-index`. Your pattern cannot contain any of the following characters: `:`, `"`, `+`, `/`, `\`, `|`, `?`, `#`, `>`, and `<`.
7. In **States**, add any states you want to include in the policy. Each state has [actions]({{site.url}}{{site.baseurl}}/im-plugin/ism/policies/#actions) the plugin executes when the index enters a certain state, and [transitions]({{site.url}}{{site.baseurl}}/im-plugin/ism/policies/#transitions), which have conditions that, when met, transition the index into a destination state. The first state you create in a policy is automatically set as the initial state. Each policy must have at least one state, but actions and transitions are optional.
8. Choose **Create**.
@ -54,10 +54,10 @@ You can use the visual editor or JSON editor to create policies. Compared to the
5. In the **Define policy** section, enter your policy.
6. Choose **Create**.
After you create a policy, your next step is to attach it to an index or indices.
After you create a policy, your next step is to attach it to an index or indexes.
You can set up an `ism_template` in the policy so when an index that matches the ISM template pattern is created, the plugin automatically attaches the policy to the index.
The following example demonstrates how to create a policy that automatically gets attached to all indices whose names start with `index_name-`.
The following example demonstrates how to create a policy that automatically gets attached to all indexes whose names start with `index_name-`.
```json
PUT _plugins/_ism/policies/policy_id
@ -76,7 +76,7 @@ PUT _plugins/_ism/policies/policy_id
If you have more than one template that matches an index pattern, ISM uses the priority value to determine which template to apply.
For an example ISM template policy, see [Sample policy with ISM template]({{site.url}}{{site.baseurl}}/im-plugin/ism/policies#sample-policy-with-ism-template).
For an example ISM template policy, see [Sample policy with ISM template for auto rollover]({{site.url}}{{site.baseurl}}/im-plugin/ism/policies#sample-policy-with-ism-template-for-auto-rollover).
Older versions of the plugin include the `policy_id` in an index template, so when an index is created that matches the index template pattern, the index will have the policy attached to it:
@ -94,13 +94,13 @@ PUT _index_template/<template_name>
}
```
The `opendistro.index_state_management.policy_id` setting is deprecated. You can continue to automatically manage newly created indices with the ISM template field.
The `opendistro.index_state_management.policy_id` setting is deprecated. You can continue to automatically manage newly created indexes with the ISM template field.
{: .note }
### Step 2: Attach policies to indices
### Step 2: Attach policies to indexes
1. Choose **Indices**.
2. Choose the index or indices that you want to attach your policy to.
1. Choose **indexes**.
2. Choose the index or indexes that you want to attach your policy to.
3. Choose **Apply policy**.
4. From the **Policy ID** menu, choose the policy that you created.
You can see a preview of your policy.
@ -112,13 +112,13 @@ After you attach a policy to an index, ISM creates a job that runs every 5 minut
ISM does not run jobs if the cluster state is red.
### Step 3: Manage indices
### Step 3: Manage indexes
1. Choose **Managed Indices**.
2. To change your policy, see [Change Policy]({{site.url}}{{site.baseurl}}/im-plugin/ism/managedindices#change-policy).
1. Choose **Managed indexes**.
2. To change your policy, see [Change Policy]({{site.url}}{{site.baseurl}}/im-plugin/ism/managedindexes#change-policy).
3. To attach a rollover alias to your index, select your policy and choose **Add rollover alias**.
Make sure that the alias that you enter already exists. For more information about the rollover operation, see [rollover]({{site.url}}{{site.baseurl}}/im-plugin/ism/policies#rollover).
4. To remove a policy, choose your policy, and then choose **Remove policy**.
5. To retry a policy, choose your policy, and then choose **Retry policy**.
For information about managing your policies, see [Managed Indices]({{site.url}}{{site.baseurl}}/im-plugin/ism/managedindices/).
For information about managing your policies, see [Managed indexes]({{site.url}}{{site.baseurl}}/im-plugin/ism/managedindexes/).

View File

@ -10,7 +10,7 @@ has_children: false
Policies are JSON documents that define the following:
- The *states* that an index can be in, including the default state for new indices. For example, you might name your states "hot," "warm," "delete," and so on. For more information, see [States](#states).
- The *states* that an index can be in, including the default state for new indexes. For example, you might name your states "hot," "warm," "delete," and so on. For more information, see [States](#states).
- Any *actions* that you want the plugin to take when an index enters a state, such as performing a rollover. For more information, see [Actions](#actions).
- The conditions that must be met for an index to move into a new state, known as *transitions*. For example, if an index is more than eight weeks old, you might want to move it to the "delete" state. For more information, see [Transitions](#transitions).
@ -57,7 +57,9 @@ Field | Description | Type | Required
Actions are the steps that the policy sequentially executes on entering a specific state.
They are executed in the order in which they are defined.
ISM executes actions in the order in which they are defined. For example, if you define actions [A,B,C,D], ISM executes action A, and then goes into a sleep period based on the cluster setting `plugins.index_state_management.job_interval`. Once the sleep period ends, ISM continues to execute the remaining actions. However, if ISM cannot successfully execute action A, the operation ends, and actions B, C, and D do not get executed.
Optionally, you can define an action's timeout period, which, if exceeded, forcibly fails the action. For example, if timeout is set to `1d`, and ISM has not completed the action within one day, even after retries, the action fails.
This table lists the parameters that you can define for an action.
@ -97,6 +99,7 @@ ISM supports the following operations:
- [read_only](#read_only)
- [read_write](#read_write)
- [replica_count](#replica_count)
- [shrink](#shrink)
- [close](#close)
- [open](#open)
- [delete](#delete)
@ -160,6 +163,59 @@ Parameter | Description | Type | Required
For information about setting replicas, see [Primary and replica shards]({{site.url}}{{site.baseurl}}/opensearch#primary-and-replica-shards).
### shrink
Allows you to reduce the number of primary shards in your indexes. With this action, you can specify:
- The number of primary shards that the target index should contain.
- A max shard size for the primary shards in the target index.
- Specify a percentage to shrink the number of primary shards in the target index.
```json
"shrink": {
"num_new_shards": 1,
"target_index_name_template": {
"source": "{{ctx.index}}_shrunken"
},
"aliases": [
"my-alias": {}
],
"force_unsafe": false
}
```
Parameter | Description | Type | Example | Required
:--- | :--- |:--- |:--- |
`num_new_shards` | The maximum number of primary shards in the shrunken index. | integer | `5` | Yes, however it cannot be used with `max_shard_size` or `percentage_of_source_shards`
`max_shard_size` | The maximum size in bytes of a shard for the target index. | keyword | `5gb` | Yes, however it cannot be used with `num_new_shards` or `percentage_of_source_shards`
`percentage_of_source_shards` | Percentage of the number of original primary shards to shrink. This parameter indicates the minimum percentage to use when shrinking the number of primary shards. Must be between 0.0 and 1.0, exclusive. | Percentage | `0.5` | Yes, however it cannot be used with `max_shard_size` or `num_new_shards`
`target_index_name_template` | The name of the shrunken index. Accepts strings and the Mustache variables `{{ctx.index}}` and `{{ctx.indexUuid}}`. | `string` or Mustache template | `{"source": "{{ctx.index}}_shrunken"}` | No
`aliases` | Aliases to add to the new index. | object | `myalias` | No, but must be an array of alias objects
`force_unsafe` | If true, executes the shrink action even if there are no replicas. | boolean | `false` | No
If you want to add `aliases` to the action, the parameter must include an array of [alias objects]({{site.url}}{{site.baseurl}}/opensearch/rest-api/alias/). For example,
```json
"aliases": [
{
"my-alias": {}
},
{
"my-second-alias": {
"is_write_index": false,
"filter": {
"multi_match": {
"query": "QUEEN",
"fields": ["speaker", "text_entry"]
}
},
"index_routing" : "1",
"search_routing" : "1"
}
},
]
```
### close
Closes the managed index.
@ -170,7 +226,7 @@ Closes the managed index.
}
```
Closed indices remain on disk, but consume no CPU or memory. You can't read from, write to, or search closed indices.
Closed indexes remain on disk, but consume no CPU or memory. You can't read from, write to, or search closed indexes.
Closing an index is a good option if you need to retain data for longer than you need to actively search it and have sufficient disk space on your data nodes. If you need to search the data again, reopening a closed index is simpler than restoring an index from a snapshot.
@ -198,14 +254,17 @@ Deletes a managed index.
Rolls an alias over to a new index when the managed index meets one of the rollover conditions.
**Important**: ISM checks the conditions for operations on **every execution of the policy** based on the **set interval**, _not_ continuously. The rollover will be performed if the value **has reached** or _exceeded_ the configured limit **when the check is performed**. For example with `min_size` configured to a value of 100GiB, ISM might check the index at 99 GiB and not perform the rollover. However, if the index has grown past the limit (e.g. 105GiB) by the next check, the operation is performed.
The index format must match the pattern: `^.*-\d+$`. For example, `(logs-000001)`.
Set `index.plugins.index_state_management.rollover_alias` as the alias to rollover.
Parameter | Description | Type | Example | Required
:--- | :--- |:--- |:--- |
`min_size` | The minimum size of the total primary shard storage (not counting replicas) required to roll over the index. For example, if you set `min_size` to 100 GiB and your index has 5 primary shards and 5 replica shards of 20 GiB each, the total size of the primaries is 100 GiB, so the rollover occurs. ISM doesn't check indices continually, so it doesn't roll over indices at exactly 100 GiB. Instead, if an index is continuously growing, ISM might check it at 99 GiB, not perform the rollover, check again when the shards reach 105 GiB, and then perform the operation. | `string` | `20gb` or `5mb` | No
`min_doc_count` | The minimum number of documents required to roll over the index. | `number` | `2000000` | No
`min_index_age` | The minimum age required to roll over the index. Index age is the time between its creation and the present. | `string` | `5d` or `7h` | No
`min_size` | The minimum size of the total primary shard storage (not counting replicas) required to roll over the index. For example, if you set `min_size` to 100 GiB and your index has 5 primary shards and 5 replica shards of 20 GiB each, the total size of all primary shards is 100 GiB, so the rollover occurs. See **Important** note above. | `string` | `20gb` or `5mb` | No
`min_primary_shard_size` | The minimum storage size of a **single primary shard** required to roll over the index. For example, if you set `min_primary_shard_size` to 30 GiB and **one of** the primary shards in the index has a size greater than the condition, the rollover occurs. See **Important** note above. | `string` | `20gb` or `5mb` | No
`min_doc_count` | The minimum number of documents required to roll over the index. See **Important** note above. | `number` | `2000000` | No
`min_index_age` | The minimum age required to roll over the index. Index age is the time between its creation and the present. See **Important** note above. | `string` | `5d` or `7h` | No
```json
{
@ -215,6 +274,14 @@ Parameter | Description | Type | Example | Required
}
```
```json
{
"rollover": {
"min_primary_shard_size": "30gb"
}
}
```
```json
{
"rollover": {
@ -307,27 +374,27 @@ Parameter | Description | Type
### snapshot
Backup your clusters indices and state. For more information about snapshots, see [Take and restore snapshots]({{site.url}}{{site.baseurl}}/opensearch/snapshot-restore/).
Backup your clusters indexes and state. For more information about snapshots, see [Take and restore snapshots]({{site.url}}{{site.baseurl}}/opensearch/snapshot-restore/).
The `snapshot` operation has the following parameters:
Parameter | Description | Type | Required | Default
:--- | :--- |:--- |:--- |
`repository` | The repository name that you register through the native snapshot API operations. | `string` | Yes | -
`snapshot` | The name of the snapshot. | `string` | Yes | -
`snapshot` | The name of the snapshot. Accepts strings and the Mustache variables `{{ctx.index}}` and `{{ctx.indexUuid}}`. If the Mustache variables are invalid, then the snapshot name defaults to the index's name. | `string` or Mustache template | Yes | -
```json
{
"snapshot": {
"repository": "my_backup",
"snapshot": "my_snapshot"
"snapshot": "{{ctx.indexUuid}}"
}
}
```
### index_priority
Set the priority for the index in a specific state. Unallocated shards of indices are recovered in the order of their priority, whenever possible. The indices with higher priority values are recovered first followed by the indices with lower priority values.
Set the priority for the index in a specific state. Unallocated shards of indexes are recovered in the order of their priority, whenever possible. The indexes with higher priority values are recovered first followed by the indexes with lower priority values.
The `index_priority` operation has the following parameter:
@ -375,7 +442,7 @@ Parameter | Description | Type | Required
Transitions define the conditions that need to be met for a state to change. After all actions in the current state are completed, the policy starts checking the conditions for transitions.
Transitions are evaluated in the order in which they are defined. For example, if the conditions for the first transition are met, then this transition takes place and the rest of the transitions are dismissed.
ISM evaluates transitions in the order in which they are defined. For example, if you define transitions: [A,B,C,D], ISM iterates through this list of transitions until it finds a transition that evaluates to `true`, it then stops and sets the next state to the one defined in that transition. On its next execution, ISM dismisses the rest of the transitions and starts in that new state.
If you don't specify any conditions in a transition and leave it empty, then it's assumed to be the equivalent of always true. This means that the policy transitions the index to this state the moment it checks.
@ -392,7 +459,7 @@ Parameter | Description | Type | Required
:--- | :--- |:--- |:--- |
`min_index_age` | The minimum age of the index required to transition. | `string` | No
`min_doc_count` | The minimum document count of the index required to transition. | `number` | No
`min_size` | The minimum size of the index required to transition. | `string` | No
`min_size` | The minimum size of the total primary shard storage (not counting replicas) required to transition. For example, if you set `min_size` to 100 GiB and your index has 5 primary shards and 5 replica shards of 20 GiB each, the total size of all primary shards is 100 GiB, so your index is transitioned to the next state. | `string` | No
`cron` | The `cron` job that triggers the transition if no other transition happens first. | `object` | No
`cron.cron.expression` | The `cron` expression that triggers the transition. | `string` | Yes
`cron.cron.timezone` | The timezone that triggers the transition. | `string` | Yes
@ -412,7 +479,7 @@ The following example transitions the index to a `cold` state after a period of
ISM checks the conditions on every execution of the policy based on the set interval.
This example uses the `cron` condition to transition indices every Saturday at 5:00 PT:
This example uses the `cron` condition to transition indexes every Saturday at 5:00 PT:
```json
"transitions": [
@ -533,10 +600,12 @@ The destination system **must** return a response otherwise the `error_notificat
You can use the same options for `ctx` variables as the [notification](#notification) operation.
## Sample policy with ISM template
## Sample policy with ISM template for auto rollover
The following sample template policy is for a rollover use case.
If you want to skip rollovers for an index, set `index.plugins.index_state_management.rollover_skip` to `true` in the settings of that index.
1. Create a policy with an `ism_template` field:
```json
@ -612,7 +681,7 @@ The following sample template policy is for a rollover use case.
## Example policy
The following example policy implements a `hot`, `warm`, and `delete` workflow. You can use this policy as a template to prioritize resources to your indices based on their levels of activity.
The following example policy implements a `hot`, `warm`, and `delete` workflow. You can use this policy as a template to prioritize resources to your indexes based on their levels of activity.
In this case, an index is initially in a `hot` state. After a day, it changes to a `warm` state, where the number of replicas increases to 5 to improve the read performance.
@ -630,7 +699,8 @@ After 30 days, the policy moves this index into a `delete` state. The service se
"actions": [
{
"rollover": {
"min_index_age": "1d"
"min_index_age": "1d",
"min_primary_shard_size": "30gb"
}
}
],
@ -678,7 +748,11 @@ After 30 days, the policy moves this index into a `delete` state. The service se
}
]
}
]
],
"ism_template": {
"index_patterns": ["log*"],
"priority": 100
}
}
}
```

View File

@ -17,6 +17,7 @@ Setting | Default | Description
:--- | :--- | :---
`plugins.index_state_management.enabled` | True | Specifies whether ISM is enabled or not.
`plugins.index_state_management.job_interval` | 5 minutes | The interval at which the managed index jobs are run.
`plugins.index_state_management.jitter` | 0.6 | A randomized delay that is added to a job's base run time to prevent a surge of activity from all indices at the same time. A value of 0.6 means a delay of 0-60% of a job interval is added to the base interval. For example, if you have a base interval time of 30 minutes, a value of 0.6 means an amount anywhere between 0 to 18 minutes gets added to your job interval. Maximum is 1, which means an additional interval time of 100%. This maximum cannot exceed `plugins.jobscheduler.jitter_limit`, which also has a default of 0.6. For example, if `plugins.index_state_management.jitter` is set to 0.8, ISM uses `plugins.jobscheduler.jitter_limit` of 0.6 instead.
`plugins.index_state_management.coordinator.sweep_period` | 10 minutes | How often the routine background sweep is run.
`plugins.index_state_management.coordinator.backoff_millis` | 50 milliseconds | The backoff time between retries for failures in the `ManagedIndexCoordinator` (such as when we update managed indices).
`plugins.index_state_management.coordinator.backoff_count` | 2 | The count of retries for failures in the `ManagedIndexCoordinator`.

View File

@ -29,8 +29,8 @@
</svg></a>
<p class="copyright">&copy; {{ 'now' | date: "%Y" }}
<a href="https://aws.amazon.com/"> Amazon Web Services</a> and individual contributors. OpenSearch is a
<a href="/trademark-usage.html">registered trademark</a> of Amazon Web Services.</a> <br /><br />
<a href="https://opensearch.org/"> OpenSearch</a> contributors. OpenSearch is a
<a href="/trademark-usage.html">registered trademark</a> of <a href="https://aws.amazon.com/"> Amazon Web Services</a>. <br /><br />
© 2005-2021
<a href="https://www.djangoproject.com/foundation/"> Django Software

View File

@ -57,7 +57,6 @@ layout: table_wrappers
</a>
</div>
<nav role="navigation" aria-label="Main" id="site-nav" class="site-nav">
{% assign past_versions = site.data.versions.past | join: ";" %}
<div class="version-wrapper">
<version-selector selected="{{ site.data.versions.current }}"></version-selector>
</div>
@ -105,13 +104,24 @@ layout: table_wrappers
<div class="copy-banner">
<div class="container">
<h1><a href="#">Documentation</a></h1>
{% if site.search_enabled != false %}
{% if site.search_enabled != false or site.use_custom_search == true %}
<div class="search">
<div class="search-input-wrap">
<input type="text" id="search-input" class="search-input" tabindex="0" placeholder="Search..." aria-label="Search {{ site.title }}" autocomplete="off">
{% assign docs_version = site.data.versions.current %}
{% if site.baseurl == '/docs/latest' %}
{% assign docs_version = "latest" %}
{% endif %}
<input type="text" id="search-input" class="search-input"
tabindex="0" placeholder="Search..." aria-label="Search {{ site.title }}"
data-docs-version="{{ docs_version }}" autocomplete="off">
<div class="search-spinner"><i></i></div>
<label for="search-input" class="search-label"><svg viewBox="0 0 24 24" class="search-icon"><use xlink:href="#svg-search"></use></svg></label>
</div>
{% if site.search_enabled != false %}
<div id="search-results" class="search-results"></div>
{% elsif site.use_custom_search == true %}
<div id="search-results" class="search-results custom-search-results"></div>
{% endif %}
</div>
{% endif %}
</div>
@ -213,5 +223,8 @@ layout: table_wrappers
</script>
{% endif %}
<script src="{{ '/assets/js/header-nav.js' | relative_url }}"></script>
{% if site.search_enabled == false and site.use_custom_search == true %}
<script src="{{ '/assets/js/search.js' | relative_url }}"></script>
{% endif %}
</body>
</html>

736
_ml-commons-plugin/api.md Normal file
View File

@ -0,0 +1,736 @@
---
layout: default
title: API
has_children: false
nav_order: 99
---
# ML Commons API
---
#### Table of contents
- TOC
{:toc}
---
The Machine Learning (ML) commons API lets you train ML algorithms synchronously and asynchronously, make predictions with that trained model, and train and predict with the same data set.
In order to train tasks through the API, three inputs are required.
- Algorithm name: Must be one of a [FunctionName](https://github.com/opensearch-project/ml-commons/blob/1.3/common/src/main/java/org/opensearch/ml/common/parameter/FunctionName.java). This determines what algorithm the ML Engine runs. To add a new function, see [How To Add a New Function](https://github.com/opensearch-project/ml-commons/blob/main/docs/how-to-add-new-function.md).
- Model hyper parameters: Adjust these parameters to make the model train better.
- Input data: The data input that trains the ML model, or applies the ML models to predictions. You can input data in two ways, query against your index or use data frame.
## Train model
Training can occur both synchronously and asynchronously.
### Request
The following examples use the kmeans algorithm to train index data.
**Train with kmeans synchronously**
```json
POST /_plugins/_ml/_train/kmeans
{
"parameters": {
"centroids": 3,
"iterations": 10,
"distance_type": "COSINE"
},
"input_query": {
"_source": ["petal_length_in_cm", "petal_width_in_cm"],
"size": 10000
},
"input_index": [
"iris_data"
]
}
```
**Train with kmeans asynchronously**
```json
POST /_plugins/_ml/_train/kmeans?async=true
{
"parameters": {
"centroids": 3,
"iterations": 10,
"distance_type": "COSINE"
},
"input_query": {
"_source": ["petal_length_in_cm", "petal_width_in_cm"],
"size": 10000
},
"input_index": [
"iris_data"
]
}
```
### Response
**Synchronously**
For synchronous responses, the API returns the model_id, which can be used to get or delete a model.
```json
{
"model_id" : "lblVmX8BO5w8y8RaYYvN",
"status" : "COMPLETED"
}
```
**Asynchronously**
For asynchronous responses, the API returns the task_id, which can be used to get or delete a task.
```json
{
"task_id" : "lrlamX8BO5w8y8Ra2otd",
"status" : "CREATED"
}
```
## Get model information
You can retrieve information on your model using the model_id.
```json
GET /_plugins/_ml/models/<model-id>
```
The API returns information on the model, the algorithm used, and the content found within the model.
```json
{
"name" : "KMEANS",
"algorithm" : "KMEANS",
"version" : 1,
"content" : ""
}
```
## Search model
Use this command to search models you're already created.
```json
POST /_plugins/_ml/models/_search
{query}
```
### Example: Query all models
```json
POST /_plugins/_ml/models/_search
{
"query": {
"match_all": {}
},
"size": 1000
}
```
### Example: Query models with algorithm "FIT_RCF"
```json
POST /_plugins/_ml/models/_search
{
"query": {
"term": {
"algorithm": {
"value": "FIT_RCF"
}
}
}
}
```
### Response
```json
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 2.4159138,
"hits" : [
{
"_index" : ".plugins-ml-model",
"_id" : "-QkKJX8BvytMh9aUeuLD",
"_version" : 1,
"_seq_no" : 12,
"_primary_term" : 15,
"_score" : 2.4159138,
"_source" : {
"name" : "FIT_RCF",
"version" : 1,
"content" : "xxx",
"algorithm" : "FIT_RCF"
}
},
{
"_index" : ".plugins-ml-model",
"_id" : "OxkvHn8BNJ65KnIpck8x",
"_version" : 1,
"_seq_no" : 2,
"_primary_term" : 8,
"_score" : 2.4159138,
"_source" : {
"name" : "FIT_RCF",
"version" : 1,
"content" : "xxx",
"algorithm" : "FIT_RCF"
}
}
]
}
}
```
## Delete model
Deletes a model based on the model_id
```json
DELETE /_plugins/_ml/models/<model_id>
```
The API returns the following:
```json
{
"_index" : ".plugins-ml-model",
"_id" : "MzcIJX8BA7mbufL6DOwl",
"_version" : 2,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 27,
"_primary_term" : 18
}
```
## Predict
ML commons can predict new data with your trained model either from indexed data or a data frame. The model_id is required to use the Predict API.
```json
POST /_plugins/_ml/_predict/<algorithm_name>/<model_id>
```
### Request
```json
POST /_plugins/_ml/_predict/kmeans/<model-id>
{
"input_query": {
"_source": ["petal_length_in_cm", "petal_width_in_cm"],
"size": 10000
},
"input_index": [
"iris_data"
]
}
```
### Response
```json
{
"status" : "COMPLETED",
"prediction_result" : {
"column_metas" : [
{
"name" : "ClusterID",
"column_type" : "INTEGER"
}
],
"rows" : [
{
"values" : [
{
"column_type" : "INTEGER",
"value" : 1
}
]
},
{
"values" : [
{
"column_type" : "INTEGER",
"value" : 1
}
]
},
{
"values" : [
{
"column_type" : "INTEGER",
"value" : 0
}
]
},
{
"values" : [
{
"column_type" : "INTEGER",
"value" : 0
}
]
},
{
"values" : [
{
"column_type" : "INTEGER",
"value" : 0
}
]
},
{
"values" : [
{
"column_type" : "INTEGER",
"value" : 0
}
]
}
]
}
```
## Train and predict
Use to train and then immediately predict against the same training data set. Can only be used with unsupervised learning models and the following algorithms:
- BATCH_RCF
- FIT_RCF
- kmeans
### Example: Train and predict with indexed data
```json
POST /_plugins/_ml/_train_predict/kmeans
{
"parameters": {
"centroids": 2,
"iterations": 10,
"distance_type": "COSINE"
},
"input_query": {
"query": {
"bool": {
"filter": [
{
"range": {
"k1": {
"gte": 0
}
}
}
]
}
},
"size": 10
},
"input_index": [
"test_data"
]
}
```
### Example: Train and predict with data directly
```json
POST /_plugins/_ml/_train_predict/kmeans
{
"parameters": {
"centroids": 2,
"iterations": 1,
"distance_type": "EUCLIDEAN"
},
"input_data": {
"column_metas": [
{
"name": "k1",
"column_type": "DOUBLE"
},
{
"name": "k2",
"column_type": "DOUBLE"
}
],
"rows": [
{
"values": [
{
"column_type": "DOUBLE",
"value": 1.00
},
{
"column_type": "DOUBLE",
"value": 2.00
}
]
},
{
"values": [
{
"column_type": "DOUBLE",
"value": 1.00
},
{
"column_type": "DOUBLE",
"value": 4.00
}
]
},
{
"values": [
{
"column_type": "DOUBLE",
"value": 1.00
},
{
"column_type": "DOUBLE",
"value": 0.00
}
]
},
{
"values": [
{
"column_type": "DOUBLE",
"value": 10.00
},
{
"column_type": "DOUBLE",
"value": 2.00
}
]
},
{
"values": [
{
"column_type": "DOUBLE",
"value": 10.00
},
{
"column_type": "DOUBLE",
"value": 4.00
}
]
},
{
"values": [
{
"column_type": "DOUBLE",
"value": 10.00
},
{
"column_type": "DOUBLE",
"value": 0.00
}
]
}
]
}
}
```
### Response
```json
{
"status" : "COMPLETED",
"prediction_result" : {
"column_metas" : [
{
"name" : "ClusterID",
"column_type" : "INTEGER"
}
],
"rows" : [
{
"values" : [
{
"column_type" : "INTEGER",
"value" : 1
}
]
},
{
"values" : [
{
"column_type" : "INTEGER",
"value" : 1
}
]
},
{
"values" : [
{
"column_type" : "INTEGER",
"value" : 1
}
]
},
{
"values" : [
{
"column_type" : "INTEGER",
"value" : 0
}
]
},
{
"values" : [
{
"column_type" : "INTEGER",
"value" : 0
}
]
},
{
"values" : [
{
"column_type" : "INTEGER",
"value" : 0
}
]
}
]
}
}
```
## Get task information
You can retrieve information about a task using the task_id.
```json
GET /_plugins/_ml/tasks/<task_id>
```
The response includes information about the task.
```json
{
"model_id" : "l7lamX8BO5w8y8Ra2oty",
"task_type" : "TRAINING",
"function_name" : "KMEANS",
"state" : "COMPLETED",
"input_type" : "SEARCH_QUERY",
"worker_node" : "54xOe0w8Qjyze00UuLDfdA",
"create_time" : 1647545342556,
"last_update_time" : 1647545342587,
"is_async" : true
}
```
## Search task
Search tasks based on parameters indicated in the request body.
```json
GET /_plugins/_ml/tasks/_search
{query body}
```
### Example: Search task which "function_name" is "KMEANS"
```json
GET /_plugins/_ml/tasks/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"function_name": "KMEANS"
}
}
]
}
}
}
```
### Response
```json
{
"took" : 12,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : ".plugins-ml-task",
"_id" : "_wnLJ38BvytMh9aUi-Ia",
"_version" : 4,
"_seq_no" : 29,
"_primary_term" : 4,
"_score" : 0.0,
"_source" : {
"last_update_time" : 1645640125267,
"create_time" : 1645640125209,
"is_async" : true,
"function_name" : "KMEANS",
"input_type" : "SEARCH_QUERY",
"worker_node" : "jjqFrlW7QWmni1tRnb_7Dg",
"state" : "COMPLETED",
"model_id" : "AAnLJ38BvytMh9aUi-M2",
"task_type" : "TRAINING"
}
},
{
"_index" : ".plugins-ml-task",
"_id" : "wwRRLX8BydmmU1x6I-AI",
"_version" : 3,
"_seq_no" : 38,
"_primary_term" : 7,
"_score" : 0.0,
"_source" : {
"last_update_time" : 1645732766656,
"create_time" : 1645732766472,
"is_async" : true,
"function_name" : "KMEANS",
"input_type" : "SEARCH_QUERY",
"worker_node" : "A_IiqoloTDK01uZvCjREaA",
"state" : "COMPLETED",
"model_id" : "xARRLX8BydmmU1x6I-CG",
"task_type" : "TRAINING"
}
}
]
}
}
```
## Delete task
Delete a task based on the task_id.
```json
DELETE /_plugins/_ml/tasks/{task_id}
```
The API returns the following:
```json
{
"_index" : ".plugins-ml-task",
"_id" : "xQRYLX8BydmmU1x6nuD3",
"_version" : 4,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 42,
"_primary_term" : 7
}
```
## Stats
Get statistics related to the number of tasks.
To receive all stats, use:
```json
GET /_plugins/_ml/stats
```
To receive stats for a specific node, use:
```json
GET /_plugins/_ml/<nodeId>/stats/
```
To receive stats for a specific node and return a specified stat, use:
```json
GET /_plugins/_ml/<nodeId>/stats/<stat>
```
To receive information on a specific stat from all nodes, use:
```json
GET /_plugins/_ml/stats/<stat>
```
### Example: Get all stats
```json
GET /_plugins/_ml/stats
```
### Response
```json
{
"zbduvgCCSOeu6cfbQhTpnQ" : {
"ml_executing_task_count" : 0
},
"54xOe0w8Qjyze00UuLDfdA" : {
"ml_executing_task_count" : 0
},
"UJiykI7bTKiCpR-rqLYHyw" : {
"ml_executing_task_count" : 0
},
"zj2_NgIbTP-StNlGZJlxdg" : {
"ml_executing_task_count" : 0
},
"jjqFrlW7QWmni1tRnb_7Dg" : {
"ml_executing_task_count" : 0
},
"3pSSjl5PSVqzv5-hBdFqyA" : {
"ml_executing_task_count" : 0
},
"A_IiqoloTDK01uZvCjREaA" : {
"ml_executing_task_count" : 0
}
}
```

View File

@ -0,0 +1,25 @@
---
layout: default
title: About ML Commons
nav_order: 1
has_children: false
has_toc: false
---
# ML Commons plugin
ML Commons for OpenSearch eases the development of machine learning features by providing a set of common machine learning (ML) algorithms through transport and REST API calls. Those calls choose the right nodes and resources for each ML request and monitors ML tasks to ensure uptime. This allows you to leverage existing open-source ML algorithms and reduce the effort required to develop new ML features.
Interaction with the ML commons plugin occurs through either the [REST API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api) or [AD]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/commands#ad) and [kmeans]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/commands#kmeans) PPL commands.
Models [trained]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#train-model) through the ML Commons plugin support model-based algorithms such as kmeans. After you've trained a model enough so that it meets your precision requirements, you can apply the model to [predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#predict) new data safely.
Should you not want to use a model, you can use the [Train and Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#train-and-predict) API to test your model without having to evaluate the model's performance.
## Permissions
There are two reserved user roles that can use of the ML commons plugin.
- `ml_full_access`: Full access to all ML features, including starting new ML tasks and reading or deleting models.
- `ml_readonly_access`: Can only read ML tasks, trained models and statistics relevant to the model's cluster. Cannot start nor delete ML tasks or models.

View File

@ -24,7 +24,7 @@ Introduced 1.0
Creates an anomaly detector.
This command creates a single-entity detector named `test-detector` that finds anomalies based on the sum of the `value` field:
This command creates a single-entity detector named `test-detector` that finds anomalies based on the sum of the `value` field and stores the result in a custom `opensearch-ad-plugin-result-test` index:
#### Request
@ -76,7 +76,8 @@ POST _plugins/_anomaly_detection/detectors
"interval": 1,
"unit": "Minutes"
}
}
},
"result_index" : "opensearch-ad-plugin-result-test"
}
```
@ -321,7 +322,7 @@ You can specify the following options.
Options | Description | Type | Required
:--- | :--- |:--- |:--- |
`name` | The name of the detector. | `string` | Yes
`description` | A description of the detector. | `string` | Yes
`description` | A description of the detector. | `string` | No
`time_field` | The name of the time field. | `string` | Yes
`indices` | A list of indices to use as the data source. | `list` | Yes
`feature_attributes` | Specify a `feature_name`, set the `enabled` parameter to `true`, and specify an aggregation query. | `list` | Yes
@ -332,6 +333,203 @@ Options | Description | Type | Required
---
## Validate detector
Introduced 1.2
{: .label .label-purple }
Returns whether the detector configuration has any issues that might prevent OpenSearch from creating the detector.
You can use the validate detector API operation to identify issues in your detector configuration before creating the detector.
The request body consists of the detector configuration and follows the same format as the request body of the [create detector API]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/api#create-anomaly-detector).
You have the following validation options:
- Only validate against the detector configuration and find any issues that would completely block detector creation:
```
POST _plugins/_anomaly_detection/detectors/_validate
POST _plugins/_anomaly_detection/detectors/_validate/detector
```
- Validate against the source data to see how likely the detector would complete model training.
```
POST _plugins/_anomaly_detection/detectors/_validate/model
```
Responses from this API operation return either blocking issues as detector type responses or a response indicating a field that could be revised to increase likelihood of model training completing successfully. Model type issues dont need to be fixed for detector creation to succeed, but the detector would likely not train successfully if they arent addressed.
#### Request
```json
POST _plugins/_anomaly_detection/detectors/_validate
POST _plugins/_anomaly_detection/detectors/_validate/detector
{
"name": "test-detector",
"description": "Test detector",
"time_field": "timestamp",
"indices": [
"server_log*"
],
"feature_attributes": [
{
"feature_name": "test",
"feature_enabled": true,
"aggregation_query": {
"test": {
"sum": {
"field": "value"
}
}
}
}
],
"filter_query": {
"bool": {
"filter": [
{
"range": {
"value": {
"gt": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"detection_interval": {
"period": {
"interval": 1,
"unit": "Minutes"
}
},
"window_delay": {
"period": {
"interval": 1,
"unit": "Minutes"
}
}
}
```
If the validate detector API doesnt find any issue in the detector configuration, it returns an empty response:
#### Sample response
```json
{}
```
If the validate detector API finds an issue, it returns a message explaining what's wrong with the configuration. In this example, the feature query aggregates over a field that doesnt exist in the data source:
#### Sample response
```json
{
"detector": {
"feature_attributes": {
"message": "Feature has invalid query returning empty aggregated data: average_total_rev",
"sub_issues": {
"average_total_rev": "Feature has invalid query returning empty aggregated data"
}
}
}
}
```
The following request validates against the source data to see if model training might succeed. In this example, the data is ingested at a rate of every 5 minutes, and detector interval is set to 1 minute.
```json
POST _plugins/_anomaly_detection/detectors/_validate/model
{
"name": "test-detector",
"description": "Test detector",
"time_field": "timestamp",
"indices": [
"server_log*"
],
"feature_attributes": [
{
"feature_name": "test",
"feature_enabled": true,
"aggregation_query": {
"test": {
"sum": {
"field": "value"
}
}
}
}
],
"filter_query": {
"bool": {
"filter": [
{
"range": {
"value": {
"gt": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"detection_interval": {
"period": {
"interval": 1,
"unit": "Minutes"
}
},
"window_delay": {
"period": {
"interval": 1,
"unit": "Minutes"
}
}
}
```
If the validate detector API finds areas of improvement with your configuration, it returns a response with suggestions about how you can change your configuration to improve model training.
#### Sample Responses
In this example, the validate detector API returns a response indicating that changing the detector interval length to at least four minutes can increase the chances of successful model training.
```json
{
"model": {
"detection_interval": {
"message": "The selected detector interval might collect sparse data. Consider changing interval length to: 4",
"suggested_value": {
"period": {
"interval": 4,
"unit": "Minutes"
}
}
}
}
}
```
Another response might indicate that you can change `filter_query` (data filter) because the currently filtered data is too sparse for the model to train correctly, which can happen because the index is also ingesting data that falls outside the chosen filter. Using another `filter_query` can make your data more dense.
```json
{
"model": {
"filter_query": {
"message": "Data is too sparse after data filter is applied. Consider changing the data filter"
}
}
}
```
---
## Get detector
Introduced 1.0
{: .label .label-purple }
@ -1004,7 +1202,6 @@ DELETE _plugins/_anomaly_detection/detectors/<detectorId>
```json
{
"_index": ".opensearch-anomaly-detectors",
"_type": "_doc",
"_id": "70TxTXwBjd8s6RK4j1Pj",
"_version": 2,
"result": "deleted",
@ -1594,7 +1791,6 @@ POST _plugins/_anomaly_detection/detectors/_search
"hits": [
{
"_index": ".opensearch-anomaly-detectors",
"_type": "_doc",
"_id": "Zi5zTXwBwf_U8gjUTfJG",
"_version": 1,
"_seq_no": 1,
@ -1726,7 +1922,6 @@ POST _plugins/_anomaly_detection/detectors/tasks/_search
"hits": [
{
"_index": ".opensearch-anomaly-detection-state",
"_type": "_doc",
"_id": "fm-RTXwBYwCbWecgB753",
"_version": 34,
"_seq_no": 928,
@ -1974,20 +2169,46 @@ Introduced 1.0
Returns all results for a search query.
To search anomaly results for `grade` greater than 0 for real-time analysis:
You have the following search options:
- To search only the default result index, simply use the search API:
```json
POST _plugins/_anomaly_detection/detectors/results/_search/
```
- To search both the custom result index and default result index, you can either add the custom result index to the search API:
```json
POST _plugins/_anomaly_detection/detectors/results/_search/<custom_result_index>
```
Or, add the custom result index and set the `only_query_custom_result_index` parameter to `false`:
```json
POST _plugins/_anomaly_detection/detectors/results/_search/<custom_result_index>?only_query_custom_result_index=false
```
- To search only the custom result index, add the custom result index to the search API and set the `only_query_custom_result_index` parameter to `true`:
```json
POST _plugins/_anomaly_detection/detectors/results/_search/<custom_result_index>?only_query_custom_result_index=true
```
The following example searches anomaly results for grade greater than 0 for real-time analysis:
#### Request
```json
GET _plugins/_anomaly_detection/detectors/results/_search
POST _plugins/_anomaly_detection/detectors/results/_search
GET _plugins/_anomaly_detection/detectors/results/_search/opensearch-ad-plugin-result-test
POST _plugins/_anomaly_detection/detectors/results/_search/opensearch-ad-plugin-result-test
{
"query": {
"bool": {
"filter": [
{
"term": {
"detector_id": "Zi5zTXwBwf_U8gjUTfJG"
"detector_id": "EWy02nwBm38sXcF2AiFJ"
}
},
{
@ -2010,8 +2231,14 @@ POST _plugins/_anomaly_detection/detectors/results/_search
}
```
If you specify the custom result index like in this example, the search results API searches both the default result indices and custom result indices.
If you don't specify the custom result index and you just use the `_plugins/_anomaly_detection/detectors/results/_search` URL, the anomaly detection plugin searches only the default result indices.
Real-time detection doesn't persist the task ID in the anomaly result, so the task ID will be null.
For information about the response body fields, see [Anomaly result mapping]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/result-mapping/#response-body-fields).
#### Sample response
```json
@ -2033,16 +2260,15 @@ Real-time detection doesn't persist the task ID in the anomaly result, so the ta
"hits": [
{
"_index": ".opensearch-anomaly-results-history-2021.10.04-1",
"_type": "_doc",
"_id": "686KTXwB6HknB84SMr6G",
"_version": 1,
"_seq_no": 103622,
"_primary_term": 1,
"_score": 0,
"_source": {
"detector_id": "Zi5zTXwBwf_U8gjUTfJG",
"detector_id": "EWy02nwBm38sXcF2AiFJ",
"confidence": 0.918886275269358,
"model_id": "Zi5zTXwBwf_U8gjUTfJG_entity_error16",
"model_id": "EWy02nwBm38sXcF2AiFJ_entity_error16",
"schema_version": 4,
"anomaly_score": 1.1093755891885446,
"execution_start_time": 1633388475001,
@ -2055,6 +2281,23 @@ Real-time detection doesn't persist the task ID in the anomaly result, so the ta
"data": 0.532
}
],
"relevant_attribution": [
{
"feature_id": "ZS5zTXwBwf_U8gjUTfIn",
"data": 1.0
}
],
"expected_values": [
{
"likelihood": 1,
"value_list": [
{
"feature_id": "ZS5zTXwBwf_U8gjUTfIn",
"data": 2
}
]
}
],
"execution_end_time": 1633388475014,
"user": {
"backend_roles": [
@ -2142,7 +2385,6 @@ POST _plugins/_anomaly_detection/detectors/results/_search
"hits": [
{
"_index": ".opensearch-anomaly-results-history-2021.10.04-1",
"_type": "_doc",
"_id": "VRyRTXwBDx7vzPBV8jYC",
"_version": 1,
"_seq_no": 149657,
@ -2195,6 +2437,73 @@ POST _plugins/_anomaly_detection/detectors/results/_search
---
## Search top anomalies
Introduced 1.2
{: .label .label-purple }
Returns the top anomaly results for a high-cardinality detector, bucketed by categorical field values.
You can pass a `historical` boolean parameter to specify whether you want to analyze real-time or historical results.
#### Request
```json
GET _plugins/_anomaly_detection/detectors/<detectorId>/results/_topAnomalies?historical=false
{
"size": 3,
"category_field": [
"ip"
],
"order": "severity",
"task_id": "example-task-id",
"start_time_ms": 123456789000,
"end_time_ms": 987654321000
}
```
#### Sample response
```json
{
"buckets": [
{
"key": {
"ip": "1.2.3.4"
},
"doc_count": 10,
"max_anomaly_grade": 0.8
},
{
"key": {
"ip": "5.6.7.8"
},
"doc_count": 12,
"max_anomaly_grade": 0.6
},
{
"key": {
"ip": "9.10.11.12"
},
"doc_count": 3,
"max_anomaly_grade": 0.5
}
]
}
```
You can specify the following options.
Options | Description | Type | Required
:--- | :--- |:--- |:--- |
`size` | Specify the number of top buckets that you want to see. Default is 10. The maximum number is 10,000. | `integer` | No
`category_field` | Specify the set of category fields that you want to aggregate on. Defaults to all category fields for the detector. | `list` | No
`order` | Specify `severity` (anomaly grade) or `occurrence` (number of anomalies). Default is `severity`. | `string` | No
`task_id` | Specify a historical task ID to see results only from that specific task. Use only when `historical=true`, otherwise the anomaly detection plugin ignores this parameter. | `string` | No
`start_time_ms` | Specify the time to start analyzing results, in Epoch milliseconds. | `long` | Yes
`end_time_ms` | Specify the time to end analyzing results, in Epoch milliseconds. | `long` | Yes
---
## Get detector stats
Introduced 1.0
{: .label .label-purple }
@ -2907,6 +3216,10 @@ Introduced 1.1
Deletes the results of a detector based on a query.
The delete detector results API only deletes anomaly result documents in the default result index. It doesn't support deleting anomaly result documents stored in any custom result indices.
You need to manually delete anomaly result documents that you don't need from custom result indices.
#### Request
```json

View File

@ -25,20 +25,41 @@ To first test with sample streaming data, you can try out one of the preconfigur
A detector is an individual anomaly detection task. You can define multiple detectors, and all the detectors can run simultaneously, with each analyzing data from different sources.
1. Choose **Create detector**.
1. Enter a name and brief description. Make sure the name is unique and descriptive enough to help you to identify the purpose of the detector.
1. For **Data source**, choose the index you want to use as the data source. You can optionally use index patterns to choose multiple indices.
1. (Optional) For **Data filter**, filter the index you chose as the data source. From the **Data filter** menu, choose **Add data filter**, and then design your filter query by selecting **Field**, **Operator**, and **Value**, or choose **Use query DSL** and add your own JSON filter query.
1. Select the **Timestamp field** in your index.
1. For **Operation settings**, define the **Detector interval**, which is the time interval at which the detector collects data.
- The detector aggregates the data in this interval, then feeds the aggregated result into the anomaly detection model.
The shorter you set this interval, the fewer data points the detector aggregates.
The anomaly detection model uses a shingling process, a technique that uses consecutive data points to create a sample for the model. This process needs a certain number of aggregated data points from contiguous intervals.
- We recommend setting the detector interval based on your actual data. If it's too long it might delay the results, and if it's too short it might miss some data. It also won't have a sufficient number of consecutive data points for the shingle process.
1. (Optional) To add extra processing time for data collection, specify a **Window delay** value. This value tells the detector that the data is not ingested into OpenSearch in real time but with a certain delay.
Set the window delay to shift the detector interval to account for this delay.
- For example, say the detector interval is 10 minutes and data is ingested into your cluster with a general delay of 1 minute.
Assume the detector runs at 2:00. The detector attempts to get the last 10 minutes of data from 1:50 to 2:00, but because of the 1-minute delay, it only gets 9 minutes of data and misses the data from 1:59 to 2:00.
Setting the window delay to 1 minute shifts the interval window to 1:49 - 1:59, so the detector accounts for all 10 minutes of the detector interval time.
1. Add in the detector details.
- Enter a name and brief description. Make sure the name is unique and descriptive enough to help you to identify the purpose of the detector.
1. Specify the data source.
- For **Data source**, choose the index you want to use as the data source. You can optionally use index patterns to choose multiple indices.
- (Optional) For **Data filter**, filter the index you chose as the data source. From the **Data filter** menu, choose **Add data filter**, and then design your filter query by selecting **Field**, **Operator**, and **Value**, or choose **Use query DSL** and add your own JSON filter query.
1. Specify a timestamp.
- Select the **Timestamp field** in your index.
1. Define operation settings.
- For **Operation settings**, define the **Detector interval**, which is the time interval at which the detector collects data.
- The detector aggregates the data in this interval, then feeds the aggregated result into the anomaly detection model.
The shorter you set this interval, the fewer data points the detector aggregates.
The anomaly detection model uses a shingling process, a technique that uses consecutive data points to create a sample for the model. This process needs a certain number of aggregated data points from contiguous intervals.
- We recommend setting the detector interval based on your actual data. If it's too long it might delay the results, and if it's too short it might miss some data. It also won't have a sufficient number of consecutive data points for the shingle process.
- (Optional) To add extra processing time for data collection, specify a **Window delay** value.
- This value tells the detector that the data is not ingested into OpenSearch in real time but with a certain delay. Set the window delay to shift the detector interval to account for this delay.
- For example, say the detector interval is 10 minutes and data is ingested into your cluster with a general delay of 1 minute. Assume the detector runs at 2:00. The detector attempts to get the last 10 minutes of data from 1:50 to 2:00, but because of the 1-minute delay, it only gets 9 minutes of data and misses the data from 1:59 to 2:00. Setting the window delay to 1 minute shifts the interval window to 1:49 - 1:59, so the detector accounts for all 10 minutes of the detector interval time.
1. Specify custom result index.
- If you want to store the anomaly detection results in your own index, choose **Enable custom result index** and specify the custom index to store the result. The anomaly detection plugin adds an `opensearch-ad-plugin-result-` prefix to the index name that you input. For example, if you input `abc` as the result index name, the final index name is `opensearch-ad-plugin-result-abc`.
You can use the dash “-” sign to separate the namespace to manage custom result index permissions. For example, if you use `opensearch-ad-plugin-result-financial-us-group1` as the result index, you can create a permission role based on the pattern `opensearch-ad-plugin-result-financial-us-*` to represent the "financial" department at a granular level for the "us" area.
{: .note }
- If the custom index you specify doesnt already exist, the anomaly detection plugin creates this index when you create the detector and start your real-time or historical analysis.
- If the custom index already exists, the plugin checks if the index mapping of the custom index matches the anomaly result file. You need to make sure the custom index has valid mapping as shown here: [anomaly-results.json](https://github.com/opensearch-project/anomaly-detection/blob/main/src/main/resources/mappings/anomaly-results.json).
- To use the custom result index option, you need the following permissions:
- `indices:admin/create` - If the custom index already exists, you don't need this.
- `indices:data/write/index` - You need the `write` permission for the anomaly detection plugin to write results into the custom index for a single-entity detector.
- `indices:data/read/search` - You need the `search` permission because the anomaly detection plugin needs to search custom result indices to show results on the anomaly detection UI.
- `indices:data/write/delete` - Because the detector might generate a large number of anomaly results, you need the `delete` permission to delete old data and save disk space.
- `indices:data/write/bulk*` - You need the `bulk*` permission because the anomaly detection plugin uses the bulk API to write results into the custom index.
- Managing the custom result index:
- The anomaly detection dashboard queries all detectors results from all custom result indices. Having too many custom result indices might impact the performance of the anomaly detection plugin.
- You can use [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/ism/index/) to rollover old result indices. You can also manually delete or archive any old result indices. We recommend reusing a custom result index for multiple detectors.
1. Choose **Next**.
After you define the detector, the next step is to configure the model.
@ -110,7 +131,12 @@ We recommend experimenting with historical analysis with different feature sets
## Step 4: Review and create
Review your model configuration and select **Create detector**.
Review your detector settings and model configurations to make sure that they're valid and then select **Create detector**.
![Anomaly detection results]({{site.url}}{{site.baseurl}}/images/review_ad.png)
If you see any validation errors, edit the settings to fix the errors and then return back to this page.
{: .note }
## Step 5: Observe the results
@ -139,6 +165,12 @@ If you set the category field, you see an additional **Heat map** chart. The hea
Choose and drag over the anomaly line chart to zoom in and see a more detailed view of an anomaly.
{: .note }
If you have set multiple category fields, you can select a subset of fields to filter and sort the fields by. Selecting a subset of fields lets you see the top values of one field that share a common value with another field.
For example, if you have a detector with the category fields `ip` and `endpoint`, you can select `endpoint` in the **View by** dropdown menu. Then, select a specific cell to overlay the top 20 values of `ip` on the charts. The anomaly detection plugin selects the top `ip` by default. You can see a maximum of 5 individual time-series values at the same time.
![Anomaly detection results]({{site.url}}{{site.baseurl}}/images/multi-category-ad.png)
## Step 6: Set up alerts
Under **Real-time results**, choose **Set up alerts** and configure a monitor to notify you when anomalies are detected. For steps to create a monitor and set up notifications based on your anomaly detector, see [Monitors]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/monitors/).

View File

@ -0,0 +1,319 @@
---
layout: default
title: Anomaly result mapping
parent: Anomaly detection
nav_order: 6
---
# Anomaly result mapping
If you enabled custom result index, the anomaly detection plugin stores the results in your own index.
If the anomaly detector doesnt detect an anomaly, the result has the following format:
```json
{
"detector_id": "kzcZ43wBgEQAbjDnhzGF",
"schema_version": 5,
"data_start_time": 1635898161367,
"data_end_time": 1635898221367,
"feature_data": [
{
"feature_id": "processing_bytes_max",
"feature_name": "processing bytes max",
"data": 2322
},
{
"feature_id": "processing_bytes_avg",
"feature_name": "processing bytes avg",
"data": 1718.6666666666667
},
{
"feature_id": "processing_bytes_min",
"feature_name": "processing bytes min",
"data": 1375
},
{
"feature_id": "processing_bytes_sum",
"feature_name": "processing bytes sum",
"data": 5156
},
{
"feature_id": "processing_time_max",
"feature_name": "processing time max",
"data": 31198
}
],
"execution_start_time": 1635898231577,
"execution_end_time": 1635898231622,
"anomaly_score": 1.8124904404395776,
"anomaly_grade": 0,
"confidence": 0.9802940756605277,
"entity": [
{
"name": "process_name",
"value": "process_3"
}
],
"model_id": "kzcZ43wBgEQAbjDnhzGF_entity_process_3",
"threshold": 1.2368549346675202
}
```
## Response body fields
Field | Description
:--- | :---
`detector_id` | A unique ID for identifying a detector.
`schema_version` | The mapping version of the result index.
`data_start_time` | The start of the detection range of the aggregated data.
`data_end_time` | The end of the detection range of the aggregated data.
`feature_data` | An array of the aggregated data points between the `data_start_time` and `data_end_time`.
`execution_start_time` | The actual start time of the detector for a specific run that produces the anomaly result. This start time includes the window delay parameter that you can set to delay data collection. Window delay is the difference between the `execution_start_time` and `data_start_time`.
`execution_end_time` | The actual end time of the detector for a specific run that produces the anomaly result.
`anomaly_score` | Indicates relative severity of an anomaly. The higher the score, the more anomalous a data point is.
`anomaly_grade` | A normalized version of the `anomaly_score` on a scale between 0 and 1.
`confidence` | The probability of the accuracy of the `anomaly_score`. The closer this number is to 1, the higher the accuracy. During the probation period of a running detector, the confidence is low (< 0.9) because of its exposure to limited data.
`entity` | An entity is a combination of specific category fields values. It includes the name and value of the category field. In the previous example, `process_name` is the category field and one of the processes such as `process_3` is the field's value. The `entity` field is only present for a high-cardinality detector (where you've selected a category field).
`model_id` | A unique ID that identifies a model. If a detector is a single-stream detector (with no category field), it has only one model. If a detector is a high-cardinality detector (with one or more category fields), it might have multiple models, one for each entity.
`threshold` | One of the criteria for a detector to classify a data point as an anomaly is that its `anomaly_score` must surpass a dynamic threshold. This field records the current threshold.
If an anomaly detector detects an anomaly, the result has the following format:
```json
{
"detector_id": "fylE53wBc9MCt6q12tKp",
"schema_version": 0,
"data_start_time": 1635927900000,
"data_end_time": 1635927960000,
"feature_data": [
{
"feature_id": "processing_bytes_max",
"feature_name": "processing bytes max",
"data": 2291
},
{
"feature_id": "processing_bytes_avg",
"feature_name": "processing bytes avg",
"data": 1677.3333333333333
},
{
"feature_id": "processing_bytes_min",
"feature_name": "processing bytes min",
"data": 1054
},
{
"feature_id": "processing_bytes_sum",
"feature_name": "processing bytes sum",
"data": 5032
},
{
"feature_id": "processing_time_max",
"feature_name": "processing time max",
"data": 11422
}
],
"anomaly_score": 1.1986675882872033,
"anomaly_grade": 0.26806225550178464,
"confidence": 0.9607519742565531,
"entity": [
{
"name": "process_name",
"value": "process_3"
}
],
"approx_anomaly_start_time": 1635927900000,
"relevant_attribution": [
{
"feature_id": "processing_bytes_max",
"data": 0.03628638020431366
},
{
"feature_id": "processing_bytes_avg",
"data": 0.03384479053991436
},
{
"feature_id": "processing_bytes_min",
"data": 0.058812549572819096
},
{
"feature_id": "processing_bytes_sum",
"data": 0.10154576265526988
},
{
"feature_id": "processing_time_max",
"data": 0.7695105170276828
}
],
"expected_values": [
{
"likelihood": 1,
"value_list": [
{
"feature_id": "processing_bytes_max",
"data": 2291
},
{
"feature_id": "processing_bytes_avg",
"data": 1677.3333333333333
},
{
"feature_id": "processing_bytes_min",
"data": 1054
},
{
"feature_id": "processing_bytes_sum",
"data": 6062
},
{
"feature_id": "processing_time_max",
"data": 23379
}
]
}
],
"threshold": 1.0993584705913992,
"execution_end_time": 1635898427895,
"execution_start_time": 1635898427803
}
```
You can see the following additional fields:
Field | Description
:--- | :---
`relevant_attribution` | Represents the contribution of each input variable. The sum of the attributions is normalized to 1.
`expected_values` | The expected value for each feature.
At times, the detector might detect an anomaly late.
Let's say the detector sees a random mix of the triples {1, 2, 3} and {2, 4, 5} that correspond to `slow weeks` and `busy weeks`, respectively. For example 1, 2, 3, 1, 2, 3, 2, 4, 5, 1, 2, 3, 2, 4, 5, ... and so on.
If the detector comes across a pattern {2, 2, X} and it's yet to see X, the detector infers that the pattern is anomalous, but it can't determine at this point which of the 2's is the cause. If X = 3, then the detector knows it's the first 2 in that unfinished triple, and if X = 5, then it's the second 2. If it's the first 2, then the detector detects the anomaly late.
If a detector detects an anomaly late, the result has the following additional fields:
Field | Description
:--- | :---
`past_values` | The actual input that triggered an anomaly. If `past_values` is null, the attributions or expected values are from the current input. If `past_values` is not null, the attributions or expected values are from a past input (for example, the previous two steps of the data [1,2,3]).
`approx_anomaly_start_time` | The approximate time of the actual input that triggers an anomaly. This field helps you understand when a detector flags an anomaly. Both single-stream and high-cardinality detectors don't query previous anomaly results because these queries are expensive operations. The cost is especially high for high-cardinality detectors that might have a lot of entities. If the data is not continuous, the accuracy of this field is low and the actual time that the detector detects an anomaly can be earlier.
```json
{
"detector_id": "kzcZ43wBgEQAbjDnhzGF",
"confidence": 0.9746820962328963,
"relevant_attribution": [
{
"feature_id": "deny_max1",
"data": 0.07339452532666227
},
{
"feature_id": "deny_avg",
"data": 0.04934972719948845
},
{
"feature_id": "deny_min",
"data": 0.01803003656061806
},
{
"feature_id": "deny_sum",
"data": 0.14804918212089874
},
{
"feature_id": "accept_max5",
"data": 0.7111765287923325
}
],
"task_id": "9Dck43wBgEQAbjDn4zEe",
"threshold": 1,
"model_id": "kzcZ43wBgEQAbjDnhzGF_entity_app_0",
"schema_version": 5,
"anomaly_score": 1.141419389056506,
"execution_start_time": 1635898427803,
"past_values": [
{
"feature_id": "processing_bytes_max",
"data": 905
},
{
"feature_id": "processing_bytes_avg",
"data": 479
},
{
"feature_id": "processing_bytes_min",
"data": 128
},
{
"feature_id": "processing_bytes_sum",
"data": 1437
},
{
"feature_id": "processing_time_max",
"data": 8440
}
],
"data_end_time": 1635883920000,
"data_start_time": 1635883860000,
"feature_data": [
{
"feature_id": "processing_bytes_max",
"feature_name": "processing bytes max",
"data": 1360
},
{
"feature_id": "processing_bytes_avg",
"feature_name": "processing bytes avg",
"data": 990
},
{
"feature_id": "processing_bytes_min",
"feature_name": "processing bytes min",
"data": 608
},
{
"feature_id": "processing_bytes_sum",
"feature_name": "processing bytes sum",
"data": 2970
},
{
"feature_id": "processing_time_max",
"feature_name": "processing time max",
"data": 9670
}
],
"expected_values": [
{
"likelihood": 1,
"value_list": [
{
"feature_id": "processing_bytes_max",
"data": 905
},
{
"feature_id": "processing_bytes_avg",
"data": 479
},
{
"feature_id": "processing_bytes_min",
"data": 128
},
{
"feature_id": "processing_bytes_sum",
"data": 4847
},
{
"feature_id": "processing_time_max",
"data": 15713
}
]
}
],
"execution_end_time": 1635898427895,
"anomaly_grade": 0.5514172746375128,
"entity": [
{
"name": "process_name",
"value": "process_3"
}
],
"approx_anomaly_start_time": 1635883620000
}
```

View File

@ -9,7 +9,6 @@ nav_order: 15
Use the alerting API to programmatically manage monitors and alerts.
---
#### Table of contents
@ -509,12 +508,84 @@ POST _plugins/_alerting/monitors
}
}
```
## Create a document-level monitor
Introduced 2.0
{: .label .label-purple }
You can create a document-level monitor programmatically that functions the same as a per document monitor in OpenSearch Dashboards. The document-level monitor runs a query that checks whether or not the results should trigger an alert notification based on individual documents within an index.
You can combine multiple queries in a per document monitor by adding a tag to each query and then setting the trigger condition to that tag.
For more information about document-level monitors, see [Monitor types]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/monitors/#monitor-types).
#### Sample request
The following sample shows how to create a per document monitor:
```json
POST _plugins/_alerting/monitors
{
"type": "monitor",
"monitor_type": "doc_level_monitor",
"name": "iad-monitor",
"enabled": true,
"createdBy": "chip",
"schedule": {
"period": {
"interval": 1,
"unit": "MINUTES"
}
},
"inputs": [
{
"doc_level_input": {
"description": "windows-powershell",
"indices": [
"test-logs"
],
"queries": [
{
"id": "sigma-123",
"query": "region:\"us-west-2\"",
"tags": [
"MITRE:8500"
],
},
{
"id": "sigma-456",
"query": "region:\"us-east-1\"",
"tags": [
"MITRE:8600"
],
}
]
}
}
],
"triggers": [ { "document_level_trigger": {
"name": "test-trigger",
"severity": "1",
"condition": {
"script": {
"source": "'sigma-123' && !'sigma-456'",
"lang": "painless"
}
},
"actions": []
}}]
}
```
### Limitations
If you run a document-level query while the index is getting reindexed, the results will not return the reindexed results. You would need to run the query again to get the updates.
{: .tip}
## Update monitor
Introduced 1.0
{: .label .label-purple }
When you update a monitor, include the current version number as a parameter. OpenSearch increments the version number automatically (see the sample response).
When updating a monitor, you can optionally include `seq_no` and `primary_term` as URL parameters. If these numbers don't match the existing monitor or the monitor doesn't exist, the alerting plugin throws an error. OpenSearch increments the version number and the sequence number automatically (see the sample response).
#### Request
@ -570,6 +641,58 @@ PUT _plugins/_alerting/monitors/<monitor_id>
}],
"last_update_time": 1551466639295
}
PUT _plugins/_alerting/monitors/<monitor_id>?if_seq_no=3&if_primary_term=1
{
"type": "monitor",
"name": "test-monitor",
"enabled": true,
"enabled_time": 1551466220455,
"schedule": {
"period": {
"interval": 1,
"unit": "MINUTES"
}
},
"inputs": [{
"search": {
"indices": [
"*"
],
"query": {
"query": {
"match_all": {
"boost": 1
}
}
}
}
}],
"triggers": [{
"id": "StaeOmkBC25HCRGmL_y-",
"name": "test-trigger",
"severity": "1",
"condition": {
"script": {
"source": "return true",
"lang": "painless"
}
},
"actions": [{
"name": "test-action",
"destination_id": "RtaaOmkBC25HCRGm0fxi",
"subject_template": {
"source": "My Message Subject",
"lang": "mustache"
},
"message_template": {
"source": "This is my message body.",
"lang": "mustache"
}
}]
}],
"last_update_time": 1551466639295
}
```
#### Sample response
@ -578,6 +701,8 @@ PUT _plugins/_alerting/monitors/<monitor_id>
{
"_id": "Q9aXOmkBC25HCRGmzfw-",
"_version": 4,
"_seq_no": 4,
"_primary_term": 1,
"monitor": {
"type": "monitor",
"name": "test-monitor",
@ -650,6 +775,8 @@ GET _plugins/_alerting/monitors/<monitor_id>
{
"_id": "Q9aXOmkBC25HCRGmzfw-",
"_version": 3,
"_seq_no": 3,
"_primary_term": 1,
"monitor": {
"type": "monitor",
"name": "test-monitor",
@ -912,7 +1039,6 @@ DELETE _plugins/_alerting/monitors/<monitor_id>
```json
{
"_index": ".opensearch-scheduled-jobs",
"_type": "_doc",
"_id": "OYAHOmgBl3cmwnqZl_yH",
"_version": 2,
"result": "deleted",
@ -1223,7 +1349,9 @@ POST _plugins/_alerting/destinations
```json
{
"_id": "nO-yFmkB8NzS6aXjJdiI",
"_version": 1,
"_version" : 1,
"_seq_no" : 3,
"_primary_term" : 1,
"destination": {
"type": "slack",
"name": "my-destination",
@ -1242,6 +1370,8 @@ POST _plugins/_alerting/destinations
Introduced 1.0
{: .label .label-purple }
When updating a destination, you can optionally include `seq_no` and `primary_term` as URL parameters. If these numbers don't match the existing destination or the destination doesn't exist, the alerting plugin throws an error. OpenSearch increments the version number and the sequence number automatically (see the sample response).
#### Request
```json
@ -1253,6 +1383,15 @@ PUT _plugins/_alerting/destinations/<destination-id>
"url": "http://www.example.com"
}
}
PUT _plugins/_alerting/destinations/<destination-id>?if_seq_no=3&if_primary_term=1
{
"name": "my-updated-destination",
"type": "slack",
"slack": {
"url": "http://www.example.com"
}
}
```
#### Sample response
@ -1260,7 +1399,9 @@ PUT _plugins/_alerting/destinations/<destination-id>
```json
{
"_id": "pe-1FmkB8NzS6aXjqvVY",
"_version": 4,
"_version" : 2,
"_seq_no" : 4,
"_primary_term" : 1,
"destination": {
"type": "slack",
"name": "my-updated-destination",
@ -1438,6 +1579,8 @@ POST _plugins/_alerting/destinations/email_accounts
Introduced 1.0
{: .label .label-purple }
When updating an email account, you can optionally include `seq_no` and `primary_term` as URL parameters. If these numbers don't match the existing email account or the email account doesn't exist, the alerting plugin throws an error. OpenSearch increments the version number and the sequence number automatically (see the sample response).
#### Request
```json
PUT _plugins/_alerting/destinations/email_accounts/<email_account_id>
@ -1448,6 +1591,15 @@ PUT _plugins/_alerting/destinations/email_accounts/<email_account_id>
"port": 465,
"method": "ssl"
}
PUT _plugins/_alerting/destinations/email_accounts/<email_account_id>?if_seq_no=18&if_primary_term=2
{
"name": "example_account",
"email": "example@email.com",
"host": "smtp.email.com",
"port": 465,
"method": "ssl"
}
```
#### Sample response
```json
@ -1636,6 +1788,8 @@ POST _plugins/_alerting/destinations/email_groups
Introduced 1.0
{: .label .label-purple }
When updating an email group, you can optionally include `seq_no` and `primary_term` as URL parameters. If these numbers don't match the existing email group or the email group doesn't exist, the alerting plugin throws an error. OpenSearch increments the version number and the sequence number automatically (see the sample response).
#### Request
```json
@ -1646,6 +1800,14 @@ PUT _plugins/_alerting/destinations/email_groups/<email_group_id>
"email": "example@email.com"
}]
}
PUT _plugins/_alerting/destinations/email_groups/<email_group_id>?if_seq_no=16&if_primary_term=2
{
"name": "example_email_group",
"emails": [{
"email": "example@email.com"
}]
}
```
#### Sample response

View File

@ -11,7 +11,11 @@ redirect_from:
OpenSearch Dashboards
{: .label .label-yellow :}
The alerting feature notifies you when data from one or more OpenSearch indices meets certain conditions. For example, you might want to notify a [Slack](https://slack.com/) channel if your application logs more than five HTTP 503 errors in one hour, or you might want to page a developer if no new documents have been indexed in the past 20 minutes.
You can use the Alerting plugin in OpenSearch Dashboards to monitor your data and create alert notifications that trigger when conditions occur in one or more indexes.
You create a monitor with trigger conditions that generate various alert notifications through the message channel you select as a destination. Notifications can be sent to email, Slack, or Amazon Chime.
The monitor you create notifies you when data from one or more OpenSearch indexes meets certain conditions. For example, you might want to notify a [Slack](https://slack.com/) channel if your application logs more than five HTTP 503 errors in one hour, or you might want to page a developer if no new documents have been indexed in the past 20 minutes.
To get started, choose **Alerting** in OpenSearch Dashboards.

View File

@ -12,31 +12,50 @@ has_children: false
- TOC
{:toc}
---
## Monitor types
The OpenSearch Dashboard Alerting plugin provides four monitor types:
* **per query** This monitor runs a query and generates alert notifications based on criteria that matches.
* **per bucket** This monitor runs a query that evaluates trigger criteria based on aggregated values in the dataset.
* **per cluster metrics** This monitor runs API requests on the cluster to monitor its health.
* **per document** This monitor runs a query (or multiple queries combined by a tag) that returns individual documents that match the alert notification trigger condition.
### More about per document monitors
The per query and per bucket monitors can only take a single query with one trigger condition. Per document monitors allow you to combine multiple query trigger conditions by adding a tag to the queries. Then you can add the tag as a single trigger condition instead of specifying a single query. The Alerting plugin processes the trigger conditions from all queries as a logical OR operation, so if any of the query conditions are met, it generates the alert notification.
The Alerting plugin also creates a list of document findings data that contains metadata about which document matches each query. Security analytics can use the document findings data to keep track of and analyze the query data separately from the alert processes.
The following metadata is provided for each document finding entry:
* **Document** The document ID and index name.
* **Query** The query name that matched the document.
* **Time found** The timestamp that indicates when the document was found during the runtime.
## Key terms
Term | Definition
:--- | :---
Monitor | A job that runs on a defined schedule and queries OpenSearch indices. The results of these queries are then used as input for one or more *triggers*.
Monitor | A job that runs on a defined schedule and queries OpenSearch indexes. The results of these queries are then used as input for one or more *triggers*.
Trigger | Conditions that, if met, generate *alerts*.
Tag | A label that can be applied to multiple queries to combine them with the logical OR operation in a per document monitor. You can't use tags with other monitor types.
Alert | An event associated with a trigger. When an alert is created, the trigger performs *actions*, which can include sending a notification.
Action | The information that you want the monitor to send out after being triggered. Actions have a *destination*, a message subject, and a message body.
Channel | A notification channel to use in an action. See [notifications]({{site.url}}{{site.baseurl}}/notifications-plugin/index) for more information.
---
## Create monitors
## Create a monitor
1. Choose **Alerting**, **Monitors**, **Create monitor**.
1. Specify a name for the monitor.
1. Choose either **Per query monitor** or **Per bucket monitor**.
1. Choose either **Per query monitor**, **Per bucket monitor**, **Per cluster metrics monitor**, or **Per document monitor**.
Whereas query-level monitors run your specified query and then check whether the query's results triggers any alerts, bucket-level monitors let you select fields to create buckets and categorize your results into those buckets. The alerting plugin runs each bucket's unique results against a script you define later, so you have finer control over which results should trigger alerts. Each of those buckets can trigger an alert, but query-level monitors can only trigger one alert at a time.
Per query monitors run your specified query and then check whether the query's results trigger any alerts. Per bucket monitors let you select which fields to create buckets and categorize your results into those buckets. The Alerting plugin runs each bucket's unique results against a script you define later, so you have finer control over which results should trigger alerts. Each of those buckets can trigger an alert, but query-level monitors can only trigger one alert at a time.
1. Define the monitor in one of three ways: visually, using a query, or using an anomaly detector.
1. Decide how you want to define your query and triggers. You can use any of the following methods: visual editor, query editor, or anomaly detector.
- Visual definition works well for monitors that you can define as "some value is above or below some threshold for some amount of time."
@ -91,7 +110,7 @@ Whereas query-level monitors run your specified query and then check whether the
To define a monitor visually, choose **Visual editor**. Then choose a source index, a timeframe, an aggregation (for example, `count()` or `average()`), a data filter if you want to monitor a subset of your source index, and a group-by field if you want to include an aggregation field in your query. At least one group-by field is required if you're defining a bucket-level monitor. Visual definition works well for most monitors.
If you use the security plugin, you can only choose indices that you have permission to access. For details, see [Alerting security]({{site.url}}{{site.baseurl}}/security-plugin/).
If you use the Security plugin, you can only choose indexes that you have permission to access. For details, see [Alerting security]({{site.url}}{{site.baseurl}}/security-plugin/).
To use a query, choose **Extraction query editor**, add your query (using [the OpenSearch query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/)), and test it using the **Run** button.
@ -112,12 +131,11 @@ Whereas query-level monitors run your specified query and then check whether the
**Note**: Anomaly detection is available only if you are defining a per query monitor.
{: .note}
1. Choose a frequency and timezone for your monitor. Note that you can only pick a timezone if you choose Daily, Weekly, Monthly, or [custom cron expression]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/cron/) for frequency.
1. Choose how frequently to run your monitor. You can run it either by time intervals (minutes, hours, or days) or on a schedule. If you run it on a daily, weekly or monthly schedule or according to a custom [custom cron expression]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/cron/), then you need to also provide the time zone.
1. Add a trigger to your monitor.
---
## Create triggers
Steps to create a trigger differ depending on whether you chose **Visual editor**, **Extraction query editor**, or **Anomaly detector** when you created the monitor.
@ -134,6 +152,15 @@ The line moves up and down as you increase and decrease the threshold. Once this
Bucket-level monitors also require you to specify a threshold and value for your aggregation and timeframe, but you can use a maximum of five conditions to better refine your trigger. Optionally, you can also use a keyword filter to filter for a specific field in your index.
Document-level monitors provide the added option to use tags that represent multiple queries connected by the logical OR operator.
To create a multiple query combination trigger, do the following steps:
1. Create a per document monitor with more than one query.
2. Create the first query with a field, an operator, and a value. For example, set the query to search for the `region` field with either operator: "is" or "is not", and set the value "us-west-2".
3. Select **Add Tag** and give the tag a name.
3. Create the second query and add the same tag to it.
4. Now you can create the trigger condition and specify the tag name. This creates a combination trigger that checks two queries that both contain the same tag. The monitor checks both queries with a logical OR operation and if either query's conditions are met, then it will generate the alert notification.
### Extraction query
@ -230,8 +257,8 @@ Variable | Data Type | Description
`ctx.monitor.schedule` | Object | Contains a schedule of how often or when the monitor should run.
`ctx.monitor.schedule.period.interval` | Integer | The interval at which the monitor runs.
`ctx.monitor.schedule.period.unit` | String | The interval's unit of time.
`ctx.monitor.inputs` | Array | An array that contains the indices and definition used to create the monitor.
`ctx.monitor.inputs.search.indices` | Array | An array that contains the indices the monitor observes.
`ctx.monitor.inputs` | Array | An array that contains the indexes and definition used to create the monitor.
`ctx.monitor.inputs.search.indices` | Array | An array that contains the indexes the monitor observes.
`ctx.monitor.inputs.search.query` | N/A | The definition used to define the monitor.
#### Trigger variables
@ -354,3 +381,112 @@ Acknowledged | Someone has acknowledged the alert, but not fixed the root cause.
Completed | The alert is no longer ongoing. Alerts enter this state after the corresponding trigger evaluates to false.
Error | An error occurred while executing the trigger---usually the result of a a bad trigger or destination.
Deleted | Someone deleted the monitor or trigger associated with this alert while the alert was ongoing.
---
## Create cluster metrics monitor
In addition to monitoring conditions for indexes, the Alerting plugin allows monitoring conditions for clusters. Alerts can be set by cluster metrics to watch for the following conditions:
- The health of your cluster reaches a status of yellow or red
- Cluster-level metrics, such as CPU usage and JVM memory usage, reach specified thresholds
- Node-level metrics, such as available disk space, JVM memory usage, and CPU usage, reach a specified threshold
- The total number of documents stores reaches a specified amount
To create a cluster metrics monitor:
1. Select **Alerting** > **Monitors** > **Create monitor**.
2. Select the **Per cluster metrics monitor** option.
3. In the Query section, pick the **Request type** from the dropdown.
4. (Optional) If you want to filter the API response to use only certain path parameters, enter those parameters under **Query parameters**. Most APIs that can be used to monitor cluster status support path parameters as described in their documentation (e.g., comma-separated lists of index names).
5. In the Triggers section, indicate what conditions trigger an alert. The trigger condition autopopulates a painless ctx variable. For example, a cluster monitor watching for Cluster Stats uses the trigger condition `ctx.results[0].indices.count <= 0`, which triggers an alert based on the number of indexes returned by the query. For more specificity, add any additional painless conditions supported by the API. To see an example of the condition response, select **Preview condition response**.
6. In the Actions section, indicate how you want your users to be notified when a trigger condition is met.
7. Select **Create**. Your new monitor appears in the **Monitors** list.
### Supported APIs
Trigger conditions use responses from the following APIs. Most APIs that can be used to monitor cluster status support path parameters as described in their documentation (e.g., comma-separated lists of index names). However, they do not support query parameters.
1. [_cluster/health]({{site.url}}{{site.baseurl}}/opensearch/rest-api/cluster-health/)
2. [_cluster/stats]({{site.url}}{{site.baseurl}}/opensearch/rest-api/cluster-stats/)
3. [_cluster/settings]({{site.url}}{{site.baseurl}}/opensearch/rest-api/cluster-settings/)
4. [_nodes/stats]({{site.url}}{{site.baseurl}}/opensearch/popular-api/#get-node-statistics)
5. [_cat/pending_tasks]({{site.url}}{{site.baseurl}}/opensearch/rest-api/cat/cat-pending-tasks/)
6. [_cat/recovery]({{site.url}}{{site.baseurl}}/opensearch/rest-api/cat/cat-recovery/)
7. [_cat/snapshots]({{site.url}}{{site.baseurl}}/opensearch/rest-api/cat/cat-snapshots/)
8. [_cat/tasks]({{site.url}}{{site.baseurl}}/opensearch/rest-api/cat/cat-tasks/)
### Restrict API fields
If you want to hide fields from the API response that you do not want exposed for alerting, reconfigure the [supported_json_payloads.json](https://github.com/opensearch-project/alerting/blob/main/alerting/src/main/resources/org/opensearch/alerting/settings/supported_json_payloads.json) file inside the Alerting plugin. The file functions as an allow list for the API fields you want to use in an alert. By default, all APIs and their parameters can be used for monitors and trigger conditions.
However, you can modify the file so that cluster metric monitors can only be created for APIs referenced. Furthermore, only fields referenced in the supported files can create trigger conditions. This `supported_json_payloads.json` allows for a cluster metrics monitor to be created for the `_cluster/stats` API, and triggers conditions for the `indices.shards.total` and `indices.shards.index.shards.min` fields.
```json
"/_cluster/stats": {
"indices": [
"shards.total",
"shards.index.shards.min"
]
}
```
### Painless triggers
Painless scripts define triggers for cluster metrics monitors, similar to query or bucket-level monitors that are defined using the extraction query definition option. Painless scripts are comprised of at least one statement and any additional functions you wish to execute.
The cluster metrics monitor supports up to **ten** triggers.
In this example, a JSON object creates a trigger that sends an alert when the Cluster Health is yellow. `script` points the `source` to the painless script `ctx.results[0].status == \"yellow\`.
```json
{
"name": "Cluster Health Monitor",
"type": "monitor",
"monitor_type": "query_level_monitor",
"enabled": true,
"schedule": {
"period": {
"unit": "MINUTES",
"interval": 1
}
},
"inputs": [
{
"uri": {
"api_type": "CLUSTER_HEALTH",
"path": "_cluster/health/",
"path_params": "",
"url": "http://localhost:9200/_cluster/health/"
}
}
],
"triggers": [
{
"query_level_trigger": {
"id": "Tf_L_nwBti6R6Bm-18qC",
"name": "Yellow status trigger",
"severity": "1",
"condition": {
"script": {
"source": "ctx.results[0].status == \"yellow\"",
"lang": "painless"
}
},
"actions": []
}
}
]
}
```
See [trigger variables](#trigger-variables) for more painless ctx options.
### Limitations
Currently, the cluster metrics monitor has the following limitations:
- You cannot create monitors for remote clusters.
- The OpenSearch cluster must be in a state where an index's conditions can be monitored and actions can be executed against the index.
- Removing resource permissions from a user will not prevent that users preexisting monitors for that resource from executing.
- Users with permissions to create monitors are not blocked from creating monitors for resources for which they do not have permissions; however, those monitors will not execute.

View File

@ -31,6 +31,12 @@ Later, the user `psantos` wants to edit the monitor to run every two hours, but
After making the change, the monitor now runs with the same permissions as `psantos`, including any [document-level security]({{site.url}}{{site.baseurl}}/security-plugin/access-control/document-level-security/) queries, [excluded fields]({{site.url}}{{site.baseurl}}/security-plugin/access-control/field-level-security/), and [masked fields]({{site.url}}{{site.baseurl}}/security-plugin/access-control/field-masking/). If you use an extraction query to define your monitor, use the **Run** button to ensure that the response includes the fields you need.
Once a monitor is created, the Alerting plugin will continue executing the monitor, even if the user who created the monitor has their permissions removed. Only a user with the correct cluster permissions can manually disable or delete a monitor to stop it from executing:
- Disable a monitor: `cluster:admin/opendistro/alerting/monitor/write`
- Delete a monitor: `cluster:admin/opendistro/alerting/monitor/delete`
If your monitor's trigger has notifications configured, the Alerting plugin continues to send out notifications regardless of destination type. To stop notifications, a user must manually delete them in the trigger's actions.
## (Advanced) Limit access by backend role
@ -55,7 +61,6 @@ Now when users view alerting resources in OpenSearch Dashboards (or make REST AP
If `jdoe` creates a monitor, `jroe` can see and modify it, but `psantos` can't. If that monitor generates an alert, the situation is the same: `jroe` can see and acknowledge it, but `psantos` can't. If `psantos` creates a destination, `jdoe` and `jroe` can't see or modify it.
<!-- ## (Advanced) Limit access by individual
If you only want users to be able to see and modify their own monitors and destinations, duplicate the `alerting_full_access` role and add the following [DLS query]({{site.url}}{{site.baseurl}}/security-plugin/access-control/document-level-security/) to it:

View File

@ -116,8 +116,6 @@ GET localhost:9600/_plugins/_performanceanalyzer/metrics/units
{
"Disk_Utilization": "%",
"Cache_Request_Hit": "count",
"TermVectors_Memory": "B",
"Segments_Memory": "B",
"HTTP_RequestDocs": "count",
"Net_TCP_Lost": "segments/flow",
"Refresh_Time": "ms",
@ -138,7 +136,6 @@ GET localhost:9600/_plugins/_performanceanalyzer/metrics/units
"Merge_CurrentEvent": "count",
"Indexing_Buffer": "B",
"Bitset_Memory": "B",
"Norms_Memory": "B",
"Net_PacketDropRate4": "packets/s",
"Heap_Committed": "B",
"Net_PacketDropRate6": "packets/s",
@ -163,7 +160,6 @@ GET localhost:9600/_plugins/_performanceanalyzer/metrics/units
"Flush_Event": "count",
"Net_TCP_RxQ": "segments/flow",
"Refresh_Event": "count",
"Points_Memory": "B",
"Flush_Time": "ms",
"Heap_Init": "B",
"CPU_Utilization": "cores",
@ -175,8 +171,6 @@ GET localhost:9600/_plugins/_performanceanalyzer/metrics/units
"Net_TCP_SendCWND": "B/flow",
"Cache_Request_Eviction": "count",
"Segments_Total": "count",
"Terms_Memory": "B",
"DocValues_Memory": "B",
"Heap_Used": "B",
"Cache_FieldData_Eviction": "count",
"IO_TotalSyscallRate": "count/s",
@ -184,7 +178,6 @@ GET localhost:9600/_plugins/_performanceanalyzer/metrics/units
"Net_Throughput": "B/s",
"Paging_RSS": "pages",
"Indexing_ThrottleTime": "ms",
"StoredFields_Memory": "B",
"IndexWriter_Memory": "B",
"Master_PendingQueueSize": "count",
"Net_TCP_SSThresh": "B/flow",

View File

@ -11,7 +11,7 @@ redirect_from:
Performance Analyzer is an agent and REST API that allows you to query numerous performance metrics for your cluster, including aggregations of those metrics, independent of the Java Virtual Machine (JVM). PerfTop is the default command line interface (CLI) for displaying those metrics.
To download PerfTop, see [Download](https://opensearch.org/downloads.html) on the OpenSearch website.
To download PerfTop, see [Download](https://github.com/opensearch-project/perftop/releases) on the PerfTop release page.
You can also install it using [npm](https://www.npmjs.com/):
@ -88,7 +88,7 @@ mount -o remount /dev/shm
Performance Analyzer supports encryption in transit for requests. It currently does *not* support client or server authentication for requests. To enable encryption in transit, edit `performance-analyzer.properties` in your `$OPENSEARCH_HOME` directory:
```bash
vi $OPENSEARCH_HOME/plugins/opensearch-performance-analyzer/pa_config/performance-analyzer.properties
vi $OPENSEARCH_HOME/config/opensearch-performance-analyzer/performance-analyzer.properties
```
Change the following lines to configure encryption in transit. Note that `certificate-file-path` must be a certificate for the server, not a root CA:

View File

@ -26,7 +26,7 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look
<tr>
<td>CPU_Utilization
</td>
<td rowspan="18">ShardID, IndexName, Operation, ShardRole
<td rowspan="20">ShardID, IndexName, Operation, ShardRole
</td>
<td>CPU usage ratio. CPU time (in milliseconds) used by the associated thread(s) in the past five seconds, divided by 5000 milliseconds.
</td>
@ -121,6 +121,18 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look
<td>The total number of times that the associated thread(s) blocked to enter or reenter a monitor (i.e. the number of times a thread has been in the blocked state).
</td>
</tr>
<tr>
<td>Thread_Waited_Time
</td>
<td>Average time (seconds) that the associated thread(s) waited to enter or reenter a monitor in WAITING or TIMED_WAITING state.
</td>
</tr>
<tr>
<td>Thread_Waited_Event
</td>
<td>The total number of times that the associated thread(s) waited to enter or reenter a monitor (i.e. the number of times a thread has been in the WAITING or TIMED_WAITING state).
</td>
</tr>
<tr>
<td>ShardEvents
</td>
@ -249,48 +261,6 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look
<td>The number of segments.
</td>
</tr>
<tr>
<td>Segments_Memory
</td>
<td>Estimated memory usage of segments in bytes.
</td>
</tr>
<tr>
<td>Terms_Memory
</td>
<td>Estimated memory usage of terms dictionaries in bytes.
</td>
</tr>
<tr>
<td>StoredFields_Memory
</td>
<td>Estimated memory usage of stored fields in bytes.
</td>
</tr>
<tr>
<td>TermVectors_Memory
</td>
<td>Estimated memory usage of term vectors in bytes.
</td>
</tr>
<tr>
<td>Norms_Memory
</td>
<td>Estimated memory usage of norms (normalization factors) in bytes.
</td>
</tr>
<tr>
<td>Points_Memory
</td>
<td>Estimated memory usage of points in bytes.
</td>
</tr>
<tr>
<td>DocValues_Memory
</td>
<td>Estimated memory usage of doc values in bytes.
</td>
</tr>
<tr>
<td>IndexWriter_Memory
</td>
@ -315,6 +285,38 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look
<td>Estimated disk usage of the shard in bytes.
</td>
</tr>
<tr>
<td>Indexing_Pressure_Current_Limits
</td>
<td rowspan="5">ShardID, IndexName, IndexingStage
</td>
<td>Total heap size (in bytes) that is available for utilization by a shard of an index in a particular indexing stage (Coordinating, Primary or Replica).
</td>
</tr>
<tr>
<td>Indexing_Pressure_Current_Bytes
</td>
<td>Total heap size (in bytes) occupied by a shard of an index in a particular indexing stage (Coordinating, Primary or Replica).
</td>
</tr>
<tr>
<td>Indexing_Pressure_Last_Successful_Timestamp
</td>
<td>Timestamp of a request that was successful for a shard of an index in a particular indexing stage (Coordinating, Primary or Replica).
</td>
</tr>
<tr>
<td>Indexing_Pressure_Rejection_Count
</td>
<td>Total rejections performed by OpenSearch for a shard of an index in a particular indexing stage (Coordinating, Primary or Replica).
</td>
</tr>
<tr>
<td>Indexing_Pressure_Average_Window_Throughput
</td>
<td>Average throughput of the last n requests (The value of n is determined by `shard_indexing_pressure.secondary_parameter.throughput.request_size_window` setting) for a shard of an index in a particular indexing stage (Coordinating, Primary or Replica).
</td>
</tr>
<tr>
<td>Latency
</td>
@ -424,7 +426,7 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look
</td>
<td rowspan="5">Direction
</td>
<td>The total number of IPv4 datagrams transmitted/received from/by interfaces per second, including those transmitted or received in error
<td>The total number of IPv4 datagrams transmitted/received from/by interfaces per second, including those transmitted or received in error.
</td>
</tr>
<tr>
@ -454,7 +456,7 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look
<tr>
<td>ThreadPool_QueueSize
</td>
<td rowspan="4">ThreadPoolType
<td rowspan="6">ThreadPoolType
</td>
<td>The size of the task queue.
</td>
@ -477,10 +479,22 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look
<td>The approximate number of threads that are actively executing tasks.
</td>
</tr>
<tr>
<td>ThreadPool_QueueLatency
</td>
<td>The latency of the task queue.
</td>
</tr>
<tr>
<td>ThreadPool_QueueCapacity
</td>
<td>The current capacity of the task queue.
</td>
</tr>
<tr>
<td>Master_PendingQueueSize
</td>
<td>N/A
<td>Master_PendingTaskType
</td>
<td>The current number of pending tasks in the cluster state update thread. Each node has a cluster state update thread that submits cluster state update tasks (create index, update mapping, allocate shard, fail shard, etc.).
</td>
@ -533,6 +547,108 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look
<td>The time (milliseconds) that a master task has been executed.
</td>
</tr>
<tr>
<td>Cache_MaxSize
</td>
<td>CacheType
</td>
<td>The max size of the cache in bytes.
</td>
</tr>
<tr>
<td>AdmissionControl_RejectionCount (WIP)
</td>
<td rowspan="3">ControllerName
</td>
<td>Total rejections performed by a Controller of Admission Control.
</td>
</tr>
<tr>
<td>AdmissionControl_CurrentValue (WIP)
</td>
<td>Current value for Controller of Admission Control.
</td>
</tr>
<tr>
<td>AdmissionControl_ThresholdValue (WIP)
</td>
<td>Threshold value for Controller of Admission Control.
</td>
</tr>
<tr>
<td>Data_RetryingPendingTasksCount (WIP)
</td>
<td rowspan="2"> NodeID
</td>
<td>Number of throttled pending tasks on which data node is actively performing retries. It will be an absolute metric at that point of time.
</td>
</tr>
<tr>
<td>Master_ThrottledPendingTasksCount (WIP)
</td>
<td>Sum of total pending tasks which got throttled by node (master node). It is a cumulative metric so look at the max aggregation.
</td>
</tr>
<tr>
<td>Election_Term (WIP)
</td>
<td rowspan="5">N/A
</td>
<td>Monotonically increasing number with every master election.
</td>
</tr>
<tr>
<td>PublishClusterState_Latency (WIP)
</td>
<td>The time taken by quorum of nodes to publish new cluster state. This metric is available for current master.
</td>
</tr>
<tr>
<td>PublishClusterState_Failure (WIP)
</td>
<td>The number of times publish new cluster state action failed on master node.
</td>
</tr>
<tr>
<td>ClusterApplierService_Latency (WIP)
</td>
<td>The time taken by each node to apply cluster state sent by master.
</td>
</tr>
<tr>
<td>ClusterApplierService_Failure (WIP)
</td>
<td>The number of times apply cluster state action failed on each node.
</td>
</tr>
<tr>
<td>Shard_State (WIP)
</td>
<td>IndexName, NodeName, ShardType, ShardID
</td>
<td>The state of each shard - whether it is STARTED, UNASSIGNED, RELOCATING etc.
</td>
</tr>
<tr>
<td>LeaderCheck_Latency (WIP)
</td>
<td rowspan="4">WIP
</td>
<td rowspan="4">WIP
</td>
</tr>
<tr>
<td>FollowerCheck_Failure (WIP)
</td>
</tr>
<tr>
<td>LeaderCheck_Failure (WIP)
</td>
</tr>
<tr>
<td>FollowerCheck_Latency (WIP)
</td>
</tr>
</tbody>
</table>
@ -558,3 +674,4 @@ MasterTaskInsertOrder | The order in which the task was inserted (e.g. `3691`).
MasterTaskPriority | Priority of the task (e.g. `URGENT`). OpenSearch executes higher priority tasks before lower priority ones, regardless of `insert_order`.
MasterTaskType | `shard-started`, `create-index`, `delete-index`, `refresh-mapping`, `put-mapping`, `CleanupSnapshotRestoreState`, `Update snapshot state`
MasterTaskMetadata | Metadata for the task (if any).
CacheType | `Field_Data_Cache`, `Shard_Request_Cache`, `Node_Query_Cache`

View File

@ -1,195 +0,0 @@
---
layout: default
title: Configuration reference
parent: Trace analytics
nav_order: 25
---
# Data Prepper configuration reference
This page lists all supported Data Prepper sources, buffers, preppers, and sinks, along with their associated options. For example configuration files, see [Data Prepper]({{site.url}}{{site.baseurl}}/monitoring-plugins/trace/data-prepper/).
## Data Prepper server options
Option | Required | Description
:--- | :--- | :---
ssl | No | Boolean, indicating whether TLS should be used for server APIs. Defaults to true.
keyStoreFilePath | No | String, path to a .jks or .p12 keystore file. Required if ssl is true.
keyStorePassword | No | String, password for keystore. Optional, defaults to empty string.
privateKeyPassword | No | String, password for private key within keystore. Optional, defaults to empty string.
serverPort | No | Integer, port number to use for server APIs. Defaults to 4900
## General pipeline options
Option | Required | Description
:--- | :--- | :---
workers | No | Integer, default 1. Essentially the number of application threads. As a starting point for your use case, try setting this value to the number of CPU cores on the machine.
delay | No | Integer (milliseconds), default 3,000. How long workers wait between buffer read attempts.
## Sources
Sources define where your data comes from.
### otel_trace_source
Source for the OpenTelemetry Collector.
Option | Required | Description
:--- | :--- | :---
port | No | Integer, the port OTel trace source is running on. Default is `21890`.
request_timeout | No | Integer, the request timeout in millis. Default is `10_000`.
health_check_service | No | Boolean, enables a gRPC health check service under `grpc.health.v1/Health/Check`. Default is `false`.
proto_reflection_service | No | Boolean, enables a reflection service for Protobuf services (see [gRPC reflection](https://github.com/grpc/grpc/blob/master/doc/server-reflection.md) and [gRPC Server Reflection Tutorial](https://github.com/grpc/grpc-java/blob/master/documentation/server-reflection-tutorial.md) docs). Default is `false`.
unframed_requests | No | Boolean, enable requests not framed using the gRPC wire protocol.
thread_count | No | Integer, the number of threads to keep in the ScheduledThreadPool. Default is `200`.
max_connection_count | No | Integer, the maximum allowed number of open connections. Default is `500`.
ssl | No | Boolean, enables connections to the OTel source port over TLS/SSL. Defaults to `true`.
sslKeyCertChainFile | Conditionally | String, file-system path or AWS S3 path to the security certificate (e.g. `"config/demo-data-prepper.crt"` or `"s3://my-secrets-bucket/demo-data-prepper.crt"`). Required if ssl is set to `true`.
sslKeyFile | Conditionally | String, file-system path or AWS S3 path to the security key (e.g. `"config/demo-data-prepper.key"` or `"s3://my-secrets-bucket/demo-data-prepper.key"`). Required if ssl is set to `true`.
useAcmCertForSSL | No | Boolean, enables TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is `false`.
acmCertificateArn | Conditionally | String, represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificate. Required if `useAcmCertForSSL` is set to `true`.
awsRegion | Conditionally | String, represents the AWS region to use ACM or S3. Required if `useAcmCertForSSL` is set to `true` or `sslKeyCertChainFile` and `sslKeyFile` are AWS S3 paths.
### file
Source for flat file input.
Option | Required | Description
:--- | :--- | :---
path | Yes | String, path to the input file (e.g. `logs/my-log.log`).
### pipeline
Source for reading from another pipeline.
Option | Required | Description
:--- | :--- | :---
name | Yes | String, name of the pipeline to read from.
### stdin
Source for console input. Can be useful for testing. No options.
## Buffers
Buffers store data as it passes through the pipeline. If you implement a custom buffer, it can be memory-based (better performance) or disk-based (larger).
### bounded_blocking
The default buffer. Memory-based.
Option | Required | Description
:--- | :--- | :---
buffer_size | No | Integer, default 512. The maximum number of records the buffer accepts.
batch_size | No | Integer, default 8. The maximum number of records the buffer drains after each read.
## Preppers
Preppers perform some action on your data: filter, transform, enrich, etc.
### otel_trace_raw_prepper
Converts OpenTelemetry data to OpenSearch-compatible JSON documents.
Option | Required | Description
:--- | :--- | :---
root_span_flush_delay | No | Integer, representing the time interval in seconds to flush all the root spans in the prepper together with their descendants. Defaults to 30.
trace_flush_interval | No | Integer, representing the time interval in seconds to flush all the descendant spans without any root span. Defaults to 180.
### service_map_stateful
Uses OpenTelemetry data to create a distributed service map for visualization in OpenSearch Dashboards.
Option | Required | Description
:--- | :--- | :---
window_duration | No | Integer, representing the fixed time window in seconds to evaluate service-map relationships. Defaults to 180.
### peer_forwarder
Forwards ExportTraceServiceRequests via gRPC to other Data Prepper instances. Required for operating Data Prepper in a clustered deployment.
Option | Required | Description
:--- | :--- | :---
time_out | No | Integer, forwarded request timeout in seconds. Defaults to 3 seconds.
span_agg_count | No | Integer, batch size for number of spans per request. Defaults to 48.
target_port | No | Integer, the destination port to forward requests to. Defaults to `21890`.
discovery_mode | No | String, peer discovery mode to be used. Allowable values are `static`, `dns`, and `aws_cloud_map`. Defaults to `static`.
static_endpoints | No | List, containing string endpoints of all Data Prepper instances.
domain_name | No | String, single domain name to query DNS against. Typically used by creating multiple DNS A Records for the same domain.
ssl | No | Boolean, indicating whether TLS should be used. Default is true.
awsCloudMapNamespaceName | Conditionally | String, name of your CloudMap Namespace. Required if `discovery_mode` is set to `aws_cloud_map`.
awsCloudMapServiceName | Conditionally | String, service name within your CloudMap Namespace. Required if `discovery_mode` is set to `aws_cloud_map`.
sslKeyCertChainFile | Conditionally | String, represents the SSL certificate chain file path or AWS S3 path. S3 path example `s3://<bucketName>/<path>`. Required if `ssl` is set to `true`.
useAcmCertForSSL | No | Boolean, enables TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is `false`.
awsRegion | Conditionally | String, represents the AWS region to use ACM, S3, or CloudMap. Required if `useAcmCertForSSL` is set to `true` or `sslKeyCertChainFile` and `sslKeyFile` are AWS S3 paths.
acmCertificateArn | Conditionally | String represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificate. Required if `useAcmCertForSSL` is set to `true`.
### string_converter
Converts strings to uppercase or lowercase. Mostly useful as an example if you want to develop your own prepper.
Option | Required | Description
:--- | :--- | :---
upper_case | No | Boolean, whether to convert to uppercase (`true`) or lowercase (`false`).
## Sinks
Sinks define where Data Prepper writes your data to.
### opensearch
Sink for an OpenSearch cluster.
Option | Required | Description
:--- | :--- | :---
hosts | Yes | List of OpenSearch hosts to write to (e.g. `["https://localhost:9200", "https://remote-cluster:9200"]`).
cert | No | String, path to the security certificate (e.g. `"config/root-ca.pem"`) if the cluster uses the OpenSearch security plugin.
username | No | String, username for HTTP basic authentication.
password | No | String, password for HTTP basic authentication.
aws_sigv4 | No | Boolean, whether to use IAM signing to connect to an Amazon OpenSearch Service domain. For your access key, secret key, and optional session token, Data Prepper uses the default credential chain (environment variables, Java system properties, `~/.aws/credential`, etc.).
aws_region | No | String, AWS region (e.g. `"us-east-1"`) for the domain if you are connecting to Amazon OpenSearch Service.
aws_sts_role | No | String, IAM role which the sink plugin will assume to sign request to Amazon OpenSearch Service. If not provided the plugin will use the default credentials.
trace_analytics_raw | No | Boolean, default false. Whether to export as trace data to the `otel-v1-apm-span-*` index pattern (alias `otel-v1-apm-span`) for use with the Trace Analytics OpenSearch Dashboards plugin.
trace_analytics_service_map | No | Boolean, default false. Whether to export as trace data to the `otel-v1-apm-service-map` index for use with the service map component of the Trace Analytics OpenSearch Dashboards plugin.
index | No | String, name of the index to export to. Only required if you don't use the `trace_analytics_raw` or `trace_analytics_service_map` presets.
template_file | No | String, the path to a JSON [index template]({{site.url}}{{site.baseurl}}/opensearch/index-templates/) file (e.g. `/your/local/template-file.json` if you do not use the `trace_analytics_raw` or `trace_analytics_service_map`. See [otel-v1-apm-span-index-template.json](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/src/main/resources/otel-v1-apm-span-index-template.json) for an example.
document_id_field | No | String, the field from the source data to use for the OpenSearch document ID (e.g. `"my-field"`) if you don't use the `trace_analytics_raw` or `trace_analytics_service_map` presets.
dlq_file | No | String, the path to your preferred dead letter queue file (e.g. `/your/local/dlq-file`). Data Prepper writes to this file when it fails to index a document on the OpenSearch cluster.
bulk_size | No | Integer (long), default 5. The maximum size (in MiB) of bulk requests to the OpenSearch cluster. Values below 0 indicate an unlimited size. If a single document exceeds the maximum bulk request size, Data Prepper sends it individually.
### file
Sink for flat file output.
Option | Required | Description
:--- | :--- | :---
path | Yes | String, path for the output file (e.g. `logs/my-transformed-log.log`).
### pipeline
Sink for writing to another pipeline.
Option | Required | Description
:--- | :--- | :---
name | Yes | String, name of the pipeline to write to.
### stdout
Sink for console output. Can be useful for testing. No options.

View File

@ -1,135 +0,0 @@
---
layout: default
title: Data Prepper
parent: Trace analytics
nav_order: 20
---
# Data Prepper
Data Prepper is an independent component, not an OpenSearch plugin, that converts data for use with OpenSearch. It's not bundled with the all-in-one OpenSearch installation packages.
## Install Data Prepper
To use the Docker image, pull it like any other image:
```bash
docker pull opensearchproject/data-prepper:latest
```
Otherwise, [download](https://opensearch.org/downloads.html) the appropriate archive for your operating system and unzip it.
## Configure pipelines
To use Data Prepper, you define pipelines in a configuration YAML file. Each pipeline is a combination of a source, a buffer, zero or more preppers, and one or more sinks:
```yml
sample-pipeline:
workers: 4 # the number of workers
delay: 100 # in milliseconds, how long workers wait between read attempts
source:
otel_trace_source:
ssl: true
sslKeyCertChainFile: "config/demo-data-prepper.crt"
sslKeyFile: "config/demo-data-prepper.key"
buffer:
bounded_blocking:
buffer_size: 1024 # max number of records the buffer accepts
batch_size: 256 # max number of records the buffer drains after each read
prepper:
- otel_trace_raw_prepper:
sink:
- opensearch:
hosts: ["https:localhost:9200"]
cert: "config/root-ca.pem"
username: "ta-user"
password: "ta-password"
trace_analytics_raw: true
```
- Sources define where your data comes from. In this case, the source is the OpenTelemetry Collector (`otel_trace_source`) with some optional SSL settings.
- Buffers store data as it passes through the pipeline.
By default, Data Prepper uses its one and only buffer, the `bounded_blocking` buffer, so you can omit this section unless you developed a custom buffer or need to tune the buffer settings.
- Preppers perform some action on your data: filter, transform, enrich, etc.
You can have multiple preppers, which run sequentially from top to bottom, not in parallel. The `otel_trace_raw_prepper` prepper converts OpenTelemetry data into OpenSearch-compatible JSON documents.
- Sinks define where your data goes. In this case, the sink is an OpenSearch cluster.
Pipelines can act as the source for other pipelines. In the following example, a pipeline takes data from the OpenTelemetry Collector and uses two other pipelines as sinks:
```yml
entry-pipeline:
delay: "100"
source:
otel_trace_source:
ssl: true
sslKeyCertChainFile: "config/demo-data-prepper.crt"
sslKeyFile: "config/demo-data-prepper.key"
sink:
- pipeline:
name: "raw-pipeline"
- pipeline:
name: "service-map-pipeline"
raw-pipeline:
source:
pipeline:
name: "entry-pipeline"
prepper:
- otel_trace_raw_prepper:
sink:
- opensearch:
hosts: ["https://localhost:9200" ]
cert: "config/root-ca.pem"
username: "ta-user"
password: "ta-password"
trace_analytics_raw: true
service-map-pipeline:
delay: "100"
source:
pipeline:
name: "entry-pipeline"
prepper:
- service_map_stateful:
sink:
- opensearch:
hosts: ["https://localhost:9200"]
cert: "config/root-ca.pem"
username: "ta-user"
password: "ta-password"
trace_analytics_service_map: true
```
To learn more, see the [Data Prepper configuration reference]({{site.url}}{{site.baseurl}}/monitoring-plugins/trace/data-prepper-reference/).
## Configure the Data Prepper server
Data Prepper itself provides administrative HTTP endpoints such as `/list` to list pipelines and `/metrics/prometheus` to provide Prometheus-compatible metrics data. The port which serves these endpoints, as well as TLS configuration, is specified by a separate YAML file. Example:
```yml
ssl: true
keyStoreFilePath: "/usr/share/data-prepper/keystore.jks"
keyStorePassword: "password"
privateKeyPassword: "other_password"
serverPort: 1234
```
## Start Data Prepper
**Docker**
```bash
docker run --name data-prepper --expose 21890 -v /full/path/to/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml -v /full/path/to/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml opensearchproject/opensearch-data-prepper:latest
```
**macOS and Linux**
```bash
./data-prepper-tar-install.sh config/pipelines.yaml config/data-prepper-config.yaml
```
For production workloads, you likely want to run Data Prepper on a dedicated machine, which makes connectivity a concern. Data Prepper uses port 21890 and must be able to connect to both the OpenTelemetry Collector and the OpenSearch cluster. In the [sample applications](https://github.com/opensearch-project/Data-Prepper/tree/main/examples), you can see that all components use the same Docker network and expose the appropriate ports.

View File

@ -0,0 +1,65 @@
---
layout: default
title: Application analytics
nav_order: 80
---
# Application analytics
You can use application analytics to create custom observability applications to view the availability status of your systems, where you can combine log events with trace and metric data into a single view of overall system health. This lets you quickly pivot between logs, traces, and metrics to dig into the source of any issues.
## Get started with application analytics
To get started, choose **Observability** in OpenSearch Dashboards, and then choose **Application analytics**.
### Create an application
1. Choose **Create application**.
2. Enter a name for your application and optionally add a description.
3. Do at least one of the following:
- Use [PPL]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/index) to specify the base query.
You can't change the base query after the application is created.
{: .note }
- Select services & entities from the dropdown or the service map.
- Select trace groups from the dropdown or the table.
4\. Choose **Create**.
### Create a visualization
1. Choose the **Log Events** tab.
1. Use [PPL]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/index) to build upon your base query.
1. Choose the **Visualizations** tab to see your visualizations.
1. Expand the **Save** dropdown menu, enter a name for your visualization, then choose **Save**.
To see your visualizations, choose the **Panel** tab.
### Configure availability
Availability is the status of your application determined by availability levels set on a [time series metric]({{site.url}}{{site.baseurl}}/observability-plugin/app-analytics/#time-series-metric).
To create an availability level, you must configure the following:
- color: The color of the availability badge on the home page.
- name: The text in the availability badge on the home page.
- expression: Comparison operator to determine the availability.
- value: Value to use when calculating availability.
![Configuring availability]({{site.url}}{{site.baseurl}}/images/app_availability_level.gif)
By default, application analytics shows results from the last 24 hours of your data. To see data from a different time frame, use the date and time selector.
#### Time series metric
A time series metric is any visualization that has a query that spans over a timestamp and is a line chart. You can then use PPL to define arbitrary conditions on your logs to create a visualization over time.
##### Example
```
source = <index_name> | ... | ... | stats ... by span(<timestamp_field>, 1h)
```
Choose **Line** in visualization configurations to create a time series metric.
![Changing visualization to line chart]({{site.url}}{{site.baseurl}}/images/visualization-line-type.gif)

View File

@ -0,0 +1,57 @@
---
layout: default
title: Event analytics
nav_order: 10
---
# Event analytics
Event analytics in observability is where you can use [Piped Processing Language]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/index) (PPL) queries to build and view different visualizations of your data.
## Get started with event analytics
To get started, choose **Observability** in OpenSearch Dashboards, and then choose **Event analytics**. If you want to start exploring without adding any of your own data, choose **Add sample Events Data**, and Dashboards adds some sample visualizations you can interact with.
## Build a query
To generate custom visualizations, you must first specify a PPL query. OpenSearch Dashboards then automatically creates a visualization based on the results of your query.
For example, the following PPL query returns a count of how many host addresses are currently in your data.
```
source = opensearch_dashboards_sample_data_logs | fields host | stats count()
```
By default, Dashboards shows results from the last 15 minutes of your data. To see data from a different timeframe, use the date and time selector.
For more information about building PPL queries, see [Piped Processing Language]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/index).
## Save a visualization
After Dashboards generates a visualization, you must save it if you want to return to it at a later time or if you want to add it to an [operational panel]({{site.url}}{{site.baseurl}}/observability-plugin/operational-panels).
To save a visualization, expand the save dropdown menu next to **Refresh**, enter a name for your visualization, then choose **Save**. You can reopen any saved visualizations on the event analytics page.
## View logs
The following are methods you can use to view logs.
### Trace log correlation
If you regularly track events across applications, you can correlate logs and traces. To view the correlation, you have to index the traces according to Open Telemetry standards (similar to trace analytics). Once you add a `TraceId` field to your logs, you can view the correlated trace information in the event explorer log details. This method lets you correlate logs and traces that correspond to the same execution context.
![Trace Log Correlation]({{site.url}}{{site.baseurl}}/images/trace_log_correlation.gif)
### View surrounding events
If you want to know more about a log event you're looking at, you can select **View surrounding events** to get a bigger picture of what was happening around the time of interest.
![Surrounding Events]({{site.url}}{{site.baseurl}}/images/surrounding_events.gif)
### Live tail
If you prefer watching events happen live, you can configure an interval so event analytics automatically refreshes the content. Live tail lets you stream logs live to OpenSearch observability event analytics based on the provided PPL query, as well as provide rich functionality such as filters. Doing so improves your debugging experience and lets you monitor your logs in real-time without having to manually refresh.
You can also choose intervals and switch between them to dictate how often live tail should stream live logs. This feature is similar to the CLI's `tail -f` command in that it only retrieves the most recent live logs by possibly eliminating a large portion of live logs. Live tail also provides you with the total count of live logs received by OpenSearch during the live stream, which you can use to better understand the incoming traffic.
![Live Tail]({{site.url}}{{site.baseurl}}/images/live_tail.gif)

View File

@ -0,0 +1,28 @@
---
layout: default
title: About Observability
nav_order: 1
has_children: false
redirect_from:
- /observability-plugin/
- /observability-plugin/
---
# About Observability
OpenSearch Dashboards
{: .label .label-yellow :}
Observability is collection of plugins and applications that let you visualize data-driven events by using Piped Processing Language to explore, discover, and query data stored in OpenSearch.
Your experience of exploring data might differ, but if you're new to exploring data to create visualizations, we recommend trying a workflow like the following:
1. Explore data over a certain timeframe using [Piped Processing Language]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/index).
2. Use [event analytics]({{site.url}}{{site.baseurl}}/observability-plugin/event-analytics) to turn data-driven events into visualizations.
![Sample Event Analytics View]({{site.url}}{{site.baseurl}}/images/event-analytics.png)
3. Create [operational panels]({{site.url}}{{site.baseurl}}/observability-plugin/operational-panels) and add visualizations to compare data the way you like.
![Sample Operational Panel View]({{site.url}}{{site.baseurl}}/images/operational-panel.png)
4. Use [log analytics]({{site.url}}{{site.baseurl}}/observability-plugin/log-analytics) to transform unstructured log data.
5. Use [trace analytics]({{site.url}}{{site.baseurl}}/observability-plugin/trace/index) to create traces and dive deep into your data.
![Sample Trace Analytics View]({{site.url}}{{site.baseurl}}/images/observability-trace.png)
6. Leverage [notebooks]({{site.url}}{{site.baseurl}}/observability-plugin/notebooks) to combine different visualizations and code blocks that you can share with team members.
![Sample Notebooks View]({{site.url}}{{site.baseurl}}/images/notebooks.png)

View File

@ -0,0 +1,93 @@
---
layout: default
title: Log analytics
nav_order: 70
---
# Log Ingestion
Log ingestion provides a way to transform unstructured log data into structured data and ingest into OpenSearch. Structured log data allows for improved queries and filtering based on the data format when searching logs for an event.
## Get started with log ingestion
OpenSearch Log Ingestion consists of three components---[Data Prepper]({{site.url}}{{site.baseurl}}/clients/data-prepper/index/), [OpenSearch]({{site.url}}{{site.baseurl}}/) and [OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/)---that fit into the OpenSearch ecosystem. The Data Prepper repository has several [sample applications](https://github.com/opensearch-project/data-prepper/tree/main/examples) to help you get started.
### Basic flow of data
![Log data flow diagram from a distributed application to OpenSearch]({{site.url}}{{site.baseurl}}/images/la.png)
1. Log Ingestion relies on you adding log collection to your application's environment to gather and send log data.
(In the [example](#example) below, [FluentBit](https://docs.fluentbit.io/manual/) is used as a log collector that collects log data from a file and sends the log data to Data Prepper).
2. [Data Prepper]({{site.url}}{{site.baseurl}}/clients/data-prepper/index/) receives the log data, transforms the data into a structure format, and indexes it on an OpenSearch cluster.
3. The data can then be explored through OpenSearch search queries or the **Discover** page in OpenSearch Dashboards.
### Example
This example mimics the writing of log entries to a log file that are then processed by Data Prepper and stored in OpenSearch.
Download or clone the [Data Prepper repository](https://github.com/opensearch-project/data-prepper). Then navigate to `examples/log-ingestion/` and open `docker-compose.yml` in a text editor. This file contains a container for:
- [Fluent Bit](https://docs.fluentbit.io/manual/) (`fluent-bit`)
- Data Prepper (`data-prepper`)
- A single-node OpenSearch cluster (`opensearch`)
- OpenSearch Dashboards (`opensearch-dashboards`).
Close the file and run `docker-compose up --build` to start the containers.
After the containers start, your ingestion pipeline is set up and ready to ingest log data. The `fluent-bit` container is configured to read log data from `test.log`. Run the following command to generate log data to send to the log ingestion pipeline.
```
echo '63.173.168.120 - - [04/Nov/2021:15:07:25 -0500] "GET /search/tag/list HTTP/1.0" 200 5003' >> test.log
```
Fluent-Bit will collect the log data and send it to Data Prepper:
```angular2html
[2021/12/02 15:35:41] [ info] [output:http:http.0] data-prepper:2021, HTTP status=200
200 OK
```
Data Prepper will process the log and index it:
```
2021-12-02T15:35:44,499 [log-pipeline-processor-worker-1-thread-1] INFO com.amazon.dataprepper.pipeline.ProcessWorker - log-pipeline Worker: Processing 1 records from buffer
```
This should result in a single document being written to the OpenSearch cluster in the `apache-logs` index as defined in the `log_pipeline.yaml` file.
Run the following command to see one of the raw documents in the OpenSearch cluster:
```bash
curl -X GET -u 'admin:admin' -k 'https://localhost:9200/apache_logs/_search?pretty&size=1'
```
The response should show the parsed log data:
```
"hits" : [
{
"_index" : "apache_logs",
"_type" : "_doc",
"_id" : "yGrJe30BgI2EWNKtDZ1g",
"_score" : 1.0,
"_source" : {
"date" : 1.638459307042312E9,
"log" : "63.173.168.120 - - [04/Nov/2021:15:07:25 -0500] \"GET /search/tag/list HTTP/1.0\" 200 5003",
"request" : "/search/tag/list",
"auth" : "-",
"ident" : "-",
"response" : "200",
"bytes" : "5003",
"clientip" : "63.173.168.120",
"verb" : "GET",
"httpversion" : "1.0",
"timestamp" : "04/Nov/2021:15:07:25 -0500"
}
}
]
```
The same data can be viewed in OpenSearch Dashboards by visiting the **Discover** page and searching the `apache_logs` index. Remember, you must create the index in OpenSearch Dashboards if this is your first time searching for the index.

View File

@ -0,0 +1,56 @@
---
layout: default
title: Observability security
nav_order: 5
has_children: false
---
# Observability security
You can use the security plugin with Observability in OpenSearch to limit non-admin users to specific actions. For example, you might want some users to only view visualizations, notebooks, and other Observability objects, while others can create and modify them.
## Basic permissions
The security plugin has two built-in roles that cover most Observability use cases: `observability_full_access` and `observability_read_access`. For descriptions of each, see [Predefined roles]({{site.url}}{{site.baseurl}}/security-plugin/access-control/users-roles#predefined-roles). If you don't see these predefined roles in OpenSearch Dashboards, you can create them with the following commands:
```json
PUT _plugins/_security/api/roles/observability_read_access
{
"cluster_permissions": [
"cluster:admin/opensearch/observability/get"
]
}
```
```json
PUT _plugins/_security/api/roles/observability_full_access
{
"cluster_permissions": [
"cluster:admin/opensearch/observability/*"
]
}
```
If these roles don't meet your needs, mix and match individual Observability [permissions]({{site.url}}{{site.baseurl}}/security-plugin/access-control/permissions/) to suit your use case. For example, the `cluster:admin/opensearch/observability/create` permission lets you create Observability objects (visualizations, operational panels, notebooks, etc.).
The following is an example role that provides access to Observability:
```json
PUT _plugins/_security/api/roles/observability_permissions
{
"cluster_permissions": [
"cluster:admin/opensearch/observability/create",
"cluster:admin/opensearch/observability/update",
"cluster:admin/opensearch/observability/delete",
"cluster:admin/opensearch/observability/get"
],
"index_permissions": [{
"index_patterns": [".opensearch-observability"],
"allowed_actions": ["write", "read", "search"]
}],
"tenant_permissions": [{
"tenant_patterns": ["global_tenant"],
"allowed_actions": ["opensearch_dashboards_all_write"]
}]
}
```

View File

@ -0,0 +1,25 @@
---
layout: default
title: Operational panels
nav_order: 30
---
# Operational panels
Operational panels in OpenSearch Dashboards are collections of visualizations generated using [Piped Processing Language]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/index) (PPL) queries.
## Get started with operational panels
If you want to start using operational panels without adding any data, expand the **Action** menu, choose **Add samples**, and Dashboards adds a set of operational panels with saved visualizations for you to explore.
## Create an operational panel
To create an operational panel and add visualizations:
1. From the **Add Visualization** dropdown menu, choose **Select Existing Visualization** or **Create New Visualization**, which takes you to the [event analytics]({{site.url}}{{site.baseurl}}/observability-plugin/event-analytics) explorer, where you can use PPL to create visualizations.
1. If you're adding already existing visualizations, choose a visualization from the dropdown menu.
1. Choose **Add**.
![Sample operational panel]({{site.url}}{{site.baseurl}}/images/operational-panel.png)
To search for a particular visualization in your operation panels, use PPL queries to search for data you've already added to your panel.

View File

@ -57,7 +57,7 @@ search source=accounts;
To get all documents from the `accounts` index that have either `account_number` equal to 1 or have `gender` as `F`:
```sql
search source=accounts account_number=1 or gender="F";
search source=accounts account_number=1 or gender=\"F\";
```
| account_number | firstname | address | balance | gender | city | employer | state | age | email | lastname |
@ -79,8 +79,8 @@ Field | Description | Type | Required | Default
:--- | :--- |:--- |:--- |:---
`int` | Retain the specified number of duplicate events for each combination. The number must be greater than 0. If you do not specify a number, only the first occurring event is kept and all other duplicates are removed from the results. | `string` | No | 1
`keepempty` | If true, keep the document if any field in the field list has a null value or a field missing. | `nested list of objects` | No | False
`consecutive` | If true, remove only consecutive events with duplicate combinations of values. | No | False | -
`field-list` | Specify a comma-delimited field list. At least one field is required. | Yes | - | -
`consecutive` | If true, remove only consecutive events with duplicate combinations of values. | `Boolean` | No | False
`field-list` | Specify a comma-delimited field list. At least one field is required. | `String` or comma-separated list of strings | Yes | -
*Example 1*: Dedup by one field
@ -151,6 +151,10 @@ search source=accounts | dedup gender consecutive=true | fields account_number,
13 | F
18 | M
### Limitations
The `dedup` command is not rewritten to OpenSearch DSL, it is only executed on the coordination node.
## eval
The `eval` command evaluates an expression and appends its result to the search result.
@ -211,14 +215,19 @@ search source=accounts | eval doubleAge = age * 2, ddAge = doubleAge * 2 | field
| 28 | 56 | 112
| 33 | 66 | 132
### Limitation
The ``eval`` command is not rewritten to OpenSearch DSL, it is only executed on the coordination node.
## fields
Use the `field` command to keep or remove fields from a search result.
Use the `fields` command to keep or remove fields from a search result.
### Syntax
```sql
field [+|-] <field-list>
fields [+|-] <field-list>
```
Field | Description | Required | Default
@ -256,6 +265,80 @@ search source=accounts | fields account_number, firstname, lastname | fields - a
| Nanette | Bates
| Dale | Adams
## parse
Use the `parse` command to parse a text field using regular expression and append the result to the search result.
### Syntax
```sql
parse <field> <regular-expression>
```
Field | Description | Required
:--- | :--- |:---
field | A text field. | Yes
regular-expression | The regular expression used to extract new fields from the given test field. If a new field name exists, it will replace the original field. | Yes
The regular expression is used to match the whole text field of each document with Java regex engine. Each named capture group in the expression will become a new ``STRING`` field.
*Example 1*: Create new field
The example shows how to create new field `host` for each document. `host` will be the hostname after `@` in `email` field. Parsing a null field will return an empty string.
```sql
os> source=accounts | parse email '.+@(?<host>.+)' | fields email, host ;
fetched rows / total rows = 4/4
```
| email | host
:--- | :--- |
| amberduke@pyrami.com | pyrami.com
| hattiebond@netagy.com | netagy.com
| null | null
| daleadams@boink.com | boink.com
*Example 2*: Override the existing field
The example shows how to override the existing address field with street number removed.
```sql
os> source=accounts | parse address '\d+ (?<address>.+)' | fields address ;
fetched rows / total rows = 4/4
```
| address
:--- |
| Holmes Lane
| Bristol Street
| Madison Street
| Hutchinson Court
*Example 3*: Filter and sort be casted parsed field
The example shows how to sort street numbers that are higher than 500 in address field.
```sql
os> source=accounts | parse address '(?<streetNumber>\d+) (?<street>.+)' | where cast(streetNumber as int) > 500 | sort num(streetNumber) | fields streetNumber, street ;
fetched rows / total rows = 3/3
```
| streetNumber | street
:--- | :--- |
| 671 | Bristol Street
| 789 | Madison Street
| 880 | Holmes Lane
### Limitations
A few limitations exist when using the parse command:
- Fields defined by parse cannot be parsed again. For example, `source=accounts | parse address '\d+ (?<street>.+)' | parse street '\w+ (?<road>\w+)' ;` will fail to return any expressions.
- Fields defined by parse cannot be overridden with other commands. For example, when entering `source=accounts | parse address '\d+ (?<street>.+)' | eval street='1' | where street='1' ;` `where` will not match any documents since `street` cannot be overridden.
- The text field used by parse cannot be overridden. For example, when entering `source=accounts | parse address '\d+ (?<street>.+)' | eval address='1' ;` `street` will not be parse since address is overridden.
- Fields defined by parse cannot be filtered/sorted after using them in the `stats` command. For example, `source=accounts | parse email '.+@(?<host>.+)' | stats avg(age) by host | where host=pyrami.com ;` `where` will not parse the domain listed.
## rename
Use the `rename` command to rename one or more fields in the search result.
@ -301,6 +384,10 @@ search source=accounts | rename account_number as an, employer as emp | fields a
| 13 | Quility
| 18 | null
### Limitations
The `rename` command is not rewritten to OpenSearch DSL, it is only executed on the coordination node.
## sort
Use the `sort` command to sort search results by a specified field.
@ -498,7 +585,7 @@ Field | Description | Required
To get all documents from the `accounts` index where `account_number` is 1 or gender is `F`:
```sql
search source=accounts | where account_number=1 or gender="F" | fields account_number, gender;
search source=accounts | where account_number=1 or gender=\"F\" | fields account_number, gender;
```
| account_number | gender
@ -547,6 +634,10 @@ search source=accounts | fields firstname, age | head 2;
| Amber | 32
| Hattie | 36
### Limitations
The `head` command is not rewritten to OpenSearch DSL, it is only executed on the coordination node.
## rare
Use the `rare` command to find the least common values of all fields in a field list.
@ -590,6 +681,10 @@ search source=accounts | rare age by gender;
| M | 32
| M | 33
### Limitations
The `rare` command is not rewritten to OpenSearch DSL, it is only executed on the coordination node.
## top {#top-command}
Use the `top` command to find the most common values of all fields in the field list.
@ -643,3 +738,177 @@ search source=accounts | top 1 age by gender;
:--- | :--- |
| F | 28
| M | 32
### Limitations
The `top` command is not rewritten to OpenSearch DSL, it is only executed on the coordination node.
## match
Use the `match` command to search documents that match a `string`, `number`, `date`, or `boolean` value for a given field.
### Syntax
```sql
match(field_expression, query_expression[, option=<option_value>]*)
```
You can specify the following options:
- `analyzer`
- `auto_generate_synonyms_phrase`
- `fuzziness`
- `max_expansions`
- `prefix_length`
- `fuzzy_transpositions`
- `fuzzy_rewrite`
- `lenient`
- `operator`
- `minimum_should_match`
- `zero_terms_query`
- `boost`
*Example 1*: Search the `message` field:
```json
GET my_index/_search
{
"query": {
"match": {
"message": "this is a test"
}
}
}
```
PPL query:
```sql
search source=my_index | match field=message query="this is a test"
```
*Example 2*: Search the `message` field with the `operator` parameter:
```json
GET my_index/_search
{
"query": {
"match": {
"message": {
"query": "this is a test",
"operator": "and"
}
}
}
}
```
PPL query:
```sql
search source=my_index | match field=message query="this is a test" operator=and
```
*Example 3*: Search the `message` field with the `operator` and `zero_terms_query` parameters:
```json
GET my_index/_search
{
"query": {
"match": {
"message": {
"query": "to be or not to be",
"operator": "and",
"zero_terms_query": "all"
}
}
}
}
```
PPL query:
```ppl
search source=my_index | where match(message, "this is a test", operator=and, zero_terms_query=all)
```
## ad
The `ad` command applies the Random Cut Forest (RCF) algorithm in the ML Commons plugin on the search result returned by a PPL command. Based on the input, the plugin uses two types of RCF algorithms: fixed in time RCF for processing time-series data and batch RCF for processing non-time-series data.
### Fixed In Time RCF For Time-series Data Command Syntax
```sql
ad <shingle_size> <time_decay> <time_field>
```
Field | Description | Required
:--- | :--- |:---
`shingle_size` | A consecutive sequence of the most recent records. The default value is 8. | No
`time_decay` | Specifies how much of the recent past to consider when computing an anomaly score. The default value is 0.001. | No
`time_field` | Specifies the time filed for RCF to use as time-series data. Must be either a long value, such as the timestamp in miliseconds, or a string value in "yyyy-MM-dd HH:mm:ss".| Yes
### Batch RCF for Non-time-series Data Command Syntax
```sql
ad <shingle_size> <time_decay>
```
Field | Description | Required
:--- | :--- |:---
`shingle_size` | A consecutive sequence of the most recent records. The default value is 8. | No
`time_decay` | Specifies how much of the recent past to consider when computing an anomaly score. The default value is 0.001. | No
*Example 1*: Detecting events in New York City from taxi ridership data with time-series data
The example trains a RCF model and use the model to detect anomalies in the time-series ridership data.
PPL query:
```sql
os> source=nyc_taxi | fields value, timestamp | AD time_field='timestamp' | where value=10844.0
```
value | timestamp | score | anomaly_grade
:--- | :--- |:--- | :---
10844.0 | 1404172800000 | 0.0 | 0.0
*Example 2*: Detecting events in New York City from taxi ridership data with non-time-series data
PPL query:
```sql
os> source=nyc_taxi | fields value | AD | where value=10844.0
```
value | score | anomalous
:--- | :--- |:---
| 10844.0 | 0.0 | false
## kmeans
The kmeans command applies the ML Commons plugin's kmeans algorithm to the provided PPL command's search results.
## Syntax
```sql
kmeans <cluster-number>
```
For `cluster-number`, enter the number of clusters you want to group your data points into.
*Example*
The example shows how to classify three Iris species (Iris setosa, Iris virginica and Iris versicolor) based on the combination of four features measured from each sample: the length and the width of the sepals and petals.
PPL query:
```sql
os> source=iris_data | fields sepal_length_in_cm, sepal_width_in_cm, petal_length_in_cm, petal_width_in_cm | kmeans 3
```
sepal_length_in_cm | sepal_width_in_cm | petal_length_in_cm | petal_width_in_cm | ClusterID
:--- | :--- |:--- | :--- | :---
| 5.1 | 3.5 | 1.4 | 0.2 | 1
| 5.6 | 3.0 | 4.1 | 1.3 | 0
| 6.7 | 2.5 | 5.8 | 1.8 | 2

View File

@ -1,7 +1,7 @@
---
layout: default
title: Piped processing language
nav_order: 42
nav_order: 40
has_children: true
has_toc: false
redirect_from:

View File

@ -9,7 +9,6 @@ nav_order: 1
OpenSearch Trace Analytics consists of two components---Data Prepper and the Trace Analytics OpenSearch Dashboards plugin---that fit into the OpenTelemetry and OpenSearch ecosystems. The Data Prepper repository has several [sample applications](https://github.com/opensearch-project/data-prepper/tree/main/examples) to help you get started.
## Basic flow of data
![Data flow diagram from a distributed application to OpenSearch]({{site.url}}{{site.baseurl}}/images/ta.svg)
@ -20,10 +19,9 @@ OpenSearch Trace Analytics consists of two components---Data Prepper and the Tra
1. The [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/getting-started/) receives data from the application and formats it into OpenTelemetry data.
1. [Data Prepper]({{site.url}}{{site.baseurl}}/monitoring-plugins/trace/data-prepper/) processes the OpenTelemetry data, transforms it for use in OpenSearch, and indexes it on an OpenSearch cluster.
1. The [Trace Analytics OpenSearch Dashboards plugin]({{site.url}}{{site.baseurl}}/monitoring-plugins/trace/ta-dashboards/) displays the data in near real-time as a series of charts and tables, with an emphasis on service architecture, latency, error rate, and throughput.
1. [Data Prepper]({{site.url}}{{site.baseurl}}/clients/data-prepper/index/) processes the OpenTelemetry data, transforms it for use in OpenSearch, and indexes it on an OpenSearch cluster.
1. The [Trace Analytics OpenSearch Dashboards plugin]({{site.url}}{{site.baseurl}}/observability-plugin/trace/ta-dashboards/) displays the data in near real-time as a series of charts and tables, with an emphasis on service architecture, latency, error rate, and throughput.
## Jaeger HotROD
@ -80,4 +78,4 @@ curl -X GET -u 'admin:admin' -k 'https://localhost:9200/otel-v1-apm-span-000001/
Navigate to `http://localhost:5601` in a web browser and choose **Trace Analytics**. You can see the results of your single click in the Jaeger HotROD web interface: the number of traces per API and HTTP method, latency trends, a color-coded map of the service architecture, and a list of trace IDs that you can use to drill down on individual operations.
If you don't see your trace, adjust the timeframe in OpenSearch Dashboards. For more information on using the plugin, see [OpenSearch Dashboards plugin]({{site.url}}{{site.baseurl}}/monitoring-plugins/trace/ta-dashboards/).
If you don't see your trace, adjust the timeframe in OpenSearch Dashboards. For more information on using the plugin, see [OpenSearch Dashboards plugin]({{site.url}}{{site.baseurl}}/observability-plugin/trace/ta-dashboards/).

View File

@ -1,11 +1,9 @@
---
layout: default
title: Trace analytics
nav_order: 48
nav_order: 60
has_children: true
has_toc: false
redirect_from:
- /monitoring-plugins/trace/
---
# Trace Analytics

View File

@ -709,7 +709,19 @@ GET opensearch_dashboards_sample_data_logs/_search
}
}
```
If you add a document with malformed fields to an index that has `ip_range` set to `false` in its mappings, OpenSearch rejects the entire document. You can set `ignore_malformed` to `true` to specify that OpenSearch should ignore malformed fields. The default is `false`.
```json
...
"mappings": {
"properties": {
"ips": {
"type": "ip_range",
"ignore_malformed": true
}
}
}
```
## filter, filters
A `filter` aggregation is a query clause, exactly like a search query — `match` or `term` or `range`. You can use the `filter` aggregation to narrow down the entire set of documents to a specific set before creating buckets.

View File

@ -12,23 +12,26 @@ OpenSearch can operate as a single-node or multi-node cluster. The steps to conf
To create and deploy an OpenSearch cluster according to your requirements, its important to understand how node discovery and cluster formation work and what settings govern them.
There are many ways to design a cluster. The following illustration shows a basic architecture:
There are many ways to design a cluster. The following illustration shows a basic architecture that includes a four-node cluster that has one dedicated cluster manager node, one dedicated coordinating node, and two data nodes that are cluster manager eligible and also used for ingesting data.
The nomenclature recently changed for the master node; it is now called the cluster manager node.
{: .note }
![multi-node cluster architecture diagram]({{site.url}}{{site.baseurl}}/images/cluster.png)
This is a four-node cluster that has one dedicated master node, one dedicated coordinating node, and two data nodes that are master-eligible and also used for ingesting data.
### Nodes
The following table provides brief descriptions of the node types:
Node type | Description | Best practices for production
:--- | :--- | :-- |
`Master` | Manages the overall operation of a cluster and keeps track of the cluster state. This includes creating and deleting indices, keeping track of the nodes that join and leave the cluster, checking the health of each node in the cluster (by running ping requests), and allocating shards to nodes. | Three dedicated master nodes in three different zones is the right approach for almost all production use cases. This configuration ensures your cluster never loses quorum. Two nodes will be idle for most of the time except when one node goes down or needs some maintenance.
`Master-eligible` | Elects one node among them as the master node through a voting process. | For production clusters, make sure you have dedicated master nodes. The way to achieve a dedicated node type is to mark all other node types as false. In this case, you have to mark all the other nodes as not master-eligible.
`Data` | Stores and searches data. Performs all data-related operations (indexing, searching, aggregating) on local shards. These are the worker nodes of your cluster and need more disk space than any other node type. | As you add data nodes, keep them balanced between zones. For example, if you have three zones, add data nodes in multiples of three, one for each zone. We recommend using storage and RAM-heavy nodes.
`Ingest` | Preprocesses data before storing it in the cluster. Runs an ingest pipeline that transforms your data before adding it to an index. | If you plan to ingest a lot of data and run complex ingest pipelines, we recommend you use dedicated ingest nodes. You can also optionally offload your indexing from the data nodes so that your data nodes are used exclusively for searching and aggregating.
`Coordinating` | Delegates client requests to the shards on the data nodes, collects and aggregates the results into one final result, and sends this result back to the client. | A couple of dedicated coordinating-only nodes is appropriate to prevent bottlenecks for search-heavy workloads. We recommend using CPUs with as many cores as you can.
Cluster manager | Manages the overall operation of a cluster and keeps track of the cluster state. This includes creating and deleting indexes, keeping track of the nodes that join and leave the cluster, checking the health of each node in the cluster (by running ping requests), and allocating shards to nodes. | Three dedicated cluster manager nodes in three different zones is the right approach for almost all production use cases. This configuration ensures your cluster never loses quorum. Two nodes will be idle for most of the time except when one node goes down or needs some maintenance.
Cluster manager eligible | Elects one node among them as the cluster manager node through a voting process. | For production clusters, make sure you have dedicated cluster manager nodes. The way to achieve a dedicated node type is to mark all other node types as false. In this case, you have to mark all the other nodes as not cluster manager eligible.
Data | Stores and searches data. Performs all data-related operations (indexing, searching, aggregating) on local shards. These are the worker nodes of your cluster and need more disk space than any other node type. | As you add data nodes, keep them balanced between zones. For example, if you have three zones, add data nodes in multiples of three, one for each zone. We recommend using storage and RAM-heavy nodes.
Ingest | Pre-processes data before storing it in the cluster. Runs an ingest pipeline that transforms your data before adding it to an index. | If you plan to ingest a lot of data and run complex ingest pipelines, we recommend you use dedicated ingest nodes. You can also optionally offload your indexing from the data nodes so that your data nodes are used exclusively for searching and aggregating.
Coordinating | Delegates client requests to the shards on the data nodes, collects and aggregates the results into one final result, and sends this result back to the client. | A couple of dedicated coordinating-only nodes is appropriate to prevent bottlenecks for search-heavy workloads. We recommend using CPUs with as many cores as you can.
By default, each node is a master-eligible, data, ingest, and coordinating node. Deciding on the number of nodes, assigning node types, and choosing the hardware for each node type depends on your use case. You must take into account factors like the amount of time you want to hold on to your data, the average size of your documents, your typical workload (indexing, searches, aggregations), your expected price-performance ratio, your risk tolerance, and so on.
By default, each node is a cluster-manager-eligible, data, ingest, and coordinating node. Deciding on the number of nodes, assigning node types, and choosing the hardware for each node type depends on your use case. You must take into account factors like the amount of time you want to hold on to your data, the average size of your documents, your typical workload (indexing, searches, aggregations), your expected price-performance ratio, your risk tolerance, and so on.
After you assess all these requirements, we recommend you use a benchmark testing tool like Rally to provision a small sample cluster and run tests with varying workloads and configurations. Compare and analyze the system and query metrics for these tests to design an optimum architecture. To get started with Rally, see the [Rally documentation](https://esrally.readthedocs.io/en/stable/).
@ -58,36 +61,22 @@ cluster.name: opensearch-cluster
Make the same change on all the nodes to make sure that they'll join to form a cluster.
## Step 2: Set node attributes for each node in a cluster
After you name the cluster, set node attributes for each node in your cluster.
#### Cluster manager node
#### Master node
Give your master node a name. If you don't specify a name, OpenSearch assigns a machine-generated name that makes the node difficult to monitor and troubleshoot.
Give your cluster manager node a name. If you don't specify a name, OpenSearch assigns a machine-generated name that makes the node difficult to monitor and troubleshoot.
```yml
node.name: opensearch-master
node.name: opensearch-cluster_manager
```
You can also explicitly specify that this node is a master node. This is already true by default, but adding it makes it easier to identify the master node:
You can also explicitly specify that this node is a cluster manager node, even though it is already set to true by default. Set the node role to `cluster_manager` to make it easier to identify the cluster manager node.
```yml
node.master: true
```
Then make the node a dedicated master that wont perform double-duty as a data node:
```yml
node.data: false
```
Specify that this node will not be used for ingesting data:
```yml
node.ingest: false
node.roles: [ cluster_manager ]
```
#### Data nodes
@ -97,16 +86,15 @@ Change the name of two nodes to `opensearch-d1` and `opensearch-d2`, respectivel
```yml
node.name: opensearch-d1
```
```yml
node.name: opensearch-d2
```
You can make them master-eligible data nodes that will also be used for ingesting data:
You can make them cluster-manager-eligible data nodes that will also be used for ingesting data:
```yml
node.master: true
node.data: true
node.ingest: true
node.roles: [ data, ingest ]
```
You can also specify any other attributes that you'd like to set for the data nodes.
@ -119,12 +107,10 @@ Change the name of the coordinating node to `opensearch-c1`:
node.name: opensearch-c1
```
Every node is a coordinating node by default, so to make this node a dedicated coordinating node, set `node.master`, `node.data`, and `node.ingest` to `false`:
Every node is a coordinating node by default, so to make this node a dedicated coordinating node, set `node.roles` to an empty list:
```yml
node.master: false
node.data: false
node.ingest: false
node.roles: []
```
## Step 3: Bind a cluster to specific IP addresses
@ -141,25 +127,22 @@ To form a multi-node cluster, specify the IP address of the node:
network.host: <IP address of the node>
```
Make sure to configure these settings on all of your nodes.
## Step 4: Configure discovery hosts for a cluster
Now that you've configured the network hosts, you need to configure the discovery hosts.
Zen Discovery is the built-in, default mechanism that uses [unicast](https://en.wikipedia.org/wiki/Unicast) to find other nodes in the cluster.
You can generally just add all your master-eligible nodes to the `discovery.seed_hosts` array. When a node starts up, it finds the other master-eligible nodes, determines which one is the master, and asks to join the cluster.
You can generally just add all of your cluster-manager-eligible nodes to the `discovery.seed_hosts` array. When a node starts up, it finds the other cluster-manager-eligible nodes, determines which one is the cluster manager, and asks to join the cluster.
For example, for `opensearch-master` the line looks something like this:
For example, for `opensearch-cluster_manager` the line looks something like this:
```yml
discovery.seed_hosts: ["<private IP of opensearch-d1>", "<private IP of opensearch-d2>", "<private IP of opensearch-c1>"]
```
## Step 5: Start the cluster
After you set the configurations, start OpenSearch on all nodes:
@ -181,8 +164,8 @@ curl -XGET https://<private-ip>:9200/_cat/nodes?v -u 'admin:admin' --insecure
```
```
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
x.x.x.x 13 61 0 0.02 0.04 0.05 mi * opensearch-master
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role cluster_manager name
x.x.x.x 13 61 0 0.02 0.04 0.05 mi * opensearch-cluster_manager
x.x.x.x 16 60 0 0.06 0.05 0.05 md - opensearch-d1
x.x.x.x 34 38 0 0.12 0.07 0.06 md - opensearch-d2
x.x.x.x 23 38 0 0.12 0.07 0.06 md - opensearch-c1
@ -190,7 +173,6 @@ x.x.x.x 23 38 0 0.12 0.07 0.06 md - o
To better understand and monitor your cluster, use the [cat API]({{site.url}}{{site.baseurl}}/opensearch/catapis/).
## (Advanced) Step 6: Configure shard allocation awareness or forced awareness
If your nodes are spread across several geographical zones, you can configure shard allocation awareness to allocate all replica shards to a zone thats different from their primary shard.
@ -202,6 +184,7 @@ To configure shard allocation awareness, add zone attributes to `opensearch-d1`
```yml
node.attr.zone: zoneA
```
```yml
node.attr.zone: zoneB
```
@ -242,7 +225,6 @@ If that is not the case, and `opensearch-d1` and `opensearch-d2` do not have the
Choosing allocation awareness or forced awareness depends on how much space you might need in each zone to balance your primary and replica shards.
## (Advanced) Step 7: Set up a hot-warm architecture
You can design a hot-warm architecture where you first index your data to hot nodes---fast and expensive---and after a certain period of time move them to warm nodes---slow and cheap.
@ -256,6 +238,7 @@ To configure a hot-warm storage architecture, add `temp` attributes to `opensear
```yml
node.attr.temp: hot
```
```yml
node.attr.temp: warm
```
@ -326,7 +309,6 @@ A popular approach is to configure your [index templates]({{site.url}}{{site.bas
You can then use the [Index State Management (ISM)]({{site.url}}{{site.baseurl}}/im-plugin/) plugin to periodically check the age of an index and specify actions to take on it. For example, when the index reaches a specific age, change the `index.routing.allocation.require.temp` setting to `warm` to automatically move your data from hot nodes to warm nodes.
## Next steps
If you are using the security plugin, the previous request to `_cat/nodes?v` might have failed with an initialization error. For full guidance around using the security plugin, see [Security configuration]({{site.url}}{{site.baseurl}}/security-plugin/configuration/index/).

View File

@ -74,7 +74,6 @@ You don't mark settings in `opensearch.yml` as persistent or transient, and sett
```yml
cluster.name: my-application
action.auto_create_index: true
compatibility.override_main_response_version: true
```
The demo configuration includes a number of settings for the security plugin that you should modify before using OpenSearch for a production workload. To learn more, see [Security]({{site.url}}{{site.baseurl}}/security-plugin/).

View File

@ -262,4 +262,4 @@ You can use wildcards to delete more than one data stream.
We recommend deleting data from a data stream using an ISM policy.
You can also use [asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/index/) and [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) and [PPL]({{site.url}}{{site.baseurl}}/search-plugins/ppl/index/) to query your data stream directly. You can also use the security plugin to define granular permissions on the data stream name.
You can also use [asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/index/) and [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) and [PPL]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/index/) to query your data stream directly. You can also use the security plugin to define granular permissions on the data stream name.

View File

@ -6,13 +6,13 @@ nav_order: 12
# Index aliases
An alias is a virtual index name that can point to one or more indices.
An alias is a virtual index name that can point to one or more indexes.
If your data is spread across multiple indices, rather than keeping track of which indices to query, you can create an alias and query it instead.
If your data is spread across multiple indexes, rather than keeping track of which indexes to query, you can create an alias and query it instead.
For example, if youre storing logs into indices based on the month and you frequently query the logs for the previous two months, you can create a `last_2_months` alias and update the indices it points to each month.
For example, if youre storing logs into indexes based on the month and you frequently query the logs for the previous two months, you can create a `last_2_months` alias and update the indexes it points to each month.
Because you can change the indices an alias points to at any time, referring to indices using aliases in your applications allows you to reindex your data without any downtime.
Because you can change the indexes an alias points to at any time, referring to indexes using aliases in your applications allows you to reindex your data without any downtime.
---
@ -63,7 +63,7 @@ To check if `alias1` refers to `index-1`, run the following command:
GET alias1
```
## Add or remove indices
## Add or remove indexes
You can perform multiple actions in the same `_aliases` operation.
For example, the following command removes `index-1` and adds `index-2` to `alias1`:
@ -90,7 +90,7 @@ POST _aliases
The `add` and `remove` actions occur atomically, which means that at no point will `alias1` point to both `index-1` and `index-2`.
You can also add indices based on an index pattern:
You can also add indexes based on an index pattern:
```json
POST _aliases
@ -108,7 +108,7 @@ POST _aliases
## Manage aliases
To list the mapping of aliases to indices, run the following command:
To list the mapping of aliases to indexes, run the following command:
```json
GET _cat/aliases?v
@ -121,7 +121,7 @@ alias index filter routing.index routing.search
alias1 index-1 * - -
```
To check which indices an alias points to, run the following command:
To check which indexes an alias points to, run the following command:
```json
GET _alias/alias1
@ -166,7 +166,7 @@ PUT index-1
## Create filtered aliases
You can create a filtered alias to access a subset of documents or fields from the underlying indices.
You can create a filtered alias to access a subset of documents or fields from the underlying indexes.
This command adds only a specific timestamp field to `alias1`:

View File

@ -68,16 +68,16 @@ PUT movies/_doc/1
Because you must specify an ID, if you run this command 10 times, you still have just one document indexed with the `_version` field incremented to 10.
Indices default to one primary shard and one replica. If you want to specify non-default settings, create the index before adding documents:
Indexes default to one primary shard and one replica. If you want to specify non-default settings, create the index before adding documents:
```json
PUT more-movies
{ "settings": { "number_of_shards": 6, "number_of_replicas": 2 } }
```
## Naming restrictions for indices
## Naming restrictions for indexes
OpenSearch indices have the following naming restrictions:
OpenSearch indexes have the following naming restrictions:
- All letters must be lowercase.
- Index names can't begin with underscores (`_`) or hyphens (`-`).

View File

@ -6,7 +6,7 @@ nav_order: 15
# Index templates
Index templates let you initialize new indices with predefined mappings and settings. For example, if you continuously index log data, you can define an index template so that all of these indices have the same number of shards and replicas.
Index templates let you initialize new indexes with predefined mappings and settings. For example, if you continuously index log data, you can define an index template so that all of these indexes have the same number of shards and replicas.
### Create a template
@ -16,7 +16,7 @@ To create an index template, use a POST request:
POST _index_template
```
This command creates a template named `daily_logs` and applies it to any new index whose name matches the regular expression `logs-2020-01-*` and also adds it to the `my_logs` alias:
This command creates a template named `daily_logs` and applies it to any new index whose name matches the pattern `logs-2020-01-*` and also adds it to the `my_logs` alias:
```json
PUT _index_template/daily_logs
@ -95,7 +95,9 @@ GET logs-2020-01-01
}
```
Any additional indices that match this pattern---`logs-2020-01-02`, `logs-2020-01-03`, and so on---will inherit the same mappings and settings.
Any additional indexes that match this pattern---`logs-2020-01-02`, `logs-2020-01-03`, and so on---will inherit the same mappings and settings.
Index patterns cannot contain any of the following characters: `:`, `"`, `+`, `/`, `\`, `|`, `?`, `#`, `>`, and `<`.
### Retrieve a template
@ -111,12 +113,6 @@ To find a template by its name:
GET _index_template/daily_logs
```
To get a list of all your templates:
```json
GET _index_template/daily_logs
```
To get a list of all templates that match a pattern:
```json
@ -131,7 +127,7 @@ HEAD _index_template/<name>
### Configure multiple templates
You can create multiple index templates for your indices. If the index name matches more than one template, OpenSearch merges all mappings and settings from all matching templates and applies them to the index.
You can create multiple index templates for your indexes. If the index name matches more than one template, OpenSearch merges all mappings and settings from all matching templates and applies them to the index.
The settings from the more recently created index templates override the settings of older index templates. So, you can first define a few common settings in a generic template that can act as a catch-all and then add more specialized settings as required.

View File

@ -0,0 +1,66 @@
---
layout: default
title: Ansible playbook
parent: Install OpenSearch
nav_order: 60
---
# Ansible playbook
You can use an Ansible playbook to install and configure a production-ready OpenSearch cluster along with OpenSearch Dashboards.
The Ansible playbook only supports deployment of OpenSearch and OpenSearch Dashboards to CentOS7 hosts.
{: .note }
## Prerequisites
Make sure you have [Ansible](https://www.ansible.com/) and [Java 8](https://www.java.com/en/download/manual.jsp) installed.
## Configuration
1. Clone the OpenSearch [ansible-playbook](https://github.com/opensearch-project/ansible-playbook) repository:
```bash
git clone https://github.com/opensearch-project/ansible-playbook
```
2. Configure the node properties in the `inventories/opensearch/hosts` file:
```bash
ansible_host=<Public IP address> ansible_user=root ip=<Private IP address / 0.0.0.0>
```
where:
- `ansible_host` is the IP address of the target node that you want the Ansible playbook to install OpenSearch and OpenSearch DashBoards on.
- `ip` is the IP address that you want OpenSearch and OpenSearch DashBoards to bind to. You can specify the private IP of the target node, or localhost, or 0.0.0.0.
3. You can modify the default configuration values in the `inventories/opensearch/group_vars/all/all.yml` file. For example, you can increase the Java memory heap size:
```bash
xms_value: 8
xmx_value: 8
```
Make sure you have direct SSH access into the root user of the target node.
{: .note }
## Run OpenSearch and OpenSearch Dashboards using Ansible playbook
1. Run the Ansible playbook with root privileges:
```bash
ansible-playbook -i inventories/opensearch/hosts opensearch.yml --extra-vars "admin_password=Test@123 kibanaserver_password=Test@6789"
```
You can set the passwords for reserved users (`admin` and `kibanaserver`) using the `admin_password` and `kibanaserver_password` variables.
2. After the deployment process is complete, you can access OpenSearch and OpenSearch Dashboards with the username `admin` and the password that you set for the `admin_password` variable.
If you bind `ip` to a private IP or localhost, make sure you're logged into the server that deployed the playbook to access OpenSearch and OpenSearch Dashboards:
```bash
curl https://localhost:9200 -u 'admin:Test@123' --insecure
```
If you bind `ip` to 0.0.0.0, then replace `localhost` with the public IP or the private IP (if it's in the same network).

View File

@ -8,12 +8,12 @@ nav_order: 2
# Operating system and JVM compatibility
- We recommend installing OpenSearch on RHEL- or Debian-based Linux distributions that use [systemd](https://en.wikipedia.org/wiki/Systemd), such as CentOS, Amazon Linux 2, and Ubuntu (LTS). OpenSearch should work on many Linux distributions, but we only test a handful.
- The OpenSearch tarball ships with a compatible version of Java in the `jdk` directory. To find its version, run `./jdk/bin/java -version`. For example, the OpenSearch 1.0.0 tarball ships with Java 15 (non-LTS).
- The OpenSearch tarball ships with a compatible version of Java in the `jdk` directory. To find its version, run `./jdk/bin/java -version`. For example, the OpenSearch 1.0.0 tarball ships with Java 15.0.1+9 (non-LTS), while OpenSearch 1.3.0 includes Java 11.0.14.1+1 (LTS).
- OpenSearch 1.0 to 1.2.4 is built and tested with Java 15, while OpenSearch 1.3.0 is built and tested with Java 8, 11 and 14.
{% comment %}`./jdk/bin/java -version` doesn't work on macOS with zsh at the moment, and I have no idea why. Maybe we need a macOS artifact. Regardless, the command works on Amazon Linux 2 with bash and presumably other distros. - aetter{% endcomment %}
To use a different Java installation, set the `OPENSEARCH_JAVA_HOME` environment variable to the Java install location. We recommend Java 11 (LTS), but OpenSearch also works with Java 8.
To use a different Java installation, set the `OPENSEARCH_JAVA_HOME` or `JAVA_HOME` environment variable to the Java install location. We recommend Java 11 (LTS), but OpenSearch also works with Java 8.
OpenSearch version | Compatible Java versions | Recommended operating systems
:--- | :--- | :---
1.x | 8, 11 | Red Hat Enterprise Linux 7, 8; CentOS 7, 8; Amazon Linux 2; Ubuntu 16.04, 18.04, 20.04
1.0 - 1.2.x | 11, 15 | Red Hat Enterprise Linux 7, 8; CentOS 7, 8; Amazon Linux 2; Ubuntu 16.04, 18.04, 20.04
1.3.x | 8, 11, 14 | Red Hat Enterprise Linux 7, 8; CentOS 7, 8; Amazon Linux 2; Ubuntu 16.04, 18.04, 20.04

View File

@ -24,7 +24,7 @@ services:
- cluster.name=opensearch-cluster
- node.name=opensearch-node1
- discovery.seed_hosts=opensearch-node1,opensearch-node2
- cluster.initial_master_nodes=opensearch-node1,opensearch-node2
- cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2
- bootstrap.memory_lock=true # along with the memlock settings below, disables swapping
- "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM
- network.host=0.0.0.0 # required if not using the demo security configuration
@ -43,11 +43,11 @@ services:
- ./admin.pem:/usr/share/opensearch/config/admin.pem
- ./admin-key.pem:/usr/share/opensearch/config/admin-key.pem
- ./custom-opensearch.yml:/usr/share/opensearch/config/opensearch.yml
- ./internal_users.yml:/usr/share/opensearch/plugins/opensearch-security/securityconfig/internal_users.yml
- ./roles_mapping.yml:/usr/share/opensearch/plugins/opensearch-security/securityconfig/roles_mapping.yml
- ./tenants.yml:/usr/share/opensearch/plugins/opensearch-security/securityconfig/tenants.yml
- ./roles.yml:/usr/share/opensearch/plugins/opensearch-security/securityconfig/roles.yml
- ./action_groups.yml:/usr/share/opensearch/plugins/opensearch-security/securityconfig/action_groups.yml
- ./internal_users.yml:/usr/share/opensearch/config/opensearch-security/internal_users.yml
- ./roles_mapping.yml:/usr/share/opensearch/config/opensearch-security/roles_mapping.yml
- ./tenants.yml:/usr/share/opensearch/config/opensearch-security/tenants.yml
- ./roles.yml:/usr/share/opensearch/config/opensearch-security/roles.yml
- ./action_groups.yml:/usr/share/opensearch/config/opensearch-security/action_groups.yml
ports:
- 9200:9200
- 9600:9600 # required for Performance Analyzer
@ -60,7 +60,7 @@ services:
- cluster.name=opensearch-cluster
- node.name=opensearch-node2
- discovery.seed_hosts=opensearch-node1,opensearch-node2
- cluster.initial_master_nodes=opensearch-node1,opensearch-node2
- cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2
- bootstrap.memory_lock=true
- "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
- network.host=0.0.0.0
@ -79,15 +79,15 @@ services:
- ./admin.pem:/usr/share/opensearch/config/admin.pem
- ./admin-key.pem:/usr/share/opensearch/config/admin-key.pem
- ./custom-opensearch.yml:/usr/share/opensearch/config/opensearch.yml
- ./internal_users.yml:/usr/share/opensearch/plugins/opensearch-security/securityconfig/internal_users.yml
- ./roles_mapping.yml:/usr/share/opensearch/plugins/opensearch-security/securityconfig/roles_mapping.yml
- ./tenants.yml:/usr/share/opensearch/plugins/opensearch-security/securityconfig/tenants.yml
- ./roles.yml:/usr/share/opensearch/plugins/opensearch-security/securityconfig/roles.yml
- ./action_groups.yml:/usr/share/opensearch/plugins/opensearch-security/securityconfig/action_groups.yml
- ./internal_users.yml:/usr/share/opensearch/config/opensearch-security/internal_users.yml
- ./roles_mapping.yml:/usr/share/opensearch/config/opensearch-security/roles_mapping.yml
- ./tenants.yml:/usr/share/opensearch/config/opensearch-security/tenants.yml
- ./roles.yml:/usr/share/opensearch/config/opensearch-security/roles.yml
- ./action_groups.yml:/usr/share/opensearch/config/opensearch-security/action_groups.yml
networks:
- opensearch-net
opensearch-dashboards
image: opensearchproject/opensearch-dashboards:{{site.opensearch_version}}
opensearch-dashboards:
image: opensearchproject/opensearch-dashboards:{{site.opensearch_dashboards_version}}
container_name: opensearch-dashboards
ports:
- 5601:5601
@ -133,7 +133,7 @@ opendistro_security.audit.config.disabled_rest_categories: NONE
opendistro_security.audit.config.disabled_transport_categories: NONE
```
Use this same override process to specify new [authentication settings]({{site.url}}{{site.baseurl}}/security-plugin/configuration/configuration/) in `/usr/share/opensearch/plugins/opensearch-security/securityconfig/config.yml`, as well as new default [internal users, roles, mappings, action groups, and tenants]({{site.url}}{{site.baseurl}}/security-plugin/configuration/yaml/).
Use this same override process to specify new [authentication settings]({{site.url}}{{site.baseurl}}/security-plugin/configuration/configuration/) in `/usr/share/opensearch/config/opensearch-security/config.yml`, as well as new default [internal users, roles, mappings, action groups, and tenants]({{site.url}}{{site.baseurl}}/security-plugin/configuration/yaml/).
To start the cluster, run `docker-compose up`.

View File

@ -10,8 +10,8 @@ nav_order: 3
You can pull the OpenSearch Docker image just like any other image:
```bash
docker pull opensearchproject/opensearch:{{site.opensearch_version}}
docker pull opensearchproject/opensearch-dashboards:{{site.opensearch_version}}
docker pull opensearchproject/opensearch:latest
docker pull opensearchproject/opensearch-dashboards:latest
```
To check available versions, see [Docker Hub](https://hub.docker.com/u/opensearchproject).
@ -77,6 +77,8 @@ To stop the cluster and delete all data volumes, run:
docker-compose down -v
```
If you're running your cluster in a production environment, be sure to refer to [Important settings]({{site.url}}{{site.baseurl}}/opensearch/install/important-settings) when configuring your machine and cluster.
{: .note}
#### Sample Docker Compose file
@ -131,7 +133,7 @@ services:
networks:
- opensearch-net
opensearch-dashboards:
image: opensearchproject/opensearch-dashboards:{{site.opensearch_version}}
image: opensearchproject/opensearch-dashboards:{{site.opensearch_dashboards_version}}
container_name: opensearch-dashboards
ports:
- 5601:5601
@ -150,7 +152,7 @@ networks:
opensearch-net:
```
If you override `opensearch_dashboards.yml` settings using environment variables, as seen above, use all uppercase letters and periods in place of underscores (e.g. for `opensearch.hosts`, use `OPENSEARCH_HOSTS`).
If you override `opensearch_dashboards.yml` settings using environment variables, as seen above, use all uppercase letters and underscores in place of periods (e.g. for `opensearch.hosts`, use `OPENSEARCH_HOSTS`).
{: .note}
@ -178,7 +180,7 @@ services:
volumes:
- opensearch-data2:/usr/share/opensearch/data
- ./custom-opensearch.yml:/usr/share/opensearch/config/opensearch.yml
opensearch-dashboards
opensearch-dashboards:
volumes:
- ./custom-opensearch_dashboards.yml:/usr/share/opensearch-dashboards/config/opensearch_dashboards.yml
```
@ -220,7 +222,7 @@ You can also configure `docker-compose.yml` and `opensearch.yml` [to take your o
docker ps # Look up the container id
docker exec -it <container-id> /bin/bash
# Inside container
cd plugins/opensearch_performance_analyzer/pa_config/
cd config/opensearch-performance-analyzer/
vi performance-analyzer.properties
```
@ -322,3 +324,63 @@ In this case, `opensearch.yml` is a "vanilla" version of the file with no plugin
cluster.name: "docker-cluster"
network.host: 0.0.0.0
```
## Sample Docker Compose file for development
You can use this sample file as a development environment.
This sample file starts one OpenSearch node and a container for OpenSearch Dashboards with the security plugin disabled.
```yml
version: '3'
services:
opensearch-node1:
image: opensearchproject/opensearch:{{site.opensearch_version}}
container_name: opensearch-node1
environment:
- cluster.name=opensearch-cluster
- node.name=opensearch-node1
- bootstrap.memory_lock=true # along with the memlock settings below, disables swapping
- "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM
- "DISABLE_INSTALL_DEMO_CONFIG=true" # disables execution of install_demo_configuration.sh bundled with security plugin, which installs demo certificates and security configurations to OpenSearch
- "DISABLE_SECURITY_PLUGIN=true" # disables security plugin entirely in OpenSearch by setting plugins.security.disabled: true in opensearch.yml
- "discovery.type=single-node" # disables bootstrap checks that are enabled when network.host is set to a non-loopback address
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536 # maximum number of open files for the OpenSearch user, set to at least 65536 on modern systems
hard: 65536
volumes:
- opensearch-data1:/usr/share/opensearch/data
ports:
- 9200:9200
- 9600:9600 # required for Performance Analyzer
networks:
- opensearch-net
opensearch-dashboards:
image: opensearchproject/opensearch-dashboards:{{site.opensearch_dashboards_version}}
container_name: opensearch-dashboards
ports:
- 5601:5601
expose:
- "5601"
environment:
- 'OPENSEARCH_HOSTS=["http://opensearch-node1:9200"]'
- "DISABLE_SECURITY_DASHBOARDS_PLUGIN=true" # disables security dashboards plugin in OpenSearch Dashboards
networks:
- opensearch-net
volumes:
opensearch-data1:
networks:
opensearch-net:
```
The environment variable `"DISABLE_SECURITY_DASHBOARDS_PLUGIN=true"` disables the security dashboards plugin in OpenSearch Dashboards by removing the security dashboards plugin folder, removing all related settings in the `opensearch_dashboards.yml` file, and setting the `opensearch.hosts` entry protocol from HTTPS to HTTP.
You can't reverse this step as the security dashboards plugin is removed in the process.
To re-enable security for OpenSearch Dashboards, start a new container and set `DISABLE_SECURITY_DASHBOARDS_PLUGIN` to false or leave it unset.
{: .note}

View File

@ -20,7 +20,7 @@ Resource | Description
The specification in the default Helm chart supports many standard use cases and setups. You can modify the default chart to configure your desired specifications and set Transport Layer Security (TLS) and role-based access control (RBAC).
For information about the default configuration, steps to configure security, and configurable parameters, see the
[README](https://github.com/opensearch-project/helm-charts/tree/main/charts).
[README](https://github.com/opensearch-project/helm-charts/blob/main/README.md).
The instructions here assume you have a Kubernetes cluster with Helm preinstalled. See the [Kubernetes documentation](https://kubernetes.io/docs/setup/) for steps to configure a Kubernetes cluster and the [Helm documentation](https://helm.sh/docs/intro/install/) to install Helm.
{: .note }

View File

@ -25,7 +25,7 @@ The [sample docker-compose.yml]({{site.url}}{{site.baseurl}}/opensearch/install/
- `bootstrap.memory_lock=true`
Disbles swapping (along with `memlock`). Swapping can dramatically decrease performance and stability, so you should ensure it is disabled on production clusters.
Disables swapping (along with `memlock`). Swapping can dramatically decrease performance and stability, so you should ensure it is disabled on production clusters.
- `OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m`

View File

@ -5,9 +5,9 @@ parent: Install OpenSearch
nav_order: 90
---
# Standalone OpenSearch plugin installation
# OpenSearch plugin installation
If you don't want to use the all-in-one OpenSearch installation options, you can install the individual plugins on a compatible OpenSearch cluster, just like any other plugin.
You can install individual plugins on an OpenSearch cluster.
---
@ -19,7 +19,42 @@ If you don't want to use the all-in-one OpenSearch installation options, you can
---
## Plugin compatibility
## Install a plugin
The install command takes a plugin id, which may be any of the following:
- An official OpenSearch plugin name
- Maven coordinates to a plugin zip file
- A URL to a plugin zip file
If you're installing an official OpenSearch plugin, use:
```
bin/opensearch-plugin install <plugin-name>
```
For a plugin installed via zip, use:
```
bin/opensearch-plugin install <name|Zip File|Url>
```
Restart your OpenSearch node after installing a plugin.
## Remove a plugin
```
bin/opensearch-plugin remove <plugin-name>
```
Restart your OpenSearch node after removing a plugin.
## List installed plugins
To check which plugins are currently installed:
```
bin/opensearch-plugin list
```
## Available plugins
<table>
<thead style="text-align: left">
@ -30,9 +65,155 @@ If you don't want to use the all-in-one OpenSearch installation options, you can
</thead>
<tbody>
<tr>
<td>1.1.0</td>
<td>2.0.0.0-rc1</td>
<td>
<pre>opensearch-alerting 1.1.0.0
<pre>opensearch-alerting 2.0.0.0-rc1
opensearch-anomaly-detection 2.0.0.0-rc1
opensearch-asynchronous-search 2.0.0.0-rc1
opensearch-cross-cluster-replication 2.0.0.0-rc1
opensearch-index-management 2.0.0.0-rc1
opensearch-job-scheduler 2.0.0.0-rc1
opensearch-knn 2.0.0.0-rc1
opensearch-ml 2.0.0.0-rc1
opensearch-observability 2.0.0.0-rc1
opensearch-performance-analyzer 2.0.0.0-rc1
opensearch-reports-scheduler 2.0.0.0-rc1
opensearch-security 2.0.0.0-rc1
opensearch-sql 2.0.0.0-rc1
</pre>
</td>
</tr>
<tr>
<td>1.3.1</td>
<td>
<pre>opensearch-alerting 1.3.1.0
opensearch-anomaly-detection 1.3.1.0
opensearch-asynchronous-search 1.3.1.0
opensearch-cross-cluster-replication 1.3.1.0
opensearch-index-management 1.3.1.0
opensearch-job-scheduler 1.3.1.0
opensearch-knn 1.3.1.0
opensearch-ml 1.3.1.0
opensearch-observability 1.3.1.0
opensearch-performance-analyzer 1.3.1.0
opensearch-reports-scheduler 1.3.1.0
opensearch-security 1.3.1.0
opensearch-sql 1.3.1.0
</pre>
</td>
</tr>
<tr>
<td>1.3.0</td>
<td>
<pre>opensearch-alerting 1.3.0.0
opensearch-anomaly-detection 1.3.0.0
opensearch-asynchronous-search 1.3.0.0
opensearch-cross-cluster-replication 1.3.0.0
opensearch-index-management 1.3.0.0
opensearch-job-scheduler 1.3.0.0
opensearch-knn 1.3.0.0
opensearch-ml 1.3.0.0
opensearch-observability 1.3.0.0
opensearch-performance-analyzer 1.3.0.0
opensearch-reports-scheduler 1.3.0.0
opensearch-security 1.3.0.0
opensearch-sql 1.3.0.0
</pre>
</td>
</tr>
<tr>
<td>1.2.4</td>
<td>
<pre>opensearch-alerting 1.2.4.0
opensearch-anomaly-detection 1.2.4.0
opensearch-asynchronous-search 1.2.4.0
opensearch-cross-cluster-replication 1.2.4.0
opensearch-index-management 1.2.4.0
opensearch-job-scheduler 1.2.4.0
opensearch-knn 1.2.4.0
opensearch-observability 1.2.4.0
opensearch-performance-analyzer 1.2.4.0
opensearch-reports-scheduler 1.2.4.0
opensearch-security 1.2.4.0
opensearch-sql 1.2.4.0
</pre>
</td>
</tr>
<td>1.2.3</td>
<td>
<pre>opensearch-alerting 1.2.3.0
opensearch-anomaly-detection 1.2.3.0
opensearch-asynchronous-search 1.2.3.0
opensearch-cross-cluster-replication 1.2.3.0
opensearch-index-management 1.2.3.0
opensearch-job-scheduler 1.2.3.0
opensearch-knn 1.2.3.0
opensearch-observability 1.2.3.0
opensearch-performance-analyzer 1.2.3.0
opensearch-reports-scheduler 1.2.3.0
opensearch-security 1.2.3.0
opensearch-sql 1.2.3.0
</pre>
</td>
</tr>
<tr>
<td>1.2.2</td>
<td>
<pre>opensearch-alerting 1.2.2.0
opensearch-anomaly-detection 1.2.2.0
opensearch-asynchronous-search 1.2.2.0
opensearch-cross-cluster-replication 1.2.2.0
opensearch-index-management 1.2.2.0
opensearch-job-scheduler 1.2.2.0
opensearch-knn 1.2.2.0
opensearch-observability 1.2.2.0
opensearch-performance-analyzer 1.2.2.0
opensearch-reports-scheduler 1.2.2.0
opensearch-security 1.2.2.0
opensearch-sql 1.2.2.0
</pre>
</td>
</tr>
<tr>
<td>1.2.1</td>
<td>
<pre>opensearch-alerting 1.2.1.0
opensearch-anomaly-detection 1.2.1.0
opensearch-asynchronous-search 1.2.1.0
opensearch-cross-cluster-replication 1.2.1.0
opensearch-index-management 1.2.1.0
opensearch-job-scheduler 1.2.1.0
opensearch-knn 1.2.1.0
opensearch-observability 1.2.1.0
opensearch-performance-analyzer 1.2.1.0
opensearch-reports-scheduler 1.2.1.0
opensearch-security 1.2.1.0
opensearch-sql 1.2.1.0
</pre>
</td>
</tr>
<tr>
<td>1.2.0</td>
<td>
<pre>opensearch-alerting 1.2.0.0
opensearch-anomaly-detection 1.2.0.0
opensearch-asynchronous-search 1.2.0.0
opensearch-cross-cluster-replication 1.2.0.0
opensearch-index-management 1.2.0.0
opensearch-job-scheduler 1.2.0.0
opensearch-knn 1.2.0.0
opensearch-observability 1.2.0.0
opensearch-performance-analyzer 1.2.0.0
opensearch-reports-scheduler 1.2.0.0
opensearch-security 1.2.0.0
opensearch-sql 1.2.0.0
</pre>
</td>
</tr>
<tr>
<td>1.1.0</td>
<td>
<pre>opensearch-alerting 1.1.0.0
opensearch-anomaly-detection 1.1.0.0
opensearch-asynchronous-search 1.1.0.0
opensearch-cross-cluster-replication 1.1.0.0
@ -83,217 +264,3 @@ opensearch-sql 1.0.0.0
</tr>
</tbody>
</table>
To install plugins manually, you must have the exact version of OpenSearch installed, down to the minor version.
{% comment %}
To get a list of available OpenSearch versions on CentOS 7 and Amazon Linux 2, run the following command:
```bash
sudo yum list opensearch-oss --showduplicates
```
Then you can specify the version that you need:
```bash
sudo yum install opensearch-oss-6.7.1
```
## Install plugins
Navigate to the OpenSearch home directory (most likely, it is `/usr/share/opensearch`), and run the install command for each plugin.
### Security
```bash
sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-security/opensearch-security-{{site.opensearch_major_minor_version}}.1.0.zip
```
After installing the security plugin, you can run `sudo sh /usr/share/opensearch/plugins/opensearch-security/tools/install_demo_configuration.sh` to quickly get started with demo certificates. Otherwise, you must configure it manually and run [securityadmin.sh]({{site.url}}{{site.baseurl}}/security-plugin/configuration/security-admin/).
The security plugin has a corresponding [OpenSearch Dashboards plugin]({{site.url}}{{site.baseurl}}/opensearch-dashboards/install/plugins) that you probably want to install as well.
### Job scheduler
```bash
sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-job-scheduler/opensearch-job-scheduler-{{site.opensearch_major_minor_version}}.0.0.zip
```
### Alerting
```bash
sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-alerting/opensearch-alerting-{{site.opensearch_major_minor_version}}.1.0.zip
```
To install Alerting, you must first install the Job Scheduler plugin. Alerting has a corresponding [OpenSearch Dashboards plugin]({{site.url}}{{site.baseurl}}/opensearch-dashboards/install/plugins/) that you probably want to install as well.
### SQL
```bash
sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-sql/opensearch-sql-{{site.opensearch_major_minor_version}}.2.0.zip
```
### Reports scheduler
```bash
sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-reports-scheduler/opensearch-reports-scheduler-{{site.opensearch_major_minor_version}}.0.0.zip
```
### Index State Management
```bash
sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-index-management/opensearch-index-management-{{site.opensearch_major_minor_version}}.2.0.zip
```
To install Index State Management, you must first install the Job Scheduler plugin. ISM has a corresponding [OpenSearch Dashboards plugin]({{site.url}}{{site.baseurl}}/opensearch-dashboards/install/plugins/) that you probably want to install as well.
### k-NN
k-NN is only available as part of the all-in-one installs: Docker, RPM, and Debian.
### Anomaly detection
```bash
sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-anomaly-detection/opensearch-anomaly-detection-{{site.opensearch_major_minor_version}}.0.0.zip
```
### Asynchronous search
```bash
sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-asynchronous-search/opensearch-asynchronous-search-{{site.opensearch_major_minor_version}}.0.1.zip
```
### Performance Analyzer
```bash
sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/performance-analyzer/opensearch-performance-analyzer-{{site.opensearch_major_minor_version}}.0.0.zip
```
Performance Analyzer requires some manual configuration after installing the plugin:
1. Create `/usr/lib/systemd/system/opensearch-performance-analyzer.service` based on [this file](https://github.com/opensearch-project/performance-analyzer/blob/master/packaging/opensearch-performance-analyzer.service).
1. Make the CLI executable:
```bash
sudo chmod +x /usr/share/opensearch/bin/performance-analyzer-agent-cli
```
1. Run the appropriate `postinst` script for your Linux distribution:
```bash
# Debian-based distros
sudo sh /usr/share/opensearch/plugins/opensearch-performance-analyzer/install/deb/postinst.sh 1
# RPM distros
sudo sh /usr/share/opensearch/plugins/opensearch-performance-analyzer/install/rpm/postinst.sh 1
```
1. Make Performance Analyzer accessible outside of the host machine
```bash
cd /usr/share/opensearch # navigate to the OpenSearch home directory
cd plugins/opensearch_performance_analyzer/pa_config/
vi performance-analyzer.properties
```
Uncomment the line `#webservice-bind-host` and set it to `0.0.0.0`:
```bash
# ======================== OpenSearch performance analyzer plugin config =========================
# NOTE: this is an example for Linux. Please modify the config accordingly if you are using it under other OS.
# WebService bind host; default to all interfaces
webservice-bind-host = 0.0.0.0
# Metrics data location
metrics-location = /dev/shm/performanceanalyzer/
# Metrics deletion interval (minutes) for metrics data.
# Interval should be between 1 to 60.
metrics-deletion-interval = 1
# If set to true, the system cleans up the files behind it. So at any point, we should expect only 2
# metrics-db-file-prefix-path files. If set to false, no files are cleaned up. This can be useful, if you are archiving
# the files and wouldn't like for them to be cleaned up.
cleanup-metrics-db-files = true
# WebService exposed by App's port
webservice-listener-port = 9600
# Metric DB File Prefix Path location
metrics-db-file-prefix-path = /tmp/metricsdb_
https-enabled = false
#Setup the correct path for certificates
certificate-file-path = specify_path
private-key-file-path = specify_path
# Plugin Stats Metadata file name, expected to be in the same location
plugin-stats-metadata = plugin-stats-metadata
# Agent Stats Metadata file name, expected to be in the same location
agent-stats-metadata = agent-stats-metadata
```
1. Start the OpenSearch service:
```bash
sudo systemctl start opensearch.service
```
1. Send a test request:
```bash
curl -XGET "localhost:9600/_plugins/_performanceanalyzer/metrics?metrics=Latency,CPU_Utilization&agg=avg,max&dim=ShardID&nodes=all"
```
{% endcomment %}
## List installed plugins
To check your installed plugins:
```bash
sudo bin/opensearch-plugin list
```
## Remove plugins
If you are removing Performance Analyzer, see below. Otherwise, you can remove the plugin with a single command:
```bash
sudo bin/opensearch-plugin remove <plugin-name>
```
Then restart OpenSearch on the node:
```bash
sudo systemctl restart opensearch.service
```
## Update plugins
OpenSearch doesn't update plugins. Instead, you have to remove and reinstall them:
```bash
sudo bin/opensearch-plugin remove <plugin-name>
sudo bin/opensearch-plugin install <plugin-name>
```

View File

@ -59,7 +59,7 @@ In a tarball installation, Performance Analyzer collects data when it is enabled
```bash
cd /usr/share/opensearch # navigate to the OpenSearch home directory
cd plugins/opensearch_performance_analyzer/pa_config/
cd config/opensearch-performance-analyzer/
vi performance-analyzer.properties
```
@ -114,7 +114,7 @@ In a tarball installation, Performance Analyzer collects data when it is enabled
1. Launch the agent CLI:
```bash
OPENSEARCH_HOME="$PWD" ./bin/performance-analyzer-agent-cli
OPENSEARCH_HOME="$PWD" OPENSEARCH_PATH_CONF="$PWD/config" ./bin/performance-analyzer-agent-cli
```
1. In a separate window, enable the Performance Analyzer plugin:

105
_opensearch/mappings.md Normal file
View File

@ -0,0 +1,105 @@
---
layout: default
title: Mapping
nav_order: 13
---
# About Mappings
You can define how documents and their fields are stored and indexed by creating a mapping.
If you're just starting to build out your cluster and data, you may not know exactly how your data should be stored. In those cases, you can use dynamic mappings, which tell OpenSearch to dynamically add data and its fields. However, if you know exactly what types your data falls under and want to enforce that standard, then you can use explicit mappings.
For example, if you want to indicate that `year` should be of type `text` instead of an `integer`, and `age` should be an `integer`, you can do so with explicit mappings. Using dynamic mapping OpenSearch might interpret both `year` and `age` as integers.
This section provides an example for how to create an index mapping, and how to add a document to it that will get ip_range validated.
#### Table of contents
1. TOC
{:toc}
---
## Dynamic mapping
When you index a document, OpenSearch adds fields automatically with dynamic mapping. You can also explicitly add fields to an index mapping.
#### Dynamic mapping types
Type | Description
:--- | :---
null | A `null` field can't be indexed or searched. When a field is set to null, OpenSearch behaves as if that field has no values.
boolean | OpenSearch accepts `true` and `false` as boolean values. An empty string is equal to `false.`
float | A single-precision 32-bit floating point number.
double | A double-precision 64-bit floating point number.
integer | A signed 32-bit number.
object | Objects are standard JSON objects, which can have fields and mappings of their own. For example, a `movies` object can have additional properties such as `title`, `year`, and `director`.
array | Arrays in OpenSearch can only store values of one type, such as an array of just integers or strings. Empty arrays are treated as though they are fields with no values.
text | A string sequence of characters that represent full-text values.
keyword | A string sequence of structured characters, such as an email address or ZIP code.
date detection string | Enabled by default, if new string fields match a date's format, then the string is processed as a `date` field. For example, `date: "2012/03/11"` is processed as a date.
numeric detection string | If disabled, OpenSearch may automatically process numeric values as strings when they should be processed as numbers. When enabled, OpenSearch can process strings into `long`, `integer`, `short`, `byte`, `double`, `float`, `half_float`, `scaled_float`, and `unsigned_long`. Default is disabled.
## Explicit mapping
If you know exactly what your field data types need to be, you can specify them in your request body when creating your index.
```json
{
"mappings": {
"properties": {
"year": { "type" : "text" },
"age": { "type" : "integer" },
"director":{ "type" : "text" }
}
}
}
```
### Response
```json
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "sample-index1"
}
```
---
## Mapping example usage
The following example shows how to create a mapping to specify that OpenSearch should ignore any documents with malformed ip addresses that do not conform to the `ip_range` data type. You accomplish this by setting the `ignore_malformed` parameter to `true`.
### Create an index with an ip_range mapping
To create an index, use a PUT request:
```json
PUT _index_ip
{
"mappings": {
"dynamic_templates": [
{
"ip_range": {
"match": "*ip_range",
"mapping": {
"type": "ip_range",
"ignore_malformed": true
}
}
}
]
}
}
```
You can add a document to your index that has an IP range specified:
```json
PUT _index_ip/_doc/<id>
{
"source_ip_range": "192.168.1.1/32"
}
```
This indexed ip_range does not throw an error because `ignore_malformed` is set to true.

View File

@ -14,7 +14,7 @@ You can use pipeline aggregations to compute complex statistical and mathematica
## Pipeline aggregation syntax
A pipeline aggregation uses the the `buckets_path` property to access the results of other aggregations.
A pipeline aggregation uses the `buckets_path` property to access the results of other aggregations.
The `buckets_path` property has a specific syntax:
```

View File

@ -113,7 +113,6 @@ GET shakespeare/_search
"hits": [
{
"_index": "shakespeare",
"_type": "_doc",
"_id": "88020",
"_score": 11.356054,
"_source": {

View File

@ -21,7 +21,7 @@ This page lists all full-text query types and common options. Given the sheer nu
## Match
Creates a [boolean query](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/BooleanQuery.html) that returns results if the search term is present in the field.
Creates a [boolean query](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/BooleanQuery.html) that returns results if the search term is present in the field.
The most basic form of the query provides only a field (`title`) and a term (`wind`):
@ -126,7 +126,7 @@ GET _search
## Match boolean prefix
Similar to [match](#match), but creates a [prefix query](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/PrefixQuery.html) out of the last term in the query string.
Similar to [match](#match), but creates a [prefix query](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PrefixQuery.html) out of the last term in the query string.
```json
GET _search
@ -164,7 +164,7 @@ GET _search
## Match phrase
Creates a [phrase query](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/PhraseQuery.html) that matches a sequence of terms.
Creates a [phrase query](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PhraseQuery.html) that matches a sequence of terms.
```json
GET _search
@ -198,7 +198,7 @@ GET _search
## Match phrase prefix
Similar to [match phrase](#match-phrase), but creates a [prefix query](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/PrefixQuery.html) out of the last term in the query string.
Similar to [match phrase](#match-phrase), but creates a [prefix query](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PrefixQuery.html) out of the last term in the query string.
```json
GET _search
@ -410,7 +410,7 @@ Option | Valid values | Description
`allow_leading_wildcard` | Boolean | Whether `*` and `?` are allowed as the first character of a search term. The default is true.
`analyze_wildcard` | Boolean | Whether OpenSearch should attempt to analyze wildcard terms. Some analyzers do a poor job at this task, so the default is false.
`analyzer` | `standard, simple, whitespace, stop, keyword, pattern, <language>, fingerprint` | The analyzer you want to use for the query. Different analyzers have different character filters, tokenizers, and token filters. The `stop` analyzer, for example, removes stop words (e.g. "an," "but," "this") from the query string.
`auto_generate_synonyms_phrase_query` | Boolean | A value of true (default) automatically generates [phrase queries](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/PhraseQuery.html) for multi-term synonyms. For example, if you have the synonym `"ba, batting average"` and search for "ba," OpenSearch searches for `ba OR "batting average"` (if this option is true) or `ba OR (batting AND average)` (if this option is false).
`auto_generate_synonyms_phrase_query` | Boolean | A value of true (default) automatically generates [phrase queries](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PhraseQuery.html) for multi-term synonyms. For example, if you have the synonym `"ba, batting average"` and search for "ba," OpenSearch searches for `ba OR "batting average"` (if this option is true) or `ba OR (batting AND average)` (if this option is false).
`boost` | Floating-point | Boosts the clause by the given multiplier. Useful for weighing clauses in compound queries. The default is 1.0.
`cutoff_frequency` | Between `0.0` and `1.0` or a positive integer | This value lets you define high and low frequency terms based on number of occurrences in the index. Numbers between 0 and 1 are treated as a percentage. For example, 0.10 is 10%. This value means that if a word occurs within the search field in more than 10% of the documents on the shard, OpenSearch considers the word "high frequency" and deemphasizes it when calculating search score.<br /><br />Because this setting is *per shard*, testing its impact on search results can be challenging unless a cluster has many documents.
`enable_position_increments` | Boolean | When true, result queries are aware of position increments. This setting is useful when the removal of stop words leaves an unwanted "gap" between terms. The default is true.
@ -420,7 +420,7 @@ Option | Valid values | Description
`fuzzy_transpositions` | Boolean | Setting `fuzzy_transpositions` to true (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the `fuzziness` option. For example, the distance between `wind` and `wnid` is 1 if `fuzzy_transpositions` is true (swap "n" and "i") and 2 if it is false (delete "n", insert "n"). <br /><br />If `fuzzy_transpositions` is false, `rewind` and `wnid` have the same distance (2) from `wind`, despite the more human-centric opinion that `wnid` is an obvious typo. The default is a good choice for most use cases.
`lenient` | Boolean | Setting `lenient` to true lets you ignore data type mismatches between the query and the document field. For example, a query string of "8.2" could match a field of type `float`. The default is false.
`low_freq_operator` | `and, or` | The operator for low-frequency terms. The default is `or`. See [Common terms](#common-terms) queries and `operator` in this table.
`max_determinized_states` | Positive integer | The maximum number of "[states](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/util/automaton/Operations.html#DEFAULT_MAX_DETERMINIZED_STATES)" (a measure of complexity) that Lucene can create for query strings that contain regular expressions (e.g. `"query": "/wind.+?/"`). Larger numbers allow for queries that use more memory. The default is 10,000.
`max_determinized_states` | Positive integer | The maximum number of "[states](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/util/automaton/Operations.html#DEFAULT_MAX_DETERMINIZED_STATES)" (a measure of complexity) that Lucene can create for query strings that contain regular expressions (e.g. `"query": "/wind.+?/"`). Larger numbers allow for queries that use more memory. The default is 10,000.
`max_expansions` | Positive integer | Fuzzy queries "expand to" a number of matching terms that are within the distance specified in `fuzziness`. Then OpenSearch tries to match those terms against its indices. `max_expansions` specifies the maximum number of terms that the fuzzy query expands to. The default is 50.
`minimum_should_match` | Positive or negative integer, positive or negative percentage, combination | If the query string contains multiple search terms and you used the `or` operator, the number of terms that need to match for the document to be considered a match. For example, if `minimum_should_match` is 2, "wind often rising" does not match "The Wind Rises." If `minimum_should_match` is 1, it matches. This option also has `low_freq` and `high_freq` properties for [Common terms](#common-terms) queries.
`operator` | `or, and` | If the query string contains multiple search terms, whether all terms need to match (`and`) or only one term needs to match (`or`) for a document to be considered a match.
@ -428,8 +428,8 @@ Option | Valid values | Description
`prefix_length` | `0` (default) or a positive integer | The number of leading characters that are not considered in fuzziness.
`quote_field_suffix` | String | This option lets you search different fields depending on whether terms are wrapped in quotes. For example, if `quote_field_suffix` is `".exact"` and you search for `"lightly"` (in quotes) in the `title` field, OpenSearch searches the `title.exact` field. This second field might use a different type (e.g. `keyword` rather than `text`) or a different analyzer. The default is null.
`rewrite` | `constant_score, scoring_boolean, constant_score_boolean, top_terms_N, top_terms_boost_N, top_terms_blended_freqs_N` | Determines how OpenSearch rewrites and scores multi-term queries. The default is `constant_score`.
`slop` | `0` (default) or a positive integer | Controls the degree to which words in a query can be misordered and still be considered a match. From the [Lucene documentation](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/PhraseQuery.html#getSlop--): "The number of other words permitted between words in query phrase. For example, to switch the order of two words requires two moves (the first move places the words atop one another), so to permit re-orderings of phrases, the slop must be at least two. A value of zero requires an exact match."
`slop` | `0` (default) or a positive integer | Controls the degree to which words in a query can be misordered and still be considered a match. From the [Lucene documentation](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PhraseQuery.html#getSlop--): "The number of other words permitted between words in query phrase. For example, to switch the order of two words requires two moves (the first move places the words atop one another), so to permit re-orderings of phrases, the slop must be at least two. A value of zero requires an exact match."
`tie_breaker` | `0.0` (default) to `1.0` | Changes the way OpenSearch scores searches. For example, a `type` of `best_fields` typically uses the highest score from any one field. If you specify a `tie_breaker` value between 0.0 and 1.0, the score changes to highest score + `tie_breaker` * score for all other matching fields. If you specify a value of 1.0, OpenSearch adds together the scores for all matching fields (effectively defeating the purpose of `best_fields`).
`time_zone` | UTC offset | The time zone to use (e.g. `-08:00`) if the query string contains a date range (e.g. `"query": "wind rises release_date[2012-01-01 TO 2014-01-01]"`). The default is `UTC`.
`type` | `best_fields, most_fields, cross-fields, phrase, phrase_prefix` | Determines how OpenSearch executes the query and scores the results. The default is `best_fields`.
`type` | `best_fields, most_fields, cross_fields, phrase, phrase_prefix` | Determines how OpenSearch executes the query and scores the results. The default is `best_fields`.
`zero_terms_query` | `none, all` | If the analyzer removes all terms from a query string, whether to match no documents (default) or all documents. For example, the `stop` analyzer removes all terms from the string "an but this."

View File

View File

@ -41,7 +41,6 @@ GET _search?q=speaker:queen
"hits": [
{
"_index": "new_shakespeare",
"_type": "_doc",
"_id": "28559",
"_score": 4.4368687,
"_source": {
@ -90,7 +89,6 @@ With query DSL, however, you can include an HTTP request body to look for result
"hits": [
{
"_index": "new_shakespeare",
"_type": "_doc",
"_id": "100763",
"_score": 7.8623476,
"_source": {
@ -105,7 +103,6 @@ With query DSL, however, you can include an HTTP request body to look for result
},
{
"_index": "shakespeare",
"_type": "_doc",
"_id": "28559",
"_score": 5.8923807,
"_source": {

View File

@ -95,7 +95,6 @@ The search query “To be, or not to be” is analyzed and tokenized into an arr
"hits" : [
{
"_index" : "shakespeare",
"_type" : "_doc",
"_id" : "34229",
"_score" : 17.419369,
"_source" : {
@ -110,7 +109,6 @@ The search query “To be, or not to be” is analyzed and tokenized into an arr
},
{
"_index" : "shakespeare",
"_type" : "_doc",
"_id" : "109930",
"_score" : 14.883024,
"_source" : {
@ -125,7 +123,6 @@ The search query “To be, or not to be” is analyzed and tokenized into an arr
},
{
"_index" : "shakespeare",
"_type" : "_doc",
"_id" : "103117",
"_score" : 14.782743,
"_source" : {
@ -180,7 +177,6 @@ GET shakespeare/_search
"hits" : [
{
"_index" : "shakespeare",
"_type" : "_doc",
"_id" : "32700",
"_score" : 4.2540946,
"_source" : {
@ -195,7 +191,6 @@ GET shakespeare/_search
},
{
"_index" : "shakespeare",
"_type" : "_doc",
"_id" : "32702",
"_score" : 4.2540946,
"_source" : {
@ -210,7 +205,6 @@ GET shakespeare/_search
},
{
"_index" : "shakespeare",
"_type" : "_doc",
"_id" : "32709",
"_score" : 4.2540946,
"_source" : {
@ -448,5 +442,5 @@ GET shakespeare/_search
A few important notes:
- Regular expressions are applied to the terms in the field (i.e. tokens), not the entire field.
- Regular expressions use the Lucene syntax, which differs from more standardized implementations. Test thoroughly to ensure that you receive the results you expect. To learn more, see [the Lucene documentation](https://lucene.apache.org/core/{{site.lucene_version}}/core/index.html).
- Regular expressions use the Lucene syntax, which differs from more standardized implementations. Test thoroughly to ensure that you receive the results you expect. To learn more, see [the Lucene documentation](https://lucene.apache.org/core/8_9_0/core/index.html).
- `regexp` queries can be expensive operations and require the `search.allow_expensive_queries` setting to be set to `true`. Before making frequent `regexp` queries, test their impact on cluster performance and examine alternative queries for achieving similar results.

View File

@ -70,7 +70,8 @@ POST _reindex
"host":"https://<REST_endpoint_of_remote_cluster>:9200",
"username":"YOUR_USERNAME",
"password":"YOUR_PASSWORD"
}
},
"index": "source"
},
"dest":{
"index":"destination"

View File

@ -2,7 +2,7 @@
layout: default
title: Alias
parent: REST API reference
nav_order: 5
nav_order: 20
---
# Alias
@ -67,7 +67,7 @@ aliases | Array | Array of alias names. | Yes if you don't supply an `alias` fie
filter | Object | A filter to use with the alias, so the alias points to a filtered part of the index. | No
is_hidden | Boolean | Specifies whether the alias should be hidden from results that include wildcard expressions | No
must_exist | Boolean | Specifies whether the alias to remove must exist. | No
is_write_index | Boolean | Specifies whether the index should be a write index. An alias can only have one write index at a time. | No
is_write_index | Boolean | Specifies whether the index should be a write index. An alias can only have one write index at a time. If a write request is submitted to a alias that links to multiple indexes, OpenSearch executes the request only on the write index. | No
routing | String | Used to assign a custom value to a shard for specific operations. | No
index_routing | String | Assigns a custom value to a shard only for index operations. | No
search_routing | String | Assigns a custom value to a shard only for search operations. | No

View File

@ -1,44 +1,44 @@
---
layout: default
title: cat master
title: CAT cluster manager
parent: CAT
grand_parent: REST API reference
nav_order: 30
has_children: false
---
# cat master
# CAT cluster_manager
Introduced 1.0
{: .label .label-purple }
The cat master operation lists information that helps identify the elected master node.
The cat cluster manager operation lists information that helps identify the elected cluster manager node.
## Example
```
GET _cat/master?v
GET _cat/cluster_manager?v
```
## Path and HTTP methods
```
GET _cat/master
GET _cat/cluster_manager
```
## URL parameters
All cat master URL parameters are optional.
All cat cluster manager URL parameters are optional.
In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/opensearch/rest-api/cat/index#common-url-parameters), you can specify the following parameters:
Parameter | Type | Description
:--- | :--- | :---
master_timeout | Time | The amount of time to wait for a connection to the master node. Default is 30 seconds.
cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds.
## Response
```json
id | host | ip | node
ZaIkkUd4TEiAihqJGkp5CA | 172.18.0.3 | 172.18.0.3 | odfe-node2
ZaIkkUd4TEiAihqJGkp5CA | 172.18.0.3 | 172.18.0.3 | opensearch-node2
```

View File

@ -13,7 +13,7 @@ Introduced 1.0
The cat nodes operation lists node-level information, including node roles and load metrics.
A few important node metrics are `pid`, `name`, `master`, `ip`, `port`, `version`, `build`, `jdk`, along with `disk`, `heap`, `ram`, and `file_desc`.
A few important node metrics are `pid`, `name`, `cluster_manager`, `ip`, `port`, `version`, `build`, `jdk`, along with `disk`, `heap`, `ram`, and `file_desc`.
## Example
@ -37,8 +37,8 @@ Parameter | Type | Description
:--- | :--- | :---
bytes | Byte size | Specify the units for byte size. For example, `7kb` or `6gb`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/).
full_id | Boolean | If true, return the full node ID. If false, return the shortened node ID. Defaults to false.
local | Boolean | Whether to return information from the local node only instead of from the master node. Default is false.
master_timeout | Time | The amount of time to wait for a connection to the master node. Default is 30 seconds.
local | Boolean | Whether to return information from the local node only instead of from the cluster_manager node. Default is false.
cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds.
time | Time | Specify the units for time. For example, `5d` or `7h`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/).
include_unloaded_segments | Boolean | Whether to include information from segments not loaded into memory. Default is false.
@ -46,7 +46,9 @@ include_unloaded_segments | Boolean | Whether to include information from segmen
## Response
```json
ip | heap.percent | ram.percent | cpu load_1m | load_5m | load_15m | node.role | master | name
172.18.0.3 | 31 | 97 | 3 | 0.03 | 0.10 | 0.14 dimr | * | odfe-node2
172.18.0.4 | 45 | 97 | 3 | 0.19 | 0.14 | 0.15 dimr | - | odfe-node1
ip | heap.percent | ram.percent | cpu load_1m | load_5m | load_15m | node.role | cluster_manager | name
172.18.0.3 | 31 | 97 | 3 | 0.03 | 0.10 | 0.14 dimr | * | opensearch-node2
172.18.0.4 | 45 | 97 | 3 | 0.19 | 0.14 | 0.15 dimr | - | opensearch-node1
```

View File

@ -33,8 +33,8 @@ In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/opensear
Parameter | Type | Description
:--- | :--- | :---
local | Boolean | Whether to return information from the local node only instead of from the master node. Default is false.
master_timeout | Time | The amount of time to wait for a connection to the master node. Default is 30 seconds.
local | Boolean | Whether to return information from the local node only instead of from the cluster_manager node. Default is false.
cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds.
time | Time | Specify the units for time. For example, `5d` or `7h`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/).

Some files were not shown because too many files have changed in this diff Show More