Merge branch 'main' into java-client

This commit is contained in:
Keith Chan 2021-10-21 16:47:30 -07:00 committed by GitHub
commit 50bc532349
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
110 changed files with 6030 additions and 2299 deletions

View File

@ -196,17 +196,17 @@ If you're making major changes to the documentation and need to see the rendered
## New releases
1. Branch.
1. Change the `opensearch_version` and `opensearch_major_version` variables in `_config.yml`.
1. Change the `opensearch_version`, `opensearch_major_minor_version`, and `lucene_version` variables in `_config.yml`.
1. Start up a new cluster using the updated Docker Compose file in `docs/install/docker.md`.
1. Update the version table in `version-history.md`.
Use `curl -XGET https://localhost:9200 -u admin:admin -k` to verify the OpenSearch version.
Use `curl -XGET https://localhost:9200 -u admin:admin -k` to verify the OpenSearch and Lucene versions.
1. Update the plugin compatibility table in `docs/install/plugin.md`.
1. Update the plugin compatibility table in `_opensearch/install/plugin.md`.
Use `curl -XGET https://localhost:9200/_cat/plugins -u admin:admin -k` to get the correct version strings.
1. Update the plugin compatibility table in `docs/opensearch-dashboards/plugins.md`.
1. Update the plugin compatibility table in `_dashboards/install/plugins.md`.
Use `docker ps` to find the ID for the OpenSearch Dashboards node. Then use `docker exec -it <opensearch-dashboards-node-id> /bin/bash` to get shell access. Finally, run `./bin/opensearch-dashboards-plugin list` to get the plugins and version strings.

View File

@ -27,12 +27,18 @@ PUT _cluster/settings
}
```
[Just like any other setting]({{site.url}}{{site.baseurl}}/opensearch/configuration/), the alternative is to add the following line to `opensearch.yml` on each node and then restart the node:
```yml
compatibility.override_main_response_version: true
```
## Downloads
You can download the OpenSearch output plugin for Logstash from [OpenSearch downloads](https://opensearch.org/downloads.html). The Logstash output plugin is compatible with OpenSearch and Elasticsearch OSS (7.10.2 or lower).
These versions of Beats offer the best compatibility with OpenSearch. For more information, see the [compatibility matrices](#compatibility-matrices).
These are the latest versions of Beats OSS with OpenSearch compatibility. For more information, see the [compatibility matrices](#compatibility-matrices).
- [Filebeat OSS 7.12.1](https://www.elastic.co/downloads/past-releases/filebeat-oss-7-12-1)
- [Metricbeat OSS 7.12.1](https://www.elastic.co/downloads/past-releases/metricbeat-oss-7-12-1)
@ -41,7 +47,7 @@ These versions of Beats offer the best compatibility with OpenSearch. For more i
- [Winlogbeat OSS 7.12.1](https://www.elastic.co/downloads/past-releases/winlogbeat-oss-7-12-1)
- [Auditbeat OSS 7.12.1](https://elastic.co/downloads/past-releases/auditbeat-oss-7-12-1)
Some users report compatibility issues with ingest pipelines on these versions of Beats. If you use ingest pipelines with OpenSearch, consider using the 7.10.2 versions of Beats OSS instead.
Some users report compatibility issues with ingest pipelines on these versions of Beats. If you use ingest pipelines with OpenSearch, consider using the 7.10.2 versions of Beats instead.
{: .note }

145
_clients/go.md Normal file
View File

@ -0,0 +1,145 @@
---
layout: default
title: Go client
nav_order: 80
---
# Go client
The OpenSearch Go client lets you connect your Go application with the data in your OpenSearch cluster.
## Setup
If you're creating a new project:
```go
go mod init
```
To add the client to your project, import it like any other module:
```go
go get github.com/opensearch-project/opensearch-go
```
## Sample code
This sample code creates a client, adds an index with non-default settings, inserts a document, searches for the document, deletes the document, and finally deletes the index:
```go
package main
import (
"os"
"context"
"crypto/tls"
"fmt"
opensearch "github.com/opensearch-project/opensearch-go"
opensearchapi "github.com/opensearch-project/opensearch-go/opensearchapi"
"net/http"
"strings"
)
const IndexName = "go-test-index1"
func main() {
// Initialize the client with SSL/TLS enabled.
client, err := opensearch.NewClient(opensearch.Config{
Transport: &http.Transport{
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
},
Addresses: []string{"https://localhost:9200"},
Username: "admin", // For testing only. Don't store credentials in code.
Password: "admin",
})
if err != nil {
fmt.Println("cannot initialize", err)
os.Exit(1)
}
// Print OpenSearch version information on console.
fmt.Println(client.Info())
// Define index mapping.
mapping := strings.NewReader(`{
'settings': {
'index': {
'number_of_shards': 4
}
}
}`)
// Create an index with non-default settings.
res := opensearchapi.CreateRequest{
Index: IndexName,
Body: mapping,
}
fmt.Println("creating index", res)
// Add a document to the index.
document := strings.NewReader(`{
"title": "Moneyball",
"director": "Bennett Miller",
"year": "2011"
}`)
docId := "1"
req := opensearchapi.IndexRequest{
Index: IndexName,
DocumentID: docId,
Body: document,
}
insertResponse, err := req.Do(context.Background(), client)
if err != nil {
fmt.Println("failed to insert document ", err)
os.Exit(1)
}
fmt.Println(insertResponse)
// Search for the document.
content := strings.NewReader(`{
"size": 5,
"query": {
"multi_match": {
"query": "miller",
"fields": ["title^2", "director"]
}
}
}`)
search := opensearchapi.SearchRequest{
Body: content,
}
searchResponse, err := search.Do(context.Background(), client)
if err != nil {
fmt.Println("failed to search document ", err)
os.Exit(1)
}
fmt.Println(searchResponse)
// Delete the document.
delete := opensearchapi.DeleteRequest{
Index: IndexName,
DocumentID: docId,
}
deleteResponse, err := delete.Do(context.Background(), client)
if err != nil {
fmt.Println("failed to delete document ", err)
os.Exit(1)
}
fmt.Println("deleting document")
fmt.Println(deleteResponse)
// Delete previously created index.
deleteIndex := opensearchapi.IndicesDeleteRequest{
Index: []string{IndexName},
}
deleteIndexResponse, err := deleteIndex.Do(context.Background(), client)
if err != nil {
fmt.Println("failed to delete index ", err)
os.Exit(1)
}
fmt.Println("deleting index", deleteIndexResponse)
}
```

10
_clients/grafana.md Normal file
View File

@ -0,0 +1,10 @@
---
layout: default
title: Grafana
nav_order: 150
has_children: false
---
# Grafana support
Grafana has a data source plugin that lets you explore and visualize your OpenSearch data. For information on getting started with the plugin, see the [Grafana overview page](https://grafana.com/grafana/plugins/grafana-opensearch-datasource/).

View File

@ -9,6 +9,20 @@ redirect_from:
# OpenSearch client compatibility
OpenSearch provides clients for several popular programming languages, with more coming. In general, clients are compatible with clusters running the same major version of OpenSearch (`major.minor.patch`).
For example, a 1.0.0 client works with an OpenSearch 1.1.0 cluster, but might not support any non-breaking API changes in OpenSearch 1.1.0. A 1.2.0 client works with the same cluster, but might allow you to pass unsupported options in certain functions. We recommend using the same version for both, but if your tests pass after a cluster upgrade, you don't necessarily need to upgrade your clients immediately.
{% comment %}
* [OpenSearch Java client]({{site.url}}{{site.baseurl}}/clients/java/)
{% endcomment %}
* [OpenSearch Python client]({{site.url}}{{site.baseurl}}/clients/python/)
* [OpenSearch JavaScript (Node.js) client]({{site.url}}{{site.baseurl}}/clients/javascript/)
* [OpenSearch Go client]({{site.url}}{{site.baseurl}}/clients/go/)
## Legacy clients
Most clients that work with Elasticsearch OSS 7.10.2 *should* work with OpenSearch, but the latest versions of those clients might include license or version checks that artificially break compatibility. This page includes recommendations around which versions of those clients to use for best compatibility with OpenSearch.
Client | Recommended version
@ -18,7 +32,7 @@ Client | Recommended version
[Python Elasticsearch client](https://pypi.org/project/elasticsearch/7.13.4/) | 7.13.4
[Elasticsearch Node.js client](https://www.npmjs.com/package/@elastic/elasticsearch/v/7.13.0) | 7.13.0
Clients exist for a wide variety of languages, so if you test a client and verify that it works, please [submit a PR](https://github.com/opensearch-project/documentation-website/pulls) and add it to this table.
If you test a legacy client and verify that it works, please [submit a PR](https://github.com/opensearch-project/documentation-website/pulls) and add it to this table.
{% comment %}

141
_clients/javascript.md Normal file
View File

@ -0,0 +1,141 @@
---
layout: default
title: JavaScript client
nav_order: 90
---
# JavaScript client
The OpenSearch JavaScript client provides a safer and easier way to interact with your OpenSearch cluster. Rather than using OpenSearch from the browser and potentially exposing your data to the public, you can build an OpenSearch client that takes care of sending requests to your cluster.
The client contains a library of APIs that let you perform different operations on your cluster and return a standard response body. The example here demonstrates some basic operations like creating an index, adding documents, and searching your data.
## Setup
To add the client to your project, install it from [npm](https://www.npmjs.com):
```bash
npm install @opensearch-project/opensearch
```
To install a specific major version of the client, run the following command:
```bash
npm install @opensearch-project/opensearch@<version>
```
If you prefer to add the client manually or just want to examine the source code, see [opensearch-js](https://github.com/opensearch-project/opensearch-js) on GitHub.
Then require the client:
```javascript
const { Client } = require("@opensearch-project/opensearch");
```
## Sample code
```javascript
"use strict";
var host = "localhost";
var protocol = "https";
var port = 9200;
var auth = "admin:admin"; // For testing only. Don't store credentials in code.
var ca_certs_path = "/full/path/to/root-ca.pem";
// Optional client certificates if you don't want to use HTTP basic authentication.
// var client_cert_path = '/full/path/to/client.pem'
// var client_key_path = '/full/path/to/client-key.pem'
// Create a client with SSL/TLS enabled.
var { Client } = require("@opensearch-project/opensearch");
var fs = require("fs");
var client = new Client({
node: protocol + "://" + auth + "@" + host + ":" + port,
ssl: {
ca: fs.readFileSync(ca_certs_path),
// You can turn off certificate verification (rejectUnauthorized: false) if you're using self-signed certificates with a hostname mismatch.
// cert: fs.readFileSync(client_cert_path),
// key: fs.readFileSync(client_key_path)
},
});
async function search() {
// Create an index with non-default settings.
var index_name = "books";
var settings = {
settings: {
index: {
number_of_shards: 4,
number_of_replicas: 3,
},
},
};
var response = await client.indices.create({
index: index_name,
body: settings,
});
console.log("Creating index:");
console.log(response.body);
// Add a document to the index.
var document = {
title: "The Outsider",
author: "Stephen King",
year: "2018",
genre: "Crime fiction",
};
var id = "1";
var response = await client.index({
id: id,
index: index_name,
body: document,
refresh: true,
});
console.log("Adding document:");
console.log(response.body);
// Search for the document.
var query = {
query: {
match: {
title: {
query: "The Outsider",
},
},
},
};
var response = await client.search({
index: index_name,
body: query,
});
console.log("Search results:");
console.log(response.body.hits);
// Delete the document.
var response = await client.delete({
index: index_name,
id: id,
});
console.log("Deleting document:");
console.log(response.body);
// Delete the index.
var response = await client.indices.delete({
index: index_name,
});
console.log("Deleting index:");
console.log(response.body);
}
search().catch(console.log);
```

View File

@ -57,6 +57,9 @@ The OpenSearch Logstash plugin has two installation options at this time: Linux
Make sure you have [Java Development Kit (JDK)](https://www.oracle.com/java/technologies/javase-downloads.html) version 8 or 11 installed.
If you're migrating from an existing Logstash installation, you can install the [OpenSearch output plugin](https://rubygems.org/gems/logstash-output-opensearch/) manually and [update pipeline.conf](https://opensearch.org/docs/latest/clients/logstash/ship-to-opensearch/). We include this plugin by default in our tarball and Docker downloads.
{: .note }
### Tarball
1. Download the Logstash tarball from [OpenSearch downloads](https://opensearch.org/downloads.html).

128
_clients/python.md Normal file
View File

@ -0,0 +1,128 @@
---
layout: default
title: Python client
nav_order: 70
---
# Python client
The OpenSearch Python client provides a more natural syntax for interacting with your cluster. Rather than sending HTTP requests to a given URL, you can create an OpenSearch client for your cluster and call the client's built-in functions.
{% comment %}
`opensearch-py` is the lower-level of the two Python clients. If you want a general client for assorted operations, it's a great choice. If you want a higher-level client strictly for indexing and search operations, consider [opensearch-dsl-py]({{site.url}}{{site.baseurl}}/clients/python-dsl/).
{% endcomment %}
## Setup
To add the client to your project, install it using [pip](https://pip.pypa.io/):
```bash
pip install opensearch-py
```
Then import it like any other module:
```python
from opensearchpy import OpenSearch
```
If you prefer to add the client manually or just want to examine the source code, see [opensearch-py on GitHub](https://github.com/opensearch-project/opensearch-py).
## Sample code
```python
from opensearchpy import OpenSearch
host = 'localhost'
port = 9200
auth = ('admin', 'admin') # For testing only. Don't store credentials in code.
ca_certs_path = '/full/path/to/root-ca.pem' # Provide a CA bundle if you use intermediate CAs with your root CA.
# Optional client certificates if you don't want to use HTTP basic authentication.
# client_cert_path = '/full/path/to/client.pem'
# client_key_path = '/full/path/to/client-key.pem'
# Create the client with SSL/TLS enabled, but hostname verification disabled.
client = OpenSearch(
hosts = [{'host': host, 'port': port}],
http_compress = True, # enables gzip compression for request bodies
http_auth = auth,
# client_cert = client_cert_path,
# client_key = client_key_path,
use_ssl = True,
verify_certs = True,
ssl_assert_hostname = False,
ssl_show_warn = False,
ca_certs = ca_certs_path
)
# Create an index with non-default settings.
index_name = 'python-test-index'
index_body = {
'settings': {
'index': {
'number_of_shards': 4
}
}
}
response = client.indices.create(index_name, body=index_body)
print('\nCreating index:')
print(response)
# Add a document to the index.
document = {
'title': 'Moneyball',
'director': 'Bennett Miller',
'year': '2011'
}
id = '1'
response = client.index(
index = index_name,
body = document,
id = id,
refresh = True
)
print('\nAdding document:')
print(response)
# Search for the document.
q = 'miller'
query = {
'size': 5,
'query': {
'multi_match': {
'query': q,
'fields': ['title^2', 'director']
}
}
}
response = client.search(
body = query,
index = index_name
)
print('\nSearch results:')
print(response)
# Delete the document.
response = client.delete(
index = index_name,
id = id
)
print('\nDeleting document:')
print(response)
# Delete the index.
response = client.indices.delete(
index = index_name
)
print('\nDeleting index:')
print(response)
```

View File

@ -1,12 +1,13 @@
title: OpenSearch documentation
description: >- # this means to ignore newlines until "baseurl:"
Documentation for OpenSearch, the Apache 2.0 search, analytics, and visualization suite with advanced security, alerting, SQL support, automated index management, deep performance analysis, and more.
baseurl: "/docs" # the subpath of your site, e.g. /blog
baseurl: "/docs/latest" # the subpath of your site, e.g. /blog
url: "https://opensearch.org" # the base hostname & protocol for your site, e.g. http://example.com
permalink: /:path/
opensearch_version: 1.0.0
opensearch_major_minor_version: 1.0
opensearch_version: 1.1.0
opensearch_major_minor_version: 1.1
lucene_version: 8_9_0
# Build settings
markdown: kramdown
@ -44,6 +45,9 @@ collections:
im-plugin:
permalink: /:collection/:path/
output: true
replication-plugin:
permalink: /:collection/:path/
output: true
monitoring-plugins:
permalink: /:collection/:path/
output: true
@ -80,6 +84,9 @@ just_the_docs:
im-plugin:
name: Index management plugin
nav_fold: true
replication-plugin:
name: Replication plugin
nav_fold: true
monitoring-plugins:
name: Monitoring plugins
nav_fold: true

142
_dashboards/dql.md Normal file
View File

@ -0,0 +1,142 @@
---
layout: default
title: Dashboards query language
nav_order: 99
---
# Dashboards Query Language
Similar to the [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index) that lets you use the HTTP request body to search for data, you can use the Dashbaords Query Language (DQL) in OpenSearch Dashboards to search for data and visualizations.
For example, if you want to see all visualizations of visits to a host based in the US, enter `geo.dest:US` into the search field, and Dashboards refreshes to display all related data.
Just like the query DSL, DQL has a handful of query types, so use whichever best fits your use case.
This section uses the OpenSearch Dashboards sample web log data. To add sample data in Dashboards, log in to OpenSearch Dashboards, choose **Home**, **Add sample data**, and then **Add data**.
---
#### Table of contents
1. TOC
{:toc}
---
## Terms query
The most basic query is to just specify the term you're searching for.
```
host:www.example.com
```
To access an object's nested field, list the complete path to the field separated by periods. For example, to retrieve the `lat` field in the `coordinates` object:
```
coordinates.lat:43.7102
```
DQL also supports leading and trailing wildcards, so you can search for any terms that match your pattern.
```
host.keyword:*.example.com/*
```
To check if a field exists or has any data, use a wildcard to see if Dashboards returns any results.
```
host.keyword:*
```
## Boolean query
To mix and match, or even combine, multiple queries for more refined results, you can use the boolean operators `and`, `or`, and `not`. DQL is not case sensitive, so `AND` and `and` are the same.
```
host.keyword:www.example.com and response.keyword:200
```
The following example demonstrates how to use multiple operators in one query.
```
geo.dest:US or response.keyword:200 and host.keyword:www.example.com
```
Remember that boolean operators follow the logical precedence order of `not`, `and`, and `or`, so if you have an expression like the previous example, `response.keyword:200 and host.keyword:www.example.com` gets evaluated first, and then Dashboards uses that result to compare with `geo.dest:US`.
To avoid confusion, we recommend using parentheses to dictate the order you want to evaluate in. If you want to evaluate `geo.dest:US or response.keyword:200` first, your expression becomes:
```
(geo.dest:US or response.keyword:200) and host.keyword:www.example.com
```
## Date and range queries
DQL also supports inequalities if you're using numeric inequalities.
```
bytes >= 15 and memory < 15
```
Similarly, you can use the same method to find a date before or after your query. `>` indicates a search for a date after your specified date, and `<` returns dates before.
```
@timestamp > "2020-12-14T09:35:33"
```
## Nested field query
If you have a document with nested fields, you have to specify which parts of the document you want to retrieve.
Suppose that you have the following document:
```json
{
"superheroes":[
{
"hero-name": "Superman",
"real-identity": "Clark Kent",
"age": 28
},
{
"hero-name": "Batman",
"real-identity": "Bruce Wayne",
"age": 26
},
{
"hero-name": "Flash",
"real-identity": "Barry Allen",
"age": 28
},
{
"hero-name": "Robin",
"real-identity": "Dick Grayson",
"age": 15
}
]
}
```
The following example demonstrates how to use DQL to retrieve a specific field.
```
superheroes: {hero-name: Superman}
```
If you want to retrieve multiple objects from your document, just specify all of the fields you want to retrieve.
```
superheroes: {hero-name: Superman} and superheroes: {hero-name: Batman}
```
The previous boolean and range queries still work, so you can submit a more refined query.
```
superheroes: {hero-name: Superman and age < 50}
```
If your document has an object nested within another object, you can still retrieve data by specifying all of the levels.
```
justice-league.superheroes: {hero-name:Superman}
```

View File

@ -5,9 +5,12 @@ nav_order: 1
has_children: false
has_toc: false
redirect_from:
- /docs/opensearch-dashboards/
- /dashboards/
---
{%- comment -%}The `/docs/opensearch-dashboards/` redirect is specifically to support the UI links in OpenSearch Dashboards 1.0.0.{%- endcomment -%}
# OpenSearch Dashboards
OpenSearch Dashboards is the default visualization tool for data in OpenSearch. It also serves as a user interface for many of the OpenSearch plugins, including security, alerting, Index State Management, SQL, and more.

View File

@ -20,7 +20,7 @@ Resource | Description
The specification in the default Helm chart supports many standard use cases and setups. You can modify the default chart to configure your desired specifications and set Transport Layer Security (TLS) and role-based access control (RBAC).
For information about the default configuration, steps to configure security, and configurable parameters, see the
[README](https://github.com/opensearch-project/opensearch-devops/blob/main/Helm/README.md).
[README](https://github.com/opensearch-project/helm-charts/tree/main/charts).
The instructions here assume you have a Kubernetes cluster with Helm preinstalled. See the [Kubernetes documentation](https://kubernetes.io/docs/setup/) for steps to configure a Kubernetes cluster and the [Helm documentation](https://helm.sh/docs/intro/install/) to install Helm.
{: .note }

View File

@ -28,6 +28,36 @@ If you don't want to use the all-in-one installation options, you can install th
</tr>
</thead>
<tbody>
<tr>
<td>1.1.0</td>
<td>
<pre>alertingDashboards 1.1.0.0
anomalyDetectionDashboards 1.1.0.0
ganttChartDashboards 1.1.0.0
indexManagementDashboards 1.1.0.0
notebooksDashboards 1.1.0.0
queryWorkbenchDashboards 1.1.0.0
reportsDashboards 1.1.0.0
securityDashboards 1.1.0.0
traceAnalyticsDashboards 1.1.0.0
</pre>
</td>
</tr>
<tr>
<td>1.0.1</td>
<td>
<pre>alertingDashboards 1.0.0.0
anomalyDetectionDashboards 1.0.0.0
ganttChartDashboards 1.0.0.0
indexManagementDashboards 1.0.1.0
notebooksDashboards 1.0.0.0
queryWorkbenchDashboards 1.0.0.0
reportsDashboards 1.0.1.0
securityDashboards 1.0.1.0
traceAnalyticsDashboards 1.0.0.0
</pre>
</td>
</tr>
<tr>
<td>1.0.0</td>
<td>
@ -40,36 +70,6 @@ queryWorkbenchDashboards 1.0.0.0
reportsDashboards 1.0.0.0
securityDashboards 1.0.0.0
traceAnalyticsDashboards 1.0.0.0
</pre>
</td>
</tr>
<tr>
<td>1.0.0-rc1</td>
<td>
<pre>alertingDashboards 1.0.0.0-rc1
anomalyDetectionDashboards 1.0.0.0-rc1
ganttChartDashboards 1.0.0.0-rc1
indexManagementDashboards 1.0.0.0-rc1
notebooksDashboards 1.0.0.0-rc1
queryWorkbenchDashboards 1.0.0.0-rc1
reportsDashboards 1.0.0.0-rc1
securityDashboards 1.0.0.0-rc1
traceAnalyticsDashboards 1.0.0.0-rc1
</pre>
</td>
</tr>
<tr>
<td>1.0.0-beta1</td>
<td>
<pre>alertingDashboards 1.0.0.0-beta1
anomalyDetectionDashboards 1.0.0.0-beta1
ganttChartDashboards 1.0.0.0-beta1
indexManagementDashboards 1.0.0.0-beta1
notebooksDashboards 1.0.0.0-beta1
queryWorkbenchDashboards 1.0.0.0-beta1
reportsDashboards 1.0.0.0-beta1
securityDashboards 1.0.0.0-beta1
traceAnalyticsDashboards 1.0.0.0-beta1
</pre>
</td>
</tr>

View File

@ -14,9 +14,10 @@ nav_order: 30
```bash
# x64
tar -zxf opensearch-dashboards-{{site.opensearch_version}}-linux-x64.tar.gz
cd opensearch-dashboards{% comment %}# ARM64
cd opensearch-dashboards
# ARM64
tar -zxf opensearch-dashboards-{{site.opensearch_version}}-linux-arm64.tar.gz
cd opensearch-dashboards{% endcomment %}
cd opensearch-dashboards
```
1. If desired, modify `config/opensearch_dashboards.yml`.
@ -26,5 +27,3 @@ nav_order: 30
```bash
./bin/opensearch-dashboards
```
1. See the [OpenSearch Dashboards documentation]({{site.url}}{{site.baseurl}}/dashboards/index/).

View File

@ -2,8 +2,12 @@
layout: default
title: WMS map server
nav_order: 5
redirect_from:
- /docs/opensearch-dashboards/maptiles/
---
{%- comment -%}The `/docs/opensearch-dashboards/maptiles/` redirect is specifically to support the UI links in OpenSearch Dashboards 1.0.0.{%- endcomment -%}
# Configure WMS map server
OpenSearch Dashboards includes default map tiles, but if you need more specialized maps, you can configure OpenSearch Dashboards to use a WMS map server:

View File

@ -1 +0,0 @@
message: "🔥 [OpenSearch 1.0 released on July 12th! Get it now!](/downloads.html)"

1
_data/alert.yml Normal file
View File

@ -0,0 +1 @@
message: "🌡️ [OpenSearch 1.1.0 arrived October 5 with cross-cluster replication, bucket-level alerting, and much, much more. Grab it here!](/downloads.html)"

6
_data/versions.json Normal file
View File

@ -0,0 +1,6 @@
{
"current": "1.1",
"past": [
"1.0"
]
}

View File

@ -0,0 +1,7 @@
---
layout: default
title: Dashboards developer guide
nav_order: 2
permalink: /dashboards-developer-guide/
redirect_to: https://github.com/opensearch-project/OpenSearch-Dashboards/blob/main/DEVELOPER_GUIDE.md
---

View File

@ -90,36 +90,36 @@ You can specify the following options.
Options | Description | Type | Required
:--- | :--- |:--- |:--- |
`source_index` | The name of the detector. | `string` | Yes
`target_index` | Specify the target index that the rolled up data is ingested into. You could either create a new target index or use an existing index. The target index cannot be a combination of raw and rolled up data. | `string` | Yes
`schedule` | Schedule of the index rollup job which can be an interval or a cron expression. | `object` | Yes
`schedule.interval` | Specify the frequency of execution of the rollup job. | `object` | No
`schedule.interval.start_time` | Start time of the interval. | `timestamp` | Yes
`schedule.interval.period` | Define the interval period. | `string` | Yes
`schedule.interval.unit` | Specify the time unit of the interval. | `string` | Yes
`schedule.interval.cron` | Optionally, specify a cron expression to define therollup frequency. | `list` | No
`schedule.interval.cron.expression` | Specify a Unix cron expression. | `string` | Yes
`schedule.interval.cron.timezone` | Specify timezones as defined by the IANA Time Zone Database. Defaults to UTC. | `string` | No
`description` | Optionally, describe the rollup job. | `string` | No
`enabled` | When true, the index rollup job is scheduled. Default is true. | `boolean` | Yes
`continuous` | Specify whether or not the index rollup job continuously rolls up data forever or just executes over the current data set once and stops. Default is false. | `boolean` | Yes
`error_notification` | Set up a Mustache message template sent for error notifications. For example, if an index rollup job fails, the system sends a message to a Slack channel. | `object` | No
`page_size` | Specify the number of buckets to paginate through at a time while rolling up. | `number` | Yes
`delay` | Specify time value to delay execution of the index rollup job. | `time_unit` | No
`dimensions` | Specify aggregations to create dimensions for the roll up time window. | `object` | Yes
`dimensions.date_histogram` | Specify either fixed_interval or calendar_interval, but not both. Either one limits what you can query in the target index. | `object` | No
`dimensions.date_histogram.fixed_interval` | Specify the fixed interval for aggregations in milliseconds, seconds, minutes, hours, or days. | `string` | No
`dimensions.date_histogram.calendar_interval` | Specify the calendar interval for aggregations in minutes, hours, days, weeks, months, quarters, or years. | `string` | No
`dimensions.date_histogram.field` | Specify the date field used in date histogram aggregation. | `string` | No
`dimensions.date_histogram.timezone` | Specify the timezones as defined by the IANA Time Zone Database. The default is UTC. | `string` | No
`dimensions.terms` | Specify the term aggregations that you want to roll up. | `object` | No
`dimensions.terms.fields` | Specify terms aggregation for compatible fields. | `object` | No
`dimensions.histogram` | Specify the histogram aggregations that you want to roll up. | `object` | No
`dimensions.histogram.field` | Add a field for histogram aggregations. | `string` | Yes
`dimensions.histogram.interval` | Specify the histogram aggregation interval for the field. | `long` | Yes
`dimensions.metrics` | Specify a list of objects that represent the fields and metrics that you want to calculate. | `nested object` | No
`dimensions.metrics.field` | Specify the field that you want to perform metric aggregations on. | `string` | No
`dimensions.metrics.field.metrics` | Specify the metric aggregations you want to calculate for the field. | `multiple strings` | No
`source_index` | The name of the detector. | String | Yes
`target_index` | Specify the target index that the rolled up data is ingested into. You could either create a new target index or use an existing index. The target index cannot be a combination of raw and rolled up data. | String | Yes
`schedule` | Schedule of the index rollup job which can be an interval or a cron expression. | Object | Yes
`schedule.interval` | Specify the frequency of execution of the rollup job. | Object | No
`schedule.interval.start_time` | Start time of the interval. | Timestamp | Yes
`schedule.interval.period` | Define the interval period. | String | Yes
`schedule.interval.unit` | Specify the time unit of the interval. | String | Yes
`schedule.interval.cron` | Optionally, specify a cron expression to define therollup frequency. | List | No
`schedule.interval.cron.expression` | Specify a Unix cron expression. | String | Yes
`schedule.interval.cron.timezone` | Specify timezones as defined by the IANA Time Zone Database. Defaults to UTC. | String | No
`description` | Optionally, describe the rollup job. | String | No
`enabled` | When true, the index rollup job is scheduled. Default is true. | Boolean | Yes
`continuous` | Specify whether or not the index rollup job continuously rolls up data forever or just executes over the current data set once and stops. Default is false. | Boolean | Yes
`error_notification` | Set up a Mustache message template sent for error notifications. For example, if an index rollup job fails, the system sends a message to a Slack channel. | Object | No
`page_size` | Specify the number of buckets to paginate through at a time while rolling up. | Number | Yes
`delay` | The number of milliseconds to delay execution of the index rollup job. | Long | No
`dimensions` | Specify aggregations to create dimensions for the roll up time window. | Object | Yes
`dimensions.date_histogram` | Specify either fixed_interval or calendar_interval, but not both. Either one limits what you can query in the target index. | Object | No
`dimensions.date_histogram.fixed_interval` | Specify the fixed interval for aggregations in milliseconds, seconds, minutes, hours, or days. | String | No
`dimensions.date_histogram.calendar_interval` | Specify the calendar interval for aggregations in minutes, hours, days, weeks, months, quarters, or years. | String | No
`dimensions.date_histogram.field` | Specify the date field used in date histogram aggregation. | String | No
`dimensions.date_histogram.timezone` | Specify the timezones as defined by the IANA Time Zone Database. The default is UTC. | String | No
`dimensions.terms` | Specify the term aggregations that you want to roll up. | Object | No
`dimensions.terms.fields` | Specify terms aggregation for compatible fields. | Object | No
`dimensions.histogram` | Specify the histogram aggregations that you want to roll up. | Object | No
`dimensions.histogram.field` | Add a field for histogram aggregations. | String | Yes
`dimensions.histogram.interval` | Specify the histogram aggregation interval for the field. | Long | Yes
`dimensions.metrics` | Specify a list of objects that represent the fields and metrics that you want to calculate. | Nested object | No
`dimensions.metrics.field` | Specify the field that you want to perform metric aggregations on. | String | No
`dimensions.metrics.field.metrics` | Specify the metric aggregations you want to calculate for the field. | Multiple strings | No
#### Sample response

View File

@ -29,7 +29,7 @@ If you don't have any data in your cluster, you can use the sample flight data w
### Step 1: Choose indices
1. In the **Job name and description** section, specify a name and an optional description for your job.
2. In the **Indices** section, select the source and target index. You can either select an existing target index or create a new one by entering a name for your new index. If you want to transform just a subset of your source index, choose **Add Data Filter**, and use the OpenSearch query DSL to specify a subset of your source index. For more information about the OpenSearch query DSL, see [query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/).
2. In the **Indices** section, select the source and target index. You can either select an existing target index or create a new one by entering a name for your new index. If you want to transform just a subset of your source index, choose **Edit data filter**, and use the OpenSearch query DSL to specify a subset of your source index. For more information about the OpenSearch query DSL, see [query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/).
3. Choose **Next**.
### Step 2: Select fields to transform

View File

@ -2,7 +2,7 @@
layout: default
title: ISM API
parent: Index State Management
nav_order: 5
nav_order: 20
---
# ISM API

View File

@ -31,14 +31,21 @@ To get started, choose **Index Management** in OpenSearch Dashboards.
A policy is a set of rules that describes how an index should be managed. For information about creating a policy, see [Policies]({{site.url}}{{site.baseurl}}/im-plugin/ism/policies/).
You can use the JSON editor or visual editor to create policies. Compared to the JSON editor, the visual editor offers a more structured way of defining policies by separating the process into creating error notifications, defining ISM templates, and adding states. We recommend using the visual editor if you want to see pre-defined fields, such as which actions you can assign to a state or under what conditions a state can transition into a destination state.
#### JSON editor
1. Choose the **Index Policies** tab.
2. Choose **Create policy**.
3. In the **Name policy** section, enter a policy ID.
4. In the **Define policy** section, enter your policy.
5. Choose **Create**.
3. Choose **JSON editor**.
4. In the **Name policy** section, enter a policy ID.
5. In the **Define policy** section, enter your policy.
6. Choose **Create**.
After you create a policy, your next step is to attach this policy to an index or indices.
You can set up an `ism_template` in the policy so when you create an index that matches the ISM template pattern, the index will have this policy attached to it:
After you create a policy, your next step is to attach it to an index or indices.
You can set up an `ism_template` in the policy so when an index that matches the ISM template pattern is created, the plugin automatically attaches the policy to the index.
The following example demonstrates how to create a policy that automatically gets attached to all indices whose names start with `index_name-`.
```json
PUT _plugins/_ism/policies/policy_id
@ -55,6 +62,8 @@ PUT _plugins/_ism/policies/policy_id
}
```
If you have more than one template that matches an index pattern, ISM uses the priority value to determine which template to apply.
For an example ISM template policy, see [Sample policy with ISM template]({{site.url}}{{site.baseurl}}/im-plugin/ism/policies#sample-policy-with-ism-template).
Older versions of the plugin include the `policy_id` in an index template, so when an index is created that matches the index template pattern, the index will have the policy attached to it:
@ -89,6 +98,7 @@ Make sure that the alias that you enter already exists. For more information abo
After you attach a policy to an index, ISM creates a job that runs every 5 minutes by default to perform policy actions, check conditions, and transition the index into different states. To change the default time interval for this job, see [Settings]({{site.url}}{{site.baseurl}}/im-plugin/ism/settings/).
ISM does not run jobs if the cluster state is red.
### Step 3: Manage indices

View File

@ -347,7 +347,7 @@ Parameter | Description | Type | Required | Default
### allocation
Allocate the index to a node with a specific attribute.
Allocate the index to a node with a specific attribute set [like this]({{site.url}}{{site.baseurl}}/opensearch/cluster/#advanced-step-7-set-up-a-hot-warm-architecture).
For example, setting `require` to `warm` moves your data only to "warm" nodes.
The `allocation` operation has the following parameters:
@ -363,7 +363,7 @@ Parameter | Description | Type | Required
"actions": [
{
"allocation": {
"require": { "box_type": "warm" }
"require": { "temp": "warm" }
}
}
]
@ -558,9 +558,11 @@ The following sample template policy is for a rollover use case.
PUT _index_template/ism_rollover
{
"index_patterns": ["log*"],
"settings": {
"template": {
"settings": {
"plugins.index_state_management.rollover_alias": "log"
}
}
}
}
```
@ -586,6 +588,12 @@ The following sample template policy is for a rollover use case.
}
```
5. Verify if the policy is attached to the `log-000001` index:
```json
GET _plugins/_ism/explain/log-000001?pretty
```
## Example policy
The following example policy implements a `hot`, `warm`, and `delete` workflow. You can use this policy as a template to prioritize resources to your indices based on their levels of activity.

View File

@ -1,7 +1,7 @@
---
layout: default
title: Refresh search analyzer
nav_order: 40
nav_order: 50
has_children: false
redirect_from: /im-plugin/refresh-analyzer/
has_toc: false

41
_im-plugin/security.md Normal file
View File

@ -0,0 +1,41 @@
---
layout: default
title: Index management security
nav_order: 40
has_children: false
---
# Index management security
Using the security plugin with index management lets you limit non-admin users to certain actions. For example, you might want to set up your security such that a group of users can only read ISM policies, while others can create, delete, or change policies.
All index management data are protected as system indices, and only a super admin or an admin with a Transport Layer Security (TLS) certificate can access system indices. For more information, see [System indices]({{site.url}}{{site.baseurl}}/security-plugin/configuration/system-indices).
## Basic permissions
The security plugin comes with one role that offers full access to index management: `index_management_full_access`. For a description of the role's permissions, see [Predefined roles]({{site.url}}{{site.baseurl}}/security-plugin/access-control/users-roles#predefined-roles).
With security enabled, users not only need the correct index management permissions, but they also need permissions to execute actions to involved indices. For example, if a user wants to use the REST API to attach a policy that executes a rollup job to an index named `system-logs`, they would need the permissions to attach a policy and execute a rollup job, as well as access to `system-logs`.
Finally, with the exceptions of Create Policy, Get Policy, and Delete Policy, users also need the `indices:admin/opensearch/ism/managedindex` permission to execute [ISM APIs]({{site.url}}{{site.baseurl}}/im-plugin/ism/api).
## (Advanced) Limit access by backend role
You can use backend roles to configure fine-grained access to index management policies and actions. For example, users of different departments in an organization might view different policies depending on what roles and permissions they are assigned.
First, ensure your users have the appropriate [backend roles]({{site.url}}{{site.baseurl}}/security-plugin/access-control/index/). Backend roles usually come from an [LDAP server]({{site.url}}{{site.baseurl}}/security-plugin/configuration/ldap/) or [SAML provider]({{site.url}}{{site.baseurl}}/security-plugin/configuration/saml/). However, if you use the internal user database, you can use the REST API to [add them manually]({{site.url}}{{site.baseurl}}/security-plugin/access-control/api#create-user).
Use the REST API to enable the following setting:
```json
PUT _cluster/settings
{
"transient": {
"plugins.index_management.filter_by_backend_roles": "true"
}
}
```
With security enabled, only users who share at least one backend role can see and execute the policies and actions relevant to their roles.
For example, consider a scenario with three users: `John` and `Jill`, who have the backend role `helpdesk_staff`, and `Jane`, who has the backend role `phone_operator`. `John` wants to create a policy that performs a rollup job on an index named `airline_data`, so `John` would need a backend role that has permissions to access that index, create relevant policies, and execute relevant actions, and `Jill` would be able to access the same index, policy, and job. However, `Jane` cannot access or edit those resources or actions.

View File

@ -6,3 +6,9 @@
<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3.0.1/es5/tex-mml-chtml.js"></script>
{% endif %}
{% if jekyll.environment == "development" %}
<script src="{{ '/assets/js/version-selector.js' | relative_url }}"></script>
{% else %}
<script src="{{ '/docs/latest/assets/js/version-selector.js' }}"></script>
{% endif %}

View File

@ -57,6 +57,10 @@ layout: table_wrappers
</a>
</div>
<nav role="navigation" aria-label="Main" id="site-nav" class="site-nav">
{% assign past_versions = site.data.versions.past | join: ";" %}
<div class="version-wrapper">
<version-selector selected="{{ site.data.versions.current }}"></version-selector>
</div>
{% assign pages_top_size = site.html_pages
| where_exp:"item", "item.title != nil"
| where_exp:"item", "item.parent == nil"

File diff suppressed because it is too large Load Diff

View File

@ -17,24 +17,19 @@ Anomaly detection automatically detects anomalies in your OpenSearch data in ne
You can pair the anomaly detection plugin with the [alerting plugin]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/) to notify you as soon as an anomaly is detected.
To use the anomaly detection plugin, your computer needs to have more than one CPU core.
{: .note }
## Get started with Anomaly Detection
To get started, choose **Anomaly Detection** in OpenSearch Dashboards.
To first test with sample streaming data, choose **Sample Detectors** and try out one of the preconfigured detectors.
To first test with sample streaming data, you can try out one of the preconfigured detectors with one of the sample datasets.
### Step 1: Create a detector
## Step 1: Define a detector
A detector is an individual anomaly detection task. You can create multiple detectors, and all the detectors can run simultaneously, with each analyzing data from different sources.
A detector is an individual anomaly detection task. You can define multiple detectors, and all the detectors can run simultaneously, with each analyzing data from different sources.
1. Choose **Create Detector**.
1. Choose **Create detector**.
1. Enter a name and brief description. Make sure the name is unique and descriptive enough to help you to identify the purpose of the detector.
1. For **Data source**, choose the index you want to use as the data source. You can optionally use index patterns to choose multiple indices.
1. (Optional) For **Data filter**, filter the index you chose as the data source. From the **Data filter** menu, choose **Add data filter**, and then design your filter query by selecting **Field**, **Operator**, and **Value**, or choose **Use query DSL** and add your own JSON filter query.
1. Select the **Timestamp field** in your index.
1. (Optional) For **Data filter**, filter the index you chose as the data source. From the **Filter type** menu, choose **Visual filter**, and then design your filter query by selecting **Fields**, **Operator**, and **Value**, or choose **Custom Expression** and add your own JSON filter query.
1. For **Detector operation settings**, define the **Detector interval**, which is the time interval at which the detector collects data.
1. For **Operation settings**, define the **Detector interval**, which is the time interval at which the detector collects data.
- The detector aggregates the data in this interval, then feeds the aggregated result into the anomaly detection model.
The shorter you set this interval, the fewer data points the detector aggregates.
The anomaly detection model uses a shingling process, a technique that uses consecutive data points to create a sample for the model. This process needs a certain number of aggregated data points from contiguous intervals.
@ -44,48 +39,53 @@ Set the window delay to shift the detector interval to account for this delay.
- For example, say the detector interval is 10 minutes and data is ingested into your cluster with a general delay of 1 minute.
Assume the detector runs at 2:00. The detector attempts to get the last 10 minutes of data from 1:50 to 2:00, but because of the 1-minute delay, it only gets 9 minutes of data and misses the data from 1:59 to 2:00.
Setting the window delay to 1 minute shifts the interval window to 1:49 - 1:59, so the detector accounts for all 10 minutes of the detector interval time.
1. Choose **Create**.
1. Choose **Next**.
After you create the detector, the next step is to add features to it.
After you define the detector, the next step is to configure the model.
### Step 2: Add features to your detector
## Step 2: Configure the model
#### Add features to your detector
A feature is the field in your index that you want to check for anomalies. A detector can discover anomalies across one or more features. You must choose an aggregation method for each feature: `average()`, `count()`, `sum()`, `min()`, or `max()`. The aggregation method determines what constitutes an anomaly.
For example, if you choose `min()`, the detector focuses on finding anomalies based on the minimum values of your feature. If you choose `average()`, the detector finds anomalies based on the average values of your feature.
A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data might further amplify this negative impact. Selecting the optimal feature set is usually an iterative process. We recommend experimenting with a historical detector with different feature sets and checking the precision before moving on to real-time detectors. By default, the maximum number of features for a detector is 5. You can adjust this limit with the `plugins.anomaly_detection.max_anomaly_features` setting.
A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data might further amplify this negative impact. Selecting the optimal feature set is usually an iterative process. By default, the maximum number of features for a detector is 5. You can adjust this limit with the `plugins.anomaly_detection.max_anomaly_features` setting.
{: .note }
1. On the **Model configuration** page, enter the **Feature name**.
1. For **Find anomalies based on**, choose the method to find anomalies. For **Field Value** menu, choose the **field** and the **aggregation method**. Or choose **Custom expression**, and add your own JSON aggregation query.
1. On the **Configure Model** page, enter the **Feature name** and check **Enable feature**.
1. For **Find anomalies based on**, choose the method to find anomalies. For **Field Value**, choose the **aggregation method**. Or choose **Custom expression**, and add your own JSON aggregation query.
1. Select a field.
#### (Optional) Set a category field for high cardinality
#### (Optional) Set category fields for high cardinality
You can categorize anomalies based on a keyword or IP field type.
The category field categorizes or slices the source time series with a dimension like IP addresses, product IDs, country codes, and so on. This helps to see a granular view of anomalies within each entity of the category field to isolate and debug issues.
To set a category field, choose **Enable a category field** and select a field.
To set a category field, choose **Enable a category field** and select a field. You cant change the category fields after you create the detector.
Only a certain number of unique entities are supported in the category field. Use the following equation to calculate the recommended total number of entities supported in a cluster:
```
(data nodes * heap size * anomaly detection maximum memory percentage) / (entity size of a detector)
(data nodes * heap size * anomaly detection maximum memory percentage) / (entity model size of a detector)
```
To get the entity model size of a detector, use the [profile detector API]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/api/#profile-detector). You can adjust the maximum memory percentage with the `plugins.anomaly_detection.model_max_size_percent` setting.
This formula provides a good starting point, but make sure to test with a representative workload.
{: .note }
For example, for a cluster with 3 data nodes, each with 8G of JVM heap size, a maximum memory percentage of 10% (default), and the entity size of the detector as 1MB: the total number of unique entities supported is (8.096 * 10^9 * 0.1 / 1M ) * 3 = 2429.
For example, for a cluster with three data nodes, each with 8 GB of JVM heap size, a maximum memory percentage of 10% (default), and the entity model size of the detector as 1MB: the total number of unique entities supported is (8.096 * 10^9 * 0.1 / 1 MB ) * 3 = 2429.
#### Set a window size
If the actual total number of unique entities higher than this number that you calculate (in this case: 2429), the anomaly detector makes its best effort to model the extra entities. The detector prioritizes entities that occur more often and are more recent.
Set the number of aggregation intervals from your data stream to consider in a detection window. It's best to choose this value based on your actual data to see which one leads to the best results for your use case.
#### (Advanced settings) Set a shingle size
Based on experiments performed on a wide variety of one-dimensional data streams, we recommend using a window size between 1 and 16. The default window size is 8. If you set the category field for high cardinality, the default window size is 1.
Set the number of aggregation intervals from your data stream to consider in a detection window. Its best to choose this value based on your actual data to see which one leads to the best results for your use case.
If you expect missing values in your data or if you want to base the anomalies on the current interval, choose 1. If your data is continuously ingested and you want to base the anomalies on multiple intervals, choose a larger window size.
The anomaly detector expects the shingle size to be in the range of 1 and 60. The default shingle size is 8. We recommend that you don't choose 1 unless you have two or more features. Smaller values might increase [recall](https://en.wikipedia.org/wiki/Precision_and_recall) but also false positives. Larger values might be useful for ignoring noise in a signal.
#### Preview sample anomalies
@ -94,12 +94,27 @@ For sample previews, the anomaly detection plugin selects a small number of data
Examine the sample preview and use it to fine-tune your feature configurations (for example, enable or disable features) to get more accurate results.
1. Choose **Save and start detector**.
1. Choose between automatically starting the detector (recommended) or manually starting the detector at a later time.
1. Choose **Preview sample anomalies**.
- If you don't see any sample anomaly result, check the detector interval and make sure you have more than 400 data points for some entities during the preview date range.
1. Choose **Next**.
### Step 3: Observe the results
## Step 3: Set up detector jobs
Choose the **Anomaly results** tab. You need to wait for some time to see the anomaly results. If the detector interval is 10 minutes, the detector might take more than an hour to start, as it's waiting for sufficient data to generate anomalies.
To start a real-time detector to find anomalies in your data in near real-time, check **Start real-time detector automatically (recommended)**.
Alternatively, if you want to perform historical analysis and find patterns in long historical data windows (weeks or months), check **Run historical analysis detection** and select a date range (at least 128 detection intervals).
Analyzing historical data helps you get familiar with the anomaly detection plugin. You can also evaluate the performance of a detector with historical data to further fine-tune it.
We recommend experimenting with historical analysis with different feature sets and checking the precision before moving on to real-time detectors.
## Step 4: Review and create
Review your model configuration and select **Create detector**.
## Step 5: Observe the results
Choose the **Real-time results** or **Historical analysis** tab. For real-time results, you need to wait for some time to see the anomaly results. If the detector interval is 10 minutes, the detector might take more than an hour to start, as it's waiting for sufficient data to generate anomalies.
A shorter interval means the model passes the shingle process more quickly and starts to generate the anomaly results sooner.
Use the [profile detector]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/api#profile-detector) operation to make sure you have sufficient data points.
@ -108,12 +123,12 @@ If you see the detector pending in "initialization" for longer than a day, aggre
![Anomaly detection results]({{site.url}}{{site.baseurl}}/images/ad.png)
Analize anomalies with the following visualizations:
Analyze anomalies with the following visualizations:
- **Live anomalies** - displays live anomaly results for the last 60 intervals. For example, if the interval is 10, it shows results for the last 600 minutes. The chart refreshes every 30 seconds.
- **Anomaly history** - plots the anomaly grade with the corresponding measure of confidence.
- **Feature breakdown** - plots the features based on the aggregation method. You can vary the date-time range of the detector.
- **Anomaly history** (for historical analysis) / **Anomaly overview** (for real-time results) - plots the anomaly grade with the corresponding measure of confidence.
- **Anomaly occurrence** - shows the `Start time`, `End time`, `Data confidence`, and `Anomaly grade` for each detected anomaly.
- **Feature breakdown** - plots the features based on the aggregation method. You can vary the date-time range of the detector.
`Anomaly grade` is a number between 0 and 1 that indicates how anomalous a data point is. An anomaly grade of 0 represents “not an anomaly,” and a non-zero value represents the relative severity of the anomaly.
@ -121,47 +136,26 @@ Analize anomalies with the following visualizations:
If you set the category field, you see an additional **Heat map** chart. The heat map correlates results for anomalous entities. This chart is empty until you select an anomalous entity. You also see the anomaly and feature line chart for the time period of the anomaly (`anomaly_grade` > 0).
Choose a filled rectangle to see a more detailed view of the anomaly.
Choose and drag over the anomaly line chart to zoom in and see a more detailed view of an anomaly.
{: .note }
### Step 4: Set up alerts
## Step 6: Set up alerts
Choose **Set up alerts** and configure a monitor to notify you when anomalies are detected. For steps to create a monitor and set up notifications based on your anomaly detector, see [Monitors]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/monitors/).
Under **Real-time results**, choose **Set up alerts** and configure a monitor to notify you when anomalies are detected. For steps to create a monitor and set up notifications based on your anomaly detector, see [Monitors]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/monitors/).
If you stop or delete a detector, make sure to delete any monitors associated with it.
### Step 5: Adjust the model
## Step 7: Adjust the model
To see all the configuration settings for a detector, choose the **Detector configuration** tab.
1. To make any changes to the detector configuration, or fine tune the time interval to minimize any false positives, go to the **Detector configuration** section and choose **Edit**.
- You need to stop the detector to change its configuration. Confirm that you want to stop the detector and proceed.
- You need to stop real-time and historical analysis to change its configuration. Confirm that you want to stop the detector and proceed.
1. To enable or disable features, in the **Features** section, choose **Edit** and adjust the feature settings as needed. After you make your changes, choose **Save and start detector**.
- Choose between automatically starting the detector (recommended) or manually starting the detector at a later time.
### Step 6: Analyze historical data
## Step 8: Manage your detectors
Analyzing historical data helps you get familiar with the anomaly detection plugin. You can also evaluate the performance of a detector with historical data to further fine-tune it.
To start, stop, or delete a detector, go to the **Detectors** page.
To use a historical detector, you need to specify a date range that has data present in at least 1,000 detection intervals.
{: .note }
1. Choose **Historical detectors** and **Create historical detector**.
1. Enter the **Name** of the detector and a brief **Description**.
1. For **Data source**, choose the index to use as the data source. You can optionally use index patterns to choose multiple indices.
1. For **Time range**, select a time range for historical analysis.
1. For **Detector settings**, choose to use the settings of an existing detector. Or choose the **Timestamp field** in your index, add individual features to the detector, and set the detector interval.
1. (Optional) Choose to run the historical detector automatically after creating it.
1. Choose **Create**.
- You can stop the historical detector even before it completes.
### Step 7: Manage your detectors
To change or delete a detector, go to the **Detector details** page.
1. To make changes to your detector, choose the detector name.
1. Choose **Actions** and **Edit detector**.
- You need to stop the detector to change its configuration. Confirm that you want to stop the detector and proceed.
1. Make your changes and choose **Save changes**.
To delete your detector, choose **Actions** and **Delete detector**. In the pop-up box, type `delete` to confirm and choose **Delete**.
1. Choose the detector name.
2. Choose **Actions** and select **Start real-time detectors**, **Stop real-time detectors**, or **Delete detectors**.

View File

@ -24,19 +24,24 @@ PUT _cluster/settings
Setting | Default | Description
:--- | :--- | :---
`plugins.anomaly_detection.enabled` | True | Whether the anomaly detection plugin is enabled or not. If disabled, all detectors immediately stop running.
`plugins.anomaly_detection.max_anomaly_detectors` | 1,000 | The maximum number of non-high cardinality detectors (no category field) users can create.
`plugins.anomaly_detection.max_multi_entity_anomaly_detectors` | 10 | The maximum number of high cardinality detectors (with category field) in a cluster.
`plugins.anomaly_detection.max_anomaly_features` | 5 | The maximum number of features for a detector.
`plugins.anomaly_detection.ad_result_history_rollover_period` | 12h | How often the rollover condition is checked. If `true`, the plugin rolls over the result index to a new index.
`plugins.anomaly_detection.ad_result_history_max_docs` | 250000000 | The maximum number of documents in one result index. The plugin only counts refreshed documents in the primary shards.
`plugins.anomaly_detection.ad_result_history_retention_period` | 30d | The maximum age of the result index. If its age exceeds the threshold, the plugin deletes the rolled over result index. If the cluster has only one result index, the plugin keeps the index even if it's older than its configured retention period.
`plugins.anomaly_detection.max_entities_per_query` | 1,000 | The maximum unique values per detection interval for high cardinality detectors. By default, if the category field has more than 1,000 unique values in a detector interval, the plugin selects the top 1,000 values and orders them by `doc_count`.
`plugins.anomaly_detection.max_entities_for_preview` | 30 | The maximum unique category field values displayed with the preview operation for high cardinality detectors. If the category field has more than 30 unique values, the plugin selects the top 30 values and orders them by `doc_count`.
`plugins.anomaly_detection.max_primary_shards` | 10 | The maximum number of primary shards an anomaly detection index can have.
`plugins.anomaly_detection.filter_by_backend_roles` | False | When you enable the security plugin and set this to `true`, the plugin filters results based on the user's backend role(s).
`plugins.anomaly_detection.max_cache_miss_handling_per_second` | 100 | High cardinality detectors use a cache to store active models. In the event of a cache miss, the cache gets the models from the model checkpoint index. Use this setting to limit the rate of fetching models. Because the thread pool for a GET operation has a queue of 1,000, we recommend setting this value below 1,000.
`plugins.anomaly_detection.max_batch_task_per_node` | 2 | Starting a historical detector triggers a batch task. This setting is the number of batch tasks that you can run per data node. You can tune this setting from 1 to 1000. If the data nodes can't support all batch tasks and you're not sure if the data nodes are capable of running more historical detectors, add more data nodes instead of changing this setting to a higher value.
`plugins.anomaly_detection.max_old_ad_task_docs_per_detector` | 10 | You can run the same historical detector many times. For each run, the anomaly detection plugin creates a new task. This setting is the number of previous tasks the plugin keeps. Set this value to at least 1 to track its last run. You can keep a maximum of 1,000 old tasks to avoid overwhelming the cluster.
`plugins.anomaly_detection.batch_task_piece_size` | 1000 | The date range for a historical task is split into smaller pieces and the anomaly detection plugin runs the task piece by piece. Each piece contains 1,000 detection intervals by default. For example, if detector interval is 1 minute and one piece is 1000 minutes, the feature data is queried every 1,000 minutes. You can change this setting from 1 to 10,000.
`plugins.anomaly_detection.batch_task_piece_interval_seconds` | 5 | Add a time interval between historical detector tasks. This interval prevents the task from consuming too much of the available resources and starving other operations like search and bulk index. You can change this setting from 1 to 600 seconds.
plugins.anomaly_detection.enabled | True | Whether the anomaly detection plugin is enabled or not. If disabled, all detectors immediately stop running.
plugins.anomaly_detection.max_anomaly_detectors | 1,000 | The maximum number of non-high cardinality detectors (no category field) users can create.
plugins.anomaly_detection.max_multi_entity_anomaly_detectors | 10 | The maximum number of high cardinality detectors (with category field) in a cluster.
plugins.anomaly_detection.max_anomaly_features | 5 | The maximum number of features for a detector.
plugins.anomaly_detection.ad_result_history_rollover_period | 12h | How often the rollover condition is checked. If `true`, the anomaly detection plugin rolls over the result index to a new index.
plugins.anomaly_detection.ad_result_history_max_docs_per_shard | 1,350,000,000 | The maximum number of documents in a single shard of the result index. The anomaly detection plugin only counts the refreshed documents in the primary shards.
plugins.anomaly_detection.max_entities_per_query | 1,000,000 | The maximum unique values per detection interval for high cardinality detectors. By default, if the category field(s) have more than the configured unique values in a detector interval, the anomaly detection plugin orders them by the natural ordering of categorical values (for example, entity `ab` comes before `bc`) and then selects the top values.
plugins.anomaly_detection.max_entities_for_preview | 5 | The maximum unique category field values displayed with the preview operation for high cardinality detectors. By default, if the category field(s) have more than the configured unique values in a detector interval, the anomaly detection plugin orders them by the natural ordering of categorical values (for example, entity `ab` comes before `bc`) and then selects the top values.
plugins.anomaly_detection.max_primary_shards | 10 | The maximum number of primary shards an anomaly detection index can have.
plugins.anomaly_detection.filter_by_backend_roles | False | When you enable the security plugin and set this to `true`, the anomaly detection plugin filters results based on the user's backend role(s).
plugins.anomaly_detection.max_batch_task_per_node | 10 | Starting a historical analysis triggers a batch task. This setting is the number of batch tasks that you can run per data node. You can tune this setting from 1 to 1,000. If the data nodes cant support all batch tasks and youre not sure if the data nodes are capable of running more historical analysis, add more data nodes instead of changing this setting to a higher value. Increasing this value might bring more load on each data node.
plugins.anomaly_detection.max_old_ad_task_docs_per_detector | 1 | You can run historical analysis for the same detector many times. For each run, the anomaly detection plugin creates a new task. This setting is the number of previous tasks the plugin keeps. Set this value to at least 1 to track its last run. You can keep a maximum of 1,000 old tasks to avoid overwhelming the cluster.
plugins.anomaly_detection.batch_task_piece_size | 1,000 | The date range for a historical task is split into smaller pieces and the anomaly detection plugin runs the task piece by piece. Each piece contains 1,000 detection intervals by default. For example, if detector interval is 1 minute and one piece is 1,000 minutes, the feature data is queried every 1,000 minutes. You can change this setting from 1 to 10,000.
plugins.anomaly_detection.batch_task_piece_interval_seconds | 5 | Add a time interval between two pieces of the same historical analysis task. This interval prevents the task from consuming too much of the available resources and starving other operations like search and bulk index. You can change this setting from 1 to 600 seconds.
plugins.anomaly_detection.max_top_entities_for_historical_analysis | 1,000 | The maximum number of top entities that you run for a high cardinality detector historical analysis. The range is from 1 to 10,000.
plugins.anomaly_detection.max_running_entities_per_detector_for_historical_analysis | 10 | The number of entity tasks that you can run in parallel for a high cardinality detector analysis. The task slots available on your cluster also impact how many entities run in parallel. If a cluster has 3 data nodes, each data node has 10 task slots by default. Say you already have two high cardinality detectors and each of them run 10 entities. If you start a single-entity detector that takes 1 task slot, the number of task slots available is 10 * 3 - 10 * 2 - 1 = 9. If you now start a new high cardinality detector, the detector can only run 9 entities in parallel and not 10. You can tune this value from 1 to 1,000 based on your cluster's capability. If you set a higher value, the anomaly detection plugin runs historical analysis faster but also consumes more resources.
plugins.anomaly_detection.max_cached_deleted_tasks | 1,000 | You can rerun historical analysis for a single detector as many times as you like. The anomaly detection plugin only keeps a limited number of old tasks, by default 1 old task. If you run historical analysis three times for a detector, the oldest task is deleted. Because historical analysis generates a number of anomaly results in a short span of time, it's necessary to clean up anomaly results for a deleted task. With this field, you can configure how many deleted tasks you can cache at most. The plugin cleans up a task's results when it's deleted. If the plugin fails to do this cleanup, it adds the task's results into a cache and an hourly cron job performs the cleanup. You can use this setting to limit how many old tasks are put into cache to avoid a DDoS attack. After an hour, if still you find an old task result in the cache, use the [delete detector results API]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/api/#delete-detector-results) to delete the task result manually. You can tune this setting from 1 to 10,000.
plugins.anomaly_detection.delete_anomaly_result_when_delete_detector | False | Whether the anomaly detection plugin deletes the anomaly result when you delete a detector. If you want to save some disk space, especially if you've high cardinality detectors generating a lot of results, set this field to true. Alternatively, you can use the [delete detector results API]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/api/#delete-detector-results) to manually delete the results.
plugins.anomaly_detection.dedicated_cache_size | 10 | If the real-time analysis of a high cardinality detector starts successfully, the anomaly detection plugin guarantees keeping 10 (dynamically adjustable via this setting) entities' models in memory per node. If the number of entities exceeds this limit, the plugin puts the extra entities' models in a memory space shared by all detectors. The actual number of entities varies based on the memory that you've available and the frequencies of the entities. If you'd like the plugin to guarantee keeping more entities' models in memory and if you're cluster has sufficient memory, you can increase this setting value.
plugins.anomaly_detection.max_concurrent_preview | 2 | The maximum number of concurrent previews. You can use this setting to limit resource usage.
plugins.anomaly_detection.model_max_size_percent | 0.1 | The upper bound of the memory percentage for a model.

View File

@ -19,10 +19,12 @@ Use the alerting API to programmatically manage monitors and alerts.
---
## Create monitor
## Create query-level monitor
Introduced 1.0
{: .label .label-purple }
Query-level monitors run the query and check whether the results should trigger any alerts. As such, query-level monitors can only trigger one alert at a time. For more information about query-level monitors versus bucket-level monitors, see [Create monitors]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/monitors/#create-monitors).
#### Request
```json
@ -30,6 +32,7 @@ POST _plugins/_alerting/monitors
{
"type": "monitor",
"name": "test-monitor",
"monitor_type": "query_level_monitor",
"enabled": true,
"schedule": {
"period": {
@ -166,7 +169,7 @@ If you use a custom webhook for your destination and need to embed JSON in the m
},
"throttle_enabled": false,
"subject_template": {
"source": "TheSubject",
"source": "Subject",
"lang": "mustache"
}
}]
@ -186,6 +189,7 @@ The following example creates a monitor that runs at 12:10 PM Pacific Time on th
{
"type": "monitor",
"name": "test-monitor",
"monitor_type": "query_level_monitor",
"enabled": true,
"schedule": {
"cron" : {
@ -228,7 +232,7 @@ The following example creates a monitor that runs at 12:10 PM Pacific Time on th
"name": "test-action",
"destination_id": "ld7912sBlQ5JUWWFThoW",
"message_template": {
"source": "This is my message body."
"source": "This is a message body."
},
"throttle_enabled": true,
"throttle": {
@ -236,7 +240,7 @@ The following example creates a monitor that runs at 12:10 PM Pacific Time on th
"unit": "MINUTES"
},
"subject_template": {
"source": "TheSubject"
"source": "Subject"
}
}]
}]
@ -247,6 +251,265 @@ For a full list of timezone names, refer to [Wikipedia](https://en.wikipedia.org
---
## Create bucket-level monitor
Bucket-level monitors categorize results into buckets separated by fields. The monitor then runs your script with each bucket's results and evaluates whether to trigger an alert. For more information about bucket-level monitors versus query-level monitors, see [Create monitors]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/monitors/#create-monitors).
```json
POST _plugins/_alerting/monitors
{
"type": "monitor",
"name": "test-bucket-level-monitor",
"monitor_type": "bucket_level_monitor",
"enabled": true,
"schedule": {
"period": {
"interval": 1,
"unit": "MINUTES"
}
},
"inputs": [
{
"search": {
"indices": [
"movies"
],
"query": {
"size": 0,
"query": {
"bool": {
"filter": [
{
"range": {
"order_date": {
"from": "{{period_end}}||-1h",
"to": "{{period_end}}",
"include_lower": true,
"include_upper": true,
"format": "epoch_millis"
}
}
}
]
}
},
"aggregations": {
"composite_agg": {
"composite": {
"sources": [
{
"user": {
"terms": {
"field": "user"
}
}
}
]
},
"aggregations": {
"avg_products_base_price": {
"avg": {
"field": "products.base_price"
}
}
}
}
}
}
}
}
],
"triggers": [
{
"bucket_level_trigger": {
"name": "test-trigger",
"severity": "1",
"condition": {
"buckets_path": {
"_count": "_count",
"avg_products_base_price": "avg_products_base_price"
},
"parent_bucket_path": "composite_agg",
"script": {
"source": "params._count > 50 || params.avg_products_base_price < 35",
"lang": "painless"
}
},
"actions": [
{
"name": "test-action",
"destination_id": "E4o5hnsB6KjPKmHtpfCA",
"message_template": {
"source": """Monitor {{ctx.monitor.name}} just entered alert status. Please investigate the issue. - Trigger: {{ctx.trigger.name}} - Severity: {{ctx.trigger.severity}} - Period start: {{ctx.periodStart}} - Period end: {{ctx.periodEnd}} - Deduped Alerts: {{ctx.dedupedAlerts}} * {{id}} : {{bucket_keys}} {{ctx.dedupedAlerts}} - New Alerts: {{ctx.newAlerts}} * {{id}} : {{bucket_keys}} {{ctx.newAlerts}} - Completed Alerts: {{ctx.completedAlerts}} * {{id}} : {{bucket_keys}} {{ctx.completedAlerts}}""",
"lang": "mustache"
},
"throttle_enabled": false,
"throttle": {
"value": 10,
"unit": "MINUTES"
},
"action_execution_policy": {
"action_execution_scope": {
"per_alert": {
"actionable_alerts": [
"DEDUPED",
"NEW"
]
}
}
},
"subject_template": {
"source": "The Subject",
"lang": "mustache"
}
}
]
}
}
]
}
```
#### Sample response
```json
{
"_id" : "Dfxr63sBwex6DxEhHV5N",
"_version" : 1,
"_seq_no" : 3,
"_primary_term" : 1,
"monitor" : {
"type" : "monitor",
"schema_version" : 4,
"name" : "test-bucket-level-monitor",
"monitor_type" : "bucket_level_monitor",
"user" : {
"name" : "",
"backend_roles" : [ ],
"roles" : [ ],
"custom_attribute_names" : [ ],
"user_requested_tenant" : null
},
"enabled" : true,
"enabled_time" : 1631742270785,
"schedule" : {
"period" : {
"interval" : 1,
"unit" : "MINUTES"
}
},
"inputs" : [
{
"search" : {
"indices" : [
"opensearch_dashboards_sample_data_flights"
],
"query" : {
"size" : 0,
"query" : {
"bool" : {
"filter" : [
{
"range" : {
"order_date" : {
"from" : "{{period_end}}||-1h",
"to" : "{{period_end}}",
"include_lower" : true,
"include_upper" : true,
"format" : "epoch_millis",
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
},
"aggregations" : {
"composite_agg" : {
"composite" : {
"size" : 10,
"sources" : [
{
"user" : {
"terms" : {
"field" : "user",
"missing_bucket" : false,
"order" : "asc"
}
}
}
]
},
"aggregations" : {
"avg_products_base_price" : {
"avg" : {
"field" : "products.base_price"
}
}
}
}
}
}
}
}
],
"triggers" : [
{
"bucket_level_trigger" : {
"id" : "C_xr63sBwex6DxEhHV5B",
"name" : "test-trigger",
"severity" : "1",
"condition" : {
"buckets_path" : {
"_count" : "_count",
"avg_products_base_price" : "avg_products_base_price"
},
"parent_bucket_path" : "composite_agg",
"script" : {
"source" : "params._count > 50 || params.avg_products_base_price < 35",
"lang" : "painless"
},
"gap_policy" : "skip"
},
"actions" : [
{
"id" : "DPxr63sBwex6DxEhHV5B",
"name" : "test-action",
"destination_id" : "E4o5hnsB6KjPKmHtpfCA",
"message_template" : {
"source" : "Monitor {{ctx.monitor.name}} just entered alert status. Please investigate the issue. - Trigger: {{ctx.trigger.name}} - Severity: {{ctx.trigger.severity}} - Period start: {{ctx.periodStart}} - Period end: {{ctx.periodEnd}} - Deduped Alerts: {{ctx.dedupedAlerts}} * {{id}} : {{bucket_keys}} {{ctx.dedupedAlerts}} - New Alerts: {{ctx.newAlerts}} * {{id}} : {{bucket_keys}} {{ctx.newAlerts}} - Completed Alerts: {{ctx.completedAlerts}} * {{id}} : {{bucket_keys}} {{ctx.completedAlerts}}",
"lang" : "mustache"
},
"throttle_enabled" : false,
"subject_template" : {
"source" : "The Subject",
"lang" : "mustache"
},
"throttle" : {
"value" : 10,
"unit" : "MINUTES"
},
"action_execution_policy" : {
"action_execution_scope" : {
"per_alert" : {
"actionable_alerts" : [
"DEDUPED",
"NEW"
]
}
}
}
}
]
}
}
],
"last_update_time" : 1631742270785
}
}
```
## Update monitor
Introduced 1.0
{: .label .label-purple }

View File

@ -19,11 +19,11 @@ has_children: false
Term | Definition
:--- | :---
Monitor | A job that runs on a defined schedule and queries OpenSearch. The results of these queries are then used as input for one or more *triggers*.
Monitor | A job that runs on a defined schedule and queries OpenSearch indices. The results of these queries are then used as input for one or more *triggers*.
Trigger | Conditions that, if met, generate *alerts*.
Alert | An event associated with a trigger. When an alert is created, the trigger performs *actions*, which can include sending a notification.
Action | The information that you want the monitor to send out after being triggered. Actions have a *destination*, a message subject, and a message body.
Destination | A reusable location for an action, such as Amazon Chime, Slack, or a webhook URL.
Destination | A reusable location for an action. Supported locations are Amazon Chime, Email, Slack, or custom webhook.
---
@ -34,9 +34,9 @@ Destination | A reusable location for an action, such as Amazon Chime, Slack, or
1. Specify a name for the destination so that you can identify it later.
1. For **Type**, choose Slack, Amazon Chime, custom webhook, or [email](#email-as-a-destination).
For Email type, refer to [Email as a destination](#email-as-a-destination) section below. For all other types, specify the webhook URL. For more information about webhooks, see the documentation for [Slack](https://api.slack.com/incoming-webhooks) and [Amazon Chime](https://docs.aws.amazon.com/chime/latest/ug/webhooks.html).
For Email, refer to the [Email as a destination](#email-as-a-destination) section below. For all other types, specify the webhook URL. See the documentation for [Slack](https://api.slack.com/incoming-webhooks) and [Amazon Chime](https://docs.aws.amazon.com/chime/latest/ug/webhooks.html) to learn more about webhooks.
For custom webhooks, you must specify more information: parameters and headers. For example, if your endpoint requires basic authentication, you might need to add a header with a key of `Authorization` and a value of `Basic <Base64-encoded-credential-string>`. You might also need to change `Content-Type` to whatever your webhook requires. Popular values are `application/json`, `application/xml`, and `text/plain`.
If you're using custom webhooks, you must specify more information: parameters and headers. For example, if your endpoint requires basic authentication, you might need to add a header with a key of `Authorization` and a value of `Basic <Base64-encoded-credential-string>`. You might also need to change `Content-Type` to whatever your webhook requires. Popular values are `application/json`, `application/xml`, and `text/plain`.
This information is stored in plain text in the OpenSearch cluster. We will improve this design in the future, but for now, the encoded credentials (which are neither encrypted nor hashed) might be visible to other OpenSearch users.
@ -55,7 +55,7 @@ To configure a sender email, do the following:
1. After you choose **Email** as the destination type, choose **Manage senders**.
1. Choose **Add sender**, **New sender** and enter a unique name.
1. Enter the email address, SMTP host (e.g. `smtp.gmail.com` for a Gmail account), and the port.
1. Choose an encryption method, or use the default value of **None**. However, most email providers require SSL or TLS, which requires a username and password in OpenSearch keystore. Refer to [Authenticate sender account](#authenticate-sender-account) to learn more.
1. Choose an encryption method, or use the default value of **None**. However, most email providers require SSL or TLS, which require a username and password in OpenSearch keystore. Refer to [Authenticate sender account](#authenticate-sender-account) to learn more.
1. Choose **Save** to save the configuration and create the sender. You can create a sender even before you add your credentials to the OpenSearch keystore. However, you must [authenticate each sender account](#authenticate-sender-account) before you use the destination to send your alert.
You can reuse senders across many different destinations, but each destination only supports one sender.
@ -82,7 +82,7 @@ If your email provider requires SSL or TLS, you must authenticate each sender ac
./bin/opensearch-keystore add plugins.alerting.destination.email.<sender_name>.password
```
**Note**: Keystore settings are node-specific. You must run these commands on each node.
Note: Keystore settings are node-specific. You must run these commands on each node.
{: .note}
To change or update your credentials (after you've added them to the keystore on every node), call the reload API to automatically update those credentials without restarting OpenSearch:
@ -101,20 +101,9 @@ POST _nodes/reload_secure_settings
1. Choose **Alerting**, **Monitors**, **Create monitor**.
1. Specify a name for the monitor.
1. Choose either **Per query monitor** or **Per bucket monitor**.
The anomaly detection option is for pairing with the anomaly detection plugin. See [Anomaly Detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/).
For anomaly detector, choose an appropriate schedule for the monitor based on the detector interval. Otherwise, the alerting monitor might miss reading the results.
For example, assume you set the monitor interval and the detector interval as 5 minutes, and you start the detector at 12:00. If an anomaly is detected at 12:05, it might be available at 12:06 because of the delay between writing the anomaly and it being available for queries. The monitor reads the anomaly results between 12:00 and 12:05, so it does not get the anomaly results available at 12:06.
To avoid this issue, make sure the alerting monitor is at least twice the detector interval.
When you create a monitor using OpenSearch Dashboards, the anomaly detector plugin generates a default monitor schedule that's twice the detector interval.
Whenever you update a detectors interval, make sure to update the associated monitor interval as well, as the anomaly detection plugin does not do this automatically.
1. Choose one or more indices. You can also use `*` as a wildcard to specify an index pattern.
If you use the security plugin, you can only choose indices that you have permission to access. For details, see [Alerting security]({{site.url}}{{site.baseurl}}/security-plugin/).
Whereas query-level monitors run your specified query and then check whether the query's results triggers any alerts, bucket-level monitors let you select fields to create buckets and categorize your results into those buckets. The alerting plugin runs each bucket's unique results against a script you define later, so you have finer control over which results should trigger alerts. Each of those buckets can trigger an alert, but query-level monitors can only trigger one alert at a time.
1. Define the monitor in one of three ways: visually, using a query, or using an anomaly detector.
@ -167,39 +156,57 @@ Whenever you update a detectors interval, make sure to update the associated
}
```
"Start" and "end" refer to the interval at which the monitor runs. See [Available variables](#available-variables).
"Start" and "end" refer to the interval at which the monitor runs. See [Available variables](#available-variables).
To define a monitor visually, choose **Visual editor**. Then choose a source index, a timeframe, an aggregation (for example, `count()` or `average()`), a data filter if you want to monitor a subset of your source index, and a group-by field if you want to include an aggregation field in your query. At least one group-by field is required if you're defining a bucket-level monitor. Visual definition works well for most monitors.
1. To define a monitor visually, choose **Define using visual graph**. Then choose an aggregation (for example, `count()` or `average()`), a set of documents, and a timeframe. Visual definition works well for most monitors.
If you use the security plugin, you can only choose indices that you have permission to access. For details, see [Alerting security]({{site.url}}{{site.baseurl}}/security-plugin/).
To use a query, choose **Define using extraction query**, add your query (using [the OpenSearch query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/)), and test it using the **Run** button.
To use a query, choose **Extraction query editor**, add your query (using [the OpenSearch query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/)), and test it using the **Run** button.
The monitor makes this query to OpenSearch as often as the schedule dictates; check the **Query Performance** section and make sure you're comfortable with the performance implications.
The monitor makes this query to OpenSearch as often as the schedule dictates; check the **Query Performance** section and make sure you're comfortable with the performance implications.
To use an anomaly detector, choose **Anomaly detector** and select your **Detector**.
The anomaly detection option is for pairing with the anomaly detection plugin. See [Anomaly Detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/).
For anomaly detector, choose an appropriate schedule for the monitor based on the detector interval. Otherwise, the alerting monitor might miss reading the results.
For example, assume you set the monitor interval and the detector interval as 5 minutes, and you start the detector at 12:00. If an anomaly is detected at 12:05, it might be available at 12:06 because of the delay between writing the anomaly and it being available for queries. The monitor reads the anomaly results between 12:00 and 12:05, so it does not get the anomaly results available at 12:06.
To avoid this issue, make sure the alerting monitor is at least twice the detector interval.
When you create a monitor using OpenSearch Dashboards, the anomaly detector plugin generates a default monitor schedule that's twice the detector interval.
Whenever you update a detectors interval, make sure to update the associated monitor interval as well, as the anomaly detection plugin does not do this automatically.
**Note**: Anomaly detection is available only if you are defining a per query monitor.
{: .note}
To use an anomaly detector, choose **Define using Anomaly detector** and select your **Detector**.
1. Choose a frequency and timezone for your monitor. Note that you can only pick a timezone if you choose Daily, Weekly, Monthly, or [custom cron expression]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/cron/) for frequency.
1. Choose **Create**.
1. Add a trigger to your monitor.
---
## Create triggers
The next step in creating a monitor is to create a trigger. These steps differ depending on whether you chose **Define using visual graph** or **Define using extraction query** or **Define using Anomaly detector** when you created the monitor.
Steps to create a trigger differ depending on whether you chose **Visual editor**, **Extraction query editor**, or **Anomaly detector** when you created the monitor.
Either way, you begin by specifying a name and severity level for the trigger. Severity levels help you manage alerts. A trigger with a high severity level (e.g. 1) might page a specific individual, whereas a trigger with a low severity level might message a chat room.
You begin by specifying a name and severity level for the trigger. Severity levels help you manage alerts. A trigger with a high severity level (e.g. 1) might page a specific individual, whereas a trigger with a low severity level might message a chat room.
Remember that query-level monitors run your trigger's script just once against the query's results, but bucket-level monitors execute your trigger's script on each bucket, so you should create a trigger that best fits the monitor you chose. If you want to execute multiple scripts, you must create multiple triggers.
### Visual graph
### Visual editor
For **Trigger condition**, specify a threshold for the aggregation and timeframe you chose earlier, such as "is below 1,000" or "is exactly 10."
For a query-level monitor's **Trigger condition**, specify a threshold for the aggregation and timeframe you chose earlier, such as "is below 1,000" or "is exactly 10."
The line moves up and down as you increase and decrease the threshold. Once this line is crossed, the trigger evaluates to true.
Bucket-level monitors also require you to specify a threshold and value for your aggregation and timeframe, but you can use a maximum of five conditions to better refine your trigger. Optionally, you can also use a keyword filter to filter for a specific field in your index.
### Extraction query
For **Trigger condition**, specify a Painless script that returns true or false. Painless is the default OpenSearch scripting language and has a syntax similar to Groovy.
If you're using a query-level monitor, specify a Painless script that returns true or false. Painless is the default OpenSearch scripting language and has a syntax similar to Groovy.
Trigger condition scripts revolve around the `ctx.results[0]` variable, which corresponds to the extraction query response. For example, your script might reference `ctx.results[0].hits.total.value` or `ctx.results[0].hits.hits[i]._source.error_code`.
@ -208,6 +215,27 @@ A return value of true means the trigger condition has been met, and the trigger
The **Info** link next to **Trigger condition** contains a useful summary of the variables and results available to your query.
{: .tip }
Bucket-level monitors require you to specify more information in your trigger condition. At a minimum, you must have the following fields:
- `buckets_path`, which maps variable names to metrics to use in your script.
- `parent_bucket_path`, which is a path to a multi-bucket aggregation. The path can include single-bucket aggregations, but the last aggregation must be multi-bucket. For example, if you have a pipeline such as `agg1>agg2>agg3`, `agg1` and `agg2` are single-bucket aggregations, but `agg3` must be a multi-bucket aggregation.
- `script`, which is the script that OpenSearch runs to evaluate whether to trigger any alerts.
For example, you might have a script that looks like the following:
```json
{
"buckets_path": {
"count_var": "_count"
},
"parent_bucket_path": "composite_agg",
"script": {
"source": "params.count_var > 5"
}
}
```
After mapping the `count_var` variable to the `_count` metric, you can use `count_var` in your script and reference `_count` data. Finally, `composite_agg` is a path to a multi-bucket aggregation.
### Anomaly detector
@ -264,11 +292,11 @@ Below are some variables you can include in your message using Mustache template
Variable | Data Type | Description
:--- | :--- | :---
`ctx.monitor` | JSON | Includes `ctx.monitor.name`, `ctx.monitor.type`, `ctx.monitor.enabled`, `ctx.monitor.enabled_time`, `ctx.monitor.schedule`, `ctx.monitor.inputs`, `triggers` and `ctx.monitor.last_update_time`.
`ctx.monitor.user` | JSON | Includes information about the user who created the monitor. Includes `ctx.monitor.user.backend_roles` and `ctx.monitor.user.roles`, which are arrays that contain the backend roles and roles assigned to the user. See [alerting security]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/security/) for more information.
`ctx.monitor` | Object | Includes `ctx.monitor.name`, `ctx.monitor.type`, `ctx.monitor.enabled`, `ctx.monitor.enabled_time`, `ctx.monitor.schedule`, `ctx.monitor.inputs`, `triggers` and `ctx.monitor.last_update_time`.
`ctx.monitor.user` | Object | Includes information about the user who created the monitor. Includes `ctx.monitor.user.backend_roles` and `ctx.monitor.user.roles`, which are arrays that contain the backend roles and roles assigned to the user. See [alerting security]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/security/) for more information.
`ctx.monitor.enabled` | Boolean | Whether the monitor is enabled.
`ctx.monitor.enabled_time` | Milliseconds | Unix epoch time of when the monitor was last enabled.
`ctx.monitor.schedule` | JSON | Contains a schedule of how often or when the monitor should run.
`ctx.monitor.schedule` | Object | Contains a schedule of how often or when the monitor should run.
`ctx.monitor.schedule.period.interval` | Integer | The interval at which the monitor runs.
`ctx.monitor.schedule.period.unit` | String | The interval's unit of time.
`ctx.monitor.inputs` | Array | An array that contains the indices and definition used to create the monitor.
@ -282,7 +310,7 @@ Variable | Data Type | Description
`ctx.trigger.id` | String | The trigger's ID.
`ctx.trigger.name` | String | The trigger's name.
`ctx.trigger.severity` | String | The trigger's severity.
`ctx.trigger.condition`| JSON | Contains the Painless script used when creating the monitor.
`ctx.trigger.condition`| Object | Contains the Painless script used when creating the monitor.
`ctx.trigger.condition.script.source` | String | The language used to define the script. Must be painless.
`ctx.trigger.condition.script.lang` | String | The script used to define the trigger.
`ctx.trigger.actions`| Array | An array with one element that contains information about the action the monitor needs to trigger.
@ -309,7 +337,13 @@ Variable | Data Type | Description
`ctx.periodStart` | String | Unix timestamp for the beginning of the period during which the alert triggered. For example, if a monitor runs every ten minutes, a period might begin at 10:40 and end at 10:50.
`ctx.periodEnd` | String | The end of the period during which the alert triggered.
`ctx.error` | String | The error message if the trigger was unable to retrieve results or unable to evaluate the trigger, typically due to a compile error or null pointer exception. Null otherwise.
`ctx.alert` | JSON | The current, active alert (if it exists). Includes `ctx.alert.id`, `ctx.alert.version`, and `ctx.alert.isAcknowledged`. Null if no alert is active.
`ctx.alert` | Object | The current, active alert (if it exists). Includes `ctx.alert.id`, `ctx.alert.version`, and `ctx.alert.isAcknowledged`. Null if no alert is active. Only available with query-level monitors.
`ctx.dedupedAlerts` | Object | Alerts that have already been triggered. OpenSearch keeps the existing alert to prevent the plugin from creating endless amounts of the same alerts. Only available with bucket-level monitors.
`ctx.newAlerts` | Object | Newly created alerts. Only available with bucket-level monitors.
`ctx.completedAlerts` | Object | Alerts that are no longer ongoing. Only available with bucket-level monitors.
`bucket_keys` | String | Comma-separated list of the monitor's bucket key values. Available only for `ctx.dedupedAlerts`, `ctx.newAlerts`, and `ctx.completedAlerts`. Accessed through `ctx.dedupedAlerts[0].bucket_keys`.
`parent_bucket_path` | String | The parent bucket path of the bucket that triggered the alert. Accessed through `ctx.dedupedAlerts[0].parent_bucket_path`.
---
@ -334,6 +368,7 @@ If you don't want to receive notifications for alerts, you don't have to add act
```
In this case, the message content must conform to the `Content-Type` header in the [custom webhook](#create-destinations).
1. If you're using a bucket-level monitor, you can choose whether the monitor should perform an action for each execution or for each alert.
1. (Optional) Use action throttling to limit the number of notifications you receive within a given span of time.

View File

@ -1,6 +1,6 @@
---
layout: default
title: Create Dashboards
title: Create PerfTop Dashboards
parent: Performance Analyzer
nav_order: 2
---

View File

@ -48,11 +48,11 @@ Otherwise, just specify the OpenSearch endpoint:
./opensearch-perf-top-macos --dashboard dashboards/<dashboard>.json --endpoint my-cluster.my-domain.com
```
PerfTop has four pre-built dashboards in the `dashboards` directory, but you can also [create your own]({{site.url}}{{site.baseurl}}/dashboards/).
PerfTop has four pre-built dashboards in the `dashboards` directory, but you can also [create your own]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/dashboards/).
You can also load the pre-built dashboards (ClusterOverview, ClusterNetworkMemoryAnalysis, ClusterThreadAnalysis, or NodeAnalysis) without the JSON files, such as `--dashboard ClusterThreadAnalysis`.
PerfTop has no interactivity. Start the application, monitor the dashboard, and press esc, q, or Ctrl + C to quit.
PerfTop has no interactivity. Start the application, monitor the dashboard, and press Esc, Q, or Ctrl + C to quit.
{: .note }

View File

@ -40,9 +40,19 @@ Source for the OpenTelemetry Collector.
Option | Required | Description
:--- | :--- | :---
ssl | No | Boolean, whether to connect to the OpenTelemetry Collector over SSL.
sslKeyCertChainFile | No | String, path to the security certificate (e.g. `"config/demo-data-prepper.crt"`.
sslKeyFile | No | String, path to the security certificate key (e.g. `"config/demo-data-prepper.key"`).
port | No | Integer, the port OTel trace source is running on. Default is `21890`.
request_timeout | No | Integer, the request timeout in millis. Default is `10_000`.
health_check_service | No | Boolean, enables a gRPC health check service under `grpc.health.v1/Health/Check`. Default is `false`.
proto_reflection_service | No | Boolean, enables a reflection service for Protobuf services (see [gRPC reflection](https://github.com/grpc/grpc/blob/master/doc/server-reflection.md) and [gRPC Server Reflection Tutorial](https://github.com/grpc/grpc-java/blob/master/documentation/server-reflection-tutorial.md) docs). Default is `false`.
unframed_requests | No | Boolean, enable requests not framed using the gRPC wire protocol.
thread_count | No | Integer, the number of threads to keep in the ScheduledThreadPool. Default is `200`.
max_connection_count | No | Integer, the maximum allowed number of open connections. Default is `500`.
ssl | No | Boolean, enables connections to the OTel source port over TLS/SSL. Defaults to `true`.
sslKeyCertChainFile | Conditionally | String, file-system path or AWS S3 path to the security certificate (e.g. `"config/demo-data-prepper.crt"` or `"s3://my-secrets-bucket/demo-data-prepper.crt"`). Required if ssl is set to `true`.
sslKeyFile | Conditionally | String, file-system path or AWS S3 path to the security key (e.g. `"config/demo-data-prepper.key"` or `"s3://my-secrets-bucket/demo-data-prepper.key"`). Required if ssl is set to `true`.
useAcmCertForSSL | No | Boolean, enables TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is `false`.
acmCertificateArn | Conditionally | String, represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificate. Required if `useAcmCertForSSL` is set to `true`.
awsRegion | Conditionally | String, represents the AWS region to use ACM or S3. Required if `useAcmCertForSSL` is set to `true` or `sslKeyCertChainFile` and `sslKeyFile` are AWS S3 paths.
### file
@ -114,11 +124,17 @@ Option | Required | Description
:--- | :--- | :---
time_out | No | Integer, forwarded request timeout in seconds. Defaults to 3 seconds.
span_agg_count | No | Integer, batch size for number of spans per request. Defaults to 48.
discovery_mode | No | String, peer discovery mode to be used. Allowable values are `static` and `dns`. Defaults to `static`.
target_port | No | Integer, the destination port to forward requests to. Defaults to `21890`.
discovery_mode | No | String, peer discovery mode to be used. Allowable values are `static`, `dns`, and `aws_cloud_map`. Defaults to `static`.
static_endpoints | No | List, containing string endpoints of all Data Prepper instances.
domain_name | No | String, single domain name to query DNS against. Typically used by creating multiple DNS A Records for the same domain.
ssl | No | Boolean, indicating whether TLS should be used. Default is true.
sslKeyCertChainFile | No | String, path to the security certificate
awsCloudMapNamespaceName | Conditionally | String, name of your CloudMap Namespace. Required if `discovery_mode` is set to `aws_cloud_map`.
awsCloudMapServiceName | Conditionally | String, service name within your CloudMap Namespace. Required if `discovery_mode` is set to `aws_cloud_map`.
sslKeyCertChainFile | Conditionally | String, represents the SSL certificate chain file path or AWS S3 path. S3 path example `s3://<bucketName>/<path>`. Required if `ssl` is set to `true`.
useAcmCertForSSL | No | Boolean, enables TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is `false`.
awsRegion | Conditionally | String, represents the AWS region to use ACM, S3, or CloudMap. Required if `useAcmCertForSSL` is set to `true` or `sslKeyCertChainFile` and `sslKeyFile` are AWS S3 paths.
acmCertificateArn | Conditionally | String represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificate. Required if `useAcmCertForSSL` is set to `true`.
### string_converter
@ -144,8 +160,9 @@ hosts | Yes | List of OpenSearch hosts to write to (e.g. `["https://localhost:92
cert | No | String, path to the security certificate (e.g. `"config/root-ca.pem"`) if the cluster uses the OpenSearch security plugin.
username | No | String, username for HTTP basic authentication.
password | No | String, password for HTTP basic authentication.
aws_sigv4 | No | Boolean, whether to use IAM signing to connect to an Amazon ES cluster. For your access key, secret key, and optional session token, Data Prepper uses the default credential chain (environment variables, Java system properties, `~/.aws/credential`, etc.).
aws_region | No | String, AWS region for the cluster (e.g. `"us-east-1"`) if you are connecting to Amazon ES.
aws_sigv4 | No | Boolean, whether to use IAM signing to connect to an Amazon OpenSearch Service domain. For your access key, secret key, and optional session token, Data Prepper uses the default credential chain (environment variables, Java system properties, `~/.aws/credential`, etc.).
aws_region | No | String, AWS region (e.g. `"us-east-1"`) for the domain if you are connecting to Amazon OpenSearch Service.
aws_sts_role | No | String, IAM role which the sink plugin will assume to sign request to Amazon OpenSearch Service. If not provided the plugin will use the default credentials.
trace_analytics_raw | No | Boolean, default false. Whether to export as trace data to the `otel-v1-apm-span-*` index pattern (alias `otel-v1-apm-span`) for use with the Trace Analytics OpenSearch Dashboards plugin.
trace_analytics_service_map | No | Boolean, default false. Whether to export as trace data to the `otel-v1-apm-service-map` index for use with the service map component of the Trace Analytics OpenSearch Dashboards plugin.
index | No | String, name of the index to export to. Only required if you don't use the `trace_analytics_raw` or `trace_analytics_service_map` presets.

View File

@ -65,4 +65,16 @@ PUT _cluster/settings
You can find `opensearch.yml` in `/usr/share/opensearch/config/opensearch.yml` (Docker) or `/etc/opensearch/opensearch.yml` (most Linux distributions) on each node.
You can edit the `OPENSEARCH_PATH_CONF=/etc/opensearch` to change the config directory location. This variable is sourced from `/etc/default/opensearch`(Debian package) and `/etc/sysconfig/opensearch`(RPM package).
If you set your customized `OPENSEARCH_PATH_CONF` variable, be aware that other default environment variables will not be loaded.
You don't mark settings in `opensearch.yml` as persistent or transient, and settings use the flat form:
```yml
cluster.name: my-application
action.auto_create_index: true
compatibility.override_main_response_version: true
```
The demo configuration includes a number of settings for the security plugin that you should modify before using OpenSearch for a production workload. To learn more, see [Security]({{site.url}}{{site.baseurl}}/security-plugin/).

View File

@ -8,39 +8,13 @@ nav_order: 13
If you're ingesting continuously generated time-series data such as logs, events, and metrics into OpenSearch, you're likely in a scenario where the number of documents grows rapidly and you don't need to update older documents.
A typical workflow to manage time-series data involves multiple steps such as creating a rollover index alias, defining a write index, and defining common mappings and settings for the backing indices.
A typical workflow to manage time-series data involves multiple steps, such as creating a rollover index alias, defining a write index, and defining common mappings and settings for the backing indices.
Data streams simplify this bootstrapping process and enforce a setup that best suits time-series data, such as being designed primarily for append-only data, and ensuring that each document has a timestamp field.
Data streams simplify this process and enforce a setup that best suits time-series data, such as being designed primarily for append-only data and ensuring that each document has a timestamp field.
A data stream is internally composed of multiple backing indices. Search requests are routed to all the backing indices, while indexing requests are routed to the latest write index. You can use [ISM]({{site.url}}{{site.baseurl}}/im-plugin/ism/index/) policies to automatically handle rollovers or deletion of indices in a data stream, based on your use case.
A data stream is internally composed of multiple backing indices. Search requests are routed to all the backing indices, while indexing requests are routed to the latest write index. [ISM]({{site.url}}{{site.baseurl}}/im-plugin/ism/index/) policies let you automatically handle index rollovers or deletions.
## About data streams
A data stream consists of one or more hidden auto-generated backing indices. These backing indices are named using the following convention:
```
.ds-<data-stream-name>-<generation-id>
```
For example, `.ds-logs-redis-000003`, where generation-id is a six-digit, zero-padded integer that acts as a cumulative count of the data streams rollovers, starting at `000001`.
The most recently created backing index is the data streams write index. You cant add documents directly to any of the backing indices. You can only add them via the data stream handle:
![data stream indexing diagram]({{site.url}}{{site.baseurl}}/images/data_stream_indexing.png)
The data stream routes search requests to all of its backing indices. It uses the timestamp field to intelligently route search requests to the right set of indices and shards:
![data stream indexing diagram]({{site.url}}{{site.baseurl}}/images/data_stream_searching.png)
The following operations are not supported on the write index because they might hinder the indexing operation:
- close
- clone
- delete
- shrink
- split
## Get started with data streams
### Step 1: Create an index template

View File

@ -1,60 +0,0 @@
---
layout: default
title: Date math support for index names
nav_order: 92
---
# Date math support for index names
Date math is shorthand arithmetic for finding relative dates.
If you're indexing time-series data with the dates mapped in the index names, you can use date math in your queries to filter index names and limit the number of searched indices.
## Date math syntax
The date math syntax for an index name is as follows:
```
<static_name{date_math_expr{date_format|time_zone}}>
```
- `static_name`: The unchanged or static portion of the index name. To use the characters `{` and `}` in the static part of an index name, escape them with a backslash `\`.
- `date_math_expr`: The changing or dynamic portion of the index name thats computed by the date math expression. For example, `now+1h` adds one hour, `now-1d` subtracts one hour, and `now/d` rounds down to the nearest day, where `now` represents the current timestamp.
- `date_format`: (Optional) Specify the format for the computed date. The default value is `YYYY.MM.dd`. Make sure that youre using the correct small or capital letters in the date format. For example, `mm` denotes minute of hour, while `MM` denotes month of year. Similarly, `hh` denotes the hour in the `1-12` range in combination with AM/PM, while `HH` denotes the hour in the `0-23` 24-hour range.
- `time_zone`: (Optional) Specify the timezone offset. The default value is UTC. For example, the UTC time offset for PST is `-08:00`.
## Date math example
You must enclose date math index names within angle brackets.
If today is 22nd March, 2024:
- `<logstash-{now/d}>` resolves to `logstash-2024.03.22`
- `<logstash-{now/M}>` resolves to `logstash-2024.03.01`
- `<logstash-{now/M{YYYY.MM}}>` resolves to `logstash-2024.03`
- `<logstash-{now/M-1M{YYYY.MM}}>` resolves to `logstash-2024.02`
- `<logstash-{now/d{yyyy.MM.dd|+12:00}}>` resolves to `logstash-2024.03.23`
You need to encode all special characters in URI format:
Special characters | URI format
:--- | :---
`<` | %3C
`>` | %3E
`/` | %2F
`{` | %7B
`}` | %7D
`|` | %7C
`+` | %2B
`:` | %3A
`,` | %2C
`\` | %5C
If you are searching for errors in your daily logs with the default Logstash index name format `logstash-YYYY.MM.dd`, you can use date math to restrict the search to indices of the past three days:
```
# GET <logstash-{now/d-2d}>,<logstash-{now/d-1d}>,<logstash-{now/d}>/_search
GET %3Clogstash-%7Bnow%2Fd-2d%7D%3E%2C%3Clogstash-%7Bnow%2Fd-1d%7D%3E%2C%3Clogstash-%7Bnow%2Fd%7D%3E/_search
```
This date math expression is evaluated at runtime.

View File

@ -195,7 +195,7 @@ You can use composable index templates to overcome these challenges. Composable
You can combine component templates to compose an index template.
Settings and mappings that you specify directly in the [create index]({{site.url}}{{site.baseurl}}/opensearch/rest-api/create-index/) request override any settings or mappings specified in an index template and its component templates.
Settings and mappings that you specify directly in the [create index]({{site.url}}{{site.baseurl}}/opensearch/rest-api/index-apis/create-index/) request override any settings or mappings specified in an index template and its component templates.
{: .note }
### Create a component template
@ -328,71 +328,6 @@ GET logs-2020-01-01
}
```
### Simulate multi-component templates
For index templates composed of multiple component templates, you can simulate applying a new template to verify whether the settings are applied as you expect.
To simulate the settings that would be applied to a specific index name:
```json
POST _index_template/_simulate_index/<index_name>
```
To simulate the settings that would be applied from an existing template:
```json
POST _index_template/_simulate/<index_template>
```
You can also specify a template definition in the simulate request:
```json
POST _index_template/_simulate
{
"index_patterns": [
"logs-2020-01-*"
],
"template": {
"settings" : {
"index.number_of_shards" : 3
}
},
"composed_of": ["component_template_1", "component_template_2"]
}
```
The `_simulate` API returns the final settings, mappings, and aliases that will be applied to indices that match the index pattern. You can also see any overlapping templates whose configuration is superseded by the simulated template body or higher priority templates:
```json
{
"template" : {
"settings" : {
"index" : {
"number_of_shards" : "3"
}
},
"mappings" : {
"properties" : {
"@timestamp" : {
"type" : "date"
},
"ip_address" : {
"type" : "ip"
}
}
},
"aliases" : { }
},
"overlapping" : [
{
"name" : "daily_logs",
"index_patterns" : [
"logs-2020-01-*"
]
}
]
}
```
## Index template options

View File

@ -5,9 +5,12 @@ nav_order: 1
has_children: false
has_toc: false
redirect_from:
- /docs/opensearch/
- /opensearch/
---
{%- comment -%}The `/docs/opensearch/` redirect is specifically to support the UI links in OpenSearch Dashboards 1.0.0.{%- endcomment -%}
# Introduction to OpenSearch
OpenSearch is a distributed search and analytics engine based on [Apache Lucene](https://lucene.apache.org/). After adding your data to OpenSearch, you can perform full-text searches on it with all of the features you might expect: search by field, search multiple indices, boost fields, rank results by score, sort results by field, and aggregate results.

View File

@ -129,8 +129,8 @@ plugins.security.enable_snapshot_restore_privilege: true
plugins.security.check_snapshot_restore_write_privileges: true
plugins.security.restapi.roles_enabled: ["all_access", "security_rest_api_access"]
cluster.routing.allocation.disk.threshold_enabled: false
plugins.security.audit.config.disabled_rest_categories: NONE
plugins.security.audit.config.disabled_transport_categories: NONE
opendistro_security.audit.config.disabled_rest_categories: NONE
opendistro_security.audit.config.disabled_transport_categories: NONE
```
Use this same override process to specify new [authentication settings]({{site.url}}{{site.baseurl}}/security-plugin/configuration/configuration/) in `/usr/share/opensearch/plugins/opensearch-security/securityconfig/config.yml`, as well as new default [internal users, roles, mappings, action groups, and tenants]({{site.url}}{{site.baseurl}}/security-plugin/configuration/yaml/).

View File

@ -20,7 +20,7 @@ Resource | Description
The specification in the default Helm chart supports many standard use cases and setups. You can modify the default chart to configure your desired specifications and set Transport Layer Security (TLS) and role-based access control (RBAC).
For information about the default configuration, steps to configure security, and configurable parameters, see the
[README](https://github.com/opensearch-project/opensearch-devops/blob/main/Helm/README.md).
[README](https://github.com/opensearch-project/helm-charts/tree/main/charts).
The instructions here assume you have a Kubernetes cluster with Helm preinstalled. See the [Kubernetes documentation](https://kubernetes.io/docs/setup/) for steps to configure a Kubernetes cluster and the [Helm documentation](https://helm.sh/docs/intro/install/) to install Helm.
{: .note }
@ -31,16 +31,42 @@ The default Helm chart deploys a three-node cluster. We recommend that you have
## Install OpenSearch using Helm
1. Clone the [opensearch-devops](https://github.com/opensearch-project/opensearch-devops/) repository:
1. Add `opensearch` [helm-charts](https://github.com/opensearch-project/helm-charts) repository to Helm:
```bash
git clone https://github.com/opensearch-project/opensearch-devops.git
helm repo add opensearch https://opensearch-project.github.io/helm-charts/
```
1. Update the available charts locally from charts repositories:
```bash
helm repo update
```
1. To search for the OpenSearch-related Helm charts:
```bash
helm search repo opensearch
```
```bash
NAME CHART VERSION APP VERSION DESCRIPTION
opensearch/opensearch 1.0.7 1.0.0 A Helm chart for OpenSearch
opensearch/opensearch-dashboards 1.0.4 1.0.0 A Helm chart for OpenSearch Dashboards
```
1. Deploy OpenSearch:
```bash
helm install my-deployment opensearch/opensearch
```
You can also build the `opensearch-1.0.0.tgz` file manually:
1. Change to the `opensearch` directory:
```bash
cd Helm/opensearch
cd charts/opensearch
```
1. Package the Helm chart:

View File

@ -29,6 +29,41 @@ If you don't want to use the all-in-one OpenSearch installation options, you can
</tr>
</thead>
<tbody>
<tr>
<td>1.1.0</td>
<td>
<pre>opensearch-alerting 1.1.0.0
opensearch-anomaly-detection 1.1.0.0
opensearch-asynchronous-search 1.1.0.0
opensearch-cross-cluster-replication 1.1.0.0
opensearch-index-management 1.1.0.0
opensearch-job-scheduler 1.1.0.0
opensearch-knn 1.1.0.0
opensearch-notebooks 1.1.0.0
opensearch-performance-analyzer 1.1.0.0
opensearch-reports-scheduler 1.1.0.0
opensearch-security 1.1.0.0
opensearch-sql 1.1.0.0
</pre>
</td>
</tr>
<tr>
<td>1.0.1</td>
<td>
<pre>opensearch-alerting 1.0.0.0
opensearch-anomaly-detection 1.0.0.0
opensearch-asynchronous-search 1.0.0.0
opensearch-index-management 1.0.1.0
opensearch-job-scheduler 1.0.0.0
opensearch-knn 1.0.0.0
opensearch-notebooks 1.0.0.0
opensearch-performance-analyzer 1.0.1.0
opensearch-reports-scheduler 1.0.0.0
opensearch-security 1.0.1.0
opensearch-sql 1.0.0.0
</pre>
</td>
</tr>
<tr>
<td>1.0.0</td>
<td>
@ -43,39 +78,6 @@ opensearch-performance-analyzer 1.0.0.0
opensearch-reports-scheduler 1.0.0.0
opensearch-security 1.0.0.0
opensearch-sql 1.0.0.0
</pre>
</td>
</tr>
<tr>
<td>1.0.0-rc1</td>
<td>
<pre>opensearch-alerting 1.0.0.0-rc1
opensearch-anomaly-detection 1.0.0.0-rc1
opensearch-asynchronous-search 1.0.0.0-rc1
opensearch-index-management 1.0.0.0-rc1
opensearch-job-scheduler 1.0.0.0-rc1
opensearch-knn 1.0.0.0-rc1
opensearch-notebooks 1.0.0.0-rc1
opensearch-performance-analyzer 1.0.0.0-rc1
opensearch-reports-scheduler 1.0.0.0-rc1
opensearch-security 1.0.0.0-rc1
opensearch-sql 1.0.0.0-rc1
</pre>
</td>
</tr>
<tr>
<td>1.0.0-beta1</td>
<td>
<pre>opensearch-alerting 1.0.0.0-beta1
opensearch-anomaly-detection 1.0.0.0-beta1
opensearch-asynchronous-search 1.0.0.0-beta1
opensearch-index-management 1.0.0.0-beta1
opensearch-job-scheduler 1.0.0.0-beta1
opensearch-knn 1.0.0.0-beta1
opensearch-performance-analyzer 1.0.0.0-beta1
opensearch-reports-scheduler 1.0.0.0-beta1
opensearch-security 1.0.0.0-beta1
opensearch-sql 1.0.0.0-beta1
</pre>
</td>
</tr>

View File

@ -18,17 +18,20 @@ The tarball supports most Linux distributions, including CentOS 7, Amazon Linux
```bash
# x64
tar -zxf opensearch-{{site.opensearch_version}}-linux-x64.tar.gz
cd opensearch-{{site.opensearch_version}}{% comment %}# ARM64
cd opensearch-{{site.opensearch_version}}
# ARM64
tar -zxf opensearch-{{site.opensearch_version}}-linux-arm64.tar.gz
cd opensearch-{{site.opensearch_version}}{% endcomment %}
cd opensearch-{{site.opensearch_version}}
```
1. Run OpenSearch:
1. Run OpenSearch with the demo security configuration:
```bash
./opensearch-tar-install.sh
```
If you configure the security plugin for production use (or disable it), you can run OpenSearch using `./bin/opensearch`.
1. Open a second terminal session, and send requests to the server to verify that OpenSearch is up and running:
```bash
@ -111,7 +114,7 @@ In a tarball installation, Performance Analyzer collects data when it is enabled
1. Launch the agent CLI:
```bash
ES_HOME="$PWD" ./bin/performance-analyzer-agent-cli
OPENSEARCH_HOME="$PWD" ./bin/performance-analyzer-agent-cli
```
1. In a separate window, enable the Performance Analyzer plugin:

View File

@ -88,8 +88,8 @@ GET opensearch_dashboards_sample_data_ecommerce/_search
}
```
The cardinality count is approximate.
If you had tens of thousands of products in your store, an accurate cardinality calculation requires loading all the values into a hash set and returning its size. This approach doesn't scale well because it requires more memory and causes high latency.
Cardinality count is approximate.
If you have tens of thousands of products in your hypothetical store, an accurate cardinality calculation requires loading all the values into a hash set and returning its size. This approach doesn't scale well; it requires huge amounts of memory and can cause high latencies.
You can control the trade-off between memory and accuracy with the `precision_threshold` setting. This setting defines the threshold below which counts are expected to be close to accurate. Above this value, counts might become a bit less accurate. The default value of `precision_threshold` is 3,000. The maximum supported value is 40,000.

View File

@ -21,7 +21,7 @@ This page lists all full-text query types and common options. Given the sheer nu
## Match
Creates a [boolean query](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/BooleanQuery.html) that returns results if the search term is present in the field.
Creates a [boolean query](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/BooleanQuery.html) that returns results if the search term is present in the field.
The most basic form of the query provides only a field (`title`) and a term (`wind`):
@ -126,7 +126,7 @@ GET _search
## Match boolean prefix
Similar to [match](#match), but creates a [prefix query](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/PrefixQuery.html) out of the last term in the query string.
Similar to [match](#match), but creates a [prefix query](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/PrefixQuery.html) out of the last term in the query string.
```json
GET _search
@ -164,7 +164,7 @@ GET _search
## Match phrase
Creates a [phrase query](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/PhraseQuery.html) that matches a sequence of terms.
Creates a [phrase query](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/PhraseQuery.html) that matches a sequence of terms.
```json
GET _search
@ -198,7 +198,7 @@ GET _search
## Match phrase prefix
Similar to [match phrase](#match-phrase), but creates a [prefix query](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/PrefixQuery.html) out of the last term in the query string.
Similar to [match phrase](#match-phrase), but creates a [prefix query](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/PrefixQuery.html) out of the last term in the query string.
```json
GET _search
@ -410,7 +410,7 @@ Option | Valid values | Description
`allow_leading_wildcard` | Boolean | Whether `*` and `?` are allowed as the first character of a search term. The default is true.
`analyze_wildcard` | Boolean | Whether OpenSearch should attempt to analyze wildcard terms. Some analyzers do a poor job at this task, so the default is false.
`analyzer` | `standard, simple, whitespace, stop, keyword, pattern, <language>, fingerprint` | The analyzer you want to use for the query. Different analyzers have different character filters, tokenizers, and token filters. The `stop` analyzer, for example, removes stop words (e.g. "an," "but," "this") from the query string.
`auto_generate_synonyms_phrase_query` | Boolean | A value of true (default) automatically generates [phrase queries](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/PhraseQuery.html) for multi-term synonyms. For example, if you have the synonym `"ba, batting average"` and search for "ba," OpenSearch searches for `ba OR "batting average"` (if this option is true) or `ba OR (batting AND average)` (if this option is false).
`auto_generate_synonyms_phrase_query` | Boolean | A value of true (default) automatically generates [phrase queries](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/PhraseQuery.html) for multi-term synonyms. For example, if you have the synonym `"ba, batting average"` and search for "ba," OpenSearch searches for `ba OR "batting average"` (if this option is true) or `ba OR (batting AND average)` (if this option is false).
`boost` | Floating-point | Boosts the clause by the given multiplier. Useful for weighing clauses in compound queries. The default is 1.0.
`cutoff_frequency` | Between `0.0` and `1.0` or a positive integer | This value lets you define high and low frequency terms based on number of occurrences in the index. Numbers between 0 and 1 are treated as a percentage. For example, 0.10 is 10%. This value means that if a word occurs within the search field in more than 10% of the documents on the shard, OpenSearch considers the word "high frequency" and deemphasizes it when calculating search score.<br /><br />Because this setting is *per shard*, testing its impact on search results can be challenging unless a cluster has many documents.
`enable_position_increments` | Boolean | When true, result queries are aware of position increments. This setting is useful when the removal of stop words leaves an unwanted "gap" between terms. The default is true.
@ -420,7 +420,7 @@ Option | Valid values | Description
`fuzzy_transpositions` | Boolean | Setting `fuzzy_transpositions` to true (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the `fuzziness` option. For example, the distance between `wind` and `wnid` is 1 if `fuzzy_transpositions` is true (swap "n" and "i") and 2 if it is false (delete "n", insert "n"). <br /><br />If `fuzzy_transpositions` is false, `rewind` and `wnid` have the same distance (2) from `wind`, despite the more human-centric opinion that `wnid` is an obvious typo. The default is a good choice for most use cases.
`lenient` | Boolean | Setting `lenient` to true lets you ignore data type mismatches between the query and the document field. For example, a query string of "8.2" could match a field of type `float`. The default is false.
`low_freq_operator` | `and, or` | The operator for low-frequency terms. The default is `or`. See [Common terms](#common-terms) queries and `operator` in this table.
`max_determinized_states` | Positive integer | The maximum number of "[states](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/util/automaton/Operations.html#DEFAULT_MAX_DETERMINIZED_STATES)" (a measure of complexity) that Lucene can create for query strings that contain regular expressions (e.g. `"query": "/wind.+?/"`). Larger numbers allow for queries that use more memory. The default is 10,000.
`max_determinized_states` | Positive integer | The maximum number of "[states](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/util/automaton/Operations.html#DEFAULT_MAX_DETERMINIZED_STATES)" (a measure of complexity) that Lucene can create for query strings that contain regular expressions (e.g. `"query": "/wind.+?/"`). Larger numbers allow for queries that use more memory. The default is 10,000.
`max_expansions` | Positive integer | Fuzzy queries "expand to" a number of matching terms that are within the distance specified in `fuzziness`. Then OpenSearch tries to match those terms against its indices. `max_expansions` specifies the maximum number of terms that the fuzzy query expands to. The default is 50.
`minimum_should_match` | Positive or negative integer, positive or negative percentage, combination | If the query string contains multiple search terms and you used the `or` operator, the number of terms that need to match for the document to be considered a match. For example, if `minimum_should_match` is 2, "wind often rising" does not match "The Wind Rises." If `minimum_should_match` is 1, it matches. This option also has `low_freq` and `high_freq` properties for [Common terms](#common-terms) queries.
`operator` | `or, and` | If the query string contains multiple search terms, whether all terms need to match (`and`) or only one term needs to match (`or`) for a document to be considered a match.
@ -428,7 +428,7 @@ Option | Valid values | Description
`prefix_length` | `0` (default) or a positive integer | The number of leading characters that are not considered in fuzziness.
`quote_field_suffix` | String | This option lets you search different fields depending on whether terms are wrapped in quotes. For example, if `quote_field_suffix` is `".exact"` and you search for `"lightly"` (in quotes) in the `title` field, OpenSearch searches the `title.exact` field. This second field might use a different type (e.g. `keyword` rather than `text`) or a different analyzer. The default is null.
`rewrite` | `constant_score, scoring_boolean, constant_score_boolean, top_terms_N, top_terms_boost_N, top_terms_blended_freqs_N` | Determines how OpenSearch rewrites and scores multi-term queries. The default is `constant_score`.
`slop` | `0` (default) or a positive integer | Controls the degree to which words in a query can be misordered and still be considered a match. From the [Lucene documentation](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/PhraseQuery.html#getSlop--): "The number of other words permitted between words in query phrase. For example, to switch the order of two words requires two moves (the first move places the words atop one another), so to permit re-orderings of phrases, the slop must be at least two. A value of zero requires an exact match."
`slop` | `0` (default) or a positive integer | Controls the degree to which words in a query can be misordered and still be considered a match. From the [Lucene documentation](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/PhraseQuery.html#getSlop--): "The number of other words permitted between words in query phrase. For example, to switch the order of two words requires two moves (the first move places the words atop one another), so to permit re-orderings of phrases, the slop must be at least two. A value of zero requires an exact match."
`tie_breaker` | `0.0` (default) to `1.0` | Changes the way OpenSearch scores searches. For example, a `type` of `best_fields` typically uses the highest score from any one field. If you specify a `tie_breaker` value between 0.0 and 1.0, the score changes to highest score + `tie_breaker` * score for all other matching fields. If you specify a value of 1.0, OpenSearch adds together the scores for all matching fields (effectively defeating the purpose of `best_fields`).
`time_zone` | UTC offset | The time zone to use (e.g. `-08:00`) if the query string contains a date range (e.g. `"query": "wind rises release_date[2012-01-01 TO 2014-01-01]"`). The default is `UTC`.
`type` | `best_fields, most_fields, cross-fields, phrase, phrase_prefix` | Determines how OpenSearch executes the query and scores the results. The default is `best_fields`.

View File

@ -5,8 +5,11 @@ nav_order: 27
has_children: true
redirect_from:
- /opensearch/query-dsl/
- /docs/opensearch/query-dsl/
---
{%- comment -%}The `/docs/opensearch/query-dsl/` redirect is specifically to support the UI links in OpenSearch Dashboards 1.0.0.{%- endcomment -%}
# Query DSL
While you can use HTTP request parameters to perform simple searches, you can also use the OpenSearch query domain-specific language (DSL), which provides a wider range of search options. The query DSL uses the HTTP request body, so you can more easily customize your queries to get the exact results that you want.

View File

@ -430,7 +430,7 @@ Wildcard queries tend to be slow because they need to iterate over a lot of term
## Regex
Use the `regex` query to search for terms that match a regular expression.
Use the `regexp` query to search for terms that match a regular expression.
This regular expression matches any single uppercase or lowercase letter:
@ -439,12 +439,14 @@ GET shakespeare/_search
{
"query": {
"regexp": {
"play_name": "H[a-zA-Z]+mlet"
"play_name": "[a-zA-Z]amlet"
}
}
}
```
Regular expressions are applied to the terms in the field and not the entire value of the field.
A few important notes:
The efficiency of your regular expression depends a lot on the patterns you write. Make sure that you write `regex` queries with either a prefix or suffix to improve performance.
- Regular expressions are applied to the terms in the field (i.e. tokens), not the entire field.
- Regular expressions use the Lucene syntax, which differs from more standardized implementations. Test thoroughly to ensure that you receive the results you expect. To learn more, see [the Lucene documentation](https://lucene.apache.org/core/{{site.lucene_version}}/core/index.html).
- `regexp` queries can be expensive operations and require the `search.allow_expensive_queries` setting to be set to `true`. Before making frequent `regexp` queries, test their impact on cluster performance and examine alternative queries for achieving similar results.

View File

@ -156,28 +156,6 @@ POST _reindex
}
```
## Reindex sorted documents
You can copy certain documents after sorting specific fields in the document.
This command copies the last 10 documents based on the `timestamp` field:
```json
POST _reindex
{
"size":10,
"source":{
"index":"source",
"sort":{
"timestamp":"desc"
}
},
"dest":{
"index":"destination"
}
}
```
## Transform documents during reindexing
You can transform your data during the reindexing process using the `script` option.
@ -272,7 +250,6 @@ Option | Valid values | Description | Required
`query` | Object | The search query to use for the reindex operation. | No
`size` | Integer | The number of documents to reindex. | No
`slice` | String | Specify manual or automatic slicing to parallelize reindexing. | No
`sort` | List | Sort specific fields in the document before reindexing. | No
## Destination index options

View File

@ -0,0 +1,103 @@
---
layout: default
title: Count
parent: REST API reference
nav_order: 150
---
# Count
Introduced 1.0
{: .label .label-purple }
The count API gives you quick access to the number of documents that match a query.
You can also use it to check the document count of an index, data stream, or cluster.
## Example
To see the number of documents that match a query:
```json
GET opensearch_dashboards_sample_data_logs/_count
{
"query": {
"term": {
"response": "200"
}
}
}
```
The following call to the search API produces equivalent results:
```json
GET opensearch_dashboards_sample_data_logs/_search
{
"query": {
"term": {
"response": "200"
}
},
"size": 0,
"track_total_hits": true
}
```
To see the number of documents in an index:
```json
GET opensearch_dashboards_sample_data_logs/_count
```
To check for the number of documents in a [data stream]({{site.url}}{{site.baseurl}}/opensearch/data-streams/), replace the index name with the data stream name.
To see the number of documents in your cluster:
```json
GET _count
```
Alternatively, you could use the [cat indices]({{site.url}}{{site.baseurl}}/opensearch/rest-api/cat/cat-indices/) and [cat count]({{site.url}}{{site.baseurl}}/opensearch/rest-api/cat/cat-count/) APIs to see the number of documents per index or data stream.
{: .note }
## Path and HTTP methods
```
GET <target>/_count/<id>
POST <target>/_count/<id>
```
## URL parameters
All count parameters are optional.
Parameter | Type | Description
:--- | :--- | :---
`allow_no_indices` | Boolean | If false, the request returns an error if any wildcard expression or index alias targets any closed or missing indices. Default is false.
`analyzer` | String | The analyzer to use in the query string.
`analyze_wildcard` | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is false.
`default_operator` | String | Indicates whether the default operator for a string query should be AND or OR. Default is OR.
`df` | String | The default field in case a field prefix is not provided in the query string.
`expand_wildcards` | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indices), `closed` (match closed, non-hidden indices), `hidden` (match hidden indices), and `none` (deny wildcard expressions). Default is `open`.
`ignore_unavailable` | Boolean | Specifies whether to include missing or closed indices in the response. Default is false.
`lenient` | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is false.
`min_score` | Float | Include only documents with a minimum `_score` value in the result.
`routing` | String | Value used to route the operation to a specific shard.
`preference` | String | Specifies which shard or node OpenSearch should perform the count operation on.
`terminate_after` | Integer | The maximum number of documents OpenSearch should process before terminating the request.
## Response
```json
{
"count" : 14074,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
```

View File

@ -32,7 +32,7 @@ POST _bulk
```
POST _bulk
POST {index}/_bulk
POST <index>/_bulk
```
Specifying the index in the path means you don't need to include it in the [request body]({{site.url}}{{site.baseurl}}/opensearch/rest-api/document-apis/bulk/#request-body).

View File

@ -28,7 +28,7 @@ POST sample-index1/_delete_by_query
## Path and HTTP methods
```
POST <target>/_delete_by_query
POST <index>/_delete_by_query
```
## URL parameters
@ -37,8 +37,8 @@ All URL parameters are optional.
Parameter | Type | Description
:--- | :--- | :--- | :---
&lt;index&gt; | String | Name of the data streams, indices, or aliases to delete from. Supports wildcards. If left blank, OpenSearch searches all indices.
allow_no_indices | Boolean | False indicates to OpenSearch the request should return an error if any wildcard expression or index alias targets only missing or closed indices. Default is true.
&lt;index&gt; | String | Name or list of the data streams, indices, or aliases to delete from. Supports wildcards. If left blank, OpenSearch searches all indices.
allow_no_indices | Boolean | Whether to ignore wildcards that dont match any indices. Default is `true`.
analyzer | String | The analyzer to use in the query string.
analyze_wildcard | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is false.
conflicts | String | Indicates to OpenSearch what should happen if the delete by query operation runs into a version conflict. Valid options are `abort` and `proceed`. Default is `abort`.
@ -47,19 +47,18 @@ df | String | The default field in case a field prefix is not provided in the qu
expand_wildcards | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indices), `closed` (match closed, non-hidden indices), `hidden` (match hidden indices), and `none` (deny wildcard expressions). Default is `open`.
from | Integer | The starting index to search from. Default is 0.
ignore_unavailable | Boolean | Specifies whether to include missing or closed indices in the response. Default is false.
lenient | Boolean | Specifies whether OpenSearch should ignore format-based query failures (for example, querying a text field for an integer). Default is false.
max_docs | Integer | Maximum amount of documents the operation should process. Default is all documents.
preference | String | Specifies the shard or node OpenSearch should perform the operation on.
q | String | Query in the Lucene query string syntax.
request_cache | Boolean | Specifies whether OpenSearch should use the request cache for the request. Default is whether it's enabled in the index's settings.
refresh | Boolean | Specifies whether OpenSearch should refresh all of the shards involved in the delete request once the operation finishes. Default is false.
lenient | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is false.
max_docs | Integer | How many documents the delete by query operation should process at most. Default is all documents.
preference | String | Specifies which shard or node OpenSearch should perform the delete by query operation on.
q | String | Lucene query string's query.
request_cache | Boolean | Specifies whether OpenSearch should use the request cache. Default is whether its enabled in the indexs settings.
refresh | Boolean | If true, OpenSearch refreshes shards to make the delete by query operation available to search results. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is `false`.
requests_per_second | Integer | Specifies the request's throttling in sub-requests per second. Default is -1, which means no throttling.
routing | String | Value used to route the operation to a specific shard.
scroll | Time | Amount of time the search context should be open.
scroll_size | Integer | Size of the scroll request of the operation. Default is 1000.
scroll_size | Integer | Size of the operation's scroll requests. Default is 1000.
search_type | String | Whether OpenSearch should use global term and document frequencies calculating revelance scores. Valid choices are `query_then_fetch` and `dfs_query_then_fetch`. `query_then_fetch` scores documents using local term and document frequencies for the shard. Its usually faster but less accurate. `dfs_query_then_fetch` scores documents using global term and document frequencies across all shards. Its usually slower but more accurate. Default is `query_then_fetch`.
search_timeout | Time | Amount of time until timeout for the search request. Default is no timeout.
slices | Integer | Number of sub-tasks OpenSearch should divide this task into. Default is 1, which means OpenSearch should not divide this task.
search_timeout | Time | How long to wait until OpenSearch deems the request timed out. Default is no timeout.
sort | String | A comma-separated list of &lt;field&gt; : &lt;direction&gt; pairs to sort by.
_source | String | Specifies whether to include the `_source` field in the response.
_source_excludes | String | A comma-separated list of source fields to exclude from the response.

View File

@ -21,20 +21,20 @@ DELETE /sample-index1/_doc/1
## Path and HTTP methods
```
DELETE /<index-name>/_doc/<_id>
DELETE /<index>/_doc/<_id>
```
## URL parameters
Parameter | Type | Description | Required
:--- | :--- | :--- | :---
&lt;index-name&gt; | String | The index to delete from. | Yes
&lt;index&gt; | String | The index to delete from. | Yes
&lt;_id&gt; | String | The ID of the document to delete. | Yes
if_seq_no | Integer | Only perform the delete operation if the document's version number matches the specified number. | No
if_primary_term | Integer | Only perform the delete operation if the document has the specified primary term. | No
refresh | Enum | If true, OpenSearch refreshes shards to make the operation visible to searching. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is false. | No
refresh | Enum | If true, OpenSearch refreshes shards to make the delete operation available to search results. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is `false`. | No
routing | String | Value used to route the operation to a specific shard. | No
timeout | Time | How long to wait for a response from the cluster. | No
timeout | Time | How long to wait for a response from the cluster. Default is `1m`. | No
version | Integer | The version of the document to delete, which must match the last updated version of the document. | No
version_type | Enum | Retrieves a specifically typed document. Available options are `external` (retrieve the document if the specified version number is greater than the document's current version) and `external_gte` (retrieve the document if the specified version number is greater than or equal to the document's current version). For example, to delete version 3 of a document, use `/_doc/1?version=3&version_type=external`. | No
wait_for_active_shards | String | The number of active shards that must be available before OpenSearch processes the delete request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the operation to succeed. | No

View File

@ -37,9 +37,9 @@ Parameter | Type | Description
:--- | :--- | :---
preference | String | Specifies a preference of which shard to retrieve results from. Available options are `_local`, which tells the operation to retrieve results from a locally allocated shard replica, and a custom string value assigned to a specific shard replica. By default, OpenSearch executes get document operations on random shards.
realtime | Boolean | Specifies whether the operation should run in realtime. If false, the operation waits for the index to refresh to analyze the source to retrieve data, which makes the operation near-realtime. Default is true.
refresh | Boolean | If true, OpenSearch refreshes shards to make the operation visible to searching. Default is false.
refresh | Boolean | If true, OpenSearch refreshes shards to make the get operation available to search results. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is `false`.
routing | String | A value used to route the operation to a specific shard.
stored_fields | Boolean | If true, the operation retrieves document fields stored in the index rather than the document's `_source`. Default is false.
stored_fields | Boolean | Whether the get operation should retrieve fields stored in the index. Default is false.
_source | String | Whether to include the `_source` field in the response body. Default is true.
_source_excludes | String | A comma-separated list of source fields to exclude in the query response.
_source_includes | String | A comma-separated list of source fields to include in the query response.

View File

@ -39,11 +39,11 @@ Parameter | Type | Description | Required
:--- | :--- | :--- | :---
&lt;index&gt; | String | Name of the index. | Yes
&lt;_id&gt; | String | A unique identifier to attach to the document. To automatically generate an ID, use `POST <target>/doc` in your request instead of PUT. | No
if_seq_no | Integer | Only perform the operation if the document has the specified sequence number. | No
if_primary_term | Integer | Only perform the operation if the document has the specified primary term. | No
if_seq_no | Integer | Only perform the index operation if the document has the specified sequence number. | No
if_primary_term | Integer | Only perform the index operation if the document has the specified primary term.| No
op_type | Enum | Specifies the type of operation to complete with the document. Valid values are `create` (create the index if it doesn't exist) and `index`. If a document ID is included in the request, then the default is `index`. Otherwise, the default is `create`. | No
pipeline | String | ID used to route the indexing operation to a certain pipeline. | No
routing | String | Value used to assign operations to specific shards. | No
pipeline | String | Route the index operation to a certain pipeline. | No
routing | String | value used to assign the index operation to a specific shard. | No
timeout | Time | How long to wait for a response from the cluster. Default is `1m`. | No
version | Integer | The document's version number. | No
version_type | Enum | Assigns a specific type to the document. Valid options are `external` (retrieve the document if the specified version number is greater than the document's current version) and `external_gte` (retrieve the document if the specified version number is greater than or equal to the document's current version). For example, to index version 3 of a document, use `/_doc/1?version=3&version_type=external`. | No
@ -86,7 +86,7 @@ _index | The name of the index.
_type | The document's type. OpenSearch supports only one type, which is `_doc`.
_id | The document's ID.
_version | The document's version.
_result | The result of the index operation.
result | The result of the index operation.
_shards | Detailed information about the cluster's shards.
total | The total number of shards.
successful | The number of shards OpenSearch succssfully added the document to.

View File

@ -16,7 +16,6 @@ The multi-get operation allows you to execute multiple GET operations in one req
```json
GET _mget
{
"docs": [
{
@ -69,11 +68,11 @@ All multi-get URL parameters are optional.
Parameter | Type | Description
:--- | :--- | :--- | :---
&lt;index&gt; | String | Name of the index to retrieve documents from.
preference | String | The node or shard that OpenSearch should perform the operation on. Default is random.
preference | String | Specifies the nodes or shards OpenSearch should execute the multi-get operation on. Default is random.
realtime | Boolean | Specifies whether the operation should run in realtime. If false, the operation waits for the index to refresh to analyze the source to retrieve data, which makes the operation near-realtime. Default is `true`.
refresh | Boolean | If true, OpenSearch refreshes shards to make the operation visible to searching. Default is `false`.
routing | String | A value used to route the operation to a specific shard.
stored_fields | Boolean | If true, the operation retrieves document fields stored in the index rather than the document's `_source`. Default is `false`.
refresh | Boolean | If true, OpenSearch refreshes shards to make the multi-get operation available to search results. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is `false`.
routing | String | Value used to route the multi-get operation to a specific shard.
stored_fields | Boolean | Specifies whether OpenSearch should retrieve documents fields from the index instead of the document's `_source`. Default is `false`.
_source | String | Whether to include the `_source` field in the query response. Default is `true`.
_source_excludes | String | A comma-separated list of source fields to exclude in the query response.
_source_includes | String | A comma-separated list of source fields to include in the query response.

View File

@ -0,0 +1,121 @@
---
layout: default
title: Reindex
parent: Document APIs
grand_parent: REST API reference
nav_order: 60
---
# Index document
Introduced 1.0
{: .label .label-purple}
The reindex API operation lets you copy all or a subset of your data from a source index into a destination index.
## Example
```json
POST /_reindex
{
"source":{
"index":"my-source-index"
},
"dest":{
"index":"my-destination-index"
}
}
```
## Path and HTTP methods
```
POST /_reindex
```
## URL parameters
All URL parameters are optional.
Parameter | Type | Description
:--- | :--- | :---
refresh | Boolean | If true, OpenSearch refreshes shards to make the reindex operation available to search results. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is `false`.
timeout | Time | How long to wait for a response from the cluster. Default is `30s`.
wait_for_active_shards | String | The number of active shards that must be available before OpenSearch processes the reindex request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the operation to succeed.
wait_for_completion | Boolean | Waits for the matching tasks to complete. Default is `false`.
requests_per_second | Integer | Specifies the requests throttling in sub-requests per second. Default is -1, which means no throttling.
require_alias | Boolean | Whether the destination index must be an index alias. Default is false.
scroll | Time | How long to keep the search context open. Default is `5m`.
slices | Integer | Number of sub-tasks OpenSearch should divide this task into. Default is 1, which means OpenSearch should not divide this task. Setting this parameter to `auto` indicates to OpenSearch that it should automatically decide how many slices to split the task into.
max_docs | Integer | How many documents the update by query operation should process at most. Default is all documents.
## Request body
Your request body must contain the names of the source index and destination index. All other fields are optional.
Field | Description
:--- | :---
conflicts | Indicates to OpenSearch what should happen if the delete by query operation runs into a version conflict. Valid options are `abort` and `proceed`. Default is abort.
source | Information about the source index to include. Valid fields are `index`, `max_docs`, `query`, `remote`, `size`, `slice`, and `_source`.
index | The name of the source index to copy data from.
max_docs | The maximum number of documents to reindex.
query | The search query to use for the reindex operation.
remote | Information about a remote OpenSearch cluster to copy data from. Valid fields are `host`, `username`, `password`, `socket_timeout`, and `connect_timeout`.
host | Host URL of the OpenSearch cluster to copy data from.
username | Username to authenticate with the remote cluster.
password | Password to authenticate with the remote cluster.
socket_timeout | The wait time for socket reads. Default is 30s.
connect_timeout | The wait time for remote connection timeouts. Default is 30s.
size | The number of documents to reindex.
slice | Whether to manually or automatically slice the reindex operation so it executes in parallel.
_source | Whether to reindex source fields. Speicfy a list of fields to reindex or true to reindex all fields. Default is true.
id | The ID to associate with manual slicing.
max | Maximum number of slices.
dest | Information about the destination index. Valid values are `index`, `version_type`, and `op_type`.
index | Name of the destination index.
version_type | The indexing operation's version type. Valid values are `internal`, `external`, `external_gt` (retrieve the document if the specified version number is greater than the documents current version), and `external_gte` (retrieve the document if the specified version number is greater or equal to than the documents current version).
op_type | Whether to copy over documents that are missing in the destination index. Valid values are `create` (ignore documents with the same ID from the source index) and `index` (copy everything from the source index).
script | A script that OpenSearch uses to apply transformations to the data during the reindex operation.
source | The actual script that OpenSearch runs.
lang | The scripting language. Valid options are `painless`, `expression`, `mustache`, and `java`.
## Response
```json
{
"took": 28829,
"timed_out": false,
"total": 111396,
"updated": 0,
"created": 111396,
"deleted": 0,
"batches": 112,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1.0,
"throttled_until_millis": 0,
"failures": []
}
```
## Response body fields
Field | Description
:--- | :---
took | How long the operation took in milliseconds.
timed_out | Whether the operation timed out.
total | The total number of documents processed.
updated | The number of documents updated in the destination index.
created | The number of documents created in the destination index.
deleted | The number of documents deleted.
batches | Number of scroll responses.
version_conflicts | Number of version conflicts.
noops | How many documents OpenSearch ignored during the operation.
retries | Number of bulk and search retry requests.
throttled_millis | Number of throttled milliseconds during the request.
requests_per_second | Number of requests executed per second during the operation.
throttled_until_millis | The amount of time until OpenSearch executes the next throttled request.
failures | Any failures that occurred during the operation.

View File

@ -44,31 +44,31 @@ All URL parameters are optional.
Parameter | Type | Description
:--- | :--- | :--- | :---
&lt;target-index&gt; | String | Comma-separated list of indices to update. To update all indices, use * or omit this parameter.
allow_no_indices | String | If false, the request returns an error if wildcard expressions match closed or missing indices. Default is true.
&lt;index&gt; | String | Comma-separated list of indices to update. To update all indices, use * or omit this parameter.
allow_no_indices | Boolean | Whether to ignore wildcards that dont match any indices. Default is `true`.
analyzer | String | Analyzer to use in the query string.
analyze_wildcard | Boolean | Whether the operation should include wildcard and prefix queries in the analysis. Default is false.
conflicts | String | Specifies whether the operation should continue if the request runs into version conflicts. Valid options are `abort` and `proceed`. Default is `abort`.
default_operator | String | The default operator the string query should use. Valid options are `AND` and `OR`. Default is `OR`.
df | String | The default field when the query string does not have a field prefix.
analyze_wildcard | Boolean | Whether the update operation should include wildcard and prefix queries in the analysis. Default is false.
conflicts | String | Indicates to OpenSearch what should happen if the update by query operation runs into a version conflict. Valid options are `abort` and `proceed`. Default is `abort`.
default_operator | String | Indicates whether the default operator for a string query should be `AND` or `OR`. Default is `OR`.
df | String | The default field if a field prefix is not provided in the query string.
expand_wildcards | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indices), `closed` (match closed, non-hidden indices), `hidden` (match hidden indices), and `none` (deny wildcard expressions). Default is `open`.
from | Integer | The starting index to search from. Default is 0.
ignore_unavailable | Boolean | Whether to exclude missing or closed indices in the response. Default is false.
lenient | Boolean | Whether OpenSearch should ignore format-based query failures (for example, querying an integer field for a string). Default is false.
max_docs | Integer | Maximum number of documents the request should process. Default is all documents.
lenient | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is false.
max_docs | Integer | How many documents the update by query operation should process at most. Default is all documents.
pipeline | String | ID of the pipeline to use to process documents.
preference | String | The node or shard OpenSearch should perform the operation on.
q | String | Query in the Lucene query string syntax.
request_cache | Boolean | Whether OpenSearch should use the request cache for the operation. Default is whether it's enabled in the index's settings.
refresh | Boolean | Specifies whether OpenSearch should refresh shards involved in the operation to make the operation visible to searching.
preference | String | Specifies which shard or node OpenSearch should perform the update by query operation on.
q | String | Lucene query string's query.
request_cache | Boolean | Specifies whether OpenSearch should use the request cache. Default is whether its enabled in the indexs settings.
refresh | Boolean | If true, OpenSearch refreshes shards to make the update by query operation available to search results. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is `false`.
requests_per_second | Integer | Specifies the request's throttling in sub-requests per second. Default is -1, which means no throttling.
routing | String | Value used to route the operation to a specific shard.
routing | String | Value used to route the update by query operation to a specific shard.
scroll | Time | How long to keep the search context open.
scroll_size | Integer | Size of the operation's scroll request. Default is 1000.
search_type | String | Whether OpenSearch should use global term and document frequencies calculating revelance scores. Valid choices are `query_then_fetch` and `dfs_query_then_fetch`. `query_then_fetch` scores documents using local term and document frequencies for the shard. Its usually faster but less accurate. `dfs_query_then_fetch` scores documents using global term and document frequencies across all shards. Its usually slower but more accurate. Default is `query_then_fetch`.
search_timeout | Time | Amount of time until timeout for the search request. Default is no timeout.
search_timeout | Time | How long to wait until OpenSearch deems the request timed out. Default is no timeout.
slices | Integer | Number of sub-tasks OpenSearch should divide this task into. Default is 1, which means OpenSearch should not divide this task.
sort | String | A comma-separated list of &lt;field&gt; : &lt;direction&gt; pairs to sort by.
sort | List | A comma-separated list of &lt;field&gt; : &lt;direction&gt; pairs to sort by.
_source | String | Whether to include the `_source` field in the response.
_source_excludes | String | A comma-separated list of source fields to exclude from the response.
_source_includes | String | A comma-separated list of source fields to include in the response.
@ -80,7 +80,7 @@ wait_for_active_shards | String | The number of shards that must be active befor
## Request body
To update your indices and documents by query, you must include a [query]({{site.baseurl}}{{site.url}}/opensearch/query-dsl/index) and a script in the request body that OpenSearch can run to update your documents. If you don't specify a query, then every document in the index gets updated.
To update your indices and documents by query, you must include a [query]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index) and a script in the request body that OpenSearch can run to update your documents. If you don't specify a query, then every document in the index gets updated.
```json
{

View File

@ -30,11 +30,7 @@ POST /sample-index1/_update/1
POST /test-index1/_update/1
{
"script" : {
"source": "ctx._source.oldValue += params.newValue",
"lang": "painless",
"params" : {
"newValue" : 10
}
"source": "ctx._source.secret_identity = \"Batman\""
}
}
```
@ -42,22 +38,22 @@ POST /test-index1/_update/1
## Path and HTTP methods
```
POST /<index-name>/_update/<_id>
POST /<index>/_update/<_id>
```
## URL parameters
Parameter | Type | Description | Required
:--- | :--- | :--- | :---
&lt;index-name&gt; | String | Name of the index. | Yes
&lt;index&gt; | String | Name of the index. | Yes
&lt;_id&gt; | String | The ID of the document to update. | Yes
if_seq_no | Integer | Only perform the delete operation if the document's version number matches the specified number. | No
if_primary_term | Integer | Only perform the delete operation if the document has the specified primary term. | No
if_primary_term | Integer | Perform the update operation if the document has the specified primary term. | No
lang | String | Language of the script. Default is `painless`. | No
require_alias | Boolean | Specifies whether the destination must be an index alias. Default is false. | No
refresh | Enum | If true, OpenSearch refreshes shards to make the operation visible to searching. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is false. | No
refresh | Enum | If true, OpenSearch refreshes shards to make the operation visible to searching. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is `false`. | No
retry_on_conflict | Integer | The amount of times OpenSearch should retry the operation if there's a document conflict. Default is 0. | No
routing | String | Value used to route the operation to a specific shard. | No
routing | String | Value to route the update operation to a specific shard. | No
_source | List | Whether to include the `_source` field in the response body. Default is true. | No
_source_excludes | List | A comma-separated list of source fields to exclude in the query response. | No
_source_includes | List | A comma-separated list of source fields to include in the query response. | No

View File

@ -11,7 +11,7 @@ Introduced 1.0
Wondering why a specific document ranks higher (or lower) for a query? You can use the explain API for an explanation of how the relevance score (`_score`) is calculated for every result.
OpenSearch uses a probabilistic ranking framework called [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25) to calculate relevance scores. Okapi BM25 is based on the original [TF/IDF](http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/package-summary.html#scoring) framework used by Apache Lucene.
OpenSearch uses a probabilistic ranking framework called [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25) to calculate relevance scores. Okapi BM25 is based on the original [TF/IDF](http://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/package-summary.html#scoring) framework used by Apache Lucene.
The explain API is an expensive operation in terms of both resources and time. On production clusters, we recommend using it sparingly for the purpose of troubleshooting.
{: .warning }

View File

@ -1,8 +1,9 @@
---
layout: default
title: Create index
parent: REST API reference
nav_order: 3
parent: Index APIs
grand_parent: REST API reference
nav_order: 1
---
# Create index
@ -83,9 +84,8 @@ index.number_of_routing_shards | The number of routing shards used to split an i
index.shard.check_on_startup | Whether the index's shards should be checked for corruption. Available options are `false` (do not check for corruption), `checksum` (check for physical corruption), and `true` (check for both physical and logical corruption). Default is `false`.
index.codec | The compression type to use to compress stored data. Available values are `best_compression` and `default`.
index.routing_partition_size | The number of shards a custom routing value can go to. Routing helps an imbalanced cluster by relocating values to a subset of shards rather than just a single shard. To enable, set this value to greater than 1 but less than `index.number_of_shards`. Default is 1.
index.soft_deletes_enabled | Whether to retain a history of operations performed on the index.
index.soft_deletes.retention_lease.period | The maximum amount of time to retain a shard's history of operations. Default is `12h`.
index.load_fixed_bitset_filters_eagerly | Indicates whether cached filters should be pre-loaded for nested queries. Available options are `true` and `false`. Default is `true`.
index.load_fixed_bitset_filters_eagerly | Whether OpenSearch should pre-load cached filters. Available options are `true` and `false`. Default is `true`.
index.hidden | Whether the index should be hidden. Hidden indices are not returned as part of queries that have wildcards. Available options are `true` and `false`. Default is `false`.
#### Dynamic index Settings
@ -93,28 +93,29 @@ index.hidden | Whether the index should be hidden. Hidden indices are not return
Setting | Description
:--- | :---
index.number_of_replicas | The number of replica shards each primary shard should have. For example, if you have 4 primary shards and set `index.number_of_replicas` to 3, the index has 12 replica shards. Default is 1.
index.auto_expand_replicas | Automatically expands the number of replicas based on the amount of data nodes in the cluster. Specify a lower bound and upper limit (for example, 0-9), or `all` for the upper limit. For example, if you have 5 data nodes and set `index.auto_expand_replicas` to 0-3, then the cluster does not autoamtically add another replica shard. However, if you set this value to `0-all` and add 2 more nodes for a total of 7, the cluster will expand to now have 6 replica shards. Default is disabled.
index.auto_expand_replicas | Whether the cluster should automatically add replica shards based on the number of data nodes. Specify a lower bound and upper limit (for example, 0-9), or `all` for the upper limit. For example, if you have 5 data nodes and set `index.auto_expand_replicas` to 0-3, then the cluster does not autoamtically add another replica shard. However, if you set this value to `0-all` and add 2 more nodes for a total of 7, the cluster will expand to now have 6 replica shards. Default is disabled.
index.search.idle.after | Amount of time a shard should wait for a search or get request until it goes idle. Default is `30s`.
index.refresh_interval | How often the index should refresh, which publishes its most recent changes and makes them available for searching. Can be set to `-1` to disable refreshing. Default is `1s`.
index.max_result_window | The maximum value of `from` + `size` for searches to the index. `from` is the starting index to search from, and `size` is the amount of results to return. Default: 10000.
index.max_inner_result_window | The maximum value of `from` + `size` for inner hits definitions and top hits aggregations to the index. `from` is the starting index to search from, and `size` is the amount of top hits to return. Default is 100.
index.max_inner_result_window | Maximum value of `from` + `size` to return nested search hits and most relevant document aggregated during the query. `from` is the starting index to search from, and `size` is the amount of top hits to return. Default is 100.
index.max_rescore_window | The maximum value of `window_size` for rescore requests to the index. Rescore requests reorder the index's documents and return a new score, which can be more precise. Default is the same as index.max_inner_result_window or 10000 by default.
index.max_docvalue_fields_search | Maximum amount of `docvalue_fields` allowed in a query. Default is 100.
index.max_script_fields | Maximum amount of `script_fields` allowed in a query. Default is 32.
index.max_ngram_diff | Maximum allowed difference between `min_gram` and `max_gram` values for `NGramTokenizer` and `NGramTokenFilter`. Default is 1.
index.max_shingle_diff | Maximum allowed difference between `max_shingle_size` and `min_shingle_size` for the `shingle` token filter. Default is 3.
index.max_refresh_listeners | Maximum amount of refresh listeners available on the index's shards.
index.max_ngram_diff | Maximum difference between `min_gram` and `max_gram` values for `NGramTokenizer` and `NGramTokenFilter` fields. Default is 1.
index.max_shingle_diff | Maximum difference between `max_shingle_size` and `min_shingle_size` to feed into the `shingle` token filter. Default is 3.
index.max_refresh_listeners | Maximum amount of refresh listeners each shard is allowed to have.
index.analyze.max_token_count | Maximum amount of tokens that can return from the `_analyze` API operation. Default is 10000.
index.highlight.max_analyzed_offset | The maximum amount of characters that will be analyzed in a highlight request. Default is 1000000.
index.max_terms_count | The maximum amount of terms that can be used in a terms query. Default is 65536.
index.highlight.max_analyzed_offset | The amount of characters a highlight request can analyze. Default is 1000000.
index.max_terms_count | The maximum amount of terms a terms query can accept. Default is 65536.
index.max_regex_length | The maximum character length of regex that can be in a regexp query. Default is 1000.
index.query.default_field | A field or list of fields that OpenSearch uses in queries in case a field isn't specified in the parameters.
index.routing.allocation.enable | Specifies options for the index's shard allocation. Available options are `all` (allow shard allocation for all shards), `primaries` (allow shard allocation only for primary shards), `new_primaries` (allow shard allocaiton only for new primary shards), and `none` (do not allow shard allocation). Default is `all`.
index.routing.rebalance.enable | Enables shard rebalancing for the index. Available options are `all` (allow shard rebalancing for all shards), `primaries`, (allow shard rebalancing only for primary shards), `replicas` (allow shard rebalancing only for replicas), and `none` (do not allow shard rebalancing). Default is `all`.
index.routing.allocation.enable | Specifies options for the indexs shard allocation. Available options are all (allow allocation for all shards), primaries (allow allocation only for primary shards), new_primaries (allow allocation only for new primary shards), and none (do not allow allocation). Default is all.
index.routing.rebalance.enable | Enables shard rebalancing for the index. Available options are `all` (allow rebalancing for all shards), `primaries` (allow rebalancing only for primary shards), `replicas` (allow rebalancing only for replicas), and `none` (do not allow rebalancing). Default is `all`.
index.gc_deletes | Amount of time to retain a deleted document's version number. Default is `60s`.
index.default_pipeline | The default ingest node pipeline for the index. If the default pipeline is set and the pipeline does not exist, then index requests fail. The pipeline name `_none` specifies that the index does not have an ingest pipeline.
index.final_pipeline | The final ingest node pipeline for the index. If the final pipeline is set and the pipeline does not exist, then index requests fail. The pipeline name `_none` specifies that the index does not have an ingest pipeline.
### Mappings
Mappings define how a documents and its fields are stored and indexed. If you're just starting to build out your cluster and data, you may not know exactly how your data should be stored. In those cases, you can use dynamic mappings, which tell OpenSearch to dynamically add data and their fields. However, if you know exactly what types your data fall under and want to enforce that standard, then you can use explicit mappings.

View File

@ -0,0 +1,45 @@
---
layout: default
title: Delete index
parent: Index APIs
grand_parent: REST API reference
nav_order: 5
---
# Delete index
Introduced 1.0
{: .label .label-purple }
If you no longer need an index, you can use the delete index API operation to delete it.
## Example
```json
DELETE /sample-index
```
## Path and HTTP methods
```
DELETE /<index-name>
```
## URL parameters
All parameters are optional.
Parameter | Type | Description
:--- | :--- | :---
allow_no_indices | Boolean | Whether to ignore wildcards that don't match any indices. Default is true.
expand_wildcards | String | Expands wildcard expressions to different indices. Combine multiple values with commas. Available values are all (match all indices), open (match open indices), closed (match closed indices), hidden (match hidden indices), and none (do not accept wildcard expressions), which must be used with open, closed, or both. Default is open.
ignore_unavailable | Boolean | If true, OpenSearch does not include missing or closed indices in the response.
master_timeout | Time | How long to wait for a connection to the master node. Default is `30s`.
timeout | Time | How long to wait for the response to return. Default is `30s`.
## Response
```json
{
"acknowledged": true
}
```

View File

@ -0,0 +1,76 @@
---
layout: default
title: Get index
parent: Index APIs
grand_parent: REST API reference
nav_order: 10
---
# Get index
Introduced 1.0
{: .label .label-purple }
You can use the get index API operation to return information about an index.
## Example
```json
GET /sample-index
```
## Path and HTTP methods
```
GET /<index-name>
```
## URL parameters
All parameters are optional.
Parameter | Type | Description
:--- | :--- | :---
allow_no_indices | Boolean | Whether to ignore wildcards that don't match any indices. Default is true.
expand_wildcards | String | Expands wildcard expressions to different indices. Combine multiple values with commas. Available values are all (match all indices), open (match open indices), closed (match closed indices), hidden (match hidden indices), and none (do not accept wildcard expressions), which must be used with open, closed, or both. Default is open.
flat_settings | Boolean | Whether to return settings in the flat form, which can improve readability, especially for heavily nested settings. For example, the flat form of "index": { "creation_date": "123456789" } is "index.creation_date": "123456789".
include_defaults | Boolean | Whether to include default settings as part of the response. This parameter is useful for identifying the names and current values of settings you want to update.
ignore_unavailable | Boolean | If true, OpenSearch does not include missing or closed indices in the response.
local | Boolean | Whether to return information from only the local node instead of from the master node. Default is false.
master_timeout | Time | How long to wait for a connection to the master node. Default is `30s`.
## Response
```json
{
"sample-index1": {
"aliases": {},
"mappings": {},
"settings": {
"index": {
"creation_date": "1633044652108",
"number_of_shards": "2",
"number_of_replicas": "1",
"uuid": "XcXA0aZ5S0aiqx3i1Ce95w",
"version": {
"created": "135217827"
},
"provided_name": "sample-index1"
}
}
}
}
```
## Response body fields
Field | Description
:--- | :---
aliases | Any aliases associated with the index.
mappings | Any mappings in the index.
settings | The index's settings
creation_date | The Unix epoch time of when the index was created.
number_of_shards | How many shards the index has.
number_of_replicas | How many replicas the index has.
uuid | The index's uuid.
created | The version of OpenSearch when the index was created.
provided_name | Name of the index.

View File

@ -0,0 +1,16 @@
---
layout: default
title: Index APIs
parent: REST API reference
has_children: true
nav_order: 3
redirect_from:
- /opensearch/rest-api/index-apis/
---
# Index APIs
The index API operations let you interact with indices in your cluster. Using these operations, you can create, delete, close, and complete other index-related operations.
If you use the security plugin, make sure you have the appropriate permissions.
{: .note }

View File

@ -1,63 +1,94 @@
---
layout: default
title: Multi search
title: Multi-search
parent: REST API reference
nav_order: 130
---
# Multi search
# Multi-search
Introduced 1.0
{: .label .label-purple }
The multi-search operation lets you bundle multiple search requests and send them to your OpenSearch cluster in a single request. This operation executes searches in parallel, so you get back the response more quickly as compared to independent search requests. It also executes each request independently, so the failure of one request doesn't affect the others.
As the name suggests, the multi-search operation lets you bundle multiple search requests into a single request. OpenSearch then executes the searches in parallel, so you get back the response more quickly compared to sending one request per search. OpenSearch executes each search independently, so the failure of one doesn't affect the others.
The multi-search request body follows this pattern:
```
header\n
body\n
header\n
body\n
```
OpenSearch uses newline characters to parse multi-search requests and requires that each request ends with a newline character.
## Example
```json
GET _msearch
{"index":"opensearch_dashboards_sample_data_logs"}
{"query":{"match_all":{}},"from":0,"size":10}
{"index":"opensearch_dashboards_sample_data_ecommerce","search_type":"dfs_query_then_fetch"}
{"query":{"match_all":{}}}
{ "index": "opensearch_dashboards_sample_data_logs"}
{ "query": { "match_all": {} }, "from": 0, "size": 10}
{ "index": "opensearch_dashboards_sample_data_ecommerce", "search_type": "dfs_query_then_fetch"}
{ "query": { "match_all": {} } }
```
## Path and HTTP methods
```
GET <target>/_msearch
GET _msearch
GET <indices>/_msearch
POST _msearch
POST <indices>/_msearch
```
## URL parameters
All multi-search URL parameters are optional.
## Request body
Parameter | Type | Description
The multi-search request body follows this pattern:
```
Metadata\n
Query\n
Metadata\n
Query\n
```
- Metadata lines include options, such as which indices to search and the type of search.
- Query lines use the [query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/).
Just like the [bulk]({{site.url}}{{site.baseurl}}/opensearch/rest-api/document-apis/bulk/) operation, the JSON doesn't need to be minified---spaces are fine---but it does need to be on a single line. OpenSearch uses newline characters to parse multi-search requests and requires that the request body end with a newline character.
## URL parameters and metadata options
All multi-search URL parameters are optional. Some can also be applied per-search as part of each metadata line.
Parameter | Type | Description | Supported in metadata line
:--- | :--- | :---
`allow_no_indices` | Boolean | Whether to ignore wildcards that don't match any indices. Default is `true`.
`css_minimize_roundtrips` | Boolean | If true, network roundtrips between the local node and remote clusters are minimized for cross-cluster search requests. Default is `true`.
`expand_wildcards` | Enum | Expands wildcard expressions to concrete indices. Combine multiple values with commas. Supported values are `all`, `open`, `closed`, `hidden`, and `none`. Default is `open`.
`ignore_unavailable` | Boolean | If an index from the indices list doesnt exist, whether to ignore it rather than fail the query. Default is `false`.
`max_concurrent_searches` | Integer | Maximum number of searches executed in parallel. Default is `max(1, (number of of data nodes * min(search thread pool size, 10)))`.
`max_concurrent_shard_requests` | Integer | Maximum number of concurrent shard requests that each sub-search request executes per node. Default is 5. If you have an environment where a very low number of concurrent search requests is expected, a higher value of this parameter might improve performance.
`pre_filter_shard_size` | Integer | Defines a threshold that enforces a round-trip to pre-filter search shards that cannot possibly match. This filter phase can limit the number of searched shards significantly. For instance, if a date range filter is applied, then all indices that don't contain documents within that date range are skipped. Default is 128.
`rest_total_hits_as_int` | String | Whether the `hits.total` property is returned as an integer or an object. Default is `false`.
`search_type` | String | Whether global term and document frequencies are used when calculating the relevance score. Valid choices are `query_then_fetch` and `dfs_query_then_fetch`. `query_then_fetch` scores documents using local term and document frequencies for the shard. It's usually faster but less accurate. `dfs_query_then_fetch` scores documents using global term and document frequencies across all shards. It's usually slower but more accurate. Default is `query_then_fetch`.
`typed_keys` | Boolean | Whether aggregation names are prefixed by their internal types in the response. Default is `false`.
allow_no_indices | Boolean | Whether to ignore wildcards that don't match any indices. Default is `true`. | Yes
css_minimize_roundtrips | Boolean | Whether OpenSearch should try to minimize the number of network round trips between the coordinating node and remote clusters (only applicable to cross-cluster search requests). Default is `true`. | No
expand_wildcards | Enum | Expands wildcard expressions to concrete indices. Combine multiple values with commas. Supported values are `all`, `open`, `closed`, `hidden`, and `none`. Default is `open`. | Yes
ignore_unavailable | Boolean | If an index from the indices list doesnt exist, whether to ignore it rather than fail the query. Default is `false`. | Yes
max_concurrent_searches | Integer | The maximum number of concurrent searches. The default depends on your node count and search thread pool size. Higher values can improve performance, but risk overloading the cluster. | No
max_concurrent_shard_requests | Integer | Maximum number of concurrent shard requests that each search executes per node. Default is 5. Higher values can improve performance, but risk overloading the cluster. | No
pre_filter_shard_size | Integer | Default is 128. | No
rest_total_hits_as_int | String | Whether the `hits.total` property is returned as an integer (`true`) or an object (`false`). Default is `false`. | No
search_type | String | Affects relevance score. Valid options are `query_then_fetch` and `dfs_query_then_fetch`. `query_then_fetch` scores documents using term and document frequencies for the shard (faster, less accurate), whereas `dfs_query_then_fetch` uses term and document frequencies across all shards (slower, more accurate). Default is `query_then_fetch`. | Yes
typed_keys | Boolean | Whether to prefix aggregation names with their internal types in the response. Default is `false`. | No
{% comment %}Regarding `pre_filter_shard_size`: The description from the REST API specification is unintelligible---to me, anyway. I wasn't able to learn anything from reading the source code, either, so I've included the default value and nothing else in the table above. - aetter
From the REST API specification: A threshold that enforces a pre-filter round trip to prefilter search shards based on query rewriting if the number of shards the search request expands to exceeds the threshold. This filter roundtrip can limit the number of shards significantly if for instance a shard can not match any documents based on its rewrite method ie. if date filters are mandatory to match but the shard bounds and the query are disjoint.{% endcomment %}
## Metadata-only options
Some options can't be applied as URL parameters to the entire request. Instead, you can apply them per-search as part of each metadata line. All are optional.
Option | Type | Description
:--- | :--- | :---
index | String, string array | If you don't specify an index or multiple indices as part of the URL (or want to override the URL value for an individual search), you can include it here. Examples include `"logs-*"` and `["my-store", "sample_data_ecommerce"]`.
preference | String | The nodes or shards that you'd like to perform the search. This setting can be useful for testing, but in most situations, the default behavior provides the best search latencies. Options include `_local`, `_only_local`, `_prefer_nodes`, `_only_nodes`, and `_shards`. These last three options accept a list of nodes or shards. Examples include `"_only_nodes:data-node1,data-node2"` and `"_shards:0,1`.
request_cache | Boolean | Whether to cache results, which can improve latency for repeat searches. Default is to use the `index.requests.cache.enable` setting for the index (which defaults to `true` for new indices).
routing | String | Comma-separated custom routing values (e.g. `"routing": "value1,value2,value3"`.
## Response
You get back the responses in an array form, where the search response for each search request matches its order in the original multi-search request.
OpenSearch returns an array with the results of each search in the same order as the multi-search request.
```json
{

View File

@ -0,0 +1,40 @@
---
layout: default
title: Remote cluster information
parent: REST API reference
nav_order: 25
---
# Remote cluster information
Introduced 1.0
{: .label .label-purple }
This operation provides connection information for any remote OpenSearch clusters that you've configured for the local cluster, such as the remote cluster alias, connection mode (`sniff` or `proxy`), IP addresses for seed nodes, and timeout settings.
The response is more comprehensive and useful than a call to `_cluster/settings`, which only includes the cluster alias and seed nodes.
## Path and HTTP methods
```
GET _remote/info
```
## Response
```json
{
"opensearch-cluster2": {
"connected": true,
"mode": "sniff",
"seeds": [
"172.28.0.2:9300"
],
"num_nodes_connected": 1,
"max_connections_per_cluster": 3,
"initial_connect_timeout": "30s",
"skip_unavailable": false
}
}
```

View File

@ -98,7 +98,7 @@ Parameter | Type | Description
:--- | :--- | :---
scroll | Time | Specifies the amount of time the search context is maintained.
scroll_id | String | The scroll ID for the search.
rest_total_hits_as_int | Boolean | Whether the `hits.total` property is returned as an integer or an object. Default is false.
rest_total_hits_as_int | Boolean | Whether the `hits.total` property is returned as an integer (`true`) or an object (`false`). Default is `false`.
## Response

View File

@ -6,6 +6,8 @@ nav_order: 6
---
# Update mapping
Introduced 1.0
{: .label .label-purple }
If you want to update an index's mappings to add or update field types after index creation, you can do so with the update mapping API operation.
@ -47,12 +49,12 @@ All update mapping parameters are optional.
Parameter | Data Type | Description
:--- | :--- | :---
allow_no_indices | Boolean | If false, the request returns an error if any wildcard expresion or index alias targets any closed or missing indices. Defaults to false.
allow_no_indices | Boolean | Whether to ignore wildcards that dont match any indices. Default is `true`.
expand_wildcards | String | Expands wildcard expressions to different indices. Combine multiple values with commas. Available values are `all` (match all indices), `open` (match open indices), `closed` (match closed indices), `hidden` (match hidden indices), and `none` (do not accept wildcard expressions), which must be used with `open`, `closed`, or both. Default is `open`.
ignore_unavailable | Boolean | If true, OpenSearch does not include missing or closed indices in the response.
master_timeout | Time | How long to wait for a connection to the master node. Default is `30s`.
timeout | Time | How long to wait for the response to return. Default is `30s`.
write_index_only | Boolean | If true, the specified mappings are applied only to the write index.
write_index_only | Boolean | Whether OpenSearch should apply mapping updates only to the write index.
## Request body

View File

@ -19,68 +19,53 @@ require "pathname"
module Jekyll::LinkChecker
##
# The collection that will get stores as the output
@urls = {}
##
# Pattern to identify documents that should be excluded based on their URL
@excluded_paths = /(\.(css|js|json|map|xml|txt|yml)$|\/version-selector\.tpl$)/i.freeze
@excluded_paths = /(\.(css|js|json|map|xml|txt|yml)$)/i.freeze
##
# Pattern to identify certain HTML tags whose content should be excluded from indexing
@href_matcher = /<a[^>]+href=(['"])(.+?)\1/im.freeze
##
# Pattern to check for external URLs
@external_matcher = /^https?:\/\//.freeze
##
# List of domains to ignore
@ignored_domains = %w[localhost]
##
# Pattern of local paths to ignore
@ignored_paths = /(^\/javadocs\/)/.freeze
##
# Pattern to exclude when adding the `index.html` suffix to paths
@need_no_suffix = /\.(?!html)[^\/]+$/.freeze
# Valid response codes for successful links
@success_codes = %w[200 302]
##
# Questionable response codes for successful links
@questionable_codes = %w[301 403 429]
##
# Holds the list of failures
@failures = []
##
# Driven by environment variables, it indicates a need to check external links
@check_external_links
##
# Driven by environment variables, it indicates the need to fail the build for dead links
@should_build_fatally
##
# Initializes the singleton by recording the site
# return [void]
def self.init(site)
@site = site
@urls = {}
@failures = []
end
##
# Processes a Document or Page and adds the links to a collection
# It also checks for anchors to parts of the same page/doc
# It also checks for anchors that link to parts of the same page/doc
# return [void]
def self.process(page)
return if @excluded_paths.match(page.path)
@ -98,9 +83,8 @@ module Jekyll::LinkChecker
end
end
##
# Saves the collection as a JSON file
# Verifies the validity of all the destinations gathered in @urls
# return [void]
def self.verify(site)
if ENV.key?('JEKYLL_CHECK_EXTERNAL_LINKS')
@check_external_links = true
@ -132,9 +116,9 @@ module Jekyll::LinkChecker
end
end
##
# Check if URL is accessible
# Check if an internal or external URL is accessible
# @param url [String] the url to check
# @return [Boolean]
def self.check(url)
match = @base_url_matcher.match(url)
unless match.nil?
@ -149,9 +133,9 @@ module Jekyll::LinkChecker
return self.check_internal(url)
end
##
# Check if an external URL is accessible by making a HEAD call
# @param url [String] the url to check
# @return [Boolean]
def self.check_external(url)
uri = URI(url)
return true if @ignored_domains.include? uri.host
@ -172,61 +156,62 @@ module Jekyll::LinkChecker
end
end
##
# Check if an internal link is accessible
# @param url [String] the url to check
# @return [Boolean]
def self.check_internal(url)
return true if @ignored_paths =~ url
path, hash = url.split('#')
unless path.end_with? 'index.html'
path << '/' unless path.end_with? '/'
path << 'index.html' unless path.end_with? 'index.html'
if @need_no_suffix =~ path
filename = File.join(@site.config["destination"], path)
return File.file?(filename)
else
unless path.end_with? 'index.html'
path << '/' unless path.end_with? '/'
path << 'index.html' unless path.end_with? 'index.html'
end
filename = File.join(@site.config["destination"], path)
return false unless File.file?(filename)
content = File.read(filename)
unless content.include? "<title>Redirecting"
return true if hash.nil? || hash.empty?
return !(content =~ /<[a-z0-9-]+[^>]+id="#{hash}"/i).nil?
end
match = content.match(@href_matcher)
if match.nil?
puts "LinkChecker: [Warning] Cannot check #{url} due to an unfollowable redirect"
return true
end
redirect = match[2]
redirect << '#' + hash unless hash.nil? || hash.empty?
return self.check(redirect)
end
filename = File.join(@site.config["destination"], path)
return false unless File.file?(filename)
content = File.read(filename)
unless content.include? "<title>Redirecting"
return true if hash.nil? || hash.empty?
return !(content =~ /<[a-z0-9-]+[^>]+id="#{hash}"/i).nil?
end
match = content.match(@href_matcher)
if match.nil?
puts "LinkChecker: [Warning] Cannot check #{url} due to an unfollowable redirect"
return true
end
redirect = match[2]
redirect << '#' + hash unless hash.nil? || hash.empty?
return self.check(redirect)
end
end
# Before any Document or Page is processed, initialize the LinkChecker
Jekyll::Hooks.register :site, :pre_render do |site|
Jekyll::LinkChecker.init(site)
end
# Process a Page as soon as its content is ready
Jekyll::Hooks.register :pages, :post_convert do |page|
Jekyll::LinkChecker.process(page)
end
# Process a Document as soon as its content is ready
Jekyll::Hooks.register :documents, :post_convert do |document|
Jekyll::LinkChecker.process(document)
end
# Verify gathered links after Jekyll is done writing all its stuff
Jekyll::Hooks.register :site, :post_write do |site|
Jekyll::LinkChecker.verify(site)
end

391
_replication-plugin/api.md Normal file
View File

@ -0,0 +1,391 @@
---
layout: default
title: API
nav_order: 50
---
# Cross-cluster replication API
Use these replication operations to programmatically manage cross-cluster replication.
#### Table of contents
- TOC
{:toc}
## Start replication
Introduced 1.1
{: .label .label-purple }
Initiate replication of an index from the leader cluster to the follower cluster. Send this request to the follower cluster.
#### Request
```json
PUT /_plugins/_replication/<follower-index>/_start
{
"leader_alias":"<connection-alias-name>",
"leader_index":"<index-name>",
"use_roles":{
"leader_cluster_role":"<role-name>",
"follower_cluster_role":"<role-name>"
}
}
```
Specify the following options:
Options | Description | Type | Required
:--- | :--- |:--- |:--- |
`leader_alias` | The name of the cross-cluster connection. You define this alias when you [set up a cross-cluster connection]({{site.url}}{{site.baseurl}}/replication-plugin/get-started/#set-up-a-cross-cluster-connection). | `string` | Yes
`leader_index` | The index on the leader cluster that you want to replicate. | `string` | Yes
`use_roles` | The roles to use for all subsequent backend replication tasks between the indices. Specify a `leader_cluster_role` and `follower_cluster_role`. See [Map the leader and follower cluster roles]({{site.url}}{{site.baseurl}}/replication-plugin/permissions/#map-the-leader-and-follower-cluster-roles). | `string` | If security plugin is enabled
#### Sample response
```json
{
"acknowledged": true
}
```
## Stop replication
Introduced 1.1
{: .label .label-purple }
Terminates replication and converts the follower index to a standard index. Send this request to the follower cluster.
#### Request
```json
POST /_plugins/_replication/<follower-index>/_stop
{}
```
#### Sample response
```json
{
"acknowledged": true
}
```
## Pause replication
Introduced 1.1
{: .label .label-purple }
Pauses replication of the leader index. Send this request to the follower cluster.
#### Request
```json
POST /_plugins/_replication/<follower-index>/_pause
{}
```
You can't resume replication after it's been paused for more than 12 hours. You must [stop replication]({{site.url}}{{site.baseurl}}/replication-plugin/api/#stop-replication), delete the follower index, and restart replication of the leader.
#### Sample response
```json
{
"acknowledged": true
}
```
## Resume replication
Introduced 1.1
{: .label .label-purple }
Resumes replication of the leader index. Send this request to the follower cluster.
#### Request
```json
POST /_plugins/_replication/<follower-index>/_resume
{}
```
#### Sample response
```json
{
"acknowledged": true
}
```
## Get replication status
Introduced 1.1
{: .label .label-purple }
Gets the status of index replication. Possible statuses are `SYNCING`, `BOOTSTRAPING`, `PAUSED`, and `REPLICATION NOT IN PROGRESS`. Use the syncing details to measure replication lag. Send this request to the follower cluster.
#### Request
```json
GET /_plugins/_replication/<follower-index>/_status
```
#### Sample response
```json
{
"status" : "SYNCING",
"reason" : "User initiated",
"leader_alias" : "my-connection-name",
"leader_index" : "leader-01",
"follower_index" : "follower-01",
"syncing_details" : {
"leader_checkpoint" : 19,
"follower_checkpoint" : 19,
"seq_no" : 0
}
}
```
To include shard replication details in the response, add the `&verbose=true` parameter.
The leader and follower checkpoint values begin as negative integers and reflect the shard count (-1 for one shard, -5 for five shards, and so on). The values increment toward positive integers with each change that you make. For example, when you make a change on the leader index, the `leader_checkpoint` becomes `0`. The `follower_checkpoint` is initially still `-1` until the follower index pulls the change from the leader, at which point it increments to `0`. If the values are the same, it means the indices are fully synced.
## Get leader cluster stats
Introduced 1.1
{: .label .label-purple }
Gets information about replicated leader indices on a specified cluster.
#### Request
```json
GET /_plugins/_replication/leader_stats
```
#### Sample response
```json
{
"num_replicated_indices": 2,
"operations_read": 15,
"translog_size_bytes": 1355,
"operations_read_lucene": 0,
"operations_read_translog": 15,
"total_read_time_lucene_millis": 0,
"total_read_time_translog_millis": 659,
"bytes_read": 1000,
"index_stats":{
"leader-index-1":{
"operations_read": 7,
"translog_size_bytes": 639,
"operations_read_lucene": 0,
"operations_read_translog": 7,
"total_read_time_lucene_millis": 0,
"total_read_time_translog_millis": 353,
"bytes_read":466
},
"leader-index-2":{
"operations_read": 8,
"translog_size_bytes": 716,
"operations_read_lucene": 0,
"operations_read_translog": 8,
"total_read_time_lucene_millis": 0,
"total_read_time_translog_millis": 306,
"bytes_read": 534
}
}
}
```
## Get follower cluster stats
Introduced 1.1
{: .label .label-purple }
Gets information about follower (syncing) indices on a specified cluster.
#### Request
```json
GET /_plugins/_replication/follower_stats
```
#### Sample response
```json
{
"num_syncing_indices": 2,
"num_bootstrapping_indices": 0,
"num_paused_indices": 0,
"num_failed_indices": 0,
"num_shard_tasks": 2,
"num_index_tasks": 2,
"operations_written": 3,
"operations_read": 3,
"failed_read_requests": 0,
"throttled_read_requests": 0,
"failed_write_requests": 0,
"throttled_write_requests": 0,
"follower_checkpoint": 1,
"leader_checkpoint": 1,
"total_write_time_millis": 2290,
"index_stats":{
"follower-index-1":{
"operations_written": 2,
"operations_read": 2,
"failed_read_requests": 0,
"throttled_read_requests": 0,
"failed_write_requests": 0,
"throttled_write_requests": 0,
"follower_checkpoint": 1,
"leader_checkpoint": 1,
"total_write_time_millis": 1355
},
"follower-index-2":{
"operations_written": 1,
"operations_read": 1,
"failed_read_requests": 0,
"throttled_read_requests": 0,
"failed_write_requests": 0,
"throttled_write_requests": 0,
"follower_checkpoint": 0,
"leader_checkpoint": 0,
"total_write_time_millis": 935
}
}
}
```
## Get auto-follow stats
Introduced 1.1
{: .label .label-purple }
Gets information about auto-follow activity and any replication rules configured on the specified cluster.
#### Request
```json
GET /_plugins/_replication/autofollow_stats
```
#### Sample response
```json
{
"num_success_start_replication": 2,
"num_failed_start_replication": 0,
"num_failed_leader_calls": 0,
"failed_indices":[
],
"autofollow_stats":[
{
"name":"my-replication-rule",
"pattern":"movies*",
"num_success_start_replication": 2,
"num_failed_start_replication": 0,
"num_failed_leader_calls": 0,
"failed_indices":[
]
}
]
}
```
## Update settings
Introduced 1.1
{: .label .label-purple }
Updates settings on the follower index.
#### Request
```json
PUT /_plugins/_replication/<follower-index>/_update
{
"settings":{
"index.number_of_shards": 4,
"index.number_of_replicas": 2
}
}
```
#### Sample response
```json
{
"acknowledged": true
}
```
## Create replication rule
Introduced 1.1
{: .label .label-purple }
Automatically starts replication on indices matching a specified pattern. If a new index on the leader cluster matches the pattern, OpenSearch automatically creates a follower index and begins replication. You can also use this API to update existing replication rules.
Send this request to the follower cluster.
Make sure to note the names of all auto-follow patterns after you create them. The replication plugin currently does not include an API operation to retrieve a list of existing patterns.
{: .tip }
#### Request
```json
POST /_plugins/_replication/_autofollow
{
"leader_alias" : "<connection-alias-name>",
"name": "<auto-follow-pattern-name>",
"pattern": "<pattern>",
"use_roles":{
"leader_cluster_role": "<role-name>",
"follower_cluster_role": "<role-name>"
}
}
```
Specify the following options:
Options | Description | Type | Required
:--- | :--- |:--- |:--- |
`leader_alias` | The name of the cross-cluster connection. You define this alias when you [set up a cross-cluster connection]({{site.url}}{{site.baseurl}}/replication-plugin/get-started/#set-up-a-cross-cluster-connection). | `string` | Yes
`name` | A name for the auto-follow pattern. | `string` | Yes
`pattern` | An array of index patterns to match against indices in the specified leader cluster. Supports wildcard characters. For example, `leader-*`. | `string` | Yes
`use_roles` | The roles to use for all subsequent backend replication tasks between the indices. Specify a `leader_cluster_role` and `follower_cluster_role`. See [Map the leader and follower cluster roles]({{site.url}}{{site.baseurl}}/replication-plugin/permissions/#map-the-leader-and-follower-cluster-roles). | `string` | If security plugin is enabled
#### Sample response
```json
{
"acknowledged": true
}
```
## Delete replication rule
Introduced 1.1
{: .label .label-purple }
Deletes the specified replication rule. This operation prevents any new indices from being replicated but does not stop existing replication that the rule has already initiated.
Send this request to the follower cluster.
#### Request
```json
DELETE /_plugins/_replication/_autofollow
{
"leader_alias" : "<connection-alias-name>",
"name": "<auto-follow-pattern-name>",
}
```
Specify the following options:
Options | Description | Type | Required
:--- | :--- |:--- |:--- |
`leader_alias` | The name of the cross-cluster connection. You define this alias when you [set up a cross-cluster connection]({{site.url}}{{site.baseurl}}/replication-plugin/get-started/#set-up-a-cross-cluster-connection). | `string` | Yes
`name` | The name of the pattern. | `string` | Yes
#### Sample response
```json
{
"acknowledged": true
}
```

View File

@ -0,0 +1,104 @@
---
layout: default
title: Auto-follow
nav_order: 20
has_children: false
---
# Auto-follow for cross-cluster replication
Auto-follow lets you automatically replicate indices created on the leader cluster based on matching patterns. When you create an index on the leader cluster with a name that matches a specified pattern (for example, `index-01*`), a corresponding follower index is automatically created on the follower cluster.
You can configure multiple replication rules for a single cluster. The patterns currently only support wildcard matching.
## Prerequisites
You need to [set up a cross-cluster connection]({{site.url}}{{site.baseurl}}/replication-plugin/get-started/#set-up-a-cross-cluster-connection) between two clusters before you can enable auto-follow.
## Permissions
If the security plugin is enabled, non-admin users need to be mapped to the appropriate permissions in order to perform replication actions. For index and cluster-level permissions requirements, see [Cross-cluster replication permissions]({{site.url}}{{site.baseurl}}/replication-plugin/permissions/).
## Get started with auto-follow
Replication rules are a collection of patterns that you create against a single remote cluster. When you create a replication rule, it automatically starts replicating any *new* indices that match the pattern, but does not replicate matching indices that were previously created.
Create a replication rule on the follower cluster:
```bash
curl -XPOST -k -H 'Content-Type: application/json' -u 'admin:admin' 'https://localhost:9200/_plugins/_replication/_autofollow?pretty' -d '
{
"leader_alias" : "my-connection-alias",
"name": "my-replication-rule",
"pattern": "movies*",
"use_roles":{
"leader_cluster_role": "all_access",
"follower_cluster_role": "all_access"
}
}'
```
If the security plugin is disabled, you can leave out the `use_roles` parameter. If it's enabled, however, you need to specify the leader and follower cluster roles that OpenSearch uses to authenticate requests. This example uses `all_access` for simplicity, but we recommend creating a replication user on each cluster and [mapping it accordingly]({{site.url}}{{site.baseurl}}/replication-plugin/permissions/#map-the-leader-and-follower-cluster-roles).
{: .tip }
To test the rule, create a matching index on the leader cluster:
```bash
curl -XPUT -k -H 'Content-Type: application/json' -u 'admin:admin' 'https://localhost:9201/movies-0001?pretty'
```
And confirm its replica shows up on the follower cluster:
```bash
curl -XGET -u 'admin:admin' -k 'https://localhost:9200/_cat/indices?v'
```
It might take several seconds for the index to appear.
```bash
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open movies-0001 kHOxYYHxRMeszLjTD9rvSQ 1 1 0 0 208b 208b
```
## Retrieve replication rules
To retrieve a list of existing replication rules configured on a cluster, send the following request:
```bash
curl -XGET -u 'admin:admin' -k 'https://localhost:9200/_plugins/_replication/autofollow_stats'
{
"num_success_start_replication": 1,
"num_failed_start_replication": 0,
"num_failed_leader_calls": 0,
"failed_indices":[
],
"autofollow_stats":[
{
"name":"my-replication-rule",
"pattern":"movies*",
"num_success_start_replication": 1,
"num_failed_start_replication": 0,
"num_failed_leader_calls": 0,
"failed_indices":[
]
}
]
}
```
## Delete a replication rule
When you delete a replication rule, OpenSearch stops replicating *new* indices that match the pattern, but existing indices that the rule previously created continue to replicate. If you need to stop existing replication activity, use the [stop replication API operation]({{site.url}}{{site.baseurl}}/replication-plugin/api/#stop-replication).
```bash
curl -XDELETE -k -H 'Content-Type: application/json' -u 'admin:admin' 'https://localhost:9200/_plugins/_replication/_autofollow?pretty' -d '
{
"leader_alias" : "my-conection-alias",
"name": "my-replication-rule"
}'
```

View File

@ -0,0 +1,273 @@
---
layout: default
title: Get started
nav_order: 10
---
# Get started with cross-cluster replication
With cross-cluster replication, you index data to a leader index, and OpenSearch replicates that data to one or more read-only follower indices. All subsequent operations on the leader are replicated on the follower, such as creating, updating, or deleting documents.
## Prerequisites
Cross-cluster replication has the following prerequisites:
- Both the leader and follower cluster must have the replication plugin installed.
- If you've overridden `node.roles` in `opensearch.yml` on the remote cluster, make sure it also includes the `remote_cluster_client` role:
```yaml
node.roles: [<other_roles>, remote_cluster_client]
```
## Permissions
Make sure the security plugin is either enabled on both clusters or disabled on both clusters. If you disabled the security plugin, you can skip this section. However, we strongly recommend enabling the security plugin in production scenarios.
If the security plugin is enabled, non-admin users need to be mapped to the appropriate permissions in order to perform replication actions. For index and cluster-level permissions requirements, see [Cross-cluster replication permissions]({{site.url}}{{site.baseurl}}/replication-plugin/permissions/).
In addition, add the following setting to `opensearch.yml` on the leader cluster so it allows connections from the follower cluster:
```yml
plugins.security.nodes_dn_dynamic_config_enabled: true
```
## Example setup
Save this sample file as `docker-compose.yml` and run `docker-compose up` to start two single-node clusters on the same network:
```yml
version: '3'
services:
replication-node1:
image: opensearchproject/opensearch:{{site.opensearch_version}}
container_name: replication-node1
environment:
- cluster.name=leader-cluster
- discovery.type=single-node
- bootstrap.memory_lock=true
- "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- opensearch-data2:/usr/share/opensearch/data
ports:
- 9201:9200
- 9700:9600 # required for Performance Analyzer
networks:
- opensearch-net
replication-node2:
image: opensearchproject/opensearch:{{site.opensearch_version}}
container_name: replication-node2
environment:
- cluster.name=follower-cluster
- discovery.type=single-node
- bootstrap.memory_lock=true
- "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- opensearch-data1:/usr/share/opensearch/data
ports:
- 9200:9200
- 9600:9600 # required for Performance Analyzer
networks:
- opensearch-net
volumes:
opensearch-data1:
opensearch-data2:
networks:
opensearch-net:
```
After the clusters start, verify the names of each:
```bash
curl -XGET -u 'admin:admin' -k 'https://localhost:9201'
{
"cluster_name" : "leader-cluster",
...
}
curl -XGET -u 'admin:admin' -k 'https://localhost:9200'
{
"cluster_name" : "follower-cluster",
...
}
```
For this example, use port 9201 (`replication-node1`) as the leader and port 9200 (`replication-node2`) as the follower cluster.
To get the IP address for the leader cluster, first identify its container ID:
```bash
docker ps
CONTAINER ID IMAGE PORTS NAMES
3b8cdc698be5 opensearchproject/opensearch:{{site.opensearch_version}} 0.0.0.0:9200->9200/tcp, 0.0.0.0:9600->9600/tcp, 9300/tcp replication-node2
731f5e8b0f4b opensearchproject/opensearch:{{site.opensearch_version}} 9300/tcp, 0.0.0.0:9201->9200/tcp, 0.0.0.0:9700->9600/tcp replication-node1
```
Then get that container's IP address:
```bash
docker inspect --format='{% raw %}{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}{% endraw %}' 731f5e8b0f4b
172.22.0.3
```
## Set up a cross-cluster connection
Cross-cluster replication follows a "pull" model, so most changes occur on the follower cluster, not the leader cluster.
On the follower cluster, add the IP address (with port 9300) for each seed node. Because this is a single-node cluster, you only have one seed node. Provide a descriptive name for the connection, which you'll use in the request to start replication:
```bash
curl -XPUT -k -H 'Content-Type: application/json' -u 'admin:admin' 'https://localhost:9200/_cluster/settings?pretty' -d '
{
"persistent": {
"cluster": {
"remote": {
"my-connection-alias": {
"seeds": ["172.22.0.3:9300"]
}
}
}
}
}'
```
## Start replication
To get started, create an index called `leader-01` on the leader cluster:
```bash
curl -XPUT -k -H 'Content-Type: application/json' -u 'admin:admin' 'https://localhost:9201/leader-01?pretty'
```
Then start replication from the follower cluster. In the request body, provide the connection name and leader index that you want to replicate, along with the security roles you want to use:
```bash
curl -XPUT -k -H 'Content-Type: application/json' -u 'admin:admin' 'https://localhost:9200/_plugins/_replication/follower-01/_start?pretty' -d '
{
"leader_alias": "my-connection-alias",
"leader_index": "leader-01",
"use_roles":{
"leader_cluster_role": "all_access",
"follower_cluster_role": "all_access"
}
}'
```
If the security plugin is disabled, omit the `use_roles` parameter. If it's enabled, however, you must specify the leader and follower cluster roles that OpenSearch will use to authenticate the request. This example uses `all_access` for simplicity, but we recommend creating a replication user on each cluster and [mapping it accordingly]({{site.url}}{{site.baseurl}}/replication-plugin/permissions/#map-the-leader-and-follower-cluster-roles).
{: .tip }
This command creates an identical read-only index named `follower-01` on the local cluster that continuously stays updated with changes to the `leader-01` index on the remote cluster. Starting replication creates a follower index from scratch; you can't convert an existing index to a follower index.
## Confirm replication
After replication starts, get the status:
```bash
curl -XGET -k -u 'admin:admin' 'https://localhost:9200/_plugins/_replication/follower-01/_status?pretty'
{
"status" : "SYNCING",
"reason" : "User initiated",
"leader_alias" : "my-connection-alias",
"leader_index" : "leader-01",
"follower_index" : "follower-01",
"syncing_details" : {
"leader_checkpoint" : -1,
"follower_checkpoint" : -1,
"seq_no" : 0
}
}
```
Possible statuses are `SYNCING`, `BOOTSTRAPING`, `PAUSED`, and `REPLICATION NOT IN PROGRESS`.
The leader and follower checkpoint values begin as negative numbers and reflect the shard count (-1 for one shard, -5 for five shards, and so on). The values increment with each change and illustrate how many updates the follower is behind the leader. If the indices are fully synced, the values are the same.
To confirm that replication is actually happening, add a document to the leader index:
```bash
curl -XPUT -k -H 'Content-Type: application/json' -u 'admin:admin' 'https://localhost:9201/leader-01/_doc/1?pretty' -d '{"The Shining": "Stephen King"}'
```
Then validate the replicated content on the follower index:
```bash
curl -XGET -k -u 'admin:admin' 'https://localhost:9200/follower-01/_search?pretty'
{
...
"hits": [{
"_index": "follower-01",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"The Shining": "Stephen King"
}
}]
}
```
## Pause and resume replication
You can temporarily pause replication of an index if you need to remediate issues or reduce load on the leader cluster:
```bash
curl -XPOST -k -H 'Content-Type: application/json' -u 'admin:admin' 'https://localhost:9200/_plugins/_replication/follower-01/_pause?pretty' -d '{}'
```
To confirm replication is paused, get the status:
```bash
curl -XGET -k -u 'admin:admin' 'https://localhost:9200/_plugins/_replication/follower-01/_status?pretty'
{
"status" : "PAUSED",
"reason" : "User initiated",
"leader_alias" : "my-connection-alias",
"leader_index" : "leader-01",
"follower_index" : "follower-01"
}
```
When you're done making changes, resume replication:
```bash
curl -XPOST -k -H 'Content-Type: application/json' -u 'admin:admin' 'https://localhost:9200/_plugins/_replication/follower-01/_resume?pretty' -d '{}'
```
When replication resumes, the follower index picks up any changes that were made to the leader index while replication was paused.
Note that you can't resume replication after it's been paused for more than 12 hours. You must [stop replication]({{site.url}}{{site.baseurl}}/replication-plugin/api/#stop-replication), delete the follower index, and restart replication of the leader.
## Stop replication
Terminate replication of a specified index from the follower cluster:
```bash
curl -XPOST -k -H 'Content-Type: application/json' -u 'admin:admin' 'https://localhost:9200/_plugins/_replication/follower-01/_stop?pretty' -d '{}'
```
When you stop replication, the follower index un-follows the leader and becomes a standard index that you can write to. You can't restart replication after stopping it.
Get the status to confirm that the index is no longer being replicated:
```bash
curl -XGET -k -u 'admin:admin' 'https://localhost:9200/_plugins/_replication/follower-01/_status?pretty'
{
"status" : "REPLICATION NOT IN PROGRESS"
}
```
You can further confirm that replication is stopped by making modifications to the leader index and confirming they don't show up on the follower index.

View File

@ -0,0 +1,23 @@
---
layout: default
title: Cross-cluster replication
nav_order: 1
has_children: false
redirect_from:
- /replication-plugin/
---
# Cross-cluster replication
The cross-cluster replication plugin lets you replicate indices, mappings, and metadata from one OpenSearch cluster to another. Cross-cluster replication has the following benefits:
- By replicating your indices, you ensure that you can continue to handle search requests in the event of an outage.
- Replicating data across geographically distant data centers minimizes the distance between the data and the application server, reducing expensive latencies.
- You can replicate data from multiple smaller clusters to a centralized reporting cluster, which is useful when it's inefficient to query across a large network.
Replication follows an active-passive model where the follower index (where the data is replicated) pulls data from the leader (remote) index.
The replication plugin supports replication of indices using wildcard pattern matching and provides commands to pause, resume, and stop replication. Once replication starts on an index, it initiates persistent background tasks on all primary shards on the follower cluster, which continuously poll corresponding shards from the leader cluster for updates.
You can use the replication plugin with the security plugin to encrypt cross-cluster traffic with node-to-node encryption and control access to replication activities.
To start, see [Get started with cross-cluster replication]({{site.url}}{{site.baseurl}}/replication-plugin/get-started/).

View File

@ -0,0 +1,78 @@
---
layout: default
title: Permissions
nav_order: 30
---
# Cross-cluster replication permissions
You can use the [security plugin]({{site.url}}{{site.baseurl}}/security-plugin/index/) with cross-cluster replication to limit users to certain actions. For example, you might want certain users to only perform replication activity on the leader or follower cluster.
Because cross-cluster replication involves multiple clusters, it's possible that clusters might have different security configurations. The following configurations are supported:
- Security plugin fully enabled on both clusters
- Security plugin enabled only for TLS on both clusters (`plugins.security.ssl_only`)
- Security plugin absent or disabled on both clusters (not recommended)
Enable node-to-node encryption on both the leader and the follower cluster to ensure that replication traffic between the clusters is encrypted.
## Basic permissions
In order for non-admin users to perform replication activities, they must be mapped to the appropriate permissions.
The security plugin has two built-in roles that cover most replication use cases: `cross_cluster_replication_leader_full_access`, which provides replication permissions on the leader cluster, and `cross_cluster_replication_follower_full_access`, which provides replication permissions on the follower cluster. For descriptions of each, see [Predefined roles]({{site.url}}{{site.baseurl}}/security-plugin/access-control/users-roles#predefined-roles).
If you don't want to use the default roles, you can combine individual replication [permissions]({{site.url}}{{site.baseurl}}/replication-plugin/permissions/#replication-permissions) to meet your needs. Most permissions correspond to specific REST API operations. For example, the `indices:admin/plugins/replication/index/pause` permission lets you pause replication.
## Map the leader and follower cluster roles
The [start replication]({{site.url}}{{site.baseurl}}/replication-plugin/api/#start-replication) and [create replication rule]({{site.url}}{{site.baseurl}}/replication-plugin/api/#create-replication-rule) operations are special cases. They involve background processes on the leader and follower clusters that must be associated with roles. When you perform one of these actions, you must explicitly pass the `leader_cluster_role` and
`follower_cluster_role` in the request, which OpenSearch then uses in all backend replication tasks.
To enable non-admins to start replication and create replication rules, create an identical user on each cluster (for example, `replication_user`) and map them to the `cross_cluster_replication_leader_full_access` role on the remote cluster and `cross_cluster_replication_follower_full_access` on the follower cluster. For instructions, see [Map users to roles]({{site.url}}{{site.baseurl}}/security-plugin/access-control/users-roles/#map-users-to-roles).
Then add those roles to the request, and sign it with the appropriate credentials:
```bash
curl -XPUT -k -H 'Content-Type: application/json' -u 'replication_user:password' 'https://localhost:9200/_plugins/_replication/follower-01/_start?pretty' -d '
{
"leader_alias": "leader-cluster",
"leader_index": "leader-01",
"use_roles":{
"leader_cluster_role": "cross_cluster_replication_leader_full_access",
"follower_cluster_role": "cross_cluster_replication_follower_full_access"
}
}'
```
You can create your own, custom leader and follower cluster roles using individual permissions, but we recommend using the default roles, which are a good fit for most use cases.
## Replication permissions
The following sections list the available index and cluster-level permissions for cross-cluster replication.
### Follower cluster
The security plugin supports these permissions for the follower cluster:
```
indices:admin/plugins/replication/index/setup/validate
indices:admin/plugins/replication/index/start
indices:admin/plugins/replication/index/pause
indices:admin/plugins/replication/index/resume
indices:admin/plugins/replication/index/stop
indices:admin/plugins/replication/index/update
indices:admin/plugins/replication/index/status_check
indices:data/write/plugins/replication/changes
cluster:admin/plugins/replication/autofollow/update
```
### Leader cluster
The security plugin supports these permissions for the leader cluster:
```
indices:admin/plugins/replication/validate
indices:data/read/plugins/replication/file_chunk
indices:data/read/plugins/replication/changes
```

View File

@ -0,0 +1,34 @@
---
layout: default
title: Settings
nav_order: 40
---
# Replication settings
The replication plugin adds several settings to the standard OpenSearch cluster settings.
The settings are dynamic, so you can change the default behavior of the plugin without restarting your cluster.
You can mark settings as `persistent` or `transient`.
For example, to update how often the follower cluster polls the leader cluster for updates:
```json
PUT _cluster/settings
{
"persistent": {
"plugins.replication.follower.metadata_sync_interval": "30s"
}
}
```
These settings manage the resources consumed by remote recoveries. We dont recommend changing these settings; the defaults should work well for most use cases.
Setting | Default | Description
:--- | :--- | :---
`plugins.replication.follower.index.recovery.chunk_size` | 10MB | The chunk size requested by the follower cluster during file transfer. Specify the chunk size as a value and unit, for example, 10MB, 5KB. See [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/).
`plugins.replication.follower.index.recovery.max_concurrent_file_chunks` | 4 | The number of file chunk requests that can be sent in parallel for each recovery.
`plugins.replication.follower.index.ops_batch_size` | 5000 | The number of operations that can be fetched at a time during the syncing phase of replication.
`plugins.replication.follower.concurrent_readers_per_shard` | 2 | The number of concurrent requests from the follower cluster per shard during the syncing phase of replication.
`plugins.replication.autofollow.fetch_poll_interval` | 30s | How often auto-follow tasks poll the leader cluster for new matching indices.
`plugins.replication.follower.metadata_sync_interval` | 60s | How often the follower cluster polls the leader cluster for updated index metadata.

View File

@ -1,3 +1,5 @@
$content-width: 740px;
//
// Brand colors
//

View File

@ -54,13 +54,13 @@ body {
code {
@include monospace;
font-size: 0.75rem;
font-size: 0.8rem;
-webkit-font-smoothing: subpixel-antialiased;
-moz-osx-font-smoothing: auto;
}
.site-nav {
padding-top: 2rem;
padding-top: 1rem;
}
.nav-category {
@ -108,6 +108,13 @@ code {
}
}
.site-footer {
display: none;
@include mq(lg) {
display: block;
}
}
.external-arrow {
position: relative;
top: 0.125rem;
@ -389,7 +396,7 @@ html {
body {
@include serif;
@include font-size(18);
@include font-size(16);
background: $background-lightest;
color: $text;
line-height: 1.6;
@ -962,6 +969,7 @@ main {
line-height: 1.3;
padding: 1px 0 6px;
margin: .45em 0 .35em;
letter-spacing: -1px;
@include mq(md) {
@include font-size(32, true);
@ -975,6 +983,7 @@ main {
a {
font-weight: 300;
background: none;
color: $text-link-alternate;
&:hover, :active {
background: none;
@ -1082,6 +1091,11 @@ main {
@include sans-serif;
@include warning-stripes;
/* To match the website */
font-size: 1.125rem;
text-decoration-thickness: 0.5px;
text-underline-offset: 1px;
a {
color: $text;
@ -1102,3 +1116,17 @@ main {
}
}
}
.version-wrapper {
text-align: center;
margin-bottom: 1rem;
}
version-selector {
z-index: 1;
font-size: .9rem;
--normal-bg: linear-gradient(#{lighten($blue-300, 5%)}, #{darken($blue-300, 2%)});
--hover-bg: linear-gradient(#{lighten($blue-300, 2%)}, #{darken($blue-300, 4%)});
--link-color: #{$blue-300};
}

View File

@ -194,12 +194,16 @@ For asynchronous searches with `keep_on_completion` as `true` and a sufficiently
Introduced 1.0
{: .label .label-purple }
You can use the DELETE API operation to delete any ongoing asynchronous search by its ID. If the search is still running, its canceled. If the search is complete, the saved search results are deleted.
To delete an asynchronous search:
```json
```
DELETE _plugins/_asynchronous_search/<ID>?pretty
```
- If the search is still running, OpenSearch cancels it.
- If the search is complete, OpenSearch deletes the saved results.
#### Sample response
```json

View File

@ -11,7 +11,7 @@ has_math: true
The approximate k-NN method uses [nmslib's](https://github.com/nmslib/nmslib/) implementation of the Hierarchical Navigable Small World (HNSW) algorithm to power k-NN search. In this case, approximate means that for a given search, the neighbors returned are an estimate of the true k-nearest neighbors. Of the three methods, this method offers the best search scalability for large data sets. Generally speaking, once the data set gets into the hundreds of thousands of vectors, this approach is preferred.
The k-NN plugin builds an HNSW graph of the vectors for each "knn-vector field"/ "Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about Lucene segments, please refer to [Apache Lucene's documentation](https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). These graphs are loaded into native memory during search and managed by a cache. To learn more about pre-loading graphs into memory, refer to the [warmup API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#warmup-operation). Additionally, you can see what graphs are already loaded in memory, which you can learn more about in the [stats API section]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#stats).
The k-NN plugin builds an HNSW graph of the vectors for each "knn-vector field"/ "Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about Lucene segments, see the [Apache Lucene documentation](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). These graphs are loaded into native memory during search and managed by a cache. To learn more about pre-loading graphs into memory, refer to the [warmup API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#warmup-operation). Additionally, you can see what graphs are already loaded in memory, which you can learn more about in the [stats API section]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#stats).
Because the graphs are constructed during indexing, it is not possible to apply a filter on an index and then use this search method. All filters are applied on the results produced by the approximate nearest neighbor search.
@ -36,7 +36,7 @@ PUT my-knn-index-1
"properties": {
"my_vector1": {
"type": "knn_vector",
"dimension": 4,
"dimension": 2,
"method": {
"name": "hnsw",
"space_type": "l2",

View File

@ -1,7 +1,7 @@
---
layout: default
title: API
parent: Access Control
parent: Access control
nav_order: 90
---
@ -1159,6 +1159,12 @@ Introduced 1.0
Updates the existing configuration using the REST API. This operation can easily break your existing configuration, so we recommend using `securityadmin.sh` instead, which is far safer. See [Access control for the API](#access-control-for-the-api) for how to enable this operation.
Before you can execute the operation, you must first add the following line to `opensearch.yml`:
```yml
plugins.security.unsupported.restapi.allow_securityconfig_modification: true
```
#### Request
```json
@ -1179,6 +1185,106 @@ PATCH _plugins/_security/api/securityconfig
}
```
---
## Distinguished names
These REST APIs let a super admin add, retrieve, update, or delete any distinguished names from an allow list to enable communication between clusters and/or nodes.
Before you can use the REST API to configure the allow list, you must first add the following line to `opensearch.yml`:
```yml
plugins.security.nodes_dn_dynamic_config_enabled: true
```
### Get distinguished names
Retrieves all distinguished names in the allow list.
#### Request
```
GET _plugins/_security/api/nodesdn
```
#### Sample response
```json
{
"cluster1": {
"nodes_dn": [
"CN=cluster1.example.com"
]
}
}
```
To get the distinguished names from a specific cluster's or node's allow list, include the cluster's name in the request path.
#### Request
```
GET _plugins/_security/api/nodesdn/<cluster-name>
```
#### Sample response
```json
{
"cluster3": {
"nodes_dn": [
"CN=cluster3.example.com"
]
}
}
```
### Update distinguished names
Adds or updates the specified distinguished names in the cluster's or node's allow list.
#### Request
```json
PUT _plugins/_security/api/nodesdn/<cluster-name>
{
"nodes_dn": [
"CN=cluster3.example.com"
]
}
```
#### Sample response
```json
{
"status": "CREATED",
"message": "'cluster3' created."
}
```
### Delete distinguished names
Deletes all distinguished names in the specified cluster's or node's allow list.
#### Request
```
DELETE _plugins/_security/api/nodesdn/<cluster-name>
```
#### Sample response
```json
{
"status": "OK",
"message": "'cluster3' deleted."
}
```
---
@ -1188,101 +1294,38 @@ PATCH _plugins/_security/api/securityconfig
Introduced 1.0
{: .label .label-purple }
Retrieves the current security plugin configuration in JSON format.
Retrieves the cluster's security certificates.
#### Request
```
GET _plugins/_security/api/securityconfig
```
### Update configuration
Introduced 1.0
{: .label .label-purple }
Creates or updates the existing configuration using the REST API rather than `securityadmin.sh`. This operation can easily break your existing configuration, so we recommend using `securityadmin.sh` instead. See [Access control for the API](#access-control-for-the-api) for how to enable this operation.
#### Request
```json
PUT _plugins/_security/api/securityconfig/config
{
"dynamic": {
"filtered_alias_mode": "warn",
"disable_rest_auth": false,
"disable_intertransport_auth": false,
"respect_request_indices_options": false,
"opensearch-dashboards": {
"multitenancy_enabled": true,
"server_username": "kibanaserver",
"index": ".opensearch-dashboards"
},
"http": {
"anonymous_auth_enabled": false
},
"authc": {
"basic_internal_auth_domain": {
"http_enabled": true,
"transport_enabled": true,
"order": 0,
"http_authenticator": {
"challenge": true,
"type": "basic",
"config": {}
},
"authentication_backend": {
"type": "intern",
"config": {}
},
"description": "Authenticate via HTTP Basic against internal users database"
}
},
"auth_failure_listeners": {},
"do_not_fail_on_forbidden": false,
"multi_rolespan_enabled": true,
"hosts_resolver_mode": "ip-only",
"do_not_fail_on_forbidden_empty": false
}
}
GET _opendistro/_security/api/ssl/certs
```
#### Sample response
```json
{
"status": "OK",
"message": "'config' updated."
"http_certificates_list": [
{
"issuer_dn": "CN=Example Com Inc. Root CA,OU=Example Com Inc. Root CA,O=Example Com Inc.,DC=example,DC=com",
"subject_dn": "CN=node-0.example.com,OU=node,O=node,L=test,DC=de",
"san": "[[8, 1.2.3.4.5.5], [2, node-0.example.com]",
"not_before": "2018-04-22T03:43:47Z",
"not_after": "2028-04-19T03:43:47Z"
}
],
"transport_certificates_list": [
{
"issuer_dn": "CN=Example Com Inc. Root CA,OU=Example Com Inc. Root CA,O=Example Com Inc.,DC=example,DC=com",
"subject_dn": "CN=node-0.example.com,OU=node,O=node,L=test,DC=de",
"san": "[[8, 1.2.3.4.5.5], [2, node-0.example.com]",
"not_before": "2018-04-22T03:43:47Z",
"not_after": "2028-04-19T03:43:47Z"
}
]
}
```
### Patch configuration
Introduced 1.0
{: .label .label-purple }
Updates the existing configuration using the REST API rather than `securityadmin.sh`. This operation can easily break your existing configuration, so we recommend using `securityadmin.sh` instead. See [Access control for the API](#access-control-for-the-api) for how to enable this operation.
#### Request
```json
PATCH _plugins/_security/api/securityconfig
[
{
"op": "replace", "path": "/config/dynamic/authc/basic_internal_auth_domain/transport_enabled", "value": "true"
}
]
```
#### Sample response
```json
{
"status": "OK",
"message": "Resource updated."
}
```
---
## Cache

View File

@ -1,7 +1,7 @@
---
layout: default
title: Cross-Cluster Search
parent: Access Control
title: Cross-cluster search
parent: Access control
nav_order: 40
---
@ -65,11 +65,11 @@ Save this file as `docker-compose.yml` and run `docker-compose up` to start two
```yml
version: '3'
services:
opensearch-node1:
opensearch-ccs-node1:
image: opensearchproject/opensearch:{{site.opensearch_version}}
container_name: opensearch-node1
container_name: opensearch-ccs-node1
environment:
- cluster.name=opensearch-cluster1
- cluster.name=opensearch-ccs-cluster1
- discovery.type=single-node
- bootstrap.memory_lock=true # along with the memlock settings below, disables swapping
- "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM
@ -85,11 +85,11 @@ services:
networks:
- opensearch-net
opensearch-node2:
opensearch-ccs-node2:
image: opensearchproject/opensearch:{{site.opensearch_version}}
container_name: opensearch-node2
container_name: opensearch-ccs-node2
environment:
- cluster.name=opensearch-cluster2
- cluster.name=opensearch-ccs-cluster2
- discovery.type=single-node
- bootstrap.memory_lock=true # along with the memlock settings below, disables swapping
- "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM
@ -118,26 +118,26 @@ After the clusters start, verify the names of each:
```json
curl -XGET -u 'admin:admin' -k 'https://localhost:9200'
{
"cluster_name" : "opensearch-cluster1",
"cluster_name" : "opensearch-ccs-cluster1",
...
}
curl -XGET -u 'admin:admin' -k 'https://localhost:9250'
{
"cluster_name" : "opensearch-cluster2",
"cluster_name" : "opensearch-ccs-cluster2",
...
}
```
Both clusters run on `localhost`, so the important identifier is the port number. In this case, use port 9200 (`opensearch-node1`) as the remote cluster, and port 9250 (`opensearch-node2`) as the coordinating cluster.
Both clusters run on `localhost`, so the important identifier is the port number. In this case, use port 9200 (`opensearch-ccs-node1`) as the remote cluster, and port 9250 (`opensearch-ccs-node2`) as the coordinating cluster.
To get the IP address for the remote cluster, first identify its container ID:
```bash
docker ps
CONTAINER ID IMAGE PORTS NAMES
6fe89ebc5a8e opensearchproject/opensearch:{{site.opensearch_version}} 0.0.0.0:9200->9200/tcp, 0.0.0.0:9600->9600/tcp, 9300/tcp opensearch-node1
2da08b6c54d8 opensearchproject/opensearch:{{site.opensearch_version}} 9300/tcp, 0.0.0.0:9250->9200/tcp, 0.0.0.0:9700->9600/tcp opensearch-node2
6fe89ebc5a8e opensearchproject/opensearch:{{site.opensearch_version}} 0.0.0.0:9200->9200/tcp, 0.0.0.0:9600->9600/tcp, 9300/tcp opensearch-ccs-node1
2da08b6c54d8 opensearchproject/opensearch:{{site.opensearch_version}} 9300/tcp, 0.0.0.0:9250->9200/tcp, 0.0.0.0:9700->9600/tcp opensearch-ccs-node2
```
Then get that container's IP address:
@ -154,7 +154,7 @@ curl -k -XPUT -H 'Content-Type: application/json' -u 'admin:admin' 'https://loca
{
"persistent": {
"search.remote": {
"opensearch-cluster1": {
"opensearch-ccs-cluster1": {
"seeds": ["172.31.0.3:9300"]
}
}
@ -171,11 +171,11 @@ curl -XPUT -k -H 'Content-Type: application/json' -u 'admin:admin' 'https://loca
At this point, cross-cluster search works. You can test it using the `admin` user:
```bash
curl -XGET -k -u 'admin:admin' 'https://localhost:9250/opensearch-cluster1:books/_search?pretty'
curl -XGET -k -u 'admin:admin' 'https://localhost:9250/opensearch-ccs-cluster1:books/_search?pretty'
{
...
"hits": [{
"_index": "opensearch-cluster1:books",
"_index": "opensearch-ccs-cluster1:books",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
@ -196,7 +196,7 @@ curl -XPUT -k -u 'admin:admin' 'https://localhost:9250/_plugins/_security/api/in
Then run the same search as before with `booksuser`:
```json
curl -XGET -k -u booksuser:password 'https://localhost:9250/opensearch-cluster1:books/_search?pretty'
curl -XGET -k -u booksuser:password 'https://localhost:9250/opensearch-ccs-cluster1:books/_search?pretty'
{
"error" : {
"root_cause" : [
@ -225,11 +225,11 @@ Both clusters must have the user, but only the remote cluster needs the role and
Finally, repeat the search:
```bash
curl -XGET -k -u booksuser:password 'https://localhost:9250/opensearch-cluster1:books/_search?pretty'
curl -XGET -k -u booksuser:password 'https://localhost:9250/opensearch-ccs-cluster1:books/_search?pretty'
{
...
"hits": [{
"_index": "opensearch-cluster1:books",
"_index": "opensearch-ccs-cluster1:books",
"_type": "_doc",
"_id": "1",
"_score": 1.0,

View File

@ -1,7 +1,7 @@
---
layout: default
title: Default Action Groups
parent: Access Control
title: Default action groups
parent: Access control
nav_order: 51
---

View File

@ -1,7 +1,7 @@
---
layout: default
title: Document-Level Security
parent: Access Control
title: Document-level security
parent: Access control
nav_order: 10
---

View File

@ -1,7 +1,7 @@
---
layout: default
title: Field-Level Security
parent: Access Control
title: Field-level security
parent: Access control
nav_order: 11
---

View File

@ -1,7 +1,7 @@
---
layout: default
title: Field Masking
parent: Access Control
title: Field masking
parent: Access control
nav_order: 12
---

View File

@ -1,7 +1,7 @@
---
layout: default
title: User Impersonation
parent: Access Control
title: User impersonation
parent: Access control
nav_order: 20
---
@ -41,8 +41,8 @@ plugins.security.authcz.impersonation_dn:
## Impersonating Users
To impersonate another user, submit a request to the system with the HTTP header `opensearch_security_impersonate_as` set to the name of the user to be impersonated. A good test is to make a GET request to the `_plugins/_security/authinfo` URI:
To impersonate another user, submit a request to the system with the HTTP header `opendistro_security_impersonate_as` set to the name of the user to be impersonated. A good test is to make a GET request to the `_plugins/_security/authinfo` URI:
```bash
curl -XGET -u 'admin:admin' -k -H "opensearch_security_impersonate_as: user_1" https://localhost:9200/_plugins/_security/authinfo?pretty
curl -XGET -u 'admin:admin' -k -H "opendistro_security_impersonate_as: user_1" https://localhost:9200/_plugins/_security/authinfo?pretty
```

View File

@ -1,6 +1,6 @@
---
layout: default
title: Access Control
title: Access control
nav_order: 10
has_children: true
has_toc: false

View File

@ -1,7 +1,7 @@
---
layout: default
title: OpenSearch Dashboards Multi-Tenancy
parent: Access Control
title: OpenSearch Dashboards multi-tenancy
parent: Access control
nav_order: 30
---
@ -47,21 +47,21 @@ Setting | Description
opensearch.username: kibanaserver
opensearch.password: kibanaserver
opensearch.requestHeadersWhitelist: ["securitytenant","Authorization"]
plugins.security.multitenancy.enabled: true
plugins.security.multitenancy.tenants.enable_global: true
plugins.security.multitenancy.tenants.enable_private: true
plugins.security.multitenancy.tenants.preferred: ["Private", "Global"]
plugins.security.multitenancy.enable_filter: false
opensearch_security.multitenancy.enabled: true
opensearch_security.multitenancy.tenants.enable_global: true
opensearch_security.multitenancy.tenants.enable_private: true
opensearch_security.multitenancy.tenants.preferred: ["Private", "Global"]
opensearch_security.multitenancy.enable_filter: false
```
Setting | Description
:--- | :---
`opensearch.requestHeadersWhitelist` | OpenSearch Dashboards requires that you whitelist all HTTP headers that it passes to OpenSearch. Multi-tenancy uses a specific header, `securitytenant`, that must be present with the standard `Authorization` header. If the `securitytenant` header is not whitelisted, OpenSearch Dashboards starts with a red status.
`plugins.security.multitenancy.enabled` | Enables or disables multi-tenancy in OpenSearch Dashboards. Default is true.
`plugins.security.multitenancy.tenants.enable_global` | Enables or disables the global tenant. Default is true.
`plugins.security.multitenancy.tenants.enable_private` | Enables or disables the private tenant. Default is true.
`plugins.security.multitenancy.tenants.preferred` | Lets you change ordering in the **Tenants** tab of OpenSearch Dashboards. By default, the list starts with global and private (if enabled) and then proceeds alphabetically. You can add tenants here to move them to the top of the list.
`plugins.security.multitenancy.enable_filter` | If you have many tenants, you can add a search bar to the top of the list. Default is false.
`opensearch_security.multitenancy.enabled` | Enables or disables multi-tenancy in OpenSearch Dashboards. Default is true.
`opensearch_security.multitenancy.tenants.enable_global` | Enables or disables the global tenant. Default is true.
`opensearch_security.multitenancy.tenants.enable_private` | Enables or disables the private tenant. Default is true.
`opensearch_security.multitenancy.tenants.preferred` | Lets you change ordering in the **Tenants** tab of OpenSearch Dashboards. By default, the list starts with global and private (if enabled) and then proceeds alphabetically. You can add tenants here to move them to the top of the list.
`opensearch_security.multitenancy.enable_filter` | If you have many tenants, you can add a search bar to the top of the list. Default is false.
## Add tenants

View File

@ -1,64 +1,136 @@
---
layout: default
title: Permissions
parent: Access Control
parent: Access control
nav_order: 50
---
# Permissions
This page is a complete list of available permissions in the security plugin. Each permission controls access to a data type or API.
Each permission in the security plugin controls access to some action that the OpenSearch cluster can perform, such as indexing a document or checking cluster health.
Rather than creating new action groups from individual permissions, you can often achieve your desired security posture using some combination of the default action groups. To learn more, see [Default Action Groups]({{site.url}}{{site.baseurl}}/security-plugin/access-control/default-action-groups/).
Most permissions are self-describing. For example, `cluster:admin/ingest/pipeline/get` lets you retrieve information about ingest pipelines. _In many cases_, a permission correlates to a specific REST API operation, such as `GET _ingest/pipeline`.
Despite this correlation, permissions do **not** directly map to REST API operations. Operations such as `POST _bulk` and `GET _msearch` can access many indices and perform many actions in a single request. Even a simple request, such as `GET _cat/nodes`, performs several actions in order to generate its response.
In short, controlling access to the REST API is insufficient. Instead, the security plugin controls access to the underlying OpenSearch actions.
For example, consider the following `_bulk` request:
```json
POST _bulk
{ "delete": { "_index": "test-index", "_id": "tt2229499" } }
{ "index": { "_index": "test-index", "_id": "tt1979320" } }
{ "title": "Rush", "year": 2013 }
{ "create": { "_index": "test-index", "_id": "tt1392214" } }
{ "title": "Prisoners", "year": 2013 }
{ "update": { "_index": "test-index", "_id": "tt0816711" } }
{ "doc" : { "title": "World War Z" } }
```
For this request to succeed, you must have the following permissions for `test-index`:
- indices:data/write/bulk*
- indices:data/write/delete
- indices:data/write/index
- indices:data/write/update
These permissions also allow you add, update, or delete documents (e.g. `PUT test-index/_doc/tt0816711`), because they govern the underlying OpenSearch actions of indexing and deleting documents rather than a specific API path and HTTP method.
## Test permissions
If you want a user to have the absolute minimum set of permissions necessary to perform some function---the [principle of least privilege](https://en.wikipedia.org/wiki/Principle_of_least_privilege)----the best way is to send representative requests to your cluster as a new test user. In the case of a permissions error, the security plugin is very explicit about which permissions are missing. Consider this request and response:
```json
GET _cat/shards?v
{
"error": {
"root_cause": [{
"type": "security_exception",
"reason": "no permissions for [indices:monitor/stats] and User [name=test-user, backend_roles=[], requestedTenant=null]"
}]
},
"status": 403
}
```
[Create a user and a role]({{site.url}}{{site.baseurl}}/security-plugin/access-control/users-roles/), map the role to the user, and start sending signed requests using curl, Postman, or any other client. Then gradually add permissions to the role as you encounter errors. Even after you resolve one permissions error, the same request might generate new errors; the plugin only returns the first error it encounters, so keep trying until the request succeeds.
Rather than individual permissions, you can often achieve your desired security posture using a combination of the default action groups. See [Default action groups]({{site.url}}{{site.baseurl}}/security-plugin/access-control/default-action-groups/) for descriptions of the permissions that each group grants.
{: .tip }
## Cluster
## Cluster permissions
These permissions are for the cluster and can't be applied granularly. For example, you either have permissions to take snapshots (`cluster:admin/snapshot/create`) or you don't. You can't have permissions to take snapshots only for certain indices.
- cluster:admin/ingest/pipeline/delete
- cluster:admin/ingest/pipeline/get
- cluster:admin/ingest/pipeline/put
- cluster:admin/ingest/pipeline/simulate
- cluster:admin/ingest/processor/grok/get
- cluster:admin/opensearch/ad/detector/delete
- cluster:admin/opensearch/ad/detector/jobmanagement
- cluster:admin/opensearch/ad/detector/run
- cluster:admin/opensearch/ad/detector/search
- cluster:admin/opensearch/ad/detector/stats
- cluster:admin/opensearch/ad/detector/write
- cluster:admin/opensearch/ad/detectors/get
- cluster:admin/opensearch/ad/result/search
- cluster:admin/opensearch/alerting/alerts/ack
- cluster:admin/opensearch/alerting/alerts/get
- cluster:admin/opensearch/alerting/destination/delete
- cluster:admin/opensearch/alerting/destination/email_account/delete
- cluster:admin/opensearch/alerting/destination/email_account/get
- cluster:admin/opensearch/alerting/destination/email_account/search
- cluster:admin/opensearch/alerting/destination/email_account/write
- cluster:admin/opensearch/alerting/destination/email_group/delete
- cluster:admin/opensearch/alerting/destination/email_group/get
- cluster:admin/opensearch/alerting/destination/email_group/search
- cluster:admin/opensearch/alerting/destination/email_group/write
- cluster:admin/opensearch/alerting/destination/get
- cluster:admin/opensearch/alerting/destination/write
- cluster:admin/opensearch/alerting/monitor/delete
- cluster:admin/opensearch/alerting/monitor/execute
- cluster:admin/opensearch/alerting/monitor/get
- cluster:admin/opensearch/alerting/monitor/search
- cluster:admin/opensearch/alerting/monitor/write
- cluster:admin/opensearch/asynchronous_search/stats
- cluster:admin/opensearch/asynchronous_search/delete
- cluster:admin/opensearch/asynchronous_search/get
- cluster:admin/opensearch/asynchronous_search/submit
- cluster:admin/opensearch/reports/definition/create
- cluster:admin/opensearch/reports/definition/delete
- cluster:admin/opensearch/reports/definition/get
- cluster:admin/opensearch/reports/definition/list
- cluster:admin/opensearch/reports/definition/on_demand
- cluster:admin/opensearch/reports/definition/update
- cluster:admin/opensearch/reports/instance/get
- cluster:admin/opensearch/reports/instance/list
- cluster:admin/opensearch/reports/menu/download
- cluster:admin/opendistro/ad/detector/delete
- cluster:admin/opendistro/ad/detector/info
- cluster:admin/opendistro/ad/detector/jobmanagement
- cluster:admin/opendistro/ad/detector/preview
- cluster:admin/opendistro/ad/detector/run
- cluster:admin/opendistro/ad/detector/search
- cluster:admin/opendistro/ad/detector/stats
- cluster:admin/opendistro/ad/detector/write
- cluster:admin/opendistro/ad/detectors/get
- cluster:admin/opendistro/ad/result/search
- cluster:admin/opendistro/ad/tasks/search
- cluster:admin/opendistro/alerting/alerts/ack (acknowledge)
- cluster:admin/opendistro/alerting/alerts/get
- cluster:admin/opendistro/alerting/destination/delete
- cluster:admin/opendistro/alerting/destination/email_account/delete
- cluster:admin/opendistro/alerting/destination/email_account/get
- cluster:admin/opendistro/alerting/destination/email_account/search
- cluster:admin/opendistro/alerting/destination/email_account/write
- cluster:admin/opendistro/alerting/destination/email_group/delete
- cluster:admin/opendistro/alerting/destination/email_group/get
- cluster:admin/opendistro/alerting/destination/email_group/search
- cluster:admin/opendistro/alerting/destination/email_group/write
- cluster:admin/opendistro/alerting/destination/get
- cluster:admin/opendistro/alerting/destination/write
- cluster:admin/opendistro/alerting/monitor/delete
- cluster:admin/opendistro/alerting/monitor/execute
- cluster:admin/opendistro/alerting/monitor/get
- cluster:admin/opendistro/alerting/monitor/search
- cluster:admin/opendistro/alerting/monitor/write
- cluster:admin/opendistro/asynchronous_search/stats
- cluster:admin/opendistro/asynchronous_search/delete
- cluster:admin/opendistro/asynchronous_search/get
- cluster:admin/opendistro/asynchronous_search/submit
- cluster:admin/opendistro/ism/managedindex/add
- cluster:admin/opendistro/ism/managedindex/change
- cluster:admin/opendistro/ism/managedindex/remove
- cluster:admin/opendistro/ism/managedindex/explain
- cluster:admin/opendistro/ism/managedindex/retry
- cluster:admin/opendistro/ism/policy/write
- cluster:admin/opendistro/ism/policy/get
- cluster:admin/opendistro/ism/policy/search
- cluster:admin/opendistro/ism/policy/delete
- cluster:admin/opendistro/rollup/index
- cluster:admin/opendistro/rollup/get
- cluster:admin/opendistro/rollup/search
- cluster:admin/opendistro/rollup/delete
- cluster:admin/opendistro/rollup/start
- cluster:admin/opendistro/rollup/stop
- cluster:admin/opendistro/rollup/explain
- cluster:admin/opendistro/reports/definition/create
- cluster:admin/opendistro/reports/definition/update
- cluster:admin/opendistro/reports/definition/on_demand
- cluster:admin/opendistro/reports/definition/delete
- cluster:admin/opendistro/reports/definition/get
- cluster:admin/opendistro/reports/definition/list
- cluster:admin/opendistro/reports/instance/list
- cluster:admin/opendistro/reports/instance/get
- cluster:admin/opendistro/reports/menu/download
- cluster:admin/plugins/replication/autofollow/update
- cluster:admin/reindex/rethrottle
- cluster:admin/repository/delete
- cluster:admin/repository/get
@ -94,7 +166,9 @@ Rather than creating new action groups from individual permissions, you can ofte
- cluster:monitor/tasks/list
## Indices
## Index permissions
These permissions apply to an index or index pattern. You might want a user to have read access to all indices (i.e. `*`), but write access to only a few (e.g. `web-logs` and `product-catalog`).
- indices:admin/aliases
- indices:admin/aliases/exists
@ -102,18 +176,34 @@ Rather than creating new action groups from individual permissions, you can ofte
- indices:admin/analyze
- indices:admin/cache/clear
- indices:admin/close
- indices:admin/create
- indices:admin/delete
- indices:admin/close*
- indices:admin/create (create indices)
- indices:admin/data_stream/create
- indices:admin/data_stream/delete
- indices:admin/data_stream/get
- indices:admin/delete (delete indices)
- indices:admin/exists
- indices:admin/flush
- indices:admin/flush*
- indices:admin/forcemerge
- indices:admin/get
- indices:admin/get (retrieve index and mapping)
- indices:admin/index_template/delete
- indices:admin/index_template/get
- indices:admin/index_template/put
- indices:admin/index_template/simulate
- indices:admin/index_template/simulate_index
- indices:admin/mapping/put
- indices:admin/mappings/fields/get
- indices:admin/mappings/fields/get*
- indices:admin/mappings/get
- indices:admin/open
- indices:admin/plugins/replication/index/setup/validate
- indices:admin/plugins/replication/index/start
- indices:admin/plugins/replication/index/pause
- indices:admin/plugins/replication/index/resume
- indices:admin/plugins/replication/index/stop
- indices:admin/plugins/replication/index/update
- indices:admin/plugins/replication/index/status_check
- indices:admin/refresh
- indices:admin/refresh*
- indices:admin/resolve/index
@ -137,22 +227,26 @@ Rather than creating new action groups from individual permissions, you can ofte
- indices:data/read/mget*
- indices:data/read/msearch
- indices:data/read/msearch/template
- indices:data/read/mtv
- indices:data/read/mtv (multi-term vectors)
- indices:data/read/mtv*
- indices:data/read/plugins/replication/file_chunk
- indices:data/read/plugins/replication/changes
- indices:data/read/scroll
- indices:data/read/scroll/clear
- indices:data/read/search
- indices:data/read/search*
- indices:data/read/search/template
- indices:data/read/tv
- indices:data/read/tv (term vectors)
- indices:data/write/bulk
- indices:data/write/bulk*
- indices:data/write/delete
- indices:data/write/delete (delete documents)
- indices:data/write/delete/byquery
- indices:data/write/index
- indices:data/write/plugins/replication/changes
- indices:data/write/index (add documents to existing indices)
- indices:data/write/reindex
- indices:data/write/update
- indices:data/write/update/byquery
- indices:monitor/data_stream/stats
- indices:monitor/recovery
- indices:monitor/segments
- indices:monitor/settings/get

View File

@ -1,7 +1,7 @@
---
layout: default
title: Users and Roles
parent: Access Control
title: Users and roles
parent: Access control
nav_order: 1
---
@ -109,7 +109,9 @@ Role | Description
`anomaly_full_access` | Grants full permissions to all anomaly detection actions.
`anomaly_read_access` | Grants permissions to view detectors, but not create, modify, or delete detectors.
`all_access` | Grants full access to the cluster: all cluster-wide operations, write to all indices, write to all tenants.
`kibana_read_only` | A special role that prevents users from making changes to visualizations, dashboards, and other OpenSearch Dashboards objects. See `plugins.security.readonly_mode.roles` in `opensearch_dashboards.yml`. Pair with the `kibana_user` role.
`cross_cluster_replication_follower_full_access` | Grants full access to perform cross-cluster replication actions on the follower cluster.
`cross_cluster_replication_leader_full_access` | Grants full access to perform cross-cluster replication actions on the leader cluster.
`kibana_read_only` | A special role that prevents users from making changes to visualizations, dashboards, and other OpenSearch Dashboards objects. See `opensearch_security.readonly_mode.roles` in `opensearch_dashboards.yml`. Pair with the `kibana_user` role.
`kibana_user` | Grants permissions to use OpenSearch Dashboards: cluster-wide searches, index monitoring, and write to various OpenSearch Dashboards indices.
`logstash` | Grants permissions for Logstash to interact with the cluster: cluster-wide searches, cluster monitoring, and write to the various Logstash indices.
`manage_snapshots` | Grants permissions to manage snapshot repositories, take snapshots, and restore snapshots.
@ -121,6 +123,7 @@ Role | Description
`reports_full_access` | Grants full permissions to reports.
`asynchronous_search_full_access` | Grants full permissions to all asynchronous search actions.
`asynchronous_search_read_access` | Grants permissions to view asynchronous searches, but not to submit, modify, or delete async searches.
`index_management_full_access` | Grants full permissions to all index management actions, including ISM, transforms, and rollups.
For more detailed summaries of the permissions for each role, reference their action groups against the descriptions in [Default action groups]({{site.url}}{{site.baseurl}}/security-plugin/access-control/default-action-groups/).

View File

@ -53,10 +53,10 @@ The `authc` section has the following format:
http_enabled: <true|false>
transport_enabled: <true|false>
order: <integer>
http_authenticator:
...
authentication_backend:
...
http_authenticator:
...
authentication_backend:
...
```
An entry in the `authc` section is called an *authentication domain*. It specifies where to get the user credentials and against which backend they should be authenticated.

View File

@ -42,10 +42,10 @@ You can optionally add the `-aes256` option to encrypt the key using the AES-256
Next, use the key to generate a self-signed certificate for the root CA:
```bash
openssl req -new -x509 -sha256 -key root-ca-key.pem -out root-ca.pem -days 30
openssl req -new -x509 -sha256 -key root-ca-key.pem -out root-ca.pem -days 730
```
Change `-days 30` to 3650 (10 years) or some other number to set a non-default expiration date. The default value of 30 days is best for testing purposes.
The default `-days` value of 30 is only useful for testing purposes. This sample command specifies 730 (two years) for the certificate expiration date, but use whatever value makes sense for your organization.
- The `-x509` option specifies that you want a self-signed certificate rather than a certificate request.
- The `-sha256` option sets the hash algorithm to SHA-256. SHA-256 is the default in later versions of OpenSSL, but earlier versions might use SHA-1.
@ -78,7 +78,7 @@ Follow the prompts to fill in the details. You don't need to specify a challenge
Finally, generate the certificate itself:
```bash
openssl x509 -req -in admin.csr -CA root-ca.pem -CAkey root-ca-key.pem -CAcreateserial -sha256 -out admin.pem -days 30
openssl x509 -req -in admin.csr -CA root-ca.pem -CAkey root-ca-key.pem -CAcreateserial -sha256 -out admin.pem -days 730
```
Just like the root certificate, use the `-days` option to specify an expiration date of longer than 30 days.
@ -91,7 +91,7 @@ Follow the steps in [Generate an admin certificate](#generate-an-admin-certifica
If you generate node certificates and have `plugins.security.ssl.transport.enforce_hostname_verification` set to `true` (default), be sure to specify a common name (CN) for the certificate that matches the hostname of the intended node. If you want to use the same node certificate on all nodes (not recommended), set hostname verification to `false`. For more information, see [Configure TLS certificates]({{site.url}}{{site.baseurl}}/security-plugin/configuration/tls#advanced-hostname-verification-and-dns-lookup).
### Sample script
## Sample script
If you already know the certificate details and don't want to specify them interactively, use the `-subj` option in your `root-ca.pem` and CSR commands. This script creates a root certificate, admin certificate, two node certificates, and a client certificate, all with an expiration dates of two years (730 days):

View File

@ -11,7 +11,6 @@ Active Directory and LDAP can be used for both authentication and authorization
In most cases, you want to configure both authentication and authorization. You can also use authentication only and map the users retrieved from LDAP directly to security plugin roles.
{% comment %}
## Docker example
@ -38,7 +37,7 @@ We provide a fully functional example that can help you understand how to use an
1. Index a document as `psantos`:
```bash
curl -XPUT https://localhost:9200/new-index/_doc/1 -H 'Content-Type: application/json' -d '{"title": "Spirited Away"}' -u psantos:password -k
curl -XPUT 'https://localhost:9200/new-index/_doc/1' -H 'Content-Type: application/json' -d '{"title": "Spirited Away"}' -u 'psantos:password' -k
```
If you try the same request as `jroe`, it fails. The `Developers` group is mapped to the `readall`, `manage_snapshots`, and `kibana_user` roles and has no write permissions.
@ -46,14 +45,13 @@ We provide a fully functional example that can help you understand how to use an
1. Search for the document as `jroe`:
```bash
curl -XGET https://localhost:9200/new-index/_search?pretty -u jroe:password -k
curl -XGET 'https://localhost:9200/new-index/_search?pretty' -u 'jroe:password' -k
```
This request succeeds, because the `Developers` group is mapped to the `readall` role.
1. If you want to examine the contents of the various containers, run `docker ps` to find the container ID and then `docker exec -it <container-id> /bin/bash`.
{% endcomment %}
## Connection settings

View File

@ -201,7 +201,7 @@ opensearch.requestHeadersWhitelist: ["securitytenant","Authorization","x-forward
You must also enable the authentication type in `opensearch_dashboards.yml`:
```yml
plugins.security.auth.type: "proxy"
plugins.security.proxycache.user_header: "x-proxy-user"
plugins.security.proxycache.roles_header: "x-proxy-roles"
opensearch_security.auth.type: "proxy"
opensearch_security.proxycache.user_header: "x-proxy-user"
opensearch_security.proxycache.roles_header: "x-proxy-roles"
```

View File

@ -11,7 +11,6 @@ The security plugin supports user authentication through SAML single sign-on. Th
This profile is meant for use with web browsers. It is not a general-purpose way of authenticating users against the security plugin, so its primary use case is to support OpenSearch Dashboards single sign-on.
{% comment %}
## Docker example
@ -35,7 +34,6 @@ We provide a fully functional example that can help you understand how to use SA
In particular, you might find it helpful to review the contents of the `/var/www/simplesamlphp/config/` and `/var/www/simplesamlphp/metadata/` directories.
{% endcomment %}
## Activating SAML
@ -300,13 +298,13 @@ authc:
Because most of the SAML-specific configuration is done in the security plugin, just activate SAML in your `opensearch_dashboards.yml` by adding the following:
```
plugins.security.auth.type: "saml"
```yml
opensearch_security.auth.type: "saml"
```
In addition, the OpenSearch Dashboards endpoint for validating the SAML assertions must be whitelisted:
```
```yml
server.xsrf.whitelist: ["/_plugins/_security/saml/acs"]
```

View File

@ -47,6 +47,21 @@ If you use the `-f` argument rather than `-cd`, you can load a single YAML file
-key ../../../config/kirk-key.pem
```
To resolve all environment variables before applying the security configurations, use the `-rev` parameter.
```bash
./securityadmin.sh -cd ../securityconfig/ \
-rev \
-cacert ../../../root-ca.pem \
-cert ../../../kirk.pem \
-key ../../../kirk.key.pem
```
Heres an example of an environment variable in the `config.yml` file:
```yml
password: ${env.LDAP_PASSWORD}
```
## Configure the admin certificate

View File

@ -126,7 +126,7 @@ plugins.security.restapi.password_validation_error_message: "Password must be mi
## whitelist.yml
You can use `whitelist.yml` to allow list any endpoints and HTTP requests. If enabled, all users except the SuperAdmin are allowed access to only the specified endpoints and HTTP requests, and all other HTTP requests associated with the endpoint are denied. For example, if GET `_cluster/settings` is allow listed, users cannot submit PUT requests to `_cluster/settings` to update cluster settings.
You can use `whitelist.yml` to add any endpoints and HTTP requests to a list of allowed endpoints and requests. If enabled, all users except the super admin are allowed access to only the specified endpoints and HTTP requests, and all other HTTP requests associated with the endpoint are denied. For example, if GET `_cluster/settings` is added to the allow list, users cannot submit PUT requests to `_cluster/settings` to update cluster settings.
Note that while you can configure access to endpoints this way, for most cases, it is still best to configure permissions using the security plugin's users and roles, which have more granular settings.
@ -165,7 +165,7 @@ requests:
- PUT
```
You can also allow list custom indices. `whitelist.yml` doesn't support wildcards, so you must manually specify all of the indices you want to allow list.
You can also add custom indices to the allow list. `whitelist.yml` doesn't support wildcards, so you must manually specify all of the indices you want to add.
```yml
requests: # Only allow GET requests to /sample-index1/_doc/1 and /sample-index2/_doc/1
@ -315,6 +315,10 @@ _meta:
## tenants.yml
You can use this file to specify and add any number of OpenSearch Dashboards tenants to your OpenSearch cluster. For more information about tenants, see [OpenSearch Dashboards multi-tenancy]({{site.url}}{{site.baseurl}}/security-plugin/access-control/multi-tenancy).
Like all of the other YAML files, we recommend you use `tenants.yml` to add any tenants you must have in your cluster, and then use OpenSearch Dashboards or the [REST API]({{site.url}}{{site.baseurl}}/security-plugin/access-control/api/#tenants) if you need to further configure or create any other tenants.
```yml
---
_meta:
@ -325,9 +329,12 @@ admin_tenant:
description: "Demo tenant for admin user"
```
## nodes_dn.yml
`nodes_dn.yml` lets you add certificates' [distinguished names (DNs)]({{site.url}}{{site.baseurl}}/security-plugin/configuration/generate-certificates/#add-distinguished-names-to-opensearchyml) an allow list to enable communication between any number of nodes and/or clusters. For example, a node that has the DN `CN=node1.example.com` in its allow list accepts communication from any other node or certificate that uses that DN.
The DNs get indexed into a [system index]({{site.url}}{{site.baseurl}}/security-plugin/configuration/system-indices) that only a super admin or an admin with a Transport Layer Security (TLS) certificate can access. If you want to programmatically add DNs to your allow lists, use the [REST API]({{site.url}}{{site.baseurl}}/security-plugin/access-control/api/#distinguished-names).
```yml
---
_meta:

View File

@ -11,16 +11,47 @@ redirect_from: /troubleshoot/
This page contains a list of common issues and workarounds.
## Java error during startup
You might see `[ERROR][c.a.o.s.s.t.OpenSearchSecuritySSLNettyTransport] [opensearch-node1] SSL Problem Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)` when starting OpenSearch. This problem is a [known issue with Java](https://bugs.openjdk.java.net/browse/JDK-8221218) and doesn't affect the operation of the cluster.
## OpenSearch Dashboards fails to start
If you encounter the error `FATAL Error: Request Timeout after 30000ms` during startup, try running OpenSearch Dashboards on a more powerful machine. We recommend four CPU cores and 8 GB of RAM.
## Requests to OpenSearch Dashboards fail with "Request must contain a osd-xsrf header"
If you run legacy Kibana OSS scripts against OpenSearch Dashboards---for example, curl commands that import saved objects from a file---they might fail with the following error:
```json
{"status": 400, "body": "Request must contain a osd-xsrf header."}
```
In this case, your scripts likely include the `"kbn-xsrf: true"` header. Switch it to the `osd-xsrf: true` header:
```
curl -XPOST -u 'admin:admin' 'https://DASHBOARDS_ENDPOINT/api/saved_objects/_import' -H 'osd-xsrf:true' --form file=@export.ndjson
```
## Multi-tenancy issues in OpenSearch Dashboards
If you're testing multiple users in OpenSearch Dashboards and encounter unexpected changes in tenant, use Google Chrome in an Incognito window or Firefox in a Private window.
## Expired certificates
If your certificates have expired, you might receive the following error or something similar:
```
ERROR org.opensearch.security.ssl.transport.SecuritySSLNettyTransport - Exception during establishing a SSL connection: javax.net.ssl.SSLHandshakeException: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
Caused by: java.security.cert.CertificateExpiredException: NotAfter: Thu Sep 16 11:27:55 PDT 2021
```
To check the expiration date for a certificate, run this command:
```bash
openssl x509 -enddate -noout -in <certificate>
```
## Encryption at rest
The operating system for each OpenSearch node handles encryption of data at rest. To enable encryption at rest in most Linux distributions, use the `cryptsetup` command:
@ -85,8 +116,3 @@ The security plugin blocks the update by script operation (`POST <index>/_update
## Illegal reflective access operation in logs
This is a known issue with Performance Analyzer that shouldn't affect functionality.
## Multi-tenancy issues in OpenSearch Dashboards
If you're testing multiple users in OpenSearch Dashboards and encounter unexpected changes in tenant, use Google Chrome in an Incognito window or Firefox in a Private window.

View File

@ -144,6 +144,12 @@ If you are upgrading an Open Distro for Elasticsearch cluster, we recommend firs
1. Port your settings from `elasticsearch.yml` to `opensearch.yml`. Most settings use the same names. At a minimum, specify `cluster.name`, `node.name`, `discovery.seed_hosts`, and `cluster.initial_master_nodes`.
1. (Optional) If you're actively connecting to the cluster with legacy clients that check for a particular version number, such as Logstash OSS, add a [compatibility setting]({{site.url}}{{site.baseurl}}/clients/agents-and-ingestion-tools/) to `opensearch.yml`:
```yml
compatibility.override_main_response_version: true
```
1. (Optional) Add your certificates to your `config` directory, add them to `opensearch.yml`, and initialize the security plugin.
1. Start OpenSearch on the node (rolling) or all nodes (cluster restart).
@ -173,3 +179,116 @@ If you are upgrading an Open Distro for Elasticsearch cluster, we recommend firs
}
}
```
## Upgrade tool
The `opensearch-upgrade` tool lets you automate some of the steps in [Upgrade to OpenSearch]({{site.url}}{{site.baseurl}}/upgrade-to/upgrade-to/#upgrade-to-opensearch), eliminating the need for error-prone manual operations.
The `opensearch-upgrade` tool performs the following functions:
- Imports any existing configurations and applies it to the new installation of OpenSearch.
- Installs any existing core plugins.
### Limitations
The `opensearch-upgrade` tool doesn't perform an end-to-end upgrade:
- You need to run the tool on each node of the cluster individually as part of the upgrade process.
- The tool doesn't provide a rollback option after you've upgraded a node, so make sure you follow best practices and take backups.
- You must install all community plugins (if available) manually.
- The tool only validates any keystore settings at service start-up time, so you must manually remove any unsupported settings for the service to start.
### Using the upgrade tool
To perform a rolling upgrade using the [OpenSearch tarball]({{site.url}}{{site.baseurl}}/opensearch/install/tar/) distribution:
Check [Upgrade paths]({{site.url}}{{site.baseurl}}/upgrade-to/upgrade-to/#upgrade-paths) to make sure that the version youre upgrading to is supported and whether you need to upgrade to a supported Elasticsearch OSS version first.
{: .note }
1. Disable shard allocation to prevent Elasticsearch OSS from replicating shards as you shut down nodes:
```json
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.enable": "primaries"
}
}
```
1. On any one of the nodes, download and extract the OpenSearch tarball to a new directory.
1. Make sure the following environment variables are set:
- `ES_HOME` - Path to the existing Elasticsearch installation home.
```bash
export ES_HOME = /home/workspace/upgrade-demo/node1/elasticsearch-7.10.2
```
- `ES_PATH_CONF` - Path to the existing Elasticsearch config directory.
```bash
export ES_PATH_CONF = /home/workspace/upgrade-demo/node1/os-config
```
- `OPENSEARCH_HOME` - Path to the OpenSearch installation home.
```bash
export OPENSEARCH_HOME = /home/workspace/upgrade-demo/node1/opensearch-1.0.0
```
- `OPENSEARCH_PATH_CONF` - Path to the OpenSearch config directory.
```bash
export OPENSEARCH_PATH_CONF = /home/workspace/upgrade-demo/node1/opensearch-config
```
1. The `opensearch-upgrade` tool is in the `bin` directory of the distribution. Run the following command from the distribution home:
Make sure you run this tool as the same user running the current Elasticsearch service.
{: .note }
```json
./bin/opensearch-upgrade
```
1. Stop Elasticsearch OSS on the node.
On Linux distributions that use systemd, use this command:
```bash
sudo systemctl stop elasticsearch.service
```
For tarball installations, find the process ID (`ps aux`) and kill it (`kill <pid>`).
1. Start OpenSearch on the node:
```json
./bin/opensearch -d.
```
1. Repeat steps 2--6 until all nodes are using the new version.
1. After all nodes are using the new version, re-enable shard allocation:
```json
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.enable": "all"
}
}
```
### How it works
Behind the scenes, the `opensearch-upgrade` tool performs the following tasks in sequence:
1. Looks for a valid Elasticsearch installation on the current node. After it finds the installation, it reads the `elasticsearch.yml` file to get the endpoint details and connects to the locally running Elasticsearch service. If the tool can't find an Elasticsearch installation, it tries to get the path from the `ES_HOME` location.
1. Verifies if the existing version of Elasticsearch is compatible with the OpenSearch version. It prints a summary of the information gathered to the console and prompts you for a confirmation to proceed.
1. Imports the settings from the `elasticsearch.yml` config file into the `opensearch.yml` config file.
1. Copies across any custom JVM options from the `$ES_PATH_CONF/jvm.options.d` directory into the `$OPENSEARCH_PATH_CONF/jvm.options.d` directory. Similarly, it also imports the logging configurations from the `$ES_PATH_CONF/log4j2.properties` file into the `$OPENSEARCH_PATH_CONF/log4j2.properties` file.
1. Installs the core plugins that youve currently installed in the `$ES_HOME/plugins` directory. You must install all other third-party community plugins manually.
1. Imports the secure settings from the `elasticsearch.keystore` file (if any) into the `opensearch.keystore` file. If the keystore file is password protected, the `opensearch-upgrade` tool prompts you to enter the password.

Some files were not shown because too many files have changed in this diff Show More