Merge branch 'main' into configuration-env-change

This commit is contained in:
Andrew Etter 2021-10-05 15:18:14 -07:00 committed by GitHub
commit eb52f210db
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
198 changed files with 4655 additions and 799 deletions

View File

@ -21,11 +21,9 @@ gem "jekyll-redirect-from", "~> 0.16"
# gem 'github-pages', group: :jekyll_plugins
# If you have any plugins, put them here!
# group :jekyll_plugins do
# # gem "jekyll-feed", "~> 0.6"
# gem "jekyll-remote-theme"
# gem "jekyll-redirect-from"
# end
group :jekyll_plugins do
gem "jekyll-sitemap"
end
# Windows does not include zoneinfo files, so bundle the tzinfo-data gem
gem "tzinfo-data", platforms: [:mingw, :mswin, :x64_mingw, :jruby]

2
NOTICE
View File

@ -1 +1 @@
Copyright 2021 OpenSearch contributors.
Copyright OpenSearch contributors.

View File

@ -1,3 +1,5 @@
<img src="https://opensearch.org/assets/img/opensearch-logo-themed.svg" height="64px">
# OpenSearch documentation
This repository contains the documentation for OpenSearch, the search, analytics, and visualization suite with advanced security, alerting, SQL support, automated index management, deep performance analysis, and more. You can find the rendered documentation at [opensearch.org/docs](https://opensearch.org/docs).
@ -131,15 +133,15 @@ If you're making major changes to the documentation and need to see the rendered
sh build.sh
```
1. If the build script doesn't automatically open your web browser (it should), open [http://localhost:4000/](http://localhost:4000/).
1. If the build script doesn't automatically open your web browser (it should), open [http://localhost:4000/docs/](http://localhost:4000/docs/).
1. Create a new branch.
1. Edit the Markdown files in `/docs`.
1. Edit the Markdown files in each collection (e.g. `_security-plugin/`).
If you're a web developer, you can customize `_layouts/default.html` and `_sass/custom/custom.scss`.
1. When you save a file, marvel as Jekyll automatically rebuilds the site and refreshes your web browser. This process takes roughly 30 seconds.
1. When you save a file, marvel as Jekyll automatically rebuilds the site and refreshes your web browser. This process can take anywhere from 10-30 seconds.
1. When you're happy with how everything looks, commit, push your changes to your fork, and submit a pull request.
@ -194,7 +196,7 @@ If you're making major changes to the documentation and need to see the rendered
## New releases
1. Branch.
1. Change the `opensearch_version` and `opensearch_major_version` variables in `_config.yml`.
1. Change the `opensearch_version` and `opensearch_major_minor_version` variables in `_config.yml`.
1. Start up a new cluster using the updated Docker Compose file in `docs/install/docker.md`.
1. Update the version table in `version-history.md`.
@ -223,7 +225,7 @@ This documentation uses a modified version of the [just-the-docs](https://github
## Get started
New
{: .label .label-green :}
{: .label .label-green }
```
* Labels come in default (blue), green, purple, yellow, and red.
@ -236,6 +238,27 @@ These classes can help with readability, but should be used *sparingly*. Each ad
Besides, standard Markdown elements suffice for most documentation.
## Labels for APIs
Each API operation has a label indicating when it was introduced. For most operations, this label is 1.0:
```
## Get roles
Introduced 1.0
{: .label .label-purple }
```
If we introduce a breaking change to an operation, add an additional label with a link to the release note for that breaking change:
```
## Get roles
Introduced 1.0
{: .label .label-purple }
[Last breaking change 2.0](https://example.com)
{: .label .label-red }
```
## Math
If you want to use the sorts of pretty formulas that [MathJax](https://www.mathjax.org) allows, add `has_math: true` to the Jekyll page metadata. Then insert LaTeX math into HTML tags with the rest of your Markdown content:
@ -271,4 +294,4 @@ This project is licensed under the Apache-2.0 License.
## Copyright
Copyright 2021 OpenSearch contributors.
Copyright OpenSearch contributors.

View File

@ -27,14 +27,19 @@ PUT _cluster/settings
}
```
For a longer term solution, we plan to create an OpenSearch output plugin for Logstash. This plugin *does not exist yet*, but we've included it in the compatibility matrices below based on its expected behavior.
[Just like any other setting]({{site.url}}{{site.baseurl}}/opensearch/configuration/), the alternative is to add the following line to `opensearch.yml` on each node and then restart the node:
```yml
compatibility.override_main_response_version: true
```
## Downloads
These versions of Logstash and Beats offer the best compatibility with OpenSearch. For more information, see the [compatibility matrices](#compatibility-matrices).
You can download the OpenSearch output plugin for Logstash from [OpenSearch downloads](https://opensearch.org/downloads.html). The Logstash output plugin is compatible with OpenSearch and Elasticsearch OSS (7.10.2 or lower).
These are the latest versions of Beats OSS with OpenSearch compatibility. For more information, see the [compatibility matrices](#compatibility-matrices).
- [Logstash OSS 7.12.1](https://www.elastic.co/downloads/past-releases/logstash-oss-7-12-1)
- [Filebeat OSS 7.12.1](https://www.elastic.co/downloads/past-releases/filebeat-oss-7-12-1)
- [Metricbeat OSS 7.12.1](https://www.elastic.co/downloads/past-releases/metricbeat-oss-7-12-1)
- [Packetbeat OSS 7.12.1](https://www.elastic.co/downloads/past-releases/packetbeat-oss-7-12-1)
@ -42,6 +47,9 @@ These versions of Logstash and Beats offer the best compatibility with OpenSearc
- [Winlogbeat OSS 7.12.1](https://www.elastic.co/downloads/past-releases/winlogbeat-oss-7-12-1)
- [Auditbeat OSS 7.12.1](https://elastic.co/downloads/past-releases/auditbeat-oss-7-12-1)
Some users report compatibility issues with ingest pipelines on these versions of Beats. If you use ingest pipelines with OpenSearch, consider using the 7.10.2 versions of Beats instead.
{: .note }
## Compatibility Matrices
@ -50,28 +58,26 @@ These versions of Logstash and Beats offer the best compatibility with OpenSearc
### Compatibility Matrix for Logstash
| | Logstash OSS 7.x to 7.11.x | Logstash OSS 7.12.x\* | Logstash 7.13.x without OpenSearch output plugin | Logstash 7.13.x with OpenSearch output plugin\*\* |
| | Logstash OSS 7.x to 7.11.x | Logstash OSS 7.12.x\* | Logstash 7.13.x without OpenSearch output plugin | Logstash 7.13.x with OpenSearch output plugin |
| :---| :--- | :--- | :--- | :--- |
| Elasticsearch OSS v7.x to v7.9.x | *Yes* | *Yes* | *No* | *Yes* |
| Elasticsearch OSS v7.10.2 | *Yes* | *Yes* | *No* | *Yes* |
| ODFE OSS v1.x to 1.12 | *Yes* | *Yes* | *No* | *Yes* |
| Elasticsearch OSS 7.x to 7.9.x | *Yes* | *Yes* | *No* | *Yes* |
| Elasticsearch OSS 7.10.2 | *Yes* | *Yes* | *No* | *Yes* |
| ODFE 1.x to 1.12 | *Yes* | *Yes* | *No* | *Yes* |
| ODFE 1.13 | *Yes* | *Yes* | *No* | *Yes* |
| OpenSearch 1.0 | [Yes via version setting](https://github.com/opensearch-project/OpenSearch/issues/693) | [Yes via version setting](https://github.com/opensearch-project/OpenSearch/issues/693) | *No* | *Yes* |
| OpenSearch 1.0 | Yes via version setting | Yes via version setting | *No* | *Yes* |
\* Most current compatible version with Elasticsearch OSS.
\*\* Planning to build.
### Compatibility Matrix for Beats
| | Beats OSS 7.x to 7.11.x\*\* | Beats OSS 7.12.x\* | Beats 7.13.x |
| :--- | :--- | :--- | :--- |
| Elasticsearch OSS v7.x to v7.9.x | *Yes* | *Yes* | No |
| Elasticsearch OSS v7.10.2 | *Yes* | *Yes* | No |
| ODFE OSS v1.x to 1.12 | *Yes* | *Yes* | No |
| Elasticsearch OSS 7.x to 7.9.x | *Yes* | *Yes* | No |
| Elasticsearch OSS 7.10.2 | *Yes* | *Yes* | No |
| ODFE 1.x to 1.12 | *Yes* | *Yes* | No |
| ODFE 1.13 | *Yes* | *Yes* | No |
| OpenSearch 1.0 | [Yes via version setting](https://github.com/opensearch-project/OpenSearch/issues/693) | [Yes via version setting](https://github.com/opensearch-project/OpenSearch/issues/693) | No |
| OpenSearch 1.0 | Yes via version setting | Yes via version setting | No |
| Logstash OSS 7.x to 7.11.x | *Yes* | *Yes* | *Yes* |
| Logstash OSS 7.12.x\* | *Yes* | *Yes* | *Yes* |
| Logstash 7.13.x with OpenSearch output plugin | *Yes* | *Yes* | *Yes* |

View File

@ -3,9 +3,6 @@ layout: default
title: OpenSearch CLI
nav_order: 52
has_children: false
redirect_from:
- /docs/odfe-cli/
- /docs/cli/
---
# OpenSearch CLI

145
_clients/go.md Normal file
View File

@ -0,0 +1,145 @@
---
layout: default
title: Go client
nav_order: 80
---
# Go client
The OpenSearch Go client lets you connect your Go application with the data in your OpenSearch cluster.
## Setup
If you're creating a new project:
```go
go mod init
```
To add the client to your project, import it like any other module:
```go
go get github.com/opensearch-project/opensearch-go
```
## Sample code
This sample code creates a client, adds an index with non-default settings, inserts a document, searches for the document, deletes the document, and finally deletes the index:
```go
package main
import (
"os"
"context"
"crypto/tls"
"fmt"
opensearch "github.com/opensearch-project/opensearch-go"
opensearchapi "github.com/opensearch-project/opensearch-go/opensearchapi"
"net/http"
"strings"
)
const IndexName = "go-test-index1"
func main() {
// Initialize the client with SSL/TLS enabled.
client, err := opensearch.NewClient(opensearch.Config{
Transport: &http.Transport{
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
},
Addresses: []string{"https://localhost:9200"},
Username: "admin", // For testing only. Don't store credentials in code.
Password: "admin",
})
if err != nil {
fmt.Println("cannot initialize", err)
os.Exit(1)
}
// Print OpenSearch version information on console.
fmt.Println(client.Info())
// Define index mapping.
mapping := strings.NewReader(`{
'settings': {
'index': {
'number_of_shards': 4
}
}
}`)
// Create an index with non-default settings.
res := opensearchapi.CreateRequest{
Index: IndexName,
Body: mapping,
}
fmt.Println("creating index", res)
// Add a document to the index.
document := strings.NewReader(`{
"title": "Moneyball",
"director": "Bennett Miller",
"year": "2011"
}`)
docId := "1"
req := opensearchapi.IndexRequest{
Index: IndexName,
DocumentID: docId,
Body: document,
}
insertResponse, err := req.Do(context.Background(), client)
if err != nil {
fmt.Println("failed to insert document ", err)
os.Exit(1)
}
fmt.Println(insertResponse)
// Search for the document.
content := strings.NewReader(`{
"size": 5,
"query": {
"multi_match": {
"query": "miller",
"fields": ["title^2", "director"]
}
}
}`)
search := opensearchapi.SearchRequest{
Body: content,
}
searchResponse, err := search.Do(context.Background(), client)
if err != nil {
fmt.Println("failed to search document ", err)
os.Exit(1)
}
fmt.Println(searchResponse)
// Delete the document.
delete := opensearchapi.DeleteRequest{
Index: IndexName,
DocumentID: docId,
}
deleteResponse, err := delete.Do(context.Background(), client)
if err != nil {
fmt.Println("failed to delete document ", err)
os.Exit(1)
}
fmt.Println("deleting document")
fmt.Println(deleteResponse)
// Delete previously created index.
deleteIndex := opensearchapi.IndicesDeleteRequest{
Index: []string{IndexName},
}
deleteIndexResponse, err := deleteIndex.Do(context.Background(), client)
if err != nil {
fmt.Println("failed to delete index ", err)
os.Exit(1)
}
fmt.Println("deleting index", deleteIndexResponse)
}
```

10
_clients/grafana.md Normal file
View File

@ -0,0 +1,10 @@
---
layout: default
title: Grafana
nav_order: 150
has_children: false
---
# Grafana support
Grafana has a data source plugin that lets you explore and visualize your OpenSearch data. For information on getting started with the plugin, see the [Grafana overview page](https://grafana.com/grafana/plugins/grafana-opensearch-datasource/).

96
_clients/index.md Normal file
View File

@ -0,0 +1,96 @@
---
layout: default
title: Compatibility
nav_order: 1
has_children: false
redirect_from:
- /clients/
---
# OpenSearch client compatibility
OpenSearch provides clients for several popular programming languages, with more coming. In general, clients are compatible with clusters running the same major version of OpenSearch (`major.minor.patch`).
For example, a 1.0.0 client works with an OpenSearch 1.1.0 cluster, but might not support any non-breaking API changes in OpenSearch 1.1.0. A 1.2.0 client works with the same cluster, but might allow you to pass unsupported options in certain functions. We recommend using the same version for both, but if your tests pass after a cluster upgrade, you don't necessarily need to upgrade your clients immediately.
{% comment %}
* [OpenSearch Java client]({{site.url}}{{site.baseurl}}/clients/java/)
{% endcomment %}
* [OpenSearch Python client]({{site.url}}{{site.baseurl}}/clients/python/)
* [OpenSearch JavaScript (Node.js) client]({{site.url}}{{site.baseurl}}/clients/javascript/)
* [OpenSearch Go client]({{site.url}}{{site.baseurl}}/clients/go/)
## Legacy clients
Most clients that work with Elasticsearch OSS 7.10.2 *should* work with OpenSearch, but the latest versions of those clients might include license or version checks that artificially break compatibility. This page includes recommendations around which versions of those clients to use for best compatibility with OpenSearch.
Client | Recommended version
:--- | :---
[Java low-level REST client](https://search.maven.org/artifact/org.elasticsearch.client/elasticsearch-rest-client/7.13.4/jar) | 7.13.4
[Java high-level REST client](https://search.maven.org/artifact/org.elasticsearch.client/elasticsearch-rest-high-level-client/7.13.4/jar) | 7.13.4
[Python Elasticsearch client](https://pypi.org/project/elasticsearch/7.13.4/) | 7.13.4
[Elasticsearch Node.js client](https://www.npmjs.com/package/@elastic/elasticsearch/v/7.13.0) | 7.13.0
If you test a legacy client and verify that it works, please [submit a PR](https://github.com/opensearch-project/documentation-website/pulls) and add it to this table.
{% comment %}
## Python 3 test code
This code indexes a single document and is equivalent to `PUT /python-test-index1/_doc/1`.
```python
from elasticsearch import Elasticsearch
host = 'localhost'
port = 9200
# For testing only. Do not store credentials in code.
auth = ('admin', 'admin')
es = Elasticsearch(
hosts = [{'host': host, 'port': port}],
http_auth = auth,
use_ssl = True,
verify_certs = False
)
document = {
"title": "Moneyball",
"director": "Bennett Miller",
"year": "2011"
}
response = es.index(index='python-test-index1', id='1', body=document, refresh=True)
print(response)
```
## Node.js test code
This code is equivalent to `GET /`.
```js
const { Client } = require('@elastic/elasticsearch')
const client = new Client({
node: 'https://localhost:9200',
auth: {
// For testing only. Don't store credentials in code.
username: 'admin',
password: 'admin'
},
ssl: {
// ca: fs.readFileSync('./cacert.pem'),
rejectUnauthorized: false
}
})
async function run () {
const { body } = await client.info();
console.log(body);
}
run().catch(console.log)
```
{% endcomment %}

View File

@ -1,10 +1,10 @@
---
layout: default
title: Java high-level REST client
nav_order: 97
title: Elasticsearch OSS Java high-level REST client
nav_order: 60
---
# Java high-level REST client
# Elasticsearch OSS Java high-level REST client
The Elasticsearch OSS Java high-level REST client allows you to interact with your OpenSearch clusters and indices through Java methods and data structures rather than HTTP methods and JSON.
@ -22,7 +22,7 @@ To start using the Elasticsearch OSS Java high-level REST client, ensure that yo
</dependency>
```
You can now start your OpenSearch cluster. The 7.10.2 high-level REST client works with the 1.x versions of OpenSearch.
You can now start your OpenSearch cluster. The 7.10.2 Elasticsearch OSS high-level REST client works with the 1.x versions of OpenSearch.
## Sample code
@ -79,7 +79,7 @@ public class RESTClientSample {
RestHighLevelClient client = new RestHighLevelClient(builder);
//Create a non-default index with custom settings and mappings.
CreateIndexRequest createIndexRequest = new CreateIndexRequest("test-index");
CreateIndexRequest createIndexRequest = new CreateIndexRequest("custom-index");
createIndexRequest.settings(Settings.builder() //Specify in the settings how many shards you want in the index.
.put("index.number_of_shards", 4)
@ -93,7 +93,7 @@ public class RESTClientSample {
HashMap<String, Object> mapping = new HashMap<String, Object>();
mapping.put("properties", ageMapping);
createIndexRequest.mapping(mapping);
CreateIndexResponse createIndexResponse = client.indices().create(createIndexRequest, RequestOptions.DEFAULT
CreateIndexResponse createIndexResponse = client.indices().create(createIndexRequest, RequestOptions.DEFAULT);
//Adding data to the index.
IndexRequest request = new IndexRequest("custom-index"); //Add a document to the custom-index we created.

141
_clients/javascript.md Normal file
View File

@ -0,0 +1,141 @@
---
layout: default
title: JavaScript client
nav_order: 90
---
# JavaScript client
The OpenSearch JavaScript client provides a safer and easier way to interact with your OpenSearch cluster. Rather than using OpenSearch from the browser and potentially exposing your data to the public, you can build an OpenSearch client that takes care of sending requests to your cluster.
The client contains a library of APIs that let you perform different operations on your cluster and return a standard response body. The example here demonstrates some basic operations like creating an index, adding documents, and searching your data.
## Setup
To add the client to your project, install it from [npm](https://www.npmjs.com):
```bash
npm install @opensearch-project/opensearch
```
To install a specific major version of the client, run the following command:
```bash
npm install @opensearch-project/opensearch@<version>
```
If you prefer to add the client manually or just want to examine the source code, see [opensearch-js](https://github.com/opensearch-project/opensearch-js) on GitHub.
Then require the client:
```javascript
const { Client } = require("@opensearch-project/opensearch");
```
## Sample code
```javascript
"use strict";
var host = "localhost";
var protocol = "https";
var port = 9200;
var auth = "admin:admin"; // For testing only. Don't store credentials in code.
var ca_certs_path = "/full/path/to/root-ca.pem";
// Optional client certificates if you don't want to use HTTP basic authentication.
// var client_cert_path = '/full/path/to/client.pem'
// var client_key_path = '/full/path/to/client-key.pem'
// Create a client with SSL/TLS enabled.
var { Client } = require("@opensearch-project/opensearch");
var fs = require("fs");
var client = new Client({
node: protocol + "://" + auth + "@" + host + ":" + port,
ssl: {
ca: fs.readFileSync(ca_certs_path),
// You can turn off certificate verification (rejectUnauthorized: false) if you're using self-signed certificates with a hostname mismatch.
// cert: fs.readFileSync(client_cert_path),
// key: fs.readFileSync(client_key_path)
},
});
async function search() {
// Create an index with non-default settings.
var index_name = "books";
var settings = {
settings: {
index: {
number_of_shards: 4,
number_of_replicas: 3,
},
},
};
var response = await client.indices.create({
index: index_name,
body: settings,
});
console.log("Creating index:");
console.log(response.body);
// Add a document to the index.
var document = {
title: "The Outsider",
author: "Stephen King",
year: "2018",
genre: "Crime fiction",
};
var id = "1";
var response = await client.index({
id: id,
index: index_name,
body: document,
refresh: true,
});
console.log("Adding document:");
console.log(response.body);
// Search for the document.
var query = {
query: {
match: {
title: {
query: "The Outsider",
},
},
},
};
var response = await client.search({
index: index_name,
body: query,
});
console.log("Search results:");
console.log(response.body.hits);
// Delete the document.
var response = await client.delete({
index: index_name,
id: id,
});
console.log("Deleting document:");
console.log(response.body);
// Delete the index.
var response = await client.indices.delete({
index: index_name,
});
console.log("Deleting index:");
console.log(response.body);
}
search().catch(console.log);
```

View File

@ -0,0 +1,246 @@
---
layout: default
title: Advanced configurations
parent: Logstash
nav_order: 230
---
# Advanced configurations
This section describes how to set up advanced configuration options, like referencing field values and conditional statements, for Logstash.
## Referencing field values
To get access to a field, use the `- field` syntax.
You can also surround the field name by square brackets `- [field]` which makes it more explicit that you're referring to a field.
For example, if you have the following event:
```bash
{
"request": "/products/view/123",
"verb": "GET",
"response": 200,
"headers": {
"request_path" => "/"
}
}
```
To access the `request` field, use `- request` or `- [request]`.
If you want to reference nested fields, use the square brackets syntax and specify the path to the field. With each level being enclosed within square brackets: `- [headers][request_path]`.
You can reference fields using the `sprintf` format. This is also called string expansion. You need to add a % sign and then wrap the field reference within curly brackets.
You need to reference field values when using conditional statements.
For example, you can make the file name dynamic and contain the type of the processed events - either `access` or `error`. The `type` option is mainly used for conditionally applying filter plugins based on the type of events being processed.
Let's add a `type` option and specify a value of `access`.
```yml
input {
file {
path => ""
start_position => "beginning"
type => "access"
}
http {
type => "access"
}
}
filter {
mutate {
remove_field => {"host"}
}
}
output {
stdout {
codec => rubydebug
}
file {
path => "%{[type]}.log"
}
}
```
Start Logstash and send an HTTP request. The processed event is output in the terminal. The event now includes a field named `type`.
You'll see the `access.log` file created within the Logstash directory.
## Conditional statements
You can use conditional statements to control the flow of code execution based on some conditions.
Syntax:
```yml
if EXPR {
...
} else if EXPR {
...
} else {
...
}
```
`EXPR` is any valid Logstash syntax that evaluates to a boolean value.
For example, you can check if an event type is set to `access` or `error` and perform some action based on that:
```yml
if [type] == "access" {
...
} else if [type] == "error" {
file { .. }
} else {
...
}
```
You can compare a field value to some arbitrary value:
```yml
if [headers][content_length] >= 1000 {
...
}
```
You can regex:
```yml
if [some_field =~ /[0-9]+/ {
//some field only contains digits
}
```
You can use arrays:
```yml
if [some_field] in ["one", "two", "three"] {
some field is either "one", "two", or "three"
}
```
You can use boolean operators:
```yml
if [type] == "access" or [type] == "error" {
...
}
```
## Formatting dates
You can use the `sprintf` format or string expansion to format dates.
For example, you might want the current date to be part of the filename.
To format the date, add a plus sign in curly brackets followed by the date format - `%{+yyyy-MM-dd}`.
```yml
file {
path => "%{[type]}_%{+yyyy_MM_dd}.log"
}
```
This is the date stored within the @timestamp fields, which is the time and date of the event.
Send a request to the pipeline and verify that a filename is outputted that contains the events date.
You can embed the date in other outputs as well, for example into the index name in OpenSearch.
## Sending time information
You can set the time of events.
Logstash already sets the time when the event is received by the input plugin within the @timestamp field.
In some scenarios, you might need to use a different timestamp.
For example, if you have an eCommerce store and you process the orders daily at midnight. When Logstash receives the events at midnight, it sets the timestamp to the current time.
But you want it to be the time when the order is placed and not when Logstash received the event.
Let's change the event timestamp to the date the request is received by the web server. You can do this using a filter plugin named `dates`.
The `dates` filter passes a `date` or `datetime` value from a field and uses the results as the event timestamp.
Add the `date` plugin at the bottom of the `filter` block:
```yml
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
```
timestamp is the field that the `grok` pattern creates.
`Z` is the timezone. i.e., UTC offsets.
Start Logstash and send an HTTP request.
You can see that the filename contains the date of the request instead of the present date.
If the passing of the date fails, the `filter` plugin adds a tag named `_datepassfailure` to the text field.
After you have set the @timestamp field to a new value, you don't really need the other `timestamp` field anymore. You can remove it with the `remove_field` option.
```yml
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
remove_field => [ "timestamp" ]
}
```
## Parsing user agents
The user agent is the last part of a log entry that consists of the name of the browser, the browser version, and the OS of the device.
Users might be using a wide range of browsers, devices, and OS's. Doing this manually is hard.
You can't use `grok` patterns because the `grok` pattern only matches the usage in the string as whole and doesn't figure out which browser the visitor used for instance.
Logstash ships with a file containing regular expressions for this purpose. This makes it really easy to extract user agent information, which you could send to OpenSearch and run aggregations on.
To do this, add a `source` option that contains the name of the field. In this case, that's the `agent` field.
By default the user agent plugin, adds a number of fields at the top-level of the event.
Since that can get pretty confusing, we can add an option named `target` with a value of `ua`, short for user agent. What this does is that it nests the fields within an object named `ua`, making things more organized.
```yml
useragent {
source => "agent"
target => "ua"
}
```
Start Logstah and send an HTTP request.
You can see a field named `ua` with a number of keys including the browser name and version, the OS, and the device.
You could OpenSearch Dashboards to create a pie chart that shows how many visitors are from mobile devices and how many are desktop users. Or, you could get statistics on which browser versions are popular.
## Enriching geographical data
You can take an IP address and perform geographical lookup to resolve the geographical location of the user using the `geoip` filter.
The `geoip` filter plugin ships with a database called `geolite 2`, which is provided by a company named MaxMind. `geolite 2` is a popular source of geographical data and it's available for free.
Add the `geoip` plugin at the bottom of the `else` block.
The value of the `source` option is the name of the field containing the IP address, in this case that's `clientip`. You can make this field available using the `grok` pattern.
```yml
geoip {
source => "clientip"
}
```
Start Logstash and send an HTTP request.
Within the terminal, you see a new field named `geoip` that contains information such as the timezone, country, continent, city, postal code, and the latitude / longitude pair.
If you only need the country name for instance, include an option named `fields` with an array of the field names that you want the `geoip` plugin to return.
Some of the fields are not always available such as city name and region because translating IP addresses into geographical locations is generally not that accurate. If the `geoip` plugin fails to look up the geographical location, it adds a tag named `geoip_lookup_failure`.
You can use the `geoip` plugin with the OpenSearch output because `location` object within the `geoip` object, is a standard format for representing geospatial data in JSON. This is the same format as OpenSearch uses for its `geo_point` data type.
You can use the powerful geospatial queries of OpenSearch for working with geographical data.

View File

@ -0,0 +1,157 @@
---
layout: default
title: Common filter plugins
parent: Logstash
nav_order: 220
---
# Common filter plugins
This page contains a list of common filter plugins.
## mutate
You can use the `mutate` filter to change the data type of a field. For example, you can use the `mutate` filter if you're sending events to OpenSearch and you need to change the data type of a field to match any existing mappings.
To convert the `quantity` field from a `string` type to an `integer` type:
```yml
input {
http {
host => "127.0.0.1"
port => 8080
}
}
filter {
mutate {
convert => {"quantity" => "integer"}
}
}
output {
file {
path => "output.txt"
}
}
```
#### Sample output
You can see that the type of the `quantity` field is changed from a `string` to an `integer`.
```yml
{
"quantity" => 3,
"host" => "127.0.0.1",
"@timestamp" => 2021-05-23T19:02:08.026Z,
"amount" => 10,
"@version" => "1",
"headers" => {
"request_path" => "/",
"connection" => "keep-alive",
"content_length" => "41",
"http_user_agent" => "PostmanRuntime/7.26.8",
"request_method" => "PUT",
"cache_control" => "no-cache",
"http_accept" => "*/*",
"content_type" => "application/json",
"http_version" => "HTTP/1.1",
"http_host" => "127.0.0.1:8080",
"accept_encoding" => "gzip, deflate, br",
"postman_token" => "ffd1cdcb-7a1d-4d63-90f8-0f2773069205"
}
}
```
Other data types you can convert to are `float`, `string`, and `boolean` values. If you pass in an array, the `mutate` filter converts all the elements in the array. If you pass a `string` like "world" to cast to an `integer` type, the result is 0 and Logstash continues processing events.
Logstash supports a few common options for all filter plugins:
Option | Description
:--- | :---
`add_field` | Adds one or more fields to the event.
`remove_field` | Removes one or more events from the field.
`add_tag` | Adds one or more tags to the event. You can use tags to perform conditional processing on events depending on which tags they contain.
`remove_tag` | Removes one or more tags from the event.
For example, you can remove the `host` field from the event:
```yml
input {
http {
host => "127.0.0.1"
port => 8080
}
}
filter {
mutate {
remove_field => {"host"}
}
}
output {
file {
path => "output.txt"
}
}
```
## grok
With the `grok` filter, you can parse unstructured data and and structure it into fields. The `grok` filter uses text patterns to match text in your logs. You can think of text patterns as variables containing regular expressions.
The format of a text pattern is as follows:
```bash
%{SYNTAX:SEMANTIC}
```
`SYNTAX` is the format a piece of text should be in for the pattern to match. You can enter any of `grok`'s predefined patterns. For example, you can use the email identifier to match an email address from a given piece of text.
`SEMANTIC` is an arbitrary name for the matched text. For example, if you're using the email identifier syntax, you can name it “email.”
The following request consists of the IP address of the visitor, name of the visitor, the timestamp of the request, the HTTP verb and URL, the HTTP status code, and the number of bytes:
```bash
184.252.108.229 - joe [20/Sep/2017:13:22:22 +0200] GET /products/view/123 200 12798
```
To split this request into different fields:
```yml
filter {
grok {
match => { "message" => " %{IP: ip_address} %{USER:identity}
%{USER:auth} \[%{HTTPDATE:reg_ts}\]
\"%{WORD:http_verb}
%{URIPATHPARAM: req_path}
\" %{INT:http_status:int}
%{INT:num_bytes:int}"}
}
}
```
where:
- `IP`: matches the IP address field.
- `USER`: matches the user name.
- `WORD`: matches the HTTP verb.
- `URIPATHPARAM`: matches the URI path.
- `INT`: matches the HTTP status field.
- `INT`: matches the number of bytes.
This is what the event looks like after the `grok` filter breaks it down into individual fields:
```yml
ip_address: 184.252.108.229
identity: joe
reg_ts: 20/Sep/2017:13:22:22 +0200
http_verb:GET
req_path: /products/view/123
http_status: 200
num_bytes: 12798
```
For common log formats, you use the predefined patterns defined here---[Logstash patterns](https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/ecs-v1). You can make any adjustments to the results with the `mutate` filter.

View File

@ -0,0 +1,40 @@
---
layout: default
title: Logstash execution model
parent: Logstash
nav_order: 210
---
# Logstash execution model
Here's a brief introduction to how Logstash processes events internally.
## Handling events concurrently
You can configure Logstash to have a number of inputs listening for events. Each input runs in its own thread to avoid inputs blocking each other. If you have two incoming events at the same time, Logstash handles both events concurrently.
After receiving an event and possibly applying an input codec, Logstash sends the event to a work queue. Pipeline workers or batchers perform the rest of the work involving filters and outputs along with any codec used at the output. Each pipeline worker also runs within its own thread meaning that Logstash processes multiple events simultaneously.
## Processing events in batches
A pipeline worker consumes events from the work queue in batches to optimize the throughput of the pipeline as a whole.
One reason why Logstash works in batches is that some code needs to be executed regardless of how many events are processed at a time within the pipeline worker. Instead of executing that code 100 times for 100 events, its more efficient to execute it once for a batch of 100 events.
Another reason is that a few output plugins group together events as batches. For example, if you send 100 requests to OpenSearch, the OpenSearch output plugin uses the bulk API to send a single request that groups together the 100 requests.
Logstash determines the batch size by two configuration options---a number representing the maximum batch size and the batch delay. The batch delay is how long Logstash waits before processing the unprocessed batch of events.
If you set the maximum batch size to 50 and the batch delay to 100 ms, Logstash processes a batch if they're either 50 unprocessed events in the work queue or if one hundred milliseconds have elapsed.
The reason that a batch is processed, even if the maximum batch size isnt reached, is to reduce the delay in processing and to continue to process events in a timely manner. This works well for pipelines that process a low volume of events.
Imagine that youve a pipeline that processes error logs from web servers and pushes them to OpenSearch. Youre using OpenSearch Dashboards to analyze the error logs. Because youre possibly dealing with a fairly low number of events, it might take a long time to reach 50 events. Logstash processes the events before reaching this threshold because otherwise there would be a long delay before we see the errors appear in OpenSearch Dashboards.
The default batch size and batch delay work for most cases. You dont need to change the default values unless you need to minutely optimize the performance.
## Optimizing based on CPU cores
The number of pipeline workers are proportional to the number of CPU cores on the nodes.
If you have 5 workers running on a server with 2 CPU cores, the 5 workers won't be able to process events concurrently. On the other hand, running 5 workers on a server running 10 CPU cores limits the throughput of a Logstash instance.
Instead of running a fixed number of workers, which results in poor performance in some cases, Logstash examines the number of CPU cores of the instance and selects the number of pipeline workers to optimize its performance for the platform on which its running. For instance, your local development machine might not have the same processing power as a production server. So you don't need to manually configure Logstash for different machines.

358
_clients/logstash/index.md Normal file
View File

@ -0,0 +1,358 @@
---
layout: default
title: Logstash
nav_order: 200
has_children: true
has_toc: true
redirect_from:
- /clients/logstash/
---
# Logstash
Logstash is a real-time event processing engine. It's part of the OpenSearch stack which includes OpenSearch, Beats, and OpenSearch Dashboards.
You can send events to Logstash from many different sources. Logstash processes the events and sends it one or more destinations. For example, you can send access logs from a web server to Logstash. Logstash extracts useful information from each log and sends it to a destination like OpenSearch.
Sending events to Logstash lets you decouple event processing from your app. Your app only needs to send events to Logstash and doesnt need to know anything about what happens to the events afterwards.
The open-source community originally built Logstash for processing log data but now you can process any type of events, including events in XML or JSON format.
## Structure of a pipeline
The way that Logstash works is that you configure a pipeline that has three phases---inputs, filters, and outputs.
Each phase uses one or more plugins. Logstash has over 200 built-in plugins so chances are that youll find what you need. Apart from the built-in plugins, you can use plugins from the community or even write your own.
The structure of a pipeline is as follows:
```yml
input {
input_plugin => {}
}
filter {
filter_plugin => {}
}
output {
output_plugin => {}
}
```
where:
* `input` receives events like logs from multiple sources simultaneously. Logstash supports a number of input plugins for TCP/UDP, files, syslog, Microsoft Windows EventLogs, stdin, HTTP, and so on. You can also use an open source collection of input tools called Beats to gather events. The input plugin sends the events to a filter.
* `filter` parses and enriches the events in one way or the other. Logstash has a large collection of filter plugins that modify events and pass them on to an output. For example, a `grok` filter parses unstructured events into fields and a `mutate` filter changes fields. Filters are executed sequentially.
* `output` ships the filtered events to one or more destinations. Logstash supports a wide range of output plugins for destinations like OpenSearch, TCP/UDP, emails, files, stdout, HTTP, Nagios, and so on.
Both the input and output phases support codecs to process events as they enter or exit the pipeline.
Some of the popular codecs are `json` and `multiline`. The `json` codec processes data thats in JSON format and the `multiline` codec merges multiple line events into a single line.
You can also write conditional statements within pipeline configurations to perform certain actions, if a certain criteria is met.
## Install Logstash
The OpenSearch Logstash plugin has two installation options at this time: Linux (ARM64/X64) and Docker (ARM64/X64).
Make sure you have [Java Development Kit (JDK)](https://www.oracle.com/java/technologies/javase-downloads.html) version 8 or 11 installed.
### Tarball
1. Download the Logstash tarball from [OpenSearch downloads](https://opensearch.org/downloads.html).
2. Navigate to the downloaded folder in the terminal and extract the files:
```bash
tar -zxvf logstash-oss-with-opensearch-output-plugin-7.13.2-linux-x64.tar.gz
```
3. Navigate to the `logstash-7.13.2` directory.
- You can add your pipeline configurations to the `config` directory. Logstash saves any data from the plugins in the `data` directory. The `bin` directory contains the binaries for starting Logstash and managing plugins.
### Docker
1. Pull the Logstash oss package with the OpenSearch output plugin image:
```
docker pull opensearchproject/logstash-oss-with-opensearch-output-plugin:7.13.2
```
1. Create a Docker network:
```
docker network create test
```
1. Start OpenSearch with this network:
```
docker run -p 9200:9200 -p 9600:9600 --name opensearch --net test -e "discovery.type=single-node" opensearchproject/opensearch:1.0.0
```
1. Start Logstash:
```
docker run -it --rm --name logstash --net test opensearchproject/logstash-oss-with-opensearch-output-plugin:7.13.2 -e 'input { stdin { } } output {
opensearch {
hosts => ["https://opensearch:9200"]
index => "opensearch-logstash-docker-%{+YYYY.MM.dd}"
user => "admin"
password => "admin"
ssl => true
ssl_certificate_verification => false
}
}'
```
## Process text from the terminal
You can define a pipeline that listens for events on `stdin` and outputs events on `stdout`. `stdin` and `stdout` refer to the terminal in which youre running Logstash.
To enter some text in the terminal and see the event data in the output:
1. Use the `-e` argument to pass a pipeline configuration directly to the Logstash binary. In this case, `stdin` is the input plugin and `stdout` is the output plugin:
```bash
bin/logstash -e "input { stdin { } } output { stdout { } }"
```
Add the `—debug` flag to see a more detailed output.
2. Enter "hello world" in your terminal. Logstash processes the text and outputs it back to the terminal:
```yml
{
"message" => "hello world",
"host" => "a483e711a548.ant.amazon.com",
"@timestamp" => 2021-05-30T05:15:56.816Z,
"@version" => "1"
}
```
The `message` field contains your raw input. The `host` field is an IP address when you dont run Logstash locally. `@timestamp` shows the date and time for when the event is processed. Logstash uses the `@version` field for internal processing.
3. Press `Ctrl + C` to shut down Logstash.
### Troubleshooting
If you already have a Logstash process running, youll get an error. To fix this issue:
1. Delete the `.lock` file from the `data` directory:
```bash
cd data
rm -rf .lock
```
2. Restart Logstash.
## Process JSON or HTTP input and output it to a file
To define a pipeline that handles JSON requests:
1. Open the `config/pipeline.conf` file in any text editor you like. You can create a pipeline configuration file with any extension, the `.conf` extension is a Logstash convention. Add the `json` codec to accept JSON as the input and the `file` plugin to output the processed events to a `.txt` file:
```yml
input {
stdin {
codec => json
}
}
output {
file {
path => "output.txt"
}
}
```
To process inputs from a file, add an input file to the `events-data` directory and then pass its path to the `file` plugin at the input:
```yml
input {
file {
path => "events-data/input_data.log"
}
}
```
2. Start Logstash:
```bash
$ bin/logstash -f config/pipeline.conf
```
`config/pipeline.conf` is a relative path to the `pipeline.conf` file. You can use an absolute path as well.
3. Add a JSON object in the terminal:
```json
{ "amount": 10, "quantity": 2}
```
The pipeline only handles a single line of input. If you paste some JSON that spans multiple lines, youll get an error.
4. Check that the fields from the JSON object are added to the `output.txt` file:
```json
$ cat output.txt
{
"@version": "1",
"@timestamp": "2021-05-30T05:52:52.421Z",
"host": "a483e711a548.ant.amazon.com",
"amount": 10,
"quantity": 2
}
```
If you type in some invalid JSON as the input, you'll see a JSON parsing error. Logstash doesn't discard the invalid JSON because you still might want to do something with it. For example, you can trigger an email or send a notification to a Slack channel.
To define a pipeline that handles HTTP requests:
1. Use the `http` plugin to send events to Logstash through HTTP:
```yml
input {
http {
host => "127.0.0.1"
port => 8080
}
}
output {
file {
path => "output.txt"
}
}
```
If you dont specify any options, the `http` plugin binds to `localhost` and listens on port 8080.
2. Start Logstash:
```bash
$ bin/logstash -f config/pipeline.conf
```
3. Use Postman to send an HTTP request. Set `Content-Type` to an HTTP header with a value of `application/json`:
```json
PUT 127.0.0.1:8080
{
"amount": 10,
"quantity": 2
}
```
Or, you can use the `curl` command:
```bash
curl -XPUT -H "Content-Type: application/json" -d ' {"amount": 7, "quantity": 3 }' http://localhost:8080 (http://localhost:8080/)
```
Even though we haven't added the `json` plugin to the input, the pipeline configuration still works because the HTTP plugin automatically applies the appropriate codec based on the `Content-Type` header.
If you specify a value of `applications/json`, Logstash parses the request body as JSON.
The `headers` field contains the HTTP headers that Logstash receives:
```json
{
"host": "127.0.0.1",
"quantity": "3",
"amount": 10,
"@timestamp": "2021-05-30T06:05:48.135Z",
"headers": {
"http_version": "HTTP/1.1",
"request_method": "PUT",
"http_user_agent": "PostmanRuntime/7.26.8",
"connection": "keep-alive",
"postman_token": "c6cd29cf-1b37-4420-8db3-9faec66b9e7e",
"http_host": "127.0.0.1:8080",
"cache_control": "no-cache",
"request_path": "/",
"content_type": "application/json",
"http_accept": "*/*",
"content_length": "41",
"accept_encoding": "gzip, deflate, br"
},
"@version": "1"
}
```
## Automatically reload the pipeline configuration
You can configure Logstash to detect any changes to the pipeline configuration file or the input log file and automatically reload the configuration.
The `stdin` plugin doesnt supporting automatic reloading.
{: .note }
1. Add an option named `start_position` with a value of `beginning` to the input plugin:
```yml
input {
file {
path => "/Users/<user>/Desktop/logstash7-12.1/events-data/input_file.log"
start_position => "beginning"
}
}
```
Logstash only processes any new events added to the input file and ignores the ones that it has already processed to avoid processing the same event more than once on restart.
Logstash records its progress in a file that's referred to as a `sinceDB` file. Logstash creates a `sinceDB` file for each file that it watches for changes.
2. Open the `sinceDB` file to check how much of the input files are processed:
```bash
cd data/plugins/inputs/file/
ls -al
-rw-r--r-- 1 user staff 0 Jun 13 10:50 .sincedb_9e484f2a9e6c0d1bdfe6f23ac107ffc5
cat .sincedb_9e484f2a9e6c0d1bdfe6f23ac107ffc5
51575938 1 4 7727
```
The last number in the `sinceDB` file (7727) is the byte offset of the last known event processed.
5. To process the input file from the beginning, delete the `sinceDB` file:
```yml
rm .sincedb_*
```
2. Start Logstash with a `—-config.reload.automatic` argument:
```bash
bin/logstash -f config/pipeline.conf --config.reload.automatic
```
The `reload` option only reloads if you add a new line at the end of the pipeline configuration file.
Sample output:
```yml
{
"message" => "216.243.171.38 - - [20/Sep/2017:19:11:52 +0200] \"GET /products/view/123 HTTP/1.1\" 200 12798 \"https://codingexplained.com/products\" \"Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)\"",
"@version" => "1",
"host" => "a483e711a548.ant.amazon.com",
"path" => "/Users/kumarjao/Desktop/odfe1/logstash-7.12.1/events-data/input_file.log",
"@timestamp" => 2021-06-13T18:03:30.423Z
}
{
"message" => "91.59.108.75 - - [20/Sep/2017:20:11:43 +0200] \"GET /js/main.js HTTP/1.1\" 200 588 \"https://codingexplained.com/products/view/863\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0\"",
"@version" => "1",
"host" => "a483e711a548.ant.amazon.com",
"path" => "/Users/kumarjao/Desktop/odfe1/logstash-7.12.1/events-data/input_file.log",
"@timestamp" => 2021-06-13T18:03:30.424Z
}
```
7. Add a new line to the input file.
- Logstash immediately detects the change and processes the new line as an event.
8. Make a change to the `pipeline.conf` file.
- Logstash immediately detects the change and reloads the modified pipeline.

View File

@ -0,0 +1,77 @@
---
layout: default
title: Ship events to OpenSearch
parent: Logstash
nav_order: 220
---
# Ship events to OpenSearch
You can Ship Logstash events to an OpenSearch cluster and then visualize your events with OpenSearch Dashboards.
Make sure you have [Logstash]({{site.url}}{{site.baseurl}}/clients/logstash/index/#install-logstash), [OpenSearch]({{site.url}}{{site.baseurl}}/opensearch/install/index/), and [OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/dashboards/install/index/).
{: .note }
## OpenSearch output plugin
To run the OpenSearch output plugin, add the following configuration in your `pipeline.conf` file:
```yml
output {
opensearch {
hosts => "https://localhost:9200"
user => "admin"
password => "admin"
index => "logstash-logs-%{+YYYY.MM.dd}"
ssl_certificate_verification => false
}
}
```
## Sample walkthrough
1. Open the `config/pipeline.conf` file and add in the following configuration:
```yml
input {
stdin {
codec => json
}
}
output {
opensearch {
hosts => "https://localhost:9200"
user => "admin"
password => "admin"
index => "logstash-logs-%{+YYYY.MM.dd}"
ssl_certificate_verification => false
}
}
```
This Logstash pipeline accepts JSON input through the terminal and ships the events to an OpenSearch cluster running locally. Logstash writes the events to an index with the `logstash-logs-%{+YYYY.MM.dd}` naming convention.
2. Start Logstash:
```bash
$ bin/logstash -f config/pipeline.conf --config.reload.automatic
```
`config/pipeline.conf` is a relative path to the `pipeline.conf` file. You can use an absolute path as well.
3. Add a JSON object in the terminal:
```json
{ "amount": 10, "quantity": 2}
```
4. Start OpenSearch Dashboards and choose **Dev Tools**:
```json
GET _cat/indices?v
health | status | index | uuid | pri | rep | docs.count | docs.deleted | store.size | pri.store.size
green | open | logstash-logs-2021.07.01 | iuh648LYSnmQrkGf70pplA | 1 | 1 | 1 | 0 | 10.3kb | 5.1kb
```

128
_clients/python.md Normal file
View File

@ -0,0 +1,128 @@
---
layout: default
title: Python client
nav_order: 70
---
# Python client
The OpenSearch Python client provides a more natural syntax for interacting with your cluster. Rather than sending HTTP requests to a given URL, you can create an OpenSearch client for your cluster and call the client's built-in functions.
{% comment %}
`opensearch-py` is the lower-level of the two Python clients. If you want a general client for assorted operations, it's a great choice. If you want a higher-level client strictly for indexing and search operations, consider [opensearch-dsl-py]({{site.url}}{{site.baseurl}}/clients/python-dsl/).
{% endcomment %}
## Setup
To add the client to your project, install it using [pip](https://pip.pypa.io/):
```bash
pip install opensearch-py
```
Then import it like any other module:
```python
from opensearchpy import OpenSearch
```
If you prefer to add the client manually or just want to examine the source code, see [opensearch-py on GitHub](https://github.com/opensearch-project/opensearch-py).
## Sample code
```python
from opensearchpy import OpenSearch
host = 'localhost'
port = 9200
auth = ('admin', 'admin') # For testing only. Don't store credentials in code.
ca_certs_path = '/full/path/to/root-ca.pem' # Provide a CA bundle if you use intermediate CAs with your root CA.
# Optional client certificates if you don't want to use HTTP basic authentication.
# client_cert_path = '/full/path/to/client.pem'
# client_key_path = '/full/path/to/client-key.pem'
# Create the client with SSL/TLS enabled, but hostname verification disabled.
client = OpenSearch(
hosts = [{'host': host, 'port': port}],
http_compress = True, # enables gzip compression for request bodies
http_auth = auth,
# client_cert = client_cert_path,
# client_key = client_key_path,
use_ssl = True,
verify_certs = True,
ssl_assert_hostname = False,
ssl_show_warn = False,
ca_certs = ca_certs_path
)
# Create an index with non-default settings.
index_name = 'python-test-index'
index_body = {
'settings': {
'index': {
'number_of_shards': 4
}
}
}
response = client.indices.create(index_name, body=index_body)
print('\nCreating index:')
print(response)
# Add a document to the index.
document = {
'title': 'Moneyball',
'director': 'Bennett Miller',
'year': '2011'
}
id = '1'
response = client.index(
index = index_name,
body = document,
id = id,
refresh = True
)
print('\nAdding document:')
print(response)
# Search for the document.
q = 'miller'
query = {
'size': 5,
'query': {
'multi_match': {
'query': q,
'fields': ['title^2', 'director']
}
}
}
response = client.search(
body = query,
index = index_name
)
print('\nSearch results:')
print(response)
# Delete the document.
response = client.delete(
index = index_name,
id = id
)
print('\nDeleting document:')
print(response)
# Delete the index.
response = client.indices.delete(
index = index_name
)
print('\nDeleting index:')
print(response)
```

View File

@ -5,8 +5,9 @@ baseurl: "/docs" # the subpath of your site, e.g. /blog
url: "https://opensearch.org" # the base hostname & protocol for your site, e.g. http://example.com
permalink: /:path/
opensearch_version: 1.0.0
opensearch_version: 1.0.1
opensearch_major_minor_version: 1.0
lucene_version: 8_8_2
# Build settings
markdown: kramdown
@ -137,6 +138,7 @@ footer_content:
plugins:
- jekyll-remote-theme
- jekyll-redirect-from
- jekyll-sitemap
# Exclude from processing.
# The following items will not be processed, by default. Create a custom list

View File

@ -6,10 +6,11 @@ has_children: false
has_toc: false
redirect_from:
- /docs/opensearch-dashboards/
- /opensearch-dashboards/
- /dashboards/
---
{%- comment -%}The `/docs/opensearch-dashboards/` redirect is specifically to support the UI links in OpenSearch Dashboards 1.0.0.{%- endcomment -%}
# OpenSearch Dashboards
OpenSearch Dashboards is the default visualization tool for data in OpenSearch. It also serves as a user interface for many of the OpenSearch plugins, including security, alerting, Index State Management, SQL, and more.

135
_dashboards/install/helm.md Normal file
View File

@ -0,0 +1,135 @@
---
layout: default
title: Helm
parent: Install OpenSearch Dashboards
nav_order: 35
---
# Run OpenSearch Dashboards using Helm
Helm is a package manager that allows you to easily install and manage OpenSearch Dashboards in a Kubernetes cluster. You can define your OpenSearch configurations in a YAML file and use Helm to deploy your applications in a version-controlled and reproducible way.
The Helm chart contains the resources described in the following table.
Resource | Description
:--- | :---
`Chart.yaml` | Information about the chart.
`values.yaml` | Default configuration values for the chart.
`templates` | Templates that combine with values to generate the Kubernetes manifest files.
The specification in the default Helm chart supports many standard use cases and setups. You can modify the default chart to configure your desired specifications and set Transport Layer Security (TLS) and role-based access control (RBAC).
For information about the default configuration, steps to configure security, and configurable parameters, see the
[README](https://github.com/opensearch-project/helm-charts/tree/main/charts).
The instructions here assume you have a Kubernetes cluster with Helm preinstalled. See the [Kubernetes documentation](https://kubernetes.io/docs/setup/) for steps to configure a Kubernetes cluster and the [Helm documentation](https://helm.sh/docs/intro/install/) to install Helm.
{: .note }
## Prerequisites
Before you get started, you must first use [Helm to install OpenSearch]({{site.url}}{{site.baseurl}}/opensearch/install/helm/).
Make sure that you can send requests to your OpenSearch pod:
```json
$ curl -XGET https://localhost:9200 -u 'admin:admin' --insecure
{
"name" : "opensearch-cluster-master-1",
"cluster_name" : "opensearch-cluster",
"cluster_uuid" : "hP2gq5bPS3SLp8Z7wXm8YQ",
"version" : {
"distribution" : "opensearch",
"number" : "1.0.0",
"build_type" : "tar",
"build_hash" : "34550c5b17124ddc59458ef774f6b43a086522e3",
"build_date" : "2021-07-02T23:22:21.383695Z",
"build_snapshot" : false,
"lucene_version" : "8.8.2",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "The OpenSearch Project: https://opensearch.org/"
}
```
## Install OpenSearch Dashboards using Helm
1. Change to the `opensearch-dashboards` directory:
```bash
cd opensearch-dashboards
```
1. Package the Helm chart:
```bash
helm package .
```
1. Deploy OpenSearch Dashboards:
```bash
helm install --generate-name opensearch-dashboards-1.0.0.tgz
```
The output shows you the specifications instantiated from the install.
To customize the deployment, pass in the values that you want to override with a custom YAML file:
```bash
helm install --values=customvalues.yaml opensearch-dashboards-1.0.0.tgz
```
#### Sample output
```yaml
NAME: opensearch-dashboards-1-1629223356
LAST DEPLOYED: Tue Aug 17 18:02:37 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Get the application URL by running these commands:
export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=opensearch-dashboards,app.kubernetes.io/instance=op
ensearch-dashboards-1-1629223356" -o jsonpath="{.items[0].metadata.name}")
export CONTAINER_PORT=$(kubectl get pod --namespace default $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
echo "Visit http://127.0.0.1:8080 to use your application"
kubectl --namespace default port-forward $POD_NAME 8080:$CONTAINER_PORT
```
To make sure your OpenSearch Dashboards pod is up and running, run the following command:
```bash
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
opensearch-cluster-master-0 1/1 Running 0 4m35s
opensearch-cluster-master-1 1/1 Running 0 4m35s
opensearch-cluster-master-2 1/1 Running 0 4m35s
opensearch-dashboards-1-1629223356-758bd8747f-8www5 1/1 Running 0 66s
```
To set up port forwarding to access OpenSearch Dashboards, exit the OpenSearch shell and run the following command:
```bash
$ kubectl port-forward deployment/opensearch-dashboards-1-1629223356 5601
```
You can now access OpenSearch Dashboards from your browser at: http://localhost:5601.
## Uninstall using Helm
To identify the OpenSearch Dashboards deployment that you want to delete:
```bash
$ helm list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
opensearch-1-1629223146 default 1 2021-08-17 17:59:07.664498239 +0000 UTCdeployedopensearch-1.0.0 1.0.0
opensearch-dashboards-1-1629223356 default 1 2021-08-17 18:02:37.600796946 +0000 UTCdepl
oyedopensearch-dashboards-1.0.0 1.0.0
```
To delete or uninstall a deployment, run the following command:
```bash
helm delete opensearch-dashboards-1-1629223356
```

View File

@ -9,4 +9,4 @@ redirect_from:
# Install and configure OpenSearch Dashboards
OpenSearch Dashboards has two installation options at this time: Docker images and tarballs.
OpenSearch Dashboards has three installation options at this time: Docker images, tarballs, and Helm charts.

View File

@ -28,6 +28,21 @@ If you don't want to use the all-in-one installation options, you can install th
</tr>
</thead>
<tbody>
<tr>
<td>1.0.1</td>
<td>
<pre>alertingDashboards 1.0.0.0
anomalyDetectionDashboards 1.0.0.0
ganttChartDashboards 1.0.0.0
indexManagementDashboards 1.0.1.0
notebooksDashboards 1.0.0.0
queryWorkbenchDashboards 1.0.0.0
reportsDashboards 1.0.1.0
securityDashboards 1.0.1.0
traceAnalyticsDashboards 1.0.0.0
</pre>
</td>
</tr>
<tr>
<td>1.0.0</td>
<td>
@ -40,36 +55,6 @@ queryWorkbenchDashboards 1.0.0.0
reportsDashboards 1.0.0.0
securityDashboards 1.0.0.0
traceAnalyticsDashboards 1.0.0.0
</pre>
</td>
</tr>
<tr>
<td>1.0.0-rc1</td>
<td>
<pre>alertingDashboards 1.0.0.0-rc1
anomalyDetectionDashboards 1.0.0.0-rc1
ganttChartDashboards 1.0.0.0-rc1
indexManagementDashboards 1.0.0.0-rc1
notebooksDashboards 1.0.0.0-rc1
queryWorkbenchDashboards 1.0.0.0-rc1
reportsDashboards 1.0.0.0-rc1
securityDashboards 1.0.0.0-rc1
traceAnalyticsDashboards 1.0.0.0-rc1
</pre>
</td>
</tr>
<tr>
<td>1.0.0-beta1</td>
<td>
<pre>alertingDashboards 1.0.0.0-beta1
anomalyDetectionDashboards 1.0.0.0-beta1
ganttChartDashboards 1.0.0.0-beta1
indexManagementDashboards 1.0.0.0-beta1
notebooksDashboards 1.0.0.0-beta1
queryWorkbenchDashboards 1.0.0.0-beta1
reportsDashboards 1.0.0.0-beta1
securityDashboards 1.0.0.0-beta1
traceAnalyticsDashboards 1.0.0.0-beta1
</pre>
</td>
</tr>

View File

@ -27,4 +27,4 @@ nav_order: 30
./bin/opensearch-dashboards
```
1. See the [OpenSearch Dashboards documentation]({{site.url}}{{site.baseurl}}/opensearch-dashboards/).
1. See the [OpenSearch Dashboards documentation]({{site.url}}{{site.baseurl}}/dashboards/index/).

View File

@ -12,9 +12,9 @@ By default, for ease of testing and getting started, OpenSearch Dashboards runs
Setting | Description
:--- | :---
opensearch.ssl.verificationMode | This setting is for communications between OpenSearch and OpenSearch Dashboards. Valid values are `full`, `certificate`, or `none`. We recommend `full` if you enable TLS, which enables hostname verification. `certificate` just checks the certificate, not the hostname, and `none` performs no checks (suitable for HTTP). Default is `full`.
opensearch.ssl.certificateAuthorities | If `opensearch.ssl.verificationMode` is `full` or `certificate`, specify the full path (e.g. `[ "/usr/share/opensearch-dashboards-1.0.0/config/root-ca.pem" ]` to the certificate authority for your OpenSearch cluster.
opensearch.ssl.certificateAuthorities | If `opensearch.ssl.verificationMode` is `full` or `certificate`, specify the full path to one or more CA certificates that comprise a trusted chain for your OpenSearch cluster. For example, you might need to include a root CA _and_ an intermediate CA if you used the intermediate CA to issue your admin, client, and node certificates.
server.ssl.enabled | This setting is for communications between OpenSearch Dashboards and the web browser. Set to true for HTTPS, false for HTTP.
server.ssl.certificate | If `server.ssl.enabled` is true, specify the full path (e.g. `/usr/share/opensearch-dashboards-1.0.0/config/my-client-cert.pem` to a valid client certificate for your OpenSearch cluster. You can [generate your own]({{site.url}}{{site.baseurl}}/security-plugin/configuration/generate-certificates/) or get one from a certificate authority.
server.ssl.certificate | If `server.ssl.enabled` is true, specify the full path to a valid client certificate for your OpenSearch cluster. You can [generate your own]({{site.url}}{{site.baseurl}}/security-plugin/configuration/generate-certificates/) or get one from a certificate authority.
server.ssl.key | If `server.ssl.enabled` is true, specify the full path (e.g. `/usr/share/opensearch-dashboards-1.0.0/config/my-client-cert-key.pem` to the key for your client certificate. You can [generate your own]({{site.url}}{{site.baseurl}}/security-plugin/configuration/generate-certificates/) or get one from a certificate authority.
opensearch_security.cookie.secure | If you enable TLS for OpenSearch Dashboards, change this setting to `true`. For HTTP, set it to `false`.
@ -27,9 +27,9 @@ opensearch.username: "kibanaserver"
opensearch.password: "kibanaserver"
opensearch.requestHeadersWhitelist: [ authorization,securitytenant ]
server.ssl.enabled: true
server.ssl.certificate: /usr/share/opensearch-1.0.0/config/client-cert.pem
server.ssl.key: /usr/share/opensearch-1.0.0/config/client-cert-key.pem
opensearch.ssl.certificateAuthorities: [ "/usr/share/opensearch-1.0.0/config/root-ca.pem" ]
server.ssl.certificate: /usr/share/opensearch-dashboards/config/client-cert.pem
server.ssl.key: /usr/share/opensearch-dashboards/config/client-cert-key.pem
opensearch.ssl.certificateAuthorities: [ "/usr/share/opensearch-dashboards/config/root-ca.pem", "/usr/share/opensearch-dashboards/config/intermediate-ca.pem" ]
opensearch_security.multitenancy.enabled: true
opensearch_security.multitenancy.tenants.preferred: ["Private", "Global"]
opensearch_security.readonly_mode.roles: ["kibana_read_only"]

View File

@ -2,8 +2,12 @@
layout: default
title: WMS map server
nav_order: 5
redirect_from:
- /docs/opensearch-dashboards/maptiles/
---
{%- comment -%}The `/docs/opensearch-dashboards/maptiles/` redirect is specifically to support the UI links in OpenSearch Dashboards 1.0.0.{%- endcomment -%}
# Configure WMS map server
OpenSearch Dashboards includes default map tiles, but if you need more specialized maps, you can configure OpenSearch Dashboards to use a WMS map server:

View File

@ -2,7 +2,7 @@
layout: default
title: Notebooks
nav_order: 50
redirect_from: /docs/notebooks/
redirect_from: /notebooks/
has_children: false
---

6
_data/versions.json Normal file
View File

@ -0,0 +1,6 @@
{
"current": "1.1",
"past": [
"1.0"
]
}

View File

@ -0,0 +1,7 @@
---
layout: default
title: Dashboards developer guide
nav_order: 2
permalink: /dashboards-developer-guide/
redirect_to: https://github.com/opensearch-project/OpenSearch-Dashboards/blob/main/DEVELOPER_GUIDE.md
---

View File

@ -3,5 +3,5 @@ layout: default
title: Javadoc
nav_order: 1
permalink: /javadoc/
redirect_to: https://opensearch.org/docs/javadocs/
redirect_to: https://opensearch.org/javadocs/
---

View File

@ -0,0 +1,6 @@
---
layout: default
nav_exclude: true
permalink: /javadocs/
redirect_to: https://opensearch.org/javadocs/
---

View File

@ -3,7 +3,7 @@ layout: default
title: Index rollups
nav_order: 35
has_children: true
redirect_from: /docs/ism/index-rollups/
redirect_from: /im-plugin/index-rollups/
has_toc: false
---

View File

@ -2,7 +2,6 @@
layout: default
title: Index rollups API
parent: Index rollups
redirect_from: /docs/ism/rollup-api/
nav_order: 9
---
@ -20,6 +19,8 @@ Use the index rollup operations to programmatically work with index rollup jobs.
---
## Create or update an index rollup job
Introduced 1.0
{: .label .label-purple }
Creates or updates an index rollup job.
You must provide the `seq_no` and `primary_term` parameters.
@ -89,36 +90,36 @@ You can specify the following options.
Options | Description | Type | Required
:--- | :--- |:--- |:--- |
`source_index` | The name of the detector. | `string` | Yes
`target_index` | Specify the target index that the rolled up data is ingested into. You could either create a new target index or use an existing index. The target index cannot be a combination of raw and rolled up data. | `string` | Yes
`schedule` | Schedule of the index rollup job which can be an interval or a cron expression. | `object` | Yes
`schedule.interval` | Specify the frequency of execution of the rollup job. | `object` | No
`schedule.interval.start_time` | Start time of the interval. | `timestamp` | Yes
`schedule.interval.period` | Define the interval period. | `string` | Yes
`schedule.interval.unit` | Specify the time unit of the interval. | `string` | Yes
`schedule.interval.cron` | Optionally, specify a cron expression to define therollup frequency. | `list` | No
`schedule.interval.cron.expression` | Specify a Unix cron expression. | `string` | Yes
`schedule.interval.cron.timezone` | Specify timezones as defined by the IANA Time Zone Database. Defaults to UTC. | `string` | No
`description` | Optionally, describe the rollup job. | `string` | No
`enabled` | When true, the index rollup job is scheduled. Default is true. | `boolean` | Yes
`continuous` | Specify whether or not the index rollup job continuously rolls up data forever or just executes over the current data set once and stops. Default is false. | `boolean` | Yes
`error_notification` | Set up a Mustache message template sent for error notifications. For example, if an index rollup job fails, the system sends a message to a Slack channel. | `object` | No
`page_size` | Specify the number of buckets to paginate through at a time while rolling up. | `number` | Yes
`delay` | Specify time value to delay execution of the index rollup job. | `time_unit` | No
`dimensions` | Specify aggregations to create dimensions for the roll up time window. | `object` | Yes
`dimensions.date_histogram` | Specify either fixed_interval or calendar_interval, but not both. Either one limits what you can query in the target index. | `object` | No
`dimensions.date_histogram.fixed_interval` | Specify the fixed interval for aggregations in milliseconds, seconds, minutes, hours, or days. | `string` | No
`dimensions.date_histogram.calendar_interval` | Specify the calendar interval for aggregations in minutes, hours, days, weeks, months, quarters, or years. | `string` | No
`dimensions.date_histogram.field` | Specify the date field used in date histogram aggregation. | `string` | No
`dimensions.date_histogram.timezone` | Specify the timezones as defined by the IANA Time Zone Database. The default is UTC. | `string` | No
`dimensions.terms` | Specify the term aggregations that you want to roll up. | `object` | No
`dimensions.terms.fields` | Specify terms aggregation for compatible fields. | `object` | No
`dimensions.histogram` | Specify the histogram aggregations that you want to roll up. | `object` | No
`dimensions.histogram.field` | Add a field for histogram aggregations. | `string` | Yes
`dimensions.histogram.interval` | Specify the histogram aggregation interval for the field. | `long` | Yes
`dimensions.metrics` | Specify a list of objects that represent the fields and metrics that you want to calculate. | `nested object` | No
`dimensions.metrics.field` | Specify the field that you want to perform metric aggregations on. | `string` | No
`dimensions.metrics.field.metrics` | Specify the metric aggregations you want to calculate for the field. | `multiple strings` | No
`source_index` | The name of the detector. | String | Yes
`target_index` | Specify the target index that the rolled up data is ingested into. You could either create a new target index or use an existing index. The target index cannot be a combination of raw and rolled up data. | String | Yes
`schedule` | Schedule of the index rollup job which can be an interval or a cron expression. | Object | Yes
`schedule.interval` | Specify the frequency of execution of the rollup job. | Object | No
`schedule.interval.start_time` | Start time of the interval. | Timestamp | Yes
`schedule.interval.period` | Define the interval period. | String | Yes
`schedule.interval.unit` | Specify the time unit of the interval. | String | Yes
`schedule.interval.cron` | Optionally, specify a cron expression to define therollup frequency. | List | No
`schedule.interval.cron.expression` | Specify a Unix cron expression. | String | Yes
`schedule.interval.cron.timezone` | Specify timezones as defined by the IANA Time Zone Database. Defaults to UTC. | String | No
`description` | Optionally, describe the rollup job. | String | No
`enabled` | When true, the index rollup job is scheduled. Default is true. | Boolean | Yes
`continuous` | Specify whether or not the index rollup job continuously rolls up data forever or just executes over the current data set once and stops. Default is false. | Boolean | Yes
`error_notification` | Set up a Mustache message template sent for error notifications. For example, if an index rollup job fails, the system sends a message to a Slack channel. | Object | No
`page_size` | Specify the number of buckets to paginate through at a time while rolling up. | Number | Yes
`delay` | The number of milliseconds to delay execution of the index rollup job. | Long | No
`dimensions` | Specify aggregations to create dimensions for the roll up time window. | Object | Yes
`dimensions.date_histogram` | Specify either fixed_interval or calendar_interval, but not both. Either one limits what you can query in the target index. | Object | No
`dimensions.date_histogram.fixed_interval` | Specify the fixed interval for aggregations in milliseconds, seconds, minutes, hours, or days. | String | No
`dimensions.date_histogram.calendar_interval` | Specify the calendar interval for aggregations in minutes, hours, days, weeks, months, quarters, or years. | String | No
`dimensions.date_histogram.field` | Specify the date field used in date histogram aggregation. | String | No
`dimensions.date_histogram.timezone` | Specify the timezones as defined by the IANA Time Zone Database. The default is UTC. | String | No
`dimensions.terms` | Specify the term aggregations that you want to roll up. | Object | No
`dimensions.terms.fields` | Specify terms aggregation for compatible fields. | Object | No
`dimensions.histogram` | Specify the histogram aggregations that you want to roll up. | Object | No
`dimensions.histogram.field` | Add a field for histogram aggregations. | String | Yes
`dimensions.histogram.interval` | Specify the histogram aggregation interval for the field. | Long | Yes
`dimensions.metrics` | Specify a list of objects that represent the fields and metrics that you want to calculate. | Nested object | No
`dimensions.metrics.field` | Specify the field that you want to perform metric aggregations on. | String | No
`dimensions.metrics.field.metrics` | Specify the metric aggregations you want to calculate for the field. | Multiple strings | No
#### Sample response
@ -134,6 +135,8 @@ Options | Description | Type | Required
## Get an index rollup job
Introduced 1.0
{: .label .label-purple }
Returns all information about an index rollup job based on the `rollup_id`.
@ -159,6 +162,8 @@ GET _plugins/_rollup/jobs/<rollup_id>
---
## Delete an index rollup job
Introduced 1.0
{: .label .label-purple }
Deletes an index rollup job based on the `rollup_id`.
@ -178,6 +183,8 @@ DELETE _plugins/_rollup/jobs/<rollup_id>
## Start or stop an index rollup job
Introduced 1.0
{: .label .label-purple }
Start or stop an index rollup job.
@ -199,6 +206,8 @@ POST _plugins/_rollup/jobs/<rollup_id>/_stop
---
## Explain an index rollup job
Introduced 1.0
{: .label .label-purple }
Returns detailed metadata information about the index rollup job and its current progress.

View File

@ -3,7 +3,7 @@ layout: default
title: Index transforms
nav_order: 20
has_children: true
redirect_from: /docs/im/index-transforms/
redirect_from: /im-plugin/index-transforms/
has_toc: false
---

View File

@ -15,6 +15,8 @@ Aside from using OpenSearch Dashboards, you can also use the REST API to create,
{:toc}
## Create a transform job
Introduced 1.0
{: .label .label-purple }
Creates a transform job.
@ -139,6 +141,8 @@ source_field | String | The field(s) to transform | Yes
aggregations | JSON | The aggregations to use in the transform job. Supported aggregations are: `sum`, `max`, `min`, `value_count`, `avg`, `scripted_metric`, and `percentiles`. For more information, see [Metric Aggregations]({{site.url}}{{site.baseurl}}/opensearch/metric-agg). | Yes if not using groups
## Update a transform job
Introduced 1.0
{: .label .label-purple }
Updates a transform job if `transform_id` already exists.
@ -254,6 +258,8 @@ Parameter | Description | Required
`if_primary_term` | Only perform the transform operation if the last operation that changed the transform job has the specified sequence term. | No
## Get a transform job's details
Introduced 1.0
{: .label .label-purple }
Returns a transform job's details.
@ -520,6 +526,8 @@ GET _plugins/_transform?size=2&from=8
```
## Start a transform job
Introduced 1.0
{: .label .label-purple }
Transform jobs created using the API are automatically enabled, but if you ever need to enable a job, you can use the `start` API operation.
@ -538,6 +546,8 @@ POST _plugins/_transform/<transform_id>/_start
```
## Stop a transform job
Introduced 1.0
{: .label .label-purple }
Stops/disables a transform job.
@ -556,6 +566,8 @@ POST _plugins/_transform/<transform_id>/_stop
```
## Get the status of a transform job
Introduced 1.0
{: .label .label-purple }
Returns the status and metadata of a transform job.
@ -589,6 +601,8 @@ GET _plugins/_transform/<transform_id>/_explain
```
## Preview a transform job's results
Introduced 1.0
{: .label .label-purple }
Returns a preview of what a transformed index would look like.
@ -674,6 +688,8 @@ POST _plugins/_transform/_preview
```
## Delete a transform job
Introduced 1.0
{: .label .label-purple }
Deletes a transform job. This operation does not delete the source or target indices.

View File

@ -2,7 +2,6 @@
layout: default
title: ISM API
parent: Index State Management
redirect_from: /docs/ism/api/
nav_order: 5
---
@ -21,6 +20,8 @@ Use the index state management operations to programmatically work with policies
## Create policy
Introduced 1.0
{: .label .label-purple }
Creates a policy.
@ -138,6 +139,8 @@ PUT _plugins/_ism/policies/policy_1
---
## Add policy
Introduced 1.0
{: .label .label-purple }
Adds a policy to an index. This operation does not change the policy if the index already has one.
@ -168,6 +171,8 @@ Don't use the broad `*` wildcard, and instead add a prefix, such as `my-logs*`,
## Update policy
Introduced 1.0
{: .label .label-purple }
Updates a policy. Use the `seq_no` and `primary_term` parameters to update an existing policy. If these numbers don't match the existing policy or the policy doesn't exist, ISM throws an error.
@ -285,6 +290,8 @@ PUT _plugins/_ism/policies/policy_1?if_seq_no=7&if_primary_term=1
---
## Get policy
Introduced 1.0
{: .label .label-purple }
Gets the policy by `policy_id`.
@ -355,6 +362,8 @@ GET _plugins/_ism/policies/policy_1
---
## Remove policy from index
Introduced 1.0
{: .label .label-purple }
Removes any ISM policy from the index.
@ -378,6 +387,8 @@ POST _plugins/_ism/remove/index_1
---
## Update managed index policy
Introduced 1.0
{: .label .label-purple }
Updates the managed index policy to a new policy (or to a new version of the policy). You can use an index pattern to update multiple indices at once. When updating multiple indices, you might want to include a state filter to only affect certain managed indices. The change policy filters out all the existing managed indices and only applies the change to the ones in the state that you specify. You can also explicitly specify the state that the managed index transitions to after the change policy takes effect.
@ -414,6 +425,8 @@ POST _plugins/_ism/change_policy/index_1
---
## Retry failed index
Introduced 1.0
{: .label .label-purple }
Retries the failed action for an index. For the retry call to succeed, ISM must manage the index, and the index must be in a failed state. You can use index patterns (`*`) to retry multiple failed indices.
@ -440,6 +453,8 @@ POST _plugins/_ism/retry/index_1
---
## Explain index
Introduced 1.0
{: .label .label-purple }
Gets the current state of the index. You can use index patterns to get the status of multiple indices.
@ -465,6 +480,8 @@ The `plugins.index_state_management.policy_id` setting is deprecated starting fr
---
## Delete policy
Introduced 1.0
{: .label .label-purple }
Deletes the policy by `policy_id`.

View File

@ -4,8 +4,7 @@ title: Index State Management
nav_order: 3
has_children: true
redirect_from:
- /docs/im/ism/
- /docs/ism/
- /im-plugin/ism/
has_toc: false
---
@ -56,6 +55,8 @@ PUT _plugins/_ism/policies/policy_id
}
```
If you have more than one template that matches an index pattern, ISM uses the priority value to determine which template to apply.
For an example ISM template policy, see [Sample policy with ISM template]({{site.url}}{{site.baseurl}}/im-plugin/ism/policies#sample-policy-with-ism-template).
Older versions of the plugin include the `policy_id` in an index template, so when an index is created that matches the index template pattern, the index will have the policy attached to it:
@ -90,7 +91,7 @@ Make sure that the alias that you enter already exists. For more information abo
After you attach a policy to an index, ISM creates a job that runs every 5 minutes by default to perform policy actions, check conditions, and transition the index into different states. To change the default time interval for this job, see [Settings]({{site.url}}{{site.baseurl}}/im-plugin/ism/settings/).
If you want to use an OpenSearch operation to create an index with a policy already attached to it, see [create index]({{site.url}}{{site.baseurl}}/im-plugin/ism/api#create-index).
ISM does not run jobs if the cluster state is red.
### Step 3: Manage indices

View File

@ -3,7 +3,6 @@ layout: default
title: Managed Indices
nav_order: 3
parent: Index State Management
redirect_from: /docs/ism/managedindices/
has_children: false
---

View File

@ -3,7 +3,6 @@ layout: default
title: Policies
nav_order: 1
parent: Index State Management
redirect_from: /docs/ism/policies/
has_children: false
---
@ -94,7 +93,7 @@ For a list of available unit types, see [Supported units]({{site.url}}{{site.bas
ISM supports the following operations:
- [force_merge](#forcemerge)
- [force_merge](#force_merge)
- [read_only](#read_only)
- [read_write](#read_write)
- [replica_count](#replica_count)
@ -348,7 +347,7 @@ Parameter | Description | Type | Required | Default
### allocation
Allocate the index to a node with a specific attribute.
Allocate the index to a node with a specific attribute set [like this]({{site.url}}{{site.baseurl}}/opensearch/cluster/#advanced-step-7-set-up-a-hot-warm-architecture).
For example, setting `require` to `warm` moves your data only to "warm" nodes.
The `allocation` operation has the following parameters:
@ -364,7 +363,7 @@ Parameter | Description | Type | Required
"actions": [
{
"allocation": {
"require": { "box_type": "warm" }
"require": { "temp": "warm" }
}
}
]
@ -559,9 +558,11 @@ The following sample template policy is for a rollover use case.
PUT _index_template/ism_rollover
{
"index_patterns": ["log*"],
"settings": {
"template": {
"settings": {
"plugins.index_state_management.rollover_alias": "log"
}
}
}
}
```
@ -587,6 +588,12 @@ The following sample template policy is for a rollover use case.
}
```
5. Verify if the policy is attached to the `log-000001` index:
```json
GET _plugins/_ism/explain/log-000001?pretty
```
## Example policy
The following example policy implements a `hot`, `warm`, and `delete` workflow. You can use this policy as a template to prioritize resources to your indices based on their levels of activity.

View File

@ -2,7 +2,6 @@
layout: default
title: Settings
parent: Index State Management
redirect_from: /docs/ism/settings/
nav_order: 4
---

View File

@ -3,7 +3,7 @@ layout: default
title: Refresh search analyzer
nav_order: 40
has_children: false
redirect_from: /docs/ism/refresh-analyzer/
redirect_from: /im-plugin/refresh-analyzer/
has_toc: false
---

View File

@ -6,3 +6,9 @@
<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3.0.1/es5/tex-mml-chtml.js"></script>
{% endif %}
{% if jekyll.environment == "development" %}
<script src="{{ '/assets/js/version-selector.js' | relative_url }}"></script>
{% else %}
<script src="{{ '/docs/latest/assets/js/version-selector.js' }}"></script>
{% endif %}

View File

@ -57,6 +57,10 @@ layout: table_wrappers
</a>
</div>
<nav role="navigation" aria-label="Main" id="site-nav" class="site-nav">
{% assign past_versions = site.data.versions.past | join: ";" %}
<div class="version-wrapper">
<version-selector selected="{{ site.data.versions.current }}"></version-selector>
</div>
{% assign pages_top_size = site.html_pages
| where_exp:"item", "item.title != nil"
| where_exp:"item", "item.parent == nil"

View File

@ -19,6 +19,8 @@ Use these anomaly detection operations to programmatically create and manage det
---
## Create anomaly detector
Introduced 1.0
{: .label .label-purple }
Creates an anomaly detector.
@ -306,6 +308,8 @@ Options | Description | Type | Required
---
## Preview detector
Introduced 1.0
{: .label .label-purple }
Passes a date range to the anomaly detector to return any anomalies within that date range.
@ -445,6 +449,8 @@ If you specify a category field, each result is associated with an entity:
---
## Start detector job
Introduced 1.0
{: .label .label-purple }
Starts a real-time or historical anomaly detector job.
@ -470,6 +476,8 @@ POST _plugins/_anomaly_detection/detectors/<detectorId>/_start
---
## Stop detector job
Introduced 1.0
{: .label .label-purple }
Stops a real-time or historical anomaly detector job.
@ -488,6 +496,8 @@ Stopped detector: m4ccEnIBTXsGi3mvMt9p
---
## Search detector result
Introduced 1.0
{: .label .label-purple }
Returns all results for a search query.
@ -933,6 +943,8 @@ GET _plugins/_anomaly_detection/detectors/results/_search
---
## Delete detector
Introduced 1.0
{: .label .label-purple }
Deletes a detector based on the `detector_id`.
To delete a detector, you need to first stop the detector.
@ -968,6 +980,8 @@ DELETE _plugins/_anomaly_detection/detectors/<detectorId>
---
## Update detector
Introduced 1.0
{: .label .label-purple }
Updates a detector with any changes, including the description or adding or removing of features.
To update a detector, you need to first stop the detector.
@ -1139,6 +1153,8 @@ PUT _plugins/_anomaly_detection/detectors/<detectorId>
---
## Get detector
Introduced 1.0
{: .label .label-purple }
Returns all information about a detector based on the `detector_id`.
@ -1485,6 +1501,8 @@ GET _plugins/_anomaly_detection/detectors/<detectorId>?task=true
---
## Search detector
Introduced 1.0
{: .label .label-purple }
Returns all anomaly detectors for a search query.
@ -1591,6 +1609,8 @@ Sample Input:
---
## Get detector stats
Introduced 1.0
{: .label .label-purple }
Provides information about how the plugin is performing.
@ -1691,6 +1711,8 @@ Historical detectors contain additional fields:
---
## Create monitor
Introduced 1.0
{: .label .label-purple }
Create a monitor to set up alerts for the detector.
@ -1909,6 +1931,8 @@ POST _plugins/_alerting/monitors
---
## Profile detector
Introduced 1.0
{: .label .label-purple }
Returns information related to the current state of the detector and memory usage, including current errors and shingle size, to help troubleshoot the detector.

View File

@ -13,7 +13,7 @@ An anomaly in OpenSearch is any unusual behavior change in your time-series data
It can be challenging to discover anomalies using conventional methods such as creating visualizations and dashboards. You could configure an alert based on a static threshold, but this requires prior domain knowledge and isn't adaptive to data that exhibits organic growth or seasonal behavior.
Anomaly detection automatically detects anomalies in your OpenSearch data in near real-time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an `anomaly grade` and `confidence score` value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Random Cut Forests](https://pdfs.semanticscholar.org/8bba/52e9797f2e2cc9a823dbd12514d02f29c8b9.pdf?_ga=2.56302955.1913766445.1574109076-1059151610.1574109076).
Anomaly detection automatically detects anomalies in your OpenSearch data in near real-time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an `anomaly grade` and `confidence score` value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Random Cut Forests](https://api.semanticscholar.org/CorpusID:927435).
You can pair the anomaly detection plugin with the [alerting plugin]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/) to notify you as soon as an anomaly is detected.
@ -79,13 +79,11 @@ This formula provides a good starting point, but make sure to test with a repres
For example, for a cluster with 3 data nodes, each with 8G of JVM heap size, a maximum memory percentage of 10% (default), and the entity size of the detector as 1MB: the total number of unique entities supported is (8.096 * 10^9 * 0.1 / 1M ) * 3 = 2429.
#### Set a window size
#### Set a shingle size
Set the number of aggregation intervals from your data stream to consider in a detection window. It's best to choose this value based on your actual data to see which one leads to the best results for your use case.
Set the number of aggregation intervals from your data stream to consider in a detection window. Its best to choose this value based on your actual data to see which one leads to the best results for your use case.
Based on experiments performed on a wide variety of one-dimensional data streams, we recommend using a window size between 1 and 16. The default window size is 8. If you set the category field for high cardinality, the default window size is 1.
If you expect missing values in your data or if you want to base the anomalies on the current interval, choose 1. If your data is continuously ingested and you want to base the anomalies on multiple intervals, choose a larger window size.
The anomaly detector expects the shingle size to be in the range of 1 and 60. The default shingle size is 8. We recommend that you don't choose 1 unless you have two or more features. Smaller values might increase [recall](https://en.wikipedia.org/wiki/Precision_and_recall) but also false positives. Larger values might be useful for ignoring noise in a signal.
#### Preview sample anomalies

View File

@ -20,6 +20,8 @@ Use the alerting API to programmatically manage monitors and alerts.
---
## Create monitor
Introduced 1.0
{: .label .label-purple }
#### Request
@ -246,6 +248,8 @@ For a full list of timezone names, refer to [Wikipedia](https://en.wikipedia.org
---
## Update monitor
Introduced 1.0
{: .label .label-purple }
When you update a monitor, include the current version number as a parameter. OpenSearch increments the version number automatically (see the sample response).
@ -368,6 +372,8 @@ PUT _plugins/_alerting/monitors/<monitor_id>
---
## Get monitor
Introduced 1.0
{: .label .label-purple }
#### Request
@ -438,6 +444,8 @@ GET _plugins/_alerting/monitors/<monitor_id>
---
## Monitor stats
Introduced 1.0
{: .label .label-purple }
Returns statistics about the alerting feature. Use `_plugins/_alerting/stats` to find node IDs and metrics. Then you can drill down using those values.
@ -627,6 +635,8 @@ GET _plugins/_alerting/<node-id>/stats/<metric>
---
## Delete monitor
Introduced 1.0
{: .label .label-purple }
#### Request
@ -658,6 +668,8 @@ DELETE _plugins/_alerting/monitors/<monitor_id>
---
## Search monitors
Introduced 1.0
{: .label .label-purple }
#### Request
@ -755,6 +767,8 @@ GET _plugins/_alerting/monitors/_search
---
## Run monitor
Introduced 1.0
{: .label .label-purple }
You can add the optional `?dryrun=true` parameter to the URL to show the results of a run without actions sending any message.
@ -787,6 +801,8 @@ POST _plugins/_alerting/monitors/<monitor_id>/_execute
---
## Get alerts
Introduced 1.0
{: .label .label-purple }
Returns an array of all alerts.
@ -855,8 +871,10 @@ GET _plugins/_alerting/monitors/alerts
---
## Acknowledge alert
Introduced 1.0
{: .label .label-purple }
[After getting your alerts](#get-alerts/), you can acknowledge any number of active alerts in one call. If the alert is already in an ERROR, COMPLETED, or ACKNOWLEDGED state, it appears in the `failed` array.
[After getting your alerts](#get-alerts), you can acknowledge any number of active alerts in one call. If the alert is already in an ERROR, COMPLETED, or ACKNOWLEDGED state, it appears in the `failed` array.
#### Request
@ -882,6 +900,8 @@ POST _plugins/_alerting/monitors/<monitor-id>/_acknowledge/alerts
---
## Create destination
Introduced 1.0
{: .label .label-purple }
#### Requests
@ -956,6 +976,8 @@ POST _plugins/_alerting/destinations
---
## Update destination
Introduced 1.0
{: .label .label-purple }
#### Request
@ -991,6 +1013,8 @@ PUT _plugins/_alerting/destinations/<destination-id>
---
## Get destination
Introduced 1.0
{: .label .label-purple }
Retrieve one destination.
@ -1036,6 +1060,8 @@ GET _plugins/_alerting/destinations/<destination-id>
---
## Get destinations
Introduced 1.0
{: .label .label-purple }
Retrieve all destinations.
@ -1081,6 +1107,8 @@ GET _plugins/_alerting/destinations
---
## Delete destination
Introduced 1.0
{: .label .label-purple }
#### Request
@ -1110,6 +1138,8 @@ DELETE _plugins/_alerting/destinations/<destination-id>
---
## Create email account
Introduced 1.0
{: .label .label-purple }
#### Request
```json
@ -1142,6 +1172,8 @@ POST _plugins/_alerting/destinations/email_accounts
```
## Update email account
Introduced 1.0
{: .label .label-purple }
#### Request
```json
@ -1173,6 +1205,8 @@ PUT _plugins/_alerting/destinations/email_accounts/<email_account_id>
```
## Get email account
Introduced 1.0
{: .label .label-purple }
#### Request
```json
@ -1204,6 +1238,8 @@ GET _plugins/_alerting/destinations/email_accounts/<email_account_id>
```
## Delete email account
Introduced 1.0
{: .label .label-purple }
#### Request
```
@ -1230,6 +1266,8 @@ DELETE _plugins/_alerting/destinations/email_accounts/<email_account_id>
```
## Search email account
Introduced 1.0
{: .label .label-purple }
#### Request
@ -1296,6 +1334,8 @@ POST _plugins/_alerting/destinations/email_accounts/_search
---
## Create email group
Introduced 1.0
{: .label .label-purple }
#### Request
@ -1330,6 +1370,8 @@ POST _plugins/_alerting/destinations/email_groups
```
## Update email group
Introduced 1.0
{: .label .label-purple }
#### Request
@ -1363,6 +1405,8 @@ PUT _plugins/_alerting/destinations/email_groups/<email_group_id>
```
## Get email group
Introduced 1.0
{: .label .label-purple }
#### Request
```json
@ -1395,6 +1439,8 @@ GET _plugins/_alerting/destinations/email_groups/<email_group_id>
```
## Delete email group
Introduced 1.0
{: .label .label-purple }
#### Request
```
@ -1421,6 +1467,8 @@ DELETE _plugins/_alerting/destinations/email_groups/<email_group_id>
```
## Search email group
Introduced 1.0
{: .label .label-purple }
#### Request

View File

@ -4,9 +4,6 @@ title: Cron
nav_order: 20
parent: Alerting
has_children: false
redirect_from:
- /alerting/cron/
- /docs/alerting/cron/
---
# Cron expression reference

View File

@ -34,7 +34,7 @@ Destination | A reusable location for an action, such as Amazon Chime, Slack, or
1. Specify a name for the destination so that you can identify it later.
1. For **Type**, choose Slack, Amazon Chime, custom webhook, or [email](#email-as-a-destination).
For Email type, refer to [Email as a destination](#email-as-a-destination) section below. For all other types, specify the webhook URL. For more information about webhooks, see the documentation for [Slack](https://api.slack.com/incoming-webhooks) and [Chime](https://docs.aws.amazon.com/chime/latest/ug/webhooks.html).
For Email type, refer to [Email as a destination](#email-as-a-destination) section below. For all other types, specify the webhook URL. For more information about webhooks, see the documentation for [Slack](https://api.slack.com/incoming-webhooks) and [Amazon Chime](https://docs.aws.amazon.com/chime/latest/ug/webhooks.html).
For custom webhooks, you must specify more information: parameters and headers. For example, if your endpoint requires basic authentication, you might need to add a header with a key of `Authorization` and a value of `Basic <Base64-encoded-credential-string>`. You might also need to change `Content-Type` to whatever your webhook requires. Popular values are `application/json`, `application/xml`, and `text/plain`.
@ -296,7 +296,7 @@ Variable | Data Type | Description
`ctx.trigger.actions.destination_id`| String | The alert destination's ID.
`ctx.trigger.actions.message_template.source` | String | The message to send in the alert.
`ctx.trigger.actions.message_template.lang` | String | The scripting language used to define the message. Must be Mustache.
`ctx.trigger.actions.throttle_enabled` | Boolean | Whether throttling is enabled for this trigger. See [adding actions](#add-actions/) for more information about throttling.
`ctx.trigger.actions.throttle_enabled` | Boolean | Whether throttling is enabled for this trigger. See [adding actions](#add-actions) for more information about throttling.
`ctx.trigger.actions.subject_template.source` | String | The message's subject in the alert.
`ctx.trigger.actions.subject_template.lang` | String | The scripting language used to define the subject. Must be mustache.

View File

@ -6,6 +6,8 @@ nav_order: 1
---
# Performance Analyzer API
Introduced 1.0
{: .label .label-purple }
Performance Analyzer uses a single HTTP method and URI for most requests:

View File

@ -1,6 +1,6 @@
---
layout: default
title: Create Dashboards
title: Create PerfTop Dashboards
parent: Performance Analyzer
nav_order: 2
---

View File

@ -48,11 +48,11 @@ Otherwise, just specify the OpenSearch endpoint:
./opensearch-perf-top-macos --dashboard dashboards/<dashboard>.json --endpoint my-cluster.my-domain.com
```
PerfTop has four pre-built dashboards in the `dashboards` directory, but you can also [create your own]({{site.url}}{{site.baseurl}}/dashboards/).
PerfTop has four pre-built dashboards in the `dashboards` directory, but you can also [create your own]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/dashboards/).
You can also load the pre-built dashboards (ClusterOverview, ClusterNetworkMemoryAnalysis, ClusterThreadAnalysis, or NodeAnalysis) without the JSON files, such as `--dashboard ClusterThreadAnalysis`.
PerfTop has no interactivity. Start the application, monitor the dashboard, and press esc, q, or Ctrl + C to quit.
PerfTop has no interactivity. Start the application, monitor the dashboard, and press Esc, Q, or Ctrl + C to quit.
{: .note }

View File

@ -660,7 +660,7 @@ GET opensearch_dashboards_sample_data_logs/_search
```
The `ip_range` aggregation is for IP addresses.
It works on `ip` type fields. You can define the IP ranges and masks in the [CIDR](http://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing) notation.
It works on `ip` type fields. You can define the IP ranges and masks in the [CIDR](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing) notation.
```json
GET opensearch_dashboards_sample_data_logs/_search
@ -1026,7 +1026,7 @@ GET opensearch_dashboards_sample_data_logs/_search
The `geohash_grid` aggregation buckets documents for geographical analysis. It organizes a geographical region into a grid of smaller regions of different sizes or precisions. Lower values of precision represent larger geographical areas and higher values represent smaller, more precise geographical areas.
The number of results returned by a query might be far too many to display each geo point individually on a map. The `geohash_grid` aggregation buckets nearby geo points together by calculating the Geohash for each point, at the level of precision that you define (between 1 to 12; the default is 5). To learn more about Geohash, see [Wikipedia](http://en.wikipedia.org/wiki/Geohash).
The number of results returned by a query might be far too many to display each geo point individually on a map. The `geohash_grid` aggregation buckets nearby geo points together by calculating the Geohash for each point, at the level of precision that you define (between 1 to 12; the default is 5). To learn more about Geohash, see [Wikipedia](https://en.wikipedia.org/wiki/Geohash).
The web logs example data is spread over a large geographical area, so you can use a lower precision value. You can zoom in on this map by increasing the precision value:

View File

@ -329,10 +329,4 @@ You can then use the [Index State Management (ISM)]({{site.url}}{{site.baseurl}}
## Next steps
If you are using the security plugin, the previous request to `_cat/nodes?v` might have failed with an initialization error. To initialize the plugin, run `opensearch/plugins/opensearch-security/tools/securityadmin.sh`. A sample command that uses the demo certificates might look like this:
```bash
sudo ./securityadmin.sh -cd {{site.url}}{{site.baseurl}}/securityconfig/ -icl -nhnv -cacert /etc/opensearch/root-ca.pem -cert /etc/opensearch/kirk.pem -key /etc/opensearch/kirk-key.pem -h <private-ip>
```
For full guidance around configuration options, see [Security configuration]({{site.url}}{{site.baseurl}}/security-plugin/configuration/index/).
If you are using the security plugin, the previous request to `_cat/nodes?v` might have failed with an initialization error. For full guidance around using the security plugin, see [Security configuration]({{site.url}}{{site.baseurl}}/security-plugin/configuration/index/).

View File

@ -69,4 +69,12 @@ You can edit the `OPENSEARCH_PATH_CONF=/etc/opensearch` to change the config dir
If you set your customized `OPENSEARCH_PATH_CONF` variable, be aware that other default environment variables will not be loaded.
You don't mark settings in `opensearch.yml` as persistent or transient, and settings use the flat form:
```yml
cluster.name: my-application
action.auto_create_index: true
compatibility.override_main_response_version: true
```
The demo configuration includes a number of settings for the security plugin that you should modify before using OpenSearch for a production workload. To learn more, see [Security]({{site.url}}{{site.baseurl}}/security-plugin/).

View File

@ -6,50 +6,14 @@ nav_order: 13
# Data streams
If you're ingesting continuously generated time-series data such as logs, events, and metrics into OpenSearch, you're likely in a scenario where:
If you're ingesting continuously generated time-series data such as logs, events, and metrics into OpenSearch, you're likely in a scenario where the number of documents grows rapidly and you don't need to update older documents.
- Youre ingesting documents that grow rapidly.
- You dont need to update older documents.
- Your searches generally target the newer documents.
A typical workflow to manage time-series data involves multiple steps, such as creating a rollover index alias, defining a write index, and defining common mappings and settings for the backing indices.
A typical workflow to manage time-series data is as follows:
Data streams simplify this process and enforce a setup that best suits time-series data, such as being designed primarily for append-only data and ensuring that each document has a timestamp field.
- To split your data into an index for each day, use the rollover operation.
- To perform searches on a virtual index name that gets expanded to the underlying indices, create an [index alias]({{site.url}}{{site.baseurl}}/opensearch/index-alias/).
- To perform a write operation on an index alias, configure the latest index as the write index.
- To configure new indices, extract common mappings and settings into an [index template]({{site.url}}{{site.baseurl}}/opensearch/index-templates/).
A data stream is internally composed of multiple backing indices. Search requests are routed to all the backing indices, while indexing requests are routed to the latest write index. [ISM]({{site.url}}{{site.baseurl}}/im-plugin/ism/index/) policies let you automatically handle index rollovers or deletions.
Even after you perform all these operations, youre still not enforcing the best practices when dealing with time-series data. For example, you can modify the indices directly. Youre able to ingest documents without a timestamp field, which might result in slower queries.
Data streams abstract the complexity and enforce the best practices for managing time-series data.
With data streams, you can store append-only time-series data across multiple indices with a single endpoint for ingesting and searching data. We recommend using data streams in place of index aliases for time-series data.
## About data streams
A data stream consists of one or more hidden auto-generated backing indices. These backing indices are named using the following convention:
```
.ds-<data-stream-name>-<generation-id>
```
For example, `.ds-logs-redis-000003`, where generation-id is a six-digit, zero-padded integer that acts as a cumulative count of the data streams rollovers, starting at `000001`.
The most recently created backing index is the data streams write index. You cant add documents directly to any of the backing indices. You can only add them via the data stream handle:
![data stream indexing diagram]({{site.url}}{{site.baseurl}}/images/data_stream_indexing.png)
The data stream routes search requests to all of its backing indices. It uses the timestamp field to intelligently route search requests to the right set of indices and shards:
![data stream indexing diagram]({{site.url}}{{site.baseurl}}/images/data_stream_searching.png)
The following operations are not supported on the write index because they might hinder the indexing operation:
- close
- clone
- delete
- shrink
- split
## Get started with data streams
@ -264,7 +228,7 @@ POST logs-redis/_rollover
If you now perform a `GET` operation on the `logs-redis` data stream, you see that the generation ID is incremented from 1 to 2.
You can also set up an [Index State Management (ISM) policy]({{site.url}}{{site.baseurl}}/ism/policies/) to automate the rollover process for the data stream.
You can also set up an [Index State Management (ISM) policy]({{site.url}}{{site.baseurl}}/im-plugin/ism/policies/) to automate the rollover process for the data stream.
The ISM policy is applied to the backing indices at the time of their creation. When you associate a policy to a data stream, it only affects the future backing indices of that data stream.
You also dont need to provide the `rollover_alias` setting, because the ISM policy infers this information from the backing index.
@ -276,7 +240,7 @@ To manage data streams from OpenSearch Dashboards, open **OpenSearch Dashboards*
You see a toggle switch for data streams that you can use to show or hide indices belonging to a data stream.
When you enable this switch, you see a data stream multi-select dropdown menu that you can use for filtering data streams.
You also see a data stream column that shows you the name of the parent data stream the index is contained in.
You also see a data stream column that shows you the name of the data stream the index is contained in.
![data stream toggle]({{site.url}}{{site.baseurl}}/images/data_streams_toggle.png)
@ -298,4 +262,4 @@ You can use wildcards to delete more than one data stream.
We recommend deleting data from a data stream using an ISM policy.
You can also use [asynchronous search]({{site.url}}{{site.baseurl}}/async/index/) and [SQL]({{site.url}}{{site.baseurl}}/sql/index/) and [PPL]({{site.url}}{{site.baseurl}}/ppl/index/) to query your data stream directly. You can also use the security plugin to define granular permissions on the data stream name.
You can also use [asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/index/) and [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) and [PPL]({{site.url}}{{site.baseurl}}/search-plugins/ppl/index/) to query your data stream directly. You can also use the security plugin to define granular permissions on the data stream name.

View File

@ -328,71 +328,6 @@ GET logs-2020-01-01
}
```
### Simulate multi-component templates
For index templates composed of multiple component templates, you can simulate applying a new template to verify whether the settings are applied as you expect.
To simulate the settings that would be applied to a specific index name:
```json
POST _index_template/_simulate_index/<index_name>
```
To simulate the settings that would be applied from an existing template:
```json
POST _index_template/_simulate/<index_template>
```
You can also specify a template definition in the simulate request:
```json
POST _index_template/_simulate
{
"index_patterns": [
"logs-2020-01-*"
],
"template": {
"settings" : {
"index.number_of_shards" : 3
}
},
"composed_of": ["component_template_1", "component_template_2"]
}
```
The `_simulate` API returns the final settings, mappings, and aliases that will be applied to indices that match the index pattern. You can also see any overlapping templates whose configuration is superseded by the simulated template body or higher priority templates:
```json
{
"template" : {
"settings" : {
"index" : {
"number_of_shards" : "3"
}
},
"mappings" : {
"properties" : {
"@timestamp" : {
"type" : "date"
},
"ip_address" : {
"type" : "ip"
}
}
},
"aliases" : { }
},
"overlapping" : [
{
"name" : "daily_logs",
"index_patterns" : [
"logs-2020-01-*"
]
}
]
}
```
## Index template options

View File

@ -5,9 +5,12 @@ nav_order: 1
has_children: false
has_toc: false
redirect_from:
- /docs/opensearch/
- /opensearch/
---
{%- comment -%}The `/docs/opensearch/` redirect is specifically to support the UI links in OpenSearch Dashboards 1.0.0.{%- endcomment -%}
# Introduction to OpenSearch
OpenSearch is a distributed search and analytics engine based on [Apache Lucene](https://lucene.apache.org/). After adding your data to OpenSearch, you can perform full-text searches on it with all of the features you might expect: search by field, search multiple indices, boost fields, rank results by score, sort results by field, and aggregate results.

133
_opensearch/install/helm.md Normal file
View File

@ -0,0 +1,133 @@
---
layout: default
title: Helm
parent: Install OpenSearch
nav_order: 6
---
# Run OpenSearch using Helm
Helm is a package manager that allows you to easily install and manage OpenSearch in a Kubernetes cluster. You can define your OpenSearch configurations in a YAML file and use Helm to deploy your applications in a version-controlled and reproducible way.
The Helm chart contains the resources described in the following table.
Resource | Description
:--- | :---
`Chart.yaml` | Information about the chart.
`values.yaml` | Default configuration values for the chart.
`templates` | Templates that combine with values to generate the Kubernetes manifest files.
The specification in the default Helm chart supports many standard use cases and setups. You can modify the default chart to configure your desired specifications and set Transport Layer Security (TLS) and role-based access control (RBAC).
For information about the default configuration, steps to configure security, and configurable parameters, see the
[README](https://github.com/opensearch-project/helm-charts/tree/main/charts).
The instructions here assume you have a Kubernetes cluster with Helm preinstalled. See the [Kubernetes documentation](https://kubernetes.io/docs/setup/) for steps to configure a Kubernetes cluster and the [Helm documentation](https://helm.sh/docs/intro/install/) to install Helm.
{: .note }
## Prerequisites
The default Helm chart deploys a three-node cluster. We recommend that you have at least 8 GiB of memory available for this deployment. You can expect the deployment to fail if, say, you have less than 4 GiB of memory available.
## Install OpenSearch using Helm
1. Clone the [helm-charts](https://github.com/opensearch-project/helm-charts) repository:
```bash
git clone https://github.com/opensearch-project/helm-charts
```
1. Change to the `opensearch` directory:
```bash
cd charts/opensearch
```
1. Package the Helm chart:
```bash
helm package .
```
1. Deploy OpenSearch:
```bash
helm install --generate-name opensearch-1.0.0.tgz
```
The output shows you the specifications instantiated from the install.
To customize the deployment, pass in the values that you want to override with a custom YAML file:
```bash
helm install --values=customvalues.yaml opensearch-1.0.0.tgz
```
#### Sample output
```yaml
NAME: opensearch-1-1629223146
LAST DEPLOYED: Tue Aug 17 17:59:07 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Watch all cluster members come up.
$ kubectl get pods --namespace=default -l app=opensearch-cluster-master -w
```
To make sure your OpenSearch pod is up and running, run the following command:
```bash
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
opensearch-cluster-master-0 1/1 Running 0 3m56s
opensearch-cluster-master-1 1/1 Running 0 3m56s
opensearch-cluster-master-2 1/1 Running 0 3m56s
```
To access the OpenSearch shell:
```bash
$ kubectl exec -it opensearch-cluster-master-0 -- /bin/bash
```
You can send requests to the pod to verify that OpenSearch is up and running:
```json
$ curl -XGET https://localhost:9200 -u 'admin:admin' --insecure
{
"name" : "opensearch-cluster-master-1",
"cluster_name" : "opensearch-cluster",
"cluster_uuid" : "hP2gq5bPS3SLp8Z7wXm8YQ",
"version" : {
"distribution" : "opensearch",
"number" : "1.0.0",
"build_type" : "tar",
"build_hash" : "34550c5b17124ddc59458ef774f6b43a086522e3",
"build_date" : "2021-07-02T23:22:21.383695Z",
"build_snapshot" : false,
"lucene_version" : "8.8.2",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "The OpenSearch Project: https://opensearch.org/"
}
```
## Uninstall using Helm
To identify the OpenSearch deployment that you want to delete:
```bash
$ helm list
NAME NAMESPACEREVISIONUPDATED STATUS CHART APP VERSION
opensearch-1-1629223146 default 1 2021-08-17 17:59:07.664498239 +0000 UTCdeployedopensearch-1.0.0 1.0.0
```
To delete or uninstall a deployment, run the following command:
```bash
helm delete opensearch-1-1629223146
```
For steps to install OpenSearch Dashboards, see [Helm to install OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/dashboards/install/helm/).

View File

@ -7,13 +7,13 @@ nav_order: 70
# Important settings
For production workloads, make sure the [Linux setting](https://www.kernel.org/doc/Documentation/sysctl/vm.txt) `vm.max_map_count` is set to at least 262144. On the OpenSearch Docker image, this setting is the default. To check, start a Bash session in the container and run:
For production workloads, make sure the [Linux setting](https://www.kernel.org/doc/Documentation/sysctl/vm.txt) `vm.max_map_count` is set to at least 262144. Even if you use the Docker image, set this value on the *host machine*. To check the current value, run this command:
```bash
cat /proc/sys/vm/max_map_count
```
To increase this value, you have to modify the Docker image. For other install types, add this setting to the host machine's `/etc/sysctl.conf` file with the following line:
To increase the value, add the following line to `/etc/sysctl.conf`:
```
vm.max_map_count=262144

View File

@ -3,7 +3,6 @@ layout: default
title: Install OpenSearch
nav_order: 2
redirect_from:
- /docs/install/
- /opensearch/install/
has_children: true
---

View File

@ -29,6 +29,23 @@ If you don't want to use the all-in-one OpenSearch installation options, you can
</tr>
</thead>
<tbody>
<tr>
<td>1.0.1</td>
<td>
<pre>opensearch-alerting 1.0.0.0
opensearch-anomaly-detection 1.0.0.0
opensearch-asynchronous-search 1.0.0.0
opensearch-index-management 1.0.1.0
opensearch-job-scheduler 1.0.0.0
opensearch-knn 1.0.0.0
opensearch-notebooks 1.0.0.0
opensearch-performance-analyzer 1.0.1.0
opensearch-reports-scheduler 1.0.0.0
opensearch-security 1.0.1.0
opensearch-sql 1.0.0.0
</pre>
</td>
</tr>
<tr>
<td>1.0.0</td>
<td>
@ -43,39 +60,6 @@ opensearch-performance-analyzer 1.0.0.0
opensearch-reports-scheduler 1.0.0.0
opensearch-security 1.0.0.0
opensearch-sql 1.0.0.0
</pre>
</td>
</tr>
<tr>
<td>1.0.0-rc1</td>
<td>
<pre>opensearch-alerting 1.0.0.0-rc1
opensearch-anomaly-detection 1.0.0.0-rc1
opensearch-asynchronous-search 1.0.0.0-rc1
opensearch-index-management 1.0.0.0-rc1
opensearch-job-scheduler 1.0.0.0-rc1
opensearch-knn 1.0.0.0-rc1
opensearch-notebooks 1.0.0.0-rc1
opensearch-performance-analyzer 1.0.0.0-rc1
opensearch-reports-scheduler 1.0.0.0-rc1
opensearch-security 1.0.0.0-rc1
opensearch-sql 1.0.0.0-rc1
</pre>
</td>
</tr>
<tr>
<td>1.0.0-beta1</td>
<td>
<pre>opensearch-alerting 1.0.0.0-beta1
opensearch-anomaly-detection 1.0.0.0-beta1
opensearch-asynchronous-search 1.0.0.0-beta1
opensearch-index-management 1.0.0.0-beta1
opensearch-job-scheduler 1.0.0.0-beta1
opensearch-knn 1.0.0.0-beta1
opensearch-performance-analyzer 1.0.0.0-beta1
opensearch-reports-scheduler 1.0.0.0-beta1
opensearch-security 1.0.0.0-beta1
opensearch-sql 1.0.0.0-beta1
</pre>
</td>
</tr>

View File

@ -111,7 +111,7 @@ In a tarball installation, Performance Analyzer collects data when it is enabled
1. Launch the agent CLI:
```bash
ES_HOME="$PWD" ./bin/performance-analyzer-agent-cli
OPENSEARCH_HOME="$PWD" ./bin/performance-analyzer-agent-cli
```
1. In a separate window, enable the Performance Analyzer plugin:

View File

@ -88,8 +88,8 @@ GET opensearch_dashboards_sample_data_ecommerce/_search
}
```
The cardinality count is approximate.
If you had tens of thousands of products in your store, an accurate cardinality calculation requires loading all the values into a hash set and returning its size. This approach doesn't scale well because it requires more memory and causes high latency.
Cardinality count is approximate.
If you have tens of thousands of products in your hypothetical store, an accurate cardinality calculation requires loading all the values into a hash set and returning its size. This approach doesn't scale well; it requires huge amounts of memory and can cause high latencies.
You can control the trade-off between memory and accuracy with the `precision_threshold` setting. This setting defines the threshold below which counts are expected to be close to accurate. Above this value, counts might become a bit less accurate. The default value of `precision_threshold` is 3,000. The maximum supported value is 40,000.

View File

@ -3,7 +3,6 @@ layout: default
title: Boolean queries
parent: Query DSL
nav_order: 45
redirect_from: /docs/opensearch/bool/
---
# Boolean queries

View File

@ -3,9 +3,6 @@ layout: default
title: Full-text queries
parent: Query DSL
nav_order: 40
redirect_from:
- /docs/opensearch/full-text/
- /opensearch/full-text/
---
# Full-text queries
@ -24,7 +21,7 @@ This page lists all full-text query types and common options. Given the sheer nu
## Match
Creates a [boolean query](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/BooleanQuery.html) that returns results if the search term is present in the field.
Creates a [boolean query](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/BooleanQuery.html) that returns results if the search term is present in the field.
The most basic form of the query provides only a field (`title`) and a term (`wind`):
@ -129,7 +126,7 @@ GET _search
## Match boolean prefix
Similar to [match](#match), but creates a [prefix query](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/PrefixQuery.html) out of the last term in the query string.
Similar to [match](#match), but creates a [prefix query](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/PrefixQuery.html) out of the last term in the query string.
```json
GET _search
@ -167,7 +164,7 @@ GET _search
## Match phrase
Creates a [phrase query](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/PhraseQuery.html) that matches a sequence of terms.
Creates a [phrase query](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/PhraseQuery.html) that matches a sequence of terms.
```json
GET _search
@ -201,7 +198,7 @@ GET _search
## Match phrase prefix
Similar to [match phrase](#match-phrase), but creates a [prefix query](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/PrefixQuery.html) out of the last term in the query string.
Similar to [match phrase](#match-phrase), but creates a [prefix query](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/PrefixQuery.html) out of the last term in the query string.
```json
GET _search
@ -413,7 +410,7 @@ Option | Valid values | Description
`allow_leading_wildcard` | Boolean | Whether `*` and `?` are allowed as the first character of a search term. The default is true.
`analyze_wildcard` | Boolean | Whether OpenSearch should attempt to analyze wildcard terms. Some analyzers do a poor job at this task, so the default is false.
`analyzer` | `standard, simple, whitespace, stop, keyword, pattern, <language>, fingerprint` | The analyzer you want to use for the query. Different analyzers have different character filters, tokenizers, and token filters. The `stop` analyzer, for example, removes stop words (e.g. "an," "but," "this") from the query string.
`auto_generate_synonyms_phrase_query` | Boolean | A value of true (default) automatically generates [phrase queries](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/PhraseQuery.html) for multi-term synonyms. For example, if you have the synonym `"ba, batting average"` and search for "ba," OpenSearch searches for `ba OR "batting average"` (if this option is true) or `ba OR (batting AND average)` (if this option is false).
`auto_generate_synonyms_phrase_query` | Boolean | A value of true (default) automatically generates [phrase queries](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/PhraseQuery.html) for multi-term synonyms. For example, if you have the synonym `"ba, batting average"` and search for "ba," OpenSearch searches for `ba OR "batting average"` (if this option is true) or `ba OR (batting AND average)` (if this option is false).
`boost` | Floating-point | Boosts the clause by the given multiplier. Useful for weighing clauses in compound queries. The default is 1.0.
`cutoff_frequency` | Between `0.0` and `1.0` or a positive integer | This value lets you define high and low frequency terms based on number of occurrences in the index. Numbers between 0 and 1 are treated as a percentage. For example, 0.10 is 10%. This value means that if a word occurs within the search field in more than 10% of the documents on the shard, OpenSearch considers the word "high frequency" and deemphasizes it when calculating search score.<br /><br />Because this setting is *per shard*, testing its impact on search results can be challenging unless a cluster has many documents.
`enable_position_increments` | Boolean | When true, result queries are aware of position increments. This setting is useful when the removal of stop words leaves an unwanted "gap" between terms. The default is true.
@ -423,7 +420,7 @@ Option | Valid values | Description
`fuzzy_transpositions` | Boolean | Setting `fuzzy_transpositions` to true (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the `fuzziness` option. For example, the distance between `wind` and `wnid` is 1 if `fuzzy_transpositions` is true (swap "n" and "i") and 2 if it is false (delete "n", insert "n"). <br /><br />If `fuzzy_transpositions` is false, `rewind` and `wnid` have the same distance (2) from `wind`, despite the more human-centric opinion that `wnid` is an obvious typo. The default is a good choice for most use cases.
`lenient` | Boolean | Setting `lenient` to true lets you ignore data type mismatches between the query and the document field. For example, a query string of "8.2" could match a field of type `float`. The default is false.
`low_freq_operator` | `and, or` | The operator for low-frequency terms. The default is `or`. See [Common terms](#common-terms) queries and `operator` in this table.
`max_determinized_states` | Positive integer | The maximum number of "[states](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/util/automaton/Operations.html#DEFAULT_MAX_DETERMINIZED_STATES)" (a measure of complexity) that Lucene can create for query strings that contain regular expressions (e.g. `"query": "/wind.+?/"`). Larger numbers allow for queries that use more memory. The default is 10,000.
`max_determinized_states` | Positive integer | The maximum number of "[states](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/util/automaton/Operations.html#DEFAULT_MAX_DETERMINIZED_STATES)" (a measure of complexity) that Lucene can create for query strings that contain regular expressions (e.g. `"query": "/wind.+?/"`). Larger numbers allow for queries that use more memory. The default is 10,000.
`max_expansions` | Positive integer | Fuzzy queries "expand to" a number of matching terms that are within the distance specified in `fuzziness`. Then OpenSearch tries to match those terms against its indices. `max_expansions` specifies the maximum number of terms that the fuzzy query expands to. The default is 50.
`minimum_should_match` | Positive or negative integer, positive or negative percentage, combination | If the query string contains multiple search terms and you used the `or` operator, the number of terms that need to match for the document to be considered a match. For example, if `minimum_should_match` is 2, "wind often rising" does not match "The Wind Rises." If `minimum_should_match` is 1, it matches. This option also has `low_freq` and `high_freq` properties for [Common terms](#common-terms) queries.
`operator` | `or, and` | If the query string contains multiple search terms, whether all terms need to match (`and`) or only one term needs to match (`or`) for a document to be considered a match.
@ -431,7 +428,7 @@ Option | Valid values | Description
`prefix_length` | `0` (default) or a positive integer | The number of leading characters that are not considered in fuzziness.
`quote_field_suffix` | String | This option lets you search different fields depending on whether terms are wrapped in quotes. For example, if `quote_field_suffix` is `".exact"` and you search for `"lightly"` (in quotes) in the `title` field, OpenSearch searches the `title.exact` field. This second field might use a different type (e.g. `keyword` rather than `text`) or a different analyzer. The default is null.
`rewrite` | `constant_score, scoring_boolean, constant_score_boolean, top_terms_N, top_terms_boost_N, top_terms_blended_freqs_N` | Determines how OpenSearch rewrites and scores multi-term queries. The default is `constant_score`.
`slop` | `0` (default) or a positive integer | Controls the degree to which words in a query can be misordered and still be considered a match. From the [Lucene documentation](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/PhraseQuery.html#getSlop--): "The number of other words permitted between words in query phrase. For example, to switch the order of two words requires two moves (the first move places the words atop one another), so to permit re-orderings of phrases, the slop must be at least two. A value of zero requires an exact match."
`slop` | `0` (default) or a positive integer | Controls the degree to which words in a query can be misordered and still be considered a match. From the [Lucene documentation](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/PhraseQuery.html#getSlop--): "The number of other words permitted between words in query phrase. For example, to switch the order of two words requires two moves (the first move places the words atop one another), so to permit re-orderings of phrases, the slop must be at least two. A value of zero requires an exact match."
`tie_breaker` | `0.0` (default) to `1.0` | Changes the way OpenSearch scores searches. For example, a `type` of `best_fields` typically uses the highest score from any one field. If you specify a `tie_breaker` value between 0.0 and 1.0, the score changes to highest score + `tie_breaker` * score for all other matching fields. If you specify a value of 1.0, OpenSearch adds together the scores for all matching fields (effectively defeating the purpose of `best_fields`).
`time_zone` | UTC offset | The time zone to use (e.g. `-08:00`) if the query string contains a date range (e.g. `"query": "wind rises release_date[2012-01-01 TO 2014-01-01]"`). The default is `UTC`.
`type` | `best_fields, most_fields, cross-fields, phrase, phrase_prefix` | Determines how OpenSearch executes the query and scores the results. The default is `best_fields`.

View File

@ -5,8 +5,11 @@ nav_order: 27
has_children: true
redirect_from:
- /opensearch/query-dsl/
- /docs/opensearch/query-dsl/
---
{%- comment -%}The `/docs/opensearch/query-dsl/` redirect is specifically to support the UI links in OpenSearch Dashboards 1.0.0.{%- endcomment -%}
# Query DSL
While you can use HTTP request parameters to perform simple searches, you can also use the OpenSearch query domain-specific language (DSL), which provides a wider range of search options. The query DSL uses the HTTP request body, so you can more easily customize your queries to get the exact results that you want.

View File

@ -3,7 +3,6 @@ layout: default
title: Term-level queries
parent: Query DSL
nav_order: 30
redirect_from: /docs/opensearch/term/
---
# Term-level queries
@ -145,7 +144,7 @@ The search query “To be, or not to be” is analyzed and tokenized into an arr
...
```
For a list of all full-text queries, see [Full-text queries]({{site.url}}{{site.baseurl}}/opensearch/full-text/).
For a list of all full-text queries, see [Full-text queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/).
If you want to query for an exact term like “HAMLET” in the speaker field and don't need the results to be sorted by relevance scores, a term-level query is more efficient:
@ -431,7 +430,7 @@ Wildcard queries tend to be slow because they need to iterate over a lot of term
## Regex
Use the `regex` query to search for terms that match a regular expression.
Use the `regexp` query to search for terms that match a regular expression.
This regular expression matches any single uppercase or lowercase letter:
@ -440,12 +439,14 @@ GET shakespeare/_search
{
"query": {
"regexp": {
"play_name": "H[a-zA-Z]+mlet"
"play_name": "[a-zA-Z]amlet"
}
}
}
```
Regular expressions are applied to the terms in the field and not the entire value of the field.
A few important notes:
The efficiency of your regular expression depends a lot on the patterns you write. Make sure that you write `regex` queries with either a prefix or suffix to improve performance.
- Regular expressions are applied to the terms in the field (i.e. tokens), not the entire field.
- Regular expressions use the Lucene syntax, which differs from more standardized implementations. Test thoroughly to ensure that you receive the results you expect. To learn more, see [the Lucene documentation](https://lucene.apache.org/core/{{site.lucene_version}}/core/index.html).
- `regexp` queries can be expensive operations and require the `search.allow_expensive_queries` setting to be set to `true`. Before making frequent `regexp` queries, test their impact on cluster performance and examine alternative queries for achieving similar results.

View File

@ -156,28 +156,6 @@ POST _reindex
}
```
## Reindex sorted documents
You can copy certain documents after sorting specific fields in the document.
This command copies the last 10 documents based on the `timestamp` field:
```json
POST _reindex
{
"size":10,
"source":{
"index":"source",
"sort":{
"timestamp":"desc"
}
},
"dest":{
"index":"destination"
}
}
```
## Transform documents during reindexing
You can transform your data during the reindexing process using the `script` option.
@ -272,7 +250,6 @@ Option | Valid values | Description | Required
`query` | Object | The search query to use for the reindex operation. | No
`size` | Integer | The number of documents to reindex. | No
`slice` | String | Specify manual or automatic slicing to parallelize reindexing. | No
`sort` | List | Sort specific fields in the document before reindexing. | No
## Destination index options

View File

@ -2,11 +2,12 @@
layout: default
title: Alias
parent: REST API reference
grand_parent: OpenSearch
nav_order: 8
nav_order: 5
---
# Alias
Introduced 1.0
{: .label .label-purple }
An alias is a virtual pointer that you can use to reference one or more indices. Creating and updating aliases are atomic operations, so you can reindex your data and point an alias at it without any downtime.

View File

@ -8,6 +8,8 @@ has_children: false
---
# cat aliases
Introduced 1.0
{: .label .label-purple }
The cat aliases operation lists the mapping of aliases to indices, plus routing and filtering information.

View File

@ -8,6 +8,8 @@ has_children: false
---
# cat allocation
Introduced 1.0
{: .label .label-purple }
The cat allocation operation lists the allocation of disk space for indices and the number of shards on each node.

View File

@ -8,6 +8,8 @@ has_children: false
---
# cat count
Introduced 1.0
{: .label .label-purple }
The cat count operation lists the number of documents in your cluster.

View File

@ -8,6 +8,8 @@ has_children: false
---
# cat fielddata
Introduced 1.0
{: .label .label-purple }
The cat fielddata operation lists the memory size used by each field per node.

View File

@ -8,6 +8,8 @@ has_children: false
---
# cat health
Introduced 1.0
{: .label .label-purple }
The cat health operation lists the status of the cluster, how long the cluster has been up, the number of nodes, and other useful information that helps you analyze the health of your cluster.

View File

@ -8,6 +8,8 @@ has_children: false
---
# cat indices
Introduced 1.0
{: .label .label-purple }
The cat indices operation lists information related to indices—how much disk space they are using, how many shards they have, their health status, and so on.

View File

@ -8,6 +8,8 @@ has_children: false
---
# cat master
Introduced 1.0
{: .label .label-purple }
The cat master operation lists information that helps identify the elected master node.

View File

@ -8,6 +8,8 @@ has_children: false
---
# cat nodeattrs
Introduced 1.0
{: .label .label-purple }
The cat nodeattrs operation lists the attributes of custom nodes.

View File

@ -8,6 +8,8 @@ has_children: false
---
# cat nodes
Introduced 1.0
{: .label .label-purple }
The cat nodes operation lists node-level information, including node roles and load metrics.

View File

@ -8,6 +8,8 @@ has_children: false
---
# cat pending tasks
Introduced 1.0
{: .label .label-purple }
The cat pending tasks operation lists the progress of all pending tasks, including task priority and time in queue.

View File

@ -8,6 +8,8 @@ has_children: false
---
# cat plugins
Introduced 1.0
{: .label .label-purple }
The cat plugins operation lists the names, components, and versions of the installed plugins.

View File

@ -8,6 +8,8 @@ has_children: false
---
# cat recovery
Introduced 1.0
{: .label .label-purple }
The cat recovery operation lists all completed and ongoing index and shard recoveries.

View File

@ -8,6 +8,8 @@ has_children: false
---
# cat repositories
Introduced 1.0
{: .label .label-purple }
The cat repositories operation lists all completed and ongoing index and shard recoveries.

View File

@ -8,6 +8,8 @@ has_children: false
---
# cat segments
Introduced 1.0
{: .label .label-purple }
The cat segments operation lists Lucene segment-level information for each index.

View File

@ -8,6 +8,8 @@ has_children: false
---
# cat shards
Introduced 1.0
{: .label .label-purple }
The cat shards operation lists the state of all primary and replica shards and how they are distributed.

View File

@ -8,6 +8,8 @@ has_children: false
---
# cat snapshots
Introduced 1.0
{: .label .label-purple }
The cat snapshots operation lists all snapshots for a repository.

View File

@ -8,6 +8,8 @@ has_children: false
---
# cat tasks
Introduced 1.0
{: .label .label-purple }
The cat tasks operation lists the progress of all tasks currently running on your cluster.

View File

@ -8,6 +8,8 @@ has_children: false
---
# cat templates
Introduced 1.0
{: .label .label-purple }
The cat templates operation lists the names, patterns, order numbers, and version numbers of index templates.

View File

@ -8,6 +8,8 @@ has_children: false
---
# cat thread pool
Introduced 1.0
{: .label .label-purple }
The cat thread pool operation lists the active, queued, and rejected threads of different thread pools on each node.
@ -47,5 +49,5 @@ master_timeout | Time | The amount of time to wait for a connection to the maste
node_name name active queue rejected
odfe-node2 ad-batch-task-threadpool 0 0 0
odfe-node2 ad-threadpool 0 0 0
odfe-node2 analyze 0 0 0s
odfe-node2 analyze 0 0 0s
```

View File

@ -6,6 +6,8 @@ nav_order: 10
---
# Cluster allocation explain
Introduced 1.0
{: .label .label-purple }
The most basic cluster allocation explain request finds an unassigned shard and explains why it can't be allocated to a node.

View File

@ -6,6 +6,8 @@ nav_order: 15
---
# Cluster health
Introduced 1.0
{: .label .label-purple }
The most basic cluster health request returns a simple status of the health of your cluster. OpenSearch expresses cluster health in three colors: green, yellow, and red. A green status means all primary shards and their replicas are allocated to nodes. A yellow status means all primary shards are allocated to nodes, but some replicas aren't. A red status means at least one primary shard is not allocated to any node.

View File

@ -6,6 +6,8 @@ nav_order: 20
---
# Cluster settings
Introduced 1.0
{: .label .label-purple }
The cluster settings operation lets you check the current settings for your cluster, review default settings, and change settings. When you update a setting using the API, OpenSearch applies it to all nodes in the cluster.

View File

@ -0,0 +1,103 @@
---
layout: default
title: Count
parent: REST API reference
nav_order: 150
---
# Count
Introduced 1.0
{: .label .label-purple }
The count API gives you quick access to the number of documents that match a query.
You can also use it to check the document count of an index, data stream, or cluster.
## Example
To see the number of documents that match a query:
```json
GET opensearch_dashboards_sample_data_logs/_count
{
"query": {
"term": {
"response": "200"
}
}
}
```
The following call to the search API produces equivalent results:
```json
GET opensearch_dashboards_sample_data_logs/_search
{
"query": {
"term": {
"response": "200"
}
},
"size": 0,
"track_total_hits": true
}
```
To see the number of documents in an index:
```json
GET opensearch_dashboards_sample_data_logs/_count
```
To check for the number of documents in a [data stream]({{site.url}}{{site.baseurl}}/opensearch/data-streams/), replace the index name with the data stream name.
To see the number of documents in your cluster:
```json
GET _count
```
Alternatively, you could use the [cat indices]({{site.url}}{{site.baseurl}}/opensearch/rest-api/cat/cat-indices/) and [cat count]({{site.url}}{{site.baseurl}}/opensearch/rest-api/cat/cat-count/) APIs to see the number of documents per index or data stream.
{: .note }
## Path and HTTP methods
```
GET <target>/_count/<id>
POST <target>/_count/<id>
```
## URL parameters
All count parameters are optional.
Parameter | Type | Description
:--- | :--- | :---
`allow_no_indices` | Boolean | If false, the request returns an error if any wildcard expression or index alias targets any closed or missing indices. Default is false.
`analyzer` | String | The analyzer to use in the query string.
`analyze_wildcard` | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is false.
`default_operator` | String | Indicates whether the default operator for a string query should be AND or OR. Default is OR.
`df` | String | The default field in case a field prefix is not provided in the query string.
`expand_wildcards` | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indices), `closed` (match closed, non-hidden indices), `hidden` (match hidden indices), and `none` (deny wildcard expressions). Default is `open`.
`ignore_unavailable` | Boolean | Specifies whether to include missing or closed indices in the response. Default is false.
`lenient` | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is false.
`min_score` | Float | Include only documents with a minimum `_score` value in the result.
`routing` | String | Value used to route the operation to a specific shard.
`preference` | String | Specifies which shard or node OpenSearch should perform the count operation on.
`terminate_after` | Integer | The maximum number of documents OpenSearch should process before terminating the request.
## Response
```json
{
"count" : 14074,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
```

View File

@ -2,11 +2,12 @@
layout: default
title: Create index
parent: REST API reference
grand_parent: OpenSearch
nav_order: 7
nav_order: 3
---
# Create index
Introduced 1.0
{: .label .label-purple }
While you can create an index by using a document as a base, you can also just create an empty index for use later.
@ -82,9 +83,8 @@ index.number_of_routing_shards | The number of routing shards used to split an i
index.shard.check_on_startup | Whether the index's shards should be checked for corruption. Available options are `false` (do not check for corruption), `checksum` (check for physical corruption), and `true` (check for both physical and logical corruption). Default is `false`.
index.codec | The compression type to use to compress stored data. Available values are `best_compression` and `default`.
index.routing_partition_size | The number of shards a custom routing value can go to. Routing helps an imbalanced cluster by relocating values to a subset of shards rather than just a single shard. To enable, set this value to greater than 1 but less than `index.number_of_shards`. Default is 1.
index.soft_deletes_enabled | Whether to retain a history of operations performed on the index.
index.soft_deletes.retention_lease.period | The maximum amount of time to retain a shard's history of operations. Default is `12h`.
index.load_fixed_bitset_filters_eagerly | Indicates whether cached filters should be pre-loaded for nested queries. Available options are `true` and `false`. Default is `true`.
index.load_fixed_bitset_filters_eagerly | Whether OpenSearch should pre-load cached filters. Available options are `true` and `false`. Default is `true`.
index.hidden | Whether the index should be hidden. Hidden indices are not returned as part of queries that have wildcards. Available options are `true` and `false`. Default is `false`.
#### Dynamic index Settings
@ -92,28 +92,29 @@ index.hidden | Whether the index should be hidden. Hidden indices are not return
Setting | Description
:--- | :---
index.number_of_replicas | The number of replica shards each primary shard should have. For example, if you have 4 primary shards and set `index.number_of_replicas` to 3, the index has 12 replica shards. Default is 1.
index.auto_expand_replicas | Automatically expands the number of replicas based on the amount of data nodes in the cluster. Specify a lower bound and upper limit (for example, 0-9), or `all` for the upper limit. For example, if you have 5 data nodes and set `index.auto_expand_replicas` to 0-3, then the cluster does not autoamtically add another replica shard. However, if you set this value to `0-all` and add 2 more nodes for a total of 7, the cluster will expand to now have 6 replica shards. Default is disabled.
index.auto_expand_replicas | Whether the cluster should automatically add replica shards based on the number of data nodes. Specify a lower bound and upper limit (for example, 0-9), or `all` for the upper limit. For example, if you have 5 data nodes and set `index.auto_expand_replicas` to 0-3, then the cluster does not autoamtically add another replica shard. However, if you set this value to `0-all` and add 2 more nodes for a total of 7, the cluster will expand to now have 6 replica shards. Default is disabled.
index.search.idle.after | Amount of time a shard should wait for a search or get request until it goes idle. Default is `30s`.
index.refresh_interval | How often the index should refresh, which publishes its most recent changes and makes them available for searching. Can be set to `-1` to disable refreshing. Default is `1s`.
index.max_result_window | The maximum value of `from` + `size` for searches to the index. `from` is the starting index to search from, and `size` is the amount of results to return. Default: 10000.
index.max_inner_result_window | The maximum value of `from` + `size` for inner hits definitions and top hits aggregations to the index. `from` is the starting index to search from, and `size` is the amount of top hits to return. Default is 100.
index.max_inner_result_window | Maximum value of `from` + `size` to return nested search hits and most relevant document aggregated during the query. `from` is the starting index to search from, and `size` is the amount of top hits to return. Default is 100.
index.max_rescore_window | The maximum value of `window_size` for rescore requests to the index. Rescore requests reorder the index's documents and return a new score, which can be more precise. Default is the same as index.max_inner_result_window or 10000 by default.
index.max_docvalue_fields_search | Maximum amount of `docvalue_fields` allowed in a query. Default is 100.
index.max_script_fields | Maximum amount of `script_fields` allowed in a query. Default is 32.
index.max_ngram_diff | Maximum allowed difference between `min_gram` and `max_gram` values for `NGramTokenizer` and `NGramTokenFilter`. Default is 1.
index.max_shingle_diff | Maximum allowed difference between `max_shingle_size` and `min_shingle_size` for the `shingle` token filter. Default is 3.
index.max_refresh_listeners | Maximum amount of refresh listeners available on the index's shards.
index.max_ngram_diff | Maximum difference between `min_gram` and `max_gram` values for `NGramTokenizer` and `NGramTokenFilter` fields. Default is 1.
index.max_shingle_diff | Maximum difference between `max_shingle_size` and `min_shingle_size` to feed into the `shingle` token filter. Default is 3.
index.max_refresh_listeners | Maximum amount of refresh listeners each shard is allowed to have.
index.analyze.max_token_count | Maximum amount of tokens that can return from the `_analyze` API operation. Default is 10000.
index.highlight.max_analyzed_offset | The maximum amount of characters that will be analyzed in a highlight request. Default is 1000000.
index.max_terms_count | The maximum amount of terms that can be used in a terms query. Default is 65536.
index.highlight.max_analyzed_offset | The amount of characters a highlight request can analyze. Default is 1000000.
index.max_terms_count | The maximum amount of terms a terms query can accept. Default is 65536.
index.max_regex_length | The maximum character length of regex that can be in a regexp query. Default is 1000.
index.query.default_field | A field or list of fields that OpenSearch uses in queries in case a field isn't specified in the parameters.
index.routing.allocation.enable | Specifies options for the index's shard allocation. Available options are `all` (allow shard allocation for all shards), `primaries` (allow shard allocation only for primary shards), `new_primaries` (allow shard allocaiton only for new primary shards), and `none` (do not allow shard allocation). Default is `all`.
index.routing.rebalance.enable | Enables shard rebalancing for the index. Available options are `all` (allow shard rebalancing for all shards), `primaries`, (allow shard rebalancing only for primary shards), `replicas` (allow shard rebalancing only for replicas), and `none` (do not allow shard rebalancing). Default is `all`.
index.routing.allocation.enable | Specifies options for the indexs shard allocation. Available options are all (allow allocation for all shards), primaries (allow allocation only for primary shards), new_primaries (allow allocation only for new primary shards), and none (do not allow allocation). Default is all.
index.routing.rebalance.enable | Enables shard rebalancing for the index. Available options are `all` (allow rebalancing for all shards), `primaries` (allow rebalancing only for primary shards), `replicas` (allow rebalancing only for replicas), and `none` (do not allow rebalancing). Default is `all`.
index.gc_deletes | Amount of time to retain a deleted document's version number. Default is `60s`.
index.default_pipeline | The default ingest node pipeline for the index. If the default pipeline is set and the pipeline does not exist, then index requests fail. The pipeline name `_none` specifies that the index does not have an ingest pipeline.
index.final_pipeline | The final ingest node pipeline for the index. If the final pipeline is set and the pipeline does not exist, then index requests fail. The pipeline name `_none` specifies that the index does not have an ingest pipeline.
### Mappings
Mappings define how a documents and its fields are stored and indexed. If you're just starting to build out your cluster and data, you may not know exactly how your data should be stored. In those cases, you can use dynamic mappings, which tell OpenSearch to dynamically add data and their fields. However, if you know exactly what types your data fall under and want to enforce that standard, then you can use explicit mappings.

View File

@ -7,6 +7,8 @@ nav_order: 20
---
# Bulk
Introduced 1.0
{: .label .label-purple }
The bulk operation lets you add, update, or delete many documents in a single request. Compared to individual OpenSearch indexing requests, the bulk operation has significant performance benefits. Whenever practical, we recommend batching indexing operations into bulk requests.
@ -30,7 +32,7 @@ POST _bulk
```
POST _bulk
POST {index}/_bulk
POST <index>/_bulk
```
Specifying the index in the path means you don't need to include it in the [request body]({{site.url}}{{site.baseurl}}/opensearch/rest-api/document-apis/bulk/#request-body).

View File

@ -0,0 +1,122 @@
---
layout: default
title: Delete by query
parent: Document APIs
grand_parent: REST API reference
nav_order: 40
---
# Delete by query
Introduced 1.0
{: .label .label-purple}
You can include a query as part of your delete request so OpenSearch deletes all documents that match that query.
## Example
```json
POST sample-index1/_delete_by_query
{
"query": {
"match": {
"movie-length": "124"
}
}
}
```
## Path and HTTP methods
```
POST <index>/_delete_by_query
```
## URL parameters
All URL parameters are optional.
Parameter | Type | Description
:--- | :--- | :--- | :---
&lt;index&gt; | String | Name or list of the data streams, indices, or aliases to delete from. Supports wildcards. If left blank, OpenSearch searches all indices.
allow_no_indices | Boolean | Whether to ignore wildcards that dont match any indices. Default is `true`.
analyzer | String | The analyzer to use in the query string.
analyze_wildcard | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is false.
conflicts | String | Indicates to OpenSearch what should happen if the delete by query operation runs into a version conflict. Valid options are `abort` and `proceed`. Default is `abort`.
default_operator | String | Indicates whether the default operator for a string query should be AND or OR. Default is OR.
df | String | The default field in case a field prefix is not provided in the query string.
expand_wildcards | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indices), `closed` (match closed, non-hidden indices), `hidden` (match hidden indices), and `none` (deny wildcard expressions). Default is `open`.
from | Integer | The starting index to search from. Default is 0.
ignore_unavailable | Boolean | Specifies whether to include missing or closed indices in the response. Default is false.
lenient | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is false.
max_docs | Integer | How many documents the delete by query operation should process at most. Default is all documents.
preference | String | Specifies which shard or node OpenSearch should perform the delete by query operation on.
q | String | Lucene query string's query.
request_cache | Boolean | Specifies whether OpenSearch should use the request cache. Default is whether its enabled in the indexs settings.
refresh | Boolean | If true, OpenSearch refreshes shards to make the delete by query operation available to search results. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is `false`.
requests_per_second | Integer | Specifies the request's throttling in sub-requests per second. Default is -1, which means no throttling.
routing | String | Value used to route the operation to a specific shard.
scroll | Time | Amount of time the search context should be open.
scroll_size | Integer | Size of the operation's scroll requests. Default is 1000.
search_type | String | Whether OpenSearch should use global term and document frequencies calculating revelance scores. Valid choices are `query_then_fetch` and `dfs_query_then_fetch`. `query_then_fetch` scores documents using local term and document frequencies for the shard. Its usually faster but less accurate. `dfs_query_then_fetch` scores documents using global term and document frequencies across all shards. Its usually slower but more accurate. Default is `query_then_fetch`.
search_timeout | Time | How long to wait until OpenSearch deems the request timed out. Default is no timeout.
sort | String | A comma-separated list of &lt;field&gt; : &lt;direction&gt; pairs to sort by.
_source | String | Specifies whether to include the `_source` field in the response.
_source_excludes | String | A comma-separated list of source fields to exclude from the response.
_source_includes | String | A comma-separated list of source fields to include in the response.
stats | String | Value to associate with the request for additional logging.
terminate_after | Integer | The maximum number of documents OpenSearch should process before terminating the request.
timeout | Time | How long the operation should wait from a response from active shards. Default is `1m`.
version | Boolean | Whether to include the document version as a match.
wait_for_active_shards | String | The number of shards that must be active before OpenSearch executes the operation. Valid values are `all` or any integer up to the total number of shards in the index. Default is 1, which is the primary shard.
## Request body
To search your index for specific documents, you must include a [query]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index) in the request body that OpenSearch uses to match documents. If you don't use a query, OpenSearch treats your delete request as a simple [delete document operation]({{site.url}}{{site.baseurl}}/opensearch/rest-api/document-apis/delete-document).
```json
{
"query": {
"match": {
"movie-length": "124"
}
}
}
```
## Response
```json
{
"took": 143,
"timed_out": false,
"total": 1,
"deleted": 1,
"batches": 1,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1.0,
"throttled_until_millis": 0,
"failures": []
}
```
## Response body fields
Field | Description
:--- | :---
took | The amount of time in milliseconds OpenSearch needed to complete the operation.
timed_out | Whether any delete requests during the operation timed out.
total | Total number of documents processed.
deleted | Total number of documents deleted.
batches | Number of scroll responses the request processed.
version_conflicts | Number of conflicts the request ran into.
noops | How many delete requests OpenSearch ignored during the operation. This field always returns 0.
retries | The number of bulk and search retry requests.
throttled_millis | Number of throttled milliseconds during the request.
requests_per_second | Number of requests executed per second during the operation.
throttled_until_millis | The amount of time until OpenSearch executes the next throttled request. Always equal to 0 in a delete by query request.
failures | Any failures that occur during the request.

View File

@ -2,10 +2,13 @@
layout: default
title: Delete document
parent: Document APIs
nav_order: 10
grand_parent: REST API reference
nav_order: 15
---
# Get document
# Delete document
Introduced 1.0
{: .label .label-purple }
If you no longer need a document in your index, you can use the delete document API operation to delete it.
@ -18,20 +21,20 @@ DELETE /sample-index1/_doc/1
## Path and HTTP methods
```
DELETE /<index-name>/_doc/<_id>
DELETE /<index>/_doc/<_id>
```
## URL parameters
Parameter | Type | Description | Required
:--- | :--- | :--- | :---
&lt;index-name&gt; | String | The index to delete from. | Yes
&lt;index&gt; | String | The index to delete from. | Yes
&lt;_id&gt; | String | The ID of the document to delete. | Yes
if_seq_no | Integer | Only perform the delete operation if the document's version number matches the specified number. | No
if_primary_term | Integer | Only perform the delete operation if the document has the specified primary term. | No
refresh | Enum | If true, OpenSearch refreshes shards to make the operation visible to searching. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is false. | No
refresh | Enum | If true, OpenSearch refreshes shards to make the delete operation available to search results. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is `false`. | No
routing | String | Value used to route the operation to a specific shard. | No
timeout | Time | How long to wait for a response from the cluster. | No
timeout | Time | How long to wait for a response from the cluster. Default is `1m`. | No
version | Integer | The version of the document to delete, which must match the last updated version of the document. | No
version_type | Enum | Retrieves a specifically typed document. Available options are `external` (retrieve the document if the specified version number is greater than the document's current version) and `external_gte` (retrieve the document if the specified version number is greater than or equal to the document's current version). For example, to delete version 3 of a document, use `/_doc/1?version=3&version_type=external`. | No
wait_for_active_shards | String | The number of active shards that must be available before OpenSearch processes the delete request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the operation to succeed. | No

View File

@ -2,10 +2,13 @@
layout: default
title: Get document
parent: Document APIs
grand_parent: REST API reference
nav_order: 5
---
# Get document
Introduced 1.0
{: .label .label-purple }
After adding a JSON document to your index, you can use the get document API operation to retrieve the document's information and data.
@ -34,9 +37,9 @@ Parameter | Type | Description
:--- | :--- | :---
preference | String | Specifies a preference of which shard to retrieve results from. Available options are `_local`, which tells the operation to retrieve results from a locally allocated shard replica, and a custom string value assigned to a specific shard replica. By default, OpenSearch executes get document operations on random shards.
realtime | Boolean | Specifies whether the operation should run in realtime. If false, the operation waits for the index to refresh to analyze the source to retrieve data, which makes the operation near-realtime. Default is true.
refresh | Boolean | If true, OpenSearch refreshes shards to make the operation visible to searching. Default is false.
refresh | Boolean | If true, OpenSearch refreshes shards to make the get operation available to search results. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is `false`.
routing | String | A value used to route the operation to a specific shard.
stored_fields | Boolean | If true, the operation retrieves document fields stored in the index rather than the document's `_source`. Default is false.
stored_fields | Boolean | Whether the get operation should retrieve fields stored in the index. Default is false.
_source | String | Whether to include the `_source` field in the response body. Default is true.
_source_excludes | String | A comma-separated list of source fields to exclude in the query response.
_source_includes | String | A comma-separated list of source fields to include in the query response.

View File

@ -0,0 +1,95 @@
---
layout: default
title: Index document
parent: Document APIs
grand_parent: REST API reference
nav_order: 1
---
# Index document
Introduced 1.0
{: .label .label-purple}
Before you can search for data, you must first add documents. This operation adds a single document to your index.
## Example
```json
PUT sample-index/_doc/1
{
"Description": "To be or not to be, that is the question."
}
```
## Path and HTTP methods
```
PUT <index>/_doc/<_id>
POST <index>/_doc
PUT <index>/_create/<_id>
POST <index>/_create/<_id>
```
## URL parameters
In your request, you must specify the index you want to add your document to. If the index doesn't already exist, OpenSearch automatically creates the index and adds in your document. All other URL parameters are optional.
Parameter | Type | Description | Required
:--- | :--- | :--- | :---
&lt;index&gt; | String | Name of the index. | Yes
&lt;_id&gt; | String | A unique identifier to attach to the document. To automatically generate an ID, use `POST <target>/doc` in your request instead of PUT. | No
if_seq_no | Integer | Only perform the index operation if the document has the specified sequence number. | No
if_primary_term | Integer | Only perform the index operation if the document has the specified primary term.| No
op_type | Enum | Specifies the type of operation to complete with the document. Valid values are `create` (create the index if it doesn't exist) and `index`. If a document ID is included in the request, then the default is `index`. Otherwise, the default is `create`. | No
pipeline | String | Route the index operation to a certain pipeline. | No
routing | String | value used to assign the index operation to a specific shard. | No
timeout | Time | How long to wait for a response from the cluster. Default is `1m`. | No
version | Integer | The document's version number. | No
version_type | Enum | Assigns a specific type to the document. Valid options are `external` (retrieve the document if the specified version number is greater than the document's current version) and `external_gte` (retrieve the document if the specified version number is greater than or equal to the document's current version). For example, to index version 3 of a document, use `/_doc/1?version=3&version_type=external`. | No
wait_for_active_shards | String | The number of active shards that must be available before OpenSearch processes the request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the operation to succeed. | No
require_alias | Boolean | Specifies whether the target index must be an index alias. Default is false. | No
## Request body
Your request body must contain the information you want to index.
```json
{
"Description": "This is just a sample document"
}
```
## Response
```json
{
"_index": "sample-index",
"_type": "_doc",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
```
## Response body fields
Field | Description
:--- | :---
_index | The name of the index.
_type | The document's type. OpenSearch supports only one type, which is `_doc`.
_id | The document's ID.
_version | The document's version.
result | The result of the index operation.
_shards | Detailed information about the cluster's shards.
total | The total number of shards.
successful | The number of shards OpenSearch succssfully added the document to.
failed | The number of shards OpenSearch failed to added the document to.
_seq_no | The sequence number assigned when the document was indexed.
_primary_term | The primary term assigned when the document was indexed.

View File

@ -4,6 +4,8 @@ title: Document APIs
parent: REST API reference
has_children: true
nav_order: 7
redirect_from:
- /opensearch/rest-api/document-apis/
---
# Document APIs

View File

@ -0,0 +1,142 @@
---
layout: default
title: Multi-get document
parent: Document APIs
grand_parent: REST API reference
nav_order: 30
---
# Multi-get documents
Introduced 1.0
{: .label .label-purple }
The multi-get operation allows you to execute multiple GET operations in one request, so you can get back all documents that match your criteria.
## Example without specifying index in URL
```json
GET _mget
{
"docs": [
{
"_index": "sample-index1",
"_id": "1"
},
{
"_index": "sample-index2",
"_id": "1",
"_source": {
"include": ["Length"]
}
}
]
}
```
## Example of specifying index in URL
```json
GET sample-index1/_mget
{
"docs": [
{
"_type": "_doc",
"_id": "1",
"_source": false
},
{
"_type": "_doc",
"_id": "2",
"_source": [ "Director", "Title" ]
}
]
}
```
## Path and HTTP methods
```
GET _mget
GET <index>/_mget
```
## URL parameters
All multi-get URL parameters are optional.
Parameter | Type | Description
:--- | :--- | :--- | :---
&lt;index&gt; | String | Name of the index to retrieve documents from.
preference | String | Specifies the nodes or shards OpenSearch should execute the multi-get operation on. Default is random.
realtime | Boolean | Specifies whether the operation should run in realtime. If false, the operation waits for the index to refresh to analyze the source to retrieve data, which makes the operation near-realtime. Default is `true`.
refresh | Boolean | If true, OpenSearch refreshes shards to make the multi-get operation available to search results. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is `false`.
routing | String | Value used to route the multi-get operation to a specific shard.
stored_fields | Boolean | Specifies whether OpenSearch should retrieve documents fields from the index instead of the document's `_source`. Default is `false`.
_source | String | Whether to include the `_source` field in the query response. Default is `true`.
_source_excludes | String | A comma-separated list of source fields to exclude in the query response.
_source_includes | String | A comma-separated list of source fields to include in the query response.
## Request body
If you don't specify an index in your request's URL, you must specify your target indices and the relevant document IDs in the request body. Other fields are optional.
Field | Type | Description | Required
:--- | :--- | :--- | :---
docs | Array | The documents you want to retrieve data from. Can contain the attributes: `_id`, `_index`, `_routing`, `_source`, and `_stored_fields`. If you specify an index in the URL, you can omit this field and add IDs of the documents to retrieve. | Yes if an index is not specified in the URL
_id | String | The ID of the document. | Yes if `docs` is specified in the request body
_index | String | Name of the index. | Yes if an index is not specified in the URL
_routing | String | The value of the shard that has the document. | Yes if a routing value was used when indexing the document
_source | Object | Specifies whether to return the `_source` field from an index (boolean), whether to return specific fields (array), or whether to include or exclude certain fields. | No
_source.includes | Array | Specifies which fields to include in the query response. For example, `"_source": { "include": ["Title"] }` retrieves `Title` from the index. | No
_source.excludes | Array | Specifies which fields to exclude in the query response. For example, `"_source": { "exclude": ["Director"] }` excludes `Director` from the query response. | No
ids | Array | IDs of the documents to retrieve. Only allowed when an index is specified in the URL. | No
## Response
```json
{
"docs": [
{
"_index": "sample-index1",
"_type": "_doc",
"_id": "1",
"_version": 4,
"_seq_no": 5,
"_primary_term": 19,
"found": true,
"_source": {
"Title": "Batman Begins",
"Director": "Christopher Nolan"
}
},
{
"_index": "sample-index2",
"_type": "_doc",
"_id": "1",
"_version": 1,
"_seq_no": 6,
"_primary_term": 19,
"found": true,
"_source": {
"Title": "The Dark Knight",
"Director": "Christopher Nolan"
}
}
]
}
```
## Response body fields
Field | Description
:--- | :---
_index | The name of the index.
_type | The document's type. OpenSearch only supports one type, which is `_doc`.
_id | The document's ID.
_version | The document's version number. Updated whenever the document changes.
_seq_no | The sequnce number assigned when the document is indexed.
primary_term | The primary term assigned when the document is indexed.
found | Whether the document exists.
_routing | The shard that the document is routed to. If the document is not routed to a particular shard, this field is omitted.
_source | Contains the document's data if `found` is true. If `_source` is set to false or `stored_fields` is set to true in the URL parameters, this field is omitted.
_fields | Contains the document's data that's stored in the index. Only returned if both `stored_fields` and `found` are true.

View File

@ -0,0 +1,121 @@
---
layout: default
title: Reindex
parent: Document APIs
grand_parent: REST API reference
nav_order: 60
---
# Index document
Introduced 1.0
{: .label .label-purple}
The reindex API operation lets you copy all or a subset of your data from a source index into a destination index.
## Example
```json
POST /_reindex
{
"source":{
"index":"my-source-index"
},
"dest":{
"index":"my-destination-index"
}
}
```
## Path and HTTP methods
```
POST /_reindex
```
## URL parameters
All URL parameters are optional.
Parameter | Type | Description
:--- | :--- | :---
refresh | Boolean | If true, OpenSearch refreshes shards to make the reindex operation available to search results. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is `false`.
timeout | Time | How long to wait for a response from the cluster. Default is `30s`.
wait_for_active_shards | String | The number of active shards that must be available before OpenSearch processes the reindex request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the operation to succeed.
wait_for_completion | Boolean | Waits for the matching tasks to complete. Default is `false`.
requests_per_second | Integer | Specifies the requests throttling in sub-requests per second. Default is -1, which means no throttling.
require_alias | Boolean | Whether the destination index must be an index alias. Default is false.
scroll | Time | How long to keep the search context open. Default is `5m`.
slices | Integer | Number of sub-tasks OpenSearch should divide this task into. Default is 1, which means OpenSearch should not divide this task. Setting this parameter to `auto` indicates to OpenSearch that it should automatically decide how many slices to split the task into.
max_docs | Integer | How many documents the update by query operation should process at most. Default is all documents.
## Request body
Your request body must contain the names of the source index and destination index. All other fields are optional.
Field | Description
:--- | :---
conflicts | Indicates to OpenSearch what should happen if the delete by query operation runs into a version conflict. Valid options are `abort` and `proceed`. Default is abort.
source | Information about the source index to include. Valid fields are `index`, `max_docs`, `query`, `remote`, `size`, `slice`, and `_source`.
index | The name of the source index to copy data from.
max_docs | The maximum number of documents to reindex.
query | The search query to use for the reindex operation.
remote | Information about a remote OpenSearch cluster to copy data from. Valid fields are `host`, `username`, `password`, `socket_timeout`, and `connect_timeout`.
host | Host URL of the OpenSearch cluster to copy data from.
username | Username to authenticate with the remote cluster.
password | Password to authenticate with the remote cluster.
socket_timeout | The wait time for socket reads. Default is 30s.
connect_timeout | The wait time for remote connection timeouts. Default is 30s.
size | The number of documents to reindex.
slice | Whether to manually or automatically slice the reindex operation so it executes in parallel.
_source | Whether to reindex source fields. Speicfy a list of fields to reindex or true to reindex all fields. Default is true.
id | The ID to associate with manual slicing.
max | Maximum number of slices.
dest | Information about the destination index. Valid values are `index`, `version_type`, and `op_type`.
index | Name of the destination index.
version_type | The indexing operation's version type. Valid values are `internal`, `external`, `external_gt` (retrieve the document if the specified version number is greater than the documents current version), and `external_gte` (retrieve the document if the specified version number is greater or equal to than the documents current version).
op_type | Whether to copy over documents that are missing in the destination index. Valid values are `create` (ignore documents with the same ID from the source index) and `index` (copy everything from the source index).
script | A script that OpenSearch uses to apply transformations to the data during the reindex operation.
source | The actual script that OpenSearch runs.
lang | The scripting language. Valid options are `painless`, `expression`, `mustache`, and `java`.
## Response
```json
{
"took": 28829,
"timed_out": false,
"total": 111396,
"updated": 0,
"created": 111396,
"deleted": 0,
"batches": 112,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1.0,
"throttled_until_millis": 0,
"failures": []
}
```
## Response body fields
Field | Description
:--- | :---
took | How long the operation took in milliseconds.
timed_out | Whether the operation timed out.
total | The total number of documents processed.
updated | The number of documents updated in the destination index.
created | The number of documents created in the destination index.
deleted | The number of documents deleted.
batches | Number of scroll responses.
version_conflicts | Number of version conflicts.
noops | How many documents OpenSearch ignored during the operation.
retries | Number of bulk and search retry requests.
throttled_millis | Number of throttled milliseconds during the request.
requests_per_second | Number of requests executed per second during the operation.
throttled_until_millis | The amount of time until OpenSearch executes the next throttled request.
failures | Any failures that occurred during the operation.

Some files were not shown because too many files have changed in this diff Show More