Initial documentation cut

This commit is contained in:
aetter 2021-05-05 10:09:47 -07:00
parent b14d081d82
commit 4c7109366d
194 changed files with 28284 additions and 13 deletions

5
.gitignore vendored Normal file
View File

@ -0,0 +1,5 @@
_site
.sass-cache
.jekyll-metadata
.DS_Store
Gemfile.lock

3
404.md Normal file
View File

@ -0,0 +1,3 @@
---
permalink: /404.html
---

View File

@ -11,7 +11,7 @@ information to effectively respond to your bug report or contribution.
We welcome you to use the GitHub issue tracker to report bugs or suggest features.
When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already
When filing an issue, please check [existing open](https://github.com/opensearch-project/documentation-website/issues), or [recently closed](https://github.com/opensearch-project/documentation-website/issues?q=is%3Aissue+is%3Aclosed), issues to make sure somebody else hasn't already
reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
* A reproducible test case or series of steps
@ -23,7 +23,7 @@ reported the issue. Please try to include as much information as you can. Detail
## Contributing via Pull Requests
Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
1. You are working against the latest source on the *main* branch.
1. You are working against the latest source on the *master* branch.
2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.
3. You open an issue to discuss any significant work - we would hate for your time to be wasted.
@ -41,7 +41,7 @@ GitHub provides additional document on [forking a repository](https://help.githu
## Finding contributions to work on
Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.
Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any ['help wanted'](https://github.com/opensearch-project/documentation-website/issues?q=is%3Aissue+label%3A%22help+wanted%22+is%3Aopen) issues is a great place to start.
## Code of Conduct
@ -56,4 +56,6 @@ If you discover a potential security issue in this project we ask that you notif
## Licensing
See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
See the [LICENSE](https://github.com/opensearch-project/documentation-website/blob/master/LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
We may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger changes.

32
Gemfile Normal file
View File

@ -0,0 +1,32 @@
source "https://rubygems.org"
# Hello! This is where you manage which Jekyll version is used to run.
# When you want to use a different version, change it below, save the
# file and run `bundle install`. Run Jekyll with `bundle exec`, like so:
#
# bundle exec jekyll serve
#
# This will help ensure the proper Jekyll version is running.
# Happy Jekylling!
# gem "jekyll", "~> 3.9.0"
# This is the default theme for new Jekyll sites. You may change this to anything you like.
gem "just-the-docs", "~> 0.3.3"
# If you want to use GitHub Pages, remove the "gem "jekyll"" above and
# uncomment the line below. To upgrade, run `bundle update github-pages`.
gem 'github-pages', group: :jekyll_plugins
# If you have any plugins, put them here!
# group :jekyll_plugins do
# # gem "jekyll-feed", "~> 0.6"
# gem "jekyll-remote-theme"
# gem "jekyll-redirect-from"
# end
# Windows does not include zoneinfo files, so bundle the tzinfo-data gem
gem "tzinfo-data", platforms: [:mingw, :mswin, :x64_mingw, :jruby]
# Performance-booster for watching directories on Windows
gem "wdm", "~> 0.1.0" if Gem.win_platform?

277
README.md
View File

@ -1,17 +1,276 @@
## My Project
# OpenSearch documentation
TODO: Fill this README out!
This repository contains the documentation for OpenSearch, the search, analytics, and visualization suite with advanced security, alerting, SQL support, automated index management, deep performance analysis, and more. You can find the rendered documentation at [docs-beta.opensearch.org](docs-beta.opensearch.org).
Be sure to:
Community contributions remain essential in keeping this documentation comprehensive, useful, well-organized, and up-to-date.
* Change the title in this README
* Edit your repository description on GitHub
## Security
## How you can help
See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
- Do you work on one of the various OpenSearch plugins? Take a look at the documentation for the plugin. Is everything accurate? Will anything change in the near future?
## License
Often, engineering teams can keep existing documentation up-to-date with minimal effort, thus freeing up the documentation team to focus on larger projects.
This project is licensed under the Apache-2.0 License.
- Do you have expertise in a particular area of OpenSearch? Cluster sizing? The query DSL? Painless scripting? Aggregations? JVM settings? Take a look at the [current content](https://docs-beta.opensearch.org/docs/opensearch/) and see where you can add value. The [documentation team](#points-of-contact) is happy to help you polish and organize your drafts.
- Are you an OpenSearch Dashboards expert? How did you set up your visualizations? Why is a particular dashboard so valuable to your organization? We have [very little](https://docs-beta.opensearch.org/docs/opensearch-dashboards/) on how to use OpenSearch Dashboards, only how to install it.
- Are you a web developer? Do you want to add an optional dark mode to the documentation? A "copy to clipboard" button for our code samples? Other improvements to the design or usability? See [major changes](#major-changes) for information on building the website locally.
- Our [issue tracker](https://github.com/opensearch-project/documentation-website/issues) contains documentation bugs and other content gaps, some of which have colorful labels like "good first issue" and "help wanted."
## Points of contact
If you encounter problems or have questions when contributing to the documentation, these people can help:
- [aetter](https://github.com/aetter)
- [ashwinkumar12345](https://github.com/ashwinkumar12345)
- [keithhc2](https://github.com/keithhc2)
- [snyder114](https://github.com/snyder114)
## How we build the website
After each commit to this repository, GitHub Pages automatically uses [Jekyll](https://jekyllrb.com) to rebuild the [website](https://docs-beta.opensearch.org). The whole process takes around 30 seconds.
This repository contains many [Markdown](https://guides.github.com/features/mastering-markdown/) files in the `/docs` directory. Each Markdown file correlates with one page on the website. For example, the Markdown file for [this page](https://docs-beta.opensearch.org/docs/opensearch/) is [here](https://github.com/opensearch-project/documentation-website/blob/master/docs/opensearch/index.md).
Using plain text on GitHub has many advantages:
- Everything is free, open source, and works on every operating system. Use your favorite text editor, Ruby, Jekyll, and Git.
- Markdown is easy to learn and looks good in side-by-side diffs.
- The workflow is no different than contributing code. Make your changes, build locally to check your work, and submit a pull request. Reviewers check the PR before merging.
- Alternatives like wikis and WordPress are full web applications that require databases and ongoing maintenance. They also have inferior versioning and content review processes compared to Git. Static websites, such as the ones Jekyll produces, are faster, more secure, and more stable.
In addition to the content for a given page, each Markdown file contains some Jekyll [front matter](https://jekyllrb.com/docs/front-matter/). Front matter looks like this:
```
---
layout: default
title: Alerting security
nav_order: 10
parent: Alerting
has_children: false
---
```
If you're making [trivial changes](#trivial-changes), you don't have to worry about front matter.
If you want to reorganize content or add new pages, keep an eye on `has_children`, `parent`, and `nav_order`, which define the hierarchy and order of pages in the lefthand navigation. For more information, see the documentation for [our upstream Jekyll theme](https://pmarsceill.github.io/just-the-docs/docs/navigation-structure/).
## Contribute content
There are three ways to contribute content, depending on the magnitude of the change.
- [Trivial changes](#trivial-changes)
- [Minor changes](#minor-changes)
- [Major changes](#major-changes)
### Trivial changes
If you just need to fix a typo or add a sentence, this web-based method works well:
1. On any page in the documentation, click the **Edit this page** link in the lower-left.
1. Make your changes.
1. Choose **Create a new branch for this commit and start a pull request** and **Commit changes**.
### Minor changes
If you want to add a few paragraphs across multiple files and are comfortable with Git, try this approach:
1. Fork this repository.
1. Download [GitHub Desktop](https://desktop.github.com), install it, and clone your fork.
1. Navigate to the repository root.
1. Create a new branch.
1. Edit the Markdown files in `/docs`.
1. Commit, push your changes to your fork, and submit a pull request.
### Major changes
If you're making major changes to the documentation and need to see the rendered HTML before submitting a pull request, here's how to build locally:
1. Fork this repository.
1. Download [GitHub Desktop](https://desktop.github.com), install it, and clone your fork.
1. Navigate to the repository root.
1. Install [Ruby](https://www.ruby-lang.org/en/) if you don't already have it. We recommend [RVM](https://rvm.io/), but use whatever method you prefer:
```
curl -sSL https://get.rvm.io | bash -s stable
rvm install 2.6
ruby -v
```
1. Install [Jekyll](https://jekyllrb.com/) if you don't already have it:
```
gem install bundler jekyll
```
1. Install dependencies:
```
bundle install
```
1. Build:
```
sh build.sh
```
1. If the build script doesn't automatically open your web browser (it should), open [http://localhost:4000/](http://localhost:4000/).
1. Create a new branch.
1. Edit the Markdown files in `/docs`.
If you're a web developer, you can customize `_layouts/default.html` and `_sass/custom/custom.scss`.
1. When you save a file, marvel as Jekyll automatically rebuilds the site and refreshes your web browser. This process takes roughly 30 seconds.
1. When you're happy with how everything looks, commit, push your changes to your fork, and submit a pull request.
## Writing tips
1. Try to stay consistent with existing content and consistent within your new content. Don't call the same plugin KNN, k-nn, and k-NN in three different places.
1. Shorter paragraphs are better than longer paragraphs. Use headers, tables, lists, and images to make your content easier for readers to scan.
1. Use **bold** for user interface elements, *italics* for key terms or emphasis, and `monospace` for Bash commands, file names, REST paths, and code.
1. Markdown file names should be all lowercase, use hyphens to separate words, and end in `.md`.
1. Avoid future tense. Use present tense.
**Bad**: After you click the button, the process will start.
**Better**: After you click the button, the process starts.
1. "You" refers to the person reading the page. "We" refers to the OpenSearch contributors.
**Bad**: Now that we've finished the configuration, we have a working cluster.
**Better**: At this point, you have a working cluster, but we recommend adding dedicated master nodes.
1. Don't use "this" and "that" to refer to something without adding a noun.
**Bad**: This can cause high latencies.
**Better**: This additional loading time can cause high latencies.
1. Use active voice.
**Bad**: After the request is sent, the data is added to the index.
**Better**: After you send the request, the OpenSearch cluster indexes the data.
1. Introduce acronyms before using them.
**Bad**: Reducing customer TTV should accelerate our ROIC.
**Better**: Reducing customer time to value (TTV) should accelerate our return on invested capital (ROIC).
1. Spell out one through nine. Start using numerals at 10. If a number needs a unit (GB, pounds, millimeters, kg, celsius, etc.), use numerals, even if the number if smaller than 10.
**Bad**: 3 kids looked for thirteen files on a six GB hard drive.
**Better**: Three kids looked for 13 files on a 6 GB hard drive.
## New releases
1. Branch.
1. Change the `opensearch_version` and `opensearch_major_version` variables in `_config.yml`.
1. Start up a new cluster using the updated Docker Compose file in `docs/install/docker.md`.
1. Update the version table in `version-history.md`.
Use `curl -XGET https://localhost:9200 -u admin:admin -k` to verify the OpenSearch version.
1. Update the plugin compatibility table in `docs/install/plugin.md`.
Use `curl -XGET https://localhost:9200/_cat/plugins -u admin:admin -k` to get the correct version strings.
1. Update the plugin compatibility table in `docs/opensearch-dashboards/plugins.md`.
Use `docker ps` to find the ID for the OpenSearch Dashboards node. Then use `docker exec -it <opensearch-dashboards-node-id> /bin/bash` to get shell access. Finally, run `./bin/opensearch-dashboards-plugin list` to get the plugins and version strings.
1. Run a build (`build.sh`), and look for any warnings or errors you introduced.
1. Verify that the individual plugin download links in `docs/install/plugins.md` and `docs/opensearch-dashboards/plugins.md` work.
1. Check for any other bad links (`check-links.sh`). Expect a few false positives for the `localhost` links.
1. Submit a PR.
## Classes within Markdown
This documentation uses a modified version of the [just-the-docs](https://github.com/pmarsceill/just-the-docs) Jekyll theme, which has some useful classes for labels and buttons:
```
[Get started](#get-started){: .btn .btn-blue }
## Get started
New
{: .label .label-green :}
```
* Labels come in default (blue), green, purple, yellow, and red.
* Buttons come in default, purple, blue, green, and outline.
* Warning, tip, and note blocks are available (`{: .warning }`, etc.).
* If an image has a white background, you can use `{: .img-border }` to add a one pixel border to the image.
These classes can help with readability, but should be used *sparingly*. Each addition of a class damages the portability of the Markdown files and makes moving to a different Jekyll theme (or a different static site generator) more difficult.
Besides, standard Markdown elements suffice for most documentation.
## Math
If you want to use the sorts of pretty formulas that [MathJax](https://www.mathjax.org) allows, add `has_math: true` to the Jekyll page metadata. Then insert LaTeX math into HTML tags with the rest of your Markdown content:
```
## Math
Some Markdown paragraph. Here's a formula:
<p>
When \(a \ne 0\), there are two solutions to \(ax^2 + bx + c = 0\) and they are
\[x = {-b \pm \sqrt{b^2-4ac} \over 2a}.\]
</p>
And back to Markdown.
```
## Code of conduct
This project has adopted an [Open Source Code of Conduct](https://opensearch.org/codeofconduct.html).
## Security issue notifications
If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public GitHub issue.
## Licensing
See the [LICENSE](./LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
## Copyright
Copyright Amazon.com, Inc. or its affiliates. All rights reserved.

11
THIRD-PARTY Normal file
View File

@ -0,0 +1,11 @@
** (MIT License) Just the Docs 0.3.3 - https://github.com/pmarsceill/just-the-docs
Copyright (c) 2016 Patrick Marsceill
** (MIT License) Jekyll Pure Liquid Table of Contents 1.1.0 - https://github.com/allejo/jekyll-toc
Copyright (c) 2017 Vladimir Jimenez
** (MIT License) Bootstrap Icons 1.4.1 - https://github.com/twbs/icons
Copyright (c) 2019-2020 The Bootstrap Authors

98
_config.yml Normal file
View File

@ -0,0 +1,98 @@
# Welcome to Jekyll!
#
# This config file is meant for settings that affect your whole blog, values
# which you are expected to set up once and rarely edit after that. If you find
# yourself editing this file very often, consider using Jekyll's data files
# feature for the data you need to update frequently.
#
# For technical reasons, this file is *NOT* reloaded automatically when you use
# 'bundle exec jekyll serve'. If you change this file, please restart the server process.
# Site settings
# These are used to personalize your new site. If you look in the HTML files,
# you will see them accessed via {{ site.title }}, {{ site.email }}, and so on.
# You can create any custom variable you would like, and they will be accessible
# in the templates via {{ site.myvariable }}.
title: OpenSearch documentation
description: >- # this means to ignore newlines until "baseurl:"
Documentation for OpenSearch, the Apache 2.0 search, analytics, and visualization suite with advanced security, alerting, SQL support, automated index management, deep performance analysis, and more.
baseurl: "" # the subpath of your site, e.g. /blog
url: "https://docs-beta.opensearch.org" # the base hostname & protocol for your site, e.g. http://example.com
permalink: pretty
opensearch_version: 1.0.0-beta1
opensearch_major_minor_version: 1.0
# Build settings
markdown: kramdown
remote_theme: pmarsceill/just-the-docs@v0.3.3
# Kramdown settings
kramdown:
toc_levels: 2..3
logo: "/assets/images/fake-logo.svg"
# Aux links for the upper right navigation
aux_links:
"Back to OpenSearch.org":
- "https://opensearch.org/"
color_scheme: opensearch
# Enable or disable the site search
# Supports true (default) or false
search_enabled: true
search:
# Split pages into sections that can be searched individually
# Supports 1 - 6, default: 2
heading_level: 2
# Maximum amount of previews per search result
# Default: 3
previews: 3
# Maximum amount of words to display before a matched word in the preview
# Default: 5
preview_words_before: 5
# Maximum amount of words to display after a matched word in the preview
# Default: 10
preview_words_after: 10
# Set the search token separator
# Default: /[\s\-/]+/
# Example: enable support for hyphenated search words
tokenizer_separator: /[\s/]+/
# Display the relative url in search results
# Supports true (default) or false
rel_url: true
# Enable or disable the search button that appears in the bottom right corner of every page
# Supports true or false (default)
button: false
# Google Analytics Tracking (optional)
# e.g, UA-1234567-89
ga_tracking: UA-135423944-1
# Disable the just-the-docs theme anchor links in favor of our custom ones
# See _includes/head_custom.html
heading_anchors: false
# Adds on-hover anchor links to h2-h6
anchor_links: true
footer_content:
plugins:
- jekyll-remote-theme
- jekyll-redirect-from
# Exclude from processing.
# The following items will not be processed, by default. Create a custom list
# to override the default setting.
exclude:
- Gemfile
- Gemfile.lock
- node_modules
- vendor/bundle/
- vendor/cache/
- vendor/gems/
- vendor/ruby/
- README.md

21
_includes/head_custom.html Executable file
View File

@ -0,0 +1,21 @@
{% if site.anchor_links != nil %}
<script src="https://cdnjs.cloudflare.com/ajax/libs/anchor-js/4.2.0/anchor.min.js"></script>
{% endif %}
{% if page.has_math == true %}
<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3.0.1/es5/tex-mml-chtml.js"></script>
{% endif %}
<!-- SiteCatalyst code version: H.25.1. Copyright 1996-2012 Adobe, Inc. All Rights Reserved -->
<script><!--
/************* DO NOT ALTER ANYTHING BELOW THIS LINE ! **************/
var s_code=s.t();if(s_code)document.write(s_code)//--></script>
<script language="JavaScript" type="text/javascript"><!--
if(navigator.appVersion.indexOf('MSIE')>=0)document.write(unescape('%3C')+'\!-'+'-')
//--></script>
<noscript>
<img src="//amazonwebservices.d2.sc.omtrdc.net/b/ss/awsamazonalldev2/1/H.25.1--NS/0" height="1" width="1" border="0" alt="" />
</noscript>
<!--/DO NOT REMOVE/-->
<!-- End SiteCatalyst code version: H.25.1. -->

106
_includes/nav.html Normal file
View File

@ -0,0 +1,106 @@
<ul class="nav-list">
{%- assign titled_pages = include.pages
| where_exp:"item", "item.title != nil" -%}
{%- comment -%}
The values of `title` and `nav_order` can be numbers or strings.
Jekyll gives build failures when sorting on mixtures of different types,
so numbers and strings need to be sorted separately.
Here, numbers are sorted by their values, and come before all strings.
An omitted `nav_order` value is equivalent to the page's `title` value
(except that a numerical `title` value is treated as a string).
The case-sensitivity of string sorting is determined by `site.nav_sort`.
{%- endcomment -%}
{%- assign string_ordered_pages = titled_pages
| where_exp:"item", "item.nav_order == nil" -%}
{%- assign nav_ordered_pages = titled_pages
| where_exp:"item", "item.nav_order != nil" -%}
{%- comment -%}
The nav_ordered_pages have to be added to number_ordered_pages and
string_ordered_pages, depending on the nav_order value.
The first character of the jsonify result is `"` only for strings.
{%- endcomment -%}
{%- assign nav_ordered_groups = nav_ordered_pages
| group_by_exp:"item", "item.nav_order | jsonify | slice: 0" -%}
{%- assign number_ordered_pages = "" | split:"X" -%}
{%- for group in nav_ordered_groups -%}
{%- if group.name == '"' -%}
{%- assign string_ordered_pages = string_ordered_pages | concat: group.items -%}
{%- else -%}
{%- assign number_ordered_pages = number_ordered_pages | concat: group.items -%}
{%- endif -%}
{%- endfor -%}
{%- assign sorted_number_ordered_pages = number_ordered_pages | sort:"nav_order" -%}
{%- comment -%}
The string_ordered_pages have to be sorted by nav_order, and otherwise title
(where appending the empty string to a numeric title converts it to a string).
After grouping them by those values, the groups are sorted, then the items
of each group are concatenated.
{%- endcomment -%}
{%- assign string_ordered_groups = string_ordered_pages
| group_by_exp:"item", "item.nav_order | default: item.title | append:''" -%}
{%- if site.nav_sort == 'case_insensitive' -%}
{%- assign sorted_string_ordered_groups = string_ordered_groups | sort_natural:"name" -%}
{%- else -%}
{%- assign sorted_string_ordered_groups = string_ordered_groups | sort:"name" -%}
{%- endif -%}
{%- assign sorted_string_ordered_pages = "" | split:"X" -%}
{%- for group in sorted_string_ordered_groups -%}
{%- assign sorted_string_ordered_pages = sorted_string_ordered_pages | concat: group.items -%}
{%- endfor -%}
{%- assign pages_list = sorted_number_ordered_pages | concat: sorted_string_ordered_pages -%}
{%- for node in pages_list -%}
{%- if node.parent == nil -%}
{%- unless node.nav_exclude -%}
<li class="nav-list-item{% if page.url == node.url or page.parent == node.title or page.grand_parent == node.title %} active{% endif %}">
{%- if node.has_children -%}
<a href="#" class="nav-list-expander"><svg viewBox="0 0 24 24"><use xlink:href="#svg-arrow-right"></use></svg></a>
{%- endif -%}
<a href="{{ node.url | absolute_url }}" class="nav-list-link{% if page.url == node.url %} active{% endif %}">{{ node.title }}</a>
{%- if node.has_children -%}
{%- assign children_list = pages_list | where: "parent", node.title -%}
<ul class="nav-list ">
{%- for child in children_list -%}
{%- unless child.nav_exclude -%}
<li class="nav-list-item {% if page.url == child.url or page.parent == child.title %} active{% endif %}">
{%- if child.has_children -%}
<a href="#" class="nav-list-expander"><svg viewBox="0 0 24 24"><use xlink:href="#svg-arrow-right"></use></svg></a>
{%- endif -%}
<a href="{{ child.url | absolute_url }}" class="nav-list-link{% if page.url == child.url %} active{% endif %}">{{ child.title }}</a>
{%- if child.has_children -%}
{%- assign grand_children_list = pages_list | where: "parent", child.title | where: "grand_parent", node.title -%}
<ul class="nav-list">
{%- for grand_child in grand_children_list -%}
{%- unless grand_child.nav_exclude -%}
<li class="nav-list-item {% if page.url == grand_child.url %} active{% endif %}">
<a href="{{ grand_child.url | absolute_url }}" class="nav-list-link{% if page.url == grand_child.url %} active{% endif %}">{{ grand_child.title }}</a>
</li>
{%- endunless -%}
{%- endfor -%}
</ul>
{%- endif -%}
</li>
{%- endunless -%}
{%- endfor -%}
</ul>
{%- endif -%}
</li>
{%- endunless -%}
{%- endif -%}
{%- endfor -%}
<li class="nav-list-item">
<a href="https://opensearch.org" target="_blank" class="nav-list-link">Javadoc <svg class="external-arrow" width="16" height="16" fill="#002A3A"><use xlink:href="#external-arrow"></use></svg></a>
</li>
<li class="nav-list-item">
<a href="https://opensearch.org" target="_blank" class="nav-list-link">Reference archive <svg class="external-arrow" width="16" height="16" fill="#002A3A"><use xlink:href="#external-arrow"></use></svg></a>
</li>
</ul>

182
_includes/toc.html Normal file
View File

@ -0,0 +1,182 @@
{% capture tocWorkspace %}
{% comment %}
Copyright (c) 2017 Vladimir "allejo" Jimenez
Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without
restriction, including without limitation the rights to use,
copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following
conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
{% endcomment %}
{% comment %}
Version 1.1.0
https://github.com/allejo/jekyll-toc
"...like all things liquid - where there's a will, and ~36 hours to spare, there's usually a/some way" ~jaybe
Usage:
{% include toc.html html=content sanitize=true class="inline_toc" id="my_toc" h_min=2 h_max=3 %}
Parameters:
* html (string) - the HTML of compiled markdown generated by kramdown in Jekyll
Optional Parameters:
* sanitize (bool) : false - when set to true, the headers will be stripped of any HTML in the TOC
* class (string) : '' - a CSS class assigned to the TOC
* id (string) : '' - an ID to assigned to the TOC
* h_min (int) : 1 - the minimum TOC header level to use; any header lower than this value will be ignored
* h_max (int) : 6 - the maximum TOC header level to use; any header greater than this value will be ignored
* ordered (bool) : false - when set to true, an ordered list will be outputted instead of an unordered list
* item_class (string) : '' - add custom class(es) for each list item; has support for '%level%' placeholder, which is the current heading level
* submenu_class (string) : '' - add custom class(es) for each child group of headings; has support for '%level%' placeholder which is the current "submenu" heading level
* base_url (string) : '' - add a base url to the TOC links for when your TOC is on another page than the actual content
* anchor_class (string) : '' - add custom class(es) for each anchor element
* skip_no_ids (bool) : false - skip headers that do not have an `id` attribute
Output:
An ordered or unordered list representing the table of contents of a markdown block. This snippet will only
generate the table of contents and will NOT output the markdown given to it
{% endcomment %}
{% capture newline %}
{% endcapture %}
{% assign newline = newline | rstrip %} <!-- Remove the extra spacing but preserve the newline -->
{% capture deprecation_warnings %}{% endcapture %}
{% if include.baseurl %}
{% capture deprecation_warnings %}{{ deprecation_warnings }}<!-- jekyll-toc :: "baseurl" has been deprecated, use "base_url" instead -->{{ newline }}{% endcapture %}
{% endif %}
{% if include.skipNoIDs %}
{% capture deprecation_warnings %}{{ deprecation_warnings }}<!-- jekyll-toc :: "skipNoIDs" has been deprecated, use "skip_no_ids" instead -->{{ newline }}{% endcapture %}
{% endif %}
{% capture jekyll_toc %}{% endcapture %}
{% assign orderedList = include.ordered | default: false %}
{% assign baseURL = include.base_url | default: include.baseurl | default: '' %}
{% assign skipNoIDs = include.skip_no_ids | default: include.skipNoIDs | default: false %}
{% assign minHeader = include.h_min | default: 1 %}
{% assign maxHeader = include.h_max | default: 6 %}
{% assign nodes = include.html | strip | split: '<h' %}
{% assign firstHeader = true %}
{% assign currLevel = 0 %}
{% assign lastLevel = 0 %}
{% capture listModifier %}{% if orderedList %}ol{% else %}ul{% endif %}{% endcapture %}
{% for node in nodes %}
{% if node == "" %}
{% continue %}
{% endif %}
{% assign currLevel = node | replace: '"', '' | slice: 0, 1 | times: 1 %}
{% if currLevel < minHeader or currLevel > maxHeader %}
{% continue %}
{% endif %}
{% assign _workspace = node | split: '</h' %}
{% assign _idWorkspace = _workspace[0] | split: 'id="' %}
{% assign _idWorkspace = _idWorkspace[1] | split: '"' %}
{% assign htmlID = _idWorkspace[0] %}
{% assign _classWorkspace = _workspace[0] | split: 'class="' %}
{% assign _classWorkspace = _classWorkspace[1] | split: '"' %}
{% assign htmlClass = _classWorkspace[0] %}
{% if htmlClass contains "no_toc" %}
{% continue %}
{% endif %}
{% if firstHeader %}
{% assign minHeader = currLevel %}
{% endif %}
{% capture _hAttrToStrip %}{{ _workspace[0] | split: '>' | first }}>{% endcapture %}
{% assign header = _workspace[0] | replace: _hAttrToStrip, '' %}
{% if include.item_class and include.item_class != blank %}
{% capture listItemClass %} class="{{ include.item_class | replace: '%level%', currLevel | split: '.' | join: ' ' }}"{% endcapture %}
{% endif %}
{% if include.submenu_class and include.submenu_class != blank %}
{% assign subMenuLevel = currLevel | minus: 1 %}
{% capture subMenuClass %} class="{{ include.submenu_class | replace: '%level%', subMenuLevel | split: '.' | join: ' ' }}"{% endcapture %}
{% endif %}
{% capture anchorBody %}{% if include.sanitize %}{{ header | strip_html }}{% else %}{{ header }}{% endif %}{% endcapture %}
{% if htmlID %}
{% capture anchorAttributes %} href="{% if baseURL %}{{ baseURL }}{% endif %}#{{ htmlID }}"{% endcapture %}
{% if include.anchor_class %}
{% capture anchorAttributes %}{{ anchorAttributes }} class="{{ include.anchor_class | split: '.' | join: ' ' }}"{% endcapture %}
{% endif %}
{% capture listItem %}<a{{ anchorAttributes }}>{{ anchorBody }}</a>{% endcapture %}
{% elsif skipNoIDs == true %}
{% continue %}
{% else %}
{% capture listItem %}{{ anchorBody }}{% endcapture %}
{% endif %}
{% if currLevel > lastLevel %}
{% capture jekyll_toc %}{{ jekyll_toc }}<{{ listModifier }}{{ subMenuClass }}>{% endcapture %}
{% elsif currLevel < lastLevel %}
{% assign repeatCount = lastLevel | minus: currLevel %}
{% for i in (1..repeatCount) %}
{% capture jekyll_toc %}{{ jekyll_toc }}</li></{{ listModifier }}>{% endcapture %}
{% endfor %}
{% capture jekyll_toc %}{{ jekyll_toc }}</li>{% endcapture %}
{% else %}
{% capture jekyll_toc %}{{ jekyll_toc }}</li>{% endcapture %}
{% endif %}
{% capture jekyll_toc %}{{ jekyll_toc }}<li{{ listItemClass }}>{{ listItem }}{% endcapture %}
{% assign lastLevel = currLevel %}
{% assign firstHeader = false %}
{% endfor %}
{% assign repeatCount = minHeader | minus: 1 %}
{% assign repeatCount = lastLevel | minus: repeatCount %}
{% for i in (1..repeatCount) %}
{% capture jekyll_toc %}{{ jekyll_toc }}</li></{{ listModifier }}>{% endcapture %}
{% endfor %}
{% if jekyll_toc != '' %}
{% assign rootAttributes = '' %}
{% if include.class and include.class != blank %}
{% capture rootAttributes %} class="{{ include.class | split: '.' | join: ' ' }}"{% endcapture %}
{% endif %}
{% if include.id and include.id != blank %}
{% capture rootAttributes %}{{ rootAttributes }} id="{{ include.id }}"{% endcapture %}
{% endif %}
{% if rootAttributes %}
{% assign nodes = jekyll_toc | split: '>' %}
{% capture jekyll_toc %}<{{ listModifier }}{{ rootAttributes }}>{{ nodes | shift | join: '>' }}>{% endcapture %}
{% endif %}
{% endif %}
{% endcapture %}{% assign tocWorkspace = '' %}{{ deprecation_warnings }}{{ jekyll_toc }}

219
_layouts/default.html Executable file
View File

@ -0,0 +1,219 @@
---
layout: table_wrappers
---
<!DOCTYPE html>
<html lang="{{ site.lang | default: 'en-US' }}">
{% include head.html %}
<body>
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
<symbol id="svg-link" viewBox="0 0 24 24">
<title>Link</title>
<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-link">
<path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path>
</svg>
</symbol>
<symbol id="svg-search" viewBox="0 0 24 24">
<title>Search</title>
<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-search">
<circle cx="11" cy="11" r="8"></circle><line x1="21" y1="21" x2="16.65" y2="16.65"></line>
</svg>
</symbol>
<symbol id="svg-menu" viewBox="0 0 24 24">
<title>Menu</title>
<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu">
<line x1="3" y1="12" x2="21" y2="12"></line><line x1="3" y1="6" x2="21" y2="6"></line><line x1="3" y1="18" x2="21" y2="18"></line>
</svg>
</symbol>
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
<title>Expand</title>
<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-chevron-right">
<polyline points="9 18 15 12 9 6"></polyline>
</svg>
</symbol>
<symbol id="svg-doc" viewBox="0 0 24 24">
<title>Document</title>
<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-file">
<path d="M13 2H6a2 2 0 0 0-2 2v16a2 2 0 0 0 2 2h12a2 2 0 0 0 2-2V9z"></path><polyline points="13 2 13 9 20 9"></polyline>
</svg>
</symbol>
<symbol id="external-arrow" viewBox="0 0 16 16">
<title>External</title>
<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" viewBox="0 0 16 16">
<path fill-rule="evenodd" d="M8.636 3.5a.5.5 0 0 0-.5-.5H1.5A1.5 1.5 0 0 0 0 4.5v10A1.5 1.5 0 0 0 1.5 16h10a1.5 1.5 0 0 0 1.5-1.5V7.864a.5.5 0 0 0-1 0V14.5a.5.5 0 0 1-.5.5h-10a.5.5 0 0 1-.5-.5v-10a.5.5 0 0 1 .5-.5h6.636a.5.5 0 0 0 .5-.5z"/>
<path fill-rule="evenodd" d="M16 .5a.5.5 0 0 0-.5-.5h-5a.5.5 0 0 0 0 1h3.793L6.146 9.146a.5.5 0 1 0 .708.708L15 1.707V5.5a.5.5 0 0 0 1 0v-5z"/>
</svg>
</symbol>
</svg>
<div class="side-bar">
<div class="site-header">
<a href="{{ '/' | absolute_url }}" class="site-title lh-tight">{% include title.html %}</a>
<a href="#" id="menu-button" class="site-button">
<svg viewBox="0 0 24 24" class="icon"><use xlink:href="#svg-menu"></use></svg>
</a>
</div>
<nav role="navigation" aria-label="Main" id="site-nav" class="site-nav">
{% if site.just_the_docs.collections %}
{% assign collections_size = site.just_the_docs.collections | size %}
{% for collection_entry in site.just_the_docs.collections %}
{% assign collection_key = collection_entry[0] %}
{% assign collection_value = collection_entry[1] %}
{% assign collection = site[collection_key] %}
{% if collection_value.nav_exclude != true %}
{% if collections_size > 1 %}
<div class="nav-category">{{ collection_value.name }}</div>
{% endif %}
{% include nav.html pages=collection %}
{% endif %}
{% endfor %}
{% else %}
{% include nav.html pages=site.html_pages %}
{% endif %}
</nav>
<footer class="site-footer">
<p class="text-small text-grey-dk-100">See a problem? Submit <a href="https://github.com/opensearch-project/documentation-website/issues">issues</a> or <a href="https://github.com/opensearch-project/documentation-website/edit/master/{{ page.path }}">edit this page</a> on <a href="https://github.com/opensearch-project/documentation-website/">GitHub</a>.</p>
<p class="text-small text-grey-dk-100 mb-0">© Amazon Web Services, Inc. or its affiliates. All rights reserved.</p>
</footer>
</div>
<div class="main" id="top">
<div id="main-header" class="main-header">
{% if site.search_enabled != false %}
<div class="search">
<div class="search-input-wrap">
<input type="text" id="search-input" class="search-input" tabindex="0" placeholder="Search..." aria-label="Search {{ site.title }}" autocomplete="off">
<label for="search-input" class="search-label"><svg viewBox="0 0 24 24" class="search-icon"><use xlink:href="#svg-search"></use></svg></label>
</div>
<div id="search-results" class="search-results"></div>
</div>
{% endif %}
{% include header_custom.html %}
{% if site.aux_links %}
<nav aria-label="Auxiliary" class="aux-nav">
<ul class="aux-nav-list">
{% for link in site.aux_links %}
<li class="aux-nav-list-item">
<a href="{{ link.last }}" class="site-button"
{% if site.aux_links_new_tab %}
target="_blank" rel="noopener noreferrer"
{% endif %}
>
{{ link.first }}
</a>
</li>
{% endfor %}
</ul>
</nav>
{% endif %}
</div>
<div id="main-content-wrap" class="main-content-wrap">
{% unless page.url == "/" %}
{% if page.parent %}
{%- for node in pages_list -%}
{%- if node.parent == nil -%}
{%- if page.parent == node.title or page.grand_parent == node.title -%}
{%- assign first_level_url = node.url | absolute_url -%}
{%- endif -%}
{%- if node.has_children -%}
{%- assign children_list = pages_list | where: "parent", node.title -%}
{%- for child in children_list -%}
{%- if page.url == child.url or page.parent == child.title -%}
{%- assign second_level_url = child.url | absolute_url -%}
{%- endif -%}
{%- endfor -%}
{%- endif -%}
{%- endif -%}
{%- endfor -%}
<nav aria-label="Breadcrumb" class="breadcrumb-nav">
<ol class="breadcrumb-nav-list">
{% if page.grand_parent %}
<li class="breadcrumb-nav-list-item"><a href="{{ first_level_url }}">{{ page.grand_parent }}</a></li>
<li class="breadcrumb-nav-list-item"><a href="{{ second_level_url }}">{{ page.parent }}</a></li>
{% else %}
<li class="breadcrumb-nav-list-item"><a href="{{ first_level_url }}">{{ page.parent }}</a></li>
{% endif %}
<li class="breadcrumb-nav-list-item"><span>{{ page.title }}</span></li>
</ol>
</nav>
{% endif %}
{% endunless %}
<div id="main-content" class="main-content" role="main">
{% if site.heading_anchors != false %}
{% include vendor/anchor_headings.html html=content beforeHeading="true" anchorBody="<svg viewBox=\"0 0 16 16\" aria-hidden=\"true\"><use xlink:href=\"#svg-link\"></use></svg>" anchorClass="anchor-heading" anchorAttrs="aria-labelledby=\"%html_id%\"" %}
{% else %}
<p class="warning" style="margin-top: 0">Like OpenSearch itself, this documentation is a beta. It has content gaps and might contain bugs.</p>
{{ content }}
{% endif %}
{% if page.has_children == true and page.has_toc != false %}
<hr>
<h2 class="text-delta">Table of contents</h2>
<ul>
{%- assign children_list = pages_list | where: "parent", page.title | where: "grand_parent", page.parent -%}
{% for child in children_list %}
<li>
<a href="{{ child.url | absolute_url }}">{{ child.title }}</a>{% if child.summary %} - {{ child.summary }}{% endif %}
</li>
{% endfor %}
</ul>
{% endif %}
{% capture footer_custom %}
{%- include footer_custom.html -%}
{% endcapture %}
{% if footer_custom != "" or site.last_edit_timestamp or site.gh_edit_link %}
<hr>
<footer>
{% if site.back_to_top %}
<p><a href="#top" id="back-to-top">{{ site.back_to_top_text }}</a></p>
{% endif %}
{{ footer_custom }}
{% if site.last_edit_timestamp or site.gh_edit_link %}
<div class="d-flex mt-2">
{% if site.last_edit_timestamp and site.last_edit_time_format and page.last_modified_date %}
<p class="text-small text-grey-dk-000 mb-0 mr-2">
Page last modified: <span class="d-inline-block">{{ page.last_modified_date | date: site.last_edit_time_format }}</span>.
</p>
{% endif %}
{% if
site.gh_edit_link and
site.gh_edit_link_text and
site.gh_edit_repository and
site.gh_edit_branch and
site.gh_edit_view_mode
%}
<p class="text-small text-grey-dk-000 mb-0">
<a href="{{ site.gh_edit_repository }}/{{ site.gh_edit_view_mode }}/{{ site.gh_edit_branch }}{% if site.gh_edit_source %}/{{ site.gh_edit_source }}{% endif %}/{{ page.path }}" id="edit-this-page">{{ site.gh_edit_link_text }}</a>
</p>
{% endif %}
</div>
{% endif %}
</footer>
{% endif %}
</div>
</div>
<div class="toc">
{% include toc.html html=content h_min=2 h_max=2 class="toc-list" item_class="toc-item" sanitize=true %}
</div>
{% if site.search_enabled != false %}
{% if site.search.button %}
<a href="#" id="search-button" class="search-button">
<svg viewBox="0 0 24 24" class="icon"><use xlink:href="#svg-search"></use></svg>
</a>
{% endif %}
<div class="search-overlay"></div>
{% endif %}
</div>
{% if site.anchor_links != nil %}
<script>
anchors.add();
</script>
{% endif %}
</body>
</html>

View File

@ -0,0 +1,75 @@
//
// Brand colors
//
$white: #FFFFFF;
$grey-dk-300: #241F21; // Error
$grey-dk-250: mix(white, $grey-dk-300, 12.5%);
$grey-dk-200: mix(white, $grey-dk-300, 25%);
$grey-dk-100: mix(white, $grey-dk-300, 50%);
$grey-dk-000: mix(white, $grey-dk-300, 75%);
$grey-lt-300: #DBDBDB; // Cloud
$grey-lt-200: mix(white, $grey-lt-300, 25%);
$grey-lt-100: mix(white, $grey-lt-300, 50%);
$grey-lt-000: mix(white, $grey-lt-300, 75%);
$blue-300: #00007C; // Meta
$blue-200: mix(white, $blue-300, 25%);
$blue-100: mix(white, $blue-300, 50%);
$blue-000: mix(white, $blue-300, 75%);
$purple-300: #9600FF; // Prpl
$purple-200: mix(white, $purple-300, 25%);
$purple-100: mix(white, $purple-300, 50%);
$purple-000: mix(white, $purple-300, 75%);
$green-300: #00671A; // Element
$green-200: mix(white, $green-300, 25%);
$green-100: mix(white, $green-300, 50%);
$green-000: mix(white, $green-300, 75%);
$yellow-300: #FFDF00; // Kan-Banana
$yellow-200: mix(white, $yellow-300, 25%);
$yellow-100: mix(white, $yellow-300, 50%);
$yellow-000: mix(white, $yellow-300, 75%);
$red-300: #BD145A; // Ruby
$red-200: mix(white, $red-300, 25%);
$red-100: mix(white, $red-300, 50%);
$red-000: mix(white, $red-300, 75%);
$blue-lt-300: #0000FF; // Cascade
$blue-lt-200: mix(white, $blue-lt-300, 25%);
$blue-lt-100: mix(white, $blue-lt-300, 50%);
$blue-lt-000: mix(white, $blue-lt-300, 75%);
/*
Other, unused brand colors
Float #2797F4
Firewall #0FF006B
Hyper Pink #F261A1
Cluster #ED20EB
Back End #808080
Python #25EE5C
Warm Node #FEA501
*/
$body-background-color: $white;
$sidebar-color: $grey-lt-000;
$code-background-color: $grey-lt-000;
$body-text-color: $grey-dk-200;
$body-heading-color: $grey-dk-300;
$nav-child-link-color: $grey-dk-200;
$link-color: mix(black, $blue-lt-300, 37.5%);
$btn-primary-color: $purple-300;
$base-button-color: $grey-lt-000;
// $border-color: $grey-dk-200;
// $search-result-preview-color: $grey-dk-000;
// $search-background-color: $grey-dk-250;
// $table-background-color: $grey-dk-250;
// $feedback-color: darken($sidebar-color, 3%);

View File

@ -0,0 +1,75 @@
//
// Brand colors
//
$white: #FFFFFF;
$grey-dk-300: #002A3A; //
$grey-dk-250: mix(white, $grey-dk-300, 12.5%);
$grey-dk-200: mix(white, $grey-dk-300, 25%);
$grey-dk-100: mix(white, $grey-dk-300, 50%);
$grey-dk-000: mix(white, $grey-dk-300, 75%);
$grey-lt-300: #D9E1E2; //
$grey-lt-200: mix(white, $grey-lt-300, 25%);
$grey-lt-100: mix(white, $grey-lt-300, 50%);
$grey-lt-000: mix(white, $grey-lt-300, 75%);
$blue-300: #005eb8; //
$blue-200: mix(white, $blue-300, 25%);
$blue-100: mix(white, $blue-300, 50%);
$blue-000: mix(white, $blue-300, 75%);
$purple-300: #963CBD; //
$purple-200: mix(white, $purple-300, 25%);
$purple-100: mix(white, $purple-300, 50%);
$purple-000: mix(white, $purple-300, 75%);
$green-300: #2cd5c4; //
$green-200: mix(white, $green-300, 25%);
$green-100: mix(white, $green-300, 50%);
$green-000: mix(white, $green-300, 75%);
$yellow-300: #FFDF00; //
$yellow-200: mix(white, $yellow-300, 25%);
$yellow-100: mix(white, $yellow-300, 50%);
$yellow-000: mix(white, $yellow-300, 75%);
$red-300: #F65275; //
$red-200: mix(white, $red-300, 25%);
$red-100: mix(white, $red-300, 50%);
$red-000: mix(white, $red-300, 75%);
$blue-lt-300: #00A3E0; //
$blue-lt-200: mix(white, $blue-lt-300, 25%);
$blue-lt-100: mix(white, $blue-lt-300, 50%);
$blue-lt-000: mix(white, $blue-lt-300, 75%);
/*
Other, unused brand colors
Float #2797F4
Firewall #0FF006B
Hyper Pink #F261A1
Cluster #ED20EB
Back End #808080
Python #25EE5C
Warm Node #FEA501
*/
$body-background-color: $white;
$sidebar-color: $grey-lt-000;
$code-background-color: $grey-lt-000;
$body-text-color: $grey-dk-200;
$body-heading-color: $grey-dk-300;
$nav-child-link-color: $grey-dk-200;
$link-color: mix(black, $blue-lt-300, 37.5%);
$btn-primary-color: $purple-300;
$base-button-color: $grey-lt-000;
// $border-color: $grey-dk-200;
// $search-result-preview-color: $grey-dk-000;
// $search-background-color: $grey-dk-250;
// $table-background-color: $grey-dk-250;
// $feedback-color: darken($sidebar-color, 3%);

232
_sass/custom/custom.scss Executable file
View File

@ -0,0 +1,232 @@
@import url('https://fonts.googleapis.com/css?family=Open+Sans:400,400i,600,700');
// Additional variables
$table-border-color: $grey-lt-300;
$toc-width: 232px !default;
$red-dk-200: mix(black, $red-300, 25%);
// Replaces xl size
$media-queries: (
xs: 320px,
sm: 500px,
md: $content-width,
lg: $content-width + $nav-width,
xl: $content-width + $nav-width + $toc-width
);
body {
padding-bottom: 6rem;
font-family: 'Open Sans', sans-serif;
@include mq(md) {
padding-bottom: 0;
}
}
code {
font-family: "SFMono-Regular", Menlo, "DejaVu Sans Mono", "Droid Sans Mono", Consolas, Monospace;
font-size: 0.75rem;
}
.site-nav {
padding-top: 2rem;
}
.main-content {
ol {
> li {
&:before {
color: $grey-dk-100;
}
}
}
ul {
> li {
&:before {
color: $grey-dk-100;
}
}
}
h1, h2, h3, h4, h5, h6 {
margin-top: 2.4rem;
margin-bottom: 0.8rem;
}
.highlight {
line-height: 1.4;
}
}
.site-title {
@include mq(md) {
padding-top: 1rem;
padding-bottom: 0.6rem;
padding-left: $sp-5;
}
}
.external-arrow {
position: relative;
top: 0.125rem;
left: 0.25rem;
}
img {
padding: 1rem 0;
}
.img-border {
border: 1px solid $grey-lt-200;
}
// Note, tip, and warning blocks
%callout {
border: 1px solid $grey-lt-200;
border-radius: 5px;
margin: 1rem 0;
padding: 1rem;
position: relative;
}
.note {
@extend %callout;
border-left: 5px solid $blue-300;
}
.tip {
@extend %callout;
border-left: 5px solid $green-300;
}
.warning {
@extend %callout;
border-left: 5px solid $red-dk-200;
}
// Labels
.label,
.label-blue {
background-color: $blue-300;
}
.label-green {
background-color: $green-300;
}
.label-purple {
background-color: $purple-300;
}
.label-red {
background-color: $red-300;
}
.label-yellow {
color: $grey-dk-200;
background-color: $yellow-300;
}
// Buttons
.btn-primary {
@include btn-color($white, $btn-primary-color);
}
.btn-purple {
@include btn-color($white, $purple-300);
}
.btn-blue {
@include btn-color($white, $blue-300);
}
.btn-green {
@include btn-color($white, $green-300);
}
// Tables
th,
td {
border-bottom: $border rgba($table-border-color, 0.5);
border-left: $border $table-border-color;
}
thead {
th {
border-bottom: 1px solid $table-border-color;
}
}
td {
pre {
margin-bottom: 0;
}
}
// Keeps labels high and tight next to headers
h1 + p.label {
margin: -23px 0 0 0;
}
h2 + p.label {
margin: -15px 0 0 0;
}
h3 + p.label {
margin: -10px 0 0 0;
}
h4 + p.label,
h5 + p.label,
h6 + p.label {
margin: -7px 0 0 0;
}
// Modifies margins in xl layout to support TOC
.side-bar {
@include mq(xl) {
width: calc((100% - #{$nav-width + $content-width + $toc-width}) / 2 + #{$nav-width});
min-width: $nav-width;
}
}
.main {
@include mq(xl) {
margin-left: calc((100% - #{$nav-width + $content-width + $toc-width}) / 2 + #{$nav-width});
}
}
// Adds TOC to righthand side in xl layout
.toc {
display: none;
@include mq(xl) {
z-index: 0;
display: block;
position: fixed;
top: 59px;
right: calc((100% - #{$nav-width + $content-width + $toc-width}) / 2);
width: $toc-width;
max-height: calc(100% - 118px);
overflow: auto;
}
}
.toc-list {
&:before {
content: "On this page";
// Basically duplicates h4 styling
font-size: 12px;
font-weight: 300;
text-transform: uppercase;
letter-spacing: 0.1em;
color: $grey-dk-300;
line-height: 1.8;
}
border: 1px solid $border-color;
font-size: 14px;
list-style-type: none;
background-color: $sidebar-color;
padding: $sp-6 $sp-4;
margin-left: $sp-6;
margin-right: 0;
margin-bottom: 0;
overflow: auto;
}
.toc-item {
padding-top: .25rem;
padding-bottom: .25rem;
}

Binary file not shown.

Binary file not shown.

BIN
assets/images/fake-logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 39 KiB

View File

@ -0,0 +1,7 @@
<svg width="743" height="142" viewBox="0 0 743 142" fill="none" xmlns="http://www.w3.org/2000/svg">
<path fill-rule="evenodd" clip-rule="evenodd" d="M123.975 53C126.198 53 128 54.8021 128 57.0252C128 98.9849 93.9849 133 52.0252 133C49.8021 133 48 131.198 48 128.975C48 126.752 49.8021 124.95 52.0252 124.95C89.5388 124.95 119.95 94.5388 119.95 57.0252C119.95 54.8021 121.752 53 123.975 53Z" fill="#005EB8"/>
<path d="M96.1628 81C100.514 73.901 104.723 64.4356 103.895 51.1842C102.18 23.7345 77.3178 2.91074 53.8413 5.16747C44.6506 6.05093 35.2136 13.5424 36.052 26.961C36.4164 32.7923 39.2705 36.2339 43.9088 38.8799C48.3237 41.3984 53.9956 42.9938 60.4255 44.8023C68.1925 46.9868 77.2017 49.4405 84.1261 54.5433C92.4251 60.6591 98.0982 67.7485 96.1628 81Z" fill="#003B5C"/>
<path d="M7.83722 33C3.48552 40.099 -0.723013 49.5644 0.104986 62.8158C1.82014 90.2655 26.6822 111.089 50.1587 108.833C59.3494 107.949 68.7864 100.458 67.948 87.039C67.5836 81.2077 64.7295 77.7661 60.0912 75.1201C55.6763 72.6016 50.0044 71.0062 43.5745 69.1977C35.8075 67.0132 26.7983 64.5595 19.8739 59.4567C11.5749 53.3409 5.90185 46.2515 7.83722 33Z" fill="#005EB8"/>
<path fill-rule="evenodd" clip-rule="evenodd" d="M725 61V107H743V57C743 47.7853 741.199 40.8241 737.598 36.1022C733.996 31.3344 728.564 29 721.5 29C713.835 29 707.694 33.4816 704 41H703C703.273 37.1181 703.519 34.8944 703.699 33.2706C703.885 31.596 704 30.5594 704 29V0H686V107H705V70C705 61.7022 705.277 55.563 706.985 51.162C708.694 46.7151 711.672 44.4916 715.92 44.4916C721.599 44.4916 725 49.7682 725 61ZM463.704 101.458C468.568 96.4296 471 89.1872 471 79.7305C471 73.8259 469.667 68.5671 467.001 63.9541C464.382 59.3411 459.729 54.8204 453.041 50.3919C448.084 47.1628 444.6 44.2797 442.589 41.7425C440.624 39.2054 439.642 36.23 439.642 32.8164C439.642 29.3566 440.461 26.635 442.098 24.6514C443.781 22.6216 446.166 21.6068 449.253 21.6068C452.059 21.6068 454.678 22.1142 457.11 23.1291C459.589 24.1439 461.95 25.2972 464.195 26.5888L470.509 11.5043C463.26 7.16811 455.707 5 447.85 5C439.619 5 433.048 7.53715 428.138 12.6114C423.274 17.6857 420.842 24.5591 420.842 33.2315C420.842 37.7523 421.45 41.7195 422.666 45.1331C423.928 48.5467 425.682 51.6374 427.927 54.4052C430.219 57.1269 433.563 59.9869 437.959 62.9854C443.01 66.399 446.634 69.5128 448.832 72.3267C451.03 75.0945 452.129 78.1621 452.129 81.5296C452.129 84.9432 451.194 87.6418 449.323 89.6254C447.499 91.609 444.763 92.6008 441.116 92.6008C434.708 92.6008 427.67 90.1329 420 85.1969V103.81C426.267 107.27 433.867 109 442.799 109C451.919 109 458.887 106.486 463.704 101.458ZM483.348 98.7489C488.959 105.583 496.612 109 506.304 109C514.606 109 521.725 107.222 527.661 103.667V88.4978C521.354 92.2381 515.186 94.1082 509.157 94.1082C504.426 94.1082 500.716 92.4459 498.026 89.1212C495.336 85.7504 494.139 80.8802 494 74H531V63.9091C531 52.873 528.565 44.3074 523.696 38.2121C518.826 32.0707 512.171 29 503.73 29C494.687 29 487.638 32.5786 482.583 39.7359C477.528 46.8932 475 56.8442 475 69.5887C475 82.1486 477.783 91.8687 483.348 98.7489ZM497.052 47.4242C498.768 44.6075 500.948 43.1991 503.591 43.1991C506.42 43.1991 508.646 44.6537 510.27 47.5628C511.893 50.4719 512.907 55.3665 513 61H494C494.278 55.1356 495.336 50.1948 497.052 47.4242ZM576 107L573 97H572C569.243 101.517 566.591 104.736 563.647 106.442C560.703 108.147 556.989 109 552.503 109C546.756 109 542.224 106.88 538.906 102.639C535.635 98.3979 534 92.4976 534 84.9378C534 76.8248 536.243 70.8093 540.728 66.8911C545.261 62.9268 552.013 60.7373 560.984 60.3224L571.357 59.9075V54.376C571.357 47.185 568.203 43.5895 561.895 43.5895C557.222 43.5895 551.849 45.3872 545.775 48.9827L539.327 36.2602C547.083 31.4201 555.389 29 564.5 29C572.77 29 579.183 31.3509 583.482 36.0527C587.827 40.7084 590 47.3232 590 55.8971V107H576ZM560.143 94.618C563.554 94.618 566.264 93.1199 568.273 90.1236C570.329 87.0812 571.357 83.0478 571.357 78.0233V71.5238L565.61 71.8003C561.358 72.0308 558.227 73.2293 556.218 75.3959C554.255 77.5624 553.274 80.7891 553.274 85.0761C553.274 91.4373 555.564 94.618 560.143 94.618ZM636 30.5C633.86 29.8101 630.674 29 628.443 29C625.301 29 622.546 30.0349 620.179 32.1046C617.811 34.1743 616.004 36.1706 614 41H613L610 31H596V107H614.927V67C614.927 60.2849 615.352 55.9525 617.72 52.457C620.088 48.9154 623.48 47.1447 627.897 47.1447C629.946 47.1447 631.725 47.5401 633 48L636 30.5ZM664 109C654.885 109 647.908 105.956 643.145 99.2604C638.382 92.5649 636 82.7294 636 69.7539C636 56.1782 638.244 46.0425 642.733 39.347C647.267 32.6515 654.034 29 663.469 29C666.309 29 669.504 29.7193 672.618 30.5505C675.733 31.3816 679.527 32.43 682 34L675.779 48.4899C671.977 46.2272 668.611 45.0959 665.679 45.0959C661.786 45.0959 658.969 47.1508 657.229 51.2604C655.534 55.3239 654.687 61.4422 654.687 69.6154C654.687 77.6039 655.534 83.5836 657.229 87.5548C658.924 91.4798 661.695 93.4422 665.542 93.4422C670.122 93.4422 674.908 91.8261 679.901 88.5938V104.802C675.092 107.803 669.817 109 664 109Z" fill="#003B5C"/>
<path fill-rule="evenodd" clip-rule="evenodd" d="M215.554 95.5249C221.851 86.5415 225 73.6884 225 56.9655C225 40.2425 221.874 27.4124 215.623 18.4751C209.372 9.49169 200.389 5 188.674 5C176.82 5 167.744 9.46866 161.446 18.406C155.149 27.2972 152 40.1043 152 56.8272C152 73.6884 155.149 86.6106 161.446 95.594C167.744 104.531 176.773 109 188.535 109C200.25 109 209.256 104.508 215.554 95.5249ZM175.685 83.2937C172.768 77.2587 171.309 68.4826 171.309 56.9655C171.309 45.4022 172.768 36.6261 175.685 30.6372C178.602 24.6022 182.932 21.5847 188.674 21.5847C199.972 21.5847 205.621 33.3783 205.621 56.9655C205.621 80.5526 199.926 92.3462 188.535 92.3462C182.886 92.3462 178.602 89.3287 175.685 83.2937ZM256.372 106.996C258.938 108.477 261.7 109 265 109C272.059 109 277.801 105.653 281.881 98.5224C285.96 91.3919 288 81.5528 288 69.0049C288 56.2719 286.029 46.4327 282.087 39.4874C278.145 32.4958 272.69 29 265.723 29C258.48 29 252.805 33.3139 249 41H248L245 31H231V142H249V109C249 107.704 248.733 103.297 248 97H249C250.5 101.5 253.85 105.468 256.372 106.996ZM251.765 49.7664C253.369 46.3864 255.959 44.6964 259.534 44.6964C262.881 44.6964 265.333 46.6874 266.891 50.6693C268.496 54.6513 269.298 60.6706 269.298 68.7271C269.298 85.118 266.089 93.3135 259.672 93.3135C255.959 93.3135 253.3 91.3688 251.696 87.4794C250.092 83.59 249.29 77.3856 249.29 68.866V66.4352C249.381 58.6564 250.206 53.1002 251.765 49.7664ZM323.304 109C313.612 109 305.959 105.583 300.348 98.7489C294.783 91.8687 292 82.1486 292 69.5887C292 56.8442 294.528 46.8932 299.583 39.7359C304.638 32.5786 311.687 29 320.73 29C329.171 29 335.826 32.0707 340.696 38.2121C345.565 44.3074 348 52.873 348 63.9091V74H311C311.139 80.8802 312.336 85.7504 315.026 89.1212C317.716 92.4459 321.426 94.1082 326.157 94.1082C332.186 94.1082 338.354 92.2381 344.661 88.4978V103.667C338.725 107.222 331.606 109 323.304 109ZM320.591 43.1991C317.948 43.1991 315.768 44.6075 314.052 47.4242C312.336 50.1948 311.278 55.1356 311 61H330C329.907 55.3665 328.893 50.4719 327.27 47.5628C325.646 44.6537 323.42 43.1991 320.591 43.1991ZM393 61V107H411V57.3982C411 48.1178 409.245 41.0646 405.736 36.2388C402.273 31.4129 397.033 29 390.015 29C385.859 29 382.235 30.0209 379.141 32.0626C376.047 34.0579 373.662 37.427 372 41H371L368.5 31H354V107H373V70.5C373 61.0803 373.346 54.6605 375.193 50.7163C377.04 46.7257 379.949 44.7304 383.92 44.7304C386.921 44.7304 389.091 46.1689 390.43 49.0458C391.769 51.9228 393 55.3853 393 61Z" fill="#005EB8"/>
</svg>

After

Width:  |  Height:  |  Size: 7.3 KiB

1
build.sh Normal file
View File

@ -0,0 +1 @@
bundle exec jekyll serve --host localhost --port 4000 --incremental --livereload --open-url

5
check-links.sh Normal file
View File

@ -0,0 +1,5 @@
# Checks for broken link in the documentation.
# Run `bundle exec jekyll serve` first.
# Uses https://github.com/stevenvachon/broken-link-checker
# I have no idea why we have to exclude the ISM section, but that's the only way I can get this to run. - ae
blc http://127.0.0.1:4000/docs/ -ro --filter-level 0 --exclude http://127.0.0.1:4000/docs/docs/ism/ --exclude http://localhost:5601/

2174
docs/ad/api.md Normal file

File diff suppressed because it is too large Load Diff

167
docs/ad/index.md Normal file
View File

@ -0,0 +1,167 @@
---
layout: default
title: Anomaly detection
nav_order: 46
has_children: true
---
# Anomaly detection
An anomaly is any unusual change in behavior. Anomalies in your time-series data can lead to valuable insights. For example, for IT infrastructure data, an anomaly in the memory usage metric might help you uncover early signs of a system failure.
Discovering anomalies using conventional methods such as creating visualizations and dashboards can be challenging. You can set an alert based on a static threshold, but this requires prior domain knowledge and is not adaptive to data that exhibits organic growth or seasonal behavior.
The anomaly detection feature automatically detects anomalies in your OpenSearch data in near real-time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an `anomaly grade` and `confidence score` value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Random Cut Forests](https://pdfs.semanticscholar.org/8bba/52e9797f2e2cc9a823dbd12514d02f29c8b9.pdf?_ga=2.56302955.1913766445.1574109076-1059151610.1574109076).
You can pair the anomaly detection plugin with the [alerting plugin](../alerting/) to notify you as soon as an anomaly is detected.
To use the anomaly detection plugin, your computer needs to have more than one CPU core.
{: .note }
## Get started with Anomaly Detection
To get started, choose **Anomaly Detection** in OpenSearch Dashboards.
To first test with sample streaming data, choose **Sample Detectors** and try out one of the preconfigured detectors.
### Step 1: Create a detector
A detector is an individual anomaly detection task. You can create multiple detectors, and all the detectors can run simultaneously, with each analyzing data from different sources.
1. Choose **Create Detector**.
1. Enter the **Name** of the detector and a brief **Description**. Make sure the name that you enter is unique and descriptive enough to help you to identify the purpose of this detector.
1. For **Data source**, choose the index that you want to use as the data source. You can optionally use index patterns to choose multiple indices.
1. Choose the **Timestamp field** in your index.
1. For **Data filter**, you can optionally filter the index that you chose as the data source. From the **Filter type** menu, choose **Visual filter**, and then design your filter query by selecting **Fields**, **Operator**, and **Value**, or choose **Custom Expression** and add in your own JSON filter query.
1. For **Detector operation settings**, define the **Detector interval** to set the time interval at which the detector collects data.
- The detector aggregates the data in this interval, then feeds the aggregated result into the anomaly detection model.
The shorter you set this interval, the fewer data points the detector aggregates.
The anomaly detection model uses a shingling process, a technique that uses consecutive data points to create a sample for the model. This process needs a certain number of aggregated data points from contiguous intervals.
- We recommend you set the detector interval based on your actual data. Too long of an interval might delay the results and too short of an interval might miss some data and also not have a sufficient number of consecutive data points for the shingle process.
1. To add extra processing time for data collection, specify a **Window delay** value. This is to tell the detector that the data is not ingested into OpenSearch in real time but with a certain delay.
Set the window delay to shift the detector interval to account for this delay.
- For example, say the detector interval is 10 minutes and data is ingested into your cluster with a general delay of 1 minute.
Assume the detector runs at 2:00, the detector attempts to get the last 10 minutes of data from 1:50 to 2:00, but because of the 1-minute delay, it only gets 9 minutes of data and misses the data from 1:59 to 2:00.
Setting the window delay to 1 minute, shifts the interval window to 1:49 - 1:59, so the detector accounts for all 10 minutes of the detector interval time.
1. Choose **Create**.
After you create the detector, the next step is to add features to it.
### Step 2: Add features to your detector
In this case, a feature is the field in your index that you want to check for anomalies. A detector can discover anomalies across one or more features. You must choose an aggregation method for each feature: `average()`, `count()`, `sum()`, `min()`, or `max()`. The aggregation method determines what constitutes an anomaly.
For example, if you choose `min()`, the detector focuses on finding anomalies based on the minimum values of your feature. If you choose `average()`, the detector finds anomalies based on the average values of your feature.
A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data might further amplify this negative impact. Selecting the optimal feature set is usually an iterative process. We recommend experimenting with a historical detector with different feature sets and checking the precision before moving on to real-time detectors. By default, the maximum number of features for a detector is 5. You can adjust this limit with the `opensearch.anomaly_detection.max_anomaly_features` setting.
{: .note }
1. On the **Model configuration** page, enter the **Feature name**.
1. For **Find anomalies based on**, choose the method to find anomalies. For **Field Value** menu, choose the **field** and the **aggregation method**. Or choose **Custom expression**, and add in your own JSON aggregation query.
#### (Optional) Set a category field for high cardinality
You can categorize anomalies based on a keyword or IP field type.
The category field categorizes or slices the source time series with a dimension like IP addresses, product IDs, country codes, and so on. This helps to see a granular view of anomalies within each entity of the category field to isolate and debug issues.
To set a category field, choose **Enable a category field** and select a field.
Only a certain number of unique entities are supported in the category field. Use the following equation to calculate the recommended total number of entities number supported in a cluster:
```
(data nodes * heap size * anomaly detection maximum memory percentage) / (entity size of a detector)
```
This formula provides a good starting point, test with a representative workload and see how it goes.
{: .note }
For example, for a cluster with 3 data nodes, each with 8G of JVM heap size, a maximum memory percentage of 10% (default), and the entity size of the detector as 1MB: the total number of unique entities supported is (8.096 * 10^9 * 0.1 / 1M ) * 3 = 2429.
#### Set a window size
Set the number of aggregation intervals from your data stream to consider in a detection window. We recommend you choose this value based on your actual data to see which one leads to the best results for your use case.
Based on experiments performed on a wide variety of one-dimensional data streams, we recommend using a window size between 1 and 16. The default window size is 8. If you have set the category field for high cardinality, the default window size is 1.
If you expect missing values in your data or if you want the anomalies based on the current interval, choose 1. If your data is continuously ingested and you want the anomalies based on multiple intervals, choose a larger window size.
#### Preview sample anomalies
Preview sample anomalies and adjust the feature settings if needed.
For sample previews, the anomaly detection plugin selects a small number of data samples---for example, one data point every 30 minutes---and uses interpolation to estimate the remaining data points to approximate the actual feature data. It loads this sample dataset into the detector. The detector uses this sample dataset to generate a sample preview of anomaly results.
Examine the sample preview and use it to fine-tune your feature configurations, for example, enable or disable features, to get more accurate results.
1. Choose **Save and start detector**.
1. Choose between automatically starting the detector (recommended) or manually starting the detector at a later time.
### Step 3: Observe the results
Choose the **Anomaly results** tab.
You will have to wait for some time to see the anomaly results.
If the detector interval is 10 minutes, the detector might take more than an hour to start, as it's waiting for sufficient data to generate anomalies.
A shorter interval means the model passes the shingle process more quickly and starts to generate the anomaly results sooner.
Use the [profile detector](./api#profile-detector) operation to make sure you check you have sufficient data points.
If you see the detector pending in "initialization" for longer than a day, aggregate your existing data using the detector interval to check if for any missing data points. If you find a lot of missing data points from the aggregated data, consider increasing the detector interval.
![Anomaly detection results](../images/ad.png)
- The **Live anomalies** chart displays the live anomaly results for the last 60 intervals. For example, if the interval is set to 10, it shows the results for the last 600 minutes. This chart refreshes every 30 seconds.
- The **Anomaly history** chart plots the anomaly grade with the corresponding measure of confidence.
- The **Feature breakdown** graph plots the features based on the aggregation method. You can vary the date-time range of the detector.
- The **Anomaly occurrence** table shows the `Start time`, `End time`, `Data confidence`, and `Anomaly grade` for each anomaly detected.
Anomaly grade is a number between 0 and 1 that indicates the level of severity of how anomalous a data point is. An anomaly grade of 0 represents “not an anomaly,” and a non-zero value represents the relative severity of the anomaly. The confidence score is an estimate of the probability that the reported anomaly grade matches the expected anomaly grade. Confidence increases as the model observes more data and learns the data behavior and trends. Note that confidence is distinct from model accuracy.
If you set the category field, you see an additional **Heat map** chart. The heat map correlates results for anomalous entities. This chart is empty until you select an anomalous entity. You also see the anomaly and feature line chart for the time period of the anomaly (`anomaly_grade` > 0).
Choose a filled rectangle to see a more detailed view of the anomaly.
{: .note }
### Step 4: Set up alerts
To create a monitor to send you notifications when any anomalies are detected, choose **Set up alerts**.
You're redirected to the **Alerting**, **Add monitor** page.
For steps to create a monitor and set notifications based on your anomaly detector, see [Monitor](../alerting/monitors/).
If you stop or delete a detector, make sure to delete any monitors associated with the detector.
### Step 5: Adjust the model
To see all the configuration settings, choose the **Detector configuration** tab.
1. To make any changes to the detector configuration, or fine tune the time interval to minimize any false positives, in the **Detector configuration** section, choose **Edit**.
- You need to stop the detector to change the detector configuration. In the pop-up box, confirm that you want to stop the detector and proceed.
1. To enable or disable features, in the **Features** section, choose **Edit** and adjust the feature settings as needed. After you make your changes, choose **Save and start detector**.
- Choose between automatically starting the detector (recommended) or manually starting the detector at a later time.
### Step 6: Analyze historical data
Analyzing historical data helps you get familiar with the anomaly detection plugin. You can also evaluate the performance of a detector with historical data to further fine-tune it.
To use a historical detector, the date range that you specify must have data present in at least 1,000 detection intervals.
{: .note }
1. Choose **Historical detectors** and **Create historical detector**.
1. Enter the **Name** of the detector and a brief **Description**.
1. For **Data source**, choose the index that you want to use as the data source. You can optionally use index patterns to choose multiple indices.
1. For **Time range**, select a time range for historical analysis.
1. For **Detector settings**, choose to use settings of an existing detector. Or choose the **Timestamp field** in your index, add individual features to the detector, and set the detector interval.
1. You can choose to run the historical detector automatically after creating.
1. Choose **Create**.
- You can stop the historical detector even before it completes.
### Step 7: Manage your detectors
Go to the **Detector details** page to change or delete your detectors.
1. To make changes to your detector, choose the detector name to open the detector details page.
1. Choose **Actions**, and then choose **Edit detector**.
- You need to stop the detector to change the detector configuration. In the pop-up box, confirm that you want to stop the detector and proceed.
1. After making your changes, choose **Save changes**.
1. To delete your detector, choose **Actions**, and then choose **Delete detector**.
- In the pop-up box, type `delete` to confirm and choose **Delete**.

86
docs/ad/security.md Normal file
View File

@ -0,0 +1,86 @@
---
layout: default
title: Anomaly detection security
nav_order: 10
parent: Anomaly detection
has_children: false
---
# Anomaly detection security
You can use the security plugin with anomaly detection to limit non-admin users to specific actions. For example, you might want some users to only be able to create, update, or delete detectors, while others to only view detectors.
All anomaly detection indices are protected as system indices. Only a super admin user or an admin user with a TLS certificate can access system indices. For more information, see [System indices](../../security/configuration/system-indices/).
Security for anomaly detection works the same as [security for alerting](../../alerting/security/).
## Basic permissions
As an admin user, you can use the security plugin to assign specific permissions to users based on which APIs they need access to. For a list of supported APIs, see [Anomaly Detection API](../api/).
The security plugin has two built-in roles that cover most anomaly detection use cases: `anomaly_full_access` and `anomaly_read_access`. For descriptions of each, see [Predefined roles](../../security/access-control/users-roles/#predefined-roles).
If these roles don't meet your needs, mix and match individual anomaly detection [permissions](../../security/access-control/permissions/) to suit your use case. Each action corresponds to an operation in the REST API. For example, the `cluster:admin/opensearch/ad/detector/delete` permission lets you delete detectors.
## (Advanced) Limit access by backend role
Use backend roles to configure fine-grained access to individual detectors based on roles. For example, users of different departments in an organization can view detectors owned by their own department.
First, make sure that your users have the appropriate [backend roles](../../security/access-control/). Backend roles usually come from an [LDAP server](../../security/configuration/ldap/) or [SAML provider](../../security/configuration/saml/), but if you use the internal user database, you can use the REST API to [add them manually](../../security/access-control/api/#create-user).
Next, enable the following setting:
```json
PUT _cluster/settings
{
"transient": {
"opensearch.anomaly_detection.filter_by_backend_roles": "true"
}
}
```
Now when users view anomaly detection resources in OpenSearch Dashboards (or make REST API calls), they only see detectors created by users who share at least one backend role.
For example, consider two users: `alice` and `bob`.
`alice` has an analyst backend role:
```json
PUT _opensearch/_security/api/internalusers/alice
{
"password": "alice",
"backend_roles": [
"analyst"
],
"attributes": {}
}
```
`bob` has a human-resources backend role:
```json
PUT _opensearch/_security/api/internalusers/bob
{
"password": "bob",
"backend_roles": [
"human-resources"
],
"attributes": {}
}
```
Both `alice` and `bob` have full access to anomaly detection:
```json
PUT _opensearch/_security/api/rolesmapping/anomaly_full_access
{
"backend_roles": [],
"hosts": [],
"users": [
"alice",
"bob"
]
}
```
Because they have different backend roles, `alice` and `bob` cannot view each other's detectors and its results.

42
docs/ad/settings.md Normal file
View File

@ -0,0 +1,42 @@
---
layout: default
title: Settings
parent: Anomaly detection
nav_order: 4
---
# Settings
The anomaly detection plugin adds several settings to the standard OpenSearch cluster settings.
They are dynamic, so you can change the default behavior of the plugin without restarting your cluster.
You can mark them `persistent` or `transient`.
For example, to update the retention period of the result index:
```json
PUT _cluster/settings
{
"transient": {
"opensearch.anomaly_detection.ad_result_history_retention_period": "5m"
}
}
```
Setting | Default | Description
:--- | :--- | :---
`opensearch.anomaly_detection.enabled` | True | Whether the anomaly detection plugin is enabled or not. If disabled, all detectors immediately stop running.
`opensearch.anomaly_detection.max_anomaly_detectors` | 1,000 | The maximum number of non-high cardinality detectors (no category field) users can create.
`opensearch.anomaly_detection.max_multi_entity_anomaly_detectors` | 10 | The maximum number of high cardinality detectors (with category field) in a cluster.
`opensearch.anomaly_detection.max_anomaly_features` | 5 | The maximum number of features for a detector.
`opensearch.anomaly_detection.ad_result_history_rollover_period` | 12h | How often the rollover condition is checked. If `true`, the plugin rolls over the result index to a new index.
`opensearch.anomaly_detection.ad_result_history_max_docs` | 250000000 | The maximum number of documents in one result index. The plugin only counts refreshed documents in the primary shards.
`opensearch.anomaly_detection.ad_result_history_retention_period` | 30d | The maximum age of the result index. If its age exceeds the threshold, the plugin deletes the rolled over result index. If the cluster has only one result index, the plugin keeps it even if it's older than its configured retention period.
`opensearch.anomaly_detection.max_entities_per_query` | 1,000 | The maximum unique values per detection interval for high cardinality detectors. By default, if the category field has more than 1,000 unique values in a detector interval, the plugin selects the top 1,000 values and orders them by `doc_count`.
`opensearch.anomaly_detection.max_entities_for_preview` | 30 | The maximum unique category field values displayed with the preview operation for high cardinality detectors. If the category field has more than 30 unique values, the plugin selects the top 30 values and orders them by `doc_count`.
`opensearch.anomaly_detection.max_primary_shards` | 10 | The maximum number of primary shards an anomaly detection index can have.
`opensearch.anomaly_detection.filter_by_backend_roles` | False | When you enable the security plugin and set this to `true`, the plugin filters results based on the user's backend role(s).
`opensearch.anomaly_detection.max_cache_miss_handling_per_second` | 100 | High cardinality detectors use a cache to store active models. In the event of a cache miss, the cache gets the models from the model checkpoint index. Use this setting to limit the rate of fetching models. Because the thread pool for a GET operation has a queue of 1,000, we recommend setting this value below 1,000.
`opensearch.anomaly_detection.max_batch_task_per_node` | 2 | Starting a historical detector triggers a batch task. This setting is the number of batch tasks that you can run per data node. You can tune this setting from 1 to 1000. If the data nodes can't support all batch tasks and if you're not sure if the data nodes are capable of running more historical detectors, add more data nodes instead of changing this setting to a higher value.
`opensearch.anomaly_detection.max_old_ad_task_docs_per_detector` | 10 | You can run the same historical detector many times. For each run, the anomaly detection plugin creates a new task. This setting is the number of previous tasks the plugin keeps. Set this value to at least 1 to track its last run. You can keep a maximum of 1,000 old tasks to avoid overwhelming the cluster.
`opensearch.anomaly_detection.batch_task_piece_size` | 1000 | The date range for a historical task is split into smaller pieces and the anomaly detection plugin runs the task piece by piece. Each piece contains 1,000 detection intervals by default. For example, if detector interval is 1 minute and one piece is 1000 minutes, the feature data is queried every 1,000 minutes. You can change this setting from 1 to 10,000.
`opensearch.anomaly_detection.batch_task_piece_interval_seconds` | 5 | Add a time interval between historical detector tasks. This interval prevents the task from consuming too much of the available resources and starving other operations like search and bulk index. You can change this setting from 1 to 600 seconds.

1466
docs/alerting/api.md Normal file

File diff suppressed because it is too large Load Diff

64
docs/alerting/cron.md Normal file
View File

@ -0,0 +1,64 @@
---
layout: default
title: Cron
nav_order: 20
parent: Alerting
has_children: false
---
# Cron expression reference
Monitors can run at a variety of fixed intervals (e.g. hourly, daily, etc.), but you can also define custom cron expressions for when they should run. Monitors use the Unix cron syntax and support five fields:
Field | Valid values
:--- | :---
Minute | 0-59
Hour | 0-23
Day of month | 1-31
Month | 1-12
Day of week | 0-7 (0 and 7 are both Sunday) or SUN, MON, TUE, WED, THU, FRI, SAT
For example, the following expression translates to "every Monday through Friday at 11:30 AM":
```
30 11 * * 1-5
```
## Features
Feature | Description
:--- | :---
`*` | Wildcard. Specifies all valid values.
`,` | List. Use to specify several values (e.g. `1,15,30`).
`-` | Range. Use to specify a range of values (e.g. `1-15`).
`/` | Step. Use after a wildcard or range to specify the "step" between values. For example, `0-11/2` is equivalent to `0,2,4,6,8,10`.
Note that you can specify the day using two fields: day of month and day of week. For most situations, we recommend that you use just one of these fields and leave the other as `*`.
If you use a non-wildcard value in both fields, the monitor runs when either field matches the time. For example, `15 2 1,15 * 1` causes the monitor to run at 2:15 AM on the 1st of the month, the 15th of the month, and every Monday.
## Sample expressions
Every other day at 1:45 PM:
```
45 13 1-31/2 * *
```
Every 10 minutes on Saturday and Sunday:
```
0/10 * * * 6-7
```
Every three hours on the first day of every other month:
```
0 0-23/3 1 1-12/2 *
```
## API
For an example of how to use a custom cron expression in an API call, see the [create monitor API operation](../api/#request-1).

16
docs/alerting/index.md Normal file
View File

@ -0,0 +1,16 @@
---
layout: default
title: Alerting
nav_order: 34
has_children: true
---
# Alerting
OpenSearch Dashboards
{: .label .label-yellow :}
The alerting feature notifies you when data from one or more OpenSearch indices meets certain conditions. For example, you might want to notify a [Slack](https://slack.com/) channel if your application logs more than five HTTP 503 errors in one hour, or you might want to page a developer if no new documents have been indexed in the past 20 minutes.
To get started, choose **Alerting** in OpenSearch Dashboards.
![OpenSearch Dashboards side bar with link](../images/alerting.png)

331
docs/alerting/monitors.md Normal file
View File

@ -0,0 +1,331 @@
---
layout: default
title: Monitors
nav_order: 1
parent: Alerting
has_children: false
---
# Monitors
#### Table of contents
- TOC
{:toc}
---
## Key terms
Term | Definition
:--- | :---
Monitor | A job that runs on a defined schedule and queries OpenSearch. The results of these queries are then used as input for one or more *triggers*.
Trigger | Conditions that, if met, generate *alerts* and can perform some *action*.
Alert | A notification that a monitor's trigger condition has been met.
Action | The information that you want the monitor to send out after being triggered. Actions have a *destination*, a message subject, and a message body.
Destination | A reusable location for an action, such as Amazon Chime, Slack, or a webhook URL.
---
## Create destinations
1. Choose **Alerting**, **Destinations**, **Add destination**.
1. Specify a name for the destination so that you can identify it later.
1. For **Type**, choose Slack, Amazon Chime, custom webhook, or [email](#email-as-a-destination).
For Email type, refer to [Email as a destination](#email-as-a-destination) section below. For all other types, specify the webhook URL. For more information about webhooks, see the documentation for [Slack](https://api.slack.com/incoming-webhooks) and [Chime](https://docs.aws.amazon.com/chime/latest/ug/webhooks.html).
For custom webhooks, you must specify more information: parameters and headers. For example, if your endpoint requires basic authentication, you might need to add a header with a key of `Authorization` and a value of `Basic <Base64-encoded-credential-string>`. You might also need to change `Content-Type` to whatever your webhook requires. Popular values are `application/json`, `application/xml`, and `text/plain`.
This information is stored in plain text in the OpenSearch cluster. We will improve this design in the future, but for now, the encoded credentials (which are neither encrypted nor hashed) might be visible to other OpenSearch users.
### Email as a destination
To send or receive an alert notification as an email, choose **Email** as the destination type. Next, add at least one sender and recipient. We recommend adding email groups if you want to notify more than a few people of an alert. You can configure senders and recipients using **Manage senders** and **Manage email groups**.
#### Manage senders
Senders are email accounts from which the alerting plugin sends notifications.
To configure a sender email, do the following:
1. After you choose **Email** as the destination type, choose **Manage senders**.
1. Choose **Add sender**, **New sender** and enter a unique name.
1. Enter the email address, SMTP host (e.g. `smtp.gmail.com` for a Gmail account), and the port.
1. Choose an encryption method, or use the default value of **None**. However, most email providers require SSL or TLS, which requires a username and password in OpenSearch keystore. Refer to [Authenticate sender account](#authenticate-sender-account) to learn more.
1. Choose **Save** to save the configuration and create the sender. You can create a sender even before you add your credentials to the OpenSearch keystore. However, you must [authenticate each sender account](#authenticate-sender-account) before you use the destination to send your alert.
You can reuse senders across many different destinations, but each destination only supports one sender.
#### Manage email groups or recipients
Use email groups to create and manage reusable lists of email addresses. For example, one alert might email the DevOps team, whereas another might email the executive team and the engineering team.
You can enter individual email addresses or an email group in the **Recipients** field.
1. After you choose **Email** as the destination type, choose **Manage email groups**. Then choose **Add email group**, **New email group**.
1. Enter a unique name.
1. For recipient emails, enter any number of email addresses.
1. Choose **Save**.
#### Authenticate sender account
If your email provider requires SSL or TLS, you must authenticate each sender account before you can send an email. Enter these credentials in the OpenSearch keystore using the CLI. Run the following commands (in your OpenSearch directory) to enter your username and password. The `<sender_name>` is the name you entered for **Sender** earlier.
```bash
./bin/opensearch-keystore add opensearch.alerting.destination.email.<sender_name>.username
./bin/opensearch-keystore add opensearch.alerting.destination.email.<sender_name>.password
```
**Note**: Keystore settings are node-specific. You must run these commands on each node.
{: .note}
To change or update your credentials (after you've added them to the keystore on every node), call the reload API to automatically update those credentials without restarting OpenSearch:
```json
POST _nodes/reload_secure_settings
{
"secure_settings_password": "1234"
}
```
---
## Create monitors
1. Choose **Alerting**, **Monitors**, **Create monitor**.
1. Specify a name for the monitor.
The anomaly detection option is for pairing with the anomaly detection plugin. See [Anomaly Detection](../../ad/).
For anomaly detector, choose an appropriate schedule for the monitor based on the detector interval. Otherwise, the alerting monitor might miss reading the results.
For example, assume you set the monitor interval and the detector interval as 5 minutes, and you start the detector at 12:00. If an anomaly is detected at 12:05, it might be available at 12:06 because of the delay between writing the anomaly and it being available for queries. The monitor reads the anomaly results between 12:00 and 12:05, so it does not get the anomaly results available at 12:06.
To avoid this issue, make sure the alerting monitor is at least twice the detector interval.
When you create a monitor using OpenSearch Dashboards, the anomaly detector plugin generates a default monitor schedule that's twice the detector interval.
Whenever you update a detectors interval, make sure to update the associated monitor interval as well, as the anomaly detection plugin does not do this automatically.
1. Choose one or more indices. You can also use `*` as a wildcard to specify an index pattern.
If you use the security plugin, you can only choose indices that you have permission to access. For details, see [Alerting security](../security/).
1. Define the monitor in one of three ways: visually, using a query, or using an anomaly detector.
- Visual definition works well for monitors that you can define as "some value is above or below some threshold for some amount of time."
- Query definition gives you flexibility in terms of what you query for (using [the OpenSearch query DSL](../../opensearch/full-text)) and how you evaluate the results of that query (Painless scripting).
This example averages the `cpu_usage` field:
```json
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"avg_cpu": {
"avg": {
"field": "cpu_usage"
}
}
}
}
```
You can even filter query results using `{% raw %}{{period_start}}{% endraw %}` and `{% raw %}{{period_end}}{% endraw %}`:
```json
{
"size": 0,
"query": {
"bool": {
"filter": [{
"range": {
"timestamp": {
"from": "{% raw %}{{period_end}}{% endraw %}||-1h",
"to": "{% raw %}{{period_end}}{% endraw %}",
"include_lower": true,
"include_upper": true,
"format": "epoch_millis",
"boost": 1
}
}
}],
"adjust_pure_negative": true,
"boost": 1
}
},
"aggregations": {}
}
```
"Start" and "end" refer to the interval at which the monitor runs. See [Available variables](#available-variables).
1. To define a monitor visually, choose **Define using visual graph**. Then choose an aggregation (for example, `count()` or `average()`), a set of documents, and a timeframe. Visual definition works well for most monitors.
To use a query, choose **Define using extraction query**, add your query (using [the OpenSearch query DSL](../../opensearch/full-text/)), and test it using the **Run** button.
The monitor makes this query to OpenSearch as often as the schedule dictates; check the **Query Performance** section and make sure you're comfortable with the performance implications.
To use an anomaly detector, choose **Define using Anomaly detector** and select your **Detector**.
1. Choose a frequency and timezone for your monitor. Note that you can only pick a timezone if you choose Daily, Weekly, Monthly, or [custom cron expression](../cron/) for frequency.
1. Choose **Create**.
---
## Create triggers
The next step in creating a monitor is to create a trigger. These steps differ depending on whether you chose **Define using visual graph** or **Define using extraction query** or **Define using Anomaly detector** when you created the monitor.
Either way, you begin by specifying a name and severity level for the trigger. Severity levels help you manage alerts. A trigger with a high severity level (e.g. 1) might page a specific individual, whereas a trigger with a low severity level might message a chat room.
### Visual graph
For **Trigger condition**, specify a threshold for the aggregation and timeframe you chose earlier, such as "is below 1,000" or "is exactly 10."
The line moves up and down as you increase and decrease the threshold. Once this line is crossed, the trigger evaluates to true.
### Extraction query
For **Trigger condition**, specify a Painless script that returns true or false. Painless is the default OpenSearch scripting language and has a syntax similar to Groovy.
Trigger condition scripts revolve around the `ctx.results[0]` variable, which corresponds to the extraction query response. For example, your script might reference `ctx.results[0].hits.total.value` or `ctx.results[0].hits.hits[i]._source.error_code`.
A return value of true means the trigger condition has been met, and the trigger should execute its actions. Test your script using the **Run** button.
The **Info** link next to **Trigger condition** contains a useful summary of the variables and results available to your query.
{: .tip }
### Anomaly detector
For **Trigger type**, choose **Anomaly detector grade and confidence**.
Specify the **Anomaly grade condition** for the aggregation and timeframe you chose earlier, "IS ABOVE 0.7" or "IS EXACTLY 0.5." The *anomaly grade* is a number between 0 and 1 that indicates the level of severity of how anomalous a data point is.
Specify the **Anomaly confidence condition** for the aggregation and timeframe you chose earlier, "IS ABOVE 0.7" or "IS EXACTLY 0.5." The *anomaly confidence* is an estimate of the probability that the reported anomaly grade matches the expected anomaly grade.
The line moves up and down as you increase and decrease the threshold. Once this line is crossed, the trigger evaluates to true.
#### Sample scripts
{::comment}
These scripts are Painless, not Groovy, but calling them Groovy in Jekyll gets us syntax highlighting in the generated HTML.
{:/comment}
```groovy
// Evaluates to true if the query returned any documents
ctx.results[0].hits.total.value > 0
```
```groovy
// Returns true if the avg_cpu aggregation exceeds 90
if (ctx.results[0].aggregations.avg_cpu.value > 90) {
return true;
}
```
```groovy
// Performs some crude custom scoring and returns true if that score exceeds a certain value
int score = 0;
for (int i = 0; i < ctx.results[0].hits.hits.length; i++) {
// Weighs 500 errors 10 times as heavily as 503 errors
if (ctx.results[0].hits.hits[i]._source.http_status_code == "500") {
score += 10;
} else if (ctx.results[0].hits.hits[i]._source.http_status_code == "503") {
score += 1;
}
}
if (score > 99) {
return true;
} else {
return false;
}
```
#### Available variables
Variable | Description
:--- | :---
`ctx.results` | An array with one element (i.e. `ctx.results[0]`). Contains the query results. This variable is empty if the trigger was unable to retrieve results. See `ctx.error`.
`ctx.monitor` | Includes `ctx.monitor.name`, `ctx.monitor.type`, `ctx.monitor.enabled`, `ctx.monitor.enabled_time`, `ctx.monitor.schedule`, `ctx.monitor.inputs`, `triggers` and `ctx.monitor.last_update_time`.
`ctx.trigger` | Includes `ctx.trigger.name`, `ctx.trigger.severity`, `ctx.trigger.condition`, and `ctx.trigger.actions`.
`ctx.periodStart` | Unix timestamp for the beginning of the period during which the alert triggered. For example, if a monitor runs every ten minutes, a period might begin at 10:40 and end at 10:50.
`ctx.periodEnd` | The end of the period during which the alert triggered.
`ctx.error` | The error message if the trigger was unable to retrieve results or unable to evaluate the trigger, typically due to a compile error or null pointer exception. Null otherwise.
`ctx.alert` | The current, active alert (if it exists). Includes `ctx.alert.id`, `ctx.alert.version`, and `ctx.alert.isAcknowledged`. Null if no alert is active.
---
## Add actions
The final step in creating a monitor is to add one or more actions. Actions send notifications when trigger conditions are met and support [Slack](https://slack.com/), [Amazon Chime](https://aws.amazon.com/chime/), and webhooks.
If you don't want to receive notifications for alerts, you don't have to add actions to your triggers. Instead, you can periodically check OpenSearch Dashboards.
{: .tip }
1. Specify a name for the action.
1. Choose a destination.
1. Add a subject and body for the message.
You can add variables to your messages using [Mustache templates](https://mustache.github.io/mustache.5.html). You have access to `ctx.action.name`, the name of the current action, as well as all [trigger variables](#available-variables).
If your destination is a custom webhook that expects a particular data format, you might need to include JSON (or even XML) directly in the message body:
```json
{% raw %}{ "text": "Monitor {{ctx.monitor.name}} just entered alert status. Please investigate the issue. - Trigger: {{ctx.trigger.name}} - Severity: {{ctx.trigger.severity}} - Period start: {{ctx.periodStart}} - Period end: {{ctx.periodEnd}}" }{% endraw %}
```
In this case, the message content must conform to the `Content-Type` header in the [custom webhook](#create-destinations).
1. (Optional) Use action throttling to limit the number of notifications you receive within a given span of time.
For example, if a monitor checks a trigger condition every minute, you could receive one notification per minute. If you set action throttling to 60 minutes, you receive no more than one notification per hour, even if the trigger condition is met dozens of times in that hour.
1. Choose **Create**.
After an action sends a message, the content of that message has left the purview of the security plugin. Securing access to the message (e.g. access to the Slack channel) is your responsibility.
#### Sample message
```mustache
{% raw %}Monitor {{ctx.monitor.name}} just entered an alert state. Please investigate the issue.
- Trigger: {{ctx.trigger.name}}
- Severity: {{ctx.trigger.severity}}
- Period start: {{ctx.periodStart}}
- Period end: {{ctx.periodEnd}}{% endraw %}
```
If you want to use the `ctx.results` variable in a message, use `{% raw %}{{ctx.results.0}}{% endraw %}` rather than `{% raw %}{{ctx.results[0]}}{% endraw %}`. This difference is due to how Mustache handles bracket notation.
{: .note }
---
## Work with alerts
Alerts persist until you resolve the root cause and have the following states:
State | Description
:--- | :---
Active | The alert is ongoing and unacknowledged. Alerts remain in this state until you acknowledge them, delete the trigger associated with the alert, or delete the monitor entirely.
Acknowledged | Someone has acknowledged the alert, but not fixed the root cause.
Completed | The alert is no longer ongoing. Alerts enter this state after the corresponding trigger evaluates to false.
Error | An error occurred while executing the trigger---usually the result of a a bad trigger or destination.
Deleted | Someone deleted the monitor or trigger associated with this alert while the alert was ongoing.

79
docs/alerting/security.md Normal file
View File

@ -0,0 +1,79 @@
---
layout: default
title: Alerting Security
nav_order: 10
parent: Alerting
has_children: false
---
# Alerting security
If you use the security plugin alongside alerting, you might want to limit certain users to certain actions. For example, you might want some users to only be able to view and acknowledge alerts, while others can modify monitors and destinations.
## Basic permissions
The security plugin has three built-in roles that cover most alerting use cases: `alerting_read_access`, `alerting_ack_alerts`, and `alerting_full_access`. For descriptions of each, see [Predefined roles](../../security/access-control/users-roles/#predefined-roles).
If these roles don't meet your needs, mix and match individual alerting [permissions](../../security/access-control/permissions/) to suit your use case. Each action corresponds to an operation in the REST API. For example, the `cluster:admin/opensearch/alerting/destination/delete` permission lets you delete destinations.
## How monitors access data
Monitors run with the permissions of the user who created or last modified them. For example, consider the user `jdoe`, who works at a chain of retail stores. `jdoe` has two roles. Together, these two roles allow read access to three indices: `store1-returns`, `store2-returns`, and `store3-returns`.
`jdoe` creates a monitor that sends an email to management whenever the number of returns across all three indices exceeds 40 per hour.
Later, the user `psantos` wants to edit the monitor to run every two hours, but `psantos` only has access to `store1-returns`. To make the change, `psantos` has two options:
- Update the monitor so that it only checks `store1-returns`.
- Ask an administrator for read access to the other two indices.
After making the change, the monitor now runs with the same permissions as `psantos`, including any [document-level security](../../security/access-control/document-level-security/) queries, [excluded fields](../../security/access-control/field-level-security/), and [masked fields](../../security/access-control/field-masking/). If you use an extraction query to define your monitor, use the **Run** button to ensure that the response includes the fields you need.
## (Advanced) Limit access by backend role
Out of the box, the alerting plugin has no concept of ownership. For example, if you have the `cluster:admin/opensearch/alerting/monitor/write` permission, you can edit *all* monitors, regardless of whether you created them. If a small number of trusted users manage your monitors and destinations, this lack of ownership generally isn't a problem. A larger organization might need to segment access by backend role.
First, make sure that your users have the appropriate [backend roles](../../security/access-control/). Backend roles usually come from an [LDAP server](../../security/configuration/ldap/) or [SAML provider](../../security/configuration/saml/). However, if you use the internal user database, you can use the REST API to [add them manually](../../security/access-control/api/#create-user).
Next, enable the following setting:
```json
PUT _cluster/settings
{
"transient": {
"opensearch.alerting.filter_by_backend_roles": "true"
}
}
```
Now when users view alerting resources in OpenSearch Dashboards (or make REST API calls), they only see monitors and destinations that are created by users who share *at least one* backend role. For example, consider three users who all have full access to alerting: `jdoe`, `jroe`, and `psantos`.
`jdoe` and `jroe` are on the same team at work and both have the `analyst` backend role. `psantos` has the `human-resources` backend role.
If `jdoe` creates a monitor, `jroe` can see and modify it, but `psantos` can't. If that monitor generates an alert, the situation is the same: `jroe` can see and acknowledge it, but `psantos` can't. If `psantos` creates a destination, `jdoe` and `jroe` can't see or modify it.
<!-- ## (Advanced) Limit access by individual
If you only want users to be able to see and modify their own monitors and destinations, duplicate the `alerting_full_access` role and add the following [DLS query](../../security/access-control/document-level-security/) to it:
```json
{
"bool": {
"should": [{
"match": {
"monitor.created_by": "${user.name}"
}
}, {
"match": {
"destination.created_by": "${user.name}"
}
}]
}
}
```
Then, use this new role for all alerting users. -->

59
docs/alerting/settings.md Normal file
View File

@ -0,0 +1,59 @@
---
layout: default
title: Management
parent: Alerting
nav_order: 5
---
# Management
## Alerting indices
The alerting feature creates several indices and one alias. The security plugin demo script configures them as [system indices](../../security/configuration/system-indices/) for an extra layer of protection. Don't delete these indices or modify their contents without using the alerting APIs.
Index | Purpose
:--- | :---
`.opensearch-alerting-alerts` | Stores ongoing alerts.
`.opensearch-alerting-alert-history-<date>` | Stores a history of completed alerts.
`.opensearch-alerting-config` | Stores monitors, triggers, and destinations. [Take a snapshot](../../opensearch/snapshot-restore) of this index to back up your alerting configuration.
`.opensearch-alerting-alert-history-write` (alias) | Provides a consistent URI for the `.opensearch-alerting-alert-history-<date>` index.
All alerting indices are hidden by default. For a summary, make the following request:
```
GET _cat/indices?expand_wildcards=open,hidden
```
## Alerting settings
We don't recommend changing these settings; the defaults should work well for most use cases.
All settings are available using the OpenSearch `_cluster/settings` API. None require a restart, and all can be marked `persistent` or `transient`.
Setting | Default | Description
:--- | :--- | :---
`opensearch.scheduled_jobs.enabled` | true | Whether the alerting plugin is enabled or not. If disabled, all monitors immediately stop running.
`opensearch.alerting.index_timeout` | 60s | The timeout for creating monitors and destinations using the REST APIs.
`opensearch.alerting.request_timeout` | 10s | The timeout for miscellaneous requests from the plugin.
`opensearch.alerting.action_throttle_max_value` | 24h | The maximum amount of time you can set for action throttling. By default, this value displays as 1440 minutes in OpenSearch Dashboards.
`opensearch.alerting.input_timeout` | 30s | How long the monitor can take to issue the search request.
`opensearch.alerting.bulk_timeout` | 120s | How long the monitor can write alerts to the alert index.
`opensearch.alerting.alert_backoff_count` | 3 | The number of retries for writing alerts before the operation fails.
`opensearch.alerting.alert_backoff_millis` | 50ms | The amount of time to wait between retries---increases exponentially after each failed retry.
`opensearch.alerting.alert_history_rollover_period` | 12h | How frequently to check whether the `.opensearch-alerting-alert-history-write` alias should roll over to a new history index and whether the Alerting plugin should delete any history indices.
`opensearch.alerting.move_alerts_backoff_millis` | 250 | The amount of time to wait between retries---increases exponentially after each failed retry.
`opensearch.alerting.move_alerts_backoff_count` | 3 | The number of retries for moving alerts to a deleted state after their monitor or trigger has been deleted.
`opensearch.alerting.monitor.max_monitors` | 1000 | The maximum number of monitors users can create.
`opensearch.alerting.alert_history_max_age` | 30d | The oldest document to store in the `.opensearch-alert-history-<date>` index before creating a new index. If the number of alerts in this time period does not exceed `alert_history_max_docs`, alerting creates one history index per period (e.g. one index every 30 days).
`opensearch.alerting.alert_history_max_docs` | 1000 | The maximum number of alerts to store in the `.opensearch-alert-history-<date>` index before creating a new index.
`opensearch.alerting.alert_history_enabled` | true | Whether to create `.opensearch-alerting-alert-history-<date>` indices.
`opensearch.alerting.alert_history_retention_period` | 60d | The amount of time to keep history indices before automatically deleting them.
`opensearch.alerting.destination.allow_list` | ["chime", "slack", "custom_webhook", "email", "test_action"] | The list of allowed destinations. If you don't want to allow users to a certain type of destination, you can remove it from this list, but we recommend leaving this setting as-is.
`opensearch.alerting.filter_by_backend_roles` | "false" | Restricts access to monitors by backend role. See [Alerting security](../security/).
`opensearch.scheduled_jobs.sweeper.period` | 5m | The alerting feature uses its "job sweeper" component to periodically check for new or updated jobs. This setting is the rate at which the sweeper checks to see if any jobs (monitors) have changed and need to be rescheduled.
`opensearch.scheduled_jobs.sweeper.page_size` | 100 | The page size for the sweeper. You shouldn't need to change this value.
`opensearch.scheduled_jobs.sweeper.backoff_millis` | 50ms | The amount of time the sweeper waits between retries---increases exponentially after each failed retry.
`opensearch.scheduled_jobs.sweeper.retry_count` | 3 | The total number of times the sweeper should retry before throwing an error.
`opensearch.scheduled_jobs.request_timeout` | 10s | The timeout for the request that sweeps shards for jobs.

251
docs/async/index.md Normal file
View File

@ -0,0 +1,251 @@
---
layout: default
title: Asynchronous search
nav_order: 51
has_children: true
---
# Asynchronous Search
Searching large volumes of data can take a long time, especially if you're searching across warm nodes or multiple remote clusters.
Asynchronous search lets you run search requests that run in the background. You can monitor the progress of these searches and get back partial results as they become available. After the search finishes, you can save the results to examine at a later time.
## REST API
To perform an asynchronous search, send requests to `_opensearch/_asynchronous_search`, with your query in the request body:
```json
POST _opensearch/_asynchronous_search
```
You can specify the following options.
Options | Description | Default value | Required
:--- | :--- |:--- |:--- |
`wait_for_completion_timeout` | Specifies the amount of time that you plan to wait for the results. You can see whatever results you get within this time just like in a normal search. You can poll the remaining results based on an ID. The maximum value is 300 seconds. | 1 second | No
`keep_on_completion` | Specifies whether you want to save the results in the cluster after the search is complete. You can examine the stored results at a later time. | `false` | No
`keep_alive` | Specifies the amount of time that the result is saved in the cluster. For example, `2d` means that the results are stored in the cluster for 48 hours. The saved search results are deleted after this period or if the search is cancelled. Note that this includes the query execution time. If the query overruns this time, the process cancels this query automatically. | 12 hours | No
#### Sample request
```json
POST _opensearch/_asynchronous_search/?pretty&size=10&wait_for_completion_timeout=1ms&keep_on_completion=true&request_cache=false
{
"aggs": {
"city": {
"terms": {
"field": "city",
"size": 10
}
}
}
}
```
#### Sample response
```json
{
"*id*": "FklfVlU4eFdIUTh1Q1hyM3ZnT19fUVEUd29KLWZYUUI3TzRpdU5wMjRYOHgAAAAAAAAABg==",
"state": "RUNNING",
"start_time_in_millis": 1599833301297,
"expiration_time_in_millis": 1600265301297,
"response": {
"took": 15,
"timed_out": false,
"terminated_early": false,
"num_reduce_phases": 4,
"_shards": {
"total": 21,
"successful": 4,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 807,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"city": {
"doc_count_error_upper_bound": 16,
"sum_other_doc_count": 403,
"buckets": [
{
"key": "downsville",
"doc_count": 1
},
....
....
....
{
"key": "blairstown",
"doc_count": 1
}
]
}
}
}
}
```
#### Response parameters
Options | Description
:--- | :---
`id` | The ID of an asynchronous search. Use this ID to monitor the progress of the search, get its partial results, and/or delete the results. If the asynchronous search finishes within the timeout period, the response doesn't include the ID because the results aren't stored in the cluster.
`state` | Specifies whether the search is still running or if it has finished, and if the results persist in the cluster. The possible states are `RUNNING`, `COMPLETED`, and `PERSISTED`.
`start_time_in_millis` | The start time in milliseconds.
`expiration_time_in_millis` | The expiration time in milliseconds.
`took` | The total time that the search is running.
`response` | The actual search response.
`num_reduce_phases` | The number of times that the coordinating node aggregates results from batches of shard responses (5 by default). If this number increases compared to the last retrieved results, you can expect additional results to be included in the search response.
`total` | The total number of shards that run the search.
`successful` | The number of shard responses that the coordinating node received successfully.
`aggregations` | The partial aggregation results that have been completed by the shards so far.
## Get partial results
After you submit an asynchronous search request, you can request partial responses with the ID that you see in the asynchronous search response.
```json
GET _opensearch/_asynchronous_search/<ID>?pretty
```
#### Sample response
```json
{
"id": "Fk9lQk5aWHJIUUltR2xGWnpVcWtFdVEURUN1SWZYUUJBVkFVMEJCTUlZUUoAAAAAAAAAAg==",
"state": "STORE_RESIDENT",
"start_time_in_millis": 1599833907465,
"expiration_time_in_millis": 1600265907465,
"response": {
"took": 83,
"timed_out": false,
"_shards": {
"total": 20,
"successful": 20,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1000,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "bank",
"_type": "_doc",
"_id": "1",
"_score": 1,
"_source": {
"email": "amberduke@abc.com",
"city": "Brogan",
"state": "IL"
}
},
{....}
]
},
"aggregations": {
"city": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 997,
"buckets": [
{
"key": "belvoir",
"doc_count": 2
},
{
"key": "aberdeen",
"doc_count": 1
},
{
"key": "abiquiu",
"doc_count": 1
}
]
}
}
}
}
```
After the response is successfully persisted, you get back the `STORE_RESIDENT` state in the response.
You can poll the ID with the `wait_for_completion_timeout` parameter to wait for the results received for the time that you specify.
For asynchronous searches with `keep_on_completion` as `true` and a sufficiently long `keep_alive` time, you can keep polling the IDs until the search finishes. If you dont want to periodically poll each ID, you can retain the results in your cluster with the `keep_alive` parameter and come back to it at a later time.
## Delete searches and results
You can use the DELETE API operation to delete any ongoing asynchronous search by its ID. If the search is still running, its canceled. If the search is complete, the saved search results are deleted.
```json
DELETE _opensearch/_asynchronous_search/<ID>?pretty
```
#### Sample response
```json
{
"acknowledged": "true"
}
```
## Monitor stats
You can use the stats API operation to monitor asynchronous searches that are running, completed, and/or persisted.
```json
GET _opensearch/_asynchronous_search/stats
```
#### Sample response
```json
{
"_nodes": {
"total": 8,
"successful": 8,
"failed": 0
},
"cluster_name": "264071961897:asynchronous-search",
"nodes": {
"JKEFl6pdRC-xNkKQauy7Yg": {
"asynchronous_search_stats": {
"submitted": 18236,
"initialized": 112,
"search_failed": 56,
"search_completed": 56,
"rejected": 18124,
"persist_failed": 0,
"cancelled": 1,
"running_current": 399,
"persisted": 100
}
}
}
}
```
#### Response parameters
Options | Description
:--- | :---
`submitted` | The number of asynchronous search requests that were submitted.
`initialized` | The number of asynchronous search requests that were initialized.
`rejected` | The number of asynchronous search requests that were rejected.
`search_completed` | The number of asynchronous search requests that completed with a successful response.
`search_failed` | The number of asynchronous search requests that completed with a failed response.
`persisted` | The number of asynchronous search requests whose final result successfully persisted in the cluster.
`persist_failed` | The number of asynchronous search requests whose final result failed to persist in the cluster.
`running_current` | The number of asynchronous search requests that are running on a given coordinator node.
`cancelled` | The number of asynchronous search requests that were canceled while the search was running.

76
docs/async/security.md Normal file
View File

@ -0,0 +1,76 @@
---
layout: default
title: Asynchronous search security
nav_order: 2
parent: Asynchronous search
has_children: false
---
# Asynchronous search security
You can use the security plugin with asynchronous searches to limit non-admin users to specific actions. For example, you might want some users to only be able to submit or delete asynchronous searches, while you might want others to only view the results.
All asynchronous search indices are protected as system indices. Only a super admin user or an admin user with a Transport Layer Security (TLS) certificate can access system indices. For more information, see [System indices](../../security/configuration/system-indices/).
## Basic permissions
As an admin user, you can use the security plugin to assign specific permissions to users based on which API operations they need access to. For a list of supported APIs operations, see [Asynchronous search](../).
The security plugin has two built-in roles that cover most asynchronous search use cases: `asynchronous_search_full_access` and `asynchronous_search_read_access`. For descriptions of each, see [Predefined roles](../../security/access-control/users-roles/#predefined-roles).
If these roles dont meet your needs, mix and match individual asynchronous search permissions to suit your use case. Each action corresponds to an operation in the REST API. For example, the `cluster:admin/opensearch/asynchronous_search/delete` permission lets you delete a previously submitted asynchronous search.
## (Advanced) Limit access by backend role
Use backend roles to configure fine-grained access to asynchronous searches based on roles. For example, users of different departments in an organization can view asynchronous searches owned by their own department.
First, make sure that your users have the appropriate [backend roles](../../security/access-control/). Backend roles usually come from an [LDAP server](../../security/configuration/ldap/) or [SAML provider](../../security/configuration/saml/). However, if you use the internal user database, you can use the REST API to [add them manually](../../security/access-control/api/#create-user).
Now when users view asynchronous search resources in OpenSearch Dashboards (or make REST API calls), they only see asynchronous searches that are submitted by users who have a subset of the backend role.
For example, consider two users: `judy` and `elon`.
`judy` has an IT backend role:
```json
PUT _opensearch/_security/api/internalusers/judy
{
"password": "judy",
"backend_roles": [
"IT"
],
"attributes": {}
}
```
`elon` has an admin backend role:
```json
PUT _opensearch/_security/api/internalusers/elon
{
"password": "elon",
"backend_roles": [
"admin"
],
"attributes": {}
}
```
Both `judy` and `elon` have full access to asynchronous search:
```json
PUT _opensearch/_security/api/rolesmapping/async_full_access
{
"backend_roles": [],
"hosts": [],
"users": [
"judy",
"elon"
]
}
```
Because they have different backend roles, an asynchronous search submitted by `judy` will not be visible to `elon` and vice versa.
`judy` needs to have at least the superset of all roles that `elon` has to see `elon`'s asynchronous searches.
For example, if `judy` has five backend roles and `elon` one has one of these roles, then `judy` can see asynchronous searches submitted by `elon`, but `elon` cant see the asynchronous searches submitted by `judy`. This means that `judy` can perform GET and DELETE operations on asynchronous searches that are submitted by `elon`, but not the reverse.

29
docs/async/settings.md Normal file
View File

@ -0,0 +1,29 @@
---
layout: default
title: Settings
parent: Asynchronous search
nav_order: 4
---
# Settings
The asynchronous search plugin adds several settings to the standard OpenSearch cluster settings. They are dynamic, so you can change the default behavior of the plugin without restarting your cluster. You can mark the settings as `persistent` or `transient`.
For example, to update the retention period of the result index:
```json
PUT _cluster/settings
{
"transient": {
"opensearch.asynchronous_search.max_wait_for_completion_timeout": "5m"
}
}
```
Setting | Default | Description
:--- | :--- | :---
`opensearch.asynchronous_search.max_search_running_time` | 12 hours | The maximum running time for the search beyond which the search is terminated.
`opensearch.asynchronous_search.node_concurrent_running_searches` | 20 | The concurrent searches running per coordinator node.
`opensearch.asynchronous_search.max_keep_alive` | 5 days | The maximum amount of time that search results can be stored in the cluster.
`opensearch.asynchronous_search.max_wait_for_completion_timeout` | 1 minute | The maximum value for the `wait_for_completion_timeout` parameter.
`opensearch.asynchronous_search.persist_search_failures` | false | Persist asynchronous search results that end with a search failure in the system index.

100
docs/cli/index.md Normal file
View File

@ -0,0 +1,100 @@
---
layout: default
title: OpenSearch CLI
nav_order: 52
has_children: false
---
# OpenSearch CLI
The OpenSearch CLI command line interface (opensearch-cli) lets you manage your OpenSearch cluster from the command line and automate tasks.
Currently, opensearch-cli supports the [Anomaly Detection](../ad/) and [k-NN](../knn/) plugins, along with arbitrary REST API paths. Among other things, you can use opensearch-cli create and delete detectors, start and stop them, and check k-NN statistics.
Profiles let you easily access different clusters or sign requests with different credentials. opensearch-cli supports unauthenticated requests, HTTP basic signing, and IAM signing for Amazon Web Services.
This example moves a detector (`ecommerce-count-quantity`) from a staging cluster to a production cluster:
```bash
opensearch-cli ad get ecommerce-count-quantity --profile staging > ecommerce-count-quantity.json
opensearch-cli ad create ecommerce-count-quantity.json --profile production
opensearch-cli ad start ecommerce-count-quantity.json --profile production
opensearch-cli ad stop ecommerce-count-quantity --profile staging
opensearch-cli ad delete ecommerce-count-quantity --profile staging
```
## Install
1. [Download](https://opensearch.org/downloads.html){:target='\_blank'} and extract the appropriate installation package for your computer.
1. Make the `opensearch-cli` file executable:
```bash
chmod +x ./opensearch-cli
```
1. Add the command to your path:
```bash
export PATH=$PATH:$(pwd)
```
1. Check that the CLI is working properly:
```bash
opensearch-cli --version
```
## Profiles
Profiles let you easily switch between different clusters and user credentials. To get started, run `opensearch-cli profile create` with the `--auth-type`, `--endpoint`, and `--name` options:
```bash
opensearch-cli profile create --auth-type basic --endpoint https://localhost:9200 --name docker-local
```
Alternatively, save a configuration file to `~/.opensearch-cli/config.yaml`:
```yaml
profiles:
- name: docker-local
endpoint: https://localhost:9200
user: admin
password: foobar
- name: aws
endpoint: https://some-cluster.us-east-1.es.amazonaws.com
aws_iam:
profile: ""
service: es
```
## Usage
opensearch-cli commands use the following syntax:
```bash
opensearch-cli <command> <subcommand> <flags>
```
For example, the following command retrieves information about a detector:
```bash
opensearch-cli ad get my-detector --profile docker-local
```
For a request to the OpenSearch CAT API, try the following command:
```bash
opensearch-cli curl get --path _cat/plugins --profile aws
```
Use the `-h` or `--help` flag to see all supported commands, subcommands, or usage for a specific command:
```bash
opensearch-cli -h
opensearch-cli ad -h
opensearch-cli ad get -h
```

View File

@ -0,0 +1,457 @@
---
layout: default
title: Index Rollups
nav_order: 35
parent: Index management
has_children: true
redirect_from: /docs/ism/index-rollups/
has_toc: false
---
# Index Rollups
Time series data increases storage costs, strains cluster health, and slows down aggregations over time. Index rollup lets you periodically reduce data granularity by rolling up old data into summarized indices.
You pick the fields that interest you and use index rollup to create a new index with only those fields aggregated into coarser time buckets. You can store months or years of historical data at a fraction of the cost with the same query performance.
For example, say you collect CPU consumption data every five seconds and store it on a hot node. Instead of moving older data to a read-only warm node, you can roll up or compress this data with only the average CPU consumption per day or with a 10% decrease in its interval every week.
You can use index rollup in three ways:
1. Use the index rollup API for an on-demand index rollup job that operates on an index that's not being actively ingested such as a rolled-over index. For example, you can perform an index rollup operation to reduce data collected at a five minute interval to a weekly average for trend analysis.
2. Use the OpenSearch Dashboards UI to create an index rollup job that runs on a defined schedule. You can also set it up to roll up your indices as its being actively ingested. For example, you can continuously roll up Logstash indices from a five second interval to a one hour interval.
3. Specify the index rollup job as an ISM action for complete index management. This allows you to roll up an index after a certain event such as a rollover, index age reaching a certain point, index becoming read-only, and so on. You can also have rollover and index rollup jobs running in sequence, where the rollover first moves the current index to a warm node and then the index rollup job creates a new index with the minimized data on the hot node.
## Create an Index Rollup Job
To get started, choose **Index Management** in OpenSearch Dashboards.
Select **Rollup Jobs** and choose **Create rollup job**.
### Step 1: Set up indices
1. In the **Job name and description** section, specify a unique name and an optional description for the index rollup job.
2. In the **Indices** section, select the source and target index. The source index is the one that you want to roll up. The source index remains as is, the index rollup job creates a new index referred to as a target index. The target index is where the index rollup results are saved. For target index, you can either type in a name for a new index or you select an existing index.
5. Choose **Next**
After you create an index rollup job, you can't change your index selections.
### Step 2: Define aggregations and metrics
Select the attributes with the aggregations (terms and histograms) and metrics (avg, sum, max, min, and value count) that you want to roll up. Make sure you dont add a lot of highly granular attributes, because you wont save much space.
For example, consider a dataset of cities and demographics within those cities. You can aggregate based on cities and specify demographics within a city as metrics.
The order in which you select attributes is critical. A city followed by a demographic is different from a demographic followed by a city.
1. In the **Time aggregation** section, select a timestamp field. Choose between a **Fixed** or **Calendar** interval type and specify the interval and timezone. The index rollup job uses this information to create a date histogram for the timestamp field.
2. (Optional) Add additional aggregations for each field. You can choose terms aggregation for all field types and histogram aggregation only for numeric fields.
3. (Optional) Add additional metrics for each field. You can choose between **All**, **Min**, **Max**, **Sum**, **Avg**, or **Value Count**.
4. Choose **Next**.
### Step 3: Specify schedule
Specify a schedule to roll up your indices as its being ingested. The index rollup job is enabled by default.
1. Specify if the data is continuous or not.
3. For roll up execution frequency, select **Define by fixed interval** and specify the **Rollup interval** and the time unit or **Define by cron expression** and add in a cron expression to select the interval. To learn how to define a cron expression, see [Alerting](../alerting/cron/).
4. Specify the number of pages per execution process. A larger number means faster execution and more cost for memory.
5. (Optional) Add a delay to the roll up executions. This is the amount of time the job waits for data ingestion to accommodate any processing time. For example, if you set this value to 10 minutes, an index rollup that executes at 2 PM to roll up 1 PM to 2 PM of data starts at 2:10 PM.
6. Choose **Next**.
### Step 4: Review and create
Review your configuration and select **Create**.
### Step 5: Search the target index
You can use the standard `_search` API to search the target index. Make sure that the query matches the constraints of the target index. For example, if dont set up terms aggregations on a field, you dont receive results for terms aggregations. If you dont set up the maximum aggregations, you dont receive results for maximum aggregations.
You cant access the internal structure of the data in the target index because the plugin automatically rewrites the query in the background to suit the target index. This is to make sure you can use the same query for the source and target index.
To query the target index, set `size` to 0:
```json
GET target_index/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"avg_cpu": {
"avg": {
"field": "cpu_usage"
}
}
}
}
```
Consider a scenario where you collect rolled up data from 1 PM to 9 PM in hourly intervals and live data from 7 PM to 11 PM in minutely intervals. If you execute an aggregation over these in the same query, for 7 PM to 9 PM, you see an overlap of both rolled up data and live data because they get counted twice in the aggregations.
## Sample Walkthrough
This walkthrough uses the OpenSearch Dashboards sample e-commerce data. To add that sample data, log in to OpenSearch Dashboards, choose **Home** and **Try our sample data**. For **Sample eCommerce orders**, choose **Add data**.
Then run a search:
```json
GET opensearch_dashboards_sample_data_ecommerce/_search
```
#### Sample response
```json
{
"took": 23,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4675,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "opensearch_dashboards_sample_data_ecommerce",
"_type": "_doc",
"_id": "jlMlwXcBQVLeQPrkC_kQ",
"_score": 1,
"_source": {
"category": [
"Women's Clothing",
"Women's Accessories"
],
"currency": "EUR",
"customer_first_name": "Selena",
"customer_full_name": "Selena Mullins",
"customer_gender": "FEMALE",
"customer_id": 42,
"customer_last_name": "Mullins",
"customer_phone": "",
"day_of_week": "Saturday",
"day_of_week_i": 5,
"email": "selena@mullins-family.zzz",
"manufacturer": [
"Tigress Enterprises"
],
"order_date": "2021-02-27T03:56:10+00:00",
"order_id": 581553,
"products": [
{
"base_price": 24.99,
"discount_percentage": 0,
"quantity": 1,
"manufacturer": "Tigress Enterprises",
"tax_amount": 0,
"product_id": 19240,
"category": "Women's Clothing",
"sku": "ZO0064500645",
"taxless_price": 24.99,
"unit_discount_amount": 0,
"min_price": 12.99,
"_id": "sold_product_581553_19240",
"discount_amount": 0,
"created_on": "2016-12-24T03:56:10+00:00",
"product_name": "Blouse - port royal",
"price": 24.99,
"taxful_price": 24.99,
"base_unit_price": 24.99
},
{
"base_price": 10.99,
"discount_percentage": 0,
"quantity": 1,
"manufacturer": "Tigress Enterprises",
"tax_amount": 0,
"product_id": 17221,
"category": "Women's Accessories",
"sku": "ZO0085200852",
"taxless_price": 10.99,
"unit_discount_amount": 0,
"min_price": 5.06,
"_id": "sold_product_581553_17221",
"discount_amount": 0,
"created_on": "2016-12-24T03:56:10+00:00",
"product_name": "Snood - rose",
"price": 10.99,
"taxful_price": 10.99,
"base_unit_price": 10.99
}
],
"sku": [
"ZO0064500645",
"ZO0085200852"
],
"taxful_total_price": 35.98,
"taxless_total_price": 35.98,
"total_quantity": 2,
"total_unique_products": 2,
"type": "order",
"user": "selena",
"geoip": {
"country_iso_code": "MA",
"location": {
"lon": -8,
"lat": 31.6
},
"region_name": "Marrakech-Tensift-Al Haouz",
"continent_name": "Africa",
"city_name": "Marrakesh"
},
"event": {
"dataset": "sample_ecommerce"
}
}
}
]
}
}
...
```
Create an index rollup job.
This example picks the `order_date`, `customer_gender`, `geoip.city_name`, `geoip.region_name`, and `day_of_week` fields and rolls them into an `example_rollup` target index:
```json
PUT _opensearch/_rollup/jobs/example
{
"rollup": {
"enabled": true,
"schedule": {
"interval": {
"period": 1,
"unit": "Minutes",
"start_time": 1602100553
}
},
"last_updated_time": 1602100553,
"description": "An example policy that rolls up the sample ecommerce data",
"source_index": "opensearch_dashboards_sample_data_ecommerce",
"target_index": "example_rollup",
"page_size": 1000,
"delay": 0,
"continuous": false,
"dimensions": [
{
"date_histogram": {
"source_field": "order_date",
"fixed_interval": "60m",
"timezone": "America/Los_Angeles"
}
},
{
"terms": {
"source_field": "customer_gender"
}
},
{
"terms": {
"source_field": "geoip.city_name"
}
},
{
"terms": {
"source_field": "geoip.region_name"
}
},
{
"terms": {
"source_field": "day_of_week"
}
}
],
"metrics": [
{
"source_field": "taxless_total_price",
"metrics": [
{
"avg": {}
},
{
"sum": {}
},
{
"max": {}
},
{
"min": {}
},
{
"value_count": {}
}
]
},
{
"source_field": "total_quantity",
"metrics": [
{
"avg": {}
},
{
"max": {}
}
]
}
]
}
}
```
You can query the `example_rollup` index for the terms aggregations on the fields set up in the rollup job.
You get back the same response that you would on the original `opensearch_dashboards_sample_data_ecommerce` source index.
```json
POST example_rollup/_search
{
"size": 0,
"query": {
"bool": {
"must": {"term": { "geoip.region_name": "California" } }
}
},
"aggregations": {
"daily_numbers": {
"terms": {
"field": "day_of_week"
},
"aggs": {
"per_city": {
"terms": {
"field": "geoip.city_name"
},
"aggregations": {
"average quantity": {
"avg": {
"field": "total_quantity"
}
}
}
},
"total_revenue": {
"sum": {
"field": "taxless_total_price"
}
}
}
}
}
}
```
#### Sample Response
```json
{
"took": 476,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 281,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"daily_numbers": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Friday",
"doc_count": 53,
"total_revenue": {
"value": 4858.84375
},
"per_city": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Los Angeles",
"doc_count": 53,
"average quantity": {
"value": 2.305084745762712
}
}
]
}
},
{
"key": "Saturday",
"doc_count": 43,
"total_revenue": {
"value": 3547.203125
},
"per_city": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Los Angeles",
"doc_count": 43,
"average quantity": {
"value": 2.260869565217391
}
}
]
}
},
{
"key": "Tuesday",
"doc_count": 42,
"total_revenue": {
"value": 3983.28125
},
"per_city": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Los Angeles",
"doc_count": 42,
"average quantity": {
"value": 2.2888888888888888
}
}
]
}
},
{
"key": "Sunday",
"doc_count": 40,
"total_revenue": {
"value": 3308.1640625
},
"per_city": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Los Angeles",
"doc_count": 40,
"average quantity": {
"value": 2.090909090909091
}
}
]
}
}
...
]
}
}
}
```

View File

@ -0,0 +1,235 @@
---
layout: default
title: Index Rollups API
parent: Index Rollups
grand_parent: Index management
redirect_from: /docs/ism/rollup-api/
nav_order: 9
---
# Index Rollups API
Use the index rollup operations to programmatically work with index rollup jobs.
---
#### Table of contents
- TOC
{:toc}
---
## Create or update an index rollup job
Creates or updates an index rollup job.
You must provide the `seq_no` and `primary_term` parameters.
#### Request
```json
PUT _opensearch/_rollup/jobs/<rollup_id> // Create
PUT _opensearch/_rollup/jobs/<rollup_id>?if_seq_no=1&if_primary_term=1 // Update
{
"rollup": {
"source_index": "nyc-taxi-data",
"target_index": "rollup-nyc-taxi-data",
"schedule": {
"interval": {
"period": 1,
"unit": "Days"
}
},
"description": "Example rollup job",
"enabled": true,
"page_size": 200,
"delay": 0,
"roles": [
"rollup_all",
"nyc_taxi_all",
"example_rollup_index_all"
],
"continuous": false,
"dimensions": {
"date_histogram": {
"source_field": "tpep_pickup_datetime",
"fixed_interval": "1h",
"timezone": "America/Los_Angeles"
},
"terms": {
"source_field": "PULocationID"
},
"metrics": [
{
"source_field": "passenger_count",
"metrics": [
{
"avg": {}
},
{
"sum": {}
},
{
"max": {}
},
{
"min": {}
},
{
"value_count": {}
}
]
}
]
}
}
}
```
You can specify the following options.
Options | Description | Type | Required
:--- | :--- |:--- |:--- |
`source_index` | The name of the detector. | `string` | Yes
`target_index` | Specify the target index that the rolled up data is ingested into. You could either create a new target index or use an existing index. The target index cannot be a combination of raw and rolled up data. | `string` | Yes
`schedule` | Schedule of the index rollup job which can be an interval or a cron expression. | `object` | Yes
`schedule.interval` | Specify the frequency of execution of the rollup job. | `object` | No
`schedule.interval.start_time` | Start time of the interval. | `timestamp` | Yes
`schedule.interval.period` | Define the interval period. | `string` | Yes
`schedule.interval.unit` | Specify the time unit of the interval. | `string` | Yes
`schedule.interval.cron` | Optionally, specify a cron expression to define therollup frequency. | `list` | No
`schedule.interval.cron.expression` | Specify a Unix cron expression. | `string` | Yes
`schedule.interval.cron.timezone` | Specify timezones as defined by the IANA Time Zone Database. Defaults to UTC. | `string` | No
`description` | Optionally, describe the rollup job. | `string` | No
`enabled` | When true, the index rollup job is scheduled. Default is true. | `boolean` | Yes
`continuous` | Specify whether or not the index rollup job continuously rolls up data forever or just executes over the current data set once and stops. Default is false. | `boolean` | Yes
`error_notification` | Set up a Mustache message template sent for error notifications. For example, if an index rollup job fails, the system sends a message to a Slack channel. | `object` | No
`page_size` | Specify the number of buckets to paginate through at a time while rolling up. | `number` | Yes
`delay` | Specify time value to delay execution of the index rollup job. | `time_unit` | No
`dimensions` | Specify aggregations to create dimensions for the roll up time window. | `object` | Yes
`dimensions.date_histogram` | Specify either fixed_interval or calendar_interval, but not both. Either one limits what you can query in the target index. | `object` | No
`dimensions.date_histogram.fixed_interval` | Specify the fixed interval for aggregations in milliseconds, seconds, minutes, hours, or days. | `string` | No
`dimensions.date_histogram.calendar_interval` | Specify the calendar interval for aggregations in minutes, hours, days, weeks, months, quarters, or years. | `string` | No
`dimensions.date_histogram.field` | Specify the date field used in date histogram aggregation. | `string` | No
`dimensions.date_histogram.timezone` | Specify the timezones as defined by the IANA Time Zone Database. The default is UTC. | `string` | No
`dimensions.terms` | Specify the term aggregations that you want to roll up. | `object` | No
`dimensions.terms.fields` | Specify terms aggregation for compatible fields. | `object` | No
`dimensions.histogram` | Specify the histogram aggregations that you want to roll up. | `object` | No
`dimensions.histogram.field` | Add a field for histogram aggregations. | `string` | Yes
`dimensions.histogram.interval` | Specify the histogram aggregation interval for the field. | `long` | Yes
`dimensions.metrics` | Specify a list of objects that represent the fields and metrics that you want to calculate. | `nested object` | No
`dimensions.metrics.field` | Specify the field that you want to perform metric aggregations on. | `string` | No
`dimensions.metrics.field.metrics` | Specify the metric aggregations you want to calculate for the field. | `multiple strings` | No
#### Sample response
```json
{
"_id": "rollup_id",
"_seqNo": 1,
"_primaryTerm": 1,
"rollup": { ... }
}
```
## Get an index rollup job
Returns all information about an index rollup job based on the `rollup_id`.
#### Request
```json
GET _opensearch/_rollup/jobs/<rollup_id>
```
#### Sample response
```json
{
"_id": "my_rollup",
"_seqNo": 1,
"_primaryTerm": 1,
"rollup": { ... }
}
```
---
## Delete an index rollup job
Deletes an index rollup job based on the `rollup_id`.
#### Request
```json
DELETE _opensearch/_rollup/jobs/<rollup_id>
```
#### Sample response
```json
200 OK
```
---
## Start or stop an index rollup job
Start or stop an index rollup job.
#### Request
```json
POST _opensearch/_rollup/jobs/<rollup_id>/_start
POST _opensearch/_rollup/jobs/<rollup_id>/_stop
```
#### Sample response
```json
200 OK
```
---
## Explain an index rollup job
Returns detailed metadata information about the index rollup job and its current progress.
#### Request
```json
GET _opensearch/_rollup/jobs/<rollup_id>/_explain
```
#### Sample response
```json
{
"example_rollup": {
"rollup_id": "example_rollup",
"last_updated_time": 1602014281,
"continuous": {
"next_window_start_time": 1602055591,
"next_window_end_time": 1602075591
},
"status": "running",
"failure_reason": null,
"stats": {
"pages_processed": 342,
"documents_processed": 489359,
"rollups_indexed": 3420,
"index_time_in_ms": 30495,
"search_time_in_ms": 584922
}
}
}
```

12
docs/im/index.md Normal file
View File

@ -0,0 +1,12 @@
---
layout: default
title: Index management
nav_order: 30
has_children: true
---
# Index Management
OpenSearch Dashboards
{: .label .label-yellow :}
The Index Management (IM) plugin lets you automate recurring index management activities and reduce storage costs.

494
docs/im/ism/api.md Normal file
View File

@ -0,0 +1,494 @@
---
layout: default
title: ISM API
parent: Index State Management
grand_parent: Index management
redirect_from: /docs/ism/api/
nav_order: 5
---
# ISM API
Use the index state management operations to programmatically work with policies and managed indices.
---
#### Table of contents
- TOC
{:toc}
---
## Create policy
Creates a policy.
#### Request
```json
PUT _opensearch/_ism/policies/policy_1
{
"policy": {
"description": "ingesting logs",
"default_state": "ingest",
"states": [
{
"name": "ingest",
"actions": [
{
"rollover": {
"min_doc_count": 5
}
}
],
"transitions": [
{
"state_name": "search"
}
]
},
{
"name": "search",
"actions": [],
"transitions": [
{
"state_name": "delete",
"conditions": {
"min_index_age": "5m"
}
}
]
},
{
"name": "delete",
"actions": [
{
"delete": {}
}
],
"transitions": []
}
]
}
}
```
#### Sample response
```json
{
"_id": "policy_1",
"_version": 1,
"_primary_term": 1,
"_seq_no": 7,
"policy": {
"policy": {
"policy_id": "policy_1",
"description": "ingesting logs",
"last_updated_time": 1577990761311,
"schema_version": 1,
"error_notification": null,
"default_state": "ingest",
"states": [
{
"name": "ingest",
"actions": [
{
"rollover": {
"min_doc_count": 5
}
}
],
"transitions": [
{
"state_name": "search"
}
]
},
{
"name": "search",
"actions": [],
"transitions": [
{
"state_name": "delete",
"conditions": {
"min_index_age": "5m"
}
}
]
},
{
"name": "delete",
"actions": [
{
"delete": {}
}
],
"transitions": []
}
]
}
}
}
```
---
## Add policy
Adds a policy to an index. This operation does not change the policy if the index already has one.
#### Request
```json
POST _opensearch/_ism/add/index_1
{
"policy_id": "policy_1"
}
```
#### Sample response
```json
{
"updated_indices": 1,
"failures": false,
"failed_indices": []
}
```
---
## Update policy
Updates a policy. Use the `seq_no` and `primary_term` parameters to update an existing policy. If these numbers don't match the existing policy or the policy doesn't exist, ISM throws an error.
#### Request
```json
PUT _opensearch/_ism/policies/policy_1?if_seq_no=7&if_primary_term=1
{
"policy": {
"description": "ingesting logs",
"default_state": "ingest",
"states": [
{
"name": "ingest",
"actions": [
{
"rollover": {
"min_doc_count": 5
}
}
],
"transitions": [
{
"state_name": "search"
}
]
},
{
"name": "search",
"actions": [],
"transitions": [
{
"state_name": "delete",
"conditions": {
"min_index_age": "5m"
}
}
]
},
{
"name": "delete",
"actions": [
{
"delete": {}
}
],
"transitions": []
}
]
}
}
```
#### Sample response
```json
{
"_id": "policy_1",
"_version": 2,
"_primary_term": 1,
"_seq_no": 10,
"policy": {
"policy": {
"policy_id": "policy_1",
"description": "ingesting logs",
"last_updated_time": 1577990934044,
"schema_version": 1,
"error_notification": null,
"default_state": "ingest",
"states": [
{
"name": "ingest",
"actions": [
{
"rollover": {
"min_doc_count": 5
}
}
],
"transitions": [
{
"state_name": "search"
}
]
},
{
"name": "search",
"actions": [],
"transitions": [
{
"state_name": "delete",
"conditions": {
"min_index_age": "5m"
}
}
]
},
{
"name": "delete",
"actions": [
{
"delete": {}
}
],
"transitions": []
}
]
}
}
}
```
---
## Get policy
Gets the policy by `policy_id`.
#### Request
```json
GET _opensearch/_ism/policies/policy_1
```
#### Sample response
```json
{
"_id": "policy_1",
"_version": 2,
"_seq_no": 10,
"_primary_term": 1,
"policy": {
"policy_id": "policy_1",
"description": "ingesting logs",
"last_updated_time": 1577990934044,
"schema_version": 1,
"error_notification": null,
"default_state": "ingest",
"states": [
{
"name": "ingest",
"actions": [
{
"rollover": {
"min_doc_count": 5
}
}
],
"transitions": [
{
"state_name": "search"
}
]
},
{
"name": "search",
"actions": [],
"transitions": [
{
"state_name": "delete",
"conditions": {
"min_index_age": "5m"
}
}
]
},
{
"name": "delete",
"actions": [
{
"delete": {}
}
],
"transitions": []
}
]
}
}
```
---
## Remove policy from index
Removes any ISM policy from the index.
#### Request
```json
POST _opensearch/_ism/remove/index_1
```
#### Sample response
```json
{
"updated_indices": 1,
"failures": false,
"failed_indices": []
}
```
---
## Update managed index policy
Updates the managed index policy to a new policy (or to a new version of the policy). You can use an index pattern to update multiple indices at once. When updating multiple indices, you might want to include a state filter to only affect certain managed indices. The change policy filters out all the existing managed indices and only applies the change to the ones in the state that you specify. You can also explicitly specify the state that the managed index transitions to after the change policy takes effect.
A policy change is an asynchronous background process. The changes are queued and are not executed immediately by the background process. This delay in execution protects the currently running managed indices from being put into a broken state. If the policy you are changing to has only some small configuration changes, then the change takes place immediately. For example, if the policy changes the `min_index_age` parameter in a rollover condition from `1000d` to `100d`, this change takes place immediately in its next execution. If the change modifies the state, actions, or the order of actions of the current state the index is in, then the change happens at the end of its current state before transitioning to a new state.
In this example, the policy applied on the `index_1` index is changed to `policy_1`, which could either be a completely new policy or an updated version of its existing policy. The process only applies the change if the index is currently in the `searches` state. After this change in policy takes place, `index_1` transitions to the `delete` state.
#### Request
```json
POST _opensearch/_ism/change_policy/index_1
{
"policy_id": "policy_1",
"state": "delete",
"include": [
{
"state": "searches"
}
]
}
```
#### Sample response
```json
{
"updated_indices": 0,
"failures": false,
"failed_indices": []
}
```
---
## Retry failed index
Retries the failed action for an index. For the retry call to succeed, ISM must manage the index, and the index must be in a failed state. You can use index patterns (`*`) to retry multiple failed indices.
#### Request
```json
POST _opensearch/_ism/retry/index_1
{
"state": "delete"
}
```
#### Sample response
```json
{
"updated_indices": 0,
"failures": false,
"failed_indices": []
}
```
---
## Explain index
Gets the current state of the index. You can use index patterns to get the status of multiple indices.
#### Request
```json
GET _opensearch/_ism/explain/index_1
```
#### Sample response
```json
{
"index_1": {
"index.opensearch.index_state_management.policy_id": "policy_1"
}
}
```
The `opensearch.index_state_management.policy_id` setting is deprecated starting from version 1.13.0.
We retain this field in the response API for consistency.
---
## Delete policy
Deletes the policy by `policy_id`.
#### Request
```json
DELETE _opensearch/_ism/policies/policy_1
```
#### Sample response
```json
{
"_index": ".opensearch-ism-config",
"_type": "_doc",
"_id": "policy_1",
"_version": 3,
"result": "deleted",
"forced_refresh": true,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 15,
"_primary_term": 1
}
```

103
docs/im/ism/index.md Normal file
View File

@ -0,0 +1,103 @@
---
layout: default
title: Index State Management
nav_order: 3
parent: Index management
has_children: true
redirect_from: /docs/ism/
has_toc: false
---
# Index State Management
OpenSearch Dashboards
{: .label .label-yellow :}
If you analyze time-series data, you likely prioritize new data over old data. You might periodically perform certain operations on older indices, such as reducing replica count or deleting them.
Index State Management (ISM) is a plugin that lets you automate these periodic, administrative operations by triggering them based on changes in the index age, index size, or number of documents. Using the ISM plugin, you can define *policies* that automatically handle index rollovers or deletions to fit your use case.
For example, you can define a policy that moves your index into a `read_only` state after 30 days and then deletes it after a set period of 90 days. You can also set up the policy to send you a notification message when the index is deleted.
You might want to perform an index rollover after a certain amount of time or run a `force_merge` operation on an index during off-peak hours to improve search performance during peak hours.
To use the ISM plugin, your user role needs to be mapped to the `all_access` role that gives you full access to the cluster. To learn more, see [Users and roles](../security/access-control/users-roles/).
{: .note }
## Get started with ISM
To get started, choose **Index Management** in OpenSearch Dashboards.
### Step 1: Set up policies
A policy is a set of rules that describes how an index should be managed. For information about creating a policy, see [Policies](policies/).
1. Choose the **Index Policies** tab.
2. Choose **Create policy**.
3. In the **Name policy** section, enter a policy ID.
4. In the **Define policy** section, enter your policy.
5. Choose **Create**.
After you create a policy, your next step is to attach this policy to an index or indices.
You can set up an `ism_template` in the policy so when you create an index that matches the ISM template pattern, the index will have this policy attached to it:
```json
PUT _opensearch/_ism/policies/policy_id
{
"policy": {
"description": "Example policy.",
"default_state": "...",
"states": [...],
"ism_template": {
"index_patterns": ["index_name-*"],
"priority": 100
}
}
}
```
For an example ISM template policy, see [Sample policy with ISM template](policies/#sample-policy-with-ism-template).
Older versions of the plugin include the `policy_id` in an index template, so when an index is created that matches the index template pattern, the index will have the policy attached to it:
```json
PUT _index_template/<template_name>
{
"index_patterns": [
"index_name-*"
],
"template": {
"settings": {
"opensearch.index_state_management.policy_id": "policy_id"
}
}
}
```
The `opensearch.index_state_management.policy_id` setting is deprecated starting from version 1.13.0. You can continue to automatically manage newly created indices with the ISM template field.
{: .note }
### Step 2: Attach policies to indices
1. Choose **Indices**.
2. Choose the index or indices that you want to attach your policy to.
3. Choose **Apply policy**.
4. From the **Policy ID** menu, choose the policy that you created.
You can see a preview of your policy.
5. If your policy includes a rollover operation, specify a rollover alias.
Make sure that the alias that you enter already exists. For more information about the rollover operation, see [rollover](policies/#rollover).
6. Choose **Apply**.
After you attach a policy to an index, ISM creates a job that runs every 5 minutes by default to perform policy actions, check conditions, and transition the index into different states. To change the default time interval for this job, see [Settings](settings/).
If you want to use an OpenSearch operation to create an index with a policy already attached to it, see [create index](api/#create-index).
### Step 3: Manage indices
1. Choose **Managed Indices**.
2. To change your policy, see [Change Policy](managedindices/#change-policy).
3. To attach a rollover alias to your index, select your policy and choose **Add rollover alias**.
Make sure that the alias that you enter already exists. For more information about the rollover operation, see [rollover](policies/#rollover).
4. To remove a policy, choose your policy, and then choose **Remove policy**.
5. To retry a policy, choose your policy, and then choose **Retry policy**.
For information about managing your policies, see [Managed Indices](managedindices/).

View File

@ -0,0 +1,75 @@
---
layout: default
title: Managed Indices
nav_order: 3
parent: Index State Management
grand_parent: Index management
redirect_from: /docs/ism/managedindices/
has_children: false
---
# Managed indices
You can change or update a policy using the managed index operations.
This table lists the fields of managed index operations.
Parameter | Description | Type | Required | Read Only
:--- | :--- |:--- |:--- |
`name` | The name of the managed index policy. | `string` | Yes | No
`index` | The name of the managed index that this policy is managing. | `string` | Yes | No
`index_uuid` | The uuid of the index. | `string` | Yes | No
`enabled` | When `true`, the managed index is scheduled and run by the scheduler. | `boolean` | Yes | No
`enabled_time` | The time the managed index was last enabled. If the managed index process is disabled, then this is null. | `timestamp` | Yes | Yes
`last_updated_time` | The time the managed index was last updated. | `timestamp` | Yes | Yes
`schedule` | The schedule of the managed index job. | `object` | Yes | No
`policy_id` | The name of the policy used by this managed index. | `string` | Yes | No
`policy_seq_no` | The sequence number of the policy used by this managed index. | `number` | Yes | No
`policy_primary_term` | The primary term of the policy used by this managed index. | `number` | Yes | No
`policy_version` | The version of the policy used by this managed index. | `number` | Yes | Yes
`policy` | The cached JSON of the policy for the `policy_version` that's used during runs. If the policy is null, it means that this is the first execution of the job and the latest policy document is read in/saved. | `object` | No | No
`change_policy` | The information regarding what policy and state to change to. | `object` | No | No
`policy_name` | The name of the policy to update to. To update to the latest version, set this to be the same as the current `policy_name`. | `string` | No | Yes
`state` | The state of the managed index after it finishes updating. If no state is specified, it's assumed that the policy structure did not change. | `string` | No | Yes
The following example shows a managed index policy:
```json
{
"managed_index": {
"name": "my_index",
"index": "my_index",
"index_uuid": "sOKSOfkdsoSKeofjIS",
"enabled": true,
"enabled_time": 1553112384,
"last_updated_time": 1553112384,
"schedule": {
"interval": {
"period": 1,
"unit": "MINUTES",
"start_time": 1553112384
}
},
"policy_id": "log_rotation",
"policy_version": 1,
"policy": {...},
"change_policy": null
}
}
```
## Change policy
You can change any managed index policy, but ISM has a few constraints in place to make sure that policy changes don't break indices.
If an index is stuck in its current state, never proceeding, and you want to update its policy immediately, make sure that the new policy includes the same state---same name, same actions, same order---as the old policy. In this case, even if the policy is in the middle of executing an action, ISM applies the new policy.
If you update the policy without including an identical state, ISM updates the policy only after all actions in the current state finish executing. Alternately, you can choose a specific state in your old policy after which you want the new policy to take effect.
To change a policy using OpenSearch Dashboards, do the following:
- Under **Managed indices**, choose the indices that you want to attach the new policy to.
- To attach the new policy to indices in specific states, choose **Choose state filters**, and then choose those states.
- Under **Choose New Policy**, choose the new policy.
- To start the new policy for indices in the current state, choose **Keep indices in their current state after the policy takes effect**.
- To start the new policy in a specific state, choose **Start from a chosen state after changing policies**, and then choose the default start state in your new policy.

666
docs/im/ism/policies.md Normal file
View File

@ -0,0 +1,666 @@
---
layout: default
title: Policies
nav_order: 1
parent: Index State Management
grand_parent: Index management
redirect_from: /docs/ism/policies/
has_children: false
---
# Policies
Policies are JSON documents that define the following:
- The *states* that an index can be in, including the default state for new indices. For example, you might name your states "hot," "warm," "delete," and so on. For more information, see [States](#states).
- Any *actions* that you want the plugin to take when an index enters a state, such as performing a rollover. For more information, see [Actions](#actions).
- The conditions that must be met for an index to move into a new state, known as *transitions*. For example, if an index is more than eight weeks old, you might want to move it to the "delete" state. For more information, see [Transitions](#transitions).
In other words, a policy defines the *states* that an index can be in, the *actions* to perform when in a state, and the conditions that must be met to *transition* between states.
You have complete flexibility in the way you can design your policies. You can create any state, transition to any other state, and specify any number of actions in each state.
This table lists the relevant fields of a policy.
Field | Description | Type | Required | Read Only
:--- | :--- |:--- |:--- |
`policy_id` | The name of the policy. | `string` | Yes | Yes
`description` | A human-readable description of the policy. | `string` | Yes | No
`ism_template` | Specify an ISM template pattern that matches the index to apply the policy. | `nested list of objects` | No | No
`last_updated_time` | The time the policy was last updated. | `timestamp` | Yes | Yes
`error_notification` | The destination and message template for error notifications. The destination could be Amazon Chime, Slack, or a webhook URL. | `object` | No | No
`default_state` | The default starting state for each index that uses this policy. | `string` | Yes | No
`states` | The states that you define in the policy. | `nested list of objects` | Yes | No
---
#### Table of contents
1. TOC
{:toc}
---
## States
A state is the description of the status that the managed index is currently in. A managed index can be in only one state at a time. Each state has associated actions that are executed sequentially on entering a state and transitions that are checked after all the actions have been completed.
This table lists the parameters that you can define for a state.
Field | Description | Type | Required
:--- | :--- |:--- |:--- |
`name` | The name of the state. | `string` | Yes
`actions` | The actions to execute after entering a state. For more information, see [Actions](#actions). | `nested list of objects` | Yes
`transitions` | The next states and the conditions required to transition to those states. If no transitions exist, the policy assumes that it's complete and can now stop managing the index. For more information, see [Transitions](#transitions). | `nested list of objects` | Yes
---
## Actions
Actions are the steps that the policy sequentially executes on entering a specific state.
They are executed in the order in which they are defined.
This table lists the parameters that you can define for an action.
Parameter | Description | Type | Required | Default
:--- | :--- |:--- |:--- |
`timeout` | The timeout period for the action. Accepts time units for minutes, hours, and days. | `time unit` | No | -
`retry` | The retry configuration for the action. | `object` | No | Specific to action
The `retry` operation has the following parameters:
Parameter | Description | Type | Required | Default
:--- | :--- |:--- |:--- |
`count` | The number of retry counts. | `number` | Yes | -
`backoff` | The backoff policy type to use when retrying. | `string` | No | Exponential
`delay` | The time to wait between retries. Accepts time units for minutes, hours, and days. | `time unit` | No | 1 minute
The following example action has a timeout period of one hour. The policy retries this action three times with an exponential backoff policy, with a delay of 10 minutes between each retry:
```json
"actions": {
"timeout": "1h",
"retry": {
"count": 3,
"backoff": "exponential",
"delay": "10m"
}
}
```
For a list of available unit types, see [Supported units](../../../opensearch/units/).
## ISM supported operations
ISM supports the following operations:
- [force_merge](#forcemerge)
- [read_only](#read_only)
- [read_write](#read_write)
- [replica_count](#replica_count)
- [close](#close)
- [open](#open)
- [delete](#delete)
- [rollover](#rollover)
- [notification](#notification)
- [snapshot](#snapshot)
- [index_priority](#index_priority)
- [allocation](#allocation)
### force_merge
Reduces the number of Lucene segments by merging the segments of individual shards. This operation attempts to set the index to a `read-only` state before starting the merging process.
Parameter | Description | Type | Required
:--- | :--- |:--- |:--- |
`max_num_segments` | The number of segments to reduce the shard to. | `number` | Yes
```json
{
"force_merge": {
"max_num_segments": 1
}
}
```
### read_only
Sets a managed index to be read only.
```json
{
"read_only": {}
}
```
### read_write
Sets a managed index to be writeable.
```json
{
"read_write": {}
}
```
### replica_count
Sets the number of replicas to assign to an index.
Parameter | Description | Type | Required
:--- | :--- |:--- |:--- |
`number_of_replicas` | Defines the number of replicas to assign to an index. | `number` | Yes
```json
{
"replica_count": {
"number_of_replicas": 2
}
}
```
For information about setting replicas, see [Primary and replica shards](../../../opensearch/#primary-and-replica-shards).
### close
Closes the managed index.
```json
{
"close": {}
}
```
Closed indices remain on disk, but consume no CPU or memory. You can't read from, write to, or search closed indices.
Closing an index is a good option if you need to retain data for longer than you need to actively search it and have sufficient disk space on your data nodes. If you need to search the data again, reopening a closed index is simpler than restoring an index from a snapshot.
### open
Opens a managed index.
```json
{
"open": {}
}
```
### delete
Deletes a managed index.
```json
{
"delete": {}
}
```
### rollover
Rolls an alias over to a new index when the managed index meets one of the rollover conditions.
The index format must match the pattern: `^.*-\d+$`. For example, `(logs-000001)`.
Set `index.opensearch.index_state_management.rollover_alias` as the alias to rollover.
Parameter | Description | Type | Example | Required
:--- | :--- |:--- |:--- |
`min_size` | The minimum size of the total primary shard storage (not counting replicas) required to roll over the index. For example, if you set `min_size` to 100 GiB and your index has 5 primary shards and 5 replica shards of 20 GiB each, the total size of the primaries is 100 GiB, so the rollover occurs. ISM doesn't check indices continually, so it doesn't roll over indices at exactly 100 GiB. Instead, if an index is continuously growing, ISM might check it at 99 GiB, not perform the rollover, check again when the shards reach 105 GiB, and then perform the operation. | `string` | `20gb` or `5mb` | No
`min_doc_count` | The minimum number of documents required to roll over the index. | `number` | `2000000` | No
`min_index_age` | The minimum age required to roll over the index. Index age is the time between its creation and the present. | `string` | `5d` or `7h` | No
```json
{
"rollover": {
"min_size": "50gb"
}
}
```
```json
{
"rollover": {
"min_doc_count": 100000000
}
}
```
```json
{
"rollover": {
"min_index_age": "30d"
}
}
```
### notification
Sends you a notification.
Parameter | Description | Type | Required
:--- | :--- |:--- |:--- |
`destination` | The destination URL. | `Slack, Amazon Chime, or webhook URL` | Yes
`message_template` | The text of the message. You can add variables to your messages using [Mustache templates](https://mustache.github.io/mustache.5.html). | `object` | Yes
The destination system **must** return a response otherwise the notification operation throws an error.
#### Example 1: Chime notification
```json
{
"notification": {
"destination": {
"chime": {
"url": "<url>"
}
},
"message_template": {
"source": "the index is {% raw %}{{ctx.index}}{% endraw %}"
}
}
}
```
#### Example 2: Custom webhook notification
```json
{
"notification": {
"destination": {
"custom_webhook": {
"url": "https://<your_webhook>"
}
},
"message_template": {
"source": "the index is {% raw %}{{ctx.index}}{% endraw %}"
}
}
}
```
#### Example 3: Slack notification
```json
{
"notification": {
"destination": {
"slack": {
"url": "https://hooks.slack.com/services/xxx/xxxxxx"
}
},
"message_template": {
"source": "the index is {% raw %}{{ctx.index}}{% endraw %}"
}
}
}
```
You can use `ctx` variables in your message to represent a number of policy parameters based on the past executions of your policy. For example, if your policy has a rollover action, you can use `{% raw %}{{ctx.action.name}}{% endraw %}` in your message to represent the name of the rollover.
The following `ctx` variable options are available for every policy:
#### Guaranteed variables
Parameter | Description | Type
:--- | :--- |:--- |:--- |
`index` | The name of the index. | `string`
`index_uuid` | The uuid of the index. | `string`
`policy_id` | The name of the policy. | `string`
### snapshot
Backup your clusters indices and state. For more information about snapshots, see [Take and restore snapshots](../../../opensearch/snapshot-restore/).
The `snapshot` operation has the following parameters:
Parameter | Description | Type | Required | Default
:--- | :--- |:--- |:--- |
`repository` | The repository name that you register through the native snapshot API operations. | `string` | Yes | -
`snapshot` | The name of the snapshot. | `string` | Yes | -
```json
{
"snapshot": {
"repository": "my_backup",
"snapshot": "my_snapshot"
}
}
```
### index_priority
Set the priority for the index in a specific state. Unallocated shards of indices are recovered in the order of their priority, whenever possible. The indices with higher priority values are recovered first followed by the indices with lower priority values.
The `index_priority` operation has the following parameter:
Parameter | Description | Type | Required | Default
:--- | :--- |:--- |:--- |:---
`priority` | The priority for the index as soon as it enters a state. | `number` | Yes | 1
```json
"actions": [
{
"index_priority": {
"priority": 50
}
}
]
```
### allocation
Allocate the index to a node with a specific attribute.
For example, setting `require` to `warm` moves your data only to "warm" nodes.
The `allocation` operation has the following parameters:
Parameter | Description | Type | Required
:--- | :--- |:--- |:---
`require` | Allocate the index to a node with a specified attribute. | `string` | Yes
`include` | Allocate the index to a node with any of the specified attributes. | `string` | Yes
`exclude` | Dont allocate the index to a node with any of the specified attributes. | `string` | Yes
`wait_for` | Wait for the policy to execute before allocating the index to a node with a specified attribute. | `string` | Yes
```json
"actions": [
{
"allocation": {
"require": { "box_type": "warm" }
}
}
]
```
---
## Transitions
Transitions define the conditions that need to be met for a state to change. After all actions in the current state are completed, the policy starts checking the conditions for transitions.
Transitions are evaluated in the order in which they are defined. For example, if the conditions for the first transition are met, then this transition takes place and the rest of the transitions are dismissed.
If you don't specify any conditions in a transition and leave it empty, then it's assumed to be the equivalent of always true. This means that the policy transitions the index to this state the moment it checks.
This table lists the parameters you can define for transitions.
Parameter | Description | Type | Required
:--- | :--- |:--- |:--- |
`state_name` | The name of the state to transition to if the conditions are met. | `string` | Yes
`conditions` | List the conditions for the transition. | `list` | Yes
The `conditions` object has the following parameters:
Parameter | Description | Type | Required
:--- | :--- |:--- |:--- |
`min_index_age` | The minimum age of the index required to transition. | `string` | No
`min_doc_count` | The minimum document count of the index required to transition. | `number` | No
`min_size` | The minimum size of the index required to transition. | `string` | No
`cron` | The `cron` job that triggers the transition if no other transition happens first. | `object` | No
`cron.cron.expression` | The `cron` expression that triggers the transition. | `string` | Yes
`cron.cron.timezone` | The timezone that triggers the transition. | `string` | Yes
The following example transitions the index to a `cold` state after a period of 30 days:
```json
"transitions": [
{
"state_name": "cold",
"conditions": {
"min_index_age": "30d"
}
}
]
```
ISM checks the conditions on every execution of the policy based on the set interval.
This example uses the `cron` condition to transition indices every Saturday at 5:00 PT:
```json
"transitions": [
{
"state_name": "cold",
"conditions": {
"cron": {
"cron": {
"expression": "* 17 * * SAT",
"timezone": "America/Los_Angeles"
}
}
}
}
]
```
Note that this condition does not execute at exactly 5:00 PM; the job still executes based off the `job_interval` setting. Due to this variance in start time and the amount of time that it can take for actions to complete prior to checking transition conditions, we recommend against overly narrow cron expressions. For example, don't use `15 17 * * SAT` (5:15 PM on Saturday).
A window of an hour, which this example uses, is generally sufficient, but you might increase it to 2--3 hours to avoid missing the window and having to wait a week for the transition to occur. Alternately, you could use a broader expression such as `* * * * SAT,SUN` to have the transition occur at any time during the weekend.
For information on writing cron expressions, see [Cron expression reference](../../../alerting/cron/).
---
## Error notifications
The `error_notification` operation sends you a notification if your managed index fails.
It notifies a single destination with a custom message.
Set up error notifications at the policy level:
```json
{
"policy": {
"description": "hot warm delete workflow",
"default_state": "hot",
"schema_version": 1,
"error_notification": { },
"states": [ ]
}
}
```
Parameter | Description | Type | Required
:--- | :--- |:--- |:--- |
`destination` | The destination URL. | `Slack, Amazon Chime, or webhook URL` | Yes
`message_template` | The text of the message. You can add variables to your messages using [Mustache templates](https://mustache.github.io/mustache.5.html). | `object` | Yes
The destination system **must** return a response otherwise the `error_notification` operation throws an error.
#### Example 1: Chime notification
```json
{
"error_notification": {
"destination": {
"chime": {
"url": "<url>"
}
},
"message_template": {
"source": "The index {% raw %}{{ctx.index}}{% endraw %} failed during policy execution."
}
}
}
```
#### Example 2: Custom webhook notification
```json
{
"error_notification": {
"destination": {
"custom_webhook": {
"url": "https://<your_webhook>"
}
},
"message_template": {
"source": "The index {% raw %}{{ctx.index}}{% endraw %} failed during policy execution."
}
}
}
```
#### Example 3: Slack notification
```json
{
"error_notification": {
"destination": {
"slack": {
"url": "https://hooks.slack.com/services/xxx/xxxxxx"
}
},
"message_template": {
"source": "The index {% raw %}{{ctx.index}}{% endraw %} failed during policy execution."
}
}
}
```
You can use the same options for `ctx` variables as the [notification](#notification) operation.
## Sample policy with ISM template
The following sample template policy is for a rollover use case:
1. Create a policy with an `ism_template` field.
```json
PUT _opensearch/_ism/policies/rollover_policy
{
"policy": {
"description": "Example rollover policy.",
"default_state": "rollover",
"states": [
{
"name": "rollover",
"actions": [
{
"rollover": {
"min_doc_count": 1
}
}
],
"transitions": []
}
],
"ism_template": {
"index_patterns": ["log*"],
"priority": 100
}
}
}
```
You need to specify the `index_patterns` field. If you don't specify a value for `priority`, it defaults to 0.
1. Set up a template with the `rollover_alias` as `log` :
```json
PUT _template/ism_rollover
{
"index_patterns": ["log*"],
"settings": {
"opensearch.index_state_management.rollover_alias": "log"
}
}
```
1. Create an index with the `log` alias:
```json
PUT log-000001
{
"aliases": {
"log": {
"is_write_index": true
}
}
}
```
1. Index a document to trigger the rollover condition:
```json
POST log/_doc
{
"message": "dummy"
}
```
## Example policy
The following example policy implements a `hot`, `warm`, and `delete` workflow. You can use this policy as a template to prioritize resources to your indices based on their levels of activity.
In this case, an index is initially in a `hot` state. After a day, it changes to a `warm` state, where the number of replicas increases to 5 to improve the read performance.
After 30 days, the policy moves this index into a `delete` state. The service sends a notification to a Chime room that the index is being deleted, and then permanently deletes it.
```json
{
"policy": {
"description": "hot warm delete workflow",
"default_state": "hot",
"schema_version": 1,
"states": [
{
"name": "hot",
"actions": [
{
"rollover": {
"min_index_age": "1d"
}
}
],
"transitions": [
{
"state_name": "warm"
}
]
},
{
"name": "warm",
"actions": [
{
"replica_count": {
"number_of_replicas": 5
}
}
],
"transitions": [
{
"state_name": "delete",
"conditions": {
"min_index_age": "30d"
}
}
]
},
{
"name": "delete",
"actions": [
{
"notification": {
"destination": {
"chime": {
"url": "<URL>"
}
},
"message_template": {
"source": "The index {% raw %}{{ctx.index}}{% endraw %} is being deleted"
}
}
},
{
"delete": {}
}
]
}
]
}
}
```
This diagram shows the `states`, `transitions`, and `actions` of the above policy as a finite-state machine. For more information about finite-state machines, see [Wikipedia](https://en.wikipedia.org/wiki/Finite-state_machine).
![Policy State Machine](../../images/ism.png)

50
docs/im/ism/settings.md Normal file
View File

@ -0,0 +1,50 @@
---
layout: default
title: Settings
parent: Index State Management
grand_parent: Index management
redirect_from: /docs/ism/settings/
nav_order: 4
---
# ISM Settings
We don't recommend changing these settings; the defaults should work well for most use cases.
Index State Management (ISM) stores its configuration in the `.opensearch-ism-config` index. Don't modify this index without using the [ISM API operations](../api/).
All settings are available using the OpenSearch `_cluster/settings` operation. None require a restart, and all can be marked `persistent` or `transient`.
Setting | Default | Description
:--- | :--- | :---
`opensearch.index_state_management.enabled` | True | Specifies whether ISM is enabled or not.
`opensearch.index_state_management.job_interval` | 5 minutes | The interval at which the managed index jobs are run.
`opensearch.index_state_management.coordinator.sweep_period` | 10 minutes | How often the routine background sweep is run.
`opensearch.index_state_management.coordinator.backoff_millis` | 50 milliseconds | The backoff time between retries for failures in the `ManagedIndexCoordinator` (such as when we update managed indices).
`opensearch.index_state_management.coordinator.backoff_count` | 2 | The count of retries for failures in the `ManagedIndexCoordinator`.
`opensearch.index_state_management.history.enabled` | True | Specifies whether audit history is enabled or not. The logs from ISM are automatically indexed to a logs document.
`opensearch.index_state_management.history.max_docs` | 2,500,000 | The maximum number of documents before rolling over the audit history index.
`opensearch.index_state_management.history.max_age` | 24 hours | The maximum age before rolling over the audit history index.
`opensearch.index_state_management.history.rollover_check_period` | 8 hours | The time between rollover checks for the audit history index.
`opensearch.index_state_management.history.rollover_retention_period` | 30 days | How long audit history indices are kept.
`opensearch.index_state_management.allow_list` | All actions | List of actions that you can use.
## Audit history indices
If you don't want to disable ISM audit history or shorten the retention period, you can create an [index template](../../../opensearch/index-templates/) to reduce the shard count of the history indices:
```json
PUT _index_template/ism_history_indices
{
"index_patterns": [
".opensearch-ism-managed-index-history-*"
],
"template": {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
}
```

View File

@ -0,0 +1,40 @@
---
layout: default
title: Refresh Search Analyzer
nav_order: 40
parent: Index management
has_children: false
redirect_from: /docs/ism/refresh-analyzer/
has_toc: false
---
# Refresh search analyzer
With ISM installed, you can refresh search analyzers in real time with the following API:
```json
POST /_opensearch/_refresh_search_analyzers/<index or alias or wildcard>
```
For example, if you change the synonym list in your analyzer, the change takes effect without you needing to close and reopen the index.
To work, the token filter must have an `updateable` flag of `true`:
```json
{
"analyzer": {
"my_synonyms": {
"tokenizer": "whitespace",
"filter": [
"synonym"
]
}
},
"filter": {
"synonym": {
"type": "synonym_graph",
"synonyms_path": "synonyms.txt",
"updateable": true
}
}
}
```

BIN
docs/images/ad.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 328 KiB

BIN
docs/images/alerting.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

BIN
docs/images/cli.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 MiB

BIN
docs/images/cluster.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

BIN
docs/images/expression.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

BIN
docs/images/gantt-chart.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 152 KiB

BIN
docs/images/hot-rod.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 68 KiB

BIN
docs/images/ism.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

BIN
docs/images/joinPart.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 121 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 43 KiB

BIN
docs/images/perftop.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 236 KiB

BIN
docs/images/ppl.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 220 KiB

BIN
docs/images/predicate.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 112 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 218 KiB

BIN
docs/images/security.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.9 KiB

BIN
docs/images/showFilter.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.5 KiB

BIN
docs/images/sql.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 307 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 165 KiB

View File

@ -0,0 +1 @@
<mxfile host="app.diagrams.net" modified="2020-11-23T19:22:49.045Z" agent="5.0 (Macintosh)" etag="Ku95kVLzIJHlZrevsW8q" version="13.10.0" type="device"><diagram id="HFMyMVgqT-aEfhmvL3X9" name="Page-1">5Vhdb5swFP01eawEmBD62CZpp31ok1qpe5scfAPeDJca0yT79bskpoSQRYm0bCx9ifC5Juaec66vYcDG6fJe8zz5hALUwHPEcsAmA88LGaPfClhtgOHwegPEWooN5DbAg/wJFnQsWkoBRWuiQVRG5m0wwiyDyLQwrjUu2tPmqNqr5jyGDvAQcdVFn6QwiU1r6DT4O5BxUq/sOjaS8nqyBYqEC1xsQWw6YGONaDZX6XIMquKu5mVz391voq8PpiEzx9xw5Rjff35/823+dTLxnp30qRBXweZfXrgqbcL2Yc2qZgAEEWKHqE2CMWZcTRv0VmOZCaiWcWjUzPmImBPoEvgdjFlZdXlpkKDEpMpG55gZG3QDGndTs9kWWOoIDuRTW4TrGMyBeTbNKretBSxx94ApGL2iCRoUN/KlbQZuPRW/zmtopwvL/AkquB0VbvJcyYiWxqwjSEN3xd0ikQYecr6mZUE1uEOtVGqMCvX6XiY4hPOI8MJo/AFbkSAKYTY/RYwX0AaWB+mzUd+WgN0D3NCOF01FuXWZJFvVFDhnIjy8MNt7R9qe9cr2XkeFzzlkj6Cgfg7yp6JdnTz6/xeBF/atCq4vrArYkVXg96oKWEeFCTeckC8a8hz+sPWHEAp/n/VDb8aC4DzWH3p9s747ujDv+0d63w16ZX6/2wImd9PqPK/KwlyE+UdB78zfPW8+6opDek1bbz07nFOipk1sm8AMM9hh20JcyTijYUT0kZjstqKNzrXqxgZSKcS6lPYp2S6vM0jj7kozOlIa/2zS7CmInTPR25LId3on0bAj0VTxgjgrgOsoqSTCqEwp5+Lt6BTsvuP9e5263zY+yBnP+LqGimSGXIsD/cU5ub+ASx1mtK+/XAcjxv9Sf/HO2F9o2HywWse2vvqx6S8=</diagram></mxfile>

BIN
docs/images/ta-services.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 200 KiB

BIN
docs/images/ta-trace.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 186 KiB

3
docs/images/ta.svg Normal file

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 10 KiB

BIN
docs/images/tableName.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.8 KiB

BIN
docs/images/tableSource.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.4 KiB

BIN
docs/images/workbench.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.3 MiB

View File

@ -0,0 +1,143 @@
---
layout: default
title: Docker security configuration
parent: Install and configure
nav_order: 5
---
# Docker security configuration
Before deploying to a production environment, you should replace the demo security certificates and configuration YAML files with your own. With the tarball, you have direct access to the file system, but the Docker image requires modifying the Docker storage volumes include the replacement files.
Additionally, you can set the Docker environment variable `DISABLE_INSTALL_DEMO_CONFIG` to `true`. This change completely disables the demo installer.
#### Sample Docker Compose file
```yml
version: '3'
services:
opensearch-node1:
image: amazon/opensearch:{{site.opensearch_version}}
container_name: opensearch-node1
environment:
- cluster.name=opensearch-cluster
- node.name=opensearch-node1
- discovery.seed_hosts=opensearch-node1,opensearch-node2
- cluster.initial_master_nodes=opensearch-node1,opensearch-node2
- bootstrap.memory_lock=true # along with the memlock settings below, disables swapping
- "ES_JAVA_OPTS=-Xms512m -Xmx512m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM
- network.host=0.0.0.0 # required if not using the demo security configuration
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536 # maximum number of open files for the OpenSearch user, set to at least 65536 on modern systems
hard: 65536
volumes:
- opensearch-data1:/usr/share/opensearch/data
- ./root-ca.pem:/usr/share/opensearch/config/root-ca.pem
- ./node.pem:/usr/share/opensearch/config/node.pem
- ./node-key.pem:/usr/share/opensearch/config/node-key.pem
- ./admin.pem:/usr/share/opensearch/config/admin.pem
- ./admin-key.pem:/usr/share/opensearch/config/admin-key.pem
- ./custom-opensearch.yml:/usr/share/opensearch/config/opensearch.yml
- ./internal_users.yml:/usr/share/opensearch/plugins/opensearch_security/securityconfig/internal_users.yml
- ./roles_mapping.yml:/usr/share/opensearch/plugins/opensearch_security/securityconfig/roles_mapping.yml
- ./tenants.yml:/usr/share/opensearch/plugins/opensearch_security/securityconfig/tenants.yml
- ./roles.yml:/usr/share/opensearch/plugins/opensearch_security/securityconfig/roles.yml
- ./action_groups.yml:/usr/share/opensearch/plugins/opensearch_security/securityconfig/action_groups.yml
ports:
- 9200:9200
- 9600:9600 # required for Performance Analyzer
networks:
- opensearch-net
opensearch-node2:
image: amazon/opensearch:{{site.opensearch_version}}
container_name: opensearch-node2
environment:
- cluster.name=opensearch-cluster
- node.name=opensearch-node2
- discovery.seed_hosts=opensearch-node1,opensearch-node2
- cluster.initial_master_nodes=opensearch-node1,opensearch-node2
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- network.host=0.0.0.0
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- opensearch-data2:/usr/share/opensearch/data
- ./root-ca.pem:/usr/share/opensearch/config/root-ca.pem
- ./node.pem:/usr/share/opensearch/config/node.pem
- ./node-key.pem:/usr/share/opensearch/config/node-key.pem
- ./admin.pem:/usr/share/opensearch/config/admin.pem
- ./admin-key.pem:/usr/share/opensearch/config/admin-key.pem
- ./custom-opensearch.yml:/usr/share/opensearch/config/opensearch.yml
- ./internal_users.yml:/usr/share/opensearch/plugins/opensearch_security/securityconfig/internal_users.yml
- ./roles_mapping.yml:/usr/share/opensearch/plugins/opensearch_security/securityconfig/roles_mapping.yml
- ./tenants.yml:/usr/share/opensearch/plugins/opensearch_security/securityconfig/tenants.yml
- ./roles.yml:/usr/share/opensearch/plugins/opensearch_security/securityconfig/roles.yml
- ./action_groups.yml:/usr/share/opensearch/plugins/opensearch_security/securityconfig/action_groups.yml
networks:
- opensearch-net
opensearch-dashboards
image: amazon/opensearch-dashboards{{site.opensearch_version}}
container_name: opensearch-dashboards
ports:
- 5601:5601
expose:
- "5601"
environment:
OPENSEARCH_URL: https://opensearch-node1:9200
OPENSEARCH_HOSTS: https://opensearch-node1:9200
volumes:
- ./custom-opensearch_dashboards.yml:/usr/share/opensearch-dashboards/config/opensearch_dashboards.yml
networks:
- opensearch-net
volumes:
opensearch-data1:
opensearch-data2:
networks:
opensearch-net:
```
Then make your changes to `opensearch.yml`. For a full list of settings, see [Security](../../security/configuration/). This example adds (extremely) verbose audit logging:
```yml
opensearch_security.ssl.transport.pemcert_filepath: node.pem
opensearch_security.ssl.transport.pemkey_filepath: node-key.pem
opensearch_security.ssl.transport.pemtrustedcas_filepath: root-ca.pem
opensearch_security.ssl.transport.enforce_hostname_verification: false
opensearch_security.ssl.http.enabled: true
opensearch_security.ssl.http.pemcert_filepath: node.pem
opensearch_security.ssl.http.pemkey_filepath: node-key.pem
opensearch_security.ssl.http.pemtrustedcas_filepath: root-ca.pem
opensearch_security.allow_default_init_securityindex: true
opensearch_security.authcz.admin_dn:
- CN=A,OU=UNIT,O=ORG,L=TORONTO,ST=ONTARIO,C=CA
opensearch_security.nodes_dn:
- 'CN=N,OU=UNIT,O=ORG,L=TORONTO,ST=ONTARIO,C=CA'
opensearch_security.audit.type: internal_opensearch
opensearch_security.enable_snapshot_restore_privilege: true
opensearch_security.check_snapshot_restore_write_privileges: true
opensearch_security.restapi.roles_enabled: ["all_access", "security_rest_api_access"]
cluster.routing.allocation.disk.threshold_enabled: false
opensearch_security.audit.config.disabled_rest_categories: NONE
opensearch_security.audit.config.disabled_transport_categories: NONE
```
Use this same override process to specify new [authentication settings](../../security/configuration/configuration/) in `/usr/share/opensearch/plugins/opensearch_security/securityconfig/config.yml`, as well as new default [internal users, roles, mappings, action groups, and tenants](../../security/configuration/yaml/).
To start the cluster, run `docker-compose up`.
If you encounter any `File /usr/share/opensearch/config/opensearch.yml has insecure file permissions (should be 0600)` messages, you can use `chmod` to set file permissions before running `docker-compose up`. Docker Compose passes files to the container as-is.
{: .note }
Finally, you can reach OpenSearch Dashboards at http://localhost:5601, sign in, and use the **Security** panel to perform other management tasks.

344
docs/install/docker.md Normal file
View File

@ -0,0 +1,344 @@
---
layout: default
title: Docker
parent: Install and configure
nav_order: 1
---
# Docker image
You can pull the OpenSearch Docker image just like any other image:
```bash
docker pull amazon/opensearch:{{site.opensearch_version}}
docker pull amazon/opensearch-dashboards{{site.opensearch_version}}
```
To check available versions, see [Docker Hub](https://hub.docker.com/r/amazon/opensearch/tags).
OpenSearch images use `centos:7` as the base image. If you run Docker locally, we recommend allowing Docker to use at least 4 GB of RAM in **Preferences** > **Resources**.
---
#### Table of contents
1. TOC
{:toc}
---
## Run the image
To run the image for local development:
```bash
docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" amazon/opensearch:{{site.opensearch_version}}
```
Then send requests to the server to verify that OpenSearch is up and running:
```bash
curl -XGET https://localhost:9200 -u 'admin:admin' --insecure
curl -XGET https://localhost:9200/_cat/nodes?v -u 'admin:admin' --insecure
curl -XGET https://localhost:9200/_cat/plugins?v -u 'admin:admin' --insecure
```
To find the container ID:
```bash
docker ps
```
Then you can stop the container using:
```bash
docker stop <container-id>
```
## Start a cluster
To deploy multiple nodes and simulate a more realistic deployment, create a [docker-compose.yml](https://docs.docker.com/compose/compose-file/) file appropriate for your environment and run:
```bash
docker-compose up
```
To stop the cluster, run:
```bash
docker-compose down
```
To stop the cluster and delete all data volumes, run:
```bash
docker-compose down -v
```
#### Sample Docker Compose file
This sample file starts two data nodes and a container for OpenSearch Dashboards.
```yml
version: '3'
services:
opensearch-node1:
image: opensearchstaging/opensearch:latest
container_name: opensearch-node1
environment:
- cluster.name=opensearch-cluster
- node.name=opensearch-node1
- discovery.seed_hosts=opensearch-node1,opensearch-node2
- cluster.initial_master_nodes=opensearch-node1,opensearch-node2
- bootstrap.memory_lock=true # along with the memlock settings below, disables swapping
- "ES_JAVA_OPTS=-Xms512m -Xmx512m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536 # maximum number of open files for the OpenSearch user, set to at least 65536 on modern systems
hard: 65536
volumes:
- opensearch-data1:/usr/share/opensearch/data
ports:
- 9200:9200
- 9600:9600 # required for Performance Analyzer
networks:
- opensearch-net
opensearch-node2:
image: opensearchstaging/opensearch:latest
container_name: opensearch-node2
environment:
- cluster.name=opensearch-cluster
- node.name=opensearch-node2
- discovery.seed_hosts=opensearch-node1,opensearch-node2
- cluster.initial_master_nodes=opensearch-node1,opensearch-node2
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- opensearch-data2:/usr/share/opensearch/data
networks:
- opensearch-net
opensearch-dashboards:
image: opensearchstaging/opensearch-dashboards:latest
container_name: opensearch-dashboards
ports:
- 5601:5601
expose:
- "5601"
environment:
OPENSEARCH_HOSTS: https://opensearch-node1:9200
networks:
- opensearch-net
volumes:
opensearch-data1:
opensearch-data2:
networks:
opensearch-net:
```
If you override `opensearch_dashboards.yml` settings using environment variables, as seen above, use all uppercase letters and periods in place of underscores (e.g. for `opensearch.url`, specify `OPENSEARCH_URL`).
{: .note}
## Configure OpenSearch
You can pass a custom `opensearch.yml` file to the Docker container using the [`-v` flag](https://docs.docker.com/engine/reference/commandline/run/#mount-volume--v---read-only) for `docker run`:
```bash
docker run \
-p 9200:9200 -p 9600:9600 \
-e "discovery.type=single-node" \
-v /<full-path-to>/custom-opensearch.yml:/usr/share/opensearch/config/opensearch.yml \
amazon/opensearch:{{site.opensearch_version}}
```
You can perform the same operation in `docker-compose.yml` using a relative path:
```yml
services:
opensearch-node1:
volumes:
- opensearch-data1:/usr/share/opensearch/data
- ./custom-opensearch.yml:/usr/share/opensearch/config/opensearch.yml
opensearch-node2:
volumes:
- opensearch-data2:/usr/share/opensearch/data
- ./custom-opensearch.yml:/usr/share/opensearch/config/opensearch.yml
opensearch-dashboards
volumes:
- ./custom-opensearch_dashboards.yml:/usr/share/opensearch-dashboards/config/opensearch_dashboards.yml
```
You can use this same method to [pass your own certificates](../docker-security/) to the containers for use with the [Security](../../security/configuration/) plugin.
### (Optional) Set up Performance Analyzer
1. Enable the Performance Analyzer plugin:
```bash
curl -XPOST localhost:9200/_opensearch/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}'
```
If you receive the `curl: (52) Empty reply from server` error, you are likely protecting your cluster with the security plugin and you need to provide credentials. Modify the following command to use your username and password:
```bash
curl -XPOST https://localhost:9200/_opensearch/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' -u 'admin:admin' -k
```
1. Enable the Root Cause Analyzer (RCA) framework
```bash
curl -XPOST localhost:9200/_opensearch/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}'
```
Similar to step 1, if you run into `curl: (52) Empty reply from server`, run the command below to enable RCA
```bash
curl -XPOST https://localhost:9200/_opensearch/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' -u 'admin:admin' -k
```
1. By default, Performance Analyzer's endpoints are not accessible from outside the Docker container.
To edit this behavior, open a shell session in the container and modify the configuration:
```bash
docker ps # Look up the container id
docker exec -it <container-id> /bin/bash
# Inside container
cd plugins/opensearch_performance_analyzer/pa_config/
vi performance-analyzer.properties
```
Uncomment the line `#webservice-bind-host` and set it to `0.0.0.0`:
```
# ======================== OpenSearch performance analyzer plugin config =========================
# NOTE: this is an example for Linux. Please modify the config accordingly if you are using it under other OS.
# WebService bind host; default to all interfaces
webservice-bind-host = 0.0.0.0
# Metrics data location
metrics-location = /dev/shm/performanceanalyzer/
# Metrics deletion interval (minutes) for metrics data.
# Interval should be between 1 to 60.
metrics-deletion-interval = 1
# If set to true, the system cleans up the files behind it. So at any point, we should expect only 2
# metrics-db-file-prefix-path files. If set to false, no files are cleaned up. This can be useful, if you are archiving
# the files and wouldn't like for them to be cleaned up.
cleanup-metrics-db-files = true
# WebService exposed by App's port
webservice-listener-port = 9600
# Metric DB File Prefix Path location
metrics-db-file-prefix-path = /tmp/metricsdb_
https-enabled = false
#Setup the correct path for certificates
certificate-file-path = specify_path
private-key-file-path = specify_path
# Plugin Stats Metadata file name, expected to be in the same location
plugin-stats-metadata = plugin-stats-metadata
# Agent Stats Metadata file name, expected to be in the same location
agent-stats-metadata = agent-stats-metadata
```
1. Then restart the Performance Analyzer agent:
```bash
kill $(ps aux | grep -i 'PerformanceAnalyzerApp' | grep -v grep | awk '{print $2}')
```
## Bash access to containers
To create an interactive Bash session in a container, run `docker ps` to find the container ID. Then run:
```bash
docker exec -it <container-id> /bin/bash
```
## Important settings
For production workloads, make sure the [Linux setting](https://www.kernel.org/doc/Documentation/sysctl/vm.txt) `vm.max_map_count` is set to at least 262144. On the OpenSearch Docker image, this setting is the default. To verify, start a Bash session in the container and run:
```bash
cat /proc/sys/vm/max_map_count
```
To increase this value, you have to modify the Docker image. For other install types, add this setting to the host machine's `/etc/sysctl.conf` file with the following line:
```
vm.max_map_count=262144
```
Then run `sudo sysctl -p` to reload.
The `docker-compose.yml` file above also contains several key settings: `bootstrap.memory_lock=true`, `ES_JAVA_OPTS=-Xms512m -Xmx512m`, `nofile 65536` and `port 9600`. Respectively, these settings disable memory swapping (along with `memlock`), set the size of the Java heap (we recommend half of system RAM), set a limit of 65536 open files for the OpenSearch user, and allow you to access Performance Analyzer on port 9600.
## Customize the Docker image
To run the image with a custom plugin, first create a [`Dockerfile`](https://docs.docker.com/engine/reference/builder/):
```
FROM amazon/opensearch:{{site.opensearch_version}}
RUN /usr/share/opensearch/bin/opensearch-plugin install --batch <plugin-name-or-url>
```
Then run the following commands:
```bash
docker build --tag=opensearch-custom-plugin .
docker run -p 9200:9200 -p 9600:9600 -v /usr/share/opensearch/data opensearch-custom-plugin
```
You can also use a `Dockerfile` to pass your own certificates for use with the [Security](../../security/) plugin, similar to the `-v` argument in [Configure OpenSearch](#configure-opensearch):
```
FROM amazon/opensearch:{{site.opensearch_version}}
COPY --chown=opensearch:opensearch opensearch.yml /usr/share/opensearch/config/
COPY --chown=opensearch:opensearch my-key-file.pem /usr/share/opensearch/config/
COPY --chown=opensearch:opensearch my-certificate-chain.pem /usr/share/opensearch/config/
COPY --chown=opensearch:opensearch my-root-cas.pem /usr/share/opensearch/config/
```
Alternately, you might want to remove a plugin. This `Dockerfile` removes the security plugin:
```
FROM amazon/opensearch:{{site.opensearch_version}}
RUN /usr/share/opensearch/bin/opensearch-plugin remove opensearch_security
COPY --chown=opensearch:opensearch opensearch.yml /usr/share/opensearch/config/
```
In this case, `opensearch.yml` is a "vanilla" version of the file with no OpenSearch entries. It might look like this:
```yml
cluster.name: "docker-cluster"
network.host: 0.0.0.0
```

10
docs/install/index.md Normal file
View File

@ -0,0 +1,10 @@
---
layout: default
title: Install and configure
nav_order: 3
has_children: true
---
# Install and configure OpenSearch
OpenSearch two installation options at this time: Docker images and tarballs.

257
docs/install/plugins.md Normal file
View File

@ -0,0 +1,257 @@
---
layout: default
title: OpenSearch plugin install
parent: Install and configure
nav_order: 90
---
# Standalone OpenSearch plugin installation
If you don't want to use the all-in-one OpenSearch installation options, you can install the individual plugins on a compatible OpenSearch cluster, just like any other plugin.
---
#### Table of contents
1. TOC
{:toc}
---
## Plugin compatibility
<table>
<thead style="text-align: left">
<tr>
<th>OpenSearch version</th>
<th>Plugin versions</th>
</tr>
</thead>
<tbody>
<tr>
<td>1.0.0-beta1</td>
<td>
<pre>opensearch-alerting 1.13.1.0
opensearch-anomaly-detection 1.13.0.0
opensearch-asynchronous-search 1.13.0.1
opensearch-index-management 1.13.2.0
opensearch-job-scheduler 1.13.0.0
opensearch-knn 1.13.0.0
opensearch-performance-analyzer 1.13.0.0
opensearch-reports-scheduler 1.13.0.0
opensearch-sql 1.13.2.0
opensearch_security 1.13.1.0
</pre>
</td>
</tr>
</tbody>
</table>
To install plugins manually, you must have the exact OSS version of OpenSearch installed (for example, 6.6.2 and not 6.6.1). To get a list of available OpenSearch versions on CentOS 7 and Amazon Linux 2, run the following command:
```bash
sudo yum list opensearch-oss --showduplicates
```
Then you can specify the version that you need:
```bash
sudo yum install opensearch-oss-6.7.1
```
## Install plugins
Navigate to the OpenSearch home directory (most likely, it is `/usr/share/opensearch`), and run the install command for each plugin.
### Security
```bash
sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-security/opensearch-security-{{site.opensearch_major_minor_version}}.1.0.zip
```
After installing the security plugin, you can run `sudo sh /usr/share/opensearch/plugins/opensearch_security/tools/install_demo_configuration.sh` to quickly get started with demo certificates. Otherwise, you must configure it manually and run [securityadmin.sh](../../security/configuration/security-admin/).
The security plugin has a corresponding [OpenSearch Dashboards plugin](../../opensearch-dashboards/plugins) that you probably want to install as well.
### Job scheduler
```bash
sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-job-scheduler/opensearch-job-scheduler-{{site.opensearch_major_minor_version}}.0.0.zip
```
### Alerting
```bash
sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-alerting/opensearch-alerting-{{site.opensearch_major_minor_version}}.1.0.zip
```
To install Alerting, you must first install the Job Scheduler plugin. Alerting has a corresponding [OpenSearch Dashboards plugin](../../opensearch-dashboards/plugins) that you probably want to install as well.
### SQL
```bash
sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-sql/opensearch-sql-{{site.opensearch_major_minor_version}}.2.0.zip
```
### Reports scheduler
```bash
sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-reports-scheduler/opensearch-reports-scheduler-{{site.opensearch_major_minor_version}}.0.0.zip
```
### Index State Management
```bash
sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-index-management/opensearch-index-management-{{site.opensearch_major_minor_version}}.2.0.zip
```
To install Index State Management, you must first install the Job Scheduler plugin. ISM has a corresponding [OpenSearch Dashboards plugin](../../opensearch-dashboards/plugins) that you probably want to install as well.
### k-NN
k-NN is only available as part of the all-in-one installs: Docker, RPM, and Debian.
### Anomaly detection
```bash
sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-anomaly-detection/opensearch-anomaly-detection-{{site.opensearch_major_minor_version}}.0.0.zip
```
### Asynchronous search
```bash
sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-asynchronous-search/opensearch-asynchronous-search-{{site.opensearch_major_minor_version}}.0.1.zip
```
### Performance Analyzer
```bash
sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/performance-analyzer/opensearch-performance-analyzer-{{site.opensearch_major_minor_version}}.0.0.zip
```
Performance Analyzer requires some manual configuration after installing the plugin:
1. Create `/usr/lib/systemd/system/opensearch-performance-analyzer.service` based on [this file](https://github.com/opensearch-project/performance-analyzer/blob/master/packaging/opensearch-performance-analyzer.service).
1. Make the CLI executable:
```bash
sudo chmod +x /usr/share/opensearch/bin/performance-analyzer-agent-cli
```
1. Run the appropriate `postinst` script for your Linux distribution:
```bash
# Debian-based distros
sudo sh /usr/share/opensearch/plugins/opensearch-performance-analyzer/install/deb/postinst.sh 1
# RPM distros
sudo sh /usr/share/opensearch/plugins/opensearch-performance-analyzer/install/rpm/postinst.sh 1
```
1. Make Performance Analyzer accessible outside of the host machine
```bash
cd /usr/share/opensearch # navigate to the OpenSearch home directory
cd plugins/opensearch_performance_analyzer/pa_config/
vi performance-analyzer.properties
```
Uncomment the line `#webservice-bind-host` and set it to `0.0.0.0`:
```bash
# ======================== OpenSearch performance analyzer plugin config =========================
# NOTE: this is an example for Linux. Please modify the config accordingly if you are using it under other OS.
# WebService bind host; default to all interfaces
webservice-bind-host = 0.0.0.0
# Metrics data location
metrics-location = /dev/shm/performanceanalyzer/
# Metrics deletion interval (minutes) for metrics data.
# Interval should be between 1 to 60.
metrics-deletion-interval = 1
# If set to true, the system cleans up the files behind it. So at any point, we should expect only 2
# metrics-db-file-prefix-path files. If set to false, no files are cleaned up. This can be useful, if you are archiving
# the files and wouldn't like for them to be cleaned up.
cleanup-metrics-db-files = true
# WebService exposed by App's port
webservice-listener-port = 9600
# Metric DB File Prefix Path location
metrics-db-file-prefix-path = /tmp/metricsdb_
https-enabled = false
#Setup the correct path for certificates
certificate-file-path = specify_path
private-key-file-path = specify_path
# Plugin Stats Metadata file name, expected to be in the same location
plugin-stats-metadata = plugin-stats-metadata
# Agent Stats Metadata file name, expected to be in the same location
agent-stats-metadata = agent-stats-metadata
```
1. Start the OpenSearch service:
```bash
sudo systemctl start opensearch.service
```
1. Send a test request:
```bash
curl -XGET "localhost:9600/_opensearch/_performanceanalyzer/metrics?metrics=Latency,CPU_Utilization&agg=avg,max&dim=ShardID&nodes=all"
```
## List installed plugins
To check your installed plugins:
```bash
sudo bin/opensearch-plugin list
```
## Remove plugins
If you are removing Performance Analyzer, see below. Otherwise, you can remove the plugin with a single command:
```bash
sudo bin/opensearch-plugin remove <plugin-name>
```
Then restart OpenSearch on the node:
```bash
sudo systemctl restart opensearch.service
```
## Update plugins
OpenSearch doesn't update plugins. Instead, you have to remove and reinstall them:
```bash
sudo bin/opensearch-plugin remove <plugin-name>
sudo bin/opensearch-plugin install <plugin-name>
```

178
docs/install/tar.md Normal file
View File

@ -0,0 +1,178 @@
---
layout: default
title: Tarball
parent: Install and configure
nav_order: 50
---
# Tarball
The tarball installation works on Linux systems and provides a self-contained directory with everything you need to run OpenSearch, including an integrated Java Development Kit (JDK). The tarball is a good option for testing and development.
The tarball supports CentOS 7, Amazon Linux 2, Ubuntu 18.04, and most other Linux distributions. If you have your own Java installation and you set `JAVA_HOME` in the terminal, macOS works as well.
1. Download the tarball:
```bash
# x64
curl https://d3g5vo6xdbdb9a.cloudfront.net/tarball/opensearch/opensearch-{{site.opensearch_version}}-linux-x64.tar.gz -o opensearch-{{site.opensearch_version}}-linux-x64.tar.gz
# ARM64
curl https://d3g5vo6xdbdb9a.cloudfront.net/tarball/opensearch/opensearch-{{site.opensearch_version}}-linux-arm64.tar.gz -o opensearch-{{site.opensearch_version}}-linux-arm64.tar.gz
```
1. Download the checksum:
```bash
# x86
curl https://d3g5vo6xdbdb9a.cloudfront.net/tarball/opensearch/opensearch-{{site.opensearch_version}}-linux-x64.tar.gz.sha512 -o opensearch-{{site.opensearch_version}}-linux-x64.tar.gz.sha512
# ARM64
curl https://d3g5vo6xdbdb9a.cloudfront.net/tarball/opensearch/opensearch-{{site.opensearch_version}}-linux-arm64.tar.gz.sha512 -o opensearch-{{site.opensearch_version}}-linux-arm64.tar.gz.sha512
```
1. Verify the tarball against the checksum:
```bash
# x64
shasum -a 512 -c opensearch-{{site.opensearch_version}}-linux-x64.tar.gz.sha512
# ARM64
shasum -a 512 -c opensearch-{{site.opensearch_version}}-linux-arm64.tar.gz.sha512
```
On CentOS, you might not have `shasum`. Install this package:
```bash
sudo yum install perl-Digest-SHA
```
Due to a [known issue](https://github.com/opensearch/opensearch-build/issues/81) with the checksum, this step might fail. You can still proceed with the installation.
1. Extract the TAR file to a directory and change to that directory:
```bash
# x64
tar -zxf opensearch-{{site.opensearch_version}}-linux-x64.tar.gz
cd opensearch-{{site.opensearch_version}}
# ARM64
tar -zxf opensearch-{{site.opensearch_version}}-linux-arm64.tar.gz
cd opensearch-{{site.opensearch_version}}
```
1. Run OpenSearch:
```bash
./opensearch-tar-install.sh
```
1. Open a second terminal session, and send requests to the server to verify that OpenSearch is up and running:
```bash
curl -XGET https://localhost:9200 -u 'admin:admin' --insecure
curl -XGET https://localhost:9200/_cat/plugins?v -u 'admin:admin' --insecure
```
## Configuration
You can modify `config/opensearch.yml` or specify environment variables as arguments using `-E`:
```bash
./opensearch-tar-install.sh -Ecluster.name=opensearch-cluster -Enode.name=opensearch-node1 -Ehttp.host=0.0.0.0 -Ediscovery.type=single-node
```
For other settings, see [Important settings](../docker/#important-settings).
### (Optional) Set up Performance Analyzer
In a tarball installation, Performance Analyzer collects data when it is enabled. But in order to read that data using the REST API on port 9600, you must first manually launch the associated reader agent process:
1. Make Performance Analyzer accessible outside of the host machine
```bash
cd /usr/share/opensearch # navigate to the OpenSearch home directory
cd plugins/opensearch_performance_analyzer/pa_config/
vi performance-analyzer.properties
```
Uncomment the line `#webservice-bind-host` and set it to `0.0.0.0`:
```
# ======================== OpenSearch performance analyzer plugin config =========================
# NOTE: this is an example for Linux. Please modify the config accordingly if you are using it under other OS.
# WebService bind host; default to all interfaces
webservice-bind-host = 0.0.0.0
# Metrics data location
metrics-location = /dev/shm/performanceanalyzer/
# Metrics deletion interval (minutes) for metrics data.
# Interval should be between 1 to 60.
metrics-deletion-interval = 1
# If set to true, the system cleans up the files behind it. So at any point, we should expect only 2
# metrics-db-file-prefix-path files. If set to false, no files are cleaned up. This can be useful, if you are archiving
# the files and wouldn't like for them to be cleaned up.
cleanup-metrics-db-files = true
# WebService exposed by App's port
webservice-listener-port = 9600
# Metric DB File Prefix Path location
metrics-db-file-prefix-path = /tmp/metricsdb_
https-enabled = false
#Setup the correct path for certificates
certificate-file-path = specify_path
private-key-file-path = specify_path
# Plugin Stats Metadata file name, expected to be in the same location
plugin-stats-metadata = plugin-stats-metadata
# Agent Stats Metadata file name, expected to be in the same location
agent-stats-metadata = agent-stats-metadata
```
1. Make the CLI executable:
```bash
sudo chmod +x ./bin/performance-analyzer-agent-cli
```
1. Launch the agent CLI:
```bash
ES_HOME="$PWD" ./bin/performance-analyzer-agent-cli
```
1. In a separate window, enable the Performance Analyzer plugin:
```bash
curl -XPOST localhost:9200/_opensearch/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}'
```
If you receive the `curl: (52) Empty reply from server` error, you are likely protecting your cluster with the security plugin and you need to provide credentials. Modify the following command to use your username and password:
```bash
curl -XPOST https://localhost:9200/_opensearch/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' -u 'admin:admin' -k
```
1. Finally, enable the Root Cause Analyzer (RCA) framework
```bash
curl -XPOST localhost:9200/_opensearch/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}'
```
Similar to step 4, if you run into `curl: (52) Empty reply from server`, run the command below to enable RCA
```bash
curl -XPOST https://localhost:9200/_opensearch/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' -u 'admin:admin' -k
```
### (Optional) Removing Performance Analyzer
See [Clean up Performance Analyzer files](../plugins/#optional-clean-up-performance-analyzer-files).

153
docs/knn/api.md Normal file
View File

@ -0,0 +1,153 @@
---
layout: default
title: API
nav_order: 4
parent: k-NN
has_children: false
---
# API
The k-NN plugin adds two API operations in order to allow users to better manage the plugin's functionality.
## Stats
The k-NN `stats` API provides information about the current status of the k-NN Plugin. The plugin keeps track of both cluster level and node level stats. Cluster level stats have a single value for the entire cluster. Node level stats have a single value for each node in the cluster. You can filter their query by nodeID and statName in the following way:
```
GET /_opensearch/_knn/nodeId1,nodeId2/stats/statName1,statName2
```
Statistic | Description
:--- | :---
`circuit_breaker_triggered` | Indicates whether the circuit breaker is triggered. This statistic is only relevant to approximate k-NN search.
`total_load_time` | The time in nanoseconds that KNN has taken to load graphs into the cache. This statistic is only relevant to approximate k-NN search.
`eviction_count` | The number of graphs that have been evicted from the cache due to memory constraints or idle time. Note: Explicit evictions that occur because of index deletion are not counted. This statistic is only relevant to approximate k-NN search.
`hit_count` | The number of cache hits. A cache hit occurs when a user queries a graph and it is already loaded into memory. This statistic is only relevant to approximate k-NN search.
`miss_count` | The number of cache misses. A cache miss occurs when a user queries a graph and it has not yet been loaded into memory. This statistic is only relevant to approximate k-NN search.
`graph_memory_usage` | Current cache size (total size of all graphs in memory) in kilobytes. This statistic is only relevant to approximate k-NN search.
`graph_memory_usage_percentage` | The current weight of the cache as a percentage of the maximum cache capacity.
`graph_index_requests` | The number of requests to add the knn_vector field of a document into a graph.
`graph_index_errors` | The number of requests to add the knn_vector field of a document into a graph that have produced an error.
`graph_query_requests` | The number of graph queries that have been made.
`graph_query_errors` | The number of graph queries that have produced an error.
`knn_query_requests` | The number of KNN query requests received.
`cache_capacity_reached` | Whether `knn.memory.circuit_breaker.limit` has been reached. This statistic is only relevant to approximate k-NN search.
`load_success_count` | The number of times KNN successfully loaded a graph into the cache. This statistic is only relevant to approximate k-NN search.
`load_exception_count` | The number of times an exception occurred when trying to load a graph into the cache. This statistic is only relevant to approximate k-NN search.
`indices_in_cache` | For each index that has graphs in the cache, this stat provides the number of graphs that index has and the total graph_memory_usage that index is using in Kilobytes.
`script_compilations` | The number of times the KNN script has been compiled. This value should usually be 1 or 0, but if the cache containing the compiled scripts is filled, the KNN script might be recompiled. This statistic is only relevant to k-NN score script search.
`script_compilation_errors` | The number of errors during script compilation. This statistic is only relevant to k-NN score script search.
`script_query_requests` | The total number of script queries. This statistic is only relevant to k-NN score script search.
`script_query_errors` | The number of errors during script queries. This statistic is only relevant to k-NN score script search.
### Usage
```json
GET /_opensearch/_knn/stats?pretty
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "_run",
"circuit_breaker_triggered" : false,
"nodes" : {
"HYMrXXsBSamUkcAjhjeN0w" : {
"eviction_count" : 0,
"miss_count" : 1,
"graph_memory_usage" : 1,
"graph_memory_usage_percentage" : 3.68,
"graph_index_requests" : 7,
"graph_index_errors" : 1,
"knn_query_requests" : 4,
"graph_query_requests" : 30,
"graph_query_errors" : 15,
"indices_in_cache" : {
"myindex" : {
"graph_memory_usage" : 2,
"graph_memory_usage_percentage" : 3.68,
"graph_count" : 2
}
},
"cache_capacity_reached" : false,
"load_exception_count" : 0,
"hit_count" : 0,
"load_success_count" : 1,
"total_load_time" : 2878745,
"script_compilations" : 1,
"script_compilation_errors" : 0,
"script_query_requests" : 534,
"script_query_errors" : 0
}
}
}
```
```json
GET /_opensearch/_knn/HYMrXXsBSamUkcAjhjeN0w/stats/circuit_breaker_triggered,graph_memory_usage?pretty
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "_run",
"circuit_breaker_triggered" : false,
"nodes" : {
"HYMrXXsBSamUkcAjhjeN0w" : {
"graph_memory_usage" : 1
}
}
}
```
## Warmup operation
The Hierarchical Navigable Small World (HNSW) graphs that are used to perform an approximate k-Nearest Neighbor (k-NN) search are stored as `.hnsw` files with other Apache Lucene segment files. In order for you to perform a search on these graphs using the k-NN plugin, these files need to be loaded into native memory.
If the plugin has not loaded the graphs into native memory, it loads them when it receives a search request. This loading time can cause high latency during initial queries. To avoid this situation, users often run random queries during a warmup period. After this warmup period, the graphs are loaded into native memory and their production workloads can begin. This loading process is indirect and requires extra effort.
As an alternative, you can avoid this latency issue by running the k-NN plugin warmup API operation on whatever indices you're interested in searching. This operation loads all the graphs for all of the shards (primaries and replicas) of all the indices specified in the request into native memory.
After the process finishes, you can start searching against the indices with no initial latency penalties. The warmup API operation is idempotent, so if a segment's graphs are already loaded into memory, this operation has no impact on those graphs. It only loads graphs that aren't currently in memory.
### Usage
This request performs a warmup on three indices:
```json
GET /_opensearch/_knn/warmup/index1,index2,index3?pretty
{
"_shards" : {
"total" : 6,
"successful" : 6,
"failed" : 0
}
}
```
`total` indicates how many shards the k-NN plugin attempted to warm up. The response also includes the number of shards the plugin succeeded and failed to warm up.
The call does not return until the warmup operation is complete or the request times out. If the request times out, the operation still continues on the cluster. To monitor the warmup operation, use the OpenSearch `_tasks` API:
```json
GET /_tasks
```
After the operation has finished, use the [k-NN `_stats` API operation](#Stats) to see what the k-NN plugin loaded into the graph.
### Best practices
For the warmup operation to function properly, follow these best practices.
First, don't run merge operations on indices that you want to warm up. During merge, the k-NN plugin creates new segments, and old segments are (sometimes) deleted. For example, you could encounter a situation in which the warmup API operation loads graphs A and B into native memory, but segment C is created from segments A and B being merged. The graphs for A and B would no longer be in memory, and graph C would also not be in memory. In this case, the initial penalty for loading graph C is still present.
Second, confirm that all graphs you want to warm up can fit into native memory. For more information about the native memory limit, see the [knn.memory.circuit_breaker.limit statistic](../settings/#cluster-settings). High graph memory usage causes cache thrashing, which can lead to operations constantly failing and attempting to run again.
Finally, don't index any documents that you want to load into the cache. Writing new information to segments prevents the warmup API operation from loading the graphs until they're searchable. This means that you would have to run the warmup operation again after indexing finishes.

162
docs/knn/approximate-knn.md Normal file
View File

@ -0,0 +1,162 @@
---
layout: default
title: Approximate Search
nav_order: 1
parent: k-NN
has_children: false
has_math: true
---
# Approximate k-NN Search
The approximate k-NN method uses [nmslib's](https://github.com/nmslib/nmslib/) implementation of the HNSW algorithm to power k-NN search. In this case, approximate means that for a given search, the neighbors returned are an estimate of the true k-nearest neighbors. Of the three methods, this method offers the best search scalability for large data sets. Generally speaking, once the data set gets into the hundreds of thousands of vectors, this approach should be preferred.
This plugin builds an HNSW graph of the vectors for each "knn-vector field"/ "Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about Lucene segments, please refer to [Apache Lucene's documentation](https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). These graphs are loaded into native memory during search and managed by a cache. To learn more about pre-loading graphs into memory, refer to the [warmup API](../api#warmup). Additionally, you can see what graphs are already loaded in memory, which you can learn more about in the [stats API section](../api#stats).
Because the graphs are constructed during indexing, it is not possible to apply a filter on an index and then use this search method. All filters will be applied on the results produced by the approximate nearest neighbor search.
## Get started with approximate k-NN
To use the k-NN plugin's approximate search functionality, you must first create a k-NN index with setting `index.knn` to `true`. This setting tells the plugin to create HNSW graphs for the index.
Additionally, if you are using the approximate k-nearest neighbor method, you should specify `knn.space_type` to the space that you are interested in. This setting cannot be changed after it is set. To see what spaces we support, please refer to the [spaces section](#spaces). By default, `index.knn.space_type` is `l2`. For more information on index settings, such as algorithm parameters that can be tweaked to tune performance, please refer to the [documentation](../settings#index-settings).
Next, you must add one or more fields of the `knn_vector` data type. Here is an example that creates an index with two `knn_vector` fields and uses cosine similarity:
```json
PUT my-knn-index-1
{
"settings": {
"index": {
"knn": true,
"knn.space_type": "cosinesimil"
}
},
"mappings": {
"properties": {
"my_vector1": {
"type": "knn_vector",
"dimension": 2
},
"my_vector2": {
"type": "knn_vector",
"dimension": 4
}
}
}
}
```
The `knn_vector` data type supports a vector of floats that can have a dimension of up to 10,000, as set by the dimension mapping parameter.
In OpenSearch, codecs handle the storage and retrieval of indices. The k-NN plugin uses a custom codec to write vector data to graphs so that the underlying k-NN search library can read it.
{: .tip }
After you create the index, you can add some data to it:
```json
POST _bulk
{ "index": { "_index": "my-knn-index-1", "_id": "1" } }
{ "my_vector1": [1.5, 2.5], "price": 12.2 }
{ "index": { "_index": "my-knn-index-1", "_id": "2" } }
{ "my_vector1": [2.5, 3.5], "price": 7.1 }
{ "index": { "_index": "my-knn-index-1", "_id": "3" } }
{ "my_vector1": [3.5, 4.5], "price": 12.9 }
{ "index": { "_index": "my-knn-index-1", "_id": "4" } }
{ "my_vector1": [5.5, 6.5], "price": 1.2 }
{ "index": { "_index": "my-knn-index-1", "_id": "5" } }
{ "my_vector1": [4.5, 5.5], "price": 3.7 }
{ "index": { "_index": "my-knn-index-1", "_id": "6" } }
{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 10.3 }
{ "index": { "_index": "my-knn-index-1", "_id": "7" } }
{ "my_vector2": [2.5, 3.5, 5.6, 6.7], "price": 5.5 }
{ "index": { "_index": "my-knn-index-1", "_id": "8" } }
{ "my_vector2": [4.5, 5.5, 6.7, 3.7], "price": 4.4 }
{ "index": { "_index": "my-knn-index-1", "_id": "9" } }
{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 8.9 }
```
Then you can execute an approximate nearest neighbor search on the data using the `knn` query type:
```json
GET my-knn-index-1/_search
{
"size": 2,
"query": {
"knn": {
"my_vector2": {
"vector": [2, 3, 5, 6],
"k": 2
}
}
}
}
```
`k` is the number of neighbors the search of each graph will return. You must also include the `size` option. This option indicates how many results the query actually returns. The plugin returns `k` amount of results for each shard (and each segment) and `size` amount of results for the entire query. The plugin supports a maximum `k` value of 10,000.
### Using approximate k-NN with filters
If you use the `knn` query alongside filters or other clauses (e.g. `bool`, `must`, `match`), you might receive fewer than `k` results. In this example, `post_filter` reduces the number of results from 2 to 1:
```json
GET my-knn-index-1/_search
{
"size": 2,
"query": {
"knn": {
"my_vector2": {
"vector": [2, 3, 5, 6],
"k": 2
}
}
},
"post_filter": {
"range": {
"price": {
"gte": 5,
"lte": 10
}
}
}
}
```
## Spaces
A space corresponds to the function used to measure the distance between 2 points in order to determine the k-nearest neighbors. From the k-NN perspective, a lower score equates to a closer and better result. This is the opposite of how OpenSearch scores results, where a greater score equates to a better result. To convert distances to OpenSearch scores, we take 1 / (1 + distance). Currently, the k-NN plugin supports the following spaces:
<table>
<thead style="text-align: left">
<tr>
<th>spaceType</th>
<th>Distance Function</th>
<th>OpenSearch Score</th>
</tr>
</thead>
<tr>
<td>l2</td>
<td>\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i)^2 \]</td>
<td>1 / (1 + Distance Function)</td>
</tr>
<tr>
<td>l1</td>
<td>\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i) \]</td>
<td>1 / (1 + Distance Function)</td>
</tr>
<tr>
<td>cosinesimil</td>
<td>\[ 1 - {A &middot; B \over \|A\| &middot; \|B\|} = 1 -
{\sum_{i=1}^n (A_i &middot; B_i) \over \sqrt{\sum_{i=1}^n A_i^2} &middot; \sqrt{\sum_{i=1}^n B_i^2}}\]
where \(\|A\|\) and \(\|B\|\) represent normalized vectors.</td>
<td>1 / (1 + Distance Function)</td>
</tr>
<tr>
<td>hammingbit</td>
<td style="text-align:center">Distance = countSetBits(X \(\oplus\) Y)</td>
<td>1 / (1 + Distance Function)</td>
</tr>
</table>
The cosine similarity formula does not include the `1 - ` prefix. However, because nmslib equates smaller scores with closer results, they return `1 - cosineSimilarity` for their cosine similarity space---that's why `1 - ` is included in the distance function.
{: .note }

42
docs/knn/index.md Normal file
View File

@ -0,0 +1,42 @@
---
layout: default
title: k-NN
nav_order: 50
has_children: true
has_toc: false
---
# k-NN
Short for *k-nearest neighbors*, the k-NN plugin enables users to search for the k-nearest neighbors to a query point across an index of vectors. To determine the neighbors, you can specify the space (the distance function) you want to use to measure the distance between points.
Use cases include recommendations (for example, an "other songs you might like" feature in a music application), image recognition, and fraud detection. For more background information on k-NN search, see [Wikipedia](https://en.wikipedia.org/wiki/Nearest_neighbor_search).
This plugin supports three different methods for obtaining the k-nearest neighbors from an index of vectors:
1. **Approximate k-NN**
The first method takes an approximate nearest neighbor approach; it uses the HNSW algorithm to return the approximate k-nearest neighbors to a query vector. This algorithm sacrifices indexing speed and search accuracy in return for lower latency and more scalable search. To learn more about the algorithm, please refer to [nmslib's documentation](https://github.com/nmslib/nmslib/) or [the paper introducing the algorithm](https://arxiv.org/abs/1603.09320).
Approximate k-NN is the best choice for searches over large indices (i.e. hundreds of thousands of vectors or more) that require low latency. You should not use approximate k-NN if you want to apply a filter on the index before the k-NN search, which greatly reduces the number of vectors to be searched. In this case, you should use either the script scoring method or painless extensions.
For more details about this method, refer to the [Approximate k-NN section](approximate-knn).
2. **Script Score k-NN**
The second method extends OpenSearch's script scoring functionality to execute a brute force, exact k-NN search over "knn_vector" fields or fields that can represent binary objects. With this approach, you can run k-NN search on a subset of vectors in your index (sometimes referred to as a pre-filter search).
This approach should be used for searches over smaller bodies of documents or when a pre-filter is needed. Using this approach on large indices may lead to high latencies.
For more details about this method, refer to the [k-NN Script Score section](knn-score-script).
3. **Painless extensions**
The third method adds the distance functions as painless extensions that you can use in more complex combinations. Similar to the k-NN Script Score, you can use this method to perform a brute force, exact k-NN search across an index, which also supports pre-filtering.
This approach has slightly slower query performance compared to the k-NN Script Score. If your use case requires more customization over the final score, you should use this approach over Script Score k-NN.
For more details about this method, refer to the [painless functions section](painless-functions).
Overall, for larger data sets, you should generally choose the approximate nearest neighbor method because it scales significantly better. For smaller data sets, where you may want to apply a filter, you should choose the custom scoring approach. If you have a more complex use case where you need to use a distance function as part of their scoring method, you should use the painless scripting approach.

10
docs/knn/jni-library.md Normal file
View File

@ -0,0 +1,10 @@
---
layout: default
title: JNI Library
nav_order: 5
parent: k-NN
has_children: false
---
# JNI Library
In order to integrate [nmslib's](https://github.com/nmslib/nmslib/) approximate k-NN functionality, which is implemented in C++, into the k-NN plugin, which is implemented in Java, we created a Java Native Interface library. Check out [this wiki](https://en.wikipedia.org/wiki/Java_Native_Interface) to learn more about JNI. This library allows the k-NN plugin to leverage nmslib's functionality. For more information about how we build the JNI library binary and how to get the most of it in your production environment, see [here](https://github.com/opensearch-project/k-NN#jni-library-artifacts).

View File

@ -0,0 +1,212 @@
---
layout: default
title: Exact k-NN with Scoring Script
nav_order: 2
parent: k-NN
has_children: false
has_math: true
---
# Exact k-NN with Scoring Script
The k-NN plugin implements the OpenSearch score script plugin that you can use to find the exact k-nearest neighbors to a given query point. Using the k-NN score script, you can apply a filter on an index before executing the nearest neighbor search. This is useful for dynamic search cases where the index body may vary based on other conditions. Because this approach executes a brute force search, it does not scale as well as the [Approximate approach](../approximate-knn). In some cases, it may be better to think about refactoring your workflow or index structure to use the Approximate approach instead of this approach.
## Getting started with the score script
Similar to approximate nearest neighbor search, in order to use the score script on a body of vectors, you must first create an index with one or more `knn_vector` fields. If you intend to just use the script score approach (and not the approximate approach) `index.knn` can be set to `false` and `index.knn.space_type` does not need to be set. The space type can be chosen during search. See the [spaces section](#spaces) to see what spaces the k-NN score script suppports. Here is an example that creates an index with two `knn_vector` fields:
```json
PUT my-knn-index-1
{
"mappings": {
"properties": {
"my_vector1": {
"type": "knn_vector",
"dimension": 2
},
"my_vector2": {
"type": "knn_vector",
"dimension": 4
}
}
}
}
```
*Note* -- For binary spaces, such as the Hamming bit space, `type` needs to be either `binary` or `long`. The binary data then needs to be encoded either as a base64 string or as a long (if the data is 64 bits or less).
If you *only* want to use the score script, you can omit `"index.knn": true`. The benefit of this approach is faster indexing speed and lower memory usage, but you lose the ability to perform standard k-NN queries on the index.
{: .tip}
After you create the index, you can add some data to it:
```json
POST _bulk
{ "index": { "_index": "my-knn-index-1", "_id": "1" } }
{ "my_vector1": [1.5, 2.5], "price": 12.2 }
{ "index": { "_index": "my-knn-index-1", "_id": "2" } }
{ "my_vector1": [2.5, 3.5], "price": 7.1 }
{ "index": { "_index": "my-knn-index-1", "_id": "3" } }
{ "my_vector1": [3.5, 4.5], "price": 12.9 }
{ "index": { "_index": "my-knn-index-1", "_id": "4" } }
{ "my_vector1": [5.5, 6.5], "price": 1.2 }
{ "index": { "_index": "my-knn-index-1", "_id": "5" } }
{ "my_vector1": [4.5, 5.5], "price": 3.7 }
{ "index": { "_index": "my-knn-index-1", "_id": "6" } }
{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 10.3 }
{ "index": { "_index": "my-knn-index-1", "_id": "7" } }
{ "my_vector2": [2.5, 3.5, 5.6, 6.7], "price": 5.5 }
{ "index": { "_index": "my-knn-index-1", "_id": "8" } }
{ "my_vector2": [4.5, 5.5, 6.7, 3.7], "price": 4.4 }
{ "index": { "_index": "my-knn-index-1", "_id": "9" } }
{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 8.9 }
```
Finally, you can execute an exact nearest neighbor search on the data using the `knn` script:
```json
GET my-knn-index-1/_search
{
"size": 4,
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "knn_score",
"lang": "knn",
"params": {
"field": "my_vector2",
"query_value": [2.0, 3.0, 5.0, 6.0],
"space_type": "cosinesimil"
}
}
}
}
}
```
All parameters are required.
- `lang` is the script type. This value is usually `painless`, but here you must specify `knn`.
- `source` is the name of the script, `knn_score`.
This script is part of the k-NN plugin and isn't available at the standard `_scripts` path. A GET request to `_cluster/state/metadata` doesn't return it, either.
- `field` is the field that contains your vector data.
- `query_value` is the point you want to find the nearest neighbors for. For the Euclidean and cosine similarity spaces, the value must be an array of floats that matches the dimension set in the field's mapping. For Hamming bit distance, this value can be either of type signed long or a base64-encoded string (for the long and binary field types, respectively).
- `space_type` corresponds to the distance function. See the [spaces section](#spaces).
*Note* -- In later versions of the k-NN plugin, `vector` was replaced by `query_value` due to the addition of the `bithamming` space.
The [post filter example in the approximate approach](../approximate-knn/#using-approximate-k-nn-with-filters) shows a search that returns fewer than `k` results. If you want to avoid this situation, the score script method lets you essentially invert the order of events. In other words, you can filter down the set of documents you want to execute the k-nearest neighbor search over.
This example shows a pre-filter approach to k-NN search with the score script approach. First, create the index:
```json
PUT my-knn-index-2
{
"mappings": {
"properties": {
"my_vector": {
"type": "knn_vector",
"dimension": 2
},
"color": {
"type": "keyword"
}
}
}
}
```
Then add some documents:
```json
POST _bulk
{ "index": { "_index": "my-knn-index-2", "_id": "1" } }
{ "my_vector": [1, 1], "color" : "RED" }
{ "index": { "_index": "my-knn-index-2", "_id": "2" } }
{ "my_vector": [2, 2], "color" : "RED" }
{ "index": { "_index": "my-knn-index-2", "_id": "3" } }
{ "my_vector": [3, 3], "color" : "RED" }
{ "index": { "_index": "my-knn-index-2", "_id": "4" } }
{ "my_vector": [10, 10], "color" : "BLUE" }
{ "index": { "_index": "my-knn-index-2", "_id": "5" } }
{ "my_vector": [20, 20], "color" : "BLUE" }
{ "index": { "_index": "my-knn-index-2", "_id": "6" } }
{ "my_vector": [30, 30], "color" : "BLUE" }
```
Finally, use the `script_score` query to pre-filter your documents before identifying nearest neighbors:
```json
GET my-knn-index-2/_search
{
"size": 2,
"query": {
"script_score": {
"query": {
"bool": {
"filter": {
"term": {
"color": "BLUE"
}
}
}
},
"script": {
"lang": "knn",
"source": "knn_score",
"params": {
"field": "my_vector",
"query_value": [9.9, 9.9],
"space_type": "l2"
}
}
}
}
}
```
## Spaces
A space corresponds to the function used to measure the distance between 2 points in order to determine the k-nearest neighbors. From the k-NN perspective, a lower score equates to a closer and better result. This is the opposite of how OpenSearch scores results, where a greater score equates to a better result. We include the conversions to OpenSearch scores in the table below:
<table>
<thead style="text-align: left">
<tr>
<th>spaceType</th>
<th>Distance Function</th>
<th>OpenSearch Score</th>
</tr>
</thead>
<tr>
<td>l2</td>
<td>\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i)^2 \]</td>
<td>1 / (1 + Distance Function)</td>
</tr>
<tr>
<td>l1</td>
<td>\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i) \]</td>
<td>1 / (1 + Distance Function)</td>
</tr>
<tr>
<td>cosinesimil</td>
<td>\[ {A &middot; B \over \|A\| &middot; \|B\|} =
{\sum_{i=1}^n (A_i &middot; B_i) \over \sqrt{\sum_{i=1}^n A_i^2} &middot; \sqrt{\sum_{i=1}^n B_i^2}}\]
where \(\|A\|\) and \(\|B\|\) represent normalized vectors.</td>
<td>1 + Distance Function</td>
</tr>
<tr>
<td>hammingbit</td>
<td style="text-align:center">Distance = countSetBits(X \(\oplus\) Y)</td>
<td> 1 / (1 + Distance Function)</td>
</tr>
</table>
Cosine similarity returns a number between -1 and 1, and because OpenSearch relevance scores can't be below 0, the k-NN plugin adds 1 to get the final score.

View File

@ -0,0 +1,83 @@
---
layout: default
title: k-NN Painless Extensions
nav_order: 3
parent: k-NN
has_children: false
has_math: true
---
# Painless Scripting Functions
With the k-NN plugin's Painless Scripting extensions, you can use k-NN distance functions directly in your Painless scripts to perform operations on `knn_vector` fields. Painless has a strict list of allowed functions and classes per context to ensure its scripts are secure. The k-NN plugin adds Painless Scripting extensions to a few of the distance functions used in [k-NN score script](../knn-score-script), so you can utilize them when you need more customization with respect to your k-NN workload.
## Get started with k-NN's Painless Scripting functions
To use k-NN's Painless Scripting functions, first, you must create an index with `knn_vector` fields like in [k-NN score script](../knn-score-script#Getting-started-with-the-score-script). Once the index is created and you have ingested some data, you can use the painless extensions:
```json
GET my-knn-index-2/_search
{
"size": 2,
"query": {
"script_score": {
"query": {
"bool": {
"filter": {
"term": {
"color": "BLUE"
}
}
}
},
"script": {
"source": "1.0 + cosineSimilarity(params.query_value, doc[params.field])",
"params": {
"field": "my_vector",
"query_value": [9.9, 9.9]
}
}
}
}
}
```
`field` needs to map to a `knn_vector` field, and `query_value` needs to be a floating point array with the same dimension as `field`.
## Function types
The following table contains the available painless functions the k-NN plugin provides:
<table>
<thead style="text-align: left">
<tr>
<th>Function Name</th>
<th>Function Signature</th>
<th>Description</th>
</tr>
</thead>
<tr>
<td>l2Squared</td>
<td><code>float l2Squared (float[] queryVector, doc['vector field'])</code></td>
<td>This function calculates the square of the L2 distance (Euclidean distance) between a given query vector and document vectors. The shorter the distance, the more relevant the document is, so this example inverts the return value of the l2Squared function. If the document vector matches the query vector, the result is 0, so this example also adds 1 to the distance to avoid divide by zero errors.</td>
</tr>
<tr>
<td>l1Norm</td>
<td><code>float l1Norm (float[] queryVector, doc['vector field'])</code></td>
<td>This function calculates the L1 Norm distance (Manhattan distance) between a given query vector and document vectors.</td>
</tr>
<tr>
<td>cosineSimilarity</td>
<td><code>float cosineSimilarity (float[] queryVector, doc['vector field'])</code></td>
<td>Cosine similarity is an inner product of the query vector and document vector normalized to both have length 1. If magnitude of the query vector does not change throughout the query, users can pass the magnitude of the query vector to improve the performance, instead of calculating the magnitude every time for every filtered document: <code>float cosineSimilarity (float[] queryVector, doc['vector field'], float normQueryVector)</code>. In general, range of cosine similarity is [-1, 1], but in the case of information retrieval, the cosine similarity of two documents will range from 0 to 1 because tf-idf cannot be negative. Hence, the k-NN plugin adds 1.0 to always yield a positive cosine similarity score. </td>
</tr>
</table>
## Constraints
1. If a documents `knn_vector` field has different dimensions than the query, the function throws an `IllegalArgumentException`.
2. If a vector field doesn't have a value, the function throws an IllegalStateException.
You can avoid this situation by first checking if a document has a value in its field:
```
"source": "doc[params.field].size() == 0 ? 0 : 1 / (1 + l2Squared(params.query_value, doc[params.field]))",
```
Because scores can only be positive, this script ranks documents with vector fields higher than those without.

View File

@ -0,0 +1,104 @@
---
layout: default
title: Performance Tuning
parent: k-NN
nav_order: 7
---
# Performance tuning
This section provides recommendations for performance tuning to improve indexing/search performance for approximate k-NN. From a high level, k-NN works according to these principles:
* Graphs are created per knn_vector field / (Lucene) segment pair.
* Queries execute on segments sequentially inside the shard (same as any other OpenSearch query).
* Each graph in the segment returns <=k neighbors.
* Coordinator node picks up final size number of neighbors from the neighbors returned by each shard.
Additionally, this section provides recommendations for comparing approximate k-NN to exact k-NN with score script.
## Indexing performance tuning
The following steps can be taken to help improve indexing performance, especially when you plan to index a large number of vectors at once:
1. Disable refresh interval (Default = 1 sec) or set a long duration for refresh interval to avoid creating multiple small segments
```json
PUT /<index_name>/_settings
{
"index" : {
"refresh_interval" : "-1"
}
}
```
*Note* -- Be sure to reenable refresh_interval after indexing finishes.
2. Disable Replicas (No OpenSearch replica shard).
Settings replicas to 0 avoids duplicate construction of graphs in both primary and replicas. When we enable replicas after the indexing, the serialized graphs are directly copied. Having no replicas means that losing a node(s) may incur data loss, so it is important that the data lives elsewhere so that this initial load can be retried in case of an issue.
3. Increase number of indexing threads
If the hardware we choose has multiple cores, we can allow multiple threads in graph construction by speeding up the indexing process. You can determine the number of threads to be allotted by using the [knn.algo_param.index_thread_qty](../settings/#Cluster-settings) setting.
Please keep an eye on CPU utilization and choose the right number of threads. Because graph construction is costly, having multiple threads can put additional load on CPU.
## Search performance tuning
1. Have fewer segments
To improve search performance, it is necessary to keep the number of segments under control. Lucene's IndexSearcher searches over all of the segments in a shard to find the 'size' best results. But, because the complexity of search for the HNSW algorithm is logarithmic with respect to the number of vectors, searching over 5 graphs with 100 vectors each and then taking the top size results from 5*k results will take longer than searching over 1 graph with 500 vectors and then taking the top size results from k results. Ideally, having 1 segment per shard will give the optimal performance with respect to search latency. We can configure index to have multiple shards to avoid giant shards and achieve more parallelism.
We can control the number of segments either during indexing by asking OpenSearch to slow down segment creation by disabling the refresh interval or choosing larger refresh interval.
2. Warm up the index
The graphs are constructed during indexing, but they are loaded into memory during the first search. The way search works in Lucene is that each segment is searched sequentially (so, for k-NN, each segment returns up to k nearest neighbors of the query point), and the top size number of results based on the score would be returned from all of the results returned by segements at a shard level (higher score --> better result).
Once a graph is loaded (graphs are loaded outside OpenSearch JVM), we cache the graphs in memory. The initial queries would be expensive in the order of a few seconds, and subsequent queries should be faster in the order of milliseconds (assuming knn circuit breaker is not hit).
To avoid this latency penalty during your first queries, you can use the warmup API operation on the indices they want to search.
### Usage
```json
GET /_opensearch/_knn/warmup/index1,index2,index3?pretty
{
"_shards" : {
"total" : 6,
"successful" : 6,
"failed" : 0
}
}
```
The warmup API operation loads all of the graphs for all of the shards (primaries and replicas) for the specified indices into the cache. Thus, there will be no penalty to load graphs during initial searches.
*Note* - This API only loads the segments of the indices it sees into the cache. If a merge or refresh operation finishes after this API is ran or if new documents are added, this API will need to be re-ran to load those graphs into memory.
3. Avoid reading stored fields
If the use case is to just read the nearest neighbors' Ids and scores, then we can disable reading stored fields, which can save some time retrieving the vectors from stored fields.
## Improving Recall
Recall depends on multiple factors like number of vectors, number of dimensions, segments, etc. Searching over a large number of small segments and aggregating the results leads to better recall than searching over a small number of large segments and aggregating results. The larger the graph, the more chances of losing recall if you are sticking with smaller algorithm parameters. Choosing larger values for algorithm parameters should help solve this issue but sacrifices search latency and indexing time. That being said, it is important to understand your system's requirements for latency and accuracy, and then choose the number of segments you want your index to have based on experimentation.
Recall can be configured by adjusting the algorithm parameters of the HNSW algorithm exposed through index settings. Algorithm params that control recall are m, ef_construction, ef_search. For more details on influence of algorithm parameters on the indexing and search recall, please refer to the [HNSW algorithm parameters document](https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md). Increasing these values could help recall (leading to better search results) but at the cost of higher memory utilization and increased indexing time. Our default values work on a broader set of use cases from our experiments, but we encourage users to run their own experiments on their data sets and choose the appropriate values. For index-level settings, please refer to the [settings page](../settings#index-settings). We will add details on our experiments here shortly.
## Estimating Memory Usage
Typically, in an OpenSearch cluster, a certain portion of RAM is set aside for the JVM heap. The k-NN plugin allocates graphs to a portion of the remaining RAM. This portion's size is determined by the circuit_breaker_limit cluster setting. By default, the circuit breaker limit is set at 50%.
The memory required for graphs is estimated to be `1.1 * (4 * dimension + 8 * M)` bytes/vector.
As an example, assume that we have 1 Million vectors with a dimension of 256 and M of 16, and the memory required can be estimated as:
```
1.1 * (4 *256 + 8 * 16) * 1,000,000 ~= 1.26 GB
```
*Note* -- Remember that having a replica will double the total number of vectors.
## Approximate nearest neighbor vs. score script
The standard k-NN query and custom scoring option perform differently. Test with a representative set of documents to see if the search results and latencies match your expectations.
Custom scoring works best if the initial filter reduces the number of documents to no more than 20,000. Increasing shard count can improve latencies, but be sure to keep shard size within [the recommended guidelines](../../opensearch/#primary-and-replica-shards).

36
docs/knn/settings.md Normal file
View File

@ -0,0 +1,36 @@
---
layout: default
title: Settings
parent: k-NN
nav_order: 6
---
# k-NN Settings
The k-NN plugin adds several new index and cluster settings.
## Index settings
The default values should work well for most use cases, but you can change these settings when you create the index.
Setting | Default | Description
:--- | :--- | :---
`index.knn.algo_param.ef_search` | 512 | The size of the dynamic list used during KNN searches. Higher values lead to more accurate, but slower searches.
`index.knn.algo_param.ef_construction` | 512 | The size of the dynamic list used during KNN graph creation. Higher values lead to a more accurate graph, but slower indexing speed.
`index.knn.algo_param.m` | 16 | The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2-100.
`index.knn.space_type` | "l2" | The vector space used to calculate the distance between vectors. Currently, the KNN plugin supports the `l2` space (Euclidean distance) and `cosinesimil` space (cosine similarity). For more information on these spaces, refer to the [nmslib documentation](https://github.com/nmslib/nmslib/blob/master/manual/spaces.md).
## Cluster settings
Setting | Default | Description
:--- | :--- | :---
`knn.algo_param.index_thread_qty` | 1 | The number of threads used for graph creation. Keeping this value low reduces the CPU impact of the KNN plugin, but also reduces indexing performance.
`knn.cache.item.expiry.enabled` | false | Whether to remove graphs that have not been accessed for a certain duration from memory.
`knn.cache.item.expiry.minutes` | 3h | If enabled, the idle time before removing a graph from memory.
`knn.circuit_breaker.unset.percentage` | 75.0 | The native memory usage threshold for the circuit breaker. Memory usage must be below this percentage of `knn.memory.circuit_breaker.limit` for `knn.circuit_breaker.triggered` to remain false.
`knn.circuit_breaker.triggered` | false | True when memory usage exceeds the `knn.circuit_breaker.unset.percentage` value.
`knn.memory.circuit_breaker.limit` | 50% | The native memory limit for graphs. At the default value, if a machine has 100 GB of memory and the JVM uses 32 GB, the k-NN plugin uses 50% of the remaining 68 GB (34 GB). If memory usage exceeds this value, KNN removes the least recently used graphs.
`knn.memory.circuit_breaker.enabled` | true | Whether to enable the KNN memory circuit breaker.
`knn.plugin.enabled`| true | Enables or disables the KNN plugin.

View File

@ -0,0 +1,26 @@
---
layout: default
title: Gantt Charts
parent: OpenSearch Dashboards
nav_order: 10
---
# Gantt charts
OpenSearch includes a Gantt chart visualization. These charts show the start, end, and duration of unique events in a sequence. Gantt charts are useful in trace analytics, telemetry, and anomaly detection use cases, where you want to understand interactions and dependencies between various events in a schedule.
For example, consider an index of log data. The fields in a typical set of log data, especially audit logs, contain a specific operation or event with a start time and duration.
To create a Gantt chart, do the following:
1. In the visualizations menu, choose **Create visualization** and **Gantt Chart**.
1. Choose a source for chart (e.g. some log data).
1. Under **Metrics**, choose **Event**. For log data, each log is an event.
1. Select the `**Start Time**` and the **Duration** fields from your data set. The start time is the timestamp for the begining of an event. The duration is the amount of time to add to the start time.
1. Under **Results**, choose the number of events that you want to display on the chart. Gantt charts sequence events from earliest to latest based on start time.
1. Choose **Panel settings** to adjust axis labels, time format, and colors.
1. Choose **Update**.
![Gantt Chart](../../images/gantt-chart.png)
This Gantt chart the ID for each log on the Y axis. Each bar is a unique event that spans some amount of time. Hover over a bar to see the duration of that event.

View File

@ -0,0 +1,150 @@
---
layout: default
title: OpenSearch Dashboards
nav_order: 11
has_children: true
has_toc: false
---
# OpenSearch Dashboards
OpenSearch Dashboards is the default visualization tool for data in OpenSearch. It also serves as a user interface for the OpenSearch [security](../security/configuration/), [alerting](../alerting/), and [Index State Management](../ism/) plugins.
## Run OpenSearch Dashboards using Docker
You *can* start OpenSearch Dashboards using `docker run` after [creating a Docker network](https://docs.docker.com/engine/reference/commandline/network_create/) and starting OpenSearch, but the process of connecting OpenSearch Dashboards to OpenSearch is significantly easier with a Docker Compose file.
1. Run `docker pull opensearch/opensearch-dashboards:{{site.opensearch_version}}`.
1. Create a [`docker-compose.yml`](https://docs.docker.com/compose/compose-file/) file appropriate for your environment. A sample file that includes OpenSearch Dashboards is available on the OpenSearch [Docker installation page](../install/docker/#sample-docker-compose-file).
Just like `opensearch.yml`, you can pass a custom `opensearch_dashboards.yml` to the container in the Docker Compose file.
{: .tip }
1. Run `docker-compose up`.
Wait for the containers to start. Then see [Get started with OpenSearch Dashboards](#get-started-with-opensearch-dashboards).
1. When finished, run `docker-compose down`.
## Run OpenSearch Dashboards using the RPM or Debian package
1. If you haven't already, add the `yum` repositories specified in steps 1--2 in [RPM](../install/rpm) or the `apt` repositories in steps 2--3 of [Debian package](../install/deb).
1. `sudo yum install opensearch-dashboards` or `sudo apt install opensearch-dashboards`
1. Modify `/etc/opensearch-dashboards/opensearch_dashboards.yml` to use `opensearch.hosts` rather than `opensearch.url`.
1. `sudo systemctl start opensearch-dashboards.service`
1. To stop OpenSearch Dashboards:
```bash
sudo systemctl stop opensearch-dashboards.service
```
### Configuration
To run OpenSearch Dashboards when the system starts:
```bash
sudo /bin/systemctl daemon-reload
sudo /bin/systemctl enable opensearch-dashboards.service
```
You can also modify the values in `/etc/opensearch-dashboards/opensearch_dashboards.yml`.
## Run OpenSearch Dashboards using the tarball
1. Download the tarball:
```bash
# x64
curl https://d3g5vo6xdbdb9a.cloudfront.net/tarball/opensearch-dashboards/opensearch-dashboards-{{site.opensearch_version}}-linux-x64.tar.gz -o opensearch-dashboards-{{site.opensearch_version}}-linux-x64.tar.gz
# ARM64
curl https://d3g5vo6xdbdb9a.cloudfront.net/tarball/opensearch-dashboards/opensearch-dashboards-{{site.opensearch_version}}-linux-arm64.tar.gz -o opensearch-dashboards-{{site.opensearch_version}}-linux-arm64.tar.gz
```
1. Download the checksum:
```bash
# x64
curl https://d3g5vo6xdbdb9a.cloudfront.net/tarball/opensearch-dashboards/opensearch-dashboards-{{site.opensearch_version}}-linux-x64.tar.gz.sha512 -o opensearch-dashboards-{{site.opensearch_version}}-linux-x64.tar.gz.sha512
# ARM64
curl https://d3g5vo6xdbdb9a.cloudfront.net/tarball/opensearch-dashboards/opensearch-dashboards-{{site.opensearch_version}}-linux-arm64.tar.gz.sha512 -o opensearch-dashboards-{{site.opensearch_version}}-linux-arm64.tar.gz.sha512
```
1. Verify the tarball against the checksum:
```bash
# x64
shasum -a 512 -c opensearch-dashboards-{{site.opensearch_version}}-linux-x64.tar.gz.sha512
# ARM64
shasum -a 512 -c opensearch-dashboards-{{site.opensearch_version}}-linux-arm64.tar.gz.sha512
```
On CentOS, you might not have `shasum`. Install this package:
```bash
sudo yum install perl-Digest-SHA
```
1. Extract the TAR file to a directory and change to that directory:
```bash
# x64
tar -zxf opensearch-dashboards-{{site.opensearch_version}}-linux-x64.tar.gz
cd opensearch-dashboards
# ARM64
tar -zxf opensearch-dashboards-{{site.opensearch_version}}-linux-arm64.tar.gz
cd opensearch-dashboards
```
1. If desired, modify `config/opensearch_dashboards.yml`.
1. Run OpenSearch Dashboards:
```bash
./bin/opensearch-dashboards
```
## Run OpenSearch Dashboards on Windows (ZIP)
1. Download the ZIP.
1. Extract [the ZIP file](https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-windows/ode-windows-zip/opensearch-dashboards-{{site.opensearch_version}}-windows-x64.zip) to a directory and open that directory at the command prompt.
1. If desired, modify `config/opensearch_dashboards.yml`.
1. Run OpenSearch Dashboards:
```
.\bin\opensearch-dashboards.bat
```
## Run OpenSearch Dashboards on Windows (EXE)
1. Download [the EXE file](https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-windows/opensearch-executables/opensearch-dashboards-{{site.opensearch_version}}-windows-x64.exe), run it, and click through the steps.
1. Open the command prompt.
1. Navigate to the OpenSearch Dashboards install directory.
1. If desired, modify `config/opensearch_dashboards.yml`.
1. Run OpenSearch Dashboards:
```
.\bin\opensearch-dashboards.bat
```
## Get started with OpenSearch Dashboards
1. After starting OpenSearch Dashboards, you can access it at port 5601. For example, http://localhost:5601.
1. Log in with the default username `admin` and password `admin`.
1. Choose **Try our sample data** and add the sample flight data.
1. Choose **Discover** and search for a few flights.
1. Choose **Dashboard**, **[Flights] Global Flight Dashboard**, and wait for the dashboard to load.

View File

@ -0,0 +1,31 @@
---
layout: default
title: WMS Map Server
parent: OpenSearch Dashboards
nav_order: 5
---
# Configure WMS map server
Due to licensing restrictions, the default installation of OpenSearch Dashboards does in OpenSearch doesn't include a map server for tile map visualizations. To configure OpenSearch Dashboards to use a WMS map server:
1. Open OpenSearch Dashboards at `https://<host>:<port>`. For example, [https://localhost:5601](https://localhost:5601).
1. If necessary, log in.
1. **Management**.
1. **Advanced Settings**.
1. Locate `visualization:tileMap:WMSdefaults`.
1. Change `enabled` to true, and add the URL of a valid WMS map server.
```json
{
"enabled": true,
"url": "<wms-map-server-url>",
"options": {
"format": "image/png",
"transparent": true
}
}
```
Map services often have licensing fees or restrictions. You are responsible for all such considerations on any map server that you specify.
{: .note }

View File

@ -0,0 +1,68 @@
---
layout: default
title: Notebooks (experimental)
parent: OpenSearch Dashboards
nav_order: 50
redirect_from: /docs/notebooks/
has_children: false
---
# OpenSearch Dashboards notebooks (experimental)
Notebooks have a known issue with [tenants](../../security/access-control/multi-tenancy/). If you open a notebook and can't see its visualizations, you might be under the wrong tenant, or you might not have access to the tenant at all.
{: .warning }
An OpenSearch Dashboards notebook is an interface that lets you easily combine live visualizations and narrative text in a single notebook interface.
With OpenSearch Dashboards notebooks, you can interactively explore data by running different visualizations and share your work with team members to collaborate on a project.
A notebook is a document composed of two elements: OpenSearch Dashboards visualizations and paragraphs (Markdown). Choose multiple timelines to compare and contrast visualizations.
Common use cases include creating postmortem reports, designing runbooks, building live infrastructure reports, and writing documentation.
## Get Started with Notebooks
To get started, choose **OpenSearch Dashboards Notebooks** in OpenSearch Dashboards.
### Step 1: Create a notebook
A notebook is an interface for creating reports.
1. Choose **Create notebook** and enter a descriptive name.
1. Choose **Create**.
Choose **Notebook actions** to rename, duplicate, or delete a notebook.
### Step 2: Add a paragraph
Paragraphs combine text and visualizations for describing data.
#### Add a markdown paragraph
1. To add text, choose **Add markdown paragraph**.
1. Add rich text with markdown syntax.
![Markdown paragraph](../../images/markdown-notebook.png)
#### Add a visualization paragraph
1. To add a visualization, choose **Add OpenSearch Dashboards visualization paragraph**.
1. In **Title**, select your visualization and choose a date range.
You can choose multiple timelines to compare and contrast visualizations.
To run and save a paragraph, choose **Run**.
You can perform the following actions on paragraphs:
- Add a new paragraph to the top of a report.
- Add a new paragraph to the bottom of a report.
- Run all the paragraphs at the same time.
- Clear the outputs of all paragraphs.
- Delete all the paragraphs.
- Move paragraphs up and down.

View File

@ -0,0 +1,202 @@
---
layout: default
title: Standalone OpenSearch Dashboards Plugin Install
parent: OpenSearch Dashboards
nav_order: 1
---
# Standalone plugin install
If you don't want to use the all-in-one installation options, you can install the various plugins for OpenSearch Dashboards individually.
---
#### Table of contents
1. TOC
{:toc}
---
## Plugin compatibility
<table>
<thead style="text-align: left">
<tr>
<th>OpenSearch Dashboards version</th>
<th>Plugin versions</th>
</tr>
</thead>
<tbody>
<tr>
<td>1.0.0-beta1</td>
<td>
<pre>opensearchDashboardsAlerting 1.0.0-beta1
opensearchDashboardsAnomalyDetection 1.0.0-beta1
opensearchDashboardsGanttChart 1.0.0-beta1
opensearchDashboardsIndexManagement 1.0.0-beta1
opensearchDashboardsNotebooks 1.0.0-beta1
opensearchDashboardsQueryWorkbench 1.0.0-beta1
opensearchDashboardsReports 1.0.0-beta1
opensearchDashboardsSecurity 1.0.0-beta1
opensearchDashboardsTraceAnalytics 1.0.0-beta1
</pre>
</td>
</tr>
</tbody>
</table>
## Prerequisites
- A compatible OpenSearch cluster
- The corresponding OpenSearch plugins [installed on that cluster](../../install/plugins)
- The corresponding version of [OpenSearch Dashboards](../) (e.g. OpenSearch Dashboards 1.0.0 works with OpenSearch 1.0.0)
## Install
Navigate to the OpenSearch Dashboards home directory (likely `/usr/share/opensearch-dashboards`) and run the install command for each plugin.
#### Security OpenSearch Dashboards
```bash
sudo bin/opensearch-dashboards-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-dashboards-plugins/opensearch-security/opensearchSecurityOpenSearch Dashboards-{{site.opensearch_major_minor_version}}.0.1.zip
```
This plugin provides a user interface for managing users, roles, mappings, action groups, and tenants.
#### Alerting OpenSearch Dashboards
```bash
sudo bin/opensearch-dashboards-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-dashboards-plugins/opensearch-alerting/opensearchAlertingOpenSearch Dashboards-{{site.opensearch_major_minor_version}}.0.0.zip
```
This plugin provides a user interface for creating monitors and managing alerts.
#### Index State Management OpenSearch Dashboards
```bash
sudo bin/opensearch-dashboards-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-dashboards-plugins/opensearch-index-management/opensearchIndexManagementOpenSearch Dashboards-{{site.opensearch_major_minor_version}}.0.1.zip
```
This plugin provides a user interface for managing policies.
#### Anomaly Detection OpenSearch Dashboards
```bash
sudo bin/opensearch-dashboards-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-dashboards-plugins/opensearch-anomaly-detection/opensearchAnomalyDetectionOpenSearch Dashboards-{{site.opensearch_major_minor_version}}.0.0.zip
```
This plugin provides a user interface for adding detectors.
#### Query Workbench OpenSearch Dashboards
```bash
sudo bin/opensearch-dashboards-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-dashboards-plugins/opensearch-query-workbench/opensearchQueryWorkbenchOpenSearch Dashboards-{{site.opensearch_major_minor_version}}.0.0.zip
```
This plugin provides a user interface for using SQL queries to explore your data.
#### Trace Analytics
```bash
sudo bin/opensearch-dashboards-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-dashboards-plugins/opensearch-trace-analytics/opensearchTraceAnalyticsOpenSearch Dashboards-{{site.opensearch_major_minor_version}}.2.0.zip
```
This plugin uses distributed trace data (indexed in OpenSearch using Data Prepper) to display latency trends, error rates, and more.
#### Notebooks OpenSearch Dashboards
```bash
sudo bin/opensearch-dashboards-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-dashboards-plugins/opensearch-notebooks/opensearchNotebooksOpenSearch Dashboards-{{site.opensearch_major_minor_version}}.2.0.zip
```
This plugin lets you combine OpenSearch Dashboards visualizations and narrative text in a single interface.
#### Reports OpenSearch Dashboards
```bash
# x86 Linux
sudo bin/opensearch-dashboards-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-dashboards-plugins/opensearch-reports/linux/x64/opensearchReportsOpenSearch Dashboards-{{site.opensearch_major_minor_version}}.2.0-linux-x64.zip
# ARM64 Linux
sudo bin/opensearch-dashboards-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-dashboards-plugins/opensearch-reports/linux/arm64/opensearchReportsOpenSearch Dashboards-{{site.opensearch_major_minor_version}}.2.0-linux-arm64.zip
# x86 Windows
sudo bin/opensearch-dashboards-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-dashboards-plugins/opensearch-reports/windows/x64/opensearchReportsOpenSearch Dashboards-{{site.opensearch_major_minor_version}}.2.0-windows-x64.zip
```
This plugin lets you export and share reports from OpenSearch Dashboards dashboards, visualizations, and saved searches.
#### Gantt Chart OpenSearch Dashboards
```bash
sudo bin/opensearch-dashboards-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-dashboards-plugins/opensearch-gantt-chart/opensearchGanttChartOpenSearch Dashboards-{{site.opensearch_major_minor_version}}.0.0.zip
```
This plugin adds a new Gantt chart visualization.
## List installed plugins
To check your installed plugins:
```bash
sudo bin/opensearch-dashboards-plugin list
```
## Remove plugins
```bash
sudo bin/opensearch-dashboards-plugin remove <plugin-name>
```
For certain plugins, you must also remove the "optimze" bundle. Here is a sample command for the Anomaly Detection plugin:
```bash
sudo rm /usr/share/opensearch-dashboards/optimize/bundles/opensearch-anomaly-detection-opensearch-dashboards.*
```
Then restart OpenSearch Dashboards. After the removal of any plugin, OpenSearch Dashboards performs an optimize operation the next time you start it. This operation takes several minutes even on fast machines, so be patient.
## Update plugins
OpenSearch Dashboards doesnt update plugins. Instead, you have to remove the old version and its optimized bundle, reinstall them, and restart OpenSearch Dashboards:
1. Remove the old version:
```bash
sudo bin/opensearch-dashboards-plugin remove <plugin-name>
```
1. Remove the optimized bundle:
```bash
sudo rm /usr/share/opensearch-dashboards/optimize/bundles/<bundle-name>
```
1. Reinstall the new version:
```bash
sudo bin/opensearch-dashboards-plugin install <plugin-name>
```
1. Restart OpenSearch Dashboards.
For example, to remove and reinstall the anomaly detection plugin:
```bash
sudo bin/opensearch-plugin remove opensearch-anomaly-detection
sudo rm /usr/share/opensearch-dashboards/optimize/bundles/opensearch-anomaly-detection-opensearch-dashboards.*
sudo bin/opensearch-dashboards-plugin install <AD OpenSearch Dashboards plugin artifact URL>
```

View File

@ -0,0 +1,55 @@
---
layout: default
title: Reporting
parent: OpenSearch Dashboards
nav_order: 20
---
# Reporting
The OpenSearch Dashboards reports feature lets you create PNG, PDF, and CSV reports. To use reports, you must have the correct permissions. For summaries of the predefined roles and the permissions they grant, see the [security plugin](../../security/access-control/users-roles/#predefined-roles).
## Create reports from Discovery, Visualize, or Dashboard
On-demand reports let you quickly generate a report from the current view.
1. From the top bar, choose **Reporting**.
1. For dashboards or visualizations, **Download PDF** or **Download PNG**. From the Discover page, choose **Download CSV**.
Reports generate asynchronously in the background and might take a few minutes, depending on the size of the report. A notification appears when your report is ready to download.
1. To create a schedule-based report, choose **Create report definition**. Then proceed to [Create reports using a definition](#create-reports-using-a-definition). This option pre-fills many of the fields for you based on the visualization, dashboard, or data you were viewing.
## Create reports using a definition
Definitions let you schedule reports for periodic creation.
1. From the left navigation panel, choose **Reporting**.
1. Choose **Create**.
1. Under **Report settings**, enter a name and optional description for your report.
1. Choose the **Report Source** (i.e. the page from which the report is generated). You can generate reports from the **Dashboard**, **Visualize** or **Discover** pages.
1. Choose your dashboard, visualization, or saved search. Then choose a time range for the report.
1. Choose an appropriate file format for the report.
1. (Optional) Add a header or footer for the report. Headers and footers are only available for dashboard or visualization reports.
1. Under **Report trigger**, choose either **On-demand** or **Schedule**.
For scheduled reports, choose either **Recurring** or **Cron based**. You can receive reports daily or at some other time interval. Cron expressions give you even more flexiblity. See [Cron expression reference](../../alerting/cron/) for more information.
1. Choose **Create**.
## Troubleshooting
### Chromium fails to launch with OpenSearch Dashboards
While creating a report for dashboards or visualizations, you might see a `Download error`:
![OpenSearch Dashboards reporting pop-up error message](../../images/reporting-error.png)
This problem occurs due to two reasons:
1. You don't have the correct version of `headless-chrome` to match the operating system on which OpenSearch Dashboards is running. Download the correct version of `headless-chrome` from [here](https://github.com/opensearch-project/opensearch-dashboards-reports/releases/tag/chromium-1.12.0.0).
2. You're missing additional dependencies. Install the required dependencies for your operating system from the [additional libraries](https://github.com/opensearch-project/opensearch-dashboards-reports/blob/dev/opensearch-dashboards-reports/rendering-engine/headless-chrome/README.md#additional-libaries) section.

290
docs/opensearch/bool.md Normal file
View File

@ -0,0 +1,290 @@
---
layout: default
title: Boolean Queries
parent: OpenSearch
nav_order: 11
---
# Boolean queries
The `bool` query lets you combine multiple search queries with boolean logic. You can use boolean logic between queries to either narrow or broaden your search results.
The `bool` query is a go-to query because it allows you to construct an advanced query by chaining together several simple ones.
Use the following clauses (subqueries) within the `bool` query:
Clause | Behavior
:--- | :---
`must` | The results must match the queries in this clause. If you have multiple queries, every single one must match. Acts as an `and` operator.
`must_not` | This is the anti-must clause. All matches are excluded from the results. Acts as a `not` operator.
`should` | The results should, but don't have to, match the queries. Each matching `should` clause increases the relevancy score. As an option, you can require one or more queries to match the value of the `minimum_number_should_match` parameter (default is 1).
`filter` | Filters reduce your dataset before applying the queries. A query within a filter clause is a yes-no option, where if a document matches the query it's included in the results. Otherwise, it's not. Filter queries do not affect the relevancy score that the results are sorted by. The results of a filter query are generally cached so they tend to run faster. Use the filter query to filter the results based on exact matches, ranges, dates, numbers, and so on.
The structure of a `bool` query is as follows:
```json
GET _search
{
"query": {
"bool": {
"must": [
{}
],
"must_not": [
{}
],
"should": [
{}
],
"filter": {}
}
}
}
```
For example, assume you have the complete works of Shakespeare indexed in an OpenSearch cluster. You want to construct a single query that meets the following requirements:
1. The `text_entry` field must contain the word `love` and should contain either `life` or `grace`.
2. The `speaker` field must not contain `ROMEO`.
3. Filter these results to the play `Romeo and Juliet` without affecting the relevancy score.
Use the following query:
```json
GET shakespeare/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"text_entry": "love"
}
}
],
"should": [
{
"match": {
"text_entry": "life"
}
},
{
"match": {
"text_entry": "grace"
}
}
],
"minimum_should_match": 1,
"must_not": [
{
"match": {
"speaker": "ROMEO"
}
}
],
"filter": {
"term": {
"play_name": "Romeo and Juliet"
}
}
}
}
}
```
#### Sample output
```json
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 4,
"successful": 4,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 11.356054,
"hits": [
{
"_index": "shakespeare",
"_type": "_doc",
"_id": "88020",
"_score": 11.356054,
"_source": {
"type": "line",
"line_id": 88021,
"play_name": "Romeo and Juliet",
"speech_number": 19,
"line_number": "4.5.61",
"speaker": "PARIS",
"text_entry": "O love! O life! not life, but love in death!"
}
}
]
}
}
```
If you want to identify which of these clauses actually caused the matching results, name each query with the `_name` parameter.
To add the `_name` parameter, change the field name in the `match` query to an object:
```json
GET shakespeare/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"text_entry": {
"query": "love",
"_name": "love-must"
}
}
}
],
"should": [
{
"match": {
"text_entry": {
"query": "life",
"_name": "life-should"
}
}
},
{
"match": {
"text_entry": {
"query": "grace",
"_name": "grace-should"
}
}
}
],
"minimum_should_match": 1,
"must_not": [
{
"match": {
"speaker": {
"query": "ROMEO",
"_name": "ROMEO-must-not"
}
}
}
],
"filter": {
"term": {
"play_name": "Romeo and Juliet"
}
}
}
}
}
```
OpenSearch returns a `matched_queries` array that lists the queries that matched these results:
```json
"matched_queries": [
"love-must",
"life-should"
]
```
If you remove the queries not in this list, you will still see the exact same result.
By examining which `should` clause matched, you can better understand the relevancy score of the results.
You can also construct complex boolean expressions by nesting `bool` queries.
For example, to find a `text_entry` field that matches (`love` OR `hate`) AND (`life` OR `grace`) in the play `Romeo and Juliet`:
```json
GET shakespeare/_search
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"match": {
"text_entry": "love"
}
},
{
"match": {
"text": "hate"
}
}
]
}
},
{
"bool": {
"should": [
{
"match": {
"text_entry": "life"
}
},
{
"match": {
"text": "grace"
}
}
]
}
}
],
"filter": {
"term": {
"play_name": "Romeo and Juliet"
}
}
}
}
}
```
#### Sample output
```json
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 11.37006,
"hits": [
{
"_index": "shakespeare",
"_type": "doc",
"_id": "88020",
"_score": 11.37006,
"_source": {
"type": "line",
"line_id": 88021,
"play_name": "Romeo and Juliet",
"speech_number": 19,
"line_number": "4.5.61",
"speaker": "PARIS",
"text_entry": "O love! O life! not life, but love in death!"
}
}
]
}
}
```

257
docs/opensearch/catapis.md Normal file
View File

@ -0,0 +1,257 @@
---
layout: default
title: CAT API
parent: OpenSearch
nav_order: 7
---
# cat API
You can get essential statistics about your cluster in an easy-to-understand, tabular format using the compact and aligned text (CAT) API. The cat API is a human-readable interface that returns plain text instead of traditional JSON.
Using the cat API, you can answer questions like which node is the elected master, what state is the cluster in, how many documents are in each index, and so on.
To see the available operations in the cat API, use the following command:
```
GET _cat
```
You can also use the following string parameters with your query.
Parameter | Description
:--- | :--- |
`?v` | Makes the output more verbose by adding headers to the columns. It also adds some formatting to help align each of the columns together. All examples on this page include the `v` parameter.
`?help` | Lists the default and other available headers for a given operation.
`?h` | Limits the output to specific headers.
`?format` | Outputs the result in JSON, YAML, or CBOR formats.
`?sort` | Sorts the output by the specified columns.
To see what each column represents, use the `?v` parameter:
```
GET _cat/<operation_name>?v
```
To see all the available headers, use the `?help` parameter:
```
GET _cat/<operation_name>?help
```
To limit the output to a subset of headers, use the `?h` parameter:
```
GET _cat/<operation_name>?h=<header_name_1>,<header_name_2>&v
```
Typically, for any operation you can find out what headers are available using the `?help` parameter, and then use the `?h` parameter to limit the output to only the headers that you care about.
---
#### Table of contents
1. TOC
{:toc}
---
## Aliases
Lists the mapping of aliases to indices, plus routing and filtering information.
```
GET _cat/aliases?v
```
To limit the information to a specific alias, add the alias name after your query.
```
GET _cat/aliases/<alias>?v
```
## Allocation
Lists the allocation of disk space for indices and the number of shards on each node.
Default request:
```
GET _cat/allocation?v
```
## Count
Lists the number of documents in your cluster.
```
GET _cat/count?v
```
To see the number of documents in a specific index, add the index name after your query.
```
GET _cat/count/<index>?v
```
## Field data
Lists the memory size used by each field per node.
```
GET _cat/fielddata?v
```
To limit the information to a specific field, add the field name after your query.
```
GET _cat/fielddata/<fields>?v
```
## Health
Lists the status of the cluster, how long the cluster has been up, the number of nodes, and other useful information that helps you analyze the health of your cluster.
```
GET _cat/health?v
```
## Indices
Lists information related to indices—how much disk space they are using, how many shards they have, their health status, and so on.
```
GET _cat/indices?v
```
To limit the information to a specific index, add the index name after your query.
```
GET _cat/indices/<index>?v
```
## Master
Lists information that helps identify the elected master node.
```
GET _cat/master?v
```
## Node attributes
Lists the attributes of custom nodes.
```
GET _cat/nodeattrs?v
```
## Nodes
Lists node-level information, including node roles and load metrics.
A few important node metrics are `pid`, `name`, `master`, `ip`, `port`, `version`, `build`, `jdk`, along with `disk`, `heap`, `ram`, and `file_desc`.
```
GET _cat/nodes?v
```
## Pending tasks
Lists the progress of all pending tasks, including task priority and time in queue.
```
GET _cat/pending_tasks?v
```
## Plugins
Lists the names, components, and versions of the installed plugins.
```
GET _cat/plugins?v
```
## Recovery
Lists all completed and ongoing index and shard recoveries.
```
GET _cat/recovery?v
```
To see only the recoveries of a specific index, add the index name after your query.
```
GET _cat/recovery/<index>?v
```
## Repositories
Lists all snapshot repositories and their types.
```
GET _cat/repositories?v
```
## Segments
Lists Lucene segment-level information for each index.
```
GET _cat/segments?v
```
To see only the information about segments of a specific index, add the index name after your query.
```
GET _cat/segments/<index>?v
```
## Shards
Lists the state of all primary and replica shards and how they are distributed.
```
GET _cat/shards?v
```
To see only the information about shards of a specific index, add the index name after your query.
```
GET _cat/shards/<index>?v
```
## Snapshots
Lists all snapshots for a repository.
```
GET _cat/snapshots/<repository>?v
```
## Tasks
Lists the progress of all tasks currently running on your cluster.
```
GET _cat/tasks?v
```
## Templates
Lists the names, patterns, order numbers, and version numbers of index templates.
```
GET _cat/templates?v
```
## Thread pool
Lists the active, queued, and rejected threads of different thread pools on each node.
```
GET _cat/thread_pool?v
```
To limit the information to a specific thread pool, add the thread pool name after your query.
```
GET _cat/thread_pool/<thread_pool>?v
```

341
docs/opensearch/cluster.md Normal file
View File

@ -0,0 +1,341 @@
---
layout: default
title: Cluster Formation
parent: OpenSearch
nav_order: 2
---
# Cluster formation
Before diving into OpenSearch and searching and aggregating data, you first need to create an OpenSearch cluster.
OpenSearch can operate as a single-node or multi-node cluster. The steps to configure both are, in general, quite similar. This page demonstrates how to create and configure a multi-node cluster, but with only a few minor adjustments, you can follow the same steps to create a single-node cluster.
To create and deploy an OpenSearch cluster according to your requirements, its important to understand how node discovery and cluster formation work and what settings govern them.
There are many ways that you can design a cluster. The following illustration shows a basic architecture.
![multi-node cluster architecture diagram](../../images/cluster.png)
This is a four-node cluster that has one dedicated master node, one dedicated coordinating node, and two data nodes that are master-eligible and also used for ingesting data.
The following table provides brief descriptions of the node types.
Node type | Description | Best practices for production
:--- | :--- | :-- |
`Master` | Manages the overall operation of a cluster and keeps track of the cluster state. This includes creating and deleting indices, keeping track of the nodes that join and leave the cluster, checking the health of each node in the cluster (by running ping requests), and allocating shards to nodes. | Three dedicated master nodes in three different zones is the right approach for almost all production use cases. This makes sure your cluster never loses quorum. Two nodes will be idle for most of the time except when one node goes down or needs some maintenance.
`Master-eligible` | Elects one node among them as the master node through a voting process. | For production clusters, make sure you have dedicated master nodes. The way to achieve a dedicated node type is to mark all other node types as false. In this case, you have to mark all the other nodes as not master-eligible.
`Data` | Stores and searches data. Performs all data-related operations (indexing, searching, aggregating) on local shards. These are the worker nodes of your cluster and need more disk space than any other node type. | As you add data nodes, keep them balanced between zones. For example, if you have three zones, add data nodes in multiples of three, one for each zone. We recommend using storage and RAM-heavy nodes.
`Ingest` | Preprocesses data before storing it in the cluster. Runs an ingest pipeline that transforms your data before adding it to an index. | If you plan to ingest a lot of data and run complex ingest pipelines, we recommend you use dedicated ingest nodes. You can also optionally offload your indexing from the data nodes so that your data nodes are used exclusively for searching and aggregating.
`Coordinating` | Delegates client requests to the shards on the data nodes, collects and aggregates the results into one final result, and sends this result back to the client. | A couple of dedicated coordinating-only nodes is appropriate to prevent bottlenecks for search-heavy workloads. We recommend using CPUs with as many cores as you can.
By default, each node is a master-eligible, data, ingest, and coordinating node. Deciding on the number of nodes, assigning node types, and choosing the hardware for each node type depends on your use case. You must take into account factors like the amount of time you want to hold on to your data, the average size of your documents, your typical workload (indexing, searches, aggregations), your expected price-performance ratio, your risk tolerance, and so on.
After you assess all these requirements, we recommend you use a benchmark testing tool like Rally to provision a small sample cluster and run tests with varying workloads and configurations. Compare and analyze the system and query metrics for these tests to design an optimum architecture. To get started with Rally, see the [Rally documentation](https://esrally.readthedocs.io/en/stable/).
This page demonstrates how to work with the different node types. It assumes that you have a four-node cluster similar to the preceding illustration.
## Prerequisites
Before you get started, you must install and configure OpenSearch on all of your nodes. For information about the available options, see [Install and Configure](../../install/).
After you are done, use SSH to connect to each node, and then open the `config/opensearch.yml` file.
You can set all configurations for your cluster in this file.
## Step 1: Name a cluster
Specify a unique name for the cluster. If you don't specify a cluster name, it's set to `opensearch` by default. Setting a descriptive cluster name is important, especially if you want to run multiple clusters inside a single network.
To specify the cluster name, change the following line:
```yml
#cluster.name: my-application
```
to
```yml
cluster.name: opensearch-cluster
```
Make the same change on all the nodes to make sure that they'll join to form a cluster.
## Step 2: Set node attributes for each node in a cluster
After you name the cluster, set node attributes for each node in your cluster.
#### Master node
Give your master node a name. If you don't specify a name, OpenSearch assigns a machine-generated name that makes the node difficult to monitor and troubleshoot.
```yml
node.name: opensearch-master
```
You can also explicitly specify that this node is a master node. This is already true by default, but adding it makes it easier to identify the master node:
```yml
node.master: true
```
Then make the node a dedicated master that wont perform double-duty as a data node:
```yml
node.data: false
```
Specify that this node will not be used for ingesting data:
```yml
node.ingest: false
```
#### Data nodes
Change the name of two nodes to `opensearch-d1` and `opensearch-d2`, respectively:
```yml
node.name: opensearch-d1
```
```yml
node.name: opensearch-d2
```
You can make them master-eligible data nodes that will also be used for ingesting data:
```yml
node.master: true
node.data: true
node.ingest: true
```
You can also specify any other attributes that you'd like to set for the data nodes.
#### Coordinating node
Change the name of the coordinating node to `opensearch-c1`:
```yml
node.name: opensearch-c1
```
Every node is a coordinating node by default, so to make this node a dedicated coordinating node, set `node.master`, `node.data`, and `node.ingest` to `false`:
```yml
node.master: false
node.data: false
node.ingest: false
```
## Step 3: Bind a cluster to specific IP addresses
`network_host` defines the IP address that's used to bind the node. By default, OpenSearch listens on a local host, which limits the cluster to a single node. You can also use `_local_` and `_site_` to bind to any loopback or site-local address, whether IPv4 or IPv6:
```yml
network.host: [_local_, _site_]
```
To form a multi-node cluster, specify the IP address of the node:
```yml
network.host: <IP address of the node>
```
Make sure to configure these settings on all of your nodes.
## Step 4: Configure discovery hosts for a cluster
Now that you've configured the network hosts, you need to configure the discovery hosts.
Zen Discovery is the built-in, default mechanism that uses [unicast](https://en.wikipedia.org/wiki/Unicast) to find other nodes in the cluster.
You can generally just add all of your master-eligible nodes to the `discovery.seed_hosts` array. When a node starts up, it finds the other master-eligible nodes, determines which one is the master, and asks to join the cluster.
For example, for `opensearch-master` the line looks something like this:
```yml
discovery.seed_hosts: ["<private IP of opensearch-d1>", "<private IP of opensearch-d2>", "<private IP of opensearch-c1>"]
```
## Step 5: Start the cluster
After you set the configurations, start OpenSearch on all nodes.
```bash
sudo systemctl start opensearch.service
```
Then go to the logs file to see the formation of the cluster:
```bash
less /var/log/opensearch/opensearch-cluster.log
```
Perform the following `_cat` query on any node to see all the nodes formed as a cluster:
```bash
curl -XGET https://<private-ip>:9200/_cat/nodes?v -u 'admin:admin' --insecure
```
```
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
x.x.x.x 13 61 0 0.02 0.04 0.05 mi * opensearch-master
x.x.x.x 16 60 0 0.06 0.05 0.05 md - opensearch-d1
x.x.x.x 34 38 0 0.12 0.07 0.06 md - opensearch-d2
x.x.x.x 23 38 0 0.12 0.07 0.06 md - opensearch-c1
```
To better understand and monitor your cluster, use the [cat API](../catapis/).
## (Advanced) Step 6: Configure shard allocation awareness or forced awareness
If your nodes are spread across several geographical zones, you can configure shard allocation awareness to allocate all replica shards to a zone thats different from their primary shard.
With shard allocation awareness, if the nodes in one of your zones fail, you can be assured that your replica shards are spread across your other zones. It adds a layer of fault tolerance to ensure your data survives a zone failure beyond just individual node failures.
To configure shard allocation awareness, add zone attributes to `opensearch-d1` and `opensearch-d2`, respectively:
```yml
node.attr.zone: zoneA
```
```yml
node.attr.zone: zoneB
```
Update the cluster settings:
```json
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.awareness.attributes": "zone"
}
}
```
You can either use `persistent` or `transient` settings. We recommend the `persistent` setting because it persists through a cluster reboot. Transient settings do not persist through a cluster reboot.
Shard allocation awareness attempts to separate primary and replica shards across multiple zones. But, if only one zone is available (such as after a zone failure), OpenSearch allocates replica shards to the only remaining zone.
Another option is to require that primary and replica shards are never allocated to the same zone. This is called forced awareness.
To configure forced awareness, specify all the possible values for your zone attributes:
```json
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.awareness.attributes": "zone",
"cluster.routing.allocation.awareness.force.zone.values":["zoneA", "zoneB"]
}
}
```
Now, if a data node fails, forced awareness does not allocate the replicas to a node in the same zone. Instead, the cluster enters a yellow state and only allocates the replicas when nodes in another zone come online.
In our two-zone architecture, we can use allocation awareness if `opensearch-d1` and `opensearch-d2` are less than 50% utilized, so that each of them have the storage capacity to allocate replicas in the same zone.
If that is not the case, and `opensearch-d1` and `opensearch-d2` do not have the capacity to contain all primary and replica shards, we can use forced awareness. This approach helps to make sure that, in the event of a failure, OpenSearch doesn't overload your last remaining zone and lock up your cluster due to lack of storage.
Choosing allocation awareness or forced awareness depends on how much space you might need in each zone to balance your primary and replica shards.
## (Advanced) Step 7: Set up a hot-warm architecture
You can design a hot-warm architecture where you first index your data to hot nodes---fast and expensive---and after a certain period of time move them to warm nodes---slow and cheap.
If you analyze time series data that you rarely update and want the older data to go onto cheaper storage, this architecture can be a good fit.
This architecture helps save money on storage costs. Rather than increasing the number of hot nodes and using fast, expensive storage, you can add warm nodes for data that you don't access as frequently.
To configure a hot-warm storage architecture, add `temp` attributes to `opensearch-d1` and `opensearch-d2`, respectively:
```yml
node.attr.temp: hot
```
```yml
node.attr.temp: warm
```
You can set the attribute name and value to whatever you want as long as its consistent for all your hot and warm nodes.
To add an index `newindex` to the hot node:
```json
PUT newindex
{
"settings": {
"index.routing.allocation.require.temp": "hot"
}
}
```
Take a look at the following shard allocation for `newindex`:
```json
GET _cat/shards/newindex?v
index shard prirep state docs store ip node
new_index 2 p STARTED 0 230b 10.0.0.225 opensearch-d1
new_index 2 r UNASSIGNED
new_index 3 p STARTED 0 230b 10.0.0.225 opensearch-d1
new_index 3 r UNASSIGNED
new_index 4 p STARTED 0 230b 10.0.0.225 opensearch-d1
new_index 4 r UNASSIGNED
new_index 1 p STARTED 0 230b 10.0.0.225 opensearch-d1
new_index 1 r UNASSIGNED
new_index 0 p STARTED 0 230b 10.0.0.225 opensearch-d1
new_index 0 r UNASSIGNED
```
In this example, all primary shards are allocated to `opensearch-d1`, which is our hot node. All replica shards are unassigned, because we're forcing this index to allocate only to hot nodes.
To add an index `oldindex` to the warm node:
```json
PUT oldindex
{
"settings": {
"index.routing.allocation.require.temp": "warm"
}
}
```
The shard allocation for `oldindex`:
```json
GET _cat/shards/oldindex?v
index shard prirep state docs store ip node
old_index 2 p STARTED 0 230b 10.0.0.74 opensearch-d2
old_index 2 r UNASSIGNED
old_index 3 p STARTED 0 230b 10.0.0.74 opensearch-d2
old_index 3 r UNASSIGNED
old_index 4 p STARTED 0 230b 10.0.0.74 opensearch-d2
old_index 4 r UNASSIGNED
old_index 1 p STARTED 0 230b 10.0.0.74 opensearch-d2
old_index 1 r UNASSIGNED
old_index 0 p STARTED 0 230b 10.0.0.74 opensearch-d2
old_index 0 r UNASSIGNED
```
In this case, all primary shards are allocated to `opensearch-d2`. Again, all replica shards are unassigned because we only have one warm node.
A popular approach is to configure your [index templates](../index-templates/) to set the `index.routing.allocation.require.temp` value to `hot`. This way, OpenSearch stores your most recent data on your hot nodes.
You can then use the [Index State Management (ISM)](../../ism/index/) plugin to periodically check the age of an index and specify actions to take on it. For example, when the index reaches a specific age, change the `index.routing.allocation.require.temp` setting to `warm` to automatically move your data from hot nodes to warm nodes.
## Next steps
If you are using the security plugin, the previous request to `_cat/nodes?v` might have failed with an initialization error. To initialize the plugin, run `opensearch/plugins/opensearch_security/tools/securityadmin.sh`. A sample command that uses the demo certificates might look like this:
```bash
sudo ./securityadmin.sh -cd ../securityconfig/ -icl -nhnv -cacert /etc/opensearch/root-ca.pem -cert /etc/opensearch/kirk.pem -key /etc/opensearch/kirk-key.pem -h <private-ip>
```
For full guidance around configuration options, see [Security configuration](../../security/configuration).

View File

@ -0,0 +1,18 @@
---
layout: default
title: Common REST Parameters
parent: OpenSearch
nav_order: 91
---
# Common REST parameters
OpenSearch supports the following parameters for all REST operations:
Option | Description | Example
:--- | :--- | :---
Human-readable output | To convert output units to human-readable values (for example, `1h` for 1 hour and `1kb` for 1,024 bytes), add `?human=true` to the request URL. | `GET <index_name>/_search?human=true`
Pretty result | To get back JSON responses in a readable format, add `?pretty=true` to the request URL. | `GET <index_name>/_search?pretty=true`
Content type | To specify the type of content in the request body, use the `Content-Type` key name in the request header. Most operations support JSON, YAML, and CBOR formats. | `POST _scripts/<template_name> -H 'Content-Type: application/json`
Request body in query string | If the client library does not accept a request body for non-POST requests, use the `source` query string parameter to pass the request body. Also, specify the `source_content_type` parameter with a supported media type such as `application/json`. | `GET _search?source_content_type=application/json&source={"query":{"match_all":{}}}`
Stack traces | To include the error stack trace in the response when an exception is raised, add `error_trace=true` to the request URL. | `GET <index_name>/_search?error_trace=true`

Some files were not shown because too many files have changed in this diff Show More