diff --git a/.gitignore b/.gitignore new file mode 100644 index 00000000..903c8bf2 --- /dev/null +++ b/.gitignore @@ -0,0 +1,5 @@ +_site +.sass-cache +.jekyll-metadata +.DS_Store +Gemfile.lock diff --git a/404.md b/404.md new file mode 100644 index 00000000..f21d3a51 --- /dev/null +++ b/404.md @@ -0,0 +1,3 @@ +--- +permalink: /404.html +--- diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index c4b6a1c5..83943154 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -11,7 +11,7 @@ information to effectively respond to your bug report or contribution. We welcome you to use the GitHub issue tracker to report bugs or suggest features. -When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already +When filing an issue, please check [existing open](https://github.com/opensearch-project/documentation-website/issues), or [recently closed](https://github.com/opensearch-project/documentation-website/issues?q=is%3Aissue+is%3Aclosed), issues to make sure somebody else hasn't already reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: * A reproducible test case or series of steps @@ -23,7 +23,7 @@ reported the issue. Please try to include as much information as you can. Detail ## Contributing via Pull Requests Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: -1. You are working against the latest source on the *main* branch. +1. You are working against the latest source on the *master* branch. 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. @@ -41,7 +41,7 @@ GitHub provides additional document on [forking a repository](https://help.githu ## Finding contributions to work on -Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. +Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any ['help wanted'](https://github.com/opensearch-project/documentation-website/issues?q=is%3Aissue+label%3A%22help+wanted%22+is%3Aopen) issues is a great place to start. ## Code of Conduct @@ -56,4 +56,6 @@ If you discover a potential security issue in this project we ask that you notif ## Licensing -See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. +See the [LICENSE](https://github.com/opensearch-project/documentation-website/blob/master/LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. + +We may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger changes. diff --git a/Gemfile b/Gemfile new file mode 100644 index 00000000..053f5b20 --- /dev/null +++ b/Gemfile @@ -0,0 +1,32 @@ +source "https://rubygems.org" + +# Hello! This is where you manage which Jekyll version is used to run. +# When you want to use a different version, change it below, save the +# file and run `bundle install`. Run Jekyll with `bundle exec`, like so: +# +# bundle exec jekyll serve +# +# This will help ensure the proper Jekyll version is running. +# Happy Jekylling! +# gem "jekyll", "~> 3.9.0" + +# This is the default theme for new Jekyll sites. You may change this to anything you like. +gem "just-the-docs", "~> 0.3.3" + +# If you want to use GitHub Pages, remove the "gem "jekyll"" above and +# uncomment the line below. To upgrade, run `bundle update github-pages`. + +gem 'github-pages', group: :jekyll_plugins + +# If you have any plugins, put them here! +# group :jekyll_plugins do +# # gem "jekyll-feed", "~> 0.6" +# gem "jekyll-remote-theme" +# gem "jekyll-redirect-from" +# end + +# Windows does not include zoneinfo files, so bundle the tzinfo-data gem +gem "tzinfo-data", platforms: [:mingw, :mswin, :x64_mingw, :jruby] + +# Performance-booster for watching directories on Windows +gem "wdm", "~> 0.1.0" if Gem.win_platform? diff --git a/README.md b/README.md index 847260ca..c3c48e48 100644 --- a/README.md +++ b/README.md @@ -1,17 +1,276 @@ -## My Project +# OpenSearch documentation -TODO: Fill this README out! +This repository contains the documentation for OpenSearch, the search, analytics, and visualization suite with advanced security, alerting, SQL support, automated index management, deep performance analysis, and more. You can find the rendered documentation at [docs-beta.opensearch.org](docs-beta.opensearch.org). -Be sure to: +Community contributions remain essential in keeping this documentation comprehensive, useful, well-organized, and up-to-date. -* Change the title in this README -* Edit your repository description on GitHub -## Security +## How you can help -See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. +- Do you work on one of the various OpenSearch plugins? Take a look at the documentation for the plugin. Is everything accurate? Will anything change in the near future? -## License + Often, engineering teams can keep existing documentation up-to-date with minimal effort, thus freeing up the documentation team to focus on larger projects. -This project is licensed under the Apache-2.0 License. +- Do you have expertise in a particular area of OpenSearch? Cluster sizing? The query DSL? Painless scripting? Aggregations? JVM settings? Take a look at the [current content](https://docs-beta.opensearch.org/docs/opensearch/) and see where you can add value. The [documentation team](#points-of-contact) is happy to help you polish and organize your drafts. +- Are you an OpenSearch Dashboards expert? How did you set up your visualizations? Why is a particular dashboard so valuable to your organization? We have [very little](https://docs-beta.opensearch.org/docs/opensearch-dashboards/) on how to use OpenSearch Dashboards, only how to install it. + +- Are you a web developer? Do you want to add an optional dark mode to the documentation? A "copy to clipboard" button for our code samples? Other improvements to the design or usability? See [major changes](#major-changes) for information on building the website locally. + +- Our [issue tracker](https://github.com/opensearch-project/documentation-website/issues) contains documentation bugs and other content gaps, some of which have colorful labels like "good first issue" and "help wanted." + + +## Points of contact + +If you encounter problems or have questions when contributing to the documentation, these people can help: + +- [aetter](https://github.com/aetter) +- [ashwinkumar12345](https://github.com/ashwinkumar12345) +- [keithhc2](https://github.com/keithhc2) +- [snyder114](https://github.com/snyder114) + + +## How we build the website + +After each commit to this repository, GitHub Pages automatically uses [Jekyll](https://jekyllrb.com) to rebuild the [website](https://docs-beta.opensearch.org). The whole process takes around 30 seconds. + +This repository contains many [Markdown](https://guides.github.com/features/mastering-markdown/) files in the `/docs` directory. Each Markdown file correlates with one page on the website. For example, the Markdown file for [this page](https://docs-beta.opensearch.org/docs/opensearch/) is [here](https://github.com/opensearch-project/documentation-website/blob/master/docs/opensearch/index.md). + +Using plain text on GitHub has many advantages: + +- Everything is free, open source, and works on every operating system. Use your favorite text editor, Ruby, Jekyll, and Git. +- Markdown is easy to learn and looks good in side-by-side diffs. +- The workflow is no different than contributing code. Make your changes, build locally to check your work, and submit a pull request. Reviewers check the PR before merging. +- Alternatives like wikis and WordPress are full web applications that require databases and ongoing maintenance. They also have inferior versioning and content review processes compared to Git. Static websites, such as the ones Jekyll produces, are faster, more secure, and more stable. + +In addition to the content for a given page, each Markdown file contains some Jekyll [front matter](https://jekyllrb.com/docs/front-matter/). Front matter looks like this: + +``` +--- +layout: default +title: Alerting security +nav_order: 10 +parent: Alerting +has_children: false +--- +``` + +If you're making [trivial changes](#trivial-changes), you don't have to worry about front matter. + +If you want to reorganize content or add new pages, keep an eye on `has_children`, `parent`, and `nav_order`, which define the hierarchy and order of pages in the lefthand navigation. For more information, see the documentation for [our upstream Jekyll theme](https://pmarsceill.github.io/just-the-docs/docs/navigation-structure/). + + +## Contribute content + +There are three ways to contribute content, depending on the magnitude of the change. + +- [Trivial changes](#trivial-changes) +- [Minor changes](#minor-changes) +- [Major changes](#major-changes) + + +### Trivial changes + +If you just need to fix a typo or add a sentence, this web-based method works well: + +1. On any page in the documentation, click the **Edit this page** link in the lower-left. + +1. Make your changes. + +1. Choose **Create a new branch for this commit and start a pull request** and **Commit changes**. + + +### Minor changes + +If you want to add a few paragraphs across multiple files and are comfortable with Git, try this approach: + +1. Fork this repository. + +1. Download [GitHub Desktop](https://desktop.github.com), install it, and clone your fork. + +1. Navigate to the repository root. + +1. Create a new branch. + +1. Edit the Markdown files in `/docs`. + +1. Commit, push your changes to your fork, and submit a pull request. + + +### Major changes + +If you're making major changes to the documentation and need to see the rendered HTML before submitting a pull request, here's how to build locally: + +1. Fork this repository. + +1. Download [GitHub Desktop](https://desktop.github.com), install it, and clone your fork. + +1. Navigate to the repository root. + +1. Install [Ruby](https://www.ruby-lang.org/en/) if you don't already have it. We recommend [RVM](https://rvm.io/), but use whatever method you prefer: + + ``` + curl -sSL https://get.rvm.io | bash -s stable + rvm install 2.6 + ruby -v + ``` + +1. Install [Jekyll](https://jekyllrb.com/) if you don't already have it: + + ``` + gem install bundler jekyll + ``` + +1. Install dependencies: + + ``` + bundle install + ``` + +1. Build: + + ``` + sh build.sh + ``` + +1. If the build script doesn't automatically open your web browser (it should), open [http://localhost:4000/](http://localhost:4000/). + +1. Create a new branch. + +1. Edit the Markdown files in `/docs`. + + If you're a web developer, you can customize `_layouts/default.html` and `_sass/custom/custom.scss`. + +1. When you save a file, marvel as Jekyll automatically rebuilds the site and refreshes your web browser. This process takes roughly 30 seconds. + +1. When you're happy with how everything looks, commit, push your changes to your fork, and submit a pull request. + + +## Writing tips + +1. Try to stay consistent with existing content and consistent within your new content. Don't call the same plugin KNN, k-nn, and k-NN in three different places. + +1. Shorter paragraphs are better than longer paragraphs. Use headers, tables, lists, and images to make your content easier for readers to scan. + +1. Use **bold** for user interface elements, *italics* for key terms or emphasis, and `monospace` for Bash commands, file names, REST paths, and code. + +1. Markdown file names should be all lowercase, use hyphens to separate words, and end in `.md`. + +1. Avoid future tense. Use present tense. + + **Bad**: After you click the button, the process will start. + + **Better**: After you click the button, the process starts. + +1. "You" refers to the person reading the page. "We" refers to the OpenSearch contributors. + + **Bad**: Now that we've finished the configuration, we have a working cluster. + + **Better**: At this point, you have a working cluster, but we recommend adding dedicated master nodes. + +1. Don't use "this" and "that" to refer to something without adding a noun. + + **Bad**: This can cause high latencies. + + **Better**: This additional loading time can cause high latencies. + +1. Use active voice. + + **Bad**: After the request is sent, the data is added to the index. + + **Better**: After you send the request, the OpenSearch cluster indexes the data. + +1. Introduce acronyms before using them. + + **Bad**: Reducing customer TTV should accelerate our ROIC. + + **Better**: Reducing customer time to value (TTV) should accelerate our return on invested capital (ROIC). + +1. Spell out one through nine. Start using numerals at 10. If a number needs a unit (GB, pounds, millimeters, kg, celsius, etc.), use numerals, even if the number if smaller than 10. + + **Bad**: 3 kids looked for thirteen files on a six GB hard drive. + + **Better**: Three kids looked for 13 files on a 6 GB hard drive. + + +## New releases + +1. Branch. +1. Change the `opensearch_version` and `opensearch_major_version` variables in `_config.yml`. +1. Start up a new cluster using the updated Docker Compose file in `docs/install/docker.md`. +1. Update the version table in `version-history.md`. + + Use `curl -XGET https://localhost:9200 -u admin:admin -k` to verify the OpenSearch version. + +1. Update the plugin compatibility table in `docs/install/plugin.md`. + + Use `curl -XGET https://localhost:9200/_cat/plugins -u admin:admin -k` to get the correct version strings. + +1. Update the plugin compatibility table in `docs/opensearch-dashboards/plugins.md`. + + Use `docker ps` to find the ID for the OpenSearch Dashboards node. Then use `docker exec -it /bin/bash` to get shell access. Finally, run `./bin/opensearch-dashboards-plugin list` to get the plugins and version strings. + +1. Run a build (`build.sh`), and look for any warnings or errors you introduced. +1. Verify that the individual plugin download links in `docs/install/plugins.md` and `docs/opensearch-dashboards/plugins.md` work. +1. Check for any other bad links (`check-links.sh`). Expect a few false positives for the `localhost` links. +1. Submit a PR. + + +## Classes within Markdown + +This documentation uses a modified version of the [just-the-docs](https://github.com/pmarsceill/just-the-docs) Jekyll theme, which has some useful classes for labels and buttons: + +``` +[Get started](#get-started){: .btn .btn-blue } + +## Get started +New +{: .label .label-green :} +``` + +* Labels come in default (blue), green, purple, yellow, and red. +* Buttons come in default, purple, blue, green, and outline. +* Warning, tip, and note blocks are available (`{: .warning }`, etc.). +* If an image has a white background, you can use `{: .img-border }` to add a one pixel border to the image. + +These classes can help with readability, but should be used *sparingly*. Each addition of a class damages the portability of the Markdown files and makes moving to a different Jekyll theme (or a different static site generator) more difficult. + +Besides, standard Markdown elements suffice for most documentation. + + +## Math + +If you want to use the sorts of pretty formulas that [MathJax](https://www.mathjax.org) allows, add `has_math: true` to the Jekyll page metadata. Then insert LaTeX math into HTML tags with the rest of your Markdown content: + +``` +## Math + +Some Markdown paragraph. Here's a formula: + +

+ When \(a \ne 0\), there are two solutions to \(ax^2 + bx + c = 0\) and they are + \[x = {-b \pm \sqrt{b^2-4ac} \over 2a}.\] +

+ +And back to Markdown. +``` + + +## Code of conduct + +This project has adopted an [Open Source Code of Conduct](https://opensearch.org/codeofconduct.html). + + +## Security issue notifications + +If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public GitHub issue. + + +## Licensing + +See the [LICENSE](./LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. + + +## Copyright + +Copyright Amazon.com, Inc. or its affiliates. All rights reserved. diff --git a/THIRD-PARTY b/THIRD-PARTY new file mode 100644 index 00000000..da1263c9 --- /dev/null +++ b/THIRD-PARTY @@ -0,0 +1,11 @@ +** (MIT License) Just the Docs 0.3.3 - https://github.com/pmarsceill/just-the-docs + + Copyright (c) 2016 Patrick Marsceill + +** (MIT License) Jekyll Pure Liquid Table of Contents 1.1.0 - https://github.com/allejo/jekyll-toc + + Copyright (c) 2017 Vladimir Jimenez + +** (MIT License) Bootstrap Icons 1.4.1 - https://github.com/twbs/icons + + Copyright (c) 2019-2020 The Bootstrap Authors diff --git a/_config.yml b/_config.yml new file mode 100644 index 00000000..4b3af416 --- /dev/null +++ b/_config.yml @@ -0,0 +1,98 @@ +# Welcome to Jekyll! +# +# This config file is meant for settings that affect your whole blog, values +# which you are expected to set up once and rarely edit after that. If you find +# yourself editing this file very often, consider using Jekyll's data files +# feature for the data you need to update frequently. +# +# For technical reasons, this file is *NOT* reloaded automatically when you use +# 'bundle exec jekyll serve'. If you change this file, please restart the server process. + +# Site settings +# These are used to personalize your new site. If you look in the HTML files, +# you will see them accessed via {{ site.title }}, {{ site.email }}, and so on. +# You can create any custom variable you would like, and they will be accessible +# in the templates via {{ site.myvariable }}. +title: OpenSearch documentation +description: >- # this means to ignore newlines until "baseurl:" + Documentation for OpenSearch, the Apache 2.0 search, analytics, and visualization suite with advanced security, alerting, SQL support, automated index management, deep performance analysis, and more. +baseurl: "" # the subpath of your site, e.g. /blog +url: "https://docs-beta.opensearch.org" # the base hostname & protocol for your site, e.g. http://example.com +permalink: pretty + +opensearch_version: 1.0.0-beta1 +opensearch_major_minor_version: 1.0 + +# Build settings +markdown: kramdown +remote_theme: pmarsceill/just-the-docs@v0.3.3 + +# Kramdown settings +kramdown: + toc_levels: 2..3 + +logo: "/assets/images/fake-logo.svg" + +# Aux links for the upper right navigation +aux_links: + "Back to OpenSearch.org": + - "https://opensearch.org/" +color_scheme: opensearch + +# Enable or disable the site search +# Supports true (default) or false +search_enabled: true + +search: + # Split pages into sections that can be searched individually + # Supports 1 - 6, default: 2 + heading_level: 2 + # Maximum amount of previews per search result + # Default: 3 + previews: 3 + # Maximum amount of words to display before a matched word in the preview + # Default: 5 + preview_words_before: 5 + # Maximum amount of words to display after a matched word in the preview + # Default: 10 + preview_words_after: 10 + # Set the search token separator + # Default: /[\s\-/]+/ + # Example: enable support for hyphenated search words + tokenizer_separator: /[\s/]+/ + # Display the relative url in search results + # Supports true (default) or false + rel_url: true + # Enable or disable the search button that appears in the bottom right corner of every page + # Supports true or false (default) + button: false + +# Google Analytics Tracking (optional) +# e.g, UA-1234567-89 +ga_tracking: UA-135423944-1 + +# Disable the just-the-docs theme anchor links in favor of our custom ones +# See _includes/head_custom.html +heading_anchors: false + +# Adds on-hover anchor links to h2-h6 +anchor_links: true + +footer_content: + +plugins: + - jekyll-remote-theme + - jekyll-redirect-from + +# Exclude from processing. +# The following items will not be processed, by default. Create a custom list +# to override the default setting. +exclude: + - Gemfile + - Gemfile.lock + - node_modules + - vendor/bundle/ + - vendor/cache/ + - vendor/gems/ + - vendor/ruby/ + - README.md diff --git a/_includes/head_custom.html b/_includes/head_custom.html new file mode 100755 index 00000000..e40a90b7 --- /dev/null +++ b/_includes/head_custom.html @@ -0,0 +1,21 @@ +{% if site.anchor_links != nil %} + +{% endif %} + +{% if page.has_math == true %} + + +{% endif %} + + + + + + + diff --git a/_includes/nav.html b/_includes/nav.html new file mode 100644 index 00000000..555d6a5b --- /dev/null +++ b/_includes/nav.html @@ -0,0 +1,106 @@ + diff --git a/_includes/toc.html b/_includes/toc.html new file mode 100644 index 00000000..8c710072 --- /dev/null +++ b/_includes/toc.html @@ -0,0 +1,182 @@ +{% capture tocWorkspace %} + {% comment %} + Copyright (c) 2017 Vladimir "allejo" Jimenez + + Permission is hereby granted, free of charge, to any person + obtaining a copy of this software and associated documentation + files (the "Software"), to deal in the Software without + restriction, including without limitation the rights to use, + copy, modify, merge, publish, distribute, sublicense, and/or sell + copies of the Software, and to permit persons to whom the + Software is furnished to do so, subject to the following + conditions: + + The above copyright notice and this permission notice shall be + included in all copies or substantial portions of the Software. + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES + OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT + HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, + WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + OTHER DEALINGS IN THE SOFTWARE. + {% endcomment %} + {% comment %} + Version 1.1.0 + https://github.com/allejo/jekyll-toc + + "...like all things liquid - where there's a will, and ~36 hours to spare, there's usually a/some way" ~jaybe + + Usage: + {% include toc.html html=content sanitize=true class="inline_toc" id="my_toc" h_min=2 h_max=3 %} + + Parameters: + * html (string) - the HTML of compiled markdown generated by kramdown in Jekyll + + Optional Parameters: + * sanitize (bool) : false - when set to true, the headers will be stripped of any HTML in the TOC + * class (string) : '' - a CSS class assigned to the TOC + * id (string) : '' - an ID to assigned to the TOC + * h_min (int) : 1 - the minimum TOC header level to use; any header lower than this value will be ignored + * h_max (int) : 6 - the maximum TOC header level to use; any header greater than this value will be ignored + * ordered (bool) : false - when set to true, an ordered list will be outputted instead of an unordered list + * item_class (string) : '' - add custom class(es) for each list item; has support for '%level%' placeholder, which is the current heading level + * submenu_class (string) : '' - add custom class(es) for each child group of headings; has support for '%level%' placeholder which is the current "submenu" heading level + * base_url (string) : '' - add a base url to the TOC links for when your TOC is on another page than the actual content + * anchor_class (string) : '' - add custom class(es) for each anchor element + * skip_no_ids (bool) : false - skip headers that do not have an `id` attribute + + Output: + An ordered or unordered list representing the table of contents of a markdown block. This snippet will only + generate the table of contents and will NOT output the markdown given to it + {% endcomment %} + + {% capture newline %} + {% endcapture %} + {% assign newline = newline | rstrip %} + + {% capture deprecation_warnings %}{% endcapture %} + + {% if include.baseurl %} + {% capture deprecation_warnings %}{{ deprecation_warnings }}{{ newline }}{% endcapture %} + {% endif %} + + {% if include.skipNoIDs %} + {% capture deprecation_warnings %}{{ deprecation_warnings }}{{ newline }}{% endcapture %} + {% endif %} + + {% capture jekyll_toc %}{% endcapture %} + {% assign orderedList = include.ordered | default: false %} + {% assign baseURL = include.base_url | default: include.baseurl | default: '' %} + {% assign skipNoIDs = include.skip_no_ids | default: include.skipNoIDs | default: false %} + {% assign minHeader = include.h_min | default: 1 %} + {% assign maxHeader = include.h_max | default: 6 %} + {% assign nodes = include.html | strip | split: ' maxHeader %} + {% continue %} + {% endif %} + + {% assign _workspace = node | split: '' | first }}>{% endcapture %} + {% assign header = _workspace[0] | replace: _hAttrToStrip, '' %} + + {% if include.item_class and include.item_class != blank %} + {% capture listItemClass %} class="{{ include.item_class | replace: '%level%', currLevel | split: '.' | join: ' ' }}"{% endcapture %} + {% endif %} + + {% if include.submenu_class and include.submenu_class != blank %} + {% assign subMenuLevel = currLevel | minus: 1 %} + {% capture subMenuClass %} class="{{ include.submenu_class | replace: '%level%', subMenuLevel | split: '.' | join: ' ' }}"{% endcapture %} + {% endif %} + + {% capture anchorBody %}{% if include.sanitize %}{{ header | strip_html }}{% else %}{{ header }}{% endif %}{% endcapture %} + + {% if htmlID %} + {% capture anchorAttributes %} href="{% if baseURL %}{{ baseURL }}{% endif %}#{{ htmlID }}"{% endcapture %} + + {% if include.anchor_class %} + {% capture anchorAttributes %}{{ anchorAttributes }} class="{{ include.anchor_class | split: '.' | join: ' ' }}"{% endcapture %} + {% endif %} + + {% capture listItem %}{{ anchorBody }}{% endcapture %} + {% elsif skipNoIDs == true %} + {% continue %} + {% else %} + {% capture listItem %}{{ anchorBody }}{% endcapture %} + {% endif %} + + {% if currLevel > lastLevel %} + {% capture jekyll_toc %}{{ jekyll_toc }}<{{ listModifier }}{{ subMenuClass }}>{% endcapture %} + {% elsif currLevel < lastLevel %} + {% assign repeatCount = lastLevel | minus: currLevel %} + + {% for i in (1..repeatCount) %} + {% capture jekyll_toc %}{{ jekyll_toc }}{% endcapture %} + {% endfor %} + + {% capture jekyll_toc %}{{ jekyll_toc }}{% endcapture %} + {% else %} + {% capture jekyll_toc %}{{ jekyll_toc }}{% endcapture %} + {% endif %} + + {% capture jekyll_toc %}{{ jekyll_toc }}{{ listItem }}{% endcapture %} + + {% assign lastLevel = currLevel %} + {% assign firstHeader = false %} + {% endfor %} + + {% assign repeatCount = minHeader | minus: 1 %} + {% assign repeatCount = lastLevel | minus: repeatCount %} + {% for i in (1..repeatCount) %} + {% capture jekyll_toc %}{{ jekyll_toc }}{% endcapture %} + {% endfor %} + + {% if jekyll_toc != '' %} + {% assign rootAttributes = '' %} + {% if include.class and include.class != blank %} + {% capture rootAttributes %} class="{{ include.class | split: '.' | join: ' ' }}"{% endcapture %} + {% endif %} + + {% if include.id and include.id != blank %} + {% capture rootAttributes %}{{ rootAttributes }} id="{{ include.id }}"{% endcapture %} + {% endif %} + + {% if rootAttributes %} + {% assign nodes = jekyll_toc | split: '>' %} + {% capture jekyll_toc %}<{{ listModifier }}{{ rootAttributes }}>{{ nodes | shift | join: '>' }}>{% endcapture %} + {% endif %} + {% endif %} +{% endcapture %}{% assign tocWorkspace = '' %}{{ deprecation_warnings }}{{ jekyll_toc }} diff --git a/_layouts/default.html b/_layouts/default.html new file mode 100755 index 00000000..b1c1ff0a --- /dev/null +++ b/_layouts/default.html @@ -0,0 +1,219 @@ +--- +layout: table_wrappers +--- + + + + +{% include head.html %} + + + + Link + + + + + + Search + + + + + + Menu + + + + + + Expand + + + + + + Document + + + + + + External + + + + + + + + +
+
+ {% if site.search_enabled != false %} + + {% endif %} + {% include header_custom.html %} + {% if site.aux_links %} + + {% endif %} +
+
+ {% unless page.url == "/" %} + {% if page.parent %} + {%- for node in pages_list -%} + {%- if node.parent == nil -%} + {%- if page.parent == node.title or page.grand_parent == node.title -%} + {%- assign first_level_url = node.url | absolute_url -%} + {%- endif -%} + {%- if node.has_children -%} + {%- assign children_list = pages_list | where: "parent", node.title -%} + {%- for child in children_list -%} + {%- if page.url == child.url or page.parent == child.title -%} + {%- assign second_level_url = child.url | absolute_url -%} + {%- endif -%} + {%- endfor -%} + {%- endif -%} + {%- endif -%} + {%- endfor -%} + + {% endif %} + {% endunless %} +
+ {% if site.heading_anchors != false %} + {% include vendor/anchor_headings.html html=content beforeHeading="true" anchorBody="" anchorClass="anchor-heading" anchorAttrs="aria-labelledby=\"%html_id%\"" %} + {% else %} +

Like OpenSearch itself, this documentation is a beta. It has content gaps and might contain bugs.

+ {{ content }} + {% endif %} + + {% if page.has_children == true and page.has_toc != false %} +
+

Table of contents

+
    + {%- assign children_list = pages_list | where: "parent", page.title | where: "grand_parent", page.parent -%} + {% for child in children_list %} +
  • + {{ child.title }}{% if child.summary %} - {{ child.summary }}{% endif %} +
  • + {% endfor %} +
+ {% endif %} + + {% capture footer_custom %} + {%- include footer_custom.html -%} + {% endcapture %} + {% if footer_custom != "" or site.last_edit_timestamp or site.gh_edit_link %} +
+
+ {% if site.back_to_top %} +

{{ site.back_to_top_text }}

+ {% endif %} + + {{ footer_custom }} + + {% if site.last_edit_timestamp or site.gh_edit_link %} +
+ {% if site.last_edit_timestamp and site.last_edit_time_format and page.last_modified_date %} +

+ Page last modified: {{ page.last_modified_date | date: site.last_edit_time_format }}. +

+ {% endif %} + {% if + site.gh_edit_link and + site.gh_edit_link_text and + site.gh_edit_repository and + site.gh_edit_branch and + site.gh_edit_view_mode + %} +

+ {{ site.gh_edit_link_text }} +

+ {% endif %} +
+ {% endif %} +
+ {% endif %} + +
+
+
+ {% include toc.html html=content h_min=2 h_max=2 class="toc-list" item_class="toc-item" sanitize=true %} +
+ {% if site.search_enabled != false %} + {% if site.search.button %} + + + + {% endif %} + +
+ {% endif %} +
+ + {% if site.anchor_links != nil %} + + {% endif %} + + diff --git a/_sass/color_schemes/odfe.scss b/_sass/color_schemes/odfe.scss new file mode 100644 index 00000000..f9b2ca02 --- /dev/null +++ b/_sass/color_schemes/odfe.scss @@ -0,0 +1,75 @@ +// +// Brand colors +// + +$white: #FFFFFF; + +$grey-dk-300: #241F21; // Error +$grey-dk-250: mix(white, $grey-dk-300, 12.5%); +$grey-dk-200: mix(white, $grey-dk-300, 25%); +$grey-dk-100: mix(white, $grey-dk-300, 50%); +$grey-dk-000: mix(white, $grey-dk-300, 75%); + +$grey-lt-300: #DBDBDB; // Cloud +$grey-lt-200: mix(white, $grey-lt-300, 25%); +$grey-lt-100: mix(white, $grey-lt-300, 50%); +$grey-lt-000: mix(white, $grey-lt-300, 75%); + +$blue-300: #00007C; // Meta +$blue-200: mix(white, $blue-300, 25%); +$blue-100: mix(white, $blue-300, 50%); +$blue-000: mix(white, $blue-300, 75%); + +$purple-300: #9600FF; // Prpl +$purple-200: mix(white, $purple-300, 25%); +$purple-100: mix(white, $purple-300, 50%); +$purple-000: mix(white, $purple-300, 75%); + +$green-300: #00671A; // Element +$green-200: mix(white, $green-300, 25%); +$green-100: mix(white, $green-300, 50%); +$green-000: mix(white, $green-300, 75%); + +$yellow-300: #FFDF00; // Kan-Banana +$yellow-200: mix(white, $yellow-300, 25%); +$yellow-100: mix(white, $yellow-300, 50%); +$yellow-000: mix(white, $yellow-300, 75%); + +$red-300: #BD145A; // Ruby +$red-200: mix(white, $red-300, 25%); +$red-100: mix(white, $red-300, 50%); +$red-000: mix(white, $red-300, 75%); + +$blue-lt-300: #0000FF; // Cascade +$blue-lt-200: mix(white, $blue-lt-300, 25%); +$blue-lt-100: mix(white, $blue-lt-300, 50%); +$blue-lt-000: mix(white, $blue-lt-300, 75%); + +/* +Other, unused brand colors + +Float #2797F4 +Firewall #0FF006B +Hyper Pink #F261A1 +Cluster #ED20EB +Back End #808080 +Python #25EE5C +Warm Node #FEA501 +*/ + +$body-background-color: $white; +$sidebar-color: $grey-lt-000; +$code-background-color: $grey-lt-000; + +$body-text-color: $grey-dk-200; +$body-heading-color: $grey-dk-300; +$nav-child-link-color: $grey-dk-200; +$link-color: mix(black, $blue-lt-300, 37.5%); +$btn-primary-color: $purple-300; +$base-button-color: $grey-lt-000; + +// $border-color: $grey-dk-200; +// $search-result-preview-color: $grey-dk-000; +// $search-background-color: $grey-dk-250; +// $table-background-color: $grey-dk-250; +// $feedback-color: darken($sidebar-color, 3%); diff --git a/_sass/color_schemes/opensearch.scss b/_sass/color_schemes/opensearch.scss new file mode 100644 index 00000000..47bff937 --- /dev/null +++ b/_sass/color_schemes/opensearch.scss @@ -0,0 +1,75 @@ +// +// Brand colors +// + +$white: #FFFFFF; + +$grey-dk-300: #002A3A; // +$grey-dk-250: mix(white, $grey-dk-300, 12.5%); +$grey-dk-200: mix(white, $grey-dk-300, 25%); +$grey-dk-100: mix(white, $grey-dk-300, 50%); +$grey-dk-000: mix(white, $grey-dk-300, 75%); + +$grey-lt-300: #D9E1E2; // +$grey-lt-200: mix(white, $grey-lt-300, 25%); +$grey-lt-100: mix(white, $grey-lt-300, 50%); +$grey-lt-000: mix(white, $grey-lt-300, 75%); + +$blue-300: #005eb8; // +$blue-200: mix(white, $blue-300, 25%); +$blue-100: mix(white, $blue-300, 50%); +$blue-000: mix(white, $blue-300, 75%); + +$purple-300: #963CBD; // +$purple-200: mix(white, $purple-300, 25%); +$purple-100: mix(white, $purple-300, 50%); +$purple-000: mix(white, $purple-300, 75%); + +$green-300: #2cd5c4; // +$green-200: mix(white, $green-300, 25%); +$green-100: mix(white, $green-300, 50%); +$green-000: mix(white, $green-300, 75%); + +$yellow-300: #FFDF00; // +$yellow-200: mix(white, $yellow-300, 25%); +$yellow-100: mix(white, $yellow-300, 50%); +$yellow-000: mix(white, $yellow-300, 75%); + +$red-300: #F65275; // +$red-200: mix(white, $red-300, 25%); +$red-100: mix(white, $red-300, 50%); +$red-000: mix(white, $red-300, 75%); + +$blue-lt-300: #00A3E0; // +$blue-lt-200: mix(white, $blue-lt-300, 25%); +$blue-lt-100: mix(white, $blue-lt-300, 50%); +$blue-lt-000: mix(white, $blue-lt-300, 75%); + +/* +Other, unused brand colors + +Float #2797F4 +Firewall #0FF006B +Hyper Pink #F261A1 +Cluster #ED20EB +Back End #808080 +Python #25EE5C +Warm Node #FEA501 +*/ + +$body-background-color: $white; +$sidebar-color: $grey-lt-000; +$code-background-color: $grey-lt-000; + +$body-text-color: $grey-dk-200; +$body-heading-color: $grey-dk-300; +$nav-child-link-color: $grey-dk-200; +$link-color: mix(black, $blue-lt-300, 37.5%); +$btn-primary-color: $purple-300; +$base-button-color: $grey-lt-000; + +// $border-color: $grey-dk-200; +// $search-result-preview-color: $grey-dk-000; +// $search-background-color: $grey-dk-250; +// $table-background-color: $grey-dk-250; +// $feedback-color: darken($sidebar-color, 3%); diff --git a/_sass/custom/custom.scss b/_sass/custom/custom.scss new file mode 100755 index 00000000..5b0a48fd --- /dev/null +++ b/_sass/custom/custom.scss @@ -0,0 +1,232 @@ +@import url('https://fonts.googleapis.com/css?family=Open+Sans:400,400i,600,700'); + +// Additional variables +$table-border-color: $grey-lt-300; +$toc-width: 232px !default; +$red-dk-200: mix(black, $red-300, 25%); + +// Replaces xl size +$media-queries: ( + xs: 320px, + sm: 500px, + md: $content-width, + lg: $content-width + $nav-width, + xl: $content-width + $nav-width + $toc-width +); + +body { + padding-bottom: 6rem; + font-family: 'Open Sans', sans-serif; + @include mq(md) { + padding-bottom: 0; + } +} + +code { + font-family: "SFMono-Regular", Menlo, "DejaVu Sans Mono", "Droid Sans Mono", Consolas, Monospace; + font-size: 0.75rem; +} + +.site-nav { + padding-top: 2rem; +} + +.main-content { + ol { + > li { + &:before { + color: $grey-dk-100; + } + } + } + ul { + > li { + &:before { + color: $grey-dk-100; + } + } + } + h1, h2, h3, h4, h5, h6 { + margin-top: 2.4rem; + margin-bottom: 0.8rem; + } + .highlight { + line-height: 1.4; + } +} + +.site-title { + @include mq(md) { + padding-top: 1rem; + padding-bottom: 0.6rem; + padding-left: $sp-5; + } +} + +.external-arrow { + position: relative; + top: 0.125rem; + left: 0.25rem; +} + +img { + padding: 1rem 0; +} + +.img-border { + border: 1px solid $grey-lt-200; +} + +// Note, tip, and warning blocks +%callout { + border: 1px solid $grey-lt-200; + border-radius: 5px; + margin: 1rem 0; + padding: 1rem; + position: relative; +} + +.note { + @extend %callout; + border-left: 5px solid $blue-300; +} + +.tip { + @extend %callout; + border-left: 5px solid $green-300; +} + +.warning { + @extend %callout; + border-left: 5px solid $red-dk-200; +} + +// Labels +.label, +.label-blue { + background-color: $blue-300; +} + +.label-green { + background-color: $green-300; +} + +.label-purple { + background-color: $purple-300; +} + +.label-red { + background-color: $red-300; +} + +.label-yellow { + color: $grey-dk-200; + background-color: $yellow-300; +} + +// Buttons +.btn-primary { + @include btn-color($white, $btn-primary-color); +} + +.btn-purple { + @include btn-color($white, $purple-300); +} + +.btn-blue { + @include btn-color($white, $blue-300); +} + +.btn-green { + @include btn-color($white, $green-300); +} + +// Tables +th, +td { + border-bottom: $border rgba($table-border-color, 0.5); + border-left: $border $table-border-color; +} + +thead { + th { + border-bottom: 1px solid $table-border-color; + } +} +td { + pre { + margin-bottom: 0; + } +} + +// Keeps labels high and tight next to headers +h1 + p.label { + margin: -23px 0 0 0; +} +h2 + p.label { + margin: -15px 0 0 0; +} +h3 + p.label { + margin: -10px 0 0 0; +} +h4 + p.label, +h5 + p.label, +h6 + p.label { + margin: -7px 0 0 0; +} + +// Modifies margins in xl layout to support TOC +.side-bar { + @include mq(xl) { + width: calc((100% - #{$nav-width + $content-width + $toc-width}) / 2 + #{$nav-width}); + min-width: $nav-width; + } +} + +.main { + @include mq(xl) { + margin-left: calc((100% - #{$nav-width + $content-width + $toc-width}) / 2 + #{$nav-width}); + } +} + +// Adds TOC to righthand side in xl layout +.toc { + display: none; + @include mq(xl) { + z-index: 0; + display: block; + position: fixed; + top: 59px; + right: calc((100% - #{$nav-width + $content-width + $toc-width}) / 2); + width: $toc-width; + max-height: calc(100% - 118px); + overflow: auto; + } +} + +.toc-list { + &:before { + content: "On this page"; + // Basically duplicates h4 styling + font-size: 12px; + font-weight: 300; + text-transform: uppercase; + letter-spacing: 0.1em; + color: $grey-dk-300; + line-height: 1.8; + } + border: 1px solid $border-color; + font-size: 14px; + list-style-type: none; + background-color: $sidebar-color; + padding: $sp-6 $sp-4; + margin-left: $sp-6; + margin-right: 0; + margin-bottom: 0; + overflow: auto; +} + +.toc-item { + padding-top: .25rem; + padding-bottom: .25rem; +} diff --git a/assets/examples/ldap-example.zip b/assets/examples/ldap-example.zip new file mode 100644 index 00000000..29a2ee81 Binary files /dev/null and b/assets/examples/ldap-example.zip differ diff --git a/assets/examples/saml-example.zip b/assets/examples/saml-example.zip new file mode 100644 index 00000000..fb0e0265 Binary files /dev/null and b/assets/examples/saml-example.zip differ diff --git a/assets/images/fake-logo.png b/assets/images/fake-logo.png new file mode 100644 index 00000000..6394dc76 Binary files /dev/null and b/assets/images/fake-logo.png differ diff --git a/assets/images/fake-logo.svg b/assets/images/fake-logo.svg new file mode 100644 index 00000000..7ff23f7a --- /dev/null +++ b/assets/images/fake-logo.svg @@ -0,0 +1,7 @@ + + + + + + + diff --git a/build.sh b/build.sh new file mode 100644 index 00000000..4806f144 --- /dev/null +++ b/build.sh @@ -0,0 +1 @@ +bundle exec jekyll serve --host localhost --port 4000 --incremental --livereload --open-url diff --git a/check-links.sh b/check-links.sh new file mode 100644 index 00000000..deff68cb --- /dev/null +++ b/check-links.sh @@ -0,0 +1,5 @@ +# Checks for broken link in the documentation. +# Run `bundle exec jekyll serve` first. +# Uses https://github.com/stevenvachon/broken-link-checker +# I have no idea why we have to exclude the ISM section, but that's the only way I can get this to run. - ae +blc http://127.0.0.1:4000/docs/ -ro --filter-level 0 --exclude http://127.0.0.1:4000/docs/docs/ism/ --exclude http://localhost:5601/ diff --git a/docs/ad/api.md b/docs/ad/api.md new file mode 100644 index 00000000..d44c7d4e --- /dev/null +++ b/docs/ad/api.md @@ -0,0 +1,2174 @@ +--- +layout: default +title: Anomaly detection API +parent: Anomaly detection +nav_order: 1 +--- + +# Anomaly detection API + +Use these anomaly detection operations to programmatically create and manage detectors. + +--- + +#### Table of contents +- TOC +{:toc} + + +--- + +## Create anomaly detector + +Creates an anomaly detector. + +This command creates a detector named `http_requests` that finds anomalies based on the sum and average number of failed HTTP requests: + + +#### Request + +```json +POST _opensearch/_anomaly_detection/detectors +{ + "name": "test-detector", + "description": "Test detector", + "time_field": "timestamp", + "indices": [ + "order*" + ], + "feature_attributes": [ + { + "feature_name": "total_order", + "feature_enabled": true, + "aggregation_query": { + "total_order": { + "sum": { + "field": "value" + } + } + } + } + ], + "filter_query": { + "bool": { + "filter": [ + { + "exists": { + "field": "value", + "boost": 1 + } + } + ], + "adjust_pure_negative": true, + "boost": 1 + } + }, + "detection_interval": { + "period": { + "interval": 1, + "unit": "Minutes" + } + }, + "window_delay": { + "period": { + "interval": 1, + "unit": "Minutes" + } + } +} +``` + +#### Sample response + +```json +{ + "_id": "m4ccEnIBTXsGi3mvMt9p", + "_version": 1, + "_seq_no": 3, + "_primary_term": 1, + "anomaly_detector": { + "name": "test-detector", + "description": "Test detector", + "time_field": "timestamp", + "indices": [ + "order*" + ], + "filter_query": { + "bool": { + "filter": [ + { + "exists": { + "field": "value", + "boost": 1 + } + } + ], + "adjust_pure_negative": true, + "boost": 1 + } + }, + "detection_interval": { + "period": { + "interval": 1, + "unit": "Minutes" + } + }, + "window_delay": { + "period": { + "interval": 1, + "unit": "Minutes" + } + }, + "schema_version": 0, + "feature_attributes": [ + { + "feature_id": "mYccEnIBTXsGi3mvMd8_", + "feature_name": "total_order", + "feature_enabled": true, + "aggregation_query": { + "total_order": { + "sum": { + "field": "value" + } + } + } + } + ] + } +} +``` + +To set a category field for high cardinality: + +#### Request + +```json +POST _opensearch/_anomaly_detection/detectors +{ + "name": "Host OK Rate Detector", + "description": "ok rate", + "time_field": "@timestamp", + "indices": [ + "host-cloudwatch" + ], + "category_field": [ + "host" + ], + "feature_attributes": [ + { + "feature_name": "latency_max", + "feature_enabled": true, + "aggregation_query": { + "latency_max": { + "max": { + "field": "latency" + } + } + } + } + ], + "window_delay": { + "period": { + "interval": 10, + "unit": "MINUTES" + } + }, + "detection_interval": { + "period": { + "interval": 1, + "unit": "MINUTES" + } + } +} +``` + +#### Sample response + +```json +{ + "_id": "4CIGoHUBTpMGN-4KzBQg", + "_version": 1, + "_seq_no": 0, + "anomaly_detector": { + "name": "Host OK Rate Detector", + "description": "ok rate", + "time_field": "@timestamp", + "indices": [ + "server-metrics" + ], + "filter_query": { + "match_all": { + "boost": 1 + } + }, + "detection_interval": { + "period": { + "interval": 1, + "unit": "Minutes" + } + }, + "window_delay": { + "period": { + "interval": 10, + "unit": "MINUTES" + } + }, + "shingle_size": 1, + "schema_version": 2, + "feature_attributes": [ + { + "feature_id": "0Kld3HUBhpHMyt2e_UHn", + "feature_name": "latency_max", + "feature_enabled": true, + "aggregation_query": { + "latency_max": { + "max": { + "field": "latency" + } + } + } + } + ], + "last_update_time": 1604707601438, + "category_field": [ + "host" + ] + }, + "_primary_term": 1 +} +``` + +To create a historical detector: + +#### Request + +```json +POST _opensearch/_anomaly_detection/detectors +{ + "name": "test1", + "description": "test historical detector", + "time_field": "timestamp", + "indices": [ + "host-cloudwatch" + ], + "filter_query": { + "match_all": { + "boost": 1 + } + }, + "detection_interval": { + "period": { + "interval": 1, + "unit": "Minutes" + } + }, + "window_delay": { + "period": { + "interval": 1, + "unit": "Minutes" + } + }, + "feature_attributes": [ + { + "feature_name": "F1", + "feature_enabled": true, + "aggregation_query": { + "f_1": { + "sum": { + "field": "value" + } + } + } + } + ], + "detection_date_range": { + "start_time": 1577840401000, + "end_time": 1606121925000 + } +} +``` + +You can specify the following options. + +Options | Description | Type | Required +:--- | :--- |:--- |:--- | +`name` | The name of the detector. | `string` | Yes +`description` | A description of the detector. | `string` | Yes +`time_field` | The name of the time field. | `string` | Yes +`indices` | A list of indices to use as the data source. | `list` | Yes +`feature_attributes` | Specify a `feature_name`, set the `enabled` parameter to `true`, and specify an aggregation query. | `list` | Yes +`filter_query` | Provide an optional filter query for your feature. | `object` | No +`detection_interval` | The time interval for your anomaly detector. | `object` | Yes +`window_delay` | Add extra processing time for data collection. | `object` | No +`category_field` | Categorizes or slices data with a dimension. Similar to `GROUP BY` in SQL. | `list` | No +`detection_date_range` | Specify the start time and end time for a historical detector. | `object` | No + +--- + +## Preview detector + +Passes a date range to the anomaly detector to return any anomalies within that date range. + +#### Request + +```json +POST _opensearch/_anomaly_detection/detectors//_preview +{ + "period_start": 1588838250000, + "period_end": 1589443050000 +} +``` + +#### Sample response + +```json +{ + "anomaly_result": [ + ... + { + "detector_id": "m4ccEnIBTXsGi3mvMt9p", + "data_start_time": 1588843020000, + "data_end_time": 1588843620000, + "feature_data": [ + { + "feature_id": "xxokEnIBcpeWMD987A1X", + "feature_name": "total_order", + "data": 489.9929131106 + } + ], + "anomaly_grade": 0, + "confidence": 0.99 + } + ... + ], + "anomaly_detector": { + "name": "test-detector", + "description": "Test detector", + "time_field": "timestamp", + "indices": [ + "order*" + ], + "filter_query": { + "bool": { + "filter": [ + { + "exists": { + "field": "value", + "boost": 1 + } + } + ], + "adjust_pure_negative": true, + "boost": 1 + } + }, + "detection_interval": { + "period": { + "interval": 10, + "unit": "MINUTES" + } + }, + "window_delay": { + "period": { + "interval": 1, + "unit": "MINUTES" + } + }, + "schema_version": 0, + "feature_attributes": [ + { + "feature_id": "xxokEnIBcpeWMD987A1X", + "feature_name": "total_order", + "feature_enabled": true, + "aggregation_query": { + "total_order": { + "sum": { + "field": "value" + } + } + } + } + ], + "last_update_time": 1589442309241 + } +} +``` + +If you specify a category field, each result is associated with an entity: + +#### Sample response + +```json +{ + "anomaly_result": [ + { + "detector_id": "4CIGoHUBTpMGN-4KzBQg", + "data_start_time": 1604277960000, + "data_end_time": 1604278020000, + "schema_version": 0, + "anomaly_grade": 0, + "confidence": 0.99 + } + ], + "entity": [ + { + "name": "host", + "value": "i-00f28ec1eb8997686" + } + ] +}, +{ + "detector_id": "4CIGoHUBTpMGN-4KzBQg", + "data_start_time": 1604278020000, + "data_end_time": 1604278080000, + "schema_version": 0, + "feature_data": [ + { + "feature_id": "0Kld3HUBhpHMyt2e_UHn", + "feature_name": "latency_max", + "data": -17 + } + ], + "anomaly_grade": 0, + "confidence": 0.99, + "entity": [ + { + "name": "host", + "value": "i-00f28ec1eb8997686" + } + ] +} +... + +``` + +--- + +## Start detector job + +Starts a real-time or historical detector job. + + +#### Request + +```json +POST _opensearch/_anomaly_detection/detectors//_start +``` + +#### Sample response + +```json +{ + "_id" : "m4ccEnIBTXsGi3mvMt9p", + "_version" : 1, + "_seq_no" : 6, + "_primary_term" : 1 +} +``` + + +--- + +## Stop detector job + +Stops a real-time or historical anomaly detector job. + +#### Request + +```json +POST _opensearch/_anomaly_detection/detectors//_stop +``` + +#### Sample response + +```json +Stopped detector: m4ccEnIBTXsGi3mvMt9p +``` + +--- + +## Search detector result + +Returns all results for a search query. + +#### Request + +```json +GET _opensearch/_anomaly_detection/detectors/results/_search +POST _opensearch/_anomaly_detection/detectors/results/_search + +{ + "query": { + "bool": { + "must": { + "range": { + "anomaly_score": { + "gte": 0.6, + "lte": 1 + } + } + } + } + } +} +``` + +#### Sample response + +```json +{ + "took": 9, + "timed_out": false, + "_shards": { + "total": 25, + "successful": 25, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": ".opensearch-anomaly-results-history-2020.04.30-1", + "_type": "_doc", + "_id": "_KBrzXEBbpoKkFM5mStm", + "_version": 1, + "_seq_no": 58, + "_primary_term": 1, + "_score": 1, + "_source": { + "detector_id": "2KDozHEBbpoKkFM58yr6", + "anomaly_score": 0.8995068350366767, + "execution_start_time": 1588289313114, + "data_end_time": 1588289313114, + "confidence": 0.84214852704501, + "data_start_time": 1588289253114, + "feature_data": [ + { + "feature_id": "X0fpzHEB5NGZmIRkXKcy", + "feature_name": "total_error", + "data": 20 + } + ], + "execution_end_time": 1588289313126, + "anomaly_grade": 0 + } + }, + { + "_index": ".opensearch-anomaly-results-history-2020.04.30-1", + "_type": "_doc", + "_id": "EqB1zXEBbpoKkFM5qyyE", + "_version": 1, + "_seq_no": 61, + "_primary_term": 1, + "_score": 1, + "_source": { + "detector_id": "2KDozHEBbpoKkFM58yr6", + "anomaly_score": 0.7086834513354907, + "execution_start_time": 1588289973113, + "data_end_time": 1588289973113, + "confidence": 0.42162017029510446, + "data_start_time": 1588289913113, + "feature_data": [ + { + "feature_id": "X0fpzHEB5NGZmIRkXKcy", + "feature_name": "memory_usage", + "data": 20.0347333108 + } + ], + "execution_end_time": 1588289973124, + "anomaly_grade": 0 + } + } + ] + } +} +``` + +In high cardinality detectors, the result contains entities’ information. +To see an ordered set of anomaly records for an entity with an anomaly within a certain time range for a specific feature value: + +#### Request + +```json +POST _opensearch/_anomaly_detection/detectors/results/_search +{ + "query": { + "bool": { + "filter": [ + { + "term": { + "detector_id": "4CIGoHUBTpMGN-4KzBQg" + } + }, + { + "range": { + "anomaly_grade": { + "gt": 0 + } + } + }, + { + "nested": { + "path": "entity", + "query": { + "bool": { + "must": [ + { + "term": { + "entity.value": "i-00f28ec1eb8997685" + } + } + ] + } + } + } + } + ] + } + }, + "size": 8, + "sort": [ + { + "execution_end_time": { + "order": "desc" + } + } + ], + "track_total_hits": true +} +``` + +#### Sample response + +```json +{ + "took": 443, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 7, + "relation": "eq" + }, + "max_score": null, + "hits": [ + { + "_index": ".opensearch-anomaly-results-history-2020.11.07-1", + "_type": "_doc", + "_id": "BiItoHUBTpMGN-4KARY5", + "_version": 1, + "_seq_no": 206, + "_primary_term": 1, + "_score": null, + "_source": { + "detector_id": "4CIGoHUBTpMGN-4KzBQg", + "schema_version": 2, + "anomaly_score": 2.462550517055763, + "execution_start_time": 1604710105400, + "data_end_time": 1604710094516, + "confidence": 0.8246254862573076, + "data_start_time": 1604710034516, + "feature_data": [ + { + "feature_id": "0Kld3HUBhpHMyt2e_UHn", + "feature_name": "latency_max", + "data": 3526 + } + ], + "execution_end_time": 1604710105401, + "anomaly_grade": 0.08045977011494891, + "entity": [ + { + "name": "host", + "value": "i-00f28ec1eb8997685" + } + ] + }, + "sort": [ + 1604710105401 + ] + }, + { + "_index": ".opensearch-anomaly-results-history-2020.11.07-1", + "_type": "_doc", + "_id": "wiImoHUBTpMGN-4KlhXs", + "_version": 1, + "_seq_no": 156, + "_primary_term": 1, + "_score": null, + "_source": { + "detector_id": "4CIGoHUBTpMGN-4KzBQg", + "schema_version": 2, + "anomaly_score": 4.892453213261217, + "execution_start_time": 1604709684971, + "data_end_time": 1604709674522, + "confidence": 0.8313735633713821, + "data_start_time": 1604709614522, + "feature_data": [ + { + "feature_id": "0Kld3HUBhpHMyt2e_UHn", + "feature_name": "latency_max", + "data": 5709 + } + ], + "execution_end_time": 1604709684971, + "anomaly_grade": 0.06542056074767538, + "entity": [ + { + "name": "host", + "value": "i-00f28ec1eb8997685" + } + ] + }, + "sort": [ + 1604709684971 + ] + }, + { + "_index": ".opensearch-anomaly-results-history-2020.11.07-1", + "_type": "_doc", + "_id": "ZiIcoHUBTpMGN-4KhhVA", + "_version": 1, + "_seq_no": 79, + "_primary_term": 1, + "_score": null, + "_source": { + "detector_id": "4CIGoHUBTpMGN-4KzBQg", + "schema_version": 2, + "anomaly_score": 3.187717536855158, + "execution_start_time": 1604709025343, + "data_end_time": 1604709014520, + "confidence": 0.8301116064308817, + "data_start_time": 1604708954520, + "feature_data": [ + { + "feature_id": "0Kld3HUBhpHMyt2e_UHn", + "feature_name": "latency_max", + "data": 441 + } + ], + "execution_end_time": 1604709025344, + "anomaly_grade": 0.040767386091133916, + "entity": [ + { + "name": "host", + "value": "i-00f28ec1eb8997685" + } + ] + }, + "sort": [ + 1604709025344 + ] + } + ] + } +} +``` + +In historical detectors, specify the `detector_id`. +To get the latest task: + +#### Request + +```json +GET _opensearch/_anomaly_detection/detectors/?task=true +``` + +To query the anomaly results with `task_id`: + +#### Request + +```json +GET _opensearch/_anomaly_detection/detectors/results/_search +{ + "query": { + "term": { + "task_id": { + "value": "NnlV9HUBQxqfQ7vBJNzy" + } + } + } +} +``` + +#### Sample response + +```json +{ + "took": 1, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 2.1366, + "hits": [ + { + "_index": ".opensearch-anomaly-detection-state", + "_type": "_doc", + "_id": "CoM8WncBtt2qvI-LZO7_", + "_version": 8, + "_seq_no": 1351, + "_primary_term": 3, + "_score": 2.1366, + "_source": { + "detector_id": "dZc8WncBgO2zoQoFWVBA", + "worker_node": "dk6-HuKQRMKm2fi8TSDHsg", + "task_progress": 0.09486946, + "last_update_time": 1612126667008, + "execution_start_time": 1612126643455, + "state": "RUNNING", + "coordinating_node": "gs213KqjS4q7H4Bmn_ZuLA", + "current_piece": 1583503800000, + "task_type": "HISTORICAL", + "started_by": "admin", + "init_progress": 1, + "is_latest": true, + "detector": { + "description": "test", + "ui_metadata": { + "features": { + "F1": { + "aggregationBy": "sum", + "aggregationOf": "value", + "featureType": "simple_aggs" + } + } + }, + "detection_date_range": { + "start_time": 1580504240308, + "end_time": 1612126640308 + }, + "feature_attributes": [ + { + "feature_id": "dJc8WncBgO2zoQoFWVAt", + "feature_enabled": true, + "feature_name": "F1", + "aggregation_query": { + "f_1": { + "sum": { + "field": "value" + } + } + } + } + ], + "schema_version": 0, + "time_field": "timestamp", + "last_update_time": 1612126640448, + "indices": [ + "nab_art_daily_jumpsdown" + ], + "window_delay": { + "period": { + "unit": "Minutes", + "interval": 1 + } + }, + "detection_interval": { + "period": { + "unit": "Minutes", + "interval": 10 + } + }, + "name": "test-historical-detector", + "filter_query": { + "match_all": { + "boost": 1 + } + }, + "shingle_size": 8, + "user": { + "backend_roles": [ + "admin" + ], + "custom_attribute_names": [], + "roles": [ + "all_access", + "own_index" + ], + "name": "admin", + "user_requested_tenant": "__user__" + }, + "detector_type": "HISTORICAL_SINGLE_ENTITY" + }, + "user": { + "backend_roles": [ + "admin" + ], + "custom_attribute_names": [], + "roles": [ + "all_access", + "own_index" + ], + "name": "admin", + "user_requested_tenant": "__user__" + } + } + } + ] + } +} +``` + + +--- + +## Delete detector + +Deletes a detector based on the `detector_id`. +To delete a detector, you need to first stop the detector. + +#### Request + +```json +DELETE _opensearch/_anomaly_detection/detectors/ +``` + + +#### Sample response + +```json +{ + "_index" : ".opensearch-anomaly-detectors", + "_type" : "_doc", + "_id" : "m4ccEnIBTXsGi3mvMt9p", + "_version" : 2, + "result" : "deleted", + "forced_refresh" : true, + "_shards" : { + "total" : 2, + "successful" : 2, + "failed" : 0 + }, + "_seq_no" : 6, + "_primary_term" : 1 +} +``` + + +--- + +## Update detector + +Updates a detector with any changes, including the description or adding or removing of features. +To update a detector, you need to first stop the detector. + +#### Request + +```json +PUT _opensearch/_anomaly_detection/detectors/ +{ + "name": "test-detector", + "description": "Test detector", + "time_field": "timestamp", + "indices": [ + "order*" + ], + "feature_attributes": [ + { + "feature_name": "total_order", + "feature_enabled": true, + "aggregation_query": { + "total_order": { + "sum": { + "field": "value" + } + } + } + } + ], + "filter_query": { + "bool": { + "filter": [ + { + "exists": { + "field": "value", + "boost": 1 + } + } + ], + "adjust_pure_negative": true, + "boost": 1 + } + }, + "detection_interval": { + "period": { + "interval": 10, + "unit": "MINUTES" + } + }, + "window_delay": { + "period": { + "interval": 1, + "unit": "MINUTES" + } + } +} +``` + + +#### Sample response + +```json +{ + "_id" : "m4ccEnIBTXsGi3mvMt9p", + "_version" : 2, + "_seq_no" : 4, + "_primary_term" : 1, + "anomaly_detector" : { + "name" : "test-detector", + "description" : "Test detector", + "time_field" : "timestamp", + "indices" : [ + "order*" + ], + "filter_query" : { + "bool" : { + "filter" : [ + { + "exists" : { + "field" : "value", + "boost" : 1.0 + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + }, + "detection_interval" : { + "period" : { + "interval" : 10, + "unit" : "Minutes" + } + }, + "window_delay" : { + "period" : { + "interval" : 1, + "unit" : "Minutes" + } + }, + "schema_version" : 0, + "feature_attributes" : [ + { + "feature_id" : "xxokEnIBcpeWMD987A1X", + "feature_name" : "total_order", + "feature_enabled" : true, + "aggregation_query" : { + "total_order" : { + "sum" : { + "field" : "value" + } + } + } + } + ] + } +} +``` + +To update a historical detector: + +#### Request + +```json +PUT _opensearch/_anomaly_detection/detectors/ +{ + "name": "test1", + "description": "test historical detector", + "time_field": "timestamp", + "indices": [ + "nab_art_daily_jumpsdown" + ], + "filter_query": { + "match_all": { + "boost": 1 + } + }, + "detection_interval": { + "period": { + "interval": 1, + "unit": "Minutes" + } + }, + "window_delay": { + "period": { + "interval": 1, + "unit": "Minutes" + } + }, + "feature_attributes": [ + { + "feature_name": "F1", + "feature_enabled": true, + "aggregation_query": { + "f_1": { + "sum": { + "field": "value" + } + } + } + } + ], + "detection_date_range": { + "start_time": 1577840401000, + "end_time": 1606121925000 + } +} +``` + +--- + +## Get detector + +Returns all information about a detector based on the `detector_id`. + +#### Request + +```json +GET _opensearch/_anomaly_detection/detectors/ +``` + +#### Sample response + +```json +{ + "_id" : "m4ccEnIBTXsGi3mvMt9p", + "_version" : 1, + "_primary_term" : 1, + "_seq_no" : 3, + "anomaly_detector" : { + "name" : "test-detector", + "description" : "Test detector", + "time_field" : "timestamp", + "indices" : [ + "order*" + ], + "filter_query" : { + "bool" : { + "filter" : [ + { + "exists" : { + "field" : "value", + "boost" : 1.0 + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + }, + "detection_interval" : { + "period" : { + "interval" : 1, + "unit" : "Minutes" + } + }, + "window_delay" : { + "period" : { + "interval" : 1, + "unit" : "Minutes" + } + }, + "schema_version" : 0, + "feature_attributes" : [ + { + "feature_id" : "mYccEnIBTXsGi3mvMd8_", + "feature_name" : "total_order", + "feature_enabled" : true, + "aggregation_query" : { + "total_order" : { + "sum" : { + "field" : "value" + } + } + } + } + ], + "last_update_time" : 1589441737319 + } +} +``` + + +Use `job=true` to get anomaly detection job information. + +#### Request + +```json +GET _opensearch/_anomaly_detection/detectors/?job=true +``` + +#### Sample response + +```json +{ + "_id" : "m4ccEnIBTXsGi3mvMt9p", + "_version" : 1, + "_primary_term" : 1, + "_seq_no" : 3, + "anomaly_detector" : { + "name" : "test-detector", + "description" : "Test detector", + "time_field" : "timestamp", + "indices" : [ + "order*" + ], + "filter_query" : { + "bool" : { + "filter" : [ + { + "exists" : { + "field" : "value", + "boost" : 1.0 + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + }, + "detection_interval" : { + "period" : { + "interval" : 1, + "unit" : "Minutes" + } + }, + "window_delay" : { + "period" : { + "interval" : 1, + "unit" : "Minutes" + } + }, + "schema_version" : 0, + "feature_attributes" : [ + { + "feature_id" : "mYccEnIBTXsGi3mvMd8_", + "feature_name" : "total_order", + "feature_enabled" : true, + "aggregation_query" : { + "total_order" : { + "sum" : { + "field" : "value" + } + } + } + } + ], + "last_update_time" : 1589441737319 + }, + "anomaly_detector_job" : { + "name" : "m4ccEnIBTXsGi3mvMt9p", + "schedule" : { + "interval" : { + "start_time" : 1589442051271, + "period" : 1, + "unit" : "Minutes" + } + }, + "window_delay" : { + "period" : { + "interval" : 1, + "unit" : "Minutes" + } + }, + "enabled" : true, + "enabled_time" : 1589442051271, + "last_update_time" : 1589442051271, + "lock_duration_seconds" : 60 + } +} +``` + +Use `task=true` to get historical detector task information. + +#### Request + +```json +GET _opensearch/_anomaly_detection/detectors/?task=true +``` + +#### Sample response + +```json +{ + "_id": "BwzKQXcB89DLS7G9rg7Y", + "_version": 1, + "_primary_term": 2, + "_seq_no": 10, + "anomaly_detector": { + "name": "test-ylwu1", + "description": "test", + "time_field": "timestamp", + "indices": [ + "nab*" + ], + "filter_query": { + "match_all": { + "boost": 1 + } + }, + "detection_interval": { + "period": { + "interval": 10, + "unit": "Minutes" + } + }, + "window_delay": { + "period": { + "interval": 1, + "unit": "Minutes" + } + }, + "shingle_size": 8, + "schema_version": 0, + "feature_attributes": [ + { + "feature_id": "BgzKQXcB89DLS7G9rg7G", + "feature_name": "F1", + "feature_enabled": true, + "aggregation_query": { + "f_1": { + "sum": { + "field": "value" + } + } + } + } + ], + "ui_metadata": { + "features": { + "F1": { + "aggregationBy": "sum", + "aggregationOf": "value", + "featureType": "simple_aggs" + } + } + }, + "last_update_time": 1611716538071, + "user": { + "name": "admin", + "backend_roles": [ + "admin" + ], + "roles": [ + "all_access", + "own_index" + ], + "custom_attribute_names": [], + "user_requested_tenant": "__user__" + }, + "detector_type": "HISTORICAL_SINGLE_ENTITY", + "detection_date_range": { + "start_time": 1580094137997, + "end_time": 1611716537997 + } + }, + "anomaly_detection_task": { + "task_id": "sgxaRXcB89DLS7G9RfIO", + "last_update_time": 1611776648699, + "started_by": "admin", + "state": "FINISHED", + "detector_id": "BwzKQXcB89DLS7G9rg7Y", + "task_progress": 1, + "init_progress": 1, + "current_piece": 1611716400000, + "execution_start_time": 1611776279822, + "execution_end_time": 1611776648679, + "is_latest": true, + "task_type": "HISTORICAL", + "coordinating_node": "gs213KqjS4q7H4Bmn_ZuLA", + "worker_node": "PgfR3JhbT7yJMx7bwQ6E3w", + "detector": { + "name": "test-ylwu1", + "description": "test", + "time_field": "timestamp", + "indices": [ + "nab*" + ], + "filter_query": { + "match_all": { + "boost": 1 + } + }, + "detection_interval": { + "period": { + "interval": 10, + "unit": "Minutes" + } + }, + "window_delay": { + "period": { + "interval": 1, + "unit": "Minutes" + } + }, + "shingle_size": 8, + "schema_version": 0, + "feature_attributes": [ + { + "feature_id": "BgzKQXcB89DLS7G9rg7G", + "feature_name": "F1", + "feature_enabled": true, + "aggregation_query": { + "f_1": { + "sum": { + "field": "value" + } + } + } + } + ], + "ui_metadata": { + "features": { + "F1": { + "aggregationBy": "sum", + "aggregationOf": "value", + "featureType": "simple_aggs" + } + } + }, + "last_update_time": 1611716538071, + "user": { + "name": "admin", + "backend_roles": [ + "admin" + ], + "roles": [ + "all_access", + "own_index" + ], + "custom_attribute_names": [], + "user_requested_tenant": "__user__" + }, + "detector_type": "HISTORICAL_SINGLE_ENTITY", + "detection_date_range": { + "start_time": 1580094137997, + "end_time": 1611716537997 + } + }, + "user": { + "name": "admin", + "backend_roles": [ + "admin" + ], + "roles": [ + "all_access", + "own_index" + ], + "custom_attribute_names": [], + "user_requested_tenant": "__user__" + } + } +} +``` + +--- + +## Search detector + +Returns all anomaly detectors for a search query. + +#### Request + +```json +GET _opensearch/_anomaly_detection/detectors/_search +POST _opensearch/_anomaly_detection/detectors/_search + +Sample Input: +{ + "query": { + "match": { + "name": "test-detector" + } + } +} +``` + + +#### Sample response + +```json +{ + "took": 13, + "timed_out": false, + "_shards": { + "total": 5, + "successful": 5, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 994, + "relation": "eq" + }, + "max_score": 3.5410638, + "hits": [ + { + "_index": ".opensearch-anomaly-detectors", + "_type": "_doc", + "_id": "m4ccEnIBTXsGi3mvMt9p", + "_version": 2, + "_seq_no": 221, + "_primary_term": 1, + "_score": 3.5410638, + "_source": { + "name": "test-detector", + "description": "Test detector", + "time_field": "timestamp", + "indices": [ + "order*" + ], + "filter_query": { + "bool": { + "filter": [ + { + "exists": { + "field": "value", + "boost": 1 + } + } + ], + "adjust_pure_negative": true, + "boost": 1 + } + }, + "detection_interval": { + "period": { + "interval": 10, + "unit": "MINUTES" + } + }, + "window_delay": { + "period": { + "interval": 1, + "unit": "MINUTES" + } + }, + "schema_version": 0, + "feature_attributes": [ + { + "feature_id": "xxokEnIBcpeWMD987A1X", + "feature_name": "total_order", + "feature_enabled": true, + "aggregation_query": { + "total_order": { + "sum": { + "field": "value" + } + } + } + } + ], + "last_update_time": 1589442309241 + } + } + ] + } +} +``` + +--- + +## Get detector stats + +Provides information about how the plugin is performing. + +#### Request + +```json +GET _opensearch/_anomaly_detection/stats +GET _opensearch/_anomaly_detection//stats +GET _opensearch/_anomaly_detection//stats/ +GET _opensearch/_anomaly_detection/stats/ +``` + +#### Sample response + +```json +{ + "_nodes" : { + "total" : 3, + "successful" : 3, + "failed" : 0 + }, + "cluster_name" : "multi-node-run", + "anomaly_detectors_index_status" : "green", + "detector_count" : 1, + "models_checkpoint_index_status" : "green", + "anomaly_results_index_status" : "green", + "nodes" : { + "IgWDUfzFRzW0FWAXM5FGJw" : { + "ad_execute_request_count" : 8, + "ad_execute_failure_count" : 7, + "models" : [ + { + "detector_id" : "m4ccEnIBTXsGi3mvMt9p", + "model_type" : "rcf", + "model_id" : "m4ccEnIBTXsGi3mvMt9p_model_rcf_0" + }, + { + "detector_id" : "m4ccEnIBTXsGi3mvMt9p", + "model_type" : "threshold", + "model_id" : "m4ccEnIBTXsGi3mvMt9p_model_threshold" + } + ] + }, + "y7YUQWukQEWOYbfdEq13hQ" : { + "ad_execute_request_count" : 0, + "ad_execute_failure_count" : 0, + "models" : [ ] + }, + "cDcGNsPoRAyRMlPP1m-vZw" : { + "ad_execute_request_count" : 0, + "ad_execute_failure_count" : 0, + "models" : [ + { + "detector_id" : "m4ccEnIBTXsGi3mvMt9p", + "model_type" : "rcf", + "model_id" : "m4ccEnIBTXsGi3mvMt9p_model_rcf_2" + }, + { + "detector_id" : "m4ccEnIBTXsGi3mvMt9p", + "model_type" : "rcf", + "model_id" : "m4ccEnIBTXsGi3mvMt9p_model_rcf_1" + } + ] + } + } +} +``` + +Historical detectors contain additional fields: + +#### Sample response + +```json +{ + "anomaly_detectors_index_status": "yellow", + "anomaly_detection_state_status": "yellow", + "historical_detector_count": 3, + "detector_count": 7, + "anomaly_detection_job_index_status": "yellow", + "models_checkpoint_index_status": "yellow", + "anomaly_results_index_status": "yellow", + "nodes": { + "Mz9HDZnuQwSCw0UiisxwWg": { + "ad_execute_request_count": 0, + "models": [], + "ad_canceled_batch_task_count": 2, + "ad_hc_execute_request_count": 0, + "ad_hc_execute_failure_count": 0, + "ad_execute_failure_count": 0, + "ad_batch_task_failure_count": 0, + "ad_executing_batch_task_count": 1, + "ad_total_batch_task_count": 8 + } + } +} +``` + +--- + +## Create monitor + +Create a monitor to set up alerts for the detector. + +#### Request + +```json +POST _opensearch/_alerting/monitors +{ + "type": "monitor", + "name": "test-monitor", + "enabled": true, + "schedule": { + "period": { + "interval": 20, + "unit": "MINUTES" + } + }, + "inputs": [ + { + "search": { + "indices": [ + ".opensearch-anomaly-results*" + ], + "query": { + "size": 1, + "query": { + "bool": { + "filter": [ + { + "range": { + "data_end_time": { + "from": "{{period_end}}||-20m", + "to": "{{period_end}}", + "include_lower": true, + "include_upper": true, + "boost": 1 + } + } + }, + { + "term": { + "detector_id": { + "value": "m4ccEnIBTXsGi3mvMt9p", + "boost": 1 + } + } + } + ], + "adjust_pure_negative": true, + "boost": 1 + } + }, + "sort": [ + { + "anomaly_grade": { + "order": "desc" + } + }, + { + "confidence": { + "order": "desc" + } + } + ], + "aggregations": { + "max_anomaly_grade": { + "max": { + "field": "anomaly_grade" + } + } + } + } + } + } + ], + "triggers": [ + { + "name": "test-trigger", + "severity": "1", + "condition": { + "script": { + "source": "return ctx.results[0].aggregations.max_anomaly_grade.value != null && ctx.results[0].aggregations.max_anomaly_grade.value > 0.7 && ctx.results[0].hits.hits[0]._source.confidence > 0.7", + "lang": "painless" + } + }, + "actions": [ + { + "name": "test-action", + "destination_id": "ld7912sBlQ5JUWWFThoW", + "message_template": { + "source": "This is my message body." + }, + "throttle_enabled": false, + "subject_template": { + "source": "TheSubject" + } + } + ] + } + ] +} +``` + +#### Sample response + +```json +{ + "_id": "OClTEnIBmSf7y6LP11Jz", + "_version": 1, + "_seq_no": 10, + "_primary_term": 1, + "monitor": { + "type": "monitor", + "schema_version": 1, + "name": "test-monitor", + "enabled": true, + "enabled_time": 1589445384043, + "schedule": { + "period": { + "interval": 20, + "unit": "MINUTES" + } + }, + "inputs": [ + { + "search": { + "indices": [ + ".opensearch-anomaly-results*" + ], + "query": { + "size": 1, + "query": { + "bool": { + "filter": [ + { + "range": { + "data_end_time": { + "from": "{{period_end}}||-20m", + "to": "{{period_end}}", + "include_lower": true, + "include_upper": true, + "boost": 1 + } + } + }, + { + "term": { + "detector_id": { + "value": "m4ccEnIBTXsGi3mvMt9p", + "boost": 1 + } + } + } + ], + "adjust_pure_negative": true, + "boost": 1 + } + }, + "sort": [ + { + "anomaly_grade": { + "order": "desc" + } + }, + { + "confidence": { + "order": "desc" + } + } + ], + "aggregations": { + "max_anomaly_grade": { + "max": { + "field": "anomaly_grade" + } + } + } + } + } + } + ], + "triggers": [ + { + "id": "NilTEnIBmSf7y6LP11Jr", + "name": "test-trigger", + "severity": "1", + "condition": { + "script": { + "source": "return ctx.results[0].aggregations.max_anomaly_grade.value != null && ctx.results[0].aggregations.max_anomaly_grade.value > 0.7 && ctx.results[0].hits.hits[0]._source.confidence > 0.7", + "lang": "painless" + } + }, + "actions": [ + { + "id": "NylTEnIBmSf7y6LP11Jr", + "name": "test-action", + "destination_id": "ld7912sBlQ5JUWWFThoW", + "message_template": { + "source": "This is my message body.", + "lang": "mustache" + }, + "throttle_enabled": false, + "subject_template": { + "source": "TheSubject", + "lang": "mustache" + } + } + ] + } + ], + "last_update_time": 1589445384043 + } +} +``` + +--- + +## Profile detector + +Returns information related to the current state of the detector and memory usage, including current errors and shingle size, to help troubleshoot the detector. + +This command helps locate logs by identifying the nodes that run the anomaly detector job for each detector. + +It also helps track the initialization percentage, the required shingles, and the estimated time left. + +#### Request + +```json +GET _opensearch/_anomaly_detection/detectors//_profile/ +GET _opensearch/_anomaly_detection/detectors//_profile?_all=true +GET _opensearch/_anomaly_detection/detectors//_profile/ +GET /_opensearch/_anomaly_detection/detectors//_profile/, +``` + +#### Sample Responses + +```json +GET _opensearch/_anomaly_detection/detectors//_profile + +{ + "state":"DISABLED", + "error":"Stopped detector: AD models memory usage exceeds our limit." +} + +GET _opensearch/_anomaly_detection/detectors//_profile?_all=true&pretty + +{ + "state": "RUNNING", + "models": [ + { + "model_id": "cneh7HEBHPICjJIdXdrR_model_rcf_2", + "model_size_in_bytes": 4456448, + "node_id": "VS29z70PSzOdHiEw4SoV9Q" + }, + { + "model_id": "cneh7HEBHPICjJIdXdrR_model_rcf_1", + "model_size_in_bytes": 4456448, + "node_id": "VS29z70PSzOdHiEw4SoV9Q" + }, + { + "model_id": "cneh7HEBHPICjJIdXdrR_model_threshold", + "node_id": "Og23iUroTdKrkwS-y89zLw" + }, + { + "model_id": "cneh7HEBHPICjJIdXdrR_model_rcf_0", + "model_size_in_bytes": 4456448, + "node_id": "Og23iUroTdKrkwS-y89zLw" + } + ], + "shingle_size": 8, + "coordinating_node": "Og23iUroTdKrkwS-y89zLw", + "total_size_in_bytes": 13369344, + "init_progress": { + "percentage": "70%", + "estimated_minutes_left": 77, + "needed_shingles": 77 + } +} + +GET _opensearch/_anomaly_detection/detectors//_profile/total_size_in_bytes + +{ + "total_size_in_bytes" : 13369344 +} +``` + +If you have configured the category field, you can see the number of unique values in the field and also all the active entities with models running in memory. +You can use this data to estimate the memory required for anomaly detection to help decide the size of your cluster. +For example, if a detector has one million entities and only 10 of them are active in memory, then you need to scale up or scale out your cluster. + +#### Request + +```json +GET /_opensearch/_anomaly_detection/detectors//_profile?_all=true&pretty + +{ + "state": "RUNNING", + "models": [ + { + "model_id": "T4c3dXUBj-2IZN7itix__entity_i-00f28ec1eb8997684", + "model_size_in_bytes": 712480, + "node_id": "g6pmr547QR-CfpEvO67M4g" + }, + { + "model_id": "T4c3dXUBj-2IZN7itix__entity_i-00f28ec1eb8997685", + "model_size_in_bytes": 712480, + "node_id": "g6pmr547QR-CfpEvO67M4g" + }, + { + "model_id": "T4c3dXUBj-2IZN7itix__entity_i-00f28ec1eb8997686", + "model_size_in_bytes": 712480, + "node_id": "g6pmr547QR-CfpEvO67M4g" + }, + { + "model_id": "T4c3dXUBj-2IZN7itix__entity_i-00f28ec1eb8997680", + "model_size_in_bytes": 712480, + "node_id": "g6pmr547QR-CfpEvO67M4g" + }, + { + "model_id": "T4c3dXUBj-2IZN7itix__entity_i-00f28ec1eb8997681", + "model_size_in_bytes": 712480, + "node_id": "g6pmr547QR-CfpEvO67M4g" + }, + { + "model_id": "T4c3dXUBj-2IZN7itix__entity_i-00f28ec1eb8997682", + "model_size_in_bytes": 712480, + "node_id": "g6pmr547QR-CfpEvO67M4g" + }, + { + "model_id": "T4c3dXUBj-2IZN7itix__entity_i-00f28ec1eb8997683", + "model_size_in_bytes": 712480, + "node_id": "g6pmr547QR-CfpEvO67M4g" + } + ], + "total_size_in_bytes": 4987360, + "init_progress": { + "percentage": "100%" + }, + "total_entities": 7, + "active_entities": 7 +} +``` + +The `profile` operation also provides information about each entity, such as the entity’s `last_sample_timestamp` and `last_active_timestamp`. + +No anomaly results for an entity indicates that either the entity doesn't have any sample data or its model is removed from the model cache. + + `last_sample_timestamp` shows the last document in the input data source index containing the entity, while `last_active_timestamp` shows the timestamp when the entity’s model was last seen in the model cache. + +#### Request + +```json +GET /_opensearch/_anomaly_detection/detectors//_profile?_all=true&entity=i-00f28ec1eb8997686 +{ + "category_field": "host", + "value": "i-00f28ec1eb8997686", + "is_active": true, + "last_active_timestamp": 1604026394879, + "last_sample_timestamp": 1604026394879, + "init_progress": { + "percentage": "100%" + }, + "model": { + "model_id": "TFUdd3UBBwIAGQeRh5IS_entity_i-00f28ec1eb8997686", + "model_size_in_bytes": 712480, + "node_id": "MQ-bTBW3Q2uU_2zX3pyEQg" + }, + "state": "RUNNING" +} +``` + +For a historical detector, specify `_all` or `ad_task` to see information about its latest task: + +#### Request + +```json +GET _opensearch/_anomaly_detection/detectors//_profile?_all +GET _opensearch/_anomaly_detection/detectors//_profile/ad_task +``` + +#### Sample Responses + +```json +{ + "ad_task": { + "ad_task": { + "task_id": "JXxyG3YBv5IHYYfMlFS2", + "last_update_time": 1606778263543, + "state": "STOPPED", + "detector_id": "SwvxCHYBPhugfWD9QAL6", + "task_progress": 0.010480972, + "init_progress": 1, + "current_piece": 1578140400000, + "execution_start_time": 1606778262709, + "is_latest": true, + "task_type": "HISTORICAL", + "detector": { + "name": "historical_test1", + "description": "test", + "time_field": "timestamp", + "indices": [ + "nab_art_daily_jumpsdown" + ], + "filter_query": { + "match_all": { + "boost": 1 + } + }, + "detection_interval": { + "period": { + "interval": 5, + "unit": "Minutes" + } + }, + "window_delay": { + "period": { + "interval": 1, + "unit": "Minutes" + } + }, + "shingle_size": 8, + "schema_version": 0, + "feature_attributes": [ + { + "feature_id": "zgvyCHYBPhugfWD9Ap_F", + "feature_name": "sum", + "feature_enabled": true, + "aggregation_query": { + "sum": { + "sum": { + "field": "value" + } + } + } + }, + { + "feature_id": "zwvyCHYBPhugfWD9Ap_G", + "feature_name": "max", + "feature_enabled": true, + "aggregation_query": { + "max": { + "max": { + "field": "value" + } + } + } + } + ], + "ui_metadata": { + "features": { + "max": { + "aggregationBy": "max", + "aggregationOf": "value", + "featureType": "simple_aggs" + }, + "sum": { + "aggregationBy": "sum", + "aggregationOf": "value", + "featureType": "simple_aggs" + } + }, + "filters": [], + "filterType": "simple_filter" + }, + "last_update_time": 1606467935713, + "detector_type": "HISTORICAL_SIGLE_ENTITY", + "detection_date_range": { + "start_time": 1577840400000, + "end_time": 1606463775000 + } + } + }, + "shingle_size": 8, + "rcf_total_updates": 1994, + "threshold_model_trained": true, + "threshold_model_training_data_size": 0, + "node_id": "Q9yznwxvTz-yJxtz7rJlLg" + } +} +``` + +--- diff --git a/docs/ad/index.md b/docs/ad/index.md new file mode 100644 index 00000000..32849e6e --- /dev/null +++ b/docs/ad/index.md @@ -0,0 +1,167 @@ +--- +layout: default +title: Anomaly detection +nav_order: 46 +has_children: true +--- + +# Anomaly detection + +An anomaly is any unusual change in behavior. Anomalies in your time-series data can lead to valuable insights. For example, for IT infrastructure data, an anomaly in the memory usage metric might help you uncover early signs of a system failure. + +Discovering anomalies using conventional methods such as creating visualizations and dashboards can be challenging. You can set an alert based on a static threshold, but this requires prior domain knowledge and is not adaptive to data that exhibits organic growth or seasonal behavior. + +The anomaly detection feature automatically detects anomalies in your OpenSearch data in near real-time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an `anomaly grade` and `confidence score` value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Random Cut Forests](https://pdfs.semanticscholar.org/8bba/52e9797f2e2cc9a823dbd12514d02f29c8b9.pdf?_ga=2.56302955.1913766445.1574109076-1059151610.1574109076). + +You can pair the anomaly detection plugin with the [alerting plugin](../alerting/) to notify you as soon as an anomaly is detected. + +To use the anomaly detection plugin, your computer needs to have more than one CPU core. +{: .note } + +## Get started with Anomaly Detection + +To get started, choose **Anomaly Detection** in OpenSearch Dashboards. +To first test with sample streaming data, choose **Sample Detectors** and try out one of the preconfigured detectors. + +### Step 1: Create a detector + +A detector is an individual anomaly detection task. You can create multiple detectors, and all the detectors can run simultaneously, with each analyzing data from different sources. + +1. Choose **Create Detector**. +1. Enter the **Name** of the detector and a brief **Description**. Make sure the name that you enter is unique and descriptive enough to help you to identify the purpose of this detector. +1. For **Data source**, choose the index that you want to use as the data source. You can optionally use index patterns to choose multiple indices. +1. Choose the **Timestamp field** in your index. +1. For **Data filter**, you can optionally filter the index that you chose as the data source. From the **Filter type** menu, choose **Visual filter**, and then design your filter query by selecting **Fields**, **Operator**, and **Value**, or choose **Custom Expression** and add in your own JSON filter query. +1. For **Detector operation settings**, define the **Detector interval** to set the time interval at which the detector collects data. +- The detector aggregates the data in this interval, then feeds the aggregated result into the anomaly detection model. +The shorter you set this interval, the fewer data points the detector aggregates. +The anomaly detection model uses a shingling process, a technique that uses consecutive data points to create a sample for the model. This process needs a certain number of aggregated data points from contiguous intervals. +- We recommend you set the detector interval based on your actual data. Too long of an interval might delay the results and too short of an interval might miss some data and also not have a sufficient number of consecutive data points for the shingle process. +1. To add extra processing time for data collection, specify a **Window delay** value. This is to tell the detector that the data is not ingested into OpenSearch in real time but with a certain delay. +Set the window delay to shift the detector interval to account for this delay. +- For example, say the detector interval is 10 minutes and data is ingested into your cluster with a general delay of 1 minute. +Assume the detector runs at 2:00, the detector attempts to get the last 10 minutes of data from 1:50 to 2:00, but because of the 1-minute delay, it only gets 9 minutes of data and misses the data from 1:59 to 2:00. +Setting the window delay to 1 minute, shifts the interval window to 1:49 - 1:59, so the detector accounts for all 10 minutes of the detector interval time. +1. Choose **Create**. + +After you create the detector, the next step is to add features to it. + +### Step 2: Add features to your detector + +In this case, a feature is the field in your index that you want to check for anomalies. A detector can discover anomalies across one or more features. You must choose an aggregation method for each feature: `average()`, `count()`, `sum()`, `min()`, or `max()`. The aggregation method determines what constitutes an anomaly. + +For example, if you choose `min()`, the detector focuses on finding anomalies based on the minimum values of your feature. If you choose `average()`, the detector finds anomalies based on the average values of your feature. + +A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data might further amplify this negative impact. Selecting the optimal feature set is usually an iterative process. We recommend experimenting with a historical detector with different feature sets and checking the precision before moving on to real-time detectors. By default, the maximum number of features for a detector is 5. You can adjust this limit with the `opensearch.anomaly_detection.max_anomaly_features` setting. +{: .note } + +1. On the **Model configuration** page, enter the **Feature name**. +1. For **Find anomalies based on**, choose the method to find anomalies. For **Field Value** menu, choose the **field** and the **aggregation method**. Or choose **Custom expression**, and add in your own JSON aggregation query. + +#### (Optional) Set a category field for high cardinality + +You can categorize anomalies based on a keyword or IP field type. + +The category field categorizes or slices the source time series with a dimension like IP addresses, product IDs, country codes, and so on. This helps to see a granular view of anomalies within each entity of the category field to isolate and debug issues. + +To set a category field, choose **Enable a category field** and select a field. + +Only a certain number of unique entities are supported in the category field. Use the following equation to calculate the recommended total number of entities number supported in a cluster: + +``` +(data nodes * heap size * anomaly detection maximum memory percentage) / (entity size of a detector) +``` + +This formula provides a good starting point, test with a representative workload and see how it goes. +{: .note } + +For example, for a cluster with 3 data nodes, each with 8G of JVM heap size, a maximum memory percentage of 10% (default), and the entity size of the detector as 1MB: the total number of unique entities supported is (8.096 * 10^9 * 0.1 / 1M ) * 3 = 2429. + +#### Set a window size + +Set the number of aggregation intervals from your data stream to consider in a detection window. We recommend you choose this value based on your actual data to see which one leads to the best results for your use case. + +Based on experiments performed on a wide variety of one-dimensional data streams, we recommend using a window size between 1 and 16. The default window size is 8. If you have set the category field for high cardinality, the default window size is 1. + +If you expect missing values in your data or if you want the anomalies based on the current interval, choose 1. If your data is continuously ingested and you want the anomalies based on multiple intervals, choose a larger window size. + +#### Preview sample anomalies + +Preview sample anomalies and adjust the feature settings if needed. +For sample previews, the anomaly detection plugin selects a small number of data samples---for example, one data point every 30 minutes---and uses interpolation to estimate the remaining data points to approximate the actual feature data. It loads this sample dataset into the detector. The detector uses this sample dataset to generate a sample preview of anomaly results. +Examine the sample preview and use it to fine-tune your feature configurations, for example, enable or disable features, to get more accurate results. + +1. Choose **Save and start detector**. +1. Choose between automatically starting the detector (recommended) or manually starting the detector at a later time. + +### Step 3: Observe the results + +Choose the **Anomaly results** tab. + +You will have to wait for some time to see the anomaly results. + +If the detector interval is 10 minutes, the detector might take more than an hour to start, as it's waiting for sufficient data to generate anomalies. + +A shorter interval means the model passes the shingle process more quickly and starts to generate the anomaly results sooner. +Use the [profile detector](./api#profile-detector) operation to make sure you check you have sufficient data points. + +If you see the detector pending in "initialization" for longer than a day, aggregate your existing data using the detector interval to check if for any missing data points. If you find a lot of missing data points from the aggregated data, consider increasing the detector interval. + +![Anomaly detection results](../images/ad.png) + +- The **Live anomalies** chart displays the live anomaly results for the last 60 intervals. For example, if the interval is set to 10, it shows the results for the last 600 minutes. This chart refreshes every 30 seconds. +- The **Anomaly history** chart plots the anomaly grade with the corresponding measure of confidence. +- The **Feature breakdown** graph plots the features based on the aggregation method. You can vary the date-time range of the detector. +- The **Anomaly occurrence** table shows the `Start time`, `End time`, `Data confidence`, and `Anomaly grade` for each anomaly detected. + +Anomaly grade is a number between 0 and 1 that indicates the level of severity of how anomalous a data point is. An anomaly grade of 0 represents “not an anomaly,” and a non-zero value represents the relative severity of the anomaly. The confidence score is an estimate of the probability that the reported anomaly grade matches the expected anomaly grade. Confidence increases as the model observes more data and learns the data behavior and trends. Note that confidence is distinct from model accuracy. + +If you set the category field, you see an additional **Heat map** chart. The heat map correlates results for anomalous entities. This chart is empty until you select an anomalous entity. You also see the anomaly and feature line chart for the time period of the anomaly (`anomaly_grade` > 0). + +Choose a filled rectangle to see a more detailed view of the anomaly. +{: .note } + +### Step 4: Set up alerts + +To create a monitor to send you notifications when any anomalies are detected, choose **Set up alerts**. +You're redirected to the **Alerting**, **Add monitor** page. + +For steps to create a monitor and set notifications based on your anomaly detector, see [Monitor](../alerting/monitors/). + +If you stop or delete a detector, make sure to delete any monitors associated with the detector. + +### Step 5: Adjust the model + +To see all the configuration settings, choose the **Detector configuration** tab. + +1. To make any changes to the detector configuration, or fine tune the time interval to minimize any false positives, in the **Detector configuration** section, choose **Edit**. +- You need to stop the detector to change the detector configuration. In the pop-up box, confirm that you want to stop the detector and proceed. +1. To enable or disable features, in the **Features** section, choose **Edit** and adjust the feature settings as needed. After you make your changes, choose **Save and start detector**. +- Choose between automatically starting the detector (recommended) or manually starting the detector at a later time. + +### Step 6: Analyze historical data + +Analyzing historical data helps you get familiar with the anomaly detection plugin. You can also evaluate the performance of a detector with historical data to further fine-tune it. + +To use a historical detector, the date range that you specify must have data present in at least 1,000 detection intervals. +{: .note } + +1. Choose **Historical detectors** and **Create historical detector**. +1. Enter the **Name** of the detector and a brief **Description**. +1. For **Data source**, choose the index that you want to use as the data source. You can optionally use index patterns to choose multiple indices. +1. For **Time range**, select a time range for historical analysis. +1. For **Detector settings**, choose to use settings of an existing detector. Or choose the **Timestamp field** in your index, add individual features to the detector, and set the detector interval. +1. You can choose to run the historical detector automatically after creating. +1. Choose **Create**. + - You can stop the historical detector even before it completes. + +### Step 7: Manage your detectors + +Go to the **Detector details** page to change or delete your detectors. + +1. To make changes to your detector, choose the detector name to open the detector details page. +1. Choose **Actions**, and then choose **Edit detector**. + - You need to stop the detector to change the detector configuration. In the pop-up box, confirm that you want to stop the detector and proceed. +1. After making your changes, choose **Save changes**. +1. To delete your detector, choose **Actions**, and then choose **Delete detector**. + - In the pop-up box, type `delete` to confirm and choose **Delete**. diff --git a/docs/ad/security.md b/docs/ad/security.md new file mode 100644 index 00000000..4e2ae207 --- /dev/null +++ b/docs/ad/security.md @@ -0,0 +1,86 @@ +--- +layout: default +title: Anomaly detection security +nav_order: 10 +parent: Anomaly detection +has_children: false +--- + +# Anomaly detection security + +You can use the security plugin with anomaly detection to limit non-admin users to specific actions. For example, you might want some users to only be able to create, update, or delete detectors, while others to only view detectors. + +All anomaly detection indices are protected as system indices. Only a super admin user or an admin user with a TLS certificate can access system indices. For more information, see [System indices](../../security/configuration/system-indices/). + + +Security for anomaly detection works the same as [security for alerting](../../alerting/security/). + +## Basic permissions + +As an admin user, you can use the security plugin to assign specific permissions to users based on which APIs they need access to. For a list of supported APIs, see [Anomaly Detection API](../api/). + +The security plugin has two built-in roles that cover most anomaly detection use cases: `anomaly_full_access` and `anomaly_read_access`. For descriptions of each, see [Predefined roles](../../security/access-control/users-roles/#predefined-roles). + +If these roles don't meet your needs, mix and match individual anomaly detection [permissions](../../security/access-control/permissions/) to suit your use case. Each action corresponds to an operation in the REST API. For example, the `cluster:admin/opensearch/ad/detector/delete` permission lets you delete detectors. + +## (Advanced) Limit access by backend role + +Use backend roles to configure fine-grained access to individual detectors based on roles. For example, users of different departments in an organization can view detectors owned by their own department. + +First, make sure that your users have the appropriate [backend roles](../../security/access-control/). Backend roles usually come from an [LDAP server](../../security/configuration/ldap/) or [SAML provider](../../security/configuration/saml/), but if you use the internal user database, you can use the REST API to [add them manually](../../security/access-control/api/#create-user). + +Next, enable the following setting: + +```json +PUT _cluster/settings +{ + "transient": { + "opensearch.anomaly_detection.filter_by_backend_roles": "true" + } +} +``` + +Now when users view anomaly detection resources in OpenSearch Dashboards (or make REST API calls), they only see detectors created by users who share at least one backend role. +For example, consider two users: `alice` and `bob`. + +`alice` has an analyst backend role: + +```json +PUT _opensearch/_security/api/internalusers/alice +{ + "password": "alice", + "backend_roles": [ + "analyst" + ], + "attributes": {} +} +``` + +`bob` has a human-resources backend role: + +```json +PUT _opensearch/_security/api/internalusers/bob +{ + "password": "bob", + "backend_roles": [ + "human-resources" + ], + "attributes": {} +} +``` + +Both `alice` and `bob` have full access to anomaly detection: + +```json +PUT _opensearch/_security/api/rolesmapping/anomaly_full_access +{ + "backend_roles": [], + "hosts": [], + "users": [ + "alice", + "bob" + ] +} +``` + +Because they have different backend roles, `alice` and `bob` cannot view each other's detectors and its results. diff --git a/docs/ad/settings.md b/docs/ad/settings.md new file mode 100644 index 00000000..dfa004a4 --- /dev/null +++ b/docs/ad/settings.md @@ -0,0 +1,42 @@ +--- +layout: default +title: Settings +parent: Anomaly detection +nav_order: 4 +--- + +# Settings + +The anomaly detection plugin adds several settings to the standard OpenSearch cluster settings. +They are dynamic, so you can change the default behavior of the plugin without restarting your cluster. +You can mark them `persistent` or `transient`. + +For example, to update the retention period of the result index: + +```json +PUT _cluster/settings +{ + "transient": { + "opensearch.anomaly_detection.ad_result_history_retention_period": "5m" + } +} +``` + +Setting | Default | Description +:--- | :--- | :--- +`opensearch.anomaly_detection.enabled` | True | Whether the anomaly detection plugin is enabled or not. If disabled, all detectors immediately stop running. +`opensearch.anomaly_detection.max_anomaly_detectors` | 1,000 | The maximum number of non-high cardinality detectors (no category field) users can create. +`opensearch.anomaly_detection.max_multi_entity_anomaly_detectors` | 10 | The maximum number of high cardinality detectors (with category field) in a cluster. +`opensearch.anomaly_detection.max_anomaly_features` | 5 | The maximum number of features for a detector. +`opensearch.anomaly_detection.ad_result_history_rollover_period` | 12h | How often the rollover condition is checked. If `true`, the plugin rolls over the result index to a new index. +`opensearch.anomaly_detection.ad_result_history_max_docs` | 250000000 | The maximum number of documents in one result index. The plugin only counts refreshed documents in the primary shards. +`opensearch.anomaly_detection.ad_result_history_retention_period` | 30d | The maximum age of the result index. If its age exceeds the threshold, the plugin deletes the rolled over result index. If the cluster has only one result index, the plugin keeps it even if it's older than its configured retention period. +`opensearch.anomaly_detection.max_entities_per_query` | 1,000 | The maximum unique values per detection interval for high cardinality detectors. By default, if the category field has more than 1,000 unique values in a detector interval, the plugin selects the top 1,000 values and orders them by `doc_count`. +`opensearch.anomaly_detection.max_entities_for_preview` | 30 | The maximum unique category field values displayed with the preview operation for high cardinality detectors. If the category field has more than 30 unique values, the plugin selects the top 30 values and orders them by `doc_count`. +`opensearch.anomaly_detection.max_primary_shards` | 10 | The maximum number of primary shards an anomaly detection index can have. +`opensearch.anomaly_detection.filter_by_backend_roles` | False | When you enable the security plugin and set this to `true`, the plugin filters results based on the user's backend role(s). +`opensearch.anomaly_detection.max_cache_miss_handling_per_second` | 100 | High cardinality detectors use a cache to store active models. In the event of a cache miss, the cache gets the models from the model checkpoint index. Use this setting to limit the rate of fetching models. Because the thread pool for a GET operation has a queue of 1,000, we recommend setting this value below 1,000. +`opensearch.anomaly_detection.max_batch_task_per_node` | 2 | Starting a historical detector triggers a batch task. This setting is the number of batch tasks that you can run per data node. You can tune this setting from 1 to 1000. If the data nodes can't support all batch tasks and if you're not sure if the data nodes are capable of running more historical detectors, add more data nodes instead of changing this setting to a higher value. +`opensearch.anomaly_detection.max_old_ad_task_docs_per_detector` | 10 | You can run the same historical detector many times. For each run, the anomaly detection plugin creates a new task. This setting is the number of previous tasks the plugin keeps. Set this value to at least 1 to track its last run. You can keep a maximum of 1,000 old tasks to avoid overwhelming the cluster. +`opensearch.anomaly_detection.batch_task_piece_size` | 1000 | The date range for a historical task is split into smaller pieces and the anomaly detection plugin runs the task piece by piece. Each piece contains 1,000 detection intervals by default. For example, if detector interval is 1 minute and one piece is 1000 minutes, the feature data is queried every 1,000 minutes. You can change this setting from 1 to 10,000. +`opensearch.anomaly_detection.batch_task_piece_interval_seconds` | 5 | Add a time interval between historical detector tasks. This interval prevents the task from consuming too much of the available resources and starving other operations like search and bulk index. You can change this setting from 1 to 600 seconds. diff --git a/docs/alerting/api.md b/docs/alerting/api.md new file mode 100644 index 00000000..52d85e43 --- /dev/null +++ b/docs/alerting/api.md @@ -0,0 +1,1466 @@ +--- +layout: default +title: API +parent: Alerting +nav_order: 15 +--- + +# Alerting API + +Use the alerting API to programmatically manage monitors and alerts. + + +--- + +#### Table of contents +- TOC +{:toc} + + +--- + +## Create monitor + +#### Request + +```json +POST _opensearch/_alerting/monitors +{ + "type": "monitor", + "name": "test-monitor", + "enabled": true, + "schedule": { + "period": { + "interval": 1, + "unit": "MINUTES" + } + }, + "inputs": [{ + "search": { + "indices": ["movies"], + "query": { + "size": 0, + "aggregations": {}, + "query": { + "bool": { + "filter": { + "range": { + "@timestamp": { + "gte": "||-1h", + "lte": "", + "format": "epoch_millis" + } + } + } + } + } + } + } + }], + "triggers": [{ + "name": "test-trigger", + "severity": "1", + "condition": { + "script": { + "source": "ctx.results[0].hits.total.value > 0", + "lang": "painless" + } + }, + "actions": [{ + "name": "test-action", + "destination_id": "ld7912sBlQ5JUWWFThoW", + "message_template": { + "source": "This is my message body." + }, + "throttle_enabled": true, + "throttle": { + "value": 27, + "unit": "MINUTES" + }, + "subject_template": { + "source": "TheSubject" + } + }] + }] +} +``` + +If you use a custom webhook for your destination and need to embed JSON in the message body, be sure to escape your quotes: + +```json +{ + "message_template": { + {% raw %}"source": "{ \"text\": \"Monitor {{ctx.monitor.name}} just entered alert status. Please investigate the issue. - Trigger: {{ctx.trigger.name}} - Severity: {{ctx.trigger.severity}} - Period start: {{ctx.periodStart}} - Period end: {{ctx.periodEnd}}\" }"{% endraw %} + } +} +``` + +#### Sample response + +```json +{ + "_id": "vd5k2GsBlQ5JUWWFxhsP", + "_version": 1, + "_seq_no": 7, + "_primary_term": 1, + "monitor": { + "type": "monitor", + "schema_version": 1, + "name": "test-monitor", + "enabled": true, + "enabled_time": 1562703611363, + "schedule": { + "period": { + "interval": 1, + "unit": "MINUTES" + } + }, + "inputs": [{ + "search": { + "indices": [ + "movies" + ], + "query": { + "size": 0, + "query": { + "bool": { + "filter": [{ + "range": { + "@timestamp": { + "from": "||-1h", + "to": "", + "include_lower": true, + "include_upper": true, + "format": "epoch_millis", + "boost": 1 + } + } + }], + "adjust_pure_negative": true, + "boost": 1 + } + }, + "aggregations": {} + } + } + }], + "triggers": [{ + "id": "ud5k2GsBlQ5JUWWFxRvi", + "name": "test-trigger", + "severity": "1", + "condition": { + "script": { + "source": "ctx.results[0].hits.total.value > 0", + "lang": "painless" + } + }, + "actions": [{ + "id": "ut5k2GsBlQ5JUWWFxRvj", + "name": "test-action", + "destination_id": "ld7912sBlQ5JUWWFThoW", + "message_template": { + "source": "This is my message body.", + "lang": "mustache" + }, + "throttle_enabled": false, + "subject_template": { + "source": "TheSubject", + "lang": "mustache" + } + }] + }], + "last_update_time": 1562703611363 + } +} +``` + +If you want to specify a timezone, you can do so by including a [cron expression](../cron/) with a timezone name in the `schedule` section of your request. + +The following example creates a monitor that runs at 12:10 PM Pacific Time on the 1st day of every month. + +#### Request + +```json +{ + "type": "monitor", + "name": "test-monitor", + "enabled": true, + "schedule": { + "cron" : { + "expression": "10 12 1 * *", + "timezone": "America/Los_Angeles" + } + }, + "inputs": [{ + "search": { + "indices": ["movies"], + "query": { + "size": 0, + "aggregations": {}, + "query": { + "bool": { + "filter": { + "range": { + "@timestamp": { + "gte": "||-1h", + "lte": "", + "format": "epoch_millis" + } + } + } + } + } + } + } + }], + "triggers": [{ + "name": "test-trigger", + "severity": "1", + "condition": { + "script": { + "source": "ctx.results[0].hits.total.value > 0", + "lang": "painless" + } + }, + "actions": [{ + "name": "test-action", + "destination_id": "ld7912sBlQ5JUWWFThoW", + "message_template": { + "source": "This is my message body." + }, + "throttle_enabled": true, + "throttle": { + "value": 27, + "unit": "MINUTES" + }, + "subject_template": { + "source": "TheSubject" + } + }] + }] +} +``` + +For a full list of timezone names, refer to [Wikipedia](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones). The alerting plugin uses the Java [TimeZone](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/TimeZone.html) class to convert a [`ZoneId`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/time/ZoneId.html) to a valid timezone. + +--- + +## Update monitor + +When you update a monitor, include the current version number as a parameter. OpenSearch increments the version number automatically (see the sample response). + +#### Request + +```json +PUT _opensearch/_alerting/monitors/ +{ + "type": "monitor", + "name": "test-monitor", + "enabled": true, + "enabled_time": 1551466220455, + "schedule": { + "period": { + "interval": 1, + "unit": "MINUTES" + } + }, + "inputs": [{ + "search": { + "indices": [ + "*" + ], + "query": { + "query": { + "match_all": { + "boost": 1 + } + } + } + } + }], + "triggers": [{ + "id": "StaeOmkBC25HCRGmL_y-", + "name": "test-trigger", + "severity": "1", + "condition": { + "script": { + "source": "return true", + "lang": "painless" + } + }, + "actions": [{ + "name": "test-action", + "destination_id": "RtaaOmkBC25HCRGm0fxi", + "subject_template": { + "source": "My Message Subject", + "lang": "mustache" + }, + "message_template": { + "source": "This is my message body.", + "lang": "mustache" + } + }] + }], + "last_update_time": 1551466639295 +} +``` + +#### Sample response + +```json +{ + "_id": "Q9aXOmkBC25HCRGmzfw-", + "_version": 4, + "monitor": { + "type": "monitor", + "name": "test-monitor", + "enabled": true, + "enabled_time": 1551466220455, + "schedule": { + "period": { + "interval": 1, + "unit": "MINUTES" + } + }, + "inputs": [{ + "search": { + "indices": [ + "*" + ], + "query": { + "query": { + "match_all": { + "boost": 1 + } + } + } + } + }], + "triggers": [{ + "id": "StaeOmkBC25HCRGmL_y-", + "name": "test-trigger", + "severity": "1", + "condition": { + "script": { + "source": "return true", + "lang": "painless" + } + }, + "actions": [{ + "name": "test-action", + "destination_id": "RtaaOmkBC25HCRGm0fxi", + "subject_template": { + "source": "My Message Subject", + "lang": "mustache" + }, + "message_template": { + "source": "This is my message body.", + "lang": "mustache" + } + }] + }], + "last_update_time": 1551466761596 + } +} +``` + + +--- + +## Get monitor + +#### Request + +``` +GET _opensearch/_alerting/monitors/ +``` + +#### Sample response + +```json +{ + "_id": "Q9aXOmkBC25HCRGmzfw-", + "_version": 3, + "monitor": { + "type": "monitor", + "name": "test-monitor", + "enabled": true, + "enabled_time": 1551466220455, + "schedule": { + "period": { + "interval": 1, + "unit": "MINUTES" + } + }, + "inputs": [{ + "search": { + "indices": [ + "*" + ], + "query": { + "query": { + "match_all": { + "boost": 1 + } + } + } + } + }], + "triggers": [{ + "id": "StaeOmkBC25HCRGmL_y-", + "name": "test-trigger", + "severity": "1", + "condition": { + "script": { + "source": "return true", + "lang": "painless" + } + }, + "actions": [{ + "name": "test-action", + "destination_id": "RtaaOmkBC25HCRGm0fxi", + "subject_template": { + "source": "My Message Subject", + "lang": "mustache" + }, + "message_template": { + "source": "This is my message body.", + "lang": "mustache" + } + }] + }], + "last_update_time": 1551466639295 + } +} +``` + + +--- + +## Monitor stats + +Returns statistics about the alerting feature. Use `_opensearch/_alerting/stats` to find node IDs and metrics. Then you can drill down using those values. + +#### Request + +```json +GET _opensearch/_alerting/stats +GET _opensearch/_alerting/stats/ +GET _opensearch/_alerting//stats +GET _opensearch/_alerting//stats/ +``` + +#### Sample response + +```json +{ + "_nodes": { + "total": 9, + "successful": 9, + "failed": 0 + }, + "cluster_name": "475300751431:alerting65-dont-delete", + "opensearch.scheduled_jobs.enabled": true, + "scheduled_job_index_exists": true, + "scheduled_job_index_status": "green", + "nodes_on_schedule": 9, + "nodes_not_on_schedule": 0, + "nodes": { + "qWcbKbb-TVyyI-Q7VSeOqA": { + "name": "qWcbKbb", + "schedule_status": "green", + "roles": [ + "MASTER" + ], + "job_scheduling_metrics": { + "last_full_sweep_time_millis": 207017, + "full_sweep_on_time": true + }, + "jobs_info": {} + }, + "Do-DX9ZcS06Y9w1XbSJo1A": { + "name": "Do-DX9Z", + "schedule_status": "green", + "roles": [ + "DATA", + "INGEST" + ], + "job_scheduling_metrics": { + "last_full_sweep_time_millis": 230516, + "full_sweep_on_time": true + }, + "jobs_info": {} + }, + "n5phkBiYQfS5I0FDzcqjZQ": { + "name": "n5phkBi", + "schedule_status": "green", + "roles": [ + "MASTER" + ], + "job_scheduling_metrics": { + "last_full_sweep_time_millis": 228406, + "full_sweep_on_time": true + }, + "jobs_info": {} + }, + "Tazzo8cQSY-g3vOjgYYLzA": { + "name": "Tazzo8c", + "schedule_status": "green", + "roles": [ + "DATA", + "INGEST" + ], + "job_scheduling_metrics": { + "last_full_sweep_time_millis": 211722, + "full_sweep_on_time": true + }, + "jobs_info": { + "i-wsFmkB8NzS6aXjQSk0": { + "last_execution_time": 1550864912882, + "running_on_time": true + } + } + }, + "Nyf7F8brTOSJuFPXw6CnpA": { + "name": "Nyf7F8b", + "schedule_status": "green", + "roles": [ + "DATA", + "INGEST" + ], + "job_scheduling_metrics": { + "last_full_sweep_time_millis": 223300, + "full_sweep_on_time": true + }, + "jobs_info": { + "NbpoFmkBeSe-hD59AKgE": { + "last_execution_time": 1550864928354, + "running_on_time": true + }, + "-LlLFmkBeSe-hD59Ydtb": { + "last_execution_time": 1550864732727, + "running_on_time": true + }, + "pBFxFmkBNXkgNmTBaFj1": { + "last_execution_time": 1550863325024, + "running_on_time": true + }, + "hfasEmkBNXkgNmTBrvIW": { + "last_execution_time": 1550862000001, + "running_on_time": true + } + } + }, + "oOdJDIBVT5qbbO3d8VLeEw": { + "name": "oOdJDIB", + "schedule_status": "green", + "roles": [ + "DATA", + "INGEST" + ], + "job_scheduling_metrics": { + "last_full_sweep_time_millis": 227570, + "full_sweep_on_time": true + }, + "jobs_info": { + "4hKRFmkBNXkgNmTBKjYX": { + "last_execution_time": 1550864806101, + "running_on_time": true + } + } + }, + "NRDG6JYgR8m0GOZYQ9QGjQ": { + "name": "NRDG6JY", + "schedule_status": "green", + "roles": [ + "MASTER" + ], + "job_scheduling_metrics": { + "last_full_sweep_time_millis": 227652, + "full_sweep_on_time": true + }, + "jobs_info": {} + }, + "URMrXRz3Tm-CB72hlsl93Q": { + "name": "URMrXRz", + "schedule_status": "green", + "roles": [ + "DATA", + "INGEST" + ], + "job_scheduling_metrics": { + "last_full_sweep_time_millis": 231048, + "full_sweep_on_time": true + }, + "jobs_info": { + "m7uKFmkBeSe-hD59jplP": { + "running_on_time": true + } + } + }, + "eXgt1k9oTRCLmx2HBGElUw": { + "name": "eXgt1k9", + "schedule_status": "green", + "roles": [ + "DATA", + "INGEST" + ], + "job_scheduling_metrics": { + "last_full_sweep_time_millis": 229234, + "full_sweep_on_time": true + }, + "jobs_info": { + "wWkFFmkBc2NG-PeLntxk": { + "running_on_time": true + }, + "3usNFmkB8NzS6aXjO1Gs": { + "last_execution_time": 1550863959848, + "running_on_time": true + } + } + } + } +} +``` + + +--- + +## Delete monitor + +#### Request + +``` +DELETE _opensearch/_alerting/monitors/ +``` + +#### Sample response + +```json +{ + "_index": ".opensearch-scheduled-jobs", + "_type": "_doc", + "_id": "OYAHOmgBl3cmwnqZl_yH", + "_version": 2, + "result": "deleted", + "forced_refresh": true, + "_shards": { + "total": 2, + "successful": 2, + "failed": 0 + }, + "_seq_no": 11, + "_primary_term": 1 +} +``` + + +--- + +## Search monitors + +#### Request + +```json +GET _opensearch/_alerting/monitors/_search +{ + "query": { + "match" : { + "monitor.name": "my-monitor-name" + } + } +} +``` + +#### Sample response + +```json +{ + "took": 17, + "timed_out": false, + "_shards": { + "total": 5, + "successful": 5, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": 1, + "max_score": 0.6931472, + "hits": [{ + "_index": ".opensearch-scheduled-jobs", + "_type": "_doc", + "_id": "eGQi7GcBRS7-AJEqfAnr", + "_score": 0.6931472, + "_source": { + "type": "monitor", + "name": "my-monitor-name", + "enabled": true, + "enabled_time": 1545854942426, + "schedule": { + "period": { + "interval": 1, + "unit": "MINUTES" + } + }, + "inputs": [{ + "search": { + "indices": [ + "*" + ], + "query": { + "size": 0, + "query": { + "bool": { + "filter": [{ + "range": { + "@timestamp": { + "from": "{{period_end}}||-1h", + "to": "{{period_end}}", + "include_lower": true, + "include_upper": true, + "format": "epoch_millis", + "boost": 1 + } + } + }], + "adjust_pure_negative": true, + "boost": 1 + } + }, + "aggregations": {} + } + } + }], + "triggers": [{ + "id": "Sooi7GcB53a0ewuj_6MH", + "name": "Over", + "severity": "1", + "condition": { + "script": { + "source": "_ctx.results[0].hits.total > 400000", + "lang": "painless" + } + }, + "actions": [] + }], + "last_update_time": 1545854975758 + } + }] + } +} +``` + + +--- + +## Run monitor + +You can add the optional `?dryrun=true` parameter to the URL to show the results of a run without actions sending any message. + + +#### Request + +```json +POST _opensearch/_alerting/monitors//_execute +``` + +#### Sample response + +```json +{ + "monitor_name": "logs", + "period_start": 1547161872322, + "period_end": 1547161932322, + "error": null, + "trigger_results": { + "Sooi7GcB53a0ewuj_6MH": { + "name": "Over", + "triggered": true, + "error": null, + "action_results": {} + } + } +} +``` + +--- + +## Get alerts + +Returns an array of all alerts. + +#### Request + +```json +GET _opensearch/_alerting/monitors/alerts +``` + +#### Response + +```json +{ + "alerts": [ + { + "id": "eQURa3gBKo1jAh6qUo49", + "version": 300, + "monitor_id": "awUMa3gBKo1jAh6qu47E", + "schema_version": 2, + "monitor_version": 2, + "monitor_name": "Example_monitor_name", + "monitor_user": { + "name": "admin", + "backend_roles": [ + "admin" + ], + "roles": [ + "all_access", + "own_index" + ], + "custom_attribute_names": [], + "user_requested_tenant": null + }, + "trigger_id": "bQUQa3gBKo1jAh6qnY6G", + "trigger_name": "Example_trigger_name", + "state": "ACTIVE", + "error_message": null, + "alert_history": [ + { + "timestamp": 1617314504873, + "message": "Example error emssage" + }, + { + "timestamp": 1617312543925, + "message": "Example error message" + } + ], + "severity": "1", + "action_execution_results": [ + { + "action_id": "bgUQa3gBKo1jAh6qnY6G", + "last_execution_time": 1617317979908, + "throttled_count": 0 + } + ], + "start_time": 1616704000492, + "last_notification_time": 1617317979908, + "end_time": null, + "acknowledged_time": null + } + ], + "totalAlerts": 1 +} +``` + +--- + +## Acknowledge alert + +[After getting your alerts](#get-alerts/), you can acknowledge any number of active alerts in one call. If the alert is already in an ERROR, COMPLETED, or ACKNOWLEDGED state, it appears in the `failed` array. + + +#### Request + +```json +POST _opensearch/_alerting/monitors//_acknowledge/alerts +{ + "alerts": ["eQURa3gBKo1jAh6qUo49"] +} +``` + +#### Sample response + +```json +{ + "success": [ + "eQURa3gBKo1jAh6qUo49" + ], + "failed": [] +} +``` + +--- + +## Create destination + +#### Requests + +```json +POST _opensearch/_alerting/destinations +{ + "name": "my-destination", + "type": "slack", + "slack": { + "url": "http://www.example.com" + } +} + +POST _opensearch/_alerting/destinations +{ + "type": "custom_webhook", + "name": "my-custom-destination", + "custom_webhook": { + "path": "incomingwebhooks/123456-123456-XXXXXX", + "header_params": { + "Content-Type": "application/json" + }, + "scheme": "HTTPS", + "port": 443, + "query_params": { + "token": "R2x1UlN4ZHF8MXxxVFJpelJNVDgzdGNwXXXXXXXXX" + }, + "host": "hooks.chime.aws" + } +} +``` + +#### Sample response + +```json +{ + "_id": "nO-yFmkB8NzS6aXjJdiI", + "_version": 1, + "destination": { + "type": "slack", + "name": "my-destination", + "last_update_time": 1550863967624, + "slack": { + "url": "http://www.example.com" + } + } +} +``` + + +--- + +## Update destination + +#### Request + +```json +PUT _opensearch/_alerting/destinations/ +{ + "name": "my-updated-destination", + "type": "slack", + "slack": { + "url": "http://www.example.com" + } +} +``` + +#### Sample response + +```json +{ + "_id": "pe-1FmkB8NzS6aXjqvVY", + "_version": 4, + "destination": { + "type": "slack", + "name": "my-updated-destination", + "last_update_time": 1550864289375, + "slack": { + "url": "http://www.example.com" + } + } +} +``` + + +--- + +## Get destination + +Retrieve one destination. + +#### Requests + +```json +GET _opensearch/_alerting/destinations/ +``` + +#### Sample response + +```json +{ + "totalDestinations": 1, + "destinations": [{ + "id": "1a2a3a4a5a6a7a", + "type": "slack", + "name": "sample-destination", + "user": { + "name": "psantos", + "backend_roles": [ + "human-resources" + ], + "roles": [ + "alerting_full_access", + "hr-role" + ], + "custom_attribute_names": [] + }, + "schema_version": 3, + "seq_no": 0, + "primary_term": 6, + "last_update_time": 1603943261722, + "slack": { + "url": "https://example.com" + } + } + ] +} +``` + + +--- + +## Get destinations + +Retrieve all destinations. + +#### Requests + +```json +GET _opensearch/_alerting/destinations +``` + +#### Sample response + +```json +{ + "totalDestinations": 1, + "destinations": [{ + "id": "1a2a3a4a5a6a7a", + "type": "slack", + "name": "sample-destination", + "user": { + "name": "psantos", + "backend_roles": [ + "human-resources" + ], + "roles": [ + "alerting_full_access", + "hr-role" + ], + "custom_attribute_names": [] + }, + "schema_version": 3, + "seq_no": 0, + "primary_term": 6, + "last_update_time": 1603943261722, + "slack": { + "url": "https://example.com" + } + } + ] +} +``` + + +--- + +## Delete destination + +#### Request + +``` +DELETE _opensearch/_alerting/destinations/ +``` + +#### Sample response + +```json +{ + "_index": ".opensearch-alerting-config", + "_type": "_doc", + "_id": "Zu-zFmkB8NzS6aXjLeBI", + "_version": 2, + "result": "deleted", + "forced_refresh": true, + "_shards": { + "total": 2, + "successful": 2, + "failed": 0 + }, + "_seq_no": 8, + "_primary_term": 1 +} +``` +--- + +## Create email account + +#### Request +```json +POST _opensearch/_alerting/destinations/email_accounts +{ + "name": "example_account", + "email": "example@email.com", + "host": "smtp.email.com", + "port": 465, + "method": "ssl" +} +``` + +#### Sample response +```json +{ + "_id" : "email_account_id", + "_version" : 1, + "_seq_no" : 7, + "_primary_term" : 2, + "email_account" : { + "schema_version" : 2, + "name" : "example_account", + "email" : "example@email.com", + "host" : "smtp.email.com", + "port" : 465, + "method" : "ssl" + } +} +``` + +## Update email account + +#### Request +```json +PUT _opensearch/_alerting/destinations/email_accounts/ +{ + "name": "example_account", + "email": "example@email.com", + "host": "smtp.email.com", + "port": 465, + "method": "ssl" +} +``` +#### Sample response +```json +{ + "_id" : "email_account_id", + "_version" : 3, + "_seq_no" : 19, + "_primary_term" : 2, + "email_account" : { + "schema_version" : 2, + "name" : "example_account", + "email" : "example@email.com", + "host" : "smtp.email.com", + "port" : 465, + "method" : "ssl" + } +} +``` + +## Get email account + +#### Request +```json +GET _opensearch/_alerting/destinations/email_accounts/ +{ + "name": "example_account", + "email": "example@email.com", + "host": "smtp.email.com", + "port": 465, + "method": "ssl" +} +``` +#### Sample response +```json +{ + "_id" : "email_account_id", + "_version" : 2, + "_seq_no" : 8, + "_primary_term" : 2, + "email_account" : { + "schema_version" : 2, + "name" : "test_account", + "email" : "test@email.com", + "host" : "smtp.test.com", + "port" : 465, + "method" : "ssl" + } +} +``` + +## Delete email account + +#### Request +``` +DELETE _opensearch/_alerting/destinations/email_accounts/ +``` +#### Sample response + +```json +{ + "_index" : ".opensearch-alerting-config", + "_type" : "_doc", + "_id" : "email_account_id", + "_version" : 1, + "result" : "deleted", + "forced_refresh" : true, + "_shards" : { + "total" : 2, + "successful" : 2, + "failed" : 0 + }, + "_seq_no" : 12, + "_primary_term" : 2 +} +``` + +## Search email account + +#### Request + +```json +POST _opensearch/_alerting/destinations/email_accounts/_search +{ + "from": 0, + "size": 20, + "sort": { "email_account.name.keyword": "desc" }, + "query": { + "bool": { + "must": { + "match_all": {} + } + } + } +} +``` + +#### Sample response + +```json +{ + "took" : 8, + "timed_out" : false, + "_shards" : { + "total" : 1, + "successful" : 1, + "skipped" : 0, + "failed" : 0 + }, + "hits" : { + "total" : { + "value" : 2, + "relation" : "eq" + }, + "max_score" : null, + "hits" : [ + { + "_index" : ".opensearch-alerting-config", + "_type" : "_doc", + "_id" : "email_account_id", + "_seq_no" : 8, + "_primary_term" : 2, + "_score" : null, + "_source" : { + "schema_version" : 2, + "name" : "example_account", + "email" : "example@email.com", + "host" : "smtp.email.com", + "port" : 465, + "method" : "ssl" + }, + "sort" : [ + "example_account" + ] + }, + ... + ] + } +} +``` + +--- + +## Create email group + +#### Request + +```json +POST _opensearch/_alerting/destinations/email_groups +{ + "name": "example_email_group", + "emails": [{ + "email": "example@email.com" + }] +} +``` + +#### Sample response + +```json +{ + "_id" : "email_group_id", + "_version" : 1, + "_seq_no" : 9, + "_primary_term" : 2, + "email_group" : { + "schema_version" : 2, + "name" : "example_email_group", + "emails" : [ + { + "email" : "example@email.com" + } + ] + } +} +``` + +## Update email group + +#### Request + +```json +PUT _opensearch/_alerting/destinations/email_groups/ +{ + "name": "example_email_group", + "emails": [{ + "email": "example@email.com" + }] +} +``` +#### Sample response + +```json +{ + "_id" : "email_group_id", + "_version" : 4, + "_seq_no" : 17, + "_primary_term" : 2, + "email_group" : { + "schema_version" : 2, + "name" : "example_email_group", + "emails" : [ + { + "email" : "example@email.com" + } + ] + } +} +``` + +## Get email group + +#### Request +```json +GET _opensearch/_alerting/destinations/email_groups/ +{ + "name": "example_email_group", + "emails": [{ + "email": "example@email.com" + }] +} +``` +#### Sample response + +```json +{ + "_id" : "email_group_id", + "_version" : 4, + "_seq_no" : 17, + "_primary_term" : 2, + "email_group" : { + "schema_version" : 2, + "name" : "example_email_group", + "emails" : [ + { + "email" : "example@email.com" + } + ] + } +} +``` + +## Delete email group + +#### Request +``` +DELETE _opensearch/_alerting/destinations/email_groups/ +``` +#### Sample response + +```json +{ + "_index" : ".opensearch-alerting-config", + "_type" : "_doc", + "_id" : "email_group_id", + "_version" : 1, + "result" : "deleted", + "forced_refresh" : true, + "_shards" : { + "total" : 2, + "successful" : 2, + "failed" : 0 + }, + "_seq_no" : 11, + "_primary_term" : 2 +} +``` + +## Search email group + +#### Request + +```json +POST _opensearch/_alerting/destinations/email_groups/_search +{ + "from": 0, + "size": 20, + "sort": { "email_group.name.keyword": "desc" }, + "query": { + "bool": { + "must": { + "match_all": {} + } + } + } +} +``` + +#### Sample response + +```json +{ + "took" : 7, + "timed_out" : false, + "_shards" : { + "total" : 1, + "successful" : 1, + "skipped" : 0, + "failed" : 0 + }, + "hits" : { + "total" : { + "value" : 5, + "relation" : "eq" + }, + "max_score" : null, + "hits" : [ + { + "_index" : ".opensearch-alerting-config", + "_type" : "_doc", + "_id" : "email_group_id", + "_seq_no" : 10, + "_primary_term" : 2, + "_score" : null, + "_source" : { + "schema_version" : 2, + "name" : "example_email_group", + "emails" : [ + { + "email" : "example@email.com" + } + ] + }, + "sort" : [ + "example_email_group" + ] + }, + ... + ] + } +} +``` +--- diff --git a/docs/alerting/cron.md b/docs/alerting/cron.md new file mode 100644 index 00000000..65a61c0b --- /dev/null +++ b/docs/alerting/cron.md @@ -0,0 +1,64 @@ +--- +layout: default +title: Cron +nav_order: 20 +parent: Alerting +has_children: false +--- + +# Cron expression reference + +Monitors can run at a variety of fixed intervals (e.g. hourly, daily, etc.), but you can also define custom cron expressions for when they should run. Monitors use the Unix cron syntax and support five fields: + +Field | Valid values +:--- | :--- +Minute | 0-59 +Hour | 0-23 +Day of month | 1-31 +Month | 1-12 +Day of week | 0-7 (0 and 7 are both Sunday) or SUN, MON, TUE, WED, THU, FRI, SAT + +For example, the following expression translates to "every Monday through Friday at 11:30 AM": + +``` +30 11 * * 1-5 +``` + + +## Features + +Feature | Description +:--- | :--- +`*` | Wildcard. Specifies all valid values. +`,` | List. Use to specify several values (e.g. `1,15,30`). +`-` | Range. Use to specify a range of values (e.g. `1-15`). +`/` | Step. Use after a wildcard or range to specify the "step" between values. For example, `0-11/2` is equivalent to `0,2,4,6,8,10`. + +Note that you can specify the day using two fields: day of month and day of week. For most situations, we recommend that you use just one of these fields and leave the other as `*`. + +If you use a non-wildcard value in both fields, the monitor runs when either field matches the time. For example, `15 2 1,15 * 1` causes the monitor to run at 2:15 AM on the 1st of the month, the 15th of the month, and every Monday. + + +## Sample expressions + +Every other day at 1:45 PM: + +``` +45 13 1-31/2 * * +``` + +Every 10 minutes on Saturday and Sunday: + +``` +0/10 * * * 6-7 +``` + +Every three hours on the first day of every other month: + +``` +0 0-23/3 1 1-12/2 * +``` + +## API + +For an example of how to use a custom cron expression in an API call, see the [create monitor API operation](../api/#request-1). diff --git a/docs/alerting/index.md b/docs/alerting/index.md new file mode 100644 index 00000000..3826203a --- /dev/null +++ b/docs/alerting/index.md @@ -0,0 +1,16 @@ +--- +layout: default +title: Alerting +nav_order: 34 +has_children: true +--- + +# Alerting +OpenSearch Dashboards +{: .label .label-yellow :} + +The alerting feature notifies you when data from one or more OpenSearch indices meets certain conditions. For example, you might want to notify a [Slack](https://slack.com/) channel if your application logs more than five HTTP 503 errors in one hour, or you might want to page a developer if no new documents have been indexed in the past 20 minutes. + +To get started, choose **Alerting** in OpenSearch Dashboards. + +![OpenSearch Dashboards side bar with link](../images/alerting.png) diff --git a/docs/alerting/monitors.md b/docs/alerting/monitors.md new file mode 100644 index 00000000..4a2bfbcf --- /dev/null +++ b/docs/alerting/monitors.md @@ -0,0 +1,331 @@ +--- +layout: default +title: Monitors +nav_order: 1 +parent: Alerting +has_children: false +--- + +# Monitors + +#### Table of contents +- TOC +{:toc} + + +--- + +## Key terms + +Term | Definition +:--- | :--- +Monitor | A job that runs on a defined schedule and queries OpenSearch. The results of these queries are then used as input for one or more *triggers*. +Trigger | Conditions that, if met, generate *alerts* and can perform some *action*. +Alert | A notification that a monitor's trigger condition has been met. +Action | The information that you want the monitor to send out after being triggered. Actions have a *destination*, a message subject, and a message body. +Destination | A reusable location for an action, such as Amazon Chime, Slack, or a webhook URL. + + +--- + +## Create destinations + +1. Choose **Alerting**, **Destinations**, **Add destination**. +1. Specify a name for the destination so that you can identify it later. +1. For **Type**, choose Slack, Amazon Chime, custom webhook, or [email](#email-as-a-destination). + +For Email type, refer to [Email as a destination](#email-as-a-destination) section below. For all other types, specify the webhook URL. For more information about webhooks, see the documentation for [Slack](https://api.slack.com/incoming-webhooks) and [Chime](https://docs.aws.amazon.com/chime/latest/ug/webhooks.html). + +For custom webhooks, you must specify more information: parameters and headers. For example, if your endpoint requires basic authentication, you might need to add a header with a key of `Authorization` and a value of `Basic `. You might also need to change `Content-Type` to whatever your webhook requires. Popular values are `application/json`, `application/xml`, and `text/plain`. + +This information is stored in plain text in the OpenSearch cluster. We will improve this design in the future, but for now, the encoded credentials (which are neither encrypted nor hashed) might be visible to other OpenSearch users. + + +### Email as a destination + +To send or receive an alert notification as an email, choose **Email** as the destination type. Next, add at least one sender and recipient. We recommend adding email groups if you want to notify more than a few people of an alert. You can configure senders and recipients using **Manage senders** and **Manage email groups**. + + +#### Manage senders + +Senders are email accounts from which the alerting plugin sends notifications. + +To configure a sender email, do the following: + +1. After you choose **Email** as the destination type, choose **Manage senders**. +1. Choose **Add sender**, **New sender** and enter a unique name. +1. Enter the email address, SMTP host (e.g. `smtp.gmail.com` for a Gmail account), and the port. +1. Choose an encryption method, or use the default value of **None**. However, most email providers require SSL or TLS, which requires a username and password in OpenSearch keystore. Refer to [Authenticate sender account](#authenticate-sender-account) to learn more. +1. Choose **Save** to save the configuration and create the sender. You can create a sender even before you add your credentials to the OpenSearch keystore. However, you must [authenticate each sender account](#authenticate-sender-account) before you use the destination to send your alert. + +You can reuse senders across many different destinations, but each destination only supports one sender. + + +#### Manage email groups or recipients + +Use email groups to create and manage reusable lists of email addresses. For example, one alert might email the DevOps team, whereas another might email the executive team and the engineering team. + +You can enter individual email addresses or an email group in the **Recipients** field. + +1. After you choose **Email** as the destination type, choose **Manage email groups**. Then choose **Add email group**, **New email group**. +1. Enter a unique name. +1. For recipient emails, enter any number of email addresses. +1. Choose **Save**. + + +#### Authenticate sender account + +If your email provider requires SSL or TLS, you must authenticate each sender account before you can send an email. Enter these credentials in the OpenSearch keystore using the CLI. Run the following commands (in your OpenSearch directory) to enter your username and password. The `` is the name you entered for **Sender** earlier. + +```bash +./bin/opensearch-keystore add opensearch.alerting.destination.email..username +./bin/opensearch-keystore add opensearch.alerting.destination.email..password +``` + +**Note**: Keystore settings are node-specific. You must run these commands on each node. +{: .note} + +To change or update your credentials (after you've added them to the keystore on every node), call the reload API to automatically update those credentials without restarting OpenSearch: + +```json +POST _nodes/reload_secure_settings +{ + "secure_settings_password": "1234" +} +``` + + +--- + +## Create monitors + +1. Choose **Alerting**, **Monitors**, **Create monitor**. +1. Specify a name for the monitor. + +The anomaly detection option is for pairing with the anomaly detection plugin. See [Anomaly Detection](../../ad/). +For anomaly detector, choose an appropriate schedule for the monitor based on the detector interval. Otherwise, the alerting monitor might miss reading the results. + +For example, assume you set the monitor interval and the detector interval as 5 minutes, and you start the detector at 12:00. If an anomaly is detected at 12:05, it might be available at 12:06 because of the delay between writing the anomaly and it being available for queries. The monitor reads the anomaly results between 12:00 and 12:05, so it does not get the anomaly results available at 12:06. + +To avoid this issue, make sure the alerting monitor is at least twice the detector interval. +When you create a monitor using OpenSearch Dashboards, the anomaly detector plugin generates a default monitor schedule that's twice the detector interval. + +Whenever you update a detector’s interval, make sure to update the associated monitor interval as well, as the anomaly detection plugin does not do this automatically. + +1. Choose one or more indices. You can also use `*` as a wildcard to specify an index pattern. + + If you use the security plugin, you can only choose indices that you have permission to access. For details, see [Alerting security](../security/). + +1. Define the monitor in one of three ways: visually, using a query, or using an anomaly detector. + + - Visual definition works well for monitors that you can define as "some value is above or below some threshold for some amount of time." + + - Query definition gives you flexibility in terms of what you query for (using [the OpenSearch query DSL](../../opensearch/full-text)) and how you evaluate the results of that query (Painless scripting). + + This example averages the `cpu_usage` field: + + ```json + { + "size": 0, + "query": { + "match_all": {} + }, + "aggs": { + "avg_cpu": { + "avg": { + "field": "cpu_usage" + } + } + } + } + ``` + + You can even filter query results using `{% raw %}{{period_start}}{% endraw %}` and `{% raw %}{{period_end}}{% endraw %}`: + + ```json + { + "size": 0, + "query": { + "bool": { + "filter": [{ + "range": { + "timestamp": { + "from": "{% raw %}{{period_end}}{% endraw %}||-1h", + "to": "{% raw %}{{period_end}}{% endraw %}", + "include_lower": true, + "include_upper": true, + "format": "epoch_millis", + "boost": 1 + } + } + }], + "adjust_pure_negative": true, + "boost": 1 + } + }, + "aggregations": {} + } + ``` + + "Start" and "end" refer to the interval at which the monitor runs. See [Available variables](#available-variables). + + +1. To define a monitor visually, choose **Define using visual graph**. Then choose an aggregation (for example, `count()` or `average()`), a set of documents, and a timeframe. Visual definition works well for most monitors. + + To use a query, choose **Define using extraction query**, add your query (using [the OpenSearch query DSL](../../opensearch/full-text/)), and test it using the **Run** button. + + The monitor makes this query to OpenSearch as often as the schedule dictates; check the **Query Performance** section and make sure you're comfortable with the performance implications. + + To use an anomaly detector, choose **Define using Anomaly detector** and select your **Detector**. +1. Choose a frequency and timezone for your monitor. Note that you can only pick a timezone if you choose Daily, Weekly, Monthly, or [custom cron expression](../cron/) for frequency. +1. Choose **Create**. + + +--- + +## Create triggers + +The next step in creating a monitor is to create a trigger. These steps differ depending on whether you chose **Define using visual graph** or **Define using extraction query** or **Define using Anomaly detector** when you created the monitor. + +Either way, you begin by specifying a name and severity level for the trigger. Severity levels help you manage alerts. A trigger with a high severity level (e.g. 1) might page a specific individual, whereas a trigger with a low severity level might message a chat room. + + +### Visual graph + +For **Trigger condition**, specify a threshold for the aggregation and timeframe you chose earlier, such as "is below 1,000" or "is exactly 10." + +The line moves up and down as you increase and decrease the threshold. Once this line is crossed, the trigger evaluates to true. + + +### Extraction query + +For **Trigger condition**, specify a Painless script that returns true or false. Painless is the default OpenSearch scripting language and has a syntax similar to Groovy. + +Trigger condition scripts revolve around the `ctx.results[0]` variable, which corresponds to the extraction query response. For example, your script might reference `ctx.results[0].hits.total.value` or `ctx.results[0].hits.hits[i]._source.error_code`. + +A return value of true means the trigger condition has been met, and the trigger should execute its actions. Test your script using the **Run** button. + +The **Info** link next to **Trigger condition** contains a useful summary of the variables and results available to your query. +{: .tip } + + +### Anomaly detector + +For **Trigger type**, choose **Anomaly detector grade and confidence**. + +Specify the **Anomaly grade condition** for the aggregation and timeframe you chose earlier, "IS ABOVE 0.7" or "IS EXACTLY 0.5." The *anomaly grade* is a number between 0 and 1 that indicates the level of severity of how anomalous a data point is. + +Specify the **Anomaly confidence condition** for the aggregation and timeframe you chose earlier, "IS ABOVE 0.7" or "IS EXACTLY 0.5." The *anomaly confidence* is an estimate of the probability that the reported anomaly grade matches the expected anomaly grade. + +The line moves up and down as you increase and decrease the threshold. Once this line is crossed, the trigger evaluates to true. + + +#### Sample scripts + +{::comment} +These scripts are Painless, not Groovy, but calling them Groovy in Jekyll gets us syntax highlighting in the generated HTML. +{:/comment} + +```groovy +// Evaluates to true if the query returned any documents +ctx.results[0].hits.total.value > 0 +``` + +```groovy +// Returns true if the avg_cpu aggregation exceeds 90 +if (ctx.results[0].aggregations.avg_cpu.value > 90) { + return true; +} +``` + +```groovy +// Performs some crude custom scoring and returns true if that score exceeds a certain value +int score = 0; +for (int i = 0; i < ctx.results[0].hits.hits.length; i++) { + // Weighs 500 errors 10 times as heavily as 503 errors + if (ctx.results[0].hits.hits[i]._source.http_status_code == "500") { + score += 10; + } else if (ctx.results[0].hits.hits[i]._source.http_status_code == "503") { + score += 1; + } +} +if (score > 99) { + return true; +} else { + return false; +} +``` + + +#### Available variables + +Variable | Description +:--- | :--- +`ctx.results` | An array with one element (i.e. `ctx.results[0]`). Contains the query results. This variable is empty if the trigger was unable to retrieve results. See `ctx.error`. +`ctx.monitor` | Includes `ctx.monitor.name`, `ctx.monitor.type`, `ctx.monitor.enabled`, `ctx.monitor.enabled_time`, `ctx.monitor.schedule`, `ctx.monitor.inputs`, `triggers` and `ctx.monitor.last_update_time`. +`ctx.trigger` | Includes `ctx.trigger.name`, `ctx.trigger.severity`, `ctx.trigger.condition`, and `ctx.trigger.actions`. +`ctx.periodStart` | Unix timestamp for the beginning of the period during which the alert triggered. For example, if a monitor runs every ten minutes, a period might begin at 10:40 and end at 10:50. +`ctx.periodEnd` | The end of the period during which the alert triggered. +`ctx.error` | The error message if the trigger was unable to retrieve results or unable to evaluate the trigger, typically due to a compile error or null pointer exception. Null otherwise. +`ctx.alert` | The current, active alert (if it exists). Includes `ctx.alert.id`, `ctx.alert.version`, and `ctx.alert.isAcknowledged`. Null if no alert is active. + + +--- + +## Add actions + +The final step in creating a monitor is to add one or more actions. Actions send notifications when trigger conditions are met and support [Slack](https://slack.com/), [Amazon Chime](https://aws.amazon.com/chime/), and webhooks. + +If you don't want to receive notifications for alerts, you don't have to add actions to your triggers. Instead, you can periodically check OpenSearch Dashboards. +{: .tip } + +1. Specify a name for the action. +1. Choose a destination. +1. Add a subject and body for the message. + + You can add variables to your messages using [Mustache templates](https://mustache.github.io/mustache.5.html). You have access to `ctx.action.name`, the name of the current action, as well as all [trigger variables](#available-variables). + + If your destination is a custom webhook that expects a particular data format, you might need to include JSON (or even XML) directly in the message body: + + ```json + {% raw %}{ "text": "Monitor {{ctx.monitor.name}} just entered alert status. Please investigate the issue. - Trigger: {{ctx.trigger.name}} - Severity: {{ctx.trigger.severity}} - Period start: {{ctx.periodStart}} - Period end: {{ctx.periodEnd}}" }{% endraw %} + ``` + + In this case, the message content must conform to the `Content-Type` header in the [custom webhook](#create-destinations). + +1. (Optional) Use action throttling to limit the number of notifications you receive within a given span of time. + + For example, if a monitor checks a trigger condition every minute, you could receive one notification per minute. If you set action throttling to 60 minutes, you receive no more than one notification per hour, even if the trigger condition is met dozens of times in that hour. + +1. Choose **Create**. + +After an action sends a message, the content of that message has left the purview of the security plugin. Securing access to the message (e.g. access to the Slack channel) is your responsibility. + + +#### Sample message + +```mustache +{% raw %}Monitor {{ctx.monitor.name}} just entered an alert state. Please investigate the issue. +- Trigger: {{ctx.trigger.name}} +- Severity: {{ctx.trigger.severity}} +- Period start: {{ctx.periodStart}} +- Period end: {{ctx.periodEnd}}{% endraw %} +``` + +If you want to use the `ctx.results` variable in a message, use `{% raw %}{{ctx.results.0}}{% endraw %}` rather than `{% raw %}{{ctx.results[0]}}{% endraw %}`. This difference is due to how Mustache handles bracket notation. +{: .note } + + +--- + +## Work with alerts + +Alerts persist until you resolve the root cause and have the following states: + +State | Description +:--- | :--- +Active | The alert is ongoing and unacknowledged. Alerts remain in this state until you acknowledge them, delete the trigger associated with the alert, or delete the monitor entirely. +Acknowledged | Someone has acknowledged the alert, but not fixed the root cause. +Completed | The alert is no longer ongoing. Alerts enter this state after the corresponding trigger evaluates to false. +Error | An error occurred while executing the trigger---usually the result of a a bad trigger or destination. +Deleted | Someone deleted the monitor or trigger associated with this alert while the alert was ongoing. diff --git a/docs/alerting/security.md b/docs/alerting/security.md new file mode 100644 index 00000000..30c19684 --- /dev/null +++ b/docs/alerting/security.md @@ -0,0 +1,79 @@ +--- +layout: default +title: Alerting Security +nav_order: 10 +parent: Alerting +has_children: false +--- + +# Alerting security + +If you use the security plugin alongside alerting, you might want to limit certain users to certain actions. For example, you might want some users to only be able to view and acknowledge alerts, while others can modify monitors and destinations. + + +## Basic permissions + +The security plugin has three built-in roles that cover most alerting use cases: `alerting_read_access`, `alerting_ack_alerts`, and `alerting_full_access`. For descriptions of each, see [Predefined roles](../../security/access-control/users-roles/#predefined-roles). + +If these roles don't meet your needs, mix and match individual alerting [permissions](../../security/access-control/permissions/) to suit your use case. Each action corresponds to an operation in the REST API. For example, the `cluster:admin/opensearch/alerting/destination/delete` permission lets you delete destinations. + + +## How monitors access data + +Monitors run with the permissions of the user who created or last modified them. For example, consider the user `jdoe`, who works at a chain of retail stores. `jdoe` has two roles. Together, these two roles allow read access to three indices: `store1-returns`, `store2-returns`, and `store3-returns`. + +`jdoe` creates a monitor that sends an email to management whenever the number of returns across all three indices exceeds 40 per hour. + +Later, the user `psantos` wants to edit the monitor to run every two hours, but `psantos` only has access to `store1-returns`. To make the change, `psantos` has two options: + +- Update the monitor so that it only checks `store1-returns`. +- Ask an administrator for read access to the other two indices. + +After making the change, the monitor now runs with the same permissions as `psantos`, including any [document-level security](../../security/access-control/document-level-security/) queries, [excluded fields](../../security/access-control/field-level-security/), and [masked fields](../../security/access-control/field-masking/). If you use an extraction query to define your monitor, use the **Run** button to ensure that the response includes the fields you need. + + +## (Advanced) Limit access by backend role + +Out of the box, the alerting plugin has no concept of ownership. For example, if you have the `cluster:admin/opensearch/alerting/monitor/write` permission, you can edit *all* monitors, regardless of whether you created them. If a small number of trusted users manage your monitors and destinations, this lack of ownership generally isn't a problem. A larger organization might need to segment access by backend role. + +First, make sure that your users have the appropriate [backend roles](../../security/access-control/). Backend roles usually come from an [LDAP server](../../security/configuration/ldap/) or [SAML provider](../../security/configuration/saml/). However, if you use the internal user database, you can use the REST API to [add them manually](../../security/access-control/api/#create-user). + +Next, enable the following setting: + +```json +PUT _cluster/settings +{ + "transient": { + "opensearch.alerting.filter_by_backend_roles": "true" + } +} +``` + +Now when users view alerting resources in OpenSearch Dashboards (or make REST API calls), they only see monitors and destinations that are created by users who share *at least one* backend role. For example, consider three users who all have full access to alerting: `jdoe`, `jroe`, and `psantos`. + +`jdoe` and `jroe` are on the same team at work and both have the `analyst` backend role. `psantos` has the `human-resources` backend role. + +If `jdoe` creates a monitor, `jroe` can see and modify it, but `psantos` can't. If that monitor generates an alert, the situation is the same: `jroe` can see and acknowledge it, but `psantos` can't. If `psantos` creates a destination, `jdoe` and `jroe` can't see or modify it. + + + diff --git a/docs/alerting/settings.md b/docs/alerting/settings.md new file mode 100644 index 00000000..30a16f59 --- /dev/null +++ b/docs/alerting/settings.md @@ -0,0 +1,59 @@ +--- +layout: default +title: Management +parent: Alerting +nav_order: 5 +--- + +# Management + + +## Alerting indices + +The alerting feature creates several indices and one alias. The security plugin demo script configures them as [system indices](../../security/configuration/system-indices/) for an extra layer of protection. Don't delete these indices or modify their contents without using the alerting APIs. + +Index | Purpose +:--- | :--- +`.opensearch-alerting-alerts` | Stores ongoing alerts. +`.opensearch-alerting-alert-history-` | Stores a history of completed alerts. +`.opensearch-alerting-config` | Stores monitors, triggers, and destinations. [Take a snapshot](../../opensearch/snapshot-restore) of this index to back up your alerting configuration. +`.opensearch-alerting-alert-history-write` (alias) | Provides a consistent URI for the `.opensearch-alerting-alert-history-` index. + +All alerting indices are hidden by default. For a summary, make the following request: + +``` +GET _cat/indices?expand_wildcards=open,hidden +``` + + +## Alerting settings + +We don't recommend changing these settings; the defaults should work well for most use cases. + +All settings are available using the OpenSearch `_cluster/settings` API. None require a restart, and all can be marked `persistent` or `transient`. + +Setting | Default | Description +:--- | :--- | :--- +`opensearch.scheduled_jobs.enabled` | true | Whether the alerting plugin is enabled or not. If disabled, all monitors immediately stop running. +`opensearch.alerting.index_timeout` | 60s | The timeout for creating monitors and destinations using the REST APIs. +`opensearch.alerting.request_timeout` | 10s | The timeout for miscellaneous requests from the plugin. +`opensearch.alerting.action_throttle_max_value` | 24h | The maximum amount of time you can set for action throttling. By default, this value displays as 1440 minutes in OpenSearch Dashboards. +`opensearch.alerting.input_timeout` | 30s | How long the monitor can take to issue the search request. +`opensearch.alerting.bulk_timeout` | 120s | How long the monitor can write alerts to the alert index. +`opensearch.alerting.alert_backoff_count` | 3 | The number of retries for writing alerts before the operation fails. +`opensearch.alerting.alert_backoff_millis` | 50ms | The amount of time to wait between retries---increases exponentially after each failed retry. +`opensearch.alerting.alert_history_rollover_period` | 12h | How frequently to check whether the `.opensearch-alerting-alert-history-write` alias should roll over to a new history index and whether the Alerting plugin should delete any history indices. +`opensearch.alerting.move_alerts_backoff_millis` | 250 | The amount of time to wait between retries---increases exponentially after each failed retry. +`opensearch.alerting.move_alerts_backoff_count` | 3 | The number of retries for moving alerts to a deleted state after their monitor or trigger has been deleted. +`opensearch.alerting.monitor.max_monitors` | 1000 | The maximum number of monitors users can create. +`opensearch.alerting.alert_history_max_age` | 30d | The oldest document to store in the `.opensearch-alert-history-` index before creating a new index. If the number of alerts in this time period does not exceed `alert_history_max_docs`, alerting creates one history index per period (e.g. one index every 30 days). +`opensearch.alerting.alert_history_max_docs` | 1000 | The maximum number of alerts to store in the `.opensearch-alert-history-` index before creating a new index. +`opensearch.alerting.alert_history_enabled` | true | Whether to create `.opensearch-alerting-alert-history-` indices. +`opensearch.alerting.alert_history_retention_period` | 60d | The amount of time to keep history indices before automatically deleting them. +`opensearch.alerting.destination.allow_list` | ["chime", "slack", "custom_webhook", "email", "test_action"] | The list of allowed destinations. If you don't want to allow users to a certain type of destination, you can remove it from this list, but we recommend leaving this setting as-is. +`opensearch.alerting.filter_by_backend_roles` | "false" | Restricts access to monitors by backend role. See [Alerting security](../security/). +`opensearch.scheduled_jobs.sweeper.period` | 5m | The alerting feature uses its "job sweeper" component to periodically check for new or updated jobs. This setting is the rate at which the sweeper checks to see if any jobs (monitors) have changed and need to be rescheduled. +`opensearch.scheduled_jobs.sweeper.page_size` | 100 | The page size for the sweeper. You shouldn't need to change this value. +`opensearch.scheduled_jobs.sweeper.backoff_millis` | 50ms | The amount of time the sweeper waits between retries---increases exponentially after each failed retry. +`opensearch.scheduled_jobs.sweeper.retry_count` | 3 | The total number of times the sweeper should retry before throwing an error. +`opensearch.scheduled_jobs.request_timeout` | 10s | The timeout for the request that sweeps shards for jobs. diff --git a/docs/async/index.md b/docs/async/index.md new file mode 100644 index 00000000..eced6c5b --- /dev/null +++ b/docs/async/index.md @@ -0,0 +1,251 @@ +--- +layout: default +title: Asynchronous search +nav_order: 51 +has_children: true +--- + +# Asynchronous Search + +Searching large volumes of data can take a long time, especially if you're searching across warm nodes or multiple remote clusters. + +Asynchronous search lets you run search requests that run in the background. You can monitor the progress of these searches and get back partial results as they become available. After the search finishes, you can save the results to examine at a later time. + +## REST API + +To perform an asynchronous search, send requests to `_opensearch/_asynchronous_search`, with your query in the request body: + +```json +POST _opensearch/_asynchronous_search +``` + +You can specify the following options. + +Options | Description | Default value | Required +:--- | :--- |:--- |:--- | +`wait_for_completion_timeout` | Specifies the amount of time that you plan to wait for the results. You can see whatever results you get within this time just like in a normal search. You can poll the remaining results based on an ID. The maximum value is 300 seconds. | 1 second | No +`keep_on_completion` | Specifies whether you want to save the results in the cluster after the search is complete. You can examine the stored results at a later time. | `false` | No +`keep_alive` | Specifies the amount of time that the result is saved in the cluster. For example, `2d` means that the results are stored in the cluster for 48 hours. The saved search results are deleted after this period or if the search is cancelled. Note that this includes the query execution time. If the query overruns this time, the process cancels this query automatically. | 12 hours | No + +#### Sample request + +```json +POST _opensearch/_asynchronous_search/?pretty&size=10&wait_for_completion_timeout=1ms&keep_on_completion=true&request_cache=false +{ + "aggs": { + "city": { + "terms": { + "field": "city", + "size": 10 + } + } + } +} +``` + +#### Sample response + +```json +{ + "*id*": "FklfVlU4eFdIUTh1Q1hyM3ZnT19fUVEUd29KLWZYUUI3TzRpdU5wMjRYOHgAAAAAAAAABg==", + "state": "RUNNING", + "start_time_in_millis": 1599833301297, + "expiration_time_in_millis": 1600265301297, + "response": { + "took": 15, + "timed_out": false, + "terminated_early": false, + "num_reduce_phases": 4, + "_shards": { + "total": 21, + "successful": 4, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 807, + "relation": "eq" + }, + "max_score": null, + "hits": [] + }, + "aggregations": { + "city": { + "doc_count_error_upper_bound": 16, + "sum_other_doc_count": 403, + "buckets": [ + { + "key": "downsville", + "doc_count": 1 + }, + .... + .... + .... + { + "key": "blairstown", + "doc_count": 1 + } + ] + } + } + } +} +``` + +#### Response parameters + +Options | Description +:--- | :--- +`id` | The ID of an asynchronous search. Use this ID to monitor the progress of the search, get its partial results, and/or delete the results. If the asynchronous search finishes within the timeout period, the response doesn't include the ID because the results aren't stored in the cluster. +`state` | Specifies whether the search is still running or if it has finished, and if the results persist in the cluster. The possible states are `RUNNING`, `COMPLETED`, and `PERSISTED`. +`start_time_in_millis` | The start time in milliseconds. +`expiration_time_in_millis` | The expiration time in milliseconds. +`took` | The total time that the search is running. +`response` | The actual search response. +`num_reduce_phases` | The number of times that the coordinating node aggregates results from batches of shard responses (5 by default). If this number increases compared to the last retrieved results, you can expect additional results to be included in the search response. +`total` | The total number of shards that run the search. +`successful` | The number of shard responses that the coordinating node received successfully. +`aggregations` | The partial aggregation results that have been completed by the shards so far. + +## Get partial results + +After you submit an asynchronous search request, you can request partial responses with the ID that you see in the asynchronous search response. + +```json +GET _opensearch/_asynchronous_search/?pretty +``` + +#### Sample response + +```json +{ + "id": "Fk9lQk5aWHJIUUltR2xGWnpVcWtFdVEURUN1SWZYUUJBVkFVMEJCTUlZUUoAAAAAAAAAAg==", + "state": "STORE_RESIDENT", + "start_time_in_millis": 1599833907465, + "expiration_time_in_millis": 1600265907465, + "response": { + "took": 83, + "timed_out": false, + "_shards": { + "total": 20, + "successful": 20, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1000, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "bank", + "_type": "_doc", + "_id": "1", + "_score": 1, + "_source": { + "email": "amberduke@abc.com", + "city": "Brogan", + "state": "IL" + } + }, + {....} + ] + }, + "aggregations": { + "city": { + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 997, + "buckets": [ + { + "key": "belvoir", + "doc_count": 2 + }, + { + "key": "aberdeen", + "doc_count": 1 + }, + { + "key": "abiquiu", + "doc_count": 1 + } + ] + } + } + } +} +``` + +After the response is successfully persisted, you get back the `STORE_RESIDENT` state in the response. + +You can poll the ID with the `wait_for_completion_timeout` parameter to wait for the results received for the time that you specify. + +For asynchronous searches with `keep_on_completion` as `true` and a sufficiently long `keep_alive` time, you can keep polling the IDs until the search finishes. If you don’t want to periodically poll each ID, you can retain the results in your cluster with the `keep_alive` parameter and come back to it at a later time. + +## Delete searches and results + +You can use the DELETE API operation to delete any ongoing asynchronous search by its ID. If the search is still running, it’s canceled. If the search is complete, the saved search results are deleted. + +```json +DELETE _opensearch/_asynchronous_search/?pretty +``` + +#### Sample response + +```json +{ + "acknowledged": "true" +} +``` + +## Monitor stats + +You can use the stats API operation to monitor asynchronous searches that are running, completed, and/or persisted. + +```json +GET _opensearch/_asynchronous_search/stats +``` + +#### Sample response + +```json +{ + "_nodes": { + "total": 8, + "successful": 8, + "failed": 0 + }, + "cluster_name": "264071961897:asynchronous-search", + "nodes": { + "JKEFl6pdRC-xNkKQauy7Yg": { + "asynchronous_search_stats": { + "submitted": 18236, + "initialized": 112, + "search_failed": 56, + "search_completed": 56, + "rejected": 18124, + "persist_failed": 0, + "cancelled": 1, + "running_current": 399, + "persisted": 100 + } + } + } +} +``` + +#### Response parameters + +Options | Description +:--- | :--- +`submitted` | The number of asynchronous search requests that were submitted. +`initialized` | The number of asynchronous search requests that were initialized. +`rejected` | The number of asynchronous search requests that were rejected. +`search_completed` | The number of asynchronous search requests that completed with a successful response. +`search_failed` | The number of asynchronous search requests that completed with a failed response. +`persisted` | The number of asynchronous search requests whose final result successfully persisted in the cluster. +`persist_failed` | The number of asynchronous search requests whose final result failed to persist in the cluster. +`running_current` | The number of asynchronous search requests that are running on a given coordinator node. +`cancelled` | The number of asynchronous search requests that were canceled while the search was running. diff --git a/docs/async/security.md b/docs/async/security.md new file mode 100644 index 00000000..c7eb1bb5 --- /dev/null +++ b/docs/async/security.md @@ -0,0 +1,76 @@ +--- +layout: default +title: Asynchronous search security +nav_order: 2 +parent: Asynchronous search +has_children: false +--- + +# Asynchronous search security + +You can use the security plugin with asynchronous searches to limit non-admin users to specific actions. For example, you might want some users to only be able to submit or delete asynchronous searches, while you might want others to only view the results. + +All asynchronous search indices are protected as system indices. Only a super admin user or an admin user with a Transport Layer Security (TLS) certificate can access system indices. For more information, see [System indices](../../security/configuration/system-indices/). + +## Basic permissions + +As an admin user, you can use the security plugin to assign specific permissions to users based on which API operations they need access to. For a list of supported APIs operations, see [Asynchronous search](../). + +The security plugin has two built-in roles that cover most asynchronous search use cases: `asynchronous_search_full_access` and `asynchronous_search_read_access`. For descriptions of each, see [Predefined roles](../../security/access-control/users-roles/#predefined-roles). + +If these roles don’t meet your needs, mix and match individual asynchronous search permissions to suit your use case. Each action corresponds to an operation in the REST API. For example, the `cluster:admin/opensearch/asynchronous_search/delete` permission lets you delete a previously submitted asynchronous search. + +## (Advanced) Limit access by backend role + +Use backend roles to configure fine-grained access to asynchronous searches based on roles. For example, users of different departments in an organization can view asynchronous searches owned by their own department. + +First, make sure that your users have the appropriate [backend roles](../../security/access-control/). Backend roles usually come from an [LDAP server](../../security/configuration/ldap/) or [SAML provider](../../security/configuration/saml/). However, if you use the internal user database, you can use the REST API to [add them manually](../../security/access-control/api/#create-user). + +Now when users view asynchronous search resources in OpenSearch Dashboards (or make REST API calls), they only see asynchronous searches that are submitted by users who have a subset of the backend role. +For example, consider two users: `judy` and `elon`. + +`judy` has an IT backend role: + +```json +PUT _opensearch/_security/api/internalusers/judy +{ + "password": "judy", + "backend_roles": [ + "IT" + ], + "attributes": {} +} +``` + +`elon` has an admin backend role: + +```json +PUT _opensearch/_security/api/internalusers/elon +{ + "password": "elon", + "backend_roles": [ + "admin" + ], + "attributes": {} +} +``` + +Both `judy` and `elon` have full access to asynchronous search: + +```json +PUT _opensearch/_security/api/rolesmapping/async_full_access +{ + "backend_roles": [], + "hosts": [], + "users": [ + "judy", + "elon" + ] +} +``` + +Because they have different backend roles, an asynchronous search submitted by `judy` will not be visible to `elon` and vice versa. + +`judy` needs to have at least the superset of all roles that `elon` has to see `elon`'s asynchronous searches. + +For example, if `judy` has five backend roles and `elon` one has one of these roles, then `judy` can see asynchronous searches submitted by `elon`, but `elon` can’t see the asynchronous searches submitted by `judy`. This means that `judy` can perform GET and DELETE operations on asynchronous searches that are submitted by `elon`, but not the reverse. diff --git a/docs/async/settings.md b/docs/async/settings.md new file mode 100644 index 00000000..31bbeb16 --- /dev/null +++ b/docs/async/settings.md @@ -0,0 +1,29 @@ +--- +layout: default +title: Settings +parent: Asynchronous search +nav_order: 4 +--- + +# Settings + +The asynchronous search plugin adds several settings to the standard OpenSearch cluster settings. They are dynamic, so you can change the default behavior of the plugin without restarting your cluster. You can mark the settings as `persistent` or `transient`. + +For example, to update the retention period of the result index: + +```json +PUT _cluster/settings +{ + "transient": { + "opensearch.asynchronous_search.max_wait_for_completion_timeout": "5m" + } +} +``` + +Setting | Default | Description +:--- | :--- | :--- +`opensearch.asynchronous_search.max_search_running_time` | 12 hours | The maximum running time for the search beyond which the search is terminated. +`opensearch.asynchronous_search.node_concurrent_running_searches` | 20 | The concurrent searches running per coordinator node. +`opensearch.asynchronous_search.max_keep_alive` | 5 days | The maximum amount of time that search results can be stored in the cluster. +`opensearch.asynchronous_search.max_wait_for_completion_timeout` | 1 minute | The maximum value for the `wait_for_completion_timeout` parameter. +`opensearch.asynchronous_search.persist_search_failures` | false | Persist asynchronous search results that end with a search failure in the system index. diff --git a/docs/cli/index.md b/docs/cli/index.md new file mode 100644 index 00000000..50a0041b --- /dev/null +++ b/docs/cli/index.md @@ -0,0 +1,100 @@ +--- +layout: default +title: OpenSearch CLI +nav_order: 52 +has_children: false +--- + +# OpenSearch CLI + +The OpenSearch CLI command line interface (opensearch-cli) lets you manage your OpenSearch cluster from the command line and automate tasks. + +Currently, opensearch-cli supports the [Anomaly Detection](../ad/) and [k-NN](../knn/) plugins, along with arbitrary REST API paths. Among other things, you can use opensearch-cli create and delete detectors, start and stop them, and check k-NN statistics. + +Profiles let you easily access different clusters or sign requests with different credentials. opensearch-cli supports unauthenticated requests, HTTP basic signing, and IAM signing for Amazon Web Services. + +This example moves a detector (`ecommerce-count-quantity`) from a staging cluster to a production cluster: + +```bash +opensearch-cli ad get ecommerce-count-quantity --profile staging > ecommerce-count-quantity.json +opensearch-cli ad create ecommerce-count-quantity.json --profile production +opensearch-cli ad start ecommerce-count-quantity.json --profile production +opensearch-cli ad stop ecommerce-count-quantity --profile staging +opensearch-cli ad delete ecommerce-count-quantity --profile staging +``` + + +## Install + +1. [Download](https://opensearch.org/downloads.html){:target='\_blank'} and extract the appropriate installation package for your computer. + +1. Make the `opensearch-cli` file executable: + + ```bash + chmod +x ./opensearch-cli + ``` + +1. Add the command to your path: + + ```bash + export PATH=$PATH:$(pwd) + ``` + +1. Check that the CLI is working properly: + + ```bash + opensearch-cli --version + ``` + + +## Profiles + +Profiles let you easily switch between different clusters and user credentials. To get started, run `opensearch-cli profile create` with the `--auth-type`, `--endpoint`, and `--name` options: + +```bash +opensearch-cli profile create --auth-type basic --endpoint https://localhost:9200 --name docker-local +``` + +Alternatively, save a configuration file to `~/.opensearch-cli/config.yaml`: + +```yaml +profiles: + - name: docker-local + endpoint: https://localhost:9200 + user: admin + password: foobar + - name: aws + endpoint: https://some-cluster.us-east-1.es.amazonaws.com + aws_iam: + profile: "" + service: es +``` + + +## Usage + +opensearch-cli commands use the following syntax: + +```bash +opensearch-cli +``` + +For example, the following command retrieves information about a detector: + +```bash +opensearch-cli ad get my-detector --profile docker-local +``` + +For a request to the OpenSearch CAT API, try the following command: + +```bash +opensearch-cli curl get --path _cat/plugins --profile aws +``` + +Use the `-h` or `--help` flag to see all supported commands, subcommands, or usage for a specific command: + +```bash +opensearch-cli -h +opensearch-cli ad -h +opensearch-cli ad get -h +``` diff --git a/docs/im/index-rollups/index.md b/docs/im/index-rollups/index.md new file mode 100644 index 00000000..17b48e46 --- /dev/null +++ b/docs/im/index-rollups/index.md @@ -0,0 +1,457 @@ +--- +layout: default +title: Index Rollups +nav_order: 35 +parent: Index management +has_children: true +redirect_from: /docs/ism/index-rollups/ +has_toc: false +--- + +# Index Rollups + +Time series data increases storage costs, strains cluster health, and slows down aggregations over time. Index rollup lets you periodically reduce data granularity by rolling up old data into summarized indices. + +You pick the fields that interest you and use index rollup to create a new index with only those fields aggregated into coarser time buckets. You can store months or years of historical data at a fraction of the cost with the same query performance. + +For example, say you collect CPU consumption data every five seconds and store it on a hot node. Instead of moving older data to a read-only warm node, you can roll up or compress this data with only the average CPU consumption per day or with a 10% decrease in its interval every week. + +You can use index rollup in three ways: + +1. Use the index rollup API for an on-demand index rollup job that operates on an index that's not being actively ingested such as a rolled-over index. For example, you can perform an index rollup operation to reduce data collected at a five minute interval to a weekly average for trend analysis. +2. Use the OpenSearch Dashboards UI to create an index rollup job that runs on a defined schedule. You can also set it up to roll up your indices as it’s being actively ingested. For example, you can continuously roll up Logstash indices from a five second interval to a one hour interval. +3. Specify the index rollup job as an ISM action for complete index management. This allows you to roll up an index after a certain event such as a rollover, index age reaching a certain point, index becoming read-only, and so on. You can also have rollover and index rollup jobs running in sequence, where the rollover first moves the current index to a warm node and then the index rollup job creates a new index with the minimized data on the hot node. + +## Create an Index Rollup Job + +To get started, choose **Index Management** in OpenSearch Dashboards. +Select **Rollup Jobs** and choose **Create rollup job**. + +### Step 1: Set up indices + +1. In the **Job name and description** section, specify a unique name and an optional description for the index rollup job. +2. In the **Indices** section, select the source and target index. The source index is the one that you want to roll up. The source index remains as is, the index rollup job creates a new index referred to as a target index. The target index is where the index rollup results are saved. For target index, you can either type in a name for a new index or you select an existing index. +5. Choose **Next** + +After you create an index rollup job, you can't change your index selections. + +### Step 2: Define aggregations and metrics + +Select the attributes with the aggregations (terms and histograms) and metrics (avg, sum, max, min, and value count) that you want to roll up. Make sure you don’t add a lot of highly granular attributes, because you won’t save much space. + +For example, consider a dataset of cities and demographics within those cities. You can aggregate based on cities and specify demographics within a city as metrics. +The order in which you select attributes is critical. A city followed by a demographic is different from a demographic followed by a city. + +1. In the **Time aggregation** section, select a timestamp field. Choose between a **Fixed** or **Calendar** interval type and specify the interval and timezone. The index rollup job uses this information to create a date histogram for the timestamp field. +2. (Optional) Add additional aggregations for each field. You can choose terms aggregation for all field types and histogram aggregation only for numeric fields. +3. (Optional) Add additional metrics for each field. You can choose between **All**, **Min**, **Max**, **Sum**, **Avg**, or **Value Count**. +4. Choose **Next**. + +### Step 3: Specify schedule + +Specify a schedule to roll up your indices as it’s being ingested. The index rollup job is enabled by default. + +1. Specify if the data is continuous or not. +3. For roll up execution frequency, select **Define by fixed interval** and specify the **Rollup interval** and the time unit or **Define by cron expression** and add in a cron expression to select the interval. To learn how to define a cron expression, see [Alerting](../alerting/cron/). +4. Specify the number of pages per execution process. A larger number means faster execution and more cost for memory. +5. (Optional) Add a delay to the roll up executions. This is the amount of time the job waits for data ingestion to accommodate any processing time. For example, if you set this value to 10 minutes, an index rollup that executes at 2 PM to roll up 1 PM to 2 PM of data starts at 2:10 PM. +6. Choose **Next**. + +### Step 4: Review and create + +Review your configuration and select **Create**. + +### Step 5: Search the target index + +You can use the standard `_search` API to search the target index. Make sure that the query matches the constraints of the target index. For example, if don’t set up terms aggregations on a field, you don’t receive results for terms aggregations. If you don’t set up the maximum aggregations, you don’t receive results for maximum aggregations. + +You can’t access the internal structure of the data in the target index because the plugin automatically rewrites the query in the background to suit the target index. This is to make sure you can use the same query for the source and target index. + +To query the target index, set `size` to 0: + +```json +GET target_index/_search +{ + "size": 0, + "query": { + "match_all": {} + }, + "aggs": { + "avg_cpu": { + "avg": { + "field": "cpu_usage" + } + } + } +} +``` + +Consider a scenario where you collect rolled up data from 1 PM to 9 PM in hourly intervals and live data from 7 PM to 11 PM in minutely intervals. If you execute an aggregation over these in the same query, for 7 PM to 9 PM, you see an overlap of both rolled up data and live data because they get counted twice in the aggregations. + +## Sample Walkthrough + +This walkthrough uses the OpenSearch Dashboards sample e-commerce data. To add that sample data, log in to OpenSearch Dashboards, choose **Home** and **Try our sample data**. For **Sample eCommerce orders**, choose **Add data**. + +Then run a search: + +```json +GET opensearch_dashboards_sample_data_ecommerce/_search +``` + +#### Sample response + +```json +{ + "took": 23, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 4675, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "opensearch_dashboards_sample_data_ecommerce", + "_type": "_doc", + "_id": "jlMlwXcBQVLeQPrkC_kQ", + "_score": 1, + "_source": { + "category": [ + "Women's Clothing", + "Women's Accessories" + ], + "currency": "EUR", + "customer_first_name": "Selena", + "customer_full_name": "Selena Mullins", + "customer_gender": "FEMALE", + "customer_id": 42, + "customer_last_name": "Mullins", + "customer_phone": "", + "day_of_week": "Saturday", + "day_of_week_i": 5, + "email": "selena@mullins-family.zzz", + "manufacturer": [ + "Tigress Enterprises" + ], + "order_date": "2021-02-27T03:56:10+00:00", + "order_id": 581553, + "products": [ + { + "base_price": 24.99, + "discount_percentage": 0, + "quantity": 1, + "manufacturer": "Tigress Enterprises", + "tax_amount": 0, + "product_id": 19240, + "category": "Women's Clothing", + "sku": "ZO0064500645", + "taxless_price": 24.99, + "unit_discount_amount": 0, + "min_price": 12.99, + "_id": "sold_product_581553_19240", + "discount_amount": 0, + "created_on": "2016-12-24T03:56:10+00:00", + "product_name": "Blouse - port royal", + "price": 24.99, + "taxful_price": 24.99, + "base_unit_price": 24.99 + }, + { + "base_price": 10.99, + "discount_percentage": 0, + "quantity": 1, + "manufacturer": "Tigress Enterprises", + "tax_amount": 0, + "product_id": 17221, + "category": "Women's Accessories", + "sku": "ZO0085200852", + "taxless_price": 10.99, + "unit_discount_amount": 0, + "min_price": 5.06, + "_id": "sold_product_581553_17221", + "discount_amount": 0, + "created_on": "2016-12-24T03:56:10+00:00", + "product_name": "Snood - rose", + "price": 10.99, + "taxful_price": 10.99, + "base_unit_price": 10.99 + } + ], + "sku": [ + "ZO0064500645", + "ZO0085200852" + ], + "taxful_total_price": 35.98, + "taxless_total_price": 35.98, + "total_quantity": 2, + "total_unique_products": 2, + "type": "order", + "user": "selena", + "geoip": { + "country_iso_code": "MA", + "location": { + "lon": -8, + "lat": 31.6 + }, + "region_name": "Marrakech-Tensift-Al Haouz", + "continent_name": "Africa", + "city_name": "Marrakesh" + }, + "event": { + "dataset": "sample_ecommerce" + } + } + } + ] + } +} +... +``` + +Create an index rollup job. +This example picks the `order_date`, `customer_gender`, `geoip.city_name`, `geoip.region_name`, and `day_of_week` fields and rolls them into an `example_rollup` target index: + +```json +PUT _opensearch/_rollup/jobs/example +{ + "rollup": { + "enabled": true, + "schedule": { + "interval": { + "period": 1, + "unit": "Minutes", + "start_time": 1602100553 + } + }, + "last_updated_time": 1602100553, + "description": "An example policy that rolls up the sample ecommerce data", + "source_index": "opensearch_dashboards_sample_data_ecommerce", + "target_index": "example_rollup", + "page_size": 1000, + "delay": 0, + "continuous": false, + "dimensions": [ + { + "date_histogram": { + "source_field": "order_date", + "fixed_interval": "60m", + "timezone": "America/Los_Angeles" + } + }, + { + "terms": { + "source_field": "customer_gender" + } + }, + { + "terms": { + "source_field": "geoip.city_name" + } + }, + { + "terms": { + "source_field": "geoip.region_name" + } + }, + { + "terms": { + "source_field": "day_of_week" + } + } + ], + "metrics": [ + { + "source_field": "taxless_total_price", + "metrics": [ + { + "avg": {} + }, + { + "sum": {} + }, + { + "max": {} + }, + { + "min": {} + }, + { + "value_count": {} + } + ] + }, + { + "source_field": "total_quantity", + "metrics": [ + { + "avg": {} + }, + { + "max": {} + } + ] + } + ] + } +} +``` + +You can query the `example_rollup` index for the terms aggregations on the fields set up in the rollup job. +You get back the same response that you would on the original `opensearch_dashboards_sample_data_ecommerce` source index. + +```json +POST example_rollup/_search +{ + "size": 0, + "query": { + "bool": { + "must": {"term": { "geoip.region_name": "California" } } + } + }, + "aggregations": { + "daily_numbers": { + "terms": { + "field": "day_of_week" + }, + "aggs": { + "per_city": { + "terms": { + "field": "geoip.city_name" + }, + "aggregations": { + "average quantity": { + "avg": { + "field": "total_quantity" + } + } + } + }, + "total_revenue": { + "sum": { + "field": "taxless_total_price" + } + } + } + } + } +} +``` + +#### Sample Response + +```json +{ + "took": 476, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 281, + "relation": "eq" + }, + "max_score": null, + "hits": [] + }, + "aggregations": { + "daily_numbers": { + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0, + "buckets": [ + { + "key": "Friday", + "doc_count": 53, + "total_revenue": { + "value": 4858.84375 + }, + "per_city": { + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0, + "buckets": [ + { + "key": "Los Angeles", + "doc_count": 53, + "average quantity": { + "value": 2.305084745762712 + } + } + ] + } + }, + { + "key": "Saturday", + "doc_count": 43, + "total_revenue": { + "value": 3547.203125 + }, + "per_city": { + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0, + "buckets": [ + { + "key": "Los Angeles", + "doc_count": 43, + "average quantity": { + "value": 2.260869565217391 + } + } + ] + } + }, + { + "key": "Tuesday", + "doc_count": 42, + "total_revenue": { + "value": 3983.28125 + }, + "per_city": { + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0, + "buckets": [ + { + "key": "Los Angeles", + "doc_count": 42, + "average quantity": { + "value": 2.2888888888888888 + } + } + ] + } + }, + { + "key": "Sunday", + "doc_count": 40, + "total_revenue": { + "value": 3308.1640625 + }, + "per_city": { + "doc_count_error_upper_bound": 0, + "sum_other_doc_count": 0, + "buckets": [ + { + "key": "Los Angeles", + "doc_count": 40, + "average quantity": { + "value": 2.090909090909091 + } + } + ] + } + } + ... + ] + } + } +} +``` diff --git a/docs/im/index-rollups/rollup-api.md b/docs/im/index-rollups/rollup-api.md new file mode 100644 index 00000000..78a6a423 --- /dev/null +++ b/docs/im/index-rollups/rollup-api.md @@ -0,0 +1,235 @@ +--- +layout: default +title: Index Rollups API +parent: Index Rollups +grand_parent: Index management +redirect_from: /docs/ism/rollup-api/ +nav_order: 9 +--- + +# Index Rollups API + +Use the index rollup operations to programmatically work with index rollup jobs. + +--- + +#### Table of contents +- TOC +{:toc} + + +--- + +## Create or update an index rollup job + +Creates or updates an index rollup job. +You must provide the `seq_no` and `primary_term` parameters. + +#### Request + +```json +PUT _opensearch/_rollup/jobs/ // Create +PUT _opensearch/_rollup/jobs/?if_seq_no=1&if_primary_term=1 // Update +{ + "rollup": { + "source_index": "nyc-taxi-data", + "target_index": "rollup-nyc-taxi-data", + "schedule": { + "interval": { + "period": 1, + "unit": "Days" + } + }, + "description": "Example rollup job", + "enabled": true, + "page_size": 200, + "delay": 0, + "roles": [ + "rollup_all", + "nyc_taxi_all", + "example_rollup_index_all" + ], + "continuous": false, + "dimensions": { + "date_histogram": { + "source_field": "tpep_pickup_datetime", + "fixed_interval": "1h", + "timezone": "America/Los_Angeles" + }, + "terms": { + "source_field": "PULocationID" + }, + "metrics": [ + { + "source_field": "passenger_count", + "metrics": [ + { + "avg": {} + }, + { + "sum": {} + }, + { + "max": {} + }, + { + "min": {} + }, + { + "value_count": {} + } + ] + } + ] + } + } +} +``` + +You can specify the following options. + +Options | Description | Type | Required +:--- | :--- |:--- |:--- | +`source_index` | The name of the detector. | `string` | Yes +`target_index` | Specify the target index that the rolled up data is ingested into. You could either create a new target index or use an existing index. The target index cannot be a combination of raw and rolled up data. | `string` | Yes +`schedule` | Schedule of the index rollup job which can be an interval or a cron expression. | `object` | Yes +`schedule.interval` | Specify the frequency of execution of the rollup job. | `object` | No +`schedule.interval.start_time` | Start time of the interval. | `timestamp` | Yes +`schedule.interval.period` | Define the interval period. | `string` | Yes +`schedule.interval.unit` | Specify the time unit of the interval. | `string` | Yes +`schedule.interval.cron` | Optionally, specify a cron expression to define therollup frequency. | `list` | No +`schedule.interval.cron.expression` | Specify a Unix cron expression. | `string` | Yes +`schedule.interval.cron.timezone` | Specify timezones as defined by the IANA Time Zone Database. Defaults to UTC. | `string` | No +`description` | Optionally, describe the rollup job. | `string` | No +`enabled` | When true, the index rollup job is scheduled. Default is true. | `boolean` | Yes +`continuous` | Specify whether or not the index rollup job continuously rolls up data forever or just executes over the current data set once and stops. Default is false. | `boolean` | Yes +`error_notification` | Set up a Mustache message template sent for error notifications. For example, if an index rollup job fails, the system sends a message to a Slack channel. | `object` | No +`page_size` | Specify the number of buckets to paginate through at a time while rolling up. | `number` | Yes +`delay` | Specify time value to delay execution of the index rollup job. | `time_unit` | No +`dimensions` | Specify aggregations to create dimensions for the roll up time window. | `object` | Yes +`dimensions.date_histogram` | Specify either fixed_interval or calendar_interval, but not both. Either one limits what you can query in the target index. | `object` | No +`dimensions.date_histogram.fixed_interval` | Specify the fixed interval for aggregations in milliseconds, seconds, minutes, hours, or days. | `string` | No +`dimensions.date_histogram.calendar_interval` | Specify the calendar interval for aggregations in minutes, hours, days, weeks, months, quarters, or years. | `string` | No +`dimensions.date_histogram.field` | Specify the date field used in date histogram aggregation. | `string` | No +`dimensions.date_histogram.timezone` | Specify the timezones as defined by the IANA Time Zone Database. The default is UTC. | `string` | No +`dimensions.terms` | Specify the term aggregations that you want to roll up. | `object` | No +`dimensions.terms.fields` | Specify terms aggregation for compatible fields. | `object` | No +`dimensions.histogram` | Specify the histogram aggregations that you want to roll up. | `object` | No +`dimensions.histogram.field` | Add a field for histogram aggregations. | `string` | Yes +`dimensions.histogram.interval` | Specify the histogram aggregation interval for the field. | `long` | Yes +`dimensions.metrics` | Specify a list of objects that represent the fields and metrics that you want to calculate. | `nested object` | No +`dimensions.metrics.field` | Specify the field that you want to perform metric aggregations on. | `string` | No +`dimensions.metrics.field.metrics` | Specify the metric aggregations you want to calculate for the field. | `multiple strings` | No + + +#### Sample response + +```json +{ + "_id": "rollup_id", + "_seqNo": 1, + "_primaryTerm": 1, + "rollup": { ... } +} +``` + + +## Get an index rollup job + +Returns all information about an index rollup job based on the `rollup_id`. + +#### Request + +```json +GET _opensearch/_rollup/jobs/ +``` + + +#### Sample response + +```json +{ + "_id": "my_rollup", + "_seqNo": 1, + "_primaryTerm": 1, + "rollup": { ... } +} +``` + + +--- + +## Delete an index rollup job + +Deletes an index rollup job based on the `rollup_id`. + +#### Request + +```json +DELETE _opensearch/_rollup/jobs/ +``` + +#### Sample response + +```json +200 OK +``` + +--- + + +## Start or stop an index rollup job + +Start or stop an index rollup job. + +#### Request + +```json +POST _opensearch/_rollup/jobs//_start +POST _opensearch/_rollup/jobs//_stop +``` + + +#### Sample response + +```json +200 OK +``` + + +--- + +## Explain an index rollup job + +Returns detailed metadata information about the index rollup job and its current progress. + +#### Request + +```json +GET _opensearch/_rollup/jobs//_explain +``` + + +#### Sample response + +```json +{ + "example_rollup": { + "rollup_id": "example_rollup", + "last_updated_time": 1602014281, + "continuous": { + "next_window_start_time": 1602055591, + "next_window_end_time": 1602075591 + }, + "status": "running", + "failure_reason": null, + "stats": { + "pages_processed": 342, + "documents_processed": 489359, + "rollups_indexed": 3420, + "index_time_in_ms": 30495, + "search_time_in_ms": 584922 + } + } +} +``` diff --git a/docs/im/index.md b/docs/im/index.md new file mode 100644 index 00000000..2a091bd1 --- /dev/null +++ b/docs/im/index.md @@ -0,0 +1,12 @@ +--- +layout: default +title: Index management +nav_order: 30 +has_children: true +--- + +# Index Management +OpenSearch Dashboards +{: .label .label-yellow :} + +The Index Management (IM) plugin lets you automate recurring index management activities and reduce storage costs. diff --git a/docs/im/ism/api.md b/docs/im/ism/api.md new file mode 100644 index 00000000..f6f17415 --- /dev/null +++ b/docs/im/ism/api.md @@ -0,0 +1,494 @@ +--- +layout: default +title: ISM API +parent: Index State Management +grand_parent: Index management +redirect_from: /docs/ism/api/ +nav_order: 5 +--- + +# ISM API + +Use the index state management operations to programmatically work with policies and managed indices. + +--- + +#### Table of contents +- TOC +{:toc} + + +--- + + +## Create policy + +Creates a policy. + +#### Request + +```json +PUT _opensearch/_ism/policies/policy_1 +{ + "policy": { + "description": "ingesting logs", + "default_state": "ingest", + "states": [ + { + "name": "ingest", + "actions": [ + { + "rollover": { + "min_doc_count": 5 + } + } + ], + "transitions": [ + { + "state_name": "search" + } + ] + }, + { + "name": "search", + "actions": [], + "transitions": [ + { + "state_name": "delete", + "conditions": { + "min_index_age": "5m" + } + } + ] + }, + { + "name": "delete", + "actions": [ + { + "delete": {} + } + ], + "transitions": [] + } + ] + } +} +``` + + +#### Sample response + +```json +{ + "_id": "policy_1", + "_version": 1, + "_primary_term": 1, + "_seq_no": 7, + "policy": { + "policy": { + "policy_id": "policy_1", + "description": "ingesting logs", + "last_updated_time": 1577990761311, + "schema_version": 1, + "error_notification": null, + "default_state": "ingest", + "states": [ + { + "name": "ingest", + "actions": [ + { + "rollover": { + "min_doc_count": 5 + } + } + ], + "transitions": [ + { + "state_name": "search" + } + ] + }, + { + "name": "search", + "actions": [], + "transitions": [ + { + "state_name": "delete", + "conditions": { + "min_index_age": "5m" + } + } + ] + }, + { + "name": "delete", + "actions": [ + { + "delete": {} + } + ], + "transitions": [] + } + ] + } + } +} +``` + + +--- + +## Add policy + +Adds a policy to an index. This operation does not change the policy if the index already has one. + +#### Request + +```json +POST _opensearch/_ism/add/index_1 +{ + "policy_id": "policy_1" +} +``` + +#### Sample response + +```json +{ + "updated_indices": 1, + "failures": false, + "failed_indices": [] +} +``` + +--- + + +## Update policy + +Updates a policy. Use the `seq_no` and `primary_term` parameters to update an existing policy. If these numbers don't match the existing policy or the policy doesn't exist, ISM throws an error. + +#### Request + +```json +PUT _opensearch/_ism/policies/policy_1?if_seq_no=7&if_primary_term=1 +{ + "policy": { + "description": "ingesting logs", + "default_state": "ingest", + "states": [ + { + "name": "ingest", + "actions": [ + { + "rollover": { + "min_doc_count": 5 + } + } + ], + "transitions": [ + { + "state_name": "search" + } + ] + }, + { + "name": "search", + "actions": [], + "transitions": [ + { + "state_name": "delete", + "conditions": { + "min_index_age": "5m" + } + } + ] + }, + { + "name": "delete", + "actions": [ + { + "delete": {} + } + ], + "transitions": [] + } + ] + } +} +``` + + +#### Sample response + +```json +{ + "_id": "policy_1", + "_version": 2, + "_primary_term": 1, + "_seq_no": 10, + "policy": { + "policy": { + "policy_id": "policy_1", + "description": "ingesting logs", + "last_updated_time": 1577990934044, + "schema_version": 1, + "error_notification": null, + "default_state": "ingest", + "states": [ + { + "name": "ingest", + "actions": [ + { + "rollover": { + "min_doc_count": 5 + } + } + ], + "transitions": [ + { + "state_name": "search" + } + ] + }, + { + "name": "search", + "actions": [], + "transitions": [ + { + "state_name": "delete", + "conditions": { + "min_index_age": "5m" + } + } + ] + }, + { + "name": "delete", + "actions": [ + { + "delete": {} + } + ], + "transitions": [] + } + ] + } + } +} +``` + + +--- + +## Get policy + +Gets the policy by `policy_id`. + +#### Request + +```json +GET _opensearch/_ism/policies/policy_1 +``` + + +#### Sample response + +```json +{ + "_id": "policy_1", + "_version": 2, + "_seq_no": 10, + "_primary_term": 1, + "policy": { + "policy_id": "policy_1", + "description": "ingesting logs", + "last_updated_time": 1577990934044, + "schema_version": 1, + "error_notification": null, + "default_state": "ingest", + "states": [ + { + "name": "ingest", + "actions": [ + { + "rollover": { + "min_doc_count": 5 + } + } + ], + "transitions": [ + { + "state_name": "search" + } + ] + }, + { + "name": "search", + "actions": [], + "transitions": [ + { + "state_name": "delete", + "conditions": { + "min_index_age": "5m" + } + } + ] + }, + { + "name": "delete", + "actions": [ + { + "delete": {} + } + ], + "transitions": [] + } + ] + } +} +``` + +--- + +## Remove policy from index + +Removes any ISM policy from the index. + +#### Request + +```json +POST _opensearch/_ism/remove/index_1 +``` + + +#### Sample response + +```json +{ + "updated_indices": 1, + "failures": false, + "failed_indices": [] +} +``` + +--- + +## Update managed index policy + +Updates the managed index policy to a new policy (or to a new version of the policy). You can use an index pattern to update multiple indices at once. When updating multiple indices, you might want to include a state filter to only affect certain managed indices. The change policy filters out all the existing managed indices and only applies the change to the ones in the state that you specify. You can also explicitly specify the state that the managed index transitions to after the change policy takes effect. + +A policy change is an asynchronous background process. The changes are queued and are not executed immediately by the background process. This delay in execution protects the currently running managed indices from being put into a broken state. If the policy you are changing to has only some small configuration changes, then the change takes place immediately. For example, if the policy changes the `min_index_age` parameter in a rollover condition from `1000d` to `100d`, this change takes place immediately in its next execution. If the change modifies the state, actions, or the order of actions of the current state the index is in, then the change happens at the end of its current state before transitioning to a new state. + +In this example, the policy applied on the `index_1` index is changed to `policy_1`, which could either be a completely new policy or an updated version of its existing policy. The process only applies the change if the index is currently in the `searches` state. After this change in policy takes place, `index_1` transitions to the `delete` state. + +#### Request + +```json +POST _opensearch/_ism/change_policy/index_1 +{ + "policy_id": "policy_1", + "state": "delete", + "include": [ + { + "state": "searches" + } + ] +} +``` + + +#### Sample response + +```json +{ + "updated_indices": 0, + "failures": false, + "failed_indices": [] +} +``` + +--- + +## Retry failed index + +Retries the failed action for an index. For the retry call to succeed, ISM must manage the index, and the index must be in a failed state. You can use index patterns (`*`) to retry multiple failed indices. + +#### Request + +```json +POST _opensearch/_ism/retry/index_1 +{ + "state": "delete" +} +``` + + +#### Sample response + +```json +{ + "updated_indices": 0, + "failures": false, + "failed_indices": [] +} +``` + +--- + +## Explain index + +Gets the current state of the index. You can use index patterns to get the status of multiple indices. + +#### Request + +```json +GET _opensearch/_ism/explain/index_1 +``` + + +#### Sample response + +```json +{ + "index_1": { + "index.opensearch.index_state_management.policy_id": "policy_1" + } +} +``` + +The `opensearch.index_state_management.policy_id` setting is deprecated starting from version 1.13.0. +We retain this field in the response API for consistency. + +--- + +## Delete policy + +Deletes the policy by `policy_id`. + +#### Request + +```json +DELETE _opensearch/_ism/policies/policy_1 +``` + + +#### Sample response + +```json +{ + "_index": ".opensearch-ism-config", + "_type": "_doc", + "_id": "policy_1", + "_version": 3, + "result": "deleted", + "forced_refresh": true, + "_shards": { + "total": 2, + "successful": 2, + "failed": 0 + }, + "_seq_no": 15, + "_primary_term": 1 +} +``` diff --git a/docs/im/ism/index.md b/docs/im/ism/index.md new file mode 100644 index 00000000..81b9d7f3 --- /dev/null +++ b/docs/im/ism/index.md @@ -0,0 +1,103 @@ +--- +layout: default +title: Index State Management +nav_order: 3 +parent: Index management +has_children: true +redirect_from: /docs/ism/ +has_toc: false +--- + +# Index State Management +OpenSearch Dashboards +{: .label .label-yellow :} + +If you analyze time-series data, you likely prioritize new data over old data. You might periodically perform certain operations on older indices, such as reducing replica count or deleting them. + +Index State Management (ISM) is a plugin that lets you automate these periodic, administrative operations by triggering them based on changes in the index age, index size, or number of documents. Using the ISM plugin, you can define *policies* that automatically handle index rollovers or deletions to fit your use case. + +For example, you can define a policy that moves your index into a `read_only` state after 30 days and then deletes it after a set period of 90 days. You can also set up the policy to send you a notification message when the index is deleted. + +You might want to perform an index rollover after a certain amount of time or run a `force_merge` operation on an index during off-peak hours to improve search performance during peak hours. + +To use the ISM plugin, your user role needs to be mapped to the `all_access` role that gives you full access to the cluster. To learn more, see [Users and roles](../security/access-control/users-roles/). +{: .note } + +## Get started with ISM + +To get started, choose **Index Management** in OpenSearch Dashboards. + +### Step 1: Set up policies + +A policy is a set of rules that describes how an index should be managed. For information about creating a policy, see [Policies](policies/). + +1. Choose the **Index Policies** tab. +2. Choose **Create policy**. +3. In the **Name policy** section, enter a policy ID. +4. In the **Define policy** section, enter your policy. +5. Choose **Create**. + +After you create a policy, your next step is to attach this policy to an index or indices. +You can set up an `ism_template` in the policy so when you create an index that matches the ISM template pattern, the index will have this policy attached to it: + +```json +PUT _opensearch/_ism/policies/policy_id +{ + "policy": { + "description": "Example policy.", + "default_state": "...", + "states": [...], + "ism_template": { + "index_patterns": ["index_name-*"], + "priority": 100 + } + } +} +``` + +For an example ISM template policy, see [Sample policy with ISM template](policies/#sample-policy-with-ism-template). + +Older versions of the plugin include the `policy_id` in an index template, so when an index is created that matches the index template pattern, the index will have the policy attached to it: + +```json +PUT _index_template/ +{ + "index_patterns": [ + "index_name-*" + ], + "template": { + "settings": { + "opensearch.index_state_management.policy_id": "policy_id" + } + } +} +``` + +The `opensearch.index_state_management.policy_id` setting is deprecated starting from version 1.13.0. You can continue to automatically manage newly created indices with the ISM template field. +{: .note } + +### Step 2: Attach policies to indices + +1. Choose **Indices**. +2. Choose the index or indices that you want to attach your policy to. +3. Choose **Apply policy**. +4. From the **Policy ID** menu, choose the policy that you created. +You can see a preview of your policy. +5. If your policy includes a rollover operation, specify a rollover alias. +Make sure that the alias that you enter already exists. For more information about the rollover operation, see [rollover](policies/#rollover). +6. Choose **Apply**. + +After you attach a policy to an index, ISM creates a job that runs every 5 minutes by default to perform policy actions, check conditions, and transition the index into different states. To change the default time interval for this job, see [Settings](settings/). + +If you want to use an OpenSearch operation to create an index with a policy already attached to it, see [create index](api/#create-index). + +### Step 3: Manage indices + +1. Choose **Managed Indices**. +2. To change your policy, see [Change Policy](managedindices/#change-policy). +3. To attach a rollover alias to your index, select your policy and choose **Add rollover alias**. +Make sure that the alias that you enter already exists. For more information about the rollover operation, see [rollover](policies/#rollover). +4. To remove a policy, choose your policy, and then choose **Remove policy**. +5. To retry a policy, choose your policy, and then choose **Retry policy**. + +For information about managing your policies, see [Managed Indices](managedindices/). diff --git a/docs/im/ism/managedindices.md b/docs/im/ism/managedindices.md new file mode 100644 index 00000000..d83c1a86 --- /dev/null +++ b/docs/im/ism/managedindices.md @@ -0,0 +1,75 @@ +--- +layout: default +title: Managed Indices +nav_order: 3 +parent: Index State Management +grand_parent: Index management +redirect_from: /docs/ism/managedindices/ +has_children: false +--- + +# Managed indices + +You can change or update a policy using the managed index operations. + +This table lists the fields of managed index operations. + +Parameter | Description | Type | Required | Read Only +:--- | :--- |:--- |:--- | +`name` | The name of the managed index policy. | `string` | Yes | No +`index` | The name of the managed index that this policy is managing. | `string` | Yes | No +`index_uuid` | The uuid of the index. | `string` | Yes | No +`enabled` | When `true`, the managed index is scheduled and run by the scheduler. | `boolean` | Yes | No +`enabled_time` | The time the managed index was last enabled. If the managed index process is disabled, then this is null. | `timestamp` | Yes | Yes +`last_updated_time` | The time the managed index was last updated. | `timestamp` | Yes | Yes +`schedule` | The schedule of the managed index job. | `object` | Yes | No +`policy_id` | The name of the policy used by this managed index. | `string` | Yes | No +`policy_seq_no` | The sequence number of the policy used by this managed index. | `number` | Yes | No +`policy_primary_term` | The primary term of the policy used by this managed index. | `number` | Yes | No +`policy_version` | The version of the policy used by this managed index. | `number` | Yes | Yes +`policy` | The cached JSON of the policy for the `policy_version` that's used during runs. If the policy is null, it means that this is the first execution of the job and the latest policy document is read in/saved. | `object` | No | No +`change_policy` | The information regarding what policy and state to change to. | `object` | No | No +`policy_name` | The name of the policy to update to. To update to the latest version, set this to be the same as the current `policy_name`. | `string` | No | Yes +`state` | The state of the managed index after it finishes updating. If no state is specified, it's assumed that the policy structure did not change. | `string` | No | Yes + +The following example shows a managed index policy: + +```json +{ + "managed_index": { + "name": "my_index", + "index": "my_index", + "index_uuid": "sOKSOfkdsoSKeofjIS", + "enabled": true, + "enabled_time": 1553112384, + "last_updated_time": 1553112384, + "schedule": { + "interval": { + "period": 1, + "unit": "MINUTES", + "start_time": 1553112384 + } + }, + "policy_id": "log_rotation", + "policy_version": 1, + "policy": {...}, + "change_policy": null + } +} +``` + +## Change policy + +You can change any managed index policy, but ISM has a few constraints in place to make sure that policy changes don't break indices. + +If an index is stuck in its current state, never proceeding, and you want to update its policy immediately, make sure that the new policy includes the same state---same name, same actions, same order---as the old policy. In this case, even if the policy is in the middle of executing an action, ISM applies the new policy. + +If you update the policy without including an identical state, ISM updates the policy only after all actions in the current state finish executing. Alternately, you can choose a specific state in your old policy after which you want the new policy to take effect. + +To change a policy using OpenSearch Dashboards, do the following: + +- Under **Managed indices**, choose the indices that you want to attach the new policy to. +- To attach the new policy to indices in specific states, choose **Choose state filters**, and then choose those states. +- Under **Choose New Policy**, choose the new policy. +- To start the new policy for indices in the current state, choose **Keep indices in their current state after the policy takes effect**. +- To start the new policy in a specific state, choose **Start from a chosen state after changing policies**, and then choose the default start state in your new policy. diff --git a/docs/im/ism/policies.md b/docs/im/ism/policies.md new file mode 100644 index 00000000..50c0e95c --- /dev/null +++ b/docs/im/ism/policies.md @@ -0,0 +1,666 @@ +--- +layout: default +title: Policies +nav_order: 1 +parent: Index State Management +grand_parent: Index management +redirect_from: /docs/ism/policies/ +has_children: false +--- + +# Policies + +Policies are JSON documents that define the following: + +- The *states* that an index can be in, including the default state for new indices. For example, you might name your states "hot," "warm," "delete," and so on. For more information, see [States](#states). +- Any *actions* that you want the plugin to take when an index enters a state, such as performing a rollover. For more information, see [Actions](#actions). +- The conditions that must be met for an index to move into a new state, known as *transitions*. For example, if an index is more than eight weeks old, you might want to move it to the "delete" state. For more information, see [Transitions](#transitions). + +In other words, a policy defines the *states* that an index can be in, the *actions* to perform when in a state, and the conditions that must be met to *transition* between states. + +You have complete flexibility in the way you can design your policies. You can create any state, transition to any other state, and specify any number of actions in each state. + +This table lists the relevant fields of a policy. + +Field | Description | Type | Required | Read Only +:--- | :--- |:--- |:--- | +`policy_id` | The name of the policy. | `string` | Yes | Yes +`description` | A human-readable description of the policy. | `string` | Yes | No +`ism_template` | Specify an ISM template pattern that matches the index to apply the policy. | `nested list of objects` | No | No +`last_updated_time` | The time the policy was last updated. | `timestamp` | Yes | Yes +`error_notification` | The destination and message template for error notifications. The destination could be Amazon Chime, Slack, or a webhook URL. | `object` | No | No +`default_state` | The default starting state for each index that uses this policy. | `string` | Yes | No +`states` | The states that you define in the policy. | `nested list of objects` | Yes | No + +--- + +#### Table of contents +1. TOC +{:toc} + + +--- + +## States + +A state is the description of the status that the managed index is currently in. A managed index can be in only one state at a time. Each state has associated actions that are executed sequentially on entering a state and transitions that are checked after all the actions have been completed. + +This table lists the parameters that you can define for a state. + +Field | Description | Type | Required +:--- | :--- |:--- |:--- | +`name` | The name of the state. | `string` | Yes +`actions` | The actions to execute after entering a state. For more information, see [Actions](#actions). | `nested list of objects` | Yes +`transitions` | The next states and the conditions required to transition to those states. If no transitions exist, the policy assumes that it's complete and can now stop managing the index. For more information, see [Transitions](#transitions). | `nested list of objects` | Yes + +--- + +## Actions + +Actions are the steps that the policy sequentially executes on entering a specific state. + +They are executed in the order in which they are defined. + +This table lists the parameters that you can define for an action. + +Parameter | Description | Type | Required | Default +:--- | :--- |:--- |:--- | +`timeout` | The timeout period for the action. Accepts time units for minutes, hours, and days. | `time unit` | No | - +`retry` | The retry configuration for the action. | `object` | No | Specific to action + +The `retry` operation has the following parameters: + +Parameter | Description | Type | Required | Default +:--- | :--- |:--- |:--- | +`count` | The number of retry counts. | `number` | Yes | - +`backoff` | The backoff policy type to use when retrying. | `string` | No | Exponential +`delay` | The time to wait between retries. Accepts time units for minutes, hours, and days. | `time unit` | No | 1 minute + +The following example action has a timeout period of one hour. The policy retries this action three times with an exponential backoff policy, with a delay of 10 minutes between each retry: + +```json +"actions": { + "timeout": "1h", + "retry": { + "count": 3, + "backoff": "exponential", + "delay": "10m" + } +} +``` + +For a list of available unit types, see [Supported units](../../../opensearch/units/). + +## ISM supported operations + +ISM supports the following operations: + +- [force_merge](#forcemerge) +- [read_only](#read_only) +- [read_write](#read_write) +- [replica_count](#replica_count) +- [close](#close) +- [open](#open) +- [delete](#delete) +- [rollover](#rollover) +- [notification](#notification) +- [snapshot](#snapshot) +- [index_priority](#index_priority) +- [allocation](#allocation) + +### force_merge + +Reduces the number of Lucene segments by merging the segments of individual shards. This operation attempts to set the index to a `read-only` state before starting the merging process. + +Parameter | Description | Type | Required +:--- | :--- |:--- |:--- | +`max_num_segments` | The number of segments to reduce the shard to. | `number` | Yes + +```json +{ + "force_merge": { + "max_num_segments": 1 + } +} +``` + +### read_only + +Sets a managed index to be read only. + +```json +{ + "read_only": {} +} +``` + +### read_write + +Sets a managed index to be writeable. + +```json +{ + "read_write": {} +} +``` + +### replica_count + +Sets the number of replicas to assign to an index. + +Parameter | Description | Type | Required +:--- | :--- |:--- |:--- | +`number_of_replicas` | Defines the number of replicas to assign to an index. | `number` | Yes + +```json +{ + "replica_count": { + "number_of_replicas": 2 + } +} +``` + +For information about setting replicas, see [Primary and replica shards](../../../opensearch/#primary-and-replica-shards). + +### close + +Closes the managed index. + +```json +{ + "close": {} +} +``` + +Closed indices remain on disk, but consume no CPU or memory. You can't read from, write to, or search closed indices. + +Closing an index is a good option if you need to retain data for longer than you need to actively search it and have sufficient disk space on your data nodes. If you need to search the data again, reopening a closed index is simpler than restoring an index from a snapshot. + +### open + +Opens a managed index. + +```json +{ + "open": {} +} +``` + +### delete + +Deletes a managed index. + +```json +{ + "delete": {} +} +``` + +### rollover + +Rolls an alias over to a new index when the managed index meets one of the rollover conditions. + +The index format must match the pattern: `^.*-\d+$`. For example, `(logs-000001)`. +Set `index.opensearch.index_state_management.rollover_alias` as the alias to rollover. + +Parameter | Description | Type | Example | Required +:--- | :--- |:--- |:--- | +`min_size` | The minimum size of the total primary shard storage (not counting replicas) required to roll over the index. For example, if you set `min_size` to 100 GiB and your index has 5 primary shards and 5 replica shards of 20 GiB each, the total size of the primaries is 100 GiB, so the rollover occurs. ISM doesn't check indices continually, so it doesn't roll over indices at exactly 100 GiB. Instead, if an index is continuously growing, ISM might check it at 99 GiB, not perform the rollover, check again when the shards reach 105 GiB, and then perform the operation. | `string` | `20gb` or `5mb` | No +`min_doc_count` | The minimum number of documents required to roll over the index. | `number` | `2000000` | No +`min_index_age` | The minimum age required to roll over the index. Index age is the time between its creation and the present. | `string` | `5d` or `7h` | No + +```json +{ + "rollover": { + "min_size": "50gb" + } +} +``` + +```json +{ + "rollover": { + "min_doc_count": 100000000 + } +} +``` + +```json +{ + "rollover": { + "min_index_age": "30d" + } +} +``` + +### notification + +Sends you a notification. + +Parameter | Description | Type | Required +:--- | :--- |:--- |:--- | +`destination` | The destination URL. | `Slack, Amazon Chime, or webhook URL` | Yes +`message_template` | The text of the message. You can add variables to your messages using [Mustache templates](https://mustache.github.io/mustache.5.html). | `object` | Yes + +The destination system **must** return a response otherwise the notification operation throws an error. + +#### Example 1: Chime notification + +```json +{ + "notification": { + "destination": { + "chime": { + "url": "" + } + }, + "message_template": { + "source": "the index is {% raw %}{{ctx.index}}{% endraw %}" + } + } +} +``` + +#### Example 2: Custom webhook notification + +```json +{ + "notification": { + "destination": { + "custom_webhook": { + "url": "https://" + } + }, + "message_template": { + "source": "the index is {% raw %}{{ctx.index}}{% endraw %}" + } + } +} +``` + +#### Example 3: Slack notification + +```json +{ + "notification": { + "destination": { + "slack": { + "url": "https://hooks.slack.com/services/xxx/xxxxxx" + } + }, + "message_template": { + "source": "the index is {% raw %}{{ctx.index}}{% endraw %}" + } + } +} +``` + +You can use `ctx` variables in your message to represent a number of policy parameters based on the past executions of your policy. For example, if your policy has a rollover action, you can use `{% raw %}{{ctx.action.name}}{% endraw %}` in your message to represent the name of the rollover. + +The following `ctx` variable options are available for every policy: + +#### Guaranteed variables + +Parameter | Description | Type +:--- | :--- |:--- |:--- | +`index` | The name of the index. | `string` +`index_uuid` | The uuid of the index. | `string` +`policy_id` | The name of the policy. | `string` + +### snapshot + +Backup your cluster’s indices and state. For more information about snapshots, see [Take and restore snapshots](../../../opensearch/snapshot-restore/). + +The `snapshot` operation has the following parameters: + +Parameter | Description | Type | Required | Default +:--- | :--- |:--- |:--- | +`repository` | The repository name that you register through the native snapshot API operations. | `string` | Yes | - +`snapshot` | The name of the snapshot. | `string` | Yes | - + +```json +{ + "snapshot": { + "repository": "my_backup", + "snapshot": "my_snapshot" + } +} +``` + +### index_priority + +Set the priority for the index in a specific state. Unallocated shards of indices are recovered in the order of their priority, whenever possible. The indices with higher priority values are recovered first followed by the indices with lower priority values. + +The `index_priority` operation has the following parameter: + +Parameter | Description | Type | Required | Default +:--- | :--- |:--- |:--- |:--- +`priority` | The priority for the index as soon as it enters a state. | `number` | Yes | 1 + +```json +"actions": [ + { + "index_priority": { + "priority": 50 + } + } +] +``` + +### allocation + +Allocate the index to a node with a specific attribute. +For example, setting `require` to `warm` moves your data only to "warm" nodes. + +The `allocation` operation has the following parameters: + +Parameter | Description | Type | Required +:--- | :--- |:--- |:--- +`require` | Allocate the index to a node with a specified attribute. | `string` | Yes +`include` | Allocate the index to a node with any of the specified attributes. | `string` | Yes +`exclude` | Don’t allocate the index to a node with any of the specified attributes. | `string` | Yes +`wait_for` | Wait for the policy to execute before allocating the index to a node with a specified attribute. | `string` | Yes + +```json +"actions": [ + { + "allocation": { + "require": { "box_type": "warm" } + } + } +] +``` + +--- + +## Transitions + +Transitions define the conditions that need to be met for a state to change. After all actions in the current state are completed, the policy starts checking the conditions for transitions. + +Transitions are evaluated in the order in which they are defined. For example, if the conditions for the first transition are met, then this transition takes place and the rest of the transitions are dismissed. + +If you don't specify any conditions in a transition and leave it empty, then it's assumed to be the equivalent of always true. This means that the policy transitions the index to this state the moment it checks. + +This table lists the parameters you can define for transitions. + +Parameter | Description | Type | Required +:--- | :--- |:--- |:--- | +`state_name` | The name of the state to transition to if the conditions are met. | `string` | Yes +`conditions` | List the conditions for the transition. | `list` | Yes + +The `conditions` object has the following parameters: + +Parameter | Description | Type | Required +:--- | :--- |:--- |:--- | +`min_index_age` | The minimum age of the index required to transition. | `string` | No +`min_doc_count` | The minimum document count of the index required to transition. | `number` | No +`min_size` | The minimum size of the index required to transition. | `string` | No +`cron` | The `cron` job that triggers the transition if no other transition happens first. | `object` | No +`cron.cron.expression` | The `cron` expression that triggers the transition. | `string` | Yes +`cron.cron.timezone` | The timezone that triggers the transition. | `string` | Yes + +The following example transitions the index to a `cold` state after a period of 30 days: + +```json +"transitions": [ + { + "state_name": "cold", + "conditions": { + "min_index_age": "30d" + } + } +] +``` + +ISM checks the conditions on every execution of the policy based on the set interval. + +This example uses the `cron` condition to transition indices every Saturday at 5:00 PT: + +```json +"transitions": [ + { + "state_name": "cold", + "conditions": { + "cron": { + "cron": { + "expression": "* 17 * * SAT", + "timezone": "America/Los_Angeles" + } + } + } + } +] +``` + +Note that this condition does not execute at exactly 5:00 PM; the job still executes based off the `job_interval` setting. Due to this variance in start time and the amount of time that it can take for actions to complete prior to checking transition conditions, we recommend against overly narrow cron expressions. For example, don't use `15 17 * * SAT` (5:15 PM on Saturday). + +A window of an hour, which this example uses, is generally sufficient, but you might increase it to 2--3 hours to avoid missing the window and having to wait a week for the transition to occur. Alternately, you could use a broader expression such as `* * * * SAT,SUN` to have the transition occur at any time during the weekend. + +For information on writing cron expressions, see [Cron expression reference](../../../alerting/cron/). + +--- + +## Error notifications + +The `error_notification` operation sends you a notification if your managed index fails. +It notifies a single destination with a custom message. + +Set up error notifications at the policy level: + +```json +{ + "policy": { + "description": "hot warm delete workflow", + "default_state": "hot", + "schema_version": 1, + "error_notification": { }, + "states": [ ] + } +} +``` + +Parameter | Description | Type | Required +:--- | :--- |:--- |:--- | +`destination` | The destination URL. | `Slack, Amazon Chime, or webhook URL` | Yes +`message_template` | The text of the message. You can add variables to your messages using [Mustache templates](https://mustache.github.io/mustache.5.html). | `object` | Yes + +The destination system **must** return a response otherwise the `error_notification` operation throws an error. + +#### Example 1: Chime notification + +```json +{ + "error_notification": { + "destination": { + "chime": { + "url": "" + } + }, + "message_template": { + "source": "The index {% raw %}{{ctx.index}}{% endraw %} failed during policy execution." + } + } +} +``` + +#### Example 2: Custom webhook notification + +```json +{ + "error_notification": { + "destination": { + "custom_webhook": { + "url": "https://" + } + }, + "message_template": { + "source": "The index {% raw %}{{ctx.index}}{% endraw %} failed during policy execution." + } + } +} +``` + +#### Example 3: Slack notification + +```json +{ + "error_notification": { + "destination": { + "slack": { + "url": "https://hooks.slack.com/services/xxx/xxxxxx" + } + }, + "message_template": { + "source": "The index {% raw %}{{ctx.index}}{% endraw %} failed during policy execution." + } + } +} +``` + +You can use the same options for `ctx` variables as the [notification](#notification) operation. + +## Sample policy with ISM template + +The following sample template policy is for a rollover use case: + +1. Create a policy with an `ism_template` field. + +```json +PUT _opensearch/_ism/policies/rollover_policy +{ + "policy": { + "description": "Example rollover policy.", + "default_state": "rollover", + "states": [ + { + "name": "rollover", + "actions": [ + { + "rollover": { + "min_doc_count": 1 + } + } + ], + "transitions": [] + } + ], + "ism_template": { + "index_patterns": ["log*"], + "priority": 100 + } + } +} +``` + +You need to specify the `index_patterns` field. If you don't specify a value for `priority`, it defaults to 0. + +1. Set up a template with the `rollover_alias` as `log` : + +```json +PUT _template/ism_rollover +{ + "index_patterns": ["log*"], + "settings": { + "opensearch.index_state_management.rollover_alias": "log" + } +} +``` + +1. Create an index with the `log` alias: + +```json +PUT log-000001 +{ + "aliases": { + "log": { + "is_write_index": true + } + } +} +``` + +1. Index a document to trigger the rollover condition: + +```json +POST log/_doc +{ + "message": "dummy" +} +``` + +## Example policy + +The following example policy implements a `hot`, `warm`, and `delete` workflow. You can use this policy as a template to prioritize resources to your indices based on their levels of activity. + +In this case, an index is initially in a `hot` state. After a day, it changes to a `warm` state, where the number of replicas increases to 5 to improve the read performance. + +After 30 days, the policy moves this index into a `delete` state. The service sends a notification to a Chime room that the index is being deleted, and then permanently deletes it. + +```json +{ + "policy": { + "description": "hot warm delete workflow", + "default_state": "hot", + "schema_version": 1, + "states": [ + { + "name": "hot", + "actions": [ + { + "rollover": { + "min_index_age": "1d" + } + } + ], + "transitions": [ + { + "state_name": "warm" + } + ] + }, + { + "name": "warm", + "actions": [ + { + "replica_count": { + "number_of_replicas": 5 + } + } + ], + "transitions": [ + { + "state_name": "delete", + "conditions": { + "min_index_age": "30d" + } + } + ] + }, + { + "name": "delete", + "actions": [ + { + "notification": { + "destination": { + "chime": { + "url": "" + } + }, + "message_template": { + "source": "The index {% raw %}{{ctx.index}}{% endraw %} is being deleted" + } + } + }, + { + "delete": {} + } + ] + } + ] + } +} +``` + +This diagram shows the `states`, `transitions`, and `actions` of the above policy as a finite-state machine. For more information about finite-state machines, see [Wikipedia](https://en.wikipedia.org/wiki/Finite-state_machine). + +![Policy State Machine](../../images/ism.png) diff --git a/docs/im/ism/settings.md b/docs/im/ism/settings.md new file mode 100644 index 00000000..6d2ea017 --- /dev/null +++ b/docs/im/ism/settings.md @@ -0,0 +1,50 @@ +--- +layout: default +title: Settings +parent: Index State Management +grand_parent: Index management +redirect_from: /docs/ism/settings/ +nav_order: 4 +--- + +# ISM Settings + +We don't recommend changing these settings; the defaults should work well for most use cases. + +Index State Management (ISM) stores its configuration in the `.opensearch-ism-config` index. Don't modify this index without using the [ISM API operations](../api/). + +All settings are available using the OpenSearch `_cluster/settings` operation. None require a restart, and all can be marked `persistent` or `transient`. + +Setting | Default | Description +:--- | :--- | :--- +`opensearch.index_state_management.enabled` | True | Specifies whether ISM is enabled or not. +`opensearch.index_state_management.job_interval` | 5 minutes | The interval at which the managed index jobs are run. +`opensearch.index_state_management.coordinator.sweep_period` | 10 minutes | How often the routine background sweep is run. +`opensearch.index_state_management.coordinator.backoff_millis` | 50 milliseconds | The backoff time between retries for failures in the `ManagedIndexCoordinator` (such as when we update managed indices). +`opensearch.index_state_management.coordinator.backoff_count` | 2 | The count of retries for failures in the `ManagedIndexCoordinator`. +`opensearch.index_state_management.history.enabled` | True | Specifies whether audit history is enabled or not. The logs from ISM are automatically indexed to a logs document. +`opensearch.index_state_management.history.max_docs` | 2,500,000 | The maximum number of documents before rolling over the audit history index. +`opensearch.index_state_management.history.max_age` | 24 hours | The maximum age before rolling over the audit history index. +`opensearch.index_state_management.history.rollover_check_period` | 8 hours | The time between rollover checks for the audit history index. +`opensearch.index_state_management.history.rollover_retention_period` | 30 days | How long audit history indices are kept. +`opensearch.index_state_management.allow_list` | All actions | List of actions that you can use. + + +## Audit history indices + +If you don't want to disable ISM audit history or shorten the retention period, you can create an [index template](../../../opensearch/index-templates/) to reduce the shard count of the history indices: + +```json +PUT _index_template/ism_history_indices +{ + "index_patterns": [ + ".opensearch-ism-managed-index-history-*" + ], + "template": { + "settings": { + "number_of_shards": 1, + "number_of_replicas": 0 + } + } +} +``` diff --git a/docs/im/refresh-analyzer/index.md b/docs/im/refresh-analyzer/index.md new file mode 100644 index 00000000..4c28bf4b --- /dev/null +++ b/docs/im/refresh-analyzer/index.md @@ -0,0 +1,40 @@ +--- +layout: default +title: Refresh Search Analyzer +nav_order: 40 +parent: Index management +has_children: false +redirect_from: /docs/ism/refresh-analyzer/ +has_toc: false +--- + +# Refresh search analyzer + +With ISM installed, you can refresh search analyzers in real time with the following API: + +```json +POST /_opensearch/_refresh_search_analyzers/ +``` +For example, if you change the synonym list in your analyzer, the change takes effect without you needing to close and reopen the index. + +To work, the token filter must have an `updateable` flag of `true`: + +```json +{ + "analyzer": { + "my_synonyms": { + "tokenizer": "whitespace", + "filter": [ + "synonym" + ] + } + }, + "filter": { + "synonym": { + "type": "synonym_graph", + "synonyms_path": "synonyms.txt", + "updateable": true + } + } +} +``` diff --git a/docs/images/ad.png b/docs/images/ad.png new file mode 100644 index 00000000..8bf13752 Binary files /dev/null and b/docs/images/ad.png differ diff --git a/docs/images/alerting.png b/docs/images/alerting.png new file mode 100644 index 00000000..4a2076d0 Binary files /dev/null and b/docs/images/alerting.png differ diff --git a/docs/images/cli.gif b/docs/images/cli.gif new file mode 100644 index 00000000..e55c47ba Binary files /dev/null and b/docs/images/cli.gif differ diff --git a/docs/images/cluster.png b/docs/images/cluster.png new file mode 100644 index 00000000..f5262a06 Binary files /dev/null and b/docs/images/cluster.png differ diff --git a/docs/images/expression.png b/docs/images/expression.png new file mode 100644 index 00000000..ab853f1d Binary files /dev/null and b/docs/images/expression.png differ diff --git a/docs/images/expressionAtom.png b/docs/images/expressionAtom.png new file mode 100644 index 00000000..9572c10e Binary files /dev/null and b/docs/images/expressionAtom.png differ diff --git a/docs/images/gantt-chart.png b/docs/images/gantt-chart.png new file mode 100644 index 00000000..105bb772 Binary files /dev/null and b/docs/images/gantt-chart.png differ diff --git a/docs/images/hot-rod.png b/docs/images/hot-rod.png new file mode 100644 index 00000000..ef962b8f Binary files /dev/null and b/docs/images/hot-rod.png differ diff --git a/docs/images/ism.png b/docs/images/ism.png new file mode 100644 index 00000000..680013cf Binary files /dev/null and b/docs/images/ism.png differ diff --git a/docs/images/joinPart.png b/docs/images/joinPart.png new file mode 100644 index 00000000..635f4838 Binary files /dev/null and b/docs/images/joinPart.png differ diff --git a/docs/images/markdown-notebook.png b/docs/images/markdown-notebook.png new file mode 100644 index 00000000..bfe06b72 Binary files /dev/null and b/docs/images/markdown-notebook.png differ diff --git a/docs/images/perftop-grid.png b/docs/images/perftop-grid.png new file mode 100644 index 00000000..90aae999 Binary files /dev/null and b/docs/images/perftop-grid.png differ diff --git a/docs/images/perftop.png b/docs/images/perftop.png new file mode 100644 index 00000000..0cdb3e40 Binary files /dev/null and b/docs/images/perftop.png differ diff --git a/docs/images/ppl.png b/docs/images/ppl.png new file mode 100644 index 00000000..c8cc6f91 Binary files /dev/null and b/docs/images/ppl.png differ diff --git a/docs/images/predicate.png b/docs/images/predicate.png new file mode 100644 index 00000000..ebc83fdc Binary files /dev/null and b/docs/images/predicate.png differ diff --git a/docs/images/reporting-error.png b/docs/images/reporting-error.png new file mode 100644 index 00000000..8bb03b2f Binary files /dev/null and b/docs/images/reporting-error.png differ diff --git a/docs/images/saml-keycloak-sign-documents.png b/docs/images/saml-keycloak-sign-documents.png new file mode 100644 index 00000000..34ce392f Binary files /dev/null and b/docs/images/saml-keycloak-sign-documents.png differ diff --git a/docs/images/security-ccs.png b/docs/images/security-ccs.png new file mode 100644 index 00000000..55f32322 Binary files /dev/null and b/docs/images/security-ccs.png differ diff --git a/docs/images/security-dls.png b/docs/images/security-dls.png new file mode 100644 index 00000000..a0533e86 Binary files /dev/null and b/docs/images/security-dls.png differ diff --git a/docs/images/security.png b/docs/images/security.png new file mode 100644 index 00000000..cb206581 Binary files /dev/null and b/docs/images/security.png differ diff --git a/docs/images/selectElement.png b/docs/images/selectElement.png new file mode 100644 index 00000000..89ee1812 Binary files /dev/null and b/docs/images/selectElement.png differ diff --git a/docs/images/selectElements.png b/docs/images/selectElements.png new file mode 100644 index 00000000..22d6e43d Binary files /dev/null and b/docs/images/selectElements.png differ diff --git a/docs/images/showFilter.png b/docs/images/showFilter.png new file mode 100644 index 00000000..47dbc074 Binary files /dev/null and b/docs/images/showFilter.png differ diff --git a/docs/images/showStatement.png b/docs/images/showStatement.png new file mode 100644 index 00000000..a1939e7d Binary files /dev/null and b/docs/images/showStatement.png differ diff --git a/docs/images/singleDeleteStatement.png b/docs/images/singleDeleteStatement.png new file mode 100644 index 00000000..9b1a88c4 Binary files /dev/null and b/docs/images/singleDeleteStatement.png differ diff --git a/docs/images/sql.png b/docs/images/sql.png new file mode 100644 index 00000000..39a7546a Binary files /dev/null and b/docs/images/sql.png differ diff --git a/docs/images/ta-dashboard.png b/docs/images/ta-dashboard.png new file mode 100644 index 00000000..70065261 Binary files /dev/null and b/docs/images/ta-dashboard.png differ diff --git a/docs/images/ta-diagram.drawio b/docs/images/ta-diagram.drawio new file mode 100644 index 00000000..4c6c2193 --- /dev/null +++ b/docs/images/ta-diagram.drawio @@ -0,0 +1 @@ +5Vhdb5swFP01eawEmBD62CZpp31ok1qpe5scfAPeDJca0yT79bskpoSQRYm0bCx9ifC5Juaec66vYcDG6fJe8zz5hALUwHPEcsAmA88LGaPfClhtgOHwegPEWooN5DbAg/wJFnQsWkoBRWuiQVRG5m0wwiyDyLQwrjUu2tPmqNqr5jyGDvAQcdVFn6QwiU1r6DT4O5BxUq/sOjaS8nqyBYqEC1xsQWw6YGONaDZX6XIMquKu5mVz391voq8PpiEzx9xw5Rjff35/823+dTLxnp30qRBXweZfXrgqbcL2Yc2qZgAEEWKHqE2CMWZcTRv0VmOZCaiWcWjUzPmImBPoEvgdjFlZdXlpkKDEpMpG55gZG3QDGndTs9kWWOoIDuRTW4TrGMyBeTbNKretBSxx94ApGL2iCRoUN/KlbQZuPRW/zmtopwvL/AkquB0VbvJcyYiWxqwjSEN3xd0ikQYecr6mZUE1uEOtVGqMCvX6XiY4hPOI8MJo/AFbkSAKYTY/RYwX0AaWB+mzUd+WgN0D3NCOF01FuXWZJFvVFDhnIjy8MNt7R9qe9cr2XkeFzzlkj6Cgfg7yp6JdnTz6/xeBF/atCq4vrArYkVXg96oKWEeFCTeckC8a8hz+sPWHEAp/n/VDb8aC4DzWH3p9s747ujDv+0d63w16ZX6/2wImd9PqPK/KwlyE+UdB78zfPW8+6opDek1bbz07nFOipk1sm8AMM9hh20JcyTijYUT0kZjstqKNzrXqxgZSKcS6lPYp2S6vM0jj7kozOlIa/2zS7CmInTPR25LId3on0bAj0VTxgjgrgOsoqSTCqEwp5+Lt6BTsvuP9e5263zY+yBnP+LqGimSGXIsD/cU5ub+ASx1mtK+/XAcjxv9Sf/HO2F9o2HywWse2vvqx6S8= \ No newline at end of file diff --git a/docs/images/ta-services.png b/docs/images/ta-services.png new file mode 100644 index 00000000..2d7e6389 Binary files /dev/null and b/docs/images/ta-services.png differ diff --git a/docs/images/ta-trace.png b/docs/images/ta-trace.png new file mode 100644 index 00000000..77471dc8 Binary files /dev/null and b/docs/images/ta-trace.png differ diff --git a/docs/images/ta.svg b/docs/images/ta.svg new file mode 100644 index 00000000..855297bd --- /dev/null +++ b/docs/images/ta.svg @@ -0,0 +1,3 @@ + + +
Application
Application
OpenTelemetry Collector
OpenTelemetry C...
Data Prepper
Data Prepper
OpenSearch cluster
OpenSearch cluster
Trace data
Trace data
OpenTelemetry data
OpenTelemetry d...
OpenSearch documents
OpenSearch d...
OpenSearch Dashboards dashboard
OpenSearch Dashboards dashboard
Viewer does not support full SVG 1.1
diff --git a/docs/images/tableName.png b/docs/images/tableName.png new file mode 100644 index 00000000..c3b1c5d0 Binary files /dev/null and b/docs/images/tableName.png differ diff --git a/docs/images/tableSource.png b/docs/images/tableSource.png new file mode 100644 index 00000000..f109f44d Binary files /dev/null and b/docs/images/tableSource.png differ diff --git a/docs/images/workbench.gif b/docs/images/workbench.gif new file mode 100644 index 00000000..cbfa312a Binary files /dev/null and b/docs/images/workbench.gif differ diff --git a/docs/install/docker-security.md b/docs/install/docker-security.md new file mode 100644 index 00000000..0312f5c7 --- /dev/null +++ b/docs/install/docker-security.md @@ -0,0 +1,143 @@ +--- +layout: default +title: Docker security configuration +parent: Install and configure +nav_order: 5 +--- + +# Docker security configuration + +Before deploying to a production environment, you should replace the demo security certificates and configuration YAML files with your own. With the tarball, you have direct access to the file system, but the Docker image requires modifying the Docker storage volumes include the replacement files. + +Additionally, you can set the Docker environment variable `DISABLE_INSTALL_DEMO_CONFIG` to `true`. This change completely disables the demo installer. + +#### Sample Docker Compose file + +```yml +version: '3' +services: + opensearch-node1: + image: amazon/opensearch:{{site.opensearch_version}} + container_name: opensearch-node1 + environment: + - cluster.name=opensearch-cluster + - node.name=opensearch-node1 + - discovery.seed_hosts=opensearch-node1,opensearch-node2 + - cluster.initial_master_nodes=opensearch-node1,opensearch-node2 + - bootstrap.memory_lock=true # along with the memlock settings below, disables swapping + - "ES_JAVA_OPTS=-Xms512m -Xmx512m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM + - network.host=0.0.0.0 # required if not using the demo security configuration + ulimits: + memlock: + soft: -1 + hard: -1 + nofile: + soft: 65536 # maximum number of open files for the OpenSearch user, set to at least 65536 on modern systems + hard: 65536 + volumes: + - opensearch-data1:/usr/share/opensearch/data + - ./root-ca.pem:/usr/share/opensearch/config/root-ca.pem + - ./node.pem:/usr/share/opensearch/config/node.pem + - ./node-key.pem:/usr/share/opensearch/config/node-key.pem + - ./admin.pem:/usr/share/opensearch/config/admin.pem + - ./admin-key.pem:/usr/share/opensearch/config/admin-key.pem + - ./custom-opensearch.yml:/usr/share/opensearch/config/opensearch.yml + - ./internal_users.yml:/usr/share/opensearch/plugins/opensearch_security/securityconfig/internal_users.yml + - ./roles_mapping.yml:/usr/share/opensearch/plugins/opensearch_security/securityconfig/roles_mapping.yml + - ./tenants.yml:/usr/share/opensearch/plugins/opensearch_security/securityconfig/tenants.yml + - ./roles.yml:/usr/share/opensearch/plugins/opensearch_security/securityconfig/roles.yml + - ./action_groups.yml:/usr/share/opensearch/plugins/opensearch_security/securityconfig/action_groups.yml + ports: + - 9200:9200 + - 9600:9600 # required for Performance Analyzer + networks: + - opensearch-net + opensearch-node2: + image: amazon/opensearch:{{site.opensearch_version}} + container_name: opensearch-node2 + environment: + - cluster.name=opensearch-cluster + - node.name=opensearch-node2 + - discovery.seed_hosts=opensearch-node1,opensearch-node2 + - cluster.initial_master_nodes=opensearch-node1,opensearch-node2 + - bootstrap.memory_lock=true + - "ES_JAVA_OPTS=-Xms512m -Xmx512m" + - network.host=0.0.0.0 + ulimits: + memlock: + soft: -1 + hard: -1 + nofile: + soft: 65536 + hard: 65536 + volumes: + - opensearch-data2:/usr/share/opensearch/data + - ./root-ca.pem:/usr/share/opensearch/config/root-ca.pem + - ./node.pem:/usr/share/opensearch/config/node.pem + - ./node-key.pem:/usr/share/opensearch/config/node-key.pem + - ./admin.pem:/usr/share/opensearch/config/admin.pem + - ./admin-key.pem:/usr/share/opensearch/config/admin-key.pem + - ./custom-opensearch.yml:/usr/share/opensearch/config/opensearch.yml + - ./internal_users.yml:/usr/share/opensearch/plugins/opensearch_security/securityconfig/internal_users.yml + - ./roles_mapping.yml:/usr/share/opensearch/plugins/opensearch_security/securityconfig/roles_mapping.yml + - ./tenants.yml:/usr/share/opensearch/plugins/opensearch_security/securityconfig/tenants.yml + - ./roles.yml:/usr/share/opensearch/plugins/opensearch_security/securityconfig/roles.yml + - ./action_groups.yml:/usr/share/opensearch/plugins/opensearch_security/securityconfig/action_groups.yml + networks: + - opensearch-net + opensearch-dashboards + image: amazon/opensearch-dashboards{{site.opensearch_version}} + container_name: opensearch-dashboards + ports: + - 5601:5601 + expose: + - "5601" + environment: + OPENSEARCH_URL: https://opensearch-node1:9200 + OPENSEARCH_HOSTS: https://opensearch-node1:9200 + volumes: + - ./custom-opensearch_dashboards.yml:/usr/share/opensearch-dashboards/config/opensearch_dashboards.yml + networks: + - opensearch-net + +volumes: + opensearch-data1: + opensearch-data2: + +networks: + opensearch-net: +``` + +Then make your changes to `opensearch.yml`. For a full list of settings, see [Security](../../security/configuration/). This example adds (extremely) verbose audit logging: + +```yml +opensearch_security.ssl.transport.pemcert_filepath: node.pem +opensearch_security.ssl.transport.pemkey_filepath: node-key.pem +opensearch_security.ssl.transport.pemtrustedcas_filepath: root-ca.pem +opensearch_security.ssl.transport.enforce_hostname_verification: false +opensearch_security.ssl.http.enabled: true +opensearch_security.ssl.http.pemcert_filepath: node.pem +opensearch_security.ssl.http.pemkey_filepath: node-key.pem +opensearch_security.ssl.http.pemtrustedcas_filepath: root-ca.pem +opensearch_security.allow_default_init_securityindex: true +opensearch_security.authcz.admin_dn: + - CN=A,OU=UNIT,O=ORG,L=TORONTO,ST=ONTARIO,C=CA +opensearch_security.nodes_dn: + - 'CN=N,OU=UNIT,O=ORG,L=TORONTO,ST=ONTARIO,C=CA' +opensearch_security.audit.type: internal_opensearch +opensearch_security.enable_snapshot_restore_privilege: true +opensearch_security.check_snapshot_restore_write_privileges: true +opensearch_security.restapi.roles_enabled: ["all_access", "security_rest_api_access"] +cluster.routing.allocation.disk.threshold_enabled: false +opensearch_security.audit.config.disabled_rest_categories: NONE +opensearch_security.audit.config.disabled_transport_categories: NONE +``` + +Use this same override process to specify new [authentication settings](../../security/configuration/configuration/) in `/usr/share/opensearch/plugins/opensearch_security/securityconfig/config.yml`, as well as new default [internal users, roles, mappings, action groups, and tenants](../../security/configuration/yaml/). + +To start the cluster, run `docker-compose up`. + +If you encounter any `File /usr/share/opensearch/config/opensearch.yml has insecure file permissions (should be 0600)` messages, you can use `chmod` to set file permissions before running `docker-compose up`. Docker Compose passes files to the container as-is. +{: .note } + +Finally, you can reach OpenSearch Dashboards at http://localhost:5601, sign in, and use the **Security** panel to perform other management tasks. diff --git a/docs/install/docker.md b/docs/install/docker.md new file mode 100644 index 00000000..fadf5e1a --- /dev/null +++ b/docs/install/docker.md @@ -0,0 +1,344 @@ +--- +layout: default +title: Docker +parent: Install and configure +nav_order: 1 +--- + +# Docker image + +You can pull the OpenSearch Docker image just like any other image: + +```bash +docker pull amazon/opensearch:{{site.opensearch_version}} +docker pull amazon/opensearch-dashboards{{site.opensearch_version}} +``` + +To check available versions, see [Docker Hub](https://hub.docker.com/r/amazon/opensearch/tags). + +OpenSearch images use `centos:7` as the base image. If you run Docker locally, we recommend allowing Docker to use at least 4 GB of RAM in **Preferences** > **Resources**. + + +--- + +#### Table of contents +1. TOC +{:toc} + + +--- + +## Run the image + +To run the image for local development: + +```bash +docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" amazon/opensearch:{{site.opensearch_version}} +``` + +Then send requests to the server to verify that OpenSearch is up and running: + +```bash +curl -XGET https://localhost:9200 -u 'admin:admin' --insecure +curl -XGET https://localhost:9200/_cat/nodes?v -u 'admin:admin' --insecure +curl -XGET https://localhost:9200/_cat/plugins?v -u 'admin:admin' --insecure +``` + +To find the container ID: + +```bash +docker ps +``` + +Then you can stop the container using: + +```bash +docker stop +``` + + +## Start a cluster + +To deploy multiple nodes and simulate a more realistic deployment, create a [docker-compose.yml](https://docs.docker.com/compose/compose-file/) file appropriate for your environment and run: + +```bash +docker-compose up +``` + +To stop the cluster, run: + +```bash +docker-compose down +``` + +To stop the cluster and delete all data volumes, run: + +```bash +docker-compose down -v +``` + + +#### Sample Docker Compose file + +This sample file starts two data nodes and a container for OpenSearch Dashboards. + +```yml +version: '3' +services: + opensearch-node1: + image: opensearchstaging/opensearch:latest + container_name: opensearch-node1 + environment: + - cluster.name=opensearch-cluster + - node.name=opensearch-node1 + - discovery.seed_hosts=opensearch-node1,opensearch-node2 + - cluster.initial_master_nodes=opensearch-node1,opensearch-node2 + - bootstrap.memory_lock=true # along with the memlock settings below, disables swapping + - "ES_JAVA_OPTS=-Xms512m -Xmx512m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM + ulimits: + memlock: + soft: -1 + hard: -1 + nofile: + soft: 65536 # maximum number of open files for the OpenSearch user, set to at least 65536 on modern systems + hard: 65536 + volumes: + - opensearch-data1:/usr/share/opensearch/data + ports: + - 9200:9200 + - 9600:9600 # required for Performance Analyzer + networks: + - opensearch-net + opensearch-node2: + image: opensearchstaging/opensearch:latest + container_name: opensearch-node2 + environment: + - cluster.name=opensearch-cluster + - node.name=opensearch-node2 + - discovery.seed_hosts=opensearch-node1,opensearch-node2 + - cluster.initial_master_nodes=opensearch-node1,opensearch-node2 + - bootstrap.memory_lock=true + - "ES_JAVA_OPTS=-Xms512m -Xmx512m" + ulimits: + memlock: + soft: -1 + hard: -1 + nofile: + soft: 65536 + hard: 65536 + volumes: + - opensearch-data2:/usr/share/opensearch/data + networks: + - opensearch-net + opensearch-dashboards: + image: opensearchstaging/opensearch-dashboards:latest + container_name: opensearch-dashboards + ports: + - 5601:5601 + expose: + - "5601" + environment: + OPENSEARCH_HOSTS: https://opensearch-node1:9200 + networks: + - opensearch-net + +volumes: + opensearch-data1: + opensearch-data2: + +networks: + opensearch-net: + +``` + +If you override `opensearch_dashboards.yml` settings using environment variables, as seen above, use all uppercase letters and periods in place of underscores (e.g. for `opensearch.url`, specify `OPENSEARCH_URL`). +{: .note} + + +## Configure OpenSearch + +You can pass a custom `opensearch.yml` file to the Docker container using the [`-v` flag](https://docs.docker.com/engine/reference/commandline/run/#mount-volume--v---read-only) for `docker run`: + +```bash +docker run \ +-p 9200:9200 -p 9600:9600 \ +-e "discovery.type=single-node" \ +-v //custom-opensearch.yml:/usr/share/opensearch/config/opensearch.yml \ +amazon/opensearch:{{site.opensearch_version}} +``` + +You can perform the same operation in `docker-compose.yml` using a relative path: + +```yml +services: + opensearch-node1: + volumes: + - opensearch-data1:/usr/share/opensearch/data + - ./custom-opensearch.yml:/usr/share/opensearch/config/opensearch.yml + opensearch-node2: + volumes: + - opensearch-data2:/usr/share/opensearch/data + - ./custom-opensearch.yml:/usr/share/opensearch/config/opensearch.yml + opensearch-dashboards + volumes: + - ./custom-opensearch_dashboards.yml:/usr/share/opensearch-dashboards/config/opensearch_dashboards.yml +``` + +You can use this same method to [pass your own certificates](../docker-security/) to the containers for use with the [Security](../../security/configuration/) plugin. + + +### (Optional) Set up Performance Analyzer + +1. Enable the Performance Analyzer plugin: + + ```bash + curl -XPOST localhost:9200/_opensearch/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' + ``` + + If you receive the `curl: (52) Empty reply from server` error, you are likely protecting your cluster with the security plugin and you need to provide credentials. Modify the following command to use your username and password: + + ```bash + curl -XPOST https://localhost:9200/_opensearch/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' -u 'admin:admin' -k + ``` + +1. Enable the Root Cause Analyzer (RCA) framework + + ```bash + curl -XPOST localhost:9200/_opensearch/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' + ``` + + Similar to step 1, if you run into `curl: (52) Empty reply from server`, run the command below to enable RCA + + ```bash + curl -XPOST https://localhost:9200/_opensearch/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' -u 'admin:admin' -k + ``` + +1. By default, Performance Analyzer's endpoints are not accessible from outside the Docker container. + + To edit this behavior, open a shell session in the container and modify the configuration: + + ```bash + docker ps # Look up the container id + docker exec -it /bin/bash + # Inside container + cd plugins/opensearch_performance_analyzer/pa_config/ + vi performance-analyzer.properties + ``` + + Uncomment the line `#webservice-bind-host` and set it to `0.0.0.0`: + + ``` + # ======================== OpenSearch performance analyzer plugin config ========================= + + # NOTE: this is an example for Linux. Please modify the config accordingly if you are using it under other OS. + + # WebService bind host; default to all interfaces + webservice-bind-host = 0.0.0.0 + + # Metrics data location + metrics-location = /dev/shm/performanceanalyzer/ + + # Metrics deletion interval (minutes) for metrics data. + # Interval should be between 1 to 60. + metrics-deletion-interval = 1 + + # If set to true, the system cleans up the files behind it. So at any point, we should expect only 2 + # metrics-db-file-prefix-path files. If set to false, no files are cleaned up. This can be useful, if you are archiving + # the files and wouldn't like for them to be cleaned up. + cleanup-metrics-db-files = true + + # WebService exposed by App's port + webservice-listener-port = 9600 + + # Metric DB File Prefix Path location + metrics-db-file-prefix-path = /tmp/metricsdb_ + + https-enabled = false + + #Setup the correct path for certificates + certificate-file-path = specify_path + + private-key-file-path = specify_path + + # Plugin Stats Metadata file name, expected to be in the same location + plugin-stats-metadata = plugin-stats-metadata + + # Agent Stats Metadata file name, expected to be in the same location + agent-stats-metadata = agent-stats-metadata + ``` + +1. Then restart the Performance Analyzer agent: + + ```bash + kill $(ps aux | grep -i 'PerformanceAnalyzerApp' | grep -v grep | awk '{print $2}') + ``` + + +## Bash access to containers + +To create an interactive Bash session in a container, run `docker ps` to find the container ID. Then run: + +```bash +docker exec -it /bin/bash +``` + + +## Important settings + +For production workloads, make sure the [Linux setting](https://www.kernel.org/doc/Documentation/sysctl/vm.txt) `vm.max_map_count` is set to at least 262144. On the OpenSearch Docker image, this setting is the default. To verify, start a Bash session in the container and run: + +```bash +cat /proc/sys/vm/max_map_count +``` + +To increase this value, you have to modify the Docker image. For other install types, add this setting to the host machine's `/etc/sysctl.conf` file with the following line: + +``` +vm.max_map_count=262144 +``` + +Then run `sudo sysctl -p` to reload. + +The `docker-compose.yml` file above also contains several key settings: `bootstrap.memory_lock=true`, `ES_JAVA_OPTS=-Xms512m -Xmx512m`, `nofile 65536` and `port 9600`. Respectively, these settings disable memory swapping (along with `memlock`), set the size of the Java heap (we recommend half of system RAM), set a limit of 65536 open files for the OpenSearch user, and allow you to access Performance Analyzer on port 9600. + + +## Customize the Docker image + +To run the image with a custom plugin, first create a [`Dockerfile`](https://docs.docker.com/engine/reference/builder/): + +``` +FROM amazon/opensearch:{{site.opensearch_version}} +RUN /usr/share/opensearch/bin/opensearch-plugin install --batch +``` + +Then run the following commands: + +```bash +docker build --tag=opensearch-custom-plugin . +docker run -p 9200:9200 -p 9600:9600 -v /usr/share/opensearch/data opensearch-custom-plugin +``` + +You can also use a `Dockerfile` to pass your own certificates for use with the [Security](../../security/) plugin, similar to the `-v` argument in [Configure OpenSearch](#configure-opensearch): + +``` +FROM amazon/opensearch:{{site.opensearch_version}} +COPY --chown=opensearch:opensearch opensearch.yml /usr/share/opensearch/config/ +COPY --chown=opensearch:opensearch my-key-file.pem /usr/share/opensearch/config/ +COPY --chown=opensearch:opensearch my-certificate-chain.pem /usr/share/opensearch/config/ +COPY --chown=opensearch:opensearch my-root-cas.pem /usr/share/opensearch/config/ +``` + +Alternately, you might want to remove a plugin. This `Dockerfile` removes the security plugin: + +``` +FROM amazon/opensearch:{{site.opensearch_version}} +RUN /usr/share/opensearch/bin/opensearch-plugin remove opensearch_security +COPY --chown=opensearch:opensearch opensearch.yml /usr/share/opensearch/config/ +``` + +In this case, `opensearch.yml` is a "vanilla" version of the file with no OpenSearch entries. It might look like this: + +```yml +cluster.name: "docker-cluster" +network.host: 0.0.0.0 +``` diff --git a/docs/install/index.md b/docs/install/index.md new file mode 100644 index 00000000..6b1c4d69 --- /dev/null +++ b/docs/install/index.md @@ -0,0 +1,10 @@ +--- +layout: default +title: Install and configure +nav_order: 3 +has_children: true +--- + +# Install and configure OpenSearch + +OpenSearch two installation options at this time: Docker images and tarballs. diff --git a/docs/install/plugins.md b/docs/install/plugins.md new file mode 100644 index 00000000..8537de20 --- /dev/null +++ b/docs/install/plugins.md @@ -0,0 +1,257 @@ +--- +layout: default +title: OpenSearch plugin install +parent: Install and configure +nav_order: 90 +--- + +# Standalone OpenSearch plugin installation + +If you don't want to use the all-in-one OpenSearch installation options, you can install the individual plugins on a compatible OpenSearch cluster, just like any other plugin. + + +--- + +#### Table of contents +1. TOC +{:toc} + + +--- + +## Plugin compatibility + + + + + + + + + + + + + + +
OpenSearch versionPlugin versions
1.0.0-beta1 +
opensearch-alerting             1.13.1.0
+opensearch-anomaly-detection    1.13.0.0
+opensearch-asynchronous-search  1.13.0.1
+opensearch-index-management     1.13.2.0
+opensearch-job-scheduler        1.13.0.0
+opensearch-knn                  1.13.0.0
+opensearch-performance-analyzer 1.13.0.0
+opensearch-reports-scheduler    1.13.0.0
+opensearch-sql                  1.13.2.0
+opensearch_security             1.13.1.0
+
+
+ +To install plugins manually, you must have the exact OSS version of OpenSearch installed (for example, 6.6.2 and not 6.6.1). To get a list of available OpenSearch versions on CentOS 7 and Amazon Linux 2, run the following command: + +```bash +sudo yum list opensearch-oss --showduplicates +``` + +Then you can specify the version that you need: + +```bash +sudo yum install opensearch-oss-6.7.1 +``` + + +## Install plugins + +Navigate to the OpenSearch home directory (most likely, it is `/usr/share/opensearch`), and run the install command for each plugin. + + +### Security + +```bash +sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-security/opensearch-security-{{site.opensearch_major_minor_version}}.1.0.zip +``` + +After installing the security plugin, you can run `sudo sh /usr/share/opensearch/plugins/opensearch_security/tools/install_demo_configuration.sh` to quickly get started with demo certificates. Otherwise, you must configure it manually and run [securityadmin.sh](../../security/configuration/security-admin/). + +The security plugin has a corresponding [OpenSearch Dashboards plugin](../../opensearch-dashboards/plugins) that you probably want to install as well. + + +### Job scheduler + +```bash +sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-job-scheduler/opensearch-job-scheduler-{{site.opensearch_major_minor_version}}.0.0.zip +``` + + +### Alerting + +```bash +sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-alerting/opensearch-alerting-{{site.opensearch_major_minor_version}}.1.0.zip +``` + +To install Alerting, you must first install the Job Scheduler plugin. Alerting has a corresponding [OpenSearch Dashboards plugin](../../opensearch-dashboards/plugins) that you probably want to install as well. + + +### SQL + +```bash +sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-sql/opensearch-sql-{{site.opensearch_major_minor_version}}.2.0.zip +``` + + +### Reports scheduler + +```bash +sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-reports-scheduler/opensearch-reports-scheduler-{{site.opensearch_major_minor_version}}.0.0.zip +``` + + +### Index State Management + +```bash +sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-index-management/opensearch-index-management-{{site.opensearch_major_minor_version}}.2.0.zip +``` + +To install Index State Management, you must first install the Job Scheduler plugin. ISM has a corresponding [OpenSearch Dashboards plugin](../../opensearch-dashboards/plugins) that you probably want to install as well. + + +### k-NN + +k-NN is only available as part of the all-in-one installs: Docker, RPM, and Debian. + + +### Anomaly detection + +```bash +sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-anomaly-detection/opensearch-anomaly-detection-{{site.opensearch_major_minor_version}}.0.0.zip +``` + + +### Asynchronous search + +```bash +sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/opensearch-asynchronous-search/opensearch-asynchronous-search-{{site.opensearch_major_minor_version}}.0.1.zip +``` + + +### Performance Analyzer + +```bash +sudo bin/opensearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-plugins/performance-analyzer/opensearch-performance-analyzer-{{site.opensearch_major_minor_version}}.0.0.zip +``` + +Performance Analyzer requires some manual configuration after installing the plugin: + +1. Create `/usr/lib/systemd/system/opensearch-performance-analyzer.service` based on [this file](https://github.com/opensearch-project/performance-analyzer/blob/master/packaging/opensearch-performance-analyzer.service). + +1. Make the CLI executable: + + ```bash + sudo chmod +x /usr/share/opensearch/bin/performance-analyzer-agent-cli + ``` + +1. Run the appropriate `postinst` script for your Linux distribution: + + ```bash + # Debian-based distros + sudo sh /usr/share/opensearch/plugins/opensearch-performance-analyzer/install/deb/postinst.sh 1 + + # RPM distros + sudo sh /usr/share/opensearch/plugins/opensearch-performance-analyzer/install/rpm/postinst.sh 1 + ``` + +1. Make Performance Analyzer accessible outside of the host machine + + ```bash + cd /usr/share/opensearch # navigate to the OpenSearch home directory + cd plugins/opensearch_performance_analyzer/pa_config/ + vi performance-analyzer.properties + ``` + + Uncomment the line `#webservice-bind-host` and set it to `0.0.0.0`: + + ```bash + # ======================== OpenSearch performance analyzer plugin config ========================= + + # NOTE: this is an example for Linux. Please modify the config accordingly if you are using it under other OS. + + # WebService bind host; default to all interfaces + webservice-bind-host = 0.0.0.0 + + # Metrics data location + metrics-location = /dev/shm/performanceanalyzer/ + + # Metrics deletion interval (minutes) for metrics data. + # Interval should be between 1 to 60. + metrics-deletion-interval = 1 + + # If set to true, the system cleans up the files behind it. So at any point, we should expect only 2 + # metrics-db-file-prefix-path files. If set to false, no files are cleaned up. This can be useful, if you are archiving + # the files and wouldn't like for them to be cleaned up. + cleanup-metrics-db-files = true + + # WebService exposed by App's port + webservice-listener-port = 9600 + + # Metric DB File Prefix Path location + metrics-db-file-prefix-path = /tmp/metricsdb_ + + https-enabled = false + + #Setup the correct path for certificates + certificate-file-path = specify_path + + private-key-file-path = specify_path + + # Plugin Stats Metadata file name, expected to be in the same location + plugin-stats-metadata = plugin-stats-metadata + + # Agent Stats Metadata file name, expected to be in the same location + agent-stats-metadata = agent-stats-metadata + ``` + +1. Start the OpenSearch service: + + ```bash + sudo systemctl start opensearch.service + ``` + +1. Send a test request: + + ```bash + curl -XGET "localhost:9600/_opensearch/_performanceanalyzer/metrics?metrics=Latency,CPU_Utilization&agg=avg,max&dim=ShardID&nodes=all" + ``` + + +## List installed plugins + +To check your installed plugins: + +```bash +sudo bin/opensearch-plugin list +``` + + +## Remove plugins + +If you are removing Performance Analyzer, see below. Otherwise, you can remove the plugin with a single command: + +```bash +sudo bin/opensearch-plugin remove +``` + +Then restart OpenSearch on the node: + +```bash +sudo systemctl restart opensearch.service +``` + +## Update plugins + +OpenSearch doesn't update plugins. Instead, you have to remove and reinstall them: + +```bash +sudo bin/opensearch-plugin remove +sudo bin/opensearch-plugin install +``` diff --git a/docs/install/tar.md b/docs/install/tar.md new file mode 100644 index 00000000..5b5eccca --- /dev/null +++ b/docs/install/tar.md @@ -0,0 +1,178 @@ +--- +layout: default +title: Tarball +parent: Install and configure +nav_order: 50 +--- + +# Tarball + +The tarball installation works on Linux systems and provides a self-contained directory with everything you need to run OpenSearch, including an integrated Java Development Kit (JDK). The tarball is a good option for testing and development. + +The tarball supports CentOS 7, Amazon Linux 2, Ubuntu 18.04, and most other Linux distributions. If you have your own Java installation and you set `JAVA_HOME` in the terminal, macOS works as well. + +1. Download the tarball: + + ```bash + # x64 + curl https://d3g5vo6xdbdb9a.cloudfront.net/tarball/opensearch/opensearch-{{site.opensearch_version}}-linux-x64.tar.gz -o opensearch-{{site.opensearch_version}}-linux-x64.tar.gz + # ARM64 + curl https://d3g5vo6xdbdb9a.cloudfront.net/tarball/opensearch/opensearch-{{site.opensearch_version}}-linux-arm64.tar.gz -o opensearch-{{site.opensearch_version}}-linux-arm64.tar.gz + ``` + +1. Download the checksum: + + ```bash + # x86 + curl https://d3g5vo6xdbdb9a.cloudfront.net/tarball/opensearch/opensearch-{{site.opensearch_version}}-linux-x64.tar.gz.sha512 -o opensearch-{{site.opensearch_version}}-linux-x64.tar.gz.sha512 + # ARM64 + curl https://d3g5vo6xdbdb9a.cloudfront.net/tarball/opensearch/opensearch-{{site.opensearch_version}}-linux-arm64.tar.gz.sha512 -o opensearch-{{site.opensearch_version}}-linux-arm64.tar.gz.sha512 + ``` + +1. Verify the tarball against the checksum: + + ```bash + # x64 + shasum -a 512 -c opensearch-{{site.opensearch_version}}-linux-x64.tar.gz.sha512 + # ARM64 + shasum -a 512 -c opensearch-{{site.opensearch_version}}-linux-arm64.tar.gz.sha512 + ``` + + On CentOS, you might not have `shasum`. Install this package: + + ```bash + sudo yum install perl-Digest-SHA + ``` + + Due to a [known issue](https://github.com/opensearch/opensearch-build/issues/81) with the checksum, this step might fail. You can still proceed with the installation. + +1. Extract the TAR file to a directory and change to that directory: + + ```bash + # x64 + tar -zxf opensearch-{{site.opensearch_version}}-linux-x64.tar.gz + cd opensearch-{{site.opensearch_version}} + # ARM64 + tar -zxf opensearch-{{site.opensearch_version}}-linux-arm64.tar.gz + cd opensearch-{{site.opensearch_version}} + ``` + +1. Run OpenSearch: + + ```bash + ./opensearch-tar-install.sh + ``` + +1. Open a second terminal session, and send requests to the server to verify that OpenSearch is up and running: + + ```bash + curl -XGET https://localhost:9200 -u 'admin:admin' --insecure + curl -XGET https://localhost:9200/_cat/plugins?v -u 'admin:admin' --insecure + ``` + + +## Configuration + +You can modify `config/opensearch.yml` or specify environment variables as arguments using `-E`: + +```bash +./opensearch-tar-install.sh -Ecluster.name=opensearch-cluster -Enode.name=opensearch-node1 -Ehttp.host=0.0.0.0 -Ediscovery.type=single-node +``` + +For other settings, see [Important settings](../docker/#important-settings). + + +### (Optional) Set up Performance Analyzer + +In a tarball installation, Performance Analyzer collects data when it is enabled. But in order to read that data using the REST API on port 9600, you must first manually launch the associated reader agent process: + +1. Make Performance Analyzer accessible outside of the host machine + + ```bash + cd /usr/share/opensearch # navigate to the OpenSearch home directory + cd plugins/opensearch_performance_analyzer/pa_config/ + vi performance-analyzer.properties + ``` + + Uncomment the line `#webservice-bind-host` and set it to `0.0.0.0`: + + ``` + # ======================== OpenSearch performance analyzer plugin config ========================= + + # NOTE: this is an example for Linux. Please modify the config accordingly if you are using it under other OS. + + # WebService bind host; default to all interfaces + webservice-bind-host = 0.0.0.0 + + # Metrics data location + metrics-location = /dev/shm/performanceanalyzer/ + + # Metrics deletion interval (minutes) for metrics data. + # Interval should be between 1 to 60. + metrics-deletion-interval = 1 + + # If set to true, the system cleans up the files behind it. So at any point, we should expect only 2 + # metrics-db-file-prefix-path files. If set to false, no files are cleaned up. This can be useful, if you are archiving + # the files and wouldn't like for them to be cleaned up. + cleanup-metrics-db-files = true + + # WebService exposed by App's port + webservice-listener-port = 9600 + + # Metric DB File Prefix Path location + metrics-db-file-prefix-path = /tmp/metricsdb_ + + https-enabled = false + + #Setup the correct path for certificates + certificate-file-path = specify_path + + private-key-file-path = specify_path + + # Plugin Stats Metadata file name, expected to be in the same location + plugin-stats-metadata = plugin-stats-metadata + + # Agent Stats Metadata file name, expected to be in the same location + agent-stats-metadata = agent-stats-metadata + ``` + +1. Make the CLI executable: + + ```bash + sudo chmod +x ./bin/performance-analyzer-agent-cli + ``` + +1. Launch the agent CLI: + + ```bash + ES_HOME="$PWD" ./bin/performance-analyzer-agent-cli + ``` + +1. In a separate window, enable the Performance Analyzer plugin: + + ```bash + curl -XPOST localhost:9200/_opensearch/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' + ``` + + If you receive the `curl: (52) Empty reply from server` error, you are likely protecting your cluster with the security plugin and you need to provide credentials. Modify the following command to use your username and password: + + ```bash + curl -XPOST https://localhost:9200/_opensearch/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' -u 'admin:admin' -k + ``` + +1. Finally, enable the Root Cause Analyzer (RCA) framework + + ```bash + curl -XPOST localhost:9200/_opensearch/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' + ``` + + Similar to step 4, if you run into `curl: (52) Empty reply from server`, run the command below to enable RCA + + ```bash + curl -XPOST https://localhost:9200/_opensearch/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' -u 'admin:admin' -k + ``` + + +### (Optional) Removing Performance Analyzer + +See [Clean up Performance Analyzer files](../plugins/#optional-clean-up-performance-analyzer-files). diff --git a/docs/knn/api.md b/docs/knn/api.md new file mode 100644 index 00000000..04457e3f --- /dev/null +++ b/docs/knn/api.md @@ -0,0 +1,153 @@ +--- +layout: default +title: API +nav_order: 4 +parent: k-NN +has_children: false +--- + +# API + +The k-NN plugin adds two API operations in order to allow users to better manage the plugin's functionality. + + +## Stats + +The k-NN `stats` API provides information about the current status of the k-NN Plugin. The plugin keeps track of both cluster level and node level stats. Cluster level stats have a single value for the entire cluster. Node level stats have a single value for each node in the cluster. You can filter their query by nodeID and statName in the following way: +``` +GET /_opensearch/_knn/nodeId1,nodeId2/stats/statName1,statName2 +``` + +Statistic | Description +:--- | :--- +`circuit_breaker_triggered` | Indicates whether the circuit breaker is triggered. This statistic is only relevant to approximate k-NN search. +`total_load_time` | The time in nanoseconds that KNN has taken to load graphs into the cache. This statistic is only relevant to approximate k-NN search. +`eviction_count` | The number of graphs that have been evicted from the cache due to memory constraints or idle time. Note: Explicit evictions that occur because of index deletion are not counted. This statistic is only relevant to approximate k-NN search. +`hit_count` | The number of cache hits. A cache hit occurs when a user queries a graph and it is already loaded into memory. This statistic is only relevant to approximate k-NN search. +`miss_count` | The number of cache misses. A cache miss occurs when a user queries a graph and it has not yet been loaded into memory. This statistic is only relevant to approximate k-NN search. +`graph_memory_usage` | Current cache size (total size of all graphs in memory) in kilobytes. This statistic is only relevant to approximate k-NN search. +`graph_memory_usage_percentage` | The current weight of the cache as a percentage of the maximum cache capacity. +`graph_index_requests` | The number of requests to add the knn_vector field of a document into a graph. +`graph_index_errors` | The number of requests to add the knn_vector field of a document into a graph that have produced an error. +`graph_query_requests` | The number of graph queries that have been made. +`graph_query_errors` | The number of graph queries that have produced an error. +`knn_query_requests` | The number of KNN query requests received. +`cache_capacity_reached` | Whether `knn.memory.circuit_breaker.limit` has been reached. This statistic is only relevant to approximate k-NN search. +`load_success_count` | The number of times KNN successfully loaded a graph into the cache. This statistic is only relevant to approximate k-NN search. +`load_exception_count` | The number of times an exception occurred when trying to load a graph into the cache. This statistic is only relevant to approximate k-NN search. +`indices_in_cache` | For each index that has graphs in the cache, this stat provides the number of graphs that index has and the total graph_memory_usage that index is using in Kilobytes. +`script_compilations` | The number of times the KNN script has been compiled. This value should usually be 1 or 0, but if the cache containing the compiled scripts is filled, the KNN script might be recompiled. This statistic is only relevant to k-NN score script search. +`script_compilation_errors` | The number of errors during script compilation. This statistic is only relevant to k-NN score script search. +`script_query_requests` | The total number of script queries. This statistic is only relevant to k-NN score script search. +`script_query_errors` | The number of errors during script queries. This statistic is only relevant to k-NN score script search. + + +### Usage + +```json +GET /_opensearch/_knn/stats?pretty +{ + "_nodes" : { + "total" : 1, + "successful" : 1, + "failed" : 0 + }, + "cluster_name" : "_run", + "circuit_breaker_triggered" : false, + "nodes" : { + "HYMrXXsBSamUkcAjhjeN0w" : { + "eviction_count" : 0, + "miss_count" : 1, + "graph_memory_usage" : 1, + "graph_memory_usage_percentage" : 3.68, + "graph_index_requests" : 7, + "graph_index_errors" : 1, + "knn_query_requests" : 4, + "graph_query_requests" : 30, + "graph_query_errors" : 15, + "indices_in_cache" : { + "myindex" : { + "graph_memory_usage" : 2, + "graph_memory_usage_percentage" : 3.68, + "graph_count" : 2 + } + }, + "cache_capacity_reached" : false, + "load_exception_count" : 0, + "hit_count" : 0, + "load_success_count" : 1, + "total_load_time" : 2878745, + "script_compilations" : 1, + "script_compilation_errors" : 0, + "script_query_requests" : 534, + "script_query_errors" : 0 + } + } +} +``` + +```json +GET /_opensearch/_knn/HYMrXXsBSamUkcAjhjeN0w/stats/circuit_breaker_triggered,graph_memory_usage?pretty +{ + "_nodes" : { + "total" : 1, + "successful" : 1, + "failed" : 0 + }, + "cluster_name" : "_run", + "circuit_breaker_triggered" : false, + "nodes" : { + "HYMrXXsBSamUkcAjhjeN0w" : { + "graph_memory_usage" : 1 + } + } +} +``` + + +## Warmup operation + +The Hierarchical Navigable Small World (HNSW) graphs that are used to perform an approximate k-Nearest Neighbor (k-NN) search are stored as `.hnsw` files with other Apache Lucene segment files. In order for you to perform a search on these graphs using the k-NN plugin, these files need to be loaded into native memory. + +If the plugin has not loaded the graphs into native memory, it loads them when it receives a search request. This loading time can cause high latency during initial queries. To avoid this situation, users often run random queries during a warmup period. After this warmup period, the graphs are loaded into native memory and their production workloads can begin. This loading process is indirect and requires extra effort. + +As an alternative, you can avoid this latency issue by running the k-NN plugin warmup API operation on whatever indices you're interested in searching. This operation loads all the graphs for all of the shards (primaries and replicas) of all the indices specified in the request into native memory. + +After the process finishes, you can start searching against the indices with no initial latency penalties. The warmup API operation is idempotent, so if a segment's graphs are already loaded into memory, this operation has no impact on those graphs. It only loads graphs that aren't currently in memory. + + +### Usage + +This request performs a warmup on three indices: + +```json +GET /_opensearch/_knn/warmup/index1,index2,index3?pretty +{ + "_shards" : { + "total" : 6, + "successful" : 6, + "failed" : 0 + } +} +``` + +`total` indicates how many shards the k-NN plugin attempted to warm up. The response also includes the number of shards the plugin succeeded and failed to warm up. + +The call does not return until the warmup operation is complete or the request times out. If the request times out, the operation still continues on the cluster. To monitor the warmup operation, use the OpenSearch `_tasks` API: + +```json +GET /_tasks +``` + +After the operation has finished, use the [k-NN `_stats` API operation](#Stats) to see what the k-NN plugin loaded into the graph. + + +### Best practices + +For the warmup operation to function properly, follow these best practices. + +First, don't run merge operations on indices that you want to warm up. During merge, the k-NN plugin creates new segments, and old segments are (sometimes) deleted. For example, you could encounter a situation in which the warmup API operation loads graphs A and B into native memory, but segment C is created from segments A and B being merged. The graphs for A and B would no longer be in memory, and graph C would also not be in memory. In this case, the initial penalty for loading graph C is still present. + +Second, confirm that all graphs you want to warm up can fit into native memory. For more information about the native memory limit, see the [knn.memory.circuit_breaker.limit statistic](../settings/#cluster-settings). High graph memory usage causes cache thrashing, which can lead to operations constantly failing and attempting to run again. + +Finally, don't index any documents that you want to load into the cache. Writing new information to segments prevents the warmup API operation from loading the graphs until they're searchable. This means that you would have to run the warmup operation again after indexing finishes. diff --git a/docs/knn/approximate-knn.md b/docs/knn/approximate-knn.md new file mode 100644 index 00000000..4902e331 --- /dev/null +++ b/docs/knn/approximate-knn.md @@ -0,0 +1,162 @@ +--- +layout: default +title: Approximate Search +nav_order: 1 +parent: k-NN +has_children: false +has_math: true +--- + +# Approximate k-NN Search + +The approximate k-NN method uses [nmslib's](https://github.com/nmslib/nmslib/) implementation of the HNSW algorithm to power k-NN search. In this case, approximate means that for a given search, the neighbors returned are an estimate of the true k-nearest neighbors. Of the three methods, this method offers the best search scalability for large data sets. Generally speaking, once the data set gets into the hundreds of thousands of vectors, this approach should be preferred. + +This plugin builds an HNSW graph of the vectors for each "knn-vector field"/ "Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about Lucene segments, please refer to [Apache Lucene's documentation](https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). These graphs are loaded into native memory during search and managed by a cache. To learn more about pre-loading graphs into memory, refer to the [warmup API](../api#warmup). Additionally, you can see what graphs are already loaded in memory, which you can learn more about in the [stats API section](../api#stats). + +Because the graphs are constructed during indexing, it is not possible to apply a filter on an index and then use this search method. All filters will be applied on the results produced by the approximate nearest neighbor search. + +## Get started with approximate k-NN + +To use the k-NN plugin's approximate search functionality, you must first create a k-NN index with setting `index.knn` to `true`. This setting tells the plugin to create HNSW graphs for the index. + +Additionally, if you are using the approximate k-nearest neighbor method, you should specify `knn.space_type` to the space that you are interested in. This setting cannot be changed after it is set. To see what spaces we support, please refer to the [spaces section](#spaces). By default, `index.knn.space_type` is `l2`. For more information on index settings, such as algorithm parameters that can be tweaked to tune performance, please refer to the [documentation](../settings#index-settings). + +Next, you must add one or more fields of the `knn_vector` data type. Here is an example that creates an index with two `knn_vector` fields and uses cosine similarity: + +```json +PUT my-knn-index-1 +{ + "settings": { + "index": { + "knn": true, + "knn.space_type": "cosinesimil" + } + }, + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 2 + }, + "my_vector2": { + "type": "knn_vector", + "dimension": 4 + } + } + } +} +``` + +The `knn_vector` data type supports a vector of floats that can have a dimension of up to 10,000, as set by the dimension mapping parameter. + +In OpenSearch, codecs handle the storage and retrieval of indices. The k-NN plugin uses a custom codec to write vector data to graphs so that the underlying k-NN search library can read it. +{: .tip } + +After you create the index, you can add some data to it: + +```json +POST _bulk +{ "index": { "_index": "my-knn-index-1", "_id": "1" } } +{ "my_vector1": [1.5, 2.5], "price": 12.2 } +{ "index": { "_index": "my-knn-index-1", "_id": "2" } } +{ "my_vector1": [2.5, 3.5], "price": 7.1 } +{ "index": { "_index": "my-knn-index-1", "_id": "3" } } +{ "my_vector1": [3.5, 4.5], "price": 12.9 } +{ "index": { "_index": "my-knn-index-1", "_id": "4" } } +{ "my_vector1": [5.5, 6.5], "price": 1.2 } +{ "index": { "_index": "my-knn-index-1", "_id": "5" } } +{ "my_vector1": [4.5, 5.5], "price": 3.7 } +{ "index": { "_index": "my-knn-index-1", "_id": "6" } } +{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 10.3 } +{ "index": { "_index": "my-knn-index-1", "_id": "7" } } +{ "my_vector2": [2.5, 3.5, 5.6, 6.7], "price": 5.5 } +{ "index": { "_index": "my-knn-index-1", "_id": "8" } } +{ "my_vector2": [4.5, 5.5, 6.7, 3.7], "price": 4.4 } +{ "index": { "_index": "my-knn-index-1", "_id": "9" } } +{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 8.9 } + +``` + +Then you can execute an approximate nearest neighbor search on the data using the `knn` query type: + +```json +GET my-knn-index-1/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector2": { + "vector": [2, 3, 5, 6], + "k": 2 + } + } + } +} +``` + +`k` is the number of neighbors the search of each graph will return. You must also include the `size` option. This option indicates how many results the query actually returns. The plugin returns `k` amount of results for each shard (and each segment) and `size` amount of results for the entire query. The plugin supports a maximum `k` value of 10,000. + +### Using approximate k-NN with filters +If you use the `knn` query alongside filters or other clauses (e.g. `bool`, `must`, `match`), you might receive fewer than `k` results. In this example, `post_filter` reduces the number of results from 2 to 1: + +```json +GET my-knn-index-1/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector2": { + "vector": [2, 3, 5, 6], + "k": 2 + } + } + }, + "post_filter": { + "range": { + "price": { + "gte": 5, + "lte": 10 + } + } + } +} +``` + +## Spaces + +A space corresponds to the function used to measure the distance between 2 points in order to determine the k-nearest neighbors. From the k-NN perspective, a lower score equates to a closer and better result. This is the opposite of how OpenSearch scores results, where a greater score equates to a better result. To convert distances to OpenSearch scores, we take 1 / (1 + distance). Currently, the k-NN plugin supports the following spaces: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
spaceTypeDistance FunctionOpenSearch Score
l2\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i)^2 \]1 / (1 + Distance Function)
l1\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i) \]1 / (1 + Distance Function)
cosinesimil\[ 1 - {A · B \over \|A\| · \|B\|} = 1 - + {\sum_{i=1}^n (A_i · B_i) \over \sqrt{\sum_{i=1}^n A_i^2} · \sqrt{\sum_{i=1}^n B_i^2}}\] + where \(\|A\|\) and \(\|B\|\) represent normalized vectors.1 / (1 + Distance Function)
hammingbitDistance = countSetBits(X \(\oplus\) Y)1 / (1 + Distance Function)
+ +The cosine similarity formula does not include the `1 - ` prefix. However, because nmslib equates smaller scores with closer results, they return `1 - cosineSimilarity` for their cosine similarity space---that's why `1 - ` is included in the distance function. +{: .note } diff --git a/docs/knn/index.md b/docs/knn/index.md new file mode 100644 index 00000000..e8d79fad --- /dev/null +++ b/docs/knn/index.md @@ -0,0 +1,42 @@ +--- +layout: default +title: k-NN +nav_order: 50 +has_children: true +has_toc: false +--- + +# k-NN + +Short for *k-nearest neighbors*, the k-NN plugin enables users to search for the k-nearest neighbors to a query point across an index of vectors. To determine the neighbors, you can specify the space (the distance function) you want to use to measure the distance between points. + +Use cases include recommendations (for example, an "other songs you might like" feature in a music application), image recognition, and fraud detection. For more background information on k-NN search, see [Wikipedia](https://en.wikipedia.org/wiki/Nearest_neighbor_search). + +This plugin supports three different methods for obtaining the k-nearest neighbors from an index of vectors: + +1. **Approximate k-NN** + + The first method takes an approximate nearest neighbor approach; it uses the HNSW algorithm to return the approximate k-nearest neighbors to a query vector. This algorithm sacrifices indexing speed and search accuracy in return for lower latency and more scalable search. To learn more about the algorithm, please refer to [nmslib's documentation](https://github.com/nmslib/nmslib/) or [the paper introducing the algorithm](https://arxiv.org/abs/1603.09320). + + Approximate k-NN is the best choice for searches over large indices (i.e. hundreds of thousands of vectors or more) that require low latency. You should not use approximate k-NN if you want to apply a filter on the index before the k-NN search, which greatly reduces the number of vectors to be searched. In this case, you should use either the script scoring method or painless extensions. + + For more details about this method, refer to the [Approximate k-NN section](approximate-knn). + +2. **Script Score k-NN** + + The second method extends OpenSearch's script scoring functionality to execute a brute force, exact k-NN search over "knn_vector" fields or fields that can represent binary objects. With this approach, you can run k-NN search on a subset of vectors in your index (sometimes referred to as a pre-filter search). + + This approach should be used for searches over smaller bodies of documents or when a pre-filter is needed. Using this approach on large indices may lead to high latencies. + + For more details about this method, refer to the [k-NN Script Score section](knn-score-script). + +3. **Painless extensions** + + The third method adds the distance functions as painless extensions that you can use in more complex combinations. Similar to the k-NN Script Score, you can use this method to perform a brute force, exact k-NN search across an index, which also supports pre-filtering. + + This approach has slightly slower query performance compared to the k-NN Script Score. If your use case requires more customization over the final score, you should use this approach over Script Score k-NN. + + For more details about this method, refer to the [painless functions section](painless-functions). + + +Overall, for larger data sets, you should generally choose the approximate nearest neighbor method because it scales significantly better. For smaller data sets, where you may want to apply a filter, you should choose the custom scoring approach. If you have a more complex use case where you need to use a distance function as part of their scoring method, you should use the painless scripting approach. diff --git a/docs/knn/jni-library.md b/docs/knn/jni-library.md new file mode 100644 index 00000000..9c751212 --- /dev/null +++ b/docs/knn/jni-library.md @@ -0,0 +1,10 @@ +--- +layout: default +title: JNI Library +nav_order: 5 +parent: k-NN +has_children: false +--- + +# JNI Library +In order to integrate [nmslib's](https://github.com/nmslib/nmslib/) approximate k-NN functionality, which is implemented in C++, into the k-NN plugin, which is implemented in Java, we created a Java Native Interface library. Check out [this wiki](https://en.wikipedia.org/wiki/Java_Native_Interface) to learn more about JNI. This library allows the k-NN plugin to leverage nmslib's functionality. For more information about how we build the JNI library binary and how to get the most of it in your production environment, see [here](https://github.com/opensearch-project/k-NN#jni-library-artifacts). diff --git a/docs/knn/knn-score-script.md b/docs/knn/knn-score-script.md new file mode 100644 index 00000000..d7e2556c --- /dev/null +++ b/docs/knn/knn-score-script.md @@ -0,0 +1,212 @@ +--- +layout: default +title: Exact k-NN with Scoring Script +nav_order: 2 +parent: k-NN +has_children: false +has_math: true +--- + +# Exact k-NN with Scoring Script +The k-NN plugin implements the OpenSearch score script plugin that you can use to find the exact k-nearest neighbors to a given query point. Using the k-NN score script, you can apply a filter on an index before executing the nearest neighbor search. This is useful for dynamic search cases where the index body may vary based on other conditions. Because this approach executes a brute force search, it does not scale as well as the [Approximate approach](../approximate-knn). In some cases, it may be better to think about refactoring your workflow or index structure to use the Approximate approach instead of this approach. + +## Getting started with the score script + +Similar to approximate nearest neighbor search, in order to use the score script on a body of vectors, you must first create an index with one or more `knn_vector` fields. If you intend to just use the script score approach (and not the approximate approach) `index.knn` can be set to `false` and `index.knn.space_type` does not need to be set. The space type can be chosen during search. See the [spaces section](#spaces) to see what spaces the k-NN score script suppports. Here is an example that creates an index with two `knn_vector` fields: + +```json +PUT my-knn-index-1 +{ + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 2 + }, + "my_vector2": { + "type": "knn_vector", + "dimension": 4 + } + } + } +} +``` + +*Note* -- For binary spaces, such as the Hamming bit space, `type` needs to be either `binary` or `long`. The binary data then needs to be encoded either as a base64 string or as a long (if the data is 64 bits or less). + +If you *only* want to use the score script, you can omit `"index.knn": true`. The benefit of this approach is faster indexing speed and lower memory usage, but you lose the ability to perform standard k-NN queries on the index. +{: .tip} + +After you create the index, you can add some data to it: + +```json +POST _bulk +{ "index": { "_index": "my-knn-index-1", "_id": "1" } } +{ "my_vector1": [1.5, 2.5], "price": 12.2 } +{ "index": { "_index": "my-knn-index-1", "_id": "2" } } +{ "my_vector1": [2.5, 3.5], "price": 7.1 } +{ "index": { "_index": "my-knn-index-1", "_id": "3" } } +{ "my_vector1": [3.5, 4.5], "price": 12.9 } +{ "index": { "_index": "my-knn-index-1", "_id": "4" } } +{ "my_vector1": [5.5, 6.5], "price": 1.2 } +{ "index": { "_index": "my-knn-index-1", "_id": "5" } } +{ "my_vector1": [4.5, 5.5], "price": 3.7 } +{ "index": { "_index": "my-knn-index-1", "_id": "6" } } +{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 10.3 } +{ "index": { "_index": "my-knn-index-1", "_id": "7" } } +{ "my_vector2": [2.5, 3.5, 5.6, 6.7], "price": 5.5 } +{ "index": { "_index": "my-knn-index-1", "_id": "8" } } +{ "my_vector2": [4.5, 5.5, 6.7, 3.7], "price": 4.4 } +{ "index": { "_index": "my-knn-index-1", "_id": "9" } } +{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 8.9 } + +``` + +Finally, you can execute an exact nearest neighbor search on the data using the `knn` script: +```json +GET my-knn-index-1/_search +{ + "size": 4, + "query": { + "script_score": { + "query": { + "match_all": {} + }, + "script": { + "source": "knn_score", + "lang": "knn", + "params": { + "field": "my_vector2", + "query_value": [2.0, 3.0, 5.0, 6.0], + "space_type": "cosinesimil" + } + } + } + } +} +``` + +All parameters are required. + +- `lang` is the script type. This value is usually `painless`, but here you must specify `knn`. +- `source` is the name of the script, `knn_score`. + + This script is part of the k-NN plugin and isn't available at the standard `_scripts` path. A GET request to `_cluster/state/metadata` doesn't return it, either. + +- `field` is the field that contains your vector data. +- `query_value` is the point you want to find the nearest neighbors for. For the Euclidean and cosine similarity spaces, the value must be an array of floats that matches the dimension set in the field's mapping. For Hamming bit distance, this value can be either of type signed long or a base64-encoded string (for the long and binary field types, respectively). +- `space_type` corresponds to the distance function. See the [spaces section](#spaces). + + +*Note* -- In later versions of the k-NN plugin, `vector` was replaced by `query_value` due to the addition of the `bithamming` space. + + +The [post filter example in the approximate approach](../approximate-knn/#using-approximate-k-nn-with-filters) shows a search that returns fewer than `k` results. If you want to avoid this situation, the score script method lets you essentially invert the order of events. In other words, you can filter down the set of documents you want to execute the k-nearest neighbor search over. + +This example shows a pre-filter approach to k-NN search with the score script approach. First, create the index: + +```json +PUT my-knn-index-2 +{ + "mappings": { + "properties": { + "my_vector": { + "type": "knn_vector", + "dimension": 2 + }, + "color": { + "type": "keyword" + } + } + } +} +``` + +Then add some documents: + +```json +POST _bulk +{ "index": { "_index": "my-knn-index-2", "_id": "1" } } +{ "my_vector": [1, 1], "color" : "RED" } +{ "index": { "_index": "my-knn-index-2", "_id": "2" } } +{ "my_vector": [2, 2], "color" : "RED" } +{ "index": { "_index": "my-knn-index-2", "_id": "3" } } +{ "my_vector": [3, 3], "color" : "RED" } +{ "index": { "_index": "my-knn-index-2", "_id": "4" } } +{ "my_vector": [10, 10], "color" : "BLUE" } +{ "index": { "_index": "my-knn-index-2", "_id": "5" } } +{ "my_vector": [20, 20], "color" : "BLUE" } +{ "index": { "_index": "my-knn-index-2", "_id": "6" } } +{ "my_vector": [30, 30], "color" : "BLUE" } + +``` + +Finally, use the `script_score` query to pre-filter your documents before identifying nearest neighbors: + +```json +GET my-knn-index-2/_search +{ + "size": 2, + "query": { + "script_score": { + "query": { + "bool": { + "filter": { + "term": { + "color": "BLUE" + } + } + } + }, + "script": { + "lang": "knn", + "source": "knn_score", + "params": { + "field": "my_vector", + "query_value": [9.9, 9.9], + "space_type": "l2" + } + } + } + } +} +``` + +## Spaces + +A space corresponds to the function used to measure the distance between 2 points in order to determine the k-nearest neighbors. From the k-NN perspective, a lower score equates to a closer and better result. This is the opposite of how OpenSearch scores results, where a greater score equates to a better result. We include the conversions to OpenSearch scores in the table below: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
spaceTypeDistance FunctionOpenSearch Score
l2\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i)^2 \]1 / (1 + Distance Function)
l1\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i) \]1 / (1 + Distance Function)
cosinesimil\[ {A · B \over \|A\| · \|B\|} = + {\sum_{i=1}^n (A_i · B_i) \over \sqrt{\sum_{i=1}^n A_i^2} · \sqrt{\sum_{i=1}^n B_i^2}}\] + where \(\|A\|\) and \(\|B\|\) represent normalized vectors.1 + Distance Function
hammingbitDistance = countSetBits(X \(\oplus\) Y) 1 / (1 + Distance Function)
+ + +Cosine similarity returns a number between -1 and 1, and because OpenSearch relevance scores can't be below 0, the k-NN plugin adds 1 to get the final score. diff --git a/docs/knn/painless-functions.md b/docs/knn/painless-functions.md new file mode 100644 index 00000000..7b022f82 --- /dev/null +++ b/docs/knn/painless-functions.md @@ -0,0 +1,83 @@ +--- +layout: default +title: k-NN Painless Extensions +nav_order: 3 +parent: k-NN +has_children: false +has_math: true +--- + +# Painless Scripting Functions + +With the k-NN plugin's Painless Scripting extensions, you can use k-NN distance functions directly in your Painless scripts to perform operations on `knn_vector` fields. Painless has a strict list of allowed functions and classes per context to ensure its scripts are secure. The k-NN plugin adds Painless Scripting extensions to a few of the distance functions used in [k-NN score script](../knn-score-script), so you can utilize them when you need more customization with respect to your k-NN workload. + +## Get started with k-NN's Painless Scripting functions + +To use k-NN's Painless Scripting functions, first, you must create an index with `knn_vector` fields like in [k-NN score script](../knn-score-script#Getting-started-with-the-score-script). Once the index is created and you have ingested some data, you can use the painless extensions: + +```json +GET my-knn-index-2/_search +{ + "size": 2, + "query": { + "script_score": { + "query": { + "bool": { + "filter": { + "term": { + "color": "BLUE" + } + } + } + }, + "script": { + "source": "1.0 + cosineSimilarity(params.query_value, doc[params.field])", + "params": { + "field": "my_vector", + "query_value": [9.9, 9.9] + } + } + } + } +} +``` + +`field` needs to map to a `knn_vector` field, and `query_value` needs to be a floating point array with the same dimension as `field`. + +## Function types +The following table contains the available painless functions the k-NN plugin provides: + + + + + + + + + + + + + + + + + + + + + + + + +
Function NameFunction SignatureDescription
l2Squaredfloat l2Squared (float[] queryVector, doc['vector field'])This function calculates the square of the L2 distance (Euclidean distance) between a given query vector and document vectors. The shorter the distance, the more relevant the document is, so this example inverts the return value of the l2Squared function. If the document vector matches the query vector, the result is 0, so this example also adds 1 to the distance to avoid divide by zero errors.
l1Normfloat l1Norm (float[] queryVector, doc['vector field'])This function calculates the L1 Norm distance (Manhattan distance) between a given query vector and document vectors.
cosineSimilarityfloat cosineSimilarity (float[] queryVector, doc['vector field'])Cosine similarity is an inner product of the query vector and document vector normalized to both have length 1. If magnitude of the query vector does not change throughout the query, users can pass the magnitude of the query vector to improve the performance, instead of calculating the magnitude every time for every filtered document: float cosineSimilarity (float[] queryVector, doc['vector field'], float normQueryVector). In general, range of cosine similarity is [-1, 1], but in the case of information retrieval, the cosine similarity of two documents will range from 0 to 1 because tf-idf cannot be negative. Hence, the k-NN plugin adds 1.0 to always yield a positive cosine similarity score.
+ + +## Constraints +1. If a document’s `knn_vector` field has different dimensions than the query, the function throws an `IllegalArgumentException`. +2. If a vector field doesn't have a value, the function throws an IllegalStateException. + You can avoid this situation by first checking if a document has a value in its field: +``` + "source": "doc[params.field].size() == 0 ? 0 : 1 / (1 + l2Squared(params.query_value, doc[params.field]))", +``` +Because scores can only be positive, this script ranks documents with vector fields higher than those without. diff --git a/docs/knn/performance-tuning.md b/docs/knn/performance-tuning.md new file mode 100644 index 00000000..7ee83656 --- /dev/null +++ b/docs/knn/performance-tuning.md @@ -0,0 +1,104 @@ +--- +layout: default +title: Performance Tuning +parent: k-NN +nav_order: 7 +--- + +# Performance tuning + +This section provides recommendations for performance tuning to improve indexing/search performance for approximate k-NN. From a high level, k-NN works according to these principles: +* Graphs are created per knn_vector field / (Lucene) segment pair. +* Queries execute on segments sequentially inside the shard (same as any other OpenSearch query). +* Each graph in the segment returns <=k neighbors. +* Coordinator node picks up final size number of neighbors from the neighbors returned by each shard. + +Additionally, this section provides recommendations for comparing approximate k-NN to exact k-NN with score script. + +## Indexing performance tuning + +The following steps can be taken to help improve indexing performance, especially when you plan to index a large number of vectors at once: +1. Disable refresh interval (Default = 1 sec) or set a long duration for refresh interval to avoid creating multiple small segments + +```json +PUT //_settings +{ + "index" : { + "refresh_interval" : "-1" + } +} +``` +*Note* -- Be sure to reenable refresh_interval after indexing finishes. + +2. Disable Replicas (No OpenSearch replica shard). + +Settings replicas to 0 avoids duplicate construction of graphs in both primary and replicas. When we enable replicas after the indexing, the serialized graphs are directly copied. Having no replicas means that losing a node(s) may incur data loss, so it is important that the data lives elsewhere so that this initial load can be retried in case of an issue. + +3. Increase number of indexing threads + +If the hardware we choose has multiple cores, we can allow multiple threads in graph construction by speeding up the indexing process. You can determine the number of threads to be allotted by using the [knn.algo_param.index_thread_qty](../settings/#Cluster-settings) setting. + +Please keep an eye on CPU utilization and choose the right number of threads. Because graph construction is costly, having multiple threads can put additional load on CPU. + +## Search performance tuning + +1. Have fewer segments + +To improve search performance, it is necessary to keep the number of segments under control. Lucene's IndexSearcher searches over all of the segments in a shard to find the 'size' best results. But, because the complexity of search for the HNSW algorithm is logarithmic with respect to the number of vectors, searching over 5 graphs with 100 vectors each and then taking the top size results from 5*k results will take longer than searching over 1 graph with 500 vectors and then taking the top size results from k results. Ideally, having 1 segment per shard will give the optimal performance with respect to search latency. We can configure index to have multiple shards to avoid giant shards and achieve more parallelism. + +We can control the number of segments either during indexing by asking OpenSearch to slow down segment creation by disabling the refresh interval or choosing larger refresh interval. + +2. Warm up the index + +The graphs are constructed during indexing, but they are loaded into memory during the first search. The way search works in Lucene is that each segment is searched sequentially (so, for k-NN, each segment returns up to k nearest neighbors of the query point), and the top size number of results based on the score would be returned from all of the results returned by segements at a shard level (higher score --> better result). + +Once a graph is loaded (graphs are loaded outside OpenSearch JVM), we cache the graphs in memory. The initial queries would be expensive in the order of a few seconds, and subsequent queries should be faster in the order of milliseconds (assuming knn circuit breaker is not hit). + +To avoid this latency penalty during your first queries, you can use the warmup API operation on the indices they want to search. + +### Usage + +```json +GET /_opensearch/_knn/warmup/index1,index2,index3?pretty +{ + "_shards" : { + "total" : 6, + "successful" : 6, + "failed" : 0 + } +} +``` + +The warmup API operation loads all of the graphs for all of the shards (primaries and replicas) for the specified indices into the cache. Thus, there will be no penalty to load graphs during initial searches. + +*Note* - This API only loads the segments of the indices it sees into the cache. If a merge or refresh operation finishes after this API is ran or if new documents are added, this API will need to be re-ran to load those graphs into memory. + +3. Avoid reading stored fields + +If the use case is to just read the nearest neighbors' Ids and scores, then we can disable reading stored fields, which can save some time retrieving the vectors from stored fields. + +## Improving Recall + +Recall depends on multiple factors like number of vectors, number of dimensions, segments, etc. Searching over a large number of small segments and aggregating the results leads to better recall than searching over a small number of large segments and aggregating results. The larger the graph, the more chances of losing recall if you are sticking with smaller algorithm parameters. Choosing larger values for algorithm parameters should help solve this issue but sacrifices search latency and indexing time. That being said, it is important to understand your system's requirements for latency and accuracy, and then choose the number of segments you want your index to have based on experimentation. + +Recall can be configured by adjusting the algorithm parameters of the HNSW algorithm exposed through index settings. Algorithm params that control recall are m, ef_construction, ef_search. For more details on influence of algorithm parameters on the indexing and search recall, please refer to the [HNSW algorithm parameters document](https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md). Increasing these values could help recall (leading to better search results) but at the cost of higher memory utilization and increased indexing time. Our default values work on a broader set of use cases from our experiments, but we encourage users to run their own experiments on their data sets and choose the appropriate values. For index-level settings, please refer to the [settings page](../settings#index-settings). We will add details on our experiments here shortly. + +## Estimating Memory Usage + +Typically, in an OpenSearch cluster, a certain portion of RAM is set aside for the JVM heap. The k-NN plugin allocates graphs to a portion of the remaining RAM. This portion's size is determined by the circuit_breaker_limit cluster setting. By default, the circuit breaker limit is set at 50%. + +The memory required for graphs is estimated to be `1.1 * (4 * dimension + 8 * M)` bytes/vector. + +As an example, assume that we have 1 Million vectors with a dimension of 256 and M of 16, and the memory required can be estimated as: + +``` +1.1 * (4 *256 + 8 * 16) * 1,000,000 ~= 1.26 GB +``` + +*Note* -- Remember that having a replica will double the total number of vectors. + +## Approximate nearest neighbor vs. score script + +The standard k-NN query and custom scoring option perform differently. Test with a representative set of documents to see if the search results and latencies match your expectations. + +Custom scoring works best if the initial filter reduces the number of documents to no more than 20,000. Increasing shard count can improve latencies, but be sure to keep shard size within [the recommended guidelines](../../opensearch/#primary-and-replica-shards). diff --git a/docs/knn/settings.md b/docs/knn/settings.md new file mode 100644 index 00000000..5180555a --- /dev/null +++ b/docs/knn/settings.md @@ -0,0 +1,36 @@ +--- +layout: default +title: Settings +parent: k-NN +nav_order: 6 +--- + +# k-NN Settings + +The k-NN plugin adds several new index and cluster settings. + + +## Index settings + +The default values should work well for most use cases, but you can change these settings when you create the index. + +Setting | Default | Description +:--- | :--- | :--- +`index.knn.algo_param.ef_search` | 512 | The size of the dynamic list used during KNN searches. Higher values lead to more accurate, but slower searches. +`index.knn.algo_param.ef_construction` | 512 | The size of the dynamic list used during KNN graph creation. Higher values lead to a more accurate graph, but slower indexing speed. +`index.knn.algo_param.m` | 16 | The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2-100. +`index.knn.space_type` | "l2" | The vector space used to calculate the distance between vectors. Currently, the KNN plugin supports the `l2` space (Euclidean distance) and `cosinesimil` space (cosine similarity). For more information on these spaces, refer to the [nmslib documentation](https://github.com/nmslib/nmslib/blob/master/manual/spaces.md). + + +## Cluster settings + +Setting | Default | Description +:--- | :--- | :--- +`knn.algo_param.index_thread_qty` | 1 | The number of threads used for graph creation. Keeping this value low reduces the CPU impact of the KNN plugin, but also reduces indexing performance. +`knn.cache.item.expiry.enabled` | false | Whether to remove graphs that have not been accessed for a certain duration from memory. +`knn.cache.item.expiry.minutes` | 3h | If enabled, the idle time before removing a graph from memory. +`knn.circuit_breaker.unset.percentage` | 75.0 | The native memory usage threshold for the circuit breaker. Memory usage must be below this percentage of `knn.memory.circuit_breaker.limit` for `knn.circuit_breaker.triggered` to remain false. +`knn.circuit_breaker.triggered` | false | True when memory usage exceeds the `knn.circuit_breaker.unset.percentage` value. +`knn.memory.circuit_breaker.limit` | 50% | The native memory limit for graphs. At the default value, if a machine has 100 GB of memory and the JVM uses 32 GB, the k-NN plugin uses 50% of the remaining 68 GB (34 GB). If memory usage exceeds this value, KNN removes the least recently used graphs. +`knn.memory.circuit_breaker.enabled` | true | Whether to enable the KNN memory circuit breaker. +`knn.plugin.enabled`| true | Enables or disables the KNN plugin. diff --git a/docs/opensearch-dashboards/gantt.md b/docs/opensearch-dashboards/gantt.md new file mode 100644 index 00000000..cac189aa --- /dev/null +++ b/docs/opensearch-dashboards/gantt.md @@ -0,0 +1,26 @@ +--- +layout: default +title: Gantt Charts +parent: OpenSearch Dashboards +nav_order: 10 +--- + +# Gantt charts + +OpenSearch includes a Gantt chart visualization. These charts show the start, end, and duration of unique events in a sequence. Gantt charts are useful in trace analytics, telemetry, and anomaly detection use cases, where you want to understand interactions and dependencies between various events in a schedule. + +For example, consider an index of log data. The fields in a typical set of log data, especially audit logs, contain a specific operation or event with a start time and duration. + +To create a Gantt chart, do the following: + +1. In the visualizations menu, choose **Create visualization** and **Gantt Chart**. +1. Choose a source for chart (e.g. some log data). +1. Under **Metrics**, choose **Event**. For log data, each log is an event. +1. Select the `**Start Time**` and the **Duration** fields from your data set. The start time is the timestamp for the begining of an event. The duration is the amount of time to add to the start time. +1. Under **Results**, choose the number of events that you want to display on the chart. Gantt charts sequence events from earliest to latest based on start time. +1. Choose **Panel settings** to adjust axis labels, time format, and colors. +1. Choose **Update**. + +![Gantt Chart](../../images/gantt-chart.png) + +This Gantt chart the ID for each log on the Y axis. Each bar is a unique event that spans some amount of time. Hover over a bar to see the duration of that event. diff --git a/docs/opensearch-dashboards/index.md b/docs/opensearch-dashboards/index.md new file mode 100644 index 00000000..644d3e6e --- /dev/null +++ b/docs/opensearch-dashboards/index.md @@ -0,0 +1,150 @@ +--- +layout: default +title: OpenSearch Dashboards +nav_order: 11 +has_children: true +has_toc: false +--- + +# OpenSearch Dashboards + +OpenSearch Dashboards is the default visualization tool for data in OpenSearch. It also serves as a user interface for the OpenSearch [security](../security/configuration/), [alerting](../alerting/), and [Index State Management](../ism/) plugins. + + +## Run OpenSearch Dashboards using Docker + +You *can* start OpenSearch Dashboards using `docker run` after [creating a Docker network](https://docs.docker.com/engine/reference/commandline/network_create/) and starting OpenSearch, but the process of connecting OpenSearch Dashboards to OpenSearch is significantly easier with a Docker Compose file. + +1. Run `docker pull opensearch/opensearch-dashboards:{{site.opensearch_version}}`. + +1. Create a [`docker-compose.yml`](https://docs.docker.com/compose/compose-file/) file appropriate for your environment. A sample file that includes OpenSearch Dashboards is available on the OpenSearch [Docker installation page](../install/docker/#sample-docker-compose-file). + + Just like `opensearch.yml`, you can pass a custom `opensearch_dashboards.yml` to the container in the Docker Compose file. + {: .tip } + +1. Run `docker-compose up`. + + Wait for the containers to start. Then see [Get started with OpenSearch Dashboards](#get-started-with-opensearch-dashboards). + +1. When finished, run `docker-compose down`. + + +## Run OpenSearch Dashboards using the RPM or Debian package + +1. If you haven't already, add the `yum` repositories specified in steps 1--2 in [RPM](../install/rpm) or the `apt` repositories in steps 2--3 of [Debian package](../install/deb). +1. `sudo yum install opensearch-dashboards` or `sudo apt install opensearch-dashboards` +1. Modify `/etc/opensearch-dashboards/opensearch_dashboards.yml` to use `opensearch.hosts` rather than `opensearch.url`. +1. `sudo systemctl start opensearch-dashboards.service` +1. To stop OpenSearch Dashboards: + + ```bash + sudo systemctl stop opensearch-dashboards.service + ``` + + +### Configuration + +To run OpenSearch Dashboards when the system starts: + +```bash +sudo /bin/systemctl daemon-reload +sudo /bin/systemctl enable opensearch-dashboards.service +``` + +You can also modify the values in `/etc/opensearch-dashboards/opensearch_dashboards.yml`. + + +## Run OpenSearch Dashboards using the tarball + +1. Download the tarball: + + ```bash + # x64 + curl https://d3g5vo6xdbdb9a.cloudfront.net/tarball/opensearch-dashboards/opensearch-dashboards-{{site.opensearch_version}}-linux-x64.tar.gz -o opensearch-dashboards-{{site.opensearch_version}}-linux-x64.tar.gz + # ARM64 + curl https://d3g5vo6xdbdb9a.cloudfront.net/tarball/opensearch-dashboards/opensearch-dashboards-{{site.opensearch_version}}-linux-arm64.tar.gz -o opensearch-dashboards-{{site.opensearch_version}}-linux-arm64.tar.gz + ``` + +1. Download the checksum: + + ```bash + # x64 + curl https://d3g5vo6xdbdb9a.cloudfront.net/tarball/opensearch-dashboards/opensearch-dashboards-{{site.opensearch_version}}-linux-x64.tar.gz.sha512 -o opensearch-dashboards-{{site.opensearch_version}}-linux-x64.tar.gz.sha512 + # ARM64 + curl https://d3g5vo6xdbdb9a.cloudfront.net/tarball/opensearch-dashboards/opensearch-dashboards-{{site.opensearch_version}}-linux-arm64.tar.gz.sha512 -o opensearch-dashboards-{{site.opensearch_version}}-linux-arm64.tar.gz.sha512 + ``` + +1. Verify the tarball against the checksum: + + ```bash + # x64 + shasum -a 512 -c opensearch-dashboards-{{site.opensearch_version}}-linux-x64.tar.gz.sha512 + # ARM64 + shasum -a 512 -c opensearch-dashboards-{{site.opensearch_version}}-linux-arm64.tar.gz.sha512 + ``` + + On CentOS, you might not have `shasum`. Install this package: + + ```bash + sudo yum install perl-Digest-SHA + ``` + +1. Extract the TAR file to a directory and change to that directory: + + ```bash + # x64 + tar -zxf opensearch-dashboards-{{site.opensearch_version}}-linux-x64.tar.gz + cd opensearch-dashboards + # ARM64 + tar -zxf opensearch-dashboards-{{site.opensearch_version}}-linux-arm64.tar.gz + cd opensearch-dashboards + ``` + +1. If desired, modify `config/opensearch_dashboards.yml`. + +1. Run OpenSearch Dashboards: + + ```bash + ./bin/opensearch-dashboards + ``` + + +## Run OpenSearch Dashboards on Windows (ZIP) + +1. Download the ZIP. + +1. Extract [the ZIP file](https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-windows/ode-windows-zip/opensearch-dashboards-{{site.opensearch_version}}-windows-x64.zip) to a directory and open that directory at the command prompt. + +1. If desired, modify `config/opensearch_dashboards.yml`. + +1. Run OpenSearch Dashboards: + + ``` + .\bin\opensearch-dashboards.bat + ``` + + +## Run OpenSearch Dashboards on Windows (EXE) + +1. Download [the EXE file](https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-windows/opensearch-executables/opensearch-dashboards-{{site.opensearch_version}}-windows-x64.exe), run it, and click through the steps. + +1. Open the command prompt. + +1. Navigate to the OpenSearch Dashboards install directory. + +1. If desired, modify `config/opensearch_dashboards.yml`. + +1. Run OpenSearch Dashboards: + + ``` + .\bin\opensearch-dashboards.bat + ``` + + +## Get started with OpenSearch Dashboards + +1. After starting OpenSearch Dashboards, you can access it at port 5601. For example, http://localhost:5601. +1. Log in with the default username `admin` and password `admin`. +1. Choose **Try our sample data** and add the sample flight data. +1. Choose **Discover** and search for a few flights. +1. Choose **Dashboard**, **[Flights] Global Flight Dashboard**, and wait for the dashboard to load. diff --git a/docs/opensearch-dashboards/maptiles.md b/docs/opensearch-dashboards/maptiles.md new file mode 100644 index 00000000..f03854ff --- /dev/null +++ b/docs/opensearch-dashboards/maptiles.md @@ -0,0 +1,31 @@ +--- +layout: default +title: WMS Map Server +parent: OpenSearch Dashboards +nav_order: 5 +--- + +# Configure WMS map server + +Due to licensing restrictions, the default installation of OpenSearch Dashboards does in OpenSearch doesn't include a map server for tile map visualizations. To configure OpenSearch Dashboards to use a WMS map server: + +1. Open OpenSearch Dashboards at `https://:`. For example, [https://localhost:5601](https://localhost:5601). +1. If necessary, log in. +1. **Management**. +1. **Advanced Settings**. +1. Locate `visualization:tileMap:WMSdefaults`. +1. Change `enabled` to true, and add the URL of a valid WMS map server. + + ```json + { + "enabled": true, + "url": "", + "options": { + "format": "image/png", + "transparent": true + } + } + ``` + +Map services often have licensing fees or restrictions. You are responsible for all such considerations on any map server that you specify. +{: .note } diff --git a/docs/opensearch-dashboards/notebooks.md b/docs/opensearch-dashboards/notebooks.md new file mode 100644 index 00000000..4dfa98fc --- /dev/null +++ b/docs/opensearch-dashboards/notebooks.md @@ -0,0 +1,68 @@ +--- +layout: default +title: Notebooks (experimental) +parent: OpenSearch Dashboards +nav_order: 50 +redirect_from: /docs/notebooks/ +has_children: false +--- + +# OpenSearch Dashboards notebooks (experimental) + +Notebooks have a known issue with [tenants](../../security/access-control/multi-tenancy/). If you open a notebook and can't see its visualizations, you might be under the wrong tenant, or you might not have access to the tenant at all. +{: .warning } + +An OpenSearch Dashboards notebook is an interface that lets you easily combine live visualizations and narrative text in a single notebook interface. + +With OpenSearch Dashboards notebooks, you can interactively explore data by running different visualizations and share your work with team members to collaborate on a project. + +A notebook is a document composed of two elements: OpenSearch Dashboards visualizations and paragraphs (Markdown). Choose multiple timelines to compare and contrast visualizations. + +Common use cases include creating postmortem reports, designing runbooks, building live infrastructure reports, and writing documentation. + + +## Get Started with Notebooks + +To get started, choose **OpenSearch Dashboards Notebooks** in OpenSearch Dashboards. + + +### Step 1: Create a notebook + +A notebook is an interface for creating reports. + +1. Choose **Create notebook** and enter a descriptive name. +1. Choose **Create**. + +Choose **Notebook actions** to rename, duplicate, or delete a notebook. + + +### Step 2: Add a paragraph + +Paragraphs combine text and visualizations for describing data. + + +#### Add a markdown paragraph + +1. To add text, choose **Add markdown paragraph**. +1. Add rich text with markdown syntax. + +![Markdown paragraph](../../images/markdown-notebook.png) + + +#### Add a visualization paragraph + +1. To add a visualization, choose **Add OpenSearch Dashboards visualization paragraph**. +1. In **Title**, select your visualization and choose a date range. + +You can choose multiple timelines to compare and contrast visualizations. + +To run and save a paragraph, choose **Run**. + +You can perform the following actions on paragraphs: + +- Add a new paragraph to the top of a report. +- Add a new paragraph to the bottom of a report. +- Run all the paragraphs at the same time. +- Clear the outputs of all paragraphs. +- Delete all the paragraphs. +- Move paragraphs up and down. diff --git a/docs/opensearch-dashboards/plugins.md b/docs/opensearch-dashboards/plugins.md new file mode 100644 index 00000000..bd5a76be --- /dev/null +++ b/docs/opensearch-dashboards/plugins.md @@ -0,0 +1,202 @@ +--- +layout: default +title: Standalone OpenSearch Dashboards Plugin Install +parent: OpenSearch Dashboards +nav_order: 1 +--- + +# Standalone plugin install + +If you don't want to use the all-in-one installation options, you can install the various plugins for OpenSearch Dashboards individually. + +--- + +#### Table of contents +1. TOC +{:toc} + + +--- + +## Plugin compatibility + + + + + + + + + + + + + + +
OpenSearch Dashboards versionPlugin versions
1.0.0-beta1 +
opensearchDashboardsAlerting          1.0.0-beta1
+opensearchDashboardsAnomalyDetection  1.0.0-beta1
+opensearchDashboardsGanttChart        1.0.0-beta1
+opensearchDashboardsIndexManagement   1.0.0-beta1
+opensearchDashboardsNotebooks         1.0.0-beta1
+opensearchDashboardsQueryWorkbench    1.0.0-beta1
+opensearchDashboardsReports           1.0.0-beta1
+opensearchDashboardsSecurity          1.0.0-beta1
+opensearchDashboardsTraceAnalytics    1.0.0-beta1
+
+
+ + +## Prerequisites + +- A compatible OpenSearch cluster +- The corresponding OpenSearch plugins [installed on that cluster](../../install/plugins) +- The corresponding version of [OpenSearch Dashboards](../) (e.g. OpenSearch Dashboards 1.0.0 works with OpenSearch 1.0.0) + + +## Install + +Navigate to the OpenSearch Dashboards home directory (likely `/usr/share/opensearch-dashboards`) and run the install command for each plugin. + + +#### Security OpenSearch Dashboards + +```bash +sudo bin/opensearch-dashboards-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-dashboards-plugins/opensearch-security/opensearchSecurityOpenSearch Dashboards-{{site.opensearch_major_minor_version}}.0.1.zip +``` + +This plugin provides a user interface for managing users, roles, mappings, action groups, and tenants. + + +#### Alerting OpenSearch Dashboards + +```bash +sudo bin/opensearch-dashboards-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-dashboards-plugins/opensearch-alerting/opensearchAlertingOpenSearch Dashboards-{{site.opensearch_major_minor_version}}.0.0.zip +``` + +This plugin provides a user interface for creating monitors and managing alerts. + + +#### Index State Management OpenSearch Dashboards + +```bash +sudo bin/opensearch-dashboards-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-dashboards-plugins/opensearch-index-management/opensearchIndexManagementOpenSearch Dashboards-{{site.opensearch_major_minor_version}}.0.1.zip +``` + +This plugin provides a user interface for managing policies. + + +#### Anomaly Detection OpenSearch Dashboards + +```bash +sudo bin/opensearch-dashboards-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-dashboards-plugins/opensearch-anomaly-detection/opensearchAnomalyDetectionOpenSearch Dashboards-{{site.opensearch_major_minor_version}}.0.0.zip +``` + +This plugin provides a user interface for adding detectors. + + +#### Query Workbench OpenSearch Dashboards + +```bash +sudo bin/opensearch-dashboards-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-dashboards-plugins/opensearch-query-workbench/opensearchQueryWorkbenchOpenSearch Dashboards-{{site.opensearch_major_minor_version}}.0.0.zip +``` + +This plugin provides a user interface for using SQL queries to explore your data. + + +#### Trace Analytics + +```bash +sudo bin/opensearch-dashboards-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-dashboards-plugins/opensearch-trace-analytics/opensearchTraceAnalyticsOpenSearch Dashboards-{{site.opensearch_major_minor_version}}.2.0.zip +``` + +This plugin uses distributed trace data (indexed in OpenSearch using Data Prepper) to display latency trends, error rates, and more. + + +#### Notebooks OpenSearch Dashboards + +```bash +sudo bin/opensearch-dashboards-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-dashboards-plugins/opensearch-notebooks/opensearchNotebooksOpenSearch Dashboards-{{site.opensearch_major_minor_version}}.2.0.zip +``` + +This plugin lets you combine OpenSearch Dashboards visualizations and narrative text in a single interface. + + +#### Reports OpenSearch Dashboards + +```bash +# x86 Linux +sudo bin/opensearch-dashboards-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-dashboards-plugins/opensearch-reports/linux/x64/opensearchReportsOpenSearch Dashboards-{{site.opensearch_major_minor_version}}.2.0-linux-x64.zip +# ARM64 Linux +sudo bin/opensearch-dashboards-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-dashboards-plugins/opensearch-reports/linux/arm64/opensearchReportsOpenSearch Dashboards-{{site.opensearch_major_minor_version}}.2.0-linux-arm64.zip +# x86 Windows +sudo bin/opensearch-dashboards-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-dashboards-plugins/opensearch-reports/windows/x64/opensearchReportsOpenSearch Dashboards-{{site.opensearch_major_minor_version}}.2.0-windows-x64.zip +``` + +This plugin lets you export and share reports from OpenSearch Dashboards dashboards, visualizations, and saved searches. + + +#### Gantt Chart OpenSearch Dashboards + +```bash +sudo bin/opensearch-dashboards-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/opensearch-dashboards-plugins/opensearch-gantt-chart/opensearchGanttChartOpenSearch Dashboards-{{site.opensearch_major_minor_version}}.0.0.zip +``` + +This plugin adds a new Gantt chart visualization. + + +## List installed plugins + +To check your installed plugins: + +```bash +sudo bin/opensearch-dashboards-plugin list +``` + + +## Remove plugins + +```bash +sudo bin/opensearch-dashboards-plugin remove +``` + +For certain plugins, you must also remove the "optimze" bundle. Here is a sample command for the Anomaly Detection plugin: + +```bash +sudo rm /usr/share/opensearch-dashboards/optimize/bundles/opensearch-anomaly-detection-opensearch-dashboards.* +``` + +Then restart OpenSearch Dashboards. After the removal of any plugin, OpenSearch Dashboards performs an optimize operation the next time you start it. This operation takes several minutes even on fast machines, so be patient. + + +## Update plugins + +OpenSearch Dashboards doesn’t update plugins. Instead, you have to remove the old version and its optimized bundle, reinstall them, and restart OpenSearch Dashboards: + +1. Remove the old version: + + ```bash + sudo bin/opensearch-dashboards-plugin remove + ``` + +1. Remove the optimized bundle: + + ```bash + sudo rm /usr/share/opensearch-dashboards/optimize/bundles/ + ``` + +1. Reinstall the new version: + + ```bash + sudo bin/opensearch-dashboards-plugin install + ``` + +1. Restart OpenSearch Dashboards. + +For example, to remove and reinstall the anomaly detection plugin: + +```bash +sudo bin/opensearch-plugin remove opensearch-anomaly-detection +sudo rm /usr/share/opensearch-dashboards/optimize/bundles/opensearch-anomaly-detection-opensearch-dashboards.* +sudo bin/opensearch-dashboards-plugin install +``` diff --git a/docs/opensearch-dashboards/reporting.md b/docs/opensearch-dashboards/reporting.md new file mode 100644 index 00000000..644d88ef --- /dev/null +++ b/docs/opensearch-dashboards/reporting.md @@ -0,0 +1,55 @@ +--- +layout: default +title: Reporting +parent: OpenSearch Dashboards +nav_order: 20 +--- + + +# Reporting + +The OpenSearch Dashboards reports feature lets you create PNG, PDF, and CSV reports. To use reports, you must have the correct permissions. For summaries of the predefined roles and the permissions they grant, see the [security plugin](../../security/access-control/users-roles/#predefined-roles). + + +## Create reports from Discovery, Visualize, or Dashboard + +On-demand reports let you quickly generate a report from the current view. + +1. From the top bar, choose **Reporting**. +1. For dashboards or visualizations, **Download PDF** or **Download PNG**. From the Discover page, choose **Download CSV**. + + Reports generate asynchronously in the background and might take a few minutes, depending on the size of the report. A notification appears when your report is ready to download. + +1. To create a schedule-based report, choose **Create report definition**. Then proceed to [Create reports using a definition](#create-reports-using-a-definition). This option pre-fills many of the fields for you based on the visualization, dashboard, or data you were viewing. + + +## Create reports using a definition + +Definitions let you schedule reports for periodic creation. + +1. From the left navigation panel, choose **Reporting**. +1. Choose **Create**. +1. Under **Report settings**, enter a name and optional description for your report. +1. Choose the **Report Source** (i.e. the page from which the report is generated). You can generate reports from the **Dashboard**, **Visualize** or **Discover** pages. +1. Choose your dashboard, visualization, or saved search. Then choose a time range for the report. +1. Choose an appropriate file format for the report. +1. (Optional) Add a header or footer for the report. Headers and footers are only available for dashboard or visualization reports. +1. Under **Report trigger**, choose either **On-demand** or **Schedule**. + + For scheduled reports, choose either **Recurring** or **Cron based**. You can receive reports daily or at some other time interval. Cron expressions give you even more flexiblity. See [Cron expression reference](../../alerting/cron/) for more information. + +1. Choose **Create**. + +## Troubleshooting + +### Chromium fails to launch with OpenSearch Dashboards + +While creating a report for dashboards or visualizations, you might see a `Download error`: + +![OpenSearch Dashboards reporting pop-up error message](../../images/reporting-error.png) + +This problem occurs due to two reasons: + +1. You don't have the correct version of `headless-chrome` to match the operating system on which OpenSearch Dashboards is running. Download the correct version of `headless-chrome` from [here](https://github.com/opensearch-project/opensearch-dashboards-reports/releases/tag/chromium-1.12.0.0). + +2. You're missing additional dependencies. Install the required dependencies for your operating system from the [additional libraries](https://github.com/opensearch-project/opensearch-dashboards-reports/blob/dev/opensearch-dashboards-reports/rendering-engine/headless-chrome/README.md#additional-libaries) section. diff --git a/docs/opensearch/bool.md b/docs/opensearch/bool.md new file mode 100644 index 00000000..3ddb77ef --- /dev/null +++ b/docs/opensearch/bool.md @@ -0,0 +1,290 @@ +--- +layout: default +title: Boolean Queries +parent: OpenSearch +nav_order: 11 +--- + +# Boolean queries + +The `bool` query lets you combine multiple search queries with boolean logic. You can use boolean logic between queries to either narrow or broaden your search results. + +The `bool` query is a go-to query because it allows you to construct an advanced query by chaining together several simple ones. + +Use the following clauses (subqueries) within the `bool` query: + +Clause | Behavior +:--- | :--- +`must` | The results must match the queries in this clause. If you have multiple queries, every single one must match. Acts as an `and` operator. +`must_not` | This is the anti-must clause. All matches are excluded from the results. Acts as a `not` operator. +`should` | The results should, but don't have to, match the queries. Each matching `should` clause increases the relevancy score. As an option, you can require one or more queries to match the value of the `minimum_number_should_match` parameter (default is 1). +`filter` | Filters reduce your dataset before applying the queries. A query within a filter clause is a yes-no option, where if a document matches the query it's included in the results. Otherwise, it's not. Filter queries do not affect the relevancy score that the results are sorted by. The results of a filter query are generally cached so they tend to run faster. Use the filter query to filter the results based on exact matches, ranges, dates, numbers, and so on. + +The structure of a `bool` query is as follows: + +```json +GET _search +{ + "query": { + "bool": { + "must": [ + {} + ], + "must_not": [ + {} + ], + "should": [ + {} + ], + "filter": {} + } + } +} +``` + +For example, assume you have the complete works of Shakespeare indexed in an OpenSearch cluster. You want to construct a single query that meets the following requirements: + +1. The `text_entry` field must contain the word `love` and should contain either `life` or `grace`. +2. The `speaker` field must not contain `ROMEO`. +3. Filter these results to the play `Romeo and Juliet` without affecting the relevancy score. + +Use the following query: + +```json +GET shakespeare/_search +{ + "query": { + "bool": { + "must": [ + { + "match": { + "text_entry": "love" + } + } + ], + "should": [ + { + "match": { + "text_entry": "life" + } + }, + { + "match": { + "text_entry": "grace" + } + } + ], + "minimum_should_match": 1, + "must_not": [ + { + "match": { + "speaker": "ROMEO" + } + } + ], + "filter": { + "term": { + "play_name": "Romeo and Juliet" + } + } + } + } +} +``` + +#### Sample output + +```json +{ + "took": 12, + "timed_out": false, + "_shards": { + "total": 4, + "successful": 4, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 11.356054, + "hits": [ + { + "_index": "shakespeare", + "_type": "_doc", + "_id": "88020", + "_score": 11.356054, + "_source": { + "type": "line", + "line_id": 88021, + "play_name": "Romeo and Juliet", + "speech_number": 19, + "line_number": "4.5.61", + "speaker": "PARIS", + "text_entry": "O love! O life! not life, but love in death!" + } + } + ] + } +} +``` + +If you want to identify which of these clauses actually caused the matching results, name each query with the `_name` parameter. +To add the `_name` parameter, change the field name in the `match` query to an object: + + +```json +GET shakespeare/_search +{ + "query": { + "bool": { + "must": [ + { + "match": { + "text_entry": { + "query": "love", + "_name": "love-must" + } + } + } + ], + "should": [ + { + "match": { + "text_entry": { + "query": "life", + "_name": "life-should" + } + } + }, + { + "match": { + "text_entry": { + "query": "grace", + "_name": "grace-should" + } + } + } + ], + "minimum_should_match": 1, + "must_not": [ + { + "match": { + "speaker": { + "query": "ROMEO", + "_name": "ROMEO-must-not" + } + } + } + ], + "filter": { + "term": { + "play_name": "Romeo and Juliet" + } + } + } + } +} +``` + +OpenSearch returns a `matched_queries` array that lists the queries that matched these results: + +```json +"matched_queries": [ + "love-must", + "life-should" +] +``` + +If you remove the queries not in this list, you will still see the exact same result. +By examining which `should` clause matched, you can better understand the relevancy score of the results. + +You can also construct complex boolean expressions by nesting `bool` queries. +For example, to find a `text_entry` field that matches (`love` OR `hate`) AND (`life` OR `grace`) in the play `Romeo and Juliet`: + +```json +GET shakespeare/_search +{ + "query": { + "bool": { + "must": [ + { + "bool": { + "should": [ + { + "match": { + "text_entry": "love" + } + }, + { + "match": { + "text": "hate" + } + } + ] + } + }, + { + "bool": { + "should": [ + { + "match": { + "text_entry": "life" + } + }, + { + "match": { + "text": "grace" + } + } + ] + } + } + ], + "filter": { + "term": { + "play_name": "Romeo and Juliet" + } + } + } + } +} +``` + +#### Sample output + +```json +{ + "took": 10, + "timed_out": false, + "_shards": { + "total": 2, + "successful": 2, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": 1, + "max_score": 11.37006, + "hits": [ + { + "_index": "shakespeare", + "_type": "doc", + "_id": "88020", + "_score": 11.37006, + "_source": { + "type": "line", + "line_id": 88021, + "play_name": "Romeo and Juliet", + "speech_number": 19, + "line_number": "4.5.61", + "speaker": "PARIS", + "text_entry": "O love! O life! not life, but love in death!" + } + } + ] + } +} +``` diff --git a/docs/opensearch/catapis.md b/docs/opensearch/catapis.md new file mode 100644 index 00000000..b28cd9d6 --- /dev/null +++ b/docs/opensearch/catapis.md @@ -0,0 +1,257 @@ +--- +layout: default +title: CAT API +parent: OpenSearch +nav_order: 7 +--- + +# cat API + +You can get essential statistics about your cluster in an easy-to-understand, tabular format using the compact and aligned text (CAT) API. The cat API is a human-readable interface that returns plain text instead of traditional JSON. + +Using the cat API, you can answer questions like which node is the elected master, what state is the cluster in, how many documents are in each index, and so on. + +To see the available operations in the cat API, use the following command: + +``` +GET _cat +``` + +You can also use the following string parameters with your query. + +Parameter | Description +:--- | :--- | +`?v` | Makes the output more verbose by adding headers to the columns. It also adds some formatting to help align each of the columns together. All examples on this page include the `v` parameter. +`?help` | Lists the default and other available headers for a given operation. +`?h` | Limits the output to specific headers. +`?format` | Outputs the result in JSON, YAML, or CBOR formats. +`?sort` | Sorts the output by the specified columns. + +To see what each column represents, use the `?v` parameter: + +``` +GET _cat/?v +``` + +To see all the available headers, use the `?help` parameter: + +``` +GET _cat/?help +``` + +To limit the output to a subset of headers, use the `?h` parameter: + +``` +GET _cat/?h=,&v +``` + +Typically, for any operation you can find out what headers are available using the `?help` parameter, and then use the `?h` parameter to limit the output to only the headers that you care about. + +--- + +#### Table of contents +1. TOC +{:toc} + +--- +## Aliases + +Lists the mapping of aliases to indices, plus routing and filtering information. + +``` +GET _cat/aliases?v +``` + +To limit the information to a specific alias, add the alias name after your query. + +``` +GET _cat/aliases/?v +``` + +## Allocation + +Lists the allocation of disk space for indices and the number of shards on each node. +Default request: +``` +GET _cat/allocation?v +``` + +## Count + +Lists the number of documents in your cluster. + +``` +GET _cat/count?v +``` + +To see the number of documents in a specific index, add the index name after your query. + +``` +GET _cat/count/?v +``` + +## Field data + +Lists the memory size used by each field per node. + +``` +GET _cat/fielddata?v +``` + +To limit the information to a specific field, add the field name after your query. + +``` +GET _cat/fielddata/?v +``` + +## Health + +Lists the status of the cluster, how long the cluster has been up, the number of nodes, and other useful information that helps you analyze the health of your cluster. + +``` +GET _cat/health?v +``` + +## Indices + +Lists information related to indices⁠—how much disk space they are using, how many shards they have, their health status, and so on. + +``` +GET _cat/indices?v +``` + +To limit the information to a specific index, add the index name after your query. + +``` +GET _cat/indices/?v +``` + +## Master + +Lists information that helps identify the elected master node. + +``` +GET _cat/master?v +``` + +## Node attributes + +Lists the attributes of custom nodes. + +``` +GET _cat/nodeattrs?v +``` + +## Nodes + +Lists node-level information, including node roles and load metrics. + +A few important node metrics are `pid`, `name`, `master`, `ip`, `port`, `version`, `build`, `jdk`, along with `disk`, `heap`, `ram`, and `file_desc`. + +``` +GET _cat/nodes?v +``` + +## Pending tasks + +Lists the progress of all pending tasks, including task priority and time in queue. + +``` +GET _cat/pending_tasks?v +``` + +## Plugins + +Lists the names, components, and versions of the installed plugins. + +``` +GET _cat/plugins?v +``` + +## Recovery + +Lists all completed and ongoing index and shard recoveries. + +``` +GET _cat/recovery?v +``` + +To see only the recoveries of a specific index, add the index name after your query. + +``` +GET _cat/recovery/?v +``` + +## Repositories + +Lists all snapshot repositories and their types. + +``` +GET _cat/repositories?v +``` + +## Segments + +Lists Lucene segment-level information for each index. + +``` +GET _cat/segments?v +``` + +To see only the information about segments of a specific index, add the index name after your query. + +``` +GET _cat/segments/?v +``` + +## Shards + +Lists the state of all primary and replica shards and how they are distributed. + +``` +GET _cat/shards?v +``` + +To see only the information about shards of a specific index, add the index name after your query. + +``` +GET _cat/shards/?v +``` + +## Snapshots + +Lists all snapshots for a repository. + +``` +GET _cat/snapshots/?v +``` + +## Tasks + +Lists the progress of all tasks currently running on your cluster. + +``` +GET _cat/tasks?v +``` + +## Templates + +Lists the names, patterns, order numbers, and version numbers of index templates. + +``` +GET _cat/templates?v +``` + +## Thread pool + +Lists the active, queued, and rejected threads of different thread pools on each node. + +``` +GET _cat/thread_pool?v +``` + +To limit the information to a specific thread pool, add the thread pool name after your query. + +``` +GET _cat/thread_pool/?v +``` diff --git a/docs/opensearch/cluster.md b/docs/opensearch/cluster.md new file mode 100644 index 00000000..25fdb853 --- /dev/null +++ b/docs/opensearch/cluster.md @@ -0,0 +1,341 @@ +--- +layout: default +title: Cluster Formation +parent: OpenSearch +nav_order: 2 +--- + +# Cluster formation + +Before diving into OpenSearch and searching and aggregating data, you first need to create an OpenSearch cluster. + +OpenSearch can operate as a single-node or multi-node cluster. The steps to configure both are, in general, quite similar. This page demonstrates how to create and configure a multi-node cluster, but with only a few minor adjustments, you can follow the same steps to create a single-node cluster. + +To create and deploy an OpenSearch cluster according to your requirements, it’s important to understand how node discovery and cluster formation work and what settings govern them. + +There are many ways that you can design a cluster. The following illustration shows a basic architecture. + +![multi-node cluster architecture diagram](../../images/cluster.png) + +This is a four-node cluster that has one dedicated master node, one dedicated coordinating node, and two data nodes that are master-eligible and also used for ingesting data. + +The following table provides brief descriptions of the node types. + +Node type | Description | Best practices for production +:--- | :--- | :-- | +`Master` | Manages the overall operation of a cluster and keeps track of the cluster state. This includes creating and deleting indices, keeping track of the nodes that join and leave the cluster, checking the health of each node in the cluster (by running ping requests), and allocating shards to nodes. | Three dedicated master nodes in three different zones is the right approach for almost all production use cases. This makes sure your cluster never loses quorum. Two nodes will be idle for most of the time except when one node goes down or needs some maintenance. +`Master-eligible` | Elects one node among them as the master node through a voting process. | For production clusters, make sure you have dedicated master nodes. The way to achieve a dedicated node type is to mark all other node types as false. In this case, you have to mark all the other nodes as not master-eligible. +`Data` | Stores and searches data. Performs all data-related operations (indexing, searching, aggregating) on local shards. These are the worker nodes of your cluster and need more disk space than any other node type. | As you add data nodes, keep them balanced between zones. For example, if you have three zones, add data nodes in multiples of three, one for each zone. We recommend using storage and RAM-heavy nodes. +`Ingest` | Preprocesses data before storing it in the cluster. Runs an ingest pipeline that transforms your data before adding it to an index. | If you plan to ingest a lot of data and run complex ingest pipelines, we recommend you use dedicated ingest nodes. You can also optionally offload your indexing from the data nodes so that your data nodes are used exclusively for searching and aggregating. +`Coordinating` | Delegates client requests to the shards on the data nodes, collects and aggregates the results into one final result, and sends this result back to the client. | A couple of dedicated coordinating-only nodes is appropriate to prevent bottlenecks for search-heavy workloads. We recommend using CPUs with as many cores as you can. + +By default, each node is a master-eligible, data, ingest, and coordinating node. Deciding on the number of nodes, assigning node types, and choosing the hardware for each node type depends on your use case. You must take into account factors like the amount of time you want to hold on to your data, the average size of your documents, your typical workload (indexing, searches, aggregations), your expected price-performance ratio, your risk tolerance, and so on. + +After you assess all these requirements, we recommend you use a benchmark testing tool like Rally to provision a small sample cluster and run tests with varying workloads and configurations. Compare and analyze the system and query metrics for these tests to design an optimum architecture. To get started with Rally, see the [Rally documentation](https://esrally.readthedocs.io/en/stable/). + +This page demonstrates how to work with the different node types. It assumes that you have a four-node cluster similar to the preceding illustration. + +## Prerequisites + +Before you get started, you must install and configure OpenSearch on all of your nodes. For information about the available options, see [Install and Configure](../../install/). + +After you are done, use SSH to connect to each node, and then open the `config/opensearch.yml` file. + +You can set all configurations for your cluster in this file. + +## Step 1: Name a cluster + +Specify a unique name for the cluster. If you don't specify a cluster name, it's set to `opensearch` by default. Setting a descriptive cluster name is important, especially if you want to run multiple clusters inside a single network. + +To specify the cluster name, change the following line: + +```yml +#cluster.name: my-application +``` + +to + +```yml +cluster.name: opensearch-cluster +``` + +Make the same change on all the nodes to make sure that they'll join to form a cluster. + + +## Step 2: Set node attributes for each node in a cluster + +After you name the cluster, set node attributes for each node in your cluster. + + +#### Master node + +Give your master node a name. If you don't specify a name, OpenSearch assigns a machine-generated name that makes the node difficult to monitor and troubleshoot. + +```yml +node.name: opensearch-master +``` + +You can also explicitly specify that this node is a master node. This is already true by default, but adding it makes it easier to identify the master node: + +```yml +node.master: true +``` + +Then make the node a dedicated master that won’t perform double-duty as a data node: + +```yml +node.data: false +``` + +Specify that this node will not be used for ingesting data: + +```yml +node.ingest: false +``` + +#### Data nodes + +Change the name of two nodes to `opensearch-d1` and `opensearch-d2`, respectively: + +```yml +node.name: opensearch-d1 +``` +```yml +node.name: opensearch-d2 +``` + +You can make them master-eligible data nodes that will also be used for ingesting data: + +```yml +node.master: true +node.data: true +node.ingest: true +``` + +You can also specify any other attributes that you'd like to set for the data nodes. + +#### Coordinating node + +Change the name of the coordinating node to `opensearch-c1`: + +```yml +node.name: opensearch-c1 +``` + +Every node is a coordinating node by default, so to make this node a dedicated coordinating node, set `node.master`, `node.data`, and `node.ingest` to `false`: + +```yml +node.master: false +node.data: false +node.ingest: false +``` + +## Step 3: Bind a cluster to specific IP addresses + +`network_host` defines the IP address that's used to bind the node. By default, OpenSearch listens on a local host, which limits the cluster to a single node. You can also use `_local_` and `_site_` to bind to any loopback or site-local address, whether IPv4 or IPv6: + +```yml +network.host: [_local_, _site_] +``` + +To form a multi-node cluster, specify the IP address of the node: + +```yml +network.host: +``` + + +Make sure to configure these settings on all of your nodes. + + +## Step 4: Configure discovery hosts for a cluster + +Now that you've configured the network hosts, you need to configure the discovery hosts. + +Zen Discovery is the built-in, default mechanism that uses [unicast](https://en.wikipedia.org/wiki/Unicast) to find other nodes in the cluster. + +You can generally just add all of your master-eligible nodes to the `discovery.seed_hosts` array. When a node starts up, it finds the other master-eligible nodes, determines which one is the master, and asks to join the cluster. + +For example, for `opensearch-master` the line looks something like this: + +```yml +discovery.seed_hosts: ["", "", ""] +``` + + +## Step 5: Start the cluster + +After you set the configurations, start OpenSearch on all nodes. + +```bash +sudo systemctl start opensearch.service +``` + +Then go to the logs file to see the formation of the cluster: + +```bash +less /var/log/opensearch/opensearch-cluster.log +``` + +Perform the following `_cat` query on any node to see all the nodes formed as a cluster: + +```bash +curl -XGET https://:9200/_cat/nodes?v -u 'admin:admin' --insecure +``` + +``` +ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name +x.x.x.x 13 61 0 0.02 0.04 0.05 mi * opensearch-master +x.x.x.x 16 60 0 0.06 0.05 0.05 md - opensearch-d1 +x.x.x.x 34 38 0 0.12 0.07 0.06 md - opensearch-d2 +x.x.x.x 23 38 0 0.12 0.07 0.06 md - opensearch-c1 +``` + +To better understand and monitor your cluster, use the [cat API](../catapis/). + + +## (Advanced) Step 6: Configure shard allocation awareness or forced awareness + +If your nodes are spread across several geographical zones, you can configure shard allocation awareness to allocate all replica shards to a zone that’s different from their primary shard. + +With shard allocation awareness, if the nodes in one of your zones fail, you can be assured that your replica shards are spread across your other zones. It adds a layer of fault tolerance to ensure your data survives a zone failure beyond just individual node failures. + +To configure shard allocation awareness, add zone attributes to `opensearch-d1` and `opensearch-d2`, respectively: + +```yml +node.attr.zone: zoneA +``` +```yml +node.attr.zone: zoneB +``` + +Update the cluster settings: + +```json +PUT _cluster/settings +{ + "persistent": { + "cluster.routing.allocation.awareness.attributes": "zone" + } +} +``` + +You can either use `persistent` or `transient` settings. We recommend the `persistent` setting because it persists through a cluster reboot. Transient settings do not persist through a cluster reboot. + +Shard allocation awareness attempts to separate primary and replica shards across multiple zones. But, if only one zone is available (such as after a zone failure), OpenSearch allocates replica shards to the only remaining zone. + +Another option is to require that primary and replica shards are never allocated to the same zone. This is called forced awareness. + +To configure forced awareness, specify all the possible values for your zone attributes: + +```json +PUT _cluster/settings +{ + "persistent": { + "cluster.routing.allocation.awareness.attributes": "zone", + "cluster.routing.allocation.awareness.force.zone.values":["zoneA", "zoneB"] + } +} +``` + +Now, if a data node fails, forced awareness does not allocate the replicas to a node in the same zone. Instead, the cluster enters a yellow state and only allocates the replicas when nodes in another zone come online. + +In our two-zone architecture, we can use allocation awareness if `opensearch-d1` and `opensearch-d2` are less than 50% utilized, so that each of them have the storage capacity to allocate replicas in the same zone. +If that is not the case, and `opensearch-d1` and `opensearch-d2` do not have the capacity to contain all primary and replica shards, we can use forced awareness. This approach helps to make sure that, in the event of a failure, OpenSearch doesn't overload your last remaining zone and lock up your cluster due to lack of storage. + +Choosing allocation awareness or forced awareness depends on how much space you might need in each zone to balance your primary and replica shards. + + +## (Advanced) Step 7: Set up a hot-warm architecture + +You can design a hot-warm architecture where you first index your data to hot nodes---fast and expensive---and after a certain period of time move them to warm nodes---slow and cheap. + +If you analyze time series data that you rarely update and want the older data to go onto cheaper storage, this architecture can be a good fit. + +This architecture helps save money on storage costs. Rather than increasing the number of hot nodes and using fast, expensive storage, you can add warm nodes for data that you don't access as frequently. + +To configure a hot-warm storage architecture, add `temp` attributes to `opensearch-d1` and `opensearch-d2`, respectively: + +```yml +node.attr.temp: hot +``` +```yml +node.attr.temp: warm +``` + +You can set the attribute name and value to whatever you want as long as it’s consistent for all your hot and warm nodes. + +To add an index `newindex` to the hot node: + +```json +PUT newindex +{ + "settings": { + "index.routing.allocation.require.temp": "hot" + } +} +``` + +Take a look at the following shard allocation for `newindex`: + +```json +GET _cat/shards/newindex?v +index shard prirep state docs store ip node +new_index 2 p STARTED 0 230b 10.0.0.225 opensearch-d1 +new_index 2 r UNASSIGNED +new_index 3 p STARTED 0 230b 10.0.0.225 opensearch-d1 +new_index 3 r UNASSIGNED +new_index 4 p STARTED 0 230b 10.0.0.225 opensearch-d1 +new_index 4 r UNASSIGNED +new_index 1 p STARTED 0 230b 10.0.0.225 opensearch-d1 +new_index 1 r UNASSIGNED +new_index 0 p STARTED 0 230b 10.0.0.225 opensearch-d1 +new_index 0 r UNASSIGNED +``` + +In this example, all primary shards are allocated to `opensearch-d1`, which is our hot node. All replica shards are unassigned, because we're forcing this index to allocate only to hot nodes. + +To add an index `oldindex` to the warm node: + +```json +PUT oldindex +{ + "settings": { + "index.routing.allocation.require.temp": "warm" + } +} +``` + +The shard allocation for `oldindex`: + +```json +GET _cat/shards/oldindex?v +index shard prirep state docs store ip node +old_index 2 p STARTED 0 230b 10.0.0.74 opensearch-d2 +old_index 2 r UNASSIGNED +old_index 3 p STARTED 0 230b 10.0.0.74 opensearch-d2 +old_index 3 r UNASSIGNED +old_index 4 p STARTED 0 230b 10.0.0.74 opensearch-d2 +old_index 4 r UNASSIGNED +old_index 1 p STARTED 0 230b 10.0.0.74 opensearch-d2 +old_index 1 r UNASSIGNED +old_index 0 p STARTED 0 230b 10.0.0.74 opensearch-d2 +old_index 0 r UNASSIGNED +``` + +In this case, all primary shards are allocated to `opensearch-d2`. Again, all replica shards are unassigned because we only have one warm node. + +A popular approach is to configure your [index templates](../index-templates/) to set the `index.routing.allocation.require.temp` value to `hot`. This way, OpenSearch stores your most recent data on your hot nodes. + +You can then use the [Index State Management (ISM)](../../ism/index/) plugin to periodically check the age of an index and specify actions to take on it. For example, when the index reaches a specific age, change the `index.routing.allocation.require.temp` setting to `warm` to automatically move your data from hot nodes to warm nodes. + + +## Next steps + +If you are using the security plugin, the previous request to `_cat/nodes?v` might have failed with an initialization error. To initialize the plugin, run `opensearch/plugins/opensearch_security/tools/securityadmin.sh`. A sample command that uses the demo certificates might look like this: + +```bash +sudo ./securityadmin.sh -cd ../securityconfig/ -icl -nhnv -cacert /etc/opensearch/root-ca.pem -cert /etc/opensearch/kirk.pem -key /etc/opensearch/kirk-key.pem -h +``` + +For full guidance around configuration options, see [Security configuration](../../security/configuration). diff --git a/docs/opensearch/common-parameters.md b/docs/opensearch/common-parameters.md new file mode 100644 index 00000000..f1e678a4 --- /dev/null +++ b/docs/opensearch/common-parameters.md @@ -0,0 +1,18 @@ +--- +layout: default +title: Common REST Parameters +parent: OpenSearch +nav_order: 91 +--- + +# Common REST parameters + +OpenSearch supports the following parameters for all REST operations: + +Option | Description | Example +:--- | :--- | :--- +Human-readable output | To convert output units to human-readable values (for example, `1h` for 1 hour and `1kb` for 1,024 bytes), add `?human=true` to the request URL. | `GET /_search?human=true` +Pretty result | To get back JSON responses in a readable format, add `?pretty=true` to the request URL. | `GET /_search?pretty=true` +Content type | To specify the type of content in the request body, use the `Content-Type` key name in the request header. Most operations support JSON, YAML, and CBOR formats. | `POST _scripts/ -H 'Content-Type: application/json` +Request body in query string | If the client library does not accept a request body for non-POST requests, use the `source` query string parameter to pass the request body. Also, specify the `source_content_type` parameter with a supported media type such as `application/json`. | `GET _search?source_content_type=application/json&source={"query":{"match_all":{}}}` +Stack traces | To include the error stack trace in the response when an exception is raised, add `error_trace=true` to the request URL. | `GET /_search?error_trace=true` diff --git a/docs/opensearch/configuration.md b/docs/opensearch/configuration.md new file mode 100755 index 00000000..48acb312 --- /dev/null +++ b/docs/opensearch/configuration.md @@ -0,0 +1,69 @@ +--- +layout: default +title: Configuration +parent: OpenSearch +nav_order: 1 +--- + +# OpenSearch configuration + +Most OpenSearch configuration can take place in the cluster settings API. Certain operations require you to modify `opensearch.yml` and restart the cluster. + +Whenever possible, use the cluster settings API instead; `opensearch.yml` is local to each node, whereas the API applies the setting to all nodes in the cluster. + + +## Cluster settings API + +The first step in changing a setting is to view the current settings: + +``` +GET _cluster/settings?include_defaults=true +``` + +For a more concise summary of non-default settings: + +``` +GET _cluster/settings +``` + +Three categories of setting exist in the cluster settings API: persistent, transient, and default. Persistent settings, well, persist after a cluster restart. After a restart, OpenSearch clears transient settings. + +If you specify the same setting in multiple places, OpenSearch uses the following precedence: + +1. Transient settings +2. Persistent settings +3. Settings from `opensearch.yml` +4. Default settings + +To change a setting, just specify the new one as either persistent or transient. This example shows the flat settings form: + +```json +PUT /_cluster/settings +{ + "persistent" : { + "action.auto_create_index" : false + } +} +``` + +You can also use the expanded form, which lets you copy and paste from the GET response and change existing values: + +```json +PUT /_cluster/settings +{ + "persistent": { + "action": { + "auto_create_index": false + } + } +} +``` + + +--- + +## Configuration file + +You can find `opensearch.yml` in `/usr/share/opensearch/config/opensearch.yml` (Docker) or `/etc/opensearch/opensearch.yml` (RPM and DEB) on each node. + +The demo configuration includes a number of settings for the security plugin that you should modify before using OpenSearch for a production workload. To learn more, see [Security](../../security/). diff --git a/docs/opensearch/full-text.md b/docs/opensearch/full-text.md new file mode 100644 index 00000000..aa416538 --- /dev/null +++ b/docs/opensearch/full-text.md @@ -0,0 +1,437 @@ +--- +layout: default +title: Full-Text Queries +parent: OpenSearch +nav_order: 10 +--- + +# Full-text queries + +Although you can use HTTP request parameters to perform simple searches, the OpenSearch query domain-specific language (DSL) lets you specify the full range of search options. The query DSL uses the HTTP request body. Queries specified in this way have the added advantage of being more explicit in their intent and easier to tune over time. + +This page lists all full-text query types and common options. Given the sheer number of options and subtle behaviors, the best method of ensuring useful search results is to test different queries against representative indices and verify the output. + + +--- + +#### Table of contents +1. TOC +{:toc} + + +--- + +## Match + +Creates a [boolean query](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/BooleanQuery.html) that returns results if the search term is present in the field. + +The most basic form of the query provides only a field (`title`) and a term (`wind`): + +```json +GET _search +{ + "query": { + "match": { + "title": "wind" + } + } +} +``` + +For an example that uses [curl](https://curl.haxx.se/), try: + +```bash +curl --insecure -XGET -u 'admin:admin' https://://_search \ + -H "content-type: application/json" \ + -d '{ + "query": { + "match": { + "title": "wind" + } + } + }' +``` + +The query accepts the following options. For descriptions of each, see [Options](#options). + +```json +GET _search +{ + "query": { + "match": { + "title": { + "query": "wind", + "fuzziness": "AUTO", + "fuzzy_transpositions": true, + "operator": "or", + "minimum_should_match": 1, + "analyzer": "standard", + "zero_terms_query": "none", + "lenient": false, + "cutoff_frequency": 0.01, + "prefix_length": 0, + "max_expansions": 50, + "boost": 1 + } + } + } +} +``` + + +## Multi match + +Similar to [match](#match), but searches multiple fields. + +The `^` lets you "boost" certain fields. Boosts are multipliers that weigh matches in one field more heavily than matches in other fields. In the following example, a match for "wind" in the title field influences `_score` four times as much as a match in the plot field. The result is that films like *The Wind Rises* and *Gone with the Wind* are near the top of the search results, and films like *Twister* and *Sharknado*, which presumably have "wind" in their plot summaries, are near the bottom. + +```json +GET _search +{ + "query": { + "multi_match": { + "query": "wind", + "fields": ["title^4", "plot"] + } + } +} +``` + +The query accepts the following options. For descriptions of each, see [Options](#options). + +```json +GET _search +{ + "query": { + "multi_match": { + "query": "wind", + "fields": ["title^4", "description"], + "type": "most_fields", + "operator": "and", + "minimum_should_match": 3, + "tie_breaker": 0.0, + "analyzer": "standard", + "boost": 1, + "fuzziness": "AUTO", + "fuzzy_transpositions": true, + "lenient": false, + "prefix_length": 0, + "max_expansions": 50, + "auto_generate_synonyms_phrase_query": true, + "cutoff_frequency": 0.01, + "zero_terms_query": "none" + } + } +} +``` + + +## Match boolean prefix + +Similar to [match](#match), but creates a [prefix query](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/PrefixQuery.html) out of the last term in the query string. + +```json +GET _search +{ + "query": { + "match_bool_prefix": { + "title": "rises wi" + } + } +} +``` + +The query accepts the following options. For descriptions of each, see [Options](#options). + +```json +GET _search +{ + "query": { + "match_bool_prefix": { + "title": { + "query": "rises wi", + "fuzziness": "AUTO", + "fuzzy_transpositions": true, + "max_expansions": 50, + "prefix_length": 0, + "operator": "or", + "minimum_should_match": 2, + "analyzer": "standard" + } + } + } +} +``` + + +## Match phrase + +Creates a [phrase query](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/PhraseQuery.html) that matches a sequence of terms. + +```json +GET _search +{ + "query": { + "match_phrase": { + "title": "the wind rises" + } + } +} +``` + +The query accepts the following options. For descriptions of each, see [Options](#options). + +```json +GET _search +{ + "query": { + "match_phrase": { + "title": { + "query": "wind rises the", + "slop": 3, + "analyzer": "standard", + "zero_terms_query": "none" + } + } + } +} +``` + + +## Match phrase prefix + +Similar to [match phrase](#match-phrase), but creates a [prefix query](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/PrefixQuery.html) out of the last term in the query string. + +```json +GET _search +{ + "query": { + "match_phrase_prefix": { + "title": "the wind ri" + } + } +} +``` + +The query accepts the following options. For descriptions of each, see [Options](#options). + +```json +GET _search +{ + "query": { + "match_phrase_prefix": { + "title": { + "query": "the wind ri", + "analyzer": "standard", + "max_expansions": 50, + "slop": 3 + } + } + } +} +``` + + +## Common terms + +The common terms query separates the query string into high- and low-frequency terms based on number of occurrences on the shard. Low-frequency terms are weighed more heavily in the results, and high-frequency terms are considered only for documents that already matched one or more low-frequency terms. In that sense, you can think of this query as having a built-in, ever-changing list of stop words. + +```json +GET _search +{ + "query": { + "common": { + "title": { + "query": "the wind rises" + } + } + } +} +``` + +The query accepts the following options. For descriptions of each, see [Options](#options). + +```json +GET _search +{ + "query": { + "common": { + "title": { + "query": "the wind rises", + "cutoff_frequency": 0.002, + "low_freq_operator": "or", + "boost": 1, + "analyzer": "standard", + "minimum_should_match": { + "low_freq" : 2, + "high_freq" : 3 + } + } + } + } +} +``` + + +## Query string + +The query string query splits text based on operators and analyzes each individually. + +If you search using the HTTP request parameters (i.e. `_search?q=wind`), OpenSearch creates a query string query. +{: .note } + +```json +GET _search +{ + "query": { + "query_string": { + "query": "the wind AND (rises OR rising)" + } + } +} +``` + +The query accepts the following options. For descriptions of each, see [Options](#options). + +```json +GET _search +{ + "query": { + "query_string": { + "query": "the wind AND (rises OR rising)", + "default_field": "title", + "type": "best_fields", + "fuzziness": "AUTO", + "fuzzy_transpositions": true, + "fuzzy_max_expansions": 50, + "fuzzy_prefix_length": 0, + "minimum_should_match": 1, + "default_operator": "or", + "analyzer": "standard", + "lenient": false, + "boost": 1, + "allow_leading_wildcard": true, + "enable_position_increments": true, + "phrase_slop": 3, + "max_determinized_states": 10000, + "time_zone": "-08:00", + "quote_field_suffix": "", + "quote_analyzer": "standard", + "analyze_wildcard": false, + "auto_generate_synonyms_phrase_query": true + } + } +} +``` + + +## Simple query string + +The simple query string query is like the query string query, but it lets advanced users specify many arguments directly in the query string. The query discards any invalid portions of the query string. + +```json +GET _search +{ + "query": { + "simple_query_string": { + "query": "\"rises wind the\"~4 | *ising~2", + "fields": ["title"] + } + } +} +``` + +Special character | Behavior +:--- | :--- +`+` | Acts as the `and` operator. +`|` | Acts as the `or` operator. +`*` | Acts as a wildcard. +`""` | Wraps several terms into a phrase. +`()` | Wraps a clause for precedence. +`~n` | When used after a term (e.g. `wnid~3`), sets `fuzziness`. When used after a phrase, sets `slop`. See [Options](#options). +`-` | Negates the term. + +The query accepts the following options. For descriptions of each, see [Options](#options). + +```json +GET _search +{ + "query": { + "simple_query_string": { + "query": "\"rises wind the\"~4 | *ising~2", + "fields": ["title"], + "flags": "ALL", + "fuzzy_transpositions": true, + "fuzzy_max_expansions": 50, + "fuzzy_prefix_length": 0, + "minimum_should_match": 1, + "default_operator": "or", + "analyzer": "standard", + "lenient": false, + "quote_field_suffix": "", + "analyze_wildcard": false, + "auto_generate_synonyms_phrase_query": true + } + } +} +``` + + +## Match all + +Matches all documents. Can be useful for testing. + +```json +GET _search +{ + "query": { + "match_all": {} + } +} +``` + + +## Match none + +Matches no documents. Rarely useful. + +```json +GET _search +{ + "query": { + "match_none": {} + } +} +``` + + +## Options + +Option | Valid values | Description +:--- | :--- | :--- +`allow_leading_wildcard` | Boolean | Whether `*` and `?` are allowed as the first character of a search term. The default is true. +`analyze_wildcard` | Boolean | Whether OpenSearch should attempt to analyze wildcard terms. Some analyzers do a poor job at this task, so the default is false. +`analyzer` | `standard, simple, whitespace, stop, keyword, pattern, , fingerprint` | The analyzer you want to use for the query. Different analyzers have different character filters, tokenizers, and token filters. The `stop` analyzer, for example, removes stop words (e.g. "an," "but," "this") from the query string. +`auto_generate_synonyms_phrase_query` | Boolean | A value of true (default) automatically generates [phrase queries](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/PhraseQuery.html) for multi-term synonyms. For example, if you have the synonym `"ba, batting average"` and search for "ba," OpenSearch searches for `ba OR "batting average"` (if this option is true) or `ba OR (batting AND average)` (if this option is false). +`boost` | Floating-point | Boosts the clause by the given multiplier. Useful for weighing clauses in compound queries. The default is 1.0. +`cutoff_frequency` | Between `0.0` and `1.0` or a positive integer | This value lets you define high and low frequency terms based on number of occurrences in the index. Numbers between 0 and 1 are treated as a percentage. For example, 0.10 is 10%. This value means that if a word occurs within the search field in more than 10% of the documents on the shard, OpenSearch considers the word "high frequency" and deemphasizes it when calculating search score.

Because this setting is *per shard*, testing its impact on search results can be challenging unless a cluster has many documents. +`enable_position_increments` | Boolean | When true, result queries are aware of position increments. This setting is useful when the removal of stop words leaves an unwanted "gap" between terms. The default is true. +`fields` | String array | The list of fields to search (e.g. `"fields": ["title^4", "description"]`). If unspecified, defaults to the `index.query.default_field` setting, which defaults to `["*"]`. +`flags` | String | A `|`-delimited string of [flags](#simple-query-string) to enable (e.g. `AND|OR|NOT`). The default is `ALL`. +`fuzziness` | `AUTO`, `0`, or a positive integer | The number of character edits (insert, delete, substitute) that it takes to change one word to another when determining whether a term matched a value. For example, the distance between `wined` and `wind` is 1. The default, `AUTO`, chooses a value based on the length of each term and is a good choice for most use cases. +`fuzzy_transpositions` | Boolean | Setting `fuzzy_transpositions` to true (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the `fuzziness` option. For example, the distance between `wind` and `wnid` is 1 if `fuzzy_transpositions` is true (swap "n" and "i") and 2 if it is false (delete "n", insert "n").

If `fuzzy_transpositions` is false, `rewind` and `wnid` have the same distance (2) from `wind`, despite the more human-centric opinion that `wnid` is an obvious typo. The default is a good choice for most use cases. +`lenient` | Boolean | Setting `lenient` to true lets you ignore data type mismatches between the query and the document field. For example, a query string of "8.2" could match a field of type `float`. The default is false. +`low_freq_operator` | `and, or` | The operator for low-frequency terms. The default is `or`. See [Common Terms](#common-terms) queries and `operator` in this table. +`max_determinized_states` | Positive integer | The maximum number of "[states](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/util/automaton/Operations.html#DEFAULT_MAX_DETERMINIZED_STATES)" (a measure of complexity) that Lucene can create for query strings that contain regular expressions (e.g. `"query": "/wind.+?/"`). Larger numbers allow for queries that use more memory. The default is 10,000. +`max_expansions` | Positive integer | Fuzzy queries "expand to" a number of matching terms that are within the distance specified in `fuzziness`. Then OpenSearch tries to match those terms against its indices. `max_expansions` specifies the maximum number of terms that the fuzzy query expands to. The default is 50. +`minimum_should_match` | Positive or negative integer, positive or negative percentage, combination | If the query string contains multiple search terms and you used the `or` operator, the number of terms that need to match for the document to be considered a match. For example, if `minimum_should_match` is 2, "wind often rising" does not match "The Wind Rises." If `minimum_should_match` is 1, it matches. This option also has `low_freq` and `high_freq` properties for [Common Terms](#common-terms) queries. +`operator` | `or, and` | If the query string contains multiple search terms, whether all terms need to match (`and`) or only one term needs to match (`or`) for a document to be considered a match. +`phrase_slop` | `0` (default) or a positive integer | See `slop`. +`prefix_length` | `0` (default) or a positive integer | The number of leading characters that are not considered in fuzziness. +`quote_field_suffix` | String | This option lets you search different fields depending on whether terms are wrapped in quotes. For example, if `quote_field_suffix` is `".exact"` and you search for `"lightly"` (in quotes) in the `title` field, OpenSearch searches the `title.exact` field. This second field might use a different type (e.g. `keyword` rather than `text`) or a different analyzer. The default is null. +`rewrite` | `constant_score, scoring_boolean, constant_score_boolean, top_terms_N, top_terms_boost_N, top_terms_blended_freqs_N` | Determines how OpenSearch rewrites and scores multi-term queries. The default is `constant_score`. +`slop` | `0` (default) or a positive integer | Controls the degree to which words in a query can be misordered and still be considered a match. From the [Lucene documentation](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/PhraseQuery.html#getSlop--): "The number of other words permitted between words in query phrase. For example, to switch the order of two words requires two moves (the first move places the words atop one another), so to permit re-orderings of phrases, the slop must be at least two. A value of zero requires an exact match." +`tie_breaker` | `0.0` (default) to `1.0` | Changes the way OpenSearch scores searches. For example, a `type` of `best_fields` typically uses the highest score from any one field. If you specify a `tie_breaker` value between 0.0 and 1.0, the score changes to highest score + `tie_breaker` * score for all other matching fields. If you specify a value of 1.0, OpenSearch adds together the scores for all matching fields (effectively defeating the purpose of `best_fields`). +`time_zone` | UTC offset | The time zone to use (e.g. `-08:00`) if the query string contains a date range (e.g. `"query": "wind rises release_date[2012-01-01 TO 2014-01-01]"`). The default is `UTC`. +`type` | `best_fields, most_fields, cross-fields, phrase, phrase_prefix` | Determines how OpenSearch executes the query and scores the results. The default is `best_fields`. +`zero_terms_query` | `none, all` | If the analyzer removes all terms from a query string, whether to match no documents (default) or all documents. For example, the `stop` analyzer removes all terms from the string "an but this." diff --git a/docs/opensearch/index-alias.md b/docs/opensearch/index-alias.md new file mode 100644 index 00000000..53289d51 --- /dev/null +++ b/docs/opensearch/index-alias.md @@ -0,0 +1,203 @@ +--- +layout: default +title: Index Aliases +parent: OpenSearch +nav_order: 4 +--- + +# Index alias + +An alias is a virtual index name that can point to one or more indices. + +If your data is spread across multiple indices, rather than keeping track of which indices to query, you can create an alias and query it instead. + +For example, if you’re storing logs into indices based on the month and you frequently query the logs for the previous two months, you can create a `last_2_months` alias and update the indices it points to each month. + +Because you can change the indices an alias points to at any time, referring to indices using aliases in your applications allows you to reindex your data without any downtime. + +--- + +#### Table of contents +1. TOC +{:toc} + + +--- + +## Create aliases + +To create an alias, use a POST request: + +```json +POST _aliases +``` + +Use the `actions` method to specify the list of actions that you want to perform. This command creates an alias named `alias1` and adds `index-1` to this alias: + +```json +POST _aliases +{ + "actions": [ + { + "add": { + "index": "index-1", + "alias": "alias1" + } + } + ] +} +``` + +You should see the following response: + +```json +{ + "acknowledged": true +} +``` + +If this request fails, make sure the index that you're adding to the alias already exists. + +To check if `alias1` refers to `index-1`, run the following command: + +```json +GET alias1 +``` + +## Add or remove indices + +You can perform multiple actions in the same `_aliases` operation. +For example, the following command removes `index-1` and adds `index-2` to `alias1`: + +```json +POST _aliases +{ + "actions": [ + { + "remove": { + "index": "index-1", + "alias": "alias1" + } + }, + { + "add": { + "index": "index-2", + "alias": "alias1" + } + } + ] +} +``` + +The `add` and `remove` actions occur atomically, which means that at no point will `alias1` point to both `index-1` and `index-2`. + +You can also add indices based on an index pattern: + +```json +POST _aliases +{ + "actions": [ + { + "add": { + "index": "index*", + "alias": "alias1" + } + } + ] +} +``` + +## Manage aliases + +To list the mapping of aliases to indices, run the following command: + +```json +GET _cat/aliases?v +``` + +#### Sample response + +```json +alias index filter routing.index routing.search +alias1 index-1 * - - +``` + +To check which indices an alias points to, run the following command: + +```json +GET _alias/alias1 +``` + +#### Sample response + +```json +{ + "index-2": { + "aliases": { + "alias1": {} + } + } +} +``` + +Conversely, to find which alias points to a specific index, run the following command: + +```json +GET /index-2/_alias/* +``` + +To check if an alias exists, run the following command: + +```json +HEAD /alias1/_alias/ +``` + +## Add aliases at index creation + +You can add an index to an alias as you create the index: + +```json +PUT index-1 +{ + "aliases": { + "alias1": {} + } +} +``` + +## Create filtered aliases + +You can create a filtered alias to access a subset of documents or fields from the underlying indices. + +This command adds only a specific timestamp field to `alias1`: + +```json +POST _aliases +{ + "actions": [ + { + "add": { + "index": "index-1", + "alias": "alias1", + "filter": { + "term": { + "timestamp": "1574641891142" + } + } + } + } + ] +} +``` + +## Index alias options + +You can specify the options shown in the following table. + +Option | Valid values | Description | Required +:--- | :--- | :--- +`index` | String | The name of the index that the alias points to. | Yes +`alias` | String | The name of the alias. | No +`filter` | Object | Add a filter to the alias. | No +`routing` | String | Limit search to an associated shard value. You can specify `search_routing` and `index_routing` independently. | No +`is_write_index` | String | Specify the index that accepts any write operations to the alias. If this value is not specified, then no write operations are allowed. | No diff --git a/docs/opensearch/index-data.md b/docs/opensearch/index-data.md new file mode 100644 index 00000000..c6bcaaf0 --- /dev/null +++ b/docs/opensearch/index-data.md @@ -0,0 +1,274 @@ +--- +layout: default +title: Index Data +parent: OpenSearch +nav_order: 3 +--- + +# Index data + +You index data using the OpenSearch REST API. Two APIs exist: the index API and the `_bulk` API. + +For situations in which new data arrives incrementally (for example, customer orders from a small business), you might use the index API to add documents individually as they arrive. For situations in which the flow of data is less frequent (for example, weekly updates to a marketing website), you might prefer to generate a file and send it to the `_bulk` API. For large numbers of documents, lumping requests together and using the `_bulk` API offers superior performance. If your documents are enormous, however, you might need to index them individually. + + +## Introduction to indexing + +Before you can search data, you must *index* it. Indexing is the method by which search engines organize data for fast retrieval. The resulting structure is called, fittingly, an index. + +In OpenSearch, the basic unit of data is a JSON *document*. Within an index, OpenSearch identifies each document using a unique *ID*. + +A request to the index API looks like the following: + +```json +PUT /_doc/ +{ "A JSON": "document" } +``` + +A request to the `_bulk` API looks a little different, because you specify the index and ID in the bulk data: + +```json +POST _bulk +{ "index": { "_index": "", "_id": "" } } +{ "A JSON": "document" } + +``` + +Bulk data must conform to a specific format, which requires a newline character (`\n`) at the end of every line, including the last line. This is the basic format: + +``` +Action and metadata\n +Optional document\n +Action and metadata\n +Optional document\n + +``` + +The document is optional, because `delete` actions do not require a document. The other actions (`index`, `create`, and `update`) all require a document. If you specifically want the action to fail if the document already exists, use the `create` action instead of the `index` action. +{: .note } + +To index bulk data using the `curl` command, navigate to the folder where you have your file saved and run the following command: + +```json +curl -H "Content-Type: application/x-ndjson" -POST https://localhost:9200/data/_bulk -u 'admin:admin' --insecure --data-binary "@data.json" +``` + +If any one of the actions in the `_bulk` API fail, OpenSearch continues to execute the other actions. Examine the `items` array in the response to figure out what went wrong. The entries in the `items` array are in the same order as the actions specified in the request. + +OpenSearch features automatic index creation when you add a document to an index that doesn't already exist. It also features automatic ID generation if you don't specify an ID in the request. This simple example automatically creates the movies index, indexes the document, and assigns it a unique ID: + +```json +POST movies/_doc +{ "title": "Spirited Away" } +``` + +Automatic ID generation has a clear downside: because the indexing request didn't specify a document ID, you can't easily update the document at a later time. Also, if you run this request 10 times, OpenSearch indexes this document as 10 different documents with unique IDs. To specify an ID of 1, use the following request, and note the use of PUT instead of POST: + +```json +PUT movies/_doc/1 +{ "title": "Spirited Away" } +``` + +Because you must specify an ID, if you run this command 10 times, you still have just one document indexed with the `_version` field incremented to 10. + +Indices default to one primary shard and one replica. If you want to specify non-default settings, create the index before adding documents: + +```json +PUT more-movies +{ "settings": { "number_of_shards": 6, "number_of_replicas": 2 } } +``` + +## Naming restrictions for indices + +OpenSearch indices have the following naming restrictions: + +- All letters must be lowercase. +- Index names can't begin with `_` (underscore) or `-` (hyphen). +- Index names can't contain spaces, commas, or the following characters: + + `:`, `"`, `*`, `+`, `/`, `\`, `|`, `?`, `#`, `>`, or `<` + +## Read data + +After you index a document, you can retrieve it by sending a GET request to the same endpoint that you used for indexing: + +```json +GET movies/_doc/1 + +{ + "_index" : "movies", + "_type" : "_doc", + "_id" : "1", + "_version" : 1, + "_seq_no" : 0, + "_primary_term" : 1, + "found" : true, + "_source" : { + "title" : "Spirited Away" + } +} +``` + +You can see the document in the `_source` object. If the document is not found, the `found` key is `false` and the `_source` object is not part of the response. + +To retrieve multiple documents with a single command, use the `_mget` operation. +The format for retrieving multiple documents is similar to the `_bulk` operation, where you must specify the index and ID in the request body: + +```json +GET _mget +{ + "docs": [ + { + "_index": "", + "_id": "" + }, + { + "_index": "", + "_id": "" + } + ] +} +``` + +To only return specific fields in a document: + +```json +GET _mget +{ + "docs": [ + { + "_index": "", + "_id": "", + "_source": "field1" + }, + { + "_index": "", + "_id": "", + "_source": "field2" + } + ] +} +``` + +To check if a document exists: + +```json +HEAD movies/_doc/ +``` + +If the document exists, you get back a `200 OK` response, and if it doesn't, you get back a `404 - Not Found` error. + +## Update data + +To update existing fields or to add new fields, send a POST request to the `_update` operation with your changes in a `doc` object: + +```json +POST movies/_update/1 +{ + "doc": { + "title": "Castle in the Sky", + "genre": ["Animation", "Fantasy"] + } +} +``` + +Note the updated `title` field and new `genre` field: + +```json +GET movies/_doc/1 + +{ + "_index" : "movies", + "_type" : "_doc", + "_id" : "1", + "_version" : 2, + "_seq_no" : 1, + "_primary_term" : 1, + "found" : true, + "_source" : { + "title" : "Castle in the Sky", + "genre" : [ + "Animation", + "Fantasy" + ] + } +} +``` + +The document also has an incremented `_version` field. Use this field to keep track of how many times a document is updated. + +POST requests make partial updates to documents. To altogether replace a document, use a PUT request: + +```json +PUT movies/_doc/1 +{ + "title": "Spirited Away" +} +``` + +The document with ID of 1 will contain only the `title` field, because the entire document will be replaced with the document indexed in this PUT request. + +Use the `upsert` object to conditionally update documents based on whether they already exist. Here, if the document exists, its `title` field changes to `Castle in the Sky`. If it doesn't, OpenSearch indexes the document in the `upsert` object. + +```json +POST movies/_update/2 +{ + "doc": { + "title": "Castle in the Sky" + }, + "upsert": { + "title": "Only Yesterday", + "genre": ["Animation", "Fantasy"], + "date": 1993 + } +} +``` + +#### Sample response + +```json +{ + "_index" : "movies", + "_type" : "_doc", + "_id" : "2", + "_version" : 2, + "result" : "updated", + "_shards" : { + "total" : 2, + "successful" : 1, + "failed" : 0 + }, + "_seq_no" : 3, + "_primary_term" : 1 +} +``` + +Each update operation for a document has a unique combination of the `_seq_no` and `_primary_term` values. + +OpenSearch first writes your updates to the primary shard and then sends this change to all the replica shards. An uncommon issue can occur if multiple users of your OpenSearch-based application make updates to existing documents in the same index. In this situation, another user can read and update a document from a replica before it receives your update from the primary shard. Your update operation then ends up updating an older version of the document. In the best case, you and the other user make the same changes, and the document remains accurate. In the worst case, the document now contains out-of-date information. + +To prevent this situation, use the `_seq_no` and `_primary_term` values in the request header: + +```json +POST movies/_update/2?if_seq_no=3&if_primary_term=1 +{ + "doc": { + "title": "Castle in the Sky", + "genre": ["Animation", "Fantasy"] + } +} +``` + +If the document is updated after we retrieved it, the `_seq_no` and `_primary_term` values are different and our update operation fails with a `409 — Conflict` error. + +When using the `_bulk` API, specify the `_seq_no` and `_primary_term` values within the action metadata. + +## Delete data + +To delete a document from an index, use a DELETE request: + +```json +DELETE movies/_doc/1 +``` + +The DELETE operation increments the `_version` field. If you add the document back to the same ID, the `_version` field increments again. This behavior occurs because OpenSearch deletes the document `_source`, but retains its metadata. diff --git a/docs/opensearch/index-templates.md b/docs/opensearch/index-templates.md new file mode 100644 index 00000000..3d0a6f94 --- /dev/null +++ b/docs/opensearch/index-templates.md @@ -0,0 +1,206 @@ +--- +layout: default +title: Index Templates +parent: OpenSearch +nav_order: 5 +--- + +# Index template + +Index templates let you initialize new indices with predefined mappings and settings. For example, if you continuously index log data, you can define an index template so that all of these indices have the same number of shards and replicas. + +OpenSearch switched from `_template` to `_index_template` in version 7.8. Use `_template` for older versions of OpenSearch. +{: .note } + +--- + +#### Table of contents +1. TOC +{:toc} + + +--- + +## Create template + +To create an index template, use a POST request: + +```json +POST _index_template +``` + +This command creates a template named `daily_logs` and applies it to any new index whose name matches the regular expression `logs-2020-01-*` and also adds it to the `my_logs` alias: + +```json +PUT _index_template/daily_logs +{ + "index_patterns": [ + "logs-2020-01-*" + ], + "template": { + "aliases": { + "my_logs": {} + }, + "settings": { + "number_of_shards": 2, + "number_of_replicas": 1 + }, + "mappings": { + "properties": { + "timestamp": { + "type": "date", + "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis" + }, + "value": { + "type": "double" + } + } + } + } +} +``` + +You should see the following response: + +```json +{ + "acknowledged": true +} +``` + +If you create an index named `logs-2020-01-01`, you can see that it has the mappings and settings from the template: + +```json +PUT logs-2020-01-01 +GET logs-2020-01-01 +``` + +```json +{ + "logs-2020-01-01": { + "aliases": { + "my_logs": {} + }, + "mappings": { + "properties": { + "timestamp": { + "type": "date", + "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis" + }, + "value": { + "type": "double" + } + } + }, + "settings": { + "index": { + "creation_date": "1578107970779", + "number_of_shards": "2", + "number_of_replicas": "1", + "uuid": "U1vMDMOHSAuS2IzPcPHpOA", + "version": { + "created": "7010199" + }, + "provided_name": "logs-2020-01-01" + } + } + } +} +``` + +Any additional indices that match this pattern---`logs-2020-01-02`, `logs-2020-01-03`, and so on---will inherit the same mappings and settings. + +## Retrieve template + +To list all index templates: + +```json +GET _cat/templates +``` + +To find a template by its name: + +```json +GET _index_template/daily_logs +``` + +To get a list of all your templates: + +```json +GET _index_template/daily_logs +``` + +To get a list of all templates that match a pattern: + +```json +GET _index_template/daily* +``` + +To check if a specific template exists: + +```json +HEAD _index_template/ +``` + +## Configure multiple templates + +You can create multiple index templates for your indices. If the index name matches more than one template, OpenSearch merges all mappings and settings from all matching templates and applies them to the index. + +The settings from the more recently created index templates override the settings of older index templates. So, you can first define a few common settings in a generic template that can act as a catch-all and then add more specialized settings as required. + +An even better approach is to explicitly specify template priority using the `order` parameter. OpenSearch applies templates with lower priority numbers first and then overrides them with templates that have higher priority numbers. + +For example, say you have the following two templates that both match the `logs-2020-01-02` index and there’s a conflict in the `number_of_shards` field: + +#### Template 1 + +```json +PUT _index_template/template-01 +{ + "index_patterns": [ + "logs*" + ], + "priority": 0, + "template": { + "settings": { + "number_of_shards": 2 + } + } +} +``` + +#### Template 2 + +```json +PUT _index_template/template-02 +{ + "index_patterns": [ + "logs-2020-01-*" + ], + "priority": 1, + "template": { + "settings": { + "number_of_shards": 3 + } + } +} +``` + +Because `template-02` has a higher `priority` value, it takes precedence over `template-01` . The `logs-2020-01-02` index would have the `number_of_shards` value as 3. + +## Delete template + +You can delete an index template using its name, as shown in the following command: + +```json +DELETE _index_template/daily_logs +``` + +## Index template options + +You can specify the options shown in the following table: + +Option | Type | Description | Required +:--- | :--- | :--- | :--- +`priority` | `Number` | Specify the priority of the index template. | No +`create` | `Boolean` | Specify whether this index template should replace an existing one. | No diff --git a/docs/opensearch/index.md b/docs/opensearch/index.md new file mode 100644 index 00000000..6f7c4255 --- /dev/null +++ b/docs/opensearch/index.md @@ -0,0 +1,95 @@ +--- +layout: default +title: OpenSearch +nav_order: 10 +has_children: true +has_toc: false +--- + +# Introduction to OpenSearch + +OpenSearch is a distributed search and analytics engine based on [Apache Lucene](https://lucene.apache.org/). After adding your data to OpenSearch, you can perform full-text searches on it with all of the features you might expect: search by field, search multiple indices, boost fields, rank results by score, sort results by field, and aggregate results. + +Unsurprisingly, people often use OpenSearch as the backend for a search application---think [Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:FAQ/Technical#What_software_is_used_to_run_Wikipedia?) or an online store. It offers excellent performance and can scale up and down as the needs of the application grow or shrink. + +An equally popular, but less obvious use case is log analytics, in which you take the logs from an application, feed them into OpenSearch, and use the rich search and visualization functionality to identify issues. For example, a malfunctioning web server might throw a 500 error 0.5% of the time, which can be hard to notice unless you have a real-time graph of all HTTP status codes that the server has thrown in the past four hours. You can use [OpenSearch Dashboards](../opensearch-dashboards/) to build these sorts of visualizations from data in OpenSearch. + + +## Clusters and nodes + +Its distributed design means that you interact with OpenSearch *clusters*. Each cluster is a collection of one or more *nodes*, servers that store your data and process search requests. + +You can run OpenSearch locally on a laptop---its system requirements are minimal---but you can also scale a single cluster to hundreds of powerful machines in a data center. + +In a single node cluster, such as a laptop, one machine has to do everything: manage the state of the cluster, index and search data, and perform any preprocessing of data prior to indexing it. As a cluster grows, however, you can subdivide responsibilities. Nodes with fast disks and plenty of RAM might be great at indexing and searching data, whereas a node with plenty of CPU power and a tiny disk could manage cluster state. For more information on setting node types, see [Cluster Formation](cluster/). + + +## Indices and documents + +OpenSearch organizes data into *indices*. Each index is a collection of JSON *documents*. If you have a set of raw encyclopedia articles or log lines that you want to add to OpenSearch, you must first convert them to [JSON](https://www.json.org/). A simple JSON document for a movie might look like this: + +```json +{ + "title": "The Wind Rises", + "release_date": "2013-07-20" +} +``` + +When you add the document to an index, OpenSearch adds some metadata, such as the unique document *ID*: + +```json +{ + "_index": "", + "_type": "_doc", + "_id": "", + "_version": 1, + "_source": { + "title": "The Wind Rises", + "release_date": "2013-07-20" + } +} +``` + +Indices also contain mappings and settings: + +- A *mapping* is the collection of *fields* that documents in the index have. In this case, those fields are `title` and `release_date`. +- Settings include data like the index name, creation date, and number of shards. + +Older versions of OpenSearch used arbitrary document *types*, but indices created in current versions of OpenSearch should use a single type named `_doc`. Store different document types in different indices. +{: .note } + + +## Primary and replica shards + +OpenSearch splits indices into *shards* for even distribution across nodes in a cluster. For example, a 400 GB index might be too large for any single node in your cluster to handle, but split into ten shards, each one 40 GB, OpenSearch can distribute the shards across ten nodes and work with each shard individually. + +By default, OpenSearch creates a *replica* shard for each *primary* shard. If you split your index into ten shards, for example, OpenSearch also creates ten replica shards. These replica shards act as backups in the event of a node failure---OpenSearch distributes replica shards to different nodes than their corresponding primary shards---but they also improve the speed and rate at which the cluster can process search requests. You might specify more than one replica per index for a search-heavy workload. + +Despite being a piece of an OpenSearch index, each shard is actually a full Lucene index---confusing, we know. This detail is important, though, because each instance of Lucene is a running process that consumes CPU and memory. More shards is not necessarily better. Splitting a 400 GB index into 1,000 shards, for example, would place needless strain on your cluster. A good rule of thumb is to keep shard size between 10--50 GB. + + +## REST API + +You interact with OpenSearch clusters using the REST API, which offers a lot of flexibility. You can use clients like [curl](https://curl.haxx.se/) or any programming language that can send HTTP requests. To add a JSON document to an OpenSearch index (i.e. index a document), you send an HTTP request: + +```json +PUT https://://_doc/ +{ + "title": "The Wind Rises", + "release_date": "2013-07-20" +} +``` + +To run a search for the document: + +``` +GET https://://_search?q=wind +``` + +To delete the document: + +``` +DELETE https://://_doc/ +``` + +You can change most OpenSearch settings using the REST API, modify indices, check the health of the cluster, get statistics---almost everything. diff --git a/docs/opensearch/logs.md b/docs/opensearch/logs.md new file mode 100644 index 00000000..2797999d --- /dev/null +++ b/docs/opensearch/logs.md @@ -0,0 +1,175 @@ +--- +layout: default +title: Logs +parent: OpenSearch +nav_order: 20 +--- + +# Logs + +The OpenSearch logs include valuable information for monitoring cluster operations and troubleshooting issues. The location of the logs differs based on the installation type: + +- On Docker, OpenSearch writes most logs to the console and stores the remainder in `opensearch/logs/`. The tarball installation also uses `opensearch/logs/`. +- On the RPM and Debian installations, OpenSearch writes logs to `/var/log/opensearch/`. + +Logs are available as `.log` (plain text) and `.json` files. + + +## Application logs + +For its application logs, OpenSearch uses [Apache Log4j 2](https://logging.apache.org/log4j/2.x/) and its built-in log levels (from least to most severe) of TRACE, DEBUG, INFO, WARN, ERROR, and FATAL. The default OpenSearch log level is INFO. + +Rather than changing the default log level (`logger.level`), you change the log level for individual OpenSearch modules: + +```json +PUT /_cluster/settings +{ + "persistent" : { + "logger.org.opensearch.index.reindex" : "DEBUG" + } +} +``` + +The easiest way to identify modules is not from the logs, which abbreviate the path (for example, `o.o.i.r`), but from the [OpenSearch source code](https://github.com/opensearch-project/opensearch/tree/master/server/src/main/java/org/opensearch). +{: .tip } + +After this sample change, OpenSearch emits much more detailed logs during reindex operations: + +``` +[2019-10-18T16:52:51,184][DEBUG][o.o.i.r.TransportReindexAction] [node1] [1626]: starting +[2019-10-18T16:52:51,186][DEBUG][o.o.i.r.TransportReindexAction] [node1] executing initial scroll against [some-index] +[2019-10-18T16:52:51,291][DEBUG][o.o.i.r.TransportReindexAction] [node1] scroll returned [3] documents with a scroll id of [DXF1Z==] +[2019-10-18T16:52:51,292][DEBUG][o.o.i.r.TransportReindexAction] [node1] [1626]: got scroll response with [3] hits +[2019-10-18T16:52:51,294][DEBUG][o.o.i.r.WorkerBulkByScrollTaskState] [node1] [1626]: preparing bulk request for [0s] +[2019-10-18T16:52:51,297][DEBUG][o.o.i.r.TransportReindexAction] [node1] [1626]: preparing bulk request +[2019-10-18T16:52:51,299][DEBUG][o.o.i.r.TransportReindexAction] [node1] [1626]: sending [3] entry, [222b] bulk request +[2019-10-18T16:52:51,310][INFO ][o.e.c.m.MetaDataMappingService] [node1] [some-new-index/R-j3adc6QTmEAEb-eAie9g] create_mapping [_doc] +[2019-10-18T16:52:51,383][DEBUG][o.o.i.r.TransportReindexAction] [node1] [1626]: got scroll response with [0] hits +[2019-10-18T16:52:51,384][DEBUG][o.o.i.r.WorkerBulkByScrollTaskState] [node1] [1626]: preparing bulk request for [0s] +[2019-10-18T16:52:51,385][DEBUG][o.o.i.r.TransportReindexAction] [node1] [1626]: preparing bulk request +[2019-10-18T16:52:51,386][DEBUG][o.o.i.r.TransportReindexAction] [node1] [1626]: finishing without any catastrophic failures +[2019-10-18T16:52:51,395][DEBUG][o.o.i.r.TransportReindexAction] [node1] Freed [1] contexts +``` + +The DEBUG and TRACE levels are extremely verbose. If you enable either one to troubleshoot a problem, disable it after you finish. + +There are other ways to change log levels: + +1. Add lines to `opensearch.yml`: + + ```yml + logger.org.opensearch.index.reindex: debug + ``` + + Modifying `opensearch.yml` makes the most sense if you want to reuse your logging configuration across multiple clusters or debug startup issues with a single node. + +2. Modify `log4j2.properties`: + + ``` + # Define a new logger with unique ID of reindex + logger.reindex.name = org.opensearch.index.reindex + # Set the log level for that ID + logger.reindex.level = debug + ``` + + This approach is extremely flexible, but requires familiarity with the [Log4j 2 property file syntax](https://logging.apache.org/log4j/2.x/manual/configuration.html#Properties). In general, the other options offer a simpler configuration experience. + + If you examine the default `log4j2.properties` file in the configuration directory, you can see a few OpenSearch-specific variables: + + ``` + appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] [%node_name]%marker %m%n + appender.rolling_old.fileName = ${sys:os.logs.base_path}${sys:file.separator}${sys:os.logs.cluster_name}.log + ``` + + - `${sys:os.logs.base_path}` is the directory for logs (for example, `/var/log/opensearch/`). + - `${sys:os.logs.cluster_name}` is the name of the cluster. + - `[%node_name]` is the name of the node. + + +## Slow logs + +OpenSearch has two *slow logs*, logs that help you identify performance issues: the search slow log and the indexing slow log. + +These logs rely on thresholds to define what qualifies as a "slow" search or indexing operation. For example, you might decide that a query is slow if it takes more than 15 seconds to complete. Unlike application logs, which you configure for modules, you configure slow logs for indices. By default, both logs are disabled (all thresholds are set to `-1`): + +```json +GET /_settings?include_defaults=true + +{ + "indexing": { + "slowlog": { + "reformat": "true", + "threshold": { + "index": { + "warn": "-1", + "trace": "-1", + "debug": "-1", + "info": "-1" + } + }, + "source": "1000", + "level": "TRACE" + } + }, + "search": { + "slowlog": { + "level": "TRACE", + "threshold": { + "fetch": { + "warn": "-1", + "trace": "-1", + "debug": "-1", + "info": "-1" + }, + "query": { + "warn": "-1", + "trace": "-1", + "debug": "-1", + "info": "-1" + } + } + } + } +} +``` + +To enable these logs, increase one or more thresholds: + +```json +PUT /_settings +{ + "indexing": { + "slowlog": { + "threshold": { + "index": { + "warn": "15s", + "trace": "750ms", + "debug": "3s", + "info": "10s" + } + }, + "source": "500", + "level": "INFO" + } + } +} +``` + +In this example, OpenSearch logs indexing operations that take 15 seconds or longer at the WARN level and operations that take between 10 and 14.*x* seconds at the INFO level. If you set a threshold to 0 seconds, OpenSearch logs all operations, which can be useful for testing that slow logs are indeed enabled. + +- `reformat` specifies whether to log the document `_source` field as a single line (`true`) or let it span multiple lines (`false`). +- `source` is the number of characters of the document `_source` field to log. +- `level` is the minimum log level to include. + +A line from `opensearch_index_indexing_slowlog.log` might look like this: + +``` +node1 | [2019-10-24T19:48:51,012][WARN][i.i.s.index] [node1] [some-index/i86iF5kyTyy-PS8zrdDeAA] took[3.4ms], took_millis[3], type[_doc], id[1], routing[], source[{"title":"Your Name", "Director":"Makoto Shinkai"}] +``` + +Slow logs can consume considerable disk space if you set thresholds or levels too low. Consider enabling them temporarily for troubleshooting or performance tuning. To disable slow logs, return all thresholds to `-1`. + + +## Deprecation logs + +Deprecation logs record when clients make deprecated API calls to your cluster. These logs can help you identify and fix issues prior to upgrading to a new major version. By default, OpenSearch logs deprecated API calls at the WARN level, which works well for almost all use cases. If desired, configure `logger.deprecation.level` using `_cluster/settings`, `opensearch.yml`, or `log4j2.properties`. diff --git a/docs/opensearch/popular-api.md b/docs/opensearch/popular-api.md new file mode 100644 index 00000000..6db1cb5d --- /dev/null +++ b/docs/opensearch/popular-api.md @@ -0,0 +1,190 @@ +--- +layout: default +title: Popular APIs +parent: OpenSearch +nav_order: 98 +--- + +# Popular APIs + +This page contains sample requests for popular OpenSearch APIs. + + +--- + +#### Table of contents +1. TOC +{:toc} + + +--- + +## Create index with non-default settings + +```json +PUT my-logs +{ + "settings": { + "number_of_shards": 4, + "number_of_replicas": 2 + }, + "mappings": { + "properties": { + "title": { + "type": "text" + }, + "year": { + "type": "integer" + } + } + } +} +``` + + +## Index a document with a random ID + +```json +POST my-logs/_doc +{ + "title": "Your Name", + "year": "2016" +} +``` + + +## Index a document with a specific ID + +```json +PUT my-logs/_doc/1 +{ + "title": "Weathering with You", + "year": "2019" +} +``` + + +## Index several documents at once + +The blank line at the end of the request body is required. If you omit the `_id` field, OpenSearch generates a random ID. + +```json +POST _bulk +{ "index": { "_index": "my-logs", "_id": "2" } } +{ "title": "The Garden of Words", "year": 2013 } +{ "index" : { "_index": "my-logs", "_id" : "3" } } +{ "title": "5 Centimeters Per Second", "year": 2007 } + +``` + + +## List all indices + +``` +GET _cat/indices?v +``` + + +## Open or close all indices that match a pattern + +``` +POST my-logs*/_open +POST my-logs*/_close +``` + + +## Delete all indices that match a pattern + +``` +DELETE my-logs* +``` + + +## Create an index alias + +This request creates the alias `my-logs-today` for the index `my-logs-2019-11-13`. + +``` +PUT my-logs-2019-11-13/_alias/my-logs-today +``` + + +## List all aliases + +``` +GET _cat/aliases?v +``` + + +## Search an index or all indices that match a pattern + +``` +GET my-logs/_search?q=test +GET my-logs*/_search?q=test +``` + + +## Get cluster settings, including defaults + +``` +GET _cluster/settings?include_defaults=true +``` + + +## Change disk watermarks (or other cluster settings) + +```json +PUT _cluster/settings +{ + "transient": { + "cluster.routing.allocation.disk.watermark.low": "80%", + "cluster.routing.allocation.disk.watermark.high": "85%" + } +} +``` + + +## Get cluster health + +``` +GET _cluster/health +``` + + +## List nodes in the cluster + +``` +GET _cat/nodes?v +``` + + +## Get node statistics + +``` +GET _nodes/stats +``` + + +## Get snapshots in a repository + +``` +GET _snapshot/my-repository/_all +``` + + +## Take a snapshot + +``` +PUT _snapshot/my-repository/my-snapshot +``` + + +## Restore a snapshot + +```json +POST _snapshot/my-repository/my-snapshot/_restore +{ + "indices": "-.opensearch_security", + "include_global_state": false +} +``` diff --git a/docs/opensearch/reindex-data.md b/docs/opensearch/reindex-data.md new file mode 100644 index 00000000..68069c74 --- /dev/null +++ b/docs/opensearch/reindex-data.md @@ -0,0 +1,285 @@ +--- +layout: default +title: Reindex Data +parent: OpenSearch +nav_order: 6 +--- + +# Reindex data + +After creating an index, if you need to make an extensive change such as adding a new field to every document or combining multiple indices to form a new one, rather than deleting your index, making the change offline, and then indexing your data all over again, you can use the `reindex` operation. + +With the `reindex` operation, you can copy all or a subset of documents that you select through a query to another index. Reindex is a `POST` operation. In its most basic form, you specify a source index and a destination index. + +Reindexing can be an expensive operation depending on the size of your source index. We recommend you disable replicas in your destination index by setting `number_of_replicas` to `0` and re-enable them once the reindex process is complete. +{: .note } + +--- + +#### Table of contents +1. TOC +{:toc} + + +--- + +## Reindex all documents + +You can copy all documents from one index to another. + +You first need to create a destination index with your desired field mappings and settings or you can copy the ones from your source index: + +```json +PUT destination +{ + "mappings":{ + "Add in your desired mappings" + }, + "settings":{ + "Add in your desired settings" + } +} +``` + +This `reindex` command copies all the documents from a source index to a destination index: + +```json +POST _reindex +{ + "source":{ + "index":"source" + }, + "dest":{ + "index":"destination" + } +} +``` + +If the destination index is not already created, the `reindex` operation creates a new destination index with default configurations. + +## Reindex from a remote cluster + +You can copy documents from an index in a remote cluster. Use the `remote` option to specify the remote hostname and the required login credentials. + +This command reaches out to a remote cluster, logs in with the username and password, and copies all the documents from the source index in that remote cluster to the destination index in your local cluster: + +```json +POST _reindex +{ + "source":{ + "remote":{ + "host":"https://:9200", + "username":"YOUR_USERNAME", + "password":"YOUR_PASSWORD" + } + }, + "dest":{ + "index":"destination" + } +} +``` + +You can specify the following options: + +Options | Valid values | Description | Required +:--- | :--- | :--- +`host` | String | The REST endpoint of the remote cluster. | Yes +`username` | String | The username to login to the remote cluster. | No +`password` | String | The password to login to the remote cluster. | No +`socket_timeout` | Time Unit | The wait time for socket reads (default 30s). | No +`connect_timeout` | Time Unit | The wait time for remote connection timeouts (default 30s). | No + + +## Reindex a subset of documents + +You can copy only a specific set of documents that match a search query. + +This command copies only a subset of documents matched by a query operation to the destination index: + +```json +POST _reindex +{ + "source":{ + "index":"source", + "query": { + "match": { + "field_name": "text" + } + } + }, + "dest":{ + "index":"destination" + } +} +``` + +For a list of all query operations, see [Full-text queries](../full-text/). + +## Combine one or more indices + +You can combine documents from one or more indices by adding the source indices as a list. + +This command copies all documents from two source indices to one destination index: + +```json +POST _reindex +{ + "source":{ + "index":[ + "source_1", + "source_2" + ] + }, + "dest":{ + "index":"destination" + } +} +``` +Make sure the number of shards for your source and destination indices are the same. + +## Reindex only unique documents + +You can copy only documents missing from a destination index by setting the `op_type` option to `create`. +In this case, if a document with the same ID already exists, the operation ignores the one from the source index. +To ignore all version conflicts of documents, set the `conflicts` option to `proceed`. + +```json +POST _reindex +{ + "conflicts":"proceed", + "source":{ + "index":"source" + }, + "dest":{ + "index":"destination", + "op_type":"create" + } +} +``` + +## Reindex sorted documents + +You can copy certain documents after sorting specific fields in the document. + +This command copies the last 10 documents based on the `timestamp` field: + +```json +POST _reindex +{ + "size":10, + "source":{ + "index":"source", + "sort":{ + "timestamp":"desc" + } + }, + "dest":{ + "index":"destination" + } +} +``` + +## Transform documents during reindexing + +You can transform your data during the reindexing process using the `script` option. +We recommend Painless for scripting in OpenSearch. + +This command runs the source index through a Painless script that increments a `number` field inside an `account` object before copying it to the destination index: + +```json +POST _reindex +{ + "source":{ + "index":"source" + }, + "dest":{ + "index":"destination" + }, + "script":{ + "lang":"painless", + "source":"ctx._account.number++" + } +} +``` + +You can also specify an ingest pipeline to transform your data during the reindexing process. + +You would first have to create a pipeline with `processors` defined. You have a number of different `processors` available to use in your ingest pipeline. + +Here's a sample ingest pipeline that defines a `split` processor that splits a `text` field based on a `space` separator and stores it in a new `word` field. The `script` processor is a Painless script that finds the length of the `word` field and stores it in a new `word_count` field. The `remove` processor removes the `test` field. + +```json +PUT _ingest/pipeline/pipeline-test +{ +"description": "Splits the text field into a list. Computes the length of the 'word' field and stores it in a new 'word_count' field. Removes the 'test' field.", +"processors": [ + { + "split": { + "field": "text", + "separator": "\\s+", + "target_field": "word" + }, + } + { + "script": { + "lang": "painless", + "source": "ctx.word_count = ctx.word.length" + } + }, + { + "remove": { + "field": "test" + } + } +] +} +``` + +After creating a pipeline, you can use the `reindex` operation: + +```json +POST _reindex +{ + "source": { + "index": "source", + }, + "dest": { + "index": "destination", + "pipeline": "pipeline-test" + } +} +``` + +## Update documents in current index + +To update your data in your current index itself without copying it to a different index, use the `update_by_query` operation. + +The `update_by_query` operation is `POST` operation that you can perform on a single index at a time. + +```json +POST /_update_by_query +``` + +If you run this command with no parameters, it increments the version number for all documents in the index. + +## Source index options + +You can specify the following options for your source index: + +Option | Valid values | Description | Required +:--- | :--- | :--- +`index` | String | The name of the source index. You can provide multiple source indices as a list. | Yes +`max_docs` | Integer | The maximum number of documents to reindex. | No +`query` | Object | The search query to use for the reindex operation. | No +`size` | Integer | The number of documents to reindex. | No +`slice` | String | Specify manual or automatic slicing to parallelize reindexing. | No +`sort` | List | Sort specific fields in the document before reindexing. | No + +## Destination index options + +You can specify the following options for your destination index: + +Option | Valid values | Description | Required +:--- | :--- | :--- +`index` | String | The name of the destination index. | Yes +`version_type` | Enum | The version type for the indexing operation. Valid values: internal, external, external_gt, external_gte. | No diff --git a/docs/opensearch/search-template.md b/docs/opensearch/search-template.md new file mode 100644 index 00000000..4432fd54 --- /dev/null +++ b/docs/opensearch/search-template.md @@ -0,0 +1,416 @@ +--- +layout: default +title: Search Templates +parent: OpenSearch +nav_order: 11 +--- + +# Search templates + +You can convert your full-text queries into a search template to accept user input and dynamically insert it into your query. + +For example, if you use OpenSearch as a backend search engine for your application or website, you can take in user queries from a search bar or a form field and pass them as parameters into a search template. That way, the syntax to create OpenSearch queries is abstracted from your end users. + +When you're writing code to convert user input into OpenSearch queries, you can simplify your code with search templates. If you need to add fields to your search query, you can just modify the template without making changes to your code. + +Search templates use the Mustache language. For a list of all syntax options, see the [Mustache manual](http://mustache.github.io/mustache.5.html). +{: .note } + +## Create search templates + +A search template has two components: the query and the parameters. Parameters are user-inputted values that get placed into variables. Variables are represented with double braces in Mustache notation. When encountering a variable like `{% raw %}{{var}}{% endraw %}` in the query, OpenSearch goes to the `params` section, looks for a parameter called `var`, and replaces it with the specified value. + +You can code your application to ask your user what they want to search for and then plug in that value in the `params` object at runtime. + +This command defines a search template to find a play by its name. The `{% raw %}{{play_name}}{% endraw %}` in the query is replaced by the value `Henry IV`: + +```json +GET _search/template +{ + "source": { + "query": { + "match": { + "play_name": "{% raw %}{{play_name}}{% endraw %}" + } + } + }, + "params": { + "play_name": "Henry IV" + } +} +``` + +This template runs the search on your entire cluster. +To run this search on a specific index, add the index name to the request: + +```json +GET shakespeare/_search/template +``` + +Specify the `from` and `size` parameters: + +```json +GET _search/template +{ + "source": { + "from": "{% raw %}{{from}}{% endraw %}", + "size": "{% raw %}{{size}}{% endraw %}", + "query": { + "match": { + "play_name": "{% raw %}{{play_name}}{% endraw %}" + } + } + }, + "params": { + "play_name": "Henry IV", + "from": 10, + "size": 10 + } +} +``` + +To improve the search experience, you can define defaults so that the user doesn’t have to specify every possible parameter. If the parameter is not defined in the `params` section, OpenSearch uses the default value. + +The syntax for defining the default value for a variable `var` is as follows: + +```json +{% raw %}{{var}}{{^var}}default value{{/var}}{% endraw %} +``` + +This command sets the defaults for `from` as 10 and `size` as 10: + +```json +GET _search/template +{ + "source": { + "from": "{% raw %}{{from}}{{^from}}10{{/from}}{% endraw %}", + "size": "{% raw %}{{size}}{{^size}}10{{/size}}{% endraw %}", + "query": { + "match": { + "play_name": "{% raw %}{{play_name}}{% endraw %}" + } + } + }, + "params": { + "play_name": "Henry IV" + } +} +``` + + +## Save and execute search templates + +After the search template works the way you want it to, you can save the source of that template as a script, making it reusable for different input parameters. + +When saving the search template as a script, you need to specify the `lang` parameter as `mustache`: + +```json +POST _scripts/play_search_template +{ + "script": { + "lang": "mustache", + "source": { + "from": "{% raw %}{{from}}{{^from}}0{{/from}}{% endraw %}", + "size": "{% raw %}{{size}}{{^size}}10{{/size}}{% endraw %}", + "query": { + "match": { + "play_name": "{{play_name}}" + } + } + }, + "params": { + "play_name": "Henry IV" + } + } +} +``` + +Now you can reuse the template by referring to its `id` parameter. +You can reuse this source template for different input values. + +```json +GET _search/template +{ + "id": "play_search_template", + "params": { + "play_name": "Henry IV", + "from": 0, + "size": 1 + } +} +``` +#### Sample output + +```json +{ + "took": 7, + "timed_out": false, + "_shards": { + "total": 6, + "successful": 6, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 3205, + "relation": "eq" + }, + "max_score": 3.641852, + "hits": [ + { + "_index": "shakespeare", + "_type": "_doc", + "_id": "4", + "_score": 3.641852, + "_source": { + "type": "line", + "line_id": 5, + "play_name": "Henry IV", + "speech_number": 1, + "line_number": "1.1.2", + "speaker": "KING HENRY IV", + "text_entry": "Find we a time for frighted peace to pant," + } + } + ] + } +} +``` + +If you have a stored template and want to validate it, use the `render` operation: + +```json +POST _render/template +{ + "id": "play_search_template", + "params": { + "play_name": "Henry IV" + } +} +``` + +#### Sample output + +```json +{ + "template_output": { + "from": "0", + "size": "10", + "query": { + "match": { + "play_name": "Henry IV" + } + } + } +} +``` + +## Advanced parameter conversion with search templates + +You have a lot of different syntax options in Mustache to transpose the input parameters into a query. +You can specify conditions, run loops, join arrays, convert arrays to JSON, and so on. + +### Conditions + +Use the section tag in Mustache to represent conditions: + +```json +{% raw %}{{#var}}var{{/var}}{% endraw %} +``` + +When `var` is a boolean value, this syntax acts as an `if` condition. The `{% raw %}{{#var}}{% endraw %}` and `{% raw %}{{/var}}{% endraw %}` tags insert the values placed between them only if `var` evaluates to `true`. + +Using section tags would make your JSON invalid, so you must write your query in a string format instead. + +This command includes the `size` parameter in the query only when the `limit` parameter is set to `true`. +In the following example, the `limit` parameter is `true`, so the `size` parameter is activated. As a result, you would get back only two documents. + +```json +GET _search/template +{ + "source": "{% raw %}{ {{#limit}} \"size\": \"{{size}}\", {{/limit}} \"query\":{\"match\":{\"play_name\": \"{{play_name}}\"}}}{% endraw %}", + "params": { + "play_name": "Henry IV", + "limit": true, + "size": 2 + } +} +``` + +You can also design an `if-else` condition. +This command sets `size` to `2` if `limit` is `true`. Otherwise, it sets `size` to `10`. + +```json +GET _search/template +{ + "source": "{% raw %}{ {{#limit}} \"size\": \"2\", {{/limit}} {{^limit}} \"size\": \"10\", {{/limit}} \"query\":{\"match\":{\"play_name\": \"{{play_name}}\"}}}{% endraw %}", + "params": { + "play_name": "Henry IV", + "limit": true + } +} +``` + +### Loops + +You can also use the section tag to implement a foreach loop: + +``` +{% raw %}{{#var}}{{.}}}{{/var}}{% endraw %} +``` + +When `var` is an array, the search template iterates through it and creates a `terms` query. + +```json +GET _search/template +{ + "source": "{% raw %}{\"query\":{\"terms\":{\"play_name\":[\"{{#play_name}}\",\"{{.}}\",\"{{/play_name}}\"]}}}{% endraw %}", + "params": { + "play_name": [ + "Henry IV", + "Othello" + ] + } +} +``` + +This template is rendered as: + +```json +GET _search/template +{ + "source": { + "query": { + "terms": { + "play_name": [ + "Henry IV", + "Othello" + ] + } + } + } +} +``` + +### Join + +You can use the `join` tag to concatenate values of an array (separated by commas): + +```json +GET _search/template +{ + "source": { + "query": { + "match": { + "text_entry": "{% raw %}{{#join}}{{text_entry}}{{/join}}{% endraw %}" + } + } + }, + "params": { + "text_entry": [ + "To be", + "or not to be" + ] + } +} +``` + +Renders as: + +```json +GET _search/template +{ + "source": { + "query": { + "match": { + "text_entry": "{0=To be, 1=or not to be}" + } + } + } +} +``` + +### Convert to JSON + +You can use the `toJson` tag to convert parameters to their JSON representation: + +```json +GET _search/template +{ + "source": "{\"query\":{\"bool\":{\"must\":[{\"terms\": {\"text_entries\": {% raw %}{{#toJson}}text_entries{{/toJson}}{% endraw %} }}] }}}", + "params": { + "text_entries": [ + { "term": { "text_entry" : "love" } }, + { "term": { "text_entry" : "soldier" } } + ] + } +} +``` + +Renders as: + +```json +GET _search/template +{ + "source": { + "query": { + "bool": { + "must": [ + { + "terms": { + "text_entries": [ + { + "term": { + "text_entry": "love" + } + }, + { + "term": { + "text_entry": "soldier" + } + } + ] + } + } + ] + } + } + } +} +``` + +## Multiple search templates + +You can bundle multiple search templates and send them to your OpenSearch cluster in a single request using the `msearch` operation. +This saves network round trip time, so you get back the response more quickly as compared to independent requests. + +```json +GET _msearch/template +{"index":"shakespeare"} +{"id":"if_search_template","params":{"play_name":"Henry IV","limit":false,"size":2}} +{"index":"shakespeare"} +{"id":"play_search_template","params":{"play_name":"Henry IV"}} +``` + +## Manage search templates + +To list all scripts, run the following command: + +```json +GET _cluster/state/metadata?pretty&filter_path=**.stored_scripts +``` + +To retrieve a specific search template, run the following command: + +```json +GET _scripts/ +``` + +To delete a search template, run the following command: + +```json +DELETE _scripts/ +``` + +--- diff --git a/docs/opensearch/snapshot-restore.md b/docs/opensearch/snapshot-restore.md new file mode 100644 index 00000000..b49f5b4e --- /dev/null +++ b/docs/opensearch/snapshot-restore.md @@ -0,0 +1,385 @@ +--- +layout: default +title: Take and Restore Snapshots +parent: OpenSearch +nav_order: 30 +--- + +# Take and restore snapshots + +Snapshots are backups of a cluster's indices and state. State includes cluster settings, node information, index settings, and shard allocation. + +Snapshots have two main uses: + +- **Recovering from failure** + + For example, if cluster health goes red, you might restore the red indices from a snapshot. + +- **Migrating from one cluster to another** + + For example, if you're moving from a proof-of-concept to a production cluster, you might take a snapshot of the former and restore it on the latter. + + +--- + +#### Table of contents +1. TOC +{:toc} + + +--- + +## About snapshots + +Snapshots aren't instantaneous. They take time to complete and do not represent perfect point-in-time views of the cluster. While a snapshot is in progress, you can still index documents and make other requests to the cluster, but new documents and updates to existing documents generally aren't included in the snapshot. The snapshot includes primary shards as they existed when OpenSearch initiated the snapshot. Depending on the size of your snapshot thread pool, different shards might be included in the snapshot at slightly different times. + +OpenSearch snapshots are incremental, meaning that they only store data that has changed since the last successful snapshot. The difference in disk usage between frequent and infrequent snapshots is often minimal. + +In other words, taking hourly snapshots for a week (for a total of 168 snapshots) might not use much more disk space than taking a single snapshot at the end of the week. Also, the more frequently you take snapshots, the less time they take to complete. Some OpenSearch users take snapshots as often as every half hour. + +If you need to delete a snapshot, be sure to use the OpenSearch API rather than navigating to the storage location and purging files. Incremental snapshots from a cluster often share a lot of the same data; when you use the API, OpenSearch only removes data that no other snapshot is using. +{: .tip } + + +## Register repository + +Before you can take a snapshot, you have to "register" a snapshot repository. A snapshot repository is just a storage location: a shared file system, Amazon S3, Hadoop Distributed File System (HDFS), Azure Storage, etc. + + +### Shared file system + +1. To use a shared file system as a snapshot repository, add it to `opensearch.yml`: + + ```yml + path.repo: ["/mnt/snapshots"] + ``` + + On the RPM and Debian installs, you can then mount the file system. If you're using the Docker install, add the file system to each node in `docker-compose.yml` before starting the cluster: + + ```yml + volumes: + - /Users/jdoe/snapshots:/mnt/snapshots + ``` + +1. Then register the repository using the REST API: + + ```json + PUT _snapshot/my-fs-repository + { + "type": "fs", + "settings": { + "location": "/mnt/snapshots" + } + } + ``` + + If the request is successful, the response from OpenSearch is minimal: + + ```json + { + "acknowledged": true + } + ``` + +You probably only need to specify `location`, but the following table summarizes the options: + +Setting | Description +:--- | :--- +location | The shared file system for snapshots. Required. +chunk_size | Breaks large files into chunks during snapshot operations (e.g. `64mb`, `1gb`), which is important for cloud storage providers and far less important for shared file systems. Default is `null` (unlimited). Optional. +compress | Whether to compress metadata files. This setting does not affect data files, which might already be compressed, depending on your index settings. Default is `false`. Optional. +max_restore_bytes_per_sec | The maximum rate at which snapshots restore. Default is 40 MB per second (`40m`). Optional. +max_snapshot_bytes_per_sec | The maximum rate at which snapshots take. Default is 40 MB per second (`40m`). Optional. +readonly | Whether the repository is read-only. Useful when migrating from one cluster (`"readonly": false` when registering) to another cluster (`"readonly": true` when registering). Optional. + + +### Amazon S3 + +1. To use an Amazon S3 bucket as a snapshot repository, install the `repository-s3` plugin on all nodes: + + ```bash + sudo ./bin/opensearch-plugin install repository-s3 + ``` + + If you're using the Docker installation, see [Customize the Docker image](../../install/docker/#customize-the-docker-image). Your `Dockerfile` should look something like this: + + ``` + FROM amazon/opensearch:{{site.opensearch_version}} + + ENV AWS_ACCESS_KEY_ID + ENV AWS_SECRET_ACCESS_KEY + + # Optional + ENV AWS_SESSION_TOKEN + + RUN /usr/share/opensearch/bin/opensearch-plugin install --batch repository-s3 + RUN /usr/share/opensearch/bin/opensearch-keystore create + + RUN echo $AWS_ACCESS_KEY_ID | /usr/share/opensearch/bin/opensearch-keystore add --stdin s3.client.default.access_key + RUN echo $AWS_SECRET_ACCESS_KEY | /usr/share/opensearch/bin/opensearch-keystore add --stdin s3.client.default.secret_key + + # Optional + RUN echo $AWS_SESSION_TOKEN | /usr/share/opensearch/bin/opensearch-keystore add --stdin s3.client.default.session_token + ``` + + After the Docker cluster starts, skip to step 7. + +1. Add your AWS access and secret keys to the OpenSearch keystore: + + ```bash + sudo ./bin/opensearch-keystore add s3.client.default.access_key + sudo ./bin/opensearch-keystore add s3.client.default.secret_key + ``` + +1. (Optional) If you're using temporary credentials, add your session token: + + ```bash + sudo ./bin/opensearch-keystore add s3.client.default.session_token + ``` + +1. (Optional) If you connect to the internet through a proxy, add those credentials: + + ```bash + sudo ./bin/opensearch-keystore add s3.client.default.proxy.username + sudo ./bin/opensearch-keystore add s3.client.default.proxy.password + ``` + +1. (Optional) Add other settings to `opensearch.yml`: + + ```yml + s3.client.default.disable_chunked_encoding: false # Disables chunked encoding for compatibility with some storage services, but you probably don't need to change this value. + s3.client.default.endpoint: s3.amazonaws.com # S3 has alternate endpoints, but you probably don't need to change this value. + s3.client.default.max_retries: 3 # number of retries if a request fails + s3.client.default.path_style_access: false # whether to use the deprecated path-style bucket URLs. + # You probably don't need to change this value, but for more information, see https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html#path-style-access. + s3.client.default.protocol: https # http or https + s3.client.default.proxy.host: my-proxy-host # the hostname for your proxy server + s3.client.default.proxy.port: 8080 # port for your proxy server + s3.client.default.read_timeout: 50s # the S3 connection timeout + s3.client.default.use_throttle_retries: true # whether the client should wait a progressively longer amount of time (exponential backoff) between each successive retry + ``` + +1. If you changed `opensearch.yml`, you must restart each node in the cluster. Otherwise, you only need to reload secure cluster settings: + + ``` + POST _nodes/reload_secure_settings + ``` + +1. Create an S3 bucket if you don't already have one. To take snapshots, you need permissions to access the bucket. The following IAM policy is an example of those permissions: + + ```json + { + "Version": "2012-10-17", + "Statement": [{ + "Action": [ + "s3:*" + ], + "Effect": "Allow", + "Resource": [ + "arn:aws:s3:::your-bucket", + "arn:aws:s3:::your-bucket/*" + ] + }] + } + ``` + +1. Register the repository using the REST API: + + ```json + PUT _snapshot/my-s3-repository + { + "type": "s3", + "settings": { + "bucket": "my-s3-bucket", + "base_path": "my/snapshot/directory" + } + } + ``` + +You probably don't need to specify anything but `bucket` and `base_path`, but the following table summarizes the options: + +Setting | Description +:--- | :--- +base_path | The path within the bucket where you want to store snapshots (e.g. `my/snapshot/directory`). Optional. If not specified, snapshots are stored in the bucket root. +bucket | Name of the S3 bucket. Required. +buffer_size | The threshold beyond which chunks (of `chunk_size`) should be broken into pieces (of `buffer_size`) and sent to S3 using a different API. Default is the smaller of two values: 100 MB or 5% of the Java heap. Valid values are between `5mb` and `5gb`. We don't recommend changing this option. +canned_acl | S3 has several [canned ACLs](https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl) that the `repository-s3` plugin can add to objects as it creates them in S3. Default is `private`. Optional. +chunk_size | Breaks files into chunks during snapshot operations (e.g. `64mb`, `1gb`), which is important for cloud storage providers and far less important for shared file systems. Default is `1gb`. Optional. +client | When specifying client settings (e.g. `s3.client.default.access_key`), you can use a string other than `default` (e.g. `s3.client.backup-role.access_key`). If you used an alternate name, change this value to match. Default and recommended value is `default`. Optional. +compress | Whether to compress metadata files. This setting does not affect data files, which might already be compressed, depending on your index settings. Default is `false`. Optional. +max_restore_bytes_per_sec | The maximum rate at which snapshots restore. Default is 40 MB per second (`40m`). Optional. +max_snapshot_bytes_per_sec | The maximum rate at which snapshots take. Default is 40 MB per second (`40m`). Optional. +readonly | Whether the repository is read-only. Useful when migrating from one cluster (`"readonly": false` when registering) to another cluster (`"readonly": true` when registering). Optional. +server_side_encryption | Whether to encrypt snapshot files in the S3 bucket. This setting uses AES-256 with S3-managed keys. See [Protecting Data Using Server-Side Encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html). Default is false. Optional. +storage_class | Specifies the [S3 storage class](https://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html) for the snapshots files. Default is `standard`. Do not use the `glacier` and `deep_archive` storage classes. Optional. + + +## Take snapshots + +You specify two pieces of information when you create a snapshot: + +- Name of your snapshot repository +- Name for the snapshot + +The following snapshot includes all indices and the cluster state: + +```json +PUT _snapshot/my-repository/1 +``` + +You can also add a request body to include or exclude certain indices or specify other settings: + +```json +PUT _snapshot/my-repository/2 +{ + "indices": "opensearch-dashboards*,my-index*,-my-index-2016", + "ignore_unavailable": true, + "include_global_state": false, + "partial": false +} +``` + +Setting | Description +:--- | :--- +indices | The indices you want to include in the snapshot. You can use `,` to create a list of indices, `*` to specify an index pattern, and `-` to exclude certain indices. Don't put spaces between items. Default is all indices. +ignore_unavailable | If an index from the `indices` list doesn't exist, whether to ignore it rather than fail the snapshot. Default is false. +include_global_state | Whether to include cluster state in the snapshot. Default is true. +partial | Whether to allow partial snapshots. Default is false, which fails the entire snapshot if one or more shards fails to store. + +If you request the snapshot immediately after taking it, you might see something like this: + +```json +GET _snapshot/my-repository/2 +{ + "snapshots": [{ + "snapshot": "2", + "version": "6.5.4", + "indices": [ + "opensearch_dashboards_sample_data_ecommerce", + "my-index", + "opensearch_dashboards_sample_data_logs", + "opensearch_dashboards_sample_data_flights" + ], + "include_global_state": true, + "state": "IN_PROGRESS", + ... + }] +} +``` + +Note that the snapshot is still in progress. If you want to wait for the snapshot to finish before continuing, add the `wait_for_completion` parameter to your request. Snapshots can take a while to complete, so consider whether or not this option fits your use case: + +``` +PUT _snapshot/my-repository/3?wait_for_completion=true +``` + +Snapshots have the following states: + +State | Description +:--- | :--- +SUCCESS | The snapshot successfully stored all shards. +IN_PROGRESS | The snapshot is currently running. +PARTIAL | At least one shard failed to store successfully. Can only occur if you set `partial` to `true` when taking the snapshot. +FAILED | The snapshot encountered an error and stored no data. +INCOMPATIBLE | The snapshot is incompatible with the version of OpenSearch running on this cluster. See [Conflicts and compatibility](#conflicts-and-compatibility). + +You can't take a snapshot if one is currently in progress. To check the status: + +``` +GET _snapshot/_status +``` + + +## Restore snapshots + +The first step in restoring a snapshot is retrieving existing snapshots. To see all snapshot repositories: + +``` +GET _snapshot/_all +``` + +To see all snapshots in a repository: + +``` +GET _snapshot/my-repository/_all +``` + +Then restore a snapshot: + +``` +POST _snapshot/my-repository/2/_restore +``` + +Just like when taking a snapshot, you can add a request body to include or exclude certain indices or specify some other settings: + +```json +POST _snapshot/my-repository/2/_restore +{ + "indices": "opensearch-dashboards*,my-index*", + "ignore_unavailable": true, + "include_global_state": false, + "include_aliases": false, + "partial": false, + "rename_pattern": "opensearch-dashboards(.+)", + "rename_replacement": "restored-opensearch-dashboards$1", + "index_settings": { + "index.blocks.read_only": false + }, + "ignore_index_settings": [ + "index.refresh_interval" + ] +} +``` + +Setting | Description +:--- | :--- +indices | The indices you want to restore. You can use `,` to create a list of indices, `*` to specify an index pattern, and `-` to exclude certain indices. Don't put spaces between items. Default is all indices. +ignore_unavailable | If an index from the `indices` list doesn't exist, whether to ignore it rather than fail the restore operation. Default is false. +include_global_state | Whether to restore the cluster state. Default is false. +include_aliases | Whether to restore aliases alongside their associated indices. Default is true. +partial | Whether to allow the restoration of partial snapshots. Default is false. +rename_pattern | If you want to rename indices as you restore them, use this option to specify a regular expression that matches all indices you want to restore. Use capture groups (`()`) to reuse portions of the index name. +rename_replacement | If you want to rename indices as you restore them, use this option to specify the replacement pattern. Use `$0` to include the entire matching index name, `$1` to include the content of the first capture group, etc. +index_settings | If you want to change index settings on restore, specify them here. +ignore_index_settings | Rather than explicitly specifying new settings with `index_settings`, you can ignore certain index settings in the snapshot and use the cluster defaults on restore. + + +### Conflicts and compatibility + +One way to avoid naming conflicts when restoring indices is to use the `rename_pattern` and `rename_replacement` options. Then, if necessary, you can use the `_reindex` API to combine the two. The simpler way is to delete existing indices prior to restoring from a snapshot. + +You can use the `_close` API to close existing indices prior to restoring from a snapshot, but the index in the snapshot has to have the same number of shards as the existing index. + +We recommend ceasing write requests to a cluster before restoring from a snapshot, which helps avoid scenarios such as: + +1. You delete an index, which also deletes its alias. +1. A write request to the now-deleted alias creates a new index with the same name as the alias. +1. The alias from the snapshot fails to restore due to a naming conflict with the new index. + +Snapshots are only forward-compatible by one major version. For example, you can't restore snapshots taken on a 2.x cluster to a 1.x cluster or a 6.x cluster, but you *can* restore them on a 2.x or 5.x cluster. + +If you have an old snapshot, you can sometimes restore it into an intermediate cluster, reindex all indices, take a new snapshot, and repeat until you arrive at your desired version, but you might find it easier to just manually index your data on the new cluster. + + +## Security plugin considerations + +If you're using the security plugin, snapshots have some additional restrictions: + +- To perform snapshot and restore operations, users must have the built-in `manage_snapshots` role. +- You can't restore snapshots that contain global state or the `.opensearch_security` index. + +If a snapshot contains global state, you must exclude it when performing the restore. If your snapshot also contains the `.opensearch_security` index, either exclude it or list all the other indices you want to include: + +```json +POST _snapshot/my-repository/3/_restore +{ + "indices": "-.opensearch_security", + "include_global_state": false +} +``` + +The `.opensearch_security` index contains sensitive data, so we recommend excluding it when you take a snapshot. If you do need to restore the index from a snapshot, you must include an admin certificate in the request: + +```bash +curl -k --cert ./kirk.pem --key ./kirk-key.pem -XPOST 'https://localhost:9200/_snapshot/my-repository/3/_restore?pretty' +``` diff --git a/docs/opensearch/tasksapis.md b/docs/opensearch/tasksapis.md new file mode 100644 index 00000000..7e2587ba --- /dev/null +++ b/docs/opensearch/tasksapis.md @@ -0,0 +1,232 @@ +--- +layout: default +title: Tasks API +parent: OpenSearch +nav_order: 8 +has_math: false +--- + +# Tasks API operation + +A task is any operation you run in a cluster. For example, searching your data collection of books for a title or author name is a task. When you run OpenSearch, a task is automatically created to monitor your cluster's health and performance. For more information about all of the tasks currently executing in your cluster, you can use the `tasks` API operation. + +The following request returns information about all of your tasks: + +``` +GET _tasks +``` + +By including a task ID, you can get information that's specific to a particular task. Note that a task ID consists of a node's identifying string and the task's numerical ID. For example, if your node's identifying string is `nodestring` and the task's numerical ID is `1234`, then your task ID is `nodestring:1234`. You can find this information by running the `tasks` operation. + +``` +GET _tasks/ +``` + +Note that if a task finishes running, it won't be returned as part of your request. For an example of a task that takes a little longer to finish, you can run the [`_reindex`](../reindex-data) API operation on a larger document, and then run `tasks`. + +**Sample Response** +```json +{ + "nodes": { + "Mgqdm0r9SEGClWxp_RbnaQ": { + "name": "opensearch-node1", + "transport_address": "172.18.0.3:9300", + "host": "172.18.0.3", + "ip": "172.18.0.3:9300", + "roles": [ + "data", + "ingest", + "master", + "remote_cluster_client" + ], + "tasks": { + "Mgqdm0r9SEGClWxp_RbnaQ:17416": { + "node": "Mgqdm0r9SEGClWxp_RbnaQ", + "id": 17416, + "type": "transport", + "action": "cluster:monitor/tasks/lists", + "start_time_in_millis": 1613599752458, + "running_time_in_nanos": 994000, + "cancellable": false, + "headers": {} + }, + "Mgqdm0r9SEGClWxp_RbnaQ:17413": { + "node": "Mgqdm0r9SEGClWxp_RbnaQ", + "id": 17413, + "type": "transport", + "action": "indices:data/write/bulk", + "start_time_in_millis": 1613599752286, + "running_time_in_nanos": 172846500, + "cancellable": false, + "parent_task_id": "Mgqdm0r9SEGClWxp_RbnaQ:17366", + "headers": {} + }, + "Mgqdm0r9SEGClWxp_RbnaQ:17366": { + "node": "Mgqdm0r9SEGClWxp_RbnaQ", + "id": 17366, + "type": "transport", + "action": "indices:data/write/reindex", + "start_time_in_millis": 1613599750929, + "running_time_in_nanos": 1529733100, + "cancellable": true, + "headers": {} + } + } + } + } +} +``` +You can also use the following parameters with your query. + +Parameter | Data type | Description | +:--- | :--- | :--- +nodes | List | A comma-separated list of node IDs or names to limit the returned information. Use `_local` to return information from the node you're connecting to, specify the node name to get information from specific nodes, or keep the parameter empty to get information from all nodes. +actions | List | A comma-separated list of actions that should be returned. Keep empty to return all. +detailed | Boolean | Returns detailed task information. (Default: false) +parent_task_id | String | Returns tasks with a specified parent task ID (node_id:task_number). Keep empty or set to -1 to return all. +wait_for_completion | Boolean | Waits for the matching tasks to complete. (Default: false) +group_by | Enum | Groups tasks by parent/child relationships or nodes. (Default: nodes) +timeout | Time | An explicit operation timeout. (Default: 30 seconds) +master_timeout | Time | The time to wait for a connection to the primary node. (Default: 30 seconds) + +For example, this request returns tasks currently running on a node named `opensearch-node1`. + +**Sample Request** + +``` +GET /_tasks?nodes=opensearch-node1 +``` + +**Sample Response** + +```json +{ + "nodes": { + "Mgqdm0r9SEGClWxp_RbnaQ": { + "name": "opensearch-node1", + "transport_address": "sample_address", + "host": "sample_host", + "ip": "sample_ip", + "roles": [ + "data", + "ingest", + "master", + "remote_cluster_client" + ], + "tasks": { + "Mgqdm0r9SEGClWxp_RbnaQ:24578": { + "node": "Mgqdm0r9SEGClWxp_RbnaQ", + "id": 24578, + "type": "transport", + "action": "cluster:monitor/tasks/lists", + "start_time_in_millis": 1611612517044, + "running_time_in_nanos": 638700, + "cancellable": false, + "headers": {} + }, + "Mgqdm0r9SEGClWxp_RbnaQ:24579": { + "node": "Mgqdm0r9SEGClWxp_RbnaQ", + "id": 24579, + "type": "direct", + "action": "cluster:monitor/tasks/lists[n]", + "start_time_in_millis": 1611612517044, + "running_time_in_nanos": 222200, + "cancellable": false, + "parent_task_id": "Mgqdm0r9SEGClWxp_RbnaQ:24578", + "headers": {} + } + } + } + } +} +``` + +## Task canceling + +After getting a list of tasks, you can cancel all cancelable tasks with the following request: + +``` +POST _tasks/_cancel +``` + +Note that not all tasks are cancelable. To see if a task is cancelable, refer to the `cancellable` field in the response to your `tasks` API request. + +You can also cancel a task by including a specific task ID. + +``` +POST _tasks//_cancel +``` + +The `cancel` operation supports the same parameters as the `tasks` operation. The following example shows how to cancel all cancelable tasks on multiple nodes. + +``` +POST _tasks/_cancel?nodes=opensearch-node1,opensearch-node2 +``` + +## Attaching headers to tasks + +To associate requests with tasks for better tracking, you can provide a `X-Opaque-Id:` header as part of the HTTPS request reader of your `curl` command. The API will attach the specified header in the returned result. + +Usage: + +```bash +curl -i -H "X-Opaque-Id: 111111" "https://localhost:9200/_tasks" -u 'admin:admin' --insecure +``` + +The `_tasks` operation returns the following result. + +```json +HTTP/1.1 200 OK +X-Opaque-Id: 111111 +content-type: application/json; charset=UTF-8 +content-length: 768 + +{ + "nodes": { + "Mgqdm0r9SEGClWxp_RbnaQ": { + "name": "opensearch-node1", + "transport_address": "172.18.0.4:9300", + "host": "172.18.0.4", + "ip": "172.18.0.4:9300", + "roles": [ + "data", + "ingest", + "master", + "remote_cluster_client" + ], + "tasks": { + "Mgqdm0r9SEGClWxp_RbnaQ:30072": { + "node": "Mgqdm0r9SEGClWxp_RbnaQ", + "id": 30072, + "type": "direct", + "action": "cluster:monitor/tasks/lists[n]", + "start_time_in_millis": 1613166701725, + "running_time_in_nanos": 245400, + "cancellable": false, + "parent_task_id": "Mgqdm0r9SEGClWxp_RbnaQ:30071", + "headers": { + "X-Opaque-Id": "111111" + } + }, + "Mgqdm0r9SEGClWxp_RbnaQ:30071": { + "node": "Mgqdm0r9SEGClWxp_RbnaQ", + "id": 30071, + "type": "transport", + "action": "cluster:monitor/tasks/lists", + "start_time_in_millis": 1613166701725, + "running_time_in_nanos": 658200, + "cancellable": false, + "headers": { + "X-Opaque-Id": "111111" + } + } + } + } + } +} +``` +This operation supports the same parameters as the `tasks` operation. The following example shows how you can associate `X-Opaque-Id` with specific tasks. + +```bash +curl -i -H "X-Opaque-Id: 123456" "https://localhost:9200/_tasks?nodes=opensearch-node1" -u 'admin:admin' --insecure +``` diff --git a/docs/opensearch/term.md b/docs/opensearch/term.md new file mode 100644 index 00000000..013b78ba --- /dev/null +++ b/docs/opensearch/term.md @@ -0,0 +1,450 @@ +--- +layout: default +title: Term-Level Queries +parent: OpenSearch +nav_order: 9 +--- + +# Term-level queries + +OpenSearch supports two types of queries when you search for data: term-level queries and full-text queries. + +The following table shows the differences between them: + +| | Term-level queries | Full-text queries +:--- | :--- | :--- +*Description* | Term-level queries answer which documents match a query. | Full-text queries answer how well the documents match a query. +*Analyzer* | The search term isn't analyzed. This means that the term query searches for your search term as it is. | The search term is analyzed by the same analyzer that was used for the specific field of the document at the time it was indexed. This means that your search term goes through the same analysis process that the document's field did. +*Relevance* | Term-level queries simply return documents that match without sorting them based on the relevance score. They still calculate the relevance score, but this score is the same for all the documents that are returned. | Full-text queries calculate a relevance score for each match and sort the results by decreasing order of relevance. +*Use Case* | Use term-level queries when you want to match exact values such as numbers, dates, tags, and so on, and don't need the matches to be sorted by relevance. | Use full-text queries to match text fields and sort by relevance after taking into account factors like casing and stemming variants. + +OpenSearch uses a probabilistic ranking framework called Okapi BM25 to calculate relevance scores. To learn more about Okapi BM25, see [Wikipedia](https://en.wikipedia.org/wiki/Okapi_BM25). +{: .note } + +Assume that you have the complete works of Shakespeare indexed in an OpenSearch cluster. We use a term-level query to search for the phrase "To be, or not to be" in the `text_entry` field: + +```json +GET shakespeare/_search +{ + "query": { + "term": { + "text_entry": "To be, or not to be" + } + } +} +``` + +#### Sample response + +```json +{ + "took" : 3, + "timed_out" : false, + "_shards" : { + "total" : 1, + "successful" : 1, + "skipped" : 0, + "failed" : 0 + }, + "hits" : { + "total" : { + "value" : 0, + "relation" : "eq" + }, + "max_score" : null, + "hits" : [ ] + } +} +``` + +We don’t get back any matches (`hits`). This is because the term “To be, or not to be” is searched literally in the inverted index, where only the analyzed values of the text fields are stored. Term-level queries aren't suited for searching on analyzed text fields because they often yield unexpected results. When working with text data, use term-level queries only for fields mapped as keyword only. + +Using a full-text query: + +```json +GET shakespeare/_search +{ + "query": { + "match": { + "text_entry": "To be, or not to be" + } + } +} +``` + +The search query “To be, or not to be” is analyzed and tokenized into an array of tokens just like the `text_entry` field of the documents. The full-text query performs an intersection of tokens between our search query and the `text_entry` fields for all the documents, and then sorts the results by relevance scores: + +#### Sample response + +```json +{ + "took" : 19, + "timed_out" : false, + "_shards" : { + "total" : 1, + "successful" : 1, + "skipped" : 0, + "failed" : 0 + }, + "hits" : { + "total" : { + "value" : 10000, + "relation" : "gte" + }, + "max_score" : 17.419369, + "hits" : [ + { + "_index" : "shakespeare", + "_type" : "_doc", + "_id" : "34229", + "_score" : 17.419369, + "_source" : { + "type" : "line", + "line_id" : 34230, + "play_name" : "Hamlet", + "speech_number" : 19, + "line_number" : "3.1.64", + "speaker" : "HAMLET", + "text_entry" : "To be, or not to be: that is the question:" + } + }, + { + "_index" : "shakespeare", + "_type" : "_doc", + "_id" : "109930", + "_score" : 14.883024, + "_source" : { + "type" : "line", + "line_id" : 109931, + "play_name" : "A Winters Tale", + "speech_number" : 23, + "line_number" : "4.4.153", + "speaker" : "PERDITA", + "text_entry" : "Not like a corse; or if, not to be buried," + } + }, + { + "_index" : "shakespeare", + "_type" : "_doc", + "_id" : "103117", + "_score" : 14.782743, + "_source" : { + "type" : "line", + "line_id" : 103118, + "play_name" : "Twelfth Night", + "speech_number" : 53, + "line_number" : "1.3.95", + "speaker" : "SIR ANDREW", + "text_entry" : "will not be seen; or if she be, its four to one" + } + } + ] + } +} +... +``` + +For a list of all full-text queries, see [Full-text queries](../full-text/). + +If you want to query for an exact term like “HAMLET” in the speaker field and don't need the results to be sorted by relevance scores, a term-level query is more efficient: + +```json +GET shakespeare/_search +{ + "query": { + "term": { + "speaker": "HAMLET" + } + } +} +``` + +#### Sample response + +```json +{ + "took" : 5, + "timed_out" : false, + "_shards" : { + "total" : 1, + "successful" : 1, + "skipped" : 0, + "failed" : 0 + }, + "hits" : { + "total" : { + "value" : 1582, + "relation" : "eq" + }, + "max_score" : 4.2540946, + "hits" : [ + { + "_index" : "shakespeare", + "_type" : "_doc", + "_id" : "32700", + "_score" : 4.2540946, + "_source" : { + "type" : "line", + "line_id" : 32701, + "play_name" : "Hamlet", + "speech_number" : 9, + "line_number" : "1.2.66", + "speaker" : "HAMLET", + "text_entry" : "[Aside] A little more than kin, and less than kind." + } + }, + { + "_index" : "shakespeare", + "_type" : "_doc", + "_id" : "32702", + "_score" : 4.2540946, + "_source" : { + "type" : "line", + "line_id" : 32703, + "play_name" : "Hamlet", + "speech_number" : 11, + "line_number" : "1.2.68", + "speaker" : "HAMLET", + "text_entry" : "Not so, my lord; I am too much i' the sun." + } + }, + { + "_index" : "shakespeare", + "_type" : "_doc", + "_id" : "32709", + "_score" : 4.2540946, + "_source" : { + "type" : "line", + "line_id" : 32710, + "play_name" : "Hamlet", + "speech_number" : 13, + "line_number" : "1.2.75", + "speaker" : "HAMLET", + "text_entry" : "Ay, madam, it is common." + } + } + ] + } +} +... +``` + +The term-level queries are exact matches. So, if you search for “Hamlet”, you don’t get back any matches, because “HAMLET” is a keyword field and is stored in OpenSearch literally and not in an analyzed form. +The search query “HAMLET” is also searched literally. So, to get a match on this field, we need to enter the exact same characters. + +--- + +## Term + +Use the `term` query to search for an exact term in a field. + +```json +GET shakespeare/_search +{ + "query": { + "term": { + "line_id": { + "value": "61809" + } + } + } +} +``` + +## Terms + +Use the `terms` query to search for multiple terms in the same field. + +```json +GET shakespeare/_search +{ + "query": { + "terms": { + "line_id": [ + "61809", + "61810" + ] + } + } +} +``` + +You get back documents that match any of the terms. + +## IDs + +Use the `ids` query to search for one or more document ID values. + +```json +GET shakespeare/_search +{ + "query": { + "ids": { + "values": [ + 34229, + 91296 + ] + } + } +} +``` + +## Range + +Use the `range` query to search for a range of values in a field. + +To search for documents where the `line_id` value is >= 10 and <= 20: + +```json +GET shakespeare/_search +{ + "query": { + "range": { + "line_id": { + "gte": 10, + "lte": 20 + } + } + } +} +``` + +Parameter | Behavior +:--- | :--- +`gte` | Greater than or equal to. +`gt` | Greater than. +`lte` | Less than or equal to. +`lt` | Less than. + +Assume that you have a `products` index and you want to find all the products that were added in the year 2019: + +```json +GET products/_search +{ + "query": { + "range": { + "created": { + "gte": "2019/01/01", + "lte": "2019/12/31" + } + } + } +} +``` + +Specify relative dates by using basic math expressions. + +To subtract 1 year and 1 day from the specified date: + +```json +GET products/_search +{ + "query": { + "range": { + "created": { + "gte": "2019/01/01||-1y-1d" + } + } + } +} +``` + +The first date that we specify is the anchor date or the starting point for the date math. Add two trailing pipe symbols. You could then add one day (`+1d`) or subtract two weeks (`-2w`). This math expression is relative to the anchor date that you specify. + +You could also round off dates by adding a forward slash to the date or time unit. + +To find products added in the last year and rounded off by month: + +```json +GET products/_search +{ + "query": { + "range": { + "created": { + "gte": "now-1y/M" + } + } + } +} +``` + +The keyword `now` refers to the current date and time. + +## Prefix + +Use the `prefix` query to search for terms that begin with a specific prefix. + +```json +GET shakespeare/_search +{ + "query": { + "prefix": { + "speaker": "KING" + } + } +} +``` + +## Exists + +Use the `exists` query to search for documents that contain a specific field. + +```json +GET shakespeare/_search +{ + "query": { + "exists": { + "field": "speaker" + } + } +} +``` + +## Wildcards + +Use wildcard queries to search for terms that match a wildcard pattern. + +Feature | Behavior +:--- | :--- +`*` | Specifies all valid values. +`?` | Specifies a single valid value. + +To search for terms that start with `H` and end with `Y`: + +```json +GET shakespeare/_search +{ + "query": { + "wildcard": { + "speaker": { + "value": "H*Y" + } + } + } +} +``` + +If we change `*` to `?`, we get no matches, because `?` refers to a single character. + +Wildcard queries tend to be slow because they need to iterate over a lot of terms. Avoid placing wildcard characters at the beginning of a query because it could be a very expensive operation in terms of both resources and time. + +## Regex + +Use the `regex` query to search for terms that match a regular expression. + +This regular expression matches any single uppercase or lowercase letter: + +```json +GET shakespeare/_search +{ + "query": { + "regexp": { + "play_name": "H[a-zA-Z]+mlet" + } + } +} +``` + +Regular expressions are applied to the terms in the field and not the entire value of the field. + +The efficiency of your regular expression depends a lot on the patterns you write. Make sure that you write `regex` queries with either a prefix or suffix to improve performance. diff --git a/docs/opensearch/units.md b/docs/opensearch/units.md new file mode 100644 index 00000000..24a5970d --- /dev/null +++ b/docs/opensearch/units.md @@ -0,0 +1,19 @@ +--- +layout: default +title: Supported Units +parent: OpenSearch +nav_order: 90 +--- + +# Supported units + +OpenSearch supports the following units for all REST operations: + +Unit | Description | Example +:--- | :--- | :--- +Times | The supported units for time are `d` for days, `h` for hours, `m` for minutes, `s` for seconds, `ms` for milliseconds, `micros` for microseconds, and `nanos` for nanoseconds. | `5d` or `7h` +Bytes | The supported units for byte size are `b` for bytes, `kb` for kibibytes, `mb` for mebibytes, `gb` for gibibytes, `tb` for tebibytes, and `pb` for pebibytes. Despite the base-10 abbreviations, these units are base-2; `1kb` is 1,024 bytes, `1mb` is 1,048,576 bytes, etc. | `7kb` or `6gb` +Distances | The supported units for distance are `mi` for miles, `yd` for yards, `ft` for feet, `in` for inches, `km` for kilometers, `m` for meters, `cm` for centimeters, `mm` for millimeters, and `nmi` or `NM` for nautical miles. | `5mi` or `4ft` +Quantities without units | For large values that don't have a unit, use `k` for kilo, `m` for mega, `g` for giga, `t` for tera, and `p` for peta. | `5k` for 5,000 + +To convert output units to human-readable values, see [Common REST parameters](../common-parameters/). diff --git a/docs/opensearch/ux.md b/docs/opensearch/ux.md new file mode 100644 index 00000000..3870b3e9 --- /dev/null +++ b/docs/opensearch/ux.md @@ -0,0 +1,1077 @@ +--- +layout: default +title: Search Experience +parent: OpenSearch +nav_order: 12 +--- + +# Search Experience + +Expectations from search engines have evolved over the years. Just returning relevant results quickly is no longer enough for most users. OpenSearch includes many features that enhance the user’s search experience as follows: + +Feature | Description +:--- | :--- +Autocomplete queries | Suggest phrases as the user types. +Paginate results | Rather than a single, long list, break search results into pages. +Scroll search | Return a large number of results in batches. +Sort results | Allow sorting results by different criteria. +Highlight query matches | Highlight the search term in the results. + +--- + +## Autocomplete queries + +Autocomplete shows suggestions to users while they type. + +For example, if a user types "pop," OpenSearch provides suggestions like "popcorn" or "popsicles." These suggestions preempt your user's intention and lead them to a possible search term more quickly. + +OpenSearch allows you to design autocomplete that updates with each keystroke, provides a few relevant suggestions, and tolerates typos. + +Implement autocomplete using one of three methods: + +- Prefix matching +- Edge N-gram matching +- Completion suggesters + +These methods are described below. + +### Prefix matching + +Prefix matching finds documents that matches the last term in the query string. + +For example, assume that the user types “qui” into a search UI. To autocomplete this phrase, use the `match_phrase_prefix` query to search all `text_entry` fields that begin with the prefix "qui." +To make the word order and relative positions flexible, specify a `slop` value. To learn about the `slop` option, see [Options](../full-text/#options). + +#### Sample Request + +```json +GET shakespeare/_search +{ + "query": { + "match_phrase_prefix": { + "text_entry": { + "query": "qui", + "slop": 3 + } + } + } +} +``` + +Prefix matching doesn’t require any special mappings. It works with your data as-is. +However, it’s a fairly resource-intensive operation. A prefix of `a` could match hundreds of thousands of terms and not be useful to your user. +To limit the impact of prefix expansion, set `max_expansions` to a reasonable number. To learn about the `max_expansions` option, see [Options](../full-text/#options). + +#### Sample Request + +```json +GET shakespeare/_search +{ + "query": { + "match_phrase_prefix": { + "text_entry": { + "query": "qui", + "slop": 3, + "max_expansions": 10 + } + } + } +} +``` + +The ease of implementing query-time autocomplete comes at the cost of performance. +When implementing this feature on a large scale, we recommend an index-time solution. With an index-time solution, you might experience slower indexing, but it’s a price you pay only once and not for every query. The edge N-gram and completion suggester methods are index time. + +### Edge N-gram matching + +During indexing, edge N-grams chop up a word into a sequence of N characters to support a faster lookup of partial search terms. + +If you N-gram the word "quick," the results depend on the value of N. + +N | Type | N-gram +:--- | :--- | :--- +1 | Unigram | [ `q`, `u`, `i`, `c`, `k` ] +2 | Bigram | [ `qu`, `ui`, `ic`, `ck` ] +3 | Trigram | [ `qui`, `uic`, `ick` ] +4 | Four-gram | [ `quic`, `uick` ] +5 | Five-gram | [ `quick` ] + +Autocomplete needs only the beginning N-grams of a search phrase, so OpenSearch uses a special type of N-gram called edge N-gram. + +Edge N-gramming the word "quick" results in the following: + +- `q` +- `qu` +- `qui` +- `quic` +- `quick` + +This follows the same sequence the user types. + +To configure a field to use edge N-grams, create an autocomplete analyzer with an `edge_ngram` filter: + +#### Sample Request + +```json +PUT shakespeare +{ + "mappings": { + "properties": { + "text_entry": { + "type": "text", + "analyzer": "autocomplete" + } + } + }, + "settings": { + "analysis": { + "filter": { + "edge_ngram_filter": { + "type": "edge_ngram", + "min_gram": 1, + "max_gram": 20 + } + }, + "analyzer": { + "autocomplete": { + "type": "custom", + "tokenizer": "standard", + "filter": [ + "lowercase", + "edge_ngram_filter" + ] + } + } + } + } +} +``` + +This example creates the index and instantiates the edge N-gram filter and analyzer. + +The `edge_ngram_filter` produces edge N-grams with a minimum N-gram length of 1 (a single letter) and a maximum length of 20. So it offers suggestions for words of up to 20 letters. + +The `autocomplete` analyzer tokenizes a string into individual terms, lowercases the terms, and then produces edge N-grams for each term using the `edge_ngram_filter`. + +Use the `analyze` operation to test this analyzer: + +```json +POST shakespeare/_analyze +{ + "analyzer": "autocomplete", + "text": "quick" +} +``` + +It returns edge N-grams as tokens: + +* `q` +* `qu` +* `qui` +* `quic` +* `quick` + +Use the `standard` analyzer at search time. Otherwise, the search query splits into edge N-grams and you get results for everything that matches `q`, `u`, and `i`. +This is one of the few occasions where you use a different analyzer on the index and query side. + +#### Sample Request + +```json +GET shakespeare/_search +{ + "query": { + "match": { + "text_entry": { + "query": "qui", + "analyzer": "standard" + } + } + } +} +``` + +#### Sample Response + +```json +{ + "took": 5, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 533, + "relation": "eq" + }, + "max_score": 9.712725, + "hits": [ + { + "_index": "shakespeare", + "_type": "_doc", + "_id": "22006", + "_score": 9.712725, + "_source": { + "type": "line", + "line_id": 22007, + "play_name": "Antony and Cleopatra", + "speech_number": 12, + "line_number": "5.2.44", + "speaker": "CLEOPATRA", + "text_entry": "Quick, quick, good hands." + } + }, + { + "_index": "shakespeare", + "_type": "_doc", + "_id": "54665", + "_score": 9.712725, + "_source": { + "type": "line", + "line_id": 54666, + "play_name": "Loves Labours Lost", + "speech_number": 21, + "line_number": "5.1.52", + "speaker": "HOLOFERNES", + "text_entry": "Quis, quis, thou consonant?" + } + } + ] + } +} +``` + +Alternatively, specify the `search_analyzer` in the mapping itself: + +```json +"mappings": { + "properties": { + "text_entry": { + "type": "text", + "analyzer": "autocomplete", + "search_analyzer": "standard" + } + } +} +``` + +### Completion suggester + +The completion suggester accepts a list of suggestions and builds them into a finite-state transducer (FST), an optimized data structure that’s essentially a graph. This data structure lives in memory and is optimized for fast prefix lookups. To learn more about FSTs, see [Wikipedia](https://en.wikipedia.org/wiki/Finite-state_transducer). + +As the user types, the completion suggester moves through the FST graph one character at a time along a matching path. After it runs out of user input, it examines the remaining endings to produce a list of suggestions. + +The completion suggester makes your autocomplete solution as efficient as possible and lets you have explicit control over its suggestions. + +Use a dedicated field type called `completion`, which stores the FST-like data structures in the index: + +```json +PUT shakespeare +{ + "mappings": { + "properties": { + "text_entry": { + "type": "completion" + } + } + } +} +``` + +To get back suggestions, use the `search` endpoint with the `suggest` parameter: + +```json +GET shakespeare/_search +{ + "suggest": { + "autocomplete": { + "prefix": "To be", + "completion": { + "field": "text_entry" + } + } + } +} +``` + +The phrase "to be" is prefix matched with the FST of the `text_entry` field. + +#### Sample Response + +```json +{ + "took": 9, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 0, + "relation": "eq" + }, + "max_score": null, + "hits": [] + }, + "suggest": { + "text_entry": [ + { + "text": "To be", + "offset": 0, + "length": 5, + "options": [ + { + "text": "To be a comrade with the wolf and owl,--", + "_index": "shakespeare", + "_type": "_doc", + "_id": "50652", + "_score": 1, + "_source": { + "type": "line", + "line_id": 50653, + "play_name": "King Lear", + "speech_number": 68, + "line_number": "2.4.230", + "speaker": "KING LEAR", + "text_entry": "To be a comrade with the wolf and owl,--" + } + }, + { + "text": "To be a make-peace shall become my age:", + "_index": "shakespeare", + "_type": "_doc", + "_id": "78566", + "_score": 1, + "_source": { + "type": "line", + "line_id": 78567, + "play_name": "Richard II", + "speech_number": 20, + "line_number": "1.1.160", + "speaker": "JOHN OF GAUNT", + "text_entry": "To be a make-peace shall become my age:" + } + } + ] + } + ] + } +} +``` + +To specify the number of suggestions that you want to return, use the `size` parameter: + +```json +GET shakespeare/_search +{ + "suggest": { + "autocomplete": { + "prefix": "To m", + "completion": { + "field": "text_entry", + "size": 3 + } + } + } +} +``` + +#### Sample Response + +```json +{ + "took": 3, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 0, + "relation": "eq" + }, + "max_score": null, + "hits": [] + }, + "suggest": { + "text_entry": [ + { + "text": "To m", + "offset": 0, + "length": 5, + "options": [ + { + "text": "To make a bastard and a slave of me!", + "_index": "shakespeare", + "_type": "_doc", + "_id": "5369", + "_score": 4, + "_source": { + "type": "line", + "line_id": 5370, + "play_name": "Henry VI Part 1", + "speech_number": 2, + "line_number": "4.5.15", + "speaker": "JOHN TALBOT", + "text_entry": "To make a bastard and a slave of me!" + } + }, + { + "text": "To make a bloody supper in the Tower.", + "_index": "shakespeare", + "_type": "_doc", + "_id": "12504", + "_score": 4, + "_source": { + "type": "line", + "line_id": 12505, + "play_name": "Henry VI Part 3", + "speech_number": 40, + "line_number": "5.5.85", + "speaker": "CLARENCE", + "text_entry": "To make a bloody supper in the Tower." + } + } + ] + } + ] + } +} +``` + +The `suggest` parameter finds suggestions using only prefix matching. +For example, you don't get back "To be, or not to be," which you might want as a suggestion. +To work around this issue, manually add curated suggestions and add weights to prioritize your suggestions. + +Index a document with an input suggestion and assign a weight: + +```json +PUT shakespeare/_doc/1 +{ + "text": "To m", + "text_entry": { + "input": [ + "To be, or not to be: that is the question:" + ], + "weight": 10 + } +} +``` + +Perform the same search as before: + +```json +GET shakespeare/_search +{ + "suggest": { + "autocomplete": { + "prefix": "To m", + "completion": { + "field": "text_entry", + "size": 3 + } + } + } +} +``` + +You see the indexed document as the first result: + +```json +{ + "took": 1021, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 0, + "relation": "eq" + }, + "max_score": null, + "hits": [] + }, + "suggest": { + "autocomplete": [ + { + "text": "To m", + "offset": 0, + "length": 5, + "options": [ + { + "text": "To be, or not to be: that is the question:", + "_index": "shakespeare", + "_type": "_doc", + "_id": "1", + "_score": 30, + "_source": { + "text": "To me", + "text_entry": { + "input": [ + "To be, or not to be: that is the question:" + ], + "weight": 10 + } + } + }, + { + "text": "To make a bastard and a slave of me!", + "_index": "shakespeare", + "_type": "_doc", + "_id": "5369", + "_score": 4, + "_source": { + "type": "line", + "line_id": 5370, + "play_name": "Henry VI Part 1", + "speech_number": 2, + "line_number": "4.5.15", + "speaker": "JOHN TALBOT", + "text_entry": "To make a bastard and a slave of me!" + } + } + ] + } + ] + } +} +``` + +Use the `term` suggester to suggest corrected spellings for individual words. +The `term` suggester uses an edit distance to compute suggestions. Edit distance is the number of characters that need to be changed for a term to match. + +In this example, the user misspells a search term: + +```json +GET shakespeare/_search +{ + "suggest": { + "spell-check": { + "text": "lief", + "term": { + "field": "text_entry" + } + } + } +} +``` + +The `term` suggester returns a list of corrections: + +```json +{ + "took": 48, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 0, + "relation": "eq" + }, + "max_score": null, + "hits": [] + }, + "suggest": { + "spell-check": [ + { + "text": "lief", + "offset": 0, + "length": 4, + "options": [ + { + "text": "lifes", + "score": 0.8, + "freq": 21 + }, + { + "text": "life", + "score": 0.75, + "freq": 805 + }, + { + "text": "lives", + "score": 0.6, + "freq": 187 + }, + { + "text": "liege", + "score": 0.6, + "freq": 138 + }, + { + "text": "lived", + "score": 0.6, + "freq": 80 + } + ] + } + ] + } +} +``` + +The higher the score, the better the suggestion is. The frequency represents the number of times the term appears in the documents of that index. + +To implement a "Did you mean `suggestion`?" feature, use a `phrase` suggester. +The `phrase` suggester is similar to the `term` suggester, except that it uses N-gram language models to suggest whole phrases instead of individual words. + +Create a custom analyzer called `trigram` that uses a `shingle` filter. This filter is similar to the `edge_ngram` filter, but it applies to words instead of letters: + +```json +PUT shakespeare +{ + "settings": { + "index": { + "analysis": { + "analyzer": { + "trigram": { + "type": "custom", + "tokenizer": "standard", + "filter": [ + "lowercase", + "shingle" + ] + } + }, + "filter": { + "shingle": { + "type": "shingle", + "min_shingle_size": 2, + "max_shingle_size": 3 + } + } + } + } + }, + "mappings": { + "properties": { + "text_entry": { + "type": "text", + "fields": { + "trigram": { + "type": "text", + "analyzer": "trigram" + } + } + } + } + } +} +``` + +This example includes as incorrect phrase: + +```json +POST shakespeare/_search +{ + "suggest": { + "text": "That the qution", + "simple_phrase": { + "phrase": { + "field": "text_entry.trigram" + } + } + } +} +``` + +You get back the corrected phrase: + +```json +{ + "took": 3, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 0, + "relation": "eq" + }, + "max_score": null, + "hits": [] + }, + "suggest": { + "simple_phrase": [ + { + "text": "That the qution", + "offset": 0, + "length": 18, + "options": [ + { + "text": "that is the question", + "score": 0.0015543294 + } + ] + } + ] + } +} +``` + + +## Paginate results + +The `from` and `size` parameters return results to your users one page at a time. + +The `from` parameter is the document number that you want to start showing the results from. The `size` parameter is the number of results that you want to show. Together, they let you return a subset of the search results. + +For example, if the value of `size` is 10 and the value of `from` is 0, you see the first 10 results. If you change the value of `from` to 10, you see the next 10 results (because the results are zero-indexed). So, if you want to see results starting from result 11, `from` must be 10. + +```json +GET shakespeare/_search +{ + "from": 0, + "size": 10, + "query": { + "match": { + "play_name": "Hamlet" + } + } +} +``` + +To calculate the `from` parameter relative to the page number: + +```json +from = size * (page_number - 1) +``` + +Each time the user chooses the next page of the results, your application needs to make the same search query with an incremented `from` value. + +You can also specify the `from` and `size` parameters in the search URI: + +```json +GET shakespeare/_search?from=0&size=10 +``` + +If you only specify the `size` parameter, the `from` parameter defaults to 0. + +Querying for pages deep in your results can have a significant performance impact, so OpenSearch limits this approach to 10,000 results. + +The `from` and `size` parameters are stateless, so the results are based on the latest available data. +This can cause inconsistent pagination. +For example, assume a user stays on the first page of the results for a minute and then navigates to the second page; in that time, a new document is indexed in the background which is relevant enough to show up on the first page. In this scenario, the last result of the first page is pushed to the second page, so the user ends up seeing a result on the second page that they already saw on the first page. + +Use the `scroll` operation for consistent pagination. The `scroll` operation keeps a search context open for a certain period of time. Any data changes do not affect the results during this time. + +## Scroll search + +The `from` and `size` parameters allow you to paginate your search results, but with a limit of 10,000 results at a time. + +If you need to request massive volumes of data from, for example, a machine learning job, use the `scroll` operation instead. The `scroll` operation allows you to request an unlimited number of results. + +To use the scroll operation, add a `scroll` parameter to the request header with a search context to tell OpenSearch how long you need to keep scrolling. This search context needs to be long enough to process a single batch of results. + +To set the number of results that you want returned for each batch, use the `size` parameter: + +```json +GET shakespeare/_search?scroll=10m +{ + "size": 10000 +} +``` + +OpenSearch caches the results and returns a scroll ID to access them in batches: + +```json +"_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAUWdmpUZDhnRFBUcWFtV21nMmFwUGJEQQ==" +``` + +Pass this scroll ID to the `scroll` operation to get back the next batch of results: + +```json +GET _search/scroll +{ + "scroll": "10m", + "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAUWdmpUZDhnRFBUcWFtV21nMmFwUGJEQQ==" +} +``` + +Using this scroll ID, you get results in batches of 10,000 as long as the search context is still open. Typically, the scroll ID does not change between requests, but it *can* change, so make sure to always use the latest scroll ID. If you don't send the next scroll request within the set search context, the `scroll` operation does not return any results. + +If you expect billions of results, use a sliced scroll. Slicing allows you to perform multiple scroll operations for the same request, but in parallel. +Set the ID and the maximum number of slices for the scroll: + +```json +GET shakespeare/_search?scroll=10m +{ + "slice": { + "id": 0, + "max": 10 + }, + "query": { + "match_all": {} + } +} +``` + +With a single scroll ID, you get back 10 results. +You can have up to 10 IDs. +Perform the same command with ID equal to 1: + +```json +GET shakespeare/_search?scroll=10m +{ + "slice": { + "id": 1, + "max": 10 + }, + "query": { + "match_all": {} + } +} +``` + +Close the search context when you’re done scrolling, because it continues to consume computing resources until the timeout: + +```json +DELETE _search/scroll/DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAcWdmpUZDhnRFBUcWFtV21nMmFwUGJEQQ== +``` + +#### Sample Response + +```json +{ + "succeeded": true, + "num_freed": 1 +} +``` + +To close all open scroll contexts: + +```json +DELETE _search/scroll/_all +``` + +The `scroll` operation corresponds to a specific timestamp. It does not consider documents added after that timestamp as potential results. + +Because open search contexts consume a lot of memory, we suggest you do not use the `scroll` operation for frequent user queries that don't need the search context open. Instead, use the `sort` parameter with the `search_after` parameter to scroll responses for user queries. + +## Sort results + +Sorting allows your users to sort the results in a way that’s most meaningful to them. + +By default, full-text queries sort results by the relevance score. +You can choose to sort the results by any field value in either ascending or descending order. + +For example, to sort results by descending order of a `line_id` value: + +```json +GET shakespeare/_search +{ + "query": { + "term": { + "play_name": { + "value": "Henry IV" + } + } + }, + "sort": [ + { + "line_id": { + "order": "desc" + } + } + ] +} +``` + +The sort parameter is an array, so you can specify multiple field values in the order of their priority. + +If you have two fields with the same value for `line_id`, OpenSearch uses `speech_number`, which is the second option for sorting. + +```json +GET shakespeare/_search +{ + "query": { + "term": { + "play_name": { + "value": "Henry IV" + } + } + }, + "sort": [ + { + "line_id": { + "order": "desc" + } + }, + { + "speech_number": { + "order": "desc" + } + } + ] +} +``` + +You can continue to sort by any number of field values to get the results in just the right order. It doesn’t have to be a numerical value, you can also sort by date or timestamp fields: + +```json +"sort": [ + { + "date": { + "order": "desc" + } + } + ] +``` + +For numeric fields that contain an array of numbers, you can sort by `avg`, `sum`, and `median` modes: + +```json +"sort": [ + { + "price": { + "order": "asc", + "mode": "avg" + } + } +] +``` + +To sort by the minimum or maximum values, use the `min` or `max` modes. These modes work for both numeric and string data types. + +A text field that’s analyzed cannot be used to sort documents, because the inverted index only contains the individual tokenized terms and not the entire string. So, you cannot sort by the `play_name`, for example. + +One workaround is map a raw version of the text field as a keyword type, so it won’t be analyzed and you have a copy of the full original version for sorting purposes. + +```json +GET shakespeare/_search +{ + "query": { + "term": { + "play_name": { + "value": "Henry IV" + } + } + }, + "sort": [ + { + "play_name.keyword": { + "order": "desc" + } + } + ] +} +``` + +You get back results sorted by the `play_name` field in alphabetic order. + +Use `sort` with `search_after` parameter for more efficient scrolling. +You get back results after the values you specify in the `search_after` array. + +Make sure you have the same number of values in the `search_after` array as in the `sort` array, also ordered in the same way. +In this case, you get back results after `line_id = 3202` and `speech_number = 8`. + +```json +GET shakespeare/_search +{ + "query": { + "term": { + "play_name": { + "value": "Henry IV" + } + } + }, + "sort": [ + { + "line_id": { + "order": "desc" + } + }, + { + "speech_number": { + "order": "desc" + } + } + ], + "search_after": [ + "3202", + "8" + ] +} +``` + +## Highlight query matches + +Highlighting emphasizes the search term(s) in the results. + +To highlight the search terms, add a `highlight` parameter outside of the query block: + +```json +GET shakespeare/_search +{ + "query": { + "match": { + "text_entry": "life" + } + }, + "highlight": { + "fields": { + "text_entry": {} + } + } +} +``` + +For each document in the results, you get back a `highlight` object that shows your search term wrapped in an `em` tag: + +```json +"highlight": { + "text_entry": [ + "my life, except my life." + ] +} +``` + +Design your application code to parse the results from the `highlight` object and perform some action on the search terms, such as changing their color, bolding, italicizing, and so on. + +To change the default `em` tags, use the `pretag` and `posttag` parameters: + +```json +GET shakespeare/_search?format=yaml +{ + "query": { + "match": { + "play_name": "Henry IV" + } + }, + "highlight": { + "pre_tags": [ + "" + ], + "post_tags": [ + "" + ], + "fields": { + "play_name": {} + } + } +} +``` + +The `highlight` parameter highlights the original terms even when using synonyms or stemming for the search itself. diff --git a/docs/pa/api.md b/docs/pa/api.md new file mode 100644 index 00000000..784d1c6b --- /dev/null +++ b/docs/pa/api.md @@ -0,0 +1,196 @@ +--- +layout: default +title: API +parent: Performance Analyzer +nav_order: 1 +--- + +# Performance Analyzer API + +Performance Analyzer uses a single HTTP method and URI for most requests: + +``` +GET :9600/_opensearch/_performanceanalyzer/metrics +``` + +Note the use of port 9600. Provide parameters for metrics, aggregations, dimensions, and nodes (optional): + +``` +?metrics=&agg=&dim=&nodes=all" +``` + +For a full list of metrics, see [Metrics reference](../reference/). Performance Analyzer updates its data every five seconds. If you create a custom client, we recommend using that same interval for calls to the API. + + +#### Sample request + +``` +GET localhost:9600/_opensearch/_performanceanalyzer/metrics?metrics=Latency,CPU_Utilization&agg=avg,max&dim=ShardID&nodes=all +``` + + +#### Sample response + +```json +{ + "keHlhQbbTpm1BYicficEQg": { + "timestamp": 1554940530000, + "data": { + "fields": [{ + "name": "ShardID", + "type": "VARCHAR" + }, + { + "name": "Latency", + "type": "DOUBLE" + }, + { + "name": "CPU_Utilization", + "type": "DOUBLE" + } + ], + "records": [ + [ + null, + null, + 0.012552206029147535 + ], + [ + "1", + 4.8, + 0.0009780939762972104 + ] + ] + } + }, + "bHdpbMJZTs-TKtZro2SmYA": { + "timestamp": 1554940530000, + "data": { + "fields": [{ + "name": "ShardID", + "type": "VARCHAR" + }, + { + "name": "Latency", + "type": "DOUBLE" + }, + { + "name": "CPU_Utilization", + "type": "DOUBLE" + } + ], + "records": [ + [ + null, + 18.2, + 0.011966493817311527 + ], + [ + "1", + 14.8, + 0.0007670829370071493 + ] + ] + } + } +} +``` + +In this case, each top-level object represents a node. The API returns names and data types for the metrics and dimensions that you specified, along with values from five seconds ago and current values (if different). Null values represent inactivity during that time period. + +Performance Analyzer has one additional URI that returns the unit for each metric. + + +#### Sample request + +``` +GET localhost:9600/_opensearch/_performanceanalyzer/metrics/units +``` + + +#### Sample response + +```json +{ + "Disk_Utilization": "%", + "Cache_Request_Hit": "count", + "TermVectors_Memory": "B", + "Segments_Memory": "B", + "HTTP_RequestDocs": "count", + "Net_TCP_Lost": "segments/flow", + "Refresh_Time": "ms", + "GC_Collection_Event": "count", + "Merge_Time": "ms", + "Sched_CtxRate": "count/s", + "Cache_Request_Size": "B", + "ThreadPool_QueueSize": "count", + "Sched_Runtime": "s/ctxswitch", + "Disk_ServiceRate": "MB/s", + "Heap_AllocRate": "B/s", + "Heap_Max": "B", + "Sched_Waittime": "s/ctxswitch", + "ShardBulkDocs": "count", + "Thread_Blocked_Time": "s/event", + "VersionMap_Memory": "B", + "Master_Task_Queue_Time": "ms", + "Merge_CurrentEvent": "count", + "Indexing_Buffer": "B", + "Bitset_Memory": "B", + "Norms_Memory": "B", + "Net_PacketDropRate4": "packets/s", + "Heap_Committed": "B", + "Net_PacketDropRate6": "packets/s", + "Thread_Blocked_Event": "count", + "GC_Collection_Time": "ms", + "Cache_Query_Miss": "count", + "IO_TotThroughput": "B/s", + "Latency": "ms", + "Net_PacketRate6": "packets/s", + "Cache_Query_Hit": "count", + "IO_ReadSyscallRate": "count/s", + "Net_PacketRate4": "packets/s", + "Cache_Request_Miss": "count", + "CB_ConfiguredSize": "B", + "CB_TrippedEvents": "count", + "ThreadPool_RejectedReqs": "count", + "Disk_WaitTime": "ms", + "Net_TCP_TxQ": "segments/flow", + "Master_Task_Run_Time": "ms", + "IO_WriteSyscallRate": "count/s", + "IO_WriteThroughput": "B/s", + "Flush_Event": "count", + "Net_TCP_RxQ": "segments/flow", + "Refresh_Event": "count", + "Points_Memory": "B", + "Flush_Time": "ms", + "Heap_Init": "B", + "CPU_Utilization": "cores", + "HTTP_TotalRequests": "count", + "ThreadPool_ActiveThreads": "count", + "Cache_Query_Size": "B", + "Paging_MinfltRate": "count/s", + "Merge_Event": "count", + "Net_TCP_SendCWND": "B/flow", + "Cache_Request_Eviction": "count", + "Segments_Total": "count", + "Terms_Memory": "B", + "DocValues_Memory": "B", + "Heap_Used": "B", + "Cache_FieldData_Eviction": "count", + "IO_TotalSyscallRate": "count/s", + "CB_EstimatedSize": "B", + "Net_Throughput": "B/s", + "Paging_RSS": "pages", + "Indexing_ThrottleTime": "ms", + "StoredFields_Memory": "B", + "IndexWriter_Memory": "B", + "Master_PendingQueueSize": "count", + "Net_TCP_SSThresh": "B/flow", + "Cache_FieldData_Size": "B", + "Paging_MajfltRate": "count/s", + "ThreadPool_TotalThreads": "count", + "IO_ReadThroughput": "B/s", + "ShardEvents": "count", + "Net_TCP_NumFlows": "count" +} +``` diff --git a/docs/pa/dashboards.md b/docs/pa/dashboards.md new file mode 100644 index 00000000..3d532d1b --- /dev/null +++ b/docs/pa/dashboards.md @@ -0,0 +1,162 @@ +--- +layout: default +title: Create Dashboards +parent: Performance Analyzer +nav_order: 2 +--- + +# PerfTop dashboards + +Dashboards are defined in JSON and composed of three main elements: tables, line graphs, and bar graphs. You define a grid of rows and columns and then place elements within that grid, with each element spanning as many rows and columns as you specify. + +The best way to get started with building custom dashboards is to duplicate and modify one of the existing JSON files in the `dashboards` directory. +{: .tip } + +--- + +#### Table of contents +1. TOC +{:toc} + +--- + + +## Summary of elements + +- Tables show metrics per dimension. For example, if your metric is `CPU_Utilization` and your dimension `ShardID`, a PerfTop table shows a row for each shard on each node. +- Bar graphs are aggregated for the cluster, unless you add `nodeName` to the dashboard. See the [options for all elements](#all-elements). +- Line graphs are aggregated for each node. Each line represents a node. + + +## Position elements + +PerfTop positions elements within a grid. For example, consider this 12 * 12 grid. + +![Dashboard grid](../../images/perftop-grid.png) + +The upper-left of the grid represents row 0, column 0, so the starting positions for the three boxes are: + +- Orange: row 0, column 0 +- Purple: row 2, column 2 +- Green: row 1, column 6 + +These boxes span a number of rows and columns. In this case: + +- Orange: 2 rows, 4 columns +- Purple: 1 row, 4 columns +- Green: 3 rows, 2 columns + +In JSON form, we have the following: + +```json +{ + "gridOptions": { + "rows": 12, + "cols": 12 + }, + "graphs": { + "tables": [{ + "options": { + "gridPosition": { + "row": 0, + "col": 0, + "rowSpan": 2, + "colSpan": 4 + } + } + }, + { + "options": { + "gridPosition": { + "row": 2, + "col": 2, + "rowSpan": 1, + "colSpan": 4 + } + } + }, + { + "options": { + "gridPosition": { + "row": 1, + "col": 6, + "rowSpan": 3, + "colSpan": 2 + } + } + } + ] + } +} +``` + +At this point, however, all the JSON does is define the size and position of three tables. To fill elements with data, you specify a query. + + +## Add queries + +Queries use the same elements as the [REST API](../api/), just in JSON form: + +```json +{ + "queryParams": { + "metrics": "estimated,limitConfigured", + "aggregates": "avg,avg", + "dimensions": "type", + "sortBy": "estimated" + } +} +``` + +For details on available metrics, see [Metrics reference](../reference/). + + +## Add options + +Options include labels, colors, and a refresh interval. Different elements types have different options. + +Dashboards support the 16 ANSI colors: black, red, green, yellow, blue, magenta, cyan, and white. For the "bright" variants of these colors, use the numbers 8--15. If your terminal supports 256 colors, you can also use hex codes (e.g. `#6D40ED`). +{: .note } + + +### All elements + +Option | Type | Description +:--- | :--- | :--- +`label` | String or integer | The text in the upper-left corner of the box. +`labelColor` | String or integer | The color of the label. +`refreshInterval` | Integer | The number of milliseconds between calls to the Performance Analyzer API for new data. Minimum value is 5000. +`dimensionFilters` | String array | The dimension value to diplay for the graph. For example, if you query for `metric=Net_Throughput&agg=sum&dim=Direction` and the possible dimension values are `in` and `out`, you can define `dimensionFilters: ["in"]` to only display the metric data for `in` dimension +`nodeName` | String | If non-null, lets you restrict elements to individual nodes. You can specify the node name directly in the dashboard file, but the better approach is to use `"nodeName": "#nodeName"` in the dashboard and include the `--nodename ` argument when starting PerfTop. + + +### Tables + +Option | Type | Description +:--- | :--- | :--- +`bg` | String or integer | The background color. +`fg` | String or integer | The text color. +`selectedFg` | String or integer | The text color for focused text. +`selectedBg` | String or integer | The background color for focused text. +`columnSpacing` | Integer | The amount of space (measured in characters) between columns. +`keys` | Boolean | Has no impact at this time. + + +### Bars + +Option | Type | Description +:--- | :--- | :--- +`barWidth` | Integer | The width of each bar (measured in characters) in the graph. +`xOffset` | Integer | The amount of space (measured in characters) between the y-axis and the first bar in the graph. +`maxHeight` | Integer | The maximum height of each bar (measured in characters) in the graph. + + +### Lines + +Option | Type | Description +:--- | :--- | :--- +`showNthLabel` | Integer | Which of the `xAxis` labels to show. For example, `"showNthLabel": 2` shows every other label. +`showLegend` | Boolean | Whether or not to display a legend for the line graph. +`legend.width` | Integer | The width of the legend (measured in characters) in the graph. +`xAxis` | String array | Array of labels for the x-axis. For example, `["0:00", "0:10", "0:20", "0:30", "0:40", "0:50"]`. +`colors` | String array | Array of line colors to choose from. For example, `["magenta", "cyan"]`. If you don't provide this value, PerfTop chooses random colors for each line. diff --git a/docs/pa/index.md b/docs/pa/index.md new file mode 100644 index 00000000..f4ea41b9 --- /dev/null +++ b/docs/pa/index.md @@ -0,0 +1,101 @@ +--- +layout: default +title: Performance Analyzer +nav_order: 58 +has_children: true +--- + +# Performance Analyzer + +Performance Analyzer is an agent and REST API that allows you to query numerous performance metrics for your cluster, including aggregations of those metrics, independent of the Java Virtual Machine (JVM). PerfTop is the default command line interface (CLI) for displaying those metrics. + +To download PerfTop, see [Download](https://opensearch.org/downloads.html) on the OpenSearch website. + +You can also install it using [npm](https://www.npmjs.com/): + +```bash +npm install -g @aws/opensearch-perftop +``` + +![PerfTop screenshot](../images/perftop.png) + + +## Get started with PerfTop + +The basic syntax is: + +```bash +./perf-top- --dashboard .json --endpoint +``` + +If you're using npm, the syntax is similar: + +```bash +perf-top --dashboard --endpoint +``` + +If you're running PerfTop from a node (i.e. locally), specify port 9600: + +```bash +./perf-top-linux --dashboard dashboards/.json --endpoint localhost:9600 +``` + +Otherwise, just specify the OpenSearch endpoint: + +```bash +./perf-top-macos --dashboard dashboards/.json --endpoint my-cluster.my-domain.com +``` + +PerfTop has four pre-built dashboards in the `dashboards` directory, but you can also [create your own](dashboards/). + +You can also load the pre-built dashboards (ClusterOverview, ClusterNetworkMemoryAnalysis, ClusterThreadAnalysis, or NodeAnalysis) without the JSON files, such as `--dashboard ClusterThreadAnalysis`. + +PerfTop has no interactivity. Start the application, monitor the dashboard, and press esc, q, or Ctrl + C to quit. +{: .note } + + +### Other options + +- For NodeAnalysis and similar custom dashboards, you can add the `--nodename ` argument if you want your dashboard to display metrics for only a single node. +- For troubleshooting, add the `--logfile .txt` argument. + + +## Performance Analyzer configuration + +### Storage + +Performance Analyzer uses `/dev/shm` for temporary storage. During heavy workloads on a cluster, Performance Analyzer can use up to 1 GB of space. + +Docker, however, has a default `/dev/shm` size of 64 MB. To change this value, you can use the `docker run --shm-size 1gb` flag or [a similar setting in Docker Compose](https://docs.docker.com/compose/compose-file/#shm_size). + +If you're not using Docker, check the size of `/dev/shm` using `df -h`. The default value is probably plenty, but if you need to change its size, add the following line to `/etc/fstab`: + +```bash +tmpfs /dev/shm tmpfs defaults,noexec,nosuid,size=1G 0 0 +``` + +Then remount the file system: + +```bash +mount -o remount /dev/shm +``` + + +### Security + +Performance Analyzer supports encryption in transit for requests. It currently does *not* support client or server authentication for requests. To enable encryption in transit, edit `performance-analyzer.properties` in your `$ES_HOME` directory: + +```bash +vi $ES_HOME/plugins/opensearch_performance_analyzer/pa_config/performance-analyzer.properties +``` + +Change the following lines to configure encryption in transit. Note that `certificate-file-path` must be a certificate for the server, not a root CA: + +``` +https-enabled = true + +#Setup the correct path for certificates +certificate-file-path = specify_path + +private-key-file-path = specify_path +``` diff --git a/docs/pa/rca/api.md b/docs/pa/rca/api.md new file mode 100644 index 00000000..d43ae681 --- /dev/null +++ b/docs/pa/rca/api.md @@ -0,0 +1,63 @@ +--- +layout: default +title: API +parent: Root Cause Analysis +grand_parent: Performance Analyzer +nav_order: 1 +--- + +# RCA API + +## Sample request + +``` +# Request all available RCAs +GET localhost:9600/_opensearch/_performanceanalyzer/rca + +# Request a specific RCA +GET localhost:9600/_opensearch/_performanceanalyzer/rca?name=HighHeapUsageClusterRca +``` + + +## Sample response + +```json +{ + "HighHeapUsageClusterRca": [{ + "rca_name": "HighHeapUsageClusterRca", + "state": "unhealthy", + "timestamp": 1587426650942, + "HotClusterSummary": [{ + "number_of_nodes": 2, + "number_of_unhealthy_nodes": 1, + "HotNodeSummary": [{ + "host_address": "192.168.144.2", + "node_id": "JtlEoRowSI6iNpzpjlbp_Q", + "HotResourceSummary": [{ + "resource_type": "old gen", + "threshold": 0.65, + "value": 0.81827232588145373, + "avg": NaN, + "max": NaN, + "min": NaN, + "unit_type": "heap usage in percentage", + "time_period_seconds": 600, + "TopConsumerSummary": [{ + "name": "CACHE_FIELDDATA_SIZE", + "value": 590702564 + }, + { + "name": "CACHE_REQUEST_SIZE", + "value": 28375 + }, + { + "name": "CACHE_QUERY_SIZE", + "value": 12687 + } + ], + }] + }] + }] + }] +} +``` diff --git a/docs/pa/rca/index.md b/docs/pa/rca/index.md new file mode 100644 index 00000000..765b9e23 --- /dev/null +++ b/docs/pa/rca/index.md @@ -0,0 +1,17 @@ +--- +layout: default +title: Root Cause Analysis +nav_order: 50 +parent: Performance Analyzer +has_children: true +--- + +# Root Cause Analysis + +The OpenSearch Performance Analyzer plugin (PA) captures OpenSearch and JVM activity, plus their lower-level resource usage (e.g. disk, network, CPU, and memory). Based on this instrumentation, Performance Analyzer computes and exposes diagnostic metrics so that administrators can measure and understand the bottlenecks in their OpenSearch clusters. + +The Root Cause Analysis framework (RCA) uses the information from PA to alert administrators about the root cause of performance and availability issues that their clusters might be experiencing. + +In broad strokes, the framework helps you access data streams from OpenSearch nodes running Performance Analyzer. You write snippets of Java to choose the streams that matter to you and evaluate the streams' PA metrics against certain thresholds. As RCA runs, you can access the state of each analysis using the REST API. + +To learn more about Root Cause Analysis, see [its repository on GitHub](https://github.com/opensearch-project/performance-analyzer-rca). diff --git a/docs/pa/rca/reference.md b/docs/pa/rca/reference.md new file mode 100644 index 00000000..8a853e3a --- /dev/null +++ b/docs/pa/rca/reference.md @@ -0,0 +1,11 @@ +--- +layout: default +title: RCA Reference +parent: Root Cause Analysis +grand_parent: Performance Analyzer +nav_order: 3 +--- + +# RCA reference + +You can find a reference of available RCAs and their purposes on [Github](https://github.com/opensearch-project/performance-analyzer-rca/tree/master/docs). diff --git a/docs/pa/reference.md b/docs/pa/reference.md new file mode 100644 index 00000000..77b61d89 --- /dev/null +++ b/docs/pa/reference.md @@ -0,0 +1,560 @@ +--- +layout: default +title: Metrics Reference +parent: Performance Analyzer +nav_order: 3 +--- + +# Metrics reference + +This page contains all Performance Analyzer metrics. All metrics support the `avg`, `sum`, `min`, and `max` aggregations, although certain metrics measure only one thing, making the choice of aggregation irrelevant. + +For information on dimensions, see the [dimensions reference](#dimensions-reference). + +This list is extensive. We recommend using Ctrl/Cmd + F to find what you're looking for. +{: .tip } + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
MetricDimensionsDescription
CPU_Utilization + ShardID, IndexName, Operation, ShardRole + CPU usage ratio. CPU time (in milliseconds) used by the associated thread(s) in the past five seconds, divided by 5000 milliseconds. +
Paging_MajfltRate + The number of major faults per second in the past five seconds. A major fault requires the process to load a memory page from disk. +
Paging_MinfltRate + The number of minor faults per second in the past five seconds. A minor fault does not requires the process to load a memory page from disk. +
Paging_RSS + The number of pages the process has in real memory---the pages that count towards text, data, or stack space. This number does not include pages that have not been demand-loaded in or swapped out. +
Sched_Runtime + Time (seconds) spent executing on the CPU per context switch. +
Sched_Waittime + Time (seconds) spent waiting on a run queue per context switch. +
Sched_CtxRate + Number of times run on the CPU per second in the past five seconds. +
Heap_AllocRate + An approximation of the heap memory allocated, in bytes, per second in the past five seconds +
IO_ReadThroughput + Number of bytes read per second in the last five seconds. +
IO_WriteThroughput + Number of bytes written per second in the last five seconds. +
IO_TotThroughput + Number of bytes read or written per second in the last five seconds. +
IO_ReadSyscallRate + Read system calls per second in the last five seconds. +
IO_WriteSyscallRate + Write system calls per second in the last five seconds. +
IO_TotalSyscallRate + Read and write system calls per second in the last five seconds. +
Thread_Blocked_Time + Average time (seconds) that the associated thread(s) blocked to enter or reenter a monitor. +
Thread_Blocked_Event + The total number of times that the associated thread(s) blocked to enter or reenter a monitor (i.e. the number of times a thread has been in the blocked state). +
ShardEvents + The total number of events executed on a shard in the past five seconds. +
ShardBulkDocs + The total number of documents indexed in the past five seconds. +
Indexing_ThrottleTime + ShardID, IndexName + Time (milliseconds) that the index has been under merge throttling control in the past five seconds. +
Cache_Query_Hit + The number of successful lookups in the query cache in the past five seconds. +
Cache_Query_Miss + The number of lookups in the query cache that failed to retrieve a `DocIdSet` in the past five seconds. `DocIdSet` is a set of document IDs in Lucene. +
Cache_Query_Size + Query cache memory size in bytes. +
Cache_FieldData_Eviction + The number of times OpenSearch has evicted data from the fielddata heap space (occurs when the heap space is full) in the past five seconds. +
Cache_FieldData_Size + Fielddata memory size in bytes. +
Cache_Request_Hit + The number of successful lookups in the shard request cache in the past five seconds. +
Cache_Request_Miss + The number of lookups in the request cache that failed to retrieve the results of search requests in the past five seconds. +
Cache_Request_Eviction + The number of times OpenSearch evicts data from shard request cache (occurs when the request cache is full) in the past five seconds. +
Cache_Request_Size + Shard request cache memory size in bytes. +
Refresh_Event + The total number of refreshes executed in the past five seconds. +
Refresh_Time + The total time (milliseconds) spent executing refreshes in the past five seconds +
Flush_Event + The total number of flushes executed in the past five seconds. +
Flush_Time + The total time (milliseconds) spent executing flushes in the past five seconds. +
Merge_Event + The total number of merges executed in the past five seconds. +
Merge_Time + The total time (milliseconds) spent executing merges in the past five seconds. +
Merge_CurrentEvent + The current number of merges executing. +
Indexing_Buffer + Index buffer memory size in bytes. +
Segments_Total + The number of segments. +
Segments_Memory + Estimated memory usage of segments in bytes. +
Terms_Memory + Estimated memory usage of terms dictionaries in bytes. +
StoredFields_Memory + Estimated memory usage of stored fields in bytes. +
TermVectors_Memory + Estimated memory usage of term vectors in bytes. +
Norms_Memory + Estimated memory usage of norms (normalization factors) in bytes. +
Points_Memory + Estimated memory usage of points in bytes. +
DocValues_Memory + Estimated memory usage of doc values in bytes. +
IndexWriter_Memory + Estimated memory usage by the index writer in bytes. +
Bitset_Memory + Estimated memory usage for the cached bit sets in bytes. +
VersionMap_Memory + Estimated memory usage of the version map in bytes. +
Shard_Size_In_Bytes + Estimated disk usage of the shard in bytes. +
Latency + Operation, Exception, Indices, HTTPRespCode, ShardID, IndexName, ShardRole + Latency (milliseconds) of a request. +
GC_Collection_Event + MemType + The number of garbage collections that have occurred in the past five seconds. +
GC_Collection_Time + The approximate accumulated time (milliseconds) of all garbage collections that have occurred in the past five seconds. +
Heap_Committed + The amount of memory (bytes) that is committed for the JVM to use. +
Heap_Init + The amount of memory (bytes) that the JVM initially requests from the operating system for memory management. +
Heap_Max + The maximum amount of memory (bytes) that can be used for memory management. +
Heap_Used + The amount of used memory in bytes. +
Disk_Utilization + DiskName + Disk utilization rate: percentage of disk time spent reading and writing by the OpenSearch process in the past five seconds. +
Disk_WaitTime + Average duration (milliseconds) of read and write operations in the past five seconds. +
Disk_ServiceRate + Service rate: MB read or written per second in the past five seconds. This metric assumes that each disk sector stores 512 bytes. +
Net_TCP_NumFlows + DestAddr + Number of samples collected. Performance Analyzer collects one sample every five seconds. +
Net_TCP_TxQ + Average number of TCP packets in the send buffer. +
Net_TCP_RxQ + Average number of TCP packets in the receive buffer. +
Net_TCP_Lost + Average number of unrecovered recurring timeouts. This number is reset when the recovery finishes or `SND.UNA` is advanced. `SND.UNA` is the sequence number of the first byte of data that has been sent, but not yet acknowledged. +
Net_TCP_SendCWND + Average size (bytes) of the sending congestion window. +
Net_TCP_SSThresh + Average size (bytes) of the slow start size threshold. +
Net_PacketRate4 + Direction + The total number of IPv4 datagrams transmitted/received from/by interfaces per second, including those transmitted or received in error +
Net_PacketDropRate4 + The total number of IPv4 datagrams transmitted or received in error per second. +
Net_PacketRate6 + The total number of IPv6 datagrams transmitted or received from or by interfaces per second, including those transmitted or received in error. +
Net_PacketDropRate6 + The total number of IPv6 datagrams transmitted or received in error per second. +
Net_Throughput + The number of bits transmitted or received per second by all network interfaces. +
ThreadPool_QueueSize + ThreadPoolType + The size of the task queue. +
ThreadPool_RejectedReqs + The number of rejected executions. +
ThreadPool_TotalThreads + The current number of threads in the pool. +
ThreadPool_ActiveThreads + The approximate number of threads that are actively executing tasks. +
Master_PendingQueueSize + N/A + The current number of pending tasks in the cluster state update thread. Each node has a cluster state update thread that submits cluster state update tasks (create index, update mapping, allocate shard, fail shard, etc.). +
HTTP_RequestDocs + Operation, Exception, Indices, HTTPRespCode + The number of items in the request (only for `_bulk` request type). +
HTTP_TotalRequests + The number of finished requests in the past five seconds. +
CB_EstimatedSize + CBType + The current number of estimated bytes. +
CB_TrippedEvents + The number of times the circuit breaker has tripped. +
CB_ConfiguredSize + The limit (bytes) for how much memory operations can use. +
Master_Task_Queue_Time + MasterTaskInsertOrder, MasterTaskPriority, MasterTaskType, MasterTaskMetadata + The time (milliseconds) that a master task spent in the queue. +
Master_Task_Run_Time + The time (milliseconds) that a master task has been executed. +
+ + +## Dimensions reference + +Dimension | Return values +:--- | :--- +ShardID | ID for the shard (e.g. `1`). +IndexName | Name of the index (e.g. `my-index`). +Operation | Type of operation (e.g. `shardbulk`). +ShardRole | `primary`, `replica` +Exception | OpenSearch exceptions (e.g. `org.opensearch.index_not_found_exception`). +Indices | The list of indices in the request URI. +HTTPRespCode | Response code from OpenSearch (e.g. `200`). +MemType | `totYoungGC`, `totFullGC`, `Survivor`, `PermGen`, `OldGen`, `Eden`, `NonHeap`, `Heap` +DiskName | Name of the disk (e.g. `sda1`). +DestAddr | Destination address (e.g. `010015AC`). +Direction | `in`, `out` +ThreadPoolType | The OpenSearch thread pools (e.g. `index`, `search`,`snapshot`). +CBType | `accounting`, `fielddata`, `in_flight_requests`, `parent`, `request` +MasterTaskInsertOrder | The order in which the task was inserted (e.g. `3691`). +MasterTaskPriority | Priority of the task (e.g. `URGENT`). OpenSearch executes higher priority tasks before lower priority ones, regardless of `insert_order`. +MasterTaskType | `shard-started`, `create-index`, `delete-index`, `refresh-mapping`, `put-mapping`, `CleanupSnapshotRestoreState`, `Update snapshot state` +MasterTaskMetadata | Metadata for the task (if any). diff --git a/docs/ppl/commands.md b/docs/ppl/commands.md new file mode 100644 index 00000000..1f9e2586 --- /dev/null +++ b/docs/ppl/commands.md @@ -0,0 +1,672 @@ +--- +layout: default +title: Commands +parent: Piped processing language +nav_order: 4 +--- + + +# Commands + +Start a PPL query with a `search` command to reference a table to search from. You can have the commands that follow in any order. + +In the following example, the `search` command refers to an `accounts` index as the source, then uses `fields` and `where` commands for the conditions: + +```sql +search source=accounts +| where age > 18 +| fields firstname, lastname +``` + +In the below examples, we represent required arguments in angle brackets `< >` and optional arguments in square brackets `[ ]`. +{: .note } + +## search + +Use the `search` command to retrieve a document from an index. You can only use the `search` command as the first command in the PPL query. + +### Syntax + +```sql +search source= [boolean-expression] +``` + +Field | Description | Required +:--- | :--- |:--- +`search` | Specify search keywords. | Yes +`index` | Specify which index to query from. | No +`bool-expression` | Specify an expression that evaluates to a boolean value. | No + +*Example 1*: Get all documents + +To get all documents from the `accounts` index: + +```sql +search source=accounts; +``` + +| account_number | firstname | address | balance | gender | city | employer | state | age | email | lastname | +:--- | :--- | +| 1 | Amber | 880 Holmes Lane | 39225 | M | Brogan | Pyrami | IL | 32 | amberduke@pyrami.com | Duke +| 6 | Hattie | 671 Bristol Street | 5686 | M | Dante | Netagy | TN | 36 | hattiebond@netagy.com | Bond +| 13 | Nanette | 789 Madison Street | 32838 | F | Nogal | Quility | VA | 28 | null | Bates +| 18 | Dale | 467 Hutchinson Court | 4180 | M | Orick | null | MD | 33 | daleadams@boink.com | Adams + +*Example 2*: Get documents that match a condition + +To get all documents from the `accounts` index that have either `account_number` equal to 1 or have `gender` as `F`: + +```sql +search source=accounts account_number=1 or gender="F"; +``` + +| account_number | firstname | address | balance | gender | city | employer | state | age | email | lastname | +:--- | :--- | +| 1 | Amber | 880 Holmes Lane | 39225 | M | Brogan | Pyrami | IL | 32 | amberduke@pyrami.com | Duke | +| 13 | Nanette | 789 Madison Street | 32838 | F | Nogal | Quility | VA | 28 | null | Bates | + +## dedup + +The `dedup` (data deduplication) command removes duplicate documents defined by a field from the search result. + +### Syntax + +```sql +dedup [int] [keepempty=] [consecutive=] +``` + +Field | Description | Type | Required | Default +:--- | :--- |:--- |:--- |:--- +`int` | Retain the specified number of duplicate events for each combination. The number must be greater than 0. If you do not specify a number, only the first occurring event is kept and all other duplicates are removed from the results. | `string` | No | 1 +`keepempty` | If true, keep the document if any field in the field list has a null value or a field missing. | `nested list of objects` | No | False +`consecutive` | If true, remove only consecutive events with duplicate combinations of values. | No | False | - +`field-list` | Specify a comma-delimited field list. At least one field is required. | Yes | - | - + +*Example 1*: Dedup by one field + +To remove duplicate documents with the same gender: + +```sql +search source=accounts | dedup gender | fields account_number, gender; +``` + +| account_number | gender +:--- | :--- | +1 | M +13 | F + + +*Example 2*: Keep two duplicate documents + +To keep two duplicate documents with the same gender: + +```sql +search source=accounts | dedup 2 gender | fields account_number, gender; +``` + +| account_number | gender +:--- | :--- | +1 | M +6 | M +13 | F + +*Example 3*: Keep or ignore an empty field by default + +To keep two duplicate documents with a `null` field value: + +```sql +search source=accounts | dedup email keepempty=true | fields account_number, email; +``` + +| account_number | email +:--- | :--- | +1 | amberduke@pyrami.com +6 | hattiebond@netagy.com +13 | null +18 | daleadams@boink.com + +To remove duplicate documents with the `null` field value: + +```sql +search source=accounts | dedup email | fields account_number, email; +``` + +| account_number | email +:--- | :--- | +1 | amberduke@pyrami.com +6 | hattiebond@netagy.com +18 | daleadams@boink.com + +*Example 4*: Dedup of consecutive documents + +To remove duplicates of consecutive documents: + +```sql +search source=accounts | dedup gender consecutive=true | fields account_number, gender; +``` + +| account_number | gender +:--- | :--- | +1 | M +13 | F +18 | M + +## eval + +The `eval` command evaluates an expression and appends its result to the search result. + +### Syntax + +```sql +eval = ["," = ]... +``` + +Field | Description | Required +:--- | :--- |:--- +`field` | If a field name does not exist, a new field is added. If the field name already exists, it's overwritten. | Yes +`expression` | Specify any supported expression. | Yes + +*Example 1*: Create a new field + +To create a new `doubleAge` field for each document. `doubleAge` is the result of `age` multiplied by 2: + +```sql +search source=accounts | eval doubleAge = age * 2 | fields age, doubleAge; +``` + +| age | doubleAge +:--- | :--- | +32 | 64 +36 | 72 +28 | 56 +33 | 66 + +*Example 2*: Overwrite the existing field + +To overwrite the `age` field with `age` plus 1: + +```sql +search source=accounts | eval age = age + 1 | fields age; +``` + +| age +:--- | +| 33 +| 37 +| 29 +| 34 + +*Example 3*: Create a new field with a field defined with the `eval` command + +To create a new field `ddAge`. `ddAge` is the result of `doubleAge` multiplied by 2, where `doubleAge` is defined in the `eval` command: + +```sql +search source=accounts | eval doubleAge = age * 2, ddAge = doubleAge * 2 | fields age, doubleAge, ddAge; +``` + +| age | doubleAge | ddAge +:--- | :--- | +| 32 | 64 | 128 +| 36 | 72 | 144 +| 28 | 56 | 112 +| 33 | 66 | 132 + +## fields + +Use the `field` command to keep or remove fields from a search result. + +### Syntax + +```sql +field [+|-] +``` + +Field | Description | Required | Default +:--- | :--- |:---|:--- +`index` | Plus (+) keeps only fields specified in the field list. Minus (-) removes all fields specified in the field list. | No | + +`field list` | Specify a comma-delimited list of fields. | Yes | No default + +*Example 1*: Select specified fields from result + +To get `account_number`, `firstname`, and `lastname` fields from a search result: + +```sql +search source=accounts | fields account_number, firstname, lastname; +``` + +| account_number | firstname | lastname +:--- | :--- | +| 1 | Amber | Duke +| 6 | Hattie | Bond +| 13 | Nanette | Bates +| 18 | Dale | Adams + +*Example 2*: Remove specified fields from a search result + +To remove the `account_number` field from the search results: + +```sql +search source=accounts | fields account_number, firstname, lastname | fields - account_number; +``` + +| firstname | lastname +:--- | :--- | +| Amber | Duke +| Hattie | Bond +| Nanette | Bates +| Dale | Adams + +## rename + +Use the `rename` command to rename one or more fields in the search result. + +### Syntax + +```sql +rename AS ["," AS ]... +``` + +Field | Description | Required +:--- | :--- |:--- +`source-field` | The name of the field that you want to rename. | Yes +`target-field` | The name you want to rename to. | Yes + +*Example 1*: Rename one field + +Rename the `account_number` field as `an`: + +```sql +search source=accounts | rename account_number as an | fields an; +``` + +| an +:--- | +| 1 +| 6 +| 13 +| 18 + +*Example 2*: Rename multiple fields + +Rename the `account_number` field as `an` and `employer` as `emp`: + +```sql +search source=accounts | rename account_number as an, employer as emp | fields an, emp; +``` + +| an | emp +:--- | :--- | +| 1 | Pyrami +| 6 | Netagy +| 13 | Quility +| 18 | null + +## sort + +Use the `sort` command to sort search results by a specified field. + +### Syntax + +```sql +sort [count] <[+|-] sort-field>... +``` + +Field | Description | Required | Default +:--- | :--- |:--- +`count` | The maximum number results to return from the sorted result. If count=0, all results are returned. | No | 1000 +`[+|-]` | Use plus [+] to sort by ascending order and minus [-] to sort by descending order. | No | Ascending order +`sort-field` | Specify the field that you want to sort by. | Yes | - + +*Example 1*: Sort by one field + +To sort all documents by the `age` field in ascending order: + +```sql +search source=accounts | sort age | fields account_number, age; +``` + +| account_number | age | +:--- | :--- | +| 13 | 28 +| 1 | 32 +| 18 | 33 +| 6 | 36 + +*Example 2*: Sort by one field and return all results + +To sort all documents by the `age` field in ascending order and specify count as 0 to get back all results: + +```sql +search source=accounts | sort 0 age | fields account_number, age; +``` + +| account_number | age | +:--- | :--- | +| 13 | 28 +| 1 | 32 +| 18 | 33 +| 6 | 36 + +*Example 3*: Sort by one field in descending order + +To sort all documents by the `age` field in descending order: + +```sql +search source=accounts | sort - age | fields account_number, age; +``` + +| account_number | age | +:--- | :--- | +| 6 | 36 +| 18 | 33 +| 1 | 32 +| 13 | 28 + +*Example 4*: Specify the number of sorted documents to return + +To sort all documents by the `age` field in ascending order and specify count as 2 to get back two results: + +```sql +search source=accounts | sort 2 age | fields account_number, age; +``` + +| account_number | age | +:--- | :--- | +| 13 | 28 +| 1 | 32 + +*Example 5*: Sort by multiple fields + +To sort all documents by the `gender` field in ascending order and `age` field in descending order: + +```sql +search source=accounts | sort + gender, - age | fields account_number, gender, age; +``` + +| account_number | gender | age | +:--- | :--- | :--- | +| 13 | F | 28 +| 6 | M | 36 +| 18 | M | 33 +| 1 | M | 32 + +## stats + +Use the `stats` command to aggregate from search results. + +The following table lists the aggregation functions and also indicates how each one handles null or missing values: + +Function | NULL | MISSING +:--- | :--- |:--- +`COUNT` | Not counted | Not counted +`SUM` | Ignore | Ignore +`AVG` | Ignore | Ignore +`MAX` | Ignore | Ignore +`MIN` | Ignore | Ignore + + +### Syntax + +``` +stats ... [by-clause]... +``` + +Field | Description | Required | Default +:--- | :--- |:--- +`aggregation` | Specify a statistical aggregation function. The argument of this function must be a field. | Yes | 1000 +`by-clause` | Specify one or more fields to group the results by. If not specified, the `stats` command returns only one row, which is the aggregation over the entire result set. | No | - + +*Example 1*: Calculate the average value of a field + +To calculate the average `age` of all documents: + +```sql +search source=accounts | stats avg(age); +``` + +| avg(age) +:--- | +| 32.25 + +*Example 2*: Calculate the average value of a field by group + +To calculate the average age grouped by gender: + +```sql +search source=accounts | stats avg(age) by gender; +``` + +| gender | avg(age) +:--- | :--- | +| F | 28.0 +| M | 33.666666666666664 + +*Example 3*: Calculate the average and sum of a field by group + +To calculate the average and sum of age grouped by gender: + +```sql +search source=accounts | stats avg(age), sum(age) by gender; +``` + +| gender | avg(age) | sum(age) +:--- | :--- | +| F | 28 | 28 +| M | 33.666666666666664 | 101 + +*Example 4*: Calculate the maximum value of a field + +To calculate the maximum age: + +```sql +search source=accounts | stats max(age); +``` + +| max(age) +:--- | +| 36 + +*Example 5*: Calculate the maximum and minimum value of a field by group + +To calculate the maximum and minimum age values grouped by gender: + +```sql +search source=accounts | stats max(age), min(age) by gender; +``` + +| gender | min(age) | max(age) +:--- | :--- | :--- | +| F | 28 | 28 +| M | 32 | 36 + +## where + +Use the `where` command with a bool expression to filter the search result. The `where` command only returns the result when the bool expression evaluates to true. + +### Syntax + +```sql +where +``` + +Field | Description | Required +:--- | :--- |:--- +`bool-expression` | An expression that evaluates to a boolean value. | No + +*Example 1*: Filter result set with a condition + +To get all documents from the `accounts` index where `account_number` is 1 or gender is `F`: + +```sql +search source=accounts | where account_number=1 or gender="F" | fields account_number, gender; +``` + +| account_number | gender +:--- | :--- | +| 1 | M +| 13 | F + +## head + +Use the `head` command to return the first N number of results in a specified search order. + +### Syntax + +```sql +head [keeplast = (true | false)] [while "("")"] [N] +``` + +Field | Description | Required | Default +:--- | :--- |:--- +`keeplast` | Use along with the `while` argument to check if the last result in the result set is retained. The last result is what caused the `while` condition to evaluate to false or NULL. Set `keeplast` to true to retain the last result and false to discard it. | No | True +`while` | An expression that evaluates to either true or false. You cannot use statistical functions in this expression. | No | False +`N` | Specify the number of results to return. | No | 10 + +*Example 1*: Get the first 10 results + +To get the first 10 results: + +```sql +search source=accounts | fields firstname, age | head; +``` + +| firstname | age +:--- | :--- | +| Amber | 32 +| Hattie | 36 +| Nanette | 28 + +*Example 2*: Get the first N results + +To get the first two results: + +```sql +search source=accounts | fields firstname, age | head 2; +``` + +| firstname | age +:--- | :--- | +| Amber | 32 +| Hattie | 36 + +*Example 3*: Get the first N results that match a while condition + +To get the first 3 results from all accounts with age less than 30: + +```sql +search source=accounts | fields firstname, age | sort age | head while(age < 30) 3; +``` + +| firstname | age +:--- | :--- | +| Nanette | 28 +| Amber | 32 + +*Example 4*: Get the first N results with a while condition with the last result that failed the condition + +To get the first 3 results from all accounts with age less than 30 and include the last failed condition: + +```sql +search source=accounts | fields firstname, age | sort age | head keeplast=false while(age < 30) 3; +``` + +| firstname | age +:--- | :--- | +| Nanette | 28 + +## rare + +Use the `rare` command to find the least common values of all fields in a field list. +A maximum of 10 results are returned for each distinct set of values of the group-by fields. + +### Syntax + +```sql +rare [by-clause] +``` + +Field | Description | Required +:--- | :--- |:--- +`field-list` | Specify a comma-delimited list of field names. | No +`by-clause` | Specify one or more fields to group the results by. | No + +*Example 1*: Find the least common values in a field + +To find the least common values of gender: + +```sql +search source=accounts | rare gender; +``` + +| gender +:--- | +| F +| M + +*Example 2*: Find the least common values grouped by gender + +To find the least common age grouped by gender: + +```sql +search source=accounts | rare age by gender; +``` + +| gender | age +:--- | :--- | +| F | 28 +| M | 32 +| M | 33 + +## top {#top-command} + +Use the `top` command to find the most common values of all fields in the field list. + +### Syntax + +```sql +top [N] [by-clause] +``` + +Field | Description | Default +:--- | :--- |:--- +`N` | Specify the number of results to return. | 10 +`field-list` | Specify a comma-delimited list of field names. | - +`by-clause` | Specify one or more fields to group the results by. | - + +*Example 1*: Find the most common values in a field + +To find the most common genders: + +```sql +search source=accounts | top gender; +``` + +| gender +:--- | +| M +| F + +*Example 2*: Find the most common value in a field + +To find the most common gender: + +```sql +search source=accounts | top 1 gender; +``` + +| gender +:--- | +| M + +*Example 2*: Find the most common values grouped by gender + +To find the most common age grouped by gender: + +```sql +search source=accounts | top 1 age by gender; +``` + +| gender | age +:--- | :--- | +| F | 28 +| M | 32 diff --git a/docs/ppl/datatypes.md b/docs/ppl/datatypes.md new file mode 100644 index 00000000..5073e06d --- /dev/null +++ b/docs/ppl/datatypes.md @@ -0,0 +1,36 @@ +--- +layout: default +title: Data Types +parent: Piped processing language +nav_order: 6 +--- + + +# Data types + +The following table shows the data types supported by the PPL plugin and how each one maps to OpenSearch and SQL data types: + +PPL Type | OpenSearch Type | SQL Type +:--- | :--- | :--- +boolean | boolean | BOOLEAN +byte | byte | TINYINT +byte | short | SMALLINT +integer | integer | INTEGER +long | long | BIGINT +float | float | REAL +float | half_float | FLOAT +float | scaled_float | DOUBLE +double | double | DOUBLE +string | keyword | VARCHAR +text | text | VARCHAR +timestamp | date | TIMESTAMP +ip | ip | VARCHAR +timestamp | date | TIMESTAMP +binary | binary | VARBINARY +struct | object | STRUCT +array | nested | STRUCT + +In addition to this list, the PPL plugin also supports the `datetime` type, though it doesn't have a corresponding mapping with OpenSearch. +To use a function without a corresponding mapping, you must explicitly convert the data type to one that does. + +The PPL plugin supports all SQL date and time types. To learn more, see [SQL Data Types](../../sql/datatypes/). diff --git a/docs/ppl/endpoint.md b/docs/ppl/endpoint.md new file mode 100644 index 00000000..4ea6919a --- /dev/null +++ b/docs/ppl/endpoint.md @@ -0,0 +1,22 @@ +--- +layout: default +title: Endpoint +parent: Piped processing language +nav_order: 1 +--- + +# Endpoint + +To send a query request to PPL plugin, use the HTTP POST request. +We recommend a POST request because it doesn't have any length limit and it allows you to pass other parameters to the plugin for other functionality. + +Use the explain endpoint for query translation and troubleshooting. + +## Request Format + +To use the PPL plugin with your own applications, send requests to `_opensearch/_ppl`, with your query in the request body: + +```json +curl -H 'Content-Type: application/json' -X POST localhost:9200/_opensearch/_ppl \ +... -d '{"query" : "source=accounts | fields firstname, lastname"}' +``` diff --git a/docs/ppl/functions.md b/docs/ppl/functions.md new file mode 100644 index 00000000..59cf3eb8 --- /dev/null +++ b/docs/ppl/functions.md @@ -0,0 +1,10 @@ +--- +layout: default +title: Functions +parent: Piped processing language +nav_order: 10 +--- + +# Functions + +The PPL plugin supports all SQL functions. To learn more, see [SQL Functions](../../sql/functions/). diff --git a/docs/ppl/identifiers.md b/docs/ppl/identifiers.md new file mode 100644 index 00000000..aac5b744 --- /dev/null +++ b/docs/ppl/identifiers.md @@ -0,0 +1,72 @@ +--- +layout: default +title: Identifiers +parent: Piped processing language +nav_order: 7 +--- + + +# Identifiers + +An identifier is an ID to name your database objects, such as index names, field names, aliases, and so on. +OpenSearch supports two types of identifiers: regular identifiers and delimited identifiers. + +## Regular identifiers + +A regular identifier is a string of characters that starts with an ASCII letter (lower or upper case). +The next character can either be a letter, digit, or underscore (_). It can't be a reserved keyword. +Whitespace and other special characters are also not allowed. + +OpenSearch supports the following regular identifiers: + +1. Identifiers prefixed by a dot `.` sign. Use to hide an index. For example `.opensearch-dashboards`. +2. Identifiers prefixed by an `@` sign. Use for meta fields generated by Logstash ingestion. +3. Identifiers with hyphen `-` in the middle. Use for index names with date information. +4. Identifiers with star `*` present. Use for wildcard match of index patterns. + +For regular identifiers, you can use the name without any back tick or escape characters. +In this example, `source`, `fields`, `account_number`, `firstname`, and `lastname` are all identifiers. Out of these, the `source` field is a reserved identifier. + +```sql +source=accounts | fields account_number, firstname, lastname; +``` + +| account_number | firstname | lastname | +:--- | :--- | +| 1 | Amber | Duke +| 6 | Hattie | Bond +| 13 | Nanette | Bates +| 18 | Dale | Adams + + +## Delimited identifiers + +A delimited identifier can contain special characters not allowed by a regular identifier. +You must enclose delimited identifiers with back ticks (\`\`). Back ticks differentiate the identifier from special characters. + +If the index name includes a dot (`.`), for example, `log-2021.01.11`, use delimited identifiers with back ticks to escape it \``log-2021.01.11`\`. + +Typical examples of using delimited identifiers: + +1. Identifiers with reserved keywords. +2. Identifiers with a `.` present. Similarly, `-` to include date information. +3. Identifiers with other special characters. For example, Unicode characters. + +To quote an index name with back ticks: + +```sql +source=`accounts` | fields `account_number`; +``` + +| account_number | +:--- | +| 1 | +| 6 | +| 13 | +| 18 | + +## Case sensitivity + +Identifiers are case sensitive. They must be exactly the same as what's stored in OpenSearch. + +For example, if you run `source=Accounts`, you'll get an index not found exception because the actual index name is in lower case. diff --git a/docs/ppl/index.md b/docs/ppl/index.md new file mode 100644 index 00000000..ec68cee3 --- /dev/null +++ b/docs/ppl/index.md @@ -0,0 +1,58 @@ +--- +layout: default +title: Piped processing language +nav_order: 42 +has_children: true +has_toc: false +--- + +# Piped Processing Language + +Piped Processing Language (PPL) is a query language that lets you use pipe (`|`) syntax to explore, discover, and query data stored in OpenSearch. + +To quickly get up and running with PPL, use **Query Workbench** in OpenSearch Dashboards. To learn more, see [Workbench](../sql/workbench/). + +The PPL syntax consists of commands delimited by the pipe character (`|`) where data flows from left to right through each pipeline. + +```sql +search command | command 1 | command 2 ... +``` + +You can only use read-only commands like `search`, `where`, `fields`, `rename`, `dedup`, `stats`, `sort`, `eval`, `head`, `top`, and `rare`. + +## Quick start + +To get started with PPL, choose **Dev Tools** in OpenSearch Dashboards and use the `bulk` operation to index some sample data: + +```json +PUT accounts/_bulk?refresh +{"index":{"_id":"1"}} +{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"} +{"index":{"_id":"6"}} +{"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"} +{"index":{"_id":"13"}} +{"account_number":13,"balance":32838,"firstname":"Nanette","lastname":"Bates","age":28,"gender":"F","address":"789 Madison Street","employer":"Quility","city":"Nogal","state":"VA"} +{"index":{"_id":"18"}} +{"account_number":18,"balance":4180,"firstname":"Dale","lastname":"Adams","age":33,"gender":"M","address":"467 Hutchinson Court","email":"daleadams@boink.com","city":"Orick","state":"MD"} +``` + +Go to **Query Workbench** and select **PPL**. + +The following example returns `firstname` and `lastname` fields for documents in an `accounts` index with `age` greater than 18: + +```json +search source=accounts +| where age > 18 +| fields firstname, lastname +``` + +#### Sample Response + +| id | firstname | lastname | +:--- | :--- | :--- | +| 0 | Amber | Duke +| 1 | Hattie | Bond +| 2 | Nanette | Bates +| 3 | Dale | Adams + +![PPL query workbench](../images/ppl.png) diff --git a/docs/ppl/protocol.md b/docs/ppl/protocol.md new file mode 100644 index 00000000..747a265c --- /dev/null +++ b/docs/ppl/protocol.md @@ -0,0 +1,71 @@ +--- +layout: default +title: Protocol +parent: Piped processing language +nav_order: 2 +--- + +# Protocol + +The PPL plugin provides responses in JDBC format. The JDBC format is widely used because it provides schema information and more functionality such as pagination. Besides JDBC driver, various clients can benefit from the detailed and well formatted response. + +## Response Format + +The body of HTTP POST request can take a few more additional fields with the PPL query: + +```json +curl -H 'Content-Type: application/json' -X POST localhost:9200/_opensearch/_ppl \ +... -d '{"query" : "source=accounts | fields firstname, lastname"}' +``` + +The following example shows a normal response where the schema includes a field name and its type and datarows includes the result set: + +```json +{ + "schema": [ + { + "name": "firstname", + "type": "string" + }, + { + "name": "lastname", + "type": "string" + } + ], + "datarows": [ + [ + "Amber", + "Duke" + ], + [ + "Hattie", + "Bond" + ], + [ + "Nanette", + "Bates" + ], + [ + "Dale", + "Adams" + ] + ], + "total": 4, + "size": 4 +} +``` + +If any error occurred, error message and the cause will be returned instead: + +```json +curl -H 'Content-Type: application/json' -X POST localhost:9200/_opensearch/_ppl \ +... -d '{"query" : "source=unknown | fields firstname, lastname"}' +{ + "error": { + "reason": "Error occurred in OpenSearch engine: no such index [unknown]", + "details": "org.opensearch.index.IndexNotFoundException: no such index [unknown]\nFor more details, please send request for Json format to see the raw response from opensearch engine.", + "type": "IndexNotFoundException" + }, + "status": 404 +} +``` diff --git a/docs/ppl/settings.md b/docs/ppl/settings.md new file mode 100644 index 00000000..742d3e2b --- /dev/null +++ b/docs/ppl/settings.md @@ -0,0 +1,35 @@ +--- +layout: default +title: Settings +parent: Piped processing language +nav_order: 3 +--- + +# Settings + +The PPL plugin adds a few settings to the standard OpenSearch cluster settings. Most are dynamic, so you can change the default behavior of the plugin without restarting your cluster. + +You can update these settings like any other cluster setting: + +```json +PUT _cluster/settings +{ + "transient": { + "opensearch": { + "ppl": { + "enabled": "false" + } + } + } +} +``` + +Requests to `_opensearch/_ppl` include index names in the request body, so they have the same access policy considerations as the `bulk`, `mget`, and `msearch` operations. If you set the `rest.action.multi.allow_explicit_index` parameter to `false`, the PPL plugin is disabled. + +You can specify the settings shown in the following table: + +Setting | Description | Default +:--- | :--- | :--- +`opensearch.ppl.enabled` | Change to `false` to disable the plugin. | True +`opensearch.ppl.query.memory_limit` | Set heap memory usage limit. If a query crosses this limit, it's terminated. | 85% +`opensearch.query.size_limit` | Set the maximum number of results that you want to see. This impacts the accuracy of aggregation operations. For example, if you have 1000 documents in an index, by default, only 200 documents are extracted from the index for aggregation. | 200 diff --git a/docs/security/access-control/api.md b/docs/security/access-control/api.md new file mode 100644 index 00000000..e043ffe4 --- /dev/null +++ b/docs/security/access-control/api.md @@ -0,0 +1,1257 @@ +--- +layout: default +title: API +parent: Access Control +grand_parent: Security +nav_order: 90 +--- + +# API + +The security plugin REST API lets you programmatically create and manage users, roles, role mappings, action groups, and tenants. + +--- + +#### Table of contents +1. TOC +{:toc} + + +--- + +## Access control for the API + +Just like OpenSearch permissions, you control access to the security plugin REST API using roles. Specify roles in `opensearch.yml`: + +```yml +opensearch_security.restapi.roles_enabled: ["", ...] +``` + +These roles can now access all APIs. To prevent access to certain APIs: + +```yml +opensearch_security.restapi.endpoints_disabled..: ["", ...] +``` + +Possible values for `endpoint` are: + +- ACTIONGROUPS +- ROLES +- ROLESMAPPING +- INTERNALUSERS +- CONFIG +- CACHE +- LICENSE +- SYSTEMINFO + +Possible values for `method` are: + +- GET +- PUT +- POST +- DELETE +- PATCH + +For example, the following configuration grants three roles access to the REST API, but then prevents `test-role` from making PUT, POST, DELETE, or PATCH requests to `_opensearch/_security/api/roles` or `_opensearch/_security/api/internalusers`: + +```yml +opensearch_security.restapi.roles_enabled: ["all_access", "security_rest_api_access", "test-role"] +opensearch_security.restapi.endpoints_disabled.test-role.ROLES: ["PUT", "POST", "DELETE", "PATCH"] +opensearch_security.restapi.endpoints_disabled.test-role.INTERNALUSERS: ["PUT", "POST", "DELETE", "PATCH"] +``` + +To use the PUT and PATCH methods for the [configuration APIs](#configuration), add the following line to `opensearch.yml`: + +```yml +opensearch_security.unsupported.restapi.allow_securityconfig_modification: true +``` + + +## Reserved and hidden resources + +You can mark users, role, role mappings, and action groups as reserved. Resources that have this flag set to true can't be changed using the REST API or OpenSearch Dashboards. + +To mark a resource as reserved, add the following flag: + +```yml +opensearch_dashboards_user: + reserved: true +``` + +Likewise, you can mark users, role, role mappings, and action groups as hidden. Resources that have this flag set to true are not returned by the REST API and not visible in OpenSearch Dashboards: + +```yml +opensearch_dashboards_user: + hidden: true +``` + +Hidden resources are automatically reserved. + +To add or remove these flags, you need to modify `plugins/opensearch_security/securityconfig/internal_users.yml` and run `plugins/opensearch_security/tools/securityadmin.sh`. + + +--- + +## Account + +### Get account details + +Returns account details for the current user. For example, if you sign the request as the `admin` user, the response includes details for that user. + + +#### Request + +``` +GET _opensearch/_security/api/account +``` + +#### Sample response + +```json +{ + "user_name": "admin", + "is_reserved": true, + "is_hidden": false, + "is_internal_user": true, + "user_requested_tenant": null, + "backend_roles": [ + "admin" + ], + "custom_attribute_names": [], + "tenants": { + "global_tenant": true, + "admin_tenant": true, + "admin": true + }, + "roles": [ + "all_access", + "own_index" + ] +} +``` + + +### Change password + +Changes the password for the current user. + + +#### Request + +```json +PUT _opensearch/_security/api/account +{ + "current_password" : "old-password", + "password" : "new-password" +} +``` + + +#### Sample response + +```json +{ + "status": "OK", + "message": "'test-user' updated." +} +``` + + +--- + +## Action groups + +### Get action group + +Retrieves one action group. + + +#### Request + +``` +GET _opensearch/_security/api/actiongroups/ +``` + +#### Sample response + +```json +{ + "custom_action_group": { + "reserved": false, + "hidden": false, + "allowed_actions": [ + "opensearch_dashboards_all_read", + "indices:admin/aliases/get", + "indices:admin/aliases/exists" + ], + "description": "My custom action group", + "static": false + } +} +``` + + +### Get action groups + +Retrieves all action groups. + + +#### Request + +``` +GET _opensearch/_security/api/actiongroups/ +``` + + +#### Sample response + +```json +{ + "read": { + "reserved": true, + "hidden": false, + "allowed_actions": [ + "indices:data/read*", + "indices:admin/mappings/fields/get*" + ], + "type": "index", + "description": "Allow all read operations", + "static": true + }, + ... +} +``` + + +### Delete action group + +#### Request + +``` +DELETE _opensearch/_security/api/actiongroups/ +``` + +#### Sample response + +```json +{ + "status":"OK", + "message":"actiongroup SEARCH deleted." +} +``` + + +### Create action group + +Creates or replaces the specified action group. + +#### Request + +```json +PUT _opensearch/_security/api/actiongroups/ +{ + "allowed_actions": [ + "indices:data/write/index*", + "indices:data/write/update*", + "indices:admin/mapping/put", + "indices:data/write/bulk*", + "read", + "write" + ] +} +``` + +#### Sample response + +```json +{ + "status": "CREATED", + "message": "'my-action-group' created." +} +``` + + +### Patch action group + +Updates individual attributes of an action group. + +#### Request + +```json +PATCH _opensearch/_security/api/actiongroups/ +[ + { + "op": "replace", "path": "/allowed_actions", "value": ["indices:admin/create", "indices:admin/mapping/put"] + } +] +``` + +#### Sample response + +```json +{ + "status":"OK", + "message":"actiongroup SEARCH deleted." +} +``` + + +### Patch action groups + +Creates, updates, or deletes multiple action groups in a single call. + +#### Request + +```json +PATCH _opensearch/_security/api/actiongroups +[ + { + "op": "add", "path": "/CREATE_INDEX", "value": { "allowed_actions": ["indices:admin/create", "indices:admin/mapping/put"] } + }, + { + "op": "remove", "path": "/CRUD" + } +] +``` + +#### Sample response + +```json +{ + "status":"OK", + "message":"actiongroup SEARCH deleted." +} +``` + + +--- + +## Users + +These calls let you create, update, and delete internal users. If you use an external authentication backend, you probably don't need to worry about internal users. + + +### Get user + +#### Request + +``` +GET _opensearch/_security/api/internalusers/ +``` + + +#### Sample response + +```json +{ + "kirk": { + "hash": "", + "roles": [ "captains", "starfleet" ], + "attributes": { + "attribute1": "value1", + "attribute2": "value2", + } + } +} +``` + + +### Get users + +#### Request + +``` +GET _opensearch/_security/api/internalusers/ +``` + +#### Sample response + +```json +{ + "kirk": { + "hash": "", + "roles": [ "captains", "starfleet" ], + "attributes": { + "attribute1": "value1", + "attribute2": "value2", + } + } +} +``` + + +### Delete user + +#### Request + +``` +DELETE _opensearch/_security/api/internalusers/ +``` + +#### Sample response + +```json +{ + "status":"OK", + "message":"user kirk deleted." +} +``` + + +### Create user + +Creates or replaces the specified user. You must specify either `password` (plain text) or `hash` (the hashed user password). If you specify `password`, the security plugin automatically hashes the password before storing it. + +Note that any role you supply in the `opensearch_security_roles` array must already exist for the security plugin to map the user to that role. To see predefined roles, refer to [the list of predefined roles](../users-roles/#predefined-roles). For instructions on how to create a role, refer to [creating a role](./#create-role). + +#### Request + +```json +PUT _opensearch/_security/api/internalusers/ +{ + "password": "kirkpass", + "opensearch_security_roles": ["maintenance_staff", "weapons"], + "backend_roles": ["captains", "starfleet"], + "attributes": { + "attribute1": "value1", + "attribute2": "value2" + } +} +``` + +#### Sample response + +```json +{ + "status":"CREATED", + "message":"User kirk created" +} +``` + + +### Patch user + +Updates individual attributes of an internal user. + +#### Request + +```json +PATCH _opensearch/_security/api/internalusers/ +[ + { + "op": "replace", "path": "/backend_roles", "value": ["klingons"] + }, + { + "op": "replace", "path": "/opensearch_security_roles", "value": ["ship_manager"] + }, + { + "op": "replace", "path": "/attributes", "value": { "newattribute": "newvalue" } + } +] +``` + +#### Sample response + +```json +{ + "status": "OK", + "message": "'kirk' updated." +} +``` + +### Patch users + +Creates, updates, or deletes multiple internal users in a single call. + +#### Request + +```json +PATCH _opensearch/_security/api/internalusers +[ + { + "op": "add", "path": "/spock", "value": { "password": "testpassword1", "backend_roles": ["testrole1"] } + }, + { + "op": "add", "path": "/worf", "value": { "password": "testpassword2", "backend_roles": ["testrole2"] } + }, + { + "op": "remove", "path": "/riker" + } +] +``` + +#### Sample response + +```json +{ + "status": "OK", + "message": "Resource updated." +} +``` + + +--- + +## Roles + + +### Get role + +Retrieves one role. + +#### Request + +``` +GET _opensearch/_security/api/roles/ +``` + +#### Sample response + +```json +{ + "test-role": { + "reserved": false, + "hidden": false, + "cluster_permissions": [ + "cluster_composite_ops", + "indices_monitor" + ], + "index_permissions": [{ + "index_patterns": [ + "movies*" + ], + "dls": "", + "fls": [], + "masked_fields": [], + "allowed_actions": [ + "read" + ] + }], + "tenant_permissions": [{ + "tenant_patterns": [ + "human_resources" + ], + "allowed_actions": [ + "opensearch_dashboards_all_read" + ] + }], + "static": false + } +} +``` + + +### Get roles + +Retrieves all roles. + +#### Request + +``` +GET _opensearch/_security/api/roles/ +``` + +#### Sample response + +```json +{ + "manage_snapshots": { + "reserved": true, + "hidden": false, + "description": "Provide the minimum permissions for managing snapshots", + "cluster_permissions": [ + "manage_snapshots" + ], + "index_permissions": [{ + "index_patterns": [ + "*" + ], + "fls": [], + "masked_fields": [], + "allowed_actions": [ + "indices:data/write/index", + "indices:admin/create" + ] + }], + "tenant_permissions": [], + "static": true + }, + ... +} +``` + + +### Delete role + +#### Request + +``` +DELETE _opensearch/_security/api/roles/ +``` + +#### Sample response + +```json +{ + "status":"OK", + "message":"role test-role deleted." +} +``` + + +### Create role + +Creates or replaces the specified role. + +#### Request + +```json +PUT _opensearch/_security/api/roles/ +{ + "cluster_permissions": [ + "cluster_composite_ops", + "indices_monitor" + ], + "index_permissions": [{ + "index_patterns": [ + "movies*" + ], + "dls": "", + "fls": [], + "masked_fields": [], + "allowed_actions": [ + "read" + ] + }], + "tenant_permissions": [{ + "tenant_patterns": [ + "human_resources" + ], + "allowed_actions": [ + "opensearch_dashboards_all_read" + ] + }] +} +``` + +#### Sample response + +```json +{ + "status": "OK", + "message": "'test-role' updated." +} +``` + + +### Patch role + +Updates individual attributes of a role. + +#### Request + +```json +PATCH _opensearch/_security/api/roles/ +[ + { + "op": "replace", "path": "/index_permissions/0/fls", "value": ["myfield1", "myfield2"] + }, + { + "op": "remove", "path": "/index_permissions/0/dls" + } +] +``` + +#### Sample response + +```json +{ + "status": "OK", + "message": "'' updated." +} +``` + + +### Patch roles + +Creates, updates, or deletes multiple roles in a single call. + +#### Request + +```json +PATCH _opensearch/_security/api/roles +[ + { + "op": "replace", "path": "/role1/index_permissions/0/fls", "value": ["test1", "test2"] + }, + { + "op": "remove", "path": "/role1/index_permissions/0/dls" + }, + { + "op": "add", "path": "/role2/cluster_permissions", "value": ["manage_snapshots"] + } +] +``` + +#### Sample response + +```json +{ + "status": "OK", + "message": "Resource updated." +} +``` + + +--- + +## Role mappings + +### Get role mapping + +Retrieves one role mapping. + +#### Request + +``` +GET _opensearch/_security/api/rolesmapping/ +``` + +#### Sample response + +```json +{ + "role_starfleet" : { + "backend_roles" : [ "starfleet", "captains", "defectors", "cn=ldaprole,ou=groups,dc=example,dc=com" ], + "hosts" : [ "*.starfleetintranet.com" ], + "users" : [ "worf" ] + } +} +``` + + +### Get role mappings + +Retrieves all role mappings. + +#### Request + +``` +GET _opensearch/_security/api/rolesmapping +``` + +#### Sample response + +```json +{ + "role_starfleet" : { + "backend_roles" : [ "starfleet", "captains", "defectors", "cn=ldaprole,ou=groups,dc=example,dc=com" ], + "hosts" : [ "*.starfleetintranet.com" ], + "users" : [ "worf" ] + } +} +``` + + +### Delete role mapping + +#### Request + +``` +DELETE _opensearch/_security/api/rolesmapping/ +``` + +#### Sample response + +```json +{ + "status": "OK", + "message": "'my-role' deleted." +} +``` + + +### Create role mapping + +Creates or replaces the specified role mapping. + +#### Request + +```json +PUT _opensearch/_security/api/rolesmapping/ +{ + "backend_roles" : [ "starfleet", "captains", "defectors", "cn=ldaprole,ou=groups,dc=example,dc=com" ], + "hosts" : [ "*.starfleetintranet.com" ], + "users" : [ "worf" ] +} +``` + +#### Sample response + +```json +{ + "status": "CREATED", + "message": "'my-role' created." +} +``` + + +### Patch role mapping + +Updates individual attributes of a role mapping. + +#### Request + +```json +PATCH _opensearch/_security/api/rolesmapping/ +[ + { + "op": "replace", "path": "/users", "value": ["myuser"] + }, + { + "op": "replace", "path": "/backend_roles", "value": ["mybackendrole"] + } +] +``` + +#### Sample response + +```json +{ + "status": "OK", + "message": "'my-role' updated." +} +``` + + +### Patch role mappings + +Creates or updates multiple role mappings in a single call. + +#### Request + +```json +PATCH _opensearch/_security/api/rolesmapping +[ + { + "op": "add", "path": "/human_resources", "value": { "users": ["user1"], "backend_roles": ["backendrole2"] } + }, + { + "op": "add", "path": "/finance", "value": { "users": ["user2"], "backend_roles": ["backendrole2"] } + } +] +``` + +#### Sample response + +```json +{ + "status": "OK", + "message": "Resource updated." +} +``` + + +--- + +## Tenants + +### Get tenant + +Retrieves one tenant. + +#### Request + +``` +GET _opensearch/_security/api/tenants/ +``` + +#### Sample response + +```json +{ + "human_resources": { + "reserved": false, + "hidden": false, + "description": "A tenant for the human resources team.", + "static": false + } +} +``` + + +### Get tenants + +Retrieves all tenants. + +#### Request + +``` +GET _opensearch/_security/api/tenants/ +``` + +#### Sample response + +```json +{ + "global_tenant": { + "reserved": true, + "hidden": false, + "description": "Global tenant", + "static": true + }, + "human_resources": { + "reserved": false, + "hidden": false, + "description": "A tenant for the human resources team.", + "static": false + } +} +``` + + +### Delete tenant + +#### Request + +``` +DELETE _opensearch/_security/api/tenants/ +``` + +#### Sample response + +```json +{ + "status":"OK", + "message":"tenant human_resources deleted." +} +``` + + +### Create tenant + +Creates or replaces the specified tenant. + +#### Request + +```json +PUT _opensearch/_security/api/tenants/ +{ + "description": "A tenant for the human resources team." +} +``` + +#### Sample response + +```json +{ + "status":"CREATED", + "message":"tenant human_resources created" +} +``` + + +### Patch tenant + +Add, delete, or modify a single tenant. + +#### Request + +```json +PATCH _opensearch/_security/api/tenants/ +[ + { + "op": "replace", "path": "/description", "value": "An updated description" + } +] +``` + +#### Sample response + +```json +{ + "status": "OK", + "message": "Resource updated." +} +``` + + +### Patch tenants + +Add, delete, or modify multiple tenants in a single call. + +#### Request + +```json +PATCH _opensearch/_security/api/tenants/ +[ + { + "op": "replace", + "path": "/human_resources/description", + "value": "An updated description" + }, + { + "op": "add", + "path": "/another_tenant", + "value": { + "description": "Another description." + } + } +] +``` + +#### Sample response + +```json +{ + "status": "OK", + "message": "Resource updated." +} +``` + +--- + +## Configuration + +### Get configuration + +Retrieves the current security plugin configuration in JSON format. + +#### Request + +``` +GET _opensearch/_security/api/securityconfig +``` + + +### Update configuration + +Creates or updates the existing configuration using the REST API rather than `securityadmin.sh`. This operation can easily break your existing configuration, so we recommend using `securityadmin.sh` instead. See [Access control for the API](#access-control-for-the-api) for how to enable this operation. + +#### Request + +```json +PUT _opensearch/_security/api/securityconfig/config +{ + "dynamic": { + "filtered_alias_mode": "warn", + "disable_rest_auth": false, + "disable_intertransport_auth": false, + "respect_request_indices_options": false, + "opensearch-dashboards": { + "multitenancy_enabled": true, + "server_username": "opensearch-dashboardsserver", + "index": ".opensearch-dashboards" + }, + "http": { + "anonymous_auth_enabled": false + }, + "authc": { + "basic_internal_auth_domain": { + "http_enabled": true, + "transport_enabled": true, + "order": 0, + "http_authenticator": { + "challenge": true, + "type": "basic", + "config": {} + }, + "authentication_backend": { + "type": "intern", + "config": {} + }, + "description": "Authenticate via HTTP Basic against internal users database" + } + }, + "auth_failure_listeners": {}, + "do_not_fail_on_forbidden": false, + "multi_rolespan_enabled": true, + "hosts_resolver_mode": "ip-only", + "do_not_fail_on_forbidden_empty": false + } +} +``` + +#### Sample response + +```json +{ + "status": "OK", + "message": "'config' updated." +} +``` + + +### Patch configuration + +Updates the existing configuration using the REST API rather than `securityadmin.sh`. This operation can easily break your existing configuration, so we recommend using `securityadmin.sh` instead. See [Access control for the API](#access-control-for-the-api) for how to enable this operation. + +#### Request + +```json +PATCH _opensearch/_security/api/securityconfig +[ + { + "op": "replace", "path": "/config/dynamic/authc/basic_internal_auth_domain/transport_enabled", "value": "true" + } +] +``` + +#### Sample response + +```json +{ + "status": "OK", + "message": "Resource updated." +} +``` + + +--- + +## Certificates + +### Get certificates + +Retrieves the current security plugin configuration in JSON format. + +#### Request + +``` +GET _opensearch/_security/api/securityconfig +``` + + +### Update configuration + +Creates or updates the existing configuration using the REST API rather than `securityadmin.sh`. This operation can easily break your existing configuration, so we recommend using `securityadmin.sh` instead. See [Access control for the API](#access-control-for-the-api) for how to enable this operation. + +#### Request + +```json +PUT _opensearch/_security/api/securityconfig/config +{ + "dynamic": { + "filtered_alias_mode": "warn", + "disable_rest_auth": false, + "disable_intertransport_auth": false, + "respect_request_indices_options": false, + "opensearch-dashboards": { + "multitenancy_enabled": true, + "server_username": "opensearch-dashboardsserver", + "index": ".opensearch-dashboards" + }, + "http": { + "anonymous_auth_enabled": false + }, + "authc": { + "basic_internal_auth_domain": { + "http_enabled": true, + "transport_enabled": true, + "order": 0, + "http_authenticator": { + "challenge": true, + "type": "basic", + "config": {} + }, + "authentication_backend": { + "type": "intern", + "config": {} + }, + "description": "Authenticate via HTTP Basic against internal users database" + } + }, + "auth_failure_listeners": {}, + "do_not_fail_on_forbidden": false, + "multi_rolespan_enabled": true, + "hosts_resolver_mode": "ip-only", + "do_not_fail_on_forbidden_empty": false + } +} +``` + +#### Sample response + +```json +{ + "status": "OK", + "message": "'config' updated." +} +``` + + +### Patch configuration + +Updates the existing configuration using the REST API rather than `securityadmin.sh`. This operation can easily break your existing configuration, so we recommend using `securityadmin.sh` instead. See [Access control for the API](#access-control-for-the-api) for how to enable this operation. + +#### Request + +```json +PATCH _opensearch/_security/api/securityconfig +[ + { + "op": "replace", "path": "/config/dynamic/authc/basic_internal_auth_domain/transport_enabled", "value": "true" + } +] +``` + +#### Sample response + +```json +{ + "status": "OK", + "message": "Resource updated." +} +``` + +--- + +## Cache + +### Flush cache + +Flushes the security plugin user, authentication, and authorization cache. + + +#### Request + +``` +DELETE _opensearch/_security/api/cache +``` + + +#### Sample response + +```json +{ + "status": "OK", + "message": "Cache flushed successfully." +} +``` + + +--- + +## Health + +### Health check + +Checks to see if the security plugin is up and running. If you operate your cluster behind a load balancer, this operation is useful for determining node health and doesn't require a signed request. + + +#### Request + +``` +GET _opensearch/_security/health +``` + + +#### Sample response + +```json +{ + "message": null, + "mode": "strict", + "status": "UP" +} +``` diff --git a/docs/security/access-control/cross-cluster-search.md b/docs/security/access-control/cross-cluster-search.md new file mode 100644 index 00000000..28941ca3 --- /dev/null +++ b/docs/security/access-control/cross-cluster-search.md @@ -0,0 +1,242 @@ +--- +layout: default +title: Cross-Cluster Search +parent: Access Control +grand_parent: Security +nav_order: 40 +--- + +# Cross-cluster search + +Cross-cluster search is exactly what it sounds like: it lets any node in a cluster execute search requests against other clusters. The security plugin supports cross-cluster search out of the box. + +--- + +#### Table of contents +1. TOC +{:toc} + + +--- + +## Authentication flow + +When accessing a *remote cluster* from a *coordinating cluster* using cross-cluster search: + +1. The security plugin authenticates the user on the coordinating cluster. +1. The security plugin fetches the user's backend roles on the coordinating cluster. +1. The call, including the authenticated user, is forwarded to the remote cluster. +1. The user's permissions are evaluated on the remote cluster. + +You can have different authentication and authorization configurations on the remote and coordinating cluster, but we recommend using the same settings on both. + + +## Permissions + +To query indices on remote clusters, users need to have the following permissions for the index, in addition to `READ` or `SEARCH` permissions: + +``` +indices:admin/shards/search_shards +``` + + +#### Sample roles.yml configuration + +```yml +humanresources: + cluster: + - CLUSTER_COMPOSITE_OPS_RO + indices: + 'humanresources': + '*': + - READ + - indices:admin/shards/search_shards # needed for CCS +``` + + +#### Sample role in OpenSearch Dashboards + +![OpenSearch Dashboards UI for creating a cross-cluster search role](../../../images/security-ccs.png) + + +## Walkthrough + +Save this file as `docker-compose.yml` and run `docker-compose up` to start two single-node clusters on the same network: + +```yml +version: '3' +services: + opensearch-node1: + image: opensearch/opensearch:{{site.opensearch_version}} + container_name: opensearch-node1 + environment: + - cluster.name=opensearch-cluster1 + - discovery.type=single-node + - bootstrap.memory_lock=true # along with the memlock settings below, disables swapping + - "ES_JAVA_OPTS=-Xms512m -Xmx512m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM + ulimits: + memlock: + soft: -1 + hard: -1 + volumes: + - opensearch-data1:/usr/share/opensearch/data + ports: + - 9200:9200 + - 9600:9600 # required for Performance Analyzer + networks: + - opensearch-net + + opensearch-node2: + image: opensearch/opensearch:{{site.opensearch_version}} + container_name: opensearch-node2 + environment: + - cluster.name=opensearch-cluster2 + - discovery.type=single-node + - bootstrap.memory_lock=true # along with the memlock settings below, disables swapping + - "ES_JAVA_OPTS=-Xms512m -Xmx512m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM + ulimits: + memlock: + soft: -1 + hard: -1 + volumes: + - opensearch-data2:/usr/share/opensearch/data + ports: + - 9250:9200 + - 9700:9600 # required for Performance Analyzer + networks: + - opensearch-net + +volumes: + opensearch-data1: + opensearch-data2: + +networks: + opensearch-net: +``` + +After the clusters start, verify the names of each: + +```json +curl -XGET -u 'admin:admin' -k 'https://localhost:9200' +{ + "cluster_name" : "opensearch-cluster1", + ... +} + +curl -XGET -u 'admin:admin' -k 'https://localhost:9250' +{ + "cluster_name" : "opensearch-cluster2", + ... +} +``` + +Both clusters run on `localhost`, so the important identifier is the port number. In this case, use port 9200 (`opensearch-node1`) as the remote cluster, and port 9250 (`opensearch-node2`) as the coordinating cluster. + +To get the IP address for the remote cluster, first identify its container ID: + +```bash +docker ps +CONTAINER ID IMAGE PORTS NAMES +6fe89ebc5a8e opensearch/opensearch:{{site.opensearch_version}} 0.0.0.0:9200->9200/tcp, 0.0.0.0:9600->9600/tcp, 9300/tcp opensearch-node1 +2da08b6c54d8 opensearch/opensearch:{{site.opensearch_version}} 9300/tcp, 0.0.0.0:9250->9200/tcp, 0.0.0.0:9700->9600/tcp opensearch-node2 +``` + +Then get that container's IP address: + +```bash +docker inspect --format='{% raw %}{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}{% endraw %}' 6fe89ebc5a8e +172.31.0.3 +``` + +On the coordinating cluster, add the remote cluster name and the IP address (with port 9300) for each "seed node." In this case, you only have one seed node: + +```json +curl -k -XPUT -H 'Content-Type: application/json' -u 'admin:admin' 'https://localhost:9250/_cluster/settings' -d ' +{ + "persistent": { + "search.remote": { + "opensearch-cluster1": { + "seeds": ["172.31.0.3:9300"] + } + } + } +}' +``` + +On the remote cluster, index a document: + +```bash +curl -XPUT -k -H 'Content-Type: application/json' -u 'admin:admin' 'https://localhost:9200/books/_doc/1' -d '{"Dracula": "Bram Stoker"}' +``` + +At this point, cross-cluster search works. You can test it using the `admin` user: + +```bash +curl -XGET -k -u 'admin:admin' 'https://localhost:9250/opensearch-cluster1:books/_search?pretty' +{ + ... + "hits": [{ + "_index": "opensearch-cluster1:books", + "_type": "_doc", + "_id": "1", + "_score": 1.0, + "_source": { + "Dracula": "Bram Stoker" + } + }] +} +``` + +To continue testing, create a new user on both clusters: + +```bash +curl -XPUT -k -u 'admin:admin' 'https://localhost:9200/_opensearch/_security/api/internalusers/booksuser' -H 'Content-Type: application/json' -d '{"password":"password"}' +curl -XPUT -k -u 'admin:admin' 'https://localhost:9250/_opensearch/_security/api/internalusers/booksuser' -H 'Content-Type: application/json' -d '{"password":"password"}' +``` + +Then run the same search as before with `booksuser`: + +```json +curl -XGET -k -u booksuser:password 'https://localhost:9250/opensearch-cluster1:books/_search?pretty' +{ + "error" : { + "root_cause" : [ + { + "type" : "security_exception", + "reason" : "no permissions for [indices:admin/shards/search_shards, indices:data/read/search] and User [name=booksuser, roles=[], requestedTenant=null]" + } + ], + "type" : "security_exception", + "reason" : "no permissions for [indices:admin/shards/search_shards, indices:data/read/search] and User [name=booksuser, roles=[], requestedTenant=null]" + }, + "status" : 403 +} +``` + +Note the permissions error. On the remote cluster, create a role with the appropriate permissions, and map `booksuser` to that role: + +```bash +curl -XPUT -k -u 'admin:admin' -H 'Content-Type: application/json' 'https://localhost:9200/_opensearch/_security/api/roles/booksrole' -d '{"index_permissions":[{"index_patterns":["books"],"allowed_actions":["indices:admin/shards/search_shards","indices:data/read/search"]}]}' +curl -XPUT -k -u 'admin:admin' -H 'Content-Type: application/json' 'https://localhost:9200/_opensearch/_security/api/rolesmapping/booksrole' -d '{"users" : ["booksuser"]}' +``` + +Both clusters must have the user, but only the remote cluster needs the role and mapping; in this case, the coordinating cluster handles authentication (i.e. "Does this request include valid user credentials?"), and the remote cluster handles authorization (i.e. "Can this user access this data?"). +{: .tip } + +Finally, repeat the search: + +```bash +curl -XGET -k -u booksuser:password 'https://localhost:9250/opensearch-cluster1:books/_search?pretty' +{ + ... + "hits": [{ + "_index": "opensearch-cluster1:books", + "_type": "_doc", + "_id": "1", + "_score": 1.0, + "_source": { + "Dracula": "Bram Stoker" + } + }] +} +``` diff --git a/docs/security/access-control/default-action-groups.md b/docs/security/access-control/default-action-groups.md new file mode 100644 index 00000000..3162b8c7 --- /dev/null +++ b/docs/security/access-control/default-action-groups.md @@ -0,0 +1,47 @@ +--- +layout: default +title: Default Action Groups +parent: Access Control +grand_parent: Security +nav_order: 51 +--- + +# Default action groups + +This page catalogs all default action groups. Often, the most coherent way to create new action groups is to use a combination of these default groups and [individual permissions](../permissions). + + +## General + +Name | Description +:--- | :--- +unlimited | Grants complete access. Can be used on an cluster- or index-level. Equates to `"*"`. + + +## Cluster-level + +Name | Description +:---| :--- +cluster_all | Grants all cluster permissions. Equates to `cluster:*`. +cluster_monitor | Grants all cluster monitoring permissions. Equates to `cluster:monitor/*`. +cluster_composite_ops_ro | Grants read-only permissions to execute requests like `mget`, `msearch`, or `mtv`, plus permissions to query for aliases. +cluster_composite_ops | Same as `CLUSTER_COMPOSITE_OPS_RO`, but also grants `bulk` permissions and all aliases permissions. +manage_snapshots | Grants permissions to manage snapshots and repositories. + + +## Index-level + +Name | Description +:--- | :--- +indices_all | Grants all permissions on the index. Equates to `indices:*`. +get | Grants permissions to use `get` and `mget` actions only. +read | Grants read permissions such as search, get field mappings, `get`, and `mget`. +write | Grants permissions to create and update documents within *existing indices*. To create new indices, see `CREATE_INDEX`. +delete | Grants permissions to delete documents. +crud | Combines the READ, WRITE and DELETE action groups. +search | Grants permissions to search documents. Includes SUGGEST. +suggest | Grants permissions to use the suggest API. Included in the READ action group. +create_index | Grants permissions to create indices and mappings. +indices_monitor | Grants permissions to execute all index monitoring actions (e.g. recovery, segments info, index stats, and status). +manage_aliases | Grants permissions to manage aliases. +manage | Grants all monitoring and administration permissions for indices. diff --git a/docs/security/access-control/document-level-security.md b/docs/security/access-control/document-level-security.md new file mode 100644 index 00000000..cdee8bbf --- /dev/null +++ b/docs/security/access-control/document-level-security.md @@ -0,0 +1,127 @@ +--- +layout: default +title: Document-Level Security +parent: Access Control +grand_parent: Security +nav_order: 10 +--- + +# Document-level security + +Document-level security lets you restrict a role to a subset of documents in an index. The easiest way to get started with document- and field-level security is open OpenSearch Dashboards and choose **Security**. Then choose **Roles**, create a new role, and review the **Index permissions** section. + +![Document- and field-level security screen in OpenSearch Dashboards](../../../images/security-dls.png) + + +## Simple roles + +Document-level security uses the OpenSearch query DSL to define which documents a role grants access to. In OpenSearch Dashboards, choose an index pattern and provide a query in the **Document level security** section: + +```json +{ + "bool": { + "must": { + "match": { + "genres": "Comedy" + } + } + } +} +``` + +This query specifies that for the role to have access to a document, its `genres` field must include `Comedy`. + +A typical request to the `_search` API includes `{ "query": { ... } }` around the query, but in this case, you only need to specify the query itself. + +In the REST API, you provide the query as a string, so you must escape your quotes. This role allows a user to read any document in any index with the field `public` set to `true`: + +```json +PUT _opensearch/_security/api/roles/public_data +{ + "cluster_permissions": [ + "*" + ], + "index_permissions": [{ + "index_patterns": [ + "pub*" + ], + "dls": "{\"term\": { \"public\": true}}", + "allowed_actions": [ + "read" + ] + }] +} +``` + +These queries can be as complex as you want, but we recommend keeping them simple to minimize the performance impact that the document-level security feature has on the cluster. +{: .warning } + + +## Parameter substitution + +A number of variables exist that you can use to enforce rules based on the properties of a user. For example, `${user.name}` is replaced with the name of the current user. + +This rule allows a user to read any document where the username is a value of the `readable_by` field: + +```json +PUT _opensearch/_security/api/roles/user_data +{ + "cluster_permissions": [ + "*" + ], + "index_permissions": [{ + "index_patterns": [ + "pub*" + ], + "dls": "{\"term\": { \"readable_by\": \"${user.name}\"}}", + "allowed_actions": [ + "read" + ] + }] +} +``` + +This table lists substitutions. + +Term | Replaced with +:--- | :--- +`${user.name}` | Username. +`${user.roles}` | A comma-separated, quoted list of user roles. +`${attr..}` | An attribute with name `` defined for a user. `` is `internal`, `jwt`, `proxy` or `ldap` + + +## Attribute-based security + +You can use roles and parameter substitution with the `terms_set` query to enable attribute-based security. + +> Note that the `security_attributes` of the index need to be of type `keyword`. + +#### User definition + +```json +PUT _opensearch/_security/api/internalusers/user1 +{ + "password": "asdf", + "backend_roles": ["abac"], + "attributes": { + "permissions": "\"att1\", \"att2\", \"att3\"" + } +} +``` + +#### Role definition + +```json +PUT _opensearch/_security/api/roles/abac +{ + "index_permissions": [{ + "index_patterns": [ + "*" + ], + "dls": "{\"terms_set\": {\"security_attributes\": {\"terms\": [${attr.internal.permissions}], \"minimum_should_match_script\": {\"source\": \"doc['security_attributes'].length\"}}}}", + "allowed_actions": [ + "read" + ] + }] +} +``` diff --git a/docs/security/access-control/field-level-security.md b/docs/security/access-control/field-level-security.md new file mode 100644 index 00000000..f92ea79b --- /dev/null +++ b/docs/security/access-control/field-level-security.md @@ -0,0 +1,125 @@ +--- +layout: default +title: Field-Level Security +parent: Access Control +grand_parent: Security +nav_order: 11 +--- + +# Field-level security + +Field-level security lets you control which document fields a user can see. Just like [document-level security](../document-level-security/), you control access by index within a role. + +The easiest way to get started with document- and field-level security is open OpenSearch Dashboards and choose **Security**. Then choose **Roles**, create a new role, and review the **Index permissions** section. + +--- + +#### Table of contents +1. TOC +{:toc} + + +--- + +## Include or exclude fields + +You have two options when you configure field-level security: include or exclude fields. If you include fields, users see *only* those fields when they retrieve a document. For example, if you include the `actors`, `title`, and `year` fields, a search result might look like this: + +```json +{ + "_index": "movies", + "_type": "_doc", + "_source": { + "year": 2013, + "title": "Rush", + "actors": [ + "Daniel Brühl", + "Chris Hemsworth", + "Olivia Wilde" + ] + } +} +``` + +If you exclude fields, users see everything *but* those fields when they retrieve a document. For example, if you exclude those same fields, the same search result might look like this: + +```json +{ + "_index": "movies", + "_type": "_doc", + "_source": { + "directors": [ + "Ron Howard" + ], + "plot": "A re-creation of the merciless 1970s rivalry between Formula One rivals James Hunt and Niki Lauda.", + "genres": [ + "Action", + "Biography", + "Drama", + "Sport" + ] + } +} +``` + +You can achieve the same outcomes using inclusion or exclusion, so choose whichever makes sense for your use case. Mixing the two doesn't make sense and is not supported. + +You can specify field-level security settings using OpenSearch Dashboards, `roles.yml`, and the REST API. + +- To exclude fields in `roles.yml` or the REST API, add `~` before the field name. +- Field names support wildcards (`*`). + + Wildcards are especially useful for excluding *subfields*. For example, if you index a document that has a string (e.g. `{"title": "Thor"}`), OpenSearch creates a `title` field of type `text`, but it also creates a `title.keyword` subfield of type `keyword`. In this example, to prevent unauthorized access to data in the `title` field, you must also exclude the `title.keyword` subfield. Use `title*` to match all fields that begin with `title`. + + +### OpenSearch Dashboards + +1. Choose a role and **Add index permission**. +1. Choose an index pattern. +1. Under **Field level security**, use the drop-down to select your preferred option. Then specify one or more fields and press Enter. + + +### roles.yml + +```yml +someonerole: + cluster: [] + indices: + movies: + '*': + - "READ" + _fls_: + - "~actors" + - "~title" + - "~year" +``` + +### REST API + +See [Create role](../api/#create-role). + + +## Interaction with multiple roles + +If you map a user to multiple roles, we recommend that those roles use either include *or* exclude statements for each index. The security plugin evaluates field-level security settings using the `AND` operator, so combining include and exclude statements can lead to neither behavior working properly. + +For example, in the `movies` index, if you include `actors`, `title`, and `year` in one role, exclude `actors`, `title`, and `genres` in another role, and then map both roles to the same user, a search result might look like this: + +```json +{ + "_index": "movies", + "_type": "_doc", + "_source": { + "year": 2013, + "directors": [ + "Ron Howard" + ], + "plot": "A re-creation of the merciless 1970s rivalry between Formula One rivals James Hunt and Niki Lauda." + } +} +``` + + +## Interaction with document-level security + +[Document-level security](../document-level-security/) relies on OpenSearch queries, which means that all fields in the query must be visible in order for it to work properly. If you use field-level security in conjunction with document-level security, make sure you don't restrict access to the fields that document-level security uses. diff --git a/docs/security/access-control/field-masking.md b/docs/security/access-control/field-masking.md new file mode 100644 index 00000000..84095953 --- /dev/null +++ b/docs/security/access-control/field-masking.md @@ -0,0 +1,126 @@ +--- +layout: default +title: Field Masking +parent: Access Control +grand_parent: Security +nav_order: 12 +--- + +# Field masking + +If you don't want to remove fields from a document using [field-level security](../field-level-security/), you can mask their values. Currently, field masking is only available for string-based fields and replaces the field's value with a cryptographic hash. + +Field masking works alongside field-level security on the same per-role, per-index basis. You can allow certain roles to see sensitive fields in plain text and mask them for others. A search result with a masked field might look like this: + +```json +{ + "_index": "movies", + "_type": "_doc", + "_source": { + "year": 2013, + "directors": [ + "Ron Howard" + ], + "title": "ca998e768dd2e6cdd84c77015feb29975f9f498a472743f159bec6f1f1db109e" + } +} +``` + + +## Set the salt + +You set the salt (a random string used to hash your data) in `opensearch.yml`: + +```yml +opensearch_security.compliance.salt: abcdefghijklmnopqrstuvqxyz1234567890 +``` + +Property | Description +:--- | :--- +`opensearch_security.compliance.salt` | The salt to use when generating the hash value. Must be at least 32 characters. Only ASCII characters are allowed. Optional. + +Setting the salt is optional, but we highly recommend it. + + +## Configure field masking + +You configure field masking using OpenSearch Dashboards, `roles.yml`, or the REST API. + +### OpenSearch Dashboards + +1. Choose a role. +1. Choose an index permission. +1. For **Anonymization**, specify one or more fields and press Enter. + + +### roles.yml + +```yml +someonerole: + cluster: [] + indices: + movies: + _masked_fields_: + - "title" + - "genres" + '*': + - "READ" +``` + + +### REST API + +See [Create role](../api/#create-role). + + +## (Advanced) Use an alternative hash algorithm + +By default, the security plugin uses the BLAKE2b algorithm, but you can use any hashing algorithm that your JVM provides. This list typically includes MD5, SHA-1, SHA-384, and SHA-512. + +To specify a different algorithm, add it after the masked field: + +```yml +someonerole: + cluster: [] + indices: + movies: + _masked_fields_: + - "title::SHA-512" + - "genres" + '*': + - "READ" +``` + + +## (Advanced) Pattern-based field masking + +Rather than creating a hash, you can use one or more regular expressions and replacement strings to mask a field. The syntax is `:://::`. If you use multiple regular expressions, the results are passed from left to right, like piping in a shell: + +```yml +hr_employee: + index_permissions: + - index_patterns: + - 'humanresources' + allowed_actions: + - ... + masked_fields: + - 'lastname::/.*/::*' + - '*ip_source::/[0-9]{1,3}$/::XXX::/^[0-9]{1,3}/::***' +someonerole: + cluster: [] + indices: + movies: + _masked_fields_: + - "title::/./::*" + - "genres::/^[a-zA-Z]{1,3}/::XXX::/[a-zA-Z]{1,3}$/::YYY" + '*': + - "READ" + +``` + +The `title` statement changes each character in the field to `*`, so you can still discern the length of the masked string. The `genres` statement changes the first three characters of the string to `XXX` and the last three characters to `YYY`. + + +## Effect on audit logging + +The read history feature lets you track read access to sensitive fields in your documents. For example, you might track access to the email field of your customer records. Access to masked fields are excluded from read history, because the user only saw the hash value, not the clear text value of the field. diff --git a/docs/security/access-control/impersonation.md b/docs/security/access-control/impersonation.md new file mode 100644 index 00000000..1fcfc70a --- /dev/null +++ b/docs/security/access-control/impersonation.md @@ -0,0 +1,49 @@ +--- +layout: default +title: User Impersonation +parent: Access Control +grand_parent: Security +nav_order: 20 +--- + +# User impersonation + +User impersonation allows specially privileged users to act as another user without knowledge of nor access to the impersonated user's credentials. + +Impersonation can be useful for testing and troubleshooting, or for allowing system services to safely act as a user. + +Impersonation can occur on either the REST interface or at the transport layer. + + +## REST interface + +To allow one user to impersonate another, add the following to `opensearch.yml`: + +```yml +opensearch_security.authcz.rest_impersonation_user: + : + - + - +``` + +The impersonated user field supports wildcards. Setting it to `*` allows `AUTHENTICATED_USER` to impersonate any user. + + +## Transport interface + +In a similar fashion, add the following to enable transport layer impersonation: + +```yml +opensearch_security.authcz.impersonation_dn: + "CN=spock,OU=client,O=client,L=Test,C=DE": + - worf +``` + + +## Impersonating Users + +To impersonate another user, submit a request to the system with the HTTP header `opensearch_security_impersonate_as` set to the name of the user to be impersonated. A good test is to make a GET request to the `_opensearch/_security/authinfo` URI: + +```bash +curl -XGET -u 'admin:admin' -k -H "opensearch_security_impersonate_as: user_1" https://localhost:9200/_opensearch/_security/authinfo?pretty +``` diff --git a/docs/security/access-control/index.md b/docs/security/access-control/index.md new file mode 100644 index 00000000..30818a56 --- /dev/null +++ b/docs/security/access-control/index.md @@ -0,0 +1,28 @@ +--- +layout: default +title: Access Control +nav_order: 10 +parent: Security +has_children: true +has_toc: false +--- + +# Access control + +After you [configure the security plugin](../configuration/) to use your own certificates and preferred authentication backend, you can start adding users, creating roles, and mapping roles to users. + +This section of the documentation covers what a user is allowed to see and do after successfully authenticating. + + +## Concepts + +Term | Description +:--- | :--- +Permission | An individual action, such as creating an index (e.g. `indices:admin/create`). For a complete list, see [Permissions](permissions/). +Action group | A set of permissions. For example, the predefined `SEARCH` action group authorizes roles to use the `_search` and `_msearch` APIs. +Role | Security roles define the scope of a permission or action group: cluster, index, document, or field. For example, a role named `delivery_analyst` might have no cluster permissions, the `READ` action group for all indices that match the `delivery-data-*` pattern, access to all document types within those indices, and access to all fields except `delivery_driver_name`. +Backend role | (Optional) Arbitrary strings that you specify *or* that come from an external authentication system (e.g. LDAP/Active Directory). Backend roles can help simplify the role mapping process. Rather than mapping a role to 100 individual users, you can map the role to a single backend role that all 100 users share. +User | Users make requests to OpenSearch clusters. A user has credentials (e.g. a username and password), zero or more backend roles, and zero or more custom attributes. +Role mapping | Users assume roles after they successfully authenticate. Role mappings, well, map roles to users (or backend roles). For example, a mapping of `opensearch_dashboards_user` (role) to `jdoe` (user) means that John Doe gains all the permissions of `opensearch_dashboards_user` after authenticating. Likewise, a mapping of `all_access` (role) to `admin` (backend role) means that any user with the backend role of `admin` gains all the permissions of `all_access` after authenticating. You can map each role to many users and/or backend roles. + +The security plugin comes with a number of [predefined action groups](default-action-groups/), roles, mappings, and users. These entities serve as sensible defaults and are good examples of how to use the plugin. diff --git a/docs/security/access-control/multi-tenancy.md b/docs/security/access-control/multi-tenancy.md new file mode 100644 index 00000000..f4812009 --- /dev/null +++ b/docs/security/access-control/multi-tenancy.md @@ -0,0 +1,162 @@ +--- +layout: default +title: OpenSearch Dashboards Multi-Tenancy +parent: Access Control +grand_parent: Security +nav_order: 30 +--- + +# OpenSearch Dashboards multi-tenancy + +*Tenants* in OpenSearch Dashboards are spaces for saving index patterns, visualizations, dashboards, and other OpenSearch Dashboards objects. By default, all OpenSearch Dashboards users have access to two tenants: **Private** and **Global**. The global tenant is shared between every OpenSearch Dashboards user. The private tenant is exclusive to each user and can't be shared. + +Tenants are useful for safely sharing your work with other OpenSearch Dashboards users. You can control which roles have access to a tenant and whether those roles have read or write access. + +You might use the private tenant for exploratory work, create detailed visualizations with your team in an `analysts` tenant, and maintain a summary dashboard for corporate leadership in an `executive` tenant. + +If you share a visualization or dashboard with someone, you can see that the URL includes the tenant: + +``` +http://:5601/app/opensearch-dashboards?security_tenant=analysts#/visualize/edit/c501fa50-7e52-11e9-ae4e-b5d69947d32e?_g=() +``` + + +## Configuration + +Multi-tenancy is enabled by default, but you can disable it or change its settings using `plugins/opensearch_security/securityconfig/config.yml`: + +```yml +config: + dynamic: + opensearch-dashboards: + multitenancy_enabled: true + server_username: opensearch-dashboardsserver + index: '.opensearch-dashboards' + do_not_fail_on_forbidden: false +``` + +Setting | Description +:--- | :--- +`multitenancy_enabled` | Enable or disable multi-tenancy. Default is true. +`server_username` | Must match the name of the OpenSearch Dashboards server user from `opensearch_dashboards.yml`. Default is `opensearch-dashboardsserver`. +`index` | Must match the name of the OpenSearch Dashboards index from `opensearch_dashboards.yml`. Default is `.opensearch-dashboards`. +`do_not_fail_on_forbidden` | If true, the security plugin removes any content that a user is not allowed to see from search results. If false, the plugin returns a security exception. Default is false. + +`opensearch_dashboards.yml` has some additional settings: + +```yml +opensearch.username: opensearch-dashboardsserver +opensearch.password: opensearch-dashboardsserver +opensearch.requestHeadersWhitelist: ["securitytenant","Authorization"] +opensearch_security.multitenancy.enabled: true +opensearch_security.multitenancy.tenants.enable_global: true +opensearch_security.multitenancy.tenants.enable_private: true +opensearch_security.multitenancy.tenants.preferred: ["Private", "Global"] +opensearch_security.multitenancy.enable_filter: false +``` + +Setting | Description +:--- | :--- +`opensearch.requestHeadersWhitelist` | OpenSearch Dashboards requires that you whitelist all HTTP headers that it passes to OpenSearch. Multi-tenancy uses a specific header, `securitytenant`, that must be present with the standard `Authorization` header. If the `securitytenant` header is not whitelisted, OpenSearch Dashboards starts with a red status. +`opensearch_security.multitenancy.enabled` | Enables or disables multi-tenancy in OpenSearch Dashboards. Default is true. +`opensearch_security.multitenancy.tenants.enable_global` | Enables or disables the global tenant. Default is true. +`opensearch_security.multitenancy.tenants.enable_private` | Enables or disables the private tenant. Default is true. +`opensearch_security.multitenancy.tenants.preferred` | Lets you change ordering in the **Tenants** tab of OpenSearch Dashboards. By default, the list starts with global and private (if enabled) and then proceeds alphabetically. You can add tenants here to move them to the top of the list. +`opensearch_security.multitenancy.enable_filter` | If you have many tenants, you can add a search bar to the top of the list. Default is false. + + +## Add tenants + +To create tenants, use OpenSearch Dashboards, the REST API, or `tenants.yml`. + + +#### OpenSearch Dashboards + +1. Open OpenSearch Dashboards. +1. Choose **Security**, **Tenants**, and **Create tenant**. +1. Give the tenant a name and description. +1. Choose **Create**. + + +#### REST API + +See [Create tenant](../api/#create-tenant). + + +#### tenants.yml + +```yml +--- +_meta: + type: "tenants" + config_version: 2 + +## Demo tenants +admin_tenant: + reserved: false + description: "Demo tenant for admin user" +``` + +## Give roles access to tenants + +After creating a tenant, give a role access to it using OpenSearch Dashboards, the REST API, or `roles.yml`. + +- Read-write (`opensearch_dashboards_all_write`) permissions let the role view and modify objects in the tenant. +- Read-only (`opensearch_dashboards_all_read`) permissions let the role view objects, but not modify them. + + +#### OpenSearch Dashboards + +1. Open OpenSearch Dashboards. +1. Choose **Security**, **Roles**, and a role. +1. For **Tenant permissions**, add tenants, press Enter, and give the role read and/or write permissions to it. + + +#### REST API + +See [Create role](../api/#create-role). + + +#### roles.yml + +```yml +--- +test-role: + reserved: false + hidden: false + cluster_permissions: + - "cluster_composite_ops" + - "indices_monitor" + index_permissions: + - index_patterns: + - "movies*" + dls: "" + fls: [] + masked_fields: [] + allowed_actions: + - "read" + tenant_permissions: + - tenant_patterns: + - "human_resources" + allowed_actions: + - "opensearch_dashboards_all_read" + static: false +_meta: + type: "roles" + config_version: 2 +``` + + +## Manage OpenSearch Dashboards indices + +The open source version of OpenSearch Dashboards saves all objects to a single index: `.opensearch-dashboards`. The security plugin uses this index for the global tenant, but separate indices for every other tenant. Each user also has a private tenant, so you might see a large number of indices that follow two patterns: + +``` +.opensearch_dashboards__ +.opensearch_dashboards__ +``` + +The security plugin scrubs these index names of special characters, so they might not be a perfect match of tenant names and usernames. +{: .tip } + +To back up your OpenSearch Dashboards data, [take a snapshot](../../opensearch/snapshot-restore/) of all tenant indices using an index pattern such as `.opensearch-dashboards*`. diff --git a/docs/security/access-control/permissions.md b/docs/security/access-control/permissions.md new file mode 100644 index 00000000..cb2963f7 --- /dev/null +++ b/docs/security/access-control/permissions.md @@ -0,0 +1,162 @@ +--- +layout: default +title: Permissions +parent: Access Control +grand_parent: Security +nav_order: 50 +--- + +# Permissions + +This page is a complete list of available permissions in the security plugin. Each permission controls access to a data type or API. + +Rather than creating new action groups from individual permissions, you can often achieve your desired security posture using some combination of the default action groups. To learn more, see [Default Action Groups](../default-action-groups). +{: .tip } + + +## Cluster + +- cluster:admin/ingest/pipeline/delete +- cluster:admin/ingest/pipeline/get +- cluster:admin/ingest/pipeline/put +- cluster:admin/ingest/pipeline/simulate +- cluster:admin/ingest/processor/grok/get +- cluster:admin/opensearch/ad/detector/delete +- cluster:admin/opensearch/ad/detector/jobmanagement +- cluster:admin/opensearch/ad/detector/run +- cluster:admin/opensearch/ad/detector/search +- cluster:admin/opensearch/ad/detector/stats +- cluster:admin/opensearch/ad/detector/write +- cluster:admin/opensearch/ad/detectors/get +- cluster:admin/opensearch/ad/result/search +- cluster:admin/opensearch/alerting/alerts/ack +- cluster:admin/opensearch/alerting/alerts/get +- cluster:admin/opensearch/alerting/destination/delete +- cluster:admin/opensearch/alerting/destination/email_account/delete +- cluster:admin/opensearch/alerting/destination/email_account/get +- cluster:admin/opensearch/alerting/destination/email_account/search +- cluster:admin/opensearch/alerting/destination/email_account/write +- cluster:admin/opensearch/alerting/destination/email_group/delete +- cluster:admin/opensearch/alerting/destination/email_group/get +- cluster:admin/opensearch/alerting/destination/email_group/search +- cluster:admin/opensearch/alerting/destination/email_group/write +- cluster:admin/opensearch/alerting/destination/get +- cluster:admin/opensearch/alerting/destination/write +- cluster:admin/opensearch/alerting/monitor/delete +- cluster:admin/opensearch/alerting/monitor/execute +- cluster:admin/opensearch/alerting/monitor/get +- cluster:admin/opensearch/alerting/monitor/search +- cluster:admin/opensearch/alerting/monitor/write +- cluster:admin/opensearch/asynchronous_search/stats +- cluster:admin/opensearch/asynchronous_search/delete +- cluster:admin/opensearch/asynchronous_search/get +- cluster:admin/opensearch/asynchronous_search/submit +- cluster:admin/opensearch/reports/definition/create +- cluster:admin/opensearch/reports/definition/delete +- cluster:admin/opensearch/reports/definition/get +- cluster:admin/opensearch/reports/definition/list +- cluster:admin/opensearch/reports/definition/on_demand +- cluster:admin/opensearch/reports/definition/update +- cluster:admin/opensearch/reports/instance/get +- cluster:admin/opensearch/reports/instance/list +- cluster:admin/opensearch/reports/menu/download +- cluster:admin/reindex/rethrottle +- cluster:admin/repository/delete +- cluster:admin/repository/get +- cluster:admin/repository/put +- cluster:admin/repository/verify +- cluster:admin/reroute +- cluster:admin/script/delete +- cluster:admin/script/get +- cluster:admin/script/put +- cluster:admin/settings/update +- cluster:admin/snapshot/create +- cluster:admin/snapshot/delete +- cluster:admin/snapshot/get +- cluster:admin/snapshot/restore +- cluster:admin/snapshot/status +- cluster:admin/snapshot/status* +- cluster:admin/tasks/cancel +- cluster:admin/tasks/test +- cluster:admin/tasks/testunblock +- cluster:monitor/allocation/explain +- cluster:monitor/health +- cluster:monitor/main +- cluster:monitor/nodes/hot_threads +- cluster:monitor/nodes/info +- cluster:monitor/nodes/liveness +- cluster:monitor/nodes/stats +- cluster:monitor/nodes/usage +- cluster:monitor/remote/info +- cluster:monitor/state +- cluster:monitor/stats +- cluster:monitor/task +- cluster:monitor/task/get +- cluster:monitor/tasks/list + + +## Indices + +- indices:admin/aliases +- indices:admin/aliases/exists +- indices:admin/aliases/get +- indices:admin/analyze +- indices:admin/cache/clear +- indices:admin/close +- indices:admin/create +- indices:admin/delete +- indices:admin/exists +- indices:admin/flush +- indices:admin/flush* +- indices:admin/forcemerge +- indices:admin/get +- indices:admin/mapping/put +- indices:admin/mappings/fields/get +- indices:admin/mappings/fields/get* +- indices:admin/mappings/get +- indices:admin/open +- indices:admin/refresh +- indices:admin/refresh* +- indices:admin/resolve/index +- indices:admin/rollover +- indices:admin/seq_no/global_checkpoint_sync +- indices:admin/settings/update +- indices:admin/shards/search_shards +- indices:admin/shrink +- indices:admin/synced_flush +- indices:admin/template/delete +- indices:admin/template/get +- indices:admin/template/put +- indices:admin/types/exists +- indices:admin/upgrade +- indices:admin/validate/query +- indices:data/read/explain +- indices:data/read/field_caps +- indices:data/read/field_caps* +- indices:data/read/get +- indices:data/read/mget +- indices:data/read/mget* +- indices:data/read/msearch +- indices:data/read/msearch/template +- indices:data/read/mtv +- indices:data/read/mtv* +- indices:data/read/scroll +- indices:data/read/scroll/clear +- indices:data/read/search +- indices:data/read/search* +- indices:data/read/search/template +- indices:data/read/tv +- indices:data/write/bulk +- indices:data/write/bulk* +- indices:data/write/delete +- indices:data/write/delete/byquery +- indices:data/write/index +- indices:data/write/reindex +- indices:data/write/update +- indices:data/write/update/byquery +- indices:monitor/recovery +- indices:monitor/segments +- indices:monitor/settings/get +- indices:monitor/shard_stores +- indices:monitor/stats +- indices:monitor/upgrade diff --git a/docs/security/access-control/users-roles.md b/docs/security/access-control/users-roles.md new file mode 100644 index 00000000..4351c4a2 --- /dev/null +++ b/docs/security/access-control/users-roles.md @@ -0,0 +1,171 @@ +--- +layout: default +title: Users and Roles +parent: Access Control +grand_parent: Security +nav_order: 1 +--- + +# Users and roles + +The security plugin includes an internal user database. Use this database in place of or in addition to an external authentication system such as LDAP or Active Directory. + +Roles are the core way of controlling access to your cluster. Roles contain any combination of cluster-wide permissions, index-specific permissions, document- and field-level security, and tenants. Then you map users to these roles so that users gain those permissions. + +Unless you need to create new [read-only or hidden users](../api/#read-only-and-hidden-resources), we **highly** recommend using OpenSearch Dashboards or the REST API to create new users, roles, and role mappings. The `.yml` files are for initial setup, not ongoing use. +{: .warning } + +--- + +#### Table of contents +1. TOC +{:toc} + + +--- + +## Create users + +You can create users using OpenSearch Dashboards, `internal_users.yml`, or the REST API. When creating a user, you can map users to roles using `internal_users.yml` or the REST API, but that feature is not currently available in OpenSearch Dashboards. + +### OpenSearch Dashboards + +1. Choose **Security**, **Internal Users**, and **Create internal user**. +1. Provide a username and password. The security plugin automatically hashes the password and stores it in the `.opensearch_security` index. +1. If desired, specify user attributes. + + Attributes are optional user properties that you can use for variable substitution in index permissions or document-level security. + +1. Choose **Submit**. + +### internal_users.yml + +See [YAML files](../../configuration/yaml/#internal_usersyml). + + +### REST API + +See [Create user](../api/#create-user). + + +## Create roles + +Just like users, you can create roles using OpenSearch Dashboards, `roles.yml`, or the REST API. + + +### OpenSearch Dashboards + +1. Choose **Security**, **Roles**, and **Create role**. +1. Provide a name for the role. +1. Add permissions as desired. + + For example, you might give a role no cluster permissions, `read` permissions to two indices, `unlimited` permissions to a third index, and read permissions to the `analysts` tenant. + +1. Choose **Submit**. + + +### roles.yml + +See [YAML files](../../configuration/yaml/#rolesyml). + + +### REST API + +See [Create role](../api/#create-role). + + +## Map users to roles + +If you didn't specify roles when you created your user, you can map roles to it afterwards. + +Just like users and roles, you create role mappings using OpenSearch Dashboards, `roles_mapping.yml`, or the REST API. + +### OpenSearch Dashboards + +1. Choose **Security**, **Roles**, and a role. +1. Choose the **Mapped users** tab and **Manage mapping**. +1. Specify users or external identities (also known as backend roles). +1. Choose **Map**. + + +### roles_mapping.yml + +See [YAML files](../../configuration/yaml/#roles_mappingyml). + + +### REST API + +See [Create role mapping](../api/#create-role-mapping). + + +## Predefined roles + +The security plugin includes several predefined roles that serve as useful defaults. + +Role | Description +:--- | :--- +`alerting_ack_alerts` | Grants permissions to view and acknowledge alerts, but not modify destinations or monitors. +`alerting_full_access` | Grants full permissions to all alerting actions. +`alerting_read_access` | Grants permissions to view alerts, destinations, and monitors, but not acknowledge alerts or modify destinations or monitors. +`anomaly_full_access` | Grants full permissions to all anomaly detection actions. +`anomaly_read_access` | Grants permissions to view detectors, but not create, modify, or delete detectors. +`all_access` | Grants full access to the cluster: all cluster-wide operations, write to all indices, write to all tenants. +`opensearch_dashboards_read_only` | A special role that prevents users from making changes to visualizations, dashboards, and other OpenSearch Dashboards objects. See `opensearch_security.readonly_mode.roles` in `opensearch_dashboards.yml`. Pair with the `opensearch_dashboards_user` role. +`opensearch_dashboards_user` | Grants permissions to use OpenSearch Dashboards: cluster-wide searches, index monitoring, and write to various OpenSearch Dashboards indices. +`logstash` | Grants permissions for Logstash to interact with the cluster: cluster-wide searches, cluster monitoring, and write to the various Logstash indices. +`manage_snapshots` | Grants permissions to manage snapshot repositories, take snapshots, and restore snapshots. +`readall` | Grants permissions for cluster-wide searches like `msearch` and search permissions for all indices. +`readall_and_monitor` | Same as `readall`, but with added cluster monitoring permissions. +`security_rest_api_access` | A special role that allows access to the REST API. See `opensearch_security.restapi.roles_enabled` in `opensearch.yml` and [Access control for the API](../api/#access-control-for-the-api). +`reports_read_access` | Grants permissions to generate on-demand reports, download existing reports, and view report definitions, but not to create report definitions. +`reports_instances_read_access` | Grants permissions to generate on-demand reports and download existing reports, but not to view or create report definitions. +`reports_full_access` | Grants full permissions to reports. +`asynchronous_search_full_access` | Grants full permissions to all asynchronous search actions. +`asynchronous_search_read_access` | Grants permissions to view asynchronous searches, but not to submit, modify, or delete async searches. + + +For more detailed summaries of the permissions for each role, reference their action groups against the descriptions in [Default action groups](../default-action-groups/). + + +## Sample roles + +The following examples show how you might set up a read-only and a bulk access role. + + +### Set up a read-only user in OpenSearch Dashboards + +Create a new `read_only_index` role: + +1. Open OpenSearch Dashboards. +1. Choose **Security**, **Roles**. +1. Create a new role named `read_only_index`. +1. For **Cluster permissions**, add the `cluster_composite_ops_ro` action group. +1. For **Index Permissions**, add an index pattern. For example, you might specify `my-index-*`. +1. For index permissions, add the `read` action group. +1. Choose **Create**. + +Map three roles to the read-only user: + +1. Choose the **Mapped users** tab and **Manage mapping**. +1. For **Internal users**, add your read-only user. +1. Choose **Map**. +1. Repeat these steps for the `opensearch_dashboards_user` and `opensearch_dashboards_read_only` roles. + + +### Set up a bulk access role in OpenSearch Dashboards + +Create a new `bulk_access` role: + +1. Open OpenSearch Dashboards. +1. Choose **Security**, **Roles**. +1. Create a new role named `bulk_access`. +1. For **Cluster permissions**, add the `cluster_composite_ops` action group. +1. For **Index Permissions**, add an index pattern. For example, you might specify `my-index-*`. +1. For index permissions, add the `write` action group. +1. Choose **Create**. + +Map the role to your user: + +1. Choose the **Mapped users** tab and **Manage mapping**. +1. For **Internal users**, add your bulk access user. +1. Choose **Map**. diff --git a/docs/security/audit-logs/field-reference.md b/docs/security/audit-logs/field-reference.md new file mode 100644 index 00000000..9694cf28 --- /dev/null +++ b/docs/security/audit-logs/field-reference.md @@ -0,0 +1,171 @@ +--- +layout: default +title: Audit Log Field Reference +parent: Audit Logs +grand_parent: Security +nav_order: 1 +--- + +# Audit log field reference + +This page contains descriptions for all audit log fields. + + +## Common attributes + +The following attributes are logged for all event categories, independent of the layer. + +Name | Description +:--- | :--- +`audit_format_version` | The audit log message format version. +`audit_category` | The audit log category, one of FAILED_LOGIN, MISSING_PRIVILEGES, BAD_HEADERS, SSL_EXCEPTION, opensearch_SECURITY_INDEX_ATTEMPT, AUTHENTICATED or GRANTED_PRIVILEGES. +`audit_node_id ` | The ID of the node where the event was generated. +`audit_node_name` | The name of the node where the event was generated. +`audit_node_host_address` | The host address of the node where the event was generated. +`audit_node_host_name` | The host name of the node where the event was generated. +`audit_request_layer` | The layer on which the event has been generated, either TRANSPORT or REST. +`audit_request_origin` | The layer from which the event originated, either TRANSPORT or REST. +`audit_request_effective_user_is_admin` | True if the request was made with a TLS admin certificate, otherwise false. + + +## REST FAILED_LOGIN attributes + +Name | Description +:--- | :--- +`audit_request_effective_user` | The username that failed to authenticate. +`audit_rest_request_path` | The REST endpoint URI. +`audit_rest_request_params` | The HTTP request parameters, if any. +`audit_rest_request_headers` | The HTTP headers, if any. +`audit_request_initiating_user` | The user that initiated the request. Only logged if it differs from the effective user. +`audit_request_body` | The HTTP request body, if any (and if request body logging is enabled). + + +## REST AUTHENTICATED attributes + +Name | Description +:--- | :--- +`audit_request_effective_user` | The username that failed to authenticate. +`audit_request_initiating_user` | The user that initiated the request. Only logged if it differs from the effective user. +`audit_rest_request_path` | The REST endpoint URI. +`audit_rest_request_params` | The HTTP request parameters, if any. +`audit_rest_request_headers` | The HTTP headers, if any. +`audit_request_body` | The HTTP request body, if any (and if request body logging is enabled). + + +## REST SSL_EXCEPTION attributes + +Name | Description +:--- | :--- +`audit_request_exception_stacktrace` | The stack trace of the SSL exception. + + +## REST BAD_HEADERS attributes + +Name | Description +:--- | :--- +`audit_rest_request_path` | The REST endpoint URI. +`audit_rest_request_params` | The HTTP request parameters, if any. +`audit_rest_request_headers` | The HTTP headers, if any. +`audit_request_body` | The HTTP request body, if any (and if request body logging is enabled). + + +## Transport FAILED_LOGIN attributes + +Name | Description +:--- | :--- +`audit_trace_task_id` | The ID of the request. +`audit_transport_headers` | The headers of the request, if any. +`audit_request_effective_user` | The username that failed to authenticate. +`audit_request_initiating_user` | The user that initiated the request. Only logged if it differs from the effective user. +`audit_transport_request_type` | The type of request (e.g. `IndexRequest`). +`audit_request_body` | The HTTP request body, if any (and if request body logging is enabled). +`audit_trace_indices` | The index name(s) included in the request. Can contain wildcards, date patterns, and aliases. Only logged if `resolve_indices` is true. +`audit_trace_resolved_indices` | The resolved index name(s) affected by the request. Only logged if `resolve_indices` is true. +`audit_trace_doc_types` | The document types affected by the request. Only logged if `resolve_indices` is true. + + +## Transport AUTHENTICATED attributes + +Name | Description +:--- | :--- +`audit_trace_task_id` | The ID of the request. +`audit_transport_headers` | The headers of the request, if any. +`audit_request_effective_user` | The username that failed to authenticate. +`audit_request_initiating_user` | The user that initiated the request. Only logged if it differs from the effective user. +`audit_transport_request_type` | The type of request (e.g. `IndexRequest`). +`audit_request_body` | The HTTP request body, if any (and if request body logging is enabled). +`audit_trace_indices` | The index name(s) included in the request. Can contain wildcards, date patterns, and aliases. Only logged if `resolve_indices` is true. +`audit_trace_resolved_indices` | The resolved index name(s) affected by the request. Only logged if `resolve_indices` is true. +`audit_trace_doc_types` | The document types affected by the request. Only logged if `resolve_indices` is true. + + +## Transport MISSING_PRIVILEGES attributes + +Name | Description +:--- | :--- +`audit_trace_task_id` | The ID of the request. +`audit_trace_task_parent_id` | The parent ID of this request, if any. +`audit_transport_headers` | The headers of the request, if any. +`audit_request_effective_user` | The username that failed to authenticate. +`audit_request_initiating_user` | The user that initiated the request. Only logged if it differs from the effective user. +`audit_transport_request_type` | The type of request (e.g. `IndexRequest`). +`audit_request_privilege` | The required privilege of the request (e.g. `indices:data/read/search`). +`audit_request_body` | The HTTP request body, if any (and if request body logging is enabled). +`audit_trace_indices` | The index name(s) included in the request. Can contain wildcards, date patterns, and aliases. Only logged if `resolve_indices` is true. +`audit_trace_resolved_indices` | The resolved index name(s) affected by the request. Only logged if `resolve_indices` is true. +`audit_trace_doc_types` | The document types affected by the request. Only logged if `resolve_indices` is true. + + +## Transport GRANTED_PRIVILEGES attributes + +Name | Description +:--- | :--- +`audit_trace_task_id` | The ID of the request. +`audit_trace_task_parent_id` | The parent ID of this request, if any. +`audit_transport_headers` | The headers of the request, if any. +`audit_request_effective_user` | The username that failed to authenticate. +`audit_request_initiating_user` | The user that initiated the request. Only logged if it differs from the effective user. +`audit_transport_request_type` | The type of request (e.g. `IndexRequest`). +`audit_request_privilege` | The required privilege of the request (e.g. `indices:data/read/search`). +`audit_request_body` | The HTTP request body, if any (and if request body logging is enabled). +`audit_trace_indices` | The index name(s) included in the request. Can contain wildcards, date patterns, and aliases. Only logged if `resolve_indices` is true. +`audit_trace_resolved_indices` | The resolved index name(s) affected by the request. Only logged if `resolve_indices` is true. +`audit_trace_doc_types` | The document types affected by the request. Only logged if `resolve_indices` is true. + + +## Transport SSL_EXCEPTION attributes + +Name | Description +:--- | :--- +`audit_request_exception_stacktrace` | The stack trace of the SSL exception. + + +## Transport BAD_HEADERS attributes + +Name | Description +:--- | :--- +`audit_trace_task_id` | The ID of the request. +`audit_trace_task_parent_id` | The parent ID of this request, if any. +`audit_transport_headers` | The headers of the request, if any. +`audit_request_effective_user` | The username that failed to authenticate. +`audit_request_initiating_user` | The user that initiated the request. Only logged if it differs from the effective user. +`audit_transport_request_type` | The type of request (e.g. `IndexRequest`). +`audit_request_body` | The HTTP request body, if any (and if request body logging is enabled). +`audit_trace_indices` | The index name(s) included in the request. Can contain wildcards, date patterns, and aliases. Only logged if `resolve_indices` is true. +`audit_trace_resolved_indices` | The resolved index name(s) affected by the request. Only logged if `resolve_indices` is true. +`audit_trace_doc_types` | The document types affected by the request. Only logged if `resolve_indices` is true. + + +## Transport opensearch_SECURITY_INDEX_ATTEMPT attributes + +Name | Description +:--- | :--- +`audit_trace_task_id` | The ID of the request. +`audit_transport_headers` | The headers of the request, if any. +`audit_request_effective_user` | The username that failed to authenticate. +`audit_request_initiating_user` | The user that initiated the request. Only logged if it differs from the effective user. +`audit_transport_request_type` | The type of request (e.g. `IndexRequest`). +`audit_request_body` | The HTTP request body, if any (and if request body logging is enabled). +`audit_trace_indices` | The index name(s) included in the request. Can contain wildcards, date patterns, and aliases. Only logged if `resolve_indices` is true. +`audit_trace_resolved_indices` | The resolved index name(s) affected by the request. Only logged if `resolve_indices` is true. +`audit_trace_doc_types` | The document types affected by the request. Only logged if `resolve_indices` is true. diff --git a/docs/security/audit-logs/index.md b/docs/security/audit-logs/index.md new file mode 100644 index 00000000..23f1e7ef --- /dev/null +++ b/docs/security/audit-logs/index.md @@ -0,0 +1,191 @@ +--- +layout: default +title: Audit Logs +nav_order: 90 +parent: Security +has_children: true +has_toc: false +--- + +# Audit logs + +Audit logs let you track access to your OpenSearch cluster and are useful for compliance purposes or in the aftermath of a security breach. You can configure the categories to be logged, the detail level of the logged messages, and where to store the logs. + +To enable audit logging: + +1. Add the following line to `opensearch.yml` on each node: + + ```yml + opensearch_security.audit.type: internal_opensearch + ``` + + This setting stores audit logs on the current cluster. For other storage options, see [Audit Log Storage Types](storage-types/). + +2. Restart each node. + +After this initial setup, you can use OpenSearch Dashboards to manage your audit log categories and other settings. In OpenSearch Dashboards, choose **Security**, **Audit logs**. + + +--- + +#### Table of contents +1. TOC +{:toc} + + +--- + +## Tracked events + +Audit logging records events in two ways: HTTP requests (REST) and the transport layer. + +Event | Logged on REST | Logged on transport | Description +:--- | :--- | :--- | :--- +`FAILED_LOGIN` | Yes | Yes | The credentials of a request could not be validated, most likely because the user does not exist or the password is incorrect. +`AUTHENTICATED` | Yes | Yes | A user successfully authenticated. +`MISSING_PRIVILEGES` | No | Yes | The user does not have the required permissions to execute the request. +`GRANTED_PRIVILEGES` | No | Yes | A user made a successful request to OpenSearch. +`SSL_EXCEPTION` | Yes | Yes | An attempt was made to access OpenSearch without a valid SSL/TLS certificate. +`opensearch_SECURITY_INDEX_ATTEMPT` | No | Yes | An attempt was made to modify the security plugin internal user and privileges index without the required permissions or TLS admin certificate. +`BAD_HEADERS` | Yes | Yes | An attempt was made to spoof a request to OpenSearch with the security plugin internal headers. + +These default log settings work well for most use cases, but you can change settings to save storage space or adapt the information to your exact needs. + + +## Exclude categories + +To exclude categories, set: + +```yml +opensearch_security.audit.config.disabled_rest_categories: +opensearch_security.audit.config.disabled_transport_categories: +``` + +For example: + +```yml +opensearch_security.audit.config.disabled_rest_categories: AUTHENTICATED, opensearch_SECURITY_INDEX_ATTEMPT +opensearch_security.audit.config.disabled_transport_categories: GRANTED_PRIVILEGES +``` + +If you want to log events in all categories, use `NONE`: + +```yml +opensearch_security.audit.config.disabled_rest_categories: NONE +opensearch_security.audit.config.disabled_transport_categories: NONE +``` + + +## Disable REST or the transport layer + +By default, the security plugin logs events on both REST and the transport layer. You can disable either type: + +```yml +opensearch_security.audit.enable_rest: false +opensearch_security.audit.enable_transport: false +``` + + +## Disable request body logging + +By default, the security plugin includes the body of the request (if available) for both REST and the transport layer. If you do not want or need the request body, you can disable it: + +```yml +opensearch_security.audit.log_request_body: false +``` + + +## Log index names + +By default, the security plugin logs all indices affected by a request. Because index names can be an aliases and contain wildcards/date patterns, the security plugin logs the index name that the user submitted *and* the actual index name to which it resolves. + +For example, if you use an alias or a wildcard, the the audit event might look like: + +```json +audit_trace_indices: [ + "human*" +], +audit_trace_resolved_indices: [ + "humanresources" +] +``` + +You can disable this feature by setting: + +```yml +opensearch_security.audit.resolve_indices: false +``` + +Disabling this feature only takes effect if `opensearch_security.audit.log_request_body` is also set to `false`. +{: .note } + + +## Configure bulk request handling + +Bulk requests can contain many indexing operations. By default, the security plugin only logs the single bulk request, not each individual operation. + +The security plugin can be configured to log each indexing operation as a separate event: + +```yml +opensearch_security.audit.resolve_bulk_requests: true +``` + +This change can create a massive number of events in the audit logs, so we don't recommend enabling this setting if you make heavy use of the `_bulk` API. + + +## Exclude requests + +You can exclude certain requests from being logged completely, by either configuring actions (for transport requests) and/or HTTP request paths (REST): + +```yml +opensearch_security.audit.ignore_requests: ["indices:data/read/*", "SearchRequest"] +``` + + +## Exclude users + +By default, the security plugin logs events from all users, but excludes the internal OpenSearch Dashboards server user `opensearch-dashboardsserver`. You can exclude other users: + +```yml +opensearch_security.audit.ignore_users: + - opensearch-dashboardsserver + - admin +``` + +If requests from all users should be logged, use `NONE`: + +```yml +opensearch_security.audit.ignore_users: NONE +``` + + +## Configure the audit log index name + +By default, the security plugin stores audit events in a daily rolling index named `auditlog-YYYY.MM.dd`. You can configure the name of the index in `opensearch.yml`: + +```yml +opensearch_security.audit.config.index: myauditlogindex +``` + +Use a date pattern in the index name to configure daily, weekly, or monthly rolling indices: + +```yml +opensearch_security.audit.config.index: "'auditlog-'YYYY.MM.dd" +``` + +For a reference on the date pattern format, see the [Joda DateTimeFormat documentation](http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html). + + +## (Advanced) Tune the thread pool + +The Search plugin logs events asynchronously, which keeps performance impact on your cluster minimal. The plugin uses a fixed thread pool to log events. You can define the number of threads in the pool in `opensearch.yml`: + +```yml +opensearch_security.audit.threadpool.size: +``` + +The default setting is `10`. Setting this value to `0` disables the thread pool, which means the plugin logs events synchronously. To set the maximum queue length per thread: + +```yml +opensearch_security.audit.threadpool.max_queue_len: 100000 +``` diff --git a/docs/security/audit-logs/storage-types.md b/docs/security/audit-logs/storage-types.md new file mode 100644 index 00000000..84fc3c32 --- /dev/null +++ b/docs/security/audit-logs/storage-types.md @@ -0,0 +1,109 @@ +--- +layout: default +title: Audit Log Storage Types +parent: Audit Logs +grand_parent: Security +nav_order: 10 +--- + +# Audit log storage types + +Audit logs can take up quite a bit of space, so the security plugin offers several options for storage locations. + +Setting | Description +:--- | :--- +debug | Outputs to stdout. Useful for testing and debugging. +internal_opensearch | Writes to an audit index on the current OpenSearch cluster. +external_opensearch | Writes to an audit index on a remote OpenSearch cluster. +webhook | Sends events to an arbitrary HTTP endpoint. +log4j | Writes the events to a Log4j logger. You can use any Log4j [appender](https://logging.apache.org/log4j/2.x/manual/appenders.html), such as SNMP, JDBC, Cassandra, and Kafka. + +You configure the output location in `opensearch.yml`: + +``` +opensearch_security.audit.type: +``` + +`external_opensearch`, `webhook`, and `log4j` all have additional configuration options. Details follow. + + +## External OpenSearch + +The `external_opensearch` storage type requires one or more OpenSearch endpoints with a host/IP address and port. Optionally, provide the index name and a document type. + +```yml +opensearch_security.audit.type: external_opensearch +opensearch_security.audit.config.http_endpoints: [] +opensearch_security.audit.config.index: +opensearch_security.audit.config.type: _doc +``` + +The security plugin uses the OpenSearch REST API to send events, just like any other indexing request. For `opensearch_security.audit.config.http_endpoints`, use a comma-separated list of hosts/IP addresses and the REST port (default 9200). + +``` +opensearch_security.audit.config.http_endpoints: [192.168.178.1:9200,192.168.178.2:9200] +``` + +If you use `external_opensearch` and the remote cluster also uses the security plugin, you must supply some additional parameters for authentication. These parameters depend on which authentication type you configured for the remote cluster. + + +### TLS settings + +Name | Data Type | Description +:--- | :--- | :--- +`opensearch_security.audit.config.enable_ssl` | Boolean | If you enabled SSL/TLS on the receiving cluster, set to true. The default is false. +`opensearch_security.audit.config.verify_hostnames` | Boolean | Whether to verify the hostname of the SSL/TLS certificate of the receiving cluster. Default is true. +`opensearch_security.audit.config.pemtrustedcas_filepath` | String | The trusted root certificate of the external OpenSearch cluster, relative to the `config` directory. +`opensearch_security.audit.config.pemtrustedcas_content` | String | Instead of specifying the path (`opensearch_security.audit.config.pemtrustedcas_filepath`), you can configure the Base64-encoded certificate content directly. +`opensearch_security.audit.config.enable_ssl_client_auth` | Boolean | Whether to enable SSL/TLS client authentication. If you set this to true, the audit log module sends the node's certificate along with the request. The receiving cluster can use this certificate to verify the identity of the caller. +`opensearch_security.audit.config.pemcert_filepath` | String | The path to the TLS certificate to send to the external OpenSearch cluster, relative to the `config` directory. +`opensearch_security.audit.config.pemcert_content` | String | Instead of specifying the path (`opensearch_security.audit.config.pemcert_filepath`), you can configure the Base64-encoded certificate content directly. +`opensearch_security.audit.config.pemkey_filepath` | String | The path to the private key of the TLS certificate to send to the external OpenSearch cluster, relative to the `config` directory. +`opensearch_security.audit.config.pemkey_content` | String | Instead of specifying the path (`opensearch_security.audit.config.pemkey_filepath`), you can configure the Base64-encoded certificate content directly. +`opensearch_security.audit.config.pemkey_password` | String | The password of the private key. + + +### Basic auth settings + +If you enabled HTTP basic authentication on the receiving cluster, use these settings to specify the username and password: + +```yml +opensearch_security.audit.config.username: +opensearch_security.audit.config.password: +``` + + +## Webhook + +Use the following keys to configure the `webhook` storage type. + +Name | Data Type | Description +:--- | :--- | :--- +`opensearch_security.audit.config.webhook.url` | String | The HTTP or HTTPS URL to send the logs to. +`opensearch_security.audit.config.webhook.ssl.verify` | Boolean | If true, the TLS certificate provided by the endpoint (if any) will be verified. If set to false, no verification is performed. You can disable this check if you use self-signed certificates. +`opensearch_security.audit.config.webhook.ssl.pemtrustedcas_filepath` | String | The path to the trusted certificate against which the webhook's TLS certificate is validated. +`opensearch_security.audit.config.webhook.ssl.pemtrustedcas_content` | String | Same as `opensearch_security.audit.config.webhook.ssl.pemtrustedcas_content`, but you can configure the base 64 encoded certificate content directly. +`opensearch_security.audit.config.webhook.format` | String | The format in which the audit log message is logged, can be one of `URL_PARAMETER_GET`, `URL_PARAMETER_POST`, `TEXT`, `JSON`, `SLACK`. See [Formats](#formats). + + +### Formats + +Format | Description +:--- | :--- +`URL_PARAMETER_GET` | Uses HTTP GET to send logs to the webhook URL. All logged information is appended to the URL as request parameters. +`URL_PARAMETER_POST` | Uses HTTP POST to send logs to the webhook URL. All logged information is appended to the URL as request parameters. +`TEXT` | Uses HTTP POST to send logs to the webhook URL. The request body contains the audit log message in plain text format. +`JSON` | Uses HTTP POST to send logs to the webhook URL. The request body contains the audit log message in JSON format. +`SLACK` | Uses HTTP POST to send logs to the webhook URL. The request body contains the audit log message in JSON format suitable for consumption by Slack. The default implementation returns `"text": ""`. + + +## Log4j + +The `log4j` storage type lets you specify the name of the logger and log level. + +```yml +opensearch_security.audit.config.log4j.logger_name: audit +opensearch_security.audit.config.log4j.level: INFO +``` + +By default, the security plugin uses the logger name `audit` and logs the events on `INFO` level. Audit events are stored in JSON format. diff --git a/docs/security/configuration/client-auth.md b/docs/security/configuration/client-auth.md new file mode 100644 index 00000000..33af183a --- /dev/null +++ b/docs/security/configuration/client-auth.md @@ -0,0 +1,107 @@ +--- +layout: default +title: Client certificate authentication +parent: Configuration +grand_parent: Security +nav_order: 50 +--- + +# Client certificate authentication + +After obtaining your own certificates either from a certificate authority (CA) or by [generating your own certificates using OpenSSL](../generate-certificates), you can start configuring OpenSearch to authenticate a user using a client certificate. + +Client certificate authentication offers more security advantages than just using basic authentication (username and password). Because client certificate authentication requires both a client certificate and its private key, which are often in the user's possession, it is less vulnerable to brute force attacks in which malicious individuals try to guess a user's password. + +Another benefit of client certificate authentication is you can use it along with basic authentication, providing two layers of security. + +## Enabling client certificate authentication + +To enable client certificate authentication, you must first set `clientauth_mode` in `opensearch.yml` to either `OPTIONAL` or `REQUIRE`: + +```yml +opensearch_security.ssl.http.clientauth_mode: OPTIONAL +``` + +Next, enable client certificate authentication in the `client_auth_domain` section of `config.yml`. + +```yml +clientcert_auth_domain: + description: "Authenticate via SSL client certificates" + http_enabled: true + transport_enabled: true + order: 1 + http_authenticator: + type: clientcert + config: + username_attribute: cn #optional, if omitted DN becomes username + challenge: false + authentication_backend: + type: noop +``` + +## Assigning roles to your common name + +You can now assign your certificate's common name (CN) to a role. For this step, you must know your certificate's CN and the role you want to assign to. To get a list of all predefined roles in OpenSearch, refer to our [list of predefined roles](../../access-control/users-roles#predefined-roles). If you want to first create a role, refer to [how to create a role](../../access-control/users-roles#create-users), and then map your certificate's CN to that role. + +After deciding which role you want to map your certificate's CN to, you can use [OpenSearch Dashboards](../../access-control/users-roles#map-users-to-roles), [`roles_mapping.yml`](../yaml/#roles_mappingyml), or the [REST API](../../access-control/api/#create-role-mapping) to map your certificate's CN to the role. The following example uses the `REST API` to map the common name `CLIENT1` to the role `readall`. + +**Sample request** + +```json +PUT _opensearch/_security/api/rolesmapping/readall +{ + "backend_roles" : ["sample_role" ], + "hosts" : [ "example.host.com" ], + "users" : [ "CLIENT1" ] +} +``` + +**Sample response** + +```json +{ + "status": "OK", + "message": "'readall' updated." +} +``` + +After mapping a role to your client certificate's CN, you're ready to connect to your cluster using those credentials. + +The code example below uses the Python `requests` library to connect to a local OpenSearch cluster and sends a GET request to the `movies` index. + +```python +import requests +import json +base_url = 'https://localhost:9200/' +headers = { + 'Content-Type': 'application/json' +} +cert_file_path = "/full/path/to/client-cert.pem" +key_file_path = "/full/path/to/client-cert-key.pem" +root_ca_path = "/full/path/to/root-ca.pem" + +# Send the request. +path = 'movies/_doc/3' +url = base_url + path +response = requests.get(url, cert = (cert_file_path, key_file_path), verify=root_ca_path) +print(response.text) +``` + +## Configuring Beats + +You can also configure your Beats so that it uses a client certificate for authentication with OpenSearch. Afterwards, it can start sending output to OpenSearch. + +This output configuration specifies which settings you need for client certificate authentication: + +```yml +output.opensearch: + enabled: true + # Array of hosts to connect to. + hosts: ["localhost:9200"] + # Protocol - either `http` (default) or `https`. + protocol: "https" + ssl.certificate_authorities: ["/full/path/to/CA.pem"] + ssl.verification_mode: certificate + ssl.certificate: "/full/path/to/client-cert.pem" + ssl.key: "/full/path/to/to/client-cert-key.pem" +``` diff --git a/docs/security/configuration/concepts.md b/docs/security/configuration/concepts.md new file mode 100755 index 00000000..864cb49c --- /dev/null +++ b/docs/security/configuration/concepts.md @@ -0,0 +1,27 @@ +--- +layout: default +title: Authentication flow +parent: Configuration +grand_parent: Security +nav_order: 1 +--- + +# Authentication flow + +Understanding the authentication flow is a great way to get started with configuring the security plugin. + +1. To identify a user who wants to access the cluster, the security plugin needs the user's credentials. + + These credentials differ depending on how you've configured the plugin. For example, if you use basic authentication, the credentials are a user name and password. If you use a JSON web token, the credentials are stored within the token itself. If you use TLS certificates, the credentials are the distinguished name (DN) of the certificate. + +2. The security plugin authenticates the user's credentials against a backend: the internal user database, Lightweight Directory Access Protocol (LDAP), Active Directory, Kerberos, or JSON web tokens. + + The plugin supports chaining backends in `securityconfig/config.yml`. If more than one backend is present, the plugin tries to authenticate the user sequentially against each until one succeeds. A common use case is to combine the internal user database of the security plugin with LDAP/Active Directory. + +3. After a backend verifies the user's credentials, the plugin collects any backend roles. These roles can be arbitrary strings in the internal user database, but in most cases, these backend roles come from LDAP/Active Directory. + +4. After the user is authenticated and any backend roles are retrieved, the security plugin uses the role mapping to assign security roles to the user. + + If the role mapping doesn't include the user (or the user's backend roles), the user is successfully authenticated, but has no permissions. + +5. The user can now perform actions as defined by the mapped security roles. For example, a user might map to the `opensearch_dashboards_user` role and thus have permissions to access OpenSearch Dashboards. diff --git a/docs/security/configuration/configuration.md b/docs/security/configuration/configuration.md new file mode 100755 index 00000000..59e51927 --- /dev/null +++ b/docs/security/configuration/configuration.md @@ -0,0 +1,402 @@ +--- +layout: default +title: Backend Configuration +parent: Configuration +grand_parent: Security +nav_order: 2 +--- + +# Backend configuration + +One of the first steps to using the security plugin is to decide on an authentication backend, which handles [steps 2-3 of the authentication flow](../concepts/#authentication-flow). The plugin has an internal user database, but many people prefer to use an existing authentication backend, such as an LDAP server, or some combination of the two. + +The main configuration file for authentication and authorization backends is `plugins/opensearch_security/securityconfig/config.yml`. It defines how the security plugin retrieves the user credentials, how it verifies these credentials, and how to fetch additional roles from backend systems (optional). + +`config.yml` has three main parts: + +```yml +opensearch_security: + dynamic: + http: + ... + authc: + ... + authz: + ... +``` + +For a more complete example, see the [sample file on GitHub](https://github.com/opensearch-project/security/blob/master/securityconfig/config.yml). + + +## HTTP + +The `http` section has the following format: + +```yml +anonymous_auth_enabled: +xff: # optional section + enabled: + internalProxies: # Regex pattern + remoteIpHeader: # Name of the header in which to look. Typically: x-forwarded-for + proxiesHeader: + trustedProxies: # Regex pattern +``` + +If you disable anonymous authentication, the security plugin won't initialize if you have not provided at least one `authc`. + + +## Authentication + +The `authc` section has the following format: + +```yml +: + http_enabled: + transport_enabled: + order: + http_authenticator: + ... + authentication_backend: + ... +``` + +An entry in the `authc` section is called an *authentication domain*. It specifies where to get the user credentials and against which backend they should be authenticated. + +You can use more than one authentication domain. Each authentication domain has a name (for example, `basic_auth_internal`), `enabled` flags, and an `order`. The order makes it possible to chain authentication domains together. The security plugin uses them in the order that you provide. If the user successfully authenticates with one domain, the security plugin skips the remaining domains. + +`http_authenticator` specifies which authentication method that you want to use on the HTTP layer. + +This is the syntax for defining an authenticator on the HTTP layer: + +```yml +http_authenticator: + type: + challenge: + config: + ... +``` + +These are the allowed values for `type`: + +- `basic`: HTTP basic authentication. No additional configuration is needed. +- `kerberos`: Kerberos authentication. Additional, [Kerberos-specific configuration](#kerberos) is needed. +- `jwt`: JSON web token authentication. Additional, [JWT-specific configuration](#json-web-token) is needed. +- `clientcert`: Authentication through a client TLS certificate. This certificate must be trusted by one of the root CAs in the truststore of your nodes. + +After setting an HTTP authenticator, you must specify against which backend system you want to authenticate the user: + +```yml +authentication_backend: + type: + config: + ... +``` + +These are the possible values for `type`: + +- `noop`: No further authentication against any backend system is performed. Use `noop` if the HTTP authenticator has already authenticated the user completely, as in the case of JWT, Kerberos, or client certificate authentication. +- `internal`: Use the users and roles defined in `internal_users.yml` for authentication. +- `ldap`: Authenticate users against an LDAP server. This setting requires [additional, LDAP-specific configuration settings](../ldap/). + + +## Authorization + +After the user has been authenticated, the security plugin can optionally collect additional roles from backend systems. The authorization configuration has the following format: + +```yml +authz: + : + http_enabled: + transport_enabled: + authorization_backend: + type: + config: + ... +``` + +You can define multiple entries in this section the same way as you can for authentication entries. In this case, execution order is not relevant, so there is no `order` field. + +These are the possible values for `type`: + +- `noop`: Skip this step altogether. +- `ldap`: Fetch additional roles from an LDAP server. This setting requires [additional, LDAP-specific configuration settings](../ldap/). + + +## Examples + +The default `plugins/opensearch_security/securityconfig/config.yml` that ships with OpenSearch contains many configuration examples. Use these examples as a starting point, and customize them to your needs. + + +## HTTP basic + +To set up HTTP basic authentication, you must enable it in the `http_authenticator` section of the configuration: + +```yml +http_authenticator: + type: basic + challenge: true +``` + +In most cases, you set the `challenge` flag to `true`. The flag defines the behavior of the security plugin if the `Authorization` field in the HTTP header is not set. + +If `challenge` is set to `true`, the security plugin sends a response with status `UNAUTHORIZED` (401) back to the client. If the client is accessing the cluster with a browser, this triggers the authentication dialog box, and the user is prompted to enter a user name and password. + +If `challenge` is set to `false` and no `Authorization` header field is set, the security plugin does not send a `WWW-Authenticate` response back to the client, and authentication fails. You might want to use this setting if you have another challenge `http_authenticator` in your configured authentication domains. One such scenario is when you plan to use basic authentication and Kerberos together. + + +## Kerberos + +Due to the nature of Kerberos, you must define some settings in `opensearch.yml` and some in `config.yml`. + +In `opensearch.yml`, define the following: + +```yml +opensearch_security.kerberos.krb5_filepath: '/etc/krb5.conf' +opensearch_security.kerberos.acceptor_keytab_filepath: 'eskeytab.tab' +``` + +`opensearch_security.kerberos.krb5_filepath` defines the path to your Kerberos configuration file. This file contains various settings regarding your Kerberos installation, for example, the realm names, hostnames, and ports of the Kerberos key distribution center (KDC). + +`opensearch_security.kerberos.acceptor_keytab_filepath` defines the path to the keytab file, which contains the principal that the security plugin uses to issue requests against Kerberos. + +`opensearch_security.kerberos.acceptor_principal: 'HTTP/localhost'` defines the principal that the security plugin uses to issue requests against Kerberos. This value must be present in the keytab file. + +Due to security restrictions, the keytab file must be placed in `config` or a subdirectory, and the path in `opensearch.yml` must be relative, not absolute. +{: .warning } + + +### Dynamic configuration + +A typical Kerberos authentication domain in `config.yml` looks like this: + +```yml + authc: + kerberos_auth_domain: + enabled: true + order: 1 + http_authenticator: + type: kerberos + challenge: true + config: + krb_debug: false + strip_realm_from_principal: true + authentication_backend: + type: noop +``` + +Authentication against Kerberos through a browser on an HTTP level is achieved using SPNEGO. Kerberos/SPNEGO implementations vary, depending on your browser and operating system. This is important when deciding if you need to set the `challenge` flag to `true` or `false`. + +As with [HTTP Basic Authentication](#http-basic), this flag determines how the security plugin should react when no `Authorization` header is found in the HTTP request or if this header does not equal `negotiate`. + +If set to `true`, the security plugin sends a response with status code 401 and a `WWW-Authenticate` header set to `negotiate`. This tells the client (browser) to resend the request with the `Authorization` header set. If set to `false`, the security plugin cannot extract the credentials from the request, and authentication fails. Setting `challenge` to `false` thus makes sense only if the Kerberos credentials are sent in the initial request. + +As the name implies, setting `krb_debug` to `true` will output Kerberos-specific debugging messages to `stdout`. Use this setting if you encounter problems with your Kerberos integration. + +If you set `strip_realm_from_principal` to `true`, the security plugin strips the realm from the user name. + + +### Authentication backend + +Because Kerberos/SPNEGO authenticates users on an HTTP level, no additional `authentication_backend` is needed. Set this value to `noop`. + + +## JSON web token + +JSON web tokens (JWTs) are JSON-based access tokens that assert one or more claims. They are commonly used to implement single sign-on (SSO) solutions and fall in the category of token-based authentication systems: + +1. A user logs in to an authentication server by providing credentials (for example, a user name and password). +1. The authentication server validates the credentials. +1. The authentication server creates an access token and signs it. +1. The authentication server returns the token to the user. +1. The user stores the access token. +1. The user sends the access token alongside every request to the service that it wants to use. +1. The service verifies the token and grants or denies access. + +A JSON web token is self-contained in the sense that it carries all necessary information to verify a user within itself. The tokens are base64-encoded, signed JSON objects. + +JSON web tokens consist of three parts: + +1. Header +1. Payload +1. Signature + + +### Header + +The header contains information about the used signing mechanism, as shown in the following example: + +```json +{ + "alg": "HS256", + "typ": "JWT" +} +``` + +In this case, the header states that the message was signed using HMAC-SHA256. + + +### Payload + +The payload of a JSON web token contains the so-called [JWT Claims](http://self-issued.info/docs/draft-ietf-oauth-json-web-token.html#RegisteredClaimName). A claim can be any piece of information about the user that the application that created the token has verified. + +The specification defines a set of standard claims with reserved names ("registered claims"). These include, for example, the token issuer, the expiration date, or the creation date. + +Public claims, on the other hand, can be created freely by the token issuer. They can contain arbitrary information, such as the user name and the roles of the user. + +Example: + +```json +{ + "iss": "example.com", + "exp": 1300819380, + "name": "John Doe", + "roles": "admin, devops" +} +``` + + +### Signature + +The issuer of the token calculates the signature of the token by applying a cryptographic hash function on the base64-encoded header and payload. These three parts are then concatenated using periods to form a complete JSON web token: + +``` +encoded = base64UrlEncode(header) + "." + base64UrlEncode(payload) +signature = HMACSHA256(encoded, 'secretkey'); +jwt = encoded + "." + base64UrlEncode(signature) +``` + +Example: +``` +eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJsb2dnZWRJbkFzIjoiYWRtaW4iLCJpYXQiOjE0MjI3Nzk2Mzh9.gzSraSYS8EXBxLN_oWnFSRgCzcmJmMjLiuyu5CSpyHI +``` + + +### Configure JSON web tokens + +If JSON web tokens are the only authentication method that you use, disable the user cache by setting `opensearch_security.cache.ttl_minutes: 0`. +{: .warning } + +Set up an authentication domain and choose `jwt` as the HTTP authentication type. Because the tokens already contain all required information to verify the request, `challenge` must be set to `false` and `authentication_backend` to `noop`. + +Example: + +```yml +jwt_auth_domain: + enabled: true + order: 0 + http_authenticator: + type: jwt + challenge: false + config: + signing_key: "base64 encoded key" + jwt_header: "Authorization" + jwt_url_parameter: null + subject_key: null + roles_key: null + authentication_backend: +I type: noop +``` + +The following table shows the configuration parameters. + +Name | Description +:--- | :--- +`signing_key` | The signing key to use when verifying the token. If you use a symmetric key algorithm, it is the base64-encoded shared secret. If you use an asymmetric algorithm, it contains the public key. +`jwt_header` | The HTTP header in which the token is transmitted. This typically is the `Authorization` header with the `Bearer` schema: `Authorization: Bearer `. Default is `Authorization`. +`jwt_url_parameter` | If the token is not transmitted in the HTTP header, but as an URL parameter, define the name of this parameter here. +`subject_key` | The key in the JSON payload that stores the user name. If not set, the [subject](https://tools.ietf.org/html/rfc7519#section-4.1.2) registered claim is used. +`roles_key` | The key in the JSON payload that stores the user's roles. The value of this key must be a comma-separated list of roles. + +Because JSON web tokens are self-contained and the user is authenticated on the HTTP level, no additional `authentication_backend` is needed. Set this value to `noop`. + + +### Symmetric key algorithms: HMAC + +Hash-based message authentication codes (HMACs) are a group of algorithms that provide a way of signing messages by means of a shared key. The key is shared between the authentication server and the security plugin. It must be configured as a base64-encoded value in the `signing_key` setting: + +```yml +jwt_auth_domain: + ... + config: + signing_key: "a3M5MjEwamRqOTAxOTJqZDE=" + ... +``` + + +### Asymmetric key algorithms: RSA and ECDSA + +RSA and ECDSA are asymmetric encryption and digital signature algorithms and use a public/private key pair to sign and verify tokens. This means that they use a private key for signing the token, while the security plugin needs to know only the public key to verify it. + +Because you cannot issue new tokens with the public key---and because you can make valid assumptions about the creator of the token---RSA and ECDSA are considered more secure than using HMAC. + +To use RS256, you need to configure only the (non-base64-encoded) public RSA key as `signing_key` in the JWT configuration: + +```yml +jwt_auth_domain: + ... + config: + signing_key: |- + -----BEGIN PUBLIC KEY----- + MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQK... + -----END PUBLIC KEY----- + ... +``` + +The security plugin automatically detects the algorithm (RSA/ECDSA), and if necessary you can break the key into multiple lines. + + +### Bearer authentication for HTTP requests + +The most common way of transmitting a JSON web token in an HTTP request is to add it as an HTTP header with the bearer authentication schema: + +``` +Authorization: Bearer +``` + +The default name of the header is `Authorization`. If required by your authentication server or proxy, you can also use a different HTTP header name using the `jwt_header` configuration key. + +As with HTTP basic authentication, you should use HTTPS instead of HTTP when transmitting JSON web tokens in HTTP requests. + + +### URL parameters for HTTP requests + +Although the most common way to transmit JWTs in HTTP requests is to use a header field, the security plugin also supports parameters. Configure the name of the `GET` parameter using the following key: + +```yml + config: + signing_key: ... + jwt_url_parameter: "parameter_name" + subject_key: ... + roles_key: ... +``` + +As with HTTP basic authentication, you should use HTTPS instead of HTTP. + + +### Validated registered claims + +The following registered claims are validated automatically: + +* "iat" (Issued At) Claim +* "nbf" (Not Before) Claim +* "exp" (Expiration Time) Claim + + +### Supported formats and algorithms + +The security plugin supports digitally signed, compact JSON web tokens with all standard algorithms: + +``` +HS256: HMAC using SHA-256 +HS384: HMAC using SHA-384 +HS512: HMAC using SHA-512 +RS256: RSASSA-PKCS-v1_5 using SHA-256 +RS384: RSASSA-PKCS-v1_5 using SHA-384 +RS512: RSASSA-PKCS-v1_5 using SHA-512 +PS256: RSASSA-PSS using SHA-256 and MGF1 with SHA-256 +PS384: RSASSA-PSS using SHA-384 and MGF1 with SHA-384 +PS512: RSASSA-PSS using SHA-512 and MGF1 with SHA-512 +ES256: ECDSA using P-256 and SHA-256 +ES384: ECDSA using P-384 and SHA-384 +ES512: ECDSA using P-521 and SHA-512 +``` diff --git a/docs/security/configuration/disable.md b/docs/security/configuration/disable.md new file mode 100755 index 00000000..8d2950f2 --- /dev/null +++ b/docs/security/configuration/disable.md @@ -0,0 +1,69 @@ +--- +layout: default +title: Disable Security +parent: Configuration +grand_parent: Security +nav_order: 99 +--- + +# Disable security + +You might want to temporarily disable the security plugin to make testing or internal usage more straightforward. To disable the plugin, add the following line in `opensearch.yml`: + +```yml +opensearch_security.disabled: true +``` + +A more permanent option is to remove the security plugin entirely. Delete the `plugins/opensearch_security` folder on all nodes, and delete the `opensearch_security` configuration entries from `opensearch.yml`. + +To perform these steps on the Docker image, see [Customize the Docker image](../../../install/docker/#customize-the-docker-image). + +Disabling or removing the plugin exposes the configuration index for the security plugin. If the index contains sensitive information, be sure to protect it through some other means. If you no longer need the index, delete it. +{: .warning } + + +## Remove OpenSearch Dashboards plugin + +The security plugin is actually two plugins: one for OpenSearch and one for OpenSearch Dashboards. You can use the OpenSearch plugin independently, but the OpenSearch Dashboards plugin depends on a secured OpenSearch cluster. + +If you disable the security plugin in `opensearch.yml` (or delete the plugin entirely) and still want to use OpenSearch Dashboards, you must remove the corresponding OpenSearch Dashboards plugin. For more information, see [Standalone OpenSearch Dashboards plugin install](../../../opensearch-dashboards/plugins/). + + +### RPM or DEB + +1. Remove all `opensearch_security` lines from `opensearch_dashboards.yml`. +1. Change `opensearch.url` in `opensearch_dashboards.yml` to `http://` rather than `https://`. +1. Enter `sudo /usr/share/opensearch-dashboards/bin/opensearch-dashboards-plugin remove opensearchSecurityOpenSearch Dashboards`. +1. Enter `sudo systemctl restart opensearch-dashboards.service`. + + +### Docker + +1. Create a new `Dockerfile`: + + ``` + FROM opensearch/opensearch-dashboards:{{site.opensearch_version}} + RUN /usr/share/opensearch-dashboards/bin/opensearch-dashboards-plugin remove opensearchSecurityOpenSearch Dashboards + COPY --chown=opensearch-dashboards:opensearch-dashboards opensearch_dashboards.yml /usr/share/opensearch-dashboards/config/ + ``` + + In this case, `opensearch_dashboards.yml` is a "vanilla" version of the file with no OpenSearch entries. It might look like this: + + ```yml + --- + server.name: opensearch-dashboards + server.host: "0" + opensearch.hosts: http://localhost:9200 + ``` + + +1. To build the new Docker image, run the following command: + + ```bash + docker build --tag=opensearch-dashboards-no-security . + ``` + +1. In `docker-compose.yml`, change `opensearch/opensearch-dashboards:{{site.opensearch_version}}` to `opensearch-dashboards-no-security`. +1. Change `OPENSEARCH_URL` (`docker-compose.yml`) or `opensearch.url` (your custom `opensearch_dashboards.yml`) to `http://` rather than `https://`. +1. Change `OPENSEARCH_HOSTS` or `opensearch.hosts` to `http://` rather than `https://`. +1. Enter `docker-compose up`. diff --git a/docs/security/configuration/generate-certificates.md b/docs/security/configuration/generate-certificates.md new file mode 100755 index 00000000..f1b4719f --- /dev/null +++ b/docs/security/configuration/generate-certificates.md @@ -0,0 +1,205 @@ +--- +layout: default +title: Generate Certificates +parent: Configuration +grand_parent: Security +nav_order: 11 +--- + +# Generate certificates + +If you don't have access to a certificate authority (CA) for your organization and want to use OpenSearch for non-demo purposes, you can generate your own self-signed certificates using [OpenSSL](https://www.openssl.org/){:target='\_blank'}. + +You can probably find OpenSSL in the package manager for your operating system. + +On CentOS, use Yum: + +```bash +sudo yum install openssl +``` + +On macOS, use [Homebrew](https://brew.sh/){:target='\_blank'}: + +```bash +brew install openssl +``` + + +## Generate a private key + +The first step in this process is to generate a private key using the `genrsa` command. As the name suggests, you should keep this file private. + +Private keys must be of sufficient length to be secure, so specify `2048`: + +```bash +openssl genrsa -out root-ca-key.pem 2048 +``` + +You can optionally add the `-aes256` option to encrypt the key using the AES-256 standard. This option requires a password. + + +## Generate a root certificate + +Next, use the key to generate a self-signed certificate for the root CA: + +```bash +openssl req -new -x509 -sha256 -key root-ca-key.pem -out root-ca.pem -days 30 +``` + +Change `-days 30` to 3650 (10 years) or some other number to set a non-default expiration date. The default value of 30 days is best for testing purposes. + +- The `-x509` option specifies that you want a self-signed certificate rather than a certificate request. +- The `-sha256` option sets the hash algorithm to SHA-256. SHA-256 is the default in later versions of OpenSSL, but earlier versions might use SHA-1. + +Follow the prompts to specify details for your organization. Together, these details form the distinguished name (DN) of your CA. + + +## Generate an admin certificate + +To generate an admin certificate, first create a new key: + +```bash +openssl genrsa -out admin-key-temp.pem 2048 +``` + +Then convert that key to PKCS#8 format for use in Java using a PKCS#12-compatible algorithm (3DES): + +```bash +openssl pkcs8 -inform PEM -outform PEM -in admin-key-temp.pem -topk8 -nocrypt -v1 PBE-SHA1-3DES -out admin-key.pem +``` + +Next, create a certificate signing request (CSR). This file acts as an application to a CA for a signed certificate: + +```bash +openssl req -new -key admin-key.pem -out admin.csr +``` + +Follow the prompts to fill in the details. You don't need to specify a challenge password. As noted in the [OpenSSL Cookbook](https://www.feistyduck.com/books/openssl-cookbook/){:target='\_blank'}, "Having a challenge password does not increase the security of the CSR in any way." + +Finally, generate the certificate itself: + +```bash +openssl x509 -req -in admin.csr -CA root-ca.pem -CAkey root-ca-key.pem -CAcreateserial -sha256 -out admin.pem -days 30 +``` + +Just like the root certificate, use the `-days` option to specify an expiration date of longer than 30 days. + + +## (Optional) Generate node and client certificates + +Follow the steps in [Generate an admin certificate](#generate-an-admin-certificate) with new file names to generate a new certificate for each node and as many client certificates as you need. Each certificate should use its own private key. + +If you generate node certificates and have `opensearch_security.ssl.transport.enforce_hostname_verification` set to `true` (default), be sure to specify a common name (CN) for the certificate that matches the hostname of the intended node. If you want to use the same node certificate on all nodes (not recommended), set the hostname verification to `false`. For more information, see [Configure TLS certificates](../tls/#advanced-hostname-verification-and-dns-lookup). + + +### Sample script + +```bash +# Root CA +openssl genrsa -out root-ca-key.pem 2048 +openssl req -new -x509 -sha256 -key root-ca-key.pem -out root-ca.pem -days 30 +# Admin cert +openssl genrsa -out admin-key-temp.pem 2048 +openssl pkcs8 -inform PEM -outform PEM -in admin-key-temp.pem -topk8 -nocrypt -v1 PBE-SHA1-3DES -out admin-key.pem +openssl req -new -key admin-key.pem -out admin.csr +openssl x509 -req -in admin.csr -CA root-ca.pem -CAkey root-ca-key.pem -CAcreateserial -sha256 -out admin.pem -days 30 +# Node cert +openssl genrsa -out node-key-temp.pem 2048 +openssl pkcs8 -inform PEM -outform PEM -in node-key-temp.pem -topk8 -nocrypt -v1 PBE-SHA1-3DES -out node-key.pem +openssl req -new -key node-key.pem -out node.csr +openssl x509 -req -in node.csr -CA root-ca.pem -CAkey root-ca-key.pem -CAcreateserial -sha256 -out node.pem -days 30 +#Client cert +openssl genrsa -out client-key-temp.pem 2048 +openssl pkcs8 -inform PEM -outform PEM -in client-key-temp.pem -topk8 -nocrypt -v1 PBE-SHA1-3DES -out client-key.pem +openssl req -new -key client-key.pem -out client.csr +openssl x509 -req -in client.csr -CA root-ca.pem -CAkey root-ca-key.pem -CAcreateserial -sha256 -out client.pem -days 30 +# Cleanup +rm admin-key-temp.pem +rm admin.csr +rm node-key-temp.pem +rm node.csr +rm client-key-temp.pem +rm client.csr +``` + +If you already know the certificate details and don't want to specify them as the script runs, use the `-subj` option in your `root-ca.pem` and CSR commands: + +```bash +openssl req -new -key node-key.pem -subj "/C=CA/ST=ONTARIO/L=TORONTO/O=ORG/OU=UNIT/CN=node1.example.com" -out node.csr +``` + + +## Get distinguished names + +If you created admin and node certificates, you must specify their distinguished names (DNs) in `opensearch.yml` on all nodes: + +```yml +opensearch_security.authcz.admin_dn: + - 'CN=ADMIN,OU=UNIT,O=ORG,L=TORONTO,ST=ONTARIO,C=CA' +opensearch_security.nodes_dn: + - 'CN=node1.example.com,OU=UNIT,O=ORG,L=TORONTO,ST=ONTARIO,C=CA' + - 'CN=node2.example.com,OU=UNIT,O=ORG,L=TORONTO,ST=ONTARIO,C=CA' +``` + +But if you look at the `subject` of the certificate after creating it, you might see different formatting: + +``` +subject=/C=CA/ST=ONTARIO/L=TORONTO/O=ORG/OU=UNIT/CN=node1.example.com +``` + +If you compare this string to the ones in `opensearch.yml` above, you can see that you need to invert the order of elements and use commas rather than slashes. Enter this command to get the correct string: + +```bash +openssl x509 -subject -nameopt RFC2253 -noout -in node.pem +``` + +Then you can copy and paste the output into `opensearch.yml`: + +``` +subject= CN=node1.example.com,OU=UNIT,O=ORG,L=TORONTO,ST=ONTARIO,C=CA +``` + + +## Configure certificates + +This process generates many files, but these are the ones you need to add to your cluster configuration: + +- `root-ca.pem` +- `admin.pem` +- `admin-key.pem` +- (Optional) `each-node-cert.pem` +- (Optional) `each-node-key.pem` + +For information about adding and configuring these certificates, see [Docker security configuration](../../../install/docker-security/) and [Configure TLS certificates](../tls/). + + +## Run securityadmin.sh + +After configuring your certificates and starting OpenSearch, run `securityadmin.sh` to initialize the security plugin: + +``` +./securityadmin.sh -cd ../securityconfig/ -icl -nhnv -cacert ../../../config/root-ca.pem -cert ../../../config/admin.pem -key ../../../config/admin-key.pem +``` + +For more information about what this command does, see [Apply configuration changes](../security-admin/). +{: .tip } + +If you use Docker, see [Bash access to containers](../../../install/docker/#bash-access-to-containers). + + +## OpenSearch Dashboards + +Depending on your settings in `opensearch_dashboards.yml`, you might need to add `root-ca.pem` to your OpenSearch Dashboards node. You have two options: disable SSL verification or add the root CA. + +- Disable SSL verification: + + ```yml + opensearch.ssl.verificationMode: none + ``` + +- Add the root CA: + + ```yml + opensearch.ssl.certificateAuthorities: ["/usr/share/opensearch-dashboards/config/root-ca.pem"] + opensearch.ssl.verificationMode: full + ``` diff --git a/docs/security/configuration/index.md b/docs/security/configuration/index.md new file mode 100644 index 00000000..2389696e --- /dev/null +++ b/docs/security/configuration/index.md @@ -0,0 +1,22 @@ +--- +layout: default +title: Configuration +nav_order: 1 +parent: Security +has_children: true +has_toc: false +--- + +# Security configuration + +The plugin includes demo certificates so that you can get up and running quickly, but before using OpenSearch in a production environment, you must configure it manually: + +1. [Replace the demo certificates](../../install/docker-security/) +1. [Reconfigure opensearch.yml to use your certificates](tls/) +1. [Reconfigure config.yml to use your authentication backend](configuration/) (if you don't plan to use the internal user database) +1. [Modify the configuration YAML files](yaml/) +1. [Apply changes using securityadmin.sh](security-admin/) +1. Start OpenSearch. +1. [Add users, roles, role mappings, and tenants](../access-control/) + +If you don't want to use the plugin, see [Disable security](disable/). diff --git a/docs/security/configuration/ldap.md b/docs/security/configuration/ldap.md new file mode 100755 index 00000000..3e33c02b --- /dev/null +++ b/docs/security/configuration/ldap.md @@ -0,0 +1,542 @@ +--- +layout: default +title: Active Directory and LDAP +parent: Configuration +grand_parent: Security +nav_order: 30 +--- + +# Active Directory and LDAP + +Active Directory and LDAP can be used for both authentication and authorization (the `authc` and `authz` sections of the configuration, respectively). Authentication checks whether the user has entered valid credentials. Authorization retrieves any backend roles for the user. + +In most cases, you want to configure both authentication and authorization. You can also use authentication only and map the users retrieved from LDAP directly to security plugin roles. + +{% comment %} + +## Docker example + +We provide a fully functional example that can help you understand how to use an LDAP server for both authentication and authorization. + +1. Download and unzip [the example ZIP file]({{site.url}}{{site.baseurl}}/assets/examples/ldap-example.zip). +1. At the command line, run `docker-compose up`. +1. Review the files: + + * `docker-compose.yml` defines a single OpenSearch node, an LDAP server, and a PHP administration tool for the LDAP server. + + You can access the administration tool at https://localhost:6443. Acknowledge the security warning and log in using `cn=admin,dc=example,dc=org` and `changethis`. + + * `directory.ldif` seeds the LDAP server with three users and two groups. + + `psantos` is in the `Administrator` and `Developers` groups. `jroe` and `jdoe` are in the `Developers` group. The security plugin loads these groups as backend roles. + + * `roles_mapping.yml` maps the `Administrator` and `Developers` LDAP groups (as backend roles) to security roles so that users gain the appropriate permissions after authenticating. + + * `internal_users.yml` removes all default users except `administrator` and `opensearch-dashboardsserver`. + + * `config.yml` includes all necessary LDAP settings. + +1. Index a document as `psantos`: + + ```bash + curl -XPUT https://localhost:9200/new-index/_doc/1 -H 'Content-Type: application/json' -d '{"title": "Spirited Away"}' -u psantos:password -k + ``` + + If you try the same request as `jroe`, it fails. The `Developers` group is mapped to the `readall`, `manage_snapshots`, and `opensearch_dashboards_user` roles and has no write permissions. + +1. Search for the document as `jroe`: + + ```bash + curl -XGET https://localhost:9200/new-index/_search?pretty -u jroe:password -k + ``` + + This request succeeds, because the `Developers` group is mapped to the `readall` role. + +1. If you want to examine the contents of the various containers, run `docker ps` to find the container ID and then `docker exec -it /bin/bash`. + +{% endcomment %} + +## Connection settings + +To enable LDAP authentication and authorization, add the following lines to `plugins/opensearch_security/securityconfig/config.yml`: + +```yml +authc: + ldap: + http_enabled: true + transport_enabled: true + order: 1 + http_authenticator: + type: basic + challenge: false + authentication_backend: + type: ldap + config: + ... +``` + +```yml +authz: + ldap: + http_enabled: true + transport_enabled: true + authorization_backend: + type: ldap + config: + ... +``` + +The connection settings are identical for authentication and authorization and are added to the `config` sections. + + +### Hostname and port + +To configure the hostname and port of your Active Directory servers, use the following: + +```yml +config: + hosts: + - primary.ldap.example.com:389 + - secondary.ldap.example.com:389 +``` + +You can configure more than one server here. If the security plugin cannot connect to the first server, it tries to connect to the remaining servers sequentially. + + +### Timeouts + +To configure connection and response timeouts to your Active Directory server, use the following (values are in milliseconds): + +```yml +config: + connect_timeout: 5000 + response_timeout: 0 +``` + +If your server supports two-factor authentication (2FA), the default timeout settings might result in login errors. You can increase `connect_timeout` to accommodate the 2FA process. Setting `response_timeout` to 0 (the default) indicates an indefinite waiting period. + + +### Bind DN and password + +To configure the `bind_dn` and `password` that the security plugin uses when issuing queries to your server, use the following: + +```yml +config: + bind_dn: cn=admin,dc=example,dc=com + password: password +``` + +If your server supports anonymous authentication, both `bind_dn` and `password` can be set to `null`. + + +### TLS settings + +Use the following parameters to configure TLS for connecting to your server: + +```yml +config: + enable_ssl: + enable_start_tls: + enable_ssl_client_auth: + verify_hostnames: +``` + +Name | Description +:--- | :--- +`enable_ssl` | Whether to use LDAP over SSL (LDAPS). +`enable_start_tls` | Whether to use STARTTLS. Can't be used in combination with LDAPS. +`enable_ssl_client_auth` | Whether to send the client certificate to the LDAP server. +`verify_hostnames` | Whether to verify the hostnames of the server's TLS certificate. + + +### Certificate validation + +By default, the security plugin validates the TLS certificate of the LDAP servers against the root CA configured in `opensearch.yml`, either as a PEM certificate or a truststore: + +``` +opensearch_security.ssl.transport.pemtrustedcas_filepath: ... +opensearch_security.ssl.http.truststore_filepath: ... +``` + +If your server uses a certificate signed by a different CA, import this CA into your truststore or add it to your trusted CA file on each node. + +You can also use a separate root CA in PEM format by setting one of the following configuration options: + +```yml +config: + pemtrustedcas_filepath: /full/path/to/trusted_cas.pem +``` + +```yml +config: + pemtrustedcas_content: |- + MIID/jCCAuagAwIBAgIBATANBgkqhkiG9w0BAQUFADCBjzETMBEGCgmSJomT8ixk + ARkWA2NvbTEXMBUGCgmSJomT8ixkARkWB2V4YW1wbGUxGTAXBgNVBAoMEEV4YW1w + bGUgQ29tIEluYy4xITAfBgNVBAsMGEV4YW1wbGUgQ29tIEluYy4gUm9vdCBDQTEh + ... +``` + + +Name | Description +:--- | :--- +`pemtrustedcas_filepath` | Absolute path to the PEM file containing the root CAs of your Active Directory/LDAP server. +`pemtrustedcas_content` | The root CA content of your Active Directory/LDAP server. Cannot be used when `pemtrustedcas_filepath` is set. + + +### Client authentication + +If you use TLS client authentication, the security plugin sends the PEM certificate of the node, as configured in `opensearch.yml`. Set one of the following configuration options: + +```yml +config: + pemkey_filepath: /full/path/to/private.key.pem + pemkey_password: private_key_password + pemcert_filepath: /full/path/to/certificate.pem +``` + +or + +```yml +config: + pemkey_content: |- + MIID2jCCAsKgAwIBAgIBBTANBgkqhkiG9w0BAQUFADCBlTETMBEGCgmSJomT8ixk + ARkWA2NvbTEXMBUGCgmSJomT8ixkARkWB2V4YW1wbGUxGTAXBgNVBAoMEEV4YW1w + bGUgQ29tIEluYy4xJDAiBgNVBAsMG0V4YW1wbGUgQ29tIEluYy4gU2lnbmluZyBD + ... + pemkey_password: private_key_password + pemcert_content: |- + MIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQCHRZwzwGlP2FvL + oEzNeDu2XnOF+ram7rWPT6fxI+JJr3SDz1mSzixTeHq82P5A7RLdMULfQFMfQPfr + WXgB4qfisuDSt+CPocZRfUqqhGlMG2l8LgJMr58tn0AHvauvNTeiGlyXy0ShxHbD + ... +``` + +Name | Description +:--- | :--- +`pemkey_filepath` | Absolute path to the file containing the private key of your certificate. +`pemkey_content` | The content of the private key of your certificate. Cannot be used when `pemkey_filepath` is set. +`pemkey_password` | The password of your private key, if any. +`pemcert_filepath` | Absolute path to the client certificate. +`pemcert_content` | The content of the client certificate. Cannot be used when `pemcert_filepath` is set. + + +### Enabled ciphers and protocols + +You can limit the allowed ciphers and TLS protocols for the LDAP connection. For example, you can allow only strong ciphers and limit the TLS versions to the most recent ones: + +```yml +ldap: + http_enabled: true + transport_enabled: true + ... + authentication_backend: + type: ldap + config: + enabled_ssl_ciphers: + - "TLS_DHE_RSA_WITH_AES_256_CBC_SHA" + - "TLS_DHE_DSS_WITH_AES_128_CBC_SHA256" + enabled_ssl_protocols: + - "TLSv1.1" + - "TLSv1.2" +``` + +Name | Description +:--- | :--- +`enabled_ssl_ciphers` | Array, enabled TLS ciphers. Only the Java format is supported. +`enabled_ssl_protocols` | Array, enabled TLS protocols. Only the Java format is supported. + + +--- + +## Use Active Directory and LDAP for authentication + +To use Active Directory/LDAP for authentication, first configure a respective authentication domain in the `authc` section of `plugins/opensearch_security/securityconfig/config.yml`: + +```yml +authc: + ldap: + http_enabled: true + transport_enabled: true + order: 1 + http_authenticator: + type: basic + challenge: true + authentication_backend: + type: ldap + config: + ... +``` + +Next, add the [connection settings](#connection-settings) for your Active Directory/LDAP server to the config section of the authentication domain: + +```yml +config: + enable_ssl: true + enable_start_tls: false + enable_ssl_client_auth: false + verify_hostnames: true + hosts: + - ldap.example.com:8389 + bind_dn: cn=admin,dc=example,dc=com + password: passw0rd +``` + +Authentication works by issuing an LDAP query containing the user name against the user subtree of the LDAP tree. + +The security plugin first takes the configured LDAP query and replaces the placeholder `{0}` with the user name from the user's credentials. + +```yml +usersearch: '(sAMAccountName={0})' +``` + +Then it issues this query against the user subtree. Currently, the entire subtree under the configured `userbase` is searched: + +```yml +userbase: 'ou=people,dc=example,dc=com' +``` + +If the query is successful, the security plugin retrieves the user name from the LDAP entry. You can specify which attribute from the LDAP entry the security plugin should use as the user name: + +```yml +username_attribute: uid +``` + +If this key is not set or null, then the distinguished name (DN) of the LDAP entry is used. + + +### Configuration summary + +Name | Description +:--- | :--- +`userbase` | Specifies the subtree in the directory where user information is stored. +`usersearch` | The actual LDAP query that the security plugin executes when trying to authenticate a user. The variable {0} is substituted with the user name. +`username_attribute` | The security plugin uses this attribute of the directory entry to look for the user name. If set to null, the DN is used (default). + + +### Complete authentication example + +```yml +ldap: + http_enabled: true + transport_enabled: true + order: 1 + http_authenticator: + type: basic + challenge: true + authentication_backend: + type: ldap + config: + enable_ssl: true + enable_start_tls: false + enable_ssl_client_auth: false + verify_hostnames: true + hosts: + - ldap.example.com:636 + bind_dn: cn=admin,dc=example,dc=com + password: password + userbase: 'ou=people,dc=example,dc=com' + usersearch: '(sAMAccountName={0})' + username_attribute: uid +``` + + +--- + +## Use Active Directory and LDAP for authorization + +To use Active Directory/LDAP for authorization, first configure a respective authorization domain in the `authz` section of `config.yml`: + +```yml +authz: + ldap: + http_enabled: true + transport_enabled: true + authorization_backend: + type: ldap + config: + ... +``` + +Authorization is the process of retrieving backend roles for an authenticated user from an LDAP server. This is typically the same servers that you use for authentication, but you can also use a different server. The only requirement is that the user to fetch the roles for actually exists on the LDAP server. + +Because the security plugin always checks if a user exists in the LDAP server, you must also configure `userbase`, `usersearch` and `username_attribute` in the `authz` section. + +Authorization works similarly to authentication. The security plugin issues an LDAP query containing the user name against the role subtree of the LDAP tree. + +As an alternative, the security plugin can also fetch roles that are defined as a direct attribute of the user entry in the user subtree. + + +### Approach 1: Query the role subtree + +The security plugin first takes the LDAP query for fetching roles ("rolesearch") and substitutes any variables found in the query. For example, for a standard Active Directory installation, you would use the following role search: + +```yml +rolesearch: '(member={0})' +``` + +You can use the following variables: + +- `{0}` is substituted with the DN of the user. +- `{1}` is substituted with the user name, as defined by the `username_attribute` setting. +- `{2}` is substituted with an arbitrary attribute value from the authenticated user's directory entry. + +The variable `{2}` refers to an attribute from the user's directory entry. The attribute that you should use is specified by the `userroleattribute` setting: + +```yml +userroleattribute: myattribute +``` + +The security plugin then issues the substituted query against the configured role subtree. The entire subtree under `rolebase` is searched: + +```yml +rolebase: 'ou=groups,dc=example,dc=com' +``` + +If you use nested roles (roles that are members of other roles), you can configure the security plugin to resolve them: + +```yml +resolve_nested_roles: false +``` + +After all roles have been fetched, the security plugin extracts the final role names from a configurable attribute of the role entries: + +```yml +rolename: cn +``` + +If this is not set, the DN of the role entry is used. You can now use this role name for mapping it to one or more of the security plugin roles, as defined in `roles_mapping.yml`. + + +### Approach 2: Use a user's attribute as the role name + +If you store the roles as a direct attribute of the user entries in the user subtree, you need to configure only the attribute name: + +```yml +userrolename: roles +``` + +You can configure multiple attribute names: + +```yml +userrolename: roles, otherroles +``` + +This approach can be combined with querying the role subtree. The security plugin fetches the roles from the user's role attribute and then executes the role search. + +If you don't use or have a role subtree, you can disable the role search completely: + +```yml +rolesearch_enabled: false +``` + + +### (Advanced) Control LDAP user attributes + +By default, the security plugin reads all LDAP user attributes and makes them available for index name variable substitution and DLS query variable substitution. If your LDAP entries have a lot of attributes, you might want to control which attributes should be made available. The fewer the attributes, the better the performance. + +Name | Description +:--- | :--- +`custom_attr_whitelist` | String array. Specifies the LDAP attributes that should be made available for variable substitution. +`custom_attr_maxval_len` | Integer. Specifies the maximum allowed length of each attribute. All attributes longer than this value are discarded. A value of `0` disables custom attributes altogether. Default is 36. + +Example: + +```yml +authz: + ldap: + http_enabled: true + transport_enabled: true + authorization_backend: + type: ldap + config: + custom_attr_whitelist: + - attribute1 + - attribute2 + custom_attr_maxval_len: 36 + ... +``` + + +### (Advanced) Exclude certain users from role lookup + +If you are using multiple authentication methods, it can make sense to exclude certain users from the LDAP role lookup. + +Consider the following scenario for a typical OpenSearch Dashboards setup: All OpenSearch Dashboards users are stored in an LDAP/Active Directory server. + +However, you also have an OpenSearch Dashboards server user. OpenSearch Dashboards uses this user to manage stored objects and perform monitoring and maintenance tasks. You do not want to add this user to your Active Directory installation, but rather store it in the security plugin internal user database. + +In this case, it makes sense to exclude the OpenSearch Dashboards server user from the LDAP authorization because we already know that there is no corresponding entry. You can use the `skip_users` configuration setting to define which users should be skipped. Wildcards and regular expressions are supported: + +```yml +skip_users: + - opensearch-dashboardsserver + - 'cn=Jane Doe,ou*people,o=TEST' + - '/\S*/' +``` + + +### (Advanced) Exclude roles from nested role lookups + +If the users in your LDAP installation have a large number of roles, and you have the requirement to resolve nested roles as well, you might run into performance issues. + +In most cases, however, not all user roles are related to OpenSearch and OpenSearch Dashboards. You might need only a couple of roles. In this case, you can use the nested role filter feature to define a list of roles that are filtered out from the list of the user's roles. Wildcards and regular expressions are supported. + +This has an effect only if `resolve_nested_roles` is `true`: + +```yml +nested_role_filter: + - 'cn=Jane Doe,ou*people,o=TEST' + - ... +``` + + +### Configuration summary + +Name | Description +:--- | :--- +`rolebase` | Specifies the subtree in the directory where role/group information is stored. +`rolesearch` | The actual LDAP query that the security plugin executes when trying to determine the roles of a user. You can use three variables here (see below). +`userroleattribute` | The attribute in a user entry to use for `{2}` variable substitution. +`userrolename` | If the roles/groups of a user are not stored in the groups subtree, but as an attribute of the user's directory entry, define this attribute name here. +`rolename` | The attribute of the role entry that should be used as the role name. +`resolve_nested_roles` | Boolean. Whether or not to resolve nested roles. Default is `false`. +`max_nested_depth` | Integer. When `resolve_nested_roles` is `true`, this defines the maximum number of nested roles to traverse. Setting smaller values can reduce the amount of data retrieved from LDAP and improve authentication times at the cost of failing to discover deeply nested roles. Default is `30`. +`skip_users` | Array of users that should be skipped when retrieving roles. Wildcards and regular expressions are supported. +`nested_role_filter` | Array of role DNs that should be filtered before resolving nested roles. Wildcards and regular expressions are supported. +`rolesearch_enabled` | Boolean. Enable or disable the role search. Default is `true`. +`custom_attr_whitelist` | String array. Specifies the LDAP attributes that should be made available for variable substitution. +`custom_attr_maxval_len` | Integer. Specifies the maximum allowed length of each attribute. All attributes longer than this value are discarded. A value of `0` disables custom attributes altogether. Default is 36. + + +### Complete authorization example + +```yml +authz: + ldap: + http_enabled: true + transport_enabled: true + authorization_backend: + type: ldap + config: + enable_ssl: true + enable_start_tls: false + enable_ssl_client_auth: false + verify_hostnames: true + hosts: + - ldap.example.com:636 + bind_dn: cn=admin,dc=example,dc=com + password: password + userbase: 'ou=people,dc=example,dc=com' + usersearch: '(uid={0})' + username_attribute: uid + rolebase: 'ou=groups,dc=example,dc=com' + rolesearch: '(member={0})' + userroleattribute: null + userrolename: none + rolename: cn + resolve_nested_roles: true + skip_users: + - opensearch-dashboardsserver + - 'cn=Jane Doe,ou*people,o=TEST' + - '/\S*/' +``` diff --git a/docs/security/configuration/openid-connect.md b/docs/security/configuration/openid-connect.md new file mode 100755 index 00000000..e8727a99 --- /dev/null +++ b/docs/security/configuration/openid-connect.md @@ -0,0 +1,337 @@ +--- +layout: default +title: OpenID Connect +parent: Configuration +grand_parent: Security +nav_order: 32 +--- + +# OpenID Connect + +The security plugin can integrate with identify providers that use the OpenID Connect standard. This feature enables the following: + +* Automatic configuration + + Point the security plugin to the metadata of your identity provider (IdP), and the security plugin uses that data for configuration. + +* Automatic key fetching + + The security plugin automatically retrieves the public key for validating the JSON web tokens (JWTs) from the JSON web key set (JWKS) endpoint of your IdP. You don't have to configure keys or shared secrets in `config.yml`. + +* Key rollover + + You can change the keys used for signing the JWTs directly in your IdP. If the security plugin detects an unknown key, it tries to retrieve it from the IdP. This rollover is transparent to the user. + +* OpenSearch Dashboards single sign-on + + +## Configure OpenID Connect integration + +To integrate with an OpenID IdP, set up an authentication domain and choose `openid` as the HTTP authentication type. JSON web tokens already contain all required information to verify the request, so set `challenge` to `false` and `authentication_backend` to `noop`. + +This is the minimal configuration: + +```yml +openid_auth_domain: + http_enabled: true + transport_enabled: true + order: 0 + http_authenticator: + type: openid + challenge: false + config: + subject_key: preferred_username + roles_key: roles + openid_connect_url: https://keycloak.example.com:8080/auth/realms/master/.well-known/openid-configuration + authentication_backend: + type: noop +``` + +The following table shows the configuration parameters. + +Name | Description +:--- | :--- +`openid_connect_url` | The URL of your IdP where the security plugin can find the OpenID Connect metadata/configuration settings. This URL differs between IdPs. Required. +`jwt_header` | The HTTP header that stores the token. Typically the `Authorization` header with the `Bearer` schema: `Authorization: Bearer `. Optional. Default is `Authorization`. +`jwt_url_parameter` | If the token is not transmitted in the HTTP header, but as an URL parameter, define the name of the parameter here. Optional. +`subject_key` | The key in the JSON payload that stores the user's name. If not defined, the [subject](https://tools.ietf.org/html/rfc7519#section-4.1.2) registered claim is used. Most IdP providers use the `preferred_username` claim. Optional. +`roles_key` | The key in the JSON payload that stores the user's roles. The value of this key must be a comma-separated list of roles. Required only if you want to use roles in the JWT. + + +## OpenID Connect URL + +OpenID Connect specifies various endpoints for integration purposes. The most important endpoint is `well-known`, which lists endpoints and other configuration options for the security plugin. + +The URL differs between IdPs, but usually ends in `/.well-known/openid-configuration`. + +Keycloak example: + +``` +http(s)://:/auth/realms//.well-known/openid-configuration +``` + +The main information that the security plugin needs is `jwks_uri`. This URI specifies where the IdP's public keys in JWKS format can be found. For example: + +``` +jwks_uri: "https://keycloak.example.com:8080/auth/realms/master/protocol/openid-connect/certs" +``` + +``` +{ + keys:[ + { + kid:"V-diposfUJIk5jDBFi_QRouiVinG5PowskcSWy5EuCo", + kty:"RSA", + alg:"RS256", + use:"sig", + n:"rI8aUrAcI_auAdF10KUopDOmEFa4qlUUaNoTER90XXWADtKne6VsYoD3ZnHGFXvPkRAQLM5d65ScBzWungcbLwZGWtWf5T2NzQj0wDyquMRwwIAsFDFtAZWkXRfXeXrFY0irYUS9rIJDafyMRvBbSz1FwWG7RTQkILkwiC4B8W1KdS5d9EZ8JPhrXvPMvW509g0GhLlkBSbPBeRSUlAS2Kk6nY5i3m6fi1H9CP3Y_X-TzOjOTsxQA_1pdP5uubXPUh5YfJihXcgewO9XXiqGDuQn6wZ3hrF6HTlhNWGcSyQPKh1gEcmXWQlRENZMvYET-BuJEE7eKyM5vRhjNoYR3w", + e:"AQAB" + } + ] +} +``` + +For more information about IdP endpoints, see the following: + +- [Okta](https://developer.okta.com/docs/api/resources/oidc#well-knownopenid-configuration) +- [Keycloak](https://www.keycloak.org/docs/latest/securing_apps/index.html#other-openid-connect-libraries) +- [Auth0](https://auth0.com/docs/protocols/oidc/openid-connect-discovery) +- [Connect2ID](https://connect2id.com/products/server/docs/api/discovery) +- [Salesforce](https://help.salesforce.com/articleView?id=remoteaccess_using_openid_discovery_endpoint.htm&type=5) +- [IBM OpenID Connect](https://www.ibm.com/support/knowledgecenter/en/SSEQTP_8.5.5/com.ibm.websphere.wlp.doc/ae/rwlp_oidc_endpoint_urls.html) + + +## Fetching public keys + +When an IdP generates and signs a JSON web token, it must add the ID of the key to the JWT header. For example: + +``` +{ + "alg": "RS256", + "typ": "JWT", + "kid": "V-diposfUJIk5jDBFi_QRouiVinG5PowskcSWy5EuCo" +} +``` + +As per the [OpenID Connect specification](http://openid.net/specs/openid-connect-messages-1_0-20.html), the `kid` (key ID) is mandatory. Token verification does not work if an IdP fails to add the `kid` field to the JWT. + +If the security plugin receives a JWT with an unknown `kid`, it visits the IdP's `jwks_uri` and retrieves all available, valid keys. These keys are used and cached until a refresh is triggered by retrieving another unknown key ID. + + +## Key rollover and multiple public keys + +The security plugin can maintain multiple valid public keys at once. The OpenID specification does not allow for a validity period of public keys, so a key is valid until it has been removed from the list of valid keys in your IdP and the list of valid keys has been refreshed. + +If you want to roll over a key in your IdP, follow these best practices: + +- Create a new key pair in your IdP, and give the new key a higher priority than the currently used key. + + Your IdP uses this new key over the old key. + +- Upon first appearance of the new `kid` in a JWT, the security plugin refreshes the key list. + + At this point, both the old key and the new key are valid. Tokens signed with the old key are also still valid. + +- The old key can be removed from your IdP when the last JWT signed with this key has timed out. + +If you have to immediately change your public key, you can also delete the old key first and then create a new one. In this case, all JWTs signed with the old key become invalid immediately. + + +## TLS settings + +To prevent man-in-the-middle attacks, you should secure the connection between the security plugin and your IdP with TLS. + + +### Enabling TLS + +Use the following parameters to enable TLS for connecting to your IdP: + +```yml +config: + enable_ssl: + verify_hostnames: +``` + +Name | Description +:--- | :--- +`enable_ssl` | Whether to use TLS. Default is false. +`verify_hostnames` | Whether to verify the hostnames of the IdP's TLS certificate. Default is true. + + +### Certificate validation + +To validate the TLS certificate of your IdP, configure either the path to the IdP's root CA or the root certificate's content: + +```yml +config: + pemtrustedcas_filepath: /path/to/trusted_cas.pem +``` + +```yml +config: + pemtrustedcas_content: |- + MIID/jCCAuagAwIBAgIBATANBgkqhkiG9w0BAQUFADCBjzETMBEGCgmSJomT8ixk + ARkWA2NvbTEXMBUGCgmSJomT8ixkARkWB2V4YW1wbGUxGTAXBgNVBAoMEEV4YW1w + bGUgQ29tIEluYy4xITAfBgNVBAsMGEV4YW1wbGUgQ29tIEluYy4gUm9vdCBDQTEh + ... +``` + + +Name | Description +:--- | :--- +`pemtrustedcas_filepath` | Absolute path to the PEM file containing the root CAs of your IdP. +`pemtrustedcas_content` | The root CA content of your IdP. Cannot be used if `pemtrustedcas_filepath` is set. + + +### TLS client authentication + +To use TLS client authentication, configure the PEM certificate and private key the security plugin should send for TLS client authentication (or its content): + +```yml +config: + pemkey_filepath: /path/to/private.key.pem + pemkey_password: private_key_password + pemcert_filepath: /path/to/certificate.pem +``` + +```yml +config: + pemkey_content: |- + MIID2jCCAsKgAwIBAgIBBTANBgkqhkiG9w0BAQUFADCBlTETMBEGCgmSJomT8ixk + ARkWA2NvbTEXMBUGCgmSJomT8ixkARkWB2V4YW1wbGUxGTAXBgNVBAoMEEV4YW1w + bGUgQ29tIEluYy4xJDAiBgNVBAsMG0V4YW1wbGUgQ29tIEluYy4gU2lnbmluZyBD + ... + pemkey_password: private_key_password + pemcert_content: |- + MIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQCHRZwzwGlP2FvL + oEzNeDu2XnOF+ram7rWPT6fxI+JJr3SDz1mSzixTeHq82P5A7RLdMULfQFMfQPfr + WXgB4qfisuDSt+CPocZRfUqqhGlMG2l8LgJMr58tn0AHvauvNTeiGlyXy0ShxHbD + ... +``` + +Name | Description +:--- | :--- +`enable_ssl_client_auth` | Whether to send the client certificate to the IdP server. Default is false. +`pemcert_filepath` | Absolute path to the client certificate. +`pemcert_content` | The content of the client certificate. Cannot be used when `pemcert_filepath` is set. +`pemkey_filepath` | Absolute path to the file containing the private key of the client certificate. +`pemkey_content` | The content of the private key of your client certificate. Cannot be used when `pemkey_filepath` is set. +`pemkey_password` | The password of your private key, if any. + + +### Enabled ciphers and protocols + +You can limit the allowed ciphers and TLS protocols by using the following keys. + +Name | Description +:--- | :--- +`enabled_ssl_ciphers` | Array. Enabled TLS cipher suites. Only Java format is supported. +`enabled_ssl_protocols` | Array. Enabled TLS protocols. Only Java format is supported. + + +## (Advanced) DoS protection + +To help protect against denial-of-service (DoS) attacks, the security plugin only allows a maximum number of new key IDs in a certain span of time. If the number of new key IDs exceeds this threshold, the security plugin returns HTTP status code 503 (Service Unavailable) and refuses to query the IdP. By default, the security plugin does not allow for more than 10 unknown key IDs within 10 seconds. The following table shows how to modify these settings. + +Name | Description +:--- | :--- +`refresh_rate_limit_count` | The maximum number of unknown key IDs in the time frame. Default is 10. +`refresh_rate_limit_time_window_ms` | The time frame to use when checking the maximum number of unknown key IDs, in milliseconds. Default is 10000 (10 seconds). + + +## OpenSearch Dashboards single sign-on + +Activate OpenID Connect by adding the following to `opensearch_dashboards.yml`: + +``` +opensearch_security.auth.type: "openid" +``` + + +### Configuration + +OpenID Connect providers usually publish their configuration in JSON format under the *metadata url*. Therefore, most settings can be pulled in automatically, so the OpenSearch Dashboards configuration becomes minimal. The most important settings are the following: + +- [Connect URL](#openid-connect-url) +- Client ID + + Every IdP can host multiple clients (sometimes called applications) with different settings and authentication protocols. When enabling OpenID Connect, you should create a new client for OpenSearch Dashboards in your IdP. The client ID uniquely identifies OpenSearch Dashboards. + +- Client secret + + Beyond the ID, each client also has a client secret assigned. The client secret is usually generated when the client is created. Applications can obtain an identity token only when they provide a client secret. You can find this secret in the settings of the client on your IdP. + + +### Configuration parameters + +Name | Description +:--- | :--- +`opensearch_security.openid.connect_url` | The URL where the IdP publishes the OpenID metadata. Required. +`opensearch_security.openid.client_id` | The ID of the OpenID Connect client configured in your IdP. Required. +`opensearch_security.openid.client_secret` | The client secret of the OpenID Connect client configured in your IdP. Required. +`opensearch_security.openid.scope` | The [scope of the identity token](https://auth0.com/docs/scopes/current) issued by the IdP. Optional. Default is `openid profile email address phone`. +`opensearch_security.openid.header` | HTTP header name of the JWT token. Optional. Default is `Authorization`. +`opensearch_security.openid.logout_url` | The logout URL of your IdP. Optional. Only necessary if your IdP does not publish the logout URL in its metadata. +`opensearch_security.openid.base_redirect_url` | The base of the redirect URL that will be sent to your IdP. Optional. Only necessary when OpenSearch Dashboards is behind a reverse proxy, in which case it should be different than `server.host` and `server.port` in `opensearch_dashboards.yml`. + + +### Configuration example + +```yml +# Enable OpenID authentication +opensearch_security.auth.type: "openid" + +# The IdP metadata endpoint +opensearch_security.openid.connect_url: "http://keycloak.example.com:8080/auth/realms/master/.well-known/openid-configuration" + +# The ID of the OpenID Connect client in your IdP +opensearch_security.openid.client_id: "opensearch-dashboards-sso" + +# The client secret of the OpenID Connect client +opensearch_security.openid.client_secret: "a59c51f5-f052-4740-a3b0-e14ba355b520" + +# Use HTTPS instead of HTTP +opensearch.url: "https://.com:" + +# Configure the OpenSearch Dashboards internal server user +opensearch.username: "opensearch-dashboardsserver" +opensearch.password: "opensearch-dashboardsserver" + +# Disable SSL verification when using self-signed demo certificates +opensearch.ssl.verificationMode: none + +# Whitelist basic headers and multi-tenancy header +opensearch.requestHeadersWhitelist: ["Authorization", "security_tenant"] +``` + + +### OpenSearch security configuration + +Because OpenSearch Dashboards requires that the internal OpenSearch Dashboards server user can authenticate through HTTP basic authentication, you must configure two authentication domains. For OpenID Connect, the HTTP basic domain has to be placed first in the chain. Make sure you set the challenge flag to `false`. + +Modify and apply the following example settings in `config.yml`: + +```yml +basic_internal_auth_domain: + http_enabled: true + transport_enabled: true + order: 0 + http_authenticator: + type: basic + challenge: false + authentication_backend: + type: internal +openid_auth_domain: + http_enabled: true + transport_enabled: true + order: 1 + http_authenticator: + type: openid + challenge: false + config: + subject_key: preferred_username + roles_key: roles + openid_connect_url: https://keycloak.example.com:8080/auth/realms/master/.well-known/openid-configuration + authentication_backend: + type: noop +``` diff --git a/docs/security/configuration/proxy.md b/docs/security/configuration/proxy.md new file mode 100644 index 00000000..e3e6d296 --- /dev/null +++ b/docs/security/configuration/proxy.md @@ -0,0 +1,208 @@ +--- +layout: default +title: Proxy-based authentication +parent: Configuration +grand_parent: Security +nav_order: 40 +--- + +# Proxy-based authentication + +If you already have a single sign-on (SSO) solution in place, you might want to use it as an authentication backend. + +Most solutions work as a proxy in front of OpenSearch and the security plugin. If proxy authentication succeeds, the proxy adds the (verified) username and its (verified) roles in HTTP header fields. The names of these fields depend on the SSO solution you have in place. + +The security plugin then extracts these HTTP header fields from the request and uses the values to determine the user's permissions. + + +## Enable proxy detection + +To enable proxy detection for OpenSearch, configure it in the `xff` section of `config.yml`: + +```yml +--- +_meta: + type: "config" + config_version: 2 + +config: + dynamic: + http: + anonymous_auth_enabled: false + xff: + enabled: true + internalProxies: '192\.168\.0\.10|192\.168\.0\.11' + remoteIpHeader: 'x-forwarded-for' +``` + +You can configure the following settings: + +Name | Description +:--- | :--- +`enabled` | Enables or disables proxy support. Default is false. +`internalProxies` | A regular expression containing the IP addresses of all trusted proxies. The pattern `.*` trusts all internal proxies. +`remoteIpHeader` | Name of the HTTP header field that has the hostname chain. Default is `x-forwarded-for`. + +To determine whether a request comes from a trusted internal proxy, the security plugin compares the remote address of the HTTP request with the list of configured internal proxies. If the remote address is not in the list, the plugin treats the request like a client request. + + +## Enable proxy authentication + +Configure the names of the HTTP header fields that carry the authenticated username and role(s) in in the `proxy` HTTP authenticator section: + +```yml +proxy_auth_domain: + http_enabled: true + transport_enabled: true + order: 0 + http_authenticator: + type: proxy + challenge: false + config: + user_header: "x-proxy-user" + roles_header: "x-proxy-roles" + authentication_backend: + type: noop +``` + +Name | Description +:--- | :--- +`user_header` | The HTTP header field containing the authenticated username. Default is `x-proxy-user`. +`roles_header` | The HTTP header field containing the comma-separated list of authenticated role names. The security plugin uses the roles found in this header field as backend roles. Default is `x-proxy-roles`. +`roles_separator` | The separator for roles. Default is `,`. + + +## Enable extended proxy authentication + +The security plugin has an extended version of the `proxy` type that lets you pass additional user attributes for use with document-level security. Aside from `type: extended-proxy` and `attr_header_prefix`, configuration is identical: + +```yml +proxy_auth_domain: + http_enabled: true + transport_enabled: true + order: 0 + http_authenticator: + type: extended-proxy + challenge: false + config: + user_header: "x-proxy-user" + roles_header: "x-proxy-roles" + attr_header_prefix: "x-proxy-ext-" + authentication_backend: + type: noop +``` + +Name | Description +:--- | :--- +`attr_header_prefix` | The header prefix that the proxy uses to provide user attributes. For example, if the proxy provides `x-proxy-ext-namespace: my-namespace`, use `${attr.proxy.namespace}` in document-level security queries. + + +## Example + +The following example uses an nginx proxy in front of a three-node OpenSearch cluster. For simplicity, we use hardcoded values for `x-proxy-user` and `x-proxy-roles`. In a real world example you would set these headers dynamically. The example also includes a commented header for use with the extended proxy. + +``` +events { + worker_connections 1024; +} + +http { + + upstream opensearch { + server node1.example.com:9200; + server node2.example.com:9200; + server node3.example.com:9200; + keepalive 15; + } + + server { + listen 8090; + server_name nginx.example.com; + + location / { + proxy_pass https://opensearch; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header x-proxy-user test; + proxy_set_header x-proxy-roles test; + #proxy_set_header x-proxy-ext-namespace my-namespace; + } + } + +} +``` + +The corresponding minimal `config.yml` looks like: + +```yml +--- +_meta: + type: "config" + config_version: 2 + +config: + dynamic: + http: + xff: + enabled: true + internalProxies: '172.16.0.203' # the nginx proxy + authc: + proxy_auth_domain: + http_enabled: true + transport_enabled: true + order: 0 + http_authenticator: + type: proxy + #type: extended-proxy + challenge: false + config: + user_header: "x-proxy-user" + roles_header: "x-proxy-roles" + #attr_header_prefix: "x-proxy-ext-" + authentication_backend: + type: noop +``` + +The important part is to enable the `X-Forwarded-For (XFF)` resolution and set the IP(s) of the internal proxies correctly: + +```yml +enabled: true +internalProxies: '172.16.0.203' # nginx proxy +``` + +In this case, `nginx.example.com` runs on `172.16.0.203`, so add this IP to the list of internal proxies. Be sure to set `internalProxies` to the minimum number of IP addresses so that the security plugin only accepts requests from trusted IPs. + + +## OpenSearch Dashboards proxy authentication + +To use proxy authentication with OpenSearch Dashboards, the most common configuration is to place the proxy in front of OpenSearch Dashboards and let OpenSearch Dashboards pass the user and role headers to the security plugin. + +In this case, the remote address of the HTTP call is the IP of OpenSearch Dashboards, because it sits directly in front of OpenSearch. Add the IP of OpenSearch Dashboards to the list of internal proxies: + +```yml +--- +_meta: + type: "config" + config_version: 2 + +config: + dynamic: + http: + xff: + enabled: true + remoteIpHeader: "x-forwarded-for" + internalProxies: '' +``` + +To pass the user and role headers that the authenticating proxy adds from OpenSearch Dashboards to the security plugin, add them to the HTTP header whitelist in `opensearch_dashboards.yml`: + +```yml +opensearch.requestHeadersWhitelist: ["securitytenant","Authorization","x-forwarded-for","x-proxy-user","x-proxy-roles"] +``` + +You must also enable the authentication type in `opensearch_dashboards.yml`: + +```yml +opensearch_security.auth.type: "proxy" +opensearch_security.proxycache.user_header: "x-proxy-user" +opensearch_security.proxycache.roles_header: "x-proxy-roles" +``` diff --git a/docs/security/configuration/saml.md b/docs/security/configuration/saml.md new file mode 100755 index 00000000..35d322ac --- /dev/null +++ b/docs/security/configuration/saml.md @@ -0,0 +1,332 @@ +--- +layout: default +title: SAML +parent: Configuration +grand_parent: Security +nav_order: 31 +--- + +# SAML + +The security plugin supports user authentication through SAML single sign-on. The security plugin implements the web browser SSO profile of the SAML 2.0 protocol. + +This profile is meant for use with web browsers. It is not a general-purpose way of authenticating users against the security plugin, so its primary use case is to support OpenSearch Dashboards single sign-on. + +{% comment %} + +## Docker example + +We provide a fully functional example that can help you understand how to use SAML with OpenSearch Dashboards. + +1. Download and unzip [the example ZIP file]({{site.url}}{{site.baseurl}}/assets/examples/saml-example.zip). +1. At the command line, run `docker-compose up`. +1. Review the files: + + * `docker-compose.yml` defines two OpenSearch nodes, an OpenSearch Dashboards server, and a SAML server. + * `custom-opensearch_dashboards.yml` add a few SAML settings to the default `opensearch_dashboards.yml` file. + * `config.yml` configures SAML for authentication. + +1. Access OpenSearch Dashboards at [http://localhost:5601](http://localhost:5601){:target='\_blank'}. Note that OpenSearch Dashboards immediately redirects you to the SAML login page. + +1. Log in as `admin` with a password of `admin`. + +1. After logging in, note that your user in the upper-right is `SAMLAdmin`, as defined in `/var/www/simplesamlphp/config/authsources.php` of the SAML server. + +1. If you want to examine the SAML server, run `docker ps` to find its container ID and then `docker exec -it /bin/bash`. + + In particular, you might find it helpful to review the contents of the `/var/www/simplesamlphp/config/` and `/var/www/simplesamlphp/metadata/` directories. + +{% endcomment %} + +## Activating SAML + +To use SAML for authentication, you need to configure a respective authentication domain in the `authc` section of `plugins/opensearch_security/securityconfig/config.yml`. Because SAML works solely on the HTTP layer, you do not need any `authentication_backend` and can set it to `noop`. Place all SAML-specific configuration options in this chapter in the `config` section of the SAML HTTP authenticator: + +```yml +authc: + saml_auth_domain: + http_enabled: true + transport_enabled: false + order: 1 + http_authenticator: + type: saml + challenge: true + config: + idp: + metadata_file: okta.xml + ... + authentication_backend: + type: noop +``` + +After you have configured SAML in `config.yml`, you must also [activate it in OpenSearch Dashboards](#opensearch-dashboards-configuration). + + +## Running multiple authentication domains + +We recommend adding at least one other authentication domain, such as LDAP or the internal user database, to support API access to OpenSearch without SAML. For OpenSearch Dashboards and the internal OpenSearch Dashboards server user, you also must add another authentication domain that supports basic authentication. This authentication domain should be placed first in the chain, and the `challenge` flag must be set to `false`: + +```yml +authc: + basic_internal_auth_domain: + http_enabled: true + transport_enabled: true + order: 0 + http_authenticator: + type: basic + challenge: false + authentication_backend: + type: internal + saml_auth_domain: + http_enabled: true + transport_enabled: false + order: 1 + http_authenticator: + type: saml + challenge: true + config: + ... + authentication_backend: + type: noop +``` + + +## Identity provider metadata + +A SAML identity provider (IdP) provides a SAML 2.0 metadata file describing the IdP's capabilities and configuration. The security plugin can read IdP metadata either from a URL or a file. The choice that you make depends on your IdP and your preferences. The SAML 2.0 metadata file is required. + +Name | Description +:--- | :--- +`idp.metadata_file` | The path to the SAML 2.0 metadata file of your IdP. Place the metadata file in the `config` directory of OpenSearch. The path has to be specified relative to the `config` directory. Required if `idp.metadata_url` is not set. +`idp.metadata_url` | The SAML 2.0 metadata URL of your IdP. Required if `idp.metadata_file` is not set. + + +## IdP and service provider entity ID + +An entity ID is a globally unique name for a SAML entity, either an IdP or a service provider (SP). The IdP entity ID is usually provided by your IdP. The SP entity ID is the name of the configured application or client in your IdP. We recommend adding a new application for OpenSearch Dashboards and using the URL of your OpenSearch Dashboards installation as the SP entity ID. + +Name | Description +:--- | :--- +`idp.entity_id` | The entity ID of your IdP. Required. +`sp.entity_id` | The entity ID of the service provider. Required. + + +## OpenSearch Dashboards settings + +The Web Browser SSO Profile exchanges information through HTTP GET or POST. For example, after you log in to your IdP, it sends an HTTP POST back to OpenSearch Dashboards containing the SAML response. You must configure the base URL of your OpenSearch Dashboards installation where the HTTP requests are being sent to. + +Name | Description +:--- | :--- +`opensearch_dashboards_url` | The OpenSearch Dashboards base URL. Required. + + +## Username and Role attributes + +Subjects (for example, user names) are usually stored in the `NameID` element of a SAML response: + +``` + + admin + ... + +``` + +If your IdP is compliant with the SAML 2.0 specification, you do not need to set anything special. If your IdP uses a different element name, you can also specify its name explicitly. + +Role attributes are optional. However, most IdPs can be configured to add roles in the SAML assertions as well. If present, you can use these roles in your [role mappings](../concepts): + +``` + + Everyone + Admins + +``` + +If you want to extract roles from the SAML response, you need to specify the element name that contains the roles. + +Name | Description +:--- | :--- +`subject_key` | The attribute in the SAML response where the subject is stored. Optional. If not configured, the `NameID` attribute is used. +`roles_key` | The attribute in the SAML response where the roles are stored. Optional. If not configured, no roles are used. + + +## Request signing + +Requests from the security plugin to the IdP can optionally be signed. Use the following settings to configure request signing. + +Name | Description +:--- | :--- +`sp.signature_private_key` | The private key used to sign the requests or to decode encrypted assertions. Optional. Cannot be used when `private_key_filepath` is set. +`sp.signature_private_key_password` | The password of the private key, if any. +`sp.signature_private_key_filepath` | Path to the private key. The file must be placed under the OpenSearch `config` directory, and the path must be specified relative to that same directory. +`sp.signature_algorithm` | The algorithm used to sign the requests. See the next table for possible values. + +The security plugin supports the following signature algorithms. + +Algorithm | Value +:--- | :--- +DSA_SHA1 | http://www.w3.org/2000/09/xmldsig#dsa-sha1; +RSA_SHA1 | http://www.w3.org/2000/09/xmldsig#rsa-sha1; +RSA_SHA256 | http://www.w3.org/2001/04/xmldsig-more#rsa-sha256; +RSA_SHA384 | http://www.w3.org/2001/04/xmldsig-more#rsa-sha384; +RSA_SHA512 | http://www.w3.org/2001/04/xmldsig-more#rsa-sha512; + + +## Logout + +Usually, IdPs provide information about their individual logout URL in their SAML 2.0 metadata. If this is the case, the security plugin uses them to render the correct logout link in OpenSearch Dashboards. If your IdP does not support an explicit logout, you can force a re-login when the user visits OpenSearch Dashboards again. + +Name | Description +:--- | :--- +`sp.forceAuthn` | Force a re-login even if the user has an active session with the IdP. + +Currently, the security plugin supports only the `HTTP-Redirect` logout binding. Make sure this is configured correctly in your IdP. + + +## Exchange key settings + +SAML, unlike other protocols, is not meant to be used for exchanging user credentials with each request. The security plugin trades the SAML response for a lightweight JSON web token that stores the validated user attributes. This token is signed by an exchange key that you can choose freely. Note that when you change this key, all tokens signed with it become invalid immediately. + +Name | Description +:--- | :--- +`exchange_key` | The key to sign the token. The algorithm is HMAC256, so it should have at least 32 characters. + + +## TLS settings + +If you are loading the IdP metadata from a URL, we recommend that you use SSL/TLS. If you use an external IdP like Okta or Auth0 that uses a trusted certificate, you usually do not need to configure anything. If you host the IdP yourself and use your own root CA, you can customize the TLS settings as follows. These settings are used only for loading SAML metadata over HTTPS. + +Name | Description +:--- | :--- +`idp.enable_ssl` | Whether to enable the custom TLS configuration. Default is false (JDK settings are used). +`idp.verify_hostnames` | Whether to verify the hostnames of the server's TLS certificate. + +Example: + +```yml +authc: + saml_auth_domain: + http_enabled: true + transport_enabled: false + order: 1 + http_authenticator: + type: saml + challenge: true + config: + idp: + enable_ssl: true + verify_hostnames: true + ... + authentication_backend: + type: noop +``` + + +### Certificate validation + +Configure the root CA used for validating the IdP TLS certificate by setting **one** of the following configuration options: + +```yml +config: + idp: + pemtrustedcas_filepath: path/to/trusted_cas.pem +``` + +```yml +config: + idp: + pemtrustedcas_content: |- + MIID/jCCAuagAwIBAgIBATANBgkqhkiG9w0BAQUFADCBjzETMBEGCgmSJomT8ixk + ARkWA2NvbTEXMBUGCgmSJomT8ixkARkWB2V4YW1wbGUxGTAXBgNVBAoMEEV4YW1w + bGUgQ29tIEluYy4xITAfBgNVBAsMGEV4YW1wbGUgQ29tIEluYy4gUm9vdCBDQTEh + ... +``` + +Name | Description +:--- | :--- +`idp.pemtrustedcas_filepath` | Path to the PEM file containing the root CAs of your IdP. The files must be placed under the OpenSearch `config` directory, and you must specify the path relative to that same directory. +`idp.pemtrustedcas_content` | The root CA content of your IdP server. Cannot be used when `pemtrustedcas_filepath` is set. + + +### Client authentication + +The security plugin can use TLS client authentication when fetching the IdP metadata. If enabled, the security plugin sends a TLS client certificate to the IdP for each metadata request. Use the following keys to configure client authentication. + +Name | Description +:--- | :--- +`idp.enable_ssl_client_auth` | Whether to send a client certificate to the IdP server. Default is false. +`idp.pemcert_filepath` | Path to the PEM file containing the client certificate. The file must be placed under the OpenSearch `config` directory, and the path must be specified relative to the `config` directory. +`idp.pemcert_content` | The content of the client certificate. Cannot be used when `pemcert_filepath` is set. +`idp.pemkey_filepath` | Path to the private key of the client certificate. The file must be placed under the OpenSearch `config` directory, and the path must be specified relative to the `config` directory. +`idp.pemkey_content` | The content of the private key of your certificate. Cannot be used when `pemkey_filepath` is set. +`idp.pemkey_password` | The password of your private key, if any. + + +### Enabled ciphers and protocols + +You can limit the allowed ciphers and TLS protocols for the IdP connection. For example, you can only enable strong ciphers and limit the TLS versions to the most recent ones. + +Name | Description +:--- | :--- +`idp.enabled_ssl_ciphers` | Array of enabled TLS ciphers. Only the Java format is supported. +`idp.enabled_ssl_protocols` | Array of enabled TLS protocols. Only the Java format is supported. + + +## Minimal configuration example +The following example shows the minimal configuration: + +```yml +authc: + saml_auth_domain: + http_enabled: true + transport_enabled: false + order: 1 + http_authenticator: + type: saml + challenge: true + config: + idp: + metadata_file: metadata.xml + entity_id: http://idp.example.com/ + sp: + entity_id: https://opensearch-dashboards.example.com + opensearch_dashboards_url: https://opensearch-dashboards.example.com:5601/ + roles_key: Role + exchange_key: 'peuvgOLrjzuhXf ...' + authentication_backend: + type: noop +``` + +## OpenSearch Dashboards configuration + +Because most of the SAML-specific configuration is done in the security plugin, just activate SAML in your `opensearch_dashboards.yml` by adding the following: + +``` +opensearch_security.auth.type: "saml" +``` + +In addition, the OpenSearch Dashboards endpoint for validating the SAML assertions must be whitelisted: + +``` +server.xsrf.whitelist: ["/_opensearch/_security/saml/acs"] +``` + +If you use the logout POST binding, you also need to whitelist the logout endpoint: + +```yml +server.xsrf.whitelist: ["/_opensearch/_security/saml/acs", "/_opensearch/_security/saml/logout"] +``` + +### IdP-initiated SSO + +To use IdP-initiated SSO, set the Assertion Consumer Service endpoint of your IdP to this: + +``` +/_opensearch/_security/saml/acs/idpinitiated +``` + +Then add this endpoint to `server.xsrf.whitelist` in `opensearch_dashboards.yml`: + +```yml +server.xsrf.whitelist: ["/_opensearch/_security/saml/acs/idpinitiated", "/_opensearch/_security/saml/acs", "/_opensearch/_security/saml/logout"] +``` diff --git a/docs/security/configuration/security-admin.md b/docs/security/configuration/security-admin.md new file mode 100755 index 00000000..83daacba --- /dev/null +++ b/docs/security/configuration/security-admin.md @@ -0,0 +1,235 @@ +--- +layout: default +title: Apply Changes with securityadmin.sh +parent: Configuration +grand_parent: Security +nav_order: 20 +--- + +# Apply configuration changes using securityadmin.sh + +The security plugin stores its configuration---including users, roles, and permissions---in an index on the OpenSearch cluster (`.opensearch_security`). Storing these settings in an index lets you change settings without restarting the cluster and eliminates the need to edit configuration files on every single node. + +After changing any of the configuration files in `plugins/opensearch_security/securityconfig`, however, you must run `plugins/opensearch_security/tools/securityadmin.sh` to load these new settings into the index. You must also run this script at least once to initialize the `.opensearch_security` index and configure your authentication and authorization methods. + +After the `.opensearch_security` index is initialized, you can use OpenSearch Dashboards to manage your users, roles, and permissions. + + +## Configure the admin certificate + +You can configure all certificates that should have admin privileges in `opensearch.yml` stating their respective distinguished names (DNs). If you use the demo certificates, for example, you can use the `kirk` certificate: + +```yml +opensearch_security.authcz.admin_dn: + - CN=kirk,OU=client,O=client,L=test,C=DE +``` + +You can't use node certificates as admin certificates. The two must be separate. Also, do not use any whitespace between the parts of the DN. +{: .warning } + + +## Basic usage + +The `securityadmin.sh` tool can be run from any machine that has access to the transport port of your OpenSearch cluster (the default is 9300). You can change the security plugin configuration without having to access your nodes through SSH. + +Each node also includes the tool at `plugins/opensearch_security/tools/securityadmin.sh`. You might need to make the script executable before running it: + +```bash +chmod +x plugins/opensearch_security/tools/securityadmin.sh +``` + +To print all available command line options, run the script with no arguments: + +```bash +./plugins/opensearch_security/tools/securityadmin.sh +``` + +To load configuration changes to the security plugin, you must provide your admin certificate to the tool: + +```bash +./securityadmin.sh -cd ../securityconfig/ -icl -nhnv \ + -cacert ../../../config/root-ca.pem \ + -cert ../../../config/kirk.pem \ + -key ../../../config/kirk-key.pem +``` + +- The `-cd` option specifies where the security plugin configuration files to upload to the cluster can be found. +- The `-icl` (`--ignore-clustername`) option tells the security plugin to upload the configuration regardless of the cluster name. As an alternative, you can also specify the cluster name with the `-cn` (`--clustername`) option. +- Because the demo certificates are self-signed, we also disable hostname verification with the `-nhnv` (`--disable-host-name-verification`) option. +- The `-cacert`, `-cert` and `-key` options define the location of your root CA certificate, the admin certificate, and the private key for the admin certificate. If the private key has a password, specify it with the `-keypass` option. + +The following table shows the PEM options. + +Name | Description +:--- | :--- +`-cert` | The location of the PEM file containing the admin certificate and all intermediate certificates, if any. You can use an absolute or relative path. Relative paths are resolved relative to the execution directory of `securityadmin.sh`. +`-key` | The location of the PEM file containing the private key of the admin certificate. You can use an absolute or relative path. Relative paths are resolved relative to the execution directory of `securityadmin.sh`. The key must be in PKCS#8 format. +`-keypass` | The password of the private key of the admin certificate, if any. +`-cacert` | The location of the PEM file containing the root certificate. You can use an absolute or relative path. Relative paths are resolved relative to the execution directory of `securityadmin.sh`. + + +## Sample commands + +Apply configuration in `securityconfig` using PEM certificates: + +```bash +/usr/share/opensearch/plugins/opensearch_security/tools/securityadmin.sh -cacert /etc/opensearch/root-ca.pem -cert /etc/opensearch/kirk.pem -key /etc/opensearch/kirk-key.pem -cd /usr/share/opensearch/plugins/opensearch_security/securityconfig/ +``` + +Apply configuration from a single file (`config.yml`) using PEM certificates: + +```bash +./securityadmin.sh -f ../securityconfig/config.yml -icl -nhnv -cert /etc/opensearch/kirk.pem -cacert /etc/opensearch/root-ca.pem -key /etc/opensearch/kirk-key.pem -t config +``` + +Apply configuration in `securityconfig` with keystore and truststore files: + +```bash +./securityadmin.sh \ + -cd /usr/share/opensearch/plugins/opensearch_security/securityconfig/ \ + -ks /path/to/keystore.jks \ + -kspass changeit \ + -ts /path/to/truststore.jks \ + -tspass changeit + -nhnv + -icl +``` + + +## Using securityadmin with keystore and truststore files + +You can also use keystore files in JKS format in conjunction with `securityadmin.sh`: + +```bash +./securityadmin.sh -cd ../securityconfig -icl -nhnv + -ts -tspass + -ks -kspass +``` + +Use the following options to control the key and truststore settings. + +Name | Description +:--- | :--- +`-ks` | The location of the keystore containing the admin certificate and all intermediate certificates, if any. You can use an absolute or relative path. Relative paths are resolved relative to the execution directory of `securityadmin.sh`. +`-kspass` | The password for the keystore. +`-kst` | The key store type, either JKS or PKCS#12/PFX. If not specified, the security plugin tries to determine the type from the file extension. +`-ksalias` | The alias of the admin certificate, if any. +`-ts` | The location of the truststore containing the root certificate. You can use an absolute or relative path. Relative paths are resolved relative to the execution directory of `securityadmin.sh`. +`-tspass` | The password for the truststore. +`-tst` | The truststore type, either JKS or PKCS#12/PFX. If not specified, the security plugin tries to determine the type from the file extension. +`-tsalias` | The alias for the root certificate, if any. + + +### OpenSearch settings + +If you run a default OpenSearch installation, which listens on transport port 9300 and uses `opensearch` as a cluster name, you can omit the following settings altogether. Otherwise, specify your OpenSearch settings by using the following switches. + +Name | Description +:--- | :--- +`-h` | OpenSearch hostname. Default is `localhost`. +`-p` | OpenSearch port. Default is 9300---not the HTTP port. +`-cn` | Cluster name. Default is `opensearch`. +`-icl` | Ignore cluster name. +`-sniff` | Sniff cluster nodes. Sniffing detects available nodes using the OpenSearch `_cluster/state` API. +`-arc,--accept-red-cluster` | Execute `securityadmin.sh` even if the cluster state is red. Default is false, which means the script will not execute on a red cluster. + + +### Certificate validation settings + +Use the following options to control certificate validation. + +Name | Description +:--- | :--- +`-nhnv` | Do not validate hostname. Default is false. +`-nrhn` | Do not resolve hostname. Only relevant if `-nhnv` is not set. +`-noopenssl` | Do not use OpenSSL, even if available. Default is to use OpenSSL if it is available. + + +### Configuration files settings + +The following switches define which configuration files you want to push to the security plugin. You can either push a single file or specify a directory containing one or more configuration files. + +Name | Description +:--- | :--- +`-cd` | Directory containing multiple security plugin configuration files. +`-f` | Single configuration file. Can't be used with `-cd`. +`-t` | File type. +`-rl` | Reload the current configuration and flush the internal cache. + +To upload all configuration files in a directory, use this: + +```bash +./securityadmin.sh -cd ../securityconfig -ts ... -tspass ... -ks ... -kspass ... +``` + +If you want to push a single configuration file, use this: + +```bash +./securityadmin.sh -f ../securityconfig/internal_users.yml -t internalusers \ + -ts ... -tspass ... -ks ... -kspass ... +``` + +The file type must be one of the following: + +* config +* roles +* rolesmapping +* internalusers +* actiongroups + + +### Cipher settings + +You probably won't need to change cipher settings. If you need to, use the following options. + +Name | Description +:--- | :--- +`-ec` | Comma-separated list of enabled TLS ciphers. +`-ep` | Comma-separated list of enabled TLS protocols. + + +### Backup, restore, and migrate + +You can download all current configuration files from your cluster with the following command: + +```bash +./securityadmin.sh -backup /file/path -ts ... -tspass ... -ks ... -kspass ... +``` + +This command dumps the current security plugin configuration from your cluster to individual files in the directory you specify. You can then use these files as backups or to load the configuration into a different cluster. This command is useful when moving a proof-of-concept to production: + +```bash +./securityadmin.sh -backup ~ -icl -nhnv -cacert ../../../config/root-ca.pem -cert ../../../config/kirk.pem -key ../../../config/kirk-key.pem +``` + +To upload the dumped files to another cluster: + +```bash +./securityadmin.sh -h production.example.com -p 9301 -cd /etc/backup/ -ts ... -tspass ... -ks ... -kspass ... +``` + +To migrate configuration YAML files from the OpenSearch 0.x.x format to the 1.x.x format: + +```bash +./securityadmin.sh -migrate ../securityconfig -ts ... -tspass ... -ks ... -kspass ... +``` + +Name | Description +:--- | :--- +`-backup` | Retrieve the current security plugin configuration from a running cluster and dump it to the working directory. +`-migrate` | Migrate configuration YAML files from version 0.x.x to 1.x.x. + + +### Other options + +Name | Description +:--- | :--- +`-dci` | Delete the security plugin configuration index and exit. This option is useful if the cluster state is red due to a corrupted security plugin index. +`-esa` | Enable shard allocation and exit. This option is useful if you disabled shard allocation while performing a full cluster restart and need to recreate the security plugin index. +`-w` | Displays information about the used admin certificate. +`-rl` | By default, the security plugin caches authenticated users, along with their roles and permissions, for one hour. This option reloads the current security plugin configuration stored in your cluster, invalidating any cached users, roles, and permissions. +`-i` | The security plugin index name. Default is `.opensearch_security`. +`-er` | Set explicit number of replicas or auto-expand expression for the `opensearch_security` index. +`-era` | Enable replica auto-expand. +`-dra` | Disable replica auto-expand. +`-us` | Update the replica settings. diff --git a/docs/security/configuration/system-indices.md b/docs/security/configuration/system-indices.md new file mode 100644 index 00000000..4c7d653d --- /dev/null +++ b/docs/security/configuration/system-indices.md @@ -0,0 +1,26 @@ +--- +layout: default +title: System Indices +parent: Configuration +grand_parent: Security +nav_order: 15 +--- + +# System indices + +By default, OpenSearch has a protected system index, `.opensearch_security`, which you create using [securityadmin.sh](../security-admin/). Even if your user account has read permissions for all indices, you can't directly access the data in this system index. + +You can add additional system indices in in `opensearch.yml`. In addition to automatically creating `.opensearch_security`, the demo configuration adds several indices for the various OpenSearch plugins that integrate with the security plugin: + +```yml +opensearch_security.system_indices.enabled: true +opensearch_security.system_indices.indices: [".opensearch-alerting-config", ".opensearch-alerting-alert*", ".opensearch-anomaly-results*", ".opensearch-anomaly-detector*", ".opensearch-anomaly-checkpoints", ".opensearch-anomaly-detection-state"] +``` + +To access these indices, you must authenticate with an [admin certificate](../tls/#configure-admin-certificates): + +```bash +curl -k --cert ./kirk.pem --key ./kirk-key.pem -XGET 'https://localhost:9200/.opensearch_security/_search' +``` + +The alternative is to remove indices from the `opensearch_security.system_indices.indices` list on each node and restart OpenSearch. diff --git a/docs/security/configuration/tls.md b/docs/security/configuration/tls.md new file mode 100755 index 00000000..3ed006eb --- /dev/null +++ b/docs/security/configuration/tls.md @@ -0,0 +1,208 @@ +--- +layout: default +title: TLS Certificates +parent: Configuration +grand_parent: Security +nav_order: 10 +--- + +# Configure TLS certificates + +TLS is configured in `opensearch.yml`. There are two main configuration sections: the transport layer and the REST layer. TLS is optional for the REST layer and mandatory for the transport layer. + +You can find an example configuration template with all options on [GitHub](https://www.github.com/opensearch-project/security-ssl/blob/master/opensearchsecurity-ssl-config-template.yml). +{: .note } + + +## X.509 PEM certificates and PKCS \#8 keys + +The following tables contain the settings you can use to configure the location of your PEM certificates and private keys. + + +### Transport layer TLS + +Name | Description +:--- | :--- +`opensearch_security.ssl.transport.pemkey_filepath` | Path to the certificate's key file (PKCS \#8), which must be under the `config` directory, specified using a relative path. Required. +`opensearch_security.ssl.transport.pemkey_password` | Key password. Omit this setting if the key has no password. Optional. +`opensearch_security.ssl.transport.pemcert_filepath` | Path to the X.509 node certificate chain (PEM format), which must be under the `config` directory, specified using a relative path. Required. +`opensearch_security.ssl.transport.pemtrustedcas_filepath` | Path to the root CAs (PEM format), which must be under the `config` directory, specified using a relative path. Required. + + +### REST layer TLS + +Name | Description +:--- | :--- +`opensearch_security.ssl.http.pemkey_filepath` | Path to the certificate's key file (PKCS \#8), which must be under the `config` directory, specified using a relative path. Required. +`opensearch_security.ssl.http.pemkey_password` | Key password. Omit this setting if the key has no password. Optional. +`opensearch_security.ssl.http.pemcert_filepath` | Path to the X.509 node certificate chain (PEM format), which must be under the `config` directory, specified using a relative path. Required. +`opensearch_security.ssl.http.pemtrustedcas_filepath` | Path to the root CAs (PEM format), which must be under the `config` directory, specified using a relative path. Required. + + +## Keystore and truststore files + +As an alternative to certificates and private keys in PEM format, you can instead use keystore and truststore files in JKS or PKCS12/PFX format. The following settings configure the location and password of your keystore and truststore files. If you want, you can use different keystore and truststore files for the REST and the transport layer. + + +### Transport layer TLS + +Name | Description +:--- | :--- +`opensearch_security.ssl.transport.keystore_type` | The type of the keystore file, JKS or PKCS12/PFX. Optional. Default is JKS. +`opensearch_security.ssl.transport.keystore_filepath` | Path to the keystore file, which must be under the `config` directory, specified using a relative path. Required. +`opensearch_security.ssl.transport.keystore_alias: my_alias` | Alias name. Optional. Default is the first alias. +`opensearch_security.ssl.transport.keystore_password` | Keystore password. Default is `changeit`. +`opensearch_security.ssl.transport.truststore_type` | The type of the truststore file, JKS or PKCS12/PFX. Default is JKS. +`opensearch_security.ssl.transport.truststore_filepath` | Path to the truststore file, which must be under the `config` directory, specified using a relative path. Required. +`opensearch_security.ssl.transport.truststore_alias` | Alias name. Optional. Default is all certificates. +`opensearch_security.ssl.transport.truststore_password` | Truststore password. Default is `changeit`. + + +### REST layer TLS + +Name | Description +:--- | :--- +`opensearch_security.ssl.http.enabled` | Whether to enable TLS on the REST layer. If enabled, only HTTPS is allowed. Optional. Default is false. +`opensearch_security.ssl.http.keystore_type` | The type of the keystore file, JKS or PKCS12/PFX. Optional. Default is JKS. +`opensearch_security.ssl.http.keystore_filepath` | Path to the keystore file, which must be under the `config` directory, specified using a relative path. Required. +`opensearch_security.ssl.http.keystore_alias` | Alias name. Optional. Default is the first alias. +`opensearch_security.ssl.http.keystore_password` | Keystore password. Default is `changeit`. +`opensearch_security.ssl.http.truststore_type` | The type of the truststore file, JKS or PKCS12/PFX. Default is JKS. +`opensearch_security.ssl.http.truststore_filepath` | Path to the truststore file, which must be under the `config` directory, specified using a relative path. Required. +`opensearch_security.ssl.http.truststore_alias` | Alias name. Optional. Default is all certificates. +`opensearch_security.ssl.http.truststore_password` | Truststore password. Default is `changeit`. + + +## Configure node certificates + +The security plugin needs to identify inter-cluster requests (i.e. requests between the nodes). The simplest way of configuring node certificates is to list the Distinguished Names (DNs) of these certificates in `opensearch.yml`. All DNs must be included in `opensearch.yml` on all nodes. The security plugin supports wildcards and regular expressions: + +```yml +opensearch_security.nodes_dn: + - 'CN=node.other.com,OU=SSL,O=Test,L=Test,C=DE' + - 'CN=*.example.com,OU=SSL,O=Test,L=Test,C=DE' + - 'CN=elk-devcluster*' + - '/CN=.*regex/' +``` + +If your node certificates have an OID identifier in the SAN section, you can omit this configuration. + + +## Configure admin certificates + +Admin certificates are regular client certificates that have elevated rights to perform administrative tasks. You need an admin certificate to change the the security plugin configuration using `plugins/opensearch_security/tools/securityadmin.sh` or the REST API. Admin certificates are configured in `opensearch.yml` by stating their DN(s): + +```yml +opensearch_security.authcz.admin_dn: + - CN=admin,OU=SSL,O=Test,L=Test,C=DE +``` + +For security reasons, you can't use wildcards or regular expressions here. + + +## (Advanced) OpenSSL + +The security plugin supports OpenSSL, but we only recommend it if you use Java 8. If you use Java 11, we recommend the default configuration. + +To use OpenSSL, you must install OpenSSL, the Apache Portable Runtime, and a Netty version with OpenSSL support matching your platform on all nodes. + +If OpenSSL is enabled, but for one reason or another the installation does not work, the security plugin falls back to the Java JCE as the security engine. + +Name | Description +:--- | :--- +`opensearch_security.ssl.transport.enable_openssl_if_available` | Enable OpenSSL on the transport layer if available. Optional. Default is true. +`opensearch_security.ssl.http.enable_openssl_if_available` | Enable OpenSSL on the REST layer if available. Optional. Default is true. + + +{% comment %} +1. Install [OpenSSL 1.1.0](https://www.openssl.org/community/binaries.html) on every node. +1. Install [Apache Portable Runtime](https://apr.apache.org) on every node: + + ``` + sudo yum install apr + ``` +{% endcomment %} + +1. Download the statically-linked JAR that includes OpenSSL, Apache Portable Runtime, and `netty-tcnative` for [RPM-based distributions](https://bintray.com/floragunncom/netty-tcnative/download_file?file_path=netty-tcnative-openssl-1.1.0j-static-2.0.20.Final-fedora-linux-x86_64.jar) or [other distributions](https://bintray.com/floragunncom/netty-tcnative/download_file?file_path=netty-tcnative-openssl-1.1.0j-static-2.0.20.Final-non-fedora-linux-x86_64.jar) and place it in `plugins/opensearch_security/` on every node. + + +## (Advanced) Hostname verification and DNS lookup + +In addition to verifying the TLS certificates against the root CA and/or intermediate CA(s), the security plugin can apply additional checks on the transport layer. + +With `enforce_hostname_verification` enabled, the security plugin verifies that the hostname of the communication partner matches the hostname in the certificate. The hostname is taken from the `subject` or `SAN` entries of your certificate. For example, if the hostname of your node is `node-0.example.com`, then the hostname in the TLS certificate has to be set to `node-0.example.com`, as well. Otherwise, errors are thrown: + +``` +[ERROR][c.a.o.s.s.t.opensearchSecuritySSLNettyTransport] [WX6omJY] SSL Problem No name matching found +[ERROR][c.a.o.s.s.t.opensearchSecuritySSLNettyTransport] [WX6omJY] SSL Problem Received fatal alert: certificate_unknown +``` + +In addition, when `resolve_hostnames` is enabled, the security plugin resolves the (verified) hostname against your DNS. If the hostname does not resolve, errors are thrown: + + +Name | Description +:--- | :--- +`opensearch_security.ssl.transport.enforce_hostname_verification` | Whether to verify hostnames on the transport layer. Optional. Default is true. +`opensearch_security.ssl.transport.resolve_hostname` | Whether to resolve hostnames against DNS on the transport layer. Optional. Default is true. Only works if hostname verification is also enabled. + + +## (Advanced) Client authentication + +With TLS client authentication enabled, REST clients can send a TLS certificate with the HTTP request to provide identity information to the security plugin. There are three main usage scenarios for TLS client authentication: + +- Providing an admin certificate when using the REST management API. +- Configuring roles and permissions based on a client certificate. +- Providing identity information for tools like OpenSearch Dashboards, Logstash, or Beats. + +TLS client authentication has three modes: + +* `NONE`: The security plugin does not accept TLS client certificates. If one is sent, it is discarded. +* `OPTIONAL`: The security plugin accepts TLS client certificates if they are sent, but does not require them. +* `REQUIRE`: The security plugin only accepts REST requests when a valid client TLS certificate is sent. + +For the REST management API, the client authentication modes has to be OPTIONAL at a minimum. + +You can configure the client authentication mode by using the following setting: + +Name | Description +:--- | :--- +opensearch_security.ssl.http.clientauth_mode | The TLS client authentication mode to use. Can be one of `NONE`, `OPTIONAL` (default) or `REQUIRE`. Optional. + + +## (Advanced) Enabled ciphers and protocols + +You can limit the allowed ciphers and TLS protocols for the REST layer. For example, you can only allow strong ciphers and limit the TLS versions to the most recent ones. + +If this setting is not enabled, the ciphers and TLS versions are negotiated between the browser and the security plugin automatically, which in some cases can lead to a weaker cipher suite being used. You can configure the ciphers and protocols using the following settings. + +Name | Description +:--- | :--- +`opensearch_security.ssl.http.enabled_ciphers` | Array, enabled TLS cipher suites for the REST layer. Only Java format is supported. +`opensearch_security.ssl.http.enabled_protocols` | Array, enabled TLS protocols for the REST layer. Only Java format is supported. +`opensearch_security.ssl.transport.enabled_ciphers` | Array, enabled TLS cipher suites for the transport layer. Only Java format is supported. +`opensearch_security.ssl.transport.enabled_protocols` | Array, enabled TLS protocols for the transport layer. Only Java format is supported. + +### Example settings + +```yml +opensearch_security.ssl.http.enabled_ciphers: + - "TLS_DHE_RSA_WITH_AES_256_CBC_SHA" + - "TLS_DHE_DSS_WITH_AES_128_CBC_SHA256" +opensearch_security.ssl.http.enabled_protocols: + - "TLSv1.1" + - "TLSv1.2" +``` + +Because it is insecure, the security plugin disables `TLSv1` by default. If you need to use `TLSv1` and accept the risks, you can still enable it: + +```yml +opensearch_security.ssl.http.enabled_protocols: + - "TLSv1" + - "TLSv1.1" + - "TLSv1.2" +``` + + +## (Advanced) Disable client initiated renegotiation for Java 8 + +Set `-Djdk.tls.rejectClientInitiatedRenegotiation=true` to disable secure client initiated renegotiation, which is enabled by default. This can be set via `ES_JAVA_OPTS` in `config/jvm.options`. diff --git a/docs/security/configuration/yaml.md b/docs/security/configuration/yaml.md new file mode 100644 index 00000000..e23148de --- /dev/null +++ b/docs/security/configuration/yaml.md @@ -0,0 +1,255 @@ +--- +layout: default +title: YAML Files +parent: Configuration +grand_parent: Security +nav_order: 3 +--- + +# YAML files + +Before running `securityadmin.sh` to load the settings into the `.opensearch_security` index, configure the YAML files in `plugins/opensearch_security/securityconfig`. You might want to back up these files so that you can reuse them on other clusters. + +The best use of these YAML files is to configure [reserved and hidden resources](../../access-control/api/#reserved-and-hidden-resources), such as the `admin` and `opensearch-dashboardsserver` users. You might find it easier to create other users, roles, mappings, action groups, and tenants using OpenSearch Dashboards or the REST API. + + +## internal_users.yml + +This file contains any initial users that you want to add to the security plugin's internal user database. + +The file format requires a hashed password. To generate one, run `plugins/opensearch_security/tools/hash.sh -p `. If you decide to keep any of the demo users, *change their passwords* and re-run [securityadmin.sh](../security-admin/) to apply the new passwords. + +```yml +--- +# This is the internal user database +# The hash value is a bcrypt hash and can be generated with plugin/tools/hash.sh + +_meta: + type: "internalusers" + config_version: 2 + +# Define your internal users here +new-user: + hash: "$2y$12$88IFVl6IfIwCFh5aQYfOmuXVL9j2hz/GusQb35o.4sdTDAEMTOD.K" + reserved: false + hidden: false + opensearch_security_roles: + - "specify-some-security-role-here" + backend_roles: + - "specify-some-backend-role-here" + attributes: + attribute1: "value1" + static: false + +## Demo users + +admin: + hash: "$2a$12$VcCDgh2NDk07JGN0rjGbM.Ad41qVR/YFJcgHp0UGns5JDymv..TOG" + reserved: true + backend_roles: + - "admin" + description: "Demo admin user" + +opensearch-dashboardsserver: + hash: "$2a$12$4AcgAt3xwOWadA5s5blL6ev39OXDNhmOesEoo33eZtrq2N0YrU3H." + reserved: true + description: "Demo opensearch-dashboardsserver user" + +opensearch-dashboardsro: + hash: "$2a$12$JJSXNfTowz7Uu5ttXfeYpeYE0arACvcwlPBStB1F.MI7f0U9Z4DGC" + reserved: false + backend_roles: + - "opensearch-dashboardsuser" + - "readall" + attributes: + attribute1: "value1" + attribute2: "value2" + attribute3: "value3" + description: "Demo opensearch-dashboardsro user" + +logstash: + hash: "$2a$12$u1ShR4l4uBS3Uv59Pa2y5.1uQuZBrZtmNfqB3iM/.jL0XoV9sghS2" + reserved: false + backend_roles: + - "logstash" + description: "Demo logstash user" + +readall: + hash: "$2a$12$ae4ycwzwvLtZxwZ82RmiEunBbIPiAmGZduBAjKN0TXdwQFtCwARz2" + reserved: false + backend_roles: + - "readall" + description: "Demo readall user" + +snapshotrestore: + hash: "$2y$12$DpwmetHKwgYnorbgdvORCenv4NAK8cPUg8AI6pxLCuWf/ALc0.v7W" + reserved: false + backend_roles: + - "snapshotrestore" + description: "Demo snapshotrestore user" +``` + + +## roles.yml + +This file contains any initial roles that you want to add to the security plugin. Aside from some metadata, the default file is empty, because the security plugin has a number of static roles that it adds automatically. + +```yml +--- +complex-role: + reserved: false + hidden: false + cluster_permissions: + - "read" + - "cluster:monitor/nodes/stats" + - "cluster:monitor/task/get" + index_permissions: + - index_patterns: + - "opensearch_dashboards_sample_data_*" + dls: "{\"match\": {\"FlightDelay\": true}}" + fls: + - "~FlightNum" + masked_fields: + - "Carrier" + allowed_actions: + - "read" + tenant_permissions: + - tenant_patterns: + - "analyst_*" + allowed_actions: + - "opensearch_dashboards_all_write" + static: false +_meta: + type: "roles" + config_version: 2 +``` + + +## roles_mapping.yml + +```yml +--- +manage_snapshots: + reserved: true + hidden: false + backend_roles: + - "snapshotrestore" + hosts: [] + users: [] + and_backend_roles: [] +logstash: + reserved: false + hidden: false + backend_roles: + - "logstash" + hosts: [] + users: [] + and_backend_roles: [] +own_index: + reserved: false + hidden: false + backend_roles: [] + hosts: [] + users: + - "*" + and_backend_roles: [] + description: "Allow full access to an index named like the username" +opensearch_dashboards_user: + reserved: false + hidden: false + backend_roles: + - "opensearch-dashboardsuser" + hosts: [] + users: [] + and_backend_roles: [] + description: "Maps opensearch-dashboardsuser to opensearch_dashboards_user" +complex-role: + reserved: false + hidden: false + backend_roles: + - "ldap-analyst" + hosts: [] + users: + - "new-user" + and_backend_roles: [] +_meta: + type: "rolesmapping" + config_version: 2 +all_access: + reserved: true + hidden: false + backend_roles: + - "admin" + hosts: [] + users: [] + and_backend_roles: [] + description: "Maps admin to all_access" +readall: + reserved: true + hidden: false + backend_roles: + - "readall" + hosts: [] + users: [] + and_backend_roles: [] +opensearch_dashboards_server: + reserved: true + hidden: false + backend_roles: [] + hosts: [] + users: + - "opensearch-dashboardsserver" + and_backend_roles: [] +``` + + +## action_groups.yml + +This file contains any initial action groups that you want to add to the security plugin. + +Aside from some metadata, the default file is empty, because the security plugin has a number of static action groups that it adds automatically. These static action groups cover a wide variety of use cases and are a great way to get started with the plugin. + +```yml +--- +my-action-group: + reserved: false + hidden: false + allowed_actions: + - "indices:data/write/index*" + - "indices:data/write/update*" + - "indices:admin/mapping/put" + - "indices:data/write/bulk*" + - "read" + - "write" + static: false +_meta: + type: "actiongroups" + config_version: 2 +``` + +## tenants.yml + +```yml +--- +_meta: + type: "tenants" + config_version: 2 +admin_tenant: + reserved: false + description: "Demo tenant for admin user" +``` + + +## nodes_dn.yml + +```yml +--- +_meta: + type: "nodesdn" + config_version: 2 + +# Define nodesdn mapping name and corresponding values +# cluster1: +# nodes_dn: +# - CN=*.example.com +``` diff --git a/docs/security/index.md b/docs/security/index.md new file mode 100755 index 00000000..d61436fe --- /dev/null +++ b/docs/security/index.md @@ -0,0 +1,22 @@ +--- +layout: default +title: Security +nav_order: 20 +has_children: true +has_toc: false +--- + +# Security + +OpenSearch has its own security plugin for authentication and access control. The plugin provides numerous features to help you secure your cluster. + +Feature | Description +:--- | :--- +Node-to-node encryption | Encrypts traffic between nodes in the OpenSearch cluster. +HTTP basic authentication | A simple authentication method that includes a user name and password as part of the HTTP request. +Support for Active Directory, LDAP, Kerberos, SAML, and OpenID Connect | Use existing, industry-standard infrastructure to authenticate users, or create new users in the internal user database. +Role-based access control | Roles define the actions that users can perform: the data they can read, the cluster settings they can modify, the indices to which they can write, and so on. Roles are reusable across users, and users can have multiple roles. +Index-level, document-level, and field-level security | Restrict access to entire indices, certain documents within an index, or certain fields within documents. +Audit logging | These logs let you track access to your OpenSearch cluster and are useful for compliance purposes or after unintended data exposure. +Cross-cluster search | Use a coordinating cluster to securely send search requests to remote clusters. +OpenSearch Dashboards multi-tenancy | Create shared (or private) spaces for visualizations and dashboards. diff --git a/docs/sql/aggregations.md b/docs/sql/aggregations.md new file mode 100644 index 00000000..688ec1b9 --- /dev/null +++ b/docs/sql/aggregations.md @@ -0,0 +1,148 @@ +--- +layout: default +title: Aggregation Functions +parent: SQL +nav_order: 11 +--- + +# Aggregation functions + +Aggregate functions use the `GROUP BY` clause to group sets of values into subsets. + +## Group By + +Use the `GROUP BY` clause as an identifier, ordinal, or expression. + +### Identifier + +```sql +SELECT gender, sum(age) FROM accounts GROUP BY gender; +``` + +| gender | sum (age) +:--- | :--- +F | 28 | +M | 101 | + +### Ordinal + +```sql +SELECT gender, sum(age) FROM accounts GROUP BY 1; +``` + +| gender | sum (age) +:--- | :--- +F | 28 | +M | 101 | + +### Expression + +```sql +SELECT abs(account_number), sum(age) FROM accounts GROUP BY abs(account_number); +``` + +| abs(account_number) | sum (age) +:--- | :--- +| 1 | 32 | +| 13 | 28 | +| 18 | 33 | +| 6 | 36 | + +## Aggregation + +Use aggregations as a select, expression, or an argument of an expression. + +### Select + +```sql +SELECT gender, sum(age) FROM accounts GROUP BY gender; +``` + +| gender | sum (age) +:--- | :--- +F | 28 | +M | 101 | + +### Argument + +```sql +SELECT gender, sum(age) * 2 as sum2 FROM accounts GROUP BY gender; +``` + +| gender | sum2 +:--- | :--- +F | 56 | +M | 202 | + +### Expression + +```sql +SELECT gender, sum(age * 2) as sum2 FROM accounts GROUP BY gender; +``` + +| gender | sum2 +:--- | :--- +F | 56 | +M | 202 | + +### COUNT + +Use the `COUNT` function to accept arguments such as a `*` or a literal like `1`. +The meaning of these different forms are as follows: + +- `COUNT(field)` - Only counts if given a field (or expression) is not null or missing in the input rows. +- `COUNT(*)` - Counts the number of all its input rows. +- `COUNT(1)` (same as `COUNT(*)`) - Counts any non-null literal. + +## Having + +Use the `HAVING` clause to filter out aggregated values. + +### HAVING with GROUP BY + +You can use aggregate expressions or its aliases defined in a `SELECT` clause in a `HAVING` condition. + +We recommend using a non-aggregate expression in the `WHERE` clause although you can do this in a `HAVING` clause. + +The aggregations in a `HAVING` clause are not necessarily the same as that in a select list. As an extension to the SQL standard, you're not restricted to using identifiers only in the `GROUP BY` list. +For example: + +```sql +SELECT gender, sum(age) +FROM accounts +GROUP BY gender +HAVING sum(age) > 100; +``` + +| gender | sum (age) +:--- | :--- +M | 101 | + +Here's another example for using an alias in a `HAVING` condition. + +```sql +SELECT gender, sum(age) AS s +FROM accounts +GROUP BY gender +HAVING s > 100; +``` + +| gender | s +:--- | :--- +M | 101 | + +If an identifier is ambiguous, for example, present both as a select alias and as an index field (preference is alias). In this case, the identifier is replaced with an expression aliased in the `SELECT` clause: + +### HAVING without GROUP BY + +You can use a `HAVING` clause without the `GROUP BY` clause. This is useful because aggregations are not supported in a `WHERE` clause: + +```sql +SELECT 'Total of age > 100' +FROM accounts +HAVING sum(age) > 100; +``` + +| Total of age > 100 | +:--- | +Total of age > 100 | diff --git a/docs/sql/basic.md b/docs/sql/basic.md new file mode 100644 index 00000000..c074ece9 --- /dev/null +++ b/docs/sql/basic.md @@ -0,0 +1,359 @@ +--- +layout: default +title: Basic Queries +parent: SQL +nav_order: 5 +--- + + +# Basic queries + +Use the `SELECT` clause, along with `FROM`, `WHERE`, `GROUP BY`, `HAVING`, `ORDER BY`, and `LIMIT` to search and aggregate data. + +Among these clauses, `SELECT` and `FROM` are required, as they specify which fields to retrieve and which indices to retrieve them from. All other clauses are optional. Use them according to your needs. + +### Syntax + +The complete syntax for searching and aggregating data is as follows: + +```sql +SELECT [DISTINCT] (* | expression) [[AS] alias] [, ...] +FROM index_name +[WHERE predicates] +[GROUP BY expression [, ...] + [HAVING predicates]] +[ORDER BY expression [IS [NOT] NULL] [ASC | DESC] [, ...]] +[LIMIT [offset, ] size] +``` + +### Fundamentals + +Apart from the predefined keywords of SQL, the most basic elements are literal and identifiers. +A literal is a numeric, string, date or boolean constant. An identifier is an OpenSearch index or field name. +With arithmetic operators and SQL functions, use literals and identifiers to build complex expressions. + +Rule `expressionAtom`: + +![expressionAtom](../../images/expressionAtom.png) + +The expression in turn can be combined into a predicate with logical operator. Use a predicate in the `WHERE` and `HAVING` clause to filter out data by specific conditions. + +Rule `expression`: + +![expression](../../images/expression.png) + +Rule `predicate`: + +![expression](../../images/predicate.png) + +### Execution Order + +These SQL clauses execute in an order different from how they appear: + +```sql +FROM index + WHERE predicates + GROUP BY expressions + HAVING predicates + SELECT expressions + ORDER BY expressions + LIMIT size +``` + +## Select + +Specify the fields to be retrieved. + +### Syntax + +Rule `selectElements`: + +![selectElements](../../images/selectElements.png) + +Rule `selectElement`: + +![selectElements](../../images/selectElement.png) + +*Example 1*: Use `*` to retrieve all fields in an index: + +```sql +SELECT * +FROM accounts +``` + +| id | account_number | firstname | gender | city | balance | employer | state | email | address | lastname | age +:--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- +0 | 1 | Amber | M | Brogan | 39225 | Pyrami | IL | amberduke@pyrami.com | 880 Holmes Lane | Duke | 32 +1 | 16 | Hattie | M | Dante | 5686 | Netagy | TN | hattiebond@netagy.com | 671 Bristol Street | Bond | 36 +2 | 13 | Nanette | F | Nogal | 32838 | Quility | VA | nanettebates@quility.com | 789 Madison Street | Bates | 28 +3 | 18 | Dale | M | Orick | 4180 | | MD | daleadams@boink.com | 467 Hutchinson Court | Adams | 33 + +*Example 2*: Use field name(s) to retrieve only specific fields: + +```sql +SELECT firstname, lastname +FROM accounts +``` + +| id | firstname | lastname +:--- | :--- | :--- +0 | Amber | Duke +1 | Hattie | Bond +2 | Nanette | Bates +3 | Dale | Adams + +*Example 3*: Use field aliases instead of field names. Field aliases are used to make field names more readable: + +```sql +SELECT account_number AS num +FROM accounts +``` + +| id | num +:--- | :--- +0 | 1 +1 | 6 +2 | 13 +3 | 18 + +*Example 4*: Use the `DISTINCT` clause to get back only unique field values. You can specify one or more field names: + +```sql +SELECT DISTINCT age +FROM accounts +``` + +| id | age +:--- | :--- +0 | 28 +1 | 32 +2 | 33 +3 | 36 + +## From + +Specify the index that you want search. +You can specify subqueries within the `FROM` clause. + +### Syntax + +Rule `tableName`: + +![tableName](../../images/tableName.png) + +*Example 1*: Use index aliases to query across indexes. To learn about index aliases, see [Index Alias](../opensearch/index-alias/). +In this sample query, `acc` is an alias for the `accounts` index: + +```sql +SELECT account_number, accounts.age +FROM accounts +``` + +or + +```sql +SELECT account_number, acc.age +FROM accounts acc +``` + +| id | account_number | age +:--- | :--- | :--- +0 | 1 | 32 +1 | 6 | 36 +2 | 13 | 28 +3 | 18 | 33 + +*Example 2*: Use index patterns to query indices that match a specific pattern: + +```sql +SELECT account_number +FROM account* +``` + +| id | account_number +:--- | :--- +0 | 1 +1 | 6 +2 | 13 +3 | 18 + +## Where + +Specify a condition to filter the results. + +| Operators | Behavior +:--- | :--- +`=` | Equal to. +`<>` | Not equal to. +`>` | Greater than. +`<` | Less than. +`>=` | Greater than or equal to. +`<=` | Less than or equal to. +`IN` | Specify multiple `OR` operators. +`BETWEEN` | Similar to a range query. For more information about range queries, see [Range query](../opensearch/term/#range). +`LIKE` | Use for full text search. For more information about full-text queries, see [Full-text queries](../opensearch/full-text/). +`IS NULL` | Check if the field value is `NULL`. +`IS NOT NULL` | Check if the field value is `NOT NULL`. + +Combine comparison operators (`=`, `<>`, `>`, `>=`, `<`, `<=`) with boolean operators `NOT`, `AND`, or `OR` to build more complex expressions. + +*Example 1*: Use comparison operators for numbers, strings, or dates: + +```sql +SELECT account_number +FROM accounts +WHERE account_number = 1 +``` + +| id | account_number +:--- | :--- +0 | 1 + +*Example 2*: OpenSearch allows for flexible schema so documents in an index may have different fields. Use `IS NULL` or `IS NOT NULL` to retrieve only missing fields or existing fields. We do not differentiate between missing fields and fields explicitly set to `NULL`: + +```sql +SELECT account_number, employer +FROM accounts +WHERE employer IS NULL +``` + +| id | account_number | employer +:--- | :--- | :--- +0 | 18 | + +*Example 3*: Deletes a document that satisfies the predicates in the `WHERE` clause: + +```sql +DELETE FROM accounts +WHERE age > 30 +``` + +## Group By + +Group documents with the same field value into buckets. + +*Example 1*: Group by fields: + +```sql +SELECT age +FROM accounts +GROUP BY age +``` + +| id | age +:--- | :--- +0 | 28 +1 | 32 +2 | 33 +3 | 36 + +*Example 2*: Group by field alias: + +```sql +SELECT account_number AS num +FROM accounts +GROUP BY num +``` + +| id | num +:--- | :--- +0 | 1 +1 | 6 +2 | 13 +3 | 18 + +*Example 4*: Use scalar functions in the `GROUP BY` clause: + +```sql +SELECT ABS(age) AS a +FROM accounts +GROUP BY ABS(age) +``` + +| id | a +:--- | :--- +0 | 28.0 +1 | 32.0 +2 | 33.0 +3 | 36.0 + +## Having + +Use the `HAVING` clause to aggregate inside each bucket based on aggregation functions (`COUNT`, `AVG`, `SUM`, `MIN`, and `MAX`). +The `HAVING` clause filters results from the `GROUP BY` clause: + +*Example 1*: + +```sql +SELECT age, MAX(balance) +FROM accounts +GROUP BY age HAVING MIN(balance) > 10000 +``` + +| id | age | MAX (balance) +:--- | :--- +0 | 28 | 32838 +1 | 32 | 39225 + +## Order By + +Use the `ORDER BY` clause to sort results into your desired order. + +*Example 1*: Use `ORDER BY` to sort by ascending or descending order. Besides regular field names, using `ordinal`, `alias`, or `scalar` functions are supported: + +```sql +SELECT account_number +FROM accounts +ORDER BY account_number DESC +``` + +| id | account_number +:--- | :--- +0 | 18 +1 | 13 +2 | 6 +3 | 1 + +*Example 2*: Specify if documents with missing fields are to be put at the beginning or at the end of the results. The default behavior of OpenSearch is to return nulls or missing fields at the end. To push them before non-nulls, use the `IS NOT NULL` operator: + +```sql +SELECT employer +FROM accounts +ORDER BY employer IS NOT NULL +``` + +| id | employer +:--- | :--- +0 | +1 | Netagy +2 | Pyrami +3 | Quility + +## Limit + +Specify the maximum number of documents that you want to retrieve. Used to prevent fetching large amounts of data into memory. + +*Example 1*: If you pass in a single argument, it's mapped to the `size` parameter in OpenSearch and the `from` parameter is set to 0. + +```sql +SELECT account_number +FROM accounts +ORDER BY account_number LIMIT 1 +``` + +| id | account_number +:--- | :--- +0 | 1 + +*Example 2*: If you pass in two arguments, the first is mapped to the `from` parameter and the second to the `size` parameter in OpenSearch. You can use this for simple pagination for small indices, as it's inefficient for large indices. +Use `ORDER BY` to ensure the same order between pages: + +```sql +SELECT account_number +FROM accounts +ORDER BY account_number LIMIT 1, 1 +``` + +| id | account_number +:--- | :--- +0 | 6 diff --git a/docs/sql/cli.md b/docs/sql/cli.md new file mode 100644 index 00000000..a385f6bf --- /dev/null +++ b/docs/sql/cli.md @@ -0,0 +1,101 @@ +--- +layout: default +title: SQL CLI +parent: SQL +nav_order: 2 +--- + +# SQL CLI + +SQL CLI is a stand-alone Python application that you can launch with the `opensearchsql` command. + +Install the SQL plugin to your OpenSearch instance, run the CLI using MacOS or Linux, and connect to any valid OpenSearch end-point. + +![SQL CLI](../../images/cli.gif) + +## Features + +SQL CLI has the following features: + +- Multi-line input +- Autocomplete for SQL syntax and index names +- Syntax highlighting +- Formatted output: + - Tabular format + - Field names with color + - Enabled horizontal display (by default) and vertical display when output is too wide for your terminal, for better visualization + - Pagination for large output +- Works with or without security enabled +- Supports loading configuration files +- Supports all SQL plugin queries + +## Install + +Launch your local OpenSearch instance and make sure you have the SQL plugin installed. + +To install the SQL CLI: + +1. We suggest you install and activate a python3 virtual environment to avoid changing your local environment: +``` +pip install virtualenv +virtualenv venv +cd venv +source ./bin/activate +``` + +2. Install the CLI: +``` +pip3 install opensearch-sql-cli +``` + +The SQL CLI only works with Python 3. +{: .note } + +3. To launch the CLI, run: +``` +opensearchsql https://localhost:9200 --username admin --password admin +``` +By default, the `opensearchsql` command connects to http://localhost:9200. + +## Configure + +When you first launch the SQL CLI, a configuration file is automatically created at `~/.config/opensearchsql-cli/config` (for MacOS and Linux), the configuration is auto-loaded thereafter. + +You can configure the following connection properties: + +- `endpoint`: You do not need to specify an option, anything that follows the launch command `opensearchsql` is considered as the endpoint. If you do not provide an endpoint, by default, the SQL CLI connects to http://localhost:9200. +- `-u/-w`: Supports username and password for HTTP basic authentication, such as with the security plugin or fine-grained access control for Amazon OpenSearch Service. +- `--aws-auth`: Turns on AWS sigV4 authentication to connect to an Amazon OpenSearch endpoint. Use with the AWS CLI (`aws configure`) to retrieve the local AWS configuration to authenticate and connect. + +For a list of all available configurations, see [clirc](https://github.com/opensearch-project/sql-cli/blob/master/src/conf/clirc). + +## Using the CLI + +1. Save the sample [accounts test data](https://github.com/opensearch-project/sql/blob/master/src/test/resources/doctest/testdata/accounts.json) file. + +1. Index the sample data. +``` +curl -H "Content-Type: application/x-ndjson" -POST https://localhost:9200/data/_bulk -u 'admin:admin' --insecure --data-binary "@accounts.json" +``` + +1. Run a sample SQL command: +``` +SELECT * FROM accounts; +``` + +By default, you see a maximum output of 200 rows. To show more results, add a `LIMIT` clause with the desired value. + +## Query options + +Run a single query with the following options: + +- `--help`: Help page for options +- `-q`: Follow by a single query +- `-f`: Specify JDBC or raw format output +- `-v`: Display data vertically +- `-e`: Translate SQL to DSL + +## CLI options + +- `-p`: Always use pager to display output +- `--clirc`: Provide path for the configuration file diff --git a/docs/sql/complex.md b/docs/sql/complex.md new file mode 100644 index 00000000..4222aedc --- /dev/null +++ b/docs/sql/complex.md @@ -0,0 +1,420 @@ +--- +layout: default +title: Complex Queries +parent: SQL +nav_order: 6 +--- + +# Complex queries + +Besides simple SFW (`SELECT-FROM-WHERE`) queries, the SQL plugin supports complex queries such as subquery, join, union, and minus. These queries operate on more than one OpenSearch index. To examine how these queries execute behind the scenes, use the `explain` operation. + + +## Joins + +OpenSearch SQL supports inner joins, cross joins, and left outer joins. + +### Constraints + +Joins have a number of constraints: + +1. You can only join two indices. +1. You must use aliases for indices (e.g. `people p`). +1. Within an ON clause, you can only use AND conditions. +1. In a WHERE statement, don't combine trees that contain multiple indices. For example, the following statement works: + + ``` + WHERE (a.type1 > 3 OR a.type1 < 0) AND (b.type2 > 4 OR b.type2 < -1) + ``` + + The following statement does not: + + ``` + WHERE (a.type1 > 3 OR b.type2 < 0) AND (a.type1 > 4 OR b.type2 < -1) + ``` + +1. You can't use GROUP BY or ORDER BY for results. +1. LIMIT with OFFSET (e.g. `LIMIT 25 OFFSET 25`) is not supported. + +### Description + +The `JOIN` clause combines columns from one or more indices using values common to each. + +### Syntax + +Rule `tableSource`: + +![tableSource](../../images/tableSource.png) + +Rule `joinPart`: + +![joinPart](../../images/joinPart.png) + +### Example 1: Inner join + +Inner join creates a new result set by combining columns of two indices based on your join predicates. It iterates the two indices and compares each document to find the ones that satisfy the join predicates. You can optionally precede the `JOIN` clause with an `INNER` keyword. + +The join predicate(s) is specified by the ON clause. + +SQL query: + +```sql +SELECT + a.account_number, a.firstname, a.lastname, + e.id, e.name +FROM accounts a +JOIN employees_nested e + ON a.account_number = e.id +``` + +Explain: + +The `explain` output is complicated, because a `JOIN` clause is associated with two OpenSearch DSL queries that execute in separate query planner frameworks. You can interpret it by examining the `Physical Plan` and `Logical Plan` objects. + +```json +{ + "Physical Plan" : { + "Project [ columns=[a.account_number, a.firstname, a.lastname, e.name, e.id] ]" : { + "Top [ count=200 ]" : { + "BlockHashJoin[ conditions=( a.account_number = e.id ), type=JOIN, blockSize=[FixedBlockSize with size=10000] ]" : { + "Scroll [ employees_nested as e, pageSize=10000 ]" : { + "request" : { + "size" : 200, + "from" : 0, + "_source" : { + "excludes" : [ ], + "includes" : [ + "id", + "name" + ] + } + } + }, + "Scroll [ accounts as a, pageSize=10000 ]" : { + "request" : { + "size" : 200, + "from" : 0, + "_source" : { + "excludes" : [ ], + "includes" : [ + "account_number", + "firstname", + "lastname" + ] + } + } + }, + "useTermsFilterOptimization" : false + } + } + } + }, + "description" : "Hash Join algorithm builds hash table based on result of first query, and then probes hash table to find matched rows for each row returned by second query", + "Logical Plan" : { + "Project [ columns=[a.account_number, a.firstname, a.lastname, e.name, e.id] ]" : { + "Top [ count=200 ]" : { + "Join [ conditions=( a.account_number = e.id ) type=JOIN ]" : { + "Group" : [ + { + "Project [ columns=[a.account_number, a.firstname, a.lastname] ]" : { + "TableScan" : { + "tableAlias" : "a", + "tableName" : "accounts" + } + } + }, + { + "Project [ columns=[e.name, e.id] ]" : { + "TableScan" : { + "tableAlias" : "e", + "tableName" : "employees_nested" + } + } + } + ] + } + } + } + } +} +``` + +Result set: + +| a.account_number | a.firstname | a.lastname | e.id | e.name +:--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- +6 | Hattie | Bond | 6 | Jane Smith + +### Example 2: Cross join + +Cross join, also known as cartesian join, combines each document from the first index with each document from the second. +The result set is the the cartesian product of documents of both indices. +This operation is similar to the inner join without the `ON` clause that specifies the join condition. + +It's risky to perform cross join on two indices of large or even medium size. It might trigger a circuit breaker that terminates the query to avoid running out of memory. +{: .warning } + +SQL query: + +```sql +SELECT + a.account_number, a.firstname, a.lastname, + e.id, e.name +FROM accounts a +JOIN employees_nested e +``` + +Result set: + +| a.account_number | a.firstname | a.lastname | e.id | e.name +:--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- +1 | Amber | Duke | 3 | Bob Smith +1 | Amber | Duke | 4 | Susan Smith +1 | Amber | Duke | 6 | Jane Smith +6 | Hattie | Bond | 3 | Bob Smith +6 | Hattie | Bond | 4 | Susan Smith +6 | Hattie | Bond | 6 | Jane Smith +13 | Nanette | Bates | 3 | Bob Smith +13 | Nanette | Bates | 4 | Susan Smith +13 | Nanette | Bates | 6 | Jane Smith +18 | Dale | Adams | 3 | Bob Smith +18 | Dale | Adams | 4 | Susan Smith +18 | Dale | Adams | 6 | Jane Smith + +### Example 3: Left outer join + +Use left outer join to retain rows from the first index if it does not satisfy the join predicate. The keyword `OUTER` is optional. + +SQL query: + +```sql +SELECT + a.account_number, a.firstname, a.lastname, + e.id, e.name +FROM accounts a +LEFT JOIN employees_nested e + ON a.account_number = e.id +``` + +Result set: + +| a.account_number | a.firstname | a.lastname | e.id | e.name +:--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- +1 | Amber | Duke | null | null +6 | Hattie | Bond | 6 | Jane Smith +13 | Nanette | Bates | null | null +18 | Dale | Adams | null | null + +## Subquery + +A subquery is a complete `SELECT` statement used within another statement and enclosed in parenthesis. +From the explain output, you can see that some subqueries are actually transformed to an equivalent join query to execute. + +### Example 1: Table subquery + +SQL query: + +```sql +SELECT a1.firstname, a1.lastname, a1.balance +FROM accounts a1 +WHERE a1.account_number IN ( + SELECT a2.account_number + FROM accounts a2 + WHERE a2.balance > 10000 +) +``` + +Explain: + +```json +{ + "Physical Plan" : { + "Project [ columns=[a1.balance, a1.firstname, a1.lastname] ]" : { + "Top [ count=200 ]" : { + "BlockHashJoin[ conditions=( a1.account_number = a2.account_number ), type=JOIN, blockSize=[FixedBlockSize with size=10000] ]" : { + "Scroll [ accounts as a2, pageSize=10000 ]" : { + "request" : { + "size" : 200, + "query" : { + "bool" : { + "filter" : [ + { + "bool" : { + "adjust_pure_negative" : true, + "must" : [ + { + "bool" : { + "adjust_pure_negative" : true, + "must" : [ + { + "bool" : { + "adjust_pure_negative" : true, + "must_not" : [ + { + "bool" : { + "adjust_pure_negative" : true, + "must_not" : [ + { + "exists" : { + "field" : "account_number", + "boost" : 1 + } + } + ], + "boost" : 1 + } + } + ], + "boost" : 1 + } + }, + { + "range" : { + "balance" : { + "include_lower" : false, + "include_upper" : true, + "from" : 10000, + "boost" : 1, + "to" : null + } + } + } + ], + "boost" : 1 + } + } + ], + "boost" : 1 + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1 + } + }, + "from" : 0 + } + }, + "Scroll [ accounts as a1, pageSize=10000 ]" : { + "request" : { + "size" : 200, + "from" : 0, + "_source" : { + "excludes" : [ ], + "includes" : [ + "firstname", + "lastname", + "balance", + "account_number" + ] + } + } + }, + "useTermsFilterOptimization" : false + } + } + } + }, + "description" : "Hash Join algorithm builds hash table based on result of first query, and then probes hash table to find matched rows for each row returned by second query", + "Logical Plan" : { + "Project [ columns=[a1.balance, a1.firstname, a1.lastname] ]" : { + "Top [ count=200 ]" : { + "Join [ conditions=( a1.account_number = a2.account_number ) type=JOIN ]" : { + "Group" : [ + { + "Project [ columns=[a1.balance, a1.firstname, a1.lastname, a1.account_number] ]" : { + "TableScan" : { + "tableAlias" : "a1", + "tableName" : "accounts" + } + } + }, + { + "Project [ columns=[a2.account_number] ]" : { + "Filter [ conditions=[AND ( AND account_number ISN null, AND balance GT 10000 ) ] ]" : { + "TableScan" : { + "tableAlias" : "a2", + "tableName" : "accounts" + } + } + } + } + ] + } + } + } + } +} +``` + +Result set: + +| a1.firstname | a1.lastname | a1.balance +:--- | :--- | :--- | :--- | :--- | :--- +Amber | Duke | 39225 +Nanette | Bates | 32838 + +### Example 2: From subquery + +SQL query: + +```sql +SELECT a.f, a.l, a.a +FROM ( + SELECT firstname AS f, lastname AS l, age AS a + FROM accounts + WHERE age > 30 +) AS a +``` + +Explain: + +```json +{ + "from" : 0, + "size" : 200, + "query" : { + "bool" : { + "filter" : [ + { + "bool" : { + "must" : [ + { + "range" : { + "age" : { + "from" : 30, + "to" : null, + "include_lower" : false, + "include_upper" : true, + "boost" : 1.0 + } + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + }, + "_source" : { + "includes" : [ + "firstname", + "lastname", + "age" + ], + "excludes" : [ ] + } +} +``` + +Result set: + +| f | l | a +:--- | :--- | :--- +Amber | Duke | 32 +Dale | Adams | 33 +Hattie | Bond | 36 diff --git a/docs/sql/datatypes.md b/docs/sql/datatypes.md new file mode 100644 index 00000000..36a6f4e8 --- /dev/null +++ b/docs/sql/datatypes.md @@ -0,0 +1,117 @@ +--- +layout: default +title: Data Types +parent: SQL +nav_order: 73 +--- + +# Data types + +The following table shows the data types supported by the SQL plugin and how each one maps to SQL and OpenSearch data types: + +| OpenSearch SQL Type | OpenSearch Type | SQL Type +:--- | :--- | :--- +boolean | boolean | BOOLEAN +byte | byte | TINYINT +short | byte | SMALLINT +integer | integer | INTEGER +long | long | BIGINT +float | float | REAL +half_float | float | FLOAT +scaled_float | float | DOUBLE +double | double | DOUBLE +keyword | string | VARCHAR +text | text | VARCHAR +date | timestamp | TIMESTAMP +ip | ip | VARCHAR +date | timestamp | TIMESTAMP +binary | binary | VARBINARY +object | struct | STRUCT +nested | array | STRUCT + +In addition to this list, the SQL plugin also supports the `datetime` type, though it doesn't have a corresponding mapping with OpenSearch or SQL. +To use a function without a corresponding mapping, you must explicitly convert the data type to one that does. + + +## Date and time types + +The date and time types represent a time period: `DATE`, `TIME`, `DATETIME`, `TIMESTAMP`, and `INTERVAL`. By default, the OpenSearch DSL uses the `date` type as the only date-time related type that contains all information of an absolute time point. + +To integrate with SQL, each type other than the `timestamp` type holds part of the time period information. To use date-time functions, see [datetime](../functions#date-and-time). Some functions might have restrictions for the input argument type. + + +### Date + +The `date` type represents the calendar date regardless of the time zone. A given date value is a 24-hour period, but this period varies in different timezones and might have flexible hours during daylight saving programs. The `date` type doesn't contain time information and it only supports a range of `1000-01-01` to `9999-12-31`. + +| Type | Syntax | Range +:--- | :--- | :--- +date | `yyyy-MM-dd` | `0001-01-01` to `9999-12-31` + +### Time + +The `time` type represents the time of a clock regardless of its timezone. The `time` type doesn't contain date information. + +| Type | Syntax | Range +:--- | :--- | :--- +time | `hh:mm:ss[.fraction]` | `00:00:00.000000` to `23:59:59.999999` + +### Datetime + +The `datetime` type is a combination of date and time. It doesn't contain timezone information. For an absolute time point that contains date, time, and timezone information, see [Timestamp](#timestamp). + +| Type | Syntax | Range +:--- | :--- | :--- +datetime | `yyyy-MM-dd hh:mm:ss[.fraction]` | `0001-01-01 00:00:00.000000` to `9999-12-31 23:59:59.999999` + +### Timestamp + +The `timestamp` type is an absolute instance independent of timezone or convention. For example, for a given point of time, if you change the timestamp to a different timezone, its value changes accordingly. + +The `timestamp` type is stored differently from the other types. It's converted from its current timezone to UTC for storage and converted back to its set timezone from UTC when it's retrieved. + +| Type | Syntax | Range +:--- | :--- | :--- +timestamp | `yyyy-MM-dd hh:mm:ss[.fraction]` | `0001-01-01 00:00:01.000000` UTC to `9999-12-31 23:59:59.999999` + +### Interval + +The `interval` type represents a temporal duration or a period. + +| Type | Syntax +:--- | :--- | :--- +interval | `INTERVAL expr unit` + +The `expr` unit is any expression that eventually iterates to a quantity value. It represents a unit for interpreting the quantity, including `MICROSECOND`, `SECOND`, `MINUTE`, `HOUR`, `DAY`, `WEEK`, `MONTH`, `QUARTER`, and `YEAR`. The `INTERVAL` keyword and the unit specifier are not case sensitive. + +The `interval` type has two classes of intervals: year-week intervals and day-time intervals. + +- Year-week intervals store years, quarters, months, and weeks. +- Day-time intervals store days, hours, minutes, seconds, and microseconds. + + +### Convert between date and time types + +Apart from the `interval` type, all date and time types can be converted to each other. The conversion might alter the value or cause some information loss. For example, when extracting the `time` value from a `datetime` value, or converting a `date` value to a `datetime` value, and so on. + +The SQL plugin supports the following conversion rules for each of the types: + +**Convert from date** + +- Because the `date` value doesn't have any time information, conversion to the `time` type isn't useful and always returns a zero time value of `00:00:00`. +- Converting from `date` to `datetime` has a data fill-up due to the lack of time information. It attaches the time `00:00:00` to the original date by default and forms a `datetime` instance. For example, conversion of `2020-08-17` to a `datetime` type is `2020-08-17 00:00:00`. +- Converting to `timestamp` type alternates both the `time` value and the `timezone` information. It attaches the zero time value `00:00:00` and the session timezone (UTC by default) to the date. For example, conversion of `2020-08-17` to a `datetime` type with a session timezone UTC is `2020-08-17 00:00:00 UTC`. + +**Convert from time** + +- You cannot convert the `time` type to any other date and time types because it doesn't contain any date information. + +**Convert from datetime** + +- Converting `datetime` to `date` extracts the date value from the `datetime` value. For example, conversion of `2020-08-17 14:09:00` to a `date` type is `2020-08-08`. +- Converting `datetime` to `time` extracts the time value from the `datetime` value. For example, conversion of `2020-08-17 14:09:00` to a `time` type is `14:09:00`. +- Because the `datetime` type doesn't contain timezone information, converting to `timestamp` type fills up the timezone value with the session timezone. For example, conversion of `2020-08-17 14:09:00` (UTC) to a `timestamp` type is `2020-08-17 14:09:00 UTC`. + +**Convert from timestamp** + +- Converting from a `timestamp` type to a `date` type extracts the date value and converting to a `time` type extracts the time value. Converting from a `timestamp` type to `datetime` type extracts only the `datetime` value and leaves out the timezone value. For example, conversion of `2020-08-17 14:09:00` UTC to a `date` type is `2020-08-17`, to a `time` type is `14:09:00`, and to a `datetime` type is `2020-08-17 14:09:00`. diff --git a/docs/sql/delete.md b/docs/sql/delete.md new file mode 100644 index 00000000..90e534d1 --- /dev/null +++ b/docs/sql/delete.md @@ -0,0 +1,78 @@ +--- +layout: default +title: Delete +parent: SQL +nav_order: 12 +--- + + +# Delete + +The `DELETE` statement deletes documents that satisfy the predicates in the `WHERE` clause. +If you don't specify the `WHERE` clause, all documents are deleted. + +### Syntax + +Rule `singleDeleteStatement`: + +![singleDeleteStatement](../../images/singleDeleteStatement.png) + +### Example + +SQL query: + +```sql +DELETE FROM accounts +WHERE age > 30 +``` + +Explain: + +```json +{ + "size" : 1000, + "query" : { + "bool" : { + "must" : [ + { + "range" : { + "age" : { + "from" : 30, + "to" : null, + "include_lower" : false, + "include_upper" : true, + "boost" : 1.0 + } + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + }, + "_source" : false +} +``` + +Result set: + +```json +{ + "schema" : [ + { + "name" : "deleted_rows", + "type" : "long" + } + ], + "total" : 1, + "datarows" : [ + [ + 3 + ] + ], + "size" : 1, + "status" : 200 +} +``` + +The `datarows` field shows the number of documents deleted. diff --git a/docs/sql/endpoints.md b/docs/sql/endpoints.md new file mode 100644 index 00000000..37cb33c0 --- /dev/null +++ b/docs/sql/endpoints.md @@ -0,0 +1,224 @@ +--- +layout: default +title: Endpoint +parent: SQL +nav_order: 13 +--- + + +# Endpoint + +To send query request to SQL plugin, you can either use a request +parameter in HTTP GET or request body by HTTP POST request. POST request +is recommended because it doesn't have length limitation and allows for +other parameters passed to plugin for other functionality such as +prepared statement. And also the explain endpoint is used very often for +query translation and troubleshooting. + +## GET + +### Description + +You can send HTTP GET request with your query embedded in URL parameter. + +### Example + +SQL query: + +```console +>> curl -H 'Content-Type: application/json' -X GET localhost:9200/_opensearch/_sql?sql=SELECT * FROM accounts +``` + +## POST + +### Description + +You can also send HTTP POST request with your query in request body. + +### Example + +SQL query: + +```console +>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_opensearch/_sql -d '{ + "query" : "SELECT * FROM accounts" +}' +``` + +## Explain + +### Description + +To translate your query, send it to explain endpoint. The explain output +is OpenSearch domain specific language (DSL) in JSON format. You can +just copy and paste it to your console to run it against OpenSearch +directly. + +### Example + +Explain query: + +```console +>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_opensearch/_sql/_explain -d '{ + "query" : "SELECT firstname, lastname FROM accounts WHERE age > 20" +}' +``` + +Explain: + +```json +{ + "from": 0, + "size": 200, + "query": { + "bool": { + "filter": [{ + "bool": { + "must": [{ + "range": { + "age": { + "from": 20, + "to": null, + "include_lower": false, + "include_upper": true, + "boost": 1.0 + } + } + }], + "adjust_pure_negative": true, + "boost": 1.0 + } + }], + "adjust_pure_negative": true, + "boost": 1.0 + } + }, + "_source": { + "includes": [ + "firstname", + "lastname" + ], + "excludes": [] + } +} +``` + + +## Cursor + +### Description + +To get back a paginated response, use the `fetch_size` parameter. The value of `fetch_size` should be greater than 0. The default value is 1,000. A value of 0 will fallback to a non-paginated response. + +The `fetch_size` parameter is only supported for the JDBC response format. +{: .note } + + +### Example + +SQL query: + +```console +>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_opensearch/_sql -d '{ + "fetch_size" : 5, + "query" : "SELECT firstname, lastname FROM accounts WHERE age > 20 ORDER BY state ASC" +}' +``` + +Result set: + +```json +{ + "schema": [ + { + "name": "firstname", + "type": "text" + }, + { + "name": "lastname", + "type": "text" + } + ], + "cursor": "d:eyJhIjp7fSwicyI6IkRYRjFaWEo1UVc1a1JtVjBZMmdCQUFBQUFBQUFBQU1XZWpkdFRFRkZUMlpTZEZkeFdsWnJkRlZoYnpaeVVRPT0iLCJjIjpbeyJuYW1lIjoiZmlyc3RuYW1lIiwidHlwZSI6InRleHQifSx7Im5hbWUiOiJsYXN0bmFtZSIsInR5cGUiOiJ0ZXh0In1dLCJmIjo1LCJpIjoiYWNjb3VudHMiLCJsIjo5NTF9", + "total": 956, + "datarows": [ + [ + "Cherry", + "Carey" + ], + [ + "Lindsey", + "Hawkins" + ], + [ + "Sargent", + "Powers" + ], + [ + "Campos", + "Olsen" + ], + [ + "Savannah", + "Kirby" + ] + ], + "size": 5, + "status": 200 +} +``` + +To fetch subsequent pages, use the `cursor` from last response: + +```console +>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_opensearch/_sql -d '{ + "cursor": "d:eyJhIjp7fSwicyI6IkRYRjFaWEo1UVc1a1JtVjBZMmdCQUFBQUFBQUFBQU1XZWpkdFRFRkZUMlpTZEZkeFdsWnJkRlZoYnpaeVVRPT0iLCJjIjpbeyJuYW1lIjoiZmlyc3RuYW1lIiwidHlwZSI6InRleHQifSx7Im5hbWUiOiJsYXN0bmFtZSIsInR5cGUiOiJ0ZXh0In1dLCJmIjo1LCJpIjoiYWNjb3VudHMiLCJsIjo5NTF9" +}' +``` + +The result only has the `fetch_size` number of `datarows` and `cursor`. +The last page has only `datarows` and no `cursor`. +The `datarows` can have more than the `fetch_size` number of records in case the nested fields are flattened. + +```json +{ + "cursor": "d:eyJhIjp7fSwicyI6IkRYRjFaWEo1UVc1a1JtVjBZMmdCQUFBQUFBQUFBQU1XZWpkdFRFRkZUMlpTZEZkeFdsWnJkRlZoYnpaeVVRPT0iLCJjIjpbeyJuYW1lIjoiZmlyc3RuYW1lIiwidHlwZSI6InRleHQifSx7Im5hbWUiOiJsYXN0bmFtZSIsInR5cGUiOiJ0ZXh0In1dLCJmIjo1LCJpIjoiYWNjb3VudHMabcde12345", + "datarows": [ + [ + "Abbas", + "Hussain" + ], + [ + "Chen", + "Dai" + ], + [ + "Anirudha", + "Jadhav" + ], + [ + "Peng", + "Huo" + ], + [ + "John", + "Doe" + ] + ] +} +``` + +The `cursor` context is automatically cleared on the last page. +To explicitly clear cursor context, use the `_opensearch/_sql/close endpoint` operation. + +```console +>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_opensearch/_sql/close -d '{ + "cursor": "d:eyJhIjp7fSwicyI6IkRYRjFaWEo1UVc1a1JtVjBZMmdCQUFBQUFBQUFBQU1XZWpkdFRFRkZUMlpTZEZkeFdsWnJkRlZoYnpaeVVRPT0iLCJjIjpbeyJuYW1lIjoiZmlyc3RuYW1lIiwidHlwZSI6InRleHQifSx7Im5hbWUiOiJsYXN0bmFtZSIsInR5cGUiOiJ0ZXh0In1dLCJmIjo1LCJpIjoiYWNjb3VudHMiLCJsIjo5NTF9" +}' +``` + +#### Sample response + +```json +{"succeeded":true} +``` diff --git a/docs/sql/functions.md b/docs/sql/functions.md new file mode 100644 index 00000000..215300fb --- /dev/null +++ b/docs/sql/functions.md @@ -0,0 +1,133 @@ +--- +layout: default +title: Functions +parent: SQL +nav_order: 10 +--- + +# Functions + +You must enable fielddata in the document mapping for most string functions to work properly. + +The specification shows the return type of the function with a generic type `T` as the argument. +For example, `abs(number T) -> T` means that the function `abs` accepts a numerical argument of type `T`, which could be any sub-type of the `number` type, and it returns the actual type of `T` as the return type. + +The SQL plugin supports the following functions. + +## Mathematical + +Function | Specification | Example +:--- | :--- | :--- +abs | `abs(number T) -> T` | `SELECT abs(0.5) FROM my-index LIMIT 1` +add | `add(number T, number) -> T` | `SELECT add(1, 5) FROM my-index LIMIT 1` +cbrt | `cbrt(number T) -> T` | `SELECT cbrt(0.5) FROM my-index LIMIT 1` +ceil | `ceil(number T) -> T` | `SELECT ceil(0.5) FROM my-index LIMIT 1` +conv | `conv(string T, int a, int b) -> T` | `SELECT CONV('12', 10, 16), CONV('2C', 16, 10), CONV(12, 10, 2), CONV(1111, 2, 10) FROM my-index LIMIT 1` +crc32 | `crc32(string T) -> T` | `SELECT crc32('MySQL') FROM my-index LIMIT 1` +divide | `divide(number T, number) -> T` | `SELECT divide(1, 0.5) FROM my-index LIMIT 1` +e | `e() -> double` | `SELECT e() FROM my-index LIMIT 1` +exp | `exp(number T) -> T` | `SELECT exp(0.5) FROM my-index LIMIT 1` +expm1 | `expm1(number T) -> T` | `SELECT expm1(0.5) FROM my-index LIMIT 1` +floor | `floor(number T) -> T` | `SELECT floor(0.5) AS Rounded_Down FROM my-index LIMIT 1` +ln | `ln(number T) -> double` | `SELECT ln(10) FROM my-index LIMIT 1` +log | `log(number T) -> double` or `log(number T, number) -> double` | `SELECT log(10) FROM my-index LIMIT 1` +log2 | `log2(number T) -> double` | `SELECT log2(10) FROM my-index LIMIT 1` +log10 | `log10(number T) -> double` | `SELECT log10(10) FROM my-index LIMIT 1` +mod | `mod(number T, number) -> T` | `SELECT modulus(2, 3) FROM my-index LIMIT 1` +multiply | `multiply(number T, number) -> number` | `SELECT multiply(2, 3) FROM my-index LIMIT 1` +pi | `pi() -> double` | `SELECT pi() FROM my-index LIMIT 1` +pow | `pow(number T) -> T` or `pow(number T, number) -> T` | `SELECT pow(2, 3) FROM my-index LIMIT 1` +power | `power(number T) -> T` or `power(number T, number) -> T` | `SELECT power(2, 3) FROM my-index LIMIT 1` +rand | `rand() -> number` or `rand(number T) -> T` | `SELECT rand(0.5) FROM my-index LIMIT 1` +rint | `rint(number T) -> T` | `SELECT rint(1.5) FROM my-index LIMIT 1` +round | `round(number T) -> T` | `SELECT round(1.5) FROM my-index LIMIT 1` +sign | `sign(number T) -> T` | `SELECT sign(1.5) FROM my-index LIMIT 1` +signum | `signum(number T) -> T` | `SELECT signum(0.5) FROM my-index LIMIT 1` +sqrt | `sqrt(number T) -> T` | `SELECT sqrt(0.5) FROM my-index LIMIT 1` +strcmp | `strcmp(string T, string T) -> T` | `SELECT strcmp('hello', 'hello') FROM my-index LIMIT 1` +subtract | `subtract(number T, number) -> T` | `SELECT subtract(3, 2) FROM my-index LIMIT 1` +truncate | `truncate(number T, number T) -> T` | `SELECT truncate(56.78, 1) FROM my-index LIMIT 1` +/ | `number [op] number -> number` | `SELECT 1 / 100 FROM my-index LIMIT 1` +% | `number [op] number -> number` | `SELECT 1 % 100 FROM my-index LIMIT 1` + +## Trigonometric + +Function | Specification | Example +:--- | :--- | :--- +acos | `acos(number T) -> double` | `SELECT acos(0.5) FROM my-index LIMIT 1` +asin | `asin(number T) -> double` | `SELECT asin(0.5) FROM my-index LIMIT 1` +atan | `atan(number T) -> double` | `SELECT atan(0.5) FROM my-index LIMIT 1` +atan2 | `atan2(number T, number) -> double` | `SELECT atan2(1, 0.5) FROM my-index LIMIT 1` +cos | `cos(number T) -> double` | `SELECT cos(0.5) FROM my-index LIMIT 1` +cosh | `cosh(number T) -> double` | `SELECT cosh(0.5) FROM my-index LIMIT 1` +cot | `cot(number T) -> double` | `SELECT cot(0.5) FROM my-index LIMIT 1` +degrees | `degrees(number T) -> double` | `SELECT degrees(0.5) FROM my-index LIMIT 1` +radians | `radians(number T) -> double` | `SELECT radians(0.5) FROM my-index LIMIT 1` +sin | `sin(number T) -> double` | `SELECT sin(0.5) FROM my-index LIMIT 1` +sinh | `sinh(number T) -> double` | `SELECT sinh(0.5) FROM my-index LIMIT 1` +tan | `tan(number T) -> double` | `SELECT tan(0.5) FROM my-index LIMIT 1` + +## Date and time + +Function | Specification | Example +:--- | :--- | :--- +adddate | `adddate(date, INTERVAL expr unit) -> date` | `SELECT adddate(date('2020-08-26'), INTERVAL 1 hour) FROM my-index LIMIT 1` +curdate | `curdate() -> date` | `SELECT curdate() FROM my-index LIMIT 1` +date | `date(date) -> date` | `SELECT date() FROM my-index LIMIT 1` +date_format | `date_format(date, string) -> string` or `date_format(date, string, string) -> string` | `SELECT date_format(date, 'Y') FROM my-index LIMIT 1` +date_sub | `date_sub(date, INTERVAL expr unit) -> date` | `SELECT date_sub(date('2008-01-02'), INTERVAL 31 day) FROM my-index LIMIT 1` +dayofmonth | `dayofmonth(date) -> integer` | `SELECT dayofmonth(date) FROM my-index LIMIT 1` +dayname | `dayname(date) -> string` | `SELECT dayname(date('2020-08-26')) FROM my-index LIMIT 1` +dayofyear | `dayofyear(date) -> integer` | `SELECT dayofyear(date('2020-08-26')) FROM my-index LIMIT 1` +dayofweek | `dayofweek(date) -> integer` | `SELECT dayofweek(date('2020-08-26')) FROM my-index LIMIT 1` +from_days | `from_days(N) -> integer` | `SELECT from_days(733687) FROM my-index LIMIT 1` +hour | `hour(time) -> integer` | `SELECT hour((time '01:02:03')) FROM my-index LIMIT 1` +maketime | `maketime(integer, integer, integer) -> date` | `SELECT maketime(1, 2, 3) FROM my-index LIMIT 1` +microsecond | `microsecond(expr) -> integer` | `SELECT microsecond((time '01:02:03.123456')) FROM my-index LIMIT 1` +minute | `minute(expr) -> integer` | `SELECT minute((time '01:02:03')) FROM my-index LIMIT 1` +month | `month(date) -> integer` | `SELECT month(date) FROM my-index` +monthname | `monthname(date) -> string` | `SELECT monthname(date) FROM my-index` +now | `now() -> date` | `SELECT now() FROM my-index LIMIT 1` +quarter | `quarter(date) -> integer` | `SELECT quarter(date('2020-08-26')) FROM my-index LIMIT 1` +second | `second(time) -> integer` | `SELECT second((time '01:02:03')) FROM my-index LIMIT 1` +subdate | `subdate(date, INTERVAL expr unit) -> date, datetime` | `SELECT subdate(date('2008-01-02'), INTERVAL 31 day) FROM my-index LIMIT 1` +time | `time(expr) -> time` | `SELECT time('13:49:00') FROM my-index LIMIT 1` +time_to_sec | `time_to_sec(time) -> long` | `SELECT time_to_sec(time '22:23:00') FROM my-index LIMIT 1` +timestamp | `timestamp(date) -> date` | `SELECT timestamp(date) FROM my-index LIMIT 1` +to_days | `to_days(date) -> long` | `SELECT to_days(date '2008-10-07') FROM my-index LIMIT 1` +week | `week(date[mode]) -> integer` | `SELECT week(date('2008-02-20')) FROM my-index LIMIT 1` +year | `year(date) -> integer` | `SELECT year(date) FROM my-index LIMIT 1` + +## String + +Function | Specification | Example +:--- | :--- | :--- +ascii | `ascii(string T) -> integer` | `SELECT ascii(name.keyword) FROM my-index LIMIT 1` +concat | `concat(str1, str2) -> string` | `SELECT concat('hello', 'world') FROM my-index LIMIT 1` +concat_ws | `concat_ws(separator, string, string…) -> string` | `SELECT concat_ws("-", "Tutorial", "is", "fun!") FROM my-index LIMIT 1` +left | `left(string T, integer) -> T` | `SELECT left('hello', 2) FROM my-index LIMIT 1` +length | `length(string) -> integer` | `SELECT length('hello') FROM my-index LIMIT 1` +locate | `locate(string, string, integer) -> integer` or `locate(string, string) -> INTEGER` | `SELECT locate('o', 'hello') FROM my-index LIMIT 1`, `SELECT locate('l', 'hello', 3) FROM my-index LIMIT 1` +replace | `replace(string T, string, string) -> T` | `SELECT replace('hello', 'l', 'x') FROM my-index LIMIT 1` +right | `right(string T, integer) -> T` | `SELECT right('hello', 1) FROM my-index LIMIT 1` +rtrim | `rtrim(string T) -> T` | `SELECT rtrim(name.keyword) FROM my-index LIMIT 1` +substring | `substring(string T, integer, integer) -> T` | `SELECT substring(name.keyword, 2,5) FROM my-index LIMIT 1` +trim | `trim(string T) -> T` | `SELECT trim(' hello') FROM my-index LIMIT 1` +upper | `upper(string T) -> T` | `SELECT upper('helloworld') FROM my-index LIMIT 1` + +## Aggregate + +Function | Specification | Example +:--- | :--- | :--- +avg | `avg(number T) -> T` | `SELECT avg(2, 3) FROM my-index LIMIT 1` +count | `count(number T) -> T` | `SELECT count(date) FROM my-index LIMIT 1` +min | `min(number T, number) -> T` | `SELECT min(2, 3) FROM my-index LIMIT 1` +show | `show(string T) -> T` | `SHOW TABLES LIKE my-index` + +## Advanced + +Function | Specification | Example +:--- | :--- | :--- +if | `if(boolean, es_type, es_type) -> es_type` | `SELECT if(false, 0, 1) FROM my-index LIMIT 1`, `SELECT if(true, 0, 1) FROM my-index LIMIT 1` +ifnull | `ifnull(es_type, es_type) -> es_type` | `SELECT ifnull('hello', 1) FROM my-index LIMIT 1`, `SELECT ifnull(null, 1) FROM my-index LIMIT 1` +isnull | `isnull(es_type) -> integer` | `SELECT isnull(null) FROM my-index LIMIT 1`, `SELECT isnull(1) FROM my-index LIMIT 1` diff --git a/docs/sql/index.md b/docs/sql/index.md new file mode 100644 index 00000000..5e41060f --- /dev/null +++ b/docs/sql/index.md @@ -0,0 +1,74 @@ +--- +layout: default +title: SQL +nav_order: 38 +has_children: true +has_toc: false +--- + +# SQL + +OpenSearch SQL lets you write queries in SQL rather than the [OpenSearch query domain-specific language (DSL)](../opensearch/full-text). If you're already familiar with SQL and don't want to learn the query DSL, this feature is a great option. + + +## Workbench + +The easiest way to get familiar with the SQL plugin is to use **SQL Workbench** in OpenSearch Dashboards to test various queries. To learn more, see [Workbench](workbench/). + +![OpenSearch Dashboards SQL UI plugin](../images/sql.png) + + +## REST API + +To use the SQL plugin with your own applications, send requests to `_opensearch/_sql`: + +```json +POST _opensearch/_sql +{ + "query": "SELECT * FROM my-index LIMIT 50" +} +``` + +Here’s how core SQL concepts map to OpenSearch: + +SQL | OpenSearch +:--- | :--- +Table | Index +Row | Document +Column | Field + +You can query multiple indices by listing them or using wildcards: + +```json +POST _opensearch/_sql +{ + "query": "SELECT * FROM my-index1,myindex2,myindex3 LIMIT 50" +} + +POST _opensearch/_sql +{ + "query": "SELECT * FROM my-index* LIMIT 50" +} +``` + +For a sample [curl](https://curl.haxx.se/) command, try: + +```bash +curl -XPOST https://localhost:9200/_opensearch/_sql -u 'admin:admin' -k -H 'Content-Type: application/json' -d '{"query": "SELECT * FROM opensearch_dashboards_sample_data_flights LIMIT 10"}' +``` + +By default, queries return data in JDBC format, but you can also return data in standard OpenSearch JSON, CSV, or raw formats: + +```json +POST _opensearch/_sql?format=json|csv|raw +{ + "query": "SELECT * FROM my-index LIMIT 50" +} +``` + +See the rest of this guide for detailed information on request parameters, settings, supported operations, tools, and more. + + +## Contributing + +To get involved and help us improve the SQL plugin, see the [development guide](https://github.com/opensearch-project/sql/blob/master/docs/developing.rst) for instructions on setting up your development environment and building the project. diff --git a/docs/sql/jdbc.md b/docs/sql/jdbc.md new file mode 100644 index 00000000..fa9c80d2 --- /dev/null +++ b/docs/sql/jdbc.md @@ -0,0 +1,12 @@ +--- +layout: default +title: JDBC Driver +parent: SQL +nav_order: 71 +--- + +# JDBC driver + +The Java Database Connectivity (JDBC) driver lets you integrate OpenSearch with your favorite business intelligence (BI) applications. + +For information on downloading and using the JAR file, see [the SQL repository on GitHub](https://github.com/opensearch-project/sql/tree/master/sql-jdbc). diff --git a/docs/sql/limitation.md b/docs/sql/limitation.md new file mode 100644 index 00000000..bd57e635 --- /dev/null +++ b/docs/sql/limitation.md @@ -0,0 +1,119 @@ +--- +layout: default +title: Limitations +parent: SQL +nav_order: 18 +--- + +# Limitations + +The SQL plugin has the following limitations: + +## SELECT FROM WHERE + +### Select literal is not supported + +The select literal expression is not supported. For example, `Select 1` is not supported. +Here's a link to the Github issue - [Issue #256](https://github.com/opensearch-project/sql/issues/256). + +### Where clause does not support arithmetic operations + +The `WHERE` clause does not support expressions. For example, `SELECT FlightNum FROM opensearch_dashboards_sample_data_flights where (AvgTicketPrice + 100) <= 1000` is not supported. +Here's a link to the Github issue - [Issue #234](https://github.com/opensearch-project/sql/issues/234). + +### Aggregation over expression is not supported + +You can only apply aggregation on fields, aggregations can't accept an expression as a parameter. For example, `avg(log(age))` is not supported. +Here's a link to the Github issue - [Issue #288](https://github.com/opensearch-project/sql/issues/288). + +### Conflict type in multiple index query + +Queries using wildcard index fail if the index has the field with a conflict type. +For example, if you have two indices with field `a`: + +``` +POST conflict_index_1/_doc/ +{ + "a": { + "b": 1 + } +} + +POST conflict_index_2/_doc/ +{ + "a": { + "b": 1, + "c": 2 + } +} +``` + +Then, the query fails because of the field mapping conflict. The query `SELECT * FROM conflict_index*` also fails for the same reason. + +```sql +Error occurred in OpenSearch engine: Different mappings are not allowed for the same field[a]: found [{properties:{b:{type:long},c:{type:long}}}] and [{properties:{b:{type:long}}}] ", + "details": "com.amazon.opensearchforopensearch.sql.rewriter.matchtoterm.VerificationException: Different mappings are not allowed for the same field[a]: found [{properties:{b:{type:long},c:{type:long}}}] and [{properties:{b:{type:long}}}] \nFor more details, please send request for Json format to see the raw response from opensearch engine.", + "type": "VerificationException +``` + +Here's a link to the Github issue - [Issue #445](https://github.com/opensearch-project/sql/issues/445). + +## Subquery in the FROM clause + +Subquery in the `FROM` clause in this format: `SELECT outer FROM (SELECT inner)` is supported only when the query is merged into one query. For example, the following query is supported: + +```sql +SELECT t.f, t.d +FROM ( + SELECT FlightNum as f, DestCountry as d + FROM opensearch_dashboards_sample_data_flights + WHERE OriginCountry = 'US') t +``` + +But, if the outer query has `GROUP BY` or `ORDER BY`, then it's not supported. + +## JOIN does not support aggregations on the joined result + +The `join` query does not support aggregations on the joined result. +For example, e.g. `SELECT depo.name, avg(empo.age) FROM empo JOIN depo WHERE empo.id == depo.id GROUP BY depo.name` is not supported. +Here's a link to the Github issue - [Issue 110](https://github.com/opensearch-project/sql/issues/110). + +## Pagination only supports basic queries + +The pagination query enables you to get back paginated responses. +Currently, the pagination only supports basic queries. For example, the following query returns the data with cursor id. + +```json +POST _opensearch/_sql/ +{ + "fetch_size" : 5, + "query" : "SELECT OriginCountry, DestCountry FROM opensearch_dashboards_sample_data_flights ORDER BY OriginCountry ASC" +} +``` + +The response in JDBC format with cursor id. + +```json +{ + "schema": [ + { + "name": "OriginCountry", + "type": "keyword" + }, + { + "name": "DestCountry", + "type": "keyword" + } + ], + "cursor": "d:eyJhIjp7fSwicyI6IkRYRjFaWEo1UVc1a1JtVjBZMmdCQUFBQUFBQUFCSllXVTJKVU4yeExiWEJSUkhsNFVrdDVXVEZSYkVKSmR3PT0iLCJjIjpbeyJuYW1lIjoiT3JpZ2luQ291bnRyeSIsInR5cGUiOiJrZXl3b3JkIn0seyJuYW1lIjoiRGVzdENvdW50cnkiLCJ0eXBlIjoia2V5d29yZCJ9XSwiZiI6MSwiaSI6ImtpYmFuYV9zYW1wbGVfZGF0YV9mbGlnaHRzIiwibCI6MTMwNTh9", + "total": 13059, + "datarows": [[ + "AE", + "CN" + ]], + "size": 1, + "status": 200 +} +``` + +The query with `aggregation` and `join` does not support pagination for now. diff --git a/docs/sql/metadata.md b/docs/sql/metadata.md new file mode 100644 index 00000000..8a67c367 --- /dev/null +++ b/docs/sql/metadata.md @@ -0,0 +1,70 @@ +--- +layout: default +title: Metadata Queries +parent: SQL +nav_order: 9 +--- + +# Metadata queries + +To see basic metadata about your indices, use the `SHOW` and `DESCRIBE` commands. + +### Syntax + +Rule `showStatement`: + +![showStatement](../../images/showStatement.png) + +Rule `showFilter`: + +![showFilter](../../images/showFilter.png) + +### Example 1: See metadata for indices + +To see metadata for indices that match a specific pattern, use the `SHOW` command. +Use the wildcard `%` to match all indices: + +```sql +SHOW TABLES LIKE % +``` + +| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_CAT | TYPE_SCHEM | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION +:--- | :--- +docker-cluster | null | accounts | BASE TABLE | null | null | null | null | null | null +docker-cluster | null | employees_nested | BASE TABLE | null | null | null | null | null | null + + +### Example 2: See metadata for a specific index + +To see metadata for an index name with a prefix of `acc`: + +```sql +SHOW TABLES LIKE acc% +``` + +| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_CAT | TYPE_SCHEM | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION +:--- | :--- +docker-cluster | null | accounts | BASE TABLE | null | null | null | null | null | null + + +### Example 3: See metadata for fields + +To see metadata for field names that match a specific pattern, use the `DESCRIBE` command: + +```sql +DESCRIBE TABLES LIKE accounts +``` + +| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME | DATA_TYPE | TYPE_NAME | COLUMN_SIZE | BUFFER_LENGTH | DECIMAL_DIGITS | NUM_PREC_RADIX | NULLABLE | REMARKS | COLUMN_DEF | SQL_DATA_TYPE | SQL_DATETIME_SUB | CHAR_OCTET_LENGTH | ORDINAL_POSITION | IS_NULLABLE | SCOPE_CATALOG | SCOPE_SCHEMA | SCOPE_TABLE | SOURCE_DATA_TYPE | IS_AUTOINCREMENT | IS_GENERATEDCOLUMN +:--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- +docker-cluster | null | accounts | account_number | null | long | null | null | null | 10 | 2 | null | null | null | null | null | 1 | | null | null | null | null | NO | +docker-cluster | null | accounts | firstname | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 2 | | null | null | null | null | NO | +docker-cluster | null | accounts | address | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 3 | | null | null | null | null | NO | +docker-cluster | null | accounts | balance | null | long | null | null | null | 10 | 2 | null | null | null | null | null | 4 | | null | null | null | null | NO | +docker-cluster | null | accounts | gender | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 5 | | null | null | null | null | NO | +docker-cluster | null | accounts | city | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 6 | | null | null | null | null | NO | +docker-cluster | null | accounts | employer | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 7 | | null | null | null | null | NO | +docker-cluster | null | accounts | state | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 8 | | null | null | null | null | NO | +docker-cluster | null | accounts | age | null | long | null | null | null | 10 | 2 | null | null | null | null | null | 9 | | null | null | null | null | NO | +docker-cluster | null | accounts | email | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 10 | | null | null | null | null | NO | +docker-cluster | null | accounts | lastname | null | text | null | null | null | 10 | 2 | null | null | null | null | null | 11 | | null | null | null | null | NO | diff --git a/docs/sql/monitoring.md b/docs/sql/monitoring.md new file mode 100644 index 00000000..6992170d --- /dev/null +++ b/docs/sql/monitoring.md @@ -0,0 +1,50 @@ +--- +layout: default +title: Monitoring +parent: SQL +nav_order: 15 +--- + +# Monitoring + +By a stats endpoint, you are able to collect metrics for the plugin +within the interval. Note that only node level statistics collecting is +implemented for now. In other words, you only get the metrics for the +node you're accessing. Cluster level statistics have yet to be +implemented. + +## Node Stats + +### Description + +The meaning of fields in the response is as follows: + +| Field name| Description| +| ------------------------- | ------------------------------------------------------------- | +| request_total| Total count of request| +| request_count| Total count of request within the interval| +|failed_request_count_syserr|Count of failed request due to system error within the interval| +|failed_request_count_cuserr| Count of failed request due to bad request within the interval| +| failed_request_count_cb| Indicate if plugin is being circuit broken within the interval| + + +### Example + +SQL query: + +```console +>> curl -H 'Content-Type: application/json' -X GET localhost:9200/_opensearch/_sql/stats +``` + +Result set: + +```json +{ + "failed_request_count_cb": 0, + "failed_request_count_cuserr": 0, + "circuit_breaker": 0, + "request_total": 0, + "request_count": 0, + "failed_request_count_syserr": 0 +} +``` diff --git a/docs/sql/odbc.md b/docs/sql/odbc.md new file mode 100644 index 00000000..f3617b69 --- /dev/null +++ b/docs/sql/odbc.md @@ -0,0 +1,206 @@ +--- +layout: default +title: ODBC Driver +parent: SQL +nav_order: 72 +--- + +# ODBC driver + +The Open Database Connectivity (ODBC) driver is a read-only ODBC driver for Windows and macOS that lets you connect business intelligence (BI) and data visualization applications like [Tableau](https://github.com/opensearch-project/sql/blob/develop/sql-odbc/docs/user/tableau_support.md), [Microsoft Excel](https://github.com/opensearch-project/sql/blob/develop/sql-odbc/docs/user/microsoft_excel_support.md), and [Power BI](https://github.com/opensearch-project/sql/blob/main/sql-odbc/docs/user/power_bi_support.md) to the SQL plugin. + +For information on downloading and using the JAR file, see [the SQL repository on GitHub](https://github.com/opensearch-project/sql/tree/master/sql-odbc). + +{% comment %} + +## Specifications + +The ODBC driver is compatible with ODBC version 3.51. + +## Supported OS versions + +The following operating systems are supported: + +Operating System | Version +:--- | :--- +Windows | Windows 10 +macOS | Catalina 10.15.4 and Mojave 10.14.6 + + +## Concepts + +Term | Definition +:--- | :--- +**DSN** | A DSN (Data Source Name) is used to store driver information in the system. By storing the information in the system, the information does not need to be specified each time the driver connects. +**.tdc** file | The TDC file contains configuration information that Tableau applies to any connection matching the database vendor name and driver name defined in the file. This configuration allows you to fine-tune parts of your ODBC data connection and turn on/off certain features not supported by the data source. + + +## Install driver + +To install the driver, download the bundled distribution installer from [here](https://opensearch.org/downloads.html) or by build from the source. + + +### Windows + +1. Open the downloaded `OpenSearch SQL ODBC Driver--Windows.msi` installer. + + The installer is unsigned and shows a security dialog. Choose **More info** and **Run anyway**. + +1. Choose **Next** to proceed with the installation. + +1. Accept the agreement, and choose **Next**. + +1. The installer comes bundled with documentation and useful resources files to connect with various BI tools (for example, a `.tdc` file for Tableau). You can choose to keep or remove these resources. Choose **Next**. + +1. Choose **Install** and **Finish**. + +The following connection information is set up as part of the default DSN: + +``` +Host: localhost +Port: 9200 +Auth: NONE +``` + +To customize the DSN, use **ODBC Data Source Administrator** which is pre-installed on Windows 10. + + +### macOS + +Before installing the ODBC Driver on macOS, install the iODBC Driver Manager. + +1. Open the downloaded `OpenSearch SQL ODBC Driver--Darwin.pkg` installer. + + The installer is unsigned and shows a security dialog. Right-click on the installer and choose **Open**. + +1. Choose **Continue** several times to proceed with the installation. + +1. Choose the **Destination** to install the driver files. + +1. The installer comes bundled with documentation and useful resources files to connect with various BI tools (for example, a `.tdc` file for Tableau). You can choose to keep or remove these resources. Choose **Continue**. + +1. Choose **Install** and **Close**. + +Currently, the DSN is not set up as part of the installation and needs to be configured manually. First, open `iODBC Administrator`: + +``` +sudo /Applications/iODBC/iODBC\ Administrator64.app/Contents/MacOS/iODBC\ Administrator64 +``` + +This command gives the application permissions to save the driver and DSN configurations. + +1. Choose **ODBC Drivers** tab. +1. Choose **Add a Driver** and fill in the following details: + - **Description of the Driver**: Enter the driver name that you used for the ODBC connection (for example, OpenSearch SQL ODBC Driver). + - **Driver File Name**: Enter the path to the driver file (default: `/bin/libopensearchsqlodbc.dylib`). + - **Setup File Name**: Enter the path to the setup file (default: `/bin/libopensearchsqlodbc.dylib`). + +1. Choose the user driver. +1. Choose **OK** to save the options. +1. Choose the **User DSN** tab. +1. Select **Add**. +1. Choose the driver that you added above. +1. For **Data Source Name (DSN)**, enter the name of the DSN used to store connection options (for example, OpenSearch SQL ODBC DSN). +1. For **Comment**, add an optional comment. +1. Add key-value pairs by using the `+` button. We recommend the following options for a default local OpenSearch installation: + - **Host**: `localhost` - OpenSearch server endpoint + - **Port**: `9200` - The server port + - **Auth**: `NONE` - The authentication mode + - **Username**: `(blank)` - The username used for BASIC auth + - **Password**: `(blank)`- The password used for BASIC auth + - **ResponseTimeout**: `10` - The number of seconds to wait for a response from the server + - **UseSSL**: `0` - Do not use SSL for connections + +1. Choose **OK** to save the DSN configuration. +1. Choose **OK** to exit the iODBC Administrator. + + +## Customizing the ODBC driver + +The driver is in the form of a library file: `opensearchsqlodbc.dll` for Windows and `libopensearchsqlodbc.dylib` for macOS. + +If you're using with ODBC compatible BI tools, refer to your BI tool documentation for configuring a new ODBC driver. +Typically, all that's required is to make the BI tool aware of the location of the driver library file and then use it to set up the database (i.e., OpenSearch) connection. + + +### Connection strings and other settings + +The ODBC driver uses an ODBC connection string. +The connection strings are semicolon-delimited strings that specify the set of options that you can use for a connection. +Typically, a connection string will either: + - Specify a Data Source Name (DSN) that contains a pre-configured set of options (`DSN=xxx;User=xxx;Password=xxx;`). + - Or, configure options explicitly using the string (`Host=xxx;Port=xxx;LogLevel=ES_DEBUG;...`). + +You can configure the following driver options using a DSN or connection string: + +All option names are case-insensitive. +{: .note } + + +#### Basic options + +Option | Description | Type | Default +:--- | :--- +`DSN` | Data source name that you used for configuring the connection. | `string` | - +`Host / Server` | Hostname or IP address for the target cluster. | `string` | - +`Port` | Port number on which the OpenSearch cluster's REST interface is listening. | `string` | - + +#### Authentication Options + +Option | Description | Type | Default +:--- | :--- +`Auth` | Authentication mechanism to use. | `BASIC` (basic HTTP), `AWS_SIGV4` (AWS auth), or `NONE` | `NONE` +`User / UID` | [`Auth=BASIC`] Username for the connection. | `string` | - +`Password / PWD` | [`Auth=BASIC`] Password for the connection. | `string` | - +`Region` | [`Auth=AWS_SIGV4`] Region used for signing requests. | `AWS region (for example, us-west-1)` | - + +#### Advanced options + +Option | Description | Type | Default +:--- | :--- +`UseSSL` | Whether to establish the connection over SSL/TLS. | `boolean (0 or 1)` | `false (0)` +`HostnameVerification` | Indicates whether certificate hostname verification should be performed for an SSL/TLS connection. | `boolean` (0 or 1) | `true (1)` +`ResponseTimeout` | The maximum time to wait for responses from the host, in seconds. | `integer` | `10` + +#### Logging options + +Option | Description | Type | Default +:--- | :--- +`LogLevel` | Severity level for driver logs. | one of `ES_OFF`, `ES_FATAL`, `ES_ERROR`, `ES_INFO`, `ES_DEBUG`, `ES_TRACE`, `ES_ALL` | `ES_WARNING` +`LogOutput` | Location for storing driver logs. | `string` | `WIN: C:\`, `MAC: /tmp` + +You need administrative privileges to change the logging options. +{: .note } + + +## Connecting to Tableau + +Pre-requisites: + +- Make sure the DSN is already set up. +- Make sure OpenSearch is running on _host_ and _port_ as configured in DSN. +- Make sure the `.tdc` is copied to `/Documents/My Tableau Repository/Datasources` in both macOS and Windows. + +1. Start Tableau. Under the **Connect** section, go to **To a Server** and choose **Other Databases (ODBC)**. + +1. In the **DSN drop-down**, select the OpenSearch DSN you set up in the previous set of steps. The options you added will be automatically filled into the **Connection Attributes**. + +1. Select **Sign In**. After a few seconds, Tableau connects to your OpenSearch server. Once connected, you will directed to **Datasource** window. The **Database** will be already populated with name of the OpenSearch cluster. +To list all the indices, click the search icon under **Table**. + +1. Start playing with data by dragging table to connection area. Choose **Update Now** or **Automatically Update** to populate table data. + + +### Troubleshooting + +**Problem** + +Unable to connect to server. + +**Workaround** + +This is most likely due to OpenSearch server not running on **host** and **post** configured in DSN. +Confirm **host** and **post** are correct and OpenSearch server is running with OpenSearch SQL plugin. +Also make sure `.tdc` that was downloaded with the installer is copied correctly to `/Documents/My Tableau Repository/Datasources` directory. + +{% endcomment %} diff --git a/docs/sql/partiql.md b/docs/sql/partiql.md new file mode 100644 index 00000000..1dfadb81 --- /dev/null +++ b/docs/sql/partiql.md @@ -0,0 +1,215 @@ +--- +layout: default +title: JSON Support +parent: SQL +nav_order: 7 +--- + +# JSON Support + +SQL plugin supports JSON by following [PartiQL](https://partiql.org/) specification, a SQL-compatible query language that lets you query semi-structured and nested data for any data format. The SQL plugin only supports a subset of the PartiQL specification. + +## Querying nested collection + +PartiQL extends SQL to allow you to query and unnest nested collections. In OpenSearch, this is very useful to query a JSON index with nested objects or fields. + +To follow along, use the `bulk` operation to index some sample data: + +```json +POST employees_nested/_bulk?refresh +{"index":{"_id":"1"}} +{"id":3,"name":"Bob Smith","title":null,"projects":[{"name":"SQL Spectrum querying","started_year":1990},{"name":"SQL security","started_year":1999},{"name":"OpenSearch security","started_year":2015}]} +{"index":{"_id":"2"}} +{"id":4,"name":"Susan Smith","title":"Dev Mgr","projects":[]} +{"index":{"_id":"3"}} +{"id":6,"name":"Jane Smith","title":"Software Eng 2","projects":[{"name":"SQL security","started_year":1998},{"name":"Hello security","started_year":2015,"address":[{"city":"Dallas","state":"TX"}]}]} +``` + +### Example 1: Unnesting a nested collection + +This example finds the nested document (`projects`) with a field value (`name`) that satisfies the predicate (contains `security`). Because each parent document can have more than one nested documents, the nested document that matches is flattened. In other words, the final result is the cartesian product between the parent and nested documents. + +```sql +SELECT e.name AS employeeName, + p.name AS projectName +FROM employees_nested AS e, + e.projects AS p +WHERE p.name LIKE '%security%' +``` + +Explain: + +```json +{ + "from" : 0, + "size" : 200, + "query" : { + "bool" : { + "filter" : [ + { + "bool" : { + "must" : [ + { + "nested" : { + "query" : { + "wildcard" : { + "projects.name" : { + "wildcard" : "*security*", + "boost" : 1.0 + } + } + }, + "path" : "projects", + "ignore_unmapped" : false, + "score_mode" : "none", + "boost" : 1.0, + "inner_hits" : { + "ignore_unmapped" : false, + "from" : 0, + "size" : 3, + "version" : false, + "seq_no_primary_term" : false, + "explain" : false, + "track_scores" : false, + "_source" : { + "includes" : [ + "projects.name" + ], + "excludes" : [ ] + } + } + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + }, + "_source" : { + "includes" : [ + "name" + ], + "excludes" : [ ] + } +} +``` + +Result set: + +| employeeName | projectName +:--- | :--- +Bob Smith | OpenSearch Security +Bob Smith | SQL security +Jane Smith | Hello security +Jane Smith | SQL security + +### Example 2: Unnesting in existential subquery + +To unnest a nested collection in a subquery to check if it satisfies a condition: + +```sql +SELECT e.name AS employeeName +FROM employees_nested AS e +WHERE EXISTS ( + SELECT * + FROM e.projects AS p + WHERE p.name LIKE '%security%' +) +``` + +Explain: + +```json +{ + "from" : 0, + "size" : 200, + "query" : { + "bool" : { + "filter" : [ + { + "bool" : { + "must" : [ + { + "nested" : { + "query" : { + "bool" : { + "must" : [ + { + "bool" : { + "must" : [ + { + "bool" : { + "must_not" : [ + { + "bool" : { + "must_not" : [ + { + "exists" : { + "field" : "projects", + "boost" : 1.0 + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + }, + { + "wildcard" : { + "projects.name" : { + "wildcard" : "*security*", + "boost" : 1.0 + } + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + }, + "path" : "projects", + "ignore_unmapped" : false, + "score_mode" : "none", + "boost" : 1.0 + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + } + ], + "adjust_pure_negative" : true, + "boost" : 1.0 + } + }, + "_source" : { + "includes" : [ + "name" + ], + "excludes" : [ ] + } +} +``` + +Result set: + +| employeeName | +:--- | :--- +Bob Smith | +Jane Smith | diff --git a/docs/sql/protocol.md b/docs/sql/protocol.md new file mode 100644 index 00000000..6d9316bf --- /dev/null +++ b/docs/sql/protocol.md @@ -0,0 +1,330 @@ +--- +layout: default +title: Protocol +parent: SQL +nav_order: 14 +--- + +# Protocol + +For the protocol, SQL plugin provides multiple response formats for +different purposes while the request format is same for all. Among them +JDBC format is widely used because it provides schema information and +more functionality such as pagination. Besides JDBC driver, various +clients can benefit from the detailed and well formatted response. + +## Request Format + +### Description + +The body of HTTP POST request can take a few more other fields with SQL +query. + +### Example 1 + +Use `filter` to add more conditions to +OpenSearch DSL directly. + +SQL query: + +```console +>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_opensearch/_sql -d '{ + "query" : "SELECT firstname, lastname, balance FROM accounts", + "filter" : { + "range" : { + "balance" : { + "lt" : 10000 + } + } + } +}' +``` + +Explain: + +```json +{ + "from": 0, + "size": 200, + "query": { + "bool": { + "filter": [{ + "bool": { + "filter": [{ + "range": { + "balance": { + "from": null, + "to": 10000, + "include_lower": true, + "include_upper": false, + "boost": 1.0 + } + } + }], + "adjust_pure_negative": true, + "boost": 1.0 + } + }], + "adjust_pure_negative": true, + "boost": 1.0 + } + }, + "_source": { + "includes": [ + "firstname", + "lastname", + "balance" + ], + "excludes": [] + } +} +``` + +### Example 2 + +Use `parameters` for actual parameter value +in prepared SQL query. + +SQL query: + +```console +>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_opensearch/_sql -d '{ + "query": "SELECT * FROM accounts WHERE age = ?", + "parameters": [{ + "type": "integer", + "value": 30 + }] +}' +``` + +Explain: + +```json +{ + "from": 0, + "size": 200, + "query": { + "bool": { + "filter": [{ + "bool": { + "must": [{ + "term": { + "age": { + "value": 30, + "boost": 1.0 + } + } + }], + "adjust_pure_negative": true, + "boost": 1.0 + } + }], + "adjust_pure_negative": true, + "boost": 1.0 + } + } +} +``` + +## OpenSearch DSL + +### Description + +By default the plugin returns original response from OpenSearch in +JSON. Because this is the native response from OpenSearch, extra +efforts are needed to parse and interpret it. + +### Example + +SQL query: + +```console +>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_opensearch/_sql -d '{ + "query" : "SELECT firstname, lastname, age FROM accounts ORDER BY age LIMIT 2" +}' +``` + +Result set: + +```json +{ + "_shards": { + "total": 5, + "failed": 0, + "successful": 5, + "skipped": 0 + }, + "hits": { + "hits": [{ + "_index": "accounts", + "_type": "account", + "_source": { + "firstname": "Nanette", + "age": 28, + "lastname": "Bates" + }, + "_id": "13", + "sort": [ + 28 + ], + "_score": null + }, + { + "_index": "accounts", + "_type": "account", + "_source": { + "firstname": "Amber", + "age": 32, + "lastname": "Duke" + }, + "_id": "1", + "sort": [ + 32 + ], + "_score": null + } + ], + "total": { + "value": 4, + "relation": "eq" + }, + "max_score": null + }, + "took": 100, + "timed_out": false +} +``` + +## JDBC Format + +### Description + +JDBC format is provided for JDBC driver and client side that needs both +schema and result set well formatted. + +### Example 1 + +Here is an example for normal response. The +`schema` includes field name and its type +and `datarows` includes the result set. + +SQL query: + +```console +>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_opensearch/_sql?format=jdbc -d '{ + "query" : "SELECT firstname, lastname, age FROM accounts ORDER BY age LIMIT 2" +}' +``` + +Result set: + +```json +{ + "schema": [{ + "name": "firstname", + "type": "text" + }, + { + "name": "lastname", + "type": "text" + }, + { + "name": "age", + "type": "long" + } + ], + "total": 4, + "datarows": [ + [ + "Nanette", + "Bates", + 28 + ], + [ + "Amber", + "Duke", + 32 + ] + ], + "size": 2, + "status": 200 +} +``` + +### Example 2 + +If any error occurred, error message and the cause will be returned +instead. + +SQL query: + +```console +>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_opensearch/_sql?format=jdbc -d '{ + "query" : "SELECT unknown FROM accounts" +}' +``` + +Result set: + +```json +{ + "error": { + "reason": "Invalid SQL query", + "details": "Field [unknown] cannot be found or used here.", + "type": "SemanticAnalysisException" + }, + "status": 400 +} +``` + +## CSV Format + +### Description + +You can also use CSV format to download result set as CSV. + +### Example + +SQL query: + +```console +>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_opensearch/_sql?format=csv -d '{ + "query" : "SELECT firstname, lastname, age FROM accounts ORDER BY age" +}' +``` + +Result set: + +```text +firstname,lastname,age +Nanette,Bates,28 +Amber,Duke,32 +Dale,Adams,33 +Hattie,Bond,36 +``` + +## Raw Format + +### Description + +Additionally raw format can be used to pipe the result to other command +line tool for post processing. + +### Example + +SQL query: + +```console +>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_opensearch/_sql?format=raw -d '{ + "query" : "SELECT firstname, lastname, age FROM accounts ORDER BY age" +}' +``` + +Result set: + +```text +Nanette|Bates|28 +Amber|Duke|32 +Dale|Adams|33 +Hattie|Bond|36 +``` diff --git a/docs/sql/settings.md b/docs/sql/settings.md new file mode 100644 index 00000000..9d91926f --- /dev/null +++ b/docs/sql/settings.md @@ -0,0 +1,33 @@ +--- +layout: default +title: Settings +parent: SQL +nav_order: 16 +--- + +# Settings + +The SQL plugin adds a few settings to the standard OpenSearch cluster settings. Most are dynamic, so you can change the default behavior of the plugin without restarting your cluster. + +You can update these settings like any other cluster setting: + +```json +PUT _cluster/settings +{ + "transient" : { + "opensearch.sql.enabled" : false + } +} +``` + +Setting | Default | Description +:--- | :--- | :--- +`opensearch.sql.enabled` | True | Change to `false` to disable the plugin. +`opensearch.sql.query.slowlog` | 2 seconds | Configure the time limit (in seconds) for slow queries. The plugin logs slow queries as `Slow query: elapsed=xxx (ms)` in `opensearch.log`. +`opensearch.sql.query.analysis.enabled` | True | Enables or disables the query analyzer. Changing this setting to `false` lets you bypass strict syntactic and semantic analysis. +`opensearch.sql.query.analysis.semantic.suggestion` | False | If enabled, the query analyzer suggests correct field names for quick fixes. +`opensearch.sql.query.analysis.semantic.threshold` | 200 | Because query analysis needs to build semantic context in memory, indices with a large number of fields are be skipped. You can update this setting to apply analysis to smaller or larger indices as needed. +`opensearch.sql.query.response.format` | JDBC | Sets the default response format for queries. The supported formats are JDBC, JSON, CSV, raw, and table. +`opensearch.sql.cursor.enabled` | False | You can enable or disable pagination for all queries that are supported. +`opensearch.sql.cursor.fetch_size` | 1,000 | You can set the default `fetch_size` for all queries that are supported by pagination. An explicit `fetch_size` passed in request overrides this value. +`opensearch.sql.cursor.keep_alive` | 1 minute | This value configures how long the cursor context is kept open. Cursor contexts are resource heavy, so we recommend a low value. diff --git a/docs/sql/sql-full-text.md b/docs/sql/sql-full-text.md new file mode 100644 index 00000000..809edea6 --- /dev/null +++ b/docs/sql/sql-full-text.md @@ -0,0 +1,119 @@ +--- +layout: default +title: Full-Text Search +parent: SQL +nav_order: 8 +--- + +# Full-text search + +Use SQL commands for full-text search. The SQL plugin supports a subset of the full-text queries available in OpenSearch. + +To learn about full-text queries in OpenSearch, see [Full-text queries](../../opensearch/full-text/). + +## Match + +To search for text in a single field, use `MATCHQUERY` or `MATCH_QUERY` functions. + +Pass in your search query and the field name that you want to search against. + + +```sql +SELECT account_number, address +FROM accounts +WHERE MATCH_QUERY(address, 'Holmes') +``` + +Alternate syntax: + +```sql +SELECT account_number, address +FROM accounts +WHERE address = MATCH_QUERY('Holmes') +``` + + +| account_number | address +:--- | :--- +1 | 880 Holmes Lane + + +## Multi match + +To search for text in multiple fields, use `MULTI_MATCH`, `MULTIMATCH`, or `MULTIMATCHQUERY` functions. + +For example, search for `Dale` in either the `firstname` or `lastname` fields: + + +```sql +SELECT firstname, lastname +FROM accounts +WHERE MULTI_MATCH('query'='Dale', 'fields'='*name') +``` + + +| firstname | lastname +:--- | :--- +Dale | Adams + + +## Query string + +To split text based on operators, use the `QUERY` function. + + +```sql +SELECT account_number, address +FROM accounts +WHERE QUERY('address:Lane OR address:Street') +``` + + +| account_number | address +:--- | :--- +1 | 880 Holmes Lane +6 | 671 Bristol Street +13 | 789 Madison Street + + +The `QUERY` function supports logical connectives, wildcard, regex, and proximity search. + + +## Match phrase + +To search for exact phrases, use `MATCHPHRASE`, `MATCH_PHRASE`, or `MATCHPHRASEQUERY` functions. + + +```sql +SELECT account_number, address +FROM accounts +WHERE MATCH_PHRASE(address, '880 Holmes Lane') +``` + + +| account_number | address +:--- | :--- +1 | 880 Holmes Lane + + +## Score query + +To return a relevance score along with every matching document, use `SCORE`, `SCOREQUERY`, or `SCORE_QUERY` functions. + +You need to pass in two arguments. The first is the `MATCH_QUERY` expression. The second is an optional floating point number to boost the score (default value is 1.0). + + +```sql +SELECT account_number, address, _score +FROM accounts +WHERE SCORE(MATCH_QUERY(address, 'Lane'), 0.5) OR + SCORE(MATCH_QUERY(address, 'Street'), 100) +ORDER BY _score +``` + + +| account_number | address | score +:--- | :--- +1 | 880 Holmes Lane | 0.5 +6 | 671 Bristol Street | 100 +13 | 789 Madison Street | 100 diff --git a/docs/sql/troubleshoot.md b/docs/sql/troubleshoot.md new file mode 100644 index 00000000..03d0717a --- /dev/null +++ b/docs/sql/troubleshoot.md @@ -0,0 +1,90 @@ +--- +layout: default +title: Troubleshooting +parent: SQL +nav_order: 17 +--- + +# Troubleshooting + +The SQL plugin is stateless, so troubleshooting is mostly focused on why a particular query fails. + +The most common error is the dreaded null pointer exception, which can occur during parsing errors or when using the wrong HTTP method (POST vs. GET and vice versa). The POST method and HTTP request body offer the most consistent results: + +```json +POST _opensearch/_sql +{ + "query": "SELECT * FROM my-index WHERE ['name.firstname']='saanvi' LIMIT 5" +} +``` + +If a query isn't behaving the way you expect, use the `_explain` API to see the translated query, which you can then troubleshoot. For most operations, `_explain` returns OpenSearch query DSL. For `UNION`, `MINUS`, and `JOIN`, it returns something more akin to a SQL execution plan. + +#### Sample request + +```json +POST _opensearch/_sql/_explain +{ + "query": "SELECT * FROM my-index LIMIT 50" +} +``` + + +#### Sample response + +```json +{ + "from": 0, + "size": 50 +} +``` + +## Syntax analysis exception + +You might receive the following error if the plugin can't parse your query: + +```json +{ + "reason": "Invalid SQL query", + "details": "Failed to parse query due to offending symbol [:] at: 'SELECT * FROM xxx WHERE xxx:' <--- HERE... + More details: Expecting tokens in {, 'AND', 'BETWEEN', 'GROUP', 'HAVING', 'IN', 'IS', 'LIKE', 'LIMIT', + 'NOT', 'OR', 'ORDER', 'REGEXP', '*', '/', '%', '+', '-', 'DIV', 'MOD', '=', '>', '<', '!', + '|', '&', '^', '.', DOT_ID}", + "type": "SyntaxAnalysisException" +} +``` + +To resolve this error: + +1. Check if your syntax follows the [MySQL grammar](https://dev.mysql.com/doc/refman/8.0/en/). +2. If your syntax is correct, disable strict query analysis: + + ```json + PUT _cluster/settings + { + "persistent" : { + "opensearch.sql.query.analysis.enabled" : false + } + } + ``` + +3. Run the query again to see if it works. + +## Index mapping verification exception + +If you see the following verification exception: + +```json +{ + "error": { + "reason": "There was internal problem at backend", + "details": "When using multiple indices, the mappings must be identical.", + "type": "VerificationException" + }, + "status": 503 +} +``` + +Make sure the index in your query is not an index pattern and is not an index pattern and doesn't have multiple types. + +If these steps don't work, submit a Github issue [here](https://github.com/opensearch-project/sql/issues). diff --git a/docs/sql/workbench.md b/docs/sql/workbench.md new file mode 100644 index 00000000..579cd3b8 --- /dev/null +++ b/docs/sql/workbench.md @@ -0,0 +1,73 @@ +--- +layout: default +title: Workbench +parent: SQL +nav_order: 1 +--- + + +# Workbench + +Use the SQL workbench to easily run on-demand SQL queries, translate SQL into its REST equivalent, and view and save results as text, JSON, JDBC, or CSV. + + +## Quick start + +To get started with SQL Workbench, choose **Dev Tools** in OpenSearch Dashboards and use the `bulk` operation to index some sample data: + +```json +PUT accounts/_bulk?refresh +{"index":{"_id":"1"}} +{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"} +{"index":{"_id":"6"}} +{"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"} +{"index":{"_id":"13"}} +{"account_number":13,"balance":32838,"firstname":"Nanette","lastname":"Bates","age":28,"gender":"F","address":"789 Madison Street","employer":"Quility","email":"nanettebates@quility.com","city":"Nogal","state":"VA"} +{"index":{"_id":"18"}} +{"account_number":18,"balance":4180,"firstname":"Dale","lastname":"Adams","age":33,"gender":"M","address":"467 Hutchinson Court","email":"daleadams@boink.com","city":"Orick","state":"MD"} +``` + +Then return to SQL Workbench. + + +### List indices + +To list all your indices: + +```sql +SHOW TABLES LIKE % +``` + +| id | TABLE_NAME +:--- | :--- +0 | accounts + + +### Read data + +After you index a document, retrieve it using the following SQL expression: + +```sql +SELECT * +FROM accounts +WHERE _id = 1 +``` + +| id | account_number | firstname | gender | city | balance | employer | state | email | address | lastname | age +:--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- +0 | 1 | Amber | M | Brogan | 39225 | Pyrami | IL | amberduke@pyrami.com | 880 Holmes Lane | Duke | 32 + + +### Delete data + +To delete a document from an index, use the `DELETE` clause: + +```sql +DELETE +FROM accounts +WHERE _id = 0 +``` + +| id | deleted_rows +:--- | :--- +0 | 1 diff --git a/docs/trace/data-prepper-reference.md b/docs/trace/data-prepper-reference.md new file mode 100644 index 00000000..59a6dd5c --- /dev/null +++ b/docs/trace/data-prepper-reference.md @@ -0,0 +1,168 @@ +--- +layout: default +title: Configuration Reference +parent: Trace analytics +nav_order: 25 +--- + +# Data Prepper configuration reference + +This page lists all supported Data Prepper sources, buffers, preppers, and sinks, along with their associated options. For example configuration files, see [Data Prepper](../data-prepper/). + + +## Data Prepper server options + +Option | Required | Description +:--- | :--- | :--- +ssl | No | Boolean, indicating whether TLS should be used for server APIs. Defaults to true. +keyStoreFilePath | No | String, path to a .jks or .p12 keystore file. Required if ssl is true. +keyStorePassword | No | String, password for keystore. Optional, defaults to empty string. +privateKeyPassword | No | String, password for private key within keystore. Optional, defaults to empty string. +serverPort | No | Integer, port number to use for server APIs. Defaults to 4900 + + +## General pipeline options + +Option | Required | Description +:--- | :--- | :--- +workers | No | Integer, default 1. Essentially the number of application threads. As a starting point for your use case, try setting this value to the number of CPU cores on the machine. +delay | No | Integer (milliseconds), default 3,000. How long workers wait between buffer read attempts. + + +## Sources + +Sources define where your data comes from. + + +### otel_trace_source + +Source for the OpenTelemetry Collector. + +Option | Required | Description +:--- | :--- | :--- +ssl | No | Boolean, whether to connect to the OpenTelemetry Collector over SSL. +sslKeyCertChainFile | No | String, path to the security certificate (e.g. `"config/demo-data-prepper.crt"`. +sslKeyFile | No | String, path to the security certificate key (e.g. `"config/demo-data-prepper.key"`). + + +### file + +Source for flat file input. + +Option | Required | Description +:--- | :--- | :--- +path | Yes | String, path to the input file (e.g. `logs/my-log.log`). + + +### pipeline + +Source for reading from another pipeline. + +Option | Required | Description +:--- | :--- | :--- +name | Yes | String, name of the pipeline to read from. + + +### stdin + +Source for console input. Can be useful for testing. No options. + + +## Buffers + +Buffers store data as it passes through the pipeline. If you implement a custom buffer, it can be memory-based (better performance) or disk-based (larger). + + +### bounded_blocking + +The default buffer. Memory-based. + +Option | Required | Description +:--- | :--- | :--- +buffer_size | No | Integer, default 512. The maximum number of records the buffer accepts. +batch_size | No | Integer, default 8. The maximum number of records the buffer drains after each read. + + +## Preppers + +Preppers perform some action on your data: filter, transform, enrich, etc. + + +### otel_trace_raw_prepper + +Converts OpenTelemetry data to OpenSearch-compatible JSON documents. No options. + + +### service_map_stateful + +Uses OpenTelemetry data to create a distributed service map for visualization in OpenSearch Dashboards. No options. + +### peer_forwarder +Forwards ExportTraceServiceRequests via gRPC to other Data Prepper instances. Required for operating Data Prepper in a clustered deployment. + +Option | Required | Description +:--- | :--- | :--- +time_out | No | Integer, forwarded request timeout in seconds. Defaults to 3 seconds. +span_agg_count | No | Integer, batch size for number of spans per request. Defaults to 48. +discovery_mode | No | String, peer discovery mode to be used. Allowable values are `static` and `dns`. Defaults to `static`. +static_endpoints | No | List, containing string endpoints of all Data Prepper instances. +domain_name | No | String, single domain name to query DNS against. Typically used by creating multiple DNS A Records for the same domain. +ssl | No | Boolean, indicating whether TLS should be used. Default is true. +sslKeyCertChainFile | No | String, path to the security certificate + +### string_converter + +Converts strings to uppercase or lowercase. Mostly useful as an example if you want to develop your own prepper. + +Option | Required | Description +:--- | :--- | :--- +upper_case | No | Boolean, whether to convert to uppercase (`true`) or lowercase (`false`). + + +## Sinks + +Sinks define where Data Prepper writes your data to. + + +### opensearch + +Sink for an OpenSearch cluster. + +Option | Required | Description +:--- | :--- | :--- +hosts | Yes | List of OpenSearch hosts to write to (e.g. `["https://localhost:9200", "https://remote-cluster:9200"]`). +cert | No | String, path to the security certificate (e.g. `"config/root-ca.pem"`) if the cluster uses the OpenSearch security plugin. +username | No | String, username for HTTP basic authentication. +password | No | String, password for HTTP basic authentication. +aws_sigv4 | No | Boolean, whether to use IAM signing to connect to an Amazon OpenSearch Service cluster. For your access key, secret key, and optional session token, Data Prepper uses the default credential chain (environment variables, Java system properties, `~/.aws/credential`, etc.). +aws_region | No | String, AWS region for the cluster (e.g. `"us-east-1"`) if you are connecting to Amazon OpenSearch Service. +trace_analytics_raw | No | Boolean, default false. Whether to export as trace data to the `otel-v1-apm-span-*` index pattern (alias `otel-v1-apm-span`) for use with the Trace Analytics OpenSearch Dashboards plugin. +trace_analytics_service_map | No | Boolean, default false. Whether to export as trace data to the `otel-v1-apm-service-map` index for use with the service map component of the Trace Analytics OpenSearch Dashboards plugin. +index | No | String, name of the index to export to. Only required if you don't use the `trace_analytics_raw` or `trace_analytics_service_map` presets. +template_file | No | String, the path to a JSON [index template](https://opensearch.github.io/for-opensearch-docs/docs/opensearch/index-templates/) file (e.g. `/your/local/template-file.json` if you do not use the `trace_analytics_raw` or `trace_analytics_service_map`. See [otel-v1-apm-span-index-template.json](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/src/main/resources/otel-v1-apm-span-index-template.json) for an example. +document_id_field | No | String, the field from the source data to use for the OpenSearch document ID (e.g. `"my-field"`) if you don't use the `trace_analytics_raw` or `trace_analytics_service_map` presets. +dlq_file | No | String, the path to your preferred dead letter queue file (e.g. `/your/local/dlq-file`). Data Prepper writes to this file when it fails to index a document on the OpenSearch cluster. +bulk_size | No | Integer (long), default 5. The maximum size (in MiB) of bulk requests to the OpenSearch cluster. Values below 0 indicate an unlimited size. If a single document exceeds the maximum bulk request size, Data Prepper sends it individually. + + +### file + +Sink for flat file output. + +Option | Required | Description +:--- | :--- | :--- +path | Yes | String, path for the output file (e.g. `logs/my-transformed-log.log`). + + +### pipeline + +Sink for writing to another pipeline. + +Option | Required | Description +:--- | :--- | :--- +name | Yes | String, name of the pipeline to write to. + + +### stdout + +Sink for console output. Can be useful for testing. No options. diff --git a/docs/trace/data-prepper.md b/docs/trace/data-prepper.md new file mode 100644 index 00000000..f15db417 --- /dev/null +++ b/docs/trace/data-prepper.md @@ -0,0 +1,135 @@ +--- +layout: default +title: Data Prepper +parent: Trace analytics +nav_order: 20 +--- + +# Data Prepper + +Data Prepper is an independent component, not an OpenSearch plugin, that converts data for use with OpenSearch. It's not bundled with the all-in-one OpenSearch installation packages. + + +## Install Data Prepper + +To use the Docker image, pull it like any other image: + +```bash +docker pull opensearch/opensearch-data-prepper:latest +``` + +Otherwise, [download](https://opensearch.org/downloads.html) the appropriate archive for your operating system and unzip it. + + +## Configure pipelines + +To use Data Prepper, you define pipelines in a configuration YAML file. Each pipeline is a combination of a source, a buffer, zero or more preppers, and one or more sinks: + +```yml +sample-pipeline: + workers: 4 # the number of workers + delay: 100 # in milliseconds, how long workers wait between read attempts + source: + otel_trace_source: + ssl: true + sslKeyCertChainFile: "config/demo-data-prepper.crt" + sslKeyFile: "config/demo-data-prepper.key" + buffer: + bounded_blocking: + buffer_size: 1024 # max number of records the buffer accepts + batch_size: 256 # max number of records the buffer drains after each read + prepper: + - otel_trace_raw_prepper: + sink: + - opensearch: + hosts: ["https:localhost:9200"] + cert: "config/root-ca.pem" + username: "ta-user" + password: "ta-password" + trace_analytics_raw: true +``` + +- Sources define where your data comes from. In this case, the source is the OpenTelemetry Collector (`otel_trace_source`) with some optional SSL settings. + +- Buffers store data as it passes through the pipeline. + + By default, Data Prepper uses its one and only buffer, the `bounded_blocking` buffer, so you can omit this section unless you developed a custom buffer or need to tune the buffer settings. + +- Preppers perform some action on your data: filter, transform, enrich, etc. + + You can have multiple preppers, which run sequentially from top to bottom, not in parallel. The `otel_trace_raw_prepper` prepper converts OpenTelemetry data into OpenSearch-compatible JSON documents. + +- Sinks define where your data goes. In this case, the sink is an OpenSearch cluster. + +Pipelines can act as the source for other pipelines. In the following example, a pipeline takes data from the OpenTelemetry Collector and uses two other pipelines as sinks: + +```yml +entry-pipeline: + delay: "100" + source: + otel_trace_source: + ssl: true + sslKeyCertChainFile: "config/demo-data-prepper.crt" + sslKeyFile: "config/demo-data-prepper.key" + sink: + - pipeline: + name: "raw-pipeline" + - pipeline: + name: "service-map-pipeline" +raw-pipeline: + source: + pipeline: + name: "entry-pipeline" + prepper: + - otel_trace_raw_prepper: + sink: + - opensearch: + hosts: ["https://localhost:9200" ] + cert: "config/root-ca.pem" + username: "ta-user" + password: "ta-password" + trace_analytics_raw: true +service-map-pipeline: + delay: "100" + source: + pipeline: + name: "entry-pipeline" + prepper: + - service_map_stateful: + sink: + - opensearch: + hosts: ["https://localhost:9200"] + cert: "config/root-ca.pem" + username: "ta-user" + password: "ta-password" + trace_analytics_service_map: true +``` + +To learn more, see the [Data Prepper configuration reference](../data-prepper-reference/). + +## Configure the Data Prepper server +Data Prepper itself provides administrative HTTP endpoints such as `/list` to list pipelines and `/metrics/prometheus` to provide Prometheus-compatible metrics data. The port which serves these endpoints, as well as TLS configuration, is specified by a separate YAML file. Example: + +```yml +ssl: true +keyStoreFilePath: "/usr/share/data-prepper/keystore.jks" +keyStorePassword: "password" +privateKeyPassword: "other_password" +serverPort: 1234 +``` + +## Start Data Prepper + +**Docker** + +```bash +docker run --name data-prepper --expose 21890 -v /full/path/to/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml -v /full/path/to/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml opensearch/opensearch-data-prepper:latest +``` + +**macOS and Linux** + +```bash +./data-prepper-tar-install.sh config/pipelines.yaml config/data-prepper-config.yaml +``` + +For production workloads, you likely want to run Data Prepper on a dedicated machine, which makes connectivity a concern. Data Prepper uses port 21890 and must be able to connect to both the OpenTelemetry Collector and the OpenSearch cluster. In the [sample applications](https://github.com/opensearch-project/Data-Prepper/tree/main/examples), you can see that all components use the same Docker network and expose the appropriate ports. diff --git a/docs/trace/get-started.md b/docs/trace/get-started.md new file mode 100644 index 00000000..85fcd867 --- /dev/null +++ b/docs/trace/get-started.md @@ -0,0 +1,83 @@ +--- +layout: default +title: Get Started +parent: Trace analytics +nav_order: 1 +--- + +# Get started with Trace Analytics + +OpenSearch Trace Analytics consists of two components---Data Prepper and the Trace Analytics OpenSearch Dashboards plugin---that fit into the OpenTelemetry and OpenSearch ecosystems. The Data Prepper repository has several [sample applications](https://github.com/opensearch-project/data-prepper/tree/main/examples) to help you get started. + + +## Basic flow of data + +![Data flow diagram from a distributed application to OpenSearch](../../images/ta.svg) + +1. Trace Analytics relies on you adding instrumentation to your application and generating trace data. The [OpenTelemetry documentation](https://opentelemetry.io/docs/) contains example applications for many programming languages that can help you get started, including Java, Python, Go, and JavaScript. + + (In the [Jaeger HotROD](#jaeger-hotrod) example below, an extra component, the Jaeger agent, runs alongside the application and sends the data to the OpenTelemetry Collector, but the concept is similar.) + +1. The [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/getting-started/) receives data from the application and formats it into OpenTelemetry data. + +1. [Data Prepper](../data-prepper/) processes the OpenTelemetry data, transforms it for use in OpenSearch, and indexes it on an OpenSearch cluster. + +1. The [Trace Analytics OpenSearch Dashboards plugin](../ta-opensearch-dashboards/) displays the data in near real-time as a series of charts and tables, with an emphasis on service architecture, latency, error rate, and throughput. + + +## Jaeger HotROD + +One Trace Analytics sample application is the Jaeger HotROD demo, which mimics the flow of data through a distributed application. + +Download or clone the [Data Prepper repository](https://github.com/opensearch-project/data-prepper). Then navigate to `examples/jaeger-hotrod/` and open `docker-compose.yml` in a text editor. This file contains a container for each element from [Basic flow of data](#basic-flow-of-data): + +- A distributed application (`jaeger-hot-rod`) with the Jaeger agent (`jaeger-agent`) +- The [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/getting-started/) (`otel-collector`) +- Data Prepper (`data-prepper`) +- A single-node OpenSearch cluster (`opensearch`) +- OpenSearch Dashboards (`opensearch-dashboards`). + +Close the file and run `docker-compose up --build`. After the containers start, navigate to `http://localhost:8080` in a web browser. + +![HotROD web interface](../../images/hot-rod.png) + +Click one of the buttons in the web interface to send a request to the application. Each request starts a series of operations across the services that make up the application. From the console logs, you can see that these operations share the same `trace-id`, which lets you track all of the operations in the request as a single *trace*: + +``` +jaeger-hot-rod | http://0.0.0.0:8081/customer?customer=392 +jaeger-hot-rod | 2020-11-19T16:29:53.425Z INFO frontend/server.go:92 HTTP request received {"service": "frontend", "trace_id": "12091bd60f45ea2c", "span_id": "12091bd60f45ea2c", "method": "GET", "url": "/dispatch?customer=392&nonse=0.6509021735471818"} +jaeger-hot-rod | 2020-11-19T16:29:53.426Z INFO customer/client.go:54 Getting customer{"service": "frontend", "component": "customer_client", "trace_id": "12091bd60f45ea2c", "span_id": "12091bd60f45ea2c", "customer_id": "392"} +jaeger-hot-rod | 2020-11-19T16:29:53.430Z INFO customer/server.go:67 HTTP request received {"service": "customer", "trace_id": "12091bd60f45ea2c", "span_id": "252ff7d0e1ac533b", "method": "GET", "url": "/customer?customer=392"} +jaeger-hot-rod | 2020-11-19T16:29:53.430Z INFO customer/database.go:73 Loading customer{"service": "customer", "component": "mysql", "trace_id": "12091bd60f45ea2c", "span_id": "252ff7d0e1ac533b", "customer_id": "392"} +``` + +These operations also have a `span_id`. *Spans* are units of work from a single service. Each trace contains some number of spans. Shortly after the application starts processing the request, you can see the OpenTelemetry Collector starts exporting the spans: + +``` +otel-collector | 2020-11-19T16:29:53.781Z INFO loggingexporter/logging_exporter.go:296 TraceExporter {"#spans": 1} +otel-collector | 2020-11-19T16:29:53.787Z INFO loggingexporter/logging_exporter.go:296 TraceExporter {"#spans": 3} +``` + +Then Data Prepper processes the data from the OpenTelemetry Collector and indexes it: + +``` +data-prepper | 1031918 [service-map-pipeline-process-worker-2-thread-1] INFO com.amazon.dataprepper.pipeline.ProcessWorker – service-map-pipeline Worker: Processing 3 records from buffer +data-prepper | 1031923 [entry-pipeline-process-worker-1-thread-1] INFO com.amazon.dataprepper.pipeline.ProcessWorker – entry-pipeline Worker: Processing 1 records from buffer +``` + +Finally, you can see the OpenSearch node responding to the indexing request. + +``` +node-0.example.com | [2020-11-19T16:29:55,064][INFO ][o.e.c.m.MetadataMappingService] [9fb4fb37a516] [otel-v1-apm-span-000001/NGYbmVD9RmmqnxjfTzBQsQ] update_mapping [_doc] +node-0.example.com | [2020-11-19T16:29:55,267][INFO ][o.e.c.m.MetadataMappingService] [9fb4fb37a516] [otel-v1-apm-span-000001/NGYbmVD9RmmqnxjfTzBQsQ] update_mapping [_doc] +``` + +In a new terminal window, run the following command to see one of the raw documents in the OpenSearch cluster: + +```bash +curl -X GET -u 'admin:admin' -k 'https://localhost:9200/otel-v1-apm-span-000001/_search?pretty&size=1' +``` + +Navigate to `http://localhost:5601` in a web browser and choose **Trace Analytics**. You can see the results of your single click in the Jaeger HotROD web interface: the number of traces per API and HTTP method, latency trends, a color-coded map of the service architecture, and a list of trace IDs that you can use to drill down on individual operations. + +If you don't see your trace, adjust the timeframe in OpenSearch Dashboards. For more information on using the plugin, see [OpenSearch Dashboards plugin](../ta-opensearch-dashboards/). diff --git a/docs/trace/index.md b/docs/trace/index.md new file mode 100644 index 00000000..a5b93074 --- /dev/null +++ b/docs/trace/index.md @@ -0,0 +1,17 @@ +--- +layout: default +title: Trace analytics +nav_order: 48 +has_children: true +has_toc: false +--- + +# Trace Analytics + +Trace Analytics provides a way to ingest and visualize [OpenTelemetry](https://opentelemetry.io/) data in OpenSearch. This data can help you find and fix performance problems in distributed applications. + +A single operation, such as a user clicking a button, can trigger an extended series of events. The front end might call a back end service, which calls another service, which queries a database, processes the data, and sends it to the original service, which sends a confirmation to the front end. + +Trace Analytics can help you visualize this flow of events and identify performance problems. + +![Detailed trace view](../images/ta-trace.png) diff --git a/docs/trace/ta-dashboards.md b/docs/trace/ta-dashboards.md new file mode 100644 index 00000000..1518f90a --- /dev/null +++ b/docs/trace/ta-dashboards.md @@ -0,0 +1,22 @@ +--- +layout: default +title: OpenSearch Dashboards plugin +parent: Trace analytics +nav_order: 50 +--- + +# Trace Analytics OpenSearch Dashboards plugin + +The Trace Analytics plugin for OpenSearch Dashboards provides at-a-glance visibility into your application performance, along with the ability to drill down on individual traces. For installation instructions, see [Standalone OpenSearch Dashboards plugin install](../../opensearch-dashboards/plugins/). + +The **Dashboard** view groups traces together by HTTP method and path so that you can see the average latency, error rate, and trends associated with a particular operation. For a more focused view, try filtering by trace group name. + +![Dashboard view](../../images/ta-dashboard.png) + +To drill down on the traces that make up a trace group, choose the number of traces in righthand column. Then choose an individual trace for a detailed summary. + +![Detailed trace view](../../images/ta-trace.png) + +The **Services** view lists all services in the application, plus an interactive map that shows how the various services connect to each other. In contrast to the dashboard, which helps identify problems by operation, the service map helps identify problems by service. Try sorting by error rate or latency to get a sense of potential problem areas of your application. + +![Service view](../../images/ta-services.png) diff --git a/docs/troubleshoot/index.md b/docs/troubleshoot/index.md new file mode 100644 index 00000000..1381e3a9 --- /dev/null +++ b/docs/troubleshoot/index.md @@ -0,0 +1,41 @@ +--- +layout: default +title: Troubleshoot +nav_order: 62 +has_children: true +has_toc: false +--- + +# Troubleshoot + +This section contains a list of issues and workarounds. + + +## Java error during startup + +You might see `[ERROR][c.a.o.s.s.t.OpenSearchSecuritySSLNettyTransport] [opensearch-node1] SSL Problem Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)` when starting OpenSearch. This problem is a [known issue with Java](https://bugs.openjdk.java.net/browse/JDK-8221218) and doesn't affect the operation of the cluster. + + +## OpenSearch Dashboards fails to start + +If you encounter the error `FATAL Error: Request Timeout after 30000ms` during startup, try running OpenSearch Dashboards on a more powerful machine. We recommend four CPU cores and 8 GB of RAM. + + +## Can't open OpenSearch Dashboards on Windows + +OpenSearch Dashboards doesn't support Microsoft Edge and many versions of Internet Explorer. We recommend using Firefox or Chrome. + + +## Can't update by script when FLS, DLS, or field masking is active + +The security plugin blocks the update by script operation (`POST /_update/`) when field-level security, document-level security, or field masking are active. You can still update documents using the standard index operation (`PUT /_doc/`). + + +## Illegal reflective access operation in logs + +This is a [known issue](https://github.com/opensearch-project/performance-analyzer/issues/21) with Performance Analyzer that shouldn't affect functionality. + + +## Multi-tenancy issues in OpenSearch Dashboards + +If you're testing multiple users in OpenSearch Dashboards and encounter unexpected changes in tenant, use Google Chrome in an Incognito window or Firefox in a Private window. diff --git a/docs/troubleshoot/openid-connect.md b/docs/troubleshoot/openid-connect.md new file mode 100644 index 00000000..96f175e5 --- /dev/null +++ b/docs/troubleshoot/openid-connect.md @@ -0,0 +1,118 @@ +--- +layout: default +title: Troubleshoot OpenID Connect +parent: Troubleshoot +nav_order: 3 +--- + +# OpenID Connect troubleshooting + +This page includes troubleshooting steps for using OpenID Connect with the security plugin. + + +--- + +#### Table of contents +- TOC +{:toc} + + +--- + +## Set log level to debug + +To help troubleshoot OpenID Connect, set the log level to `debug` on OpenSearch. Add the following lines in `config/log4j2.properties` and restart the node: + +``` +logger.opensearch_security.name = com.amazon.dlic.auth.http.jwt +logger.opensearch_security.level = trace +``` + +This setting prints a lot of helpful information to your log file. If this information isn't sufficient, you can also set the log level to `trace`. + + +## "Failed when trying to obtain the endpoints from your IdP" + +This error indicates that the security plugin can't reach the metadata endpoint of your IdP. In `opensearch_dashboards.yml`, check the following setting: + +``` +opensearch_security.openid.connect_url: "http://keycloak.example.com:8080/auth/realms/master/.well-known/openid-configuration" +``` + +If this error occurs on OpenSearch, check the following setting in `config.yml`: + +```yml +openid_auth_domain: + enabled: true + order: 1 + http_authenticator: + type: "openid" + ... + config: + openid_connect_url: http://keycloak.examplesss.com:8080/auth/realms/master/.well-known/openid-configuration + ... +``` + +## "ValidationError: child 'opensearch_security' fails" + +This indicates that one or more of the OpenSearch Dashboards configuration settings are missing. + +Check `opensearch_dashboards.yml` and make sure you have set the following minimal configuration: + +```yml +opensearch_security.openid.connect_url: "..." +opensearch_security.openid.client_id: "..." +opensearch_security.openid.client_secret: "..." +``` + + +## "Authentication failed. Please provide a new token." + +This error has several potential root causes. + + +### Leftover cookies or cached credentials + +Please delete all cached browser data, or try again in a private browser window. + + +### Wrong client secret + +To trade the access token for an identity token, most IdPs require you to provide a client secret. Check if the client secret in `opensearch_dashboards.yml` matches the client secret of your IdP configuration: + +``` +opensearch_security.openid.client_secret: "..." +``` + + +### "Failed to get subject from JWT claims" + +This error is logged on OpenSearch and means that the username could not be extracted from the ID token. Make sure the following setting matches the claims in the JWT your IdP issues: + +``` +openid_auth_domain: + enabled: true + order: 1 + http_authenticator: + type: "openid" + ... + config: + subject_key: + ... +``` + +### "Failed to get roles from JWT claims with roles_key" + +This error indicates that the roles key you configured in `config.yml` does not exist in the JWT issued by your IdP. Make sure the following setting matches the claims in the JWT your IdP issues: + +``` +openid_auth_domain: + enabled: true + order: 1 + http_authenticator: + type: "openid" + ... + config: + roles_key: + ... +``` diff --git a/docs/troubleshoot/saml.md b/docs/troubleshoot/saml.md new file mode 100644 index 00000000..16b69e99 --- /dev/null +++ b/docs/troubleshoot/saml.md @@ -0,0 +1,139 @@ +--- +layout: default +title: Troubleshoot SAML +parent: Troubleshoot +nav_order: 2 +--- + +# SAML troubleshooting + +This page includes troubleshooting steps for using SAML for OpenSearch Dashboards authentication. + + +--- + +#### Table of contents +- TOC +{:toc} + + +--- + +## Check sp.entity_id + +Most identity providers (IdPs) allow you to configure multiple authentication methods for different applications. For example, in Okta, these clients are called "Applications." In Keycloak, they are called "Clients." Each one has its own entity ID. Make sure to configure `sp.entity_id` to match those settings: + +```yml +saml: + ... + http_authenticator: + type: 'saml' + challenge: true + config: + ... + sp: + entity_id: opensearch-dashboards-saml +``` + + +## Check the SAML assertion consumer service URL + +After a successful login, your IdP sends a SAML response using HTTP POST to OpenSearch Dashboards's "assertion consumer service URL" (ACS). + +The endpoint the OpenSearch Dashboards security plugin provides is: + +``` +/_opensearch/_security/saml/acs +``` + +Make sure that you have configured this endpoint correctly in your IdP. Some IdPs also require you to whitelist all endpoints that they send requests to. Ensure that the ACS endpoint is listed. + +OpenSearch Dashboards also requires you to whitelist this endpoint. Make sure you have the following entry in `opensearch_dashboards.yml`: + +``` +server.xsrf.whitelist: [/_opensearch/_security/saml/acs] +``` + + +## Sign all documents + +Some IdPs do not sign the SAML documents by default. Make sure the IdP signs all documents. + + +#### Keycloak + +![Keycloak UI](../../images/saml-keycloak-sign-documents.png) + + +## Role settings + +Including user roles in the SAML response is dependent on your IdP. For example, in Keycloak, this setting is in the **Mappers** section of your client. In Okta, you have to set group attribute statements. Make sure this is configured correctly and that the `roles_key` in the SAML configuration matches the role name in the SAML response: + +```yml +saml: + ... + http_authenticator: + type: 'saml' + challenge: true + config: + ... + roles_key: Role +``` + + +## Inspect the SAML response + +If you are not sure what the SAML response of your IdP contains and where it places the username and roles, you can enable debug mode in the `log4j2.properties`: + +``` +logger.token.name = com.amazon.dlic.auth.http.saml.Token +logger.token.level = debug +``` + +This setting prints the SAML response to the OpenSearch log file so that you can inspect and debug it. Setting this logger to `debug` generates many statements, so we don't recommend using it in production. + +Another way of inspecting the SAML response is to monitor network traffic while logging in to OpenSearch Dashboards. The IdP uses HTTP POST requests to send Base64-encoded SAML responses to: + +``` +/_opensearch/_security/saml/acs +``` + +Inspect the payload of this POST request, and use a tool like [base64decode.org](https://www.base64decode.org/) to decode it. + + +## Check role mapping + +The security plugin uses a standard role mapping to map a user or backend role to one or more Security roles. + +For username, the security plugin uses the `NameID` attribute of the SAML response by default. For some IdPs, this attribute does not contain the expected username, but some internal user ID. Check the content of the SAML response to locate the element you want to use as username, and configure it by setting the `subject_key`: + +```yml +saml: + ... + http_authenticator: + type: 'saml' + challenge: true + config: + ... + subject_key: preferred_username +``` + +For checking that the correct backend roles are contained in the SAML response, inspect the contents, and set the correct attribute name: + +```yml +saml: + ... + http_authenticator: + type: 'saml' + challenge: true + config: + ... + roles_key: Role +``` + + +## Inspect the JWT token + +The security plugin trades the SAML response for a more lightweight JSON web token. The username and backend roles in the JWT are ultimately mapped to roles in the security plugin. If there is a problem with the mapping, you can enable the token debug mode using the same setting as [Inspect the SAML response](#inspect-the-saml-response). + +This setting prints the JWT to the OpenSearch log file so that you can inspect and debug it using a tool like [JWT.io](https://jwt.io/). diff --git a/docs/troubleshoot/security-admin.md b/docs/troubleshoot/security-admin.md new file mode 100644 index 00000000..0da8af8f --- /dev/null +++ b/docs/troubleshoot/security-admin.md @@ -0,0 +1,107 @@ +--- +layout: default +title: Troubleshoot securityadmin.sh +parent: Troubleshoot +nav_order: 4 +--- + +# securityadmin.sh Troubleshooting + +This page includes troubleshooting steps for `securityadmin.sh`. + + +--- + +#### Table of contents +- TOC +{:toc} + + +--- + +## Cluster not reachable + +If `securityadmin.sh` can't reach the cluster, it outputs: + +``` +OpenSearch Security Admin v6 +Will connect to localhost:9300 +ERR: Seems there is no opensearch running on localhost:9300 - Will exit +``` + + +### Check hostname + +By default, `securityadmin.sh` uses `localhost`. If your cluster runs on any other host, specify the hostname using the `-h` option. + + +### Check the port + +Check that you are running `securityadmin.sh` against the transport port, **not** the HTTP port. + +By default, `securityadmin.sh` uses `9300`. If your cluster runs on a different port, use the `-p` option to specify the port number. + + +## None of the configured nodes are available + +If `securityadmin.sh` can reach the cluster, but can't update the configuration, it outputs this error: + +``` +Contacting opensearch cluster 'opensearch' and wait for YELLOW clusterstate ... +Cannot retrieve cluster state due to: None of the configured nodes are available: [{#transport#-1}{mr2NlX3XQ3WvtVG0Dv5eHw}{localhost}{127.0.0.1:9300}]. This is not an error, will keep on trying ... +``` + +* Try running `securityadmin.sh` with `-icl` and `-nhnv`. + + If this works, check your cluster name as well as the hostnames in your SSL certificates. If this does not work, try running `securityadmin.sh` with `--diagnose` and see diagnose trace log file. + +* Add `--accept-red-cluster` to allow `securityadmin.sh` to operate on a red cluster. + + +### Check cluster name + +By default, `securityadmin.sh` uses `opensearch` as the cluster name. + +If your cluster has a different name, you can either ignore the name completely using the `-icl` option or specify the name using the `-cn` option. + + +### Check hostname verification + +By default, `securityadmin.sh` verifies that the hostname in your node's certificate matches the node's actual hostname. + +If this is not the case (e.g. if you're using the demo certificates), you can disable hostname verification by adding the `-nhnv` option. + + +### Check cluster state + +By default, `securityadmin.sh` only executes if the cluster state is at least yellow. + +If your cluster state is red, you can still execute `securityadmin.sh`, but you need to add the `-arc` option. + + +### Check the security index name + +By default, the security plugin uses `opensearch_security` as the name of the configuration index. If you configured a different index name in `opensearch.yml`, specify it using the `-i` option. + + +## "ERR: DN is not an admin user" + +If the TLS certificate used to start `securityadmin.sh` isn't an admin certificate, the script outputs: + +``` +Connected as CN=node-0.example.com,OU=SSL,O=Test,L=Test,C=DE +ERR: CN=node-0.example.com,OU=SSL,O=Test,L=Test,C=DE is not an admin user +``` + +You must use an admin certificate when executing the script. To learn more, see [Configure admin certificates](../../security/configuration/tls/#configure-admin-certificates). + + +## Use the diagnose option + +For more information on why `securityadmin.sh` is not executing, add the `--diagnose` option: + +``` +./securityadmin.sh -diagnose -cd ../securityconfig/ -cacert ... -cert ... -key ... -keypass ... +``` + +The script prints the location of the generated diagnostic file. diff --git a/docs/troubleshoot/tls.md b/docs/troubleshoot/tls.md new file mode 100644 index 00000000..3db3b0d7 --- /dev/null +++ b/docs/troubleshoot/tls.md @@ -0,0 +1,225 @@ +--- +layout: default +title: Troubleshoot TLS +parent: Troubleshoot +nav_order: 1 +--- + +# TLS troubleshooting + +This page includes troubleshooting steps for configuring TLS certificates with the security plugin. + + +--- + +#### Table of contents +- TOC +{:toc} + + +--- + + +## Validate YAML + +`opensearch.yml` and the files in `opensearch_security/securityconfig/` are in the YAML format. A linter like [YAML Lint](http://www.yamllint.com/) can help verify that you don't have any formatting errors. + + +## View contents of PEM certificates + +You can use OpenSSL to display the content of each PEM certificate: + +```bash +openssl x509 -subject -nameopt RFC2253 -noout -in node1.pem +``` + +Then ensure that the value matches the one in `opensearch.yml`. + +For more complete information on a certificate: + +```bash +openssl x509 -in node1.pem -text -noout +``` + + +### Check for special characters and whitespace in DNs + +The security plugin uses the [string representation of Distinguished Names (RFC1779)](https://www.ietf.org/rfc/rfc1779.txt) when validating node certificates. + +If parts of your DN contain special characters (e.g. a comma), make sure you escape it in your configuration: + +```yml +opensearch_security.nodes_dn: + - 'CN=node-0.example.com,OU=SSL,O=My\, Test,L=Test,C=DE' +``` + +You can have whitespace within a field, but not between fields. + +#### Bad configuration + +```yml +opensearch_security.nodes_dn: + - 'CN=node-0.example.com, OU=SSL,O=My\, Test, L=Test, C=DE' +``` + +#### Good configuration + +```yml +opensearch_security.nodes_dn: + - 'CN=node-0.example.com,OU=SSL,O=My\, Test,L=Test,C=DE' +``` + + +### Check certificate IP addresses + +Sometimes the IP address in your certificate is not the one communicating with the cluster. This problem can occur if your node has multiple interfaces or is running on a dual stack network (IPv6 and IPv4). + +If this problem occurs, you might see the following in the node's OpenSearch log: + +``` +SSL Problem Received fatal alert: certificate_unknown javax.net.ssl.SSLException: Received fatal alert: certificate_unknown +``` + +You might also see the following message in your cluster's master log when the new node tries to join the cluster: + +``` +Caused by: java.security.cert.CertificateException: No subject alternative names matching IP address 10.0.0.42 found +``` + +Check the IP address in the certificate: + +``` +IPAddress: 2001:db8:0:1:1.2.3.4 +``` + +In this example, the node tries to join the cluster with the IPv4 address of `10.0.0.42`, but the certificate contians the IPv6 address of `2001:db8:0:1:1.2.3.4`. + + +### Validate certificate chain + +TLS certificates are organized in a certificate chain. You can check with `keytool` that the certificate chain is correct by inspecting the owner and the issuer of each certificate. If you used the demo installation script that ships with the security plugin, the chain looks like: + +#### Node certificate + +``` +Owner: CN=node-0.example.com, OU=SSL, O=Test, L=Test, C=DE +Issuer: CN=Example Com Inc. Signing CA, OU=Example Com Inc. Signing CA, O=Example Com Inc., DC=example, DC=com +``` + +#### Signing certificate + +``` +Owner: CN=Example Com Inc. Signing CA, OU=Example Com Inc. Signing CA, O=Example Com Inc., DC=example, DC=com +Issuer: CN=Example Com Inc. Root CA, OU=Example Com Inc. Root CA, O=Example Com Inc., DC=example, DC=com +``` + +#### Root certificate + +``` +Owner: CN=Example Com Inc. Root CA, OU=Example Com Inc. Root CA, O=Example Com Inc., DC=example, DC=com +Issuer: CN=Example Com Inc. Root CA, OU=Example Com Inc. Root CA, O=Example Com Inc., DC=example, DC=com +``` + +From the entries, you can see that the root certificate signed the intermediate certificate, which signed the node certificate. The root certificate signed itself, hence the name "self-signed certificate." If you're using separate keystore and truststore files, your root CA can most likely in the truststore. + +Generally, the keystore contains client or node certificate and all intermediate certificates, and the truststore contains the root certificate. + + +### Check the configured alias + +If you have multiple entries in the keystore and you are using aliases to refer to them, make sure that the configured alias in `opensearch.yml` matches the one in the keystore. If there is only one entry in the keystore, you do not need to configure an alias. + + +## View contents of your keystore and truststore + +In order to view information about the certificates stored in your keystore or truststore, use the `keytool` command like: + +```bash +keytool -list -v -keystore keystore.jks +``` + +`keytool` prompts for the password of the keystore and lists all entries. For example, you can use this output to check for the correctness of the SAN and EKU settings. + + +## Check SAN hostnames and IP addresses + +The valid hostnames and IP addresses of a TLS certificates are stored as `SAN` entries. Check that the hostname and IP entries in the `SAN` section are correct, especially when you use hostname verification: + +``` +Certificate[1]: +Owner: CN=node-0.example.com, OU=SSL, O=Test, L=Test, C=DE +... +Extensions: +... +#5: ObjectId: 2.5.29.17 Criticality=false +SubjectAlternativeName [ + DNSName: node-0.example.com + DNSName: localhost + IPAddress: 127.0.0.1 + ... +] +``` + + +## Check OID for node certificates + +If you are using OIDs to denote valid node certificates, check that the `SAN` extension for your node certificate contains the correct `OIDName`: + +``` +Certificate[1]: +Owner: CN=node-0.example.com, OU=SSL, O=Test, L=Test, C=DE +... +Extensions: +... +#5: ObjectId: 2.5.29.17 Criticality=false +SubjectAlternativeName [ + ... + OIDName: 1.2.3.4.5.5 +] +``` + + +## Check EKU field for node certificates + +Node certificates need to have both `serverAuth` and `clientAuth` set in the extended key usage field: + +``` +#3: ObjectId: 2.5.29.37 Criticality=false +ExtendedKeyUsages [ + serverAuth + clientAuth +] +``` + + +## TLS versions + +The security plugin disables TLS version 1.0 by default; it is outdated, insecure, and vulnerable. If you need to use `TLSv1` and accept the risks, you can enable it in `opensearch.yml`: + +```yml +opensearch_security.ssl.http.enabled_protocols: + - "TLSv1" + - "TLSv1.1" + - "TLSv1.2" +``` + + +## Supported ciphers + +TLS relies on the server and client negotiating a common cipher suite. Depending on your system, the available ciphers will vary. They depend on the JDK or OpenSSL version you're using, and whether or not the `JCE Unlimited Strength Jurisdiction Policy Files` are installed. + +For legal reasons, the JDK does not include strong ciphers like AES256. In order to use strong ciphers you need to download and install the [Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files](http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html). If you don't have them installed, you might see an error message on startup: + +``` +[INFO ] AES-256 not supported, max key length for AES is 128 bit. +That is not an issue, it just limits possible encryption strength. +To enable AES 256 install 'Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files' +``` + +The security plugin still works and falls back to weaker cipher suites. The plugin also prints out all available cipher suites during startup: + +``` +[INFO ] sslTransportClientProvider: +JDK with ciphers [TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256, TLS_DHE_RSA_WITH_AES_128_CBC_SHA256, +TLS_DHE_DSS_WITH_AES_128_CBC_SHA256, ...] +``` diff --git a/favicon.ico b/favicon.ico new file mode 100644 index 00000000..f54f4541 Binary files /dev/null and b/favicon.ico differ diff --git a/favicon.png b/favicon.png new file mode 100644 index 00000000..f54f4541 Binary files /dev/null and b/favicon.png differ diff --git a/favicon.svg b/favicon.svg new file mode 100644 index 00000000..e3e4276c --- /dev/null +++ b/favicon.svg @@ -0,0 +1,5 @@ + + + + + diff --git a/index.md b/index.md new file mode 100755 index 00000000..542b8b6d --- /dev/null +++ b/index.md @@ -0,0 +1,72 @@ +--- +layout: default +title: Get started +nav_order: 1 +redirect_from: /404.html +permalink: / +--- + +# OpenSearch documentation + +This site contains the technical documentation for [OpenSearch](https://opensearch.org/), the search, analytics, and visualization suite with advanced security, alerting, SQL support, automated index management, deep performance analysis, and more. + +[Get started](#docker-quickstart){: .btn .btn-blue } + + +--- + +## Why use OpenSearch? + +OpenSearch is well-suited to the following use cases: + +* Log analytics +* Real-time application monitoring +* Clickstream analytics +* Search backend + +Component | Purpose +:--- | :--- +[OpenSearch](docs/opensearch/) | Data store and search engine +[OpenSearch Dashboards](docs/opensearch-dashboards/) | Search frontend and visualizations +[Security](docs/security/) | Authentication and access control for your cluster +[Alerting](docs/alerting/) | Receive notifications when your data meets certain conditions +[SQL](docs/sql/) | Use SQL or a piped processing language to query your data +[Index State Management](docs/ism/) | Automate index operations +[KNN](docs/knn/) | Find “nearest neighbors” in your vector data +[Performance Analyzer](docs/pa/) | Monitor and optimize your cluster +[Anomaly Detection](docs/ad/) | Identify atypical data and receive automatic notifications +[Asynchronous Search](docs/async/) | Run search requests in the background + +You can install OpenSearch plugins [individually](docs/install/plugins/) on existing clusters or use the [all-in-one packages](docs/install/) for new clusters. Most of these OpenSearch plugins have corresponding OpenSearch Dashboards plugins that provide a convenient, unified user interface. + + +--- + +## Docker quickstart +Docker +{: .label .label-green } + +1. Install and start [Docker Desktop](https://www.docker.com/products/docker-desktop). +1. Run the following commands: + + ```bash + docker pull amazon/opensearch:{{site.opensearch_version}} + docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" amazon/opensearch:{{site.opensearch_version}} + ``` + +1. In a new terminal session, run: + + ```bash + curl -XGET --insecure https://localhost:9200 -u admin:admin + ``` + +To learn more, see [Install](docs/install/). + + +--- + +## Get involved + +[OpenSearch](https://opensearch.org) is supported by Amazon Web Services. All components are available under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0.html) on [GitHub](https://github.com/opensearch-project/). + +The project welcomes GitHub issues, bug fixes, features, plugins, documentation---anything at all. To get involved, see [Contribute](https://opensearch.org/contribute.html) on the OpenSearch website. diff --git a/version-history.md b/version-history.md new file mode 100644 index 00000000..a89d8447 --- /dev/null +++ b/version-history.md @@ -0,0 +1,24 @@ +--- +layout: default +title: Version history +nav_order: 2 +permalink: /version-history/ +--- + +# Version history + +OpenSearch version | Release highlights | Release date +:--- | :--- | :--- | :--- +[1.0.0-beta1](https://github.com/opensearch-project/) | Initial beta release. | 10 May 2021 + + +For detailed release notes, see these GitHub repositories: + +- [OpenSearch](https://github.com/opensearch-project/opensearch-build/tree/main/release-notes) +- [Security](https://github.com/opensearch-project/security/releases) +- [Alerting](https://github.com/opensearch-project/alerting/releases) +- [SQL](https://github.com/opensearch-project/sql/releases) +- [Index State Management](https://github.com/opensearch-project/index-management/releases) +- [Performance Analyzer](https://github.com/opensearch-project/performance-analyzer/releases) +- [k-NN](https://github.com/opensearch-project/k-NN/releases) +- [Anomaly detection](https://github.com/opensearch-project/anomaly-detection/releases)