OpenSearch/docs
Colin Goodheart-Smithe 0edb096eb4 Adds a new auto-interval date histogram (#28993)
* Adds a new auto-interval date histogram

This change adds a new type of histogram aggregation called `auto_date_histogram` where you can specify the target number of buckets you require and it will find an appropriate interval for the returned buckets. The aggregation works by first collecting documents in buckets at second interval, when it has created more than the target number of buckets it merges these buckets into minute interval bucket and continues collecting until it reaches the target number of buckets again. It will keep merging buckets when it exceeds the target until either collection is finished or the highest interval (currently years) is reached. A similar process happens at reduce time.

This aggregation intentionally does not support min_doc_count, offest and extended_bounds to keep the already complex logic from becoming more complex. The aggregation accepts sub-aggregations but will always operate in `breadth_first` mode deferring the computation of sub-aggregations until the final buckets from the shard are known. min_doc_count is effectively hard-coded to zero meaning that we will insert empty buckets where necessary.

Closes #9572

* Adds documentation

* Added sub aggregator test

* Fixes failing docs test

* Brings branch up to date with master changes

* trying to get tests to pass again

* Fixes multiBucketConsumer accounting

* Collects more buckets than needed on shards

This gives us more options at reduce time in terms of how we do the
final merge of the buckeets to produce the final result

* Revert "Collects more buckets than needed on shards"

This reverts commit 993c782d117892af9a3c86a51921cdee630a3ac5.

* Adds ability to merge within a rounding

* Fixes nonn-timezone doc test failure

* Fix time zone tests

* iterates on tests

* Adds test case and documentation changes

Added some notes in the documentation about the intervals that can bbe
returned.

Also added a test case that utilises the merging of conseecutive buckets

* Fixes performance bug

The bug meant that getAppropriate rounding look a huge amount of time
if the range of the data was large but also sparsely populated. In
these situations the rounding would be very low so iterating through
the rounding values from the min key to the max keey look a long time
(~120 seconds in one test).

The solution is to add a rough estimate first which chooses the
rounding based just on the long values of the min and max keeys alone
but selects the rounding one lower than the one it thinks is
appropriate so the accurate method can choose the final rounding taking
into account the fact that intervals are not always fixed length.

Thee commit also adds more tests

* Changes to only do complex reduction on final reduce

* merge latest with master

* correct tests and add a new test case for 10k buckets

* refactor to perform bucket number check in innerBuild

* correctly derive bucket setting, update tests to increase bucket threshold

* fix checkstyle

* address code review comments

* add documentation for default buckets

* fix typo
2018-07-13 13:08:35 -04:00
..
community-clients Docs: Add uptasticsearch to list of clients (#30738) 2018-05-18 17:13:12 -04:00
groovy-api [Docs] Unify spelling of Elasticsearch (#27567) 2017-11-29 09:44:25 +01:00
java-api Migrate scripted metric aggregation scripts to ScriptContext design (#30111) 2018-06-25 12:01:33 +01:00
java-rest HLRC: Add xpack usage api (#31975) 2018-07-13 09:33:27 -07:00
painless [DOCS] Fix broken link in painless example 2018-07-09 10:47:42 -07:00
perl [Docs] Update Copyright notices to 2018 (#29404) 2018-04-06 16:21:20 +02:00
plugins [Docs] Fix wrong link in Korean analyzer docs (#31815) 2018-07-06 09:30:48 +02:00
python [Docs] Update Copyright notices to 2018 (#29404) 2018-04-06 16:21:20 +02:00
reference Adds a new auto-interval date histogram (#28993) 2018-07-13 13:08:35 -04:00
resiliency Resilience page - Remove 6.0.0 as a target for the discovery refactoring. (#26311) 2017-08-21 18:15:24 +02:00
ruby [Docs] Update Copyright notices to 2018 (#29404) 2018-04-06 16:21:20 +02:00
src/test Node selector per client rather than per request (#31471) 2018-06-22 17:15:29 +02:00
README.asciidoc Docs: Remove duplicate test setup 2018-06-28 10:59:35 -04:00
Versions.asciidoc Docs: Use the default distribution to test docs (#31251) 2018-06-18 12:06:42 -04:00
build.gradle Add JDK11 support and enable in CI (#31644) 2018-07-05 03:24:01 +00:00

README.asciidoc

The Elasticsearch docs are in AsciiDoc format and can be built using the
Elasticsearch documentation build process.

See: https://github.com/elastic/docs

Snippets marked with `// CONSOLE` are automatically annotated with "VIEW IN
CONSOLE" and "COPY AS CURL" in the documentation and are automatically tested
by the command `gradle :docs:check`. To test just the docs from a single page,
use e.g. `gradle :docs:check -Dtests.method="\*rollover*"`.

NOTE: If you have an elasticsearch-extra folder alongside your elasticsearch
folder, you must temporarily rename it when you are testing 6.3 or later branches.

By default each `// CONSOLE` snippet runs as its own isolated test. You can
manipulate the test execution in the following ways:

* `// TEST`: Explicitly marks a snippet as a test. Snippets marked this way
are tests even if they don't have `// CONSOLE` but usually `// TEST` is used
for its modifiers:
  * `// TEST[s/foo/bar/]`: Replace `foo` with `bar` in the generated test. This
  should be used sparingly because it makes the snippet "lie". Sometimes,
  though, you can use it to make the snippet more clear more clear. Keep in
  mind the that if there are multiple substitutions then they are applied in
  the order that they are defined.
  * `// TEST[catch:foo]`: Used to expect errors in the requests. Replace `foo`
  with `request` to expect a 400 error, for example. If the snippet contains
  multiple requests then only the last request will expect the error.
  * `// TEST[continued]`: Continue the test started in the last snippet. Between
  tests the nodes are cleaned: indexes are removed, etc. This prevents that
  from happening between snippets because the two snippets are a single test.
  This is most useful when you have text and snippets that work together to
  tell the story of some use case because it merges the snippets (and thus the
  use case) into one big test.
  * `// TEST[skip:reason]`: Skip this test. Replace `reason` with the actual
  reason to skip the test. Snippets without `// TEST` or `// CONSOLE` aren't
  considered tests anyway but this is useful for explicitly documenting the
  reason why the test shouldn't be run.
  * `// TEST[setup:name]`: Run some setup code before running the snippet. This
  is useful for creating and populating indexes used in the snippet. The setup
  code is defined in `docs/build.gradle`. See `// TESTSETUP` below for a
  similar feature.
  * `// TEST[warning:some warning]`: Expect the response to include a `Warning`
  header. If the response doesn't include a `Warning` header with the exact
  text then the test fails. If the response includes `Warning` headers that
  aren't expected then the test fails.
* `// TESTRESPONSE`: Matches this snippet against the body of the response of
  the last test. If the response is JSON then order is ignored. If you add
  `// TEST[continued]` to the snippet after `// TESTRESPONSE` it will continue
  in the same test, allowing you to interleave requests with responses to check.
  * `// TESTRESPONSE[s/foo/bar/]`: Substitutions. See `// TEST[s/foo/bar]` for
  how it works. These are much more common than `// TEST[s/foo/bar]` because
  they are useful for eliding portions of the response that are not pertinent
  to the documentation.
    * One interesting difference here is that you often want to match against
    the response from Elasticsearch. To do that you can reference the "body" of
    the response like this: `// TESTRESPONSE[s/"took": 25/"took": $body.took/]`.
    Note the `$body` string. This says "I don't expect that 25 number in the
    response, just match against what is in the response." Instead of writing
    the path into the response after `$body` you can write `$_path` which
    "figures out" the path. This is especially useful for making sweeping
    assertions like "I made up all the numbers in this example, don't compare
    them" which looks like `// TESTRESPONSE[s/\d+/$body.$_path/]`.
  * `// TESTRESPONSE[_cat]`: Add substitutions for testing `_cat` responses. Use
  this after all other substitutions so it doesn't make other substitutions
  difficult.
* `// TESTSETUP`: Marks this snippet as the "setup" for all other snippets in
  this file. This is a somewhat natural way of structuring documentation. You
  say "this is the data we use to explain this feature" then you add the
  snippet that you mark `// TESTSETUP` and then every snippet will turn into
  a test that runs the setup snippet first. See the "painless" docs for a file
  that puts this to good use. This is fairly similar to `// TEST[setup:name]`
  but rather than the setup defined in `docs/build.gradle` the setup is defined
  right in the documentation file. In general, we should prefer `// TESTSETUP`
  over `// TEST[setup:name]` because it makes it more clear what steps have to
  be taken before the examples will work.

In addition to the standard CONSOLE syntax these snippets can contain blocks
of yaml surrounded by markers like this:

```
startyaml
  - compare_analyzers: {index: thai_example, first: thai, second: rebuilt_thai}
endyaml
```

This allows slightly more expressive testing of the snippets. Since that syntax
is not supported by CONSOLE the usual way to incorporate it is with a
`// TEST[s//]` marker like this:

```
// TEST[s/\n$/\nstartyaml\n  - compare_analyzers: {index: thai_example, first: thai, second: rebuilt_thai}\nendyaml\n/]
```

Any place you can use json you can use elements like `$body.path.to.thing`
which is replaced on the fly with the contents of the thing at `path.to.thing`
in the last response.