OpenSearch/docs
Jim Ferenczi 623367d793
Add composite aggregator (#26800)
* This change adds a module called `aggs-composite` that defines a new aggregation named `composite`.
The `composite` aggregation is a multi-buckets aggregation that creates composite buckets made of multiple sources.
The sources for each bucket can be defined as:
  * A `terms` source, values are extracted from a field or a script.
  * A `date_histogram` source, values are extracted from a date field and rounded to the provided interval.
This aggregation can be used to retrieve all buckets of a deeply nested aggregation by flattening the nested aggregation in composite buckets.
A composite buckets is composed of one value per source and is built for each document as the combinations of values in the provided sources.
For instance the following aggregation:

````
"test_agg": {
  "terms": {
    "field": "field1"
  },
  "aggs": {
    "nested_test_agg":
      "terms": {
        "field": "field2"
      }
  }
}
````
... which retrieves the top N terms for `field1` and for each top term in `field1` the top N terms for `field2`, can be replaced by a `composite` aggregation in order to retrieve **all** the combinations of `field1`, `field2` in the matching documents:

````
"composite_agg": {
  "composite": {
    "sources": [
      {
	"field1": {
          "terms": {
              "field": "field1"
            }
        }
      },
      {
	"field2": {
          "terms": {
            "field": "field2"
          }
        }
      },
    }
  }
````

The response of the aggregation looks like this:

````
"aggregations": {
  "composite_agg": {
    "buckets": [
      {
        "key": {
          "field1": "alabama",
          "field2": "almanach"
        },
        "doc_count": 100
      },
      {
        "key": {
          "field1": "alabama",
          "field2": "calendar"
        },
        "doc_count": 1
      },
      {
        "key": {
          "field1": "arizona",
          "field2": "calendar"
        },
        "doc_count": 1
      }
    ]
  }
}
````

By default this aggregation returns 10 buckets sorted in ascending order of the composite key.
Pagination can be achieved by providing `after` values, the values of the composite key to aggregate after.
For instance the following aggregation will aggregate all composite keys that sorts after `arizona, calendar`:

````
"composite_agg": {
  "composite": {
    "after": {"field1": "alabama", "field2": "calendar"},
    "size": 100,
    "sources": [
      {
	"field1": {
          "terms": {
            "field": "field1"
          }
        }
      },
      {
	"field2": {
          "terms": {
            "field": "field2"
          }
	}
      }
    }
  }
````

This aggregation is optimized for indices that set an index sorting that match the composite source definition.
For instance the aggregation above could run faster on indices that defines an index sorting like this:

````
"settings": {
  "index.sort.field": ["field1", "field2"]
}
````

In this case the `composite` aggregation can early terminate on each segment.
This aggregation also accepts multi-valued field but disables early termination for these fields even if index sorting matches the sources definition.
This is mandatory because index sorting picks only one value per document to perform the sort.
2017-11-16 15:13:36 +01:00
..
community-clients Add link to community Rust Client (#22897) 2017-06-09 14:50:51 -07:00
groovy-api Use Versions.asciidoc for groovy docs too 2017-02-04 11:42:45 +01:00
java-api Remove deploying in JBoss documentation 2017-10-05 15:41:32 -04:00
java-rest Add Delete Index API support to high-level REST client (#27019) 2017-10-26 09:52:46 +02:00
painless [Docs] "The the" is a great band, but ... (#26644) 2017-09-14 15:08:20 +02:00
perl Updated copyright years to include 2016 (#17808) 2016-04-18 12:39:23 +02:00
plugins Add note on plugin distributions in plugins folder 2017-11-15 13:33:59 -05:00
python Update version information (#25226) 2017-08-15 15:00:11 -06:00
reference Add composite aggregator (#26800) 2017-11-16 15:13:36 +01:00
resiliency Resilience page - Remove 6.0.0 as a target for the discovery refactoring. (#26311) 2017-08-21 18:15:24 +02:00
ruby Updated copyright years to include 2016 (#17808) 2016-04-18 12:39:23 +02:00
src/test Revert shading for the low level rest client (#26367) 2017-08-25 14:13:12 -05:00
README.asciidoc [DOCS] Clarified readme for testing a single page 2017-08-15 15:11:12 -07:00
Versions.asciidoc Upgrade to Lucene 7.1 (#27225) 2017-11-02 13:25:33 +00:00
build.gradle ScriptService: Replace max compilation per minute setting with max compilation rate (#26399) 2017-09-01 10:15:27 +02:00

README.asciidoc

The Elasticsearch docs are in AsciiDoc format and can be built using the
Elasticsearch documentation build process.

See: https://github.com/elastic/docs

Snippets marked with `// CONSOLE` are automatically annotated with "VIEW IN
CONSOLE" and "COPY AS CURL" in the documentation and are automatically tested
by the command `gradle :docs:check`. To test just the docs from a single page,
use e.g. `gradle :docs:check -Dtests.method="\*rollover*"`.

By default each `// CONSOLE` snippet runs as its own isolated test. You can
manipulate the test execution in the following ways:

* `// TEST`: Explicitly marks a snippet as a test. Snippets marked this way
are tests even if they don't have `// CONSOLE` but usually `// TEST` is used
for its modifiers:
  * `// TEST[s/foo/bar/]`: Replace `foo` with `bar` in the generated test. This
  should be used sparingly because it makes the snippet "lie". Sometimes,
  though, you can use it to make the snippet more clear more clear. Keep in
  mind the that if there are multiple substitutions then they are applied in
  the order that they are defined.
  * `// TEST[catch:foo]`: Used to expect errors in the requests. Replace `foo`
  with `request` to expect a 400 error, for example. If the snippet contains
  multiple requests then only the last request will expect the error.
  * `// TEST[continued]`: Continue the test started in the last snippet. Between
  tests the nodes are cleaned: indexes are removed, etc. This prevents that
  from happening between snippets because the two snippets are a single test.
  This is most useful when you have text and snippets that work together to
  tell the story of some use case because it merges the snippets (and thus the
  use case) into one big test.
  * `// TEST[skip:reason]`: Skip this test. Replace `reason` with the actual
  reason to skip the test. Snippets without `// TEST` or `// CONSOLE` aren't
  considered tests anyway but this is useful for explicitly documenting the
  reason why the test shouldn't be run.
  * `// TEST[setup:name]`: Run some setup code before running the snippet. This
  is useful for creating and populating indexes used in the snippet. The setup
  code is defined in `docs/build.gradle`.
  * `// TEST[warning:some warning]`: Expect the response to include a `Warning`
  header. If the response doesn't include a `Warning` header with the exact
  text then the test fails. If the response includes `Warning` headers that
  aren't expected then the test fails.
* `// TESTRESPONSE`: Matches this snippet against the body of the response of
  the last test. If the response is JSON then order is ignored. If you add
  `// TEST[continued]` to the snippet after `// TESTRESPONSE` it will continue
  in the same test, allowing you to interleave requests with responses to check.
  * `// TESTRESPONSE[s/foo/bar/]`: Substitutions. See `// TEST[s/foo/bar]` for
  how it works. These are much more common than `// TEST[s/foo/bar]` because
  they are useful for eliding portions of the response that are not pertinent
  to the documentation.
    * One interesting difference here is that you often want to match against
    the response from Elasticsearch. To do that you can reference the "body" of
    the response like this: `// TESTRESPONSE[s/"took": 25/"took": $body.took/]`.
    Note the `$body` string. This says "I don't expect that 25 number in the
    response, just match against what is in the response." Instead of writing
    the path into the response after `$body` you can write `$_path` which
    "figures out" the path. This is especially useful for making sweeping
    assertions like "I made up all the numbers in this example, don't compare
    them" which looks like `// TESTRESPONSE[s/\d+/$body.$_path/]`.
  * `// TESTRESPONSE[_cat]`: Add substitutions for testing `_cat` responses. Use
  this after all other substitutions so it doesn't make other substitutions
  difficult.
* `// TESTSETUP`: Marks this snippet as the "setup" for all other snippets in
  this file. This is a somewhat natural way of structuring documentation. You
  say "this is the data we use to explain this feature" then you add the
  snippet that you mark `// TESTSETUP` and then every snippet will turn into
  a test that runs the setup snippet first. See the "painless" docs for a file
  that puts this to good use. This is fairly similar to `// TEST[setup:name]`
  but rather than the setup defined in `docs/build.gradle` the setup is defined
  right in the documentation file.

Any place you can use json you can use elements like `$body.path.to.thing`
which is replaced on the fly with the contents of the thing at `path.to.thing`
in the last response.