Merge pull request #10929 from polyfractal/docs/aggs
Restructure Aggregation documentation
This commit is contained in:
commit
f6d5167d41
|
@ -1,6 +1,8 @@
|
|||
[[search-aggregations]]
|
||||
== Aggregations
|
||||
= Aggregations
|
||||
|
||||
[partintro]
|
||||
--
|
||||
The aggregations framework helps provide aggregated data based on a search query. It is based on simple building blocks
|
||||
called aggregations, that can be composed in order to build complex summaries of the data.
|
||||
|
||||
|
@ -11,16 +13,19 @@ query/filters of the search request).
|
|||
There are many different types of aggregations, each with its own purpose and output. To better understand these types,
|
||||
it is often easier to break them into two main families:
|
||||
|
||||
_Bucketing_::
|
||||
<<search-aggregations-bucket, _Bucketing_>>::
|
||||
A family of aggregations that build buckets, where each bucket is associated with a _key_ and a document
|
||||
criterion. When the aggregation is executed, all the buckets criteria are evaluated on every document in
|
||||
the context and when a criterion matches, the document is considered to "fall in" the relevant bucket.
|
||||
By the end of the aggregation process, we'll end up with a list of buckets - each one with a set of
|
||||
documents that "belong" to it.
|
||||
|
||||
_Metric_::
|
||||
<<search-aggregations-metrics, _Metric_>>::
|
||||
Aggregations that keep track and compute metrics over a set of documents.
|
||||
|
||||
<<search-aggregations-reducer, _Reducer_>>::
|
||||
Aggregations that aggregate the output of other aggregations and their associated metrics
|
||||
|
||||
The interesting part comes next. Since each bucket effectively defines a document set (all documents belonging to
|
||||
the bucket), one can potentially associate aggregations on the bucket level, and those will execute within the context
|
||||
of that bucket. This is where the real power of aggregations kicks in: *aggregations can be nested!*
|
||||
|
@ -31,7 +36,7 @@ NOTE: Bucketing aggregations can have sub-aggregations (bucketing or metric). Th
|
|||
another higher-level aggregation).
|
||||
|
||||
[float]
|
||||
=== Structuring Aggregations
|
||||
== Structuring Aggregations
|
||||
|
||||
The following snippet captures the basic structure of aggregations:
|
||||
|
||||
|
@ -62,7 +67,7 @@ bucketing aggregation. For example, if you define a set of aggregations under th
|
|||
sub-aggregations will be computed for the range buckets that are defined.
|
||||
|
||||
[float]
|
||||
==== Values Source
|
||||
=== Values Source
|
||||
|
||||
Some aggregations work on values extracted from the aggregated documents. Typically, the values will be extracted from
|
||||
a specific document field which is set using the `field` key for the aggregations. It is also possible to define a
|
||||
|
@ -89,142 +94,7 @@ perform optimizations when dealing with sorted values (for example, with the `mi
|
|||
sorted, Elasticsearch will skip the iterations over all the values and rely on the first value in the list to be the
|
||||
minimum value among all other values associated with the same document).
|
||||
|
||||
[float]
|
||||
=== Metrics Aggregations
|
||||
|
||||
The aggregations in this family compute metrics based on values extracted in one way or another from the documents that
|
||||
are being aggregated. The values are typically extracted from the fields of the document (using the field data), but
|
||||
can also be generated using scripts.
|
||||
|
||||
Numeric metrics aggregations are a special type of metrics aggregation which output numeric values. Some aggregations output
|
||||
a single numeric metric (e.g. `avg`) and are called `single-value numeric metrics aggregation`, others generate multiple
|
||||
metrics (e.g. `stats`) and are called `multi-value numeric metrics aggregation`. The distinction between single-value and
|
||||
multi-value numeric metrics aggregations plays a role when these aggregations serve as direct sub-aggregations of some
|
||||
bucket aggregations (some bucket aggregations enable you to sort the returned buckets based on the numeric metrics in each bucket).
|
||||
|
||||
|
||||
[float]
|
||||
=== Bucket Aggregations
|
||||
|
||||
Bucket aggregations don't calculate metrics over fields like the metrics aggregations do, but instead, they create
|
||||
buckets of documents. Each bucket is associated with a criterion (depending on the aggregation type) which determines
|
||||
whether or not a document in the current context "falls" into it. In other words, the buckets effectively define document
|
||||
sets. In addition to the buckets themselves, the `bucket` aggregations also compute and return the number of documents
|
||||
that "fell in" to each bucket.
|
||||
|
||||
Bucket aggregations, as opposed to `metrics` aggregations, can hold sub-aggregations. These sub-aggregations will be
|
||||
aggregated for the buckets created by their "parent" bucket aggregation.
|
||||
|
||||
There are different bucket aggregators, each with a different "bucketing" strategy. Some define a single bucket, some
|
||||
define fixed number of multiple buckets, and others dynamically create the buckets during the aggregation process.
|
||||
|
||||
[float]
|
||||
=== Reducer Aggregations
|
||||
|
||||
coming[2.0.0]
|
||||
|
||||
experimental[]
|
||||
|
||||
Reducer aggregations work on the outputs produced from other aggregations rather than from document sets, adding
|
||||
information to the output tree. There are many different types of reducer, each computing different information from
|
||||
other aggregations, but these types can broken down into two families:
|
||||
|
||||
_Parent_::
|
||||
A family of reducer aggregations that is provided with the output of its parent aggregation and is able
|
||||
to compute new buckets or new aggregations to add to existing buckets.
|
||||
|
||||
_Sibling_::
|
||||
Reducer aggregations that are provided with the output of a sibling aggregation and are able to compute a
|
||||
new aggregation which will be at the same level as the sibling aggregation.
|
||||
|
||||
Reducer aggregations can reference the aggregations they need to perform their computation by using the `buckets_paths`
|
||||
parameter to indicate the paths to the required metrics. The syntax for defining these paths can be found in the
|
||||
<<search-aggregations-bucket-terms-aggregation-order, terms aggregation order>> section.
|
||||
|
||||
?????? SHOULD THE SECTION ABOUT DEFINING AGGREGATION PATHS
|
||||
BE IN THIS PAGE AND REFERENCED FROM THE TERMS AGGREGATION DOCUMENTATION ???????
|
||||
|
||||
Reducer aggregations cannot have sub-aggregations but depending on the type it can reference another reducer in the `buckets_path`
|
||||
allowing reducers to be chained.
|
||||
|
||||
NOTE: Because reducer aggregations only add to the output, when chaining reducer aggregations the output of each reducer will be
|
||||
included in the final output.
|
||||
|
||||
[float]
|
||||
=== Caching heavy aggregations
|
||||
|
||||
Frequently used aggregations (e.g. for display on the home page of a website)
|
||||
can be cached for faster responses. These cached results are the same results
|
||||
that would be returned by an uncached aggregation -- you will never get stale
|
||||
results.
|
||||
|
||||
See <<index-modules-shard-query-cache>> for more details.
|
||||
|
||||
[float]
|
||||
=== Returning only aggregation results
|
||||
|
||||
There are many occasions when aggregations are required but search hits are not. For these cases the hits can be ignored by
|
||||
setting `size=0`. For example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
$ curl -XGET 'http://localhost:9200/twitter/tweet/_search' -d '{
|
||||
"size": 0,
|
||||
"aggregations": {
|
||||
"my_agg": {
|
||||
"terms": {
|
||||
"field": "text"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
'
|
||||
--------------------------------------------------
|
||||
|
||||
Setting `size` to `0` avoids executing the fetch phase of the search making the request more efficient.
|
||||
|
||||
[float]
|
||||
=== Metadata
|
||||
|
||||
You can associate a piece of metadata with individual aggregations at request time that will be returned in place
|
||||
at response time.
|
||||
|
||||
Consider this example where we want to associate the color blue with our `terms` aggregation.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
...
|
||||
aggs": {
|
||||
"titles": {
|
||||
"terms": {
|
||||
"field": "title"
|
||||
},
|
||||
"meta": {
|
||||
"color": "blue"
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Then that piece of metadata will be returned in place for our `titles` terms aggregation
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
...
|
||||
"aggregations": {
|
||||
"titles": {
|
||||
"meta": {
|
||||
"color" : "blue"
|
||||
},
|
||||
"buckets": [
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
--
|
||||
|
||||
include::aggregations/metrics.asciidoc[]
|
||||
|
||||
|
@ -232,3 +102,4 @@ include::aggregations/bucket.asciidoc[]
|
|||
|
||||
include::aggregations/reducer.asciidoc[]
|
||||
|
||||
include::aggregations/misc.asciidoc[]
|
|
@ -0,0 +1,49 @@
|
|||
[[search-aggregations-bucket]]
|
||||
== Bucket Aggregations
|
||||
|
||||
Bucket aggregations don't calculate metrics over fields like the metrics aggregations do, but instead, they create
|
||||
buckets of documents. Each bucket is associated with a criterion (depending on the aggregation type) which determines
|
||||
whether or not a document in the current context "falls" into it. In other words, the buckets effectively define document
|
||||
sets. In addition to the buckets themselves, the `bucket` aggregations also compute and return the number of documents
|
||||
that "fell in" to each bucket.
|
||||
|
||||
Bucket aggregations, as opposed to `metrics` aggregations, can hold sub-aggregations. These sub-aggregations will be
|
||||
aggregated for the buckets created by their "parent" bucket aggregation.
|
||||
|
||||
There are different bucket aggregators, each with a different "bucketing" strategy. Some define a single bucket, some
|
||||
define fixed number of multiple buckets, and others dynamically create the buckets during the aggregation process.
|
||||
|
||||
include::bucket/children-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/datehistogram-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/daterange-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/filter-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/filters-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/geodistance-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/geohashgrid-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/global-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/histogram-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/iprange-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/missing-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/nested-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/range-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/reverse-nested-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/sampler-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/significantterms-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/terms-aggregation.asciidoc[]
|
||||
|
|
@ -72,7 +72,7 @@ Response:
|
|||
The `shard_size` parameter limits how many top-scoring documents are collected in the sample processed on each shard.
|
||||
The default value is 100.
|
||||
|
||||
=== Controlling diversity
|
||||
==== Controlling diversity
|
||||
Optionally, you can use the `field` or `script` and `max_docs_per_value` settings to control the maximum number of documents collected on any one shard which share a common value.
|
||||
The choice of value (e.g. `author`) is loaded from a regular `field` or derived dynamically by a `script`.
|
||||
|
||||
|
@ -139,16 +139,16 @@ The default setting is to use `global_ordinals` if this information is available
|
|||
The `bytes_hash` setting may prove faster in some cases but introduces the possibility of false positives in de-duplication logic due to the possibility of hash collisions.
|
||||
Please note that Elasticsearch will ignore the choice of execution hint if it is not applicable and that there is no backward compatibility guarantee on these hints.
|
||||
|
||||
=== Limitations
|
||||
==== Limitations
|
||||
|
||||
==== Cannot be nested under `breadth_first` aggregations
|
||||
===== Cannot be nested under `breadth_first` aggregations
|
||||
Being a quality-based filter the sampler aggregation needs access to the relevance score produced for each document.
|
||||
It therefore cannot be nested under a `terms` aggregation which has the `collect_mode` switched from the default `depth_first` mode to `breadth_first` as this discards scores.
|
||||
In this situation an error will be thrown.
|
||||
|
||||
==== Limited de-dup logic.
|
||||
===== Limited de-dup logic.
|
||||
The de-duplication logic in the diversify settings applies only at a shard level so will not apply across shards.
|
||||
|
||||
==== No specialized syntax for geo/date fields
|
||||
===== No specialized syntax for geo/date fields
|
||||
Currently the syntax for defining the diversifying values is defined by a choice of `field` or `script` - there is no added syntactical sugar for expressing geo or date units such as "1w" (1 week).
|
||||
This support may be added in a later release and users will currently have to create these sorts of values using a script.
|
|
@ -0,0 +1,48 @@
|
|||
[[search-aggregations-metrics]]
|
||||
== Metrics Aggregations
|
||||
|
||||
The aggregations in this family compute metrics based on values extracted in one way or another from the documents that
|
||||
are being aggregated. The values are typically extracted from the fields of the document (using the field data), but
|
||||
can also be generated using scripts.
|
||||
|
||||
Numeric metrics aggregations are a special type of metrics aggregation which output numeric values. Some aggregations output
|
||||
a single numeric metric (e.g. `avg`) and are called `single-value numeric metrics aggregation`, others generate multiple
|
||||
metrics (e.g. `stats`) and are called `multi-value numeric metrics aggregation`. The distinction between single-value and
|
||||
multi-value numeric metrics aggregations plays a role when these aggregations serve as direct sub-aggregations of some
|
||||
bucket aggregations (some bucket aggregations enable you to sort the returned buckets based on the numeric metrics in each bucket).
|
||||
|
||||
include::metrics/avg-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/cardinality-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/extendedstats-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/geobounds-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/max-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/min-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/percentile-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/percentile-rank-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/scripted-metric-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/stats-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/sum-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/tophits-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/valuecount-aggregation.asciidoc[]
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,76 @@
|
|||
|
||||
[[caching-heavy-aggregations]]
|
||||
== Caching heavy aggregations
|
||||
|
||||
Frequently used aggregations (e.g. for display on the home page of a website)
|
||||
can be cached for faster responses. These cached results are the same results
|
||||
that would be returned by an uncached aggregation -- you will never get stale
|
||||
results.
|
||||
|
||||
See <<index-modules-shard-query-cache>> for more details.
|
||||
|
||||
[[returning-only-agg-results]]
|
||||
== Returning only aggregation results
|
||||
|
||||
There are many occasions when aggregations are required but search hits are not. For these cases the hits can be ignored by
|
||||
setting `size=0`. For example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
$ curl -XGET 'http://localhost:9200/twitter/tweet/_search' -d '{
|
||||
"size": 0,
|
||||
"aggregations": {
|
||||
"my_agg": {
|
||||
"terms": {
|
||||
"field": "text"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
'
|
||||
--------------------------------------------------
|
||||
|
||||
Setting `size` to `0` avoids executing the fetch phase of the search making the request more efficient.
|
||||
|
||||
[[agg-metadata]]
|
||||
== Aggregation Metadata
|
||||
|
||||
You can associate a piece of metadata with individual aggregations at request time that will be returned in place
|
||||
at response time.
|
||||
|
||||
Consider this example where we want to associate the color blue with our `terms` aggregation.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
...
|
||||
aggs": {
|
||||
"titles": {
|
||||
"terms": {
|
||||
"field": "title"
|
||||
},
|
||||
"meta": {
|
||||
"color": "blue"
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Then that piece of metadata will be returned in place for our `titles` terms aggregation
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
...
|
||||
"aggregations": {
|
||||
"titles": {
|
||||
"meta": {
|
||||
"color" : "blue"
|
||||
},
|
||||
"buckets": [
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,160 @@
|
|||
[[search-aggregations-reducer]]
|
||||
|
||||
== Reducer Aggregations
|
||||
|
||||
coming[2.0.0]
|
||||
|
||||
experimental[]
|
||||
|
||||
Reducer aggregations work on the outputs produced from other aggregations rather than from document sets, adding
|
||||
information to the output tree. There are many different types of reducer, each computing different information from
|
||||
other aggregations, but these types can broken down into two families:
|
||||
|
||||
_Parent_::
|
||||
A family of reducer aggregations that is provided with the output of its parent aggregation and is able
|
||||
to compute new buckets or new aggregations to add to existing buckets.
|
||||
|
||||
_Sibling_::
|
||||
Reducer aggregations that are provided with the output of a sibling aggregation and are able to compute a
|
||||
new aggregation which will be at the same level as the sibling aggregation.
|
||||
|
||||
Reducer aggregations can reference the aggregations they need to perform their computation by using the `buckets_paths`
|
||||
parameter to indicate the paths to the required metrics. The syntax for defining these paths can be found in the
|
||||
<<bucket-path-syntax, `buckets_path` Syntax>> section below.
|
||||
|
||||
Reducer aggregations cannot have sub-aggregations but depending on the type it can reference another reducer in the `buckets_path`
|
||||
allowing reducers to be chained. For example, you can chain together two derivatives to calculate the second derivative
|
||||
(e.g. a derivative of a derivative).
|
||||
|
||||
NOTE: Because reducer aggregations only add to the output, when chaining reducer aggregations the output of each reducer will be
|
||||
included in the final output.
|
||||
|
||||
[[bucket-path-syntax]]
|
||||
[float]
|
||||
=== `buckets_path` Syntax
|
||||
|
||||
Most reducers require another aggregation as their input. The input aggregation is defined via the `buckets_path`
|
||||
parameter, which follows a specific format:
|
||||
|
||||
--------------------------------------------------
|
||||
AGG_SEPARATOR := '>'
|
||||
METRIC_SEPARATOR := '.'
|
||||
AGG_NAME := <the name of the aggregation>
|
||||
METRIC := <the name of the metric (in case of multi-value metrics aggregation)>
|
||||
PATH := <AGG_NAME>[<AGG_SEPARATOR><AGG_NAME>]*[<METRIC_SEPARATOR><METRIC>]
|
||||
--------------------------------------------------
|
||||
|
||||
For example, the path `"my_bucket>my_stats.avg"` will path to the `avg` value in the `"my_stats"` metric, which is
|
||||
contained in the `"my_bucket"` bucket aggregation.
|
||||
|
||||
Paths are relative from the position of the reducer; they are not absolute paths, and the path cannot go back "up" the
|
||||
aggregation tree. For example, this moving average is embedded inside a date_histogram and refers to a "sibling"
|
||||
metric `"the_sum"`:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"my_date_histo":{
|
||||
"date_histogram":{
|
||||
"field":"timestamp",
|
||||
"interval":"day"
|
||||
},
|
||||
"aggs":{
|
||||
"the_sum":{
|
||||
"sum":{ "field": "lemmings" } <1>
|
||||
},
|
||||
"the_movavg":{
|
||||
"moving_avg":{ "buckets_path": "the_sum" } <2>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
<1> The metric is called `"the_sum"`
|
||||
<2> The `buckets_path` refers to the metric via a relative path `"the_sum"`
|
||||
|
||||
`buckets_path` is also used for Sibling reducer aggregations, where the aggregation is "next" to a series of buckets
|
||||
instead of embedded "inside" them. For example, the `max_bucket` aggregation uses the `buckets_path` to specify
|
||||
a metric embedded inside a sibling aggregation:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"aggs" : {
|
||||
"sales_per_month" : {
|
||||
"date_histogram" : {
|
||||
"field" : "date",
|
||||
"interval" : "month"
|
||||
},
|
||||
"aggs": {
|
||||
"sales": {
|
||||
"sum": {
|
||||
"field": "price"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"max_monthly_sales": {
|
||||
"max_bucket": {
|
||||
"buckets_paths": "sales_per_month>sales" <1>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
<1> `bucket_paths` instructs this max_bucket aggregation that we want the maximum value of the `sales` aggregation in the
|
||||
`sales_per_month` date histogram.
|
||||
|
||||
[float]
|
||||
==== Special Paths
|
||||
|
||||
Instead of pathing to a metric, `buckets_path` can use a special `"_count"` path. This instructs
|
||||
the reducer to use the document count as it's input. For example, a moving average can be calculated on the document
|
||||
count of each bucket, instead of a specific metric:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"my_date_histo":{
|
||||
"date_histogram":{
|
||||
"field":"timestamp",
|
||||
"interval":"day"
|
||||
},
|
||||
"aggs":{
|
||||
"the_movavg":{
|
||||
"moving_avg":{ "buckets_path": "_count" } <1>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
<1> By using `_count` instead of a metric name, we can calculate the moving average of document counts in the histogram
|
||||
|
||||
|
||||
[float]
|
||||
=== Dealing with gaps in the data
|
||||
|
||||
There are a couple of reasons why the data output by the enclosing histogram may have gaps:
|
||||
|
||||
* There are no documents matching the query for some buckets
|
||||
* The data for a metric is missing in all of the documents falling into a bucket (this is most likely with either a small interval
|
||||
on the enclosing histogram or with a query matching only a small number of documents)
|
||||
|
||||
Where there is no data available in a bucket for a given metric it presents a problem for calculating the derivative value for both
|
||||
the current bucket and the next bucket. In the derivative reducer aggregation has a `gap policy` parameter to define what the behavior
|
||||
should be when a gap in the data is found. There are currently two options for controlling the gap policy:
|
||||
|
||||
_ignore_::
|
||||
This option will not produce a derivative value for any buckets where the value in the current or previous bucket is
|
||||
missing
|
||||
|
||||
_insert_zeros_::
|
||||
This option will assume the missing value is `0` and calculate the derivative with the value `0`.
|
||||
|
||||
|
||||
|
||||
|
||||
include::reducer/derivative-aggregation.asciidoc[]
|
||||
include::reducer/max-bucket-aggregation.asciidoc[]
|
||||
include::reducer/min-bucket-aggregation.asciidoc[]
|
||||
include::reducer/movavg-aggregation.asciidoc[]
|
|
@ -5,6 +5,28 @@ A parent reducer aggregation which calculates the derivative of a specified metr
|
|||
aggregation. The specified metric must be numeric and the enclosing histogram must have `min_doc_count` set to `0` (default
|
||||
for `histogram` aggregations).
|
||||
|
||||
==== Syntax
|
||||
|
||||
A `derivative` aggregation looks like this in isolation:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"derivative": {
|
||||
"buckets_path": "the_sum"
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
.`derivative` Parameters
|
||||
|===
|
||||
|Parameter Name |Description |Required |Default Value
|
||||
|`buckets_path` |Path to the metric of interest (see <<bucket-path-syntax, `buckets_path` Syntax>> for more details |Required |
|
||||
|===
|
||||
|
||||
|
||||
==== First Order Derivative
|
||||
|
||||
The following snippet calculates the derivative of the total monthly `sales`:
|
||||
|
||||
[source,js]
|
||||
|
@ -82,7 +104,7 @@ And the following may be the response:
|
|||
<1> No derivative for the first bucket since we need at least 2 data points to calculate the derivative
|
||||
<2> Derivative value units are implicitly defined by the `sales` aggregation and the parent histogram so in this case the units
|
||||
would be $/month assuming the `price` field has units of $.
|
||||
<3> The number of documents in the bucket are represented by the `doc_count` value
|
||||
<3> The number of documents in the bucket are represented by the `doc_count` f
|
||||
|
||||
==== Second Order Derivative
|
||||
|
||||
|
@ -172,23 +194,3 @@ And the following may be the response:
|
|||
<1> No second derivative for the first two buckets since we need at least 2 data points from the first derivative to calculate the
|
||||
second derivative
|
||||
|
||||
==== Dealing with gaps in the data
|
||||
|
||||
There are a couple of reasons why the data output by the enclosing histogram may have gaps:
|
||||
|
||||
* There are no documents matching the query for some buckets
|
||||
* The data for a metric is missing in all of the documents falling into a bucket (this is most likely with either a small interval
|
||||
on the enclosing histogram or with a query matching only a small number of documents)
|
||||
|
||||
Where there is no data available in a bucket for a given metric it presents a problem for calculating the derivative value for both
|
||||
the current bucket and the next bucket. In the derivative reducer aggregation has a `gap_policy` parameter to define what the behavior
|
||||
should be when a gap in the data is found. There are currently two options for controlling the gap policy:
|
||||
|
||||
_ignore_::
|
||||
This option will not produce a derivative value for any buckets where the value in the current or previous bucket is
|
||||
missing
|
||||
|
||||
_insert_zeros_::
|
||||
This option will assume the missing value is `0` and calculate the derivative with the value `0`.
|
||||
|
||||
|
|
@ -5,6 +5,26 @@ A sibling reducer aggregation which identifies the bucket(s) with the maximum va
|
|||
and outputs both the value and the key(s) of the bucket(s). The specified metric must be numeric and the sibling aggregation must
|
||||
be a multi-bucket aggregation.
|
||||
|
||||
==== Syntax
|
||||
|
||||
A `max_bucket` aggregation looks like this in isolation:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"max_bucket": {
|
||||
"buckets_path": "the_sum"
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
.`max_bucket` Parameters
|
||||
|===
|
||||
|Parameter Name |Description |Required |Default Value
|
||||
|`buckets_path` |The path to the buckets we wish to find the maximum for (see <<bucket-path-syntax>> for more
|
||||
details |Required |
|
||||
|===
|
||||
|
||||
The following snippet calculates the maximum of the total monthly `sales`:
|
||||
|
||||
[source,js]
|
||||
|
@ -32,7 +52,6 @@ The following snippet calculates the maximum of the total monthly `sales`:
|
|||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
<1> `bucket_paths` instructs this max_bucket aggregation that we want the maximum value of the `sales` aggregation in the
|
||||
`sales_per_month` date histogram.
|
||||
|
|
@ -5,6 +5,26 @@ A sibling reducer aggregation which identifies the bucket(s) with the minimum va
|
|||
and outputs both the value and the key(s) of the bucket(s). The specified metric must be numeric and the sibling aggregation must
|
||||
be a multi-bucket aggregation.
|
||||
|
||||
==== Syntax
|
||||
|
||||
A `max_bucket` aggregation looks like this in isolation:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"min_bucket": {
|
||||
"buckets_path": "the_sum"
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
.`min_bucket` Parameters
|
||||
|===
|
||||
|Parameter Name |Description |Required |Default Value
|
||||
|`buckets_path` |Path to the metric of interest (see <<bucket-path-syntax, `buckets_path` Syntax>> for more details |Required |
|
||||
|===
|
||||
|
||||
|
||||
The following snippet calculates the minimum of the total monthly `sales`:
|
||||
|
||||
[source,js]
|
|
@ -35,16 +35,14 @@ A `moving_avg` aggregation looks like this in isolation:
|
|||
|
||||
.`moving_avg` Parameters
|
||||
|===
|
||||
|Parameter Name |Description |Required |Default
|
||||
|
||||
|`buckets_path` |The path to the metric that we wish to calculate a moving average for |Required |
|
||||
|Parameter Name |Description |Required |Default Value
|
||||
|`buckets_path` |Path to the metric of interest (see <<bucket-path-syntax, `buckets_path` Syntax>> for more details |Required |
|
||||
|`model` |The moving average weighting model that we wish to use |Optional |`simple`
|
||||
|`gap_policy` |Determines what should happen when a gap in the data is encountered. |Optional |`insert_zero`
|
||||
|`window` |The size of window to "slide" across the histogram. |Optional |`5`
|
||||
|`settings` |Model-specific settings, contents which differ depending on the model specified. |Optional |
|
||||
|===
|
||||
|
||||
|
||||
`moving_avg` aggregations must be embedded inside of a `histogram` or `date_histogram` aggregation. They can be
|
||||
embedded like any other metric aggregation:
|
||||
|
||||
|
@ -73,27 +71,9 @@ embedded like any other metric aggregation:
|
|||
|
||||
Moving averages are built by first specifying a `histogram` or `date_histogram` over a field. You can then optionally
|
||||
add normal metrics, such as a `sum`, inside of that histogram. Finally, the `moving_avg` is embedded inside the histogram.
|
||||
The `buckets_path` parameter is then used to "point" at one of the sibling metrics inside of the histogram.
|
||||
The `buckets_path` parameter is then used to "point" at one of the sibling metrics inside of the histogram (see
|
||||
<<bucket-path-syntax>> for a description of the syntax for `buckets_path`.
|
||||
|
||||
A moving average can also be calculated on the document count of each bucket, instead of a metric:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"my_date_histo":{
|
||||
"date_histogram":{
|
||||
"field":"timestamp",
|
||||
"interval":"day"
|
||||
},
|
||||
"aggs":{
|
||||
"the_movavg":{
|
||||
"moving_avg":{ "buckets_path": "_count" } <1>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
<1> By using `_count` instead of a metric name, we can calculate the moving average of document counts in the histogram
|
||||
|
||||
==== Models
|
||||
|
||||
|
@ -250,7 +230,7 @@ image::images/reducers_movavg/double_0.2beta.png[]
|
|||
.Double Exponential moving average with window of size 100, alpha = 0.5, beta = 0.7
|
||||
image::images/reducers_movavg/double_0.7beta.png[]
|
||||
|
||||
=== Prediction
|
||||
==== Prediction
|
||||
|
||||
All the moving average model support a "prediction" mode, which will attempt to extrapolate into the future given the
|
||||
current smoothed, moving average. Depending on the model and parameter, these predictions may or may not be accurate.
|
|
@ -18,6 +18,8 @@ include::docs.asciidoc[]
|
|||
|
||||
include::search.asciidoc[]
|
||||
|
||||
include::aggregations.asciidoc[]
|
||||
|
||||
include::indices.asciidoc[]
|
||||
|
||||
include::cat.asciidoc[]
|
||||
|
|
|
@ -85,8 +85,6 @@ include::search/search-template.asciidoc[]
|
|||
|
||||
include::search/search-shards.asciidoc[]
|
||||
|
||||
include::search/aggregations.asciidoc[]
|
||||
|
||||
include::search/facets.asciidoc[]
|
||||
|
||||
include::search/suggesters.asciidoc[]
|
||||
|
|
|
@ -1,33 +0,0 @@
|
|||
[[search-aggregations-bucket]]
|
||||
|
||||
include::bucket/global-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/filter-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/filters-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/missing-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/nested-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/reverse-nested-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/children-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/terms-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/significantterms-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/range-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/daterange-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/iprange-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/histogram-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/datehistogram-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/geodistance-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/geohashgrid-aggregation.asciidoc[]
|
|
@ -1,27 +0,0 @@
|
|||
[[search-aggregations-metrics]]
|
||||
|
||||
include::metrics/min-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/max-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/sum-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/avg-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/stats-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/extendedstats-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/valuecount-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/percentile-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/percentile-rank-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/cardinality-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/geobounds-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/tophits-aggregation.asciidoc[]
|
||||
|
||||
include::metrics/scripted-metric-aggregation.asciidoc[]
|
|
@ -1,6 +0,0 @@
|
|||
[[search-aggregations-reducer]]
|
||||
|
||||
include::reducer/derivative-aggregation.asciidoc[]
|
||||
include::reducer/max-bucket-aggregation.asciidoc[]
|
||||
include::reducer/min-bucket-aggregation.asciidoc[]
|
||||
include::reducer/movavg-aggregation.asciidoc[]
|
Loading…
Reference in New Issue