OpenSearch/docs/reference/aggregations.asciidoc

435 lines
9.9 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[[search-aggregations]]
= Aggregations
[partintro]
--
An aggregation summarizes your data as metrics, statistics, or other analytics.
Aggregations help you answer questions like:
* What's the average load time for my website?
* Who are my most valuable customers based on transaction volume?
* What would be considered a large file on my network?
* How many products are in each product category?
{es} organizes aggregations into three categories:
* <<search-aggregations-metrics,Metric>> aggregations that calculate metrics,
such as a sum or average, from field values.
* <<search-aggregations-bucket,Bucket>> aggregations that
group documents into buckets, also called bins, based on field values, ranges,
or other criteria.
* <<search-aggregations-pipeline,Pipeline>> aggregations that take input from
other aggregations instead of documents or fields.
[discrete]
[[run-an-agg]]
=== Run an aggregation
You can run aggregations as part of a <<search-your-data,search>> by specifying the <<search-search,search API>>'s `aggs` parameter. The
following search runs a
<<search-aggregations-bucket-terms-aggregation,terms aggregation>> on
`my-field`:
[source,console]
----
GET /my-index-000001/_search
{
"aggs": {
"my-agg-name": {
"terms": {
"field": "my-field"
}
}
}
}
----
// TEST[setup:my_index]
// TEST[s/my-field/http.request.method/]
Aggregation results are in the response's `aggregations` object:
[source,console-result]
----
{
"took": 78,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": 1.0,
"hits": [...]
},
"aggregations": {
"my-agg-name": { <1>
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
}
}
----
// TESTRESPONSE[s/"took": 78/"took": "$body.took"/]
// TESTRESPONSE[s/\.\.\.$/"took": "$body.took", "timed_out": false, "_shards": "$body._shards", /]
// TESTRESPONSE[s/"hits": \[\.\.\.\]/"hits": "$body.hits.hits"/]
// TESTRESPONSE[s/"buckets": \[\]/"buckets":\[\{"key":"get","doc_count":5\}\]/]
<1> Results for the `my-agg-name` aggregation.
[discrete]
[[change-agg-scope]]
=== Change an aggregation's scope
Use the `query` parameter to limit the documents on which an aggregation runs:
[source,console]
----
GET /my-index-000001/_search
{
"query": {
"range": {
"@timestamp": {
"gte": "now-1d/d",
"lt": "now/d"
}
}
},
"aggs": {
"my-agg-name": {
"terms": {
"field": "my-field"
}
}
}
}
----
// TEST[setup:my_index]
// TEST[s/my-field/http.request.method/]
[discrete]
[[return-only-agg-results]]
=== Return only aggregation results
By default, searches containing an aggregation return both search hits and
aggregation results. To return only aggregation results, set `size` to `0`:
[source,console]
----
GET /my-index-000001/_search
{
"size": 0,
"aggs": {
"my-agg-name": {
"terms": {
"field": "my-field"
}
}
}
}
----
// TEST[setup:my_index]
// TEST[s/my-field/http.request.method/]
[discrete]
[[run-multiple-aggs]]
=== Run multiple aggregations
You can specify multiple aggregations in the same request:
[source,console]
----
GET /my-index-000001/_search
{
"aggs": {
"my-first-agg-name": {
"terms": {
"field": "my-field"
}
},
"my-second-agg-name": {
"avg": {
"field": "my-other-field"
}
}
}
}
----
// TEST[setup:my_index]
// TEST[s/my-field/http.request.method/]
// TEST[s/my-other-field/http.response.bytes/]
[discrete]
[[run-sub-aggs]]
=== Run sub-aggregations
Bucket aggregations support bucket or metric sub-aggregations. For example, a
terms aggregation with an <<search-aggregations-metrics-avg-aggregation,avg>>
sub-aggregation calculates an average value for each bucket of documents. There
is no level or depth limit for nesting sub-aggregations.
[source,console]
----
GET /my-index-000001/_search
{
"aggs": {
"my-agg-name": {
"terms": {
"field": "my-field"
},
"aggs": {
"my-sub-agg-name": {
"avg": {
"field": "my-other-field"
}
}
}
}
}
}
----
// TEST[setup:my_index]
// TEST[s/_search/_search?size=0/]
// TEST[s/my-field/http.request.method/]
// TEST[s/my-other-field/http.response.bytes/]
The response nests sub-aggregation results under their parent aggregation:
[source,console-result]
----
{
...
"aggregations": {
"my-agg-name": { <1>
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "foo",
"doc_count": 5,
"my-sub-agg-name": { <2>
"value": 75.0
}
}
]
}
}
}
----
// TESTRESPONSE[s/\.\.\./"took": "$body.took", "timed_out": false, "_shards": "$body._shards", "hits": "$body.hits",/]
// TESTRESPONSE[s/"key": "foo"/"key": "get"/]
// TESTRESPONSE[s/"value": 75.0/"value": $body.aggregations.my-agg-name.buckets.0.my-sub-agg-name.value/]
<1> Results for the parent aggregation, `my-agg-name`.
<2> Results for `my-agg-name`'s sub-aggregation, `my-sub-agg-name`.
[discrete]
[[add-metadata-to-an-agg]]
=== Add custom metadata
Use the `meta` object to associate custom metadata with an aggregation:
[source,console]
----
GET /my-index-000001/_search
{
"aggs": {
"my-agg-name": {
"terms": {
"field": "my-field"
},
"meta": {
"my-metadata-field": "foo"
}
}
}
}
----
// TEST[setup:my_index]
// TEST[s/_search/_search?size=0/]
The response returns the `meta` object in place:
[source,console-result]
----
{
...
"aggregations": {
"my-agg-name": {
"meta": {
"my-metadata-field": "foo"
},
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
}
}
----
// TESTRESPONSE[s/\.\.\./"took": "$body.took", "timed_out": false, "_shards": "$body._shards", "hits": "$body.hits",/]
[discrete]
[[return-agg-type]]
=== Return the aggregation type
By default, aggregation results include the aggregation's name but not its type.
To return the aggregation type, use the `typed_keys` query parameter.
[source,console]
----
GET /my-index-000001/_search?typed_keys
{
"aggs": {
"my-agg-name": {
"histogram": {
"field": "my-field",
"interval": 1000
}
}
}
}
----
// TEST[setup:my_index]
// TEST[s/typed_keys/typed_keys&size=0/]
// TEST[s/my-field/http.response.bytes/]
The response returns the aggregation type as a prefix to the aggregation's name.
IMPORTANT: Some aggregations return a different aggregation type from the
type in the request. For example, the terms,
<<search-aggregations-bucket-significantterms-aggregation,significant terms>>,
and <<search-aggregations-metrics-percentile-aggregation,percentiles>>
aggregations return different aggregations types depending on the data type of
the aggregated field.
[source,console-result]
----
{
...
"aggregations": {
"histogram#my-agg-name": { <1>
"buckets": []
}
}
}
----
// TESTRESPONSE[s/\.\.\./"took": "$body.took", "timed_out": false, "_shards": "$body._shards", "hits": "$body.hits",/]
// TESTRESPONSE[s/"buckets": \[\]/"buckets":\[\{"key":1070000.0,"doc_count":5\}\]/]
<1> The aggregation type, `histogram`, followed by a `#` separator and the aggregation's name, `my-agg-name`.
[discrete]
[[use-scripts-in-an-agg]]
=== Use scripts in an aggregation
Some aggregations support <<modules-scripting,scripts>>. You can
use a `script` to extract or generate values for the aggregation:
[source,console]
----
GET /my-index-000001/_search
{
"aggs": {
"my-agg-name": {
"histogram": {
"interval": 1000,
"script": {
"source": "doc['my-field'].value.length()"
}
}
}
}
}
----
// TEST[setup:my_index]
// TEST[s/my-field/http.request.method/]
If you also specify a `field`, the `script` modifies the field values used in
the aggregation. The following aggregation uses a script to modify `my-field`
values:
[source,console]
----
GET /my-index-000001/_search
{
"aggs": {
"my-agg-name": {
"histogram": {
"field": "my-field",
"interval": 1000,
"script": "_value / 1000"
}
}
}
}
----
// TEST[setup:my_index]
// TEST[s/my-field/http.response.bytes/]
Some aggregations only work on specific data types. Use the `value_type`
parameter to specify a data type for a script-generated value or an unmapped
field. `value_type` accepts the following values:
* `boolean`
* `date`
* `double`, used for all floating-point numbers
* `long`, used for all integers
* `ip`
* `string`
[source,console]
----
GET /my-index-000001/_search
{
"aggs": {
"my-agg-name": {
"histogram": {
"field": "my-field",
"interval": 1000,
"script": "_value / 1000",
"value_type": "long"
}
}
}
}
----
// TEST[setup:my_index]
// TEST[s/my-field/http.response.bytes/]
[discrete]
[[agg-caches]]
=== Aggregation caches
For faster responses, {es} caches the results of frequently run aggregations in
the <<shard-request-cache,shard request cache>>. To get cached results, use the
same <<shard-and-node-preference,`preference` string>> for each search. If you
don't need search hits, <<return-only-agg-results,set `size` to `0`>> to avoid
filling the cache.
{es} routes searches with the same preference string to the same shards. If the
shards' data doesnt change between searches, the shards return cached
aggregation results.
[discrete]
[[limits-for-long-values]]
=== Limits for `long` values
When running aggregations, {es} uses <<number,`double`>> values to hold and
represent numeric data. As a result, aggregations on <<number,`long`>> numbers
greater than +2^53^+ are approximate.
--
include::aggregations/bucket.asciidoc[]
include::aggregations/metrics.asciidoc[]
include::aggregations/pipeline.asciidoc[]