mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-04-17 12:50:32 +00:00
This changes the `top_metrics` aggregation to return metrics in their original type. Since it only supports numerics, that means that dates, longs, and doubles will come back as stored, with their appropriate formatter applied.
405 lines
8.7 KiB
Plaintext
405 lines
8.7 KiB
Plaintext
[role="xpack"]
|
|
[testenv="basic"]
|
|
[[search-aggregations-metrics-top-metrics]]
|
|
=== Top Metrics Aggregation
|
|
|
|
experimental[We expect to change the response format of this aggregation as we add more features., https://github.com/elastic/elasticsearch/issues/51813]
|
|
|
|
The `top_metrics` aggregation selects metrics from the document with the largest or smallest "sort"
|
|
value. For example, This gets the value of the `v` field on the document with the largest value of `s`:
|
|
|
|
[source,console,id=search-aggregations-metrics-top-metrics-simple]
|
|
----
|
|
POST /test/_bulk?refresh
|
|
{"index": {}}
|
|
{"s": 1, "v": 3.1415}
|
|
{"index": {}}
|
|
{"s": 2, "v": 1.0}
|
|
{"index": {}}
|
|
{"s": 3, "v": 2.71828}
|
|
POST /test/_search?filter_path=aggregations
|
|
{
|
|
"aggs": {
|
|
"tm": {
|
|
"top_metrics": {
|
|
"metrics": {"field": "v"},
|
|
"sort": {"s": "desc"}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----
|
|
|
|
Which returns:
|
|
|
|
[source,js]
|
|
----
|
|
{
|
|
"aggregations": {
|
|
"tm": {
|
|
"top": [ {"sort": [3], "metrics": {"v": 2.718280076980591 } } ]
|
|
}
|
|
}
|
|
}
|
|
----
|
|
// TESTRESPONSE
|
|
|
|
`top_metrics` is fairly similar to <<search-aggregations-metrics-top-hits-aggregation, `top_hits`>>
|
|
in spirit but because it is more limited it is able to do its job using less memory and is often
|
|
faster.
|
|
|
|
==== `sort`
|
|
|
|
The `sort` field in the metric request functions exactly the same as the `sort` field in the
|
|
<<request-body-search-sort, search>> request except:
|
|
* It can't be used on <<binary,binary>>, <<flattened,flattened>, <<ip,ip>>,
|
|
<<keyword,keyword>>, or <<text,text>> fields.
|
|
* It only supports a single sort value.
|
|
|
|
The metrics that the aggregation returns is the first hit that would be returned by the search
|
|
request. So,
|
|
|
|
`"sort": {"s": "desc"}`:: gets metrics from the document with the highest `s`
|
|
`"sort": {"s": "asc"}`:: gets the metrics from the document with the lowest `s`
|
|
`"sort": {"_geo_distance": {"location": "35.7796, -78.6382"}}`::
|
|
gets metrics from the documents with `location` *closest* to `35.7796, -78.6382`
|
|
`"sort": "_score"`:: gets metrics from the document with the highest score
|
|
|
|
NOTE: This aggregation doesn't support any sort of "tie breaking". If two documents have
|
|
the same sort values then this aggregation could return either document's fields.
|
|
|
|
==== `metrics`
|
|
|
|
`metrics` selects the fields to of the "top" document to return.
|
|
|
|
You can return multiple metrics by providing a list:
|
|
|
|
[source,console,id=search-aggregations-metrics-top-metrics-list-of-metrics]
|
|
----
|
|
PUT /test
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"d": {"type": "date"}
|
|
}
|
|
}
|
|
}
|
|
POST /test/_bulk?refresh
|
|
{"index": {}}
|
|
{"s": 1, "v": 3.1415, "m": 1, "d": "2020-01-01T00:12:12Z"}
|
|
{"index": {}}
|
|
{"s": 2, "v": 1.0, "m": 6, "d": "2020-01-02T00:12:12Z"}
|
|
{"index": {}}
|
|
{"s": 3, "v": 2.71828, "m": -12, "d": "2019-12-31T00:12:12Z"}
|
|
POST /test/_search?filter_path=aggregations
|
|
{
|
|
"aggs": {
|
|
"tm": {
|
|
"top_metrics": {
|
|
"metrics": [
|
|
{"field": "v"},
|
|
{"field": "m"},
|
|
{"field": "d"}
|
|
],
|
|
"sort": {"s": "desc"}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----
|
|
|
|
Which returns:
|
|
|
|
[source,js]
|
|
----
|
|
{
|
|
"aggregations": {
|
|
"tm": {
|
|
"top": [ {
|
|
"sort": [3],
|
|
"metrics": {
|
|
"v": 2.718280076980591,
|
|
"m": -12,
|
|
"d": "2019-12-31T00:12:12.000Z"
|
|
}
|
|
} ]
|
|
}
|
|
}
|
|
}
|
|
----
|
|
// TESTRESPONSE
|
|
|
|
==== `size`
|
|
|
|
`top_metrics` can return the top few document's worth of metrics using the size parameter:
|
|
|
|
[source,console,id=search-aggregations-metrics-top-metrics-size]
|
|
----
|
|
POST /test/_bulk?refresh
|
|
{"index": {}}
|
|
{"s": 1, "v": 3.1415}
|
|
{"index": {}}
|
|
{"s": 2, "v": 1.0}
|
|
{"index": {}}
|
|
{"s": 3, "v": 2.71828}
|
|
POST /test/_search?filter_path=aggregations
|
|
{
|
|
"aggs": {
|
|
"tm": {
|
|
"top_metrics": {
|
|
"metrics": {"field": "v"},
|
|
"sort": {"s": "desc"},
|
|
"size": 2
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----
|
|
|
|
Which returns:
|
|
|
|
[source,js]
|
|
----
|
|
{
|
|
"aggregations": {
|
|
"tm": {
|
|
"top": [
|
|
{"sort": [3], "metrics": {"v": 2.718280076980591 } },
|
|
{"sort": [2], "metrics": {"v": 1.0 } }
|
|
]
|
|
}
|
|
}
|
|
}
|
|
----
|
|
// TESTRESPONSE
|
|
|
|
The default `size` is 1. The maximum default size is `10` because the aggregation's
|
|
working storage is "dense", meaning we allocate `size` slots for every bucket. `10`
|
|
is a *very* conservative default maximum and you can raise it if you need to by
|
|
changing the `top_metrics_max_size` index setting. But know that large sizes can
|
|
take a fair bit of memory, especially if they are inside of an aggregation which
|
|
makes many buckes like a large
|
|
<<search-aggregations-metrics-top-metrics-example-terms, terms aggregation>>.
|
|
|
|
[source,console]
|
|
----
|
|
PUT /test/_settings
|
|
{
|
|
"top_metrics_max_size": 100
|
|
}
|
|
----
|
|
// TEST[continued]
|
|
|
|
NOTE: If `size` is more than `1` the `top_metrics` aggregation can't be the target of a sort.
|
|
|
|
|
|
==== Examples
|
|
|
|
[[search-aggregations-metrics-top-metrics-example-terms]]
|
|
===== Use with terms
|
|
|
|
This aggregation should be quite useful inside of <<search-aggregations-bucket-terms-aggregation, `terms`>>
|
|
aggregation, to, say, find the last value reported by each server.
|
|
|
|
[source,console,id=search-aggregations-metrics-top-metrics-terms]
|
|
----
|
|
PUT /node
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"ip": {"type": "ip"},
|
|
"date": {"type": "date"}
|
|
}
|
|
}
|
|
}
|
|
POST /node/_bulk?refresh
|
|
{"index": {}}
|
|
{"ip": "192.168.0.1", "date": "2020-01-01T01:01:01", "v": 1}
|
|
{"index": {}}
|
|
{"ip": "192.168.0.1", "date": "2020-01-01T02:01:01", "v": 2}
|
|
{"index": {}}
|
|
{"ip": "192.168.0.2", "date": "2020-01-01T02:01:01", "v": 3}
|
|
POST /node/_search?filter_path=aggregations
|
|
{
|
|
"aggs": {
|
|
"ip": {
|
|
"terms": {
|
|
"field": "ip"
|
|
},
|
|
"aggs": {
|
|
"tm": {
|
|
"top_metrics": {
|
|
"metrics": {"field": "v"},
|
|
"sort": {"date": "desc"}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----
|
|
|
|
Which returns:
|
|
|
|
[source,js]
|
|
----
|
|
{
|
|
"aggregations": {
|
|
"ip": {
|
|
"buckets": [
|
|
{
|
|
"key": "192.168.0.1",
|
|
"doc_count": 2,
|
|
"tm": {
|
|
"top": [ {"sort": ["2020-01-01T02:01:01.000Z"], "metrics": {"v": 2 } } ]
|
|
}
|
|
},
|
|
{
|
|
"key": "192.168.0.2",
|
|
"doc_count": 1,
|
|
"tm": {
|
|
"top": [ {"sort": ["2020-01-01T02:01:01.000Z"], "metrics": {"v": 3 } } ]
|
|
}
|
|
}
|
|
],
|
|
"doc_count_error_upper_bound": 0,
|
|
"sum_other_doc_count": 0
|
|
}
|
|
}
|
|
}
|
|
----
|
|
// TESTRESPONSE
|
|
|
|
Unlike `top_hits`, you can sort buckets by the results of this metric:
|
|
|
|
[source,console]
|
|
----
|
|
POST /node/_search?filter_path=aggregations
|
|
{
|
|
"aggs": {
|
|
"ip": {
|
|
"terms": {
|
|
"field": "ip",
|
|
"order": {"tm.v": "desc"}
|
|
},
|
|
"aggs": {
|
|
"tm": {
|
|
"top_metrics": {
|
|
"metrics": {"field": "v"},
|
|
"sort": {"date": "desc"}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----
|
|
// TEST[continued]
|
|
|
|
Which returns:
|
|
|
|
[source,js]
|
|
----
|
|
{
|
|
"aggregations": {
|
|
"ip": {
|
|
"buckets": [
|
|
{
|
|
"key": "192.168.0.2",
|
|
"doc_count": 1,
|
|
"tm": {
|
|
"top": [ {"sort": ["2020-01-01T02:01:01.000Z"], "metrics": {"v": 3 } } ]
|
|
}
|
|
},
|
|
{
|
|
"key": "192.168.0.1",
|
|
"doc_count": 2,
|
|
"tm": {
|
|
"top": [ {"sort": ["2020-01-01T02:01:01.000Z"], "metrics": {"v": 2 } } ]
|
|
}
|
|
}
|
|
],
|
|
"doc_count_error_upper_bound": 0,
|
|
"sum_other_doc_count": 0
|
|
}
|
|
}
|
|
}
|
|
----
|
|
// TESTRESPONSE
|
|
|
|
===== Mixed sort types
|
|
|
|
Sorting `top_metrics` by a field that has different types across different
|
|
indices producs somewhat suprising results: floating point fields are
|
|
always sorted independantly of whole numbered fields.
|
|
|
|
[source,console,id=search-aggregations-metrics-top-metrics-mixed-sort]
|
|
----
|
|
POST /test/_bulk?refresh
|
|
{"index": {"_index": "test1"}}
|
|
{"s": 1, "v": 3.1415}
|
|
{"index": {"_index": "test1"}}
|
|
{"s": 2, "v": 1}
|
|
{"index": {"_index": "test2"}}
|
|
{"s": 3.1, "v": 2.71828}
|
|
POST /test*/_search?filter_path=aggregations
|
|
{
|
|
"aggs": {
|
|
"tm": {
|
|
"top_metrics": {
|
|
"metrics": {"field": "v"},
|
|
"sort": {"s": "asc"}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----
|
|
|
|
Which returns:
|
|
|
|
[source,js]
|
|
----
|
|
{
|
|
"aggregations": {
|
|
"tm": {
|
|
"top": [ {"sort": [3.0999999046325684], "metrics": {"v": 2.718280076980591 } } ]
|
|
}
|
|
}
|
|
}
|
|
----
|
|
// TESTRESPONSE
|
|
|
|
While this is better than an error it *probably* isn't what you were going for.
|
|
While it does lose some precision, you can explictly cast the whole number
|
|
fields to floating points with something like:
|
|
|
|
[source,console]
|
|
----
|
|
POST /test*/_search?filter_path=aggregations
|
|
{
|
|
"aggs": {
|
|
"tm": {
|
|
"top_metrics": {
|
|
"metrics": {"field": "v"},
|
|
"sort": {"s": {"order": "asc", "numeric_type": "double"}}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----
|
|
// TEST[continued]
|
|
|
|
Which returns the much more expected:
|
|
|
|
[source,js]
|
|
----
|
|
{
|
|
"aggregations": {
|
|
"tm": {
|
|
"top": [ {"sort": [1.0], "metrics": {"v": 3.1414999961853027 } } ]
|
|
}
|
|
}
|
|
}
|
|
----
|
|
// TESTRESPONSE
|