druid/docs/content/TopNQuery.md

121 lines
3.2 KiB
Markdown
Raw Normal View History

2014-01-08 18:51:03 -05:00
---
layout: doc_page
---
TopN queries
==================
2014-08-31 17:51:12 -04:00
TopN queries return a sorted set of results for the values in a given dimension according to some criteria. Conceptually, they can be thought of as an approximate [GroupByQuery](GroupByQuery.html) over a single dimension with an [Ordering](LimitSpec.html) spec. TopNs are much faster and resource efficient than GroupBys for this use case. These types of queries take a topN query object and return an array of JSON objects where each object represents a value asked for by the topN query.
2014-01-08 18:51:03 -05:00
A topN query object looks like:
```json
2014-01-28 11:04:19 -05:00
{
2014-01-08 18:51:03 -05:00
"queryType": "topN",
"dataSource": "sample_data",
"dimension": "sample_dim",
"threshold": 5,
"metric": "count",
"granularity": "all",
"filter": {
"type": "and",
"fields": [
{
"type": "selector",
"dimension": "dim1",
"value": "some_value"
},
{
"type": "selector",
"dimension": "dim2",
"value": "some_other_val"
}
]
},
"aggregations": [
{
"type": "longSum",
"name": "count",
"fieldName": "count"
},
{
"type": "doubleSum",
"name": "some_metric",
"fieldName": "some_metric"
}
],
"postAggregations": [
{
"type": "arithmetic",
"name": "sample_divide",
"fn": "/",
"fields": [
{
"type": "fieldAccess",
"name": "some_metric",
"fieldName": "some_metric"
},
{
"type": "fieldAccess",
"name": "count",
"fieldName": "count"
}
]
}
],
"intervals": [
"2013-08-31T00:00:00.000/2013-09-03T00:00:00.000"
]
}
```
There are 10 parts to a topN query, but 7 of them are shared with [TimeseriesQuery](TimeseriesQuery.html). Please review [TimeseriesQuery](TimeseriesQuery.html) for meanings of fields not defined below.
|property|description|required?|
|--------|-----------|---------|
|dimension|A JSON object defining the dimension that you want the top taken for. For more info, see [DimensionSpecs](DimensionSpecs.html)|yes|
|threshold|An integer defining the N in the topN (i.e. how many you want in the top list)|yes|
|metric|A JSON object specifying the metric to sort by for the top list. For more info, see [TopNMetricSpec](TopNMetricSpec.html).|yes|
Please note the context JSON object is also available for topN queries and should be used with the same caution as the timeseries case.
The format of the results would look like so:
```json
[
{
"timestamp": "2013-08-31T00:00:00.000Z",
"result": [
{
2014-01-08 19:10:48 -05:00
"dim1": "dim1_val",
2014-01-08 18:51:03 -05:00
"count": 111,
"some_metrics": 10669,
"average": 96.11711711711712
},
{
2014-01-08 19:10:48 -05:00
"dim1": "another_dim1_val",
2014-01-08 18:51:03 -05:00
"count": 88,
"some_metrics": 28344,
"average": 322.09090909090907
},
{
2014-01-08 19:10:48 -05:00
"dim1": "dim1_val3",
2014-01-08 18:51:03 -05:00
"count": 70,
"some_metrics": 871,
"average": 12.442857142857143
},
{
2014-01-08 19:10:48 -05:00
"dim1": "dim1_val4",
2014-01-08 18:51:03 -05:00
"count": 62,
"some_metrics": 815,
"average": 13.14516129032258
},
{
2014-01-08 19:10:48 -05:00
"dim1": "dim1_val5",
2014-01-08 18:51:03 -05:00
"count": 60,
"some_metrics": 2787,
"average": 46.45
}
]
}
]
```