mirror of https://github.com/apache/druid.git
120 lines
3.2 KiB
Markdown
120 lines
3.2 KiB
Markdown
|
---
|
||
|
layout: doc_page
|
||
|
---
|
||
|
TopN queries
|
||
|
==================
|
||
|
|
||
|
TopN queries return a sorted set of results for the values in a given dimension according to some criteria. Conceptually, they can be thought of as an approximate [GroupByQuery](GroupByQuery.html) over a single dimension with an [Ordering](Ordering.html) spec. TopNs are much faster and resource efficient than GroupBys for this use case. These types of queries take a topN query object and return an array of JSON objects where each object represents a value asked for by the topN query.
|
||
|
|
||
|
A topN query object looks like:
|
||
|
|
||
|
```json
|
||
|
"queryType": "topN",
|
||
|
"dataSource": "sample_data",
|
||
|
"dimension": "sample_dim",
|
||
|
"threshold": 5,
|
||
|
"metric": "count",
|
||
|
"granularity": "all",
|
||
|
"filter": {
|
||
|
"type": "and",
|
||
|
"fields": [
|
||
|
{
|
||
|
"type": "selector",
|
||
|
"dimension": "dim1",
|
||
|
"value": "some_value"
|
||
|
},
|
||
|
{
|
||
|
"type": "selector",
|
||
|
"dimension": "dim2",
|
||
|
"value": "some_other_val"
|
||
|
}
|
||
|
]
|
||
|
},
|
||
|
"aggregations": [
|
||
|
{
|
||
|
"type": "longSum",
|
||
|
"name": "count",
|
||
|
"fieldName": "count"
|
||
|
},
|
||
|
{
|
||
|
"type": "doubleSum",
|
||
|
"name": "some_metric",
|
||
|
"fieldName": "some_metric"
|
||
|
}
|
||
|
],
|
||
|
"postAggregations": [
|
||
|
{
|
||
|
"type": "arithmetic",
|
||
|
"name": "sample_divide",
|
||
|
"fn": "/",
|
||
|
"fields": [
|
||
|
{
|
||
|
"type": "fieldAccess",
|
||
|
"name": "some_metric",
|
||
|
"fieldName": "some_metric"
|
||
|
},
|
||
|
{
|
||
|
"type": "fieldAccess",
|
||
|
"name": "count",
|
||
|
"fieldName": "count"
|
||
|
}
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"intervals": [
|
||
|
"2013-08-31T00:00:00.000/2013-09-03T00:00:00.000"
|
||
|
]
|
||
|
}
|
||
|
```
|
||
|
|
||
|
There are 10 parts to a topN query, but 7 of them are shared with [TimeseriesQuery](TimeseriesQuery.html). Please review [TimeseriesQuery](TimeseriesQuery.html) for meanings of fields not defined below.
|
||
|
|
||
|
|property|description|required?|
|
||
|
|--------|-----------|---------|
|
||
|
|dimension|A JSON object defining the dimension that you want the top taken for. For more info, see [DimensionSpecs](DimensionSpecs.html)|yes|
|
||
|
|threshold|An integer defining the N in the topN (i.e. how many you want in the top list)|yes|
|
||
|
|metric|A JSON object specifying the metric to sort by for the top list. For more info, see [TopNMetricSpec](TopNMetricSpec.html).|yes|
|
||
|
|
||
|
Please note the context JSON object is also available for topN queries and should be used with the same caution as the timeseries case.
|
||
|
The format of the results would look like so:
|
||
|
|
||
|
```json
|
||
|
[
|
||
|
{
|
||
|
"timestamp": "2013-08-31T00:00:00.000Z",
|
||
|
"result": [
|
||
|
{
|
||
|
"user": "67.173.175.77",
|
||
|
"count": 111,
|
||
|
"some_metrics": 10669,
|
||
|
"average": 96.11711711711712
|
||
|
},
|
||
|
{
|
||
|
"user": "24.10.49.170",
|
||
|
"count": 88,
|
||
|
"some_metrics": 28344,
|
||
|
"average": 322.09090909090907
|
||
|
},
|
||
|
{
|
||
|
"user": "72.193.24.148",
|
||
|
"count": 70,
|
||
|
"some_metrics": 871,
|
||
|
"average": 12.442857142857143
|
||
|
},
|
||
|
{
|
||
|
"user": "108.46.28.47",
|
||
|
"count": 62,
|
||
|
"some_metrics": 815,
|
||
|
"average": 13.14516129032258
|
||
|
},
|
||
|
{
|
||
|
"user": "99.181.143.133",
|
||
|
"count": 60,
|
||
|
"some_metrics": 2787,
|
||
|
"average": 46.45
|
||
|
}
|
||
|
]
|
||
|
}
|
||
|
]
|
||
|
```
|