druid/docs/content/GroupByQuery.md

---
layout: doc_page
---
# groupBy Queries
These types of queries take a groupBy query object and return an array of JSON objects where each object represents a grouping asked for by the query. Note: If you only want to do straight aggregates for some time range, we highly recommend using [TimeseriesQueries](TimeseriesQuery.html) instead. The performance will be substantially better.
An example groupBy query object is shown below:

``` json
{
  "queryType": "groupBy",
  "dataSource": "sample_datasource",
  "granularity": "day",
  "dimensions": ["dim1", "dim2"],
  "limitSpec": { "type": "default", "limit": 5000, "columns": ["dim1", "metric1"] },
  "filter": {
    "type": "and",
    "fields": [
      { "type": "selector", "dimension": "sample_dimension1", "value": "sample_value1" },
      { "type": "or", 
        "fields": [
          { "type": "selector", "dimension": "sample_dimension2", "value": "sample_value2" },
          { "type": "selector", "dimension": "sample_dimension3", "value": "sample_value3" }
        ]
      }
    ]
  },
  "aggregations": [
    { "type": "longSum", "name": "sample_name1", "fieldName": "sample_fieldName1" },
    { "type": "doubleSum", "name": "sample_name2", "fieldName": "sample_fieldName2" }
  ],
  "postAggregations": [
    { "type": "arithmetic",
      "name": "sample_divide",
      "fn": "/",
      "fields": [
        { "type": "fieldAccess", "name": "sample_name1", "fieldName": "sample_fieldName1" },
        { "type": "fieldAccess", "name": "sample_name2", "fieldName": "sample_fieldName2" }
      ]
    }
  ],
  "intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ],
  "having": { "type": "greaterThan", "aggregation": "sample_name1", "value": 0 }
}
```

There are 11 main parts to a groupBy query:

|property|description|required?|
|--------|-----------|---------|
|queryType|This String should always be "groupBy"; this is the first thing Druid looks at to figure out how to interpret the query|yes|
|dataSource|A String defining the data source to query, very similar to a table in a relational database, or a [DataSource](DataSource.html) structure.|yes|
|dimensions|A JSON list of dimensions to do the groupBy over; or see [DimensionSpec](DimensionSpecs) for ways to extract dimensions. |yes|
|limitSpec|See [LimitSpec](LimitSpec.html).|no|
|having|See [Having](Having.html).|no|
|granularity|Defines the granularity of the query. See [Granularities](Granularities.html)|yes|
|filter|See [Filters](Filters.html)|no|
|aggregations|See [Aggregations](Aggregations.html)|yes|
|postAggregations|See [Post Aggregations](Post-aggregations.html)|no|
|intervals|A JSON Object representing ISO-8601 Intervals. This defines the time ranges to run the query over.|yes|
|context|An additional JSON Object which can be used to specify certain flags.|no|

To pull it all together, the above query would return *n\*m* data points, up to a maximum of 5000 points, where n is the cardinality of the "dim1" dimension, m is the cardinality of the "dim2" dimension, each day between 2012-01-01 and 2012-01-03, from the "sample_datasource" table. Each data point contains the (long) sum of sample_fieldName1 if the value of the data point is greater than 0, the (double) sum of sample_fieldName2 and the (double) the result of sample_fieldName1 divided by sample_fieldName2 for the filter set for a particular grouping of "dim1" and "dim2". The output looks like this:

```json
[ 
  {
    "version" : "v1",
    "timestamp" : "2012-01-01T00:00:00.000Z",
    "event" : {
      "dim1" : <some_dim_value_one>,
      "dim2" : <some_dim_value_two>,
      "sample_name1" : <some_sample_name_value_one>,
      "sample_name2" :<some_sample_name_value_two>,
      "sample_divide" : <some_sample_divide_value>
    }
  }, 
  {
    "version" : "v1",
    "timestamp" : "2012-01-01T00:00:00.000Z",
    "event" : {
      "dim1" : <some_other_dim_value_one>,
      "dim2" : <some_other_dim_value_two>,
      "sample_name1" : <some_other_sample_name_value_one>,
      "sample_name2" :<some_other_sample_name_value_two>,
      "sample_divide" : <some_other_sample_divide_value>
    }
  },
...
]
```
Added prepend tag to make pages display. 2013-09-16 17:49:36 -04:00			`---`
Docs working 2013-09-26 19:22:28 -04:00			`layout: doc_page`
Added prepend tag to make pages display. 2013-09-16 17:49:36 -04:00			`---`
added titles; fixed a few typos 2014-01-16 18:37:07 -05:00			`# groupBy Queries`
typo fix 2014-01-28 09:56:38 -05:00			`These types of queries take a groupBy query object and return an array of JSON objects where each object represents a grouping asked for by the query. Note: If you only want to do straight aggregates for some time range, we highly recommend using [TimeseriesQueries](TimeseriesQuery.html) instead. The performance will be substantially better.`
Add docs from github wiki 2013-09-13 18:20:39 -04:00			`An example groupBy query object is shown below:`

Finish converting docs over to something that displays properly 2013-09-27 20:08:34 -04:00			``` json
Add docs from github wiki 2013-09-13 18:20:39 -04:00			`{`
Finish converting docs over to something that displays properly 2013-09-27 20:08:34 -04:00			`"queryType": "groupBy",`
			`"dataSource": "sample_datasource",`
			`"granularity": "day",`
			`"dimensions": ["dim1", "dim2"],`
			`"limitSpec": { "type": "default", "limit": 5000, "columns": ["dim1", "metric1"] },`
			`"filter": {`
			`"type": "and",`
			`"fields": [`
			`{ "type": "selector", "dimension": "sample_dimension1", "value": "sample_value1" },`
			`{ "type": "or",`
			`"fields": [`
			`{ "type": "selector", "dimension": "sample_dimension2", "value": "sample_value2" },`
			`{ "type": "selector", "dimension": "sample_dimension3", "value": "sample_value3" }`
			`]`
			`}`
			`]`
			`},`
			`"aggregations": [`
			`{ "type": "longSum", "name": "sample_name1", "fieldName": "sample_fieldName1" },`
			`{ "type": "doubleSum", "name": "sample_name2", "fieldName": "sample_fieldName2" }`
			`],`
			`"postAggregations": [`
			`{ "type": "arithmetic",`
			`"name": "sample_divide",`
			`"fn": "/",`
			`"fields": [`
			`{ "type": "fieldAccess", "name": "sample_name1", "fieldName": "sample_fieldName1" },`
			`{ "type": "fieldAccess", "name": "sample_name2", "fieldName": "sample_fieldName2" }`
			`]`
			`}`
			`],`
			`"intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ],`
			`"having": { "type": "greaterThan", "aggregation": "sample_name1", "value": 0 }`
Add docs from github wiki 2013-09-13 18:20:39 -04:00			`}`
Finish converting docs over to something that displays properly 2013-09-27 20:08:34 -04:00			```
Add docs from github wiki 2013-09-13 18:20:39 -04:00
document limitspec in groupby, fixes #394 2014-05-29 13:19:11 -04:00			`There are 11 main parts to a groupBy query:`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			`\|property\|description\|required?\|`
			`\|--------\|-----------\|---------\|`
Finish converting docs over to something that displays properly 2013-09-27 20:08:34 -04:00			`\|queryType\|This String should always be "groupBy"; this is the first thing Druid looks at to figure out how to interpret the query\|yes\|`
Documentation for query data sources 2014-01-24 19:12:10 -05:00			`\|dataSource\|A String defining the data source to query, very similar to a table in a relational database, or a [DataSource](DataSource.html) structure.\|yes\|`
added link to DimensionSpec for alternative way to provide dimensions 2014-10-23 17:30:02 -04:00			`\|dimensions\|A JSON list of dimensions to do the groupBy over; or see [DimensionSpec](DimensionSpecs) for ways to extract dimensions. \|yes\|`
document limitspec in groupby, fixes #394 2014-05-29 13:19:11 -04:00			`\|limitSpec\|See [LimitSpec](LimitSpec.html).\|no\|`
Converted links, sans space to slash 2013-09-16 19:01:14 -04:00			`\|having\|See [Having](Having.html).\|no\|`
			`\|granularity\|Defines the granularity of the query. See [Granularities](Granularities.html)\|yes\|`
			`\|filter\|See [Filters](Filters.html)\|no\|`
			`\|aggregations\|See [Aggregations](Aggregations.html)\|yes\|`
fix post-aggregations broken link 2014-08-29 19:39:09 -04:00			`\|postAggregations\|See [Post Aggregations](Post-aggregations.html)\|no\|`
Add docs from github wiki 2013-09-13 18:20:39 -04:00			`\|intervals\|A JSON Object representing ISO-8601 Intervals. This defines the time ranges to run the query over.\|yes\|`
			`\|context\|An additional JSON Object which can be used to specify certain flags.\|no\|`

Finish converting docs over to something that displays properly 2013-09-27 20:08:34 -04:00			To pull it all together, the above query would return n\m* data points, up to a maximum of 5000 points, where n is the cardinality of the "dim1" dimension, m is the cardinality of the "dim2" dimension, each day between 2012-01-01 and 2012-01-03, from the "sample_datasource" table. Each data point contains the (long) sum of sample_fieldName1 if the value of the data point is greater than 0, the (double) sum of sample_fieldName2 and the (double) the result of sample_fieldName1 divided by sample_fieldName2 for the filter set for a particular grouping of "dim1" and "dim2". The output looks like this:
Add docs from github wiki 2013-09-13 18:20:39 -04:00
Finish converting docs over to something that displays properly 2013-09-27 20:08:34 -04:00			```json
			`[`
			`{`
			`"version" : "v1",`
			`"timestamp" : "2012-01-01T00:00:00.000Z",`
			`"event" : {`
			`"dim1" : <some_dim_value_one>,`
			`"dim2" : <some_dim_value_two>,`
			`"sample_name1" : <some_sample_name_value_one>,`
			`"sample_name2" :<some_sample_name_value_two>,`
			`"sample_divide" : <some_sample_divide_value>`
			`}`
			`},`
			`{`
			`"version" : "v1",`
			`"timestamp" : "2012-01-01T00:00:00.000Z",`
			`"event" : {`
			`"dim1" : <some_other_dim_value_one>,`
			`"dim2" : <some_other_dim_value_two>,`
			`"sample_name1" : <some_other_sample_name_value_one>,`
			`"sample_name2" :<some_other_sample_name_value_two>,`
			`"sample_divide" : <some_other_sample_divide_value>`
			`}`
			`},`
			`...`
Add docs from github wiki 2013-09-13 18:20:39 -04:00			`]`
fix post-aggregations broken link 2014-08-29 19:39:09 -04:00			```