druid/docs/content/querying/groupbyquery.md

---
layout: doc_page
---
# groupBy Queries

These types of queries take a groupBy query object and return an array of JSON objects where each object represents a
grouping asked for by the query.

<div class="note info">
Note: If you are doing aggregations with time as your only grouping, or an ordered groupBy over a single dimension,
consider <a href="timeseriesquery.html">Timeseries</a> and <a href="topnquery.html">TopN</a> queries as well as
groupBy. Their performance may be better in some cases. See <a href="#alternatives">Alternatives</a> below for more details.
</div>

An example groupBy query object is shown below:

``` json
{
  "queryType": "groupBy",
  "dataSource": "sample_datasource",
  "granularity": "day",
  "dimensions": ["country", "device"],
  "limitSpec": { "type": "default", "limit": 5000, "columns": ["country", "data_transfer"] },
  "filter": {
    "type": "and",
    "fields": [
      { "type": "selector", "dimension": "carrier", "value": "AT&T" },
      { "type": "or", 
        "fields": [
          { "type": "selector", "dimension": "make", "value": "Apple" },
          { "type": "selector", "dimension": "make", "value": "Samsung" }
        ]
      }
    ]
  },
  "aggregations": [
    { "type": "longSum", "name": "total_usage", "fieldName": "user_count" },
    { "type": "doubleSum", "name": "data_transfer", "fieldName": "data_transfer" }
  ],
  "postAggregations": [
    { "type": "arithmetic",
      "name": "avg_usage",
      "fn": "/",
      "fields": [
        { "type": "fieldAccess", "fieldName": "data_transfer" },
        { "type": "fieldAccess", "fieldName": "total_usage" }
      ]
    }
  ],
  "intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ],
  "having": {
    "type": "greaterThan",
    "aggregation": "total_usage",
    "value": 100
  }
}
```

There are 11 main parts to a groupBy query:

|property|description|required?|
|--------|-----------|---------|
|queryType|This String should always be "groupBy"; this is the first thing Druid looks at to figure out how to interpret the query|yes|
|dataSource|A String or Object defining the data source to query, very similar to a table in a relational database. See [DataSource](../querying/datasource.html) for more information.|yes|
|dimensions|A JSON list of dimensions to do the groupBy over; or see [DimensionSpec](../querying/dimensionspecs.html) for ways to extract dimensions. |yes|
|limitSpec|See [LimitSpec](../querying/limitspec.html).|no|
|having|See [Having](../querying/having.html).|no|
|granularity|Defines the granularity of the query. See [Granularities](../querying/granularities.html)|yes|
|filter|See [Filters](../querying/filters.html)|no|
|aggregations|See [Aggregations](../querying/aggregations.html)|no|
|postAggregations|See [Post Aggregations](../querying/post-aggregations.html)|no|
|intervals|A JSON Object representing ISO-8601 Intervals. This defines the time ranges to run the query over.|yes|
|context|An additional JSON Object which can be used to specify certain flags.|no|

To pull it all together, the above query would return *n\*m* data points, up to a maximum of 5000 points, where n is the cardinality of the `country` dimension, m is the cardinality of the `device` dimension, each day between 2012-01-01 and 2012-01-03, from the `sample_datasource` table. Each data point contains the (long) sum of `total_usage` if the value of the data point is greater than 100, the (double) sum of `data_transfer` and the (double) result of `total_usage` divided by `data_transfer` for the filter set for a particular grouping of `country` and `device`. The output looks like this:

```json
[ 
  {
    "version" : "v1",
    "timestamp" : "2012-01-01T00:00:00.000Z",
    "event" : {
      "country" : <some_dim_value_one>,
      "device" : <some_dim_value_two>,
      "total_usage" : <some_value_one>,
      "data_transfer" :<some_value_two>,
      "avg_usage" : <some_avg_usage_value>
    }
  }, 
  {
    "version" : "v1",
    "timestamp" : "2012-01-01T00:00:12.000Z",
    "event" : {
      "dim1" : <some_other_dim_value_one>,
      "dim2" : <some_other_dim_value_two>,
      "sample_name1" : <some_other_value_one>,
      "sample_name2" :<some_other_value_two>,
      "avg_usage" : <some_other_avg_usage_value>
    }
  },
...
]
```

### Behavior on multi-value dimensions

groupBy queries can group on multi-value dimensions. When grouping on a multi-value dimension, _all_ values
from matching rows will be used to generate one group per value. It's possible for a query to return more groups than
there are rows. For example, a groupBy on the dimension `tags` with filter `"t1" AND "t3"` would match only row1, and
generate a result with three groups: `t1`, `t2`, and `t3`. If you only need to include values that match
your filter, you can use a [filtered dimensionSpec](dimensionspecs.html#filtered-dimensionspecs). This can also
improve performance.

See [Multi-value dimensions](multi-value-dimensions.html) for more details.

### Implementation details

#### Strategies

GroupBy queries can be executed using two different strategies. The default strategy for a cluster is determined by the
"druid.query.groupBy.defaultStrategy" runtime property on the broker. This can be overridden using "groupByStrategy" in
the query context. If neither the context field nor the property is set, the "v2" strategy will be used.

- "v2", the default, is designed to offer better performance and memory management. This strategy generates
per-segment results using a fully off-heap map. Data nodes merge the per-segment results using a fully off-heap
concurrent facts map combined with an on-heap string dictionary. This may optionally involve spilling to disk. Data
nodes return sorted results to the broker, which merges result streams using an N-way merge. The broker materializes
the results if necessary (e.g. if the query sorts on columns other than its dimensions). Otherwise, it streams results
back as they are merged.

- "v1", a legacy engine, generates per-segment results on data nodes (historical, realtime, middleManager) using a map which
is partially on-heap (dimension keys and the map itself) and partially off-heap (the aggregated values). Data nodes then
merge the per-segment results using Druid's indexing mechanism. This merging is multi-threaded by default, but can
optionally be single-threaded. The broker merges the final result set using Druid's indexing mechanism again. The broker
merging is always single-threaded. Because the broker merges results using the indexing mechanism, it must materialize
the full result set before returning any results. On both the data nodes and the broker, the merging index is fully
on-heap by default, but it can optionally store aggregated values off-heap.

#### Differences between v1 and v2

Query API and results are compatible between the two engines; however, there are some differences from a cluster
configuration perspective:

- groupBy v1 controls resource usage using a row-based limit (maxResults) whereas groupBy v2 uses bytes-based limits.
In addition, groupBy v1 merges results on-heap, whereas groupBy v2 merges results off-heap. These factors mean that
memory tuning and resource limits behave differently between v1 and v2. In particular, due to this, some queries
that can complete successfully in one engine may exceed resource limits and fail with the other engine. See the
"Memory tuning and resource limits" section for more details.
- groupBy v1 imposes no limit on the number of concurrently running queries, whereas groupBy v2 controls memory usage
by using a finite-sized merge buffer pool. By default, the number of merge buffers is 1/4 the number of processing
threads. You can adjust this as necessary to balance concurrency and memory usage.
- groupBy v1 supports caching on either the broker or historical nodes, whereas groupBy v2 only supports caching on
historical nodes.
- groupBy v1 supports using [chunkPeriod](query-context.html) to parallelize merging on the broker, whereas groupBy v2
ignores chunkPeriod.
- groupBy v2 supports both array-based aggregation and hash-based aggregation. The array-based aggregation is used only
when the grouping key is a single indexed string column. In array-based aggregation, the dictionary-encoded value is used
as the index, so the aggregated values in the array can be accessed directly without finding buckets based on hashing.

#### Memory tuning and resource limits

When using groupBy v2, three parameters control resource usage and limits:

- druid.processing.buffer.sizeBytes: size of the off-heap hash table used for aggregation, per query, in bytes. At
most druid.processing.numMergeBuffers of these will be created at once, which also serves as an upper limit on the
number of concurrently running groupBy queries.
- druid.query.groupBy.maxMergingDictionarySize: size of the on-heap dictionary used when grouping on strings, per query,
in bytes. Note that this is based on a rough estimate of the dictionary size, not the actual size.
- druid.query.groupBy.maxOnDiskStorage: amount of space on disk used for aggregation, per query, in bytes. By default,
this is 0, which means aggregation will not use disk.

If maxOnDiskStorage is 0 (the default) then a query that exceeds either the on-heap dictionary limit, or the off-heap
aggregation table limit, will fail with a "Resource limit exceeded" error describing the limit that was exceeded.

If maxOnDiskStorage is greater than 0, queries that exceed the in-memory limits will start using disk for aggregation.
In this case, when either the on-heap dictionary or off-heap hash table fills up, partially aggregated records will be
sorted and flushed to disk. Then, both in-memory structures will be cleared out for further aggregation. Queries that
then go on to exceed maxOnDiskStorage will fail with a "Resource limit exceeded" error indicating that they ran out of
disk space.

With groupBy v2, cluster operators should make sure that the off-heap hash tables and on-heap merging dictionaries
will not exceed available memory for the maximum possible concurrent query load (given by
druid.processing.numMergeBuffers). See [How much direct memory does Druid use?](../operations/performance-faq.html) for more details.

When using groupBy v1, all aggregation is done on-heap, and resource limits are done through the parameter
druid.query.groupBy.maxResults. This is a cap on the maximum number of results in a result set. Queries that exceed
this limit will fail with a "Resource limit exceeded" error indicating they exceeded their row limit. Cluster
operators should make sure that the on-heap aggregations will not exceed available JVM heap space for the expected
concurrent query load.

#### Performance tuning for groupBy v2

##### Limit pushdown optimization

Druid pushes down the `limit` spec in groupBy queries to the segments on historicals wherever possible to early prune unnecessary intermediate results and minimize the amount of data transferred to brokers. By default, this technique is applied only when all fields in the `orderBy` spec is a subset of the grouping keys. This is because the `limitPushDown` doesn't guarantee the exact results if the `orderBy` spec includes any fields that are not in the grouping keys. However, you can enable this technique even in such cases if you can sacrifice some accuracy for fast query processing like in topN queries. See `forceLimitPushDown` in [advanced groupBy v2 configurations](#groupby-v2-configurations).


##### Optimizing hash table

The groupBy v2 engine uses an open addressing hash table for aggregation. The hash table is initalized with a given initial bucket number and gradually grows on buffer full. On hash collisions, the linear probing technique is used.

The default number of initial buckets is 1024 and the default max load factor of the hash table is 0.7. If you can see too many collisions in the hash table, you can adjust these numbers. See `bufferGrouperInitialBuckets` and `bufferGrouperMaxLoadFactor` in [Advanced groupBy v2 configurations](#groupby-v2-configurations).


##### Parallel combine

Once a historical finishes aggregation using the hash table, it sorts aggregates and merge them before sending to the broker for N-way merge aggregation in the broker. By default, historicals use all their available processing threads (configured by `druid.processing.numThreads`) for aggregation, but use a single thread for sorting and merging aggregates which is an http thread to send data to brokers.

This is to prevent some heavy groupBy queries from blocking other queries. In Druid, the processing threads are shared between all submitted queries and they are _not interruptible_. It means, if a heavy query takes all available processing threads, all other queries might be blocked until the heavy query is finished. GroupBy queries usually take longer time than timeseries or topN queries, they should release processing threads as soon as possible.

However, you might care about the performance of some really heavy groupBy queries. Usually, the performance bottleneck of heavy groupBy queries is merging sorted aggregates. In such cases, you can use processing threads for it as well. This is called _parallel combine_. To enable parallel combine, see `numParallelCombineThreads` in [Advanced groupBy v2 configurations](#groupby-v2-configurations). Note that parallel combine can be enabled only when data is actually spilled (see [Memory tuning and resource limits](#memory-tuning-and-resource-limits)).

Once parallel combine is enabled, the groupBy v2 engine can create a combining tree for merging sorted aggregates. Each intermediate node of the tree is a thread merging aggregates from the child nodes. The leaf node threads read and merge aggregates from hash tables including spilled ones. Usually, leaf nodes are slower than intermediate nodes because they need to read data from disk. As a result, less threads are used for intermediate nodes by default. You can change the degree of intermeidate nodes. See `intermediateCombineDegree` in [Advanced groupBy v2 configurations](#groupby-v2-configurations).


#### Alternatives

There are some situations where other query types may be a better choice than groupBy.

- For queries with no "dimensions" (i.e. grouping by time only) the [Timeseries query](timeseriesquery.html) will
generally be faster than groupBy. The major differences are that it is implemented in a fully streaming manner (taking
advantage of the fact that segments are already sorted on time) and does not need to use a hash table for merging.

- For queries with a single "dimensions" element (i.e. grouping by one string dimension), the [TopN query](topnquery.html)
will sometimes be faster than groupBy. This is especially true if you are ordering by a metric and find approximate
results acceptable.

#### Nested groupBys

Nested groupBys (dataSource of type "query") are performed differently for "v1" and "v2". The broker first runs the
inner groupBy query in the usual way. "v1" strategy then materializes the inner query's results on-heap with Druid's
indexing mechanism, and runs the outer query on these materialized results. "v2" strategy runs the outer query on the
inner query's results stream with off-heap fact map and on-heap string dictionary that can spill to disk. Both
strategy perform the outer query on the broker in a single-threaded fashion.

#### Configurations

This section describes the configurations for groupBy queries. You can set system-wide configurations by adding them to runtime properties or query-specific configurations by adding them to query contexts. All runtime properties are prefixed by `druid.query.groupBy`.

#### Commonly tuned configurations

##### Configurations for groupBy v2

Supported runtime properties:

|Property|Description|Default|
|--------|-----------|-------|
|`druid.query.groupBy.maxMergingDictionarySize`|Maximum amount of heap space (approximately) to use for the string dictionary during merging. When the dictionary exceeds this size, a spill to disk will be triggered.|100000000|
|`druid.query.groupBy.maxOnDiskStorage`|Maximum amount of disk space to use, per-query, for spilling result sets to disk when either the merging buffer or the dictionary fills up. Queries that exceed this limit will fail. Set to zero to disable disk spilling.|0 (disabled)|

Supported query contexts:

|Key|Description|
|---|-----------|
|`maxMergingDictionarySize`|Can be used to lower the value of `druid.query.groupBy.maxMergingDictionarySize` for this query.|
|`maxOnDiskStorage`|Can be used to lower the value of `druid.query.groupBy.maxOnDiskStorage` for this query.|


#### Advanced configurations

##### Common configuragions for all groupBy strategies

Supported runtime properties:

|Property|Description|Default|
|--------|-----------|-------|
|`druid.query.groupBy.defaultStrategy`|Default groupBy query strategy.|v2|
|`druid.query.groupBy.singleThreaded`|Merge results using a single thread.|false|

Supported query contexts:

|Key|Description|
|---|-----------|
|`groupByStrategy`|Overrides the value of `druid.query.groupBy.defaultStrategy` for this query.|
|`groupByIsSingleThreaded`|Overrides the value of `druid.query.groupBy.singleThreaded` for this query.|


##### GroupBy v2 configurations

Supported runtime properties:

|Property|Description|Default|
|--------|-----------|-------|
|`druid.query.groupBy.bufferGrouperInitialBuckets`|Initial number of buckets in the off-heap hash table used for grouping results. Set to 0 to use a reasonable default (1024).|0|
|`druid.query.groupBy.bufferGrouperMaxLoadFactor`|Maximum load factor of the off-heap hash table used for grouping results. When the load factor exceeds this size, the table will be grown or spilled to disk. Set to 0 to use a reasonable default (0.7).|0|
|`druid.query.groupBy.forceHashAggregation`|Force to use hash-based aggregation.|false|
|`druid.query.groupBy.intermediateCombineDegree`|Number of intermediate nodes combined together in the combining tree. Higher degrees will need less threads which might be helpful to improve the query performance by reducing the overhead of too many threads if the server has sufficiently powerful cpu cores.|8|
|`druid.query.groupBy.numParallelCombineThreads`|Hint for the number of parallel combining threads. This should be larger than 1 to turn on the parallel combining feature. The actual number of threads used for parallel combining is min(`druid.query.groupBy.numParallelCombineThreads`, `druid.processing.numThreads`).|1 (disabled)|

Supported query contexts:

|Key|Description|Default|
|---|-----------|-------|
|`bufferGrouperInitialBuckets`|Overrides the value of `druid.query.groupBy.bufferGrouperInitialBuckets` for this query.|None|
|`bufferGrouperMaxLoadFactor`|Overrides the value of `druid.query.groupBy.bufferGrouperMaxLoadFactor` for this query.|None|
|`forceHashAggregation`|Overrides the value of `druid.query.groupBy.forceHashAggregation`|None|
|`intermediateCombineDegree`|Overrides the value of `druid.query.groupBy.intermediateCombineDegree`|None|
|`numParallelCombineThreads`|Overrides the value of `druid.query.groupBy.numParallelCombineThreads`|None|
|`sortByDimsFirst`|Sort the results first by dimension values and then by timestamp.|false|
|`forceLimitPushDown`|When all fields in the orderby are part of the grouping key, the broker will push limit application down to the historical nodes. When the sorting order uses fields that are not in the grouping key, applying this optimization can result in approximate results with unknown accuracy, so this optimization is disabled by default in that case. Enabling this context flag turns on limit push down for limit/orderbys that contain non-grouping key columns.|false|


##### GroupBy v1 configurations

Supported runtime properties:

|Property|Description|Default|
|--------|-----------|-------|
|`druid.query.groupBy.maxIntermediateRows`|Maximum number of intermediate rows for the per-segment grouping engine. This is a tuning parameter that does not impose a hard limit; rather, it potentially shifts merging work from the per-segment engine to the overall merging index. Queries that exceed this limit will not fail.|50000|
|`druid.query.groupBy.maxResults`|Maximum number of results. Queries that exceed this limit will fail.|500000|

Supported query contexts:

|Key|Description|Default|
|---|-----------|-------|
|`maxIntermediateRows`|Can be used to lower the value of `druid.query.groupBy.maxIntermediateRows` for this query.|None|
|`maxResults`|Can be used to lower the value of `druid.query.groupBy.maxResults` for this query.|None|
|`useOffheap`|Set to true to store aggregations off-heap when merging results.|false|
-												renaming all *.md filenames to only have lowercase and dashes
so that they are editable on case-insensitive os as well

											
										
										
											2015-05-05 17:07:32 -04:00
+								---
 								layout: doc_page
 								---
 								# groupBy Queries
-												Alternative groupBy strategy. (#2998)

This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
											
										
										
											2016-06-24 21:06:09 -04:00
-												renaming all *.md filenames to only have lowercase and dashes
so that they are editable on case-insensitive os as well

											
										
										
											2015-05-05 17:07:32 -04:00
+								These types of queries take a groupBy query object and return an array of JSON objects where each object represents a
-												Alternative groupBy strategy. (#2998)

This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
											
										
										
											2016-06-24 21:06:09 -04:00
+								grouping asked for by the query.
 								<div class="note info">
 								Note: If you are doing aggregations with time as your only grouping, or an ordered groupBy over a single dimension,
 								consider <a href="timeseriesquery.html">Timeseries</a> and <a href="topnquery.html">TopN</a> queries as well as
 								groupBy. Their performance may be better in some cases. See <a href="#alternatives">Alternatives</a> below for more details.
 								</div>
-												renaming all *.md filenames to only have lowercase and dashes
so that they are editable on case-insensitive os as well

											
										
										
											2015-05-05 17:07:32 -04:00
+								An example groupBy query object is shown below:
 								``` json
 								{
 								  "queryType": "groupBy",
 								  "dataSource": "sample_datasource",
 								  "granularity": "day",
 								  "dimensions": ["country", "device"],
 								  "limitSpec": { "type": "default", "limit": 5000, "columns": ["country", "data_transfer"] },
 								  "filter": {
 								    "type": "and",
 								    "fields": [
 								      { "type": "selector", "dimension": "carrier", "value": "AT&T" },
 								      { "type": "or",
 								        "fields": [
 								          { "type": "selector", "dimension": "make", "value": "Apple" },
 								          { "type": "selector", "dimension": "make", "value": "Samsung" }
 								        ]
 								      }
 								    ]
 								  },
 								  "aggregations": [
 								    { "type": "longSum", "name": "total_usage", "fieldName": "user_count" },
 								    { "type": "doubleSum", "name": "data_transfer", "fieldName": "data_transfer" }
 								  ],
 								  "postAggregations": [
 								    { "type": "arithmetic",
 								      "name": "avg_usage",
 								      "fn": "/",
 								      "fields": [
 								        { "type": "fieldAccess", "fieldName": "data_transfer" },
 								        { "type": "fieldAccess", "fieldName": "total_usage" }
 								      ]
 								    }
 								  ],
 								  "intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ],
 								  "having": {
-												Add streaming aggregation as the last step of ConcurrentGrouper if data are spilled (#4704)

* Add steaming grouper

* Fix doc

* Use a single dictionary while combining

* Revert GroupByBenchmark

* Removed unused code

* More cleanup

* Remove unused config

* Fix some typos and bugs

* Refactor Groupers.mergeIterators()

* Add comments for combining tree

* Refactor buildCombineTree

* Refactor iterator

* Add ParallelCombiner

* Add ParallelCombinerTest

* Handle InterruptedException

* use AbstractPrioritizedCallable

* Address comments

* [maven-release-plugin] prepare release druid-0.11.0-sg

* [maven-release-plugin] prepare for next development iteration

* Address comments

* Revert "[maven-release-plugin] prepare for next development iteration"

This reverts commit 5c6b31e488c413073e2b1d4ce128bdff0649b41a.

* Revert "[maven-release-plugin] prepare release druid-0.11.0-sg"

This reverts commit 0f5c3a8b82415b34fa765dc375d87ae8fe4daa3b.

* Fix build failure

* Change list to array

* rename sortableIds

* Address comments

* change to foreach loop

* Fix comment

* Revert keyEquals()

* Remove loop

* Address comments

* Fix build fail

* Address comments

* Remove unused imports

* Fix method name

* Split intermediate and leaf combine degrees

* Add comments to StreamingMergeSortedGrouper

* Add more comments and fix overflow

* Address comments

* ConcurrentGrouperTest cleanup

* add thread number configuration for parallel combining

* improve doc

* address comments

* fix build

											
										
										
											2017-10-18 02:24:08 -04:00
+								    "type": "greaterThan",
 								    "aggregation": "total_usage",
 								    "value": 100
-												renaming all *.md filenames to only have lowercase and dashes
so that they are editable on case-insensitive os as well

											
										
										
											2015-05-05 17:07:32 -04:00
+								  }
 								}
 								```
 								There are 11 main parts to a groupBy query:
 								|property|description|required?|
 								|--------|-----------|---------|
 								|queryType|This String should always be "groupBy"; this is the first thing Druid looks at to figure out how to interpret the query|yes|
 								|dataSource|A String or Object defining the data source to query, very similar to a table in a relational database. See [DataSource](../querying/datasource.html) for more information.|yes|
 								|dimensions|A JSON list of dimensions to do the groupBy over; or see [DimensionSpec](../querying/dimensionspecs.html) for ways to extract dimensions. |yes|
 								|limitSpec|See [LimitSpec](../querying/limitspec.html).|no|
 								|having|See [Having](../querying/having.html).|no|
 								|granularity|Defines the granularity of the query. See [Granularities](../querying/granularities.html)|yes|
 								|filter|See [Filters](../querying/filters.html)|no|
-												Allow queries with no aggregators. (#3216)

This is actually reasonable for a groupBy or lexicographic topNs that is
being used to do a "COUNT DISTINCT" kind of query. No aggregators are
needed for that query, and including a dummy aggregator wastes 8 bytes
per row.

It's kind of silly for timeseries, but why not.
											
										
										
											2016-07-06 11:08:54 -04:00
+								|aggregations|See [Aggregations](../querying/aggregations.html)|no|
-												renaming all *.md filenames to only have lowercase and dashes
so that they are editable on case-insensitive os as well

											
										
										
											2015-05-05 17:07:32 -04:00
+								|postAggregations|See [Post Aggregations](../querying/post-aggregations.html)|no|
 								|intervals|A JSON Object representing ISO-8601 Intervals. This defines the time ranges to run the query over.|yes|
 								|context|An additional JSON Object which can be used to specify certain flags.|no|
 								To pull it all together, the above query would return *n\*m* data points, up to a maximum of 5000 points, where n is the cardinality of the `country` dimension, m is the cardinality of the `device` dimension, each day between 2012-01-01 and 2012-01-03, from the `sample_datasource` table. Each data point contains the (long) sum of `total_usage` if the value of the data point is greater than 100, the (double) sum of `data_transfer` and the (double) result of `total_usage` divided by `data_transfer` for the filter set for a particular grouping of `country` and `device`. The output looks like this:
 								```json
 								[
 								  {
 								    "version" : "v1",
 								    "timestamp" : "2012-01-01T00:00:00.000Z",
 								    "event" : {
 								      "country" : <some_dim_value_one>,
 								      "device" : <some_dim_value_two>,
 								      "total_usage" : <some_value_one>,
 								      "data_transfer" :<some_value_two>,
 								      "avg_usage" : <some_avg_usage_value>
 								    }
 								  },
 								  {
 								    "version" : "v1",
 								    "timestamp" : "2012-01-01T00:00:12.000Z",
 								    "event" : {
 								      "dim1" : <some_other_dim_value_one>,
 								      "dim2" : <some_other_dim_value_two>,
 								      "sample_name1" : <some_other_value_one>,
 								      "sample_name2" :<some_other_value_two>,
 								      "avg_usage" : <some_other_avg_usage_value>
 								    }
 								  },
 								...
 								]
 								```
-												Improved docs for multi-value dimensions.

- Add central doc for multi-value dimensions, with some content from other docs.
- Link to multi-value dimension doc from topN and groupBy docs.
- Fixes a broken link from dimensionspecs.md, which was presciently already
  linking to this nonexistent doc.
- Resolve inconsistent naming in docs & code (sometimes "multi-valued", sometimes
  "multi-value") in favor of "multi-value".
											
										
										
											2016-03-22 17:16:34 -04:00
 								### Behavior on multi-value dimensions
 								groupBy queries can group on multi-value dimensions. When grouping on a multi-value dimension, _all_ values
 								from matching rows will be used to generate one group per value. It's possible for a query to return more groups than
-												docs: change OR to AND inside query docs about multi-value dims (#3162)

* docs: replace OR by AND inside topnquery docs about multi value dimensions

* docs: replace OR by AND inside groupby docs about multi value dimensions

											
										
										
											2016-06-17 11:54:18 -04:00
+								there are rows. For example, a groupBy on the dimension `tags` with filter `"t1" AND "t3"` would match only row1, and
-												Improved docs for multi-value dimensions.

- Add central doc for multi-value dimensions, with some content from other docs.
- Link to multi-value dimension doc from topN and groupBy docs.
- Fixes a broken link from dimensionspecs.md, which was presciently already
  linking to this nonexistent doc.
- Resolve inconsistent naming in docs & code (sometimes "multi-valued", sometimes
  "multi-value") in favor of "multi-value".
											
										
										
											2016-03-22 17:16:34 -04:00
+								generate a result with three groups: `t1`, `t2`, and `t3`. If you only need to include values that match
 								your filter, you can use a [filtered dimensionSpec](dimensionspecs.html#filtered-dimensionspecs). This can also
 								improve performance.
 								See [Multi-value dimensions](multi-value-dimensions.html) for more details.
-												Alternative groupBy strategy. (#2998)

This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
											
										
										
											2016-06-24 21:06:09 -04:00
 								### Implementation details
 								#### Strategies
 								GroupBy queries can be executed using two different strategies. The default strategy for a cluster is determined by the
 								"druid.query.groupBy.defaultStrategy" runtime property on the broker. This can be overridden using "groupByStrategy" in
-												Use GroupBy V2 as default (#3953)

* Use GroupBy V2 as default

* Remove unused line

* Change assert to exception propagation

											
										
										
											2017-02-18 10:40:40 -05:00
+								the query context. If neither the context field nor the property is set, the "v2" strategy will be used.
-												Alternative groupBy strategy. (#2998)

This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
											
										
										
											2016-06-24 21:06:09 -04:00
-												Fix groupBy docs to reflect that 'v2' is default. (#3993)


											
										
										
											2017-03-02 18:13:39 -05:00
+								- "v2", the default, is designed to offer better performance and memory management. This strategy generates
 								per-segment results using a fully off-heap map. Data nodes merge the per-segment results using a fully off-heap
 								concurrent facts map combined with an on-heap string dictionary. This may optionally involve spilling to disk. Data
 								nodes return sorted results to the broker, which merges result streams using an N-way merge. The broker materializes
 								the results if necessary (e.g. if the query sorts on columns other than its dimensions). Otherwise, it streams results
 								back as they are merged.
 								- "v1", a legacy engine, generates per-segment results on data nodes (historical, realtime, middleManager) using a map which
-												Alternative groupBy strategy. (#2998)

This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
											
										
										
											2016-06-24 21:06:09 -04:00
+								is partially on-heap (dimension keys and the map itself) and partially off-heap (the aggregated values). Data nodes then
 								merge the per-segment results using Druid's indexing mechanism. This merging is multi-threaded by default, but can
 								optionally be single-threaded. The broker merges the final result set using Druid's indexing mechanism again. The broker
 								merging is always single-threaded. Because the broker merges results using the indexing mechanism, it must materialize
 								the full result set before returning any results. On both the data nodes and the broker, the merging index is fully
 								on-heap by default, but it can optionally store aggregated values off-heap.
-												Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods. (#4004)

* Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods.

Includes two fixes:
- groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults
returns a lazy sequence) and it generates incorrect results.
- Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y".

Also includes doc and test fixes:
- groupBy v1 was no longer being tested by GroupByQueryRunnerTest since #3953, now it
  is once again.
- chunkPeriod documentation was misleading due to its checkered past. Updated it to
  be more accurate.

* Remove unused import.

* Restore buffer size.

											
										
										
											2017-03-06 13:27:02 -05:00
+								#### Differences between v1 and v2
 								Query API and results are compatible between the two engines; however, there are some differences from a cluster
 								configuration perspective:
-												Better groupBy error messages and docs around resource limits. (#4162)

* Better groupBy error messages and docs around resource limits.

* Fix BufferGrouper test from datasketches.

* Further clarify.

											
										
										
											2017-04-13 13:38:53 -04:00
+								- groupBy v1 controls resource usage using a row-based limit (maxResults) whereas groupBy v2 uses bytes-based limits.
 								In addition, groupBy v1 merges results on-heap, whereas groupBy v2 merges results off-heap. These factors mean that
 								memory tuning and resource limits behave differently between v1 and v2. In particular, due to this, some queries
 								that can complete successfully in one engine may exceed resource limits and fail with the other engine. See the
 								"Memory tuning and resource limits" section for more details.
-												Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods. (#4004)

* Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods.

Includes two fixes:
- groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults
returns a lazy sequence) and it generates incorrect results.
- Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y".

Also includes doc and test fixes:
- groupBy v1 was no longer being tested by GroupByQueryRunnerTest since #3953, now it
  is once again.
- chunkPeriod documentation was misleading due to its checkered past. Updated it to
  be more accurate.

* Remove unused import.

* Restore buffer size.

											
										
										
											2017-03-06 13:27:02 -05:00
+								- groupBy v1 imposes no limit on the number of concurrently running queries, whereas groupBy v2 controls memory usage
 								by using a finite-sized merge buffer pool. By default, the number of merge buffers is 1/4 the number of processing
 								threads. You can adjust this as necessary to balance concurrency and memory usage.
 								- groupBy v1 supports caching on either the broker or historical nodes, whereas groupBy v2 only supports caching on
 								historical nodes.
 								- groupBy v1 supports using [chunkPeriod](query-context.html) to parallelize merging on the broker, whereas groupBy v2
 								ignores chunkPeriod.
-												Array-based aggregation for groupBy query (#4576)

* Array-based aggregation

* Fix handling missing grouping key

* Handle invalid offset

* Fix compilation

* Add cardinality check

* Fix cardinality check

* Address comments

* Address comments

* Address comments

* Address comments

* Cleanup GroupByQueryEngineV2.process

* Change to Byte.SIZE

* Add flatMap

											
										
										
											2017-08-03 13:04:54 -04:00
+								- groupBy v2 supports both array-based aggregation and hash-based aggregation. The array-based aggregation is used only
 								when the grouping key is a single indexed string column. In array-based aggregation, the dictionary-encoded value is used
 								as the index, so the aggregated values in the array can be accessed directly without finding buckets based on hashing.
-												Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods. (#4004)

* Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods.

Includes two fixes:
- groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults
returns a lazy sequence) and it generates incorrect results.
- Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y".

Also includes doc and test fixes:
- groupBy v1 was no longer being tested by GroupByQueryRunnerTest since #3953, now it
  is once again.
- chunkPeriod documentation was misleading due to its checkered past. Updated it to
  be more accurate.

* Remove unused import.

* Restore buffer size.

											
										
										
											2017-03-06 13:27:02 -05:00
-												Better groupBy error messages and docs around resource limits. (#4162)

* Better groupBy error messages and docs around resource limits.

* Fix BufferGrouper test from datasketches.

* Further clarify.

											
										
										
											2017-04-13 13:38:53 -04:00
+								#### Memory tuning and resource limits
 								When using groupBy v2, three parameters control resource usage and limits:
 								- druid.processing.buffer.sizeBytes: size of the off-heap hash table used for aggregation, per query, in bytes. At
 								most druid.processing.numMergeBuffers of these will be created at once, which also serves as an upper limit on the
 								number of concurrently running groupBy queries.
 								- druid.query.groupBy.maxMergingDictionarySize: size of the on-heap dictionary used when grouping on strings, per query,
 								in bytes. Note that this is based on a rough estimate of the dictionary size, not the actual size.
 								- druid.query.groupBy.maxOnDiskStorage: amount of space on disk used for aggregation, per query, in bytes. By default,
 								this is 0, which means aggregation will not use disk.
 								If maxOnDiskStorage is 0 (the default) then a query that exceeds either the on-heap dictionary limit, or the off-heap
 								aggregation table limit, will fail with a "Resource limit exceeded" error describing the limit that was exceeded.
 								If maxOnDiskStorage is greater than 0, queries that exceed the in-memory limits will start using disk for aggregation.
 								In this case, when either the on-heap dictionary or off-heap hash table fills up, partially aggregated records will be
 								sorted and flushed to disk. Then, both in-memory structures will be cleared out for further aggregation. Queries that
 								then go on to exceed maxOnDiskStorage will fail with a "Resource limit exceeded" error indicating that they ran out of
 								disk space.
 								With groupBy v2, cluster operators should make sure that the off-heap hash tables and on-heap merging dictionaries
 								will not exceed available memory for the maximum possible concurrent query load (given by
-												Add streaming aggregation as the last step of ConcurrentGrouper if data are spilled (#4704)

* Add steaming grouper

* Fix doc

* Use a single dictionary while combining

* Revert GroupByBenchmark

* Removed unused code

* More cleanup

* Remove unused config

* Fix some typos and bugs

* Refactor Groupers.mergeIterators()

* Add comments for combining tree

* Refactor buildCombineTree

* Refactor iterator

* Add ParallelCombiner

* Add ParallelCombinerTest

* Handle InterruptedException

* use AbstractPrioritizedCallable

* Address comments

* [maven-release-plugin] prepare release druid-0.11.0-sg

* [maven-release-plugin] prepare for next development iteration

* Address comments

* Revert "[maven-release-plugin] prepare for next development iteration"

This reverts commit 5c6b31e488c413073e2b1d4ce128bdff0649b41a.

* Revert "[maven-release-plugin] prepare release druid-0.11.0-sg"

This reverts commit 0f5c3a8b82415b34fa765dc375d87ae8fe4daa3b.

* Fix build failure

* Change list to array

* rename sortableIds

* Address comments

* change to foreach loop

* Fix comment

* Revert keyEquals()

* Remove loop

* Address comments

* Fix build fail

* Address comments

* Remove unused imports

* Fix method name

* Split intermediate and leaf combine degrees

* Add comments to StreamingMergeSortedGrouper

* Add more comments and fix overflow

* Address comments

* ConcurrentGrouperTest cleanup

* add thread number configuration for parallel combining

* improve doc

* address comments

* fix build

											
										
										
											2017-10-18 02:24:08 -04:00
+								druid.processing.numMergeBuffers). See [How much direct memory does Druid use?](../operations/performance-faq.html) for more details.
-												Better groupBy error messages and docs around resource limits. (#4162)

* Better groupBy error messages and docs around resource limits.

* Fix BufferGrouper test from datasketches.

* Further clarify.

											
										
										
											2017-04-13 13:38:53 -04:00
 								When using groupBy v1, all aggregation is done on-heap, and resource limits are done through the parameter
 								druid.query.groupBy.maxResults. This is a cap on the maximum number of results in a result set. Queries that exceed
 								this limit will fail with a "Resource limit exceeded" error indicating they exceeded their row limit. Cluster
 								operators should make sure that the on-heap aggregations will not exceed available JVM heap space for the expected
 								concurrent query load.
-												Add streaming aggregation as the last step of ConcurrentGrouper if data are spilled (#4704)

* Add steaming grouper

* Fix doc

* Use a single dictionary while combining

* Revert GroupByBenchmark

* Removed unused code

* More cleanup

* Remove unused config

* Fix some typos and bugs

* Refactor Groupers.mergeIterators()

* Add comments for combining tree

* Refactor buildCombineTree

* Refactor iterator

* Add ParallelCombiner

* Add ParallelCombinerTest

* Handle InterruptedException

* use AbstractPrioritizedCallable

* Address comments

* [maven-release-plugin] prepare release druid-0.11.0-sg

* [maven-release-plugin] prepare for next development iteration

* Address comments

* Revert "[maven-release-plugin] prepare for next development iteration"

This reverts commit 5c6b31e488c413073e2b1d4ce128bdff0649b41a.

* Revert "[maven-release-plugin] prepare release druid-0.11.0-sg"

This reverts commit 0f5c3a8b82415b34fa765dc375d87ae8fe4daa3b.

* Fix build failure

* Change list to array

* rename sortableIds

* Address comments

* change to foreach loop

* Fix comment

* Revert keyEquals()

* Remove loop

* Address comments

* Fix build fail

* Address comments

* Remove unused imports

* Fix method name

* Split intermediate and leaf combine degrees

* Add comments to StreamingMergeSortedGrouper

* Add more comments and fix overflow

* Address comments

* ConcurrentGrouperTest cleanup

* add thread number configuration for parallel combining

* improve doc

* address comments

* fix build

											
										
										
											2017-10-18 02:24:08 -04:00
+								#### Performance tuning for groupBy v2
 								##### Limit pushdown optimization
 								Druid pushes down the `limit` spec in groupBy queries to the segments on historicals wherever possible to early prune unnecessary intermediate results and minimize the amount of data transferred to brokers. By default, this technique is applied only when all fields in the `orderBy` spec is a subset of the grouping keys. This is because the `limitPushDown` doesn't guarantee the exact results if the `orderBy` spec includes any fields that are not in the grouping keys. However, you can enable this technique even in such cases if you can sacrifice some accuracy for fast query processing like in topN queries. See `forceLimitPushDown` in [advanced groupBy v2 configurations](#groupby-v2-configurations).
 								##### Optimizing hash table
 								The groupBy v2 engine uses an open addressing hash table for aggregation. The hash table is initalized with a given initial bucket number and gradually grows on buffer full. On hash collisions, the linear probing technique is used.
 								The default number of initial buckets is 1024 and the default max load factor of the hash table is 0.7. If you can see too many collisions in the hash table, you can adjust these numbers. See `bufferGrouperInitialBuckets` and `bufferGrouperMaxLoadFactor` in [Advanced groupBy v2 configurations](#groupby-v2-configurations).
 								##### Parallel combine
 								Once a historical finishes aggregation using the hash table, it sorts aggregates and merge them before sending to the broker for N-way merge aggregation in the broker. By default, historicals use all their available processing threads (configured by `druid.processing.numThreads`) for aggregation, but use a single thread for sorting and merging aggregates which is an http thread to send data to brokers.
 								This is to prevent some heavy groupBy queries from blocking other queries. In Druid, the processing threads are shared between all submitted queries and they are _not interruptible_. It means, if a heavy query takes all available processing threads, all other queries might be blocked until the heavy query is finished. GroupBy queries usually take longer time than timeseries or topN queries, they should release processing threads as soon as possible.
 								However, you might care about the performance of some really heavy groupBy queries. Usually, the performance bottleneck of heavy groupBy queries is merging sorted aggregates. In such cases, you can use processing threads for it as well. This is called _parallel combine_. To enable parallel combine, see `numParallelCombineThreads` in [Advanced groupBy v2 configurations](#groupby-v2-configurations). Note that parallel combine can be enabled only when data is actually spilled (see [Memory tuning and resource limits](#memory-tuning-and-resource-limits)).
 								Once parallel combine is enabled, the groupBy v2 engine can create a combining tree for merging sorted aggregates. Each intermediate node of the tree is a thread merging aggregates from the child nodes. The leaf node threads read and merge aggregates from hash tables including spilled ones. Usually, leaf nodes are slower than intermediate nodes because they need to read data from disk. As a result, less threads are used for intermediate nodes by default. You can change the degree of intermeidate nodes. See `intermediateCombineDegree` in [Advanced groupBy v2 configurations](#groupby-v2-configurations).
-												Alternative groupBy strategy. (#2998)

This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
											
										
										
											2016-06-24 21:06:09 -04:00
+								#### Alternatives
 								There are some situations where other query types may be a better choice than groupBy.
 								- For queries with no "dimensions" (i.e. grouping by time only) the [Timeseries query](timeseriesquery.html) will
 								generally be faster than groupBy. The major differences are that it is implemented in a fully streaming manner (taking
 								advantage of the fact that segments are already sorted on time) and does not need to use a hash table for merging.
 								- For queries with a single "dimensions" element (i.e. grouping by one string dimension), the [TopN query](topnquery.html)
 								will sometimes be faster than groupBy. This is especially true if you are ordering by a metric and find approximate
 								results acceptable.
 								#### Nested groupBys
-												groupBy nested query using v2 strategy (#3269)

* changed v2 nested query strategy

* add test for #3239

* update for new ValueMatcher interface and add benchmarks

* enable time filtering

* address PR comments

* add failing test for outer filter aggregator

* add helper class for sharing code

* update nested groupby doc

* move temporary storage instantiation

* address PR comment

* address PR comment 2

											
										
										
											2016-08-01 21:30:39 -04:00
+								Nested groupBys (dataSource of type "query") are performed differently for "v1" and "v2". The broker first runs the
 								inner groupBy query in the usual way. "v1" strategy then materializes the inner query's results on-heap with Druid's
 								indexing mechanism, and runs the outer query on these materialized results. "v2" strategy runs the outer query on the
 								inner query's results stream with off-heap fact map and on-heap string dictionary that can spill to disk. Both
 								strategy perform the outer query on the broker in a single-threaded fashion.
-												Alternative groupBy strategy. (#2998)

This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
											
										
										
											2016-06-24 21:06:09 -04:00
-												Add streaming aggregation as the last step of ConcurrentGrouper if data are spilled (#4704)

* Add steaming grouper

* Fix doc

* Use a single dictionary while combining

* Revert GroupByBenchmark

* Removed unused code

* More cleanup

* Remove unused config

* Fix some typos and bugs

* Refactor Groupers.mergeIterators()

* Add comments for combining tree

* Refactor buildCombineTree

* Refactor iterator

* Add ParallelCombiner

* Add ParallelCombinerTest

* Handle InterruptedException

* use AbstractPrioritizedCallable

* Address comments

* [maven-release-plugin] prepare release druid-0.11.0-sg

* [maven-release-plugin] prepare for next development iteration

* Address comments

* Revert "[maven-release-plugin] prepare for next development iteration"

This reverts commit 5c6b31e488c413073e2b1d4ce128bdff0649b41a.

* Revert "[maven-release-plugin] prepare release druid-0.11.0-sg"

This reverts commit 0f5c3a8b82415b34fa765dc375d87ae8fe4daa3b.

* Fix build failure

* Change list to array

* rename sortableIds

* Address comments

* change to foreach loop

* Fix comment

* Revert keyEquals()

* Remove loop

* Address comments

* Fix build fail

* Address comments

* Remove unused imports

* Fix method name

* Split intermediate and leaf combine degrees

* Add comments to StreamingMergeSortedGrouper

* Add more comments and fix overflow

* Address comments

* ConcurrentGrouperTest cleanup

* add thread number configuration for parallel combining

* improve doc

* address comments

* fix build

											
										
										
											2017-10-18 02:24:08 -04:00
+								#### Configurations
 								This section describes the configurations for groupBy queries. You can set system-wide configurations by adding them to runtime properties or query-specific configurations by adding them to query contexts. All runtime properties are prefixed by `druid.query.groupBy`.
 								#### Commonly tuned configurations
-												Alternative groupBy strategy. (#2998)

This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
											
										
										
											2016-06-24 21:06:09 -04:00
-												Add streaming aggregation as the last step of ConcurrentGrouper if data are spilled (#4704)

* Add steaming grouper

* Fix doc

* Use a single dictionary while combining

* Revert GroupByBenchmark

* Removed unused code

* More cleanup

* Remove unused config

* Fix some typos and bugs

* Refactor Groupers.mergeIterators()

* Add comments for combining tree

* Refactor buildCombineTree

* Refactor iterator

* Add ParallelCombiner

* Add ParallelCombinerTest

* Handle InterruptedException

* use AbstractPrioritizedCallable

* Address comments

* [maven-release-plugin] prepare release druid-0.11.0-sg

* [maven-release-plugin] prepare for next development iteration

* Address comments

* Revert "[maven-release-plugin] prepare for next development iteration"

This reverts commit 5c6b31e488c413073e2b1d4ce128bdff0649b41a.

* Revert "[maven-release-plugin] prepare release druid-0.11.0-sg"

This reverts commit 0f5c3a8b82415b34fa765dc375d87ae8fe4daa3b.

* Fix build failure

* Change list to array

* rename sortableIds

* Address comments

* change to foreach loop

* Fix comment

* Revert keyEquals()

* Remove loop

* Address comments

* Fix build fail

* Address comments

* Remove unused imports

* Fix method name

* Split intermediate and leaf combine degrees

* Add comments to StreamingMergeSortedGrouper

* Add more comments and fix overflow

* Address comments

* ConcurrentGrouperTest cleanup

* add thread number configuration for parallel combining

* improve doc

* address comments

* fix build

											
										
										
											2017-10-18 02:24:08 -04:00
+								##### Configurations for groupBy v2
 								Supported runtime properties:
-												Alternative groupBy strategy. (#2998)

This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
											
										
										
											2016-06-24 21:06:09 -04:00
 								|Property|Description|Default|
 								|--------|-----------|-------|
-												groupBy v2: Configurable load factor. (#3437)

Also change defaults:

- bufferGrouperMaxLoadFactor from 0.75 to 0.7.
- maxMergingDictionarySize to 100MB from 25MB, should be more appropriate
  for most heaps.
											
										
										
											2016-09-07 15:14:59 -04:00
+								|`druid.query.groupBy.maxMergingDictionarySize`|Maximum amount of heap space (approximately) to use for the string dictionary during merging. When the dictionary exceeds this size, a spill to disk will be triggered.|100000000|
-												Alternative groupBy strategy. (#2998)

This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
											
										
										
											2016-06-24 21:06:09 -04:00
+								|`druid.query.groupBy.maxOnDiskStorage`|Maximum amount of disk space to use, per-query, for spilling result sets to disk when either the merging buffer or the dictionary fills up. Queries that exceed this limit will fail. Set to zero to disable disk spilling.|0 (disabled)|
-												Add streaming aggregation as the last step of ConcurrentGrouper if data are spilled (#4704)

* Add steaming grouper

* Fix doc

* Use a single dictionary while combining

* Revert GroupByBenchmark

* Removed unused code

* More cleanup

* Remove unused config

* Fix some typos and bugs

* Refactor Groupers.mergeIterators()

* Add comments for combining tree

* Refactor buildCombineTree

* Refactor iterator

* Add ParallelCombiner

* Add ParallelCombinerTest

* Handle InterruptedException

* use AbstractPrioritizedCallable

* Address comments

* [maven-release-plugin] prepare release druid-0.11.0-sg

* [maven-release-plugin] prepare for next development iteration

* Address comments

* Revert "[maven-release-plugin] prepare for next development iteration"

This reverts commit 5c6b31e488c413073e2b1d4ce128bdff0649b41a.

* Revert "[maven-release-plugin] prepare release druid-0.11.0-sg"

This reverts commit 0f5c3a8b82415b34fa765dc375d87ae8fe4daa3b.

* Fix build failure

* Change list to array

* rename sortableIds

* Address comments

* change to foreach loop

* Fix comment

* Revert keyEquals()

* Remove loop

* Address comments

* Fix build fail

* Address comments

* Remove unused imports

* Fix method name

* Split intermediate and leaf combine degrees

* Add comments to StreamingMergeSortedGrouper

* Add more comments and fix overflow

* Address comments

* ConcurrentGrouperTest cleanup

* add thread number configuration for parallel combining

* improve doc

* address comments

* fix build

											
										
										
											2017-10-18 02:24:08 -04:00
+								Supported query contexts:
 								|Key|Description|
 								|---|-----------|
 								|`maxMergingDictionarySize`|Can be used to lower the value of `druid.query.groupBy.maxMergingDictionarySize` for this query.|
 								|`maxOnDiskStorage`|Can be used to lower the value of `druid.query.groupBy.maxOnDiskStorage` for this query.|
 								#### Advanced configurations
 								##### Common configuragions for all groupBy strategies
-												Alternative groupBy strategy. (#2998)

This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
											
										
										
											2016-06-24 21:06:09 -04:00
-												Add streaming aggregation as the last step of ConcurrentGrouper if data are spilled (#4704)

* Add steaming grouper

* Fix doc

* Use a single dictionary while combining

* Revert GroupByBenchmark

* Removed unused code

* More cleanup

* Remove unused config

* Fix some typos and bugs

* Refactor Groupers.mergeIterators()

* Add comments for combining tree

* Refactor buildCombineTree

* Refactor iterator

* Add ParallelCombiner

* Add ParallelCombinerTest

* Handle InterruptedException

* use AbstractPrioritizedCallable

* Address comments

* [maven-release-plugin] prepare release druid-0.11.0-sg

* [maven-release-plugin] prepare for next development iteration

* Address comments

* Revert "[maven-release-plugin] prepare for next development iteration"

This reverts commit 5c6b31e488c413073e2b1d4ce128bdff0649b41a.

* Revert "[maven-release-plugin] prepare release druid-0.11.0-sg"

This reverts commit 0f5c3a8b82415b34fa765dc375d87ae8fe4daa3b.

* Fix build failure

* Change list to array

* rename sortableIds

* Address comments

* change to foreach loop

* Fix comment

* Revert keyEquals()

* Remove loop

* Address comments

* Fix build fail

* Address comments

* Remove unused imports

* Fix method name

* Split intermediate and leaf combine degrees

* Add comments to StreamingMergeSortedGrouper

* Add more comments and fix overflow

* Address comments

* ConcurrentGrouperTest cleanup

* add thread number configuration for parallel combining

* improve doc

* address comments

* fix build

											
										
										
											2017-10-18 02:24:08 -04:00
+								Supported runtime properties:
-												Alternative groupBy strategy. (#2998)

This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
											
										
										
											2016-06-24 21:06:09 -04:00
-												Fix groupBy docs to reflect that 'v2' is default. (#3993)


											
										
										
											2017-03-02 18:13:39 -05:00
+								|Property|Description|Default|
 								|--------|-----------|-------|
 								|`druid.query.groupBy.defaultStrategy`|Default groupBy query strategy.|v2|
 								|`druid.query.groupBy.singleThreaded`|Merge results using a single thread.|false|
-												Alternative groupBy strategy. (#2998)

This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
											
										
										
											2016-06-24 21:06:09 -04:00
-												Add streaming aggregation as the last step of ConcurrentGrouper if data are spilled (#4704)

* Add steaming grouper

* Fix doc

* Use a single dictionary while combining

* Revert GroupByBenchmark

* Removed unused code

* More cleanup

* Remove unused config

* Fix some typos and bugs

* Refactor Groupers.mergeIterators()

* Add comments for combining tree

* Refactor buildCombineTree

* Refactor iterator

* Add ParallelCombiner

* Add ParallelCombinerTest

* Handle InterruptedException

* use AbstractPrioritizedCallable

* Address comments

* [maven-release-plugin] prepare release druid-0.11.0-sg

* [maven-release-plugin] prepare for next development iteration

* Address comments

* Revert "[maven-release-plugin] prepare for next development iteration"

This reverts commit 5c6b31e488c413073e2b1d4ce128bdff0649b41a.

* Revert "[maven-release-plugin] prepare release druid-0.11.0-sg"

This reverts commit 0f5c3a8b82415b34fa765dc375d87ae8fe4daa3b.

* Fix build failure

* Change list to array

* rename sortableIds

* Address comments

* change to foreach loop

* Fix comment

* Revert keyEquals()

* Remove loop

* Address comments

* Fix build fail

* Address comments

* Remove unused imports

* Fix method name

* Split intermediate and leaf combine degrees

* Add comments to StreamingMergeSortedGrouper

* Add more comments and fix overflow

* Address comments

* ConcurrentGrouperTest cleanup

* add thread number configuration for parallel combining

* improve doc

* address comments

* fix build

											
										
										
											2017-10-18 02:24:08 -04:00
+								Supported query contexts:
-												Alternative groupBy strategy. (#2998)

This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
											
										
										
											2016-06-24 21:06:09 -04:00
-												Add streaming aggregation as the last step of ConcurrentGrouper if data are spilled (#4704)

* Add steaming grouper

* Fix doc

* Use a single dictionary while combining

* Revert GroupByBenchmark

* Removed unused code

* More cleanup

* Remove unused config

* Fix some typos and bugs

* Refactor Groupers.mergeIterators()

* Add comments for combining tree

* Refactor buildCombineTree

* Refactor iterator

* Add ParallelCombiner

* Add ParallelCombinerTest

* Handle InterruptedException

* use AbstractPrioritizedCallable

* Address comments

* [maven-release-plugin] prepare release druid-0.11.0-sg

* [maven-release-plugin] prepare for next development iteration

* Address comments

* Revert "[maven-release-plugin] prepare for next development iteration"

This reverts commit 5c6b31e488c413073e2b1d4ce128bdff0649b41a.

* Revert "[maven-release-plugin] prepare release druid-0.11.0-sg"

This reverts commit 0f5c3a8b82415b34fa765dc375d87ae8fe4daa3b.

* Fix build failure

* Change list to array

* rename sortableIds

* Address comments

* change to foreach loop

* Fix comment

* Revert keyEquals()

* Remove loop

* Address comments

* Fix build fail

* Address comments

* Remove unused imports

* Fix method name

* Split intermediate and leaf combine degrees

* Add comments to StreamingMergeSortedGrouper

* Add more comments and fix overflow

* Address comments

* ConcurrentGrouperTest cleanup

* add thread number configuration for parallel combining

* improve doc

* address comments

* fix build

											
										
										
											2017-10-18 02:24:08 -04:00
+								|Key|Description|
 								|---|-----------|
-												Alternative groupBy strategy. (#2998)

This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
											
										
										
											2016-06-24 21:06:09 -04:00
+								|`groupByStrategy`|Overrides the value of `druid.query.groupBy.defaultStrategy` for this query.|
-												support singleThreaded flag for groupBy-v2 as well (#3992)


											
										
										
											2017-03-03 13:13:06 -05:00
+								|`groupByIsSingleThreaded`|Overrides the value of `druid.query.groupBy.singleThreaded` for this query.|
-												Fix groupBy docs to reflect that 'v2' is default. (#3993)


											
										
										
											2017-03-02 18:13:39 -05:00
-												Add streaming aggregation as the last step of ConcurrentGrouper if data are spilled (#4704)

* Add steaming grouper

* Fix doc

* Use a single dictionary while combining

* Revert GroupByBenchmark

* Removed unused code

* More cleanup

* Remove unused config

* Fix some typos and bugs

* Refactor Groupers.mergeIterators()

* Add comments for combining tree

* Refactor buildCombineTree

* Refactor iterator

* Add ParallelCombiner

* Add ParallelCombinerTest

* Handle InterruptedException

* use AbstractPrioritizedCallable

* Address comments

* [maven-release-plugin] prepare release druid-0.11.0-sg

* [maven-release-plugin] prepare for next development iteration

* Address comments

* Revert "[maven-release-plugin] prepare for next development iteration"

This reverts commit 5c6b31e488c413073e2b1d4ce128bdff0649b41a.

* Revert "[maven-release-plugin] prepare release druid-0.11.0-sg"

This reverts commit 0f5c3a8b82415b34fa765dc375d87ae8fe4daa3b.

* Fix build failure

* Change list to array

* rename sortableIds

* Address comments

* change to foreach loop

* Fix comment

* Revert keyEquals()

* Remove loop

* Address comments

* Fix build fail

* Address comments

* Remove unused imports

* Fix method name

* Split intermediate and leaf combine degrees

* Add comments to StreamingMergeSortedGrouper

* Add more comments and fix overflow

* Address comments

* ConcurrentGrouperTest cleanup

* add thread number configuration for parallel combining

* improve doc

* address comments

* fix build

											
										
										
											2017-10-18 02:24:08 -04:00
+								##### GroupBy v2 configurations
 								Supported runtime properties:
 								|Property|Description|Default|
 								|--------|-----------|-------|
 								|`druid.query.groupBy.bufferGrouperInitialBuckets`|Initial number of buckets in the off-heap hash table used for grouping results. Set to 0 to use a reasonable default (1024).|0|
 								|`druid.query.groupBy.bufferGrouperMaxLoadFactor`|Maximum load factor of the off-heap hash table used for grouping results. When the load factor exceeds this size, the table will be grown or spilled to disk. Set to 0 to use a reasonable default (0.7).|0|
 								|`druid.query.groupBy.forceHashAggregation`|Force to use hash-based aggregation.|false|
 								|`druid.query.groupBy.intermediateCombineDegree`|Number of intermediate nodes combined together in the combining tree. Higher degrees will need less threads which might be helpful to improve the query performance by reducing the overhead of too many threads if the server has sufficiently powerful cpu cores.|8|
 								|`druid.query.groupBy.numParallelCombineThreads`|Hint for the number of parallel combining threads. This should be larger than 1 to turn on the parallel combining feature. The actual number of threads used for parallel combining is min(`druid.query.groupBy.numParallelCombineThreads`, `druid.processing.numThreads`).|1 (disabled)|
 								Supported query contexts:
 								|Key|Description|Default|
 								|---|-----------|-------|
 								|`bufferGrouperInitialBuckets`|Overrides the value of `druid.query.groupBy.bufferGrouperInitialBuckets` for this query.|None|
 								|`bufferGrouperMaxLoadFactor`|Overrides the value of `druid.query.groupBy.bufferGrouperMaxLoadFactor` for this query.|None|
 								|`forceHashAggregation`|Overrides the value of `druid.query.groupBy.forceHashAggregation`|None|
 								|`intermediateCombineDegree`|Overrides the value of `druid.query.groupBy.intermediateCombineDegree`|None|
 								|`numParallelCombineThreads`|Overrides the value of `druid.query.groupBy.numParallelCombineThreads`|None|
 								|`sortByDimsFirst`|Sort the results first by dimension values and then by timestamp.|false|
 								|`forceLimitPushDown`|When all fields in the orderby are part of the grouping key, the broker will push limit application down to the historical nodes. When the sorting order uses fields that are not in the grouping key, applying this optimization can result in approximate results with unknown accuracy, so this optimization is disabled by default in that case. Enabling this context flag turns on limit push down for limit/orderbys that contain non-grouping key columns.|false|
 								##### GroupBy v1 configurations
 								Supported runtime properties:
 								|Property|Description|Default|
 								|--------|-----------|-------|
 								|`druid.query.groupBy.maxIntermediateRows`|Maximum number of intermediate rows for the per-segment grouping engine. This is a tuning parameter that does not impose a hard limit; rather, it potentially shifts merging work from the per-segment engine to the overall merging index. Queries that exceed this limit will not fail.|50000|
 								|`druid.query.groupBy.maxResults`|Maximum number of results. Queries that exceed this limit will fail.|500000|
 								Supported query contexts:
 								|Key|Description|Default|
 								|---|-----------|-------|
 								|`maxIntermediateRows`|Can be used to lower the value of `druid.query.groupBy.maxIntermediateRows` for this query.|None|
 								|`maxResults`|Can be used to lower the value of `druid.query.groupBy.maxResults` for this query.|None|
 								|`useOffheap`|Set to true to store aggregations off-heap when merging results.|false|