mirror of https://github.com/apache/druid.git
fix doc headers (#8729)
This commit is contained in:
parent
fdbc4ae147
commit
cc3650ee3b
|
@ -31,7 +31,7 @@ Apache Druid (incubating) uses [Apache ZooKeeper](http://zookeeper.apache.org/)
|
|||
4. [Overlord](../design/overlord.md) leader election
|
||||
5. [Overlord](../design/overlord.md) and [MiddleManager](../design/middlemanager.md) task management
|
||||
|
||||
### Coordinator Leader Election
|
||||
## Coordinator Leader Election
|
||||
|
||||
We use the Curator LeadershipLatch recipe to do leader election at path
|
||||
|
||||
|
@ -39,7 +39,7 @@ We use the Curator LeadershipLatch recipe to do leader election at path
|
|||
${druid.zk.paths.coordinatorPath}/_COORDINATOR
|
||||
```
|
||||
|
||||
### Segment "publishing" protocol from Historical and Realtime
|
||||
## Segment "publishing" protocol from Historical and Realtime
|
||||
|
||||
The `announcementsPath` and `servedSegmentsPath` are used for this.
|
||||
|
||||
|
@ -63,7 +63,7 @@ ${druid.zk.paths.servedSegmentsPath}/${druid.host}/_segment_identifier_
|
|||
|
||||
Processes like the [Coordinator](../design/coordinator.md) and [Broker](../design/broker.md) can then watch these paths to see which processes are currently serving which segments.
|
||||
|
||||
### Segment load/drop protocol between Coordinator and Historical
|
||||
## Segment load/drop protocol between Coordinator and Historical
|
||||
|
||||
The `loadQueuePath` is used for this.
|
||||
|
||||
|
|
|
@ -122,7 +122,7 @@ To pull it all together, the above query would return *n\*m* data points, up to
|
|||
]
|
||||
```
|
||||
|
||||
### Behavior on multi-value dimensions
|
||||
## Behavior on multi-value dimensions
|
||||
|
||||
groupBy queries can group on multi-value dimensions. When grouping on a multi-value dimension, _all_ values
|
||||
from matching rows will be used to generate one group per value. It's possible for a query to return more groups than
|
||||
|
@ -133,7 +133,7 @@ improve performance.
|
|||
|
||||
See [Multi-value dimensions](multi-value-dimensions.html) for more details.
|
||||
|
||||
### More on subtotalsSpec
|
||||
## More on subtotalsSpec
|
||||
The subtotals feature allows computation of multiple sub-groupings in a single query. To use this feature, add a "subtotalsSpec" to your query, which should be a list of subgroup dimension sets. It should contain the "outputName" from dimensions in your "dimensions" attribute, in the same order as they appear in the "dimensions" attribute (although, of course, you may skip some). For example, consider a groupBy query like this one:
|
||||
|
||||
```json
|
||||
|
@ -219,9 +219,9 @@ Response for above query would look something like below...
|
|||
]
|
||||
```
|
||||
|
||||
### Implementation details
|
||||
## Implementation details
|
||||
|
||||
#### Strategies
|
||||
### Strategies
|
||||
|
||||
GroupBy queries can be executed using two different strategies. The default strategy for a cluster is determined by the
|
||||
"druid.query.groupBy.defaultStrategy" runtime property on the Broker. This can be overridden using "groupByStrategy" in
|
||||
|
@ -242,7 +242,7 @@ merging is always single-threaded. Because the Broker merges results using the i
|
|||
the full result set before returning any results. On both the data processes and the Broker, the merging index is fully
|
||||
on-heap by default, but it can optionally store aggregated values off-heap.
|
||||
|
||||
#### Differences between v1 and v2
|
||||
### Differences between v1 and v2
|
||||
|
||||
Query API and results are compatible between the two engines; however, there are some differences from a cluster
|
||||
configuration perspective:
|
||||
|
@ -263,7 +263,7 @@ ignores chunkPeriod.
|
|||
when the grouping key is a single indexed string column. In array-based aggregation, the dictionary-encoded value is used
|
||||
as the index, so the aggregated values in the array can be accessed directly without finding buckets based on hashing.
|
||||
|
||||
#### Memory tuning and resource limits
|
||||
### Memory tuning and resource limits
|
||||
|
||||
When using groupBy v2, three parameters control resource usage and limits:
|
||||
|
||||
|
@ -299,21 +299,21 @@ this limit will fail with a "Resource limit exceeded" error indicating they exce
|
|||
operators should make sure that the on-heap aggregations will not exceed available JVM heap space for the expected
|
||||
concurrent query load.
|
||||
|
||||
#### Performance tuning for groupBy v2
|
||||
### Performance tuning for groupBy v2
|
||||
|
||||
##### Limit pushdown optimization
|
||||
#### Limit pushdown optimization
|
||||
|
||||
Druid pushes down the `limit` spec in groupBy queries to the segments on Historicals wherever possible to early prune unnecessary intermediate results and minimize the amount of data transferred to Brokers. By default, this technique is applied only when all fields in the `orderBy` spec is a subset of the grouping keys. This is because the `limitPushDown` doesn't guarantee the exact results if the `orderBy` spec includes any fields that are not in the grouping keys. However, you can enable this technique even in such cases if you can sacrifice some accuracy for fast query processing like in topN queries. See `forceLimitPushDown` in [advanced groupBy v2 configurations](#groupby-v2-configurations).
|
||||
|
||||
|
||||
##### Optimizing hash table
|
||||
#### Optimizing hash table
|
||||
|
||||
The groupBy v2 engine uses an open addressing hash table for aggregation. The hash table is initialized with a given initial bucket number and gradually grows on buffer full. On hash collisions, the linear probing technique is used.
|
||||
|
||||
The default number of initial buckets is 1024 and the default max load factor of the hash table is 0.7. If you can see too many collisions in the hash table, you can adjust these numbers. See `bufferGrouperInitialBuckets` and `bufferGrouperMaxLoadFactor` in [Advanced groupBy v2 configurations](#groupby-v2-configurations).
|
||||
|
||||
|
||||
##### Parallel combine
|
||||
#### Parallel combine
|
||||
|
||||
Once a Historical finishes aggregation using the hash table, it sorts the aggregated results and merges them before sending to the
|
||||
Broker for N-way merge aggregation in the broker. By default, Historicals use all their available processing threads
|
||||
|
@ -341,7 +341,7 @@ Please note that each Historical needs two merge buffers to process a groupBy v2
|
|||
computing intermediate aggregates from each segment and another for combining intermediate aggregates in parallel.
|
||||
|
||||
|
||||
#### Alternatives
|
||||
### Alternatives
|
||||
|
||||
There are some situations where other query types may be a better choice than groupBy.
|
||||
|
||||
|
@ -353,7 +353,7 @@ advantage of the fact that segments are already sorted on time) and does not nee
|
|||
will sometimes be faster than groupBy. This is especially true if you are ordering by a metric and find approximate
|
||||
results acceptable.
|
||||
|
||||
#### Nested groupBys
|
||||
### Nested groupBys
|
||||
|
||||
Nested groupBys (dataSource of type "query") are performed differently for "v1" and "v2". The Broker first runs the
|
||||
inner groupBy query in the usual way. "v1" strategy then materializes the inner query's results on-heap with Druid's
|
||||
|
@ -361,11 +361,11 @@ indexing mechanism, and runs the outer query on these materialized results. "v2"
|
|||
inner query's results stream with off-heap fact map and on-heap string dictionary that can spill to disk. Both
|
||||
strategy perform the outer query on the Broker in a single-threaded fashion.
|
||||
|
||||
#### Configurations
|
||||
### Configurations
|
||||
|
||||
This section describes the configurations for groupBy queries. You can set the runtime properties in the `runtime.properties` file on Broker, Historical, and MiddleManager processes. You can set the query context parameters through the [query context](query-context.html).
|
||||
|
||||
##### Configurations for groupBy v2
|
||||
#### Configurations for groupBy v2
|
||||
|
||||
Supported runtime properties:
|
||||
|
||||
|
@ -382,9 +382,9 @@ Supported query contexts:
|
|||
|`maxOnDiskStorage`|Can be used to lower the value of `druid.query.groupBy.maxOnDiskStorage` for this query.|
|
||||
|
||||
|
||||
#### Advanced configurations
|
||||
### Advanced configurations
|
||||
|
||||
##### Common configurations for all groupBy strategies
|
||||
#### Common configurations for all groupBy strategies
|
||||
|
||||
Supported runtime properties:
|
||||
|
||||
|
@ -401,7 +401,7 @@ Supported query contexts:
|
|||
|`groupByIsSingleThreaded`|Overrides the value of `druid.query.groupBy.singleThreaded` for this query.|
|
||||
|
||||
|
||||
##### GroupBy v2 configurations
|
||||
#### GroupBy v2 configurations
|
||||
|
||||
Supported runtime properties:
|
||||
|
||||
|
@ -428,7 +428,7 @@ Supported query contexts:
|
|||
|`applyLimitPushDownToSegment`|If Broker pushes limit down to queryable nodes (historicals, peons) then limit results during segment scan. This context value can be used to override `druid.query.groupBy.applyLimitPushDownToSegment`.|true|
|
||||
|
||||
|
||||
##### GroupBy v1 configurations
|
||||
#### GroupBy v1 configurations
|
||||
|
||||
Supported runtime properties:
|
||||
|
||||
|
@ -445,7 +445,7 @@ Supported query contexts:
|
|||
|`maxResults`|Can be used to lower the value of `druid.query.groupBy.maxResults` for this query.|None|
|
||||
|`useOffheap`|Set to true to store aggregations off-heap when merging results.|false|
|
||||
|
||||
##### Array based result rows
|
||||
#### Array based result rows
|
||||
|
||||
Internally Druid always uses an array based representation of groupBy result rows, but by default this is translated
|
||||
into a map based result format at the Broker. To reduce the overhead of this translation, results may also be returned
|
||||
|
|
|
@ -124,7 +124,7 @@ only the rows which satisfy those filters, thereby saving I/O cost. However, it
|
|||
and cursor-based execution plans, and chooses the optimal one. Currently, it is not enabled by default due to the overhead
|
||||
of cost estimation.
|
||||
|
||||
#### Server configuration
|
||||
## Server configuration
|
||||
|
||||
The following runtime properties apply:
|
||||
|
||||
|
@ -132,7 +132,7 @@ The following runtime properties apply:
|
|||
|--------|-----------|-------|
|
||||
|`druid.query.search.searchStrategy`|Default search query strategy.|useIndexes|
|
||||
|
||||
#### Query context
|
||||
## Query context
|
||||
|
||||
The following query context parameters apply:
|
||||
|
||||
|
|
|
@ -89,18 +89,18 @@ undefined.
|
|||
|
||||
Only columns which are dimensions (i.e., have type `STRING`) will have any cardinality. Rest of the columns (timestamp and metric columns) will show cardinality as `null`.
|
||||
|
||||
### intervals
|
||||
## intervals
|
||||
|
||||
If an interval is not specified, the query will use a default interval that spans a configurable period before the end time of the most recent segment.
|
||||
|
||||
The length of this default time period is set in the Broker configuration via:
|
||||
druid.query.segmentMetadata.defaultHistory
|
||||
|
||||
### toInclude
|
||||
## toInclude
|
||||
|
||||
There are 3 types of toInclude objects.
|
||||
|
||||
#### All
|
||||
### All
|
||||
|
||||
The grammar is as follows:
|
||||
|
||||
|
@ -108,7 +108,7 @@ The grammar is as follows:
|
|||
"toInclude": { "type": "all"}
|
||||
```
|
||||
|
||||
#### None
|
||||
### None
|
||||
|
||||
The grammar is as follows:
|
||||
|
||||
|
@ -116,7 +116,7 @@ The grammar is as follows:
|
|||
"toInclude": { "type": "none"}
|
||||
```
|
||||
|
||||
#### List
|
||||
### List
|
||||
|
||||
The grammar is as follows:
|
||||
|
||||
|
@ -124,7 +124,7 @@ The grammar is as follows:
|
|||
"toInclude": { "type": "list", "columns": [<string list of column names>]}
|
||||
```
|
||||
|
||||
### analysisTypes
|
||||
## analysisTypes
|
||||
|
||||
This is a list of properties that determines the amount of information returned about the columns, i.e. analyses to be performed on the columns.
|
||||
|
||||
|
@ -135,32 +135,32 @@ The default analysis types can be set in the Broker configuration via:
|
|||
|
||||
Types of column analyses are described below:
|
||||
|
||||
#### cardinality
|
||||
### cardinality
|
||||
|
||||
* `cardinality` in the result will return the estimated floor of cardinality for each column. Only relevant for
|
||||
dimension columns.
|
||||
|
||||
#### minmax
|
||||
### minmax
|
||||
|
||||
* Estimated min/max values for each column. Only relevant for dimension columns.
|
||||
|
||||
#### size
|
||||
### size
|
||||
|
||||
* `size` in the result will contain the estimated total segment byte size as if the data were stored in text format
|
||||
|
||||
#### interval
|
||||
### interval
|
||||
|
||||
* `intervals` in the result will contain the list of intervals associated with the queried segments.
|
||||
|
||||
#### timestampSpec
|
||||
### timestampSpec
|
||||
|
||||
* `timestampSpec` in the result will contain timestampSpec of data stored in segments. this can be null if timestampSpec of segments was unknown or unmergeable (if merging is enabled).
|
||||
|
||||
#### queryGranularity
|
||||
### queryGranularity
|
||||
|
||||
* `queryGranularity` in the result will contain query granularity of data stored in segments. this can be null if query granularity of segments was unknown or unmergeable (if merging is enabled).
|
||||
|
||||
#### aggregators
|
||||
### aggregators
|
||||
|
||||
* `aggregators` in the result will contain the list of aggregators usable for querying metric columns. This may be
|
||||
null if the aggregators are unknown or unmergeable (if merging is enabled).
|
||||
|
@ -169,12 +169,12 @@ null if the aggregators are unknown or unmergeable (if merging is enabled).
|
|||
|
||||
* The form of the result is a map of column name to aggregator.
|
||||
|
||||
#### rollup
|
||||
### rollup
|
||||
|
||||
* `rollup` in the result is true/false/null.
|
||||
* When merging is enabled, if some are rollup, others are not, result is null.
|
||||
|
||||
### lenientAggregatorMerge
|
||||
## lenientAggregatorMerge
|
||||
|
||||
Conflicts between aggregator metadata across segments can occur if some segments have unknown aggregators, or if
|
||||
two segments use incompatible aggregators for the same column (e.g. longSum changed to doubleSum).
|
||||
|
|
|
@ -94,7 +94,7 @@ To pull it all together, the above query would return 2 data points, one for eac
|
|||
]
|
||||
```
|
||||
|
||||
#### Grand totals
|
||||
## Grand totals
|
||||
|
||||
Druid can include an extra "grand totals" row as the last row of a timeseries result set. To enable this, add
|
||||
`"grandTotal" : true` to your query context. For example:
|
||||
|
@ -119,7 +119,7 @@ The grand totals row will appear as the last row in the result array, and will h
|
|||
row even if the query is run in "descending" mode. Post-aggregations in the grand totals row will be computed based
|
||||
upon the grand total aggregations.
|
||||
|
||||
#### Zero-filling
|
||||
## Zero-filling
|
||||
|
||||
Timeseries queries normally fill empty interior time buckets with zeroes. For example, if you issue a "day" granularity
|
||||
timeseries query for the interval 2012-01-01/2012-01-04, and no data exists for 2012-01-02, you will receive:
|
||||
|
|
|
@ -149,7 +149,7 @@ The format of the results would look like so:
|
|||
]
|
||||
```
|
||||
|
||||
### Behavior on multi-value dimensions
|
||||
## Behavior on multi-value dimensions
|
||||
|
||||
topN queries can group on multi-value dimensions. When grouping on a multi-value dimension, _all_ values
|
||||
from matching rows will be used to generate one group per value. It's possible for a query to return more groups than
|
||||
|
@ -160,7 +160,7 @@ improve performance.
|
|||
|
||||
See [Multi-value dimensions](multi-value-dimensions.html) for more details.
|
||||
|
||||
### Aliasing
|
||||
## Aliasing
|
||||
|
||||
The current TopN algorithm is an approximate algorithm. The top 1000 local results from each segment are returned for merging to determine the global topN. As such, the topN algorithm is approximate in both rank and results. Approximate results *ONLY APPLY WHEN THERE ARE MORE THAN 1000 DIM VALUES*. A topN over a dimension with fewer than 1000 unique dimension values can be considered accurate in rank and accurate in aggregates.
|
||||
|
||||
|
@ -176,7 +176,7 @@ Users wishing to get an *exact rank and exact aggregates* topN over a dimension
|
|||
|
||||
Users who can tolerate *approximate rank* topN over a dimension with greater than 1000 unique values, but require *exact aggregates* can issue two queries. One to get the approximate topN dimension values, and another topN with dimension selection filters which only use the topN results of the first.
|
||||
|
||||
#### Example First query:
|
||||
### Example First query
|
||||
|
||||
```json
|
||||
{
|
||||
|
@ -199,7 +199,7 @@ Users who can tolerate *approximate rank* topN over a dimension with greater tha
|
|||
}
|
||||
```
|
||||
|
||||
#### Example second query:
|
||||
### Example second query
|
||||
|
||||
```json
|
||||
{
|
||||
|
|
|
@ -50,6 +50,9 @@
|
|||
"design/coordinator": {
|
||||
"title": "Coordinator Process"
|
||||
},
|
||||
"design/extensions-contrib/dropwizard": {
|
||||
"title": "Dropwizard metrics emitter"
|
||||
},
|
||||
"design/historical": {
|
||||
"title": "Historical Process"
|
||||
},
|
||||
|
@ -336,9 +339,6 @@
|
|||
"operations/pull-deps": {
|
||||
"title": "pull-deps tool"
|
||||
},
|
||||
"operations/recommendations": {
|
||||
"title": "Recommendations"
|
||||
},
|
||||
"operations/reset-cluster": {
|
||||
"title": "reset-cluster tool"
|
||||
},
|
||||
|
|
Loading…
Reference in New Issue