diff --git a/docs/operations/basic-cluster-tuning.md b/docs/operations/basic-cluster-tuning.md index 653cd114c15..a376ad97c94 100644 --- a/docs/operations/basic-cluster-tuning.md +++ b/docs/operations/basic-cluster-tuning.md @@ -420,7 +420,7 @@ Enabling process termination on out-of-memory errors is useful as well, since th `ExitOnOutOfMemoryError` flag is only supported starting JDK 8u92 . For older versions, `-XX:OnOutOfMemoryError='kill -9 %p'` can be used. -`MaxDirectMemorySize` restricts JVM from allocating more than specified limit, by setting it to unlimited JVM restriction is lifted and OS level memory limits would still be effective. It's still important to make sure that Druid is not configured to allocate more off-heap memory than your machine has available. Important settings here include druid.processing.numThreads, druid.processing.numMergeBuffers, and druid.processing.buffer.sizeBytes. +`MaxDirectMemorySize` restricts JVM from allocating more than specified limit, by setting it to unlimited JVM restriction is lifted and OS level memory limits would still be effective. It's still important to make sure that Druid is not configured to allocate more off-heap memory than your machine has available. Important settings here include `druid.processing.numThreads`, `druid.processing.numMergeBuffers`, and `druid.processing.buffer.sizeBytes`. Please note that above flags are general guidelines only. Be cautious and feel free to change them if necessary for the specific deployment. @@ -455,7 +455,7 @@ For Historical, MiddleManager, and Indexer processes (and for really large clust ##### ulimit -The limit on the number of open files can be set permanantly by editing `/etc/security/limits.conf`. This value should be substantially greater than the number of segment files that will exist on the server. +The limit on the number of open files can be set permanently by editing `/etc/security/limits.conf`. This value should be substantially greater than the number of segment files that will exist on the server. ##### max_map_count diff --git a/docs/querying/groupbyquery.md b/docs/querying/groupbyquery.md index a7dc72558db..d15b42b792d 100644 --- a/docs/querying/groupbyquery.md +++ b/docs/querying/groupbyquery.md @@ -267,26 +267,26 @@ as the index, so the aggregated values in the array can be accessed directly wit When using groupBy v2, three parameters control resource usage and limits: -- druid.processing.buffer.sizeBytes: size of the off-heap hash table used for aggregation, per query, in bytes. At -most druid.processing.numMergeBuffers of these will be created at once, which also serves as an upper limit on the +- `druid.processing.buffer.sizeBytes`: size of the off-heap hash table used for aggregation, per query, in bytes. At +most `druid.processing.numMergeBuffers` of these will be created at once, which also serves as an upper limit on the number of concurrently running groupBy queries. -- druid.query.groupBy.maxMergingDictionarySize: size of the on-heap dictionary used when grouping on strings, per query, +- `druid.query.groupBy.maxMergingDictionarySize`: size of the on-heap dictionary used when grouping on strings, per query, in bytes. Note that this is based on a rough estimate of the dictionary size, not the actual size. -- druid.query.groupBy.maxOnDiskStorage: amount of space on disk used for aggregation, per query, in bytes. By default, +- `druid.query.groupBy.maxOnDiskStorage`: amount of space on disk used for aggregation, per query, in bytes. By default, this is 0, which means aggregation will not use disk. -If maxOnDiskStorage is 0 (the default) then a query that exceeds either the on-heap dictionary limit, or the off-heap +If `maxOnDiskStorage` is 0 (the default) then a query that exceeds either the on-heap dictionary limit, or the off-heap aggregation table limit, will fail with a "Resource limit exceeded" error describing the limit that was exceeded. -If maxOnDiskStorage is greater than 0, queries that exceed the in-memory limits will start using disk for aggregation. +If `maxOnDiskStorage` is greater than 0, queries that exceed the in-memory limits will start using disk for aggregation. In this case, when either the on-heap dictionary or off-heap hash table fills up, partially aggregated records will be sorted and flushed to disk. Then, both in-memory structures will be cleared out for further aggregation. Queries that -then go on to exceed maxOnDiskStorage will fail with a "Resource limit exceeded" error indicating that they ran out of +then go on to exceed `maxOnDiskStorage` will fail with a "Resource limit exceeded" error indicating that they ran out of disk space. With groupBy v2, cluster operators should make sure that the off-heap hash tables and on-heap merging dictionaries will not exceed available memory for the maximum possible concurrent query load (given by -druid.processing.numMergeBuffers). See the [basic cluster tuning guide](../operations/basic-cluster-tuning.md) +`druid.processing.numMergeBuffers`). See the [basic cluster tuning guide](../operations/basic-cluster-tuning.md) for more details about direct memory usage, organized by Druid process type. Brokers do not need merge buffers for basic groupBy queries. Queries with subqueries (using a `query` dataSource) require one merge buffer if there is a single subquery, or two merge buffers if there is more than one layer of nested subqueries. Queries with [subtotals](groupbyquery.html#more-on-subtotalsspec) need one merge buffer. These can stack on top of each other: a groupBy query with multiple layers of nested subqueries, and that also uses subtotals, will need three merge buffers. @@ -294,7 +294,7 @@ Brokers do not need merge buffers for basic groupBy queries. Queries with subque Historicals and ingestion tasks need one merge buffer for each groupBy query, unless [parallel combination](groupbyquery.html#parallel-combine) is enabled, in which case they need two merge buffers per query. When using groupBy v1, all aggregation is done on-heap, and resource limits are done through the parameter -druid.query.groupBy.maxResults. This is a cap on the maximum number of results in a result set. Queries that exceed +`druid.query.groupBy.maxResults`. This is a cap on the maximum number of results in a result set. Queries that exceed this limit will fail with a "Resource limit exceeded" error indicating they exceeded their row limit. Cluster operators should make sure that the on-heap aggregations will not exceed available JVM heap space for the expected concurrent query load. diff --git a/docs/querying/query-context.md b/docs/querying/query-context.md index a093706e1f1..a3c8c1a4db7 100644 --- a/docs/querying/query-context.md +++ b/docs/querying/query-context.md @@ -30,13 +30,13 @@ The query context is used for various query configuration parameters. The follow |timeout | `druid.server.http.defaultQueryTimeout`| Query timeout in millis, beyond which unfinished queries will be cancelled. 0 timeout means `no timeout`. To set the default timeout, see [Broker configuration](../configuration/index.html#broker) | |priority | `0` | Query Priority. Queries with higher priority get precedence for computational resources.| |queryId | auto-generated | Unique identifier given to this query. If a query ID is set or known, this can be used to cancel the query | -|useCache | `true` | Flag indicating whether to leverage the query cache for this query. When set to false, it disables reading from the query cache for this query. When set to true, Apache Druid (incubating) uses druid.broker.cache.useCache or druid.historical.cache.useCache to determine whether or not to read from the query cache | -|populateCache | `true` | Flag indicating whether to save the results of the query to the query cache. Primarily used for debugging. When set to false, it disables saving the results of this query to the query cache. When set to true, Druid uses druid.broker.cache.populateCache or druid.historical.cache.populateCache to determine whether or not to save the results of this query to the query cache | -|useResultLevelCache | `true` | Flag indicating whether to leverage the result level cache for this query. When set to false, it disables reading from the query cache for this query. When set to true, Druid uses druid.broker.cache.useResultLevelCache to determine whether or not to read from the result-level query cache | -|populateResultLevelCache | `true` | Flag indicating whether to save the results of the query to the result level cache. Primarily used for debugging. When set to false, it disables saving the results of this query to the query cache. When set to true, Druid uses druid.broker.cache.populateResultLevelCache to determine whether or not to save the results of this query to the result-level query cache | +|useCache | `true` | Flag indicating whether to leverage the query cache for this query. When set to false, it disables reading from the query cache for this query. When set to true, Apache Druid (incubating) uses `druid.broker.cache.useCache` or `druid.historical.cache.useCache` to determine whether or not to read from the query cache | +|populateCache | `true` | Flag indicating whether to save the results of the query to the query cache. Primarily used for debugging. When set to false, it disables saving the results of this query to the query cache. When set to true, Druid uses `druid.broker.cache.populateCache` or `druid.historical.cache.populateCache` to determine whether or not to save the results of this query to the query cache | +|useResultLevelCache | `true` | Flag indicating whether to leverage the result level cache for this query. When set to false, it disables reading from the query cache for this query. When set to true, Druid uses `druid.broker.cache.useResultLevelCache` to determine whether or not to read from the result-level query cache | +|populateResultLevelCache | `true` | Flag indicating whether to save the results of the query to the result level cache. Primarily used for debugging. When set to false, it disables saving the results of this query to the query cache. When set to true, Druid uses `druid.broker.cache.populateResultLevelCache` to determine whether or not to save the results of this query to the result-level query cache | |bySegment | `false` | Return "by segment" results. Primarily used for debugging, setting it to `true` returns results associated with the data segment they came from | |finalize | `true` | Flag indicating whether to "finalize" aggregation results. Primarily used for debugging. For instance, the `hyperUnique` aggregator will return the full HyperLogLog sketch instead of the estimated cardinality when this flag is set to `false` | -|chunkPeriod | `P0D` (off) | At the Broker process level, long interval queries (of any type) may be broken into shorter interval queries to parallelize merging more than normal. Broken up queries will use a larger share of cluster resources, but, if you use groupBy "v1, it may be able to complete faster as a result. Use ISO 8601 periods. For example, if this property is set to `P1M` (one month), then a query covering a year would be broken into 12 smaller queries. The broker uses its query processing executor service to initiate processing for query chunks, so make sure "druid.processing.numThreads" is configured appropriately on the broker. [groupBy queries](groupbyquery.html) do not support chunkPeriod by default, although they do if using the legacy "v1" engine. This context is deprecated since it's only useful for groupBy "v1", and will be removed in the future releases.| +|chunkPeriod | `P0D` (off) | At the Broker process level, long interval queries (of any type) may be broken into shorter interval queries to parallelize merging more than normal. Broken up queries will use a larger share of cluster resources, but, if you use groupBy "v1, it may be able to complete faster as a result. Use ISO 8601 periods. For example, if this property is set to `P1M` (one month), then a query covering a year would be broken into 12 smaller queries. The broker uses its query processing executor service to initiate processing for query chunks, so make sure `druid.processing.numThreads` is configured appropriately on the broker. [groupBy queries](groupbyquery.html) do not support chunkPeriod by default, although they do if using the legacy "v1" engine. This context is deprecated since it's only useful for groupBy "v1", and will be removed in the future releases.| |maxScatterGatherBytes| `druid.server.http.maxScatterGatherBytes` | Maximum number of bytes gathered from data processes such as Historicals and realtime processes to execute a query. This parameter can be used to further reduce `maxScatterGatherBytes` limit at query time. See [Broker configuration](../configuration/index.html#broker) for more details.| |maxQueuedBytes | `druid.broker.http.maxQueuedBytes` | Maximum number of bytes queued per query before exerting backpressure on the channel to the data server. Similar to `maxScatterGatherBytes`, except unlike that configuration, this one will trigger backpressure rather than query failure. Zero means disabled.| |serializeDateTimeAsLong| `false` | If true, DateTime is serialized as long in the result returned by Broker and the data transportation between Broker and compute process| diff --git a/website/.spelling b/website/.spelling index 927a7fb4c51..bb5a8782aa6 100644 --- a/website/.spelling +++ b/website/.spelling @@ -87,6 +87,7 @@ IndexSpec IndexTask InfluxDB Integer.MAX_VALUE +JBOD JDBC JDK JDK7 @@ -247,6 +248,7 @@ lookback lookups mapreduce masse +max_map_count memcached mergeable metadata @@ -336,6 +338,7 @@ timestamp timestamps tradeoffs tsv +ulimit unannounce unannouncements unary