* Cache: Add maxEntrySize config.
The idea is this makes it more feasible to cache query types that
can potentially generate large result sets, like groupBy and select,
without fear of writing too much to the cache per query.
Includes a refactor of cache population code in CachingQueryRunner and
CachingClusteredClient, such that they now use the same CachePopulator
interface with two implementations: one for foreground and one for
background.
The main reason for splitting the foreground / background impls is
that the foreground impl can have a more effective implementation of
maxEntrySize. It can stop retaining subvalues for the cache early.
* Add CachePopulatorStats.
* Fix whitespace.
* Fix docs.
* Fix various tests.
* Add tests.
* Fix tests.
* Better tests
* Remove conflict markers.
* Fix licenses.
* Add graceful shutdown timeout
* Handle interruptedException
* Incorporate code review comments
* Address code review comments
* Poll for activeConnections to be zero
* Use statistics handler to get active requests
* Use native jetty shutdown gracefully
* Move log line back to where it was
* Add unannounce wait time
* Make the default retain prior behavior
* Update docs with new config defaults
* Make duration handling on jetty shutdown more consistent
* StatisticsHandler is a wrapper
* Move jetty lifecycle error logging to error
* maxQueryTimeout property in runtime properties.
* extra line
* move withTimeoutAndMaxScatterGatherBytes method to QueryLifeCycle.
* Fix initialize method.
* remove unused import.
* doc update.
* some more details in doc about query failure..
* minor fix.
* decorating QueryRunner to set and verify context. Added by servers.
* remove whitespace.
* add configs to enable fast request failure on broker
* address review comments
* fix styling error
* fix style error
* have enableRequestLimit config instead of having user specify max limit
* add comment
* fix style error
* add UT fo LimitRequestsFilter
* address review comments
* fix test
* make LimitRequestsFilterTest more robust
* fix JettyQosTest
* remove ServerConfig from DruidNode as all information needs to be present in DruidNode serialized form
* sanitize output of /druid/coordinator/v1/cluster endpoint
* Fixed (#4216)
Modify the default value of `druid.server.http.numThreads` to `Math.max(10, (Runtime.getRuntime().availableProcessors() * 17) / 16 + 2) + 30`
* Fixed(#4216)
Modify the default value of `druid.server.http.numThreads` to `max(10, (Number of cores * 17) / 16 + 2) + 30`
* Fixed(#4216)
Modify the default value of `druid.server.http.numThreads` to `max(10, (Number of cores * 17) / 16 + 2) + 30`
* Make timeout behavior consistent to document
* Refactoring BlockingPool and add more methods to QueryContexts
* remove unused imports
* Addressed comments
* Address comments
* remove unused method
* Make default query timeout configurable
* Fix test failure
* Change timeout from period to millis
This puts all the SQL stuff in one place. It also makes life easier by
pointing out that configs be made in either common.runtime.properties
or the broker runtime.properties.
* SQL: Ditch CalciteConnection layer and add DruidMeta, extension aggregators.
Switched from CalciteConnection to Planner, bringing benefits:
- CalciteConnection's JDBC interface no longer sits between the SQL server
(HTTP/Avatica) and Druid's query layer. Instead, the SQL servers can use
Druid Sequence objects directly, reducing overhead in the query return path.
- Implemented our own Planner-based Avatica Meta, letting us control
connection timeouts and connection / statement limits. The previous
CalciteConnection-based implementation didn't have any limits or timeouts.
- The Planner interface lets us override the operator table, opening up
SQL language extensions. This patch includes two: APPROX_COUNT_DISTINCT
in core, and a QUANTILE aggregator in the druid-histogram extension.
Also:
- Added INFORMATION_SCHEMA metadata schema.
- Added tests for Unicode literals and escapes.
* Verify statement is actually open before closing it.
* More detailed INFORMATION_SCHEMA docs.
* SQL support for nested groupBys.
Allows, for example, doing exact count distinct by writing:
SELECT COUNT(*) FROM (SELECT DISTINCT col FROM druid.foo)
Contrast with approximate count distinct, which is:
SELECT COUNT(DISTINCT col) FROM druid.foo
* Add deeply-nested groupBy docs, tests, and maxQueryCount config.
* Extract magic constants into statics.
* Rework rules to put preconditions in the "matches" method.
This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.
Both of these are described in more detail in #2987.
There are two goals of this patch:
1. Make it possible for historical/realtime nodes to return larger groupBy
result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
columns, avoiding materialization.
This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
* make isSingleThreaded groupBy query processing overridable at query time
* refactor code in GroupByMergedQueryRunner to make processing of single threaded and parallel merging of runners consistent
* Add back FilteredServerView removed in a32906c7fd to reduce memory usage using watched tiers.
* Add functionality to specify "druid.broker.segment.watchedDataSources"