Commit Graph

187 Commits

Author SHA1 Message Date
Jonathan Wei 182261f713 Allow configurable temp directory for query processing (#3893) 2017-02-02 10:22:28 -08:00
Niketh Sabbineni 2b8d3c102b Remove throttling on drop segments (#3736)
* Remove throttling on drop

* Throttle loadqueuepeon segment change requests to ZK

* Make initial delay configurable, add docs, shutdown gracefully

* Make loadqueuepeon repeat delay configurable
2017-01-20 10:02:19 -08:00
Gian Merlino d51f5e058d SQL: Ditch CalciteConnection layer and add DruidMeta, extension aggregators. (#3852)
* SQL: Ditch CalciteConnection layer and add DruidMeta, extension aggregators.

Switched from CalciteConnection to Planner, bringing benefits:

- CalciteConnection's JDBC interface no longer sits between the SQL server
  (HTTP/Avatica) and Druid's query layer. Instead, the SQL servers can use
  Druid Sequence objects directly, reducing overhead in the query return path.

- Implemented our own Planner-based Avatica Meta, letting us control
  connection timeouts and connection / statement limits. The previous
  CalciteConnection-based implementation didn't have any limits or timeouts.

- The Planner interface lets us override the operator table, opening up
  SQL language extensions. This patch includes two: APPROX_COUNT_DISTINCT
  in core, and a QUANTILE aggregator in the druid-histogram extension.

Also:

- Added INFORMATION_SCHEMA metadata schema.

- Added tests for Unicode literals and escapes.

* Verify statement is actually open before closing it.

* More detailed INFORMATION_SCHEMA docs.
2017-01-19 16:32:20 -08:00
Gian Merlino e86859b228 SQL support for nested groupBys. (#3806)
* SQL support for nested groupBys.

Allows, for example, doing exact count distinct by writing:

  SELECT COUNT(*) FROM (SELECT DISTINCT col FROM druid.foo)

Contrast with approximate count distinct, which is:

  SELECT COUNT(DISTINCT col) FROM druid.foo

* Add deeply-nested groupBy docs, tests, and maxQueryCount config.

* Extract magic constants into statics.

* Rework rules to put preconditions in the "matches" method.
2017-01-11 18:32:53 -08:00
Gian Merlino 76620615a1 Properly respect the enableAvatica and enableJsonOverHttp options. (#3834) 2017-01-11 14:43:34 -06:00
Vinh Tran dddeae813a Update caching.md typo (#3824)
* Update caching.md

Typo of Command vs Comma

* Update index.md

Fixing `Command` typo
2017-01-06 12:14:07 -08:00
Yuusaku Taniguchi 02519d5b64 Exhibitor Support (#3664)
* allow JsonConfigTesterBase to treat the fields of collections

* [Feature] Exhibitor Support (#3664)

This patch provides the integration of Druid & Netflix Exhibitor. Druid
currently use Apache Curator as ZooKeeper client. Curator can be
integrated with Exhibitor to achieve a live/updating list of the
ZooKeeper ensemble. This patch enables Druid to use this features.
2017-01-02 09:15:36 -08:00
Himanshu 4ca3b7f1e4 overlord helpers framework and tasklog auto cleanup (#3677)
* overlord helpers framework and tasklog auto cleanup

* review comment changes

* further review comments addressed
2016-12-21 15:18:55 -08:00
Nishant 35160e5595 Add metrics for Query Count statistics (#3470)
* Add metrics for Query Count statistics

This PR adds a new metrics monitor “QueryCountStatsMonitor” which emits
three new metrics -
1) query/success/count - number of successful queries
2) query/failed/count - number of failed queries
3) query/interrupted/count - number of interrupted/timedout queries

fix bindings

* make fields final

* fix imports

* AsyncQueryForwardingServlet implement QueryStatsProvider

* remove unused import
2016-12-19 09:47:58 -08:00
Gian Merlino dd63f54325 Built-in SQL. (#3682) 2016-12-16 17:15:59 -08:00
Nishant 8cfcb95fbc Add Filtered and Composing request loggers (#3469)
* Add Filtered and Composing request loggers

Add Filtered and Composite Request loggers
- enables users to filter request logs for slow queries.

fix test

* review comments

* review comment

* remove unused import
2016-12-16 11:18:32 -08:00
Gian Merlino 943982b7b0 Configurable HTTP compression. (#3759)
* Configurable HTTP compression.

* Call real-time nodes real-time processes in docs.
2016-12-07 17:40:39 -08:00
Himanshu 06d0ef9c6c allow and load extensions with absolute paths in druid.extensions.loadList (#3747) 2016-12-06 17:40:23 -08:00
Niketh Sabbineni d904c79081 Normalized Cost Balancer (#3632)
* Normalized Cost Balancer

* Adding documentation and renaming to use diskNormalizedCostBalancer

* Remove balancer from the strings

* Update docs and include random cost balancer

* Fix checkstyle issues
2016-12-05 17:18:20 -08:00
Niketh Sabbineni 2640d170c3 Blacklist workers if they fail for too many times (#3643)
* Blacklist workers if they fail for too many times

* Adding documentation

* Changing to timeout to period and updating docs

* 1. Add configurable maxPercentageBlacklistWorkers
2. Rename variable

* Change maxPercentageBlacklistWorkers to double

* Remove thread.sleep
2016-11-29 12:38:56 +05:30
Erik Dubbelboer 7d36f540e8 WIP: Add Google Storage support (#2458)
Also excludes the correct artifacts from #2741
2016-11-16 14:06:45 +05:30
Gian Merlino 7a2a4bc6de JavaScript: Disable now affects worker selection and router strategy too. (#3458) 2016-09-13 16:37:42 -07:00
Gian Merlino e0e28866ee JavaScript docs: Fix links and typos, add to TOC. (#3457) 2016-09-13 15:26:44 -07:00
Gian Merlino 76a24054e3 JavaScript docs, including docs for globals. (#3454) 2016-09-13 13:46:55 -07:00
Himanshu 03cfcf002b fix the race described in #3174 (#3205) 2016-08-10 11:29:50 -07:00
Nishant 8035c73409 Implement EnvironmentVariablePasswordProvider (#3329)
* Implement EnvironmentVariablePasswordProvider

* Review Comment : rename passwordKey to passwordVariable

* add docs

* improve doc layout

* review comment: rename property for variable
2016-08-10 05:33:51 +08:00
Navis Ryu 39351fb8d2 Mask properties from logging (#3332)
* Mask properties from logging

* mask "password" by default
2016-08-08 21:36:10 +05:30
Charles Allen d04af6aee4 Add `slf4j` requst logger (#3146)
* Add `slf4j` requst logger

* Address comments

* Fix conflicts with master

* Fix removed map value
2016-07-29 15:15:41 -07:00
David Lim 9a068e1ba6 fix broken link and use of pipes in table (#3290) 2016-07-26 15:46:51 -07:00
Himanshu 3f82108d15 optionally enable coordinator auto kill tasks on all dataSources via dynamic config (#3250) 2016-07-17 18:47:52 -07:00
Fangjin Yang 8eeae2e844 remove bad docs on setting up clusters (#3188) 2016-07-01 15:41:40 -05:00
Gian Merlino 4cc39b2ee7 Alternative groupBy strategy. (#2998)
This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
2016-06-24 18:06:09 -07:00
Gian Merlino 2db5f49f35 Fix JavaScriptConfig. (#3062) 2016-06-02 23:59:00 -07:00
Parag Jain 44237e25d9 fix duration format and number format (#3057) 2016-06-02 10:09:21 -07:00
David Lim b489f63698 Supervisor for KafkaIndexTask (#2656)
* supervisor for kafka indexing tasks

* cr changes
2016-05-04 23:13:13 -07:00
Charles Allen 44e52acfc0 Link up metrics configuration to what they mean (#2921) 2016-05-04 10:30:02 -07:00
binlijin 9151099e08 add document for druid.segmentCache.numBootstrapThreads (#2872) 2016-04-22 12:06:08 +08:00
Himanshu 3cfd9c64c9 make singleThreaded groupBy query config overridable at query time (#2828)
* make isSingleThreaded groupBy query processing overridable at query time

* refactor code in GroupByMergedQueryRunner to make processing of single threaded and parallel merging of runners consistent
2016-04-21 17:12:58 -07:00
Gian Merlino c74391e54c JavaScript: Ability to disable. (#2853)
Fixes #2852.
2016-04-21 09:43:15 -05:00
Nishant dbf63f738f Add ability to filter segments for specific dataSources on broker without creating tiers (#2848)
* Add back FilteredServerView removed in a32906c7fd to reduce memory usage using watched tiers.

* Add functionality to specify "druid.broker.segment.watchedDataSources"
2016-04-19 10:10:06 -07:00
Nishant deb6ecf919 handle review comments for PR 2784
https://github.com/druid-io/druid/pull/2784#discussion_r59062021
2016-04-12 21:52:00 +05:30
Nishant edd74f2b67 Allow Lite DataSegment Announcements
separate config for each skipping dimensions, metrics and loadSpec

Add test

fix test comment

Add docs
2016-04-07 18:24:12 +05:30
Fangjin Yang 1e02eeab13 Merge pull request #2683 from metamx/default_retry
Better defaults for Retry policy for task actions
2016-03-29 08:02:59 -07:00
Fangjin Yang 9cb197adec Merge pull request #2722 from himanshug/fix_hadoop_jar_upload
config to explicitly specify classpath for hadoop container during hadoop ingestion
2016-03-28 14:49:03 -07:00
Himanshu Gupta e78a469fb7 UTs for ExtensionsConfig 2016-03-25 10:51:28 -05:00
Himanshu Gupta 004b00bb96 config to explicitly specify classpath for hadoop container during hadoop ingestion 2016-03-25 10:51:28 -05:00
Bingkun Guo 0fa04305a6 refine description for mergeBytesLimit 2016-03-24 13:17:24 -05:00
Robin 448e0127b9 dynamic config endpoint is at coordinator 2016-03-23 17:22:19 -05:00
Gian Merlino 451c0bc6d8 Merge pull request #2702 from pjain1/improve_docs
how to query in the querying section, correct default for select strategy, formatting
2016-03-22 16:40:35 -07:00
Parag Jain 39ecb9929d how to query, correct default for select strategy, formatting 2016-03-22 17:06:15 -05:00
Nishant ed8f39fcfe Better defaults for Retry policy for task actions
This PR changes the retry of task actions to be a bit more aggressive
by reducing the maxWait. Current defaults were 1 min to 10 mins, which
lead to a very delayed recovery in case there are any transient network
issues between the overlord and the peons.

doc changes.
2016-03-18 11:59:55 -07:00
Charles Allen 5da9a280b6 Query Time Lookup - Dynamic Configuration 2016-03-18 09:45:05 -07:00
Jonathan Wei 5ec5ac92c6 Merge pull request #2382 from himanshug/broker_segment_tier_selection
at broker, if configured, only add segments from specific tiers to the timeline
2016-03-14 16:53:06 -07:00
Bingkun Guo 96c981cd0a fix broken link for Tasks 2016-03-11 11:36:34 -06:00
Himanshu Gupta ca5de3f583 only allow lowering maxResults and maxIntermediateRows from groupBy query context 2016-03-08 15:03:59 -06:00
Himanshu Gupta 099acb4966 allow groupBy max[Intermediate]Rows limit be overridable by context 2016-03-07 15:22:41 -06:00
jisookim 177b575d41 fix default number of connections on broker config documentation 2016-03-03 13:50:48 -08:00
Björn Zettergren 2462c82c0e New defaults for maxRowsInMemory rowFlushBoundary
To bring consistency to docs and source this commit changes the default
values for maxRowsInMemory and rowFlushBoundary to 75000 after
discussion in PR https://github.com/druid-io/druid/pull/2457.

The previous default was 500000 and it's lower now on the grounds that
it's better for a default to be somewhat less efficient, and work,
than to reach for the stars and possibly result in
"OutOfMemoryError: java heap space" errors.
2016-03-01 13:50:28 +01:00
Charles Allen c6803c4364 Allow specifying peon javaOpts as an array 2016-02-26 13:24:35 -08:00
Gian Merlino eb13d7afe3 Merge pull request #2521 from himanshug/fix_2497
RTR has multiple threads for assignment of pending tasks now
2016-02-26 08:14:15 -08:00
Nishant 9f8faabddb Merge pull request #2469 from pdeva/patch-10
correct service names
2016-02-26 21:15:58 +05:30
Himanshu Gupta bc156effe7 RTR has multiple threads for assignment of pending tasks now. 2016-02-26 09:27:03 -06:00
Gian Merlino 23c993c9e7 Add druid.indexer.server.maxChatRequests for QoS; deprecate separate ports.
- Add druid.indexer.server.maxChatRequests, which sets up a QoSFilter on the main Jetty server.
- Deprecate druid.indexer.runner.separateIngestionEndpoint
- Deprecate druid.indexer.server.chathandler.*
2016-02-19 13:36:09 -08:00
pdeva dd81b5ebe4 correct service names
use a `/` instead of `:` cause thats how the service names are declared in the respective config files of coordinator and overlord
2016-02-13 15:26:19 -08:00
Gian Merlino e0c049c0b0 Make startup properties logging optional.
Off by default, but enabled in the example config files. See also #2452.
2016-02-12 14:12:16 -08:00
Charles Allen 3a6452c6d4 Make QuotableWhiteSpaceSplitter able to take json
* Fixes #2435
2016-02-10 16:42:14 -08:00
Himanshu Gupta d1cb17d3f7 at broker - only add segments from specific tiers to the timeline 2016-02-09 22:33:22 -06:00
Robin 1d57e3267d some minor doc changes 2016-02-09 08:20:53 -06:00
Himanshu Gupta b40c342cd1 make Global stupid pool cache size configurable 2016-02-05 14:18:06 -06:00
fjy 9e2295aa61 whitespace fixes 2016-02-04 16:25:51 -08:00
fjy 003f54e268 add doc rendering 2016-02-04 14:21:59 -08:00
fjy 1aa363cea7 new quickstart 2016-02-04 09:37:38 -08:00
Charles Allen c9393e5289 Add more docs around timezone handling
* Fixes #2356
2016-02-01 08:51:07 -08:00
Bingkun Guo b07db4089c fix doc: Setting druid.coordinator.merge.on will trigger an Append Task instead of Merge Task. 2016-01-26 10:20:32 -06:00
Slim Bouguerra e0d90f875c Graphite emitter 2016-01-21 13:43:37 -06:00
Robin c9368702fa do some editing of the instructions for using mysql for metadata 2016-01-21 10:37:30 -06:00
Fangjin Yang 2e54553a8f Merge pull request #1990 from himanshug/schedule_kill_task
support periodic hard delete of segments
2016-01-15 15:22:33 -06:00
Nikita Geer 1908d63162 acl for zookeeper is added 2016-01-13 14:56:05 -08:00
Himanshu Gupta eb2d251ac8 support periodic hard delete of segments 2016-01-12 16:55:05 -06:00
Himanshu 82bdfbbbf1 Merge pull request #2155 from metamx/taskConfigTmpdir
Make TaskConfig pull from java.io.tmpdir
2016-01-05 13:58:39 -06:00
Charles Allen e18301d99c Make TaskConfig pull from java.io.tmpdir
* Also makes paths built off of java.nio.file.Paths instead of String.format
2016-01-04 10:17:08 -08:00
pdeva 77863285e9 fix typo 2015-12-27 14:28:23 -08:00
pdeva b308a13483 correct docs 2015-12-27 14:27:20 -08:00
Gian Merlino bad270b6c4 druid.indexer.task.restoreTasksOnRestart configuration. 2015-12-22 10:59:15 -08:00
Bingkun Guo 951a4e9b35 Remove SingleDataSegmentAnnouncer in favor of BatchDataSegmentAnnouncer 2015-12-21 00:05:53 -06:00
Steve M 2b5a010332 Change sample worker config spec with host:port instead of ip:port.
Also extend description of the 'affinity' property of the worker strategy
fillCapacityWithAffinity and fix a couple typos of middle manager (to
be more consistent throughout the page).

Add additional verbiage about appropriate middle manager host value.
2015-12-14 14:59:23 -08:00
Fangjin Yang b0ab363022 Merge pull request #2052 from gianm/service-names
Change service names in docs, examples to match defaults in the code.
2015-12-08 15:40:35 -08:00
Nishant 9491e8de3b Remove ServerView from RealtimeIndexTasks and use coordinator http endpoint for handoffs
- fixes #1970
- extracted out segment handoff callbacks in SegmentHandoffNotifier
which is responsible for tracking segment handoffs and doing callbacks
when handoff is complete.
- Coordinator now maintains a view of segments in the cluster, this
will affect the jam heap requirements for the overlord for large
clusters.
realtime index task and nodes now use HTTP end points exposed by the
coordinator to get serverView

review comment

fix realtime node guide injection

review comments

make test not rely on scheduled exec

fix compilation

fix import

review comment

introduce immutableSegmentLoadInfo

fix son reading

remove unnecessary logging
2015-12-09 01:54:09 +05:30
Gian Merlino 8e594a2e72 Change service names in docs, examples to match defaults in the code. 2015-12-06 10:04:21 -08:00
Gian Merlino 501dcb43fa Some changes that make it possible to restart tasks on the same hardware.
This is done by killing and respawning the jvms rather than reconnecting to existing
jvms, for a couple reasons. One is that it lets you restore tasks after server reboots
too, and another is that it lets you upgrade all the software on a box at once by just
restarting everything.

The main changes are,

1) Add "canRestore" and "stopGracefully" methods to Tasks that say if a task can
   stop gracefully, and actually do a graceful stop. RealtimeIndexTask is the only
   one that currently implements this.

2) Add "stop" method to TaskRunners that attempts to do an orderly shutdown.
   ThreadPoolTaskRunner- call stopGracefully on restorable tasks, wait for exit
   ForkingTaskRunner- close output stream to restorable tasks, wait for exit
   RemoteTaskRunner- do nothing special, we actually don't want to shutdown

3) Add "restore" method to TaskRunners that attempts to bootstrap tasks from last run.
   Only ForkingTaskRunner does anything here. It maintains a "restore.json" file with
   a list of restorable tasks.

4) Have the CliPeon's ExecutorLifecycle lock the task base directory to avoid a restored
   task and a zombie old task from stomping on each other.
2015-11-23 11:22:08 -08:00
Charles Allen 8fcf2403e3 Merge pull request #1943 from metamx/realtime-caching
Enable caching on intermediate realtime persists
2015-11-17 15:06:43 -08:00
Charles Allen dbe201aeed Merge pull request #1929 from pjain1/jetty_threads
separate ingestion and query thread pool
2015-11-17 12:14:25 -08:00
Parag Jain 6c498b7d4a separate ingestion and query thread pool 2015-11-17 13:42:41 -06:00
Xavier Léauté d7eb2f717e enable query caching on intermediate realtime persists 2015-11-17 10:58:00 -08:00
Bartosz Ługowski 6e5d2c6745 Add count parameter to history endpoints. 2015-11-11 23:03:57 +01:00
Xavier Léauté cf779946ef Merge pull request #1791 from guobingkun/event_receiver_firehose_monitor
EventReceiverFirehoseMonitor
2015-11-10 11:09:42 -08:00
Xavier Léauté e9533db987 Merge pull request #1850 from metamx/friendlyBardCache
Allow setting upper limit on the number of cache segments a broker will try to fetch.
2015-11-06 10:25:49 -08:00
Bingkun Guo 3ee28c35ce fix curator compress doc 2015-11-03 16:48:59 -06:00
Bingkun Guo 962f65cc76 fix metadata typo and rename default extension directory 2015-11-03 14:50:42 -06:00
Oleg Zaezdny 95a5ae0373 Docs improved by adding more details about local cache and memory for segments on historicals. 2015-11-01 21:56:28 +02:00
Bingkun Guo c3b6fcce9d Add EventReceiverFirehoseMonitor
add an EventReceiverFirehoseMonitor so that we can monitor how many
events have been queued in the EventReceiverFirehose and get a sense
about whether the firehose is under too much pressure.
2015-10-30 11:40:02 -05:00
Charles Allen dfce14ed17 Allow setting upper limit on the number of cache segments a broker will try to fetch. 2015-10-29 11:50:00 -07:00
Xavier Léauté 59872bd0cd Merge pull request #1809 from metamx/fifoPriorityExecutorService
Make PrioritizedExecutorService optionally FIFO
2015-10-27 15:19:32 -07:00
Gian Merlino 7df7370935 Merge pull request #1862 from metamx/indexingServiceMMGone
Add timeout to shutdown request to middle manager for indexing service
2015-10-27 14:38:01 -07:00
Charles Allen ecdafa87c5 Make PrioritizedExecutorService optionally FIFO 2015-10-27 14:16:22 -07:00