Commit Graph

1056 Commits

Author SHA1 Message Date
Gian Merlino 195c9c5240 Overlord: Avoid a scary Jersey warning.
Avoids the following message from being printed on Overlord startup:

WARNING: Parameter 1 of type io.druid.indexing.common.actions.TaskActionHolder<T> from
public <T> javax.ws.rs.core.Response io.druid.indexing.overlord.http.OverlordResource.doAction
(io.druid.indexing.common.actions.TaskActionHolder<T>) is not resolvable to a concrete type
2016-03-28 19:08:56 -07:00
Fangjin Yang c2284929dc Merge pull request #2739 from gianm/fix-wtmtest-failure
Fix handling of InterruptedException in WorkerTaskMonitor's mainLoop.
2016-03-28 14:52:10 -07:00
Gian Merlino ee4bb96855 Fix handling of InterruptedException in WorkerTaskMonitor's mainLoop.
I believe this will fix #2664.
2016-03-25 12:17:33 -07:00
Himanshu Gupta 004b00bb96 config to explicitly specify classpath for hadoop container during hadoop ingestion 2016-03-25 10:51:28 -05:00
Himanshu 00d7021291 Merge pull request #2607 from jon-wei/dim_schema
Support use of DimensionSchema class in DimensionsSpec
2016-03-22 11:53:46 -05:00
Himanshu 3220b109ad Merge pull request #2570 from binlijin/single_dimension_partitioning
Single dimension hash-based partitioning
2016-03-22 11:51:06 -05:00
binlijin bce600f5d5 Single dimension hash-based partitioning 2016-03-22 13:15:33 +08:00
jon-wei a59c9ee1b1 Support use of DimensionSchema class in DimensionsSpec 2016-03-21 13:12:04 -07:00
Nishant ed8f39fcfe Better defaults for Retry policy for task actions
This PR changes the retry of task actions to be a bit more aggressive
by reducing the maxWait. Current defaults were 1 min to 10 mins, which
lead to a very delayed recovery in case there are any transient network
issues between the overlord and the peons.

doc changes.
2016-03-18 11:59:55 -07:00
Charles Allen a52c6d3bee Fix some google related imports 2016-03-17 11:03:29 -07:00
Nishant 9cceff2274 Use ImmutableWorkerInfo instead of ZKWorker
review comments

add test for equals and hashcode
2016-03-14 11:17:15 -07:00
Himanshu d51a0a0cf4 Merge pull request #2220 from gianm/appenderator-kafka
Appenderators, DataSource metadata, KafkaIndexTask
2016-03-14 13:14:36 -05:00
Nishant cf7f6da392 Merge pull request #2634 from gianm/stopGracefully-avoid-interrupt
ThreadPoolTaskRunner: Make graceful shutdown logs less scary.
2016-03-11 16:36:10 -08:00
Charles Allen a3f0048ea4 Merge pull request #2631 from gianm/plumbers-rpe
Better logging for ParseExceptions on index aggregation, and remove unnecessary exception handling.
2016-03-11 14:22:58 -08:00
Gian Merlino 79a95f7789 WorkerTaskMonitor: stop() waits for mainLoop to exit.
Fixes #2637.
2016-03-11 11:40:13 -08:00
Gian Merlino 05397a9b4f ThreadPoolTaskRunner: Make graceful shutdown logs less scary.
- It's okay to suppress InterruptedException during graceful shutdown, as
  tasks may use it to accelerate their own shutdown.
- It's okay to ignore return statuses during graceful shutdown (which may
  be FAILED!) because it actually doesn't matter what they are.
2016-03-11 07:49:29 -08:00
Gian Merlino 187569e702 DataSource metadata.
Geared towards supporting transactional inserts of new segments. This involves an
interface "DataSourceMetadata" that allows combining of partially specified metadata
(useful for partitioned ingestion).

DataSource metadata is stored in a new "dataSource" table.
2016-03-10 17:41:50 -08:00
Gian Merlino 3d2214377d Appenderatoring.
Appenderators are a way of getting more control over the ingestion process
than a Plumber allows. The idea is that existing Plumbers could be implemented
using Appenderators, but you could also implement things that Plumbers can't do.

FiniteAppenderatorDrivers help simplify indexing a finite stream of data.

Also:
- Sink: Ability to consider itself "finished" vs "still writable".
- Sink: Ability to return the number of rows contained within the sink.
2016-03-10 17:41:50 -08:00
Gian Merlino 92c828f904 Make SegmentHandoffNotifier Closeable. 2016-03-10 16:50:37 -08:00
Gian Merlino 8a11161b20 Plumbers: Move plumber.add out of try/catch for ParseException.
The incremental indexes handle that now so it's not necessary.

Also, add debug logging and more detailed exceptions to the incremental
indexes for the case where there are parse exceptions during aggregation.
2016-03-10 16:39:26 -08:00
Charles Allen d299540efc Make HadoopTask load hadoop dependency classes LAST for local isolated classrunner 2016-03-10 10:18:23 -08:00
Himanshu Gupta 0402636598 configurable handoffConditionTimeout in realtime tasks for segment handoff wait 2016-03-05 10:14:54 -06:00
Gian Merlino e9c23bf376 OverlordResource: Use getZkWorkers on RemoteTaskRunner.
Restores old behavior of this api, from before #2249 when getWorkers returned ZkWorkers.
2016-03-02 17:31:34 -08:00
Fangjin Yang 80d954578d Merge pull request #2572 from gianm/fix-rit-taskresource
Fix default TaskResource for RealtimeIndexTasks.
2016-03-02 10:20:27 -08:00
Gian Merlino acd95d3e28 TaskLocation: Add toString method.
Necessary because these objects are used in log messages.
2016-03-01 17:52:06 -08:00
Gian Merlino a355bfb7a9 Fix default TaskResource for RealtimeIndexTasks.
It was supposed to be the same as the task id, but it wasn't because
"makeTaskId" has a random component.
2016-03-01 16:54:22 -08:00
Björn Zettergren 2462c82c0e New defaults for maxRowsInMemory rowFlushBoundary
To bring consistency to docs and source this commit changes the default
values for maxRowsInMemory and rowFlushBoundary to 75000 after
discussion in PR https://github.com/druid-io/druid/pull/2457.

The previous default was 500000 and it's lower now on the grounds that
it's better for a default to be somewhat less efficient, and work,
than to reach for the stars and possibly result in
"OutOfMemoryError: java heap space" errors.
2016-03-01 13:50:28 +01:00
Charles Allen c6803c4364 Allow specifying peon javaOpts as an array 2016-02-26 13:24:35 -08:00
Himanshu Gupta bc156effe7 RTR has multiple threads for assignment of pending tasks now. 2016-02-26 09:27:03 -06:00
Fangjin Yang 53a5f07c14 Merge pull request #2544 from metamx/fixMaxPort
Limit PortFinder to 0xFFFF
2016-02-25 17:12:53 -08:00
Fangjin Yang 143e85eaa5 Merge pull request #2419 from gianm/task-hostports
Plumb task peon host/ports back out to the overlord.
2016-02-25 17:11:53 -08:00
Charles Allen 3fa7a7ebfe Limit PortFinder to 0xFFFF 2016-02-25 08:16:40 -08:00
Charles Allen 187b788089 UnRegister port in ForkingTaskRunner 2016-02-25 08:04:25 -08:00
Gian Merlino cf0bc905fb Plumb task peon host/ports back out to the overlord.
- Add TaskLocation class
- Add registerListener to TaskRunner
- Add getLocation to TaskRunnerWorkItem
- Implement location tracking in existing TaskRunners
- Rework WorkerTaskMonitor to do management out of a single thread so it can
  handle status and location updates more simply.
2016-02-24 15:13:10 -08:00
Nishant fb7eae34ed Merge pull request #2249 from metamx/workerExpanded
Use Worker instead of ZkWorker whenever possible
2016-02-24 13:23:22 +05:30
Charles Allen ac13a5942a Use Worker instead of ZkWorker whenver possible
* Moves last run task state information to Worker
* Makes WorkerTaskRunner a TaskRunner which has interfaces to help with getting information about a Worker
2016-02-23 15:02:03 -08:00
Gian Merlino 3534483433 Better handling of ParseExceptions.
Two changes:
- Allow IncrementalIndex to suppress ParseExceptions on "aggregate".
- Add "reportParseExceptions" option to realtime tuning configs. By default this is "false".

Behavior of the counters should now be:

- processed: Number of rows indexed, including rows where some fields could be parsed and some could not.
- thrownAway: Number of rows thrown away due to rejection policy.
- unparseable: Number of rows thrown away due to being completely unparseable (no fields salvageable at all).

If "reportParseExceptions" is true then "unparseable" will always be zero (because a parse error would
cause an exception to be thrown). In addition, "processed" will only include fully parseable rows
(because even partial parse failures will cause exceptions to be thrown).

Fixes #2510.
2016-02-23 10:11:43 -08:00
Bingkun Guo 499288ff4b Merge pull request #2509 from metamx/hadoopIsolatorTest
Add hadoop classloader isolation tests for HadoopTask
2016-02-19 14:23:22 -06:00
Fangjin Yang a3c29b91cc Merge pull request #2505 from gianm/rt-exceptions
Harmonize realtime indexing loop across the task and standalone nodes.
2016-02-19 11:23:14 -08:00
Charles Allen 9dff0e5dbd Add hadoop classloader isolation tests for HadoopTask 2016-02-19 11:15:53 -08:00
Fangjin Yang ddf913d626 Merge pull request #2508 from gianm/ftr-shutdown-logging
ForkingTaskRunner: Better logging during orderly shutdown.
2016-02-19 10:02:24 -08:00
Gian Merlino c0c6cf77fa ForkingTaskRunner: Better logging during orderly shutdown. 2016-02-19 09:17:16 -08:00
Gian Merlino 243ac5399b Harmonize realtime indexing loop across the task and standalone nodes.
- Both now catch ParseExceptions on plumber.add (see https://groups.google.com/d/topic/druid-user/wmiRDvx2RvM/discussion)
- Standalone now treats IndexSizeExceededException as fatal (previously only the task did)
2016-02-19 07:34:15 -08:00
Charles Allen 87752be740 Make HadoopTasks's classloader a single one 2016-02-18 20:58:09 -08:00
Andrés Gomez 07d714b1b5 Fixed equal distribution strategy when exist disable middleManager with same currCapacityUsed. 2016-02-17 08:40:42 +01:00
Himanshu 5779b32742 Merge pull request #2439 from metamx/fix2435
Make QuotableWhiteSpaceSplitter able to take JSON
2016-02-11 13:14:43 -06:00
Charles Allen 40ade32a1f Fix dependencies.
* Don't put druid****selfcontained.jar at the end of the hadoop isolated classpath
* Add `<scope>provided</scope>` to prevent repeated dependency inclusion in the extension directories
2016-02-11 07:30:14 -08:00
Charles Allen 3a6452c6d4 Make QuotableWhiteSpaceSplitter able to take json
* Fixes #2435
2016-02-10 16:42:14 -08:00
Xavier Léauté 91f23583f5 Merge pull request #2436 from gianm/mm-less-suppressey
Harmonize znode writing code in RTR and Worker.
2016-02-10 16:11:30 -08:00
Gian Merlino fa92b77f5a Harmonize znode writing code in RTR and Worker.
- Throw most exceptions rather than suppressing them, which should help
  detect problems. Continue suppressing exceptions that make sense to
  suppress.
- Handle payload length checks consistently, and improve error message.
- Remove unused WorkerCuratorCoordinator.announceTaskAnnouncement method.
- Max znode length should be int, not long.
- Add tests.
2016-02-10 14:52:00 -08:00
Charles Allen 2bde8b1d68 Make hadoop classpath isolation more explicit
* Fixes #2428
2016-02-10 12:09:17 -08:00
Charles Allen a0728fa854 Allow ScalingStats to be null
* Fixes #2378
2016-02-02 18:01:01 -08:00
Parag Jain 7853a9cc41 clean up TaskLifecycleTest 2016-01-31 11:19:20 -06:00
Gian Merlino 5fd4b79373 RealtimeIndexTask: Fix NPE caused by calling stopGracefully before a firehose had been connected. 2016-01-29 11:20:23 -08:00
Gian Merlino c4fde52160 Fix 'graceful shutdown aborted' log message in ThreadPoolTaskRunner. 2016-01-29 11:07:17 -08:00
Nishant dcb7830330 Merge pull request #984 from drcrallen/thread-priority-rebase
Use thread priorities. (aka set `nice` values for background-like tasks)
2016-01-21 15:02:34 +05:30
Charles Allen 66e74b1a63 Minor field name change in RemoteTaskRunnerFactory to be more descriptive
* Addresses https://github.com/druid-io/druid/pull/2309#discussion_r50335081
2016-01-20 17:43:20 -08:00
Charles Allen 3152d08844 Fix overlord scheduled executor injection
* Fixes https://github.com/druid-io/druid/issues/2308
2016-01-20 14:16:14 -08:00
Charles Allen 2e1d6aaf3d Use thread priorities. (aka set `nice` values for background-like tasks)
* Defaults the thread priority to java.util.Thread.NORM_PRIORITY in io.druid.indexing.common.task.AbstractTask
 * Each exec service has its own Task Factory which is assigned a priority for spawned task. Therefore each priority class has a unique exec service
 * Added priority to tasks as taskPriority in the task context. <0 means low, 0 means take default, >0 means high. It is up to any particular implementation to determine how to handle these numbers
 * Add options to ForkingTaskRunner
    * Add "-XX:+UseThreadPriorities" default option
    * Add "-XX:ThreadPriorityPolicy=42" default option
 * AbstractTask - Removed unneded @JsonIgnore on priority
 * Added priority to RealtimePlumber executors. All sub-executors (non query runners) get Thread.MIN_PRIORITY
 * Add persistThreadPriority and mergeThreadPriority to realtime tuning config
2016-01-20 14:00:31 -08:00
Nishant ac6c90e657 Merge pull request #1953 from metamx/taskRunnerResourceManagement
Move resource managemnt to be the responsibility of the TaskRunner
2016-01-20 22:27:47 +05:30
Jonathan Wei df2906a91c Merge pull request #2290 from gianm/index-merger-v9-stuff
Respect buildV9Directly in PlumberSchools, so it works on standalone realtime.
2016-01-19 13:04:00 -08:00
Fangjin Yang 0c31f007fc Merge pull request #1728 from himanshug/aggregators_in_segment_metadata
Store AggregatorFactory[] in segment metadata
2016-01-19 12:55:49 -08:00
Himanshu fe841fd961 Merge pull request #2118 from guobingkun/fix_segment_loading
Fix loading segment for historical
2016-01-19 14:25:48 -06:00
Himanshu Gupta a99aef29a1 adding aggregators to segment metadata 2016-01-19 14:23:39 -06:00
Gian Merlino 1dcf22edb7 Respect buildV9Directly in PlumberSchools, so it works on standalone realtime nodes.
Also parameterize some tests to run with/without buildV9Directly:

- IndexGeneratorJobTest
- RealtimeIndexTaskTest
- RealtimePlumberSchoolTest
2016-01-19 12:15:06 -08:00
Bingkun Guo c4ad50f92c Fix loading segment for historical
Historical will drop a segment that shouldn't be dropped in the following scenario:
Historical node tried to load segmentA, but failed with SegmentLoadingException,
then ZkCoordinator called removeSegment(segmentA, blah) to schedule a runnable that would drop segmentA by deleting its files. Now, before that runnable executed, another LOAD request was sent to this historical, this time historical actually succeeded on loading segmentA and announced it. But later on, the scheduled drop-of-segment runnable started executing and removed the segment files, while historical is still announcing segmentA.
2016-01-19 10:29:49 -06:00
Himanshu Gupta 164b0aad7a removing Map<String,Object> segmentMetadata from methods in Index[Maker/Merger] and using Metadata class
instead of a Map to store segment metadata
2016-01-18 22:03:46 -06:00
Kurt Young 82ff98c2bf add config for build v9 directly and update docs 2016-01-16 11:26:34 +08:00
Charles Allen 976d4c965b Move resource managemnt to be the responsibility of the TaskRunner 2016-01-13 10:38:22 -08:00
Himanshu 82bdfbbbf1 Merge pull request #2155 from metamx/taskConfigTmpdir
Make TaskConfig pull from java.io.tmpdir
2016-01-05 13:58:39 -06:00
Nishant 45f402f22f increase timeout
tune timeouts
2016-01-05 19:06:04 +05:30
Charles Allen e18301d99c Make TaskConfig pull from java.io.tmpdir
* Also makes paths built off of java.nio.file.Paths instead of String.format
2016-01-04 10:17:08 -08:00
fjy b5c094d951 Fixes #2180 2016-01-01 16:56:41 -08:00
Nishant b68265399c Merge pull request #2168 from druid-io/remove-indexmaker
Remove IndexMaker
2015-12-30 12:24:29 +05:30
Nishant df893dbaf8 Merge pull request #2141 from gianm/fix-restoring-realtime
Fix some problems with restoring
2015-12-30 10:44:45 +05:30
Fangjin Yang 7ffa706655 Merge pull request #2152 from metamx/add-taskId
Add taskId to realtimeMetrics
2015-12-29 10:33:40 -08:00
fjy 38b0f1fbc2 fix transient failures in unit tests 2015-12-28 20:03:30 -08:00
fjy faf421726b remove IndexMaker 2015-12-28 14:19:02 -08:00
Fangjin Yang 8cb52bddd8 Merge pull request #2140 from navis/fix-sporadic-testfail4
Fix sporadic fail of RemoteTaskRunnerTest#testWorkerRemoved
2015-12-27 14:55:49 -08:00
Fangjin Yang 9aa62e4631 Merge pull request #2154 from navis/fix-testfail-WorkerTaskMonitorTest
Fix sporadic fail of WorkerTaskMonitorTest#testRunTask
2015-12-23 20:52:33 -08:00
navis.ryu a8f6c6110d Fix sporadic fail of WorkerTaskMonitorTest#testRunTask 2015-12-24 02:31:30 +09:00
navis.ryu 2c3c4a3f8f Another try to fix xxServerViewTests 2015-12-24 02:13:40 +09:00
Nishant 978a3fd8ae Add taskId to realtimeMetrics
Add task Id to Realtime Metrics
2015-12-23 18:05:25 +05:30
Gian Merlino 32edd1538d RealtimeIndexTask: Fix a couple of problems with restoring.
- Shedding locks at startup is bad, we actually want to keep them. Stop doing that.
- stopGracefully now interrupts the run thread if had started running finishJob. This avoids
  waiting for handoff unnecessarily.
2015-12-22 16:04:47 -08:00
Gian Merlino f4ce2b9bc5 TaskLockbox: Consider active tasks active even if they have no locks. 2015-12-22 16:04:16 -08:00
Gian Merlino bad270b6c4 druid.indexer.task.restoreTasksOnRestart configuration. 2015-12-22 10:59:15 -08:00
navis.ryu 8a179fc273 Fix sporadic fail of RemoteTaskRunnerTest#testWorkerRemoved 2015-12-22 14:33:37 +09:00
Himanshu Gupta 5e178499e8 trying to fix transient errors in testRealtimeIndexTask() by increasing overall timeout and unlimited wait for segment publish 2015-12-21 00:11:20 -06:00
Fangjin Yang 14229ba0f2 Merge pull request #1922 from metamx/jsonIgnoresFinalFields
Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to
2015-12-18 15:38:32 -08:00
Bingkun Guo 1e5aa2f3ac fix getType() and Json serialization in ClientMergeQuery and add serde tests 2015-12-15 12:08:43 -06:00
Nishant a32906c7fd Remove FilteredServerView 2015-12-09 01:54:12 +05:30
Nishant 9491e8de3b Remove ServerView from RealtimeIndexTasks and use coordinator http endpoint for handoffs
- fixes #1970
- extracted out segment handoff callbacks in SegmentHandoffNotifier
which is responsible for tracking segment handoffs and doing callbacks
when handoff is complete.
- Coordinator now maintains a view of segments in the cluster, this
will affect the jam heap requirements for the overlord for large
clusters.
realtime index task and nodes now use HTTP end points exposed by the
coordinator to get serverView

review comment

fix realtime node guide injection

review comments

make test not rely on scheduled exec

fix compilation

fix import

review comment

introduce immutableSegmentLoadInfo

fix son reading

remove unnecessary logging
2015-12-09 01:54:09 +05:30
Himanshu Gupta 62ba9ade37 unifying license header in all java files 2015-12-05 22:16:23 -06:00
Gian Merlino 20544d409b Merge pull request #1988 from himanshug/multi-interval-batch-delta
support multiple intervals in dataSource inputSpec
2015-12-04 09:07:52 -08:00
Gian Merlino 020a5e7081 Merge pull request #2024 from metamx/fairBigTaskQueueLock
Make the TaskQueue big lock fair
2015-12-03 19:32:53 -08:00
Himanshu Gupta 61aaa09012 support multiple intervals in dataSource input spec 2015-12-03 21:28:04 -06:00
Himanshu Gupta 86f0a36e83 support multiple intervals in SegmentListUsedAction 2015-12-03 21:28:04 -06:00
Himanshu Gupta 221fb95d07 add support for getting used segments for multiple interval in IndexerMetadataStorageCoordinator 2015-12-03 21:28:04 -06:00
Charles Allen dbaaa6af92 Make the TaskQueue big lock fair 2015-12-01 19:13:07 -08:00
Nishant 1eb8211346 Add datasource and taskId to metrics emitted by peons
This PR adds the datasource and taskId to the jvm and sys metrics
emitted by the peons.

fix spelling

review comment

review comment
2015-12-01 23:20:59 +05:30
Fangjin Yang 8e83d800d6 Merge pull request #1881 from gianm/restartable-tasks
Restorable indexing tasks
2015-11-23 21:14:37 -08:00
Gian Merlino 501dcb43fa Some changes that make it possible to restart tasks on the same hardware.
This is done by killing and respawning the jvms rather than reconnecting to existing
jvms, for a couple reasons. One is that it lets you restore tasks after server reboots
too, and another is that it lets you upgrade all the software on a box at once by just
restarting everything.

The main changes are,

1) Add "canRestore" and "stopGracefully" methods to Tasks that say if a task can
   stop gracefully, and actually do a graceful stop. RealtimeIndexTask is the only
   one that currently implements this.

2) Add "stop" method to TaskRunners that attempts to do an orderly shutdown.
   ThreadPoolTaskRunner- call stopGracefully on restorable tasks, wait for exit
   ForkingTaskRunner- close output stream to restorable tasks, wait for exit
   RemoteTaskRunner- do nothing special, we actually don't want to shutdown

3) Add "restore" method to TaskRunners that attempts to bootstrap tasks from last run.
   Only ForkingTaskRunner does anything here. It maintains a "restore.json" file with
   a list of restorable tasks.

4) Have the CliPeon's ExecutorLifecycle lock the task base directory to avoid a restored
   task and a zombie old task from stomping on each other.
2015-11-23 11:22:08 -08:00
Gian Merlino 666d785787 Switch TaskActions from Optionals to nullable.
Deserialization of Optionals does not work quite right- they come back as actual
nulls, rather than absent Optionals. So these probably only ever worked for the local
task action client.
2015-11-20 09:14:07 -08:00
Fangjin Yang 21c84b5ff7 Merge pull request #1896 from gianm/allocate-segment
SegmentAllocateAction (fixes #1515)
2015-11-18 21:05:46 -08:00
Fangjin Yang e52c156066 Merge pull request #1880 from gianm/rtr-adjust
RTR: Ensure that there is only one cleanup task scheduled for a worker at once.
2015-11-18 15:12:55 -08:00
Charles Allen 8fcf2403e3 Merge pull request #1943 from metamx/realtime-caching
Enable caching on intermediate realtime persists
2015-11-17 15:06:43 -08:00
Charles Allen dbe201aeed Merge pull request #1929 from pjain1/jetty_threads
separate ingestion and query thread pool
2015-11-17 12:14:25 -08:00
Parag Jain 6c498b7d4a separate ingestion and query thread pool 2015-11-17 13:42:41 -06:00
Xavier Léauté d7eb2f717e enable query caching on intermediate realtime persists 2015-11-17 10:58:00 -08:00
Charles Allen 46527a9610 Merge pull request #1954 from metamx/fix-stupid-aws-limit
EC2 autoscaler: avoid hitting aws filter limits
2015-11-13 10:52:35 -08:00
Fangjin Yang 4f46d457f1 Merge pull request #1947 from noddi/feature/count-parameter-history-endpoints
Add count parameter to history endpoints
2015-11-12 10:23:44 -08:00
Xavier Léauté 749ac12f88 EC2 autoscaler: avoid hitting aws filter limits 2015-11-11 20:28:06 -08:00
Fangjin Yang 465cbcf9a7 Merge pull request #1956 from metamx/remove-unused-imports
Cleanup + remove unused imports
2015-11-11 17:36:47 -08:00
Gian Merlino e4e5f0375b SegmentAllocateAction (fixes #1515)
This is a feature meant to allow realtime tasks to work without being told upfront
what shardSpec they should use (so we can potentially publish a variable number
of segments per interval).

The idea is that there is a "pendingSegments" table in the metadata store that
tracks allocated segments. Each one has a segment id (the same segment id we know
and love) and is also part of a sequence.

The sequences are an idea from @cheddar that offers a way of doing replication.
If there are N tasks reading exactly the same data with exactly the same logic
(think Kafka tasks reading a fixed range of offsets) then you can place them
in the same sequence, and they will generate the same sequence of segments.
2015-11-11 16:54:35 -08:00
Bartosz Ługowski 6e5d2c6745 Add count parameter to history endpoints. 2015-11-11 23:03:57 +01:00
Xavier Léauté fa6142e217 cleanup and remove unused imports 2015-11-11 12:25:21 -08:00
zhxiaog c197a4cf32 fix #1918, add unit tests for RemoteTaskActionClient 2015-11-12 03:15:17 +08:00
Charles Allen abae47850a Add backwards compatability for PR #1922 2015-11-11 10:27:00 -08:00
Charles Allen 1df4baf489 Move Jackson Guice adapters into io.druid
* Removes access to protected methods in com.fasterxml
* Eliminates druid-common's use of foreign package com.fasterxml
2015-11-09 10:50:45 -08:00
Gian Merlino fc55314d1c ForkingTaskRunner: Log without buffering.
In #933 the ForkingTaskRunner's logging was changed to buffered from
unbuffered. This means that the last few KB of the logs are generally
not visible while a task is running, which makes debugging running
tasks difficult.
2015-11-07 15:16:53 -08:00
Charles Allen 929b981710 Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to 2015-11-05 18:10:13 -08:00
Gian Merlino cb409ee928 RemoteTaskActionClient: Fix statusCode check. 2015-11-05 10:03:49 -08:00
fjy 8f231fd3e3 cleanup druid codebase 2015-11-04 13:59:53 -08:00
Himanshu Gupta 84f7d8d264 making static final variables in HadoopDruidIndexerConfig upper case 2015-11-02 23:24:26 -06:00
Himanshu Gupta 8b67417ac8 make methods in Index[Merger,Maker,IO] non-static so that they can have
appropriate ObjectMapper injected instead of creating one statically
2015-11-02 23:24:26 -06:00
Gian Merlino 16ae8866b8 Log and continue on failure to schedule cleanup for missing workers at startup. 2015-10-28 08:10:54 -07:00
Gian Merlino 513bc76252 RTR: Ensure that there is only one cleanup task scheduled for a worker at once.
This is accomplished by making sure that scheduleTasksCleanupForWorker is only called
from the PathChildrenCache event thread, having it cancel existing cleanup tasks when
it adds a new one, and having tasks check on finish that the thing they are removing
from the task list is actually themselves.
2015-10-27 21:16:58 -07:00
Fangjin Yang ea2267e08c Merge pull request #1868 from gianm/fix-announcements
Historical and MiddleManager server announcements should not remove parents.
2015-10-27 14:50:05 -07:00
Gian Merlino 7df7370935 Merge pull request #1862 from metamx/indexingServiceMMGone
Add timeout to shutdown request to middle manager for indexing service
2015-10-27 14:38:01 -07:00
Charles Allen 44a2b204df Add timeout to shutdown request to middle manager for indexing service 2015-10-27 13:56:03 -07:00
Gian Merlino 4b92752deb Historical and MiddleManager server announcements should not remove parents.
Removing parent paths causes watchers of the "announcements" path to get stuck
and stop seeing new updates.
2015-10-27 08:06:11 -07:00
Bingkun Guo 4914925d65 New extension loading mechanism
1) Remove maven client from downloading extensions at runtime.
2) Provide a way to load Druid extensions and hadoop dependencies through file system.
3) Refactor pull-deps so that it can download extensions into extension directories.
4) Add documents on how to use this new extension loading mechanism.
5) Change the way how Druid tarball is generated. Now all the extensions + hadoop-client 2.3.0
are packaged within the Druid tarball.
2015-10-21 14:22:36 -05:00
Charles Allen 532e1c9fd5 Do not pass `druid.indexer.runner.javaOpts` to Peon as a property
* Still places `druid.indexer.runner.javaOpts` on the command line, but the Peon no longer tries to have the property `druid.indexer.runner.javaOpts` set
* Fixes https://github.com/druid-io/druid/issues/1841
2015-10-20 09:24:01 -07:00
Charles Allen bf11723a52 Update usages of io.druid.client.selector.Server to build URL or URI directly instead of using String.format 2015-10-12 12:30:56 -07:00
Charles Allen 2d847ad654 Merge pull request #1730 from metamx/union-queries-fix
fix #1727 - Union bySegment queries fix
2015-09-29 12:23:25 -07:00
Nishant 573aa96bd6 fix #1727 - Union bySegment queries fix
Fixes #1727.
revert to doing merging for results for union queries on broker.

revert unrelated changes

Add test for union query runner

Add test

remove unused imports

fix imports

fix renamed file

fix test

update docs.
2015-09-29 23:32:36 +05:30
Charles Allen d2e400f063 Merge pull request #1740 from metamx/validate-locks
fix #1715
2015-09-29 09:38:42 -07:00
Xavier Léauté 25bbc0b923 Merge pull request #1778 from gianm/redirect-fixes
Redirect fixes
2015-09-25 09:54:48 -07:00
Gian Merlino 348172203f OverlordRedirectInfo: Fix ability to detect that there is no leader. 2015-09-25 09:30:09 -07:00
Parag Jain b630720164 fail task if finishjob throws any exception
add realtime task failure test
2015-09-25 10:55:45 -05:00
Fangjin Yang aa9d90355e Merge pull request #1772 from gianm/fix-overlord-startup
RemoteTaskRunner: Fix for starting an overlord before any workers ever existed.
2015-09-24 21:55:03 -07:00
Gian Merlino 63bf021077 RemoteTaskRunner: Fix for starting an overlord before any workers ever existed. 2015-09-24 21:15:36 -07:00
Himanshu Gupta 6e550d5346 update doc about aggregation field in merge task and a null check 2015-09-24 22:25:07 -05:00
Nishant b638400acb fix #1715
fixes #1715
- TaskLockBox has a set of active tasks
- lock requests throws exception for if they are from a task not in
active task set.
- TaskQueue is responsible for updating the active task set on
tasklockbox

fix #1715

fixes #1715
- TaskLockBox has a set of active tasks
- lock requests throws exception for if they are from a task not in
active task set.
- TaskQueue is responsible for updating the active task set on
tasklockbox

review comment

remove duplicate line

use ISE instead

organise imports
2015-09-24 10:06:50 +05:30
Himanshu 61b0743943 Merge pull request #1748 from metamx/forkingJavaOptionsWithQuotes
Allow ForkingTaskRunner javaOpts to have quoted arguments which contain spaces
2015-09-21 21:03:00 -05:00
Charles Allen 465035e531 Allow ForkingTaskRunner javaOpts to have quoted arguments which contain spaces 2015-09-21 17:32:27 -07:00
Fangjin Yang e48f6dd660 Merge pull request #1736 from gianm/additional-ingest-segment-timeline-test
IngestSegmentFirehostFactoryTimelineTest for overshadowing of the middle of a segment.
2015-09-17 14:42:29 -07:00
Gian Merlino 64e33b2bcb IngestSegmentFirehostFactoryTimelineTest for overshadowing of the middle of a segment. 2015-09-16 10:17:43 -07:00
Himanshu Gupta 74f4572bd4 Lazily deserialize "parser" to InputRowParser in DataSchema
so that user hadoop related InputRowParsers are created only when needed
this allows overlord to accept a HadoopIndexTask with a hadoopy InputRowParser
and not fail because hadoopy InputRowParser might need hadoop libraries
2015-09-16 10:58:13 -05:00
Charles Allen f5ed6e885c Merge pull request #1702 from himanshug/double_datasource_in_storage_dir
do not have dataSource twice in path to segment storage on hdfs
2015-09-15 14:00:35 -07:00
Nishant 4681ff22ed add task duration in response for completed tasks 2015-09-10 13:51:50 +05:30
Himanshu Gupta fe0233adf2 removing unused imports from HadoopIndexTask 2015-09-09 11:12:01 -05:00
Nishant 47aac991ec add null check for task context.
make variable final
2015-09-04 22:19:01 +05:30
Fangjin Yang 75a582974b Merge pull request #1639 from gianm/new-plumber
New plumber
2015-09-03 18:52:57 -07:00
Gian Merlino 062a47fba4 Modify Plumbers in these ways,
1) Persist using Committer instead of Runnable. (Although the metadata object
   is ignored in this patch)

2) Remove the getSink method.

3) Plumbers are now responsible for time-based and hydrant-full-based periodic
   committing. (FireChief, RealtimeIndexTask, and IndexTask used to do this)
2015-09-03 11:13:06 -07:00
Nishant 726326abc3 Add Task Context and ability to override task specific properties
override javaOpts

fix compilation

review comments

Add Test for typecast

review comments - remove unused method.
2015-09-03 23:36:32 +05:30
Gian Merlino 940e1aa3eb Replace funky imports with standard ones.
1) Lots of Guava imports were not coming from the actual Guava
2) junit.framework.Assert should be org.junit.Assert
2015-08-28 18:02:05 -07:00
Gian Merlino 414a6fb477 Fix overlapping segments in IngestSegmentFirehose, DatasourceInputFormat.
Fixes #1678. IngestSegmentFirehose (and its users) need to remember which
windows of which segments should actually be read, based on a timeline.
2015-08-28 07:32:41 -07:00
Himanshu Gupta 2e0dd1d792 adding UTs and addressing review comments to
firehoseV2 addition to Realtime[Manager|Plumber],
essential segment metadata persist support,
kafka-simple-consumer-firehose extension patch
2015-08-27 20:50:46 -05:00
lvjq 2237a8cf0f kafka 8 simple consumer firehose 2015-08-27 20:50:46 -05:00
Nishant b306739e9c fix convert segment task
1) fix serde
2) fix wrong parameter being passed when creating subtask

remove sysout
2015-08-27 11:34:41 +05:30
Charles Allen e38cf54bc8 Migrate TestDerbyConnector to a JUnit @Rule 2015-08-26 21:47:40 -07:00
Xavier Léauté fdb6a6651b Merge pull request #1669 from metamx/upgrade-dependencies
Upgrade dependencies
2015-08-25 21:30:22 -07:00
Xavier Léauté 5c19ffa98c Merge pull request #1663 from gianm/segment-insert-constraints
TaskActionToolbox: Remove allowOlderVersions, lift interval constraint
2015-08-25 18:11:46 -07:00
Xavier Léauté 51f6a9a2c9 update jackson to 2.6.1 2015-08-25 16:07:01 -07:00
Gian Merlino 33681525e3 TaskActionToolbox: Remove allowOlderVersions switch, lift interval constraint.
allowOlderVersions has been stuck true for a while due to a bug (introduced in
566a3a61), but I think it's actually OK this way. I think it's reasonable to
expect tasks to choose versions in some way that makes sense, so long as they
don't choose one larger than their taskLock version. This is still verified.

The interval constraint was introduced to force tasks to break up their
segment insert lists into manageable chunks. They are already doing this, and
I think it's reasonable to expect them to do so without enforcement.

Lifting these constraints paves the way for transactional insertion of segments
that have varying versions and may be for varying intervals.
2015-08-25 14:17:38 -07:00
Paul Otto 2301b60365 Add ability to provide taskResource for IndexTask. 2015-08-24 17:38:31 -07:00
Himanshu Gupta 15fa43dd43 changing DatasourcePathSpec, to get segment list, so that hadoop indexer uses overlord action to get list of segments and passes when running as an overlord task. and, uses metadata store directly when running as standalone hadoop indexer
also, serialized list of segments is passed to DatasourcePathSpec so that hadoop classloader issues do not creep up
2015-08-16 14:07:35 -05:00
Himanshu Gupta 4d4aa8bfc6 refactor IngestSegmentFirehoseFactory so that IngestSegmentFirehose becomes reusable
Conflicts:
	indexing-service/src/main/java/io/druid/indexing/firehose/IngestSegmentFirehoseFactory.java
2015-08-14 14:44:22 -05:00
Gian Merlino bc0c7dd65d Avoid the Hadoop objectMapper in the local IndexTask. Fixes #1545. 2015-08-11 10:40:53 -07:00
Charles Allen 1ddaa3fb33 Merge pull request #1592 from metamx/clean-test-files
clean temporary files
2015-08-03 11:47:20 -07:00
Nishant 2679efee7a clean temporary files 2015-08-03 23:32:58 +05:30
Fangjin Yang 6f65e6d3ef Merge pull request #1547 from pjain1/improve_overlord_test
add test to OverlordResourceTest
2015-07-28 07:35:48 -10:00
Parag Jain 2e1b617346 add more tests 2015-07-24 15:12:08 -05:00
Fangjin Yang 97242356b4 Merge pull request #1480 from guobingkun/kill_task_test
Unit tests for KillTask and MetadataTaskStorage
2015-07-20 16:31:45 -07:00
Fangjin Yang 3f7ba58227 Merge pull request #1504 from metamx/fix-1447
fix for #1447
2015-07-14 08:50:08 -07:00
Himanshu e2ddfb7a1a Merge pull request #1511 from pjain1/remove_test
remove flaky overlord test
2015-07-13 18:38:34 -05:00
Parag Jain 59dec89f6a remove flaky overlord test 2015-07-13 15:32:12 -05:00
Himanshu 725086cc89 Merge pull request #1506 from gianm/realtime-plumber-nulls
Consider null inputRows and parse errors as unparseable during realtime ingestion.
2015-07-13 10:12:12 -05:00
Gian Merlino 9068bcd062 Consider null inputRows and parse errors as unparseable during realtime ingestion.
Also, harmonize exception handling between the RealtimeIndexTask and the RealtimeManager.
Conditions other than null inputRows and parse errors bubble up in both.
2015-07-11 20:40:03 -07:00
Himanshu cac722968e Merge pull request #1503 from metamx/fix-leaking-zk-nodes
Fix leaking Status Path nodes in ZK
2015-07-10 17:40:18 -05:00
Fangjin Yang 9f19e96658 Merge pull request #1477 from pjain1/overlord_test
overlord and task master test
2015-07-10 14:27:14 -07:00
Parag Jain 55c4fe64f3 overlord and task master test 2015-07-10 16:17:45 -05:00
Nishant 5fe27fe4ad fix for #1447
fixes #1447
2015-07-09 19:05:48 +05:30
Nishant 8d7a566bae Fix leaking Status Path nodes in ZK
- remove  ZK status path nodes for workers after they are removed
2015-07-09 17:20:09 +05:30
Charles Allen c0b60c0d2f I'm not your mom, indexing-service/test... cleanup after yourself 2015-07-01 15:00:09 -07:00
Bingkun Guo 282a0f9760 Unit tests for KillTask and MetadataTaskStorage 2015-06-29 17:55:41 -05:00
Himanshu b5b9ca1446 Merge pull request #1470 from pjain1/rtindex_test
Realtime Index Task test
2015-06-29 16:51:35 -05:00
Parag Jain 284b80b09e Realtime Index Task test 2015-06-29 09:52:41 -05:00
nishant fb4052d577 JavaScript Worker Select Strategy
this PR adds a JavaScriptWorkerSelectStrategy which allows defining
arbitrary logic for selecting workers to run task using a JavaScript
function.

This gives users full control to implement complex worker selection
strategies based on task attributes.

more tests and a complex javascript config

fix for java8 modify for nashorn compatibility
2015-06-20 02:01:34 +05:30
Charles Allen acc0a3fbf7 Add jitter to the retries for RemoteTaskActionClient 2015-06-12 17:43:25 -07:00
nishant e9afec4a2b fix task status issues on zk outages
docs

review comments

fix test

review comments

Review comments

fix compilation

fix typo
2015-06-11 00:49:52 +05:30
Xavier Léauté 78d468700b Merge pull request #1388 from metamx/fix-1360
fix race described in 1360
2015-06-10 11:59:36 -07:00
Xavier Léauté f6b336ac3e Merge pull request #1432 from metamx/config-fix
fix passing of config from IndexTuningConfig to RealtimeTuningConfig
2015-06-10 11:42:58 -07:00
nishant 963682d696 Add check for valid rowFlushBoundary configuration and fix tests 2015-06-10 21:38:34 +05:30
nishant 191b302f6a fix passing of config from IndexTuningConfig to RealtimeTuningConfig
- pass rowFlushboundary correctly instead of using default.
- fixes indexTask failing with
io.druid.segment.incremental.IndexSizeExceededException when
rowFlushboundary is set higher than
RealtimeTuningConfig.defaultMaxRowsInMemory

rename test method
2015-06-10 21:07:25 +05:30
nishant af9ea08041 fix race described in 1360
review comments

review comments

review comments

no need to remove

fix test

review comments
2015-06-10 12:19:12 +05:30
Charles Allen 056cab93ed Add Hadoop Converter Job and task
* Fixes https://github.com/druid-io/druid/issues/1363
* Add extra utils in JobHelper based on PR feedback
2015-06-09 14:47:38 -07:00
Charles Allen ef9b67cce3 Merge pull request #1422 from metamx/fix-ec2-public-ip
fix public IP not working in EC2 autoscaling
2015-06-03 16:30:51 -07:00
Xavier Léauté 4ebdfea76f fix public IP not working in EC2 autoscaling 2015-06-03 16:05:59 -07:00
Charles Allen 8289914f76 Make AbstractTask.makeId use AbstractTask.joinId
* Also remove TaskUtil
2015-06-03 13:24:20 -07:00
Fangjin Yang ac9057c00e Merge pull request #1401 from metamx/ec2-public-ip
flag to enable public IP in EC2 autoscaling
2015-05-28 20:21:32 -07:00
Xavier Léauté d834a974ba flag to enable public IP in EC2-VPC autoscaling 2015-05-28 18:14:12 -07:00
fjy bb1145ef56 Make the index task use indexmerger and not indexmaker 2015-05-28 13:34:57 -07:00
Xavier Léauté 5ad5d7d18b Merge pull request #1379 from flowbehappy/fix-hadoop-ha
bug fix: hdfs task log and indexing task not work properly with Hadoop HA
2015-05-22 09:14:50 -04:00
flow 07659f30ab bug fix: hdfs task log and indexing task not work properly with Hadoop HA 2015-05-21 20:49:42 +08:00
Charles Allen 29ba05c04f Abstractify HadoopTask
* Add `invokeForeignLoader` to commonize the way tasks are attempted to be launched in a foreign class loader
* Add `buildClassLoader` to accomplish the common tasks for hadoop jobs when building a ClassLoader
2015-05-14 17:04:43 -07:00
Gian Merlino e69d82a2b4 Realtime: Delay firehose connection until job is started.
Some firehoses (like the Kafka firehose) acquire input resources when they
connect, so it helps to delay this until after plumber.startJob() runs.
2015-05-04 10:54:07 -07:00
Xavier Léauté 721505c017 Merge pull request #1208 from druid-io/rework-metrics
Schemaless metrics + additional metrics for things we care about
2015-04-27 15:04:54 -07:00
fjy 963e5765bf Schemaless metrics + additional metrics for things we care about 2015-04-27 13:39:40 -07:00
Charles Allen 633fdb029e Add option to ConvertSegmentTask to skip validation
* Validation is enabled by default
2015-04-27 08:37:55 -07:00
Charles Allen 29341f9837 Fix random unit test failure from NoopTask ID collision 2015-04-24 13:07:48 -07:00
Xavier Léauté f73f14ab91 Merge pull request #1297 from metamx/versionConverterTaskUpdates
Update VersionConverterTask for IndexSpec and allowing Forced updates
2015-04-20 16:44:35 -07:00
Charles Allen 7479ac9012 Update VersionConverterTask for IndexSepc and allowing Forced updates 2015-04-20 16:17:06 -07:00
fjy d260515a43 update druid-api version 2015-04-17 14:58:35 -07:00
Xavier Léauté ea5572d001 Merge pull request #1271 from metamx/strictErrorChecking
Add stricter checking for potential coding errors
2015-04-15 15:21:41 -07:00
Charles Allen abdeaa0746 Add stricter checking for potential coding errors
Can use via `mvn clean compile test-compile -P strict'
2015-04-15 14:52:25 -07:00
Xavier Léauté 3a3046ccf3 add support for dimension compression
- compression for single-value dimensions using CompressedVSizeIntsIndexedSupplier
- makes dimension compression configurable via IndexSpec
- IndexSpec also enables configuring bitmap and metric compression
2015-04-14 10:44:18 -07:00
fjy 195a3b8bb8 ignore rows with invalid interval 2015-04-06 16:08:40 -07:00
Charles Allen 1c6cbea89c Revert "Revert "Overhaul of SegmentPullers to add consistency and retries""
This reverts commit f904bc7858.
2015-03-30 13:40:04 -07:00
Fangjin Yang f904bc7858 Revert "Overhaul of SegmentPullers to add consistency and retries" 2015-03-30 13:15:50 -07:00
Charles Allen 6d407e8677 Add URI handling to SegmentPullers
* Requires https://github.com/druid-io/druid-api/pull/37
* Requires https://github.com/metamx/java-util/pull/22
* Moves the puller logic to use a more standard workflow going through java-util helpers instead of re-writing the handlers for each impl
  * General workflow goes like this: 1) LoadSpec makes sure the correct Puller is called with the correct parameters. 2) The Puller sets up general information like how to make an InputStream, how to find a file name (for .gz files for example), and when to retry. 3) CompressionUtils does most of the heavy lifting when it can
2015-03-30 12:33:23 -07:00
msprunck 942c17a2aa Remove timeline chunk count assumptions.
* Replace with generic iterables
2015-03-24 22:40:49 +01:00
Xavier Léauté 9d6b728054 Merge pull request #1215 from metamx/log-audit-IP-Address
Add remote ip address in audit log.
2015-03-17 13:59:31 -07:00
fjy bfe10bd156 This fixes arbitrary gran spec breaking 2015-03-17 12:19:43 -07:00
nishantmonu51 f9821d242f also log author ip address in audit log 2015-03-17 23:15:15 +05:30
Xavier Léauté ddfafa0711 randomize task ID to fix spurious test failure 2015-03-12 18:08:48 -07:00
Fangjin Yang a508c0955f Merge pull request #1195 from himanshug/task_storage_config_fix
correctly parse recentlyFinishedThreshold from config
2015-03-12 16:50:49 -07:00
nishantmonu51 3ec4a30ab5 initial commit
review comments

more refactoring and cleaning of redundant code

add UT + docs + more refactoring

fixes + review comments

more cleanup

end points to fetch history

review comments

remove unnecessary changes

review comments rename header name

review comments + add test for MetadataRulesManager

review comments docs
2015-03-12 22:50:29 +05:30
Himanshu Gupta 23545fc01c correctly parse recentlyFinishedThreshold from config 2015-03-12 09:46:57 -05:00
Xavier Léauté d3f5bddc5c Add ability to apply extraction functions to the time dimension
- Moves DimExtractionFn under a more generic ExtractionFn interface to
  support extracting dimension values other than strings
- pushes down extractionFn to the storage adapter from query engine
- 'dimExtractionFn' parameter has been deprecated in favor of 'extractionFn'
- adds a TimeFormatExtractionFn, allowing to project the '__time' dimension
- JavascriptDimExtractionFn renamed to JavascriptExtractionFn, adding
  support for any dimension value types that map directly to Javascript
- update documentation for time column extraction and related changes
2015-03-11 16:45:42 -07:00
Gian Merlino b00c243786 Need a null check for iamProfile. 2015-03-10 17:52:15 -07:00
Gian Merlino b810cdfe58 EC2AutoScaler: Allow setting "iamProfile". 2015-03-10 17:41:35 -07:00
Gian Merlino d102a89760 Fix license on EC2AutoScalerSerdeTest. 2015-03-10 17:31:30 -07:00
Gian Merlino 9235b45063 EC2AutoScaler: Support for setting subnetId. 2015-03-10 11:29:56 -07:00
Xavier Léauté 113d204b10 break up archive task actions, which was missed in #566a3a6112 2015-03-04 13:19:52 -08:00
Himanshu Gupta bd5cecdd44 UTs update for indexing service 2015-02-25 15:45:58 -08:00
Xavier Léauté 53d2b961c5 default to canonical hostname instead of localhost 2015-02-18 16:44:48 -08:00
fjy 708759e1e0 Update http-client to 1.0.0 2015-02-10 13:36:47 -08:00
Charles Allen 79a3e8f59f Fix overriding base of IndexerZkConfig to be absolute instead of relative
* Updated docs to clarify ZK config behavior
* Added unit tests for this case
2015-02-04 13:04:06 -08:00
Fangjin Yang 92e616de11 Merge pull request #1077 from metamx/remove-unused-imports
remove unused imports
2015-02-02 10:45:27 -08:00
nishantmonu51 ba932bb1f2 remove unused imports 2015-02-02 21:53:39 +05:30
fjy d05032b98a towards a community led druid 2015-01-31 20:57:36 -08:00
Xavier Léauté a01a22dba1 Merge pull request #1074 from druid-io/overlord-leader
Add an endpoint to return the overlord leader
2015-01-30 13:44:49 -08:00
Xavier Léauté bd49528805 Merge pull request #1073 from druid-io/fix-statusPath
Fix worker status path announcement with indexer zk config
2015-01-30 12:51:21 -08:00
fjy 649f285feb Add an endpoint to return the overlord leader 2015-01-30 12:37:48 -08:00
fjy bc1405bee0 fix worker status path announcement with indexer zk config 2015-01-30 12:26:08 -08:00
Xavier Léauté 2c2771b90e Make dynamic worker selection actually work 2015-01-27 14:17:42 -08:00
nishantmonu51 0f3eac4705 fix dimension exclusion 2015-01-23 00:31:23 +05:30
fjy 2d516fa591 Add a new equal distribution strategy for assigning tasks 2015-01-20 13:12:22 -08:00
Xavier Léauté cd9635ff5e Merge pull request #1034 from druid-io/minor-rename
minor rename of things in hadoop ingestion config to match 0.6.x
2015-01-15 15:46:13 -08:00
fjy ccddbf8747 minor rename of things in hadoop ingestion config to match 0.6.x 2015-01-15 14:04:55 -08:00
Fangjin Yang 5bfcc43377 Merge pull request #1008 from metamx/stringConversionJavaUtilUpdate
Update all String conversions to and from byte[] to use the java-util StringUtils functions
2015-01-15 13:50:27 -08:00
Charles Allen 67757b6aea Change IndexerZkConfig to use @JacksonInject instead of just straight @Inject
* Updated IndexerZkConfig to use no setters, and take all arguments from constructor instead
* Also added more unit tests
2015-01-08 11:11:17 -08:00
Charles Allen f6fbb733b8 Added a few places where tests were using Object instead of Module 2015-01-05 13:47:25 -08:00
Charles Allen b1b5c9099e Update all String conversions to and from byte[] to use the java-util StringUtils functions
* Speedup of GroupBy with javaScript filters by ~10%
* Requires https://github.com/metamx/java-util/pull/15
2015-01-05 11:22:32 -08:00
Charles Allen 65286a24e0 Change zk configs to use Jackson injection instead of Skife
* Also added generic config testing class JsonConfigTesterBase
2014-12-29 10:36:12 -08:00
Fangjin Yang af1185b58c Merge pull request #969 from metamx/fixRemoteLogViewing
Remove try-with-resources for log stream in WokerResource
2014-12-15 16:26:02 -07:00
Charles Allen 54068e8b1d Remove try-with-resources for log stream in WokerResource 2014-12-15 15:24:59 -08:00
fjy ac407fb6ba clean up defaults 2014-12-15 15:05:02 -08:00
fjy e872952390 fix working path default bug 2014-12-15 14:51:58 -08:00
Fangjin Yang b3fe91bb50 Merge pull request #830 from metamx/union-merge-on-historical
Union merge on historical
2014-12-15 13:36:47 -07:00
Charles Allen bed3e7e1d2 Merge pull request #966 from metamx/fix-tasklog-streaming
fix task log streaming
2014-12-14 09:31:41 -08:00
Xavier Léauté bd91a40491 fix task log streaming 2014-12-13 15:22:55 -08:00
Xavier Léauté 092dfe0309 fix IndexTaskTest tmp dir
- Create local firehose files in a clean temp directory to avoid
firehose reading other random temp files that start with 'druid'
2014-12-12 17:05:45 -08:00
fjy 123db3da4d fix another broken ut 2014-12-09 15:47:28 -08:00
nishantmonu51 1a1b0e6f23 merge from master and review comments 2014-12-09 13:16:45 +05:30
Charles Allen a0f9f9877e Changed all "application/json" to MediaType.APPLICATION_JSON except for in druid.js 2014-12-08 14:21:49 -08:00
nishantmonu51 6e03a6245f Merge branch 'master' into onheap-incremental-index 2014-12-05 10:40:28 +05:30
Xavier Léauté 7cd45a6e1f IncrementalIndex throws exception if limit exceeded
- For now uses a hardcoded ratio of aggregator to timeanddim buffer sizes
- canAppendRow is a workaround for realtime index since the
Firehose currently does not have a way of rolling back the last event in
case of error
- canAppendRow needs a fudge factor; there is a race between checking
if we can add a row and actually adding a row, because of the way MapDB
reports its size.
2014-12-04 14:38:16 -08:00
Charles Allen 18234a2f00 Fix confusing error message in HadoopIndexTask 2014-12-04 10:57:57 -08:00
Gian Merlino 20a7239ffd Replace google-http-client imports with real guava imports. 2014-12-04 10:57:57 -08:00
Xavier Léauté 0c521e0a77 update joda-time and fix min/max instant 2014-12-04 10:57:56 -08:00
Fangjin Yang 27d4b2bdea Merge pull request #934 from metamx/fix-hadoop-metadata-injection
metadata update handler injection not needed for indexing service
2014-12-04 10:45:20 -07:00
Xavier Léauté 2e6c254937 metadata injection not needed for indexing service 2014-12-03 15:09:31 -08:00
Charles Allen 325a5c4abc Update ForkingTaskRunner to remove @Deprecated Files method usage 2014-12-03 13:18:33 -08:00
xvrl 2681da4420 Merge pull request #929 from metamx/google-cleanup
Replace google-http-client imports with real guava imports.
2014-12-03 11:50:19 -08:00
Charles Allen b6f71d3fd6 Fix confusing error message in HadoopIndexTask 2014-12-03 11:11:53 -08:00
Gian Merlino d388a8fe89 Replace google-http-client imports with real guava imports. 2014-12-03 10:52:57 -08:00
nishantmonu51 da8bd7836b Introduce buffer size 2014-12-03 16:28:22 +05:30
Xavier Léauté a79389a9e5 update joda-time and fix min/max instant 2014-12-02 17:27:22 -08:00
Xavier Léauté d23fd1e1ab make host+port more explicit
- document the behavior for node host/port initialization
- throw exception if settings make no sense
- fixes announcement for nodes without host/port defaults
- makes code clearer as to when host vs. host+port are used
2014-11-26 22:03:25 -08:00
Fangjin Yang 3ff569ef2d Merge pull request #879 from metamx/rtr-with-pref
Rewrite autoscaling and enable easier configuration of worker selection and autoscaling behaviour
2014-11-24 17:54:28 -07:00
fjy 3808411340 address some cr 2014-11-24 16:54:47 -08:00
fjy 13cae41f6c Merge branch 'master' into refactor-examples 2014-11-24 11:00:26 -08:00
fjy 9b701bbc76 a few more code review fixes 2014-11-24 10:54:29 -08:00
fjy 1aaea9a0d7 address code review 2014-11-24 10:52:30 -08:00
fjy 8ee4d12562 Refactor structure for examples and extensions 2014-11-21 14:45:24 -08:00
fjy 580e1172c1 move IndexTask to use hashed partition; fixes #815 2014-11-21 11:15:25 -08:00
fjy fdeab0c6af make Druid case sensitive 2014-11-19 14:27:31 -08:00
Fangjin Yang 109fdf0b34 Merge pull request #852 from metamx/druid-0.7.x-TaskLogStreamer
(DO NOT MERGE YET) Update logging to https://github.com/druid-io/druid-api/pull/27
2014-11-19 15:03:12 -07:00
Fangjin Yang 590d31799e Merge pull request #876 from metamx/remove-backwards-compatible
Remove backwards compatible
2014-11-19 14:33:14 -07:00
fjy 64719b15e0 rewrite autoscaling with tests 2014-11-18 15:41:06 -08:00
fjy c91310914b fix a few naming things 2014-11-17 16:05:18 -08:00
fjy 32600e10bb address code review 2014-11-17 15:55:22 -08:00
fjy 1af6b337f2 optionally choose what worker to send tasks to 2014-11-17 14:50:56 -08:00
Xavier Léauté d914afe1cd make defaultVersion configurable for non-jar testing 2014-11-17 13:54:32 -08:00
nishantmonu51 0c2d06475d merge from master 2014-11-17 19:19:18 +05:30
nishantmonu51 cbffe3c648 merge from master and resolve conflicts 2014-11-17 18:07:08 +05:30
nishantmonu51 ad1dd161e7 formatting 2014-11-17 17:44:26 +05:30
Fangjin Yang d4ca805cb9 Merge pull request #806 from metamx/rtac-error-messages
RemoteTaskActionClient: Better error messages for non-2xx responses.
2014-11-13 11:57:50 -07:00
Charles Allen a89b539b4f Merge pull request #823 from metamx/roaring
Configurable bitmap indexes: roaring and concise
2014-11-11 17:26:38 -08:00
fjy 1cc162727b address code review 2014-11-11 14:05:37 -08:00
Xavier Léauté aeb194ad12 fix injection for real 2014-11-07 11:43:53 -08:00
Fangjin Yang 2336e6c167 Merge pull request #758 from metamx/jisoo-metadata
make metadata storage pluggable
2014-11-07 11:30:11 -07:00
nishantmonu51 fd8eb7742b handle union query on realtime node 2014-11-07 23:27:50 +05:30
Xavier Léauté 5bda4ee1dd global task entry type 2014-11-06 17:08:20 -08:00
Charles Allen d52530e0de Update logging to https://github.com/druid-io/druid-api/pull/27 2014-11-06 11:43:25 -08:00
Xavier Léauté 9bc20ef8bf prefer druid.curator.compress to druid.indexer.runner.compressZnodes 2014-11-06 11:28:51 -08:00
Xavier Léauté 350bb09605 refactor sql storage to abstract task storage 2014-11-05 17:19:37 -08:00
Xavier Léauté 1872b8f979 make it easier to test 2014-10-31 14:49:07 -07:00
Xavier Léauté 97a2f5af4a rename db->metadata 2014-10-31 10:54:33 -07:00
Xavier Léauté 9c06db021f rename db->metadata postgres->postgresql 2014-10-31 10:30:27 -07:00
Xavier Léauté fb4d41cedb make the injection gods happy 2014-10-30 21:16:36 -07:00
Xavier Léauté 377151beda better abstraction for metadatastorage 2014-10-30 18:23:35 -07:00
jisookim0513 aa754b86e8 build success! 2014-10-24 11:28:42 -07:00
fjy bef74104d9 merge with 0.7.x and resolve any conflicts 2014-10-23 17:24:06 -07:00
Gian Merlino ca0a4bd8a5 RemoteTaskActionClient: Better error messages for non-2xx responses. 2014-10-22 18:02:22 -07:00
jisookim0513 02e79d6b15 attempted to solve merge-conflict; IncrementalIndex has unresolved classes after updates - needs to be fixed 2014-10-22 00:18:17 -07:00
jisookim0513 37979282fe enabled ansi-quote in mysql; insert statement should now work 2014-10-21 00:09:19 -07:00
jisookim0513 b8cbe2457a fixed variable name 2014-10-17 00:13:30 -07:00
jisookim0513 7d5c5f2083 fixed createTable; fixed miscellaneous stuff; added DerbyMetadataRuleManagerProvider 2014-10-17 00:10:36 -07:00
nishantmonu51 f4a97aebbc fix rollup for hashed partitions
truncate timestamp while calculating the partitionNumber
2014-10-15 22:32:56 +05:30
nishantmonu51 bce388fb27 merge changes from 0.7.x branch 2014-10-14 18:46:02 +05:30
nishantmonu51 b5d66381f3 more cleanup 2014-10-14 18:32:40 +05:30
nishantmonu51 454acd3f5a remove backwards compatible code
1) remove backwards compatible and deprecated code
2) make hashed partitions spec default
2014-10-13 19:30:44 +05:30
Gian Merlino e1fedbe741 RemoteTaskRunner should respect worker version changes (fixes #787). 2014-10-12 11:27:44 -07:00
jisookim0513 521398267c fixed inconsistent variable names 2014-10-10 17:00:50 -07:00
fjy c7b4d5b7b4 Merge branch 'master' into druid-0.7.x
Conflicts:
	processing/src/test/java/io/druid/segment/filter/SpatialFilterTest.java
2014-10-02 18:12:10 -07:00
fjy c3bea245a7 fix up some bugs 2014-09-30 17:20:52 -07:00
fjy a0782d4c54 fix compile 2014-09-30 16:56:44 -07:00
jisookim0513 0e50852985 fixed MetadataTaskStorage and handler 2014-09-30 14:09:23 -07:00
Gian Merlino 0781781b99 Merge pull request #766 from metamx/extend-rtr
make the worker selection strategy in remotetaskrunner extendable
2014-09-30 12:52:12 -07:00
fjy fab7caafff final code reviews 2014-09-30 12:32:14 -07:00
fjy 06757034f2 add default impl 2014-09-30 11:54:29 -07:00
fjy 2b2b028e5c why am i so bad at coding 2014-09-30 11:53:18 -07:00
fjy b1b9e0a267 i suck 2014-09-30 11:45:19 -07:00
fjy 575d51b0ce fix compilation error 2014-09-30 11:44:50 -07:00
fjy 4c23a5e9f6 address cr again 2014-09-30 11:40:29 -07:00
fjy 55db06ccb1 address cr 2014-09-30 10:29:02 -07:00
nishantmonu51 358ff915bb fix merge conflicts 2014-09-30 22:19:18 +05:30
nishantmonu51 2789536bed merge changes from druid-0.7.x 2014-09-30 22:05:49 +05:30
nishantmonu51 61c7fd2e6e make ingestOffheap tuneable 2014-09-30 15:30:02 +05:30
Gian Merlino 1e6ce8ac9a TaskLogs fixes and cleanups.
- Fix negative offsets in FileTaskLogs and HdfsTaskLogs.
- Consolidate file offset code into LogUtils (currently used in two places).
- Clean up style for HdfsTaskLogs and related classes.
- Remove unused code in ForkingTaskRunner.
2014-09-29 16:20:34 -07:00
fjy 4a09678739 make the selection strategy in rtr extendable 2014-09-29 14:24:02 -07:00
jisookim0513 74565c9371 cleaned up the code 2014-09-27 13:10:01 -07:00
jisookim0513 6a641621b2 finished merging into druid-0.7.x; derby not working (to be fixed) 2014-09-26 14:24:53 -07:00
jisookim0513 43cc6283d3 trying to revert files that have overwritten changes 2014-09-26 12:38:04 -07:00
fjy eaf0a48b92 Merge branch 'master' into druid-0.7.x
Conflicts:
	cassandra-storage/pom.xml
	common/pom.xml
	examples/pom.xml
	hdfs-storage/pom.xml
	histogram/pom.xml
	indexing-hadoop/pom.xml
	indexing-service/pom.xml
	kafka-eight/pom.xml
	kafka-seven/pom.xml
	pom.xml
	processing/pom.xml
	processing/src/main/java/io/druid/guice/PropertiesModule.java
	rabbitmq/pom.xml
	s3-extensions/pom.xml
	server/pom.xml
	services/pom.xml
2014-09-26 11:39:24 -07:00
jisookim0513 3bf39cc9f8 attempted to fix merge-conflicts 2014-09-24 15:55:42 -07:00
nishantmonu51 f51ab84386 merge changes from druid-0.7.x 2014-09-22 23:48:45 +05:30
jisookim0513 273205f217 initial attempt for abstraction; druid cluster works with Derby as a default 2014-09-19 17:39:59 -07:00
nishantmonu51 8eb6466487 revert buffer size and add back rowFlushBoundary 2014-09-19 23:06:04 +05:30
nishantmonu51 e6d93a3070 fix NPE
fix NPE when the dimension of metric is not present one of the segments
to be reIndexed.
2014-09-17 15:57:58 +05:30
nishantmonu51 f006de8639 fix #732
fix metric list discovery
2014-09-16 22:12:36 +05:30
Xavier Léauté c8b8e3f6e9 negating compare is bad 2014-09-15 13:00:06 -07:00
Xavier Léauté 137ad50bf1 classes that should be static 2014-09-15 13:00:06 -07:00
Xavier Léauté e57e2d97ba make constants final 2014-09-15 12:53:40 -07:00
Xavier Léauté cfa92e8217 fix incorrect nullable annotations 2014-09-15 12:13:52 -07:00
fjy 469ccbbe5e Merge branch 'master' into druid-0.7.x
Conflicts:
	cassandra-storage/pom.xml
	common/pom.xml
	examples/pom.xml
	hdfs-storage/pom.xml
	histogram/pom.xml
	indexing-hadoop/pom.xml
	indexing-service/pom.xml
	kafka-eight/pom.xml
	kafka-seven/pom.xml
	pom.xml
	processing/pom.xml
	processing/src/main/java/io/druid/query/FinalizeResultsQueryRunner.java
	processing/src/main/java/io/druid/query/UnionQueryRunner.java
	processing/src/main/java/io/druid/query/groupby/GroupByQueryRunnerFactory.java
	processing/src/main/java/io/druid/query/topn/TopNQueryEngine.java
	processing/src/main/java/io/druid/query/topn/TopNQueryRunnerFactory.java
	rabbitmq/pom.xml
	s3-extensions/pom.xml
	server/pom.xml
	server/src/test/java/io/druid/server/initialization/JettyTest.java
	services/pom.xml
2014-09-11 16:20:50 -07:00
fjy fec7b43fcb make making v9 segments something completely configurable 2014-09-10 15:28:30 -07:00
fjy 351afb8be7 allow legacy index generator 2014-09-09 17:04:35 -07:00
Xavier Léauté 508e982190 Merge remote-tracking branch 'origin/master' into druid-0.7.x
Conflicts:
	cassandra-storage/pom.xml
	common/pom.xml
	examples/config/historical/runtime.properties
	examples/config/overlord/runtime.properties
	examples/config/realtime/runtime.properties
	examples/pom.xml
	hdfs-storage/pom.xml
	histogram/pom.xml
	indexing-hadoop/pom.xml
	indexing-service/pom.xml
	kafka-eight/pom.xml
	kafka-seven/pom.xml
	pom.xml
	processing/pom.xml
	rabbitmq/pom.xml
	s3-extensions/pom.xml
	server/pom.xml
	server/src/main/java/io/druid/server/ClientQuerySegmentWalker.java
	services/pom.xml
2014-08-30 22:42:36 -07:00
Xavier Léauté 58ab759fc6 remove unused imports 2014-08-29 14:03:47 -07:00
Xavier Léauté ac05836833 make Java 8 javadoc happy 2014-08-29 13:58:50 -07:00
Gian Merlino 8bcb73a0eb Java programs are best run with a java command. 2014-08-28 18:10:01 -07:00
Gian Merlino 5564a67bfc File.pathSeparator is better than ":". 2014-08-28 18:09:33 -07:00
Gian Merlino 68aeafaacd Allow indexing tasks to specify extra classpaths.
This could be used by Hadoop tasks to reference configs for different clusters, assuming
that the possible configs have been pre-distributed to middle managers.
2014-08-28 18:00:26 -07:00
fjy d64879ccca more cleanup 2014-08-20 13:22:42 -07:00
fjy 1614f40f1a fix index task 2014-08-20 13:11:00 -07:00
fjy 92f26d9a1f cleanup rowflushboundary 2014-08-20 13:09:37 -07:00
nishantmonu51 fe105d52ee use bufferSize for IndexTask 2014-08-20 22:41:34 +05:30
nishantmonu51 33354cf7fe replace maxRowsInMemory with BufferSize 2014-08-20 20:59:44 +05:30
nishantmonu51 e525562767 review comments - cleanup ColumnSelectorFactory 2014-08-20 15:04:43 +05:30
nishantmonu51 60906c3244 Revert "make valueType configurable"
This reverts commit 6f60a3f604.
2014-08-20 11:55:26 +05:30
fjy 4fd5479559 fix typo 2014-08-19 12:34:10 -07:00
fjy 77e514688a Merge branch 'druid-0.7.x' into offheap-incremental-index 2014-08-18 11:14:19 -07:00
Xavier Léauté 1fd30ab588 default service/host/port for all nodes 2014-08-15 17:14:05 -07:00
nishantmonu51 6f60a3f604 make valueType configurable 2014-08-13 14:37:57 +05:30
nishantmonu51 1b0a72751b Add support for LongColumn 2014-08-13 08:52:36 +05:30
nishantmonu51 c6712739dc merge changes from druid-0.7.x 2014-08-12 15:47:42 +05:30
fjy 91ebe45b4e support both rejectionPolicy and rejectionPolicyFactory in serde 2014-08-07 10:06:27 -07:00
nishantmonu51 637bd35785 merge changes from druid-0.7.x 2014-07-31 16:07:22 +05:30
Gian Merlino 09fcfc3b6d Fix race in RemoteTaskRunner that could lead to zombie tasks. 2014-07-18 11:41:50 -07:00
nishantmonu51 4ce12470a1 Add way to skip determine partitions for index task
Add a way to skip determinePartitions for IndexTask by manually
specifying numShards.
2014-07-18 18:52:15 +05:30
fjy d8b8826c2e Merge branch 'cleanup-ingest' of github.com:metamx/druid into cleanup-ingest
Conflicts:
	server/src/test/java/io/druid/realtime/firehose/CombiningFirehoseFactoryTest.java
2014-07-17 20:26:11 -07:00
fjy 291f4c00ae Merge branch 'master' of github.com:metamx/druid into cleanup-ingest 2014-07-17 20:24:59 -07:00
nishantmonu51 0e0454a34c switch reingest task to noop & fix compilation
switch back to noop task, its confusing to have a reinvest task that
does nothing.
fix compilation
2014-07-18 06:50:58 +05:30
fjy ded83557dd Merge pull request #640 from metamx/disable-worker
An alternative way to disable middlemanagers based on worker version
2014-07-17 19:07:27 -06:00
fjy beac0be45b fix enabled endpoint 2014-07-17 18:04:36 -07:00
fjy c6078ca841 address code review 2014-07-17 13:34:05 -07:00
fjy ba978d8b79 some minor cleanups to ingest firehose 2014-07-17 13:05:59 -07:00
fjy bc650a1c80 Merge pull request #627 from metamx/druid-firehose
Functionality to ingest a Druid segment and change the schema
2014-07-17 13:41:16 -06:00
fjy 5197ea527a disable middlemanagers based on worker version 2014-07-17 12:35:45 -07:00
nishantmonu51 e59c9ebdbc minor fixes
fix IndexOutOfBoundsException
fix ingestFirehose
2014-07-17 17:24:57 +05:30
nishantmonu51 972c5dac31 improve memory usage and rename firehose 2014-07-14 21:17:53 +05:30
nishantmonu51 f5f05e3a9b Sync changes from branch new-ingestion PR #599
Sync and Resolve Conflicts
2014-07-11 16:15:10 +05:30
nishantmonu51 7168adcca7 Use Ingest task instead of Noop Task 2014-07-08 21:20:17 +05:30
fjy ce478be899 Remove redundant forking task runner config 2014-07-07 18:08:37 -06:00
nishantmonu51 97b58eb193 review comments
1) Rename firehose to IngestSegment
2) fix segment overlapping, intervals
3) fix overshadow
2014-07-04 12:31:33 +05:30