1367 Commits

Author SHA1 Message Date
David Lim
b489f63698 Supervisor for KafkaIndexTask (#2656)
* supervisor for kafka indexing tasks

* cr changes
2016-05-04 23:13:13 -07:00
Gian Merlino
f8ddfb9a4b Split SegmentInsertAction and SegmentTransactionalInsertAction for backwards compat. (#2922)
Fixes #2912.
2016-05-04 13:54:34 -07:00
Himanshu
50065c8288 fix spurious failure of RTR concurrency test (#2915) 2016-05-04 10:30:20 -07:00
Charles Allen
3f71a4a302 Fix missing log arguments in PendingTaskBasedWorkerResourceManagementStrategy (#2898) 2016-04-28 18:15:41 -07:00
Parag Jain
0d745ee120 Basic authorization support in Druid (#2424)
- Introduce `AuthorizationInfo` interface, specific implementations of which would be provided by extensions
- If the `druid.auth.enabled` is set to `true` then the `isAuthorized` method of `AuthorizationInfo` will be called to perform authorization checks
-  `AuthorizationInfo` object will be created in the servlet filters of specific extension and will be passed as a request attribute with attribute name as `AuthConfig.DRUID_AUTH_TOKEN`
- As per the scope of this PR, all resources that needs to be secured are divided into 3 types - `DATASOURCE`, `CONFIG` and `STATE`. For any type of resource, possible actions are  - `READ` or `WRITE`
- Specific ResourceFilters are used to perform auth checks for all endpoints that corresponds to a specific resource type. This prevents duplication of logic and need to inject HttpServletRequest inside each endpoint. For example
 - `DatasourceResourceFilter` is used for endpoints where the datasource information is present after "datasources" segment in the request Path such as `/druid/coordinator/v1/datasources/`, `/druid/coordinator/v1/metadata/datasources/`, `/druid/v2/datasources/`
 - `RulesResourceFilter` is used where the datasource information is present after "rules" segment in the request Path such as `/druid/coordinator/v1/rules/`
 - `TaskResourceFilter` is used for endpoints is used where the datasource information is present after "task" segment in the request Path such as `druid/indexer/v1/task`
 - `ConfigResourceFilter` is used for endpoints like `/druid/coordinator/v1/config`, `/druid/indexer/v1/worker`, `/druid/worker/v1` etc
 - `StateResourceFilter` is used for endpoints like `/druid/broker/v1/loadstatus`, `/druid/coordinator/v1/leader`, `/druid/coordinator/v1/loadqueue`, `/druid/coordinator/v1/rules` etc
- For endpoints where a list of resources is returned like `/druid/coordinator/v1/datasources`, `/druid/indexer/v1/completeTasks` etc. the list is filtered to return only the resources to which the requested user has access. In these cases, `HttpServletRequest` instance needs to be injected in the endpoint method.

Note -
JAX-RS specification provides an interface called `SecurityContext`. However, we did not use this but provided our own interface `AuthorizationInfo` mainly because it provides more flexibility. For example, `SecurityContext` has a method called `isUserInRole(String role)` which would be used for auth checks and if used then the mapping of what roles can access what resource needs to be modeled inside Druid either using some convention or some other means which is not very flexible as Druid has dynamic resources like datasources. Fixes #2355 with PR #2424
2016-04-28 16:50:28 -07:00
Himanshu
9669e79df2 fix misleading error log due to race in RTR and concurrency test (#2878) 2016-04-28 10:28:00 -07:00
Nishant
c29cb7d711 add pending task based resource management strategy (#2086) 2016-04-27 10:40:53 -07:00
Nishant
bf5e5e7b75 fix #2886 (#2887)
Fixes https://github.com/druid-io/druid/issues/2886
2016-04-27 08:29:41 -07:00
David Lim
7641f2628f add control and status endpoints to KafkaIndexTask (#2730) 2016-04-21 15:34:59 -07:00
Nishant
dbf63f738f Add ability to filter segments for specific dataSources on broker without creating tiers (#2848)
* Add back FilteredServerView removed in a32906c7fd11c9a8554df2621a172353a523a9dd to reduce memory usage using watched tiers.

* Add functionality to specify "druid.broker.segment.watchedDataSources"
2016-04-19 10:10:06 -07:00
Gian Merlino
08c784fbf6 KafkaIndexTask: Use a separate sequence per Kafka partition in order to make (#2844)
segment creation deterministic.

This means that each segment will contain data from just one Kafka
partition. So, users will probably not want to have a super high number
of Kafka partitions...

Fixes #2703.
2016-04-18 22:29:52 -07:00
jon-wei
0e481d6f93 Allow filters to use extraction functions 2016-04-05 13:24:56 -07:00
Fangjin Yang
1e02eeab13 Merge pull request #2683 from metamx/default_retry
Better defaults for Retry policy for task actions
2016-03-29 08:02:59 -07:00
Gian Merlino
195c9c5240 Overlord: Avoid a scary Jersey warning.
Avoids the following message from being printed on Overlord startup:

WARNING: Parameter 1 of type io.druid.indexing.common.actions.TaskActionHolder<T> from
public <T> javax.ws.rs.core.Response io.druid.indexing.overlord.http.OverlordResource.doAction
(io.druid.indexing.common.actions.TaskActionHolder<T>) is not resolvable to a concrete type
2016-03-28 19:08:56 -07:00
Fangjin Yang
c2284929dc Merge pull request #2739 from gianm/fix-wtmtest-failure
Fix handling of InterruptedException in WorkerTaskMonitor's mainLoop.
2016-03-28 14:52:10 -07:00
Gian Merlino
ee4bb96855 Fix handling of InterruptedException in WorkerTaskMonitor's mainLoop.
I believe this will fix #2664.
2016-03-25 12:17:33 -07:00
Himanshu Gupta
004b00bb96 config to explicitly specify classpath for hadoop container during hadoop ingestion 2016-03-25 10:51:28 -05:00
Himanshu
00d7021291 Merge pull request #2607 from jon-wei/dim_schema
Support use of DimensionSchema class in DimensionsSpec
2016-03-22 11:53:46 -05:00
Himanshu
3220b109ad Merge pull request #2570 from binlijin/single_dimension_partitioning
Single dimension hash-based partitioning
2016-03-22 11:51:06 -05:00
binlijin
bce600f5d5 Single dimension hash-based partitioning 2016-03-22 13:15:33 +08:00
jon-wei
a59c9ee1b1 Support use of DimensionSchema class in DimensionsSpec 2016-03-21 13:12:04 -07:00
Nishant
ed8f39fcfe Better defaults for Retry policy for task actions
This PR changes the retry of task actions to be a bit more aggressive
by reducing the maxWait. Current defaults were 1 min to 10 mins, which
lead to a very delayed recovery in case there are any transient network
issues between the overlord and the peons.

doc changes.
2016-03-18 11:59:55 -07:00
Charles Allen
c716af5b04 Merge pull request #2678 from metamx/fixImports
Fix some google related imports
2016-03-17 11:53:16 -07:00
Charles Allen
a52c6d3bee Fix some google related imports 2016-03-17 11:03:29 -07:00
Gian Merlino
738dcd8cd9 Update version to 0.9.1-SNAPSHOT.
Fixes #2462
2016-03-17 10:34:20 -07:00
Nishant
9cceff2274 Use ImmutableWorkerInfo instead of ZKWorker
review comments

add test for equals and hashcode
2016-03-14 11:17:15 -07:00
Himanshu
d51a0a0cf4 Merge pull request #2220 from gianm/appenderator-kafka
Appenderators, DataSource metadata, KafkaIndexTask
2016-03-14 13:14:36 -05:00
Nishant
cf7f6da392 Merge pull request #2634 from gianm/stopGracefully-avoid-interrupt
ThreadPoolTaskRunner: Make graceful shutdown logs less scary.
2016-03-11 16:36:10 -08:00
Charles Allen
a3f0048ea4 Merge pull request #2631 from gianm/plumbers-rpe
Better logging for ParseExceptions on index aggregation, and remove unnecessary exception handling.
2016-03-11 14:22:58 -08:00
Gian Merlino
79a95f7789 WorkerTaskMonitor: stop() waits for mainLoop to exit.
Fixes #2637.
2016-03-11 11:40:13 -08:00
Gian Merlino
05397a9b4f ThreadPoolTaskRunner: Make graceful shutdown logs less scary.
- It's okay to suppress InterruptedException during graceful shutdown, as
  tasks may use it to accelerate their own shutdown.
- It's okay to ignore return statuses during graceful shutdown (which may
  be FAILED!) because it actually doesn't matter what they are.
2016-03-11 07:49:29 -08:00
Gian Merlino
187569e702 DataSource metadata.
Geared towards supporting transactional inserts of new segments. This involves an
interface "DataSourceMetadata" that allows combining of partially specified metadata
(useful for partitioned ingestion).

DataSource metadata is stored in a new "dataSource" table.
2016-03-10 17:41:50 -08:00
Gian Merlino
3d2214377d Appenderatoring.
Appenderators are a way of getting more control over the ingestion process
than a Plumber allows. The idea is that existing Plumbers could be implemented
using Appenderators, but you could also implement things that Plumbers can't do.

FiniteAppenderatorDrivers help simplify indexing a finite stream of data.

Also:
- Sink: Ability to consider itself "finished" vs "still writable".
- Sink: Ability to return the number of rows contained within the sink.
2016-03-10 17:41:50 -08:00
Gian Merlino
08284fea62 Publish test-jar for indexing-service. 2016-03-10 16:50:37 -08:00
Gian Merlino
92c828f904 Make SegmentHandoffNotifier Closeable. 2016-03-10 16:50:37 -08:00
Gian Merlino
8a11161b20 Plumbers: Move plumber.add out of try/catch for ParseException.
The incremental indexes handle that now so it's not necessary.

Also, add debug logging and more detailed exceptions to the incremental
indexes for the case where there are parse exceptions during aggregation.
2016-03-10 16:39:26 -08:00
Charles Allen
d299540efc Make HadoopTask load hadoop dependency classes LAST for local isolated classrunner 2016-03-10 10:18:23 -08:00
Himanshu Gupta
0402636598 configurable handoffConditionTimeout in realtime tasks for segment handoff wait 2016-03-05 10:14:54 -06:00
Gian Merlino
e9c23bf376 OverlordResource: Use getZkWorkers on RemoteTaskRunner.
Restores old behavior of this api, from before #2249 when getWorkers returned ZkWorkers.
2016-03-02 17:31:34 -08:00
Fangjin Yang
80d954578d Merge pull request #2572 from gianm/fix-rit-taskresource
Fix default TaskResource for RealtimeIndexTasks.
2016-03-02 10:20:27 -08:00
Gian Merlino
acd95d3e28 TaskLocation: Add toString method.
Necessary because these objects are used in log messages.
2016-03-01 17:52:06 -08:00
Gian Merlino
a355bfb7a9 Fix default TaskResource for RealtimeIndexTasks.
It was supposed to be the same as the task id, but it wasn't because
"makeTaskId" has a random component.
2016-03-01 16:54:22 -08:00
Björn Zettergren
2462c82c0e New defaults for maxRowsInMemory rowFlushBoundary
To bring consistency to docs and source this commit changes the default
values for maxRowsInMemory and rowFlushBoundary to 75000 after
discussion in PR https://github.com/druid-io/druid/pull/2457.

The previous default was 500000 and it's lower now on the grounds that
it's better for a default to be somewhat less efficient, and work,
than to reach for the stars and possibly result in
"OutOfMemoryError: java heap space" errors.
2016-03-01 13:50:28 +01:00
Charles Allen
c6803c4364 Allow specifying peon javaOpts as an array 2016-02-26 13:24:35 -08:00
Himanshu Gupta
bc156effe7 RTR has multiple threads for assignment of pending tasks now. 2016-02-26 09:27:03 -06:00
Fangjin Yang
53a5f07c14 Merge pull request #2544 from metamx/fixMaxPort
Limit PortFinder to 0xFFFF
2016-02-25 17:12:53 -08:00
Fangjin Yang
143e85eaa5 Merge pull request #2419 from gianm/task-hostports
Plumb task peon host/ports back out to the overlord.
2016-02-25 17:11:53 -08:00
Charles Allen
3fa7a7ebfe Limit PortFinder to 0xFFFF 2016-02-25 08:16:40 -08:00
Charles Allen
187b788089 UnRegister port in ForkingTaskRunner 2016-02-25 08:04:25 -08:00
Gian Merlino
cf0bc905fb Plumb task peon host/ports back out to the overlord.
- Add TaskLocation class
- Add registerListener to TaskRunner
- Add getLocation to TaskRunnerWorkItem
- Implement location tracking in existing TaskRunners
- Rework WorkerTaskMonitor to do management out of a single thread so it can
  handle status and location updates more simply.
2016-02-24 15:13:10 -08:00