332 Commits

Author SHA1 Message Date
Akash Dwivedi
4b3bd8bd63 Migrating java-util from Metamarkets. (#3585)
* Migrating java-util from Metamarkets.

* checkstyle and updated license on java-util files.

* Removed unused imports from whole project.

* cherry pick metamx/java-util@826021f.

* Copyright changes on java-util pom, address review comments.
2016-10-21 14:57:07 -07:00
David Lim
ca9114b41b add supervisor reset API (#3484)
* add supervisor reset API

* CR doc changes and kill running tasks / clear offsets from supervisor
2016-09-22 17:51:06 -07:00
Gian Merlino
27bd5cb13a Add forceExtendableShardSpecs option to Hadoop indexing, IndexTask. (#3473)
Fixes #3241.
2016-09-21 13:40:04 -06:00
Gian Merlino
7a2a4bc6de JavaScript: Disable now affects worker selection and router strategy too. (#3458) 2016-09-13 16:37:42 -07:00
Dave Li
c4e8440c22 Adds long compression methods (#3148)
* add read

* update deprecated guava calls

* add write and vsizeserde

* add benchmark

* separate encoding and compression

* add header and reformat

* update doc

* address PR comment

* fix buffer order

* generate benchmark files

* separate encoding strategy and format

* fix benchmark

* modify supplier write to channel

* add float NONE handling

* address PR comment

* address PR comment 2
2016-08-30 16:17:46 -07:00
Nishant
4c2b8d29d3 Make RTR assign pending tasks by insertion order (#3405) 2016-08-30 12:22:44 -07:00
Gian Merlino
2f46effc8e FileTaskLogsTest: Throw unthrown exception. (#3352) 2016-08-11 09:40:28 -07:00
kaijianding
50d52a24fc ability to not rollup at index time, make pre aggregation an option (#3020)
* ability to not rollup at index time, make pre aggregation an option

* rename getRowIndexForRollup to getPriorIndex

* fix doc misspelling

* test query using no-rollup indexes

* fix benchmark fail due to jmh bug
2016-08-02 11:13:05 -07:00
Gian Merlino
06624c40c0 Share query handling between Appenderator and RealtimePlumber. (#3248)
Fixes inconsistent metric handling between the two implementations. Formerly,
RealtimePlumber only emitted query/segmentAndCache/time and query/wait and
Appenderator only emitted query/partial/time and query/wait (all per sink).

Now they both do the same thing:
- query/segmentAndCache/time, query/segment/time are the time spent per sink.
- query/cpu/time is the CPU time spent per query.
- query/wait/time is the executor waiting time per sink.

These generally match historical metrics, except segmentAndCache & segment
mean the same thing here, because one Sink may be partially cached and
partially uncached and we aren't splitting that out.
2016-07-19 22:15:13 -05:00
Hyukjin Kwon
55e7a52475 Replace deprecated usage for StringInputRowParser and JSONParseSpec (#3215) 2016-07-14 09:19:17 -07:00
Gian Merlino
ea03906fcf Configurable compressRunOnSerialization for Roaring bitmaps. (#3228)
Defaults to true, which is a change in behavior (this used to be false and unconfigurable).
2016-07-08 10:24:19 +05:30
Xavier Léauté
485e381387 remove datasource from hadoop output path (#3196)
fixes #2083, follow-up to #1702
2016-06-29 08:53:45 -07:00
Hyukjin Kwon
45f553fc28 Replace the deprecated usage of NoneShardSpec (#3166) 2016-06-25 10:27:25 -07:00
David Lim
5a3db634ff add synchronization to SupervisorManager (#3077) 2016-06-07 00:29:23 -06:00
Charles Allen
15ccf451f9 Move QueryGranularity static fields to QueryGranularities (#2980)
* Move QueryGranularity static fields to QueryGranularityUtil
* Fixes #2979

* Add test showing #2979

* change name to QueryGranularities
2016-05-17 16:23:48 -07:00
David Lim
b489f63698 Supervisor for KafkaIndexTask (#2656)
* supervisor for kafka indexing tasks

* cr changes
2016-05-04 23:13:13 -07:00
Gian Merlino
f8ddfb9a4b Split SegmentInsertAction and SegmentTransactionalInsertAction for backwards compat. (#2922)
Fixes #2912.
2016-05-04 13:54:34 -07:00
Himanshu
50065c8288 fix spurious failure of RTR concurrency test (#2915) 2016-05-04 10:30:20 -07:00
Parag Jain
0d745ee120 Basic authorization support in Druid (#2424)
- Introduce `AuthorizationInfo` interface, specific implementations of which would be provided by extensions
- If the `druid.auth.enabled` is set to `true` then the `isAuthorized` method of `AuthorizationInfo` will be called to perform authorization checks
-  `AuthorizationInfo` object will be created in the servlet filters of specific extension and will be passed as a request attribute with attribute name as `AuthConfig.DRUID_AUTH_TOKEN`
- As per the scope of this PR, all resources that needs to be secured are divided into 3 types - `DATASOURCE`, `CONFIG` and `STATE`. For any type of resource, possible actions are  - `READ` or `WRITE`
- Specific ResourceFilters are used to perform auth checks for all endpoints that corresponds to a specific resource type. This prevents duplication of logic and need to inject HttpServletRequest inside each endpoint. For example
 - `DatasourceResourceFilter` is used for endpoints where the datasource information is present after "datasources" segment in the request Path such as `/druid/coordinator/v1/datasources/`, `/druid/coordinator/v1/metadata/datasources/`, `/druid/v2/datasources/`
 - `RulesResourceFilter` is used where the datasource information is present after "rules" segment in the request Path such as `/druid/coordinator/v1/rules/`
 - `TaskResourceFilter` is used for endpoints is used where the datasource information is present after "task" segment in the request Path such as `druid/indexer/v1/task`
 - `ConfigResourceFilter` is used for endpoints like `/druid/coordinator/v1/config`, `/druid/indexer/v1/worker`, `/druid/worker/v1` etc
 - `StateResourceFilter` is used for endpoints like `/druid/broker/v1/loadstatus`, `/druid/coordinator/v1/leader`, `/druid/coordinator/v1/loadqueue`, `/druid/coordinator/v1/rules` etc
- For endpoints where a list of resources is returned like `/druid/coordinator/v1/datasources`, `/druid/indexer/v1/completeTasks` etc. the list is filtered to return only the resources to which the requested user has access. In these cases, `HttpServletRequest` instance needs to be injected in the endpoint method.

Note -
JAX-RS specification provides an interface called `SecurityContext`. However, we did not use this but provided our own interface `AuthorizationInfo` mainly because it provides more flexibility. For example, `SecurityContext` has a method called `isUserInRole(String role)` which would be used for auth checks and if used then the mapping of what roles can access what resource needs to be modeled inside Druid either using some convention or some other means which is not very flexible as Druid has dynamic resources like datasources. Fixes #2355 with PR #2424
2016-04-28 16:50:28 -07:00
Himanshu
9669e79df2 fix misleading error log due to race in RTR and concurrency test (#2878) 2016-04-28 10:28:00 -07:00
Nishant
c29cb7d711 add pending task based resource management strategy (#2086) 2016-04-27 10:40:53 -07:00
Nishant
bf5e5e7b75 fix #2886 (#2887)
Fixes https://github.com/druid-io/druid/issues/2886
2016-04-27 08:29:41 -07:00
David Lim
7641f2628f add control and status endpoints to KafkaIndexTask (#2730) 2016-04-21 15:34:59 -07:00
Nishant
dbf63f738f Add ability to filter segments for specific dataSources on broker without creating tiers (#2848)
* Add back FilteredServerView removed in a32906c7fd11c9a8554df2621a172353a523a9dd to reduce memory usage using watched tiers.

* Add functionality to specify "druid.broker.segment.watchedDataSources"
2016-04-19 10:10:06 -07:00
jon-wei
0e481d6f93 Allow filters to use extraction functions 2016-04-05 13:24:56 -07:00
jon-wei
a59c9ee1b1 Support use of DimensionSchema class in DimensionsSpec 2016-03-21 13:12:04 -07:00
Nishant
9cceff2274 Use ImmutableWorkerInfo instead of ZKWorker
review comments

add test for equals and hashcode
2016-03-14 11:17:15 -07:00
Himanshu
d51a0a0cf4 Merge pull request #2220 from gianm/appenderator-kafka
Appenderators, DataSource metadata, KafkaIndexTask
2016-03-14 13:14:36 -05:00
Gian Merlino
187569e702 DataSource metadata.
Geared towards supporting transactional inserts of new segments. This involves an
interface "DataSourceMetadata" that allows combining of partially specified metadata
(useful for partitioned ingestion).

DataSource metadata is stored in a new "dataSource" table.
2016-03-10 17:41:50 -08:00
Gian Merlino
3d2214377d Appenderatoring.
Appenderators are a way of getting more control over the ingestion process
than a Plumber allows. The idea is that existing Plumbers could be implemented
using Appenderators, but you could also implement things that Plumbers can't do.

FiniteAppenderatorDrivers help simplify indexing a finite stream of data.

Also:
- Sink: Ability to consider itself "finished" vs "still writable".
- Sink: Ability to return the number of rows contained within the sink.
2016-03-10 17:41:50 -08:00
Gian Merlino
92c828f904 Make SegmentHandoffNotifier Closeable. 2016-03-10 16:50:37 -08:00
Gian Merlino
8a11161b20 Plumbers: Move plumber.add out of try/catch for ParseException.
The incremental indexes handle that now so it's not necessary.

Also, add debug logging and more detailed exceptions to the incremental
indexes for the case where there are parse exceptions during aggregation.
2016-03-10 16:39:26 -08:00
Himanshu Gupta
0402636598 configurable handoffConditionTimeout in realtime tasks for segment handoff wait 2016-03-05 10:14:54 -06:00
Gian Merlino
a355bfb7a9 Fix default TaskResource for RealtimeIndexTasks.
It was supposed to be the same as the task id, but it wasn't because
"makeTaskId" has a random component.
2016-03-01 16:54:22 -08:00
Charles Allen
c6803c4364 Allow specifying peon javaOpts as an array 2016-02-26 13:24:35 -08:00
Himanshu Gupta
bc156effe7 RTR has multiple threads for assignment of pending tasks now. 2016-02-26 09:27:03 -06:00
Gian Merlino
cf0bc905fb Plumb task peon host/ports back out to the overlord.
- Add TaskLocation class
- Add registerListener to TaskRunner
- Add getLocation to TaskRunnerWorkItem
- Implement location tracking in existing TaskRunners
- Rework WorkerTaskMonitor to do management out of a single thread so it can
  handle status and location updates more simply.
2016-02-24 15:13:10 -08:00
Nishant
fb7eae34ed Merge pull request #2249 from metamx/workerExpanded
Use Worker instead of ZkWorker whenever possible
2016-02-24 13:23:22 +05:30
Charles Allen
ac13a5942a Use Worker instead of ZkWorker whenver possible
* Moves last run task state information to Worker
* Makes WorkerTaskRunner a TaskRunner which has interfaces to help with getting information about a Worker
2016-02-23 15:02:03 -08:00
Gian Merlino
3534483433 Better handling of ParseExceptions.
Two changes:
- Allow IncrementalIndex to suppress ParseExceptions on "aggregate".
- Add "reportParseExceptions" option to realtime tuning configs. By default this is "false".

Behavior of the counters should now be:

- processed: Number of rows indexed, including rows where some fields could be parsed and some could not.
- thrownAway: Number of rows thrown away due to rejection policy.
- unparseable: Number of rows thrown away due to being completely unparseable (no fields salvageable at all).

If "reportParseExceptions" is true then "unparseable" will always be zero (because a parse error would
cause an exception to be thrown). In addition, "processed" will only include fully parseable rows
(because even partial parse failures will cause exceptions to be thrown).

Fixes #2510.
2016-02-23 10:11:43 -08:00
Charles Allen
9dff0e5dbd Add hadoop classloader isolation tests for HadoopTask 2016-02-19 11:15:53 -08:00
Andrés Gomez
07d714b1b5 Fixed equal distribution strategy when exist disable middleManager with same currCapacityUsed. 2016-02-17 08:40:42 +01:00
Himanshu
5779b32742 Merge pull request #2439 from metamx/fix2435
Make QuotableWhiteSpaceSplitter able to take JSON
2016-02-11 13:14:43 -06:00
Charles Allen
3a6452c6d4 Make QuotableWhiteSpaceSplitter able to take json
* Fixes #2435
2016-02-10 16:42:14 -08:00
Gian Merlino
fa92b77f5a Harmonize znode writing code in RTR and Worker.
- Throw most exceptions rather than suppressing them, which should help
  detect problems. Continue suppressing exceptions that make sense to
  suppress.
- Handle payload length checks consistently, and improve error message.
- Remove unused WorkerCuratorCoordinator.announceTaskAnnouncement method.
- Max znode length should be int, not long.
- Add tests.
2016-02-10 14:52:00 -08:00
Parag Jain
7853a9cc41 clean up TaskLifecycleTest 2016-01-31 11:19:20 -06:00
Gian Merlino
5fd4b79373 RealtimeIndexTask: Fix NPE caused by calling stopGracefully before a firehose had been connected. 2016-01-29 11:20:23 -08:00
Charles Allen
2e1d6aaf3d Use thread priorities. (aka set nice values for background-like tasks)
* Defaults the thread priority to java.util.Thread.NORM_PRIORITY in io.druid.indexing.common.task.AbstractTask
 * Each exec service has its own Task Factory which is assigned a priority for spawned task. Therefore each priority class has a unique exec service
 * Added priority to tasks as taskPriority in the task context. <0 means low, 0 means take default, >0 means high. It is up to any particular implementation to determine how to handle these numbers
 * Add options to ForkingTaskRunner
    * Add "-XX:+UseThreadPriorities" default option
    * Add "-XX:ThreadPriorityPolicy=42" default option
 * AbstractTask - Removed unneded @JsonIgnore on priority
 * Added priority to RealtimePlumber executors. All sub-executors (non query runners) get Thread.MIN_PRIORITY
 * Add persistThreadPriority and mergeThreadPriority to realtime tuning config
2016-01-20 14:00:31 -08:00
Nishant
ac6c90e657 Merge pull request #1953 from metamx/taskRunnerResourceManagement
Move resource managemnt to be the responsibility of the TaskRunner
2016-01-20 22:27:47 +05:30
Jonathan Wei
df2906a91c Merge pull request #2290 from gianm/index-merger-v9-stuff
Respect buildV9Directly in PlumberSchools, so it works on standalone realtime.
2016-01-19 13:04:00 -08:00