druid

Commit Graph

Author	SHA1	Message	Date
Roman Leventov	b2865b7c7b	Make possible to start Peon without DI loading of any querying-related stuff (#4516 ) * Make QueryRunnerFactoryConglomerate injection lazy in TaskToolbox/TaskToolboxFactory * Extract QueryablePeonModule and add druid.modules.excludeList config * Typo	2017-07-12 13:18:25 -05:00
Jihoon Son	6d2df2a542	Fix duplicated locks after sync from storage (#4521 ) * Fix duplicated locks after sync from storage * Remove unnecessary table creation	2017-07-11 10:10:11 -07:00
Akash Dwivedi	5f411f14af	Timeout for LockAcquireAction (#4461 ) * Timeout for LockAcquireAction * Static inner class. * Rebase changes. * makeAlert and throw exception incase of overlapping interval. * Addressed comments. * remove unused import. * Addressed comments	2017-07-11 18:59:32 +09:00
Jihoon Son	cc20260078	Early publishing segments in the middle of data ingestion (#4238 ) * Early publishing segments in the middle of data ingestion * Remove unnecessary logs * Address comments * Refactoring the patch according to #4292 and address comments * Set the total shard number of NumberedShardSpec to 0 * refactoring * Address comments * Fix tests * Address comments * Fix sync problem of committer and retry push only * Fix doc * Fix build failure * Address comments * Fix compilation failure * Fix transient test failure	2017-07-10 22:35:36 -07:00
Jihoon Son	8ed25acc15	Fix a bug for CSVParser/DelimitedParser when empty column exists in the header row (#4504 ) * Fix a bug when empty column exists in header row * Address comments	2017-07-07 16:19:25 -07:00
Parag Jain	6e2f78f552	TLS support (#4270 )	2017-07-06 17:40:12 -07:00
Roman Leventov	9ae457f7ad	Avoid using the default system Locale and printing to System.out in production code (#4409 ) * Avoid usages of Default system Locale and printing to System.out or System.err in production code * Fix Charset in DruidKerberosUtil * Remove redundant string format in GenericIndexed * Rename StringUtils.safeFormat() to unimportantSafeFormat(); add StringUtils.format() which fails as well as String.format() * Fix testSafeFormat() * More fixes of redundant StringUtils.format() inside ISE * Rename unimportantSafeFormat() to nonStrictFormat()	2017-06-29 14:06:19 -07:00
Jihoon Son	e3c13c246a	Respect reportParseExceptions option in IndexTask.determineShardSpecs() (#4467 ) * Respect reportParseExceptions option in IndexTask.determineShardSpecs() * Fix typo	2017-06-27 10:28:22 -07:00
Roman Leventov	05d58689ad	Remove the ability to create segments in v8 format (#4420 ) * Remove ability to create segments in v8 format * Fix IndexGeneratorJobTest * Fix parameterized test name in IndexMergerTest * Remove extra legacy merging stuff * Remove legacy serializer builders * Remove ConciseBitmapIndexMergerTest and RoaringBitmapIndexMergerTest	2017-06-26 13:21:39 -07:00
Jihoon Son	b37c9b5fe0	Fix a bug of CSV/TSV parsers when extracting columns from header (#4443 ) * Reset fieldNames whenever a new file begins * Fix test failure * Fix test failure	2017-06-23 14:29:26 -07:00
Goh Wei Xiang	f68a0693f3	Allow use of non-threadsafe ObjectCachingColumnSelectorFactory (#4397 ) * Adding a flag to indicate when ObjectCachingColumnSelectorFactory need not be threadsafe. * - Use of computeIfAbsent over putIfAbsent - Replace Maps.newXXXMap() with normal instantiation - Documentations on when is thread-safe required. - Use Builders for On/OffheapIncrementalIndex * - Optimization on computeIfAbsent - Constant EMPTY DimensionsSpec - Improvement on IncrementalIndexSchema.Builder - Remove setting of default values - Use var args for metrics - Correction on On/OffheapIncrementalIndex Builders - Combine On/OffheapIncrementalIndex Builders * - Removing unused imports. * - Helper method for testing with IncrementalIndex.Builder * - Correction on javadoc. * Style fix	2017-06-16 16:04:19 -05:00
Gian Merlino	1f2afccdf8	Expressions: Add ExprMacros. (#4365 ) * Expressions: Add ExprMacros, which have the same syntax as functions, but can convert themselves to any kind of Expr at parse-time. ExprMacroTable is an extension point for adding new ExprMacros. Anything that might need to parse expressions needs an ExprMacroTable, which can be injected through Guice. * Address code review comments.	2017-06-08 09:32:10 -04:00
Roman Leventov	63a897c278	Enable most IntelliJ 'Probable bugs' inspections (#4353 ) * Enable most IntelliJ 'Probable bugs' inspections * Fix in RemoteTestNG * Fix IndexSpec's equals() and hashCode() to include longEncoding * Fix inspection errors * Extract global isntance of natural().nullsFirst(); address comments * Fix * Use noinspection comments instead of SuppressWarnings on method for IntelliJ-specific inspections * Prohibit Ordering.natural().nullsFirst() using Checkstyle	2017-06-07 09:54:25 -07:00
Roman Leventov	31d33b333e	Make using implicit system Charset an error (#4326 ) * Make using implicit system charset an error * Use StringUtils.toUtf8() and fromUtf8() instead of String.getBytes() and new String() * Use English locale in StringUtils.safeFormat() * Restore comment	2017-06-05 23:57:25 -07:00
David Lim	13ecf90923	Report Kafka lag information in supervisor status report (#4314 ) * refactor lag reporting and report lag at status endpoint * refactor offset reporting logic to fetch offsets periodically vs. at request time * remove JavaCompatUtils * code review changes * code review changes	2017-06-05 13:26:25 -07:00
Slim	a2584d214a	Delagate creation of segmentPath/LoadSpec to DataSegmentPushers and add S3a support (#4116 ) * Adding s3a schema and s3a implem to hdfs storage module. * use 2.7.3 * use segment pusher to make loadspec * move getStorageDir and makeLoad spec under DataSegmentPusher * fix uts * fix comment part1 * move to hadoop 2.8 * inject deep storage properties * set version to 2.7.3 * fix build issue about static class * fix comments * fix default hadoop default coordinate * fix create filesytem * downgrade aws sdk * bump the version	2017-06-04 00:55:09 -06:00
Jihoon Son	f876246af7	Rename FiniteAppenderatorDriver to AppenderatorDriver (#4356 )	2017-06-03 00:48:44 +09:00
Jihoon Son	1150bf7a2c	Refactoring Appenderator Driver (#4292 ) * Refactoring Appenderator 1) Added publishExecutor and handoffExecutor for background publishing and handing segments off 2) Change add() to not move segments out in it * Address comments 1) Remove publishTimeout for KafkaIndexTask 2) Simplifying registerHandoff() 3) Add increamental handoff test * Remove unused variable * Add persist() to Appenderator and more tests for AppenderatorDriver * Remove unused imports * Fix strict build * Address comments	2017-06-02 07:09:11 +09:00
chaoqiang	5fc4abcf71	fix equalDistribution worker select strategy (#4318 ) * fix equalDistribution worker select strategy * replace anonymous Comparator * keep previous version sorting comment * fix code style * update comment * move JsonProperty	2017-05-25 13:30:42 +09:00
Gian Merlino	adeecc0e72	Add /isLeader call to overlord and coordinator. (#4282 ) This is useful for putting them behind load balancers or proxies, as it lets the load balancer know which server is currently active through an http health check. Also makes the method naming a little more consistent between coordinator and overlord code.	2017-05-18 20:46:13 -05:00
Jihoon Son	733dfc9b30	Add PrefetchableTextFilesFirehoseFactory for cloud storage types (#4193 ) * Add PrefetcheableTextFilesFirehoseFactory * fix comment * exception handling * Fix wrong json property * Remove ReplayableFirehoseFactory and fix misspelling * Defer object initialization * Add a temporaryDirectory parameter to FirehoseFactory.connect() * fix when cache and fetch are disabled * Address comments * Add more test * Increase timeout for test * Add wrapObjectStream * Move methods to Firehose from PrefetchableFirehoseFactory * Cleanup comment * add directory listing to s3 firehose * Rename a variable * Addressing comments * Update document * Support disabling prefetch * Fix race condition * Add fetchLock * Remove ReplayableFirehoseFactoryTest * Fix compilation error * Fix test failure * Address comments * Add default implementation for new method	2017-05-18 15:37:18 +09:00
Himanshu	daa8ef8658	Optional long-polling based segment announcement via HTTP instead of Zookeeper (#3902 ) * Optional long-polling based segment announcement via HTTP instead of Zookeeper * address review comments * make endpoint /druid-internal/v1 instead of /druid/internal so that jetty qos filters can be configured easily when needed * update segment callback initialization to be called only after first segment list fetch has been succeeded from all servers * address review comments * remove size check not required anymore as only segment servers announce themselves and not all peon processes * annouce segment server on historical only after cached segments are loaded * fix checkstyle errors	2017-05-17 16:31:58 -05:00
Roman Leventov	b7a52286e8	Make @Override annotation obligatory (#4274 ) * Make MissingOverride an error * Make travis stript to fail fast * Add missing Override annotations * Comment	2017-05-16 13:30:30 -05:00
Benedict Jin	e823085866	Improve `collection` related things that reusing a immutable object instead of creating a new object (#4135 )	2017-05-17 01:38:51 +09:00
Jihoon Son	50a4ec2b0b	Add support for headers and skipping thereof for CSV and TSV (#4254 ) * initial commit * small fixes * fix bug * fix bug * address code review * more cr * more cr * more cr * fix * Skip head rows for CSV and TSV * Move checking skipHeadRows to FileIteratingFirehose * Remove checking null iterators * Remove unused imports * Address comments * Fix compilation error * Address comments * Add more tests * Add a comment to ReplayableFirehose * Addressing comments * Add docs and fix typos	2017-05-15 22:57:31 -07:00
Roman Leventov	1ebfa22955	Update Error prone configuration; Fix bugs (#4252 ) * Make Errorprone the default compiler * Address comments * Make Error Prone's ClassCanBeStatic rule a error * Preconditions allow only %s pattern * Fix DruidCoordinatorBalancerTester * Try to give the compiler more memory * Remove distribution module activation on jdk 1.8 because only jdk 1.8 is used now * Don't show compiler warnings * Try different travis script * Fix travis.yml * Make Error Prone optional again * For error-prone compiler * Increase compiler's maxmem * Don't run Error Prone for benchmarks because of OOM * Skip install step in Travis * Remove MetricHolder.writeToChannel() * In travis.yml, check compilation before tests, because it may fail faster	2017-05-12 15:55:17 +09:00
Himanshu	462f6482df	optionally add extensions to explicitly specified hadoopContainerClassPath (#4230 ) * optionally add extensions to explicitly specified hadoopContainerClassPath * note extensions always pushed in hadoop container when druid.extensions.hadoopContainerDruidClasspath is not provided explicitly	2017-05-08 14:24:14 -05:00
Gian Merlino	f0fd8ba191	Add supervisors to overlord console. (#4248 )	2017-05-04 11:13:12 -07:00
Roman Leventov	15f3a94474	Copy closer into Druid codebase (fixes #3652 ) (#4153 )	2017-04-10 09:38:45 +09:00
Parag Jain	7e0d4c9555	secure supervisor endpoints (#3985 )	2017-04-05 16:42:32 -07:00
JackyWoo	a0f2cf05d5	Add EqualDistributionWithAffinityWorkerSelectStrategy which balance w… (#3998 ) * Add EqualDistributionWithAffinityWorkerSelectStrategy which balance work load within affinity workers. * add docs to equalDistributionWithAffinity	2017-03-25 19:15:49 -07:00
Himanshu	de081c711b	RealtimeIndexTask to support alertTimeout in context (#4089 ) * RealtimeIndexTask to support alertTimeout in context and raise alert if task process exists after the timeout * move alertTimeout config to tuningConfig and document	2017-03-24 12:48:12 -07:00
Gian Merlino	b4289c0004	Remove "granularity" from IngestSegmentFirehose. (#4110 ) It wasn't doing anything useful (the sequences were being concatted, and cursor.getTime() wasn't being called) and it defaulted to Granularities.NONE. Changing it to Granularities.ALL gave me a 700x+ performance boost on a small dataset I was reindexing (2m27s to 365ms). Most of that was from avoiding making a lot of unnecessary column selectors.	2017-03-24 10:28:54 -07:00
Zhihui Jiao	6febcd9f24	Fix IngestSegmentFirehoseFactory (#4069 )	2017-03-17 16:57:25 -06:00
Parag Jain	c155d9a5e9	increase kill timeout (#4002 )	2017-03-08 09:00:34 -08:00
kaijianding	19ac1c7c2c	Add SameIntervalMergeTask for easier usage of MergeTask (#3981 ) * Add SameIntervalMergeTask for easier usage of MergeTask * fix a bug and add ut * remove same_interval_merge_sub from Task.java and remove other no needed code	2017-03-06 11:21:32 -06:00
Roman Leventov	ea1f5b7954	LifecycleLock for better synchronization in lifecycled objects (#3964 ) * Introduce LifecycleLock * Add LifecycleLockTest * Rename LifecycleLock.release() to exitStart() * Rewrite LifecycleLock using AbstractQueuedSynchronizer for more safety, added tests * Add LifecycleLock.exitStop() and reset() * Add LifecycleLock.awaitStarted(timeout) * Braces * Fix	2017-03-02 12:22:57 -08:00
Akash Dwivedi	94da5e80f9	Namespace optimization for hdfs data segments. (#3877 ) * NN optimization for hdfs data segments. * HdfsDataSegmentKiller, HdfsDataSegment finder changes to use new storage format.Docs update. * Common utility function in DataSegmentPusherUtil. * new static method `makeSegmentOutputPathUptoVersionForHdfs` in JobHelper * reuse getHdfsStorageDirUptoVersion in DataSegmentPusherUtil.getHdfsStorageDir() * Addressed comments. * Review comments. * HdfsDataSegmentKiller requested changes. * extra newline * Add maprfs.	2017-03-01 09:51:20 -08:00
praveev	5ccfdcc48b	Fix testDeadlock timeout delay (#3979 ) * No more singleton. Reduce iterations * Granularities * Fix the delay in the test * Add license header * Remove unused imports * Lot more unused imports from all the rearranging * CR feedback * Move javadoc to constructor	2017-02-28 12:51:41 -06:00
kaijianding	ef6a19c81b	buildV9Directly in MergeTask and AppendTask (#3976 ) * buildV9Directly in MergeTask and AppendTask * add doc	2017-02-28 10:04:32 -08:00
Parag Jain	469ae374a3	add kill task link on console (#3974 ) * add kill task link on console * refresh after kill	2017-02-25 14:58:16 +05:30
praveev	c3bf40108d	One granularity (#3850 ) * Refactor Segment Granularity * Beginning of one granularity * Copy the fix for custom periods in segment-grunalrity over here. * Remove the custom serialization for now. * Compilation cleanup * Reformat code * Fixing unit tests * Unify to use a single iterable * Backward compatibility for rolling upgrade * Minor check style. Cosmetic changes. * Rename length and millis to duration * CR feedback * Minor changes.	2017-02-25 01:02:29 -06:00
David Lim	3c54fc912a	fix numShards = -1 not being handled correctly (#3937 )	2017-02-14 18:45:38 -08:00
Himanshu	9dfcf0763a	disable javascript execution by default (#3818 )	2017-02-13 15:11:18 -08:00
Himanshu	8cf7ad1e3a	druid.coordinator.asOverlord.enabled flag at coordinator to make it an overlord too (#3711 )	2017-02-13 15:03:59 -08:00
Parag Jain	8e31a465ad	report hand off count finite appenderator driver (#3925 )	2017-02-13 10:41:24 -08:00
DaimonPl	93b71e265e	Extract HLL related code to separate module (#3900 )	2017-02-03 09:45:11 -08:00
Parag Jain	1aabb45a09	auto reset option for Kafka Indexing service (#3842 ) * auto reset option for Kafka Indexing service in case message at the offset being fetched is not present anymore at kafka brokers * review comments * review comments * reverted last change * review comments * review comments * fix typo	2017-02-02 14:57:45 -06:00
David Lim	ff52581bd3	IndexTask improvements (#3611 ) * index task improvements * code review changes * add null check	2017-01-18 14:24:37 -08:00
Jihoon Son	d80bec83cc	Enable auto license checking (#3836 ) * Enable license checking * Clean duplicated license headers	2017-01-10 18:13:47 -08:00
Charles Allen	229559b46a	Make TaskLockbox's ReentrantLock fair (#3828 )	2017-01-07 12:34:47 -08:00
Himanshu	4ca3b7f1e4	overlord helpers framework and tasklog auto cleanup (#3677 ) * overlord helpers framework and tasklog auto cleanup * review comment changes * further review comments addressed	2016-12-21 15:18:55 -08:00
Gian Merlino	6440ddcbca	Fix #3795 (Java 7 compatibility). (#3796 ) * Fix #3795 (Java 7 compatibility). Also introduce Animal Sniffer checks during build, which would have caught the original problems. * Add Animal Sniffer on caffeine-cache for JDK8.	2016-12-21 10:19:13 -08:00
Roman Leventov	70e83bea6d	Fix PathChildrenCache's ExecutorService leak (#3726 ) * Fix PathChildrenCache's executorService leak in Announcer, CuratorInventoryManager and RemoteTaskRunner * Use a single ExecutorService for all workerStatusPathChildrenCaches in RemoteTaskRunner	2016-12-07 13:00:10 -08:00
Gian Merlino	4e67dd28c0	RemoteTaskRunnerConfig: Fix Guice error on startup. (#3737 )	2016-12-06 00:19:53 +05:30
Charles Allen	27ab23ef44	Don't update segment metadata if archive doesn't move anything (#3476 ) * Don't update segment metadata if archive doesn't move anything * Fix restore task to handle potential null values * Don't try to update empty metadata * Address review comments * Move to druid-io java-util	2016-12-01 07:49:28 -08:00
Niketh Sabbineni	2640d170c3	Blacklist workers if they fail for too many times (#3643 ) * Blacklist workers if they fail for too many times * Adding documentation * Changing to timeout to period and updating docs * 1. Add configurable maxPercentageBlacklistWorkers 2. Rename variable * Change maxPercentageBlacklistWorkers to double * Remove thread.sleep	2016-11-29 12:38:56 +05:30
Roman Leventov	c070b4a816	Fix concurrency defects, remove unnecessary volatiles (#3701 )	2016-11-22 16:42:28 -08:00
Roman Leventov	7b56cec3b9	Fix resource leaks (#3702 )	2016-11-18 21:21:36 +05:30
Gian Merlino	bcd20441be	Make buildV9Directly the default. (#3688 )	2016-11-14 09:29:32 -08:00
Roman Leventov	fbbb55f867	Update emitter dependency to 0.4.0 and emit "version" dimension for all druid metrics (#3679 ) * Update emitter dependency to 0.4.0 and emit "version" dimension for all druid metrics, not only query metrics * Remove unused imports * Use empty string instead of "testing-version" as a version placeholder	2016-11-11 17:17:27 -06:00
Himanshu	b76b3f8d85	reset-cluster command to clean up druid state stored on metadata and deep storage (#3670 )	2016-11-09 11:07:01 -06:00
Akash Dwivedi	4b3bd8bd63	Migrating java-util from Metamarkets. (#3585 ) * Migrating java-util from Metamarkets. * checkstyle and updated license on java-util files. * Removed unused imports from whole project. * cherry pick metamx/java-util@826021f. * Copyright changes on java-util pom, address review comments.	2016-10-21 14:57:07 -07:00
Parag Jain	1e79a1be82	fix useExplicitVersion (#3559 )	2016-10-10 14:28:06 -05:00
Akash Dwivedi	078de4fcf9	Use explicit version from HadoopIngestionSpec. (#3554 )	2016-10-07 13:59:14 -07:00
Parag Jain	e419407eba	handle supervisor spec metadata failures (#3456 ) close kafka consumer in case supervisor start fails	2016-10-04 10:15:28 -07:00
David Lim	ca9114b41b	add supervisor reset API (#3484 ) * add supervisor reset API * CR doc changes and kill running tasks / clear offsets from supervisor	2016-09-22 17:51:06 -07:00
Gian Merlino	27bd5cb13a	Add forceExtendableShardSpecs option to Hadoop indexing, IndexTask. (#3473 ) Fixes #3241.	2016-09-21 13:40:04 -06:00
Gian Merlino	7a2a4bc6de	JavaScript: Disable now affects worker selection and router strategy too. (#3458 )	2016-09-13 16:37:42 -07:00
Dave Li	c4e8440c22	Adds long compression methods (#3148 ) * add read * update deprecated guava calls * add write and vsizeserde * add benchmark * separate encoding and compression * add header and reformat * update doc * address PR comment * fix buffer order * generate benchmark files * separate encoding strategy and format * fix benchmark * modify supplier write to channel * add float NONE handling * address PR comment * address PR comment 2	2016-08-30 16:17:46 -07:00
Nishant	4c2b8d29d3	Make RTR assign pending tasks by insertion order (#3405 )	2016-08-30 12:22:44 -07:00
Gian Merlino	2f46effc8e	FileTaskLogsTest: Throw unthrown exception. (#3352 )	2016-08-11 09:40:28 -07:00
Himanshu	03cfcf002b	fix the race described in #3174 (#3205 )	2016-08-10 11:29:50 -07:00
kaijianding	50d52a24fc	ability to not rollup at index time, make pre aggregation an option (#3020 ) * ability to not rollup at index time, make pre aggregation an option * rename getRowIndexForRollup to getPriorIndex * fix doc misspelling * test query using no-rollup indexes * fix benchmark fail due to jmh bug	2016-08-02 11:13:05 -07:00
David Lim	d5ed3f1347	change expected response from ACCEPTED to OK (#3280 )	2016-07-23 19:48:30 -07:00
Gian Merlino	06624c40c0	Share query handling between Appenderator and RealtimePlumber. (#3248 ) Fixes inconsistent metric handling between the two implementations. Formerly, RealtimePlumber only emitted query/segmentAndCache/time and query/wait and Appenderator only emitted query/partial/time and query/wait (all per sink). Now they both do the same thing: - query/segmentAndCache/time, query/segment/time are the time spent per sink. - query/cpu/time is the CPU time spent per query. - query/wait/time is the executor waiting time per sink. These generally match historical metrics, except segmentAndCache & segment mean the same thing here, because one Sink may be partially cached and partially uncached and we aren't splitting that out.	2016-07-19 22:15:13 -05:00
Hyukjin Kwon	55e7a52475	Replace deprecated usage for StringInputRowParser and JSONParseSpec (#3215 )	2016-07-14 09:19:17 -07:00
Gian Merlino	ea03906fcf	Configurable compressRunOnSerialization for Roaring bitmaps. (#3228 ) Defaults to true, which is a change in behavior (this used to be false and unconfigurable).	2016-07-08 10:24:19 +05:30
Xavier Léauté	485e381387	remove datasource from hadoop output path (#3196 ) fixes #2083, follow-up to #1702	2016-06-29 08:53:45 -07:00
Hyukjin Kwon	45f553fc28	Replace the deprecated usage of NoneShardSpec (#3166 )	2016-06-25 10:27:25 -07:00
Charles Allen	6be18376c0	Make forking task runner have more informative thread names during the long-blocking part (#3172 ) * Make forking task runner have more informative thread names during the long-blocking part * Make string.format do the work	2016-06-24 08:56:01 -07:00
David Lim	5a3db634ff	add synchronization to SupervisorManager (#3077 )	2016-06-07 00:29:23 -06:00
David Lim	a2290a8f05	support seamless config changes (#3051 )	2016-06-03 13:50:19 -07:00
Charles Allen	474286bbce	Make TaskMaster giant lock fair (#3050 )	2016-06-02 12:10:40 -07:00
David Lim	3ef24c03b3	Validate X-Druid-Task-Id header in request/response and support retrying on outdated TaskLocation information, add KafkaIndexTaskClient unit tests (#3006 ) * validate X-Druid-Task-Id header in request and add header to response * modify KafkaIndexTaskClient to take a TaskLocationProvider as the TaskLocation may not remain constant	2016-05-25 22:05:18 -07:00
Charles Allen	15ccf451f9	Move QueryGranularity static fields to QueryGranularities (#2980 ) * Move QueryGranularity static fields to QueryGranularityUtil * Fixes #2979 * Add test showing #2979 * change name to QueryGranularities	2016-05-17 16:23:48 -07:00
Charles Allen	eaaad01de7	[QTL] Datasource as lookupTier (#2955 ) * Datasource as lookup tier * Adds an option to let indexing service tasks pull their lookup tier from the datasource they are working for. * Fix bad docs for lookups lookupTier * Add Datasource name holder * Move task and datasource to be pulled from Task file * Make LookupModule pull from bound dataSource * Fix test * Fix code style on imports * Fix formatting * Make naming better * Address code comments about naming	2016-05-17 15:44:42 -07:00
David Lim	b489f63698	Supervisor for KafkaIndexTask (#2656 ) * supervisor for kafka indexing tasks * cr changes	2016-05-04 23:13:13 -07:00
Gian Merlino	f8ddfb9a4b	Split SegmentInsertAction and SegmentTransactionalInsertAction for backwards compat. (#2922 ) Fixes #2912.	2016-05-04 13:54:34 -07:00
Himanshu	50065c8288	fix spurious failure of RTR concurrency test (#2915 )	2016-05-04 10:30:20 -07:00
Charles Allen	3f71a4a302	Fix missing log arguments in PendingTaskBasedWorkerResourceManagementStrategy (#2898 )	2016-04-28 18:15:41 -07:00
Parag Jain	0d745ee120	Basic authorization support in Druid (#2424 ) - Introduce `AuthorizationInfo` interface, specific implementations of which would be provided by extensions - If the `druid.auth.enabled` is set to `true` then the `isAuthorized` method of `AuthorizationInfo` will be called to perform authorization checks - `AuthorizationInfo` object will be created in the servlet filters of specific extension and will be passed as a request attribute with attribute name as `AuthConfig.DRUID_AUTH_TOKEN` - As per the scope of this PR, all resources that needs to be secured are divided into 3 types - `DATASOURCE`, `CONFIG` and `STATE`. For any type of resource, possible actions are - `READ` or `WRITE` - Specific ResourceFilters are used to perform auth checks for all endpoints that corresponds to a specific resource type. This prevents duplication of logic and need to inject HttpServletRequest inside each endpoint. For example - `DatasourceResourceFilter` is used for endpoints where the datasource information is present after "datasources" segment in the request Path such as `/druid/coordinator/v1/datasources/`, `/druid/coordinator/v1/metadata/datasources/`, `/druid/v2/datasources/` - `RulesResourceFilter` is used where the datasource information is present after "rules" segment in the request Path such as `/druid/coordinator/v1/rules/` - `TaskResourceFilter` is used for endpoints is used where the datasource information is present after "task" segment in the request Path such as `druid/indexer/v1/task` - `ConfigResourceFilter` is used for endpoints like `/druid/coordinator/v1/config`, `/druid/indexer/v1/worker`, `/druid/worker/v1` etc - `StateResourceFilter` is used for endpoints like `/druid/broker/v1/loadstatus`, `/druid/coordinator/v1/leader`, `/druid/coordinator/v1/loadqueue`, `/druid/coordinator/v1/rules` etc - For endpoints where a list of resources is returned like `/druid/coordinator/v1/datasources`, `/druid/indexer/v1/completeTasks` etc. the list is filtered to return only the resources to which the requested user has access. In these cases, `HttpServletRequest` instance needs to be injected in the endpoint method. Note - JAX-RS specification provides an interface called `SecurityContext`. However, we did not use this but provided our own interface `AuthorizationInfo` mainly because it provides more flexibility. For example, `SecurityContext` has a method called `isUserInRole(String role)` which would be used for auth checks and if used then the mapping of what roles can access what resource needs to be modeled inside Druid either using some convention or some other means which is not very flexible as Druid has dynamic resources like datasources. Fixes #2355 with PR #2424	2016-04-28 16:50:28 -07:00
Himanshu	9669e79df2	fix misleading error log due to race in RTR and concurrency test (#2878 )	2016-04-28 10:28:00 -07:00
Nishant	c29cb7d711	add pending task based resource management strategy (#2086 )	2016-04-27 10:40:53 -07:00
Nishant	bf5e5e7b75	fix #2886 (#2887 ) Fixes https://github.com/druid-io/druid/issues/2886	2016-04-27 08:29:41 -07:00
David Lim	7641f2628f	add control and status endpoints to KafkaIndexTask (#2730 )	2016-04-21 15:34:59 -07:00
Nishant	dbf63f738f	Add ability to filter segments for specific dataSources on broker without creating tiers (#2848 ) * Add back FilteredServerView removed in `a32906c7fd` to reduce memory usage using watched tiers. * Add functionality to specify "druid.broker.segment.watchedDataSources"	2016-04-19 10:10:06 -07:00
Gian Merlino	08c784fbf6	KafkaIndexTask: Use a separate sequence per Kafka partition in order to make (#2844 ) segment creation deterministic. This means that each segment will contain data from just one Kafka partition. So, users will probably not want to have a super high number of Kafka partitions... Fixes #2703.	2016-04-18 22:29:52 -07:00
jon-wei	0e481d6f93	Allow filters to use extraction functions	2016-04-05 13:24:56 -07:00
Fangjin Yang	1e02eeab13	Merge pull request #2683 from metamx/default_retry Better defaults for Retry policy for task actions	2016-03-29 08:02:59 -07:00
Gian Merlino	195c9c5240	Overlord: Avoid a scary Jersey warning. Avoids the following message from being printed on Overlord startup: WARNING: Parameter 1 of type io.druid.indexing.common.actions.TaskActionHolder<T> from public <T> javax.ws.rs.core.Response io.druid.indexing.overlord.http.OverlordResource.doAction (io.druid.indexing.common.actions.TaskActionHolder<T>) is not resolvable to a concrete type	2016-03-28 19:08:56 -07:00
Fangjin Yang	c2284929dc	Merge pull request #2739 from gianm/fix-wtmtest-failure Fix handling of InterruptedException in WorkerTaskMonitor's mainLoop.	2016-03-28 14:52:10 -07:00
Gian Merlino	ee4bb96855	Fix handling of InterruptedException in WorkerTaskMonitor's mainLoop. I believe this will fix #2664.	2016-03-25 12:17:33 -07:00
Himanshu Gupta	004b00bb96	config to explicitly specify classpath for hadoop container during hadoop ingestion	2016-03-25 10:51:28 -05:00
Himanshu	00d7021291	Merge pull request #2607 from jon-wei/dim_schema Support use of DimensionSchema class in DimensionsSpec	2016-03-22 11:53:46 -05:00
Himanshu	3220b109ad	Merge pull request #2570 from binlijin/single_dimension_partitioning Single dimension hash-based partitioning	2016-03-22 11:51:06 -05:00
binlijin	bce600f5d5	Single dimension hash-based partitioning	2016-03-22 13:15:33 +08:00
jon-wei	a59c9ee1b1	Support use of DimensionSchema class in DimensionsSpec	2016-03-21 13:12:04 -07:00
Nishant	ed8f39fcfe	Better defaults for Retry policy for task actions This PR changes the retry of task actions to be a bit more aggressive by reducing the maxWait. Current defaults were 1 min to 10 mins, which lead to a very delayed recovery in case there are any transient network issues between the overlord and the peons. doc changes.	2016-03-18 11:59:55 -07:00
Charles Allen	a52c6d3bee	Fix some google related imports	2016-03-17 11:03:29 -07:00
Nishant	9cceff2274	Use ImmutableWorkerInfo instead of ZKWorker review comments add test for equals and hashcode	2016-03-14 11:17:15 -07:00
Himanshu	d51a0a0cf4	Merge pull request #2220 from gianm/appenderator-kafka Appenderators, DataSource metadata, KafkaIndexTask	2016-03-14 13:14:36 -05:00
Nishant	cf7f6da392	Merge pull request #2634 from gianm/stopGracefully-avoid-interrupt ThreadPoolTaskRunner: Make graceful shutdown logs less scary.	2016-03-11 16:36:10 -08:00
Charles Allen	a3f0048ea4	Merge pull request #2631 from gianm/plumbers-rpe Better logging for ParseExceptions on index aggregation, and remove unnecessary exception handling.	2016-03-11 14:22:58 -08:00
Gian Merlino	79a95f7789	WorkerTaskMonitor: stop() waits for mainLoop to exit. Fixes #2637.	2016-03-11 11:40:13 -08:00
Gian Merlino	05397a9b4f	ThreadPoolTaskRunner: Make graceful shutdown logs less scary. - It's okay to suppress InterruptedException during graceful shutdown, as tasks may use it to accelerate their own shutdown. - It's okay to ignore return statuses during graceful shutdown (which may be FAILED!) because it actually doesn't matter what they are.	2016-03-11 07:49:29 -08:00
Gian Merlino	187569e702	DataSource metadata. Geared towards supporting transactional inserts of new segments. This involves an interface "DataSourceMetadata" that allows combining of partially specified metadata (useful for partitioned ingestion). DataSource metadata is stored in a new "dataSource" table.	2016-03-10 17:41:50 -08:00
Gian Merlino	3d2214377d	Appenderatoring. Appenderators are a way of getting more control over the ingestion process than a Plumber allows. The idea is that existing Plumbers could be implemented using Appenderators, but you could also implement things that Plumbers can't do. FiniteAppenderatorDrivers help simplify indexing a finite stream of data. Also: - Sink: Ability to consider itself "finished" vs "still writable". - Sink: Ability to return the number of rows contained within the sink.	2016-03-10 17:41:50 -08:00
Gian Merlino	92c828f904	Make SegmentHandoffNotifier Closeable.	2016-03-10 16:50:37 -08:00
Gian Merlino	8a11161b20	Plumbers: Move plumber.add out of try/catch for ParseException. The incremental indexes handle that now so it's not necessary. Also, add debug logging and more detailed exceptions to the incremental indexes for the case where there are parse exceptions during aggregation.	2016-03-10 16:39:26 -08:00
Charles Allen	d299540efc	Make HadoopTask load hadoop dependency classes LAST for local isolated classrunner	2016-03-10 10:18:23 -08:00
Himanshu Gupta	0402636598	configurable handoffConditionTimeout in realtime tasks for segment handoff wait	2016-03-05 10:14:54 -06:00
Gian Merlino	e9c23bf376	OverlordResource: Use getZkWorkers on RemoteTaskRunner. Restores old behavior of this api, from before #2249 when getWorkers returned ZkWorkers.	2016-03-02 17:31:34 -08:00
Fangjin Yang	80d954578d	Merge pull request #2572 from gianm/fix-rit-taskresource Fix default TaskResource for RealtimeIndexTasks.	2016-03-02 10:20:27 -08:00
Gian Merlino	acd95d3e28	TaskLocation: Add toString method. Necessary because these objects are used in log messages.	2016-03-01 17:52:06 -08:00
Gian Merlino	a355bfb7a9	Fix default TaskResource for RealtimeIndexTasks. It was supposed to be the same as the task id, but it wasn't because "makeTaskId" has a random component.	2016-03-01 16:54:22 -08:00
Björn Zettergren	2462c82c0e	New defaults for maxRowsInMemory rowFlushBoundary To bring consistency to docs and source this commit changes the default values for maxRowsInMemory and rowFlushBoundary to 75000 after discussion in PR https://github.com/druid-io/druid/pull/2457. The previous default was 500000 and it's lower now on the grounds that it's better for a default to be somewhat less efficient, and work, than to reach for the stars and possibly result in "OutOfMemoryError: java heap space" errors.	2016-03-01 13:50:28 +01:00
Charles Allen	c6803c4364	Allow specifying peon javaOpts as an array	2016-02-26 13:24:35 -08:00
Himanshu Gupta	bc156effe7	RTR has multiple threads for assignment of pending tasks now.	2016-02-26 09:27:03 -06:00
Fangjin Yang	53a5f07c14	Merge pull request #2544 from metamx/fixMaxPort Limit PortFinder to 0xFFFF	2016-02-25 17:12:53 -08:00
Fangjin Yang	143e85eaa5	Merge pull request #2419 from gianm/task-hostports Plumb task peon host/ports back out to the overlord.	2016-02-25 17:11:53 -08:00
Charles Allen	3fa7a7ebfe	Limit PortFinder to 0xFFFF	2016-02-25 08:16:40 -08:00
Charles Allen	187b788089	UnRegister port in ForkingTaskRunner	2016-02-25 08:04:25 -08:00
Gian Merlino	cf0bc905fb	Plumb task peon host/ports back out to the overlord. - Add TaskLocation class - Add registerListener to TaskRunner - Add getLocation to TaskRunnerWorkItem - Implement location tracking in existing TaskRunners - Rework WorkerTaskMonitor to do management out of a single thread so it can handle status and location updates more simply.	2016-02-24 15:13:10 -08:00
Nishant	fb7eae34ed	Merge pull request #2249 from metamx/workerExpanded Use Worker instead of ZkWorker whenever possible	2016-02-24 13:23:22 +05:30
Charles Allen	ac13a5942a	Use Worker instead of ZkWorker whenver possible * Moves last run task state information to Worker * Makes WorkerTaskRunner a TaskRunner which has interfaces to help with getting information about a Worker	2016-02-23 15:02:03 -08:00
Gian Merlino	3534483433	Better handling of ParseExceptions. Two changes: - Allow IncrementalIndex to suppress ParseExceptions on "aggregate". - Add "reportParseExceptions" option to realtime tuning configs. By default this is "false". Behavior of the counters should now be: - processed: Number of rows indexed, including rows where some fields could be parsed and some could not. - thrownAway: Number of rows thrown away due to rejection policy. - unparseable: Number of rows thrown away due to being completely unparseable (no fields salvageable at all). If "reportParseExceptions" is true then "unparseable" will always be zero (because a parse error would cause an exception to be thrown). In addition, "processed" will only include fully parseable rows (because even partial parse failures will cause exceptions to be thrown). Fixes #2510.	2016-02-23 10:11:43 -08:00
Bingkun Guo	499288ff4b	Merge pull request #2509 from metamx/hadoopIsolatorTest Add hadoop classloader isolation tests for HadoopTask	2016-02-19 14:23:22 -06:00
Fangjin Yang	a3c29b91cc	Merge pull request #2505 from gianm/rt-exceptions Harmonize realtime indexing loop across the task and standalone nodes.	2016-02-19 11:23:14 -08:00
Charles Allen	9dff0e5dbd	Add hadoop classloader isolation tests for HadoopTask	2016-02-19 11:15:53 -08:00
Fangjin Yang	ddf913d626	Merge pull request #2508 from gianm/ftr-shutdown-logging ForkingTaskRunner: Better logging during orderly shutdown.	2016-02-19 10:02:24 -08:00
Gian Merlino	c0c6cf77fa	ForkingTaskRunner: Better logging during orderly shutdown.	2016-02-19 09:17:16 -08:00
Gian Merlino	243ac5399b	Harmonize realtime indexing loop across the task and standalone nodes. - Both now catch ParseExceptions on plumber.add (see https://groups.google.com/d/topic/druid-user/wmiRDvx2RvM/discussion) - Standalone now treats IndexSizeExceededException as fatal (previously only the task did)	2016-02-19 07:34:15 -08:00
Charles Allen	87752be740	Make HadoopTasks's classloader a single one	2016-02-18 20:58:09 -08:00
Andrés Gomez	07d714b1b5	Fixed equal distribution strategy when exist disable middleManager with same currCapacityUsed.	2016-02-17 08:40:42 +01:00
Himanshu	5779b32742	Merge pull request #2439 from metamx/fix2435 Make QuotableWhiteSpaceSplitter able to take JSON	2016-02-11 13:14:43 -06:00
Charles Allen	40ade32a1f	Fix dependencies. * Don't put druid***selfcontained.jar at the end of the hadoop isolated classpath Add `<scope>provided</scope>` to prevent repeated dependency inclusion in the extension directories	2016-02-11 07:30:14 -08:00
Charles Allen	3a6452c6d4	Make QuotableWhiteSpaceSplitter able to take json * Fixes #2435	2016-02-10 16:42:14 -08:00
Xavier Léauté	91f23583f5	Merge pull request #2436 from gianm/mm-less-suppressey Harmonize znode writing code in RTR and Worker.	2016-02-10 16:11:30 -08:00
Gian Merlino	fa92b77f5a	Harmonize znode writing code in RTR and Worker. - Throw most exceptions rather than suppressing them, which should help detect problems. Continue suppressing exceptions that make sense to suppress. - Handle payload length checks consistently, and improve error message. - Remove unused WorkerCuratorCoordinator.announceTaskAnnouncement method. - Max znode length should be int, not long. - Add tests.	2016-02-10 14:52:00 -08:00
Charles Allen	2bde8b1d68	Make hadoop classpath isolation more explicit * Fixes #2428	2016-02-10 12:09:17 -08:00
Charles Allen	a0728fa854	Allow ScalingStats to be null * Fixes #2378	2016-02-02 18:01:01 -08:00
Parag Jain	7853a9cc41	clean up TaskLifecycleTest	2016-01-31 11:19:20 -06:00
Gian Merlino	5fd4b79373	RealtimeIndexTask: Fix NPE caused by calling stopGracefully before a firehose had been connected.	2016-01-29 11:20:23 -08:00
Gian Merlino	c4fde52160	Fix 'graceful shutdown aborted' log message in ThreadPoolTaskRunner.	2016-01-29 11:07:17 -08:00
Nishant	dcb7830330	Merge pull request #984 from drcrallen/thread-priority-rebase Use thread priorities. (aka set `nice` values for background-like tasks)	2016-01-21 15:02:34 +05:30
Charles Allen	66e74b1a63	Minor field name change in RemoteTaskRunnerFactory to be more descriptive * Addresses https://github.com/druid-io/druid/pull/2309#discussion_r50335081	2016-01-20 17:43:20 -08:00
Charles Allen	3152d08844	Fix overlord scheduled executor injection * Fixes https://github.com/druid-io/druid/issues/2308	2016-01-20 14:16:14 -08:00
Charles Allen	2e1d6aaf3d	Use thread priorities. (aka set `nice` values for background-like tasks) * Defaults the thread priority to java.util.Thread.NORM_PRIORITY in io.druid.indexing.common.task.AbstractTask * Each exec service has its own Task Factory which is assigned a priority for spawned task. Therefore each priority class has a unique exec service * Added priority to tasks as taskPriority in the task context. <0 means low, 0 means take default, >0 means high. It is up to any particular implementation to determine how to handle these numbers * Add options to ForkingTaskRunner * Add "-XX:+UseThreadPriorities" default option * Add "-XX:ThreadPriorityPolicy=42" default option * AbstractTask - Removed unneded @JsonIgnore on priority * Added priority to RealtimePlumber executors. All sub-executors (non query runners) get Thread.MIN_PRIORITY * Add persistThreadPriority and mergeThreadPriority to realtime tuning config	2016-01-20 14:00:31 -08:00
Nishant	ac6c90e657	Merge pull request #1953 from metamx/taskRunnerResourceManagement Move resource managemnt to be the responsibility of the TaskRunner	2016-01-20 22:27:47 +05:30
Jonathan Wei	df2906a91c	Merge pull request #2290 from gianm/index-merger-v9-stuff Respect buildV9Directly in PlumberSchools, so it works on standalone realtime.	2016-01-19 13:04:00 -08:00
Fangjin Yang	0c31f007fc	Merge pull request #1728 from himanshug/aggregators_in_segment_metadata Store AggregatorFactory[] in segment metadata	2016-01-19 12:55:49 -08:00
Himanshu	fe841fd961	Merge pull request #2118 from guobingkun/fix_segment_loading Fix loading segment for historical	2016-01-19 14:25:48 -06:00
Himanshu Gupta	a99aef29a1	adding aggregators to segment metadata	2016-01-19 14:23:39 -06:00
Gian Merlino	1dcf22edb7	Respect buildV9Directly in PlumberSchools, so it works on standalone realtime nodes. Also parameterize some tests to run with/without buildV9Directly: - IndexGeneratorJobTest - RealtimeIndexTaskTest - RealtimePlumberSchoolTest	2016-01-19 12:15:06 -08:00
Bingkun Guo	c4ad50f92c	Fix loading segment for historical Historical will drop a segment that shouldn't be dropped in the following scenario: Historical node tried to load segmentA, but failed with SegmentLoadingException, then ZkCoordinator called removeSegment(segmentA, blah) to schedule a runnable that would drop segmentA by deleting its files. Now, before that runnable executed, another LOAD request was sent to this historical, this time historical actually succeeded on loading segmentA and announced it. But later on, the scheduled drop-of-segment runnable started executing and removed the segment files, while historical is still announcing segmentA.	2016-01-19 10:29:49 -06:00
Himanshu Gupta	164b0aad7a	removing Map<String,Object> segmentMetadata from methods in Index[Maker/Merger] and using Metadata class instead of a Map to store segment metadata	2016-01-18 22:03:46 -06:00
Kurt Young	82ff98c2bf	add config for build v9 directly and update docs	2016-01-16 11:26:34 +08:00
Charles Allen	976d4c965b	Move resource managemnt to be the responsibility of the TaskRunner	2016-01-13 10:38:22 -08:00
Himanshu	82bdfbbbf1	Merge pull request #2155 from metamx/taskConfigTmpdir Make TaskConfig pull from java.io.tmpdir	2016-01-05 13:58:39 -06:00
Nishant	45f402f22f	increase timeout tune timeouts	2016-01-05 19:06:04 +05:30
Charles Allen	e18301d99c	Make TaskConfig pull from java.io.tmpdir * Also makes paths built off of java.nio.file.Paths instead of String.format	2016-01-04 10:17:08 -08:00
fjy	b5c094d951	Fixes #2180	2016-01-01 16:56:41 -08:00
Nishant	b68265399c	Merge pull request #2168 from druid-io/remove-indexmaker Remove IndexMaker	2015-12-30 12:24:29 +05:30
Nishant	df893dbaf8	Merge pull request #2141 from gianm/fix-restoring-realtime Fix some problems with restoring	2015-12-30 10:44:45 +05:30
Fangjin Yang	7ffa706655	Merge pull request #2152 from metamx/add-taskId Add taskId to realtimeMetrics	2015-12-29 10:33:40 -08:00
fjy	38b0f1fbc2	fix transient failures in unit tests	2015-12-28 20:03:30 -08:00
fjy	faf421726b	remove IndexMaker	2015-12-28 14:19:02 -08:00
Fangjin Yang	8cb52bddd8	Merge pull request #2140 from navis/fix-sporadic-testfail4 Fix sporadic fail of RemoteTaskRunnerTest#testWorkerRemoved	2015-12-27 14:55:49 -08:00
Fangjin Yang	9aa62e4631	Merge pull request #2154 from navis/fix-testfail-WorkerTaskMonitorTest Fix sporadic fail of WorkerTaskMonitorTest#testRunTask	2015-12-23 20:52:33 -08:00
navis.ryu	a8f6c6110d	Fix sporadic fail of WorkerTaskMonitorTest#testRunTask	2015-12-24 02:31:30 +09:00
navis.ryu	2c3c4a3f8f	Another try to fix xxServerViewTests	2015-12-24 02:13:40 +09:00
Nishant	978a3fd8ae	Add taskId to realtimeMetrics Add task Id to Realtime Metrics	2015-12-23 18:05:25 +05:30
Gian Merlino	32edd1538d	RealtimeIndexTask: Fix a couple of problems with restoring. - Shedding locks at startup is bad, we actually want to keep them. Stop doing that. - stopGracefully now interrupts the run thread if had started running finishJob. This avoids waiting for handoff unnecessarily.	2015-12-22 16:04:47 -08:00
Gian Merlino	f4ce2b9bc5	TaskLockbox: Consider active tasks active even if they have no locks.	2015-12-22 16:04:16 -08:00
Gian Merlino	bad270b6c4	druid.indexer.task.restoreTasksOnRestart configuration.	2015-12-22 10:59:15 -08:00
navis.ryu	8a179fc273	Fix sporadic fail of RemoteTaskRunnerTest#testWorkerRemoved	2015-12-22 14:33:37 +09:00
Himanshu Gupta	5e178499e8	trying to fix transient errors in testRealtimeIndexTask() by increasing overall timeout and unlimited wait for segment publish	2015-12-21 00:11:20 -06:00
Fangjin Yang	14229ba0f2	Merge pull request #1922 from metamx/jsonIgnoresFinalFields Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to	2015-12-18 15:38:32 -08:00
Bingkun Guo	1e5aa2f3ac	fix getType() and Json serialization in ClientMergeQuery and add serde tests	2015-12-15 12:08:43 -06:00
Nishant	a32906c7fd	Remove FilteredServerView	2015-12-09 01:54:12 +05:30
Nishant	9491e8de3b	Remove ServerView from RealtimeIndexTasks and use coordinator http endpoint for handoffs - fixes #1970 - extracted out segment handoff callbacks in SegmentHandoffNotifier which is responsible for tracking segment handoffs and doing callbacks when handoff is complete. - Coordinator now maintains a view of segments in the cluster, this will affect the jam heap requirements for the overlord for large clusters. realtime index task and nodes now use HTTP end points exposed by the coordinator to get serverView review comment fix realtime node guide injection review comments make test not rely on scheduled exec fix compilation fix import review comment introduce immutableSegmentLoadInfo fix son reading remove unnecessary logging	2015-12-09 01:54:09 +05:30
Himanshu Gupta	62ba9ade37	unifying license header in all java files	2015-12-05 22:16:23 -06:00
Gian Merlino	20544d409b	Merge pull request #1988 from himanshug/multi-interval-batch-delta support multiple intervals in dataSource inputSpec	2015-12-04 09:07:52 -08:00
Gian Merlino	020a5e7081	Merge pull request #2024 from metamx/fairBigTaskQueueLock Make the TaskQueue big lock fair	2015-12-03 19:32:53 -08:00
Himanshu Gupta	61aaa09012	support multiple intervals in dataSource input spec	2015-12-03 21:28:04 -06:00
Himanshu Gupta	86f0a36e83	support multiple intervals in SegmentListUsedAction	2015-12-03 21:28:04 -06:00
Himanshu Gupta	221fb95d07	add support for getting used segments for multiple interval in IndexerMetadataStorageCoordinator	2015-12-03 21:28:04 -06:00
Charles Allen	dbaaa6af92	Make the TaskQueue big lock fair	2015-12-01 19:13:07 -08:00
Nishant	1eb8211346	Add datasource and taskId to metrics emitted by peons This PR adds the datasource and taskId to the jvm and sys metrics emitted by the peons. fix spelling review comment review comment	2015-12-01 23:20:59 +05:30
Fangjin Yang	8e83d800d6	Merge pull request #1881 from gianm/restartable-tasks Restorable indexing tasks	2015-11-23 21:14:37 -08:00
Gian Merlino	501dcb43fa	Some changes that make it possible to restart tasks on the same hardware. This is done by killing and respawning the jvms rather than reconnecting to existing jvms, for a couple reasons. One is that it lets you restore tasks after server reboots too, and another is that it lets you upgrade all the software on a box at once by just restarting everything. The main changes are, 1) Add "canRestore" and "stopGracefully" methods to Tasks that say if a task can stop gracefully, and actually do a graceful stop. RealtimeIndexTask is the only one that currently implements this. 2) Add "stop" method to TaskRunners that attempts to do an orderly shutdown. ThreadPoolTaskRunner- call stopGracefully on restorable tasks, wait for exit ForkingTaskRunner- close output stream to restorable tasks, wait for exit RemoteTaskRunner- do nothing special, we actually don't want to shutdown 3) Add "restore" method to TaskRunners that attempts to bootstrap tasks from last run. Only ForkingTaskRunner does anything here. It maintains a "restore.json" file with a list of restorable tasks. 4) Have the CliPeon's ExecutorLifecycle lock the task base directory to avoid a restored task and a zombie old task from stomping on each other.	2015-11-23 11:22:08 -08:00
Gian Merlino	666d785787	Switch TaskActions from Optionals to nullable. Deserialization of Optionals does not work quite right- they come back as actual nulls, rather than absent Optionals. So these probably only ever worked for the local task action client.	2015-11-20 09:14:07 -08:00
Fangjin Yang	21c84b5ff7	Merge pull request #1896 from gianm/allocate-segment SegmentAllocateAction (fixes #1515)	2015-11-18 21:05:46 -08:00
Fangjin Yang	e52c156066	Merge pull request #1880 from gianm/rtr-adjust RTR: Ensure that there is only one cleanup task scheduled for a worker at once.	2015-11-18 15:12:55 -08:00
Charles Allen	8fcf2403e3	Merge pull request #1943 from metamx/realtime-caching Enable caching on intermediate realtime persists	2015-11-17 15:06:43 -08:00
Charles Allen	dbe201aeed	Merge pull request #1929 from pjain1/jetty_threads separate ingestion and query thread pool	2015-11-17 12:14:25 -08:00
Parag Jain	6c498b7d4a	separate ingestion and query thread pool	2015-11-17 13:42:41 -06:00
Xavier Léauté	d7eb2f717e	enable query caching on intermediate realtime persists	2015-11-17 10:58:00 -08:00
Charles Allen	46527a9610	Merge pull request #1954 from metamx/fix-stupid-aws-limit EC2 autoscaler: avoid hitting aws filter limits	2015-11-13 10:52:35 -08:00
Fangjin Yang	4f46d457f1	Merge pull request #1947 from noddi/feature/count-parameter-history-endpoints Add count parameter to history endpoints	2015-11-12 10:23:44 -08:00
Xavier Léauté	749ac12f88	EC2 autoscaler: avoid hitting aws filter limits	2015-11-11 20:28:06 -08:00
Fangjin Yang	465cbcf9a7	Merge pull request #1956 from metamx/remove-unused-imports Cleanup + remove unused imports	2015-11-11 17:36:47 -08:00
Gian Merlino	e4e5f0375b	SegmentAllocateAction (fixes #1515 ) This is a feature meant to allow realtime tasks to work without being told upfront what shardSpec they should use (so we can potentially publish a variable number of segments per interval). The idea is that there is a "pendingSegments" table in the metadata store that tracks allocated segments. Each one has a segment id (the same segment id we know and love) and is also part of a sequence. The sequences are an idea from @cheddar that offers a way of doing replication. If there are N tasks reading exactly the same data with exactly the same logic (think Kafka tasks reading a fixed range of offsets) then you can place them in the same sequence, and they will generate the same sequence of segments.	2015-11-11 16:54:35 -08:00
Bartosz Ługowski	6e5d2c6745	Add count parameter to history endpoints.	2015-11-11 23:03:57 +01:00
Xavier Léauté	fa6142e217	cleanup and remove unused imports	2015-11-11 12:25:21 -08:00
zhxiaog	c197a4cf32	fix #1918 , add unit tests for RemoteTaskActionClient	2015-11-12 03:15:17 +08:00
Charles Allen	abae47850a	Add backwards compatability for PR #1922	2015-11-11 10:27:00 -08:00
Charles Allen	1df4baf489	Move Jackson Guice adapters into io.druid * Removes access to protected methods in com.fasterxml * Eliminates druid-common's use of foreign package com.fasterxml	2015-11-09 10:50:45 -08:00
Gian Merlino	fc55314d1c	ForkingTaskRunner: Log without buffering. In #933 the ForkingTaskRunner's logging was changed to buffered from unbuffered. This means that the last few KB of the logs are generally not visible while a task is running, which makes debugging running tasks difficult.	2015-11-07 15:16:53 -08:00
Charles Allen	929b981710	Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to	2015-11-05 18:10:13 -08:00
Gian Merlino	cb409ee928	RemoteTaskActionClient: Fix statusCode check.	2015-11-05 10:03:49 -08:00
fjy	8f231fd3e3	cleanup druid codebase	2015-11-04 13:59:53 -08:00
Himanshu Gupta	84f7d8d264	making static final variables in HadoopDruidIndexerConfig upper case	2015-11-02 23:24:26 -06:00
Himanshu Gupta	8b67417ac8	make methods in Index[Merger,Maker,IO] non-static so that they can have appropriate ObjectMapper injected instead of creating one statically	2015-11-02 23:24:26 -06:00
Gian Merlino	16ae8866b8	Log and continue on failure to schedule cleanup for missing workers at startup.	2015-10-28 08:10:54 -07:00
Gian Merlino	513bc76252	RTR: Ensure that there is only one cleanup task scheduled for a worker at once. This is accomplished by making sure that scheduleTasksCleanupForWorker is only called from the PathChildrenCache event thread, having it cancel existing cleanup tasks when it adds a new one, and having tasks check on finish that the thing they are removing from the task list is actually themselves.	2015-10-27 21:16:58 -07:00
Fangjin Yang	ea2267e08c	Merge pull request #1868 from gianm/fix-announcements Historical and MiddleManager server announcements should not remove parents.	2015-10-27 14:50:05 -07:00
Gian Merlino	7df7370935	Merge pull request #1862 from metamx/indexingServiceMMGone Add timeout to shutdown request to middle manager for indexing service	2015-10-27 14:38:01 -07:00
Charles Allen	44a2b204df	Add timeout to shutdown request to middle manager for indexing service	2015-10-27 13:56:03 -07:00
Gian Merlino	4b92752deb	Historical and MiddleManager server announcements should not remove parents. Removing parent paths causes watchers of the "announcements" path to get stuck and stop seeing new updates.	2015-10-27 08:06:11 -07:00
Bingkun Guo	4914925d65	New extension loading mechanism 1) Remove maven client from downloading extensions at runtime. 2) Provide a way to load Druid extensions and hadoop dependencies through file system. 3) Refactor pull-deps so that it can download extensions into extension directories. 4) Add documents on how to use this new extension loading mechanism. 5) Change the way how Druid tarball is generated. Now all the extensions + hadoop-client 2.3.0 are packaged within the Druid tarball.	2015-10-21 14:22:36 -05:00
Charles Allen	532e1c9fd5	Do not pass `druid.indexer.runner.javaOpts` to Peon as a property * Still places `druid.indexer.runner.javaOpts` on the command line, but the Peon no longer tries to have the property `druid.indexer.runner.javaOpts` set * Fixes https://github.com/druid-io/druid/issues/1841	2015-10-20 09:24:01 -07:00
Charles Allen	bf11723a52	Update usages of io.druid.client.selector.Server to build URL or URI directly instead of using String.format	2015-10-12 12:30:56 -07:00
Charles Allen	2d847ad654	Merge pull request #1730 from metamx/union-queries-fix fix #1727 - Union bySegment queries fix	2015-09-29 12:23:25 -07:00
Nishant	573aa96bd6	fix #1727 - Union bySegment queries fix Fixes #1727. revert to doing merging for results for union queries on broker. revert unrelated changes Add test for union query runner Add test remove unused imports fix imports fix renamed file fix test update docs.	2015-09-29 23:32:36 +05:30
Charles Allen	d2e400f063	Merge pull request #1740 from metamx/validate-locks fix #1715	2015-09-29 09:38:42 -07:00
Xavier Léauté	25bbc0b923	Merge pull request #1778 from gianm/redirect-fixes Redirect fixes	2015-09-25 09:54:48 -07:00
Gian Merlino	348172203f	OverlordRedirectInfo: Fix ability to detect that there is no leader.	2015-09-25 09:30:09 -07:00
Parag Jain	b630720164	fail task if finishjob throws any exception add realtime task failure test	2015-09-25 10:55:45 -05:00
Fangjin Yang	aa9d90355e	Merge pull request #1772 from gianm/fix-overlord-startup RemoteTaskRunner: Fix for starting an overlord before any workers ever existed.	2015-09-24 21:55:03 -07:00
Gian Merlino	63bf021077	RemoteTaskRunner: Fix for starting an overlord before any workers ever existed.	2015-09-24 21:15:36 -07:00
Himanshu Gupta	6e550d5346	update doc about aggregation field in merge task and a null check	2015-09-24 22:25:07 -05:00
Nishant	b638400acb	fix #1715 fixes #1715 - TaskLockBox has a set of active tasks - lock requests throws exception for if they are from a task not in active task set. - TaskQueue is responsible for updating the active task set on tasklockbox fix #1715 fixes #1715 - TaskLockBox has a set of active tasks - lock requests throws exception for if they are from a task not in active task set. - TaskQueue is responsible for updating the active task set on tasklockbox review comment remove duplicate line use ISE instead organise imports	2015-09-24 10:06:50 +05:30
Himanshu	61b0743943	Merge pull request #1748 from metamx/forkingJavaOptionsWithQuotes Allow ForkingTaskRunner javaOpts to have quoted arguments which contain spaces	2015-09-21 21:03:00 -05:00
Charles Allen	465035e531	Allow ForkingTaskRunner javaOpts to have quoted arguments which contain spaces	2015-09-21 17:32:27 -07:00
Fangjin Yang	e48f6dd660	Merge pull request #1736 from gianm/additional-ingest-segment-timeline-test IngestSegmentFirehostFactoryTimelineTest for overshadowing of the middle of a segment.	2015-09-17 14:42:29 -07:00
Gian Merlino	64e33b2bcb	IngestSegmentFirehostFactoryTimelineTest for overshadowing of the middle of a segment.	2015-09-16 10:17:43 -07:00
Himanshu Gupta	74f4572bd4	Lazily deserialize "parser" to InputRowParser in DataSchema so that user hadoop related InputRowParsers are created only when needed this allows overlord to accept a HadoopIndexTask with a hadoopy InputRowParser and not fail because hadoopy InputRowParser might need hadoop libraries	2015-09-16 10:58:13 -05:00
Charles Allen	f5ed6e885c	Merge pull request #1702 from himanshug/double_datasource_in_storage_dir do not have dataSource twice in path to segment storage on hdfs	2015-09-15 14:00:35 -07:00
Nishant	4681ff22ed	add task duration in response for completed tasks	2015-09-10 13:51:50 +05:30
Himanshu Gupta	fe0233adf2	removing unused imports from HadoopIndexTask	2015-09-09 11:12:01 -05:00
Nishant	47aac991ec	add null check for task context. make variable final	2015-09-04 22:19:01 +05:30
Fangjin Yang	75a582974b	Merge pull request #1639 from gianm/new-plumber New plumber	2015-09-03 18:52:57 -07:00
Gian Merlino	062a47fba4	Modify Plumbers in these ways, 1) Persist using Committer instead of Runnable. (Although the metadata object is ignored in this patch) 2) Remove the getSink method. 3) Plumbers are now responsible for time-based and hydrant-full-based periodic committing. (FireChief, RealtimeIndexTask, and IndexTask used to do this)	2015-09-03 11:13:06 -07:00
Nishant	726326abc3	Add Task Context and ability to override task specific properties override javaOpts fix compilation review comments Add Test for typecast review comments - remove unused method.	2015-09-03 23:36:32 +05:30
Gian Merlino	940e1aa3eb	Replace funky imports with standard ones. 1) Lots of Guava imports were not coming from the actual Guava 2) junit.framework.Assert should be org.junit.Assert	2015-08-28 18:02:05 -07:00
Gian Merlino	414a6fb477	Fix overlapping segments in IngestSegmentFirehose, DatasourceInputFormat. Fixes #1678. IngestSegmentFirehose (and its users) need to remember which windows of which segments should actually be read, based on a timeline.	2015-08-28 07:32:41 -07:00
Himanshu Gupta	2e0dd1d792	adding UTs and addressing review comments to firehoseV2 addition to Realtime[Manager\|Plumber], essential segment metadata persist support, kafka-simple-consumer-firehose extension patch	2015-08-27 20:50:46 -05:00
lvjq	2237a8cf0f	kafka 8 simple consumer firehose	2015-08-27 20:50:46 -05:00
Nishant	b306739e9c	fix convert segment task 1) fix serde 2) fix wrong parameter being passed when creating subtask remove sysout	2015-08-27 11:34:41 +05:30
Charles Allen	e38cf54bc8	Migrate TestDerbyConnector to a JUnit @Rule	2015-08-26 21:47:40 -07:00
Xavier Léauté	fdb6a6651b	Merge pull request #1669 from metamx/upgrade-dependencies Upgrade dependencies	2015-08-25 21:30:22 -07:00
Xavier Léauté	5c19ffa98c	Merge pull request #1663 from gianm/segment-insert-constraints TaskActionToolbox: Remove allowOlderVersions, lift interval constraint	2015-08-25 18:11:46 -07:00
Xavier Léauté	51f6a9a2c9	update jackson to 2.6.1	2015-08-25 16:07:01 -07:00
Gian Merlino	33681525e3	TaskActionToolbox: Remove allowOlderVersions switch, lift interval constraint. allowOlderVersions has been stuck true for a while due to a bug (introduced in `566a3a61`), but I think it's actually OK this way. I think it's reasonable to expect tasks to choose versions in some way that makes sense, so long as they don't choose one larger than their taskLock version. This is still verified. The interval constraint was introduced to force tasks to break up their segment insert lists into manageable chunks. They are already doing this, and I think it's reasonable to expect them to do so without enforcement. Lifting these constraints paves the way for transactional insertion of segments that have varying versions and may be for varying intervals.	2015-08-25 14:17:38 -07:00
Paul Otto	2301b60365	Add ability to provide taskResource for IndexTask.	2015-08-24 17:38:31 -07:00
Himanshu Gupta	15fa43dd43	changing DatasourcePathSpec, to get segment list, so that hadoop indexer uses overlord action to get list of segments and passes when running as an overlord task. and, uses metadata store directly when running as standalone hadoop indexer also, serialized list of segments is passed to DatasourcePathSpec so that hadoop classloader issues do not creep up	2015-08-16 14:07:35 -05:00
Himanshu Gupta	4d4aa8bfc6	refactor IngestSegmentFirehoseFactory so that IngestSegmentFirehose becomes reusable Conflicts: indexing-service/src/main/java/io/druid/indexing/firehose/IngestSegmentFirehoseFactory.java	2015-08-14 14:44:22 -05:00
Gian Merlino	bc0c7dd65d	Avoid the Hadoop objectMapper in the local IndexTask. Fixes #1545 .	2015-08-11 10:40:53 -07:00
Charles Allen	1ddaa3fb33	Merge pull request #1592 from metamx/clean-test-files clean temporary files	2015-08-03 11:47:20 -07:00
Nishant	2679efee7a	clean temporary files	2015-08-03 23:32:58 +05:30
Fangjin Yang	6f65e6d3ef	Merge pull request #1547 from pjain1/improve_overlord_test add test to OverlordResourceTest	2015-07-28 07:35:48 -10:00
Parag Jain	2e1b617346	add more tests	2015-07-24 15:12:08 -05:00
Fangjin Yang	97242356b4	Merge pull request #1480 from guobingkun/kill_task_test Unit tests for KillTask and MetadataTaskStorage	2015-07-20 16:31:45 -07:00
Fangjin Yang	3f7ba58227	Merge pull request #1504 from metamx/fix-1447 fix for #1447	2015-07-14 08:50:08 -07:00
Himanshu	e2ddfb7a1a	Merge pull request #1511 from pjain1/remove_test remove flaky overlord test	2015-07-13 18:38:34 -05:00
Parag Jain	59dec89f6a	remove flaky overlord test	2015-07-13 15:32:12 -05:00
Himanshu	725086cc89	Merge pull request #1506 from gianm/realtime-plumber-nulls Consider null inputRows and parse errors as unparseable during realtime ingestion.	2015-07-13 10:12:12 -05:00
Gian Merlino	9068bcd062	Consider null inputRows and parse errors as unparseable during realtime ingestion. Also, harmonize exception handling between the RealtimeIndexTask and the RealtimeManager. Conditions other than null inputRows and parse errors bubble up in both.	2015-07-11 20:40:03 -07:00
Himanshu	cac722968e	Merge pull request #1503 from metamx/fix-leaking-zk-nodes Fix leaking Status Path nodes in ZK	2015-07-10 17:40:18 -05:00
Fangjin Yang	9f19e96658	Merge pull request #1477 from pjain1/overlord_test overlord and task master test	2015-07-10 14:27:14 -07:00
Parag Jain	55c4fe64f3	overlord and task master test	2015-07-10 16:17:45 -05:00
Nishant	5fe27fe4ad	fix for #1447 fixes #1447	2015-07-09 19:05:48 +05:30
Nishant	8d7a566bae	Fix leaking Status Path nodes in ZK - remove ZK status path nodes for workers after they are removed	2015-07-09 17:20:09 +05:30
Charles Allen	c0b60c0d2f	I'm not your mom, indexing-service/test... cleanup after yourself	2015-07-01 15:00:09 -07:00
Bingkun Guo	282a0f9760	Unit tests for KillTask and MetadataTaskStorage	2015-06-29 17:55:41 -05:00
Himanshu	b5b9ca1446	Merge pull request #1470 from pjain1/rtindex_test Realtime Index Task test	2015-06-29 16:51:35 -05:00
Parag Jain	284b80b09e	Realtime Index Task test	2015-06-29 09:52:41 -05:00
nishant	fb4052d577	JavaScript Worker Select Strategy this PR adds a JavaScriptWorkerSelectStrategy which allows defining arbitrary logic for selecting workers to run task using a JavaScript function. This gives users full control to implement complex worker selection strategies based on task attributes. more tests and a complex javascript config fix for java8 modify for nashorn compatibility	2015-06-20 02:01:34 +05:30
Charles Allen	acc0a3fbf7	Add jitter to the retries for RemoteTaskActionClient	2015-06-12 17:43:25 -07:00
nishant	e9afec4a2b	fix task status issues on zk outages docs review comments fix test review comments Review comments fix compilation fix typo	2015-06-11 00:49:52 +05:30
Xavier Léauté	78d468700b	Merge pull request #1388 from metamx/fix-1360 fix race described in 1360	2015-06-10 11:59:36 -07:00
Xavier Léauté	f6b336ac3e	Merge pull request #1432 from metamx/config-fix fix passing of config from IndexTuningConfig to RealtimeTuningConfig	2015-06-10 11:42:58 -07:00
nishant	963682d696	Add check for valid rowFlushBoundary configuration and fix tests	2015-06-10 21:38:34 +05:30
nishant	191b302f6a	fix passing of config from IndexTuningConfig to RealtimeTuningConfig - pass rowFlushboundary correctly instead of using default. - fixes indexTask failing with io.druid.segment.incremental.IndexSizeExceededException when rowFlushboundary is set higher than RealtimeTuningConfig.defaultMaxRowsInMemory rename test method	2015-06-10 21:07:25 +05:30
nishant	af9ea08041	fix race described in 1360 review comments review comments review comments no need to remove fix test review comments	2015-06-10 12:19:12 +05:30
Charles Allen	056cab93ed	Add Hadoop Converter Job and task * Fixes https://github.com/druid-io/druid/issues/1363 * Add extra utils in JobHelper based on PR feedback	2015-06-09 14:47:38 -07:00
Charles Allen	ef9b67cce3	Merge pull request #1422 from metamx/fix-ec2-public-ip fix public IP not working in EC2 autoscaling	2015-06-03 16:30:51 -07:00
Xavier Léauté	4ebdfea76f	fix public IP not working in EC2 autoscaling	2015-06-03 16:05:59 -07:00

... 4 5 6 7 8 ...

1056 Commits