druid

Commit Graph

Author	SHA1	Message	Date
Jihoon Son	a2e6de4b16	Fix the potential race between SplittableInputSource.getNumSplits() and SplittableInputSource.createSplits() in TaskMonitor (#8924 ) * Fix the potential race SplittableInputSource.getNumSplits() and SplittableInputSource.createSplits() in TaskMonitor * Fix docs and javadoc * Add unit tests for large or small estimated num splits * add override	2019-11-23 01:38:08 -08:00
Gian Merlino	e0eb85ace7	Add FileUtils.createTempDir() and enforce its usage. (#8932 ) * Add FileUtils.createTempDir() and enforce its usage. The purpose of this is to improve error messages. Previously, the error message on a nonexistent or unwritable temp directory would be "Failed to create directory within 10,000 attempts". * Further updates. * Another update. * Remove commons-io from benchmark. * Fix tests.	2019-11-22 19:48:49 -08:00
Rye	0514e5686e	add TsvInputFormat (#8915 ) * add TsvInputFormat * refactor code * fix grammar * use enum replace string literal * code refactor * code refactor * mark abstract for base class meant not to be instantiated * remove constructor for test	2019-11-22 18:01:40 -08:00
Chi Cao Minh	ff6217365b	Refactor parallel indexing perfect rollup partitioning (#8852 ) * Refactor parallel indexing perfect rollup partitioning Refactoring to make it easier to later add range partitioning for perfect rollup parallel indexing. This is accomplished by adding several new base classes (e.g., PerfectRollupWorkerTask) and new classes for encapsulating logic that needs to be changed for different partitioning strategies (e.g., IndexTaskInputRowIteratorBuilder). The code is functionally equivalent to before except for the following small behavior changes: 1) PartialSegmentMergeTask: Previously, this task had a priority of DEFAULT_TASK_PRIORITY. It now has a priority of DEFAULT_BATCH_INDEX_TASK_PRIORITY (via the new PerfectRollupWorkerTask base class), since it is a batch index task. 2) ParallelIndexPhaseRunner: A decorator was added to subTaskSpecIterator to ensure the subtasks are generated with unique ids. Previously, only tests (i.e., MultiPhaseParallelIndexingTest) would have this decorator, but this behavior is desired for non-test code as well. * Fix forbidden apis and pmd warnings * Fix analyze dependencies warnings * Fix IndexTask json and add IT diags * Fix parallel index supervisor<->worker serde * Fix TeamCity inspection errors/warnings * Fix TeamCity inspection errors/warnings again * Integrate changes with those from #8823 * Address review comments * Address more review comments * Fix forbidden apis * Address more review comments	2019-11-20 17:24:12 -08:00
Jihoon Son	ac6d703814	Support inputFormat and inputSource for sampler (#8901 ) * Support inputFormat and inputSource for sampler * Cleanup javadocs and names * fix style * fix timed shutoff input source reader * fix timed shutoff input source reader again * tidy up timed shutoff reader * unused imports * fix tc	2019-11-20 14:51:25 -08:00
Surekha	d628bebbd7	Make supervisor API similar to submit task API (#8810 ) * accept spec or dataSchema, tuningConfig, ioConfig while submitting task json * fix test * update docs * lgtm warning * Add original constructor back to IndexTask to minimize changes * fix indentation in docs * Allow spec to be specified in supervisor schema * undo IndexTask spec changes * update docs * Add Nullable and deprecated annotations * remove deprecated configs from SeekableStreamSupervisorSpec * remove nullable annotation	2019-11-20 10:04:41 -08:00
Clint Wylie	3fcaa1a61b	fix sql compatible null handling config work with runtime.properties (#8876 ) * fix sql compatible null handling config work with runtime.properties * fix npe * fix tests * add friendly error * comment, and friendlier still * fix compile * fix from merges	2019-11-20 03:55:29 -08:00
Atul Mohan	f5fbd0bea0	Handle missing values for delimited text files when Nullhandling is enabled (#8779 ) * Handle missing values * Fix multi value tests * Fix firehose tests * Fix conflicts	2019-11-19 22:35:22 -08:00
Jihoon Son	baefc65f80	Retrying with a backward compatible task type on unknown task type error in parallel indexing (#8905 ) * Retrying with a backward compatible task type on unknown task type error in parallel indexing * Register legacy class; add a serde test	2019-11-19 19:29:25 -08:00
Rye	d0913475b7	sampler returns nulls in CSV (#8871 ) * sampler returns nulls in CSV * fixed kafka sampler test * fix Kinesis test * sql compatibility fix * remove null to empty string conversion, use null * fix sql compatibility	2019-11-19 13:59:44 -08:00
Gian Merlino	c44452f0c1	Tidy up lifecycle, query, and ingestion logging. (#8889 ) * Tidy up lifecycle, query, and ingestion logging. The goal of this patch is to improve the clarity and usefulness of Druid's logging for cluster operators. For more information, see https://twitter.com/cowtowncoder/status/1195469299814555648. Concretely, this patch does the following: - Changes a lot of INFO logs to DEBUG, and DEBUG to TRACE, with the goal of reducing redundancy and improving clarity by avoiding showing rarely-useful log messages. This includes most "starting" and "stopping" messages, and most messages related to individual columns. - Adds new log4j2 templates that show operators how to enabled DEBUG logging for certain important packages. - Eliminate stack traces for query errors, unless log level is DEBUG or more. This is useful because query errors often indicate user error rather than system error, but dumping stack trace often gave operators the impression that there was a system failure. - Adds task id to Appenderator, AppenderatorDriver thread names. In the default log4j2 configuration, this will put them in log lines as well. It's very useful if a user is using the Indexer, where multiple tasks run in the same JVM. - More consistent terminology when it comes to "sequences" (sets of segments that are handed-off together by Kafka ingestion) and "offsets" (cursors in partitions). These terms had been confused in some log messages due to the fact that Kinesis calls offsets "sequence numbers". - Replaces some ugly toString calls with either the JSONification or something more operator-accessible (like a URL or segment identifier, instead of JSON object representing the same). * Adjustments. * Adjust integration test.	2019-11-19 13:57:58 -08:00
Surekha	cf6643eb9a	add sequenceName and currentCheckPoint for backwards compatibility (#8864 ) * add sequenceName and currentCheckPoint for backwards compatibility * Add serde unit test in kafka * fix checkstyle * add hashcode * update javadoc	2019-11-19 13:11:31 -08:00
Chi Cao Minh	8365bdf62a	Address security vulnerabilities (#8878 ) * Address security vulnerabilities Security vulnerabilities addressed by upgrading 3rd party libs: - Upgrade avro-ipc to 1.9.1 - sonatype-2019-0115 - Upgrade caffeine to 2.8.0 - sonatype-2019-0282 - Upgrade commons-beanutils to 1.9.4 - CVE-2014-0114 - Upgrade commons-codec to 1.13 - sonatype-2012-0050 - Upgrade commons-compress to 1.19 - CVE-2019-12402 - sonatype-2018-0293 - Upgrade hadoop-common to 2.8.5 - CVE-2018-11767 - Upgrade hadoop-mapreduce-client-core to 2.8.5 - CVE-2017-3166 - Upgrade hibernate-validator to 5.2.5 - CVE-2017-7536 - Upgrade httpclient to 4.5.10 - sonatype-2017-0359 - Upgrade icu4j to 55.1 - CVE-2014-8147 - Upgrade jackson-databind to 2.6.7.3: - CVE-2017-7525 - Upgrade jetty-http to 9.4.12: - CVE-2017-7657 - CVE-2017-7658 - CVE-2017-7656 - CVE-2018-12545 - Upgrade log4j-core to 2.8.2 - CVE-2017-5645: - Upgrade netty to 3.10.6 - CVE-2015-2156 - Upgrade netty-common to 4.1.42 - CVE-2019-9518 - Upgrade netty-codec-http to 4.1.42 - CVE-2019-16869 - Upgrade nimbus-jose-jwt to 4.41.1 - CVE-2017-12972 - CVE-2017-12974 - Upgrade plexus-utils to 3.0.24 - CVE-2017-1000487 - sonatype-2015-0173 - sonatype-2016-0398 - Upgrade postgresql to 42.2.8 - CVE-2018-10936 Note that if users are using JDBC lookups with postgres, they may need to update the JDBC jar used by the lookup extension. * Fix license for postgresql	2019-11-19 09:14:33 -08:00
Atul Mohan	8515a03c6b	Modify batch index task naming to accomodate simultaneous tasks (#8612 ) * Change hadoop task naming * Remove unused * Add timestamp * Fix build	2019-11-18 15:07:16 -08:00
Rye	ea8e4066f6	Use earliest offset on kafka newly discovered partitions (#8748 ) * Use earliest offset on kafka newly discovered partitions * resolve conflicts * remove redundant check cases * simplified unit tests * change test case * rewrite comments * add regression test * add junit ignore annotation * minor modifications * indent * override testableKafkaSupervisor and KafkaRecordSupplier to make the test runable * modified test constructor of kafkaRecordSupplier * simplify * delegated constructor	2019-11-18 11:05:31 -08:00
Jihoon Son	1611792855	Add InputSource and InputFormat interfaces (#8823 ) * Add InputSource and InputFormat interfaces * revert orc dependency * fix dimension exclusions and failing unit tests * fix tests * fix test * fix test * fix firehose and inputSource for parallel indexing task * fix tc * fix tc: remove unused method * Formattable * add needsFormat(); renamed to ObjectSource; pass metricsName for reader * address comments * fix closing resource * fix checkstyle * fix tests * remove verify from csv * Revert "remove verify from csv" This reverts commit `1ea7758489`. * address comments * fix import order and javadoc * flatMap * sampleLine * Add IntermediateRowParsingReader * Address comments * move csv reader test * remove test for verify * adjust comments * Fix InputEntityIteratingReader * rename source -> entity * address comments	2019-11-15 09:22:09 -08:00
Clint Wylie	cc54b2a9df	support for array expressions in TransformSpec with ExpressionTransform (#8744 ) * transformSpec + array expressions changes: * added array expression support to transformSpec * removed ParseSpec.verify since its only use afaict was preventing transform expr that did not replace their input from functioning * hijacked index task test to test changes * remove docs about being unsupported * re-arrange test assert * unused imports * imports * fix tests * preserve types * suppress warning, fixes, add test * formatting * cleanup * better list to array type conversion and tests * fix oops	2019-11-13 11:04:37 -08:00
chencb	cc2bdb5f51	Fix hadoop task jdk11 compatible (#8799 ) * Fix hadoop task jdk11 compatible * Fix HadoopTaskTest	2019-11-13 02:32:46 -08:00
Roman Leventov	5c0fc0a13a	Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed or all used segments (#8564 ) * IndexerSQLMetadataStorageCoordinator.getTimelineForIntervalsWithHandle() don't fetch abutting intervals; simplify getUsedSegmentsForIntervals() * Add VersionedIntervalTimeline.findNonOvershadowedObjectsInInterval() method; Propagate the decision about whether only visible segmetns or visible and overshadowed segments should be returned from IndexerMetadataStorageCoordinator's methods to the user logic; Rename SegmentListUsedAction to RetrieveUsedSegmentsAction, SegmetnListUnusedAction to RetrieveUnusedSegmentsAction, and UsedSegmentLister to UsedSegmentsRetriever * Fix tests * More fixes * Add javadoc notes about returning Collection instead of Set. Add JacksonUtils.readValue() to reduce boilerplate code * Fix KinesisIndexTaskTest, factor out common parts from KinesisIndexTaskTest and KafkaIndexTaskTest into SeekableStreamIndexTaskTestBase * More test fixes * More test fixes * Add a comment to VersionedIntervalTimelineTestBase * Fix tests * Set DataSegment.size(0) in more tests * Specify DataSegment.size(0) in more places in tests * Fix more tests * Fix DruidSchemaTest * Set DataSegment's size in more tests and benchmarks * Fix HdfsDataSegmentPusherTest * Doc changes addressing comments * Extended doc for visibility * Typo * Typo 2 * Address comment	2019-11-06 11:07:04 -08:00
Giuseppe Martino	9c171e2b1f	Message rejection absolute date (#8656 ) * Add option lateMessageRejectionStartDate * Use option lateMessageRejectionStartDate * Fix tests * Add lateMessageRejectionStartDate to kafka indexing service * Update tests kafka indexing service * Fix tests for KafkaSupervisorTest * Add lateMessageRejectionStartDate to KinesisSupervisorIOConfig * Fix var name * Update documentation * Add check lateMessageRejectionStartDateTime and lateMessageRejectionPeriod, fails if both were specified.	2019-10-31 15:13:02 -07:00
Jihoon Son	2363b61983	Asynchronous file copy in the shuffle of parallel indexing task (#8783 )	2019-10-30 18:00:05 -07:00
Xiaobao	b9d10473a5	fix typo (#8745 )	2019-10-25 19:21:56 +08:00
Jihoon Son	094936ca03	Remove commit() method Firehose (#8688 ) * Remove commit() method Firehose * fix javadoc	2019-10-23 16:52:02 -07:00
Jihoon Son	2518478b20	Remove deprecated parameter for Checkpoint request (#8707 ) * Remove deprecated parameter for Checkpoint request * fix wrong doc	2019-10-23 16:51:16 -07:00
Surekha	98f59ddd7e	Add `sys.supervisors` table to system tables (#8547 ) * Add supervisors table to SystemSchema * Add docs * fix checkstyle * fix test * fix CI * Add comments * Fix javadoc teamcity error * comments * fix links in docs * fix links * rename fullStatus query param to system and remove it from docs	2019-10-18 15:16:42 -07:00
Jihoon Son	30c15900be	Auto compaction based on parallel indexing (#8570 ) * Auto compaction based on parallel indexing * javadoc and doc * typo * update spell * addressing comments * address comments * fix log * fix build * fix test * increase default max input segment bytes per task * fix test	2019-10-18 13:24:14 -07:00
Mingming Qiu	2c758ef5ff	Support assign tasks to run on different categories of MiddleManagers (#7066 ) * Support assign tasks to run on different tiers of MiddleManagers * address comments * address comments * rename tier to category and docs * doc * fix doc * fix spelling errors * docs	2019-10-17 12:57:19 -07:00
Jonathan Wei	89ce6384f5	More Kinesis resharding adjustments (#8671 ) * More Kinesis resharding adjustments * Fix TC inspection * Fix comment' * Adjust comment, small refactor * Make repartition transition time configurable * Add spellcheck exclusion * Spelling fix	2019-10-15 23:19:17 -07:00
Jihoon Son	4046c86d62	Stateful auto compaction (#8573 ) * Stateful auto compaction * javaodc * add removed test back * fix test * adding indexSpec to compactionState * fix build * add lastCompactionState * address comments * extract CompactionState * fix doc * fix build and test * Add a task context to store compaction state; add javadoc * fix it test	2019-10-15 22:57:42 -07:00
Jonathan Wei	0c387c1d47	Fix Kinesis resharding issues (#8644 ) * Fix Kinesis resharding issues * PR comments * Adjust metadata error message * Remove unused method * Use sha1 for shard id hashing * Add metadata sanity check, add comment * Only use shard ID hashing for group mapping * Style fix * Fix unused import * update comment * Fix teamcity inspection	2019-10-10 00:16:44 -07:00
Jihoon Son	96d8523ecb	Use hash of Segment IDs instead of a list of explicit segments in auto compaction (#8571 ) * IOConfig for compaction task * add javadoc, doc, unit test * fix webconsole test * add spelling * address comments * fix build and test * address comments	2019-10-09 11:12:00 -07:00
Chi Cao Minh	b6b5517c20	Speed up ParallelIndexSupervisorTask tests (#8633 ) Previously, some tests for ParallelIndexSupervisorTask were being run twice unnecessarily.	2019-10-08 19:56:12 -07:00
Himanshu	d91d1c8699	make TaskMonitor continue to monitor in the face of transient errors (#8625 )	2019-10-04 09:42:20 -07:00
Fokko Driesprong	82bfe86d0c	Make more package EverythingIsNonnullByDefault by default (#8198 ) * Make more package EverythingIsNonnullByDefault by default * Fixed additional voilations after pulling in master * Change iterator to list.addAll * Fix annotations	2019-09-30 18:53:18 -06:00
elloooooo	7f2b6577ef	get active task by datasource when supervisor discover tasks (#8450 ) * get active task by datasource when supervisor discover tasks * fix ut * fix ut * fix ut * remove unnecessary condition check * fix ut * remove stream in hot loop	2019-09-26 16:15:24 -07:00
Rye	f2a444321b	Added live reports for Kafka and Native batch task (#8557 ) * Added live reports for Kafka and Native batch task * Removed unused local variables * Added the missing unit test * Refine unit test logic, add implementation for HttpRemoteTaskRunner * checksytle fixes * Update doc descriptions for updated API * remove unnecessary files * Fix spellcheck complaints * More details for api descriptions	2019-09-23 21:08:36 -07:00
Chi Cao Minh	aeac0d4fd3	Adjust defaults for hashed partitioning (#8565 ) * Adjust defaults for hashed partitioning If neither the partition size nor the number of shards are specified, default to partitions of 5,000,000 rows (similar to the behavior of dynamic partitions). Previously, both could be null and cause incorrect behavior. Specifying both a partition size and a number of shards now results in an error instead of ignoring the partition size in favor of using the number of shards. This is a behavior change that makes it more apparent to the user that only one of the two properties will be honored (previously, a message was just logged when the specified partition size was ignored). * Fix test * Handle -1 as null * Add -1 as null tests for single dim partitioning * Simplify logic to handle -1 as null * Address review comments	2019-09-21 20:57:40 -07:00
Himanshu	62afbca7b9	update HRTR to account for task known to be running on a worker when it shows up (#8427 )	2019-09-19 10:19:17 -07:00
Clint Wylie	b00dd84fa2	clarify error messaging for parallel indexing task when when missing numShards or intervals (#8513 )	2019-09-11 20:47:27 -07:00
Chi Cao Minh	5f61374cb3	Fix dependency analyze warnings (#8230 ) * Fix dependency analyze warnings Update the maven dependency plugin to the latest version and fix all warnings for unused declared and used undeclared dependencies in the compile scope. Added new travis job to add the check to CI. Also fixed some source code files to use the correct packages for their imports and updated druid-forbidden-apis to prevent regressions. * Address review comments * Adjust scope for org.glassfish.jaxb:jaxb-runtime * Fix dependencies for hdfs-storage * Consolidate netty4 versions	2019-09-09 14:37:21 -07:00
Clint Wylie	c73a489335	bump master version to 0.17.0-incubating-SNAPSHOT (#8421 )	2019-08-28 01:58:36 -07:00
Jihoon Son	e5ef5ddafa	Fix the shuffle with TLS enabled for parallel indexing; add an integration test; improve unit tests (#8350 ) * Fix shuffle with tls enabled; add an integration test; improve unit tests * remove debug log * fix tests * unused import * add javadoc * rename to getContent	2019-08-26 19:27:41 -07:00
Dylan Wylie	b2821a8371	do not exclude client core jar (#8339 ) make indexing service depend on hadoop client	2019-08-26 13:48:24 -07:00
Xavier Léauté	8e0c307e54	Do not assume system classloader is URLClassLoader in Java 9+ (#8392 ) * Fallback to parsing classpath for hadoop task in Java 9+ In Java 9 and above we cannot assume that the system classloader is an instance of URLClassLoader. This change adds a fallback method to parse the system classpath in that case, and adds a unit test to validate it matches what JDK8 would do. Note: This has not been tested in an actual hadoop setup, so this is mostly to help us pass unit tests. * Remove granularity test of dubious value One of our granularity tests relies on system classloader being a URLClassLoaders to catch a bug related to class initialization and static initializers using a subclass (see #2979) This test was added to catch a potential regression, but it assumes we would add back the same type of static initializers to this specific class, so it seems to be of dubious value as a unit test and mostly serves to illustrate the bug. relates to #5589	2019-08-24 20:47:54 -04:00
Jihoon Son	95fa609615	Fix wrong partitionsSpec type names in the document (#8297 ) * Fix wrong type names for partitionsSpec * add unit tests; add json properties for backward compatibility * beautify conf names * remove maxRowsPerSegment from hashed partitionsSpec * fix doc build	2019-08-23 13:44:58 -07:00
SandishKumarHN	33f0753a70	Add Checkstyle for constant name static final (#8060 ) * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * merging with upstream * review-1 * unknow changes * unknow changes * review-2 * merging with master * review-2 1 changes * review changes-2 2 * bug fix	2019-08-23 13:13:54 +03:00
Surekha	cf2a2dd917	Add group_id to the sys.tasks table (#8304 ) * Add group_id to overlord tasks API and sys.tasks table * adjust test * modify docs * Make groupId nullable * fix integration test * fix toString * Remove groupId from TaskInfo * Modify docs and tests * modify TaskMonitorTest	2019-08-22 15:28:23 -07:00
Jihoon Son	fba92ae469	Fix to always use end sequenceNumber for reset (#8305 ) * Fix to always use end sequenceNumber for reset * fix checkstyle * fix style and add log	2019-08-22 16:51:25 -05:00
Jonathan Wei	96e2142ea3	Cleanup appenderators and segment walkers in UnifiedIndexerAppenderatorsManager (#8287 ) * Cleanup Appenderators in UnifiedIndexerAppenderatorsManager * PR comments * More PR comments * Fix test	2019-08-22 12:18:46 -07:00
Clint Wylie	b95607d31c	remove YeOldePlumberSchool.java, unused (#8347 )	2019-08-21 18:15:51 -07:00

1 2 3 4 5 ...

1669 Commits