druid

Commit Graph

Author	SHA1	Message	Date
Jihoon Son	d644a27f1a	Create packed core partitions for hash/range-partitioned segments in native batch ingestion (#10025 ) * Fill in the core partition set size properly for batch ingestion with dynamic partitioning * incomplete javadoc * Address comments * fix tests * fix json serde, add tests * checkstyle * Set core partition set size for hash-partitioned segments properly in batch ingestion * test for both parallel and single-threaded task * unused variables * fix test * unused imports * add hash/range buckets * some test adjustment and missing json serde * centralized partition id allocation in parallel and simple tasks * remove string partition chunk * revive string partition chunk * fill numCorePartitions for hadoop * clean up hash stuffs * resolved todos * javadocs * Fix tests * add more tests * doc * unused imports	2020-06-18 18:40:43 -07:00
Aleksey Plekhanov	2c384b61ff	IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty()" (#9690 ) IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty()" Reverted checkstyle rule * Added tests to pass CI * Codestyle	2020-06-18 09:47:07 -07:00
mcbrewster	28be107a1c	add flag to flattenSpec to keep null columns (#9814 ) * add flag to flattenSpec to keep null columns * remove changes to inputFormat interface * add comment * change comment message * update web console e2e test * move keepNullColmns to JSONParseSpec * fix merge conflicts * fix tests * set keepNullColumns to false by default * fix lgtm * change Boolean to boolean, add keepNullColumns to hash, add tests for keepKeepNullColumns false + true with no nuulul columns * Add equals verifier tests	2020-05-08 21:53:39 -07:00
Suneet Saldanha	1ced3b33fb	IntelliJ inspections cleanup (#9339 ) * IntelliJ inspections cleanup * Standard Charset object can be used * Redundant Collection.addAll() call * String literal concatenation missing whitespace * Statement with empty body * Redundant Collection operation * StringBuilder can be replaced with String * Type parameter hides visible type * fix warnings in test code * more test fixes * remove string concatenation inspection error * fix extra curly brace * cleanup AzureTestUtils * fix charsets for RangerAdminClient * review comments	2020-04-10 10:04:40 -07:00
Jihoon Son	0da8ffc3ff	Bump up development version to 0.19.0-SNAPSHOT (#9586 )	2020-03-30 16:24:04 -07:00
Maytas Monsereenusorn	e97695d9da	fix Hadoop ingestion fails due to error 'JavaScript is disabled' on certain config (#9553 ) * fix Hadoop ingestion fails due to error 'JavaScript is disabled', if determine partition hadoop job is run * add test * fix checkstyle * address comments * address comments	2020-03-23 23:09:21 -07:00
Gian Merlino	c6c2282b59	Harmonization and bug-fixing for selector and filter behavior on unknown types. (#9484 ) * Harmonization and bug-fixing for selector and filter behavior on unknown types. - Migrate ValueMatcherColumnSelectorStrategy to newer ColumnProcessorFactory system, and set defaultType COMPLEX so unknown types can be dynamically matched. - Remove ValueGetters in favor of ColumnComparisonFilter doing its own thing. - Switch various methods to use convertObjectToX when casting to numbers, rather than ad-hoc and inconsistent logic. - Fix bug in RowBasedExpressionColumnValueSelector: isBindingArray should return true even for 0- or 1- element arrays. - Adjust various javadocs. * Add throwParseExceptions option to Rows.objectToNumber, switch back to that. * Update tests. * Adjust moment sketch tests.	2020-03-10 07:15:57 -07:00
Clint Wylie	831ec172f1	Logging large segment list handling (#9312 ) * better handling of large segment lists in logs * more * adjust * exceptions * fixes * refactor * debug * heh * dang	2020-02-07 21:42:45 -08:00
Jihoon Son	e81230f9ab	Refactoring some codes around ingestion (#9274 ) * Refactoring codes around ingestion: - Parallel index task and simple task now use the same segment allocator implementation. This is reusable for the future implementation as well. - Added PartitionAnalysis to store the analysis of the partitioning - Move some util methods to SegmentLockHelper and rename it to TaskLockHelper * fix build * fix SingleDimensionShardSpecFactory * optimize SingledimensionShardSpecFactory * fix test * shard spec builder * import order * shardSpecBuilder -> partialShardSpec * build -> complete * fix comment; add unit tests for partitionBoundaries * add more tests and fix javadoc * fix toString(); add serde tests for HashBasedNumberedPartialShardSpec and SegmentAllocateAction * fix test * add equality test for hash and range partial shard specs	2020-02-07 16:23:07 -08:00
Suneet Saldanha	303b02eba1	intelliJ inspections cleanup (#9260 ) * intelliJ inspections cleanup - remove redundant escapes - performance warnings - access static member via instance reference - static method declared final - inner class may be static Most of these changes are aesthetic, however, they will allow inspections to be enabled as part of CI checks going forward The valuable changes in this delta are: - using StringBuilder instead of string addition in a loop indexing-hadoop/.../Utils.java processing/.../ByteBufferMinMaxOffsetHeap.java - Use class variables instead of static variables for parameterized test processing/src/.../ScanQueryLimitRowIteratorTest.java * Add intelliJ inspection warnings as errors to druid profile * one more static inner class	2020-01-29 11:50:52 -08:00
Roman Leventov	b9186f8f9f	Reconcile terminology and method naming to 'used/unused segments'; Rename MetadataSegmentManager to MetadataSegmentsManager (#7306 ) * Reconcile terminology and method naming to 'used/unused segments'; Don't use terms 'enable/disable data source'; Rename MetadataSegmentManager to MetadataSegments; Make REST API methods which mark segments as used/unused to return server error instead of an empty response in case of error * Fix brace * Import order * Rename withKillDataSourceWhitelist to withSpecificDataSourcesToKill * Fix tests * Fix tests by adding proper methods without interval parameters to IndexerMetadataStorageCoordinator instead of hacking with Intervals.ETERNITY * More aligned names of DruidCoordinatorHelpers, rename several CoordinatorDynamicConfig parameters * Rename ClientCompactTaskQuery to ClientCompactionTaskQuery for consistency with CompactionTask; ClientCompactQueryTuningConfig to ClientCompactionTaskQueryTuningConfig * More variable and method renames * Rename MetadataSegments to SegmentsMetadata * Javadoc update * Simplify SegmentsMetadata.getUnusedSegmentIntervals(), more javadocs * Update Javadoc of VersionedIntervalTimeline.iterateAllObjects() * Reorder imports * Rename SegmentsMetadata.tryMark... methods to mark... and make them to return boolean and the numbers of segments changed and relay exceptions to callers * Complete merge * Add CollectionUtils.newTreeSet(); Refactor DruidCoordinatorRuntimeParams creation in tests * Remove MetadataSegmentManager * Rename millisLagSinceCoordinatorBecomesLeaderBeforeCanMarkAsUnusedOvershadowedSegments to leadingTimeMillisBeforeCanMarkAsUnusedOvershadowedSegments * Fix tests, refactor DruidCluster creation in tests into DruidClusterBuilder * Fix inspections * Fix SQLMetadataSegmentManagerEmptyTest and rename it to SqlSegmentsMetadataEmptyTest * Rename SegmentsAndMetadata to SegmentsAndCommitMetadata to reduce the similarity with SegmentsMetadata; Rename some methods * Rename DruidCoordinatorHelper to CoordinatorDuty, refactor DruidCoordinator * Unused import * Optimize imports * Rename IndexerSQLMetadataStorageCoordinator.getDataSourceMetadata() to retrieveDataSourceMetadata() * Unused import * Update terminology in datasource-view.tsx * Fix label in datasource-view.spec.tsx.snap * Fix lint errors in datasource-view.tsx * Doc improvements * Another attempt to please TSLint * Another attempt to please TSLint * Style fixes * Fix IndexerSQLMetadataStorageCoordinator.createUsedSegmentsSqlQueryForIntervals() (wrong merge) * Try to fix docs build issue * Javadoc and spelling fixes * Rename SegmentsMetadata to SegmentsMetadataManager, address other comments * Address more comments	2020-01-27 11:24:29 -08:00
Jonathan Wei	aa539177ec	De-incubation cleanup in code, docs, packaging (#9108 ) * De-incubation cleanup in code, docs, packaging * remove unused docs script	2020-01-03 12:33:19 -05:00
Jonathan Wei	4e8368a5d9	Set version to 0.18.0-SNAPSHOT (#9109 )	2020-01-02 17:55:10 -05:00
Jonathan Wei	15884f6d10	Fix hadoop ingestion property handling when using indexers (#9059 )	2019-12-18 12:13:19 -08:00
Jonathan Wei	8af41d7cd0	Update version to 0.18.0-incubating-SNAPSHOT (#9009 )	2019-12-11 14:04:03 -08:00
jon-wei	dfbc066163	Revert "[maven-release-plugin] prepare release druid-0.16.1-incubating-rc1" This reverts commit `a0f21d9b07`.	2019-11-27 23:22:43 -08:00
jon-wei	0402ff85b8	Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit `8ffa71e7e6`.	2019-11-27 23:22:32 -08:00
jon-wei	8ffa71e7e6	[maven-release-plugin] prepare for next development iteration	2019-11-27 23:18:48 -08:00
jon-wei	a0f21d9b07	[maven-release-plugin] prepare release druid-0.16.1-incubating-rc1	2019-11-27 23:18:37 -08:00
Chi Cao Minh	fba876b607	Update jackson to 2.9.10 (#8940 ) Addresses security vulnerabilities: - sonatype-2016-0397: https://github.com/FasterXML/jackson-core/issues/315 - sonatype-2017-0355: https://github.com/FasterXML/jackson-core/pull/322	2019-11-26 21:41:14 -08:00
Gian Merlino	e0eb85ace7	Add FileUtils.createTempDir() and enforce its usage. (#8932 ) * Add FileUtils.createTempDir() and enforce its usage. The purpose of this is to improve error messages. Previously, the error message on a nonexistent or unwritable temp directory would be "Failed to create directory within 10,000 attempts". * Further updates. * Another update. * Remove commons-io from benchmark. * Fix tests.	2019-11-22 19:48:49 -08:00
Jihoon Son	1611792855	Add InputSource and InputFormat interfaces (#8823 ) * Add InputSource and InputFormat interfaces * revert orc dependency * fix dimension exclusions and failing unit tests * fix tests * fix test * fix test * fix firehose and inputSource for parallel indexing task * fix tc * fix tc: remove unused method * Formattable * add needsFormat(); renamed to ObjectSource; pass metricsName for reader * address comments * fix closing resource * fix checkstyle * fix tests * remove verify from csv * Revert "remove verify from csv" This reverts commit `1ea7758489`. * address comments * fix import order and javadoc * flatMap * sampleLine * Add IntermediateRowParsingReader * Address comments * move csv reader test * remove test for verify * adjust comments * Fix InputEntityIteratingReader * rename source -> entity * address comments	2019-11-15 09:22:09 -08:00
Roman Leventov	5c0fc0a13a	Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed or all used segments (#8564 ) * IndexerSQLMetadataStorageCoordinator.getTimelineForIntervalsWithHandle() don't fetch abutting intervals; simplify getUsedSegmentsForIntervals() * Add VersionedIntervalTimeline.findNonOvershadowedObjectsInInterval() method; Propagate the decision about whether only visible segmetns or visible and overshadowed segments should be returned from IndexerMetadataStorageCoordinator's methods to the user logic; Rename SegmentListUsedAction to RetrieveUsedSegmentsAction, SegmetnListUnusedAction to RetrieveUnusedSegmentsAction, and UsedSegmentLister to UsedSegmentsRetriever * Fix tests * More fixes * Add javadoc notes about returning Collection instead of Set. Add JacksonUtils.readValue() to reduce boilerplate code * Fix KinesisIndexTaskTest, factor out common parts from KinesisIndexTaskTest and KafkaIndexTaskTest into SeekableStreamIndexTaskTestBase * More test fixes * More test fixes * Add a comment to VersionedIntervalTimelineTestBase * Fix tests * Set DataSegment.size(0) in more tests * Specify DataSegment.size(0) in more places in tests * Fix more tests * Fix DruidSchemaTest * Set DataSegment's size in more tests and benchmarks * Fix HdfsDataSegmentPusherTest * Doc changes addressing comments * Extended doc for visibility * Typo * Typo 2 * Address comment	2019-11-06 11:07:04 -08:00
Chi Cao Minh	8b2afa5c49	Use targetRowsPerSegment for single-dim partitions (#8624 ) When using single-dimension partitioning, use targetRowsPerSegment (if specified) to size segments. Previously, single-dimension partitioning would always size segments as close to the max size as possible. Also, change single-dimension partitioning to allow partitions that have a size equal to the target or max size. Previously, it would create partitions up to 1 less than those limits. Also, fix some IntelliJ inspection warnings in HadoopDruidIndexerConfig.	2019-10-17 15:55:12 -07:00
Jihoon Son	4046c86d62	Stateful auto compaction (#8573 ) * Stateful auto compaction * javaodc * add removed test back * fix test * adding indexSpec to compactionState * fix build * add lastCompactionState * address comments * extract CompactionState * fix doc * fix build and test * Add a task context to store compaction state; add javadoc * fix it test	2019-10-15 22:57:42 -07:00
Benedict Jin	bba262a4c5	Fix resource leaks and suppress an incorrect LGTM alert (#8589 ) * Fix resource leaks and suppress an incorrect alert * Replace Guava's Files	2019-10-10 22:40:45 +03:00
Jihoon Son	96d8523ecb	Use hash of Segment IDs instead of a list of explicit segments in auto compaction (#8571 ) * IOConfig for compaction task * add javadoc, doc, unit test * fix webconsole test * add spelling * address comments * fix build and test * address comments	2019-10-09 11:12:00 -07:00
Fokko Driesprong	82bfe86d0c	Make more package EverythingIsNonnullByDefault by default (#8198 ) * Make more package EverythingIsNonnullByDefault by default * Fixed additional voilations after pulling in master * Change iterator to list.addAll * Fix annotations	2019-09-30 18:53:18 -06:00
Chi Cao Minh	aeac0d4fd3	Adjust defaults for hashed partitioning (#8565 ) * Adjust defaults for hashed partitioning If neither the partition size nor the number of shards are specified, default to partitions of 5,000,000 rows (similar to the behavior of dynamic partitions). Previously, both could be null and cause incorrect behavior. Specifying both a partition size and a number of shards now results in an error instead of ignoring the partition size in favor of using the number of shards. This is a behavior change that makes it more apparent to the user that only one of the two properties will be honored (previously, a message was just logged when the specified partition size was ignored). * Fix test * Handle -1 as null * Add -1 as null tests for single dim partitioning * Simplify logic to handle -1 as null * Address review comments	2019-09-21 20:57:40 -07:00
Chi Cao Minh	99b6eedab5	Rename partition spec fields (#8507 ) * Rename partition spec fields Rename partition spec fields to be consistent across the various types (hashed, single_dim, dynamic). Specifically, use targetNumRowsPerSegment and maxRowsPerSegment in favor of targetPartitionSize and maxSegmentSize. Consistent and clearer names are easier for users to understand and use. Also fix various IntelliJ inspection warnings and doc spelling mistakes. * Fix test * Improve docs * Add targetRowsPerSegment to HashedPartitionsSpec	2019-09-20 14:59:18 -06:00
Chi Cao Minh	5f61374cb3	Fix dependency analyze warnings (#8230 ) * Fix dependency analyze warnings Update the maven dependency plugin to the latest version and fix all warnings for unused declared and used undeclared dependencies in the compile scope. Added new travis job to add the check to CI. Also fixed some source code files to use the correct packages for their imports and updated druid-forbidden-apis to prevent regressions. * Address review comments * Adjust scope for org.glassfish.jaxb:jaxb-runtime * Fix dependencies for hdfs-storage * Consolidate netty4 versions	2019-09-09 14:37:21 -07:00
Clint Wylie	c73a489335	bump master version to 0.17.0-incubating-SNAPSHOT (#8421 )	2019-08-28 01:58:36 -07:00
Dylan Wylie	b2821a8371	do not exclude client core jar (#8339 ) make indexing service depend on hadoop client	2019-08-26 13:48:24 -07:00
SandishKumarHN	33f0753a70	Add Checkstyle for constant name static final (#8060 ) * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * merging with upstream * review-1 * unknow changes * unknow changes * review-2 * merging with master * review-2 1 changes * review changes-2 2 * bug fix	2019-08-23 13:13:54 +03:00
Jonathan Wei	e8727dc98b	Fix DeterminePartitionsJob reducer when total rows < targetPartitionSize * SHARD_COMBINE_THRESHOLD (#8273 ) * Fix DeterminePartitionsJob reducer when rows < targetPartitionSize * use isEmpty()	2019-08-09 16:03:30 -05:00
Jihoon Son	385f492a55	Use PartitionsSpec for all task types (#8141 ) * Use partitionsSpec for all task types * fix doc * fix typos and revert to use isPushRequired * address comments * move partitionsSpec to core * remove hadoopPartitionsSpec	2019-07-30 17:24:39 -07:00
Chi Cao Minh	ab71a2e1e4	Revert "Fix dependency analyze warnings (#8128 )" (#8189 ) This reverts commit `5dd0d8e873`.	2019-07-29 11:42:16 -07:00
Chi Cao Minh	5dd0d8e873	Fix dependency analyze warnings (#8128 ) * Fix dependency analyze warnings Update the maven dependency plugin to the latest version and fix all warnings for unused declared and used undeclared dependencies in the compile scope. Added new travis job to add the check to CI. Also fixed some source code files to use the correct packages for their imports. * Fix licenses and dependencies * Fix licenses and dependencies again * Fix integration test dependency * Address review comments * Fix unit test dependencies * Fix integration test dependency * Fix integration test dependency again * Fix integration test dependency third time * Fix integration test dependency fourth time * Fix compile error * Fix assert package	2019-07-26 10:49:03 -07:00
Jihoon Son	db14946207	Add support minor compaction with segment locking (#7547 ) * Segment locking * Allow both timeChunk and segment lock in the same gruop * fix it test * Fix adding same chunk to atomicUpdateGroup * resolving todos * Fix segments to lock * fix segments to lock * fix kill task * resolving todos * resolving todos * fix teamcity * remove unused class * fix single map * resolving todos * fix build * fix SQLMetadataSegmentManager * fix findInputSegments * adding more tests * fixing task lock checks * add SegmentTransactionalOverwriteAction * changing publisher * fixing something * fix for perfect rollup * fix test * adjust package-lock.json * fix test * fix style * adding javadocs * remove unused classes * add more javadocs * unused import * fix test * fix test * Support forceTimeChunk context and force timeChunk lock for parallel index task if intervals are missing * fix travis * fix travis * unused import * spotbug * revert getMaxVersion * address comments * fix tc * add missing error handling * fix backward compatibility * unused import * Fix perf of versionedIntervalTimeline * fix timeline * fix tc * remove remaining todos * add comment for parallel index * fix javadoc and typos * typo * address comments	2019-07-24 17:35:46 -07:00
Clint Wylie	03e55d30eb	add CachingClusteredClient benchmark, refactor some stuff (#8089 ) * add CachingClusteredClient benchmark, refactor some stuff * revert WeightedServerSelectorStrategy to ConnectionCountServerSelectorStrategy and remove getWeight since felt artificial, default mergeResults in toolchest implementation for topn, search, select * adjust javadoc * adjustments * oops * use it * use BinaryOperator, remove CombiningFunction, use Comparator instead of Ordering, other review adjustments * rename createComparator to createResultComparator, fix typo, firstNonNull nullable parameters	2019-07-18 13:16:28 -07:00
Himanshu	14aec7fcec	add config to optionally disable all compression in intermediate segment persists while ingestion (#7919 ) * disable all compression in intermediate segment persists while ingestion * more changes and build fix * by default retain existing indexingSpec for intermediate persisted segments * document indexSpecForIntermediatePersists index tuning config * fix build issues * update serde tests	2019-07-10 12:22:24 -07:00
Sashidhar Thallam	3bee6adcf7	Use map.putIfAbsent() or map.computeIfAbsent() as appropriate instead of containsKey() + put() (#7764 ) * https://github.com/apache/incubator-druid/issues/7316 Use Map.putIfAbsent() instead of containsKey() + put() * fixing indentation * Using map.computeIfAbsent() instead of map.putIfAbsent() where appropriate * fixing checkstyle * Changing the recommendation text * Reverting auto changes made by IDE * Implementing recommendation: A ConcurrentHashMap on which computeIfAbsent() is called should be assigned into variables of ConcurrentHashMap type, not ConcurrentMap * Removing unused import	2019-06-14 17:59:36 +02:00
Jihoon Son	7abfbb066a	Bump up snapshot version to 0.16.0 (#7802 )	2019-05-30 17:17:33 -07:00
Himanshu	2b7bb064b5	remove unused ObjectMapper from DatasourcePathSpec (#7754 )	2019-05-24 23:15:40 -07:00
Jonathan Wei	d99f77a01b	Add option to use YARN RM as fallback for JobHistory failure (#7673 ) * Add option to use YARN RM as fallback for job status * PR comments	2019-05-16 13:59:10 -07:00
Fokko Driesprong	e8a6575fb3	Remove Joda from indexing-hadoop (#7650 )	2019-05-13 12:31:13 -07:00
Samarth Jain	9732e04c60	Pass in segmentTable correctly (#7492 )	2019-04-17 20:07:22 -07:00
Faxian Zhao	6789438a49	make hdfs index map reduce task add jar more reasonable (#7294 )	2019-04-14 10:26:59 -07:00
Roman Leventov	bca40dcdaf	Fix some IntelliJ inspections (#7273 ) Prepare TeamCity for IntelliJ 2018.3.1 upgrade. Mostly removed redundant exceptions declarations in `throws` clauses.	2019-03-25 21:11:01 -03:00
Jihoon Son	892d1d35d6	Deprecate NoneShardSpec and drop support for automatic segment merge (#6883 ) * Deprecate noneShardSpec * clean up noneShardSpec constructor * revert unnecessary change * Deprecate mergeTask * add more doc * remove convert from indexMerger * Remove mergeTask * remove HadoopDruidConverterConfig * fix build * fix build * fix teamcity * fix teamcity * fix ServerModule * fix compilation * fix compilation	2019-03-15 23:29:25 -07:00

1 2 3 4 5 ...

986 Commits