* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning
* incomplete javadoc
* Address comments
* fix tests
* fix json serde, add tests
* checkstyle
* Set core partition set size for hash-partitioned segments properly in
batch ingestion
* test for both parallel and single-threaded task
* unused variables
* fix test
* unused imports
* add hash/range buckets
* some test adjustment and missing json serde
* centralized partition id allocation in parallel and simple tasks
* remove string partition chunk
* revive string partition chunk
* fill numCorePartitions for hadoop
* clean up hash stuffs
* resolved todos
* javadocs
* Fix tests
* add more tests
* doc
* unused imports
* IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty*()"
* Reverted checkstyle rule
* Added tests to pass CI
* Codestyle
* add flag to flattenSpec to keep null columns
* remove changes to inputFormat interface
* add comment
* change comment message
* update web console e2e test
* move keepNullColmns to JSONParseSpec
* fix merge conflicts
* fix tests
* set keepNullColumns to false by default
* fix lgtm
* change Boolean to boolean, add keepNullColumns to hash, add tests for keepKeepNullColumns false + true with no nuulul columns
* Add equals verifier tests
* IntelliJ inspections cleanup
* Standard Charset object can be used
* Redundant Collection.addAll() call
* String literal concatenation missing whitespace
* Statement with empty body
* Redundant Collection operation
* StringBuilder can be replaced with String
* Type parameter hides visible type
* fix warnings in test code
* more test fixes
* remove string concatenation inspection error
* fix extra curly brace
* cleanup AzureTestUtils
* fix charsets for RangerAdminClient
* review comments
* fix Hadoop ingestion fails due to error 'JavaScript is disabled', if determine partition hadoop job is run
* add test
* fix checkstyle
* address comments
* address comments
* Harmonization and bug-fixing for selector and filter behavior on unknown types.
- Migrate ValueMatcherColumnSelectorStrategy to newer ColumnProcessorFactory
system, and set defaultType COMPLEX so unknown types can be dynamically matched.
- Remove ValueGetters in favor of ColumnComparisonFilter doing its own thing.
- Switch various methods to use convertObjectToX when casting to numbers, rather
than ad-hoc and inconsistent logic.
- Fix bug in RowBasedExpressionColumnValueSelector: isBindingArray should return
true even for 0- or 1- element arrays.
- Adjust various javadocs.
* Add throwParseExceptions option to Rows.objectToNumber, switch back to that.
* Update tests.
* Adjust moment sketch tests.
* Refactoring codes around ingestion:
- Parallel index task and simple task now use the same segment allocator implementation. This is reusable for the future implementation as well.
- Added PartitionAnalysis to store the analysis of the partitioning
- Move some util methods to SegmentLockHelper and rename it to TaskLockHelper
* fix build
* fix SingleDimensionShardSpecFactory
* optimize SingledimensionShardSpecFactory
* fix test
* shard spec builder
* import order
* shardSpecBuilder -> partialShardSpec
* build -> complete
* fix comment; add unit tests for partitionBoundaries
* add more tests and fix javadoc
* fix toString(); add serde tests for HashBasedNumberedPartialShardSpec and SegmentAllocateAction
* fix test
* add equality test for hash and range partial shard specs
* intelliJ inspections cleanup
- remove redundant escapes
- performance warnings
- access static member via instance reference
- static method declared final
- inner class may be static
Most of these changes are aesthetic, however, they will allow inspections to
be enabled as part of CI checks going forward
The valuable changes in this delta are:
- using StringBuilder instead of string addition in a loop
indexing-hadoop/.../Utils.java
processing/.../ByteBufferMinMaxOffsetHeap.java
- Use class variables instead of static variables for parameterized test
processing/src/.../ScanQueryLimitRowIteratorTest.java
* Add intelliJ inspection warnings as errors to druid profile
* one more static inner class
* Reconcile terminology and method naming to 'used/unused segments'; Don't use terms 'enable/disable data source'; Rename MetadataSegmentManager to MetadataSegments; Make REST API methods which mark segments as used/unused to return server error instead of an empty response in case of error
* Fix brace
* Import order
* Rename withKillDataSourceWhitelist to withSpecificDataSourcesToKill
* Fix tests
* Fix tests by adding proper methods without interval parameters to IndexerMetadataStorageCoordinator instead of hacking with Intervals.ETERNITY
* More aligned names of DruidCoordinatorHelpers, rename several CoordinatorDynamicConfig parameters
* Rename ClientCompactTaskQuery to ClientCompactionTaskQuery for consistency with CompactionTask; ClientCompactQueryTuningConfig to ClientCompactionTaskQueryTuningConfig
* More variable and method renames
* Rename MetadataSegments to SegmentsMetadata
* Javadoc update
* Simplify SegmentsMetadata.getUnusedSegmentIntervals(), more javadocs
* Update Javadoc of VersionedIntervalTimeline.iterateAllObjects()
* Reorder imports
* Rename SegmentsMetadata.tryMark... methods to mark... and make them to return boolean and the numbers of segments changed and relay exceptions to callers
* Complete merge
* Add CollectionUtils.newTreeSet(); Refactor DruidCoordinatorRuntimeParams creation in tests
* Remove MetadataSegmentManager
* Rename millisLagSinceCoordinatorBecomesLeaderBeforeCanMarkAsUnusedOvershadowedSegments to leadingTimeMillisBeforeCanMarkAsUnusedOvershadowedSegments
* Fix tests, refactor DruidCluster creation in tests into DruidClusterBuilder
* Fix inspections
* Fix SQLMetadataSegmentManagerEmptyTest and rename it to SqlSegmentsMetadataEmptyTest
* Rename SegmentsAndMetadata to SegmentsAndCommitMetadata to reduce the similarity with SegmentsMetadata; Rename some methods
* Rename DruidCoordinatorHelper to CoordinatorDuty, refactor DruidCoordinator
* Unused import
* Optimize imports
* Rename IndexerSQLMetadataStorageCoordinator.getDataSourceMetadata() to retrieveDataSourceMetadata()
* Unused import
* Update terminology in datasource-view.tsx
* Fix label in datasource-view.spec.tsx.snap
* Fix lint errors in datasource-view.tsx
* Doc improvements
* Another attempt to please TSLint
* Another attempt to please TSLint
* Style fixes
* Fix IndexerSQLMetadataStorageCoordinator.createUsedSegmentsSqlQueryForIntervals() (wrong merge)
* Try to fix docs build issue
* Javadoc and spelling fixes
* Rename SegmentsMetadata to SegmentsMetadataManager, address other comments
* Address more comments
* Add FileUtils.createTempDir() and enforce its usage.
The purpose of this is to improve error messages. Previously, the error
message on a nonexistent or unwritable temp directory would be
"Failed to create directory within 10,000 attempts".
* Further updates.
* Another update.
* Remove commons-io from benchmark.
* Fix tests.
* IndexerSQLMetadataStorageCoordinator.getTimelineForIntervalsWithHandle() don't fetch abutting intervals; simplify getUsedSegmentsForIntervals()
* Add VersionedIntervalTimeline.findNonOvershadowedObjectsInInterval() method; Propagate the decision about whether only visible segmetns or visible and overshadowed segments should be returned from IndexerMetadataStorageCoordinator's methods to the user logic; Rename SegmentListUsedAction to RetrieveUsedSegmentsAction, SegmetnListUnusedAction to RetrieveUnusedSegmentsAction, and UsedSegmentLister to UsedSegmentsRetriever
* Fix tests
* More fixes
* Add javadoc notes about returning Collection instead of Set. Add JacksonUtils.readValue() to reduce boilerplate code
* Fix KinesisIndexTaskTest, factor out common parts from KinesisIndexTaskTest and KafkaIndexTaskTest into SeekableStreamIndexTaskTestBase
* More test fixes
* More test fixes
* Add a comment to VersionedIntervalTimelineTestBase
* Fix tests
* Set DataSegment.size(0) in more tests
* Specify DataSegment.size(0) in more places in tests
* Fix more tests
* Fix DruidSchemaTest
* Set DataSegment's size in more tests and benchmarks
* Fix HdfsDataSegmentPusherTest
* Doc changes addressing comments
* Extended doc for visibility
* Typo
* Typo 2
* Address comment
When using single-dimension partitioning, use targetRowsPerSegment (if
specified) to size segments. Previously, single-dimension partitioning
would always size segments as close to the max size as possible.
Also, change single-dimension partitioning to allow partitions that have
a size equal to the target or max size. Previously, it would create
partitions up to 1 less than those limits.
Also, fix some IntelliJ inspection warnings in HadoopDruidIndexerConfig.
* Stateful auto compaction
* javaodc
* add removed test back
* fix test
* adding indexSpec to compactionState
* fix build
* add lastCompactionState
* address comments
* extract CompactionState
* fix doc
* fix build and test
* Add a task context to store compaction state; add javadoc
* fix it test
* IOConfig for compaction task
* add javadoc, doc, unit test
* fix webconsole test
* add spelling
* address comments
* fix build and test
* address comments
* Make more package EverythingIsNonnullByDefault by default
* Fixed additional voilations after pulling in master
* Change iterator to list.addAll
* Fix annotations
* Adjust defaults for hashed partitioning
If neither the partition size nor the number of shards are specified,
default to partitions of 5,000,000 rows (similar to the behavior of
dynamic partitions). Previously, both could be null and cause incorrect
behavior.
Specifying both a partition size and a number of shards now results in
an error instead of ignoring the partition size in favor of using the
number of shards. This is a behavior change that makes it more apparent
to the user that only one of the two properties will be honored
(previously, a message was just logged when the specified partition size
was ignored).
* Fix test
* Handle -1 as null
* Add -1 as null tests for single dim partitioning
* Simplify logic to handle -1 as null
* Address review comments
* Rename partition spec fields
Rename partition spec fields to be consistent across the various types
(hashed, single_dim, dynamic). Specifically, use targetNumRowsPerSegment
and maxRowsPerSegment in favor of targetPartitionSize and
maxSegmentSize. Consistent and clearer names are easier for users to
understand and use.
Also fix various IntelliJ inspection warnings and doc spelling mistakes.
* Fix test
* Improve docs
* Add targetRowsPerSegment to HashedPartitionsSpec
* Fix dependency analyze warnings
Update the maven dependency plugin to the latest version and fix all
warnings for unused declared and used undeclared dependencies in the
compile scope. Added new travis job to add the check to CI. Also fixed
some source code files to use the correct packages for their imports and
updated druid-forbidden-apis to prevent regressions.
* Address review comments
* Adjust scope for org.glassfish.jaxb:jaxb-runtime
* Fix dependencies for hdfs-storage
* Consolidate netty4 versions
* check ctyle for constant field name
* check ctyle for constant field name
* check ctyle for constant field name
* check ctyle for constant field name
* check ctyle for constant field name
* check ctyle for constant field name
* check ctyle for constant field name
* check ctyle for constant field name
* check ctyle for constant field name
* merging with upstream
* review-1
* unknow changes
* unknow changes
* review-2
* merging with master
* review-2 1 changes
* review changes-2 2
* bug fix
* Use partitionsSpec for all task types
* fix doc
* fix typos and revert to use isPushRequired
* address comments
* move partitionsSpec to core
* remove hadoopPartitionsSpec
* Fix dependency analyze warnings
Update the maven dependency plugin to the latest version and fix all
warnings for unused declared and used undeclared dependencies in the
compile scope. Added new travis job to add the check to CI. Also fixed
some source code files to use the correct packages for their imports.
* Fix licenses and dependencies
* Fix licenses and dependencies again
* Fix integration test dependency
* Address review comments
* Fix unit test dependencies
* Fix integration test dependency
* Fix integration test dependency again
* Fix integration test dependency third time
* Fix integration test dependency fourth time
* Fix compile error
* Fix assert package
* add CachingClusteredClient benchmark, refactor some stuff
* revert WeightedServerSelectorStrategy to ConnectionCountServerSelectorStrategy and remove getWeight since felt artificial, default mergeResults in toolchest implementation for topn, search, select
* adjust javadoc
* adjustments
* oops
* use it
* use BinaryOperator, remove CombiningFunction, use Comparator instead of Ordering, other review adjustments
* rename createComparator to createResultComparator, fix typo, firstNonNull nullable parameters
* disable all compression in intermediate segment persists while ingestion
* more changes and build fix
* by default retain existing indexingSpec for intermediate persisted segments
* document indexSpecForIntermediatePersists index tuning config
* fix build issues
* update serde tests
* https://github.com/apache/incubator-druid/issues/7316 Use Map.putIfAbsent() instead of containsKey() + put()
* fixing indentation
* Using map.computeIfAbsent() instead of map.putIfAbsent() where appropriate
* fixing checkstyle
* Changing the recommendation text
* Reverting auto changes made by IDE
* Implementing recommendation: A ConcurrentHashMap on which computeIfAbsent() is called should be assigned into variables of ConcurrentHashMap type, not ConcurrentMap
* Removing unused import