* Support assign tasks to run on different tiers of MiddleManagers
* address comments
* address comments
* rename tier to category and docs
* doc
* fix doc
* fix spelling errors
* docs
* Stateful auto compaction
* javaodc
* add removed test back
* fix test
* adding indexSpec to compactionState
* fix build
* add lastCompactionState
* address comments
* extract CompactionState
* fix doc
* fix build and test
* Add a task context to store compaction state; add javadoc
* fix it test
* IOConfig for compaction task
* add javadoc, doc, unit test
* fix webconsole test
* add spelling
* address comments
* fix build and test
* address comments
* Make more package EverythingIsNonnullByDefault by default
* Fixed additional voilations after pulling in master
* Change iterator to list.addAll
* Fix annotations
* get active task by datasource when supervisor discover tasks
* fix ut
* fix ut
* fix ut
* remove unnecessary condition check
* fix ut
* remove stream in hot loop
* Added live reports for Kafka and Native batch task
* Removed unused local variables
* Added the missing unit test
* Refine unit test logic, add implementation for HttpRemoteTaskRunner
* checksytle fixes
* Update doc descriptions for updated API
* remove unnecessary files
* Fix spellcheck complaints
* More details for api descriptions
* Adjust defaults for hashed partitioning
If neither the partition size nor the number of shards are specified,
default to partitions of 5,000,000 rows (similar to the behavior of
dynamic partitions). Previously, both could be null and cause incorrect
behavior.
Specifying both a partition size and a number of shards now results in
an error instead of ignoring the partition size in favor of using the
number of shards. This is a behavior change that makes it more apparent
to the user that only one of the two properties will be honored
(previously, a message was just logged when the specified partition size
was ignored).
* Fix test
* Handle -1 as null
* Add -1 as null tests for single dim partitioning
* Simplify logic to handle -1 as null
* Address review comments
* Fix dependency analyze warnings
Update the maven dependency plugin to the latest version and fix all
warnings for unused declared and used undeclared dependencies in the
compile scope. Added new travis job to add the check to CI. Also fixed
some source code files to use the correct packages for their imports and
updated druid-forbidden-apis to prevent regressions.
* Address review comments
* Adjust scope for org.glassfish.jaxb:jaxb-runtime
* Fix dependencies for hdfs-storage
* Consolidate netty4 versions
* Fallback to parsing classpath for hadoop task in Java 9+
In Java 9 and above we cannot assume that the system classloader is an
instance of URLClassLoader. This change adds a fallback method to parse
the system classpath in that case, and adds a unit test to validate it matches
what JDK8 would do.
Note: This has not been tested in an actual hadoop setup, so this is mostly
to help us pass unit tests.
* Remove granularity test of dubious value
One of our granularity tests relies on system classloader being a URLClassLoaders to
catch a bug related to class initialization and static initializers using a subclass (see
#2979)
This test was added to catch a potential regression, but it assumes we would add back
the same type of static initializers to this specific class, so it seems to be of dubious value
as a unit test and mostly serves to illustrate the bug.
relates to #5589
* check ctyle for constant field name
* check ctyle for constant field name
* check ctyle for constant field name
* check ctyle for constant field name
* check ctyle for constant field name
* check ctyle for constant field name
* check ctyle for constant field name
* check ctyle for constant field name
* check ctyle for constant field name
* merging with upstream
* review-1
* unknow changes
* unknow changes
* review-2
* merging with master
* review-2 1 changes
* review changes-2 2
* bug fix
* Add group_id to overlord tasks API and sys.tasks table
* adjust test
* modify docs
* Make groupId nullable
* fix integration test
* fix toString
* Remove groupId from TaskInfo
* Modify docs and tests
* modify TaskMonitorTest
* Add TaskResourceCleaner; fix a couple of concurrency bugs in batch tasks
* kill runner when it's ready
* add comment
* kill run thread
* fix test
* Take closeable out of Appenderator
* add javadoc
* fix test
* fix test
* update javadoc
* add javadoc about killed task
* address comment
* Add support for parallel native indexing with shuffle for perfect rollup.
* Add comment about volatiles
* fix test
* fix test
* handling missing exceptions
* more clear javadoc for stopGracefully
* unused import
* update javadoc
* Add missing statement in javadoc
* address comments; fix doc
* add javadoc for isGuaranteedRollup
* Rename confusing variable name and fix typos
* fix typos; move fetch() to a better home; fix the expiration time
* add support https
* Fix race between canHandle() and addSegment() in StorageLocation
* add comment
* Add shuffleSegmentPusher which is a dataSegmentPusher used for writing shuffle data in local storage.
* add comments
* unused import
* add comments
* fix test
* address comments
* remove <p> tag from javadoc
* address comments
* comparingLong
* Address comments
* fix test
* Add a cluster-wide configuration to force timeChunk lock and add a doc for segment locking
* add more test
* javadoc for missingIntervalsInOverwriteMode
* Fix test
* Address comments
* avoid spotbugs
* GroupBy array-based result rows.
Fixes#8118; see that proposal for details.
Other than the GroupBy changes, the main other "interesting" classes are:
- ResultRow: The array-based result type.
- BaseQuery: T is no longer required to be Comparable.
- QueryToolChest: Adds "decorateObjectMapper" to enable query-aware serialization
and deserialization of result rows (necessary due to their positional nature).
- QueryResource: Uses the new decoration functionality.
- DirectDruidClient: Also uses the new decoration functionality.
- QueryMaker (in Druid SQL): Modifications to read ResultRows.
These classes weren't changed, but got some new javadocs:
- BySegmentQueryRunner
- FinalizeResultsQueryRunner
- Query
* Adjustments for TC stuff.
* Use partitionsSpec for all task types
* fix doc
* fix typos and revert to use isPushRequired
* address comments
* move partitionsSpec to core
* remove hadoopPartitionsSpec
* Fix dependency analyze warnings
Update the maven dependency plugin to the latest version and fix all
warnings for unused declared and used undeclared dependencies in the
compile scope. Added new travis job to add the check to CI. Also fixed
some source code files to use the correct packages for their imports.
* Fix licenses and dependencies
* Fix licenses and dependencies again
* Fix integration test dependency
* Address review comments
* Fix unit test dependencies
* Fix integration test dependency
* Fix integration test dependency again
* Fix integration test dependency third time
* Fix integration test dependency fourth time
* Fix compile error
* Fix assert package
* doc updates and changes to use the CollectionUtils.mapValues utility method
* Add Structural Search patterns to intelliJ
* refactoring from PR comments
* put -> putIfAbsent
* do single key lookup
* disable all compression in intermediate segment persists while ingestion
* more changes and build fix
* by default retain existing indexingSpec for intermediate persisted segments
* document indexSpecForIntermediatePersists index tuning config
* fix build issues
* update serde tests
The endpoints added in #6272 were missing authorization checks. This patch removes the bulk
methods from SupervisorManager, and instead has SupervisorResource run the full list through
filterAuthorizedSupervisorIds before calling resume/suspend/terminate one by one.
Make static imports forbidden in tests and remove all occurrences to be
consistent with the non-test code.
Also, various changes to files affected by above:
- Reformat to adhere to druid style guide
- Fix various IntelliJ warnings
- Fix various SonarLint warnings (e.g., the expected/actual args to
Assert.assertEquals() were flipped)
* array support for expression language for multi-value string columns
* fix tests?
* fixes
* more tests
* fixes
* cleanup
* more better, more test
* ignore inspection
* license
* license fix
* inspection
* remove dumb import
* more better
* some comments
* add expr rewrite for arrayfn args for more magic, tests
* test stuff
* more tests
* fix test
* fix test
* castfunc can deal with arrays
* needs more empty array
* more tests, make cast to long array more forgiving
* refactor
* simplify ExprMacro Expr implementations with base classes in core
* oops
* more test
* use Shuttle for Parser.flatten, javadoc, cleanup
* fixes and more tests
* unused import
* fixes
* javadocs, cleanup, refactors
* fix imports
* more javadoc
* more javadoc
* more
* more javadocs, nonnullbydefault, minor refactor
* markdown fix
* adjustments
* more doc
* move initial filter out
* docs
* map empty arg lambda, apply function argument validation
* check function args at parse time instead of eval time
* more immutable
* more more immutable
* clarify grammar
* fix docs
* empty array is string test, we need a way to make arrays better maybe in the future, or define empty arrays as other types..
* https://github.com/apache/incubator-druid/issues/7316 Use Map.putIfAbsent() instead of containsKey() + put()
* fixing indentation
* Using map.computeIfAbsent() instead of map.putIfAbsent() where appropriate
* fixing checkstyle
* Changing the recommendation text
* Reverting auto changes made by IDE
* Implementing recommendation: A ConcurrentHashMap on which computeIfAbsent() is called should be assigned into variables of ConcurrentHashMap type, not ConcurrentMap
* Removing unused import
* Add state and error tracking for seekable stream supervisors
* Fixed nits in docs
* Made inner class static and updated spec test with jackson inject
* Review changes
* Remove redundant config param in supervisor
* Style
* Applied some of Jon's recommendations
* Add transience field
* write test
* implement code review changes except for reconsidering logic of markRunFinishedAndEvaluateHealth()
* remove transience reporting and fix SeekableStreamSupervisorStateManager impl
* move call to stateManager.markRunFinished() from RunNotice to runInternal() for tests
* remove stateHistory because it wasn't adding much value, some fixes, and add more tests
* fix tests
* code review changes and add HTTP health check status
* fix test failure
* refactor to split into a generic SupervisorStateManager and a specific SeekableStreamSupervisorStateManager
* fixup after merge
* code review changes - add additional docs
* cleanup KafkaIndexTaskTest
* add additional documentation for Kinesis indexing
* remove unused throws class
* Add checkstyle for "Local variable names shouldn't start with capital"
* Adjust some local variables to constants
* Replace StringUtils.LINE_SEPARATOR with System.lineSeparator()
The main motivation is that this fixes#7724, by making it so the overlord
doesn't try to create a task runner and parser when all it really wants to
do is create a task object and serialize it.
* Bump Checkstyle to 8.20
Moderate severity vulnerability that affects:
com.puppycrawl.tools:checkstyle
Checkstyle prior to 8.18 loads external DTDs by default,
which can potentially lead to denial of service attacks
or the leaking of confidential information.
Affected versions: < 8.18
* Oops, missed one
* Oops, missed a few
* sampler initial check-in
fix checkstyle issues
add sampler fix to process CSV files from cache properly
change to composition and rename some classes
add tests and report num rows read and indexed
remove excludedByFilter flag and don't send filtered out data
fix tests to handle both settings for druid.generic.useDefaultValueForNull
* wrap sampler firehose in TimedShutoffFirehoseFactory to support timeouts
* code review changes - add additional comments, limit maxRows
* Enhnace the HttpFirehose to work with both insecure URIs and URIs requiring basic authentication
* Improve security of enhanced HttpFirehoseFactory by not logging auth credentials
* Fix checkstyle failure in HttpFirehoseFactory.java
* Update docs and fix TeamCity build with required noinspection
* Indentation cleanup and logic modification for HttpFirehose object stream
* Remove default Empty string password provider in http firehose
* Add JavaDoc for MixIn describing its intended use
* Reverting documentation notation for json code to be inline with rest of doc
* Improve instantiation of ObjectMappers that require MixIn for redacting password from task logs
* Add comment to clarify fully qualified references of Objects in SQLMetadataStorageActionHandler
* Make IngestSegmentFirehoseFactory splittable for parallel ingestion
* Code review feedback
- Get rid of WindowedSegment
- Don't document 'segments' parameter or support splitting firehoses that use it
- Require 'intervals' in WindowedSegmentId (since it won't be written by hand)
* Add missing @JsonProperty
* Integration test passes
* Add unit test
* Remove two FIXME comments from CompactionTask
I'd like to leave this PR in a potentially mergeable state, but I still would
appreciate reviewer eyes on the questions I'm removing here.
* Updates from code review
* Move GCP to a core extension
* Don't provide druid-core >.<
* Keep AWS and GCP modules separate
* Move AWSModule to its own module
* Add aws ec2 extension and more modules in more places
* Fix bad imports
* Fix test jackson module
* Include AWS and GCP core in server
* Add simple empty method comment
* Update version to 15
* One more 0.13.0-->0.15.0 change
* Fix multi-binding problem
* Grep for s3-extensions and update docs
* Update extensions.md
* Fix exclusivity for start offset in kinesis indexing service
* some adjustment
* Fix SeekableStreamDataSourceMetadata
* Add missing javadocs
* Add missing comments and unit test
* fix SeekableStreamStartSequenceNumbers.plus and add comments
* remove extra exclusivePartitions in KafkaIOConfig and fix downgrade issue
* Add javadocs
* fix compilation
* fix test
* remove unused variable
* Consolidate kafka consumer configs
* change the order of adding properties
* Add consumer properties to fix test
it seems kafka consumer does not reveive any message without these configs
* Use KafkaConsumerConfigs in integration test
* Update zookeeper and kafka versions in the setup.sh for the base druid image
* use version 0.2 of base druid image
* Try to fix tests in KafkaRecordSupplierTest
* unused import
* Fix tests in KafkaSupervisorTest