* Fix two SeekableStream serde issues.
1) Fix backwards-compatibility serde for SeekableStreamPartitions. It is needed
for split 0.13 / 0.14 clusters to work properly during a rolling update.
2) Abstract classes don't need JsonCreator constructors; remove them.
* Comment fixes.
* Added checkstyle for "Methods starting with Capital Letters" and changed the method names violating this.
* Un-abbreviate the method names in the calcite tests
* Fixed checkstyle errors
* Changed asserts position in the code
IndexTask had special-cased code to properly send a TaskToolbox to a
IngestSegmentFirehoseFactory that's nested inside a CombiningFirehoseFactory,
but ParallelIndexSubTask didn't.
This change refactors IngestSegmentFirehoseFactory so that it doesn't need a
TaskToolbox; it instead gets a CoordinatorClient and a SegmentLoaderFactory
directly injected into it.
This also refactors SegmentLoaderFactory so it doesn't depend on
an injectable SegmentLoaderConfig, since its only method always
replaces the preconfigured SegmentLoaderConfig anyway.
This makes it possible to use SegmentLoaderFactory without setting
druid.segmentCaches.locations to some dummy value.
Another goal of this PR is to make it possible for IngestSegmentFirehoseFactory
to list data segments outside of connect() --- specifically, to make it a
FiniteFirehoseFactory which can query the coordinator in order to calculate its
splits. See #7048.
This also adds missing datasource name URL-encoding to an API used by
CoordinatorBasedSegmentHandoffNotifier.
* Remove DataSegmentFinder, InsertSegmentToDb, and descriptor.json file
* delete descriptor.file when killing segments
* fix test
* Add doc for ha
* improve warning
* Support kafka transactional topics
* update kafka to version 2.0.0
* Remove the skipOffsetGaps option since it's not used anymore
* Adjust kafka consumer to use transactional semantics
* Update tests
* Remove unused import from test
* Fix compilation
* Invoke transaction api to fix a unit test
* temporary modification of travis.yml for debugging
* another attempt to get travis tasklogs
* update kafka to 2.0.1 at all places
* Remove druid-kafka-eight dependency from integration-tests, remove the kafka firehose test and deprecate kafka-eight classes
* Add deprecated in docs for kafka-eight and kafka-simple extensions
* Remove skipOffsetGaps and code changes for transaction support
* Fix indentation
* remove skipOffsetGaps from kinesis
* Add transaction api to KafkaRecordSupplierTest
* Fix indent
* Fix test
* update kafka version to 2.1.0
* Prohibit assigning concurrent maps into Map-types variables and fields; Fix a race condition in CoordinatorRuleManager; improve logic in DirectDruidClient and ResourcePool
* Enforce that if compute(), computeIfAbsent(), computeIfPresent() or merge() is called on a ConcurrentHashMap, it's stored in a ConcurrentHashMap-typed variable, not ConcurrentMap; add comments explaining get()-before-computeIfAbsent() optimization; refactor Counters; fix a race condition in Intialization.java
* Remove unnecessary comment
* Checkstyle
* Fix getFromExtensions()
* Add a reference to the comment about guarded computeIfAbsent() optimization; IdentityHashMap optimization
* Fix UriCacheGeneratorTest
* Workaround issue with MaterializedViewQueryQueryToolChest
* Strengthen Appenderator's contract regarding concurrency
* KillTask from overlord UI now makes sure that it terminates the underlying MR job, thus saving unnecessary compute
Run in jobby is now split into 2
1. submitAndGetHadoopJobId followed by 2. run
submitAndGetHadoopJobId is responsible for submitting the job and returning the jobId as a string, run monitors this job for completion
JobHelper writes this jobId in the path provided by HadoopIndexTask which in turn is provided by the ForkingTaskRunner
HadoopIndexTask reads this path when kill task is clicked to get hte jobId and fire the kill command via the yarn api. This is taken care in the stopGracefully method which is called in SingleTaskBackgroundRunner. Have enabled `canRestore` method to return `true` for HadoopIndexTask in order for the stopGracefully method to be called
Hadoop*Job files have been changed to incorporate the changes to jobby
* Addressing PR comments
* Addressing PR comments - Fix taskDir
* Addressing PR comments - For changing the contract of Task.stopGracefully()
`SingleTaskBackgroundRunner` calls stopGracefully in stop() and then checks for canRestore condition to return the status of the task
* Addressing PR comments
1. Formatting
2. Removing `submitAndGetHadoopJobId` from `Jobby` and calling writeJobIdToFile in the job itself
* Addressing PR comments
1. POM change. Moving hadoop dependency to indexing-hadoop
* Addressing PR comments
1. stopGracefully now accepts TaskConfig as a param
Handling isRestoreOnRestart in stopGracefully for `AppenderatorDriverRealtimeIndexTask, RealtimeIndexTask, SeekableStreamIndexTask`
Changing tests to make TaskConfig param isRestoreOnRestart to true
* Use multi-guava version friendly direct executor implementation
* Don't use a singleton
* Fix strict compliation complaints
* Copy Guava's DirectExecutor
* Fix javadoc
* Imports are the devil
* created seekablestream classes
* created seekablestreamsupervisor class
* first attempt to integrate kafa indexing service to use SeekableStream
* seekablestream bug fixes
* kafkarecordsupplier
* integrated kafka indexing service with seekablestream
* implemented resume/suspend and refactored some package names
* moved kinesis indexing service into core druid extensions
* merged some changes from kafka supervisor race condition
* integrated kinesis-indexing-service with seekablestream
* unite tests for kinesis-indexing-service
* various bug fixes for kinesis-indexing-service
* refactored kinesisindexingtask
* finished up more kinesis unit tests
* more bug fixes for kinesis-indexing-service
* finsihed refactoring kinesis unit tests
* removed KinesisParititons and KafkaPartitions to use SeekableStreamPartitions
* kinesis-indexing-service code cleanup and docs
* merge #6291
merge #6337
merge #6383
* added more docs and reordered methods
* fixd kinesis tests after merging master and added docs in seekablestream
* fix various things from pr comment
* improve recordsupplier and add unit tests
* migrated to aws-java-sdk-kinesis
* merge changes from master
* fix pom files and forbiddenapi checks
* checkpoint JavaType bug fix
* fix pom and stuff
* disable checkpointing in kinesis
* fix kinesis sequence number null in closed shard
* merge changes from master
* fixes for kinesis tasks
* capitalized <partitionType, sequenceType>
* removed abstract class loggers
* conform to guava api restrictions
* add docker for travis other modules test
* address comments
* improve RecordSupplier to supply records in batch
* fix strict compile issue
* add test scope for localstack dependency
* kinesis indexing task refactoring
* comments
* github comments
* minor fix
* removed unneeded readme
* fix deserialization bug
* fix various bugs
* KinesisRecordSupplier unable to catch up to earliest position in stream bug fix
* minor changes to kinesis
* implement deaggregate for kinesis
* Merge remote-tracking branch 'upstream/master' into seekablestream
* fix kinesis offset discrepancy with kafka
* kinesis record supplier disable getPosition
* pr comments
* mock for kinesis tests and remove docker dependency for unit tests
* PR comments
* avg lag in kafkasupervisor #6587
* refacotred SequenceMetadata in taskRunners
* small fix
* more small fix
* recordsupplier resource leak
* revert .travis.yml formatting
* fix style
* kinesis docs
* doc part2
* more docs
* comments
* comments*2
* revert string replace changes
* comments
* teamcity
* comments part 1
* comments part 2
* comments part 3
* merge #6754
* fix injection binding
* comments
* KinesisRegion refactor
* comments part idk lol
* can't think of a commit msg anymore
* remove possiblyResetDataSourceMetadata() for IncrementalPublishingTaskRunner
* commmmmmmmmmments
* extra error handling in KinesisRecordSupplier getRecords
* comments
* quickfix
* typo
* oof
* Add checkstyle rules about imports and empty lines between members
* Add suppressions
* Update Eclipse import order
* Add empty line
* Fix StatsDEmitter
* Prohibit some guava collection APIs and use JDK APIs directly
* reset files that changed by accident
* sort codestyle/druid-forbidden-apis.txt alphabetically
* remove consumer.listTopics() method
* add consumerLock and exception handling for consumer.partitionFor() and remove some useless checks
* add check in case consumer.partitionsFor() returns null
* fix CI failure
* fix failed UT
* Revert "fix CI failure"
This reverts commit f839d09e1e.
* revert unless commit and re-commit the useful part to fix failed UT
* Use NodeType enum instead of Strings
* Make NodeType constants uppercase
* Fix CommonCacheNotifier and NodeType/ServerType comments
* Reconsidering comment
* Fix import
* Add a comment to CommonCacheNotifier.NODE_TYPES
* Prevent failed KafkaConsumer creation from blocking overlord startup
* PR comments
* Fix random task ID length
* Adjust test timer
* Use Integer.SIZE
This PR accumulates many refactorings and small improvements that I did while preparing the next change set of https://github.com/druid-io/druid/projects/2. I finally decided to make them a separate PR to minimize the volume of the main PR.
Some of the changes:
- Renamed confusing "Generic Column" term to "Numeric Column" (what it actually implies) in many class names.
- Generified `ComplexMetricExtractor`
* 'suspend' and 'resume' support for kafka indexing service
changes:
* introduces `SuspendableSupervisorSpec` interface to describe supervisors which support suspend/resume functionality controlled through the `SupervisorManager`, which will gracefully shutdown the supervisor and it's tasks, update it's `SupervisorSpec` with either a suspended or running state, and update with the toggled spec. Spec updates are provided by `SuspendableSupervisorSpec.createSuspendedSpec` and `SuspendableSupervisorSpec.createRunningSpec` respectively.
* `KafkaSupervisorSpec` extends `SuspendableSupervisorSpec` and now supports suspend/resume functionality. The difference in behavior between 'running' and 'suspended' state is whether the supervisor will attempt to ensure that indexing tasks are or are not running respectively. Behavior is identical otherwise.
* `SupervisorResource` now provides `/druid/indexer/v1/supervisor/{id}/suspend` and `/druid/indexer/v1/supervisor/{id}/resume` which are used to suspend/resume suspendable supervisors
* Deprecated `/druid/indexer/v1/supervisor/{id}/shutdown` and moved it's functionality to `/druid/indexer/v1/supervisor/{id}/terminate` since 'shutdown' is ambiguous verbage for something that effectively stops a supervisor forever
* Added ability to get all supervisor specs from `/druid/indexer/v1/supervisor` by supplying the 'full' query parameter `/druid/indexer/v1/supervisor?full` which will return a list of json objects of the form `{"id":<id>, "spec":<SupervisorSpec>}`
* Updated overlord console ui to enable suspend/resume, and changed 'shutdown' to 'terminate'
* move overlord console status to own column in supervisor table so does not look like garbage
* spacing
* padding
* other kind of spacing
* fix rebase fail
* fix more better
* all supervisors now suspendable, updated materialized view supervisor to support suspend, more tests
* fix log
* resolves#5898 by adding maxTotalRows to incremental publishing kafka index task and appenderator based realtime indexing task, as available in IndexTask
* address review comments
* changes due to review
* merge fail