* https://github.com/apache/incubator-druid/issues/7316 Use Map.putIfAbsent() instead of containsKey() + put()
* fixing indentation
* Using map.computeIfAbsent() instead of map.putIfAbsent() where appropriate
* fixing checkstyle
* Changing the recommendation text
* Reverting auto changes made by IDE
* Implementing recommendation: A ConcurrentHashMap on which computeIfAbsent() is called should be assigned into variables of ConcurrentHashMap type, not ConcurrentMap
* Removing unused import
* Add state and error tracking for seekable stream supervisors
* Fixed nits in docs
* Made inner class static and updated spec test with jackson inject
* Review changes
* Remove redundant config param in supervisor
* Style
* Applied some of Jon's recommendations
* Add transience field
* write test
* implement code review changes except for reconsidering logic of markRunFinishedAndEvaluateHealth()
* remove transience reporting and fix SeekableStreamSupervisorStateManager impl
* move call to stateManager.markRunFinished() from RunNotice to runInternal() for tests
* remove stateHistory because it wasn't adding much value, some fixes, and add more tests
* fix tests
* code review changes and add HTTP health check status
* fix test failure
* refactor to split into a generic SupervisorStateManager and a specific SeekableStreamSupervisorStateManager
* fixup after merge
* code review changes - add additional docs
* cleanup KafkaIndexTaskTest
* add additional documentation for Kinesis indexing
* remove unused throws class
* Add checkstyle for "Local variable names shouldn't start with capital"
* Adjust some local variables to constants
* Replace StringUtils.LINE_SEPARATOR with System.lineSeparator()
The main motivation is that this fixes#7724, by making it so the overlord
doesn't try to create a task runner and parser when all it really wants to
do is create a task object and serialize it.
* Bump Checkstyle to 8.20
Moderate severity vulnerability that affects:
com.puppycrawl.tools:checkstyle
Checkstyle prior to 8.18 loads external DTDs by default,
which can potentially lead to denial of service attacks
or the leaking of confidential information.
Affected versions: < 8.18
* Oops, missed one
* Oops, missed a few
* sampler initial check-in
fix checkstyle issues
add sampler fix to process CSV files from cache properly
change to composition and rename some classes
add tests and report num rows read and indexed
remove excludedByFilter flag and don't send filtered out data
fix tests to handle both settings for druid.generic.useDefaultValueForNull
* wrap sampler firehose in TimedShutoffFirehoseFactory to support timeouts
* code review changes - add additional comments, limit maxRows
* Enhnace the HttpFirehose to work with both insecure URIs and URIs requiring basic authentication
* Improve security of enhanced HttpFirehoseFactory by not logging auth credentials
* Fix checkstyle failure in HttpFirehoseFactory.java
* Update docs and fix TeamCity build with required noinspection
* Indentation cleanup and logic modification for HttpFirehose object stream
* Remove default Empty string password provider in http firehose
* Add JavaDoc for MixIn describing its intended use
* Reverting documentation notation for json code to be inline with rest of doc
* Improve instantiation of ObjectMappers that require MixIn for redacting password from task logs
* Add comment to clarify fully qualified references of Objects in SQLMetadataStorageActionHandler
* Make IngestSegmentFirehoseFactory splittable for parallel ingestion
* Code review feedback
- Get rid of WindowedSegment
- Don't document 'segments' parameter or support splitting firehoses that use it
- Require 'intervals' in WindowedSegmentId (since it won't be written by hand)
* Add missing @JsonProperty
* Integration test passes
* Add unit test
* Remove two FIXME comments from CompactionTask
I'd like to leave this PR in a potentially mergeable state, but I still would
appreciate reviewer eyes on the questions I'm removing here.
* Updates from code review
* Move GCP to a core extension
* Don't provide druid-core >.<
* Keep AWS and GCP modules separate
* Move AWSModule to its own module
* Add aws ec2 extension and more modules in more places
* Fix bad imports
* Fix test jackson module
* Include AWS and GCP core in server
* Add simple empty method comment
* Update version to 15
* One more 0.13.0-->0.15.0 change
* Fix multi-binding problem
* Grep for s3-extensions and update docs
* Update extensions.md
* Fix exclusivity for start offset in kinesis indexing service
* some adjustment
* Fix SeekableStreamDataSourceMetadata
* Add missing javadocs
* Add missing comments and unit test
* fix SeekableStreamStartSequenceNumbers.plus and add comments
* remove extra exclusivePartitions in KafkaIOConfig and fix downgrade issue
* Add javadocs
* fix compilation
* fix test
* remove unused variable
* Consolidate kafka consumer configs
* change the order of adding properties
* Add consumer properties to fix test
it seems kafka consumer does not reveive any message without these configs
* Use KafkaConsumerConfigs in integration test
* Update zookeeper and kafka versions in the setup.sh for the base druid image
* use version 0.2 of base druid image
* Try to fix tests in KafkaRecordSupplierTest
* unused import
* Fix tests in KafkaSupervisorTest
* Avoid many unnecessary materializations of collections of 'all segments in cluster' cardinality
* Fix DruidCoordinatorTest; Renamed DruidCoordinator.getReplicationStatus() to computeUnderReplicationCountsPerDataSourcePerTier()
* More Javadocs, typos, refactor DruidCoordinatorRuntimeParams.createAvailableSegmentsSet()
* Style
* typo
* Disable StaticPseudoFunctionalStyleMethod inspection because of too much false positives
* Fixes
* Logic adjustments to SeekableStreamIndexTaskRunner.
A mix of simplifications and bug fixes. They are intermingled because
some of the bugs were made difficult to fix, and also more likely to
happen in the first place, by how the code was structured. I tried to
keep restructuring to a minimum. The changes are:
- Remove "initialOffsetsSnapshot", which was used to determine when to
skip start offsets. Replace it with "lastReadOffsets", which I hope
is more intuitive. (There is a connection: start offsets must be
skipped if and only if they have already been read, either by a
previous task or by a previous sequence in the same task, post-restoring.)
- Remove "isStartingSequenceOffsetsExclusive", because it should always
be the opposite of isEndOffsetExclusive. The reason is that starts are
exclusive exactly when the prior ends are inclusive: they must match
up in that way for adjacent reads to link up properly.
- Don't call "seekToStartingSequence" after the initial seek. There is
no reason to, since we expect to read continuous message streams
throughout the task. And calling it makes offset-tracking logic
trickier, so better to avoid the need for trickiness. I believe the
call being here was causing a bug in Kinesis ingestion where a
message might get double-read.
- Remove the "continue" calls in the main read loop. They are bad
because they prevent keeping currOffsets and lastReadOffsets up to
date, and prevent us from detecting that we have finished reading.
- Rework "verifyInitialRecordAndSkipExclusivePartition" into
"verifyRecordInRange". It no longer has side effects. It does a sanity
check on the message offset and also makes sure that it is not past
the endOffsets.
- Rework "assignPartitions" to replace inline comparisons with
"isRecordAlreadyRead" and "isMoreToReadBeforeReadingRecord" calls. I
believe this fixes an off-by-one error with Kinesis where the last
record would not get read. It also makes the logic easier to read.
- When doing the final publish, only adjust end offsets of the final
sequence, rather than potentially adjusting any unpublished sequence.
Adjusting sequences other than the last one is a mistake since it
will extend their endOffsets beyond what they actually read. (I'm not
sure if this was an issue in practice, since I'm not sure if real
world situations would have more than one unpublished sequence.)
- Rename "isEndSequenceOffsetsExclusive" to "isEndOffsetExclusive". It's
shorter and more clear, I think.
- Add equals/hashCode/toString methods to OrderedSequenceNumber.
Kafka test changes:
- Added a Kafka "testRestoreAtEndOffset" test to verify that restores at
the very end of the task lifecycle still work properly.
Kinesis test changes:
- Renamed "testRunOnNothing" to "testRunOnSingletonRange". I think that
given Kinesis semantics, the right behavior when start offset equals
end offset (and there aren't exclusive partitions set) is to read that
single offset. This is because they are both meant to be treated as
inclusive.
- Adjusted "testRestoreAfterPersistingSequences" to expect one more
message read. I believe the old test was wrong; it expected the task
not to read message number 5.
- Adjusted "testRunContextSequenceAheadOfStartingOffsets" to use a
checkpoint starting from 1 rather than 2. I believe the old test was
wrong here too; it was expecting the task to start reading from the
checkpointed offset, but it actually should have started reading from
one past the checkpointed offset.
- Adjusted "testIncrementalHandOffReadsThroughEndOffsets" to expect
11 messages read instead of 12. It's starting at message 0 and reading
up to 10, which should be 11 messages.
* Changes from code review.
* Throw caught exception.
* Throw caught exceptions.
* Related checkstyle rule is added to prevent further bugs.
* RuntimeException() is used instead of Throwables.propagate().
* Missing import is added.
* Throwables are propogated if possible.
* Throwables are propogated if possible.
* Throwables are propogated if possible.
* Throwables are propogated if possible.
* * Checkstyle definition is improved.
* Throwables.propagate() usages are removed.
* Checkstyle pattern is changed for only scanning "Throwables.propagate(" instead of checking lookbehind.
* Throwable is kept before firing a Runtime Exception.
* Fix unused assignments.
* wip
* fix tests, stop reading if we are at end offset
* fix build
* remove restore at end offsets fix in favor of a separate PR
* use typereference from method for serialization too
* Reduce # of max subTasks to 2
* fix typo and add more doc
* add more doc and link
* change default and add warning
* fix doc
* add test
* fix it test
* Fix two SeekableStream serde issues.
1) Fix backwards-compatibility serde for SeekableStreamPartitions. It is needed
for split 0.13 / 0.14 clusters to work properly during a rolling update.
2) Abstract classes don't need JsonCreator constructors; remove them.
* Comment fixes.
* Added checkstyle for "Methods starting with Capital Letters" and changed the method names violating this.
* Un-abbreviate the method names in the calcite tests
* Fixed checkstyle errors
* Changed asserts position in the code
IndexTask had special-cased code to properly send a TaskToolbox to a
IngestSegmentFirehoseFactory that's nested inside a CombiningFirehoseFactory,
but ParallelIndexSubTask didn't.
This change refactors IngestSegmentFirehoseFactory so that it doesn't need a
TaskToolbox; it instead gets a CoordinatorClient and a SegmentLoaderFactory
directly injected into it.
This also refactors SegmentLoaderFactory so it doesn't depend on
an injectable SegmentLoaderConfig, since its only method always
replaces the preconfigured SegmentLoaderConfig anyway.
This makes it possible to use SegmentLoaderFactory without setting
druid.segmentCaches.locations to some dummy value.
Another goal of this PR is to make it possible for IngestSegmentFirehoseFactory
to list data segments outside of connect() --- specifically, to make it a
FiniteFirehoseFactory which can query the coordinator in order to calculate its
splits. See #7048.
This also adds missing datasource name URL-encoding to an API used by
CoordinatorBasedSegmentHandoffNotifier.
* Remove DataSegmentFinder, InsertSegmentToDb, and descriptor.json file
* delete descriptor.file when killing segments
* fix test
* Add doc for ha
* improve warning
* index_parallel: support !appendToExisting with no explicit intervals
This enables ParallelIndexSupervisorTask to dynamically request locks at runtime
if it is run without explicit intervals in the granularity spec and with
appendToExisting set to false. Previously, it behaved as if appendToExisting
was set to true, which was undocumented and inconsistent with IndexTask and
Hadoop indexing.
Also, when ParallelIndexSupervisorTask allocates segments in the explicit
interval case, fail if its locks on the interval have been revoked.
Also make a few other additions/clarifications to native ingestion docs.
Fixes#6989.
* Review feedback.
PR description on GitHub updated to match.
* Make native batch ingestion partitions start at 0
* Fix to previous commit
* Unit test. Verified to fail without the other commits on this branch.
* Another round of review
* Slightly scarier warning
* document middle manager api
* re-arrange
* correction
* document more missing overlord api calls, minor re-arrange of some code i was referencing
* fix it
* this will fix it
* fixup
* link to other docs
* Support kafka transactional topics
* update kafka to version 2.0.0
* Remove the skipOffsetGaps option since it's not used anymore
* Adjust kafka consumer to use transactional semantics
* Update tests
* Remove unused import from test
* Fix compilation
* Invoke transaction api to fix a unit test
* temporary modification of travis.yml for debugging
* another attempt to get travis tasklogs
* update kafka to 2.0.1 at all places
* Remove druid-kafka-eight dependency from integration-tests, remove the kafka firehose test and deprecate kafka-eight classes
* Add deprecated in docs for kafka-eight and kafka-simple extensions
* Remove skipOffsetGaps and code changes for transaction support
* Fix indentation
* remove skipOffsetGaps from kinesis
* Add transaction api to KafkaRecordSupplierTest
* Fix indent
* Fix test
* update kafka version to 2.1.0
* Fix:
1. hadoop-common dependency for druid-hdfs and druid-kerberos extensions
Refactoring:
2. Hadoop config call in the inner static class to avoid class path conflicts for stopGracefully kill
* Fix:
1. hadoop-common test dependency
* Fix:
1. Avoid issue of kill command once the job is actually completed
* Improper equals override is fixed to prevent NullPointerException
* Fixed curly brace indentation.
* Test method is added for equals method of TaskLockPosse class.
Native batch indexing doesn't yet support the maxParseExceptions,
maxSavedParseExceptions, and logParseExceptions tuning config options, so
ParallelIndexSupervisorTask logs if these are set. But the default value for
maxParseExceptions is Integer.MAX_VALUE, which means that you'll get the
maxParseExceptions flavor of this warning even if you don't configure
maxParseExceptions.
This PR changes all three warnings to occur if you change the settings from the
default; this mostly affects the maxParseExceptions warning.
* Prohibit assigning concurrent maps into Map-types variables and fields; Fix a race condition in CoordinatorRuleManager; improve logic in DirectDruidClient and ResourcePool
* Enforce that if compute(), computeIfAbsent(), computeIfPresent() or merge() is called on a ConcurrentHashMap, it's stored in a ConcurrentHashMap-typed variable, not ConcurrentMap; add comments explaining get()-before-computeIfAbsent() optimization; refactor Counters; fix a race condition in Intialization.java
* Remove unnecessary comment
* Checkstyle
* Fix getFromExtensions()
* Add a reference to the comment about guarded computeIfAbsent() optimization; IdentityHashMap optimization
* Fix UriCacheGeneratorTest
* Workaround issue with MaterializedViewQueryQueryToolChest
* Strengthen Appenderator's contract regarding concurrency
* Adding new web console.
* fixed css
* fix form height
* fix typo
* do import custom react-table css
* added repo field so npm does not complain
* ask travis for node 10
* move indexing-service/src/main/resources/indexer_static into web-console
* fix resource names and paths
* add licenses
* fix exclude file
* add licenses to misc files and tidy up
* remove rebase marker
* fix link
* updated env variable name
* tidy up licenses and surface errors
* cleanup
* remove unused code, fix missing await
* TeamCity does not like the name aux
* add more links to tasks view
* rm pages
* update gitignore
* update readme to be accurate
* make clean script
* removed old console dependancy
* update Jetty routes
* add a comment for welcome files for coordinator
* do not show inital notifaction for now
* renamed overlord console back to console.html
* fix coordinator console
* rename coordinator-console.html to index.html
* * Add few methods about base64 into StringUtils
* Use `java.util.Base64` instead of others
* Add org.apache.commons.codec.binary.Base64 & com.google.common.io.BaseEncoding into druid-forbidden-apis
* Rename encodeBase64String & decodeBase64String
* Update druid-forbidden-apis
* KillTask from overlord UI now makes sure that it terminates the underlying MR job, thus saving unnecessary compute
Run in jobby is now split into 2
1. submitAndGetHadoopJobId followed by 2. run
submitAndGetHadoopJobId is responsible for submitting the job and returning the jobId as a string, run monitors this job for completion
JobHelper writes this jobId in the path provided by HadoopIndexTask which in turn is provided by the ForkingTaskRunner
HadoopIndexTask reads this path when kill task is clicked to get hte jobId and fire the kill command via the yarn api. This is taken care in the stopGracefully method which is called in SingleTaskBackgroundRunner. Have enabled `canRestore` method to return `true` for HadoopIndexTask in order for the stopGracefully method to be called
Hadoop*Job files have been changed to incorporate the changes to jobby
* Addressing PR comments
* Addressing PR comments - Fix taskDir
* Addressing PR comments - For changing the contract of Task.stopGracefully()
`SingleTaskBackgroundRunner` calls stopGracefully in stop() and then checks for canRestore condition to return the status of the task
* Addressing PR comments
1. Formatting
2. Removing `submitAndGetHadoopJobId` from `Jobby` and calling writeJobIdToFile in the job itself
* Addressing PR comments
1. POM change. Moving hadoop dependency to indexing-hadoop
* Addressing PR comments
1. stopGracefully now accepts TaskConfig as a param
Handling isRestoreOnRestart in stopGracefully for `AppenderatorDriverRealtimeIndexTask, RealtimeIndexTask, SeekableStreamIndexTask`
Changing tests to make TaskConfig param isRestoreOnRestart to true
* Replacing Math.random() with ThreadLocalRandom.current().nextDouble()
* Added java.lang.Math#random() in forbidden-apis.txt
* Minor change in the message - druid-forbidden-apis.txt
* Use multi-guava version friendly direct executor implementation
* Don't use a singleton
* Fix strict compliation complaints
* Copy Guava's DirectExecutor
* Fix javadoc
* Imports are the devil
* created seekablestream classes
* created seekablestreamsupervisor class
* first attempt to integrate kafa indexing service to use SeekableStream
* seekablestream bug fixes
* kafkarecordsupplier
* integrated kafka indexing service with seekablestream
* implemented resume/suspend and refactored some package names
* moved kinesis indexing service into core druid extensions
* merged some changes from kafka supervisor race condition
* integrated kinesis-indexing-service with seekablestream
* unite tests for kinesis-indexing-service
* various bug fixes for kinesis-indexing-service
* refactored kinesisindexingtask
* finished up more kinesis unit tests
* more bug fixes for kinesis-indexing-service
* finsihed refactoring kinesis unit tests
* removed KinesisParititons and KafkaPartitions to use SeekableStreamPartitions
* kinesis-indexing-service code cleanup and docs
* merge #6291
merge #6337
merge #6383
* added more docs and reordered methods
* fixd kinesis tests after merging master and added docs in seekablestream
* fix various things from pr comment
* improve recordsupplier and add unit tests
* migrated to aws-java-sdk-kinesis
* merge changes from master
* fix pom files and forbiddenapi checks
* checkpoint JavaType bug fix
* fix pom and stuff
* disable checkpointing in kinesis
* fix kinesis sequence number null in closed shard
* merge changes from master
* fixes for kinesis tasks
* capitalized <partitionType, sequenceType>
* removed abstract class loggers
* conform to guava api restrictions
* add docker for travis other modules test
* address comments
* improve RecordSupplier to supply records in batch
* fix strict compile issue
* add test scope for localstack dependency
* kinesis indexing task refactoring
* comments
* github comments
* minor fix
* removed unneeded readme
* fix deserialization bug
* fix various bugs
* KinesisRecordSupplier unable to catch up to earliest position in stream bug fix
* minor changes to kinesis
* implement deaggregate for kinesis
* Merge remote-tracking branch 'upstream/master' into seekablestream
* fix kinesis offset discrepancy with kafka
* kinesis record supplier disable getPosition
* pr comments
* mock for kinesis tests and remove docker dependency for unit tests
* PR comments
* avg lag in kafkasupervisor #6587
* refacotred SequenceMetadata in taskRunners
* small fix
* more small fix
* recordsupplier resource leak
* revert .travis.yml formatting
* fix style
* kinesis docs
* doc part2
* more docs
* comments
* comments*2
* revert string replace changes
* comments
* teamcity
* comments part 1
* comments part 2
* comments part 3
* merge #6754
* fix injection binding
* comments
* KinesisRegion refactor
* comments part idk lol
* can't think of a commit msg anymore
* remove possiblyResetDataSourceMetadata() for IncrementalPublishingTaskRunner
* commmmmmmmmmments
* extra error handling in KinesisRecordSupplier getRecords
* comments
* quickfix
* typo
* oof
* make logs that are only useful for debugging be at debug level so log volume is much more chill
* info level messages for total merge buffer allocated/free
* more chill compaction logs
* FilteredRequestLogger: Fix start/stop, invalid delegate behavior.
Fixes two bugs:
1) FilteredRequestLogger did not start/stop the delegate.
2) FilteredRequestLogger would ignore an invalid delegate type, and
instead silently substitute the "noop" logger. This was due to a larger
problem with RequestLoggerProvider setup in general; the fix here is
to remove "defaultImpl" from the RequestLoggerProvider interface, and
instead have JsonConfigurator be responsible for creating the
default implementations. It is stricter about things than the old system
was, and is only willing to make a noop logger if it doesn't see any
request logger configs. Otherwise, it'll raise a provision error.
* Remove unneeded annotations.
* autosize processing buffers based on direct memory sizing
* remove oops, more test
* max 1gb autosize buffers, test, start of docs
* fix oops
* revert accidental change
* print buffer size in exception
* change the things
* remove AbstractResourceFilter.isApplicable because it is not, add tests for OverlordResource.doShutdown and OverlordResource.shutdownTasksForDatasource
* cleanup
* 1. added support for unused DateTime start parameter in getRecentlyFinishedTaskInfoSince method:
HeapMemoryTaskStorage.getRecentlyFinishedTaskInfoSince return the finished tasks by comparing TaskStuff.createdDate with the start time
2. added filtering by status complete to TaskStuff list stream in HeapMemoryTaskStorage.getNRecentlyFinishedTaskInfo method.
3. changed names of methods and parameters to present that public API method OverlordResource.getTasks return the list of completed tasks, which createdDate, not date of completion, belongs to the interval parameter.
* 1. added support for unused DateTime start parameter in getRecentlyFinishedTaskInfoSince method:
HeapMemoryTaskStorage.getRecentlyFinishedTaskInfoSince return the finished tasks by comparing TaskStuff.createdDate with the start time
2. added filtering by status complete to TaskStuff list stream in HeapMemoryTaskStorage.getNRecentlyFinishedTaskInfo method.
3. changed names of methods and parameters to present that public API method OverlordResource.getTasks return the list of completed tasks, which createdDate, not date of completion, belongs to the interval parameter.
* Fixed OverlordResourceTest to Support changed methods names
* Changed methods and parameters names to make them more obvious to understand.
* Changed String.replace() for the StringUtils.replace()(#6607)
* Fixed checkstyle error
* Add checkstyle rules about imports and empty lines between members
* Add suppressions
* Update Eclipse import order
* Add empty line
* Fix StatsDEmitter
* Prohibit some guava collection APIs and use JDK APIs directly
* reset files that changed by accident
* sort codestyle/druid-forbidden-apis.txt alphabetically
* 1. Mysql default transaction isolation is REPEATABLE_READ, treat it as READ_COMMITTED will reduce insert id conflict.
2. Add an index to 'dataSource used end' is work well for the most of scenarios(get recently segments), and it will speed up sync add pending segments in DB.
3. 'select and insert' is not need within transaction.
* Use TaskLockbox.doInCriticalSection instead of synchronized syntax to speed up insert pending segments.
* fix typo for NullPointerException
* Use NodeType enum instead of Strings
* Make NodeType constants uppercase
* Fix CommonCacheNotifier and NodeType/ServerType comments
* Reconsidering comment
* Fix import
* Add a comment to CommonCacheNotifier.NODE_TYPES
* Prevent failed KafkaConsumer creation from blocking overlord startup
* PR comments
* Fix random task ID length
* Adjust test timer
* Use Integer.SIZE
This PR accumulates many refactorings and small improvements that I did while preparing the next change set of https://github.com/druid-io/druid/projects/2. I finally decided to make them a separate PR to minimize the volume of the main PR.
Some of the changes:
- Renamed confusing "Generic Column" term to "Numeric Column" (what it actually implies) in many class names.
- Generified `ComplexMetricExtractor`
* Replace statusCode with status (#6333)
Also changed runnerStatusCode to runnerStatus to keep things consistent
* Add unit test
* Add status param to TaskStatusPlus
Revert to statusCode and runnerStatusCode
* Add additional status member to TaskStatusPlus
* Change TaskResponseObject to match overlord's response object
* Address PR comments
* address comments
* Add runtime exception after logging error
* Remove (deprecated)status member variable from TaskStatusPlus
* Minor change
* Add support targetCompactionSizeBytes for compactionTask
* fix test
* fix a bug in keepSegmentGranularity
* fix wrong noinspection comment
* address comments
* Adding licenses and enable apache-rat-plugi.
Change-Id: I4685a2d9f1e147855dba69329b286f2d5bee3c18
* restore the copywrite of demo_table and add it to the list of allowed ones
Change-Id: I2a9efde6f4b984bc1ac90483e90d98e71f818a14
* revirew comments
Change-Id: I0256c930b7f9a5bb09b44b5e7a149e6ec48cb0ca
* more fixup
Change-Id: I1355e8a2549e76cd44487abec142be79bec59de2
* align
Change-Id: I70bc47ecb577bdf6b91639dd91b6f5642aa6b02f
* 'suspend' and 'resume' support for kafka indexing service
changes:
* introduces `SuspendableSupervisorSpec` interface to describe supervisors which support suspend/resume functionality controlled through the `SupervisorManager`, which will gracefully shutdown the supervisor and it's tasks, update it's `SupervisorSpec` with either a suspended or running state, and update with the toggled spec. Spec updates are provided by `SuspendableSupervisorSpec.createSuspendedSpec` and `SuspendableSupervisorSpec.createRunningSpec` respectively.
* `KafkaSupervisorSpec` extends `SuspendableSupervisorSpec` and now supports suspend/resume functionality. The difference in behavior between 'running' and 'suspended' state is whether the supervisor will attempt to ensure that indexing tasks are or are not running respectively. Behavior is identical otherwise.
* `SupervisorResource` now provides `/druid/indexer/v1/supervisor/{id}/suspend` and `/druid/indexer/v1/supervisor/{id}/resume` which are used to suspend/resume suspendable supervisors
* Deprecated `/druid/indexer/v1/supervisor/{id}/shutdown` and moved it's functionality to `/druid/indexer/v1/supervisor/{id}/terminate` since 'shutdown' is ambiguous verbage for something that effectively stops a supervisor forever
* Added ability to get all supervisor specs from `/druid/indexer/v1/supervisor` by supplying the 'full' query parameter `/druid/indexer/v1/supervisor?full` which will return a list of json objects of the form `{"id":<id>, "spec":<SupervisorSpec>}`
* Updated overlord console ui to enable suspend/resume, and changed 'shutdown' to 'terminate'
* move overlord console status to own column in supervisor table so does not look like garbage
* spacing
* padding
* other kind of spacing
* fix rebase fail
* fix more better
* all supervisors now suspendable, updated materialized view supervisor to support suspend, more tests
* fix log
* resolves#5898 by adding maxTotalRows to incremental publishing kafka index task and appenderator based realtime indexing task, as available in IndexTask
* address review comments
* changes due to review
* merge fail
* Rename io.druid to org.apache.druid.
* Fix META-INF files and remove some benchmark results.
* MonitorsConfig update for metrics package migration.
* Reorder some dimensions in inner queries for some reason.
* Fix protobuf tests.
* Fix all inspection errors currently reported.
TeamCity builds on master are reporting inspection errors, possibly
because there was a while where it was not running due to the Apache
migration, and there was some drift.
* Fix one more location.
* Fix tests.
* Another fix.
* 'shutdownAllTasks' API for a dataSource
Change-Id: I30d14390457d39e0427d23a48f4f224223dc5777
* fix api path and return
Change-Id: Ib463f31ee2c4cb168cf2697f149be845b57c42e5
* optimize implementation
Change-Id: I50a8dcd44dd9d36c9ecbfa78e103eb9bff32eab9
* Fix three bugs with segment publishing.
1. In AppenderatorImpl: always use a unique path if requested, even if the segment
was already pushed. This is important because if we don't do this, it causes
the issue mentioned in #6124.
2. In IndexerSQLMetadataStorageCoordinator: Fix a bug that could cause it to return
a "not published" result instead of throwing an exception, when there was one
metadata update failure, followed by some random exception. This is done by
resetting the AtomicBoolean that tracks what case we're in, each time the
callback runs.
3. In BaseAppenderatorDriver: Only kill segments if we get an affirmative false
publish result. Skip killing if we just got some exception. The reason for this
is that we want to avoid killing segments if they are in an unknown state.
Two other changes to clarify the contracts a bit and hopefully prevent future bugs:
1. Return SegmentPublishResult from TransactionalSegmentPublisher, to make it
more similar to announceHistoricalSegments.
2. Make it explicit, at multiple levels of javadocs, that a "false" publish result
must indicate that the publish _definitely_ did not happen. Unknown states must be
exceptions. This helps BaseAppenderatorDriver do the right thing.
* Remove javadoc-only import.
* Updates.
* Fix test.
* Fix tests.
* Cache: Add maxEntrySize config.
The idea is this makes it more feasible to cache query types that
can potentially generate large result sets, like groupBy and select,
without fear of writing too much to the cache per query.
Includes a refactor of cache population code in CachingQueryRunner and
CachingClusteredClient, such that they now use the same CachePopulator
interface with two implementations: one for foreground and one for
background.
The main reason for splitting the foreground / background impls is
that the foreground impl can have a more effective implementation of
maxEntrySize. It can stop retaining subvalues for the cache early.
* Add CachePopulatorStats.
* Fix whitespace.
* Fix docs.
* Fix various tests.
* Add tests.
* Fix tests.
* Better tests
* Remove conflict markers.
* Fix licenses.
* Native parallel indexing without shuffle
* fix build
* fix ci
* fix ingestion without intervals
* fix retry
* fix retry
* add it test
* use chat handler
* fix build
* add docs
* fix ITUnionQueryTest
* fix failures
* disable metrics reporting
* working
* Fix split of static-s3 firehose
* Add endpoints to supervisor task and a unit test for endpoints
* increase timeout in test
* Added doc
* Address comments
* Fix overlapping locks
* address comments
* Fix static s3 firehose
* Fix test
* fix build
* fix test
* fix typo in docs
* add missing maxBytesInMemory to doc
* address comments
* fix race in test
* fix test
* Rename to ParallelIndexSupervisorTask
* fix teamcity
* address comments
* Fix license
* addressing comments
* addressing comments
* indexTaskClient-based segmentAllocator instead of CountingActionBasedSegmentAllocator
* Fix race in TaskMonitor and move HTTP endpoints to supervisorTask from runner
* Add more javadocs
* use StringUtils.nonStrictFormat for logging
* fix typo and remove unused class
* fix tests
* change package
* fix strict build
* tmp
* Fix overlord api according to the recent change in master
* Fix it test
* Remove some unnecessary task storage internal APIs.
- Remove MetadataStorageActionHandler's getInactiveStatusesSince and getActiveEntriesWithStatus.
- Remove TaskStorage's getCreatedDateTimeAndDataSource.
- Remove TaskStorageQueryAdapter's getCreatedTime, and getCreatedDateAndDataSource.
- Migrated all callers to getActiveTaskInfo and getCompletedTaskInfo.
This has one side effect: since getActiveTaskInfo (new) warns and continues when it
sees unreadable tasks, but getActiveEntriesWithStatus threw an exception when it
encountered those, it means that after this patch bad tasks will be ignored when
syncing from metadata storage rather than causing an exception to be thrown.
IMO, this is an improvement, since the most likely reason for bad tasks is either:
- A new version introduced an additional validation, and a pre-existing task doesn't
pass it.
- You are rolling back from a newer version to an older version.
In both cases, I believe you would want to skip tasks that can't be deserialized,
rather than blocking overlord startup.
* Remove unused import.
* Fix formatting.
* Fix formatting.
* Various changes about druid-services module
* Patch improvements from reviewer
* Add ToArrayCallWithZeroLengthArrayArgument & ArraysAsListWithZeroOrOneArgument into inspection profile
* Fix ArraysAsListWithZeroOrOneArgument
* Fix conflict
* Fix ToArrayCallWithZeroLengthArrayArgument
* Fix AliEqualsAvoidNull
* Remove blank line
* Remove unused import clauses
* Fix code style in TopNQueryRunnerTest
* Fix conflict
* Don't use Collections.singletonList when converting the type of array type
* Add argLine into maven-surefire-plugin in druid-process module & increase the timeout value for testMoveSegment testcase
* Roll back the latest commit
* Add java.io.File#toURL() into druid-forbidden-apis
* Using Boolean.parseBoolean instead of Boolean.valueOf for CliCoordinator#isOverlord
* Add a new regexp element into stylecode xml file
* Fix style error for new regexp
* Set the level of ArraysAsListWithZeroOrOneArgument as WARNING
* Fix style error for new regexp
* Add option BY_LEVEL for ToArrayCallWithZeroLengthArrayArgument in inspection profile
* Roll back the level as ToArrayCallWithZeroLengthArrayArgument as ERROR
* Add toArray(new Object[0]) regexp into checkstyle config file & fix them
* Set the level of ArraysAsListWithZeroOrOneArgument as ERROR & Roll back the level of ToArrayCallWithZeroLengthArrayArgument as WARNING until Youtrack fix it
* Add a comment for string equals regexp in checkstyle config
* Fix code format
* Add RedundantTypeArguments as ERROR level inspection
* Fix cannot resolve symbol datasource
* Add support to filter on datasource for active tasks
* Added datasource filter to sql query for active tasks
* Fixed unit tests
* Address PR comments
* Fix NPE while handling CheckpointNotice
* fix code style
* Fix test
* fix test
* add a log for creating a new taskGroup
* fix backward compatibility in KafkaIOConfig