* index_parallel: support !appendToExisting with no explicit intervals
This enables ParallelIndexSupervisorTask to dynamically request locks at runtime
if it is run without explicit intervals in the granularity spec and with
appendToExisting set to false. Previously, it behaved as if appendToExisting
was set to true, which was undocumented and inconsistent with IndexTask and
Hadoop indexing.
Also, when ParallelIndexSupervisorTask allocates segments in the explicit
interval case, fail if its locks on the interval have been revoked.
Also make a few other additions/clarifications to native ingestion docs.
Fixes#6989.
* Review feedback.
PR description on GitHub updated to match.
* Make native batch ingestion partitions start at 0
* Fix to previous commit
* Unit test. Verified to fail without the other commits on this branch.
* Another round of review
* Slightly scarier warning
* document middle manager api
* re-arrange
* correction
* document more missing overlord api calls, minor re-arrange of some code i was referencing
* fix it
* this will fix it
* fixup
* link to other docs
* Support kafka transactional topics
* update kafka to version 2.0.0
* Remove the skipOffsetGaps option since it's not used anymore
* Adjust kafka consumer to use transactional semantics
* Update tests
* Remove unused import from test
* Fix compilation
* Invoke transaction api to fix a unit test
* temporary modification of travis.yml for debugging
* another attempt to get travis tasklogs
* update kafka to 2.0.1 at all places
* Remove druid-kafka-eight dependency from integration-tests, remove the kafka firehose test and deprecate kafka-eight classes
* Add deprecated in docs for kafka-eight and kafka-simple extensions
* Remove skipOffsetGaps and code changes for transaction support
* Fix indentation
* remove skipOffsetGaps from kinesis
* Add transaction api to KafkaRecordSupplierTest
* Fix indent
* Fix test
* update kafka version to 2.1.0
* Fix:
1. hadoop-common dependency for druid-hdfs and druid-kerberos extensions
Refactoring:
2. Hadoop config call in the inner static class to avoid class path conflicts for stopGracefully kill
* Fix:
1. hadoop-common test dependency
* Fix:
1. Avoid issue of kill command once the job is actually completed
* Improper equals override is fixed to prevent NullPointerException
* Fixed curly brace indentation.
* Test method is added for equals method of TaskLockPosse class.
Native batch indexing doesn't yet support the maxParseExceptions,
maxSavedParseExceptions, and logParseExceptions tuning config options, so
ParallelIndexSupervisorTask logs if these are set. But the default value for
maxParseExceptions is Integer.MAX_VALUE, which means that you'll get the
maxParseExceptions flavor of this warning even if you don't configure
maxParseExceptions.
This PR changes all three warnings to occur if you change the settings from the
default; this mostly affects the maxParseExceptions warning.
* Prohibit assigning concurrent maps into Map-types variables and fields; Fix a race condition in CoordinatorRuleManager; improve logic in DirectDruidClient and ResourcePool
* Enforce that if compute(), computeIfAbsent(), computeIfPresent() or merge() is called on a ConcurrentHashMap, it's stored in a ConcurrentHashMap-typed variable, not ConcurrentMap; add comments explaining get()-before-computeIfAbsent() optimization; refactor Counters; fix a race condition in Intialization.java
* Remove unnecessary comment
* Checkstyle
* Fix getFromExtensions()
* Add a reference to the comment about guarded computeIfAbsent() optimization; IdentityHashMap optimization
* Fix UriCacheGeneratorTest
* Workaround issue with MaterializedViewQueryQueryToolChest
* Strengthen Appenderator's contract regarding concurrency
* Adding new web console.
* fixed css
* fix form height
* fix typo
* do import custom react-table css
* added repo field so npm does not complain
* ask travis for node 10
* move indexing-service/src/main/resources/indexer_static into web-console
* fix resource names and paths
* add licenses
* fix exclude file
* add licenses to misc files and tidy up
* remove rebase marker
* fix link
* updated env variable name
* tidy up licenses and surface errors
* cleanup
* remove unused code, fix missing await
* TeamCity does not like the name aux
* add more links to tasks view
* rm pages
* update gitignore
* update readme to be accurate
* make clean script
* removed old console dependancy
* update Jetty routes
* add a comment for welcome files for coordinator
* do not show inital notifaction for now
* renamed overlord console back to console.html
* fix coordinator console
* rename coordinator-console.html to index.html
* * Add few methods about base64 into StringUtils
* Use `java.util.Base64` instead of others
* Add org.apache.commons.codec.binary.Base64 & com.google.common.io.BaseEncoding into druid-forbidden-apis
* Rename encodeBase64String & decodeBase64String
* Update druid-forbidden-apis
* KillTask from overlord UI now makes sure that it terminates the underlying MR job, thus saving unnecessary compute
Run in jobby is now split into 2
1. submitAndGetHadoopJobId followed by 2. run
submitAndGetHadoopJobId is responsible for submitting the job and returning the jobId as a string, run monitors this job for completion
JobHelper writes this jobId in the path provided by HadoopIndexTask which in turn is provided by the ForkingTaskRunner
HadoopIndexTask reads this path when kill task is clicked to get hte jobId and fire the kill command via the yarn api. This is taken care in the stopGracefully method which is called in SingleTaskBackgroundRunner. Have enabled `canRestore` method to return `true` for HadoopIndexTask in order for the stopGracefully method to be called
Hadoop*Job files have been changed to incorporate the changes to jobby
* Addressing PR comments
* Addressing PR comments - Fix taskDir
* Addressing PR comments - For changing the contract of Task.stopGracefully()
`SingleTaskBackgroundRunner` calls stopGracefully in stop() and then checks for canRestore condition to return the status of the task
* Addressing PR comments
1. Formatting
2. Removing `submitAndGetHadoopJobId` from `Jobby` and calling writeJobIdToFile in the job itself
* Addressing PR comments
1. POM change. Moving hadoop dependency to indexing-hadoop
* Addressing PR comments
1. stopGracefully now accepts TaskConfig as a param
Handling isRestoreOnRestart in stopGracefully for `AppenderatorDriverRealtimeIndexTask, RealtimeIndexTask, SeekableStreamIndexTask`
Changing tests to make TaskConfig param isRestoreOnRestart to true
* Replacing Math.random() with ThreadLocalRandom.current().nextDouble()
* Added java.lang.Math#random() in forbidden-apis.txt
* Minor change in the message - druid-forbidden-apis.txt
* Use multi-guava version friendly direct executor implementation
* Don't use a singleton
* Fix strict compliation complaints
* Copy Guava's DirectExecutor
* Fix javadoc
* Imports are the devil
* created seekablestream classes
* created seekablestreamsupervisor class
* first attempt to integrate kafa indexing service to use SeekableStream
* seekablestream bug fixes
* kafkarecordsupplier
* integrated kafka indexing service with seekablestream
* implemented resume/suspend and refactored some package names
* moved kinesis indexing service into core druid extensions
* merged some changes from kafka supervisor race condition
* integrated kinesis-indexing-service with seekablestream
* unite tests for kinesis-indexing-service
* various bug fixes for kinesis-indexing-service
* refactored kinesisindexingtask
* finished up more kinesis unit tests
* more bug fixes for kinesis-indexing-service
* finsihed refactoring kinesis unit tests
* removed KinesisParititons and KafkaPartitions to use SeekableStreamPartitions
* kinesis-indexing-service code cleanup and docs
* merge #6291
merge #6337
merge #6383
* added more docs and reordered methods
* fixd kinesis tests after merging master and added docs in seekablestream
* fix various things from pr comment
* improve recordsupplier and add unit tests
* migrated to aws-java-sdk-kinesis
* merge changes from master
* fix pom files and forbiddenapi checks
* checkpoint JavaType bug fix
* fix pom and stuff
* disable checkpointing in kinesis
* fix kinesis sequence number null in closed shard
* merge changes from master
* fixes for kinesis tasks
* capitalized <partitionType, sequenceType>
* removed abstract class loggers
* conform to guava api restrictions
* add docker for travis other modules test
* address comments
* improve RecordSupplier to supply records in batch
* fix strict compile issue
* add test scope for localstack dependency
* kinesis indexing task refactoring
* comments
* github comments
* minor fix
* removed unneeded readme
* fix deserialization bug
* fix various bugs
* KinesisRecordSupplier unable to catch up to earliest position in stream bug fix
* minor changes to kinesis
* implement deaggregate for kinesis
* Merge remote-tracking branch 'upstream/master' into seekablestream
* fix kinesis offset discrepancy with kafka
* kinesis record supplier disable getPosition
* pr comments
* mock for kinesis tests and remove docker dependency for unit tests
* PR comments
* avg lag in kafkasupervisor #6587
* refacotred SequenceMetadata in taskRunners
* small fix
* more small fix
* recordsupplier resource leak
* revert .travis.yml formatting
* fix style
* kinesis docs
* doc part2
* more docs
* comments
* comments*2
* revert string replace changes
* comments
* teamcity
* comments part 1
* comments part 2
* comments part 3
* merge #6754
* fix injection binding
* comments
* KinesisRegion refactor
* comments part idk lol
* can't think of a commit msg anymore
* remove possiblyResetDataSourceMetadata() for IncrementalPublishingTaskRunner
* commmmmmmmmmments
* extra error handling in KinesisRecordSupplier getRecords
* comments
* quickfix
* typo
* oof
* make logs that are only useful for debugging be at debug level so log volume is much more chill
* info level messages for total merge buffer allocated/free
* more chill compaction logs
* FilteredRequestLogger: Fix start/stop, invalid delegate behavior.
Fixes two bugs:
1) FilteredRequestLogger did not start/stop the delegate.
2) FilteredRequestLogger would ignore an invalid delegate type, and
instead silently substitute the "noop" logger. This was due to a larger
problem with RequestLoggerProvider setup in general; the fix here is
to remove "defaultImpl" from the RequestLoggerProvider interface, and
instead have JsonConfigurator be responsible for creating the
default implementations. It is stricter about things than the old system
was, and is only willing to make a noop logger if it doesn't see any
request logger configs. Otherwise, it'll raise a provision error.
* Remove unneeded annotations.
* autosize processing buffers based on direct memory sizing
* remove oops, more test
* max 1gb autosize buffers, test, start of docs
* fix oops
* revert accidental change
* print buffer size in exception
* change the things
* remove AbstractResourceFilter.isApplicable because it is not, add tests for OverlordResource.doShutdown and OverlordResource.shutdownTasksForDatasource
* cleanup
* 1. added support for unused DateTime start parameter in getRecentlyFinishedTaskInfoSince method:
HeapMemoryTaskStorage.getRecentlyFinishedTaskInfoSince return the finished tasks by comparing TaskStuff.createdDate with the start time
2. added filtering by status complete to TaskStuff list stream in HeapMemoryTaskStorage.getNRecentlyFinishedTaskInfo method.
3. changed names of methods and parameters to present that public API method OverlordResource.getTasks return the list of completed tasks, which createdDate, not date of completion, belongs to the interval parameter.
* 1. added support for unused DateTime start parameter in getRecentlyFinishedTaskInfoSince method:
HeapMemoryTaskStorage.getRecentlyFinishedTaskInfoSince return the finished tasks by comparing TaskStuff.createdDate with the start time
2. added filtering by status complete to TaskStuff list stream in HeapMemoryTaskStorage.getNRecentlyFinishedTaskInfo method.
3. changed names of methods and parameters to present that public API method OverlordResource.getTasks return the list of completed tasks, which createdDate, not date of completion, belongs to the interval parameter.
* Fixed OverlordResourceTest to Support changed methods names
* Changed methods and parameters names to make them more obvious to understand.
* Changed String.replace() for the StringUtils.replace()(#6607)
* Fixed checkstyle error
* Add checkstyle rules about imports and empty lines between members
* Add suppressions
* Update Eclipse import order
* Add empty line
* Fix StatsDEmitter
* Prohibit some guava collection APIs and use JDK APIs directly
* reset files that changed by accident
* sort codestyle/druid-forbidden-apis.txt alphabetically
* 1. Mysql default transaction isolation is REPEATABLE_READ, treat it as READ_COMMITTED will reduce insert id conflict.
2. Add an index to 'dataSource used end' is work well for the most of scenarios(get recently segments), and it will speed up sync add pending segments in DB.
3. 'select and insert' is not need within transaction.
* Use TaskLockbox.doInCriticalSection instead of synchronized syntax to speed up insert pending segments.
* fix typo for NullPointerException
* Use NodeType enum instead of Strings
* Make NodeType constants uppercase
* Fix CommonCacheNotifier and NodeType/ServerType comments
* Reconsidering comment
* Fix import
* Add a comment to CommonCacheNotifier.NODE_TYPES
* Prevent failed KafkaConsumer creation from blocking overlord startup
* PR comments
* Fix random task ID length
* Adjust test timer
* Use Integer.SIZE
This PR accumulates many refactorings and small improvements that I did while preparing the next change set of https://github.com/druid-io/druid/projects/2. I finally decided to make them a separate PR to minimize the volume of the main PR.
Some of the changes:
- Renamed confusing "Generic Column" term to "Numeric Column" (what it actually implies) in many class names.
- Generified `ComplexMetricExtractor`
* Replace statusCode with status (#6333)
Also changed runnerStatusCode to runnerStatus to keep things consistent
* Add unit test
* Add status param to TaskStatusPlus
Revert to statusCode and runnerStatusCode
* Add additional status member to TaskStatusPlus
* Change TaskResponseObject to match overlord's response object
* Address PR comments
* address comments
* Add runtime exception after logging error
* Remove (deprecated)status member variable from TaskStatusPlus
* Minor change
* Add support targetCompactionSizeBytes for compactionTask
* fix test
* fix a bug in keepSegmentGranularity
* fix wrong noinspection comment
* address comments
* Adding licenses and enable apache-rat-plugi.
Change-Id: I4685a2d9f1e147855dba69329b286f2d5bee3c18
* restore the copywrite of demo_table and add it to the list of allowed ones
Change-Id: I2a9efde6f4b984bc1ac90483e90d98e71f818a14
* revirew comments
Change-Id: I0256c930b7f9a5bb09b44b5e7a149e6ec48cb0ca
* more fixup
Change-Id: I1355e8a2549e76cd44487abec142be79bec59de2
* align
Change-Id: I70bc47ecb577bdf6b91639dd91b6f5642aa6b02f
* 'suspend' and 'resume' support for kafka indexing service
changes:
* introduces `SuspendableSupervisorSpec` interface to describe supervisors which support suspend/resume functionality controlled through the `SupervisorManager`, which will gracefully shutdown the supervisor and it's tasks, update it's `SupervisorSpec` with either a suspended or running state, and update with the toggled spec. Spec updates are provided by `SuspendableSupervisorSpec.createSuspendedSpec` and `SuspendableSupervisorSpec.createRunningSpec` respectively.
* `KafkaSupervisorSpec` extends `SuspendableSupervisorSpec` and now supports suspend/resume functionality. The difference in behavior between 'running' and 'suspended' state is whether the supervisor will attempt to ensure that indexing tasks are or are not running respectively. Behavior is identical otherwise.
* `SupervisorResource` now provides `/druid/indexer/v1/supervisor/{id}/suspend` and `/druid/indexer/v1/supervisor/{id}/resume` which are used to suspend/resume suspendable supervisors
* Deprecated `/druid/indexer/v1/supervisor/{id}/shutdown` and moved it's functionality to `/druid/indexer/v1/supervisor/{id}/terminate` since 'shutdown' is ambiguous verbage for something that effectively stops a supervisor forever
* Added ability to get all supervisor specs from `/druid/indexer/v1/supervisor` by supplying the 'full' query parameter `/druid/indexer/v1/supervisor?full` which will return a list of json objects of the form `{"id":<id>, "spec":<SupervisorSpec>}`
* Updated overlord console ui to enable suspend/resume, and changed 'shutdown' to 'terminate'
* move overlord console status to own column in supervisor table so does not look like garbage
* spacing
* padding
* other kind of spacing
* fix rebase fail
* fix more better
* all supervisors now suspendable, updated materialized view supervisor to support suspend, more tests
* fix log
* resolves#5898 by adding maxTotalRows to incremental publishing kafka index task and appenderator based realtime indexing task, as available in IndexTask
* address review comments
* changes due to review
* merge fail
* Rename io.druid to org.apache.druid.
* Fix META-INF files and remove some benchmark results.
* MonitorsConfig update for metrics package migration.
* Reorder some dimensions in inner queries for some reason.
* Fix protobuf tests.
* Fix all inspection errors currently reported.
TeamCity builds on master are reporting inspection errors, possibly
because there was a while where it was not running due to the Apache
migration, and there was some drift.
* Fix one more location.
* Fix tests.
* Another fix.
* 'shutdownAllTasks' API for a dataSource
Change-Id: I30d14390457d39e0427d23a48f4f224223dc5777
* fix api path and return
Change-Id: Ib463f31ee2c4cb168cf2697f149be845b57c42e5
* optimize implementation
Change-Id: I50a8dcd44dd9d36c9ecbfa78e103eb9bff32eab9
* Fix three bugs with segment publishing.
1. In AppenderatorImpl: always use a unique path if requested, even if the segment
was already pushed. This is important because if we don't do this, it causes
the issue mentioned in #6124.
2. In IndexerSQLMetadataStorageCoordinator: Fix a bug that could cause it to return
a "not published" result instead of throwing an exception, when there was one
metadata update failure, followed by some random exception. This is done by
resetting the AtomicBoolean that tracks what case we're in, each time the
callback runs.
3. In BaseAppenderatorDriver: Only kill segments if we get an affirmative false
publish result. Skip killing if we just got some exception. The reason for this
is that we want to avoid killing segments if they are in an unknown state.
Two other changes to clarify the contracts a bit and hopefully prevent future bugs:
1. Return SegmentPublishResult from TransactionalSegmentPublisher, to make it
more similar to announceHistoricalSegments.
2. Make it explicit, at multiple levels of javadocs, that a "false" publish result
must indicate that the publish _definitely_ did not happen. Unknown states must be
exceptions. This helps BaseAppenderatorDriver do the right thing.
* Remove javadoc-only import.
* Updates.
* Fix test.
* Fix tests.
* Cache: Add maxEntrySize config.
The idea is this makes it more feasible to cache query types that
can potentially generate large result sets, like groupBy and select,
without fear of writing too much to the cache per query.
Includes a refactor of cache population code in CachingQueryRunner and
CachingClusteredClient, such that they now use the same CachePopulator
interface with two implementations: one for foreground and one for
background.
The main reason for splitting the foreground / background impls is
that the foreground impl can have a more effective implementation of
maxEntrySize. It can stop retaining subvalues for the cache early.
* Add CachePopulatorStats.
* Fix whitespace.
* Fix docs.
* Fix various tests.
* Add tests.
* Fix tests.
* Better tests
* Remove conflict markers.
* Fix licenses.
* Native parallel indexing without shuffle
* fix build
* fix ci
* fix ingestion without intervals
* fix retry
* fix retry
* add it test
* use chat handler
* fix build
* add docs
* fix ITUnionQueryTest
* fix failures
* disable metrics reporting
* working
* Fix split of static-s3 firehose
* Add endpoints to supervisor task and a unit test for endpoints
* increase timeout in test
* Added doc
* Address comments
* Fix overlapping locks
* address comments
* Fix static s3 firehose
* Fix test
* fix build
* fix test
* fix typo in docs
* add missing maxBytesInMemory to doc
* address comments
* fix race in test
* fix test
* Rename to ParallelIndexSupervisorTask
* fix teamcity
* address comments
* Fix license
* addressing comments
* addressing comments
* indexTaskClient-based segmentAllocator instead of CountingActionBasedSegmentAllocator
* Fix race in TaskMonitor and move HTTP endpoints to supervisorTask from runner
* Add more javadocs
* use StringUtils.nonStrictFormat for logging
* fix typo and remove unused class
* fix tests
* change package
* fix strict build
* tmp
* Fix overlord api according to the recent change in master
* Fix it test
* Remove some unnecessary task storage internal APIs.
- Remove MetadataStorageActionHandler's getInactiveStatusesSince and getActiveEntriesWithStatus.
- Remove TaskStorage's getCreatedDateTimeAndDataSource.
- Remove TaskStorageQueryAdapter's getCreatedTime, and getCreatedDateAndDataSource.
- Migrated all callers to getActiveTaskInfo and getCompletedTaskInfo.
This has one side effect: since getActiveTaskInfo (new) warns and continues when it
sees unreadable tasks, but getActiveEntriesWithStatus threw an exception when it
encountered those, it means that after this patch bad tasks will be ignored when
syncing from metadata storage rather than causing an exception to be thrown.
IMO, this is an improvement, since the most likely reason for bad tasks is either:
- A new version introduced an additional validation, and a pre-existing task doesn't
pass it.
- You are rolling back from a newer version to an older version.
In both cases, I believe you would want to skip tasks that can't be deserialized,
rather than blocking overlord startup.
* Remove unused import.
* Fix formatting.
* Fix formatting.
* Various changes about druid-services module
* Patch improvements from reviewer
* Add ToArrayCallWithZeroLengthArrayArgument & ArraysAsListWithZeroOrOneArgument into inspection profile
* Fix ArraysAsListWithZeroOrOneArgument
* Fix conflict
* Fix ToArrayCallWithZeroLengthArrayArgument
* Fix AliEqualsAvoidNull
* Remove blank line
* Remove unused import clauses
* Fix code style in TopNQueryRunnerTest
* Fix conflict
* Don't use Collections.singletonList when converting the type of array type
* Add argLine into maven-surefire-plugin in druid-process module & increase the timeout value for testMoveSegment testcase
* Roll back the latest commit
* Add java.io.File#toURL() into druid-forbidden-apis
* Using Boolean.parseBoolean instead of Boolean.valueOf for CliCoordinator#isOverlord
* Add a new regexp element into stylecode xml file
* Fix style error for new regexp
* Set the level of ArraysAsListWithZeroOrOneArgument as WARNING
* Fix style error for new regexp
* Add option BY_LEVEL for ToArrayCallWithZeroLengthArrayArgument in inspection profile
* Roll back the level as ToArrayCallWithZeroLengthArrayArgument as ERROR
* Add toArray(new Object[0]) regexp into checkstyle config file & fix them
* Set the level of ArraysAsListWithZeroOrOneArgument as ERROR & Roll back the level of ToArrayCallWithZeroLengthArrayArgument as WARNING until Youtrack fix it
* Add a comment for string equals regexp in checkstyle config
* Fix code format
* Add RedundantTypeArguments as ERROR level inspection
* Fix cannot resolve symbol datasource