* upgrade Airline to Airline 2
https://github.com/airlift/airline is no longer maintained, updating to
https://github.com/rvesse/airline (Airline 2) to use an actively
maintained version, while minimizing breaking changes.
Note, this is a backwards incompatible change, and extensions relying on
the CliCommandCreator extension point will also need to be updated.
* fix dependency checks where jakarta.inject is now resolved first instead
of javax.inject, due to Airline 2 using jakarta
Problem:
- When a kinesis stream is resharded, the original shards are closed.
Any intermediate shard created in the process is eventually closed as well.
- If a shard is closed before any record is put into it, it can be safely ignored for ingestion.
- It is expensive to determine if a closed shard is empty, since it requires a call to the Kinesis cluster.
Changes:
- Maintain a cache of closed empty and closed non-empty shards in `KinesisSupervisor`
- Add config `skipIngorableShards` to `KinesisSupervisorTuningConfig`
- The caches are used and updated only when `skipIgnorableShards = true`
This fixes intermittent, spurious failures that we've observed in
the Kinesis sharding integration tests due to Kinesis taking
longer than the code expected to start a sharding operation. The
method that's changed is part of the integration test suite and
only used by the test cases that we've seen are flaky.
Prior to this change, the tests expected a sharding operation to
start in 9 seconds (30 retries * 300ms delay/retry). This change
bumps the number of retries to 100, giving Kinesis 30 seconds to
start the sharding.
This PR also makes a small, clarifying change to the condition
used to determine if sharding has started. Instead of checking if
the number of shards has increased (which was technically correct
even if the test is reducing the number of shards due to a Kinesis
implementation detail), we now just check if the shard count has
changed.
Makes kinesis ingestion resilient to `LimitExceededException` caused by resharding.
Replace `describeStream` with `listShards` (recommended) to get shard related info.
`describeStream` has a limit (100) to the number of shards returned per call and a low default TPS limit of 10.
`listShards` returns the info for at most 1000 shards and has a higher TPS limit of 100 as well.
Key changed/added classes in this PR
* `KinesisRecordSupplier`
* `KinesisAdminClient`
* Use Druid's extension loading for integration test instead of maven
* fix maven command
* override config path
* load input format extensions and kafka by default; add prepopulated-data group
* all docker-composes are overridable
* fix s3 configs
* override config for all
* fix docker_compose_args
* fix security tests
* turn off debug logs for overlord api calls
* clean up stuff
* revert docker-compose.yml
* fix override config for query error test; fix circular dependency in docker compose
* add back some dependencies in docker compose
* new maven profile for integration test
* example file filter
* Add jsonPath functions support
* Add jsonPath function test for Avro
* Add jsonPath function length() to Orc
* Add jsonPath function length() to Parquet
* Add more tests to ORC format
* update doc
* Fix exception during ingestion
* Add IT test case
* Revert "Fix exception during ingestion"
This reverts commit 5a5484b9ea.
* update IT test case
* Add 'keys()'
* Commit IT test case
* Fix UT
* Make nodeRole available during binding; add support for dynamic registration of DruidService
* fix checkstyle and test
* fix customRole test
* address comments
* add more javadoc
* Code cleanup from query profile project
* Fix spelling errors
* Fix Javadoc formatting
* Abstract out repeated test code
* Reuse constants in place of some string literals
* Fix up some parameterized types
* Reduce warnings reported by Eclipse
* Reverted change due to lack of tests
* Use 404 instead of 400
* Use 404 instead of 400
* Add UT test cases
* Add IT testcases
* add UT for task resource filter
Signed-off-by: frank chen <frank.chen021@outlook.com>
* Using org.testing.Assert instead of org.junit.Assert
* Resolve comments and fix test
* Fix test
* Fix tests
* Resolve comments
* add impl
* fix checkstyle
* add test
* add test
* add unit tests
* fix unit tests
* fix unit tests
* fix unit tests
* add IT
* add IT
* add comments
* fix spelling
* Make LocalInputSource.files a List instead of Set and adjust wikipedia_index_task to use file list.
Rationale: the behavior of wikipedia_index_task.json is order-dependent with regard to its input
files; some orders produce 4 segments and some produce 5 segments. Some integration tests, like
ITSystemTableBatchIndexTaskTest and ITAutoCompactionTest, are written assuming that the
4-segment case will always happen. Providing the file list in a specific order ensures that this
will happen as expected by the tests.
I didn't see a specific reason why the LocalInputSource.files parameter needed to be a Set, so
changing it to a List was the simplest way to achieve the consistent ordering. I think it will
also make the behavior make more sense if someone does specify the same input file multiple
times in a spec: I think they'd expect it to be loaded multiple times instead of deduped. This
is consistent with the behavior of other input sources like S3, GCS, HTTP.
* Sort files in LocalFirehoseFactory.
* Use a simple class to sanitize sanitizable errors and log them
The purpose of this is to sanitize JDBC errors, but can sanitize other errors
if they implement SanitizableError Interface
add a class to log errors and sanitize them
added a simple test that tests out that the error gets sanitized
add @NonNull annotation to serverconfig's ErrorResponseTransfromStrategy
* return less information as part of too many connections, and instead only log specific details
This is so an end user gets relevant information but not too much info since they might now how
many brokers they have
* return only runtime exceptions
added new error types that need to be sanitized
also sanitize deprecated and unsupported exceptions.
* dont reqrewite exceptions unless necessary for checked exceptions
add docs
avoid blanket turning all exceptions into runtime exceptions
* address comments, to fix up docs.
add more javadocs
add support UOE sanitization
* use try catch instead and sanitize at public methods
* checkstyle fixes
* throw noSuchStatement and NoSuchConnection as Avatica is affected by those
* address comments. move log error back to druid meta
clean up bad formatting and commented code. add missed catch for NoSuchStatementException
clean up comments for error handler and add comment explainging not wanting to santize avatica exceptions
* alter test to reflect new error message
* revert ColumnAnalysis type, add typeSignature and use it for DruidSchema
* review stuffs
* maybe null
* better maybe null
* Update docs/querying/segmentmetadataquery.md
* Update docs/querying/segmentmetadataquery.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* fix null right
* sad
* oops
* Update batch_hadoop_queries.json
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Add support for hadoop 3 profiles . Most of the details are captured in #11791 .
We use a combination of maven profiles and resource filtering to achieve this. Hadoop2 is supported by default and a new maven profile with the name hadoop3 is created. This will allow the user to choose the profile which is best suited for the use case.
* Revert "Require Datasource WRITE authorization for Supervisor and Task access (#11718)"
This reverts commit f2d6100124.
* Revert "Require DATASOURCE WRITE access in SupervisorResourceFilter and TaskResourceFilter (#11680)"
This reverts commit 6779c4652d.
* Fix docs for the reverted commits
* Fix and restore deleted tests
* Fix and restore SystemSchemaTest
* Increment retry count to add more time for tests to pass
* Re-enable ITHttpInputSourceTest
* Restore original count
* This test is about input source, hash partitioning takes longer and not required thus changing to dynamic
* Further simplify by removing sketches
Follow up PR for #11680
Description
Supervisor and Task APIs are related to ingestion and must always require Datasource WRITE
authorization even if they are purely informative.
Changes
Check Datasource WRITE in SystemSchema for tables "supervisors" and "tasks"
Check Datasource WRITE for APIs /supervisor/history and /supervisor/{id}/history
Check Datasource for all Indexing Task APIs
* Add handoff wait time to ingestion stats report. Refactor some code for batch handoff
* fix checkstyle
* Add assertion to AbstractITBatchIndexTask to make sure report reflects wait for segments happened
* add docs to the task reports section of doc
* initial work
* reduce lock in sqlLifecycle
* Integration test for sql canceling
* javadoc, cleanup, more tests
* log level to debug
* fix test
* checkstyle
* fix flaky test; address comments
* rowTransformer
* cancelled state
* use lock
* explode instead of noop
* oops
* unused import
* less aggressive with state
* fix calcite charset
* don't emit metrics when you are not authorized
* Configurable maxStreamLength for doubles sketches
* fix equals/hashcode and it test failure
* fix test
* fix it test
* benchmark
* doc
* grouping key
* fix comment
* dependency check
* Update docs/development/extensions-core/datasketches-quantiles.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/sql.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/sql.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/sql.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/sql.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/sql.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/sql.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/sql.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Fixes#11297.
Description
Description and design in the proposal #11297
Key changed/added classes in this PR
*DataSegmentPusher
*ShuffleClient
*PartitionStat
*PartitionLocation
*IntermediaryDataManager
* compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded
* compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded
* compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded
* fix test
* fix test
This PR refactors the code for QueryRunnerFactory#mergeRunners to accept a new interface called QueryProcessingPool instead of ExecutorService for concurrent execution of query runners. This interface will let custom extensions inject their own implementation for deciding which query-runner to prioritize first. The default implementation is the same as today that takes the priority of query into account. QueryProcessingPool can also be used as a regular executor service. It has a dedicated method for accepting query execution work so implementations can differentiate between regular async tasks and query execution tasks. This dedicated method also passes the QueryRunner object as part of the task information. This hook will let custom extensions carry any state from QuerySegmentWalker to QueryProcessingPool#mergeRunners which is not possible currently.
* Do not stop retrying when an exception is encountered. Save & propagate last exception if retry count is exceeded.
* Add one more log message to help with debugging
* Limit schema registry heap to attempt to control OOMs
* Add ability to wait for segment availability for batch jobs
* IT updates
* fix queries in legacy hadoop IT
* Fix broken indexing integration tests
* address an lgtm flag
* spell checker still flagging for hadoop doc. adding under that file header too
* fix compaction IT
* Updates to wait for availability method
* improve unit testing for patch
* fix bad indentation
* refactor waitForSegmentAvailability
* Fixes based off of review comments
* cleanup to get compile after merging with master
* fix failing test after previous logic update
* add back code that must have gotten deleted during conflict resolution
* update some logging code
* fixes to get compilation working after merge with master
* reset interrupt flag in catch block after code review pointed it out
* small changes following self-review
* fixup some issues brought on by merge with master
* small changes after review
* cleanup a little bit after merge with master
* Fix potential resource leak in AbstractBatchIndexTask
* syntax fix
* Add a Compcation TuningConfig type
* add docs stipulating the lack of support by Compaction tasks for the new config
* Fixup compilation errors after merge with master
* Remove erreneous newline
* DruidInputSource: Fix issues in column projection, timestamp handling.
DruidInputSource, DruidSegmentReader changes:
1) Remove "dimensions" and "metrics". They are not necessary, because we
can compute which columns we need to read based on what is going to
be used by the timestamp, transform, dimensions, and metrics.
2) Start using ColumnsFilter (see below) to decide which columns we need
to read.
3) Actually respect the "timestampSpec". Previously, it was ignored, and
the timestamp of the returned InputRows was set to the `__time` column
of the input datasource.
(1) and (2) together fix a bug in which the DruidInputSource would not
properly read columns that are used as inputs to a transformSpec.
(3) fixes a bug where the timestampSpec would be ignored if you attempted
to set the column to something other than `__time`.
(1) and (3) are breaking changes.
Web console changes:
1) Remove "Dimensions" and "Metrics" from the Druid input source.
2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for
compatibility with the new behavior.
Other changes:
1) Add ColumnsFilter, a new class that allows input readers to determine
which columns they need to read. Currently, it's only used by the
DruidInputSource, but it could be used by other columnar input sources
in the future.
2) Add a ColumnsFilter to InputRowSchema.
3) Remove the metric names from InputRowSchema (they were unused).
4) Add InputRowSchemas.fromDataSchema method that computes the proper
ColumnsFilter for given timestamp, dimensions, transform, and metrics.
5) Add "getRequiredColumns" method to TransformSpec to support the above.
* Various fixups.
* Uncomment incorrectly commented lines.
* Move TransformSpecTest to the proper module.
* Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting.
* Fix.
* Fix build.
* Checkstyle.
* Misc fixes.
* Fix test.
* Move config.
* Fix imports.
* Fixup.
* Fix ShuffleResourceTest.
* Add import.
* Smarter exclusions.
* Fixes based on tests.
Also, add TIME_COLUMN constant in the web console.
* Adjustments for tests.
* Reorder test data.
* Update docs.
* Update docs to say Druid 0.22.0 instead of 0.21.0.
* Fix test.
* Fix ITAutoCompactionTest.
* Changes from review & from merging.
* Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap
* Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap
* Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap
* Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap
* address comments
* Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity
* Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity
* Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity
* address comments
* address comments
* address comments
* address comments
* address comments
* Suppress logging for some exceptions to reduce excessive stack trace messages
Signed-off-by: frank chen <frank.chen021@outlook.com>
* log message for channel disconnected exception
Signed-off-by: frank chen <frank.chen021@outlook.com>
* druid task auto scale based on kafka lag
* fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig
* druid task auto scale based on kafka lag
* fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig
* test dynamic auto scale done
* auto scale tasks tested on prd cluster
* auto scale tasks tested on prd cluster
* modify code style to solve 29055.10 29055.9 29055.17 29055.18 29055.19 29055.20
* rename test fiel function
* change codes and add docs based on capistrant reviewed
* midify test docs
* modify docs
* modify docs
* modify docs
* merge from master
* Extract the autoScale logic out of SeekableStreamSupervisor to minimize putting more stuff inside there && Make autoscaling algorithm configurable and scalable.
* fix ci failed
* revert msic.xml
* add uts to test autoscaler create && scale out/in and kafka ingest with scale enable
* add more uts
* fix inner class check
* add IT for kafka ingestion with autoscaler
* add new IT in groups=kafka-index named testKafkaIndexDataWithWithAutoscaler
* review change
* code review
* remove unused imports
* fix NLP
* fix docs and UTs
* revert misc.xml
* use jackson to build autoScaleConfig with default values
* add uts
* use jackson to init AutoScalerConfig in IOConfig instead of Map<>
* autoscalerConfig interface and provide a defaultAutoScalerConfig
* modify uts
* modify docs
* fix checkstyle
* revert misc.xml
* modify uts
* reviewed code change
* reviewed code change
* code reviewed
* code review
* log changed
* do StringUtils.encodeForFormat when create allocationExec
* code review && limit taskCountMax to partitionNumbers
* modify docs
* code review
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
* add query granularity to compaction task
* fix checkstyle
* fix checkstyle
* fix test
* fix test
* add tests
* fix test
* fix test
* cleanup
* rename class
* fix test
* fix test
* add test
* fix test
* Keep query granularity of compacted segments after compaction
* Protect against null isRollup
* Fix bugspot check RC_REF_COMPARISON_BAD_PRACTICE_BOOLEAN & edit an existing comment
* Make sure that NONE is also included when comparing for the finer granularity
* Update integration test check for segment size due to query granularity propagation affecting size
* Minor code cleanup
* Added functional test to verify queryGranlarity after compaction
* Minor style fix
* Update unit tests
* Support segmentGranularity for auto-compaction
* Support segmentGranularity for auto-compaction
* Support segmentGranularity for auto-compaction
* Support segmentGranularity for auto-compaction
* resolve conflict
* Support segmentGranularity for auto-compaction
* Support segmentGranularity for auto-compaction
* fix tests
* fix more tests
* fix checkstyle
* add unit tests
* fix checkstyle
* fix checkstyle
* fix checkstyle
* add unit tests
* add integration tests
* fix checkstyle
* fix checkstyle
* fix failing tests
* address comments
* address comments
* fix tests
* fix tests
* fix test
* fix test
* fix test
* fix test
* fix test
* fix test
* fix test
* fix test
* Tidy up query error codes
* fix tests
* Restore query exception type in JsonParserIterator
* address review comments; add a comment explaining the ugly switch
* fix test
* Multiphase merge for IndexMergerV9
* JSON fix
* Cleanup temp files
* Docs
* Address logging and add IT
* Fix spelling and test unloader datasource name
* Make some additions to IT suite to make Hadoop related testing more understandable
* add start.hadoop.docker to mvn arg tips in doc
* fix issues preventing ITIndexHadoopTest from running in local mode
* integration test for coordinator and overlord leadership, added sys.servers is_leader column
* docs
* remove not needed
* fix comments
* fix compile heh
* oof
* revert unintended
* fix tests, split out docker-compose file selection from starting cluster, use docker-compose down to stop cluster
* fixes
* style
* dang
* heh
* scripts are hard
* fix spelling
* fix thing that must not matter since was already wrong ip, log when test fails
* needs more heap
* fix merge
* less aggro
* fix race condition with DruidSchema tables and dataSourcesNeedingRebuild
* rework to see if it passes analysis
* more better
* maybe this
* re-arrange and comments
* Fixes and tests related to the Indexer process.
Three bugs fixed:
1) Indexers would not announce themselves as segment servers if they
did not have storage locations defined. This used to work, but was
broken in #9971. Fixed this by adding an "isSegmentServer" method
to ServerType and updating SegmentLoadDropHandler to always announce
if this method returns true.
2) Certain batch task types were written in a way that assumed "isReady"
would be called before "run", which is not guaranteed. In particular,
they relied on it in order to initialize "taskLockHelper". Fixed this
by updating AbstractBatchIndexTask to ensure "isReady" is called
before "run" for these tasks.
3) UnifiedIndexerAppenderatorsManager did not properly handle complex
datasources. Introduced DataSourceAnalysis in order to fix this.
Test changes:
1) Add a new "docker-compose.cli-indexer.yml" config that spins up an
Indexer instead of a MiddleManager.
2) Introduce a "USE_INDEXER" environment variable that determines if
docker-compose will start up an Indexer or a MiddleManager.
3) Duplicate all the jdk8 tests and run them in both MiddleManager and
Indexer mode.
4) Various adjustments to encourage fail-fast errors in the Docker
build scripts.
5) Various adjustments to speed up integration tests and reduce memory
usage.
6) Add another Mac-specific approach to determining a machine's own IP.
This was useful on my development machine.
7) Update segment-count check in ITCompactionTaskTest to eliminate a
race condition (it was looking for 6 segments, which only exist
together briefly, until the older 4 are marked unused).
Javadoc updates:
1) AbstractBatchIndexTask: Added javadocs to determineLockGranularityXXX
that make it clear when taskLockHelper will be initialized as a side
effect. (Related to the second bug above.)
2) Task: Clarified that "isReady" is not guaranteed to be called before
"run". It was already implied, but now it's explicit.
3) ZkCoordinator: Clarified deprecation message.
4) DataSegmentServerAnnouncer: Clarified deprecation message.
* Fix stop_cluster script.
* Fix sanity check in script.
* Fix hashbang lines.
* Test and doc adjustments.
* Additional tests, and adjustments for tests.
* Split ITs back out.
* Revert change to druid_coordinator_period_indexingPeriod.
* Set Indexer capacity to match MM.
* Bump up Historical memory.
* Bump down coordinator, overlord memory.
* Bump up Broker memory.
1) Accelerate coordinator runs to speed up segment load after publishing.
2) For streaming ingestion tests, Instead of waiting 3 minutes for data to
load, wait until the expected number of rows is loaded.
Also updates segment-count check in ITCompactionTaskTest to eliminate a
race condition (it was looking for 6 segments, which only exist together
briefly, until the older 4 are marked unused).
* fix flaky IT Compaction test
* fix flaky IT Compaction test
* test
* test
* test
* test
* Fix compaction integration test CI timeout
* address comments
* test
* test
* Add print logs
* add error msg
* add taskId to logging
* fix JSON format
* Change all columns in sys segments to be JSON
* Change all columns in sys segments to be JSON
* add tests
* fix failing tests
* fix failing tests
* Proposed changes for making joins cacheable
* Add unit tests
* Fix tests
* simplify logic
* Pull empty byte array logic out of CachingQueryRunner
* remove useless null check
* Minor refactor
* Fix tests
* Fix segment caching on Broker
* Move join cache key computation in Broker
Move join cache key computation in Broker from ResultLevelCachingQueryRunner to CachingClusteredClient
* Fix compilation
* Review comments
* Add more tests
* Fix inspection errors
* Pushed condition analysis to JoinableFactory
* review comments
* Disable join caching for broker and add prefix key to BroadcastSegmentIndexedTable
* Remove commented lines
* Fix populateCache
* Disable caching for selective datasources
Refactored the code so that we can decide at the data source level, whether to enable cache for broker or data nodes
* Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided
* query context
* fix tests; add more test
* javadoc
* docs and more tests
* remove default and hadoop tests
* consistent name and fix javadoc
* spelling and field name
* default function for partitionsSpec
* other comments
* address comments
* fix tests and spelling
* test
* doc
* Working
* add test
* doc
* fix test
* split other integration test
* exclude other-index from other tests
* doc anchor fix
* adjust task slots and number of merge tasks
* spell check
* reduce maxNumConcurrentSubTasks to 1
* maxNumConcurrentSubtasks for range partitinoing
* reduce memory for historical
* change group name
* Add IndexMergerRollupTest
This changelist adds a test to merge indexes with StringFirst/StringLast aggregator.
* Fix StringFirstAggregateCombiner/StringLastAggregateCombiner
The segment-level type for stringFirst/stringLast is SerializablePairLongString,
not String. This changelist fixes it.
* Fix EarliestLatestAnySqlAggregator to handle COMPLEX type
This changelist allows EarliestLatestAnySqlAggregator to accept COMPLEX
type as an operand. For its return type, we set it to VARCHAR, since
COMPLEX column is only generated by stringFirst/stringLast during ingestion
rollup.
* Return value with smaller timestamp in StringFirstAggregatorFactory.combine function
* Add integration tests for stringFirst/stringLast during ingestion
* Use one EarliestLatestReturnTypeInference instance
Co-authored-by: Joy Kent <joy@automonic.ai>