* Add ability to wait for segment availability for batch jobs
* IT updates
* fix queries in legacy hadoop IT
* Fix broken indexing integration tests
* address an lgtm flag
* spell checker still flagging for hadoop doc. adding under that file header too
* fix compaction IT
* Updates to wait for availability method
* improve unit testing for patch
* fix bad indentation
* refactor waitForSegmentAvailability
* Fixes based off of review comments
* cleanup to get compile after merging with master
* fix failing test after previous logic update
* add back code that must have gotten deleted during conflict resolution
* update some logging code
* fixes to get compilation working after merge with master
* reset interrupt flag in catch block after code review pointed it out
* small changes following self-review
* fixup some issues brought on by merge with master
* small changes after review
* cleanup a little bit after merge with master
* Fix potential resource leak in AbstractBatchIndexTask
* syntax fix
* Add a Compcation TuningConfig type
* add docs stipulating the lack of support by Compaction tasks for the new config
* Fixup compilation errors after merge with master
* Remove erreneous newline
* DruidInputSource: Fix issues in column projection, timestamp handling.
DruidInputSource, DruidSegmentReader changes:
1) Remove "dimensions" and "metrics". They are not necessary, because we
can compute which columns we need to read based on what is going to
be used by the timestamp, transform, dimensions, and metrics.
2) Start using ColumnsFilter (see below) to decide which columns we need
to read.
3) Actually respect the "timestampSpec". Previously, it was ignored, and
the timestamp of the returned InputRows was set to the `__time` column
of the input datasource.
(1) and (2) together fix a bug in which the DruidInputSource would not
properly read columns that are used as inputs to a transformSpec.
(3) fixes a bug where the timestampSpec would be ignored if you attempted
to set the column to something other than `__time`.
(1) and (3) are breaking changes.
Web console changes:
1) Remove "Dimensions" and "Metrics" from the Druid input source.
2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for
compatibility with the new behavior.
Other changes:
1) Add ColumnsFilter, a new class that allows input readers to determine
which columns they need to read. Currently, it's only used by the
DruidInputSource, but it could be used by other columnar input sources
in the future.
2) Add a ColumnsFilter to InputRowSchema.
3) Remove the metric names from InputRowSchema (they were unused).
4) Add InputRowSchemas.fromDataSchema method that computes the proper
ColumnsFilter for given timestamp, dimensions, transform, and metrics.
5) Add "getRequiredColumns" method to TransformSpec to support the above.
* Various fixups.
* Uncomment incorrectly commented lines.
* Move TransformSpecTest to the proper module.
* Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting.
* Fix.
* Fix build.
* Checkstyle.
* Misc fixes.
* Fix test.
* Move config.
* Fix imports.
* Fixup.
* Fix ShuffleResourceTest.
* Add import.
* Smarter exclusions.
* Fixes based on tests.
Also, add TIME_COLUMN constant in the web console.
* Adjustments for tests.
* Reorder test data.
* Update docs.
* Update docs to say Druid 0.22.0 instead of 0.21.0.
* Fix test.
* Fix ITAutoCompactionTest.
* Changes from review & from merging.
* Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap
* Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap
* Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap
* Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap
* address comments
* Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity
* Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity
* Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity
* address comments
* address comments
* address comments
* address comments
* address comments
* Suppress logging for some exceptions to reduce excessive stack trace messages
Signed-off-by: frank chen <frank.chen021@outlook.com>
* log message for channel disconnected exception
Signed-off-by: frank chen <frank.chen021@outlook.com>
* druid task auto scale based on kafka lag
* fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig
* druid task auto scale based on kafka lag
* fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig
* test dynamic auto scale done
* auto scale tasks tested on prd cluster
* auto scale tasks tested on prd cluster
* modify code style to solve 29055.10 29055.9 29055.17 29055.18 29055.19 29055.20
* rename test fiel function
* change codes and add docs based on capistrant reviewed
* midify test docs
* modify docs
* modify docs
* modify docs
* merge from master
* Extract the autoScale logic out of SeekableStreamSupervisor to minimize putting more stuff inside there && Make autoscaling algorithm configurable and scalable.
* fix ci failed
* revert msic.xml
* add uts to test autoscaler create && scale out/in and kafka ingest with scale enable
* add more uts
* fix inner class check
* add IT for kafka ingestion with autoscaler
* add new IT in groups=kafka-index named testKafkaIndexDataWithWithAutoscaler
* review change
* code review
* remove unused imports
* fix NLP
* fix docs and UTs
* revert misc.xml
* use jackson to build autoScaleConfig with default values
* add uts
* use jackson to init AutoScalerConfig in IOConfig instead of Map<>
* autoscalerConfig interface and provide a defaultAutoScalerConfig
* modify uts
* modify docs
* fix checkstyle
* revert misc.xml
* modify uts
* reviewed code change
* reviewed code change
* code reviewed
* code review
* log changed
* do StringUtils.encodeForFormat when create allocationExec
* code review && limit taskCountMax to partitionNumbers
* modify docs
* code review
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
* add query granularity to compaction task
* fix checkstyle
* fix checkstyle
* fix test
* fix test
* add tests
* fix test
* fix test
* cleanup
* rename class
* fix test
* fix test
* add test
* fix test
* Keep query granularity of compacted segments after compaction
* Protect against null isRollup
* Fix bugspot check RC_REF_COMPARISON_BAD_PRACTICE_BOOLEAN & edit an existing comment
* Make sure that NONE is also included when comparing for the finer granularity
* Update integration test check for segment size due to query granularity propagation affecting size
* Minor code cleanup
* Added functional test to verify queryGranlarity after compaction
* Minor style fix
* Update unit tests
* Support segmentGranularity for auto-compaction
* Support segmentGranularity for auto-compaction
* Support segmentGranularity for auto-compaction
* Support segmentGranularity for auto-compaction
* resolve conflict
* Support segmentGranularity for auto-compaction
* Support segmentGranularity for auto-compaction
* fix tests
* fix more tests
* fix checkstyle
* add unit tests
* fix checkstyle
* fix checkstyle
* fix checkstyle
* add unit tests
* add integration tests
* fix checkstyle
* fix checkstyle
* fix failing tests
* address comments
* address comments
* fix tests
* fix tests
* fix test
* fix test
* fix test
* fix test
* fix test
* fix test
* fix test
* fix test
* Tidy up query error codes
* fix tests
* Restore query exception type in JsonParserIterator
* address review comments; add a comment explaining the ugly switch
* fix test
* Multiphase merge for IndexMergerV9
* JSON fix
* Cleanup temp files
* Docs
* Address logging and add IT
* Fix spelling and test unloader datasource name
* Make some additions to IT suite to make Hadoop related testing more understandable
* add start.hadoop.docker to mvn arg tips in doc
* fix issues preventing ITIndexHadoopTest from running in local mode
* integration test for coordinator and overlord leadership, added sys.servers is_leader column
* docs
* remove not needed
* fix comments
* fix compile heh
* oof
* revert unintended
* fix tests, split out docker-compose file selection from starting cluster, use docker-compose down to stop cluster
* fixes
* style
* dang
* heh
* scripts are hard
* fix spelling
* fix thing that must not matter since was already wrong ip, log when test fails
* needs more heap
* fix merge
* less aggro
* fix race condition with DruidSchema tables and dataSourcesNeedingRebuild
* rework to see if it passes analysis
* more better
* maybe this
* re-arrange and comments
* Fixes and tests related to the Indexer process.
Three bugs fixed:
1) Indexers would not announce themselves as segment servers if they
did not have storage locations defined. This used to work, but was
broken in #9971. Fixed this by adding an "isSegmentServer" method
to ServerType and updating SegmentLoadDropHandler to always announce
if this method returns true.
2) Certain batch task types were written in a way that assumed "isReady"
would be called before "run", which is not guaranteed. In particular,
they relied on it in order to initialize "taskLockHelper". Fixed this
by updating AbstractBatchIndexTask to ensure "isReady" is called
before "run" for these tasks.
3) UnifiedIndexerAppenderatorsManager did not properly handle complex
datasources. Introduced DataSourceAnalysis in order to fix this.
Test changes:
1) Add a new "docker-compose.cli-indexer.yml" config that spins up an
Indexer instead of a MiddleManager.
2) Introduce a "USE_INDEXER" environment variable that determines if
docker-compose will start up an Indexer or a MiddleManager.
3) Duplicate all the jdk8 tests and run them in both MiddleManager and
Indexer mode.
4) Various adjustments to encourage fail-fast errors in the Docker
build scripts.
5) Various adjustments to speed up integration tests and reduce memory
usage.
6) Add another Mac-specific approach to determining a machine's own IP.
This was useful on my development machine.
7) Update segment-count check in ITCompactionTaskTest to eliminate a
race condition (it was looking for 6 segments, which only exist
together briefly, until the older 4 are marked unused).
Javadoc updates:
1) AbstractBatchIndexTask: Added javadocs to determineLockGranularityXXX
that make it clear when taskLockHelper will be initialized as a side
effect. (Related to the second bug above.)
2) Task: Clarified that "isReady" is not guaranteed to be called before
"run". It was already implied, but now it's explicit.
3) ZkCoordinator: Clarified deprecation message.
4) DataSegmentServerAnnouncer: Clarified deprecation message.
* Fix stop_cluster script.
* Fix sanity check in script.
* Fix hashbang lines.
* Test and doc adjustments.
* Additional tests, and adjustments for tests.
* Split ITs back out.
* Revert change to druid_coordinator_period_indexingPeriod.
* Set Indexer capacity to match MM.
* Bump up Historical memory.
* Bump down coordinator, overlord memory.
* Bump up Broker memory.
1) Accelerate coordinator runs to speed up segment load after publishing.
2) For streaming ingestion tests, Instead of waiting 3 minutes for data to
load, wait until the expected number of rows is loaded.
Also updates segment-count check in ITCompactionTaskTest to eliminate a
race condition (it was looking for 6 segments, which only exist together
briefly, until the older 4 are marked unused).
* fix flaky IT Compaction test
* fix flaky IT Compaction test
* test
* test
* test
* test
* Fix compaction integration test CI timeout
* address comments
* test
* test
* Add print logs
* add error msg
* add taskId to logging
* fix JSON format
* Change all columns in sys segments to be JSON
* Change all columns in sys segments to be JSON
* add tests
* fix failing tests
* fix failing tests
* Proposed changes for making joins cacheable
* Add unit tests
* Fix tests
* simplify logic
* Pull empty byte array logic out of CachingQueryRunner
* remove useless null check
* Minor refactor
* Fix tests
* Fix segment caching on Broker
* Move join cache key computation in Broker
Move join cache key computation in Broker from ResultLevelCachingQueryRunner to CachingClusteredClient
* Fix compilation
* Review comments
* Add more tests
* Fix inspection errors
* Pushed condition analysis to JoinableFactory
* review comments
* Disable join caching for broker and add prefix key to BroadcastSegmentIndexedTable
* Remove commented lines
* Fix populateCache
* Disable caching for selective datasources
Refactored the code so that we can decide at the data source level, whether to enable cache for broker or data nodes
* Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided
* query context
* fix tests; add more test
* javadoc
* docs and more tests
* remove default and hadoop tests
* consistent name and fix javadoc
* spelling and field name
* default function for partitionsSpec
* other comments
* address comments
* fix tests and spelling
* test
* doc
* Working
* add test
* doc
* fix test
* split other integration test
* exclude other-index from other tests
* doc anchor fix
* adjust task slots and number of merge tasks
* spell check
* reduce maxNumConcurrentSubTasks to 1
* maxNumConcurrentSubtasks for range partitinoing
* reduce memory for historical
* change group name