* Allow only HTTP and HTTPS protocols for the HTTP inputSource
* rename
* Update core/src/main/java/org/apache/druid/data/input/impl/HttpInputSource.java
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
* fix http firehose and update doc
* HDFS inputSource
* add configs for allowed protocols
* fix checkstyle and doc
* more checkstyle
* remove stale doc
* remove more doc
* Apply doc suggestions from code review
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
* update hdfs address in docs
* fix test
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
* druid task auto scale based on kafka lag
* fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig
* druid task auto scale based on kafka lag
* fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig
* test dynamic auto scale done
* auto scale tasks tested on prd cluster
* auto scale tasks tested on prd cluster
* modify code style to solve 29055.10 29055.9 29055.17 29055.18 29055.19 29055.20
* rename test fiel function
* change codes and add docs based on capistrant reviewed
* midify test docs
* modify docs
* modify docs
* modify docs
* merge from master
* Extract the autoScale logic out of SeekableStreamSupervisor to minimize putting more stuff inside there && Make autoscaling algorithm configurable and scalable.
* fix ci failed
* revert msic.xml
* add uts to test autoscaler create && scale out/in and kafka ingest with scale enable
* add more uts
* fix inner class check
* add IT for kafka ingestion with autoscaler
* add new IT in groups=kafka-index named testKafkaIndexDataWithWithAutoscaler
* review change
* code review
* remove unused imports
* fix NLP
* fix docs and UTs
* revert misc.xml
* use jackson to build autoScaleConfig with default values
* add uts
* use jackson to init AutoScalerConfig in IOConfig instead of Map<>
* autoscalerConfig interface and provide a defaultAutoScalerConfig
* modify uts
* modify docs
* fix checkstyle
* revert misc.xml
* modify uts
* reviewed code change
* reviewed code change
* code reviewed
* code review
* log changed
* do StringUtils.encodeForFormat when create allocationExec
* code review && limit taskCountMax to partitionNumbers
* modify docs
* code review
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
* where filter left first draft
* Revert changes in calcite test
* Refactor a bit
* Fixing the Tests
* Changes
* Adding tests
* Add tests for correlated queries
* Add comment
* Fix typos
* add query granularity to compaction task
* fix checkstyle
* fix checkstyle
* fix test
* fix test
* add tests
* fix test
* fix test
* cleanup
* rename class
* fix test
* fix test
* add test
* fix test
* Granularity: Introduce primitive-typed bucketStart, increment methods.
Saves creation of unnecessary DateTime objects in timestamp_floor and
timestamp_ceil expressions.
* Fix style.
* Amp up the test coverage.
* Support segmentGranularity for auto-compaction
* Support segmentGranularity for auto-compaction
* Support segmentGranularity for auto-compaction
* Support segmentGranularity for auto-compaction
* resolve conflict
* Support segmentGranularity for auto-compaction
* Support segmentGranularity for auto-compaction
* fix tests
* fix more tests
* fix checkstyle
* add unit tests
* fix checkstyle
* fix checkstyle
* fix checkstyle
* add unit tests
* add integration tests
* fix checkstyle
* fix checkstyle
* fix failing tests
* address comments
* address comments
* fix tests
* fix tests
* fix test
* fix test
* fix test
* fix test
* fix test
* fix test
* fix test
* fix test
* before i leaped i should've seen, the view from halfway down
* fixes
* fixes, more test
* rename
* fix style
* further refactoring
* review stuffs
* rename
* more javadoc and comments
* add offsetFetchPeriod to kinesis ingestion doc
* Remove jackson dependencies from extensions
* Use fixed delay for lag collection
* Metrics reset after finishing processing
* comments
* Broaden the list of exceptions to retry for
* Unit tests
* Add more tests
* Refactoring
* re-order metrics
* Doc suggestions
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
* Add tests
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
* Prevent interval materialization for UniformGranularitySpec inside the overlord
* Change API of bucketIntervals in GranularitySpec to return an Iterable<Interval>
* Javadoc update, respect inputIntervals contract
* Eliminate dependency on wrappedspec (i.e. ArbitraryGranularity) in UniformGranularitySpec
* Added one boundary condition test to UniformGranularityTest and fixed Travis forbidden method errors in IntervalsByGranularity
* Fix Travis style & other checks
* Refactor TreeSet to facilitate re-use in UniformGranularitySpec
* Make sure intervals are unique when there is no segment granularity
* Style/bugspot fixes...
* More travis checks
* Add condensedIntervals method to GranularitySpec and pass it as needed to the lock method
* Style & PR feedback
* Fixed failing test
* Fixed bug in IntervalsByGranularity iterator that it would return repeated elements (see added unit tests that were broken before this change)
* Refactor so that we can get the condensed buckets without materializing the intervals
* Get rid of GranularitySpec::condensedInputIntervals ... not needed
* Travis failures fixes
* Travis checkstyle fix
* Edited/added javadoc comments and a method name (code review feedback)
* Fixed jacoco coverage by moving class and adding more coverage
* Avoid materializing the condensed intervals when locking
* Deal with overlapping intervals
* Remove code and use library code instead
* Refactor intervals by granularity using the FluentIterable, add sanity checks
* Change !hasNext() to inputIntervals().isEmpty()
* Remove redundant lambda
* Use materialized intervals here since this is outside the overlord (for performance)
* Name refactor to reflect the fact that bucket intervals are sorted.
* Style fixes
* Removed redundant method and have condensedIntervalIterator throw IAE when element is null for consistency with other methods in this class (as well that null interval when condensing does not make sense)
* Remove forbidden api
* Move helper class inside common base class to reduce public space pollution
* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead
* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead
* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead
* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead
* fix checkstyle
* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead
* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead
* fix test
* fix test
* add log
* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead
* address comments
* fix checkstyle
* fix checkstyle
* add config to skip overhead memory calculation
* add test for the skipBytesInMemoryOverheadCheck config
* add docs
* fix checkstyle
* fix checkstyle
* fix spelling
* address comments
* fix travis
* address comments
* ready to test
* tested on dev cluster
* tested
* code review
* add UTs
* add UTs
* ut passed
* ut passed
* opti imports
* done
* done
* fix checkstyle
* modify uts
* modify logs
* changing the package of SegmentLazyLoadFailCallback.java to org.apache.druid.segment
* merge from master
* modify import orders
* merge from master
* merge from master
* modify logs
* modify docs
* modify logs to rerun ci
* modify logs to rerun ci
* modify logs to rerun ci
* modify logs to rerun ci
* modify logs to rerun ci
* modify logs to rerun ci
* modify logs to rerun ci
* modify logs to rerun ci
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
* Add a config for monitorScheduler type
* check interrupted
* null check
* do not schedule monitor if the previous one is still running
* checkstyle
* clean up names
* change default back to basic
* fix test
* Tidy up query error codes
* fix tests
* Restore query exception type in JsonParserIterator
* address review comments; add a comment explaining the ugly switch
* fix test
* allow the LogUsedSegments duty to be skippped
* Fixes for TravisCI coverage checks and documentation spell checking
* prameterize DruidCoordinatorTest in order to achieve coverage
* update config name to remove duty ref and improve documentation
* refine documentation for new config with reviewer advice
* add default column to docs for new config
* remove legacy code in LogUsedSegments and remove config to disbale duty
* fix makeHistoricalMangementDuties now that the returned list is always the same
* Coordinator Dynamic Config changes to ease upgrading with new config value
* change a log to debug level following review
* changes based on review feedback
* fix checkstyle
* Remove redundant IncrementalIndex.Builder
* Parametrize incremental index tests and benchmarks
- Reveal and fix a bug in OffheapIncrementalIndex
* Fix forbiddenapis error: Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale]
* Fix Intellij errors: declared exception is never thrown
* Add documentation and validate before closing objects on tearDown.
* Add documentation to OffheapIncrementalIndexTestSpec
* Doc corrections and minor changes.
* Add logging for generated rows.
* Refactor new tests/benchmarks.
* Improve IncrementalIndexCreator documentation
* Add required tests for DataGenerator
* Revert "rollupOpportunity" to be a string
* Multiphase merge for IndexMergerV9
* JSON fix
* Cleanup temp files
* Docs
* Address logging and add IT
* Fix spelling and test unloader datasource name
* dynamic coord config adding more balancing control
add new dynamic coordinator config, maxSegmentsToConsiderPerMove. This
config caps the number of segments that are iterated over when selecting
a segment to move. The default value combined with current balancing
strategies will still iterate over all provided segments. However,
setting this value to something > 0 will cap the number of segments
visited. This could make sense in cases where a cluster has a very large
number of segments and the admins prefer less iterations vs a thorough
consideration of all segments provided.
* fix checkstyle failure
* Make doc more detailed for admin to understand when/why to use new config
* refactor PR to use a % of segments instead of raw number
* update the docs
* remove bad doc line
* fix typo in name of new dynamic config
* update RservoirSegmentSampler to gracefully deal with values > 100%
* add handler for <= 0 in ReservoirSegmentSampler
* fixup CoordinatorDynamicConfigTest naming and argument ordering
* fix items in docs after spellcheck flags
* Fix lgtm flag on missing space in string literal
* improve documentation for new config
* Add default value to config docs and add advice in cluster tuning doc
* Add percentOfSegmentsToConsiderPerMove to web console coord config dialog
* update jest snapshot after console change
* fix spell checker errors
* Improve debug logging in getRandomSegmentBalancerHolder to cover all bad inputs for % of segments to consider
* add new config back to web console module after merge with master
* fix ReservoirSegmentSamplerTest
* fix line breaks in coordinator console dialog
* Add a test that helps ensure not regressions for percentOfSegmentsToConsiderPerMove
* Make improvements based off of feedback in review
* additional cleanup coming from review
* Add a warning log if limit on segments to consider for move can't be calcluated
* remove unused import
* fix tests for CoordinatorDynamicConfig
* remove precondition test that is redundant in CoordinatorDynamicConfig Builder class
* add query through to server selector
* add nullable extensions, deprecate old methods with defaults
* style changes
* add nullable to ServerSelectorStrategy
* fix test coverage
* missing override in test
* add null check
* Fixes and tests related to the Indexer process.
Three bugs fixed:
1) Indexers would not announce themselves as segment servers if they
did not have storage locations defined. This used to work, but was
broken in #9971. Fixed this by adding an "isSegmentServer" method
to ServerType and updating SegmentLoadDropHandler to always announce
if this method returns true.
2) Certain batch task types were written in a way that assumed "isReady"
would be called before "run", which is not guaranteed. In particular,
they relied on it in order to initialize "taskLockHelper". Fixed this
by updating AbstractBatchIndexTask to ensure "isReady" is called
before "run" for these tasks.
3) UnifiedIndexerAppenderatorsManager did not properly handle complex
datasources. Introduced DataSourceAnalysis in order to fix this.
Test changes:
1) Add a new "docker-compose.cli-indexer.yml" config that spins up an
Indexer instead of a MiddleManager.
2) Introduce a "USE_INDEXER" environment variable that determines if
docker-compose will start up an Indexer or a MiddleManager.
3) Duplicate all the jdk8 tests and run them in both MiddleManager and
Indexer mode.
4) Various adjustments to encourage fail-fast errors in the Docker
build scripts.
5) Various adjustments to speed up integration tests and reduce memory
usage.
6) Add another Mac-specific approach to determining a machine's own IP.
This was useful on my development machine.
7) Update segment-count check in ITCompactionTaskTest to eliminate a
race condition (it was looking for 6 segments, which only exist
together briefly, until the older 4 are marked unused).
Javadoc updates:
1) AbstractBatchIndexTask: Added javadocs to determineLockGranularityXXX
that make it clear when taskLockHelper will be initialized as a side
effect. (Related to the second bug above.)
2) Task: Clarified that "isReady" is not guaranteed to be called before
"run". It was already implied, but now it's explicit.
3) ZkCoordinator: Clarified deprecation message.
4) DataSegmentServerAnnouncer: Clarified deprecation message.
* Fix stop_cluster script.
* Fix sanity check in script.
* Fix hashbang lines.
* Test and doc adjustments.
* Additional tests, and adjustments for tests.
* Split ITs back out.
* Revert change to druid_coordinator_period_indexingPeriod.
* Set Indexer capacity to match MM.
* Bump up Historical memory.
* Bump down coordinator, overlord memory.
* Bump up Broker memory.
* fix to allow customer storage location selector strategy
* add test cases to check instance of selector strategy
* update doc
* code format
* resolve code review comments
* inject StorageLocation
* fix CI
* fix mismatched license item reported by CI
* change property path from druid.segmentCache.locationSelectorStrategy.type to druid.segmentCache.locationSelector.strategy
* using a helper method to bind to correct property path
* Two fixes related to encoding of % symbols.
1) TaskResourceFilter: Don't double-decode task ids. request.getPathSegments()
returns already-decoded strings. Applying StringUtils.urlDecode on
top of that causes erroneous behavior with '%' characters.
2) Update various ThreadFactoryBuilder name formats to escape '%'
characters. This fixes situations where substrings starting with '%'
are erroneously treated as format specifiers.
ITs are updated to include a '%' in extra.datasource.name.suffix.
* Avoid String.replace.
* Work around surefire bug.
* Fix xml encoding.
* Another try at the proper encoding.
* Give up on the emojis.
* Less ambitious testing.
* Fix an additional problem.
* Adjust encodeForFormat to return null if the input is null.
These caused certain APIs to not actually be properly forbidden.
Also removed two MoreExecutors entries for methods that don't exist in
our version of Guava.
* Move common methods that are used in HadoopTuningConfig and in AppenderatorConfig to TuningConfig
* Rename rowFlushBoundary in HadoopTuningConfig to maxRowsInMemory to match TuningConfig API
* Add new coordinator metrics for duty runtimes
* fix spelling for a constant variable value
* add comment clarifying why the global runtime metric is emitted where it is
* Remove duty alias in lieu of using the class name for metrics
* fix docs
* CoordinatorStats tests + add duty stats to accumulate() logic
* support multi-line text
* add test cases
* split json text into lines case by case
* improve exception handle
* fix CI
* use IntermediateRowParsingReader as base of JsonReader
* update doc
* ignore the non-immutable field in test case
* add more test cases
* mark `lineSplittable` as final
* fix testcases
* fix doc
* add a test case for SqlReader
* return all raw columns when exception occurs
* fix CI
* fix test cases
* resolve review comments
* handle ParseException returned by index.add
* apply Iterables.getOnlyElement
* fix CI
* fix test cases
* improve code in more graceful way
* fix test cases
* fix test cases
* add a test case to check multiple json string in one text block
* fix inspection check