* Avoid mapping hydrants in create segments phase for native ingestion
* Drop queriable indices after a given sink is fully merged
* Do not drop memory mappings for realtime ingestion
* Style fixes
* Renamed to match use case better
* Rollback memoization code and use the real time flag instead
* Null ptr fix in FireHydrant toString plus adjustments to memory pressure tracking calculations
* Style
* Log some count stats
* Make sure sinks size is obtained at the right time
* BatchAppenderator unit test
* Fix comment typos
* Renamed methods to make them more readable
* Move persisted metadata from FireHydrant class to AppenderatorImpl. Removed superfluous differences and fix comment typo. Removed custom comparator
* Missing dependency
* Make persisted hydrant metadata map concurrent and better reflect the fact that keys are Java references. Maintain persisted metadata when dropping/closing segments.
* Replaced concurrent variables with normal ones
* Added batchMemoryMappedIndex "fallback" flag with default "false". Set this to "true" make code fallback to previous code path.
* Style fix.
* Added note to new setting in doc, using Iterables.size (and removing a dependency), and fixing a typo in a comment.
* Forgot to commit this edited documentation message
* lay the groundwork for throttling replicant loads per RunRules execution
* Add dynamic coordinator config to control new replicant threshold.
* remove redundant line
* add some unit tests
* fix checkstyle error
* add documentation for new dynamic config
* improve docs and logs
* Alter how null is handled for new config. If null, manually set as default
* ARRAY_AGG sql aggregator function
* add javadoc
* spelling
* review stuff, return null instead of empty when nil input
* review stuff
* Update sql.md
* use type inference for finalize, refactor some things
* Add feature to automatically remove rules based on retention period
* Add feature to automatically remove rules based on retention period
* address comments
* DataSchema: Improve duplicate-column error message.
Now, when duplicate columns are specified, the error message will include
information about where those duplicate columns were seen. Also, if there
are multiple duplicate columns, all will be listed in the error message
instead of just the first one encountered.
* Fix style for checkstyle.
* Further improve error message.
* Add ability to wait for segment availability for batch jobs
* IT updates
* fix queries in legacy hadoop IT
* Fix broken indexing integration tests
* address an lgtm flag
* spell checker still flagging for hadoop doc. adding under that file header too
* fix compaction IT
* Updates to wait for availability method
* improve unit testing for patch
* fix bad indentation
* refactor waitForSegmentAvailability
* Fixes based off of review comments
* cleanup to get compile after merging with master
* fix failing test after previous logic update
* add back code that must have gotten deleted during conflict resolution
* update some logging code
* fixes to get compilation working after merge with master
* reset interrupt flag in catch block after code review pointed it out
* small changes following self-review
* fixup some issues brought on by merge with master
* small changes after review
* cleanup a little bit after merge with master
* Fix potential resource leak in AbstractBatchIndexTask
* syntax fix
* Add a Compcation TuningConfig type
* add docs stipulating the lack of support by Compaction tasks for the new config
* Fixup compilation errors after merge with master
* Remove erreneous newline
* JavaScript script engine support was removed in JDK 15: skip those tests for JDKs without it
* Fix flaky HTTP client tests with Java 15
* Switch from CMS to G1GC in integration tests, since CMS is no longer available in JDK 15
* Add paramter to loadstatus API to compute underdeplication against cluster view
This change adds a query parameter `computeUsingClusterView` to loadstatus apis
that if specified have the coordinator compute undereplication for segments based
on the number of services available within cluster that the segment can be replicated
on, instead of the configured replication count configured in load rule. A default
load rule is created in all clusters that specified that all segments should be
replicated 2 times. As replicas are forced to be on separate nodes in the cluster,
this causes the loadstatus api to report that there are under-replicated segments
when there is only 1 data server in the cluster. In this case, calling loadstatus
api without this new query parameter will always result in a response indicating
under-replication of segments
* * fix exception mapper
* * Address review comments
* * update external API docs
* Apply suggestions from code review
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
* * update more external docs
* * update javadoc
* Apply suggestions from code review
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
* request logs through kafka emitter
* travis fixes
* review comments
* kafka emitter unit test
* new line
* travis checks
* checkstyle fix
* count request lost when request topic is null
* DruidInputSource: Fix issues in column projection, timestamp handling.
DruidInputSource, DruidSegmentReader changes:
1) Remove "dimensions" and "metrics". They are not necessary, because we
can compute which columns we need to read based on what is going to
be used by the timestamp, transform, dimensions, and metrics.
2) Start using ColumnsFilter (see below) to decide which columns we need
to read.
3) Actually respect the "timestampSpec". Previously, it was ignored, and
the timestamp of the returned InputRows was set to the `__time` column
of the input datasource.
(1) and (2) together fix a bug in which the DruidInputSource would not
properly read columns that are used as inputs to a transformSpec.
(3) fixes a bug where the timestampSpec would be ignored if you attempted
to set the column to something other than `__time`.
(1) and (3) are breaking changes.
Web console changes:
1) Remove "Dimensions" and "Metrics" from the Druid input source.
2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for
compatibility with the new behavior.
Other changes:
1) Add ColumnsFilter, a new class that allows input readers to determine
which columns they need to read. Currently, it's only used by the
DruidInputSource, but it could be used by other columnar input sources
in the future.
2) Add a ColumnsFilter to InputRowSchema.
3) Remove the metric names from InputRowSchema (they were unused).
4) Add InputRowSchemas.fromDataSchema method that computes the proper
ColumnsFilter for given timestamp, dimensions, transform, and metrics.
5) Add "getRequiredColumns" method to TransformSpec to support the above.
* Various fixups.
* Uncomment incorrectly commented lines.
* Move TransformSpecTest to the proper module.
* Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting.
* Fix.
* Fix build.
* Checkstyle.
* Misc fixes.
* Fix test.
* Move config.
* Fix imports.
* Fixup.
* Fix ShuffleResourceTest.
* Add import.
* Smarter exclusions.
* Fixes based on tests.
Also, add TIME_COLUMN constant in the web console.
* Adjustments for tests.
* Reorder test data.
* Update docs.
* Update docs to say Druid 0.22.0 instead of 0.21.0.
* Fix test.
* Fix ITAutoCompactionTest.
* Changes from review & from merging.
* Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap
* Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap
* Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap
* Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap
* address comments
* Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity
* Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity
* Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity
* address comments
* address comments
* address comments
* address comments
* address comments
* Fix regression introduced by #11008
* Add back and tweak the check to not inspect resources for authorization when AllowAllAuthorizer is configured.
Add a unit test to validate that the change doesn't introduce new behavior.
Size HashMap and HashSet appropriately. Perf analysis of the queries
revealed that over 25% of the query time was spent in resizing HashMap and HashSet
collections. Also, prevent the need to examine and authorize all resources when
AllowAllAuthorizer is the configured authorizer.
* logs more info when delete segments && add deleteSegments-related UT
* revert msic.xml
* code review
* use log.debugSegments
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
* Allow only HTTP and HTTPS protocols for the HTTP inputSource
* rename
* Update core/src/main/java/org/apache/druid/data/input/impl/HttpInputSource.java
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
* fix http firehose and update doc
* HDFS inputSource
* add configs for allowed protocols
* fix checkstyle and doc
* more checkstyle
* remove stale doc
* remove more doc
* Apply doc suggestions from code review
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
* update hdfs address in docs
* fix test
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
* druid task auto scale based on kafka lag
* fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig
* druid task auto scale based on kafka lag
* fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig
* test dynamic auto scale done
* auto scale tasks tested on prd cluster
* auto scale tasks tested on prd cluster
* modify code style to solve 29055.10 29055.9 29055.17 29055.18 29055.19 29055.20
* rename test fiel function
* change codes and add docs based on capistrant reviewed
* midify test docs
* modify docs
* modify docs
* modify docs
* merge from master
* Extract the autoScale logic out of SeekableStreamSupervisor to minimize putting more stuff inside there && Make autoscaling algorithm configurable and scalable.
* fix ci failed
* revert msic.xml
* add uts to test autoscaler create && scale out/in and kafka ingest with scale enable
* add more uts
* fix inner class check
* add IT for kafka ingestion with autoscaler
* add new IT in groups=kafka-index named testKafkaIndexDataWithWithAutoscaler
* review change
* code review
* remove unused imports
* fix NLP
* fix docs and UTs
* revert misc.xml
* use jackson to build autoScaleConfig with default values
* add uts
* use jackson to init AutoScalerConfig in IOConfig instead of Map<>
* autoscalerConfig interface and provide a defaultAutoScalerConfig
* modify uts
* modify docs
* fix checkstyle
* revert misc.xml
* modify uts
* reviewed code change
* reviewed code change
* code reviewed
* code review
* log changed
* do StringUtils.encodeForFormat when create allocationExec
* code review && limit taskCountMax to partitionNumbers
* modify docs
* code review
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
* where filter left first draft
* Revert changes in calcite test
* Refactor a bit
* Fixing the Tests
* Changes
* Adding tests
* Add tests for correlated queries
* Add comment
* Fix typos
* add query granularity to compaction task
* fix checkstyle
* fix checkstyle
* fix test
* fix test
* add tests
* fix test
* fix test
* cleanup
* rename class
* fix test
* fix test
* add test
* fix test
* Granularity: Introduce primitive-typed bucketStart, increment methods.
Saves creation of unnecessary DateTime objects in timestamp_floor and
timestamp_ceil expressions.
* Fix style.
* Amp up the test coverage.
* Support segmentGranularity for auto-compaction
* Support segmentGranularity for auto-compaction
* Support segmentGranularity for auto-compaction
* Support segmentGranularity for auto-compaction
* resolve conflict
* Support segmentGranularity for auto-compaction
* Support segmentGranularity for auto-compaction
* fix tests
* fix more tests
* fix checkstyle
* add unit tests
* fix checkstyle
* fix checkstyle
* fix checkstyle
* add unit tests
* add integration tests
* fix checkstyle
* fix checkstyle
* fix failing tests
* address comments
* address comments
* fix tests
* fix tests
* fix test
* fix test
* fix test
* fix test
* fix test
* fix test
* fix test
* fix test