Commit Graph

3003 Commits

Author SHA1 Message Date
Gian Merlino 823f620ede
Add IS [NOT] DISTINCT FROM to SQL and join matchers. (#14976)
* Add IS [NOT] DISTINCT FROM to SQL and join matchers.

Changes:

1) Add "isdistinctfrom" and "notdistinctfrom" native expressions.

2) Add "IS [NOT] DISTINCT FROM" to SQL. It uses the new native expressions
   when generating expressions, and is treated the same as equals and
   not-equals when generating native filters on literals.

3) Update join matchers to have an "includeNull" parameter that determines
   whether we are operating in "equals" mode or "is not distinct from"
   mode.

* Main changes:

- Add ARRAY handling to "notdistinctfrom" and "isdistinctfrom".
- Include null in pushed-down filters when using "notdistinctfrom" in a join.

Other changes:
- Adjust join filter analyzer to more explicitly use InDimFilter's ValuesSets,
  relying less on remembering to get it right to avoid copies.

* Remove unused "wrap" method.

* Fixes.

* Remove methods we do not need.

* Fix bug with INPUT_REF.
2023-09-20 10:44:32 -07:00
Gian Merlino 4f498e6469
SQL: Plan non-equijoin conditions as cross join followed by filter. (#14978)
* SQL: Plan non-equijoin conditions as cross join followed by filter.

Druid has previously refused to execute joins with non-equality-based
conditions. This was well-intentioned: the idea was to push people to
write their queries in a different, hopefully more performant way.

But as we're moving towards fuller SQL support, it makes more sense to
allow these conditions to go through with the best plan we can come up
with: a cross join followed by a filter. In some cases this will allow
the query to run, and people will be happy with that. In other cases,
it will run into resource limits during execution. But we should at
least give the query a chance.

This patch also updates the documentation to explain how people can
tell whether their queries are being planned this way.

* cartesian is a word.

* Adjust tests.

* Update docs/querying/datasource.md

Co-authored-by: Benedict Jin <asdf2014@apache.org>

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
2023-09-19 10:23:42 -07:00
George Shiqi Wu f773d83914
Mixed task runner for migration to mm-less ingestion (#14918)
* save work

* Working

* Fix runner constructor

* Working runner

* extra log lines

* try using lifecycle for everything

* clean up configs

* cleanup /workers call

* Use a single config

* Allow selecting runner

* debug changes

* Work on composite task runner

* Unit tests running

* Add documentation

* Add some javadocs

* Fix spelling

* Use standard libraries

* code review

* fix

* fix

* use taskRunner as string

* checkstyl

---------

Co-authored-by: Suneet Saldanha <suneet@apache.org>
2023-09-11 18:09:46 -07:00
317brian 3a453f7a3c
docs: add note about transparent_reconnection (#14953)
* add note about transparent_reconnection

* Update docs/api-reference/sql-jdbc.md
2023-09-11 11:58:39 -07:00
317brian 09f7dfe327
docs: update docusaurus 2 stuff (#14864) 2023-09-08 14:19:15 -07:00
Kashif Faraz 647686aee2
Add test and metrics for KillStalePendingSegments duty (#14951)
Changes:
- Add new metric `kill/pendingSegments/count` with dimension `dataSource`
- Add tests for `KillStalePendingSegments`
- Reduce no-op logs that spit out for each datasource even when no pending
segments have been deleted. This can get particularly noisy at low values of `indexingPeriod`.
- Refactor the code in `KillStalePendingSegments` for readability and add javadocs
2023-09-08 10:33:47 +05:30
Hardik Bajaj e100b18e86
Updated documentation for OshiSysMonitor (#14912) 2023-09-07 16:54:33 +05:30
Laksh Singla 6ee0b06e38
Auto configuration for maxSubqueryBytes (#14808)
A new monitor SubqueryCountStatsMonitor which emits the metrics corresponding to the subqueries and their execution is now introduced. Moreover, the user can now also use the auto mode to automatically set the number of bytes available per query for the inlining of its subquery's results.
2023-09-06 05:47:19 +00:00
Adarsh Sanjeev 959148ad37
Add code to wait for segments generated to be loaded on historicals (#14322)
Currently, after an MSQ query, the web console is responsible for waiting for the segments to load. It does so by checking if there are any segments loading into the datasource ingested into, which can cause some issues, like in cases where the segments would never be loaded, or would end up waiting for other ingests as well.

This PR shifts this responsibility to the controller, which would have the list of segments created.
2023-09-06 10:35:57 +05:30
Clint Wylie 706b57c0b2
fixup array and mvd sql docs (#14928) 2023-09-05 16:17:00 -07:00
Jill Osborne 425ebaa387
Query tips doc (#14922)
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2023-09-05 14:16:01 -07:00
Kashif Faraz ec630e3671
Remove deprecated coordinator dynamic configs (#14923)
Changes:

[A] Remove config `decommissioningMaxPercentOfMaxSegmentsToMove`
- It is a complicated config 😅 , 
- It is always desirable to prioritize move from decommissioning servers so that
they can be terminated quickly, so this should always be 100%
- It is already handled by `smartSegmentLoading` (enabled by default)

[B] Remove config `maxNonPrimaryReplicantsToLoad`
This was added in #11135 to address two requirements:
- Prevent coordinator runs from getting stuck assigning too many segments to historicals
- Prevent load of replicas from competing with load of unavailable segments

Both of these requirements are now already met thanks to:
- Round-robin segment assignment
- Prioritization in the new coordinator
- Modifications to `replicationThrottleLimit`
- `smartSegmentLoading` (enabled by default)
2023-09-04 11:54:36 +05:30
John Gerassimou d201ea0ece
prometheus-emitter: add extraLabels parameter (#14728)
* prometheus-emitter: add extraLabels parameter

* prometheus-emitter: update readme to include the extraLabels parameter

* prometheus-emitter: remove nullable and surface label name issues

* remove import to make linter happy
2023-08-29 12:02:22 -07:00
benkrug 8885805bb3
Update filters.md (#14917) 2023-08-28 15:29:00 -07:00
Kashif Faraz d6565f46b0
Increase the computed value of replicationThrottleLimit (#14913)
Changes
- Increase value of `replicationThrottleLimit` computed by `smartSegmentLoading` from
2% to 5% of total number of used segments.
- Assign replicas to a tier even when some replicas are already being loaded in that tier
- Limit the total number of replicas in load queue at start of run + replica assignments in
the run to the `replicationThrottleLimit`.

i.e. for every tier,
    num loading replicas at start of run + num replicas assigned in run <= replicationThrottleLimit
2023-08-28 18:20:22 +05:30
Victoria Lim 9142f4b8d7
docs: update note in automatic compaction doc (#14908) 2023-08-25 14:14:29 -07:00
Kashif Faraz e51181957c
Use num cores to determine balancerComputeThreads (#14902)
Changes:
- Determine the default value of balancerComputeThreads based on number of
coordinator cpus rather than number of segments. Even if the number of segments
is low and we create more balancer threads, it doesn't hurt the system as threads
would mostly be idle.
- Remove unused field from SegmentLoadQueueManager

Expected values:
- Clusters with ~1M segments typically work with Coordinators having 16 cores or more.
This would give us 8 balancer threads, which is the same as the current maximum.
- On small clusters, even a single thread is enough to do the required balancing work.
2023-08-25 08:15:27 +05:30
Abhishek Agarwal 3c7b237c22
Add docs for ingesting Kafka topic name (#14894)
Add documentation on how to extract the Kafka topic name and ingest it into the data.
2023-08-24 19:19:59 +05:30
Clint Wylie 36e659a501
remove group-by v1 (#14866)
* remove group-by v1

* docs

* remove unused configs, fix test

* fix test

* adjustments

* why not

* adjust

* review stuff
2023-08-23 12:44:06 -07:00
zachjsh 0c76df1c7d
Enable Continuous auto kill (#14831)
### Description

This change enables the `KillUnusedSegments` coordinator duty to be scheduled continuously. Things that prevented this, or made this difficult before were the following:

1. If scheduled at fast enough rate, the duty would find the same intervals to kill for the same datasources, while kill tasks submitted for those same datasources and intervals were already underway, thus wasting task slots on duplicated work.

2. The task resources used by auto kill were previously unbounded.  Each duty run period, if unused
 segments were found for any datasource, a kill task would be submitted to kill them.

This pr solves for both of these issues:

1. The duty keeps track of the end time of the last interval found when killing unused segments for each datasource, in a in memory map. The end time for each datasource, if found, is used as the start time lower bound, when searching for unused intervals for that same datasource. Each duty run, we remove any datasource keys from this map that are no longer found to match datasources in the system, or in whitelist, and also remove a datasource entry, if there is found to be no unused segments for the datasource, which happens when we fail to find an interval which includes unused segments. Removing the datasource entry from the map,  allows for searching for unusedSegments in the datasource from the beginning of time once again

2. The unbounded task resource usage can be mitigated with coordinator dynamic config added as part of ba957a9b97


Operators can configure continous auto kill by providing coordinator runtime properties similar to the following:

```
druid.coordinator.period.indexingPeriod=PT60S
druid.coordinator.kill.period=PT60S
```

And providing sensible limits to the killTask usage via coordinator dynamic properties.
2023-08-23 09:23:08 -04:00
Adarsh Sanjeev dfb5a98888
Add coordinator API for unused segments (#14846)
There is a current issue due to inconsistent metadata between worker and controller in MSQ. A controller can receive one set of segments, which are then marked as unused by, say, a compaction job. The worker would be unable to get the segment information as MetadataResource.
2023-08-23 14:51:25 +05:30
Giulio Talarico 76e5048aab
fix supervisor spec api submission commands (#14877) 2023-08-23 14:38:09 +05:30
Zoltan Haindrich e806d09309
Allow EARLIEST/EARLIEST_BY/LATEST/LATEST_BY for STRING columns without specifying maxStringBytes (#14848) 2023-08-22 22:50:19 -07:00
Clint Wylie fb053c399c
consolidate json and auto indexers, remove v4 nested column serializer (#14456) 2023-08-22 18:50:11 -07:00
Soumyava 6817de9376
Doc changes for avatica transparent reconnection (#14896) 2023-08-22 11:58:17 -07:00
Clint Wylie 194a9c9abc
set druid.expressions.useStrictBooleans to true by default (#14734) 2023-08-22 00:19:56 -07:00
Benedict Jin 18f7cb6926
Fixed broken URL of python api tutorial (#14881) 2023-08-22 09:53:41 +05:30
Clint Wylie 5d1412949e
enable sql compatible null handling mode by default (#14792)
* enable sql compatible null handling mode by default
* fix bug with string first/last aggs when druid.generic.useDefaultValueForNull=false
2023-08-21 20:07:13 -07:00
Katya Macedo 5f74ef56f1
Clean up Kafka supervisor topic (#14651)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-08-21 11:55:38 -07:00
Nhi Pham 9fe7c01c16
Automatic compaction API documentation refactor (#14740)
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2023-08-21 11:34:41 -07:00
Kashif Faraz 097b645005
Clean up after add kill bufferPeriod (#14868)
Follow up changes to #12599 

Changes:
- Rename column `used_flag_last_updated` to `used_status_last_updated`
- Remove new CLI tool `UpdateTables`.
  - We already have a `CreateTables` with similar functionality, which should be able to
  handle update cases too.
  - Any user running the cluster for the first time should either just have `connector.createTables`
  enabled or run `CreateTables` which should create tables at the latest version.
  - For instance, the `UpdateTables` tool would be inadequate when a new metadata table has
  been added to Druid, and users would have to run `CreateTables` anyway.
- Remove `upgrade-prep.md` and include that info in `metadata-init.md`.
- Fix log messages to adhere to Druid style
- Use lambdas
2023-08-19 00:00:04 +05:30
Lucas Capistrant 9c124f2cde
Add a configurable bufferPeriod between when a segment is marked unused and deleted by KillUnusedSegments duty (#12599)
* Add new configurable buffer period to create gap between mark unused and kill of segment

* Changes after testing

* fixes and improvements

* changes after initial self review

* self review changes

* update sql statement that was lacking last_used

* shore up some code in SqlMetadataConnector after self review

* fix derby compatibility and improve testing/docs

* fix checkstyle violations

* Fixes post merge with master

* add some unit tests to improve coverage

* ignore test coverage on new UpdateTools cli tool

* another attempt to ignore UpdateTables in coverage check

* change column name to used_flag_last_updated

* fix a method signature after column name switch

* update docs spelling

* Update spelling dictionary

* Fixing up docs/spelling and integrating altering tasks table with my alteration code

* Update NULL values for used_flag_last_updated in the background

* Remove logic to allow segs with null used_flag_last_updated to be killed regardless of bufferPeriod

* remove unneeded things now that the new column is automatically updated

* Test new background row updater method

* fix broken tests

* fix create table statement

* cleanup DDL formatting

* Revert adding columns to entry table by default

* fix compilation issues after merge with master

* discovered and fixed metastore inserts that were breaking integration tests

* fixup forgotten insert by using pattern of sharing now timestamp across columns

* fix issue introduced by merge

* fixup after merge with master

* add some directions to docs in the case of segment table validation issues
2023-08-17 19:32:51 -05:00
Abhishek Radhakrishnan 37db5d9b81
Reset offsets supervisor API (#14772)
* Add supervisor /resetOffsets API.

- Add a new endpoint /druid/indexer/v1/supervisor/<supervisorId>/resetOffsets
which accepts DataSourceMetadata as a body parameter.
- Update logs, unit tests and docs.

* Add a new interface method for backwards compatibility.

* Rename

* Adjust tests and javadocs.

* Use CoreInjectorBuilder instead of deprecated makeInjectorWithModules

* UT fix

* Doc updates.

* remove extraneous debugging logs.

* Remove the boolean setting; only ResetHandle() and resetInternal()

* Relax constraints and add a new ResetOffsetsNotice; cleanup old logic.

* A separate ResetOffsetsNotice and some cleanup.

* Minor cleanup

* Add a check & test to verify that sequence numbers are only of type SeekableStreamEndSequenceNumbers

* Add unit tests for the no op implementations for test coverage

* CodeQL fix

* checkstyle from merge conflict

* Doc changes

* DOCUSAURUS code tabs fix. Thanks, Brian!
2023-08-17 14:13:10 -07:00
Abhishek Agarwal b97cc45d81
Add clarification to the docs for multi-topic Kafka ingestion (#14847)
Follow-up to #14828. Added some more clarification about how topicPattern is used.
2023-08-17 12:52:06 +05:30
317brian 6b4dda964d
Docusaurus2 upgrade for master (#14411)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-08-16 19:01:21 -07:00
YongGang 3954685aae
Report more metrics to monitor K8s task runner (#14771)
* Report pod running metrics to monitor K8s task runner

* refine method definition

* fix checkstyle

* implement task metrics

* more comment

* address comments

* update doc for the new metrics reported

* fix checkstyle

* refine method definition

* minor refine
2023-08-16 14:03:53 -04:00
Abhishek Agarwal 7911a04064
Refactoring of multi-topic kafka ingestion docs (#14828)
In this PR, I have gotten rid of multiTopic parameter and instead added a topicPattern parameter. Kafka supervisor will pass topicPattern or topic as the stream name to the core ingestion engine. There is validation to ensure that only one of topic or topicPattern will be set. This new setting is easier to understand than overloading the topic field that earlier could be interpreted differently depending on the value of some other field.
2023-08-16 18:00:11 +05:30
Nhi Pham 8fa78594ea
Druid SQL API documentation refactor (#14711) 2023-08-15 13:45:25 -07:00
Nhi Pham a38579ab3c
Retention rules API documentation refactor (#14623) 2023-08-15 13:44:44 -07:00
Abhishek Agarwal 30b5dd4ca7
Add support to read from multiple kafka topics in same supervisor (#14424)
This PR adds support to read from multiple Kafka topics in the same supervisor. A multi-topic ingestion can be useful in scenarios where a cluster admin has no control over input streams. Different teams in an org may create different input topics that they can write the data to. However, the cluster admin wants all this data to be queryable in one data source.
2023-08-14 22:24:49 +05:30
Soumyava afe22907a5
Calcite upgrade 1.35 (#14510)
* Update to Calcite 1.35.0
* Update from.ftl for Calcite 1.35.0.
* Fixed tests in Calcite upgrade by doing the following:
1. Added a new rule, CoreRules.PROJECT_FILTER_TRANSPOSE_WHOLE_PROJECT_EXPRESSIONS, to Base rules
2. Refactored the CorrelateUnnestRule
3. Updated CorrelateUnnestRel accordingly
4. Fixed a case with selector filters on the left where Calcite was eliding the virtual column
5. Additional test cases for fixes in 2,3,4
6. Update to StringListAggregator to fail a query if separators are not propagated appropriately
* Refactored for testcases to pass after the upgrade, introduced 2 new data sources for handling filters and select projects
* Added a literalSqlAggregator as the upgraded Calcite involved changes to subquery remove rule. This corrected plans for 2 queries with joins and subqueries by replacing an useless literal dimension with a post agg. Additionally a test with COUNT DISTINCT and FILTER which was failing with Calcite 1.21 is added here which passes with 1.35
* Updated to latest avatica and updated code as SqlUnknownTimeStamp is now used in Calcite which needs to be resolved to a timestamp literal
* Added a wrapper segment ref to use for unnest and filter segment reference
2023-08-11 12:47:16 -07:00
zachjsh 82d82dfbd6
Add stats to KillUnusedSegments coordinator duty (#14782)
### Description

Added the following metrics, which are calculated from the `KillUnusedSegments` coordinatorDuty

`"killTask/availableSlot/count"`: calculates the number remaining task slots available for auto kill
`"killTask/maxSlot/count"`: calculates the maximum number of tasks available for auto kill
`"killTask/task/count"`: calculates the number of tasks submitted by auto kill. 

#### Release note
NEW: metrics added for auto kill

`"killTask/availableSlot/count"`: calculates the number remaining task slots available for auto kill
`"killTask/maxSlot/count"`: calculates the maximum number of tasks available for auto kill
`"killTask/task/count"`: calculates the number of tasks submitted by auto kill.
2023-08-10 18:36:53 -04:00
Laksh Singla 8f102f9031
Introduce StorageConnector for Azure (#14660)
The Azure connector is introduced and MSQ's fault tolerance and durable storage can now be used with Microsoft Azure's blob storage. Also, the results of newly introduced queries from deep storage can now store and fetch the results from Azure's blob storage.
2023-08-09 12:25:27 +00:00
Tejaswini Bandlamudi a45b25fa1d
Removes support for Hadoop 2 (#14763)
Removing Hadoop 2 support as discussed in https://lists.apache.org/list?dev@druid.apache.org:lte=1M:hadoop
2023-08-09 17:47:52 +05:30
Clint Wylie e57f880020
document new filters and stuff (#14760) 2023-08-08 16:01:06 -07:00
Clint Wylie 667e4dab5e
document expression aggregator (#14497) 2023-08-08 15:49:29 -07:00
317brian 8a4dabc431
docs: remove experimental from schema auto-discoery (#14759) 2023-08-08 12:45:44 -07:00
zachjsh 660e6cfa01
Allow for task limit on kill tasks spawned by auto kill coordinator duty (#14769)
### Description

Previously, the `KillUnusedSegments` coordinator duty, in charge of periodically deleting unused segments, could spawn an unlimited number of kill tasks for unused segments. This change adds 2 new coordinator dynamic configs that can be used to control the limit of tasks spawned by this coordinator duty

`killTaskSlotRatio`: Ratio of total available task slots, including autoscaling if applicable that will be allowed for kill tasks. This limit only applies for kill tasks that are spawned automatically by the coordinator's auto kill duty. Default is 1, which allows all available tasks to be used, which is the existing behavior

`maxKillTaskSlots`: Maximum number of tasks that will be allowed for kill tasks. This limit only applies for kill tasks that are spawned automatically by the coordinator's auto kill duty. Default is INT.MAX, which essentially allows for unbounded number of tasks, which is the existing behavior. 

Realize that we can effectively get away with just the one `killTaskSlotRatio`, but following similarly to the compaction config, which has similar properties; I thought it was good to have some control of the upper limit regardless of ratio provided.

#### Release note
NEW: `killTaskSlotRatio`  and `maxKillTaskSlots` coordinator dynamic config properties added that allow control of task resource usage spawned by `KillUnusedSegments` coordinator task (auto kill)
2023-08-08 08:40:55 -04:00
Suneet Saldanha 2af0ab2425
Metric to report time spent fetching and analyzing segments (#14752)
* Metric to report time spent fetching and analyzing segments

* fix test

* spell check

* fix tests

* checkstyle

* remove unused variable

* Update docs/operations/metrics.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/metrics.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/metrics.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

---------

Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2023-08-07 18:32:48 -07:00
Abhishek Radhakrishnan bff8f9e12e
Update kinesis docs (#14768)
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2023-08-07 17:08:34 -07:00
Victoria Lim 7d7813372a
Docs: Include EARLIEST_BY and LATEST_BY as supported aggregation functions (#14280) 2023-08-07 09:59:12 -07:00
Kashif Faraz 2d8e0f28f3
Refactor: Cleanup coordinator duties for metadata cleanup (#14631)
Changes
- Add abstract class `MetadataCleanupDuty`
- Make `KillAuditLogs`, `KillCompactionConfig`, etc extend `MetadataCleanupDuty` 
- Improve log and error messages
- Cleanup tests
- No functional change
2023-08-05 13:08:23 +05:30
Suneet Saldanha 62ddeaf16f
Additional dimensions for service/heartbeat (#14743)
* Additional dimensions for service/heartbeat

* docs

* review

* review
2023-08-04 11:01:07 -07:00
Suneet Saldanha 590734b5eb
Update tutorial-kafka.md (#14749) 2023-08-04 10:56:33 -07:00
Laksh Singla d6c73ca6e5
Cleanup the documentation for deep storage 2023-08-04 10:20:01 +00:00
317brian 3b5b6c6a41
docs: query from deep storage (#14609)
* cold tier wip

* wip

* copyedits

* wip

* copyedits

* copyedits

* wip

* wip

* update rules page

* typo

* typo

* update sidebar

* moves durable storage info to its own page in operations

* update screenshots

* add apache license

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* add query from deep storage tutorial stub

* address some of the feedback

* revert screenshot update. handled in separate pr

* load rule update

* wip tutorial

* reformat deep storage endpoints

* rest of tutorial

* typo

* cleanup

* screenshot and sidebar for tutorial

* add license

* typos

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* rest of review comments

* clarify where results are stored

* update api reference for durablestorage context param

* Apply suggestions from code review

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>

* comments

* incorporate #14720

* address rest of comments

* missed one

* Update docs/api-reference/sql-api.md

* Update docs/api-reference/sql-api.md

---------

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: demo-kratia <56242907+demo-kratia@users.noreply.github.com>
Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
2023-08-04 11:10:08 +05:30
zachjsh ba957a9b97
Add ability to limit the number of segments killed in kill task (#14662)
### Description

Previously, the `maxSegments` configured for auto kill could be ignored if an interval of data for a  given datasource had more than this number of unused segments, causing the kill task spawned with the task of deleting unused segments in that given interval of data to delete more than the `maxSegments` configured. Now each kill task spawned by the auto kill coordinator duty, will kill at most `limit` segments. This is done by adding a new config property to the `KillUnusedSegmentTask` which allows users to specify this limit.
2023-08-03 22:17:04 -04:00
George Shiqi Wu 174053f4fd
Add readme for kubernetes-overlord-extensions and update docs (#14674)
* Add readme for kubernetes task scheduler

* clean up uneeded stuff

* Update extensions-contrib/kubernetes-overlord-extensions/README.md

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* Move documentation into main page

* indentation

* cleanup spellcheck errors

* Update docs/development/extensions-contrib/k8s-jobs.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Update extensions-contrib/kubernetes-overlord-extensions/README.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Update docs/development/extensions-contrib/k8s-jobs.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* PR comments

* Update docs/development/extensions-contrib/k8s-jobs.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Update docs/development/extensions-contrib/k8s-jobs.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Update docs/development/extensions-contrib/k8s-jobs.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>

---------

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
Co-authored-by: Suneet Saldanha <suneet@apache.org>
2023-08-01 13:29:44 -07:00
Kashif Faraz 10328c0743
Rename metadatacache and serverview metrics (#14716) 2023-08-01 14:18:20 +05:30
Gian Merlino 5387f1bac0
Remove chatAsync parameter, so chat is always async. (#14692)
* Remove chatAsync parameter, so chat is always async.

chatAsync has been made default in Druid 26. I have seen good
battle-testing of it in production, and am comfortable removing the
older sync client.

This was the last remaining usage of IndexTaskClient, so this patch
deletes all that stuff too.

* Remove unthrown exception.

* Remove unthrown exception.

* No more TimeoutException.
2023-07-31 19:42:51 -07:00
Jason Koch 44d5c1a15f
split KillUnusedSegmentsTask to processing in smaller chunks (#14642)
split KillUnusedSegmentsTask to smaller batches

Processing in smaller chunks allows the task execution to yield the TaskLockbox lock,
which allows the overlord to continue being responsive to other tasks and users while
this particular kill task is executing.

* introduce KillUnusedSegmentsTask batchSize parameter to control size of batching

* provide an explanation for kill task batchSize parameter

* add logging details for kill batch progress
2023-07-31 12:56:27 -07:00
Nhi Pham 53733d2542
JSON-querying API documentation refactor (#14589)
Co-authored-by: Jill Osborne <jill.osborne@imply.io>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-07-28 10:55:17 -07:00
Nhi Pham ee9cfc7e18
Tasks API documentation spacing update (#14633) 2023-07-27 14:24:55 -07:00
Nhi Pham 482def788f
Supervisor API documentation refactor (#14579) 2023-07-27 12:58:37 -07:00
Katya Macedo 0b9e4af443
Clean up some of the descriptions (#14661)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2023-07-27 12:11:58 -07:00
Nhi Pham dd204e596d
Refresh the OS Druid web console screenshots (#14397)
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2023-07-26 16:32:03 -07:00
slfan1989 d69edb7723
Docs: Fix some typos. (#14663)
---------

Co-authored-by: slfan1989 <louj1988@@>
2023-07-26 21:24:18 +05:30
Katya Macedo 4804630c78
Clean up Kinesis doc (#14529) 2023-07-25 19:24:36 -07:00
Nhi Pham 2dc3e94a9a
Service status API documentation refactor (#14528)
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-07-25 18:42:47 -07:00
Karan Kumar 77e0c16bce
Sql statement api error messaging fixes. (#14629)
* Error messaging fixes.

* Static check fix

* Review comments
2023-07-20 22:48:44 +05:30
Gian Merlino bac5ef347c
Add ingest/input/bytes metric and Kafka consumer metrics. (#14582)
* Add ingest/input/bytes metric and Kafka consumer metrics.

New metrics:

1) ingest/input/bytes. Equivalent to processedBytes in the task reports.

2) kafka/consumer/bytesConsumed: Equivalent to the Kafka consumer
   metric "bytes-consumed-total". Only emitted for Kafka tasks.

3) kafka/consumer/recordsConsumed: Equivalent to the Kafka consumer
   metric "records-consumed-total". Only emitted for Kafka tasks.

* Fix anchor.

* Fix KafkaConsumerMonitor.

* Interface updates.

* Doc changes.

* Update indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTask.java

Co-authored-by: Benedict Jin <asdf2014@apache.org>

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
2023-07-20 10:56:22 +08:00
Jaehui Lee 1f4ee5e21b
Docs: Change default value of "maxRowsInMemory" in tuningConfig (#14618)
Reflecting fixes from https://github.com/apache/druid/pull/13939
2023-07-19 23:14:15 +05:30
Abhishek Radhakrishnan f4d0ea7bc8
Add support for earliest `aggregatorMergeStrategy` (#14598)
* Add EARLIEST aggregator merge strategy.

- More unit tests.
- Include the aggregators analysis type by default in tests.

* Docs.

* Some comments and a test

* Collapse into individual code blocks.
2023-07-18 12:37:10 -07:00
Kashif Faraz cab93fb817
Docs: Minor change missed in #14590 (#14604)
Changes:
- Rephrased the description of `smartSegmentLoading`
- Moved detail about value computation outside of quoted block as it is important.
2023-07-18 19:58:37 +05:30
Kashif Faraz 88dc330da2
Docs: Changes for coordinator improvements done in #13197 (#14590) 2023-07-18 14:22:00 +05:30
Atul Mohan 03d6d395a0
Extension to read and ingest iceberg data files (#14329)
This adds a new contrib extension: druid-iceberg-extensions which can be used to ingest data stored in Apache Iceberg format. It adds a new input source of type iceberg that connects to a catalog and retrieves the data files associated with an iceberg table and provides these data file paths to either an S3 or HDFS input source depending on the warehouse location.

Two important dependencies associated with Apache Iceberg tables are:

Catalog : This extension supports reading from either a Hive Metastore catalog or a Local file-based catalog. Support for AWS Glue is not available yet.
Warehouse : This extension supports reading data files from either HDFS or S3. Adapters for other cloud object locations should be easy to add by extending the AbstractInputSourceAdapter.
2023-07-18 08:59:57 +05:30
Abhishek Radhakrishnan 1f6507dd60
Remove the deprecated `InsertCannotOrderByDescending` MSQ fault (#14588)
The deprecated MSQ fault, InsertCannotOrderByDescending, is removed.
2023-07-17 09:23:39 +00:00
Gian Merlino 95ca43034f
Change default handoffConditionTimeout to 15 minutes. (#14539)
* Change default handoffConditionTimeout to 15 minutes.

Most of the time, when handoff is taking this long, it's because something
is preventing Historicals from loading new data. In this case, we have
two choices:

1) Stop making progress on ingestion, wait for Historicals to load stuff,
   and keep the waiting-for-handoff segments available on realtime tasks.
   (handoffConditionTimeout = 0, the current default)

2) Continue making progress on ingestion, by exiting the realtime tasks
   that were waiting for handoff. Once the Historicals get their act
   together, the segments will be loaded, as they are still there on
   deep storage. They will just not be continuously available.
   (handoffConditionTimeout > 0)

I believe most users would prefer [2], because [1] risks ingestion falling
behind the stream, which causes many other problems. It can cause data loss
if the stream ages-out data before we have a chance to ingest it.

Due to the way tuningConfigs are serialized -- defaults are baked into the
serialized form that is written to the database -- this default change will
not change anyone's existing supervisors. It will take effect for newly
created supervisors.

* Fix tests.

* Update docs/development/extensions-core/kafka-supervisor-reference.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/kinesis-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

---------

Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2023-07-13 13:17:14 -07:00
Abhishek Radhakrishnan f4ee58eaa8
Add `aggregatorMergeStrategy` property in SegmentMetadata queries (#14560)
* Add aggregatorMergeStrategy property to SegmentMetadaQuery.

- Adds a new property aggregatorMergeStrategy to segmentMetadata query.
aggregatorMergeStrategy currently supports three types of merge strategies -
the legacy strict and lenient strategies, and the new latest strategy.
- The latest strategy considers the latest aggregator from the latest segment
by time order when there's a conflict when merging aggregators from different
segments.
- Deprecate lenientAggregatorMerge property; The API validates that both the new
and old properties are not set, and returns an exception.
- When merging segments as part of segmentMetadata query, the segments have a more
elaborate id -- <datasource>_<interval>_merged_<partition_number> format, similar to
the name format that segments usually contain. Previously it was simply "merged".
- Adjust unit tests to test the latest strategy, to assert the returned complete
SegmentAnalysis object instead of just the aggregators for completeness.

* Don't explicitly set strict strategy in tests

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/querying/segmentmetadataquery.md

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

---------

Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2023-07-13 12:37:36 -04:00
Katya Macedo 12ce187ae4
Update slack text (#14578) 2023-07-12 12:08:48 -07:00
cristian-popa d21c54fb73
Cross reference backpressure info (#14508)
Co-authored-by: Jill Osborne <jill.osborne@imply.io>
Co-authored-by: Cristian Popa <cristian.popa@imply.io>
2023-07-12 10:02:04 -07:00
Gian Merlino 3ff51487b7
Add ZooKeeper connection state alerts and metrics. (#14333)
* Add ZooKeeper connection state alerts and metrics.

- New metric "zk/connected" is an indicator showing 1 when connected,
  0 when disconnected.
- New metric "zk/disconnected/time" measures time spent disconnected.
- New alert when Curator connection state enters LOST or SUSPENDED.

* Use right GuardedBy.

* Test fixes, coverage.

* Adjustment.

* Fix tests.

* Fix ITs.

* Improved injection.

* Adjust metric name, add tests.
2023-07-12 09:34:28 -07:00
hqx871 7142b0c39e
Enable result level cache for GroupByStrategyV2 on broker (#11595)
Cache is disabled for GroupByStrategyV2 on broker since the pr #3820 [groupBy v2: Results not fully merged when caching is enabled on the broker]. But we can enable the result-level cache on broker for GroupByStrategyV2 and keep the segment-level cache disabled.
2023-07-12 15:00:01 +05:30
Nhi Pham d76903f10b
Tasks API documentation refactor (#14492)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-07-11 13:19:39 -07:00
Abhishek Radhakrishnan 854ef98235
Minor doc fixes. (#14565)
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2023-07-11 13:12:40 -07:00
Nhi Pham a764ed7fde
Update Jupyter notebook tutorial instructions for ARM devices (#14459)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-07-11 10:01:20 -07:00
Laksh Singla 5ce536355e
Fix planning bug while using sort merge frame processor (#14450)
sqlJoinAlgorithm is now a hint to the planner to execute the join in the specified manner. The planner can decide to ignore the hint if it deduces that the specified algorithm can be detrimental to the performance of the join beforehand.
2023-07-11 09:58:44 +00:00
Gian Merlino 63ee69b4e8
Claim full support for Java 17. (#14384)
* Claim full support for Java 17.

No production code has changed, except the startup scripts.

Changes:

1) Allow Java 17 without DRUID_SKIP_JAVA_CHECK.

2) Include the full list of opens and exports on both Java 11 and 17.

3) Document that Java 17 is both supported and preferred.

4) Switch some tests from Java 11 to 17 to get better coverage on the
   preferred version.

* Doc update.

* Update errorprone.

* Update docker_build_containers.sh.

* Update errorprone in licenses.yaml.

* Add some more run-javas.

* Additional run-javas.

* Update errorprone.

* Suppress new errorprone error.

* Add exports and opens in ForkingTaskRunner for Java 11+.

Test, doc changes.

* Additional errorprone updates.

* Update for errorprone.

* Restore old fomatting in LdapCredentialsValidator.

* Copy bin/ too.

* Fix Java 15, 17 build line in docker_build_containers.sh.

* Update busybox image.

* One more java command.

* Fix interpolation.

* IT commandline refinements.

* Switch to busybox 1.34.1-glibc.

* POM adjustments, build and test one IT on 17.

* Additional debugging.

* Fix silly thing.

* Adjust command line.

* Add exports and opens one more place.

* Additional harmonization of strong encapsulation parameters.
2023-07-07 12:52:35 -07:00
Katya Macedo 5f94a2a9c2
Add link to Slack channel (#14553)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-07-07 10:09:15 -07:00
Kashif Faraz 87bb1b9709
Fix bug during initialization of HttpServerInventoryView (#14517)
If a server is removed during `HttpServerInventoryView.serverInventoryInitialized`,
the initialization gets stuck as this server is never synced. The method eventually times
out (default 250s).

Fix: Mark a server as stopped if it is removed. `serverInventoryInitialized` only waits for
non-stopped servers to sync.

Other changes:
- Add new metrics for better debugging of slow broker/coordinator startup
  - `segment/serverview/sync/healthy`: whether the server view is syncing properly with a server
  - `segment/serverview/sync/unstableTime`: time for which sync with a server has been unstable  
- Clean up logging in `HttpServerInventoryView` and `ChangeRequestHttpSyncer`
- Minor refactor for readability
- Add utility class `Stopwatch`
- Add tests and stubs
2023-07-06 13:04:53 +05:30
Kashif Faraz a6547febaf
Remove unused coordinator dynamic configs (#14524)
After #13197 , several coordinator configs are now redundant as they are not being
used anymore, neither with `smartSegmentLoading` nor otherwise.

Changes:
- Remove dynamic configs `emitBalancingStats`: balancer error stats are always
emitted, debug stats can be logged by using `debugDimensions`
- `useBatchedSegmentSampler`, `percentOfSegmentsToConsiderPerMove`:
batched segment sampling is always used
- Add test to verify deserialization with unknown properties
- Update `CoordinatorRunStats` to always track stats, this can be optimized later.
2023-07-06 12:11:10 +05:30
Victoria Lim 50b7e5d20e
docs: fix links (#14504) 2023-07-05 12:29:47 -07:00
Jakub Matyszewski cc159f4317
docs: k8s-jobs role needs batch apigroup (#14343) 2023-07-04 14:34:20 +05:30
Clint Wylie 277aaa5c57
remove druid.processing.columnCache.sizeBytes and CachingIndexed, combine string column implementations (#14500)
* combine string column implementations
changes:
* generic indexed, front-coded, and auto string columns now all share the same column and index supplier implementations
* remove CachingIndexed implementation, which I think is largely no longer needed by the switch of many things to directly using ByteBuffer, avoiding the cost of creating Strings
* remove ColumnConfig.columnCacheSizeBytes since CachingIndexed was the only user
2023-07-02 19:37:15 -07:00
Gian Merlino e10e35aa2c
Add REGEXP_REPLACE function. (#14460)
* Add REGEXP_REPLACE function.

Replaces all instances of a pattern with a replacement string.

* Fixes.

* Improve test coverage.

* Adjust behavior.
2023-06-29 13:47:57 -07:00
Adarsh Sanjeev 0335aaa279
Add query results directory and prevent the auto cleaner from cleaning it (#14446)
Adds support for automatic cleaning of a "query-results" directory in durable storage. This directory will be cleaned up only if the task id is not known to the overlord. This will allow the storage of query results after the task has finished running.
2023-06-28 10:14:04 +05:30
Laksh Singla f546cd64a9
MSQ: Ensure that the allocated segment aligns with the requested granularity (#14475)
Changes:
- Throw an `InsertCannotAllocateSegmentFault` if the allocated segment is not aligned with
the requested granularity.
- Tests to verify new behaviour
2023-06-27 09:25:32 +05:30
Abhishek Radhakrishnan 79bff4bbf7
Improvements to `EXPLAIN PLAN` attributes (#14441)
* Updates: use the target table directly, sanitized replace time chunks and clustered by cols.

* Add DruidSqlParserUtil and tests.

* minor refactor

* Use SqlUtil.isLiteral

* Throw ValidationException if CLUSTERED BY column descending order is specified.

- Fails query planning

* Some more tests.

* fixup existing comment

* Update comment

* checkstyle fix: remove unused imports

* Remove InsertCannotOrderByDescendingFault and deprecate the fault in readme.

* minor naming

* move deprecated field to the bottom

* update docs.

* add one more example.

* Collapsible query and result

* checkstyle fixes

* Code cleanup

* order by changes

* conditionally set attributes only for explain queries.

* Cleaner ordinal check.

* Add limit test and update javadoc.

* Commentary and minor adjustments.

* Checkstyle fixes.

* One more checkArg.

* add unexpected kind to exception.
2023-06-26 23:01:11 -04:00
Nhi Pham 579b93f282
API reference refactor (#14372)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-06-26 15:48:54 -07:00
Katya Macedo fc08617e9e
[Docs] Clean up druid.processing.intermediaryData.storage.type description (#14431)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-06-26 11:46:54 -07:00