Commit Graph

3162 Commits

Author SHA1 Message Date
George Shiqi Wu d5a25a94b8
Docs: Clarify that all supervisors can support early handoff (#16588) 2024-06-13 08:43:22 +05:30
YongGang 46dbc74053
Support Dynamic Peon Pod Template Selection in K8s extension (#16510)
* initial commit

* add Javadocs

* refine JSON input config

* more test and fix build

* extract existing behavior as default strategy

* change template mapping fallback

* add docs

* update doc

* fix doc

* address comments

* define Matcher interface

* fix test coverage

* use lower case for endpoint path

* update Json name

* add more tests

* refactoring Selector class
2024-06-12 15:27:10 -07:00
Andreas Maechler fec48432d4
docs: Correct some outdated module names (#16584)
* Fix module names

* Better spacing

* Some spacing

* Suggestions from code review

Thanks Abhishek.

* More links

* Roll-up time

* Remove logs

* More spelling
2024-06-11 14:17:40 -07:00
Andreas Maechler 24056b90b5
Bring back missing property in indexer documentation (#16582)
* Bring back druid.peon.taskActionClient.retry.minWait

* Update docs/configuration/index.md

* Consistent italics

Thanks Abhishek.

* Update docs/configuration/index.md

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>

* Consistent list style

* Remove extra space

---------

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
2024-06-10 16:52:54 -07:00
Kashif Faraz e4fdf1055b
Update default value of `druid.indexer.tasklock.batchAllocationWaitTime` to zero (#16578)
Update default value of druid.indexer.tasklock.batchAllocationWaitTime to 0.
Thus, a segment allocation request is processed immediately unless there are already some requests queued before this one. While in queue, a segment allocation request may get clubbed together with other similar requests into a batch to reduce load on the metadata store.
2024-06-10 20:07:23 +05:30
317brian 8e11adfc6f
docs: remove outdated druidversion var from a page (#16570)
Co-authored-by: asdf2014 <asdf2014@apache.org>
2024-06-10 15:30:36 +08:00
Gian Merlino b837ce565b
Simplify serialized form of JsonInputFormat. (#15691)
* Simplify serialized form of JsonInputFormat.

Use JsonInclude for keepNullColumns, assumeNewlineDelimited, and
useJsonNodeReader. Because the default value of keepNullColumns is
variable, we store the original configured value rather than the
derived value, and include if the original value is nonnull.

* Fix test.
2024-06-05 20:01:14 -07:00
Katya Macedo 7aecc09230
Docs: Remove circular link (#16553) 2024-06-05 11:07:36 -07:00
Charles Smith c100ae0ecc
Add a tutorial for LATEST_BY to get most recent data (#16515)
Co-authored-by: Will Xu <2bethere@gmail.com>
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2024-06-04 17:00:25 -07:00
Jill Osborne 8b5802d4cd
docs: add maxSubqueryBytes limit to migration guide landing page (#16547)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2024-06-04 12:52:06 -07:00
Amit 540d3e6af5
Added new use cases and description of the use case - 5/14/24 (#16451)
Thanks for your contribution @amit-git-account

* Added new use cases and description of the use case - 5/14/24

The use case listing is not changed in a long time. While speaking with users, I came across several other use cases not listed here in the index. So I added new use cases and also added description against the use cases.

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* update spelling file

* Update docs/design/index.md

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: Benedict Jin <asdf2014@apache.org>
2024-06-04 09:47:49 -07:00
Charles Smith 8f78c901e7
docs: add lookups to the sidebar (#16530)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2024-06-03 16:04:15 -07:00
Charles Smith b1568fb95b
docs: Adds a redirect for flatten-json which was removed (#16263) 2024-05-31 16:16:12 -07:00
Katya Macedo f70ef1f434
Update front coding text (#16491)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2024-05-31 15:13:10 -07:00
Katya Macedo 92e660dd21
Add Druid 30.0.0 upgrade notes (#16522) 2024-05-31 13:23:22 -07:00
Atul Mohan b53d75758f
IcebergInputSource : Add option to toggle case sensitivity while reading columns from iceberg catalog (#16496)
* Toggle case sensitivity while reading columns from iceberg

* Fix tests

* Drop case check and set unconditionally
2024-05-31 10:18:52 -07:00
George Shiqi Wu 0936798122
Add limit to task payload size (#16512)
* Add limit to task payload size

* Change to a warning

* Remove test

* Fix unit tests

* Optionally throw alert

* PR comments

* Update indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* PR comments

* Reject large payloads

* Update docs/configuration/index.md

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* Update indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

---------

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2024-05-31 09:17:36 -07:00
Jill Osborne 3c72ec8413
docs: Migration guide for subquery limit (#16519)
Adds a migration guide for Druid 30 to help users understand the new byte-based subquery limit property maxSubqueryBytes
2024-05-31 09:26:07 +05:30
Charles Smith 92e565e3b8
Adds a migration guide overview page to the release-info section (#16506)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: Katya Macedo <katya.macedo@imply.io>
2024-05-30 09:50:30 -07:00
Adithya Chakilam a9044ac235
Add cgroup cpu/mem/disk usage metrics (#16472)
* Add cgroup cpu/mem usage metrics

* checks

* comments

* docs fix

* add disk metrics

* fapi check

* checkstyle

* issues

* spelling

* change asserts

* checks

* use proc builder instead of runtime

* specify charset

* spotbug
2024-05-29 12:44:37 -07:00
George Shiqi Wu b3b62ac431
Update azure input source docs (#16508)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2024-05-29 10:00:46 -07:00
Vadim Ogievetsky 10ea88e5bf
Web console: more robust durable storage setting detection (#16493)
* more robust durable storage setting

* add test
2024-05-22 15:47:20 -07:00
Vadim Ogievetsky a124c6cbbd
fix typo in extension name (#16466) 2024-05-20 09:47:22 +08:00
George Shiqi Wu ed9881df88
Cleanup logic from handoff API (#16457)
* Cleanup logic from handoff API

* Fix test

* Fix checkstyle

* Update docs
2024-05-16 08:42:44 -07:00
Gian Merlino 72432c2e78
Speed up SQL IN using SCALAR_IN_ARRAY. (#16388)
* Speed up SQL IN using SCALAR_IN_ARRAY.

Main changes:

1) DruidSqlValidator now includes a rewrite of IN to SCALAR_IN_ARRAY, when the size of
   the IN is above inFunctionThreshold. The default value of inFunctionThreshold
   is 100. Users can restore the prior behavior by setting it to Integer.MAX_VALUE.

2) SearchOperatorConversion now generates SCALAR_IN_ARRAY when converting to a regular
   expression, when the size of the SEARCH is above inFunctionExprThreshold. The default
   value of inFunctionExprThreshold is 2. Users can restore the prior behavior by setting
   it to Integer.MAX_VALUE.

3) ReverseLookupRule generates SCALAR_IN_ARRAY if the set of reverse-looked-up values is
   greater than inFunctionThreshold.

* Revert test.

* Additional coverage.

* Update docs/querying/sql-query-context.md

Co-authored-by: Benedict Jin <asdf2014@apache.org>

* New test.

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
2024-05-14 08:09:27 -07:00
George Shiqi Wu c1bf4fed90
API for stopping streaming tasks early (#16310)
* Try stopping task early

* Fix checkstyle

* Add unit test

* Add a couple more tests

* PR changes

* Use notice

* fix checkstyle

* PR changes

* Update indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Update indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Change payload

* Remove quotes

---------

Co-authored-by: Suneet Saldanha <suneet@apache.org>
2024-05-14 06:39:50 -07:00
Alberic Liu 811dcd1726
update protobuf.md (#16434) 2024-05-11 17:52:54 +08:00
Charles Smith 2d0b4e5f1e
Update sidebar to organize tutorials + other minor improvements (#16184)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2024-05-09 08:57:43 -07:00
Adarsh Sanjeev 269e035e76
Add validation for reindex with realtime sources (#16390)
Add validation for reindex with realtime sources.

With the addition of concurrent compaction, it is possible to ingest data while querying from realtime sources with MSQ into the same datasource. This could potentially lead to issues if the interval that is ingested into is replaced by an MSQ job, which has queried only some of the data from the realtime task.

This PR adds validation to check that the datasource being ingested into is not being queried from, if the query includes realtime sources.
2024-05-07 10:32:15 +05:30
Misha b5958b6b07
Feature configurable calcite bloat (#16248)
* Configurable bloat for calcite ProjectMergeRule implemented

* Comment added

* Default bloat value increased to 1000

* Implemented bloat configuration from QueryContext

* Code refactored, docs updated

---------

Co-authored-by: sviatahorau <mikhail.sviatahorau@deep.bi>
2024-05-06 20:43:39 +05:30
Alberic Liu 92fb0ff718
upgrade mysql:mysql-connector-java to 8.2.0 (#16024)
* upgrade mysql:mysql-connector-java to 8.2.0

* fix the check errors

* remove unused comment
2024-05-06 21:58:37 +08:00
Rishabh Singh c61c3785a0
Followup changes to 15817 (Segment schema publishing and polling) (#16368)
* Fix build

* Nit changes in KillUnreferencedSegmentSchema

* Replace reference to the abbreviation SMQ with Metadata Query, rename inTransit maps in schema cache

* nitpicks

* Remove reference to smq abbreviation from integration-tests

* Remove reference to smq abbreviation from integration-tests

* minor change

* Update index.md

* Add delimiter while computing schema fingerprint hash
2024-05-03 19:13:52 +05:30
Kashif Faraz 51104e8bb3
Docs: Remove references to Zk-based segment loading (#16360)
Follow up to #15705

Changes:
- Remove references to ZK-based segment loading in the docs
- Fix doc for existing config `druid.coordinator.loadqueuepeon.http.repeatDelay`
2024-05-01 08:06:00 +05:30
Abhishek Radhakrishnan 1d7595f3f7
Support for filters in the Druid Delta Lake connector (#16288)
* Delta Lake support for filters.

* Updates

* cleanup comments

* Docs

* Remmove Enclosed runner

* Rename

* Cleanup test

* Serde test for the Delta input source and fix jackson annotation.

* Updates and docs.

* Update error messages to be clearer

* Fixes

* Handle NumberFormatException to provide a nicer error message.

* Apply suggestions from code review

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* Doc fixes based on feedback

* Yes -> yes in docs; reword slightly.

* Update docs/ingestion/input-sources.md

Co-authored-by: Laksh Singla <lakshsingla@gmail.com>

* Update docs/ingestion/input-sources.md

Co-authored-by: Laksh Singla <lakshsingla@gmail.com>

* Documentation, javadoc and more updates.

* Not with an or expression end-to-end test.

* Break up =, >, >=, <, <= into its own types instead of sub-classing.

---------

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: Laksh Singla <lakshsingla@gmail.com>
2024-04-29 11:31:36 -07:00
Adithya Chakilam f8015eb02a
Add config lagAggregate to LagBasedAutoScalerConfig (#16334)
Changes:
- Add new config `lagAggregate` to `LagBasedAutoScalerConfig`
- Add field `aggregateForScaling` to `LagStats`
- Use the new field/config to determine which aggregate to use to compute lag
- Remove method `Supervisor.computeLagForAutoScaler()`
2024-04-29 22:20:41 +05:30
Gian Merlino db82adcdfd
SCALAR_IN_ARRAY: Optimization and behavioral follow-ups. (#16311)
* Four changes to scalar_in_array as follow-ups to #16306:

1) Align behavior for `null` scalars to the behavior of the native `in` and `inType` filters: return `true` if the array itself contains null, else return `null`.

2) Rename the class to more closely match the function name.

3) Add a specialization for constant arrays, where we build a `HashSet`.

4) Use `castForEqualityComparison` to properly handle cross-type comparisons.
   Additional tests verify comparisons between LONG and DOUBLE are now
   handled properly.

* Fix spelling.

* Adjustments from review.
2024-04-26 16:01:17 -07:00
Atul Mohan 77333e56fa
Docs: Add missing kafka emitter config (#16332) 2024-04-25 10:37:14 +05:30
Katya Macedo ceb6646dec
Add supervisor actions (#16276)
* Add supervisor actions

* Update text

* Update text

* Update after review

* Update after review
2024-04-24 13:14:01 -07:00
Rishabh Singh e30790e013
Introduce Segment Schema Publishing and Polling for Efficient Datasource Schema Building (#15817)
Issue: #14989

The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Thereafter, we addressed the problem of publishing schema for realtime segments (#15475). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information.

This is the final change which involves publishing segment schema for finalized segments from task and periodically polling them in the Coordinator.
2024-04-24 22:22:53 +05:30
Charles Smith 65412f80ab
remove additional column marks (#16319) 2024-04-22 19:41:54 -07:00
Sree Charan Manamala ad5701e891
new SCALAR_IN_ARRAY function analogous to DRUID_IN (#16306)
* scalar_in function

* api doc

* refactor
2024-04-18 21:15:15 -07:00
Gian Merlino 4285a5e2c6
Update documentation for exceptions to subquery limit. (#16295)
The true exception for groupBy is somewhat more narrow than the docs
suggest.
2024-04-17 21:04:43 -07:00
Hardik Bajaj 0bf5e7745d
Add configurable parameters for statsd client (#16283)
Statsd client sometimes drops metrics when this queueSize of statsd client with max unprocessed messages is completely full. This causes some high cardinality metrics like per partition lag being droppped.
There are multiple parameters of statsdclient that can be initialized and can help increase the load/capacity of client to not to drop metrics more frequently.
Properties like queueSize, poolSize, processorWorkers and senderWorkers will now be configurable at runtime
2024-04-17 18:35:31 +05:30
Nikhil Rao a805c5612e
Adds Druid SQL query examples for the Stats aggregator Native Queries (#16277)
* Adds Druid SQL query examples for the Timeseries and GroupBy Native queries in the stats aggregator docs page

* Updates intervals in Native Query to remove excess Time part in timestamp

* Moves Druid SQL section above Native query because sql used more often by users

* removes old Druid SQL sections

* Adds TopN Druid SQL query using ORDER BY and LIMIT

* Adds table for Druid SQL variance and standard deviation functions

* Update docs/development/extensions-core/stats.md

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>

---------

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
2024-04-15 08:05:34 -07:00
Adarsh Sanjeev 3df00aef9d
Add manifest file for MSQ export (#15953)
Currently, export creates the files at the provided destination. The addition of the manifest file will provide a list of files created as part of the manifest. This will allow easier consumption of the data exported from Druid, especially for automated data pipelines
2024-04-15 11:37:31 +05:30
Abhishek Radhakrishnan 041d0bff5e
Set default `KillUnusedSegments` duty to coordinator's indexing period & `killTaskSlotRatio` to 0.1 (#16247)
The default value for druid.coordinator.kill.period (if unspecified) has changed from P1D to the value of druid.coordinator.period.indexingPeriod. Operators can choose to override druid.coordinator.kill.period and that will take precedence over the default behavior.
The default value for the coordinator dynamic config killTaskSlotRatio is updated from 1.0 to 0.1. This ensures that that kill tasks take up only 1 task slot right out-of-the-box instead of taking up all the task slots.

* Remove stale comment and inline canDutyRun()

* druid.coordinator.kill.period defaults to druid.coordinator.period.indexingPeriod if not set.

- Remove the default P1D value for druid.coordinator.kill.period. Instead default
  druid.coordinator.kill.period to whatever value druid.coordinator.period.indexingPeriod is set
  to if the former config isn't specified.
- If druid.coordinator.kill.period is set, the value will take precedence over
  druid.coordinator.period.indexingPeriod

* Update server/src/test/java/org/apache/druid/server/coordinator/DruidCoordinatorConfigTest.java

* Fix checkstyle error

* Clarify comment

* Update server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinatorConfig.java

* Put back canDutyRun()

* Default killTaskSlotsRatio to 0.1 instead of 1.0 (all slots)

* Fix typo DEFAULT_MAX_COMPACTION_TASK_SLOTS

* Remove unused test method.

* Update default value of killTaskSlotsRatio in docs and web-console default mock

* Move initDuty() after params and config setup.
2024-04-14 18:56:17 -07:00
Katya Macedo 7f06a53cb1
[Docs] Fix API placeholder formatting (#16240) 2024-04-12 09:19:13 -07:00
YongGang da9feb4430
Introduce TaskContextReport for reporting task context (#16041)
Changes:
- Add `TaskContextEnricher` interface to improve task management and monitoring
- Invoke `enrichContext` in `TaskQueue.add()` whenever a new task is submitted to the Overlord
- Add `TaskContextReport` to write out task context information in reports
2024-04-12 08:57:49 +05:30
Pranav fc2600b8e2
Adding jvmVersion dimension in JVM Monitor (#16262) 2024-04-11 15:44:56 -07:00
317brian df9e1bb97b
Docs: Fix typo in tutorial (#16254) 2024-04-10 08:59:52 +05:30