Commit Graph

3284 Commits

Author SHA1 Message Date
Rishabh Singh 74422b58f5
Emit disk spill and merge buffer utilisation metrics for GroupBy queries (#17360)
This change is to emit following metrics as part of GroupByStatsMonitor monitor,
mergeBuffer/used -> Number of merge buffers used.
mergeBuffer/acquisitionTimeNs -> Total time required to acquire merge buffer.
mergeBuffer/acquisition -> Number of queries that acquired a batch of merge buffers.
groupBy/spilledQueries -> Number of queries that spilled onto the disk.
groupBy/spilledBytes-> Spilled bytes on the disk.
groupBy/mergeDictionarySize -> Size of the merging dictionary.
2024-11-22 14:22:03 +05:30
Katya Macedo bd93d0046d
Docs: update text and example (#17480)
* Docs: update text and example

* Update after review

* Update the spelling file

* Update text for clarity

* Update after review
2024-11-21 08:40:41 -08:00
Akshat Jain 17215cd677
Remove support for Java 8 (#17466)
All JDK 8 based CI checks have been removed.
    Images used in Dockerfile(s) have been updated to Java 17 based images.
    Documentation has been updated accordingly.
2024-11-21 15:33:08 +05:30
Adithya Chakilam 6f436301be
supervisor: make rejection periods work with stopTasksCount (#17442)
* kafka-indexing: Report consumer io time

* commit

* backward

* tests

* remove unwanted changes

* comments

* comments

* coverage

* change name

* fixes

* fixes

* comments
2024-11-18 13:12:24 -08:00
Katya Macedo 75d9ece665
Docs: update descriptions and default values (#17473) 2024-11-13 16:29:27 -08:00
Kiran Gadhave 1dbd005df6
updated docs with behavior for empty collections in pod template selector config (#17464) 2024-11-12 13:21:27 -08:00
zachjsh 1f3b1f85f9
Add documentation for Druids catalog extension (#17459)
* SQL syntax error should target USER persona

* * revert change to queryHandler and related tests, based on review comments

* * add test

* Add documentation for druid-catalog extension

* * fix error

* * fix error

* Apply suggestions from code review

Co-authored-by: Andreas Maechler <amaechler@gmail.com>

* * fix spelling error

* * fix spelling

---------

Co-authored-by: Andreas Maechler <amaechler@gmail.com>
2024-11-12 14:50:55 -05:00
Shekhar Prasad Rajak ae049a4bab
AWS Glue Catalog for Iceberg ingest extension (#17392)
* iceberg glue catalog dependencies added

* GlueIcebergCatalog added in druid module

* default version of iceberg glue catalog implementation - basics

* basic tests added

* removed dependecy iceberg-aws-bundle

* glue catalog support - docs update for iceberg

* Update IcebergDruidModule.java

* Update IcebergDruidModule.java

* updates in dependencies and warehousePath must be under catalogProp

* removed some dependencies - which not required

* only glue sdk added

* update license

* avro exclusion removed

* doc update

* doc update

* set the type to glue

* minor change

* minor change

* fixing codestyle

* checkstyle fixes

* checkstyle fixes

* checkstyle fixes

* dependency check fixes

* update pom for ignore warning for glue catalog

* compile scope needed - iceberg-aws and awssdk

* updates pom with comment

* minor change

* mvn dependency check in iceberg extension

* revert pom.xml changes

* aws sdk sts and s3 for gluecatalog initialize

* dependency check - ignore aws sdk s3 and sts

---------

Co-authored-by: SHEKHAR PRASAD RAJAK <shekhar_rajak@apple.com>
2024-11-10 18:43:55 -08:00
George Shiqi Wu 5764183d4e
k8s-based-ingestion: Wait for task lifecycles to enter RUNNING state before returning from KubernetesTaskRunner.start (#17446)
* Add a wait on start() for task lifecycle to go into running

* handle exceptions

* Fix logging messages

* Don't pass in the settable future as a arg

* add some unit tests
2024-11-08 11:13:35 -05:00
Virushade ba76264244
Update build documentation (#17444)
Add build instructions for developers
Follow up from issue #17375, add instructions solely for distribution profile. Note that this build command is mostly used by me, everyone is welcome to add further optimizations for a faster distribution build.

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>

* Update docs/development/build.md

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>

* Update docs/development/build.md

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>

---------

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
2024-11-04 18:31:46 -08:00
Ashwin Tumma d5bb7de5cf
Fix Map Lookup Introspection Endpoints and update doc for Globally Cached Lookups (#17436)
Map Lookup Introspection API endpoints /keys and /values no longer return an invalid JSON object.
Also, update documentation to clarify the version returned by the /version introspection endpoint.

---------

Co-authored-by: Ashwin Tumma <ashwin.tumma@salesforce.com>
2024-10-30 08:23:22 -07:00
Ashwin Tumma 1be2b852e9
[Kafka Ingestion Tutorial] Update docs for Schema Config (#17409)
Co-authored-by: Ashwin Tumma <ashwin.tumma@salesforce.com>
2024-10-29 08:23:20 -07:00
Adarsh Sanjeev b7c661b801
Make tempStorageDirectory configuration optional and rely on task dir instead (#17015)
Currently, durable storage and export both require configuring a temporary directory to be used using druid.export.storage.<connectorType>.tempLocalDir and druid.msq.intermediate.storage.tempDir.

Tasks on middle manager already have a configured temporary directory. This PR aims to reduce the configuration required by using the task directory as a default if it is not explicitly configured, thus reducing the number of configs that a user has to set.

Please note that preference would be given to the user configured, druid.*.storage.temp*Dir, on the tasks. If that is not configured, we then use the configured temporary directory.

Overlord and brokers also require storage connector configurations (for the durableStorageCleanerOverlordDuty and to fetch results of async queries respectively), but do not have a default temporary task directory. The configuration is still required for these services.
2024-10-29 13:36:59 +05:30
Benjamin Hopp b59317e42b
Fix typo in security.md (#17413)
No longer using Azure Blog storage, moving to Blobs instead.
2024-10-25 13:43:58 -07:00
Kashif Faraz 9dfb378711
Remove unused coordinator dynamic configs mergeSegmentsLimit, mergeBytesLimit (#17384)
* Remove unused coordinator dynamic configs

* Update docs and web-console
2024-10-22 09:03:46 +05:30
317brian d1b81f312a
docs: msq autocompaction (#16681)
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Co-authored-by: Vishesh Garg <vishesh.garg@imply.io>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2024-10-17 10:40:53 -07:00
Shivam Garg 6898a5a359
Removed Microsecond from Extract function (#17247) 2024-10-11 05:32:26 +02:00
anny-imply dca69c5761
update line in architecture md (#17289) 2024-10-08 11:51:47 -07:00
Charles Smith 5ed68622c3
[Docs] Update known issues for window functions (#17097)
* draft update to known issues

* Update known issues

Remove addressed known issues. Clarify the issue with SELECT * queries.
2024-10-08 08:47:13 -07:00
Edgar Melendrez a67a3c8e0a
[docs] update tutorial for Theta sketches (#16953)
* from start to step 3 of Ingest data using Theta sketche

* updated upto "Query the Theta sketch column"

* fixed sentence

* another typo

* using sql ingestion instead of batch-sql

* waiting for explanations on DS_THETA

* Revert "using sql ingestion instead of batch-sql"

This reverts commit b95fcb9b32.

* Revert "using sql ingestion instead of batch-sql"

This reverts commit b95fcb9b32.

* just copy and pasting to where I was

* updated tutorial

* fixing images, and removing unused

* slightly updating explanatio

* Update docs/tutorials/tutorial-sketches-theta.md

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* addressing comments in review

* made filter clause consitent with other instances

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-10-08 10:44:37 +08:00
317brian 9932f2e70a
docs: concurrent append and replace is gA (#17269) 2024-10-08 07:55:55 +05:30
Clint Wylie 04fe56835d
add druid.expressions.allowVectorizeFallback and default to false (#17248)
changes:

adds ExpressionProcessing.allowVectorizeFallback() and ExpressionProcessingConfig.allowVectorizeFallback(), defaulting to false until few remaining bugs can be fixed (mostly complex types and some odd interactions with mixed types)
add cannotVectorizeUnlessFallback functions to make it easy to toggle the default of this config, and easy to know what to delete when we remove it in the future
2024-10-05 12:42:42 +05:30
Charles Smith acd973273f
Docs: adds MSQ examples to front coded dict. migration (#17236)
* add msq example

* adjust json formatting
2024-10-03 16:33:34 -07:00
317brian 1fc82a96bd
docs: update future development blurbs (#16939)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2024-10-01 15:02:05 -07:00
Sree Charan Manamala 661614129e
Window Functions : Context Parameter to Enable Transfer of RACs over wire (#17150) 2024-09-28 08:04:22 +02:00
Victoria Lim 203d6345af
docs: Separate section on ingesting MVDs in migration guide (#17109) 2024-09-25 14:45:25 -07:00
Atul Mohan c1f8ae25b5
Support Iceberg ingestion from REST based catalogs (#17124)
Adds support to the iceberg input source to read from Iceberg REST Catalogs.
2024-09-23 22:13:24 -07:00
Adithya Chakilam 8eaac2c051
cgroup monitors: Add mem/disk/cpu usage metrics for V2 (#16905)
* cgroup monitors: Add mem/disk/cpu usage metrics for V2

* intellij inspection

* docs and checks

* fix-dos

* add comments

* comments
2024-09-23 20:32:01 -07:00
Sree Charan Manamala 67d361c9bf
Window Functions : Remove enable windowing flag (#17087) 2024-09-23 08:24:26 +02:00
Abhishek Radhakrishnan 635e418131
Support to parse numbers in text-based input formats (#17082)
Text-based input formats like csv and tsv currently parse inputs only as strings, following the RFC4180Parser spec).
To workaround this, the web-console and other tools need to further inspect the sample data returned to sample data returned by the Druid sampler API to parse them as numbers. 

This patch introduces a new optional config, tryParseNumbers, for the csv and tsv input formats. If enabled, any numbers present in the input will be parsed in the following manner -- long data type for integer types and double for floating-point numbers, and if parsing fails for whatever reason, the input is treated as a string. By default, this configuration is set to false, so numeric strings will be treated as strings.
2024-09-19 13:21:18 -07:00
Pranav d1bd6a8156
Update doc for allowedHeaders (#17045)
Update doc for allowedHeaders and make allowedHeaders more restrictive
2024-09-19 08:37:39 +05:30
Abhishek Radhakrishnan 39723e5401
Update note about `sys.tasks` table (#17096)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2024-09-18 11:02:45 -07:00
Edgar Melendrez 64a4d115c5
[Docs] adding admonition for div (#17093)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2024-09-17 13:54:49 -07:00
Katya Macedo 490211f2b1
Docs - update streaming ingestion terminology for Kafka and Kinesis (#17003) 2024-09-17 09:49:24 -07:00
Lasse Mammen 307b8e3357
feat: json_merge expression and sql function (#17081) 2024-09-17 18:27:34 +05:30
Victoria Lim 2e2f3cf66a
docs: Refresh docs for SQL input source (#17031)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-09-16 15:52:37 -07:00
Adithya Chakilam 6ef8d5d8e1
OshiSysMonitor: Add ability to skip emitting metrics (#16972)
* OshiSysMonitor: Add ability to skip emitting metrics

* comments

* static checks

* remove oshi
2024-09-12 11:32:31 -04:00
George Shiqi Wu 428f58cf15
Support maxColumnsToMerge in supervisor tuningConfig (#17030)
* support maxColumnsToMerge in supervisor specs

* remove log line

* fix style

* add docs

* fix unit tests
2024-09-11 18:00:13 -04:00
aho135 2427972c10
Implement segment range threshold for automatic query prioritization (#17009)
Implements threshold based automatic query prioritization using the time period of the actual segments scanned. This differs from the current implementation of durationThreshold which uses the duration in the user supplied query. There are some usability constraints with using durationThreshold from the user supplied query, especially when using SQL. For example, if a client does not explicitly specify both start and end timestamps then the duration is extremely large and will always exceed the configured durationThreshold. This is one example interval from a query that specifies no end timestamp:
"interval":["2024-08-30T08:05:41.944Z/146140482-04-24T15:36:27.903Z"]. This interval is generated from a query like SELECT * FROM table WHERE __time > CURRENT_TIMESTAMP - INTERVAL '15' HOUR. Using the time period of the actual segments scanned allows proper prioritization without explicitly having to specify start and end timestamps. This PR adds onto #9493
2024-09-10 15:01:52 +05:30
Abhishek Radhakrishnan aa833a711c
Support for reading Delta Lake table snapshots (#17004)
Problem
Currently, the delta input source only supports reading from the latest snapshot of the given Delta Lake table. This is a known documented limitation.

Description
Add support for reading Delta snapshot. By default, the Druid-Delta connector reads the latest snapshot of the Delta table in order to preserve compatibility. Users can specify a snapshotVersion to ingest change data events from Delta tables into Druid.

In the future, we can also add support for time-based snapshot reads. The Delta API to read time-based snapshots is not clear currently.
2024-09-09 14:12:48 +05:30
Edgar Melendrez 48a758ee08
[docs] reverting changes for sql-functions.md (#17019) 2024-09-06 16:07:32 -07:00
Katya Macedo 94b0705109
Docs - Update the architecture diagram (#17007) 2024-09-06 12:21:27 -07:00
Edgar Melendrez 2d9e92ce78
[docs] Batch11 date and time functions (#16926)
* first draft of functions

* minor improvments

* Update docs/querying/sql-functions.md

* Update docs/querying/sql-scalar.md

* Apply suggestions from code review

Accepted as is

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* applying next round of suggestions

* fixing missing column name

* addressing floor and ceil functions

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* re-wording TIMESTAMPADD

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2024-09-06 12:20:47 -07:00
Edgar Melendrez ed811262e3
[docs] Batch13 IP functions (#16947)
* new datasource

* reviewing before pr

* Update docs/querying/sql-functions.md

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Applying suggestions to IPV4_PARSE

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-09-06 12:19:36 -07:00
Virushade 476b205efa
Docs: Fix language in Schema Design docs (#17010) 2024-09-06 08:48:00 +05:30
Edgar Melendrez c49dc83b22
[docs] batch 12: reduction functions (#16930)
* [docs] batch 12: reduction functions

* Update docs/querying/sql-functions.md

* Update docs/querying/sql-functions.md

* Update docs/querying/sql-functions.md

* applying suggestions

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2024-09-05 17:02:45 -07:00
Jill Osborne b4d83a86c2
Middle Manager wording update in docs (#17005) 2024-09-05 10:25:30 -07:00
Hugh Evans 9162339fa8
Replace dsql instructions in example (#16977) 2024-09-04 12:45:58 -07:00
Katya Macedo 03c37b3143
Fix spelling (#17001) 2024-09-04 13:33:17 -04:00
Hardik Bajaj 2ef936be40
Update Documentation on meregeBuffer/pendingRequests for Real-time nodes (#16992)
#15025 adds mergeBuffer/pendingRequests metric in QueryCountStatsMonitor. Since real-time nodes also use the same merge buffers for queries and have QueryCountStatsMonitor , the documentation is being updated to include this metric.
2024-09-04 00:25:09 +05:30