druid

Commit Graph

Author	SHA1	Message	Date
Rishabh Singh	74422b58f5	Emit disk spill and merge buffer utilisation metrics for GroupBy queries (#17360 ) This change is to emit following metrics as part of GroupByStatsMonitor monitor, mergeBuffer/used -> Number of merge buffers used. mergeBuffer/acquisitionTimeNs -> Total time required to acquire merge buffer. mergeBuffer/acquisition -> Number of queries that acquired a batch of merge buffers. groupBy/spilledQueries -> Number of queries that spilled onto the disk. groupBy/spilledBytes-> Spilled bytes on the disk. groupBy/mergeDictionarySize -> Size of the merging dictionary.	2024-11-22 14:22:03 +05:30
Katya Macedo	bd93d0046d	Docs: update text and example (#17480 ) * Docs: update text and example * Update after review * Update the spelling file * Update text for clarity * Update after review	2024-11-21 08:40:41 -08:00
Akshat Jain	17215cd677	Remove support for Java 8 (#17466 ) All JDK 8 based CI checks have been removed. Images used in Dockerfile(s) have been updated to Java 17 based images. Documentation has been updated accordingly.	2024-11-21 15:33:08 +05:30
Adithya Chakilam	6f436301be	supervisor: make rejection periods work with stopTasksCount (#17442 ) * kafka-indexing: Report consumer io time * commit * backward * tests * remove unwanted changes * comments * comments * coverage * change name * fixes * fixes * comments	2024-11-18 13:12:24 -08:00
Katya Macedo	75d9ece665	Docs: update descriptions and default values (#17473 )	2024-11-13 16:29:27 -08:00
Kiran Gadhave	1dbd005df6	updated docs with behavior for empty collections in pod template selector config (#17464 )	2024-11-12 13:21:27 -08:00
zachjsh	1f3b1f85f9	Add documentation for Druids catalog extension (#17459 ) * SQL syntax error should target USER persona * * revert change to queryHandler and related tests, based on review comments * * add test * Add documentation for druid-catalog extension * * fix error * * fix error * Apply suggestions from code review Co-authored-by: Andreas Maechler <amaechler@gmail.com> * * fix spelling error * * fix spelling --------- Co-authored-by: Andreas Maechler <amaechler@gmail.com>	2024-11-12 14:50:55 -05:00
Shekhar Prasad Rajak	ae049a4bab	AWS Glue Catalog for Iceberg ingest extension (#17392 ) * iceberg glue catalog dependencies added * GlueIcebergCatalog added in druid module * default version of iceberg glue catalog implementation - basics * basic tests added * removed dependecy iceberg-aws-bundle * glue catalog support - docs update for iceberg * Update IcebergDruidModule.java * Update IcebergDruidModule.java * updates in dependencies and warehousePath must be under catalogProp * removed some dependencies - which not required * only glue sdk added * update license * avro exclusion removed * doc update * doc update * set the type to glue * minor change * minor change * fixing codestyle * checkstyle fixes * checkstyle fixes * checkstyle fixes * dependency check fixes * update pom for ignore warning for glue catalog * compile scope needed - iceberg-aws and awssdk * updates pom with comment * minor change * mvn dependency check in iceberg extension * revert pom.xml changes * aws sdk sts and s3 for gluecatalog initialize * dependency check - ignore aws sdk s3 and sts --------- Co-authored-by: SHEKHAR PRASAD RAJAK <shekhar_rajak@apple.com>	2024-11-10 18:43:55 -08:00
George Shiqi Wu	5764183d4e	k8s-based-ingestion: Wait for task lifecycles to enter RUNNING state before returning from KubernetesTaskRunner.start (#17446 ) * Add a wait on start() for task lifecycle to go into running * handle exceptions * Fix logging messages * Don't pass in the settable future as a arg * add some unit tests	2024-11-08 11:13:35 -05:00
Virushade	ba76264244	Update build documentation (#17444 ) Add build instructions for developers Follow up from issue #17375, add instructions solely for distribution profile. Note that this build command is mostly used by me, everyone is welcome to add further optimizations for a faster distribution build. Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> * Update docs/development/build.md Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> * Update docs/development/build.md Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> --------- Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>	2024-11-04 18:31:46 -08:00
Ashwin Tumma	d5bb7de5cf	Fix Map Lookup Introspection Endpoints and update doc for Globally Cached Lookups (#17436 ) Map Lookup Introspection API endpoints /keys and /values no longer return an invalid JSON object. Also, update documentation to clarify the version returned by the /version introspection endpoint. --------- Co-authored-by: Ashwin Tumma <ashwin.tumma@salesforce.com>	2024-10-30 08:23:22 -07:00
Ashwin Tumma	1be2b852e9	[Kafka Ingestion Tutorial] Update docs for Schema Config (#17409 ) Co-authored-by: Ashwin Tumma <ashwin.tumma@salesforce.com>	2024-10-29 08:23:20 -07:00
Adarsh Sanjeev	b7c661b801	Make tempStorageDirectory configuration optional and rely on task dir instead (#17015 ) Currently, durable storage and export both require configuring a temporary directory to be used using druid.export.storage.<connectorType>.tempLocalDir and druid.msq.intermediate.storage.tempDir. Tasks on middle manager already have a configured temporary directory. This PR aims to reduce the configuration required by using the task directory as a default if it is not explicitly configured, thus reducing the number of configs that a user has to set. Please note that preference would be given to the user configured, druid..storage.tempDir, on the tasks. If that is not configured, we then use the configured temporary directory. Overlord and brokers also require storage connector configurations (for the durableStorageCleanerOverlordDuty and to fetch results of async queries respectively), but do not have a default temporary task directory. The configuration is still required for these services.	2024-10-29 13:36:59 +05:30
Benjamin Hopp	b59317e42b	Fix typo in security.md (#17413 ) No longer using Azure Blog storage, moving to Blobs instead.	2024-10-25 13:43:58 -07:00
Kashif Faraz	9dfb378711	Remove unused coordinator dynamic configs mergeSegmentsLimit, mergeBytesLimit (#17384 ) * Remove unused coordinator dynamic configs * Update docs and web-console	2024-10-22 09:03:46 +05:30
317brian	d1b81f312a	docs: msq autocompaction (#16681 ) Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Vishesh Garg <vishesh.garg@imply.io> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-10-17 10:40:53 -07:00
Shivam Garg	6898a5a359	Removed Microsecond from Extract function (#17247 )	2024-10-11 05:32:26 +02:00
anny-imply	dca69c5761	update line in architecture md (#17289 )	2024-10-08 11:51:47 -07:00
Charles Smith	5ed68622c3	[Docs] Update known issues for window functions (#17097 ) * draft update to known issues * Update known issues Remove addressed known issues. Clarify the issue with SELECT * queries.	2024-10-08 08:47:13 -07:00
Edgar Melendrez	a67a3c8e0a	[docs] update tutorial for Theta sketches (#16953 ) * from start to step 3 of Ingest data using Theta sketche * updated upto "Query the Theta sketch column" * fixed sentence * another typo * using sql ingestion instead of batch-sql * waiting for explanations on DS_THETA * Revert "using sql ingestion instead of batch-sql" This reverts commit `b95fcb9b32`. * Revert "using sql ingestion instead of batch-sql" This reverts commit `b95fcb9b32`. * just copy and pasting to where I was * updated tutorial * fixing images, and removing unused * slightly updating explanatio * Update docs/tutorials/tutorial-sketches-theta.md * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * addressing comments in review * made filter clause consitent with other instances * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> --------- Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-10-08 10:44:37 +08:00
317brian	9932f2e70a	docs: concurrent append and replace is gA (#17269 )	2024-10-08 07:55:55 +05:30
Clint Wylie	04fe56835d	add druid.expressions.allowVectorizeFallback and default to false (#17248 ) changes: adds ExpressionProcessing.allowVectorizeFallback() and ExpressionProcessingConfig.allowVectorizeFallback(), defaulting to false until few remaining bugs can be fixed (mostly complex types and some odd interactions with mixed types) add cannotVectorizeUnlessFallback functions to make it easy to toggle the default of this config, and easy to know what to delete when we remove it in the future	2024-10-05 12:42:42 +05:30
Charles Smith	acd973273f	Docs: adds MSQ examples to front coded dict. migration (#17236 ) * add msq example * adjust json formatting	2024-10-03 16:33:34 -07:00
317brian	1fc82a96bd	docs: update future development blurbs (#16939 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-10-01 15:02:05 -07:00
Sree Charan Manamala	661614129e	Window Functions : Context Parameter to Enable Transfer of RACs over wire (#17150 )	2024-09-28 08:04:22 +02:00
Victoria Lim	203d6345af	docs: Separate section on ingesting MVDs in migration guide (#17109 )	2024-09-25 14:45:25 -07:00
Atul Mohan	c1f8ae25b5	Support Iceberg ingestion from REST based catalogs (#17124 ) Adds support to the iceberg input source to read from Iceberg REST Catalogs.	2024-09-23 22:13:24 -07:00
Adithya Chakilam	8eaac2c051	cgroup monitors: Add mem/disk/cpu usage metrics for V2 (#16905 ) * cgroup monitors: Add mem/disk/cpu usage metrics for V2 * intellij inspection * docs and checks * fix-dos * add comments * comments	2024-09-23 20:32:01 -07:00
Sree Charan Manamala	67d361c9bf	Window Functions : Remove enable windowing flag (#17087 )	2024-09-23 08:24:26 +02:00
Abhishek Radhakrishnan	635e418131	Support to parse numbers in text-based input formats (#17082 ) Text-based input formats like csv and tsv currently parse inputs only as strings, following the RFC4180Parser spec). To workaround this, the web-console and other tools need to further inspect the sample data returned to sample data returned by the Druid sampler API to parse them as numbers. This patch introduces a new optional config, tryParseNumbers, for the csv and tsv input formats. If enabled, any numbers present in the input will be parsed in the following manner -- long data type for integer types and double for floating-point numbers, and if parsing fails for whatever reason, the input is treated as a string. By default, this configuration is set to false, so numeric strings will be treated as strings.	2024-09-19 13:21:18 -07:00
Pranav	d1bd6a8156	Update doc for allowedHeaders (#17045 ) Update doc for allowedHeaders and make allowedHeaders more restrictive	2024-09-19 08:37:39 +05:30
Abhishek Radhakrishnan	39723e5401	Update note about `sys.tasks` table (#17096 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-09-18 11:02:45 -07:00
Edgar Melendrez	64a4d115c5	[Docs] adding admonition for div (#17093 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-09-17 13:54:49 -07:00
Katya Macedo	490211f2b1	Docs - update streaming ingestion terminology for Kafka and Kinesis (#17003 )	2024-09-17 09:49:24 -07:00
Lasse Mammen	307b8e3357	feat: json_merge expression and sql function (#17081 )	2024-09-17 18:27:34 +05:30
Victoria Lim	2e2f3cf66a	docs: Refresh docs for SQL input source (#17031 ) Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-09-16 15:52:37 -07:00
Adithya Chakilam	6ef8d5d8e1	OshiSysMonitor: Add ability to skip emitting metrics (#16972 ) * OshiSysMonitor: Add ability to skip emitting metrics * comments * static checks * remove oshi	2024-09-12 11:32:31 -04:00
George Shiqi Wu	428f58cf15	Support maxColumnsToMerge in supervisor tuningConfig (#17030 ) * support maxColumnsToMerge in supervisor specs * remove log line * fix style * add docs * fix unit tests	2024-09-11 18:00:13 -04:00
aho135	2427972c10	Implement segment range threshold for automatic query prioritization (#17009 ) Implements threshold based automatic query prioritization using the time period of the actual segments scanned. This differs from the current implementation of durationThreshold which uses the duration in the user supplied query. There are some usability constraints with using durationThreshold from the user supplied query, especially when using SQL. For example, if a client does not explicitly specify both start and end timestamps then the duration is extremely large and will always exceed the configured durationThreshold. This is one example interval from a query that specifies no end timestamp: "interval":["2024-08-30T08:05:41.944Z/146140482-04-24T15:36:27.903Z"]. This interval is generated from a query like SELECT * FROM table WHERE __time > CURRENT_TIMESTAMP - INTERVAL '15' HOUR. Using the time period of the actual segments scanned allows proper prioritization without explicitly having to specify start and end timestamps. This PR adds onto #9493	2024-09-10 15:01:52 +05:30
Abhishek Radhakrishnan	aa833a711c	Support for reading Delta Lake table snapshots (#17004 ) Problem Currently, the delta input source only supports reading from the latest snapshot of the given Delta Lake table. This is a known documented limitation. Description Add support for reading Delta snapshot. By default, the Druid-Delta connector reads the latest snapshot of the Delta table in order to preserve compatibility. Users can specify a snapshotVersion to ingest change data events from Delta tables into Druid. In the future, we can also add support for time-based snapshot reads. The Delta API to read time-based snapshots is not clear currently.	2024-09-09 14:12:48 +05:30
Edgar Melendrez	48a758ee08	[docs] reverting changes for sql-functions.md (#17019 )	2024-09-06 16:07:32 -07:00
Katya Macedo	94b0705109	Docs - Update the architecture diagram (#17007 )	2024-09-06 12:21:27 -07:00
Edgar Melendrez	2d9e92ce78	[docs] Batch11 date and time functions (#16926 ) * first draft of functions * minor improvments * Update docs/querying/sql-functions.md * Update docs/querying/sql-scalar.md * Apply suggestions from code review Accepted as is Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * applying next round of suggestions * fixing missing column name * addressing floor and ceil functions * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * re-wording TIMESTAMPADD --------- Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2024-09-06 12:20:47 -07:00
Edgar Melendrez	ed811262e3	[docs] Batch13 IP functions (#16947 ) * new datasource * reviewing before pr * Update docs/querying/sql-functions.md * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Applying suggestions to IPV4_PARSE --------- Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-09-06 12:19:36 -07:00
Virushade	476b205efa	Docs: Fix language in Schema Design docs (#17010 )	2024-09-06 08:48:00 +05:30
Edgar Melendrez	c49dc83b22	[docs] batch 12: reduction functions (#16930 ) * [docs] batch 12: reduction functions * Update docs/querying/sql-functions.md * Update docs/querying/sql-functions.md * Update docs/querying/sql-functions.md * applying suggestions * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> --------- Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2024-09-05 17:02:45 -07:00
Jill Osborne	b4d83a86c2	Middle Manager wording update in docs (#17005 )	2024-09-05 10:25:30 -07:00
Hugh Evans	9162339fa8	Replace dsql instructions in example (#16977 )	2024-09-04 12:45:58 -07:00
Katya Macedo	03c37b3143	Fix spelling (#17001 )	2024-09-04 13:33:17 -04:00
Hardik Bajaj	2ef936be40	Update Documentation on meregeBuffer/pendingRequests for Real-time nodes (#16992 ) #15025 adds mergeBuffer/pendingRequests metric in QueryCountStatsMonitor. Since real-time nodes also use the same merge buffers for queries and have QueryCountStatsMonitor , the documentation is being updated to include this metric.	2024-09-04 00:25:09 +05:30

1 2 3 4 5 ...

3284 Commits