druid

Commit Graph

Author	SHA1	Message	Date
Charles Smith	2a42b11660	remove legacy Jupyter tutorial files (#15834 ) * remove legacy files * redirection for the jupyter tutorial page * remove tutorial from sidebar * remove redirection	2024-02-12 13:45:47 -08:00
Abhishek Radhakrishnan	51fd79ee58	Clean up kafka emitter tests, add more validations and code coverage. (#15878 ) * Clean up kafka emitter tests a bit and add more validations. The test wasn't validating what events were sent, but simply the dropped counters, which aren't that useful. Additionally, this module has fewer tests, so folks often run into code coverage issue in this extension. Hopefully this change helps with that too. * Change things to feed-based rather than topic-based. * Another test for shared topic * Switch to DruidException, add test dependencies and sad path config tests. * missing test dependency * minor renames. * Add more tests - to test unknown events and drop when queue is full	2024-02-12 16:22:19 -05:00
Gian Merlino	7fea34abdd	LOOKUP docs: clarify behavior of replaceMissingValueWith. (#15879 ) Clarify behavior when expr is null.	2024-02-11 13:11:00 -08:00
zachjsh	f9ee2c353b	Extend the PARTITION BY clause to accept string literals for the time partitioning (#15836 ) This PR contains a portion of the changes from the inactive draft PR for integrating the catalog with the Calcite planner https://github.com/apache/druid/pull/13686 from @paul-rogers, extending the PARTITION BY clause to accept string literals for the time partitioning	2024-02-09 11:45:38 -05:00
Vishesh Garg	6e9eee4c5f	Add failure check (#15873 )	2024-02-09 08:27:10 -08:00
Lasse Mammen	4255711b3e	fix: handle BOOKMARK events in kubernetes pod discovery (#15819 )	2024-02-09 18:50:04 +05:30
Tom	11a8624ef1	allow for kafka-emitter to have extra dimensions be set for each event it emits (#15845 ) * allow for kafka-emitter to have extra dimensions be set for each event it emits * fix checktsyle issue in kafkaemitterconfig * make changes to fix docs, and cleanup copy paste error in #toString() * undo formatting to markdown table * add more branches so test passes * fix checkstyle issue	2024-02-08 22:55:24 -08:00
George Shiqi Wu	d703b2c709	Add azure kill test (#15833 ) * Add kill test * Extra line * Don't need toString * Add back test * Remove newline * move kill verification into main test	2024-02-08 16:15:30 -05:00
Sree Charan Manamala	57e12df352	Sql Single Value Aggregator for scalar queries (#15700 ) Executing single value correlated queries will throw an exception today since single_value function is not available in druid. With these added classes, this provides druid, the capability to plan and run such queries.	2024-02-08 19:20:30 +05:30
Soumyava	f3996b96ff	Fixes for safe_divide with vectorize and datatypes (#15839 ) * Fix for save_divide with vectorize * More fixes * Update to use expr.eval(null) for both cases when denominator is 0	2024-02-08 14:40:42 +05:30
Abhishek Radhakrishnan	1a5b57df84	Update `groupId` for delta-lake and iceberg extensions (#15843 ) * Update the group id to org.apache.druid.extensions.contrib for contrib exts. * Note iceberg and delta lake extensions in extensions.md * properties and shell backticks * Update groupId in distribution/pom.xml * remove delta-lake from dist. * Add note on downloading extension.	2024-02-07 23:54:06 -08:00
Vadim Ogievetsky	26815d425b	Web console: add system fields UI (#15858 ) This PR adds console support for configuring system fields in the batch data loader.	2024-02-08 11:08:55 +05:30
Gian Merlino	21a97f4c61	Fix HllSketchHolderObjectStrategy#isSafeToConvertToNullSketch. (#15860 ) * Fix HllSketchHolderObjectStrategy#isSafeToConvertToNullSketch. The prior code from #15162 was reading only the low-order byte of an int representing the size of a coupon set. As a result, it would erroneously believe that a coupon set with a multiple of 256 elements was empty.	2024-02-08 08:14:28 +05:30
Adarsh Sanjeev	514b3b4d01	Add export capabilities to MSQ with SQL syntax (#15689 ) * Add test * Parser changes to support export statements * Fix builds * Address comments * Add frame processor * Address review comments * Fix builds * Update syntax * Webconsole workaround * Refactor * Refactor * Change export file path * Update docs * Remove webconsole changes * Fix spelling mistake * Parser changes, add tests * Parser changes, resolve build warnings * Fix failing test * Fix failing test * Fix IT tests * Add tests * Cleanup * Fix unparse * Fix forbidden API * Update docs * Update docs * Address review comments * Address review comments * Fix tests * Address review comments * Fix insert unparse * Add external write resource action * Fix tests * Add resource check to overlord resource * Fix tests * Add IT * Update syntax * Update tests * Update permission * Address review comments * Address review comments * Address review comments * Add tests * Add check for runtime parameter for bucket and path * Add check for runtime parameter for bucket and path * Add tests * Update docs * Fix NPE * Update docs, remove deadcode * Fix formatting	2024-02-07 22:08:50 +05:30
Vadim Ogievetsky	f2b242b6e6	update console to core Druid changes (#15854 )	2024-02-07 19:44:25 +05:30
Clint Wylie	23d4fade90	use NullFilter for SQL rewrite of MV_CONTAINS and MV_OVERLAP for null array elements (#15855 ) Fixes an oversight after #14542 that happens in the SQL planner rewrite of MV_CONTAINS and MV_OVERLAP when faced with array elements that are NULL, where we were incorrectly using EqualityFilter instead of NullFilter for null elements (EqualityFilter does not accept null elements).	2024-02-07 19:40:41 +05:30
Bartosz Mikulski	45c26e8682	Fix Inspection Check in DirectDruidClientTest (#15857 )	2024-02-07 02:56:26 -08:00
Zoltan Haindrich	fdc7cec271	Support Window operators in decoupled planning (#15815 )	2024-02-07 04:09:48 -05:00
Bartosz Mikulski	43a1c96cd1	Fix query-cancellation-executor memory leak (#15754 ) This PR fixes #15069 by resolving a memory leak caused by a thread leak in query-cancellation-executor.	2024-02-07 10:54:38 +05:30
Pramod Immaneni	59bca0951a	Parallelize storage of incremental segments (#13982 ) During ingestion, incremental segments are created in memory for the different time chunks and persisted to disk when certain thresholds are reached (max number of rows, max memory, incremental persist period etc). In the case where there are a lot of dimension and metrics (1000+) it was observed that the creation/serialization of incremental segment file format for persistence and persisting the file took a while and it was blocking ingestion of new data. This affected the real-time ingestion. This serialization and persistence can be parallelized across the different time chunks. This update aims to do that. The patch adds a simple configuration parameter to the ingestion tuning configuration to specify number of persistence threads. The default value is 1 if it not specified which makes it the same as it is today.	2024-02-07 10:43:05 +05:30
Sam Wheating	4c58856f10	Fix incorrect ordering of args in log statement (#15846 )	2024-02-06 16:12:04 -08:00
Abhishek Radhakrishnan	1affa35b29	Bump up Delta Lake Kernel to 3.1.0 (#15842 ) This patch bumps Delta Lake Kernel dependency from 3.0.0 to 3.1.0, which released last week - please see https://github.com/delta-io/delta/releases/tag/v3.1.0 for release notes. There were a few "breaking" API changes in 3.1.0, you can find the rationale for some of those changes here. Next-up in this extension: add and expose filter predicates.	2024-02-06 21:25:17 +05:30
317brian	2dc71c7874	docs: fix rendering (#15835 )	2024-02-06 07:18:43 -08:00
Gian Merlino	54b30646f3	Add sqlReverseLookupThreshold for ReverseLookupRule. (#15832 ) If lots of keys map to the same value, reversing a LOOKUP call can slow things down unacceptably. To protect against this, this patch introduces a parameter sqlReverseLookupThreshold representing the maximum size of an IN filter that will be created as part of lookup reversal. If inSubQueryThreshold is set to a smaller value than sqlReverseLookupThreshold, then inSubQueryThreshold will be used instead. This allows users to use that single parameter to control IN sizes if they wish.	2024-02-06 16:32:05 +05:30
Soumyava	b86f31f2c0	Addressing shapeshifting issues with window functions (#15807 ) Addressing shapeshifting issues with window functions	2024-02-06 11:12:20 +05:30
Zoltan Haindrich	392d585ff8	Identify not range filters without negating subexpressions (#15766 ) * Identify not range filters without negating subexpressions Earlier betweenish (range/bounds) filters were identified thru a process of negating the subexpressions which may have not performed that well. (it could have dominated the runtime in some cases) This patch makes that unnecessary as its able to create the negate expression directly. * add test;fix for multiple intervals	2024-02-05 19:12:58 -08:00
George Shiqi Wu	edb1ac1b71	Update azure console tile (#15820 ) * Save web console changes * Working new input type * fix tests	2024-02-05 13:11:39 -08:00
Clint Wylie	358892e5b0	add nested array index support, fix some bugs (#15752 ) This PR wires up ValueIndexes and ArrayElementIndexes for nested arrays, ValueIndexes for nested long and double columns, and fixes a handful of bugs I found after adding nested columns to the filter test gauntlet.	2024-02-05 15:12:09 +05:30
Laksh Singla	ee78a0367d	Fix serialization bug in PassthroughAggregatorFactory (#15830 ) PassthroughAggregatorFactory overrides a deprecated method in the AggregatorFactory, on which it relies on for serializing one of its fields complexTypeName. This was accidentally removed, leading to a bug in the factory, where the type name doesn't get serialized properly, and places null in the type name. This PR revives that method with a different name and adds tests for the same.	2024-02-05 15:11:10 +05:30
Rishabh Singh	de959e513d	Add QueryLifecycle#authorize for grpc-query-extension (#15816 ) Proposal #13469 Original PR #14024 A new method is being added in QueryLifecycle class to authorise a query based on authentication result. This method is required since we authenticate the query by intercepting it in the grpc extension and pass down the authentication result.	2024-02-02 21:49:57 +05:30
Zoltan Haindrich	8f5b7522c7	Strict window frame checks (#15746 ) introduce checks to ensure that window frame is supported added check to ensure that no expressions are set as bounds added logic to detect following/following like cases - described in Window function fails to demarcate if 2 following are used #15739 currently RANGE frames are only supported correctly if both endpoints are unbounded or current row Offset based window range support #15767 added windowingStrictValidation context key to provide a way to override the check	2024-02-02 16:21:53 +05:30
Atul Mohan	2e46a98024	Add range filtering support for iceberg ingestion (#15782 ) * Add range filtering support for iceberg ingestion * Docs formatting * Spelling	2024-02-01 23:32:30 -08:00
Aru Raghuwanshi	223f29d64c	Update input-sources.md for fixing the warehouse path example under S3 (#15823 )	2024-02-01 23:32:05 -08:00
317brian	6d617c34d2	docs: revise concurrent append and replace (#15760 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-02-01 11:03:36 -08:00
PANKAJ KUMAR	65857dc0e7	pac4j: fix incompatible dependencies + authorization regression (#15753 ) - After upgrading the pac4j version in: https://github.com/apache/druid/pull/15522. We were not able to access the druid ui. - Upgraded the Nimbus libraries version to a compatible version to pac4j. - In the older pac4j version, when we return RedirectAction there we also update the webcontext Response status code and add the authentication URL to the header. But in the newer pac4j version, we just simply return the RedirectAction. So that's why it was not getting redirected to the generated authentication URL. - To fix the above, I have updated the NOOP_HTTP_ACTION_ADAPTER to JEE_HTTP_ACTION_ADAPTER and it updates the HTTP Response in context as per the HTTP Action.	2024-02-01 09:35:23 -08:00
George Shiqi Wu	50bae96e8b	Add azure integrationt ests (#15799 )	2024-02-01 09:18:49 -05:00
Vishesh Garg	5de39c6251	Resolve CVE issues (#15814 ) * Resolve CVE issues * Update license.yaml	2024-02-01 14:10:12 +05:30
Laksh Singla	7d65caf0c5	Update the docs for EARLIEST_BY/LATEST_BY aggregators with the newly added numeric capabilities (#15670 )	2024-02-01 10:24:43 +05:30
Vadim Ogievetsky	fcd65c9801	Web console: use arrayIngestMode: array (#15588 ) * Adapt to new array mode * Feedback fixes * fixing type detection and highlighting * goodies * add docs * feedback fixes * finish array work * update snapshots * typo fix * color fixes * small fix * make MVDs default for now * better sqlStringifyArrays default * fix spec converter * fix tests	2024-01-31 20:19:29 -08:00
George Shiqi Wu	5edfa9429f	Batch kill in azure (#15770 ) * Multi kill * add some unit tests * Fix param * Fix deleteBatchFiles * Fix unit tests * Add tests * Save work on batch kill * add tests * Fix unit tests * Update extensions-core/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureDataSegmentKiller.java Co-authored-by: Suneet Saldanha <suneet@apache.org> * Fix unit tests * Update extensions-core/azure-extensions/src/test/java/org/apache/druid/storage/azure/AzureStorageTest.java Co-authored-by: Suneet Saldanha <suneet@apache.org> * fix test * fix test * Add test --------- Co-authored-by: Suneet Saldanha <suneet@apache.org>	2024-01-31 13:41:15 -05:00
Vadim Ogievetsky	0089f6b905	Web console: Don't force waitUntilSegmentLoad to true (#15781 ) * Don't force setting waitUntilSegmentsLoad * delete irrelevant code	2024-01-31 16:16:36 +05:30
Vishesh Garg	37d1650ccf	Benchmark for query planning time for IN queries (#15688 ) Adds a set of benchmark queries for measuring the planning time with the IN operator. Current results indicate that with the recent optimizations, the IN planning time with 100K expressions in the IN clause is just 3s and with 1M is 46s. For IN clause paired with OR <col>=<val> expr, the numbers are 10s and 155s for 100K and 1M, resp.	2024-01-31 15:40:31 +05:30
Vishesh Garg	2a250a4e6e	Fix GHA logs dir and make tar and upload conditional on web console test failures (#15810 ) The PR makes 2 change: Correct the current logs directory tarred in GHA static checks to log Make the steps of logs tar-ing and uploading conditional on web console test failures, which currently happens on any step failure in static checks workflow Sample logs before this change for failed static checks: https://github.com/apache/druid/actions/runs/7719743853/job/21043502498	2024-01-31 15:39:56 +05:30
Zoltan Haindrich	f701197224	Enable ArrayListRowsAndColumns to StorageAdapter conversion (#15735 )	2024-01-31 02:36:58 -05:00
Abhishek Radhakrishnan	9f95a691f7	Extension to read and ingest Delta Lake tables (#15755 ) * something * test commit * compilation fix * more compilation fixes (fixme placeholders) * Comment out druid-kereberos build since it conflicts with newly added transitive deps from delta-lake Will need to sort out the dependencies later. * checkpoint * remove snapshot schema since we can get schema from the row * iterator bug fix * json json json * sampler flow * empty impls for read(InputStats) and sample() * conversion? * conversion, without timestamp * Web console changes to show Delta Lake * Asset bug fix and tile load * Add missing pieces to input source info, etc. * fix stuff * Use a different delta lake asset * Delta lake extension dependencies * Cleanup * Add InputSource, module init and helper code to process delta files. * Test init * Checkpoint changes * Test resources and updates * some fixes * move to the correct package * More tests * Test cleanup * TODOs * Test updates * requirements and javadocs * Adjust dependencies * Update readme * Bump up version * fixup typo in deps * forbidden api and checkstyle checks * Trim down dependencies * new lines * Fixup Intellij inspections. * Add equals() and hashCode() * chain splits, intellij inspections * review comments and todo placeholder * fix up some docs * null table path and test dependencies. Fixup broken link. * run prettify * Different test; fixes * Upgrade pyspark and delta-spark to latest (3.5.0 and 3.0.0) and regenerate tests * yank the old test resource. * add a couple of sad path tests * Updates to readme based on latest. * Version support * Extract Delta DateTime converstions to DeltaTimeUtils class and add test * More comprehensive split tests. * Some test renames. * Cleanup and update instructions. * add pruneSchema() optimization for table scans. * Oops, missed the parquet files. * Update default table and rename schema constants. * Test setup and misc changes. * Add class loader logic as the context class loader is unaware about extension classes * change some table client creation logic. * Add hadoop-aws, hadoop-common and related exclusions. * Remove org.apache.hadoop:hadoop-common * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Add entry to .spelling to fix docs static check --------- Co-authored-by: abhishekagarwal87 <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Laksh Singla <lakshsingla@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-01-30 21:53:50 -08:00
Benjamin Hopp	6177f6efd7	Fixing formatting of Iceberg Catalog Object (#15748 )	2024-01-30 20:17:38 -08:00
AmatyaAvadhanula	d9e8448c50	Close open segments when a newer segment with higher version is allocated (#15727 )	2024-01-31 09:11:00 +05:30
George Shiqi Wu	dbcfb2bb8b	Allow null values for account when injecting (#15777 )	2024-01-30 16:55:45 -05:00
Abhishek Radhakrishnan	dbdfae3011	Fix up typo </br /> -> <br /> and adjust interpolated exception msg in InvalidNullByteFault. (#15804 )	2024-01-30 12:44:51 -08:00
317brian	62886e23ac	docs: add mermaid diagram support (#15771 )	2024-01-30 11:24:15 -08:00

1 2 3 4 5 ...

13679 Commits All Branches Search

13679 Commits

All Branches