druid

Commit Graph

Author	SHA1	Message	Date
Bartosz Mikulski	45c26e8682	Fix Inspection Check in DirectDruidClientTest (#15857 )	2024-02-07 02:56:26 -08:00
Zoltan Haindrich	fdc7cec271	Support Window operators in decoupled planning (#15815 )	2024-02-07 04:09:48 -05:00
Bartosz Mikulski	43a1c96cd1	Fix query-cancellation-executor memory leak (#15754 ) This PR fixes #15069 by resolving a memory leak caused by a thread leak in query-cancellation-executor.	2024-02-07 10:54:38 +05:30
Pramod Immaneni	59bca0951a	Parallelize storage of incremental segments (#13982 ) During ingestion, incremental segments are created in memory for the different time chunks and persisted to disk when certain thresholds are reached (max number of rows, max memory, incremental persist period etc). In the case where there are a lot of dimension and metrics (1000+) it was observed that the creation/serialization of incremental segment file format for persistence and persisting the file took a while and it was blocking ingestion of new data. This affected the real-time ingestion. This serialization and persistence can be parallelized across the different time chunks. This update aims to do that. The patch adds a simple configuration parameter to the ingestion tuning configuration to specify number of persistence threads. The default value is 1 if it not specified which makes it the same as it is today.	2024-02-07 10:43:05 +05:30
Sam Wheating	4c58856f10	Fix incorrect ordering of args in log statement (#15846 )	2024-02-06 16:12:04 -08:00
Abhishek Radhakrishnan	1affa35b29	Bump up Delta Lake Kernel to 3.1.0 (#15842 ) This patch bumps Delta Lake Kernel dependency from 3.0.0 to 3.1.0, which released last week - please see https://github.com/delta-io/delta/releases/tag/v3.1.0 for release notes. There were a few "breaking" API changes in 3.1.0, you can find the rationale for some of those changes here. Next-up in this extension: add and expose filter predicates.	2024-02-06 21:25:17 +05:30
317brian	2dc71c7874	docs: fix rendering (#15835 )	2024-02-06 07:18:43 -08:00
Gian Merlino	54b30646f3	Add sqlReverseLookupThreshold for ReverseLookupRule. (#15832 ) If lots of keys map to the same value, reversing a LOOKUP call can slow things down unacceptably. To protect against this, this patch introduces a parameter sqlReverseLookupThreshold representing the maximum size of an IN filter that will be created as part of lookup reversal. If inSubQueryThreshold is set to a smaller value than sqlReverseLookupThreshold, then inSubQueryThreshold will be used instead. This allows users to use that single parameter to control IN sizes if they wish.	2024-02-06 16:32:05 +05:30
Soumyava	b86f31f2c0	Addressing shapeshifting issues with window functions (#15807 ) Addressing shapeshifting issues with window functions	2024-02-06 11:12:20 +05:30
Zoltan Haindrich	392d585ff8	Identify not range filters without negating subexpressions (#15766 ) * Identify not range filters without negating subexpressions Earlier betweenish (range/bounds) filters were identified thru a process of negating the subexpressions which may have not performed that well. (it could have dominated the runtime in some cases) This patch makes that unnecessary as its able to create the negate expression directly. * add test;fix for multiple intervals	2024-02-05 19:12:58 -08:00
George Shiqi Wu	edb1ac1b71	Update azure console tile (#15820 ) * Save web console changes * Working new input type * fix tests	2024-02-05 13:11:39 -08:00
Clint Wylie	358892e5b0	add nested array index support, fix some bugs (#15752 ) This PR wires up ValueIndexes and ArrayElementIndexes for nested arrays, ValueIndexes for nested long and double columns, and fixes a handful of bugs I found after adding nested columns to the filter test gauntlet.	2024-02-05 15:12:09 +05:30
Laksh Singla	ee78a0367d	Fix serialization bug in PassthroughAggregatorFactory (#15830 ) PassthroughAggregatorFactory overrides a deprecated method in the AggregatorFactory, on which it relies on for serializing one of its fields complexTypeName. This was accidentally removed, leading to a bug in the factory, where the type name doesn't get serialized properly, and places null in the type name. This PR revives that method with a different name and adds tests for the same.	2024-02-05 15:11:10 +05:30
Rishabh Singh	de959e513d	Add QueryLifecycle#authorize for grpc-query-extension (#15816 ) Proposal #13469 Original PR #14024 A new method is being added in QueryLifecycle class to authorise a query based on authentication result. This method is required since we authenticate the query by intercepting it in the grpc extension and pass down the authentication result.	2024-02-02 21:49:57 +05:30
Zoltan Haindrich	8f5b7522c7	Strict window frame checks (#15746 ) introduce checks to ensure that window frame is supported added check to ensure that no expressions are set as bounds added logic to detect following/following like cases - described in Window function fails to demarcate if 2 following are used #15739 currently RANGE frames are only supported correctly if both endpoints are unbounded or current row Offset based window range support #15767 added windowingStrictValidation context key to provide a way to override the check	2024-02-02 16:21:53 +05:30
Atul Mohan	2e46a98024	Add range filtering support for iceberg ingestion (#15782 ) * Add range filtering support for iceberg ingestion * Docs formatting * Spelling	2024-02-01 23:32:30 -08:00
Aru Raghuwanshi	223f29d64c	Update input-sources.md for fixing the warehouse path example under S3 (#15823 )	2024-02-01 23:32:05 -08:00
317brian	6d617c34d2	docs: revise concurrent append and replace (#15760 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-02-01 11:03:36 -08:00
PANKAJ KUMAR	65857dc0e7	pac4j: fix incompatible dependencies + authorization regression (#15753 ) - After upgrading the pac4j version in: https://github.com/apache/druid/pull/15522. We were not able to access the druid ui. - Upgraded the Nimbus libraries version to a compatible version to pac4j. - In the older pac4j version, when we return RedirectAction there we also update the webcontext Response status code and add the authentication URL to the header. But in the newer pac4j version, we just simply return the RedirectAction. So that's why it was not getting redirected to the generated authentication URL. - To fix the above, I have updated the NOOP_HTTP_ACTION_ADAPTER to JEE_HTTP_ACTION_ADAPTER and it updates the HTTP Response in context as per the HTTP Action.	2024-02-01 09:35:23 -08:00
George Shiqi Wu	50bae96e8b	Add azure integrationt ests (#15799 )	2024-02-01 09:18:49 -05:00
Vishesh Garg	5de39c6251	Resolve CVE issues (#15814 ) * Resolve CVE issues * Update license.yaml	2024-02-01 14:10:12 +05:30
Laksh Singla	7d65caf0c5	Update the docs for EARLIEST_BY/LATEST_BY aggregators with the newly added numeric capabilities (#15670 )	2024-02-01 10:24:43 +05:30
Vadim Ogievetsky	fcd65c9801	Web console: use arrayIngestMode: array (#15588 ) * Adapt to new array mode * Feedback fixes * fixing type detection and highlighting * goodies * add docs * feedback fixes * finish array work * update snapshots * typo fix * color fixes * small fix * make MVDs default for now * better sqlStringifyArrays default * fix spec converter * fix tests	2024-01-31 20:19:29 -08:00
George Shiqi Wu	5edfa9429f	Batch kill in azure (#15770 ) * Multi kill * add some unit tests * Fix param * Fix deleteBatchFiles * Fix unit tests * Add tests * Save work on batch kill * add tests * Fix unit tests * Update extensions-core/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureDataSegmentKiller.java Co-authored-by: Suneet Saldanha <suneet@apache.org> * Fix unit tests * Update extensions-core/azure-extensions/src/test/java/org/apache/druid/storage/azure/AzureStorageTest.java Co-authored-by: Suneet Saldanha <suneet@apache.org> * fix test * fix test * Add test --------- Co-authored-by: Suneet Saldanha <suneet@apache.org>	2024-01-31 13:41:15 -05:00
Vadim Ogievetsky	0089f6b905	Web console: Don't force waitUntilSegmentLoad to true (#15781 ) * Don't force setting waitUntilSegmentsLoad * delete irrelevant code	2024-01-31 16:16:36 +05:30
Vishesh Garg	37d1650ccf	Benchmark for query planning time for IN queries (#15688 ) Adds a set of benchmark queries for measuring the planning time with the IN operator. Current results indicate that with the recent optimizations, the IN planning time with 100K expressions in the IN clause is just 3s and with 1M is 46s. For IN clause paired with OR <col>=<val> expr, the numbers are 10s and 155s for 100K and 1M, resp.	2024-01-31 15:40:31 +05:30
Vishesh Garg	2a250a4e6e	Fix GHA logs dir and make tar and upload conditional on web console test failures (#15810 ) The PR makes 2 change: Correct the current logs directory tarred in GHA static checks to log Make the steps of logs tar-ing and uploading conditional on web console test failures, which currently happens on any step failure in static checks workflow Sample logs before this change for failed static checks: https://github.com/apache/druid/actions/runs/7719743853/job/21043502498	2024-01-31 15:39:56 +05:30
Zoltan Haindrich	f701197224	Enable ArrayListRowsAndColumns to StorageAdapter conversion (#15735 )	2024-01-31 02:36:58 -05:00
Abhishek Radhakrishnan	9f95a691f7	Extension to read and ingest Delta Lake tables (#15755 ) * something * test commit * compilation fix * more compilation fixes (fixme placeholders) * Comment out druid-kereberos build since it conflicts with newly added transitive deps from delta-lake Will need to sort out the dependencies later. * checkpoint * remove snapshot schema since we can get schema from the row * iterator bug fix * json json json * sampler flow * empty impls for read(InputStats) and sample() * conversion? * conversion, without timestamp * Web console changes to show Delta Lake * Asset bug fix and tile load * Add missing pieces to input source info, etc. * fix stuff * Use a different delta lake asset * Delta lake extension dependencies * Cleanup * Add InputSource, module init and helper code to process delta files. * Test init * Checkpoint changes * Test resources and updates * some fixes * move to the correct package * More tests * Test cleanup * TODOs * Test updates * requirements and javadocs * Adjust dependencies * Update readme * Bump up version * fixup typo in deps * forbidden api and checkstyle checks * Trim down dependencies * new lines * Fixup Intellij inspections. * Add equals() and hashCode() * chain splits, intellij inspections * review comments and todo placeholder * fix up some docs * null table path and test dependencies. Fixup broken link. * run prettify * Different test; fixes * Upgrade pyspark and delta-spark to latest (3.5.0 and 3.0.0) and regenerate tests * yank the old test resource. * add a couple of sad path tests * Updates to readme based on latest. * Version support * Extract Delta DateTime converstions to DeltaTimeUtils class and add test * More comprehensive split tests. * Some test renames. * Cleanup and update instructions. * add pruneSchema() optimization for table scans. * Oops, missed the parquet files. * Update default table and rename schema constants. * Test setup and misc changes. * Add class loader logic as the context class loader is unaware about extension classes * change some table client creation logic. * Add hadoop-aws, hadoop-common and related exclusions. * Remove org.apache.hadoop:hadoop-common * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Add entry to .spelling to fix docs static check --------- Co-authored-by: abhishekagarwal87 <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Laksh Singla <lakshsingla@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-01-30 21:53:50 -08:00
Benjamin Hopp	6177f6efd7	Fixing formatting of Iceberg Catalog Object (#15748 )	2024-01-30 20:17:38 -08:00
AmatyaAvadhanula	d9e8448c50	Close open segments when a newer segment with higher version is allocated (#15727 )	2024-01-31 09:11:00 +05:30
George Shiqi Wu	dbcfb2bb8b	Allow null values for account when injecting (#15777 )	2024-01-30 16:55:45 -05:00
Abhishek Radhakrishnan	dbdfae3011	Fix up typo </br /> -> <br /> and adjust interpolated exception msg in InvalidNullByteFault. (#15804 )	2024-01-30 12:44:51 -08:00
317brian	62886e23ac	docs: add mermaid diagram support (#15771 )	2024-01-30 11:24:15 -08:00
AmatyaAvadhanula	ef46d88200	Release unneeded append locks after acquiring a new superseding append lock (#15682 ) * Fix segment transactional append when publishing with multiple overlapping locks	2024-01-30 16:51:56 +05:30
Vadim Ogievetsky	497e2123f0	Web console: Make table driven query modification actions work with slices. (#15779 ) * Make table driven query modification actions work with slices. * cleanup found query prefix * fix regex complexity	2024-01-29 20:09:46 -08:00
Gian Merlino	38a1e827ab	Fix up value types when creating range filters. (#15778 ) Fixes a bug introduced in #15609, where queries involving filters on TIME_FLOOR could encounter ClassCastException when comparing RangeValue in CombineAndSimplifyBounds. Prior to #15609, CombineAndSimplifyBounds would remove, rebuild, and re-add all numeric range filters as part of consolidating numeric range filters for the same column under the least restrictive type. #15609 included a change to only rebuild numeric range filters when a consolidation opportunity actually arises. The bug was introduced because the unconditional rebuild, as a side effect, masked the fact that in some cases range filters would be created with string match values and a LONG match value type. This patch changes the fixup to happen at the time the range filter is initially created, rather than in CombineAndSimplifyBounds.	2024-01-29 13:30:47 -08:00
AmatyaAvadhanula	54d0e482dc	Consolidate RetrieveSegmentsToReplaceAction into RetrieveUsedSegmentsAction (#15699 ) Consolidate RetrieveSegmentsToReplaceAction into RetrieveUsedSegmentsAction	2024-01-29 19:18:43 +05:30
Clint Wylie	01fa5c7ea6	add null value index wiring for nested column to speed up is null/is not null (#15687 ) Nested columns maintain a null value bitmap for which rows are nulls, however I forgot to wire up a ColumnIndexSupplier to nested columns when filtering the 'raw' data itself, so these were not able to be used. This PR fixes that by adding a supplier that can return NullValueIndex to be used by the NullFilter, which should speed up is null and is not null filters on json columns. I haven't spent the time to measure the difference yet, but I imagine it should be a significant speed increase. Note that I only wired this up if druid.generic.useDefaultValueForNull=false (sql compatible mode), the reason being that the SQL planner still uses selector filter, which is unable to properly handle any arrays or complex types (including json, even checking for nulls). The reason for this is so that the behavior is consistent between using the index and using the value matcher, otherwise we get into a situation where using the index has correct behavior but using the value matcher does not, which I was trying to avoid.	2024-01-29 12:34:50 +05:30
Abhishek Agarwal	989a8f7874	Better error message for date_trunc operators (#15759 ) IAEs are not bubbled up and show up as a runtime failure to the user which are not helpful. See https://apachedruidworkspace.slack.com/archives/C0303FDCZEZ/p1706185796975109 for one such example. This change will fix that.	2024-01-27 11:22:39 +05:30
Abhishek Radhakrishnan	f58fd5b75f	Remove TestObjectMapper in favor of DefaultObjectMapper. (#15769 ) Remove dilemma on what object mapper class to use in tests since the DefaultObjectMapper class provides all the same settings and goodies.	2024-01-26 16:35:12 -08:00
317brian	ba07965580	docs: clean up some rolling updates stuff (#15762 )	2024-01-26 14:10:53 -08:00
zachjsh	ae6afc0751	Extend unused segment metadata api response to include created date and last used updated time (#15738 ) ### Description The unusedSegment api response was extended to include the original DataSegment object with the creation time and last used update time added to it. A new object `DataSegmentPlus` was created for this purpose, and the metadata queries used were updated as needed. example response: ``` [ { "dataSegment": { "dataSource": "inline_data", "interval": "2023-01-02T00:00:00.000Z/2023-01-03T00:00:00.000Z", "version": "2024-01-25T16:06:42.835Z", "loadSpec": { "type": "local", "path": "/Users/zachsherman/projects/opensrc-druid/distribution/target/apache-druid-30.0.0-SNAPSHOT/var/druid/segments/inline_data/2023-01-02T00:00:00.000Z_2023-01-03T00:00:00.000Z/2024-01-25T16:06:42.835Z/0/index/" }, "dimensions": "str_dim,double_measure1,double_measure2", "metrics": "", "shardSpec": { "type": "numbered", "partitionNum": 0, "partitions": 1 }, "binaryVersion": 9, "size": 1402, "identifier": "inline_data_2023-01-02T00:00:00.000Z_2023-01-03T00:00:00.000Z_2024-01-25T16:06:42.835Z" }, "createdDate": "2024-01-25T16:06:44.196Z", "usedStatusLastUpdatedDate": "2024-01-25T16:07:34.909Z" } ] ```	2024-01-26 15:47:40 -05:00
Abhishek Radhakrishnan	a7918be268	Temporarily bump up the delay in auth IT from 5s to 10s. (#15765 ) A more ideal/permanent fix would be to have status checks exposed by the services, but that'll require more code changes. So temporarily bump it to unblock CI now.	2024-01-26 11:52:27 -05:00
Gian Merlino	00cb0a2900	Fix extractionFns on number-wrapping dimension selectors. (#15761 ) When an ExtractionFn is used on top of a numeric column, it wasn't applied to nulls (nulls are returned as-is). This patch fixes it.	2024-01-26 19:56:13 +05:30
Vadim Ogievetsky	45ad47cc66	allow segment table to sort on start and end when grouped (#15720 )	2024-01-26 18:59:23 +08:00
Sensor	4e50a14d50	fix router page value inconsistent issue (#15742 ) * fix router page value inconsistent issue * make the fix more universal as suggested * minor	2024-01-26 11:02:30 +08:00
George Shiqi Wu	3e512249e3	Azure multi read options (#15630 ) * Include new dependencies * Mostly implemented * More azure fixes * Tests passing * Unit tests running * Test running after removing storage exception * Happy with coverage now * Add more tests * fix client factory * cleanup from testing * Remove old client * update docs * Exclude from spellcheck * Add licenses * Fix identity version * Save work * Add azure clients * add licenses * typos * Add dependencies * Exception is not thrown * Fix intellij check * Don't need to override * specify length * urldecode * encode path * Fix checks * Revert urlencode changes * Urlencode with azure library * Update docs/development/extensions-core/azure.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * PR changes * Update docs/development/extensions-core/azure.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Add config for multiple storage accounts * Deprecate AzureTaskLogsConfig.maxRetries * Clean up azure retry block * logic update to reuse clients * fix comments * Create container conditionally * Fix key auth * save work * Fix unit tests * Revert old azure input type * Separate input source * save work * Add support for app registrations * Fix unit tests * clean up spacing * Add coverage * fixes from testing * cleanup some caching behavior * Add docs * Fix spelling issues * fix more spelling errors' * Fix intellij inspections * add simple changes from pr * save work on fixing bug * Fix unit tests * Add more testing * Fix unit test * Add tests * Add annotation for azureStorage * Fix up docs * Add comment for list method * Fix tests * Remove uneeded toString * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * PR changes * fix injection of StorageConnector * Fix checkstyle * clean up unit tests * More pr fixes --------- Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-01-25 13:29:16 -05:00
Katya Macedo	867c636629	Document pivot and unpivot operators (#15669 )	2024-01-25 09:53:39 -08:00
Parth Agrawal	ed6df26a91	update salt size (#15758 ) As part of becoming FIPS compliance, we are seeing this error: salt must be at least 128 bits when we run the Druid code against FIPS Compliant cryptographic security providers. This PR fixes the salt size used in Pac4jSessionStore.java	2024-01-25 17:05:53 +05:30

... 2 3 4 5 6 ...

13813 Commits All Branches Search

13813 Commits

All Branches