druid

Commit Graph

Author	SHA1	Message	Date
Zoltan Haindrich	f8645de341	Remove incorrect utf8 conversion of ResultCache keys (#16569 )	2024-06-12 13:12:05 -07:00
Clint Wylie	fee509df2e	fix NestedDataColumnIndexerV4 to not report cardinality (#16507 ) * fix NestedDataColumnIndexerV4 to not report cardinality changes: * fix issue similar to #16489 but for NestedDataColumnIndexerV4, which can report STRING type if it only processes a single type of values. this should be less common than the auto indexer problem * fix some issues with sql benchmarks	2024-06-11 20:58:12 -07:00
zachjsh	3f5f5921e0	Fix sql syntax error user (#16583 ) This fixes an issue where in some cases, a SQL syntax error encountered when parsing / planning a query results in an error returned to the user with persona a `admin` when it should instead be `user`.	2024-06-11 18:08:35 -04:00
Andreas Maechler	fec48432d4	docs: Correct some outdated module names (#16584 ) * Fix module names * Better spacing * Some spacing * Suggestions from code review Thanks Abhishek. * More links * Roll-up time * Remove logs * More spelling	2024-06-11 14:17:40 -07:00
Andreas Maechler	24056b90b5	Bring back missing property in indexer documentation (#16582 ) * Bring back druid.peon.taskActionClient.retry.minWait * Update docs/configuration/index.md * Consistent italics Thanks Abhishek. * Update docs/configuration/index.md Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> * Consistent list style * Remove extra space --------- Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>	2024-06-10 16:52:54 -07:00
Kashif Faraz	e4fdf1055b	Update default value of `druid.indexer.tasklock.batchAllocationWaitTime` to zero (#16578 ) Update default value of druid.indexer.tasklock.batchAllocationWaitTime to 0. Thus, a segment allocation request is processed immediately unless there are already some requests queued before this one. While in queue, a segment allocation request may get clubbed together with other similar requests into a batch to reduce load on the metadata store.	2024-06-10 20:07:23 +05:30
317brian	8e11adfc6f	docs: remove outdated druidversion var from a page (#16570 ) Co-authored-by: asdf2014 <asdf2014@apache.org>	2024-06-10 15:30:36 +08:00
Clint Wylie	3fb6ba22e8	fix expression column capabilities to not report dictionary encoded unless input is string (#16577 )	2024-06-08 13:05:19 -07:00
Andreas Maechler	40ba429c5f	More validation for Azure account config (#16561 ) * Mark `account` as NotNull * Remove account test Handled by annotation now * Cleanup account config * Mark container as not-null.	2024-06-07 13:24:15 -07:00
Andreas Maechler	e6a82e8a11	Only create container in `AzureStorage` for write operations (#16558 ) * Remove unused constants * Refactor getBlockBlobLength * Better link * Upper-case log * Mark defaultStorageAccount nullable This is the case if you do not use Azure for deep-storage but ingest from Azure blobs. * Do not always create a new container if it doesn't exist Specifically, only create a container if uploading a blob or writing a blob stream * Add lots of comments, group methods * Revert "Mark defaultStorageAccount nullable" * Add mockito for junit * Add extra test * Add comment Thanks George. * Pass blockSize as Long * Test more branches...	2024-06-07 09:47:51 -07:00
Vadim Ogievetsky	efe9079f0a	Web console: fix pagination and filtering regression in supervisor view (#16571 ) * fix pagination and filtering in supervisor view * update snapshot	2024-06-07 21:09:51 +05:30
razinbouzar	844b2177de	Fix 2 coordinators elected as leader (#16528 ) Changes: - Recreate the leader latch when connection to zookeeper is lost - Do not become leader if leader latch is already closed	2024-06-07 15:07:30 +05:30
Akshat Jain	03a38be446	Optimize S3 storage writing for MSQ durable storage (#16481 ) * Optimise S3 storage writing for MSQ durable storage * Get rid of static ConcurrentHashMap * Fix static checks * Fix tests * Remove unused constructor parameter chunkValidation + relevant cleanup * Assert etags as String instead of Integer * Fix flaky test * Inject executor service * Make threadpool size dynamic based on number of cores * Fix S3StorageDruidModuleTest * Fix S3StorageConnectorProviderTest * Fix injection issues * Add S3UploadConfig to manage maximum number of concurrent chunks dynamically based on chunk size * Address the minor review comments * Refactor S3UploadConfig + ExecutorService into S3UploadManager * Address review comments * Make updateChunkSizeIfGreater() synchronized instead of recomputeMaxConcurrentNumChunks() * Address the minor review comments * Fix intellij-inspections check * Refactor code to use futures for maxNumConcurrentChunks. Also use executor service with blocking queue for backpressure semantics. * Update javadoc * Get rid of cyclic dependency injection between S3UploadManager and S3OutputConfig * Fix RetryableS3OutputStreamTest * Remove unnecessary synchronization parts from RetryableS3OutputStream * Update javadoc * Add S3UploadManagerTest * Revert back to S3StorageConnectorProvider extends S3OutputConfig * Address Karan's review comments * Address Kashif's review comments * Change a log message to debug * Address review comments * Fix intellij-inspections check * Fix checkstyle --------- Co-authored-by: asdf2014 <asdf2014@apache.org>	2024-06-07 11:33:16 +05:30
Andreas Maechler	e9f723344b	Disable event hubs when kafka extensions isn't loaded (#16559 )	2024-06-06 16:59:26 -07:00
Rishabh Singh	423c91f9e4	Revert log line to debug (#16565 )	2024-06-06 14:00:31 +05:30
Kashif Faraz	e4f59e00b2	Fix backwards compatibility with centralized schema config in partial_index_merge tasks (#16556 ) * Handle null values of centralized schema config in PartialMergeTask * Fix checkstyle * Do not pass centralized schema config from supervisor task to sub-tasks * Do not pass ObjectMapper in constructor of task * Fix logs * Fix tests	2024-06-06 13:44:04 +05:30
Gian Merlino	277006446d	Fallback vectorization for FunctionExpr and BaseMacroFunctionExpr. (#16366 ) * Fallback vectorization for FunctionExpr and BaseMacroFunctionExpr. This patch adds FallbackVectorProcessor, a processor that adapts non-vectorizable operations into vectorizable ones. It is used in FunctionExpr and BaseMacroFunctionExpr. In addition: - Identifiers are updated to offer getObjectVector for ARRAY and COMPLEX in addition to STRING. ExprEvalObjectVector is updated to offer ARRAY and COMPLEX as well. - In SQL tests, cannotVectorize now fails tests if an exception is not thrown. This makes it easier to identify tests that can now vectorize. - Fix a null-matcher bug in StringObjectVectorValueMatcher. * Fix tests. * Fixes. * Fix tests. * Fix test. * Fix test.	2024-06-05 20:03:02 -07:00
Gian Merlino	2534a42539	Fix serde for ArrayOfDoublesSketchConstantPostAggregator. (#16550 ) * Fix serde for ArrayOfDoublesSketchConstantPostAggregator. The version originally added in #13819 was missing an annotation for the "value" property. Fixes #16539. Line endings for ArrayOfDoublesSketchConstantPostAggregator.java are changed from \r\n to \n. Adds a serde test, and improves various other datasketches post-aggregator serde tests to deserialize into PostAggregator. This verifies that the type information is set up correctly. * Fix excessive imports. * Fix equals, hashCode.	2024-06-05 20:01:51 -07:00
Gian Merlino	b837ce565b	Simplify serialized form of JsonInputFormat. (#15691 ) * Simplify serialized form of JsonInputFormat. Use JsonInclude for keepNullColumns, assumeNewlineDelimited, and useJsonNodeReader. Because the default value of keepNullColumns is variable, we store the original configured value rather than the derived value, and include if the original value is nonnull. * Fix test.	2024-06-05 20:01:14 -07:00
Gian Merlino	717e634156	Router: Authorize permissionless internal requests. (#16419 ) * Router: Authorize permissionless internal requests. Router-internal requests like /proxy/enabled and errors for invalid requests should not require permissions, but they still need to be authorized in order to satisfy the PreResponseAuthorizationCheckFilter. This patch adds authorization checks that do not require any particular permissions. * Fix tests.	2024-06-05 15:31:02 -07:00
Gian Merlino	1040a29bc5	Fix capabilities reported by UnnestStorageAdapter. (#16551 ) UnnestStorageAdapter and its cursors did not return capabilities correctly for the output column. This patch fixes two problems: 1) UnnestStorageAdapter returned the capabilities of the unnest virtual column prior to unnesting. It should return the post-unnest capabilities. 2) UnnestColumnValueSelectorCursor passed through isDictionaryEncoded from the unnest virtual column. This is incorrect, because the dimension selector created by this class never has a dictionary. This is the cause of #16543.	2024-06-05 15:19:42 -07:00
Akshat Jain	6d7d2ffa63	Add interface method for returning canonical lookup name (#16557 ) * Add interface method for returning canonical lookup name * Address review comment * Add test in LookupReferencesManagerTest for coverage check * Add test in LookupSerdeModuleTest for coverage check	2024-06-05 14:33:18 -07:00
Katya Macedo	7aecc09230	Docs: Remove circular link (#16553 )	2024-06-05 11:07:36 -07:00
Bünyamin	30c59042e0	Add new metrics from v30 to prometheus-emitter (#16345 ) Co-authored-by: asdf2014 <asdf2014@apache.org>	2024-06-05 10:51:48 +05:30
Charles Smith	c100ae0ecc	Add a tutorial for LATEST_BY to get most recent data (#16515 ) Co-authored-by: Will Xu <2bethere@gmail.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2024-06-04 17:00:25 -07:00
Jill Osborne	8b5802d4cd	docs: add maxSubqueryBytes limit to migration guide landing page (#16547 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-06-04 12:52:06 -07:00
Abhishek Radhakrishnan	b9ba286423	Fix task bootstrapping & simplify segment load/drop flows (#16475 ) * Fix task bootstrap locations. * Remove dependency of SegmentCacheManager from SegmentLoadDropHandler. - The load drop handler code talks to the local cache manager via SegmentManager. * Clean up unused imports and stuff. * Test fixes. * Intellij inspections and test bind. * Clean up dependencies some more * Extract test load spec and factory to its own class. * Cleanup test util * Pull SegmentForTesting out to TestSegmentUtils. * Fix up. * Minor changes to infoDir * Replace server announcer mock and verify that. * Add tests. * Update javadocs. * Address review comments. * Separate methods for download and bootstrap load * Clean up return types and exception handling. * No callback for loadSegment(). * Minor cleanup * Pull out the test helpers into its own static class so it can have better state control. * LocalCacheManager stuff * Fix build. * Fix build. * Address some CI warnings. * Minor updates to javadocs and test code. * Address some CodeQL test warnings and checkstyle fix. * Pass a Consumer<DataSegment> instead of boolean & rename variables. * Small updates * Remove one test constructor. * Remove the other constructor that wasn't initializing fully and update usages. * Cleanup withInfoDir() builder and unnecessary test hooks. * Remove mocks and elaborate on comments. * Commentary * Fix a few Intellij inspection warnings. * Suppress corePoolSize intellij-inspect warning. The intellij-inspect tool doesn't seem to correctly inspect lambda usages. See ScheduledExecutors. * Update docs and add more tests. * Use hamcrest for asserting order on expectation. * Shutdown bootstrap exec. * Fix checkstyle	2024-06-04 10:44:46 -07:00
Vadim Ogievetsky	0b4ac78a7b	Web console: fix delta sorting in the explore view table (#16542 ) * more robust query naming * make order by delta work * fix tests * fix type imports * tidy up	2024-06-04 10:15:35 -07:00
Amit	540d3e6af5	Added new use cases and description of the use case - 5/14/24 (#16451 ) Thanks for your contribution @amit-git-account * Added new use cases and description of the use case - 5/14/24 The use case listing is not changed in a long time. While speaking with users, I came across several other use cases not listed here in the index. So I added new use cases and also added description against the use cases. * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * update spelling file * Update docs/design/index.md --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-06-04 09:47:49 -07:00
Andreas Maechler	b0f2a07c40	Add README with link to docs (#16540 )	2024-06-04 07:41:01 -07:00
Andreas Maechler	02caa50fd0	Remove unused interface from Azure extension (#16541 )	2024-06-04 08:21:26 +05:30
Andreas Maechler	6c7443c93a	Update Azure extension tests to JUnit 5 (#16521 ) Changes: - Loosely followed the steps in the migration guide at https://junit.org/junit5/docs/current/user-guide/#migrating-from-junit4 - Updated POM to add JUnit 5 dependencies - Updated imports to JUnit 5 packages - Updated annotations (Lifecycle annotations like `@BeforeEach`) - Updated exception testing (`assertThrows`) - Updated temporary path handling (use `@TempDir` annotation) - Various other updates (replace other `Rule` usages, make sure to use JUnit 5 assertions)	2024-06-04 08:19:48 +05:30
Charles Smith	8f78c901e7	docs: add lookups to the sidebar (#16530 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-06-03 16:04:15 -07:00
Kashif Faraz	1974a38bc9	Clean up allocation and supervisor logs for easier debugging (#16535 ) Changes: - Use string taskGroup consistently to easily search for a task group - Clean up other logs - No change in any logic	2024-06-03 16:41:04 +05:30
Karan Kumar	d0916865d0	Fix race in AzureClient factory fetch (#16525 ) * Fix race in AzureClient factory fetch * Fixing forbidden check. * Renaming variable.	2024-06-01 22:50:44 +05:30
Charles Smith	b1568fb95b	docs: Adds a redirect for flatten-json which was removed (#16263 )	2024-05-31 16:16:12 -07:00
Katya Macedo	f70ef1f434	Update front coding text (#16491 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-05-31 15:13:10 -07:00
Katya Macedo	92e660dd21	Add Druid 30.0.0 upgrade notes (#16522 )	2024-05-31 13:23:22 -07:00
Atul Mohan	b53d75758f	IcebergInputSource : Add option to toggle case sensitivity while reading columns from iceberg catalog (#16496 ) * Toggle case sensitivity while reading columns from iceberg * Fix tests * Drop case check and set unconditionally	2024-05-31 10:18:52 -07:00
George Shiqi Wu	0936798122	Add limit to task payload size (#16512 ) * Add limit to task payload size * Change to a warning * Remove test * Fix unit tests * Optionally throw alert * PR comments * Update indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * PR comments * Reject large payloads * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-05-31 09:17:36 -07:00
Kashif Faraz	b5b900b6a0	Do minor cleanup of AutoCompactionSnapshot.Builder (#16523 ) Changes: - Use `final` modifier for immutable - Use builder methods for chaining - Shorter lambda syntax	2024-05-31 16:06:53 +05:30
Jill Osborne	3c72ec8413	docs: Migration guide for subquery limit (#16519 ) Adds a migration guide for Druid 30 to help users understand the new byte-based subquery limit property maxSubqueryBytes	2024-05-31 09:26:07 +05:30
Charles Smith	92e565e3b8	Adds a migration guide overview page to the release-info section (#16506 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Katya Macedo <katya.macedo@imply.io>	2024-05-30 09:50:30 -07:00
Adithya Chakilam	a9044ac235	Add cgroup cpu/mem/disk usage metrics (#16472 ) * Add cgroup cpu/mem usage metrics * checks * comments * docs fix * add disk metrics * fapi check * checkstyle * issues * spelling * change asserts * checks * use proc builder instead of runtime * specify charset * spotbug	2024-05-29 12:44:37 -07:00
Abhishek Radhakrishnan	75937c98e8	Upgrade delta kernel from 3.1.0 to 3.2.0 (#16513 ) Upstream release: https://github.com/delta-io/delta/releases/tag/v3.2.0 - Upgrade kernel dependency to 3.2.0 - Notable breaking changes introduced in upstream that affects the Druid extension: - Rename TableClient -> Engine - Rename DefaultTableClient -> DefaultEngine - Exceptions moved to a separate package - Table.getPath() doesn't throw TableNotFoundException. Instead the exception is thrown when getting snapshot info from the Table object	2024-05-29 10:46:30 -07:00
George Shiqi Wu	b3b62ac431	Update azure input source docs (#16508 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-05-29 10:00:46 -07:00
Sree Charan Manamala	6bbf9613f8	Throw soft exception in case of empty signature while building Scan Query (#16502 )	2024-05-29 09:41:54 +02:00
Sree Charan Manamala	27cfe12f4a	Enable reordering of window operators (#16482 ) This commit aims to enable the re-ordering of window operators in order to optimise the sort and partition operators. Example : ``` SELECT m1, m2, SUM(m1) OVER(PARTITION BY m2) as sum1, SUM(m2) OVER() as sum2 from numFoo GROUP BY m1,m2 ``` In order to compute this query, we can order the operators as to first compute the operators corresponding to sum2 and then place the operators corresponding to sum1 which would help us in reducing one sort operator if we order our operators by sum1 and then sum2.	2024-05-29 12:17:12 +05:30
George Shiqi Wu	f7013e012c	Add new test for handoff API (#16492 ) * Add new test for handoff API * Add new method * fix test * Update test	2024-05-28 12:57:51 -07:00
Adarsh Sanjeev	21f725f33e	Add octet streaming of sketchs in MSQ (#16269 ) There are a few issues with using Jackson serialization in sending datasketches between controller and worker in MSQ. This caused a blowup due to holding multiple copies of the sketch being stored. This PR aims to resolve this by switching to deserializing the sketch payload without Jackson. The PR adds a new query parameter used during communication between controller and worker while fetching sketches, "sketchEncoding". If the value of this parameter is OCTET, the sketch is returned as a binary encoding, done by ClusterByStatisticsSnapshotSerde. If the value is not the above, the sketch is encoded by Jackson as before.	2024-05-28 18:12:38 +05:30

... 3 4 5 6 7 ...

14293 Commits All Branches Search

14293 Commits

All Branches