Commit Graph

14293 Commits

Author SHA1 Message Date
Zoltan Haindrich f8645de341
Remove incorrect utf8 conversion of ResultCache keys (#16569) 2024-06-12 13:12:05 -07:00
Clint Wylie fee509df2e
fix NestedDataColumnIndexerV4 to not report cardinality (#16507)
* fix NestedDataColumnIndexerV4 to not report cardinality
changes:
* fix issue similar to #16489 but for NestedDataColumnIndexerV4, which can report STRING type if it only processes a single type of values. this should be less common than the auto indexer problem
* fix some issues with sql benchmarks
2024-06-11 20:58:12 -07:00
zachjsh 3f5f5921e0
Fix sql syntax error user (#16583)
This fixes an issue where in some cases, a SQL syntax error encountered when parsing / planning a query results in an error returned to the user with persona a `admin` when it should instead be `user`.
2024-06-11 18:08:35 -04:00
Andreas Maechler fec48432d4
docs: Correct some outdated module names (#16584)
* Fix module names

* Better spacing

* Some spacing

* Suggestions from code review

Thanks Abhishek.

* More links

* Roll-up time

* Remove logs

* More spelling
2024-06-11 14:17:40 -07:00
Andreas Maechler 24056b90b5
Bring back missing property in indexer documentation (#16582)
* Bring back druid.peon.taskActionClient.retry.minWait

* Update docs/configuration/index.md

* Consistent italics

Thanks Abhishek.

* Update docs/configuration/index.md

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>

* Consistent list style

* Remove extra space

---------

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
2024-06-10 16:52:54 -07:00
Kashif Faraz e4fdf1055b
Update default value of `druid.indexer.tasklock.batchAllocationWaitTime` to zero (#16578)
Update default value of druid.indexer.tasklock.batchAllocationWaitTime to 0.
Thus, a segment allocation request is processed immediately unless there are already some requests queued before this one. While in queue, a segment allocation request may get clubbed together with other similar requests into a batch to reduce load on the metadata store.
2024-06-10 20:07:23 +05:30
317brian 8e11adfc6f
docs: remove outdated druidversion var from a page (#16570)
Co-authored-by: asdf2014 <asdf2014@apache.org>
2024-06-10 15:30:36 +08:00
Clint Wylie 3fb6ba22e8
fix expression column capabilities to not report dictionary encoded unless input is string (#16577) 2024-06-08 13:05:19 -07:00
Andreas Maechler 40ba429c5f
More validation for Azure account config (#16561)
* Mark `account` as NotNull

* Remove account test

Handled by annotation now

* Cleanup account config

* Mark container as not-null.
2024-06-07 13:24:15 -07:00
Andreas Maechler e6a82e8a11
Only create container in `AzureStorage` for write operations (#16558)
* Remove unused constants

* Refactor getBlockBlobLength

* Better link

* Upper-case log

* Mark defaultStorageAccount nullable

This is the case if you do not use Azure for deep-storage but ingest from Azure blobs.

* Do not always create a new container if it doesn't exist

Specifically, only create a container if uploading a blob or writing a blob stream

* Add lots of comments, group methods

* Revert "Mark defaultStorageAccount nullable"

* Add mockito for junit

* Add extra test

* Add comment

Thanks George.

* Pass blockSize as Long

* Test more branches...
2024-06-07 09:47:51 -07:00
Vadim Ogievetsky efe9079f0a
Web console: fix pagination and filtering regression in supervisor view (#16571)
* fix pagination and filtering in supervisor view

* update snapshot
2024-06-07 21:09:51 +05:30
razinbouzar 844b2177de
Fix 2 coordinators elected as leader (#16528)
Changes:
- Recreate the leader latch when connection to zookeeper is lost
- Do not become leader if leader latch is already closed
2024-06-07 15:07:30 +05:30
Akshat Jain 03a38be446
Optimize S3 storage writing for MSQ durable storage (#16481)
* Optimise S3 storage writing for MSQ durable storage

* Get rid of static ConcurrentHashMap

* Fix static checks

* Fix tests

* Remove unused constructor parameter chunkValidation + relevant cleanup

* Assert etags as String instead of Integer

* Fix flaky test

* Inject executor service

* Make threadpool size dynamic based on number of cores

* Fix S3StorageDruidModuleTest

* Fix S3StorageConnectorProviderTest

* Fix injection issues

* Add S3UploadConfig to manage maximum number of concurrent chunks dynamically based on chunk size

* Address the minor review comments

* Refactor S3UploadConfig + ExecutorService into S3UploadManager

* Address review comments

* Make updateChunkSizeIfGreater() synchronized instead of recomputeMaxConcurrentNumChunks()

* Address the minor review comments

* Fix intellij-inspections check

* Refactor code to use futures for maxNumConcurrentChunks. Also use executor service with blocking queue for backpressure semantics.

* Update javadoc

* Get rid of cyclic dependency injection between S3UploadManager and S3OutputConfig

* Fix RetryableS3OutputStreamTest

* Remove unnecessary synchronization parts from RetryableS3OutputStream

* Update javadoc

* Add S3UploadManagerTest

* Revert back to S3StorageConnectorProvider extends S3OutputConfig

* Address Karan's review comments

* Address Kashif's review comments

* Change a log message to debug

* Address review comments

* Fix intellij-inspections check

* Fix checkstyle

---------

Co-authored-by: asdf2014 <asdf2014@apache.org>
2024-06-07 11:33:16 +05:30
Andreas Maechler e9f723344b
Disable event hubs when kafka extensions isn't loaded (#16559) 2024-06-06 16:59:26 -07:00
Rishabh Singh 423c91f9e4
Revert log line to debug (#16565) 2024-06-06 14:00:31 +05:30
Kashif Faraz e4f59e00b2
Fix backwards compatibility with centralized schema config in partial_index_merge tasks (#16556)
* Handle null values of centralized schema config in PartialMergeTask

* Fix checkstyle

* Do not pass centralized schema config from supervisor task to sub-tasks

* Do not pass ObjectMapper in constructor of task

* Fix logs

* Fix tests
2024-06-06 13:44:04 +05:30
Gian Merlino 277006446d
Fallback vectorization for FunctionExpr and BaseMacroFunctionExpr. (#16366)
* Fallback vectorization for FunctionExpr and BaseMacroFunctionExpr.

This patch adds FallbackVectorProcessor, a processor that adapts non-vectorizable
operations into vectorizable ones. It is used in FunctionExpr and BaseMacroFunctionExpr.

In addition:

- Identifiers are updated to offer getObjectVector for ARRAY and COMPLEX in addition
  to STRING. ExprEvalObjectVector is updated to offer ARRAY and COMPLEX as well.

- In SQL tests, cannotVectorize now fails tests if an exception is not thrown. This makes
  it easier to identify tests that can now vectorize.

- Fix a null-matcher bug in StringObjectVectorValueMatcher.

* Fix tests.

* Fixes.

* Fix tests.

* Fix test.

* Fix test.
2024-06-05 20:03:02 -07:00
Gian Merlino 2534a42539
Fix serde for ArrayOfDoublesSketchConstantPostAggregator. (#16550)
* Fix serde for ArrayOfDoublesSketchConstantPostAggregator.

The version originally added in #13819 was missing an annotation for
the "value" property. Fixes #16539.

Line endings for ArrayOfDoublesSketchConstantPostAggregator.java are changed
from \r\n to \n.

Adds a serde test, and improves various other datasketches post-aggregator
serde tests to deserialize into PostAggregator. This verifies that the type
information is set up correctly.

* Fix excessive imports.

* Fix equals, hashCode.
2024-06-05 20:01:51 -07:00
Gian Merlino b837ce565b
Simplify serialized form of JsonInputFormat. (#15691)
* Simplify serialized form of JsonInputFormat.

Use JsonInclude for keepNullColumns, assumeNewlineDelimited, and
useJsonNodeReader. Because the default value of keepNullColumns is
variable, we store the original configured value rather than the
derived value, and include if the original value is nonnull.

* Fix test.
2024-06-05 20:01:14 -07:00
Gian Merlino 717e634156
Router: Authorize permissionless internal requests. (#16419)
* Router: Authorize permissionless internal requests.

Router-internal requests like /proxy/enabled and errors for invalid
requests should not require permissions, but they still need to be
authorized in order to satisfy the PreResponseAuthorizationCheckFilter.
This patch adds authorization checks that do not require any particular
permissions.

* Fix tests.
2024-06-05 15:31:02 -07:00
Gian Merlino 1040a29bc5
Fix capabilities reported by UnnestStorageAdapter. (#16551)
UnnestStorageAdapter and its cursors did not return capabilities correctly
for the output column. This patch fixes two problems:

1) UnnestStorageAdapter returned the capabilities of the unnest virtual
   column prior to unnesting. It should return the post-unnest capabilities.

2) UnnestColumnValueSelectorCursor passed through isDictionaryEncoded from
   the unnest virtual column. This is incorrect, because the dimension selector
   created by this class never has a dictionary. This is the cause of #16543.
2024-06-05 15:19:42 -07:00
Akshat Jain 6d7d2ffa63
Add interface method for returning canonical lookup name (#16557)
* Add interface method for returning canonical lookup name

* Address review comment

* Add test in LookupReferencesManagerTest for coverage check

* Add test in LookupSerdeModuleTest for coverage check
2024-06-05 14:33:18 -07:00
Katya Macedo 7aecc09230
Docs: Remove circular link (#16553) 2024-06-05 11:07:36 -07:00
Bünyamin 30c59042e0
Add new metrics from v30 to prometheus-emitter (#16345)
Co-authored-by: asdf2014 <asdf2014@apache.org>
2024-06-05 10:51:48 +05:30
Charles Smith c100ae0ecc
Add a tutorial for LATEST_BY to get most recent data (#16515)
Co-authored-by: Will Xu <2bethere@gmail.com>
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2024-06-04 17:00:25 -07:00
Jill Osborne 8b5802d4cd
docs: add maxSubqueryBytes limit to migration guide landing page (#16547)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2024-06-04 12:52:06 -07:00
Abhishek Radhakrishnan b9ba286423
Fix task bootstrapping & simplify segment load/drop flows (#16475)
* Fix task bootstrap locations.

* Remove dependency of SegmentCacheManager from SegmentLoadDropHandler.

- The load drop handler code talks to the local cache manager via
SegmentManager.

* Clean up unused imports and stuff.

* Test fixes.

* Intellij inspections and test bind.

* Clean up dependencies some more

* Extract test load spec and factory to its own class.

* Cleanup test util

* Pull SegmentForTesting out to TestSegmentUtils.

* Fix up.

* Minor changes to infoDir

* Replace server announcer mock and verify that.

* Add tests.

* Update javadocs.

* Address review comments.

* Separate methods for download and bootstrap load

* Clean up return types and exception handling.

* No callback for loadSegment().

* Minor cleanup

* Pull out the test helpers into its own static class so it can have better state control.

* LocalCacheManager stuff

* Fix build.

* Fix build.

* Address some CI warnings.

* Minor updates to javadocs and test code.

* Address some CodeQL test warnings and checkstyle fix.

* Pass a Consumer<DataSegment> instead of boolean & rename variables.

* Small updates

* Remove one test constructor.

* Remove the other constructor that wasn't initializing fully and update usages.

* Cleanup withInfoDir() builder and unnecessary test hooks.

* Remove mocks and elaborate on comments.

* Commentary

* Fix a few Intellij inspection warnings.

* Suppress corePoolSize intellij-inspect warning.

The intellij-inspect tool doesn't seem to correctly inspect
lambda usages. See ScheduledExecutors.

* Update docs and add more tests.

* Use hamcrest for asserting order on expectation.

* Shutdown bootstrap exec.

* Fix checkstyle
2024-06-04 10:44:46 -07:00
Vadim Ogievetsky 0b4ac78a7b
Web console: fix delta sorting in the explore view table (#16542)
* more robust query naming

* make order by delta work

* fix tests

* fix type imports

* tidy up
2024-06-04 10:15:35 -07:00
Amit 540d3e6af5
Added new use cases and description of the use case - 5/14/24 (#16451)
Thanks for your contribution @amit-git-account

* Added new use cases and description of the use case - 5/14/24

The use case listing is not changed in a long time. While speaking with users, I came across several other use cases not listed here in the index. So I added new use cases and also added description against the use cases.

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* update spelling file

* Update docs/design/index.md

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: Benedict Jin <asdf2014@apache.org>
2024-06-04 09:47:49 -07:00
Andreas Maechler b0f2a07c40
Add README with link to docs (#16540) 2024-06-04 07:41:01 -07:00
Andreas Maechler 02caa50fd0
Remove unused interface from Azure extension (#16541) 2024-06-04 08:21:26 +05:30
Andreas Maechler 6c7443c93a
Update Azure extension tests to JUnit 5 (#16521)
Changes:
- Loosely followed the steps in the migration guide at
https://junit.org/junit5/docs/current/user-guide/#migrating-from-junit4
- Updated POM to add JUnit 5 dependencies
- Updated imports to JUnit 5 packages
- Updated annotations (Lifecycle annotations like `@BeforeEach`)
- Updated exception testing (`assertThrows`)
- Updated temporary path handling (use `@TempDir` annotation)
- Various other updates (replace other `Rule` usages, make sure to use JUnit 5 assertions)
2024-06-04 08:19:48 +05:30
Charles Smith 8f78c901e7
docs: add lookups to the sidebar (#16530)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2024-06-03 16:04:15 -07:00
Kashif Faraz 1974a38bc9
Clean up allocation and supervisor logs for easier debugging (#16535)
Changes:
- Use string taskGroup consistently to easily search for a task group
- Clean up other logs
- No change in any logic
2024-06-03 16:41:04 +05:30
Karan Kumar d0916865d0
Fix race in AzureClient factory fetch (#16525)
* Fix race in AzureClient factory fetch

* Fixing forbidden check.

* Renaming variable.
2024-06-01 22:50:44 +05:30
Charles Smith b1568fb95b
docs: Adds a redirect for flatten-json which was removed (#16263) 2024-05-31 16:16:12 -07:00
Katya Macedo f70ef1f434
Update front coding text (#16491)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2024-05-31 15:13:10 -07:00
Katya Macedo 92e660dd21
Add Druid 30.0.0 upgrade notes (#16522) 2024-05-31 13:23:22 -07:00
Atul Mohan b53d75758f
IcebergInputSource : Add option to toggle case sensitivity while reading columns from iceberg catalog (#16496)
* Toggle case sensitivity while reading columns from iceberg

* Fix tests

* Drop case check and set unconditionally
2024-05-31 10:18:52 -07:00
George Shiqi Wu 0936798122
Add limit to task payload size (#16512)
* Add limit to task payload size

* Change to a warning

* Remove test

* Fix unit tests

* Optionally throw alert

* PR comments

* Update indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* PR comments

* Reject large payloads

* Update docs/configuration/index.md

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* Update indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

---------

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2024-05-31 09:17:36 -07:00
Kashif Faraz b5b900b6a0
Do minor cleanup of AutoCompactionSnapshot.Builder (#16523)
Changes:
- Use `final` modifier for immutable
- Use builder methods for chaining
- Shorter lambda syntax
2024-05-31 16:06:53 +05:30
Jill Osborne 3c72ec8413
docs: Migration guide for subquery limit (#16519)
Adds a migration guide for Druid 30 to help users understand the new byte-based subquery limit property maxSubqueryBytes
2024-05-31 09:26:07 +05:30
Charles Smith 92e565e3b8
Adds a migration guide overview page to the release-info section (#16506)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: Katya Macedo <katya.macedo@imply.io>
2024-05-30 09:50:30 -07:00
Adithya Chakilam a9044ac235
Add cgroup cpu/mem/disk usage metrics (#16472)
* Add cgroup cpu/mem usage metrics

* checks

* comments

* docs fix

* add disk metrics

* fapi check

* checkstyle

* issues

* spelling

* change asserts

* checks

* use proc builder instead of runtime

* specify charset

* spotbug
2024-05-29 12:44:37 -07:00
Abhishek Radhakrishnan 75937c98e8
Upgrade delta kernel from 3.1.0 to 3.2.0 (#16513)
Upstream release: https://github.com/delta-io/delta/releases/tag/v3.2.0

- Upgrade kernel dependency to 3.2.0
- Notable breaking changes introduced in upstream that affects the Druid extension:
 - Rename TableClient -> Engine
 - Rename DefaultTableClient -> DefaultEngine
 - Exceptions moved to a separate package
 - Table.getPath() doesn't throw TableNotFoundException. Instead the exception is thrown
   when getting snapshot info from the Table object
2024-05-29 10:46:30 -07:00
George Shiqi Wu b3b62ac431
Update azure input source docs (#16508)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2024-05-29 10:00:46 -07:00
Sree Charan Manamala 6bbf9613f8
Throw soft exception in case of empty signature while building Scan Query (#16502) 2024-05-29 09:41:54 +02:00
Sree Charan Manamala 27cfe12f4a
Enable reordering of window operators (#16482)
This commit aims to enable the re-ordering of window operators in order to optimise
the sort and partition operators.
Example : 
```
SELECT m1, m2,
SUM(m1) OVER(PARTITION BY m2) as sum1,
SUM(m2) OVER() as sum2
from numFoo
GROUP BY m1,m2
```

In order to compute this query, we can order the operators as to first compute the operators
corresponding to sum2 and then place the operators corresponding to sum1 which would
help us in reducing one sort operator if we order our operators by sum1 and then sum2.
2024-05-29 12:17:12 +05:30
George Shiqi Wu f7013e012c
Add new test for handoff API (#16492)
* Add new test for handoff API

* Add new method

* fix test

* Update test
2024-05-28 12:57:51 -07:00
Adarsh Sanjeev 21f725f33e
Add octet streaming of sketchs in MSQ (#16269)
There are a few issues with using Jackson serialization in sending datasketches between controller and worker in MSQ. This caused a blowup due to holding multiple copies of the sketch being stored.

This PR aims to resolve this by switching to deserializing the sketch payload without Jackson.

The PR adds a new query parameter used during communication between controller and worker while fetching sketches, "sketchEncoding".

    If the value of this parameter is OCTET, the sketch is returned as a binary encoding, done by ClusterByStatisticsSnapshotSerde.
    If the value is not the above, the sketch is encoded by Jackson as before.
2024-05-28 18:12:38 +05:30