Commit Graph

14116 Commits

Author SHA1 Message Date
Rishabh Singh 4eced9b3c9
Fix CentralizedDatasourceSchema group IT failure (#16636)
* Fix build

* Update datasource name in ITSystemTableBatchIndexTaskTest
2024-06-21 15:40:12 -07:00
Suneet Saldanha 4e0ea7823b
Update docs for K8s TaskRunner Dynamic Config (#16600)
* Update docs for K8s TaskRunner Dynamic Config

* touchups

* code review

* npe

* oopsies
2024-06-21 06:01:59 -07:00
Akshat Jain cd438b1918
Emit metrics for S3UploadThreadPool (#16616)
* Emit metrics for S3UploadThreadPool

* Address review comments

* Revert unnecessary formatting change

* Revert unnecessary formatting change in metrics.md file

* Address review comments

* Add metric for task duration

* Minor fix in metrics.md

* Add s3Key and uploadId in the log message

* Address review comments

* Create new instance of ServiceMetricEvent.Builder for thread safety

* Address review comments

* Address review comments
2024-06-21 11:36:47 +05:30
Adithya Chakilam 35709de549
CgroupCpuSetMonitor: Initialize the cgroup discoverer (#16621) 2024-06-20 10:23:59 -07:00
Andreas Maechler ae70e18bc8
docs: Update Azure extension (#16585)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2024-06-20 09:31:29 -07:00
Abhishek Radhakrishnan b20c3dbadf
Fix malformed period throwing `ADMIN` persona error (#16626)
* Turn invalid periods into user-facing exception providing more context.

The current exception is targeting the ADMIN persona. Catch that and turn
it into a USER persona instead. Also, provide more context in the error
message.

* Review comment: pass the wrapping expression and stringify.

* Update processing/src/main/java/org/apache/druid/query/expression/ExprUtils.java

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

---------

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2024-06-20 08:40:28 -07:00
Sree Charan Manamala 7ac0862287
Grouping Engine fix when a limit spec with different order by columns is applied (#16534) 2024-06-20 11:35:58 +02:00
Rishabh Singh 169a8dbd1a
Disable TestValidateIncompatibleCentralizedDatasourceSchemaConfig (#16627)
* Fix build

* Ignore test
2024-06-18 17:50:46 -07:00
Maytas Monsereenusorn 44268e7fad
Pass requestBufferSize from Config to Proxy servlet (#16611) 2024-06-19 02:42:16 +07:00
AmatyaAvadhanula be3593f099
Optimize unused segment query for segment allocation (#16623) 2024-06-18 20:45:04 +05:30
Sam Rash a10310388f
Add Conditional Helpers to DruidException / InvalidInput (#16470)
Adds versions of 

DruidException.defensive(String, Object...)
InvalidInput.exception(String, Object...)
InvalidInput.exception(Throwable, String, Object...)

the versions add a boolean as the first arg and only create and throw
an exception if it's false. It can be used similar to
Preconditions.checkState/checkArgument
2024-06-18 14:05:43 +05:30
AmatyaAvadhanula 4c8932e00e
Fix attempts to publish the same pending segments multiple times (#16605)
* Fix attempts to publish the same pending segments multiple times
2024-06-18 12:02:13 +05:30
Abhishek Radhakrishnan 51b2f6cb45
Fix retry logic in `BrokerClient` and flakey `BrokerClientTest` (#16618)
* Fix flakey BrokerClientTest.

The testError() method reliably fails in the IDE. This is because the
the test runner has

<surefire.rerunFailingTestsCount>3</surefire.rerunFailingTestsCount> is set to 3, so maven
retries this "flaky test" multiple times and the test code returns a successful response
in the third attempt.

The exception handling in BrokerClientTest was broken:
- All non-2xx errors were being turned as 5xx errors. Remove that block of
code. If we need to handle retries of more specific 5xx error codes, that should be
hanlded explicitly. Or if there's a source of 4xx class error that needs to be 5xx,
fix that in the source of error.

* Fix CodeQL warning for unused parameter.
2024-06-17 12:48:15 -07:00
Maytas Monsereenusorn d6c7d868cd
Fix peon startup with non string property value (#16612) 2024-06-16 07:48:44 +05:30
Jill Osborne aec1d5ddd6
Link fix (#16596)
* Link fix

* Update docs/operations/auth.md

Co-authored-by: Andreas Maechler <amaechler@gmail.com>

---------

Co-authored-by: Andreas Maechler <amaechler@gmail.com>
2024-06-14 11:40:53 -07:00
Virushade eb842d3dda
Remove redundant check on optional in BlockingQueueFrameChannel.Writable#isClosed (#16595)
* Remove redundant check on optional in BlockingQueueFrameChannel.Writable#isClosed

* Rollback mistake
2024-06-14 15:21:07 +05:30
Laksh Singla da1e293a57
Deserialize dimensions in group by queries to their respective types when reading from their serialized format (#16511)
* init

* tests, pair groupable

* framework change

* tests

* update benchmarks

* comments

* add javadoc for the jsonMapper

* remove extra deserialization

* add special serde for map based result rows

* revert unnecessary change

---------

Co-authored-by: asdf2014 <asdf2014@apache.org>
2024-06-14 16:27:47 +08:00
317brian e1926e2549
docs: fix redirect (#16548)
* doc: cleanup unnecessary redirect

(cherry picked from commit d86aaadbc78cc51345f768ee66c9a8b2cbf13f27)

* restore redirect file entry. delete md file
2024-06-14 09:54:16 +08:00
Alberic Liu ea2de517b2
Update the youtube link for druid presentations page (#16601)
* Update the link to lambda architectures with Druid

* update the youtube link
2024-06-14 09:47:46 +08:00
Victoria Lim 836cdb48a5
docs: Migration guide for MVDs to arrays (#16516)
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2024-06-13 13:05:58 -07:00
Zoltan Haindrich ac19b148c2
Upgrade calcite to 1.37.0 (#16504)
* contains Make a full copy of the parser and apply our modifications to it #16503
* some minor api changes pair/entry
* some unnecessary aggregation was removed from a set of queries in `CalciteSubqueryTest`
* `AliasedOperatorConversion` was detecting `CHAR_LENGTH` as not a function ; I've removed the check
  * the field it was using doesn't look maintained that much
  * the `kind` is passed for the created `SqlFunction` so I don't think this check is actually needed
* some decoupled test cases become broken - will be fixed later
* some aggregate related changes: due to the fact that SUM() and COUNT() of no inputs are different
* upgrade avatica to 1.25.0
* `CalciteQueryTest#testExactCountDistinctWithFilter` is now executable

Close apache/druid#16503
2024-06-13 08:47:50 +02:00
George Shiqi Wu d5a25a94b8
Docs: Clarify that all supervisors can support early handoff (#16588) 2024-06-13 08:43:22 +05:30
YongGang 46dbc74053
Support Dynamic Peon Pod Template Selection in K8s extension (#16510)
* initial commit

* add Javadocs

* refine JSON input config

* more test and fix build

* extract existing behavior as default strategy

* change template mapping fallback

* add docs

* update doc

* fix doc

* address comments

* define Matcher interface

* fix test coverage

* use lower case for endpoint path

* update Json name

* add more tests

* refactoring Selector class
2024-06-12 15:27:10 -07:00
Zoltan Haindrich f8645de341
Remove incorrect utf8 conversion of ResultCache keys (#16569) 2024-06-12 13:12:05 -07:00
Clint Wylie fee509df2e
fix NestedDataColumnIndexerV4 to not report cardinality (#16507)
* fix NestedDataColumnIndexerV4 to not report cardinality
changes:
* fix issue similar to #16489 but for NestedDataColumnIndexerV4, which can report STRING type if it only processes a single type of values. this should be less common than the auto indexer problem
* fix some issues with sql benchmarks
2024-06-11 20:58:12 -07:00
zachjsh 3f5f5921e0
Fix sql syntax error user (#16583)
This fixes an issue where in some cases, a SQL syntax error encountered when parsing / planning a query results in an error returned to the user with persona a `admin` when it should instead be `user`.
2024-06-11 18:08:35 -04:00
Andreas Maechler fec48432d4
docs: Correct some outdated module names (#16584)
* Fix module names

* Better spacing

* Some spacing

* Suggestions from code review

Thanks Abhishek.

* More links

* Roll-up time

* Remove logs

* More spelling
2024-06-11 14:17:40 -07:00
Andreas Maechler 24056b90b5
Bring back missing property in indexer documentation (#16582)
* Bring back druid.peon.taskActionClient.retry.minWait

* Update docs/configuration/index.md

* Consistent italics

Thanks Abhishek.

* Update docs/configuration/index.md

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>

* Consistent list style

* Remove extra space

---------

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
2024-06-10 16:52:54 -07:00
Kashif Faraz e4fdf1055b
Update default value of `druid.indexer.tasklock.batchAllocationWaitTime` to zero (#16578)
Update default value of druid.indexer.tasklock.batchAllocationWaitTime to 0.
Thus, a segment allocation request is processed immediately unless there are already some requests queued before this one. While in queue, a segment allocation request may get clubbed together with other similar requests into a batch to reduce load on the metadata store.
2024-06-10 20:07:23 +05:30
317brian 8e11adfc6f
docs: remove outdated druidversion var from a page (#16570)
Co-authored-by: asdf2014 <asdf2014@apache.org>
2024-06-10 15:30:36 +08:00
Clint Wylie 3fb6ba22e8
fix expression column capabilities to not report dictionary encoded unless input is string (#16577) 2024-06-08 13:05:19 -07:00
Andreas Maechler 40ba429c5f
More validation for Azure account config (#16561)
* Mark `account` as NotNull

* Remove account test

Handled by annotation now

* Cleanup account config

* Mark container as not-null.
2024-06-07 13:24:15 -07:00
Andreas Maechler e6a82e8a11
Only create container in `AzureStorage` for write operations (#16558)
* Remove unused constants

* Refactor getBlockBlobLength

* Better link

* Upper-case log

* Mark defaultStorageAccount nullable

This is the case if you do not use Azure for deep-storage but ingest from Azure blobs.

* Do not always create a new container if it doesn't exist

Specifically, only create a container if uploading a blob or writing a blob stream

* Add lots of comments, group methods

* Revert "Mark defaultStorageAccount nullable"

* Add mockito for junit

* Add extra test

* Add comment

Thanks George.

* Pass blockSize as Long

* Test more branches...
2024-06-07 09:47:51 -07:00
Vadim Ogievetsky efe9079f0a
Web console: fix pagination and filtering regression in supervisor view (#16571)
* fix pagination and filtering in supervisor view

* update snapshot
2024-06-07 21:09:51 +05:30
razinbouzar 844b2177de
Fix 2 coordinators elected as leader (#16528)
Changes:
- Recreate the leader latch when connection to zookeeper is lost
- Do not become leader if leader latch is already closed
2024-06-07 15:07:30 +05:30
Akshat Jain 03a38be446
Optimize S3 storage writing for MSQ durable storage (#16481)
* Optimise S3 storage writing for MSQ durable storage

* Get rid of static ConcurrentHashMap

* Fix static checks

* Fix tests

* Remove unused constructor parameter chunkValidation + relevant cleanup

* Assert etags as String instead of Integer

* Fix flaky test

* Inject executor service

* Make threadpool size dynamic based on number of cores

* Fix S3StorageDruidModuleTest

* Fix S3StorageConnectorProviderTest

* Fix injection issues

* Add S3UploadConfig to manage maximum number of concurrent chunks dynamically based on chunk size

* Address the minor review comments

* Refactor S3UploadConfig + ExecutorService into S3UploadManager

* Address review comments

* Make updateChunkSizeIfGreater() synchronized instead of recomputeMaxConcurrentNumChunks()

* Address the minor review comments

* Fix intellij-inspections check

* Refactor code to use futures for maxNumConcurrentChunks. Also use executor service with blocking queue for backpressure semantics.

* Update javadoc

* Get rid of cyclic dependency injection between S3UploadManager and S3OutputConfig

* Fix RetryableS3OutputStreamTest

* Remove unnecessary synchronization parts from RetryableS3OutputStream

* Update javadoc

* Add S3UploadManagerTest

* Revert back to S3StorageConnectorProvider extends S3OutputConfig

* Address Karan's review comments

* Address Kashif's review comments

* Change a log message to debug

* Address review comments

* Fix intellij-inspections check

* Fix checkstyle

---------

Co-authored-by: asdf2014 <asdf2014@apache.org>
2024-06-07 11:33:16 +05:30
Andreas Maechler e9f723344b
Disable event hubs when kafka extensions isn't loaded (#16559) 2024-06-06 16:59:26 -07:00
Rishabh Singh 423c91f9e4
Revert log line to debug (#16565) 2024-06-06 14:00:31 +05:30
Kashif Faraz e4f59e00b2
Fix backwards compatibility with centralized schema config in partial_index_merge tasks (#16556)
* Handle null values of centralized schema config in PartialMergeTask

* Fix checkstyle

* Do not pass centralized schema config from supervisor task to sub-tasks

* Do not pass ObjectMapper in constructor of task

* Fix logs

* Fix tests
2024-06-06 13:44:04 +05:30
Gian Merlino 277006446d
Fallback vectorization for FunctionExpr and BaseMacroFunctionExpr. (#16366)
* Fallback vectorization for FunctionExpr and BaseMacroFunctionExpr.

This patch adds FallbackVectorProcessor, a processor that adapts non-vectorizable
operations into vectorizable ones. It is used in FunctionExpr and BaseMacroFunctionExpr.

In addition:

- Identifiers are updated to offer getObjectVector for ARRAY and COMPLEX in addition
  to STRING. ExprEvalObjectVector is updated to offer ARRAY and COMPLEX as well.

- In SQL tests, cannotVectorize now fails tests if an exception is not thrown. This makes
  it easier to identify tests that can now vectorize.

- Fix a null-matcher bug in StringObjectVectorValueMatcher.

* Fix tests.

* Fixes.

* Fix tests.

* Fix test.

* Fix test.
2024-06-05 20:03:02 -07:00
Gian Merlino 2534a42539
Fix serde for ArrayOfDoublesSketchConstantPostAggregator. (#16550)
* Fix serde for ArrayOfDoublesSketchConstantPostAggregator.

The version originally added in #13819 was missing an annotation for
the "value" property. Fixes #16539.

Line endings for ArrayOfDoublesSketchConstantPostAggregator.java are changed
from \r\n to \n.

Adds a serde test, and improves various other datasketches post-aggregator
serde tests to deserialize into PostAggregator. This verifies that the type
information is set up correctly.

* Fix excessive imports.

* Fix equals, hashCode.
2024-06-05 20:01:51 -07:00
Gian Merlino b837ce565b
Simplify serialized form of JsonInputFormat. (#15691)
* Simplify serialized form of JsonInputFormat.

Use JsonInclude for keepNullColumns, assumeNewlineDelimited, and
useJsonNodeReader. Because the default value of keepNullColumns is
variable, we store the original configured value rather than the
derived value, and include if the original value is nonnull.

* Fix test.
2024-06-05 20:01:14 -07:00
Gian Merlino 717e634156
Router: Authorize permissionless internal requests. (#16419)
* Router: Authorize permissionless internal requests.

Router-internal requests like /proxy/enabled and errors for invalid
requests should not require permissions, but they still need to be
authorized in order to satisfy the PreResponseAuthorizationCheckFilter.
This patch adds authorization checks that do not require any particular
permissions.

* Fix tests.
2024-06-05 15:31:02 -07:00
Gian Merlino 1040a29bc5
Fix capabilities reported by UnnestStorageAdapter. (#16551)
UnnestStorageAdapter and its cursors did not return capabilities correctly
for the output column. This patch fixes two problems:

1) UnnestStorageAdapter returned the capabilities of the unnest virtual
   column prior to unnesting. It should return the post-unnest capabilities.

2) UnnestColumnValueSelectorCursor passed through isDictionaryEncoded from
   the unnest virtual column. This is incorrect, because the dimension selector
   created by this class never has a dictionary. This is the cause of #16543.
2024-06-05 15:19:42 -07:00
Akshat Jain 6d7d2ffa63
Add interface method for returning canonical lookup name (#16557)
* Add interface method for returning canonical lookup name

* Address review comment

* Add test in LookupReferencesManagerTest for coverage check

* Add test in LookupSerdeModuleTest for coverage check
2024-06-05 14:33:18 -07:00
Katya Macedo 7aecc09230
Docs: Remove circular link (#16553) 2024-06-05 11:07:36 -07:00
Bünyamin 30c59042e0
Add new metrics from v30 to prometheus-emitter (#16345)
Co-authored-by: asdf2014 <asdf2014@apache.org>
2024-06-05 10:51:48 +05:30
Charles Smith c100ae0ecc
Add a tutorial for LATEST_BY to get most recent data (#16515)
Co-authored-by: Will Xu <2bethere@gmail.com>
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2024-06-04 17:00:25 -07:00
Jill Osborne 8b5802d4cd
docs: add maxSubqueryBytes limit to migration guide landing page (#16547)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2024-06-04 12:52:06 -07:00
Abhishek Radhakrishnan b9ba286423
Fix task bootstrapping & simplify segment load/drop flows (#16475)
* Fix task bootstrap locations.

* Remove dependency of SegmentCacheManager from SegmentLoadDropHandler.

- The load drop handler code talks to the local cache manager via
SegmentManager.

* Clean up unused imports and stuff.

* Test fixes.

* Intellij inspections and test bind.

* Clean up dependencies some more

* Extract test load spec and factory to its own class.

* Cleanup test util

* Pull SegmentForTesting out to TestSegmentUtils.

* Fix up.

* Minor changes to infoDir

* Replace server announcer mock and verify that.

* Add tests.

* Update javadocs.

* Address review comments.

* Separate methods for download and bootstrap load

* Clean up return types and exception handling.

* No callback for loadSegment().

* Minor cleanup

* Pull out the test helpers into its own static class so it can have better state control.

* LocalCacheManager stuff

* Fix build.

* Fix build.

* Address some CI warnings.

* Minor updates to javadocs and test code.

* Address some CodeQL test warnings and checkstyle fix.

* Pass a Consumer<DataSegment> instead of boolean & rename variables.

* Small updates

* Remove one test constructor.

* Remove the other constructor that wasn't initializing fully and update usages.

* Cleanup withInfoDir() builder and unnecessary test hooks.

* Remove mocks and elaborate on comments.

* Commentary

* Fix a few Intellij inspection warnings.

* Suppress corePoolSize intellij-inspect warning.

The intellij-inspect tool doesn't seem to correctly inspect
lambda usages. See ScheduledExecutors.

* Update docs and add more tests.

* Use hamcrest for asserting order on expectation.

* Shutdown bootstrap exec.

* Fix checkstyle
2024-06-04 10:44:46 -07:00