Commit Graph

13005 Commits

Author SHA1 Message Date
Kashif Faraz d04521d58f
Improve description field when emitting metric for broadcast failure (#14703)
Changes:
- Emit descriptions such as `Load queue is full`, `No disk space` etc. instead of `Unknown error`
- Rewrite `BroadcastDistributionRuleTest`
2023-08-01 10:13:55 +05:30
Abhishek Agarwal 5c96b60162
Increase heap size for router (#14699) 2023-08-01 08:58:48 +05:30
Gian Merlino 5387f1bac0
Remove chatAsync parameter, so chat is always async. (#14692)
* Remove chatAsync parameter, so chat is always async.

chatAsync has been made default in Druid 26. I have seen good
battle-testing of it in production, and am comfortable removing the
older sync client.

This was the last remaining usage of IndexTaskClient, so this patch
deletes all that stuff too.

* Remove unthrown exception.

* Remove unthrown exception.

* No more TimeoutException.
2023-07-31 19:42:51 -07:00
Jason Koch 44d5c1a15f
split KillUnusedSegmentsTask to processing in smaller chunks (#14642)
split KillUnusedSegmentsTask to smaller batches

Processing in smaller chunks allows the task execution to yield the TaskLockbox lock,
which allows the overlord to continue being responsive to other tasks and users while
this particular kill task is executing.

* introduce KillUnusedSegmentsTask batchSize parameter to control size of batching

* provide an explanation for kill task batchSize parameter

* add logging details for kill batch progress
2023-07-31 12:56:27 -07:00
Adarsh Sanjeev 339b8d959f
Change the default format from OBJECT to OBJECTLINES (#14700) 2023-07-31 18:39:58 +00:00
Adarsh Sanjeev 21d023b62b
Handle taskIds which are not found in the overlord correctly (#14706)
This PR has fixes a bug in the SqlStatementAPI where if the task is not found on the overlord, the response status is 500.
This changes the response to invalid input since the queryID passed is not valid.
2023-07-31 21:38:14 +05:30
Kashif Faraz 844a9c7ffb
Cancel loads of unused segments (#14644) 2023-07-31 18:01:50 +05:30
Kashif Faraz e9b4f1e95c
Fix reported replication factor of segment with zero required replicas (#14701) 2023-07-31 14:51:01 +05:30
AmatyaAvadhanula c648b1cb36
Add task toolbox to DruidInputSource (#14507)
Add task toolbox to DruidInputSource
2023-07-31 13:12:01 +05:30
Vadim Ogievetsky 9e1650e327
Web console: add durable storage selector (#14669) 2023-07-31 05:33:24 +00:00
Laksh Singla 8232c03667
[MSQ] Handle dimensionless group by queries with partitioning, and multiple workers (#14678)
* fixup

* add ut

* review
2023-07-29 07:15:17 +05:30
Will Xu 25df122b41
Releasenote notebooks 26 (#14410)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <lim.t.victoria@gmail.com>
2023-07-28 16:43:35 -07:00
Clint Wylie 5f72f4f37d
fixes for nested virtual column array element vector selectors and fixes for variant and nested variant numeric columns
* fix issue with nested virtual column array element vector selectors when input is numeric array but output is non-numeric
* add vector value selector for mixed numeric type variant and nested variant fields, tests
2023-07-28 15:14:29 -07:00
Gian Merlino 6517fc2796
Save a metadata call when reading files from CloudObjectInputSource. (#14677)
* Save a metadata call when reading files from CloudObjectInputSource.

The call to createSplits(inputFormat, null) in formattableReader would
use the default split hint spec, MaxSizeSplitHintSpec, which makes
getObjectMetadata calls in order to compute its splits. This isn't
necessary; we're just trying to unpack the files inside the input
source.

To fix this, use FilePerSplitHintSpec to extract files without any
funny business.

* Adjust call.

* Fix constant.

* Test coverage.
2023-07-28 13:31:03 -07:00
Nhi Pham 53733d2542
JSON-querying API documentation refactor (#14589)
Co-authored-by: Jill Osborne <jill.osborne@imply.io>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-07-28 10:55:17 -07:00
Gian Merlino 46ecc6b900
Frames support for string arrays that are null. (#14653)
* Frames support for string arrays that are null.

The row format represents null arrays as 0x0001, which older readers
would interpret as an empty array. This provides compatibility with
older readers, which is useful during updates.

The column format represents null arrays by writing -(actual length) - 1
instead of the length, and using FrameColumnWriters.TYPE_STRING_ARRAY for
the type code for string arrays generally. Older readers will report this
as an unrecognized type code. Column format is only used by the operator
query, which is currently experimental, so the impact isn't too severe.

* Remove unused import.

* Return Object[] instead of List from frame array selectors.

Update MSQSelectTest and MSQInsertTest to reflect the fact that null
arrays are possible.

Add a bunch of javadocs to object selectors describing expected behavior,
including the requirement that array selectors return Object[].

* update test case.

* Update test cases.
2023-07-28 10:23:39 -07:00
Kashif Faraz 22290fd632
Test: Simplify test impl of LoadQueuePeon (#14684)
Changes
- Rename `LoadQueuePeonTester` to `TestLoadQueuePeon`
- Simplify `TestLoadQueuePeon` by removing dependency on `CuratorLoadQueuePeon`
- Remove usages of mock peons in `LoadRuleTest` and use `TestLoadQueuePeon` instead
2023-07-28 16:14:23 +05:30
Clint Wylie d406bafdfc
fix issues with equality and range filters matching double values to long typed inputs (#14654)
* fix issues with equality and range filters matching double values to long typed inputs
* adjust to ensure we never homogenize null, [], and [null] into [null] for expressions on real array columns
2023-07-27 16:01:21 -07:00
TSFenwick 9a9038c7ae
Speed up kill tasks by deleting segments in batch (#14131)
* allow for batched delete of segments instead of deleting segment data one by one

create new batchdelete method in datasegment killer that has default functionality
of iterating through all segments and calling delete on them. This will enable
a slow rollout of other deepstorage implementations to move to a batched delete
on their own time

* cleanup batchdelete segments

* batch delete with the omni data deleter

cleaned up code
just need to add tests and docs for this functionality

* update java doc to explain how it will try to use batch if function is overwritten

* rename killBatch to kill
add unit tests

* add omniDataSegmentKillerTest for deleting multiple segments at a time. fix checkstyle

* explain test peculiarity better

* clean up batch kill in s3.

* remove unused return value. cleanup comments and fix checkstyle

* default to batch delete. more specific java docs. list segments that couldn't be deleted
if there was a client error or server error

* simplify error handling

* add tests where an exception is thrown when killing multiple s3 segments

* add test for failing to delete two calls with the s3 client

* fix javadoc for kill(List<DataSegment> segments) clean up tests remove feature flag

* fix typo in javadocs

* fix test failure

* fix checkstyle and improve tests

* fix intellij inspections issues

* address comments, make delete multiple segments not assume same bucket

* fix test errors

* better grammar and punctuation. fix test. and better logging for exception

* remove unused code

* avoid extra arraylist instantiation

* fix broken test

* fix broken test

* fix tests to use assert.throws
2023-07-27 15:34:44 -07:00
Nhi Pham ee9cfc7e18
Tasks API documentation spacing update (#14633) 2023-07-27 14:24:55 -07:00
Gian Merlino 986a271a7d
Merge core CoordinatorClient with MSQ CoordinatorServiceClient. (#14652)
* Merge core CoordinatorClient with MSQ CoordinatorServiceClient.

Continuing the work from #12696, this patch merges the MSQ
CoordinatorServiceClient into the core CoordinatorClient, yielding a single
interface that serves both needs and is based on the ServiceClient RPC
system rather than DruidLeaderClient.

Also removes the backwards-compatibility code for the handoff API in
CoordinatorBasedSegmentHandoffNotifier, because the new API was added
in 0.14.0. That's long enough ago that we don't need backwards
compatibility for rolling updates.

* Fixups.

* Trigger GHA.

* Remove unnecessary retrying in DruidInputSource. Add "about an hour"
retry policy and h

* EasyMock
2023-07-27 13:23:37 -07:00
Nhi Pham 482def788f
Supervisor API documentation refactor (#14579) 2023-07-27 12:58:37 -07:00
Katya Macedo 0b9e4af443
Clean up some of the descriptions (#14661)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2023-07-27 12:11:58 -07:00
YongGang 9b88b78ba4
Fix race condition in KubernetesTaskRunner when task is added to the map (#14643)
Changes:
- Fix race condition in KubernetesTaskRunner introduced by #14435 
- Perform addition and removal from map inside a synchronized block
- Update tests
2023-07-27 12:34:36 +05:30
Kashif Faraz 7634ac896e
Quick fix for SegmentLoadDropHandler bug (#14670) 2023-07-27 11:53:58 +05:30
Nhi Pham dd204e596d
Refresh the OS Druid web console screenshots (#14397)
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2023-07-26 16:32:03 -07:00
Gian Merlino 4a68f8a294
Fix maxCompletedTasks parameter in OverlordClientImpl. (#14667)
It was sent to the server as "maxCompletedTasks", but the server expects
"max". This caused it to be ignored. This bug was introduced in #14581.
2023-07-26 15:12:20 -07:00
dependabot[bot] add9796d46
Bump qs from 6.5.2 to 6.5.3 in /website (#13510)
Bumps [qs](https://github.com/ljharb/qs) from 6.5.2 to 6.5.3.
- [Release notes](https://github.com/ljharb/qs/releases)
- [Changelog](https://github.com/ljharb/qs/blob/main/CHANGELOG.md)
- [Commits](https://github.com/ljharb/qs/compare/v6.5.2...v6.5.3)

---
updated-dependencies:
- dependency-name: qs
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-26 22:17:36 +02:00
dependabot[bot] 915cea7586
Bump decode-uri-component from 0.2.0 to 0.2.2 in /web-console (#13481)
Bumps [decode-uri-component](https://github.com/SamVerschueren/decode-uri-component) from 0.2.0 to 0.2.2.
- [Release notes](https://github.com/SamVerschueren/decode-uri-component/releases)
- [Commits](https://github.com/SamVerschueren/decode-uri-component/compare/v0.2.0...v0.2.2)

---
updated-dependencies:
- dependency-name: decode-uri-component
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-26 09:07:09 -07:00
slfan1989 d69edb7723
Docs: Fix some typos. (#14663)
---------

Co-authored-by: slfan1989 <louj1988@@>
2023-07-26 21:24:18 +05:30
dependabot[bot] e99bab2fd3
Bump org.xerial.snappy:snappy-java from 1.1.10.1 to 1.1.10.3 (#14641)
Bumps [org.xerial.snappy:snappy-java](https://github.com/xerial/snappy-java) from 1.1.10.1 to 1.1.10.3.
- [Release notes](https://github.com/xerial/snappy-java/releases)
- [Commits](https://github.com/xerial/snappy-java/compare/v1.1.10.1...v1.1.10.3)

---
updated-dependencies:
- dependency-name: org.xerial.snappy:snappy-java
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-26 06:27:55 -07:00
dependabot[bot] b2a07c5db1
Bump word-wrap from 1.2.3 to 1.2.4 in /web-console (#14613)
Bumps [word-wrap](https://github.com/jonschlinkert/word-wrap) from 1.2.3 to 1.2.4.
- [Release notes](https://github.com/jonschlinkert/word-wrap/releases)
- [Commits](https://github.com/jonschlinkert/word-wrap/compare/1.2.3...1.2.4)

---
updated-dependencies:
- dependency-name: word-wrap
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-26 06:27:06 -07:00
Adarsh Sanjeev 6a42a24426
Fix a comment in the Calcite UT testExactCountDistinctWithFilter (#14628) 2023-07-26 06:32:26 +00:00
Abhishek Agarwal 52fbc6939e
Fix the bug in string last vector aggregation (#14655) 2023-07-26 09:18:03 +05:30
Katya Macedo 4804630c78
Clean up Kinesis doc (#14529) 2023-07-25 19:24:36 -07:00
Nhi Pham 2dc3e94a9a
Service status API documentation refactor (#14528)
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-07-25 18:42:47 -07:00
Sergio Ferragut 6b229f5118
new environment vars for external druid and kafka + jupyter template (#14592) 2023-07-25 11:02:16 -07:00
AmatyaAvadhanula 6566bda57c
Suppress CVEs (#14648)
CVE-2023-34462 - (Allows malicious allocation of resources without throttling) Not applicable as the Netty requests in Druid are internal, and not user facing.
CVE-2016-2402 - (Man in the middle with okhttp by sending certificate chains) Not applicable as okhttp requests in Druid are also internal
2023-07-25 13:37:50 +05:30
Gian Merlino 2f9619a96f
Use OverlordClient for all Overlord RPCs. (#14581)
* Use OverlordClient for all Overlord RPCs.

Continuing the work from #12696, this patch removes HttpIndexingServiceClient
and the IndexingService flavor of DruidLeaderClient completely. All remaining
usages are migrated to OverlordClient.

Supporting changes include:

1) Add a variety of methods to OverlordClient.

2) Update MetadataTaskStorage to skip the complete-task lookup when
   the caller requests zero completed tasks. This helps performance of
   the "get active tasks" APIs, which don't want to see complete ones.

* Use less forbidden APIs.

* Fixes from CI.

* Add test coverage.

* Two more tests.

* Fix test.

* Updates from CR.

* Remove unthrown exceptions.

* Refactor to improve testability and test coverage.

* Add isNil tests.

* Remove unnecessary "deserialize" methods.
2023-07-24 21:14:27 -07:00
George Shiqi Wu f742bb7376
Get task location should be stored on the lifecycle object (#14649)
* Fix issue with long data source names

* Use the regular library

* Save location and tls enabled

* Null out before running

* add another comment
2023-07-24 18:36:19 -07:00
George Shiqi Wu 28914bbab8
Fix issue with long data source names (#14620)
* Fix issue with long data source names

* Use the regular library

* fix overlord utils test
2023-07-24 08:45:10 -07:00
aho135 607f511767
Improve logging in CoordinatorBasedSegmentHandoffNotifier (#14640) 2023-07-24 18:04:21 +05:30
AmatyaAvadhanula 536e491d00
Suppress ambari metrics CVEs (#14645)
* Suppress ambari metrics CVEs
2023-07-24 18:01:56 +05:30
Jason Koch 54f29fedce
Use PreparedBatch while deleting segments (#14639)
Related to #14634 

Changes:
- Update `IndexerSQLMetadataStorageCoordinator.deleteSegments` to use
JDBI PreparedBatch instead of issuing single DELETE statements
2023-07-23 22:55:04 +05:30
Abhishek Agarwal efb32810c4
Clean up the core API required for Iceberg extension (#14614)
Changes:
- Replace `AbstractInputSourceBuilder` with `InputSourceFactory`
- Move iceberg specific logic to `IcebergInputSource`
2023-07-21 13:01:33 +05:30
Vadim Ogievetsky f5784e66d3
Web console: add explore view (#14602)
This PR adds a simple, stateless, SQL backed, data exploration view to the web console. The idea is to let users explore data in Druid with point-and-click interaction and visualizations (instead of writing SQL and looking at a table). This can provide faster time-to-value for a user new to Druid and can allow a Druid veteran to quickly chart some data that they care about.
2023-07-21 11:19:23 +05:30
Vadim Ogievetsky 295653648b
Web console: make typing fun again (#14632)
* extract common function

* make typing fun again
2023-07-20 16:22:41 -07:00
Karan Kumar 77e0c16bce
Sql statement api error messaging fixes. (#14629)
* Error messaging fixes.

* Static check fix

* Review comments
2023-07-20 22:48:44 +05:30
Abhishek Agarwal 1ddbaa8744
Reserve threads for non-query requests without using laning (#14576)
This PR uses the QoSFilter available in Jetty to park the query requests that exceed a configured limit. This is done so that other HTTP requests such as health check calls do not get blocked if the query server is busy serving long-running queries. The same mechanism can also be used in the future to isolate interactive queries from long-running select queries from interactive queries within the same broker.

Right now, you can still get that isolation by setting druid.query.scheduler.numThreads to a value lowe than druid.server.http.numThreads. That enables total laning but the side effect is that excess requests are not queued and rejected outright that leads to a bad user experience.

Parked requests are timed out after 30 seconds by default. I overrode that to the maxQueryTimeout in this PR.
2023-07-20 15:03:48 +05:30
Clint Wylie 024ce40f1a
reduce heap footprint of ingesting auto typed columns by pushing compression and index generation into writeTo (#14615) 2023-07-20 00:54:58 -07:00