13312 Commits

Author SHA1 Message Date
Adarsh Sanjeev
959148ad37
Add code to wait for segments generated to be loaded on historicals (#14322)
Currently, after an MSQ query, the web console is responsible for waiting for the segments to load. It does so by checking if there are any segments loading into the datasource ingested into, which can cause some issues, like in cases where the segments would never be loaded, or would end up waiting for other ingests as well.

This PR shifts this responsibility to the controller, which would have the list of segments created.
2023-09-06 10:35:57 +05:30
Clint Wylie
706b57c0b2
fixup array and mvd sql docs (#14928) 2023-09-05 16:17:00 -07:00
Jill Osborne
425ebaa387
Query tips doc (#14922)
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2023-09-05 14:16:01 -07:00
Soumyava
8088a763a6
Vectorize earliest aggregator for both numeric and string types (#14408)
* Vectorizing earliest for numeric

* Vectorizing earliest string aggregator

* checkstyle fix

* Removing unnecessary exceptions

* Ignoring tests in MSQ as earliest is not supported for numeric there

* Fixing benchmarks

* Updating tests as MSQ does not support earliest for some cases

* Addressing review comments by adding the following:
1. Checking capabilities first before creating selectors
2. Removing mockito in tests for numeric first aggs
3. Removing unnecessary tests

* Addressing issues for dictionary encoded single string columns where we can use the dictionary ids instead of the entire string

* Adding a flag for multi value dimension selector

* Addressing comments

* 1 more change

* Handling review comments part 1

* Handling review comments and correctness fix for latest_by when the time expression need not be in sorted order

* Updating numeric first vector agg

* Revert "Updating numeric first vector agg"

This reverts commit 429170990192883e51812311c49d2e461e6db732.

* Updating code for correctness issues

* fixing an issue with latest agg

* Adding more comments and removing an unnecessary check

* Addressing null checks for tie selector and only vectorize false for quantile sketches
2023-09-05 08:41:42 -07:00
Abhishek Radhakrishnan
9d6ca61ac1
Verify statsd mock client interaction in unit test (#14939) 2023-09-05 07:34:22 -07:00
Kashif Faraz
289ee1e011
Refactor: Cleanup NoopTask (#14938)
Changes:
- Simplify static `create` methods for `NoopTask`
- Remove `FirehoseFactory`, `IsReadyResult`, `readyTime` from `NoopTask`
as these fields were not being used anywhere
- Update tests
2023-09-05 09:15:41 +05:30
panhongan
d4e972e1e4
Add checking for new checkpoint (#14353)
Check that a checkpoint is non-empty before adding it to the checkpoint sequence 
in a SeekableStreamSupervisor
2023-09-04 13:18:55 +05:30
Kashif Faraz
ec630e3671
Remove deprecated coordinator dynamic configs (#14923)
Changes:

[A] Remove config `decommissioningMaxPercentOfMaxSegmentsToMove`
- It is a complicated config 😅 , 
- It is always desirable to prioritize move from decommissioning servers so that
they can be terminated quickly, so this should always be 100%
- It is already handled by `smartSegmentLoading` (enabled by default)

[B] Remove config `maxNonPrimaryReplicantsToLoad`
This was added in #11135 to address two requirements:
- Prevent coordinator runs from getting stuck assigning too many segments to historicals
- Prevent load of replicas from competing with load of unavailable segments

Both of these requirements are now already met thanks to:
- Round-robin segment assignment
- Prioritization in the new coordinator
- Modifications to `replicationThrottleLimit`
- `smartSegmentLoading` (enabled by default)
2023-09-04 11:54:36 +05:30
Kashif Faraz
7f26b80e21
Simplify ServiceMetricEvent.Builder (#14933)
Changes:
- Make ServiceMetricEvent.Builder extend ServiceEventBuilder<ServiceMetricEvent>
and thus convert it to a plain builder rather than a builder of builder.
- Add methods setCreatedTime , setMetricAndValue to the builder
2023-09-01 11:30:45 +05:30
Clint Wylie
dea9d4f1a7
cleaning DruidProcessingConfig bindings (#14927) 2023-08-30 22:35:08 -07:00
Vadim Ogievetsky
680669fd3a
show execution dialog in task view (#14930) 2023-08-30 15:59:34 -07:00
Vadim Ogievetsky
04a1153d0f
line chart fix others not mapping correctly (#14931) 2023-08-30 15:59:26 -07:00
Sébastien
42cfb999cd
Added brush to time-chart (#14929) 2023-08-30 10:36:50 -07:00
Vadim Ogievetsky
d295b9158f
Web console: dynamic query parameters UI (#14921)
* fix nvl in table

* add query parameter dialog

* pre-wrap in the tables

* fix typo
2023-08-29 23:14:25 -07:00
Kashif Faraz
8263f0d1e9
Reduce coordinator logs when operating normally (#14926)
Changes:
- Reduce log level of some coordinator stats, which only denote normal coordinator operation.
These stats are still emitted and can be logged by setting debugDimensions in the coordinator
dynamic config.
- Initialize SegmentLoadingConfig only for historical management duties. This config is not
needed in other duties and initializing it creates logs which are misleading.
2023-08-30 11:30:38 +05:30
John Gerassimou
d201ea0ece
prometheus-emitter: add extraLabels parameter (#14728)
* prometheus-emitter: add extraLabels parameter

* prometheus-emitter: update readme to include the extraLabels parameter

* prometheus-emitter: remove nullable and surface label name issues

* remove import to make linter happy
2023-08-29 12:02:22 -07:00
Gian Merlino
004cd012e1
HttpClient: Include error handler on all connection attempts. (#14915)
Currently we have an error handler for https connection attempts, but
not for plaintext connection attempts. This leads to warnings like the
following for plaintext connection errors:

  EXCEPTION, please implement org.jboss.netty.handler.codec.http.HttpContentDecompressor.exceptionCaught() for proper handling.

This happens because if we don't add our own error handler, the last
handler in the chain during a connection attempt is HttpContentDecompressor,
which doesn't handle errors.

The new error handler for plaintext doesn't do much: it just closes
the channel.
2023-08-29 14:28:04 +05:30
benkrug
8885805bb3
Update filters.md (#14917) 2023-08-28 15:29:00 -07:00
Kashif Faraz
d6565f46b0
Increase the computed value of replicationThrottleLimit (#14913)
Changes
- Increase value of `replicationThrottleLimit` computed by `smartSegmentLoading` from
2% to 5% of total number of used segments.
- Assign replicas to a tier even when some replicas are already being loaded in that tier
- Limit the total number of replicas in load queue at start of run + replica assignments in
the run to the `replicationThrottleLimit`.

i.e. for every tier,
    num loading replicas at start of run + num replicas assigned in run <= replicationThrottleLimit
2023-08-28 18:20:22 +05:30
Karan Kumar
9fcbf05c5d
Adjusting SqlStatementResource and SqlTaskResource to set request attribute via a new method. (#14878) 2023-08-26 10:59:47 +00:00
Vadim Ogievetsky
30c49c4cfc
Web console: misc fixes and SQL query re-formatting (#14906)
* better dialog formatting

* use CSS to render triangle

* can flatten in kafka also

* better formatting

* better format

* fill in empty values in line chart

* more fp

* add show others
2023-08-25 15:18:37 -07:00
Victoria Lim
9142f4b8d7
docs: update note in automatic compaction doc (#14908) 2023-08-25 14:14:29 -07:00
George Shiqi Wu
95b0de61d1
Move some lifecycle management from doTask -> shutdown for the mm-less task runner (#14895)
* save work

* Add syncronized

* Don't shutdown in run

* Adding unit tests

* Cleanup lifecycle

* Fix tests

* remove newline
2023-08-25 10:50:38 -06:00
George Shiqi Wu
ad32f84586
Fix capacity response in mm-less ingestion (#14888)
Changes:
- Fix capacity response in mm-less ingestion.
- Add field usedClusterCapacity to the GET /totalWorkerCapacity response.
This API should be used to get the total ingestion capacity on the overlord.
- Remove method `isK8sTaskRunner` from interface `TaskRunner`
2023-08-25 08:17:38 +05:30
Kashif Faraz
e51181957c
Use num cores to determine balancerComputeThreads (#14902)
Changes:
- Determine the default value of balancerComputeThreads based on number of
coordinator cpus rather than number of segments. Even if the number of segments
is low and we create more balancer threads, it doesn't hurt the system as threads
would mostly be idle.
- Remove unused field from SegmentLoadQueueManager

Expected values:
- Clusters with ~1M segments typically work with Coordinators having 16 cores or more.
This would give us 8 balancer threads, which is the same as the current maximum.
- On small clusters, even a single thread is enough to do the required balancing work.
2023-08-25 08:15:27 +05:30
Tejaswini Bandlamudi
388d5ecf78
Fix reported CVEs (#14882)
Suppress CVEs from dependencies with no available fix or false positives
hadoop-annotations: CVE-2022-25168, CVE-2021-33036
hadoop-client-runtime: CVE-2023-1370, CVE-2023-37475
okio: CVE-2023-3635
Upgrade grpc version to fix CVE-2023-33953
2023-08-24 19:28:55 +05:30
Abhishek Agarwal
3c7b237c22
Add docs for ingesting Kafka topic name (#14894)
Add documentation on how to extract the Kafka topic name and ingest it into the data.
2023-08-24 19:19:59 +05:30
Zoltan Haindrich
54336e2a3e
Imporve on incremental compilation (#14860)
This patch fixes a few issues toward #14858

1. some phony classes were added to enable maven to track the compilation of those classes
2. cyclonedx 2.7.9 seem to handle incremental compilation better; it had a PR relating to that
3. needed to update root pom to 25
4. update antlr to 4.5.3 older one didn't really worked incrementally; 4.5.3 works much better
2023-08-24 16:06:16 +05:30
Laksh Singla
f9f734cde5
Display the output column name in InvalidNullByteException (#14780)
This PR maps the query column to the output column name while surfacing the fault since that is readily visible to the user while executing the query.
2023-08-24 04:24:41 +00:00
Clint Wylie
36e659a501
remove group-by v1 (#14866)
* remove group-by v1

* docs

* remove unused configs, fix test

* fix test

* adjustments

* why not

* adjust

* review stuff
2023-08-23 12:44:06 -07:00
zachjsh
0c76df1c7d
Enable Continuous auto kill (#14831)
### Description

This change enables the `KillUnusedSegments` coordinator duty to be scheduled continuously. Things that prevented this, or made this difficult before were the following:

1. If scheduled at fast enough rate, the duty would find the same intervals to kill for the same datasources, while kill tasks submitted for those same datasources and intervals were already underway, thus wasting task slots on duplicated work.

2. The task resources used by auto kill were previously unbounded.  Each duty run period, if unused
 segments were found for any datasource, a kill task would be submitted to kill them.

This pr solves for both of these issues:

1. The duty keeps track of the end time of the last interval found when killing unused segments for each datasource, in a in memory map. The end time for each datasource, if found, is used as the start time lower bound, when searching for unused intervals for that same datasource. Each duty run, we remove any datasource keys from this map that are no longer found to match datasources in the system, or in whitelist, and also remove a datasource entry, if there is found to be no unused segments for the datasource, which happens when we fail to find an interval which includes unused segments. Removing the datasource entry from the map,  allows for searching for unusedSegments in the datasource from the beginning of time once again

2. The unbounded task resource usage can be mitigated with coordinator dynamic config added as part of ba957a9b97


Operators can configure continous auto kill by providing coordinator runtime properties similar to the following:

```
druid.coordinator.period.indexingPeriod=PT60S
druid.coordinator.kill.period=PT60S
```

And providing sensible limits to the killTask usage via coordinator dynamic properties.
2023-08-23 09:23:08 -04:00
Adarsh Sanjeev
dfb5a98888
Add coordinator API for unused segments (#14846)
There is a current issue due to inconsistent metadata between worker and controller in MSQ. A controller can receive one set of segments, which are then marked as unused by, say, a compaction job. The worker would be unable to get the segment information as MetadataResource.
2023-08-23 14:51:25 +05:30
Atul Mohan
989ed8d0c2
Fix null check for JWT claims (#14872) 2023-08-23 14:39:23 +05:30
Giulio Talarico
76e5048aab
fix supervisor spec api submission commands (#14877) 2023-08-23 14:38:09 +05:30
Zoltan Haindrich
e806d09309
Allow EARLIEST/EARLIEST_BY/LATEST/LATEST_BY for STRING columns without specifying maxStringBytes (#14848) 2023-08-22 22:50:19 -07:00
Clint Wylie
7b5012ea6e
override retry attempts for InputEntityIteratingReaderTest for much faster test run (#14897) 2023-08-22 22:01:47 -07:00
Clint Wylie
fb053c399c
consolidate json and auto indexers, remove v4 nested column serializer (#14456) 2023-08-22 18:50:11 -07:00
Soumyava
6817de9376
Doc changes for avatica transparent reconnection (#14896) 2023-08-22 11:58:17 -07:00
Zoltan Haindrich
b9a33949fd
Fix aggregation filter expression processing in the absense of projection (#14893)
* test

* fix

* add 33 test

* crap

* Revert "crap"

This reverts commit 2751198debdcf3ee0c0ab9f56a8dfa7477308d93.

* cleanup test

* celanup

* rename test
2023-08-22 10:17:14 -07:00
Kashif Faraz
9376d8d6e1
Refactor: Move UpdateCoordinatorStateAndPrepareCluster duty out of DruidCoordinator (#14845)
Motivation:
- Clean up `DruidCoordinator` and move methods to classes where they are most relevant

Changes:
- No functional change
- Add duty `PrepareBalancerAndLoadQueues` to replace `UpdateCoordinatorState`
- Move map of `LoadQueuePeon` from `DruidCoordinator` to `LoadQueueTaskMaster`
- Make `BalancerStrategyFactory` an abstract class and keep the balancer executor here
- Move reporting of used segment stats and historical capacity stats from
`CollectSegmentAndServerStats` to `PrepareBalancerAndLoadQueues`
- Move reporting of unavailable and under-replicated segment stats from
`CollectSegmentAndServerStats` to `UpdateReplicationStatus` duty
2023-08-22 19:50:41 +05:30
Zoltan Haindrich
14c1aff150
Fix error messages relating to OVERWRITE keyword (#14870)
OVERWRITE should not be a fully reserved keyword
2023-08-22 16:17:49 +05:30
AmatyaAvadhanula
bd505062de
Improve streaming ingestion completion timeout error message (#14636)
* Improve streaming ingestion completion timeout error message

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Co-authored-by: Benedict Jin <asdf2014@apache.org>
2023-08-22 14:33:28 +05:30
Clint Wylie
194a9c9abc
set druid.expressions.useStrictBooleans to true by default (#14734) 2023-08-22 00:19:56 -07:00
Tejaswini Bandlamudi
d87056e708
Upgrade guava version to 31.1-jre (#14767)
Currently, Druid is using Guava 16.0.1 version. This upgrade to 31.1-jre fixes the following issues.

CVE-2018-10237 (Unbounded memory allocation in Google Guava 11.0 through 24.x before 24.1.1 allows remote attackers to conduct denial of service attacks against servers that depend on this library and deserialize attacker-provided data because the AtomicDoubleArray class (when serialized with Java serialization) and the CompoundOrdering class (when serialized with GWT serialization) perform eager allocation without appropriate checks on what a client has sent and whether the data size is reasonable). We don't use Java or GWT serializations. Despite being false positive they're causing red security scans on Druid distribution.
Latest version of google-client-api is incompatible with the existing Guava version. This PR unblocks Update google client apis to latest version #14414
2023-08-22 12:09:53 +05:30
Benedict Jin
18f7cb6926
Fixed broken URL of python api tutorial (#14881) 2023-08-22 09:53:41 +05:30
Clint Wylie
5d1412949e
enable sql compatible null handling mode by default (#14792)
* enable sql compatible null handling mode by default
* fix bug with string first/last aggs when druid.generic.useDefaultValueForNull=false
2023-08-21 20:07:13 -07:00
Katya Macedo
5f74ef56f1
Clean up Kafka supervisor topic (#14651)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-08-21 11:55:38 -07:00
Nhi Pham
9fe7c01c16
Automatic compaction API documentation refactor (#14740)
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2023-08-21 11:34:41 -07:00
Peter Marshall
0dfd99e381
202307-notebook-unionall (#14726)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-08-21 10:55:58 -07:00
Vadim Ogievetsky
631dc3b589
add Kafka topic column controls (#14865) 2023-08-21 21:33:23 +05:30