13206 Commits

Author SHA1 Message Date
Pranav
883c2692d2
Adding new function decode_base64_utf8 and expr macro (#14943)
* Adding new function decode_base64_utf8 and expr macro

* using BaseScalarUnivariateMacroFunctionExpr

* Print stack trace in case of debug in ChainedExecutionQueryRunner

* fix static check
2023-09-20 17:06:34 -07:00
Xavier Léauté
22abc10f24
update RoaringBitmap to 0.9.49 (#15006)
* update RoaringBitmap to 0.9.49

update RoaringBitmap from 0.9.0 to 0.9.49

Many optimizations and improvements have gone into recent releases of
RoaringBitmap. It seems worthwhile to incorporate those.

* implement workaround for BatchIterator interface change

* add test case for BatchIteratorAdapter.advanceIfNeeded
2023-09-20 15:52:27 -07:00
Laksh Singla
82e809c8d0
fix (#15017) 2023-09-20 15:48:26 -07:00
Zoltan Haindrich
08cf290da2
Configure caching for static-check actions (#15010)
* some stuff

* some stuff

* dont change it.sh

* some stuff

* updates

* add missing

* add 1 more

* setup-java
2023-09-20 14:11:39 -07:00
Gian Merlino
823f620ede
Add IS [NOT] DISTINCT FROM to SQL and join matchers. (#14976)
* Add IS [NOT] DISTINCT FROM to SQL and join matchers.

Changes:

1) Add "isdistinctfrom" and "notdistinctfrom" native expressions.

2) Add "IS [NOT] DISTINCT FROM" to SQL. It uses the new native expressions
   when generating expressions, and is treated the same as equals and
   not-equals when generating native filters on literals.

3) Update join matchers to have an "includeNull" parameter that determines
   whether we are operating in "equals" mode or "is not distinct from"
   mode.

* Main changes:

- Add ARRAY handling to "notdistinctfrom" and "isdistinctfrom".
- Include null in pushed-down filters when using "notdistinctfrom" in a join.

Other changes:
- Adjust join filter analyzer to more explicitly use InDimFilter's ValuesSets,
  relying less on remembering to get it right to avoid copies.

* Remove unused "wrap" method.

* Fixes.

* Remove methods we do not need.

* Fix bug with INPUT_REF.
2023-09-20 10:44:32 -07:00
Zoltan Haindrich
79f882f48c
Fix exception cause logging in QueryResultPusher (#14975) 2023-09-20 15:44:02 +05:30
Zoltan Haindrich
e8773f4d0f
Enable already passing tests in DecoupledPlanningCalciteQueryTest (#14996) 2023-09-20 15:42:52 +05:30
Sam Wheating
73bab2f020
Add option to copy query results directly to clipboard (#14889)
* Add option to copy query results to clipboard

* Refactor, allow copying in all formats

---------

Co-authored-by: Sam Wheating <sam.wheating@reddit.com>
2023-09-19 10:25:39 -07:00
Gian Merlino
4f498e6469
SQL: Plan non-equijoin conditions as cross join followed by filter. (#14978)
* SQL: Plan non-equijoin conditions as cross join followed by filter.

Druid has previously refused to execute joins with non-equality-based
conditions. This was well-intentioned: the idea was to push people to
write their queries in a different, hopefully more performant way.

But as we're moving towards fuller SQL support, it makes more sense to
allow these conditions to go through with the best plan we can come up
with: a cross join followed by a filter. In some cases this will allow
the query to run, and people will be happy with that. In other cases,
it will run into resource limits during execution. But we should at
least give the query a chance.

This patch also updates the documentation to explain how people can
tell whether their queries are being planned this way.

* cartesian is a word.

* Adjust tests.

* Update docs/querying/datasource.md

Co-authored-by: Benedict Jin <asdf2014@apache.org>

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
2023-09-19 10:23:42 -07:00
George Shiqi Wu
d459df8d6e
Fix log syntax (#15004) 2023-09-18 10:40:02 -07:00
Karan Kumar
973fbaf962
Adding addition logging for taskIdReady in MSQ for debugging lock races. (#14998) 2023-09-17 20:11:58 +00:00
Rohan Garg
39d95955f5
Do not eagerly close inner iterators in CloseableIterator#flatMap (#14986) 2023-09-15 15:14:20 +05:30
Laksh Singla
0fc5d5405a
Tweak GHA runner label for MSQ (#14992) 2023-09-15 05:44:21 +00:00
Soumyava
279b3818f0
Make Unnest work with nullif operator (#14993)
This is due to the recursive filter creation in unnest storage adapter not performing correctly in case of an empty children. This PR addresses the issue
2023-09-15 09:54:14 +05:30
Gian Merlino
3ae5e97801
Add IS [NOT] TRUE, IS [NOT] FALSE native functions. (#14977)
They are not quite the same as "x == true", "x != true", etc. These
functions never return null, even when "x" itself is null.
2023-09-14 09:19:09 -07:00
AmatyaAvadhanula
0e3df2d2e9
Clean up stale locks if segment allocation fails (#14966)
* Clean up stale locks if segment allocation fails due to an exception
2023-09-14 14:58:02 +05:30
Soumyava
7bbefd5741
Updating version in from.ftl (#14982) 2023-09-14 05:11:36 +00:00
Soumyava
5c42ac8c4d
Fix for latest agg to handle nulls in time column. Also adding optimi… (#14911)
* Fix for latest agg to handle nulls in time column. Also adding optimization for dictionary encoded string columns

* One minor fix

* Adding more tests for the new class

* Changing the init to a putInt
2023-09-13 17:37:26 -07:00
Soumyava
bf99d2c7b2
Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly (#14924)
* Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly

* Fixing a failed test

* Updating numericNilAgg

* Moving to use default values in case of nil agg

* Adding the same for first agg

* Fixing a test

* fixing vectorized string agg for last/first with cast if numeric

* Updating tests to remove mockito and cover the case of string first/last on non string columns

* Updating a test to vectorize

* Addressing review comments: Name change to NilVectorAggregator and using static variables now

* fixing intellij inspections
2023-09-13 13:15:14 -07:00
Benedict Jin
7f757e33f0
Fix the created property in DOAP RDF file (#14971) 2023-09-13 06:12:35 -07:00
Laksh Singla
4c57504960
Fix the uncaught exceptions when materializing results as frames (#14970)
When materializing the results as frames, we defer the creation of the frames in ScanQueryQueryToolChest, which passes through the catch-all block reserved for catching cases when we don't have the complete row signature in the query (and falls back to the old code).
This PR aims to resolve it by adding the frame generation code to the try-catch block we have at the outer level.
2023-09-13 15:41:28 +05:30
Tejaswini Bandlamudi
b7bb5ee1db
Upload docker and druid service logs as artifacts on GitHub Actions IT run failure (#14967)
With this PR, docker and druid service logs are uploaded as artifacts onto GitHub when an IT job fails so that we can later download them for investigation.
2023-09-13 11:32:04 +05:30
Clint Wylie
23b78c0f95
use mmap for nested column value to dictionary id lookup for more chill heap usage during serialization (#14919) 2023-09-12 21:01:18 -07:00
Kashif Faraz
286eecad7c
Simplify DruidCoordinatorConfig and binding of metadata cleanup duties (#14891)
Changes:
- Move following configs from `CliCoordinator` to `DruidCoordinatorConfig`:
  - `druid.coordinator.kill.on`
  - `druid.coordinator.kill.pendingSegments.on`
  - `druid.coordinator.kill.supervisors.on`
  - `druid.coordinator.kill.rules.on`
  - `druid.coordinator.kill.audit.on`
  - `druid.coordinator.kill.datasource.on`
  - `druid.coordinator.kill.compaction.on`
- In the Coordinator style used by historical management duties, always instantiate all
 the metadata cleanup duties but execute only if enabled. In the existing code, they are
instantiated only when enabled by using optional binding with Guice.
- Add a wrapper `MetadataManager` which contains handles to all the different
metadata managers for rules, supervisors, segments, etc.
- Add a `CoordinatorConfigManager` to simplify read and update of coordinator configs
- Remove persistence related methods from `CoordinatorCompactionConfig` and
`CoordinatorDynamicConfig` as these are config classes.
- Remove annotations `@CoordinatorIndexingServiceDuty`,
`@CoordinatorMetadataStoreManagementDuty`
2023-09-13 09:06:57 +05:30
Suneet Saldanha
6371721e17
Add DOAP file for Druid (#14954)
The DOAP file is a standard RDF file that describes a project's metadata.
The Apache Software Foundation uses DOAP files for projects to keep project
listing information at https://projects.apache.org/projects.html

This change just introduces basic information about the project. Future
changes can add more information like each release that goes out.

The descriptions were pulled from the website and the README in this repo.
2023-09-12 17:40:21 -07:00
Clint Wylie
891f0a3fe9
longer compatibility window for nested column format v4 (#14955)
changes:
* add back nested column v4 serializers
* 'json' schema by default still uses the newer 'nested common format' used by 'auto', but now has an optional 'formatVersion' property which can be specified to override format versions on native ingest jobs
* add system config to specify default column format stuff, 'druid.indexing.formats', and property 'druid.indexing.formats.nestedColumnFormatVersion' to specify system level preferred nested column format for friendly rolling upgrades from versions which do not support the newer 'nested common format' used by 'auto'
2023-09-12 14:07:53 -07:00
Zoltan Haindrich
5d16d0edf0
Count distinct returned incorrect results without useApproximateCountDistinct (#14748)
* fix grouping engine handling of summaries when result set is empty
2023-09-12 13:57:54 -07:00
Abhishek Radhakrishnan
0f38a37b9d
Tweak GHA runner label. (#14963)
- processing/** can be ingestion, querying or neither. Removing it
for now.
- Also, add msq extension for the querying label.
2023-09-11 20:09:26 -07:00
Clint Wylie
5cecf6ce8f
fix issue with segment metadata cache and complex types when doing out of order upgrades from 0.22 (#14948) 2023-09-12 10:54:35 +08:00
Suneet Saldanha
757603a773
Set task location as k8sPodName for mm-less ingestion (#14959)
* Set task location as k8sPodName for mm-less ingestion

* tests
2023-09-11 19:44:26 -07:00
George Shiqi Wu
f773d83914
Mixed task runner for migration to mm-less ingestion (#14918)
* save work

* Working

* Fix runner constructor

* Working runner

* extra log lines

* try using lifecycle for everything

* clean up configs

* cleanup /workers call

* Use a single config

* Allow selecting runner

* debug changes

* Work on composite task runner

* Unit tests running

* Add documentation

* Add some javadocs

* Fix spelling

* Use standard libraries

* code review

* fix

* fix

* use taskRunner as string

* checkstyl

---------

Co-authored-by: Suneet Saldanha <suneet@apache.org>
2023-09-11 18:09:46 -07:00
317brian
3a453f7a3c
docs: add note about transparent_reconnection (#14953)
* add note about transparent_reconnection

* Update docs/api-reference/sql-jdbc.md
2023-09-11 11:58:39 -07:00
Kashif Faraz
7871e633c6
Fix bug in KillStalePendingSegments (#14961) 2023-09-11 15:18:15 +05:30
Tejaswini Bandlamudi
dec6a0aa14
Update google client apis to latest version (#14414)
Currently Druid is using google apis client 1.26.0 version and google-oauth-client-1.26.0.jar in particular is bringing following CVEs CVE-2020-7692, CVE-2021-22573. Despite the CVEs being false positives, they're causing red security scans on Druid distribution. Hence updating the version to latest version with these CVE fixes.
2023-09-11 12:27:23 +05:30
Clint Wylie
2b7f2c5119
use VectorValueSelector instead of BaseLongVectorValueSelector for StringFirstAggregatorFactory.factorizeVector (#14957) 2023-09-09 04:03:05 -07:00
317brian
09f7dfe327
docs: update docusaurus 2 stuff (#14864) 2023-09-08 14:19:15 -07:00
Zoltan Haindrich
699893bcff
Fix StringLastAggregatorFactory equals/toString (#14907)
* update test

* update test

* format

* test

* fix0

* Revert "fix0"

This reverts commit 44992cb3932158c1253134bc689884abd4650fd3.

* ok resultset

* add plan

* update test

* before rewind

* test

* fix toString/compare/test

* move test

* add timeColumn to hashCode
2023-09-08 09:20:54 -07:00
Kashif Faraz
647686aee2
Add test and metrics for KillStalePendingSegments duty (#14951)
Changes:
- Add new metric `kill/pendingSegments/count` with dimension `dataSource`
- Add tests for `KillStalePendingSegments`
- Reduce no-op logs that spit out for each datasource even when no pending
segments have been deleted. This can get particularly noisy at low values of `indexingPeriod`.
- Refactor the code in `KillStalePendingSegments` for readability and add javadocs
2023-09-08 10:33:47 +05:30
Abhishek Radhakrishnan
f9cf500a69
Extend GHA autolabeler to other areas (#14903)
* Automate adding labels.

* Add metrics/event emitting label

* ingestion and segment format
2023-09-07 20:25:37 -07:00
Hardik Bajaj
e100b18e86
Updated documentation for OshiSysMonitor (#14912) 2023-09-07 16:54:33 +05:30
Kashif Faraz
88f3c9baed
Fix bug in computed value of balancerComputeThreads (#14947)
In smartSegmentLoading mode, use computed value of balancerComputeThreads
rather than configured value.
2023-09-07 01:14:05 +05:30
Soumyava
a8fa979115
Unnest dont push down not (#14942)
* Not pushing down not filters

* New test case

* Updating tests

* Removing a stale comment
2023-09-06 08:57:03 -07:00
Zoltan Haindrich
23308c050d
Remove DruidAggregateCaseToFilterRule (#14940)
The issue due to which the custom rule was added has been fixed as a part of https://issues.apache.org/jira/browse/CALCITE-3763 and accommodated during Calcite upgrade
2023-09-06 19:11:58 +05:30
Laksh Singla
6ee0b06e38
Auto configuration for maxSubqueryBytes (#14808)
A new monitor SubqueryCountStatsMonitor which emits the metrics corresponding to the subqueries and their execution is now introduced. Moreover, the user can now also use the auto mode to automatically set the number of bytes available per query for the inlining of its subquery's results.
2023-09-06 05:47:19 +00:00
Adarsh Sanjeev
959148ad37
Add code to wait for segments generated to be loaded on historicals (#14322)
Currently, after an MSQ query, the web console is responsible for waiting for the segments to load. It does so by checking if there are any segments loading into the datasource ingested into, which can cause some issues, like in cases where the segments would never be loaded, or would end up waiting for other ingests as well.

This PR shifts this responsibility to the controller, which would have the list of segments created.
2023-09-06 10:35:57 +05:30
Clint Wylie
706b57c0b2
fixup array and mvd sql docs (#14928) 2023-09-05 16:17:00 -07:00
Jill Osborne
425ebaa387
Query tips doc (#14922)
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2023-09-05 14:16:01 -07:00
Soumyava
8088a763a6
Vectorize earliest aggregator for both numeric and string types (#14408)
* Vectorizing earliest for numeric

* Vectorizing earliest string aggregator

* checkstyle fix

* Removing unnecessary exceptions

* Ignoring tests in MSQ as earliest is not supported for numeric there

* Fixing benchmarks

* Updating tests as MSQ does not support earliest for some cases

* Addressing review comments by adding the following:
1. Checking capabilities first before creating selectors
2. Removing mockito in tests for numeric first aggs
3. Removing unnecessary tests

* Addressing issues for dictionary encoded single string columns where we can use the dictionary ids instead of the entire string

* Adding a flag for multi value dimension selector

* Addressing comments

* 1 more change

* Handling review comments part 1

* Handling review comments and correctness fix for latest_by when the time expression need not be in sorted order

* Updating numeric first vector agg

* Revert "Updating numeric first vector agg"

This reverts commit 429170990192883e51812311c49d2e461e6db732.

* Updating code for correctness issues

* fixing an issue with latest agg

* Adding more comments and removing an unnecessary check

* Addressing null checks for tie selector and only vectorize false for quantile sketches
2023-09-05 08:41:42 -07:00
Abhishek Radhakrishnan
9d6ca61ac1
Verify statsd mock client interaction in unit test (#14939) 2023-09-05 07:34:22 -07:00
Kashif Faraz
289ee1e011
Refactor: Cleanup NoopTask (#14938)
Changes:
- Simplify static `create` methods for `NoopTask`
- Remove `FirehoseFactory`, `IsReadyResult`, `readyTime` from `NoopTask`
as these fields were not being used anywhere
- Update tests
2023-09-05 09:15:41 +05:30