Commit Graph

13353 Commits

Author SHA1 Message Date
Zoltan Haindrich 08cf290da2
Configure caching for static-check actions (#15010)
* some stuff

* some stuff

* dont change it.sh

* some stuff

* updates

* add missing

* add 1 more

* setup-java
2023-09-20 14:11:39 -07:00
Gian Merlino 823f620ede
Add IS [NOT] DISTINCT FROM to SQL and join matchers. (#14976)
* Add IS [NOT] DISTINCT FROM to SQL and join matchers.

Changes:

1) Add "isdistinctfrom" and "notdistinctfrom" native expressions.

2) Add "IS [NOT] DISTINCT FROM" to SQL. It uses the new native expressions
   when generating expressions, and is treated the same as equals and
   not-equals when generating native filters on literals.

3) Update join matchers to have an "includeNull" parameter that determines
   whether we are operating in "equals" mode or "is not distinct from"
   mode.

* Main changes:

- Add ARRAY handling to "notdistinctfrom" and "isdistinctfrom".
- Include null in pushed-down filters when using "notdistinctfrom" in a join.

Other changes:
- Adjust join filter analyzer to more explicitly use InDimFilter's ValuesSets,
  relying less on remembering to get it right to avoid copies.

* Remove unused "wrap" method.

* Fixes.

* Remove methods we do not need.

* Fix bug with INPUT_REF.
2023-09-20 10:44:32 -07:00
Zoltan Haindrich 79f882f48c
Fix exception cause logging in QueryResultPusher (#14975) 2023-09-20 15:44:02 +05:30
Zoltan Haindrich e8773f4d0f
Enable already passing tests in DecoupledPlanningCalciteQueryTest (#14996) 2023-09-20 15:42:52 +05:30
Sam Wheating 73bab2f020
Add option to copy query results directly to clipboard (#14889)
* Add option to copy query results to clipboard

* Refactor, allow copying in all formats

---------

Co-authored-by: Sam Wheating <sam.wheating@reddit.com>
2023-09-19 10:25:39 -07:00
Gian Merlino 4f498e6469
SQL: Plan non-equijoin conditions as cross join followed by filter. (#14978)
* SQL: Plan non-equijoin conditions as cross join followed by filter.

Druid has previously refused to execute joins with non-equality-based
conditions. This was well-intentioned: the idea was to push people to
write their queries in a different, hopefully more performant way.

But as we're moving towards fuller SQL support, it makes more sense to
allow these conditions to go through with the best plan we can come up
with: a cross join followed by a filter. In some cases this will allow
the query to run, and people will be happy with that. In other cases,
it will run into resource limits during execution. But we should at
least give the query a chance.

This patch also updates the documentation to explain how people can
tell whether their queries are being planned this way.

* cartesian is a word.

* Adjust tests.

* Update docs/querying/datasource.md

Co-authored-by: Benedict Jin <asdf2014@apache.org>

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
2023-09-19 10:23:42 -07:00
George Shiqi Wu d459df8d6e
Fix log syntax (#15004) 2023-09-18 10:40:02 -07:00
Karan Kumar 973fbaf962
Adding addition logging for taskIdReady in MSQ for debugging lock races. (#14998) 2023-09-17 20:11:58 +00:00
Rohan Garg 39d95955f5
Do not eagerly close inner iterators in CloseableIterator#flatMap (#14986) 2023-09-15 15:14:20 +05:30
Laksh Singla 0fc5d5405a
Tweak GHA runner label for MSQ (#14992) 2023-09-15 05:44:21 +00:00
Soumyava 279b3818f0
Make Unnest work with nullif operator (#14993)
This is due to the recursive filter creation in unnest storage adapter not performing correctly in case of an empty children. This PR addresses the issue
2023-09-15 09:54:14 +05:30
Gian Merlino 3ae5e97801
Add IS [NOT] TRUE, IS [NOT] FALSE native functions. (#14977)
They are not quite the same as "x == true", "x != true", etc. These
functions never return null, even when "x" itself is null.
2023-09-14 09:19:09 -07:00
AmatyaAvadhanula 0e3df2d2e9
Clean up stale locks if segment allocation fails (#14966)
* Clean up stale locks if segment allocation fails due to an exception
2023-09-14 14:58:02 +05:30
Soumyava 7bbefd5741
Updating version in from.ftl (#14982) 2023-09-14 05:11:36 +00:00
Soumyava 5c42ac8c4d
Fix for latest agg to handle nulls in time column. Also adding optimi… (#14911)
* Fix for latest agg to handle nulls in time column. Also adding optimization for dictionary encoded string columns

* One minor fix

* Adding more tests for the new class

* Changing the init to a putInt
2023-09-13 17:37:26 -07:00
Soumyava bf99d2c7b2
Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly (#14924)
* Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly

* Fixing a failed test

* Updating numericNilAgg

* Moving to use default values in case of nil agg

* Adding the same for first agg

* Fixing a test

* fixing vectorized string agg for last/first with cast if numeric

* Updating tests to remove mockito and cover the case of string first/last on non string columns

* Updating a test to vectorize

* Addressing review comments: Name change to NilVectorAggregator and using static variables now

* fixing intellij inspections
2023-09-13 13:15:14 -07:00
Benedict Jin 7f757e33f0
Fix the created property in DOAP RDF file (#14971) 2023-09-13 06:12:35 -07:00
Laksh Singla 4c57504960
Fix the uncaught exceptions when materializing results as frames (#14970)
When materializing the results as frames, we defer the creation of the frames in ScanQueryQueryToolChest, which passes through the catch-all block reserved for catching cases when we don't have the complete row signature in the query (and falls back to the old code).
This PR aims to resolve it by adding the frame generation code to the try-catch block we have at the outer level.
2023-09-13 15:41:28 +05:30
Tejaswini Bandlamudi b7bb5ee1db
Upload docker and druid service logs as artifacts on GitHub Actions IT run failure (#14967)
With this PR, docker and druid service logs are uploaded as artifacts onto GitHub when an IT job fails so that we can later download them for investigation.
2023-09-13 11:32:04 +05:30
Clint Wylie 23b78c0f95
use mmap for nested column value to dictionary id lookup for more chill heap usage during serialization (#14919) 2023-09-12 21:01:18 -07:00
Kashif Faraz 286eecad7c
Simplify DruidCoordinatorConfig and binding of metadata cleanup duties (#14891)
Changes:
- Move following configs from `CliCoordinator` to `DruidCoordinatorConfig`:
  - `druid.coordinator.kill.on`
  - `druid.coordinator.kill.pendingSegments.on`
  - `druid.coordinator.kill.supervisors.on`
  - `druid.coordinator.kill.rules.on`
  - `druid.coordinator.kill.audit.on`
  - `druid.coordinator.kill.datasource.on`
  - `druid.coordinator.kill.compaction.on`
- In the Coordinator style used by historical management duties, always instantiate all
 the metadata cleanup duties but execute only if enabled. In the existing code, they are
instantiated only when enabled by using optional binding with Guice.
- Add a wrapper `MetadataManager` which contains handles to all the different
metadata managers for rules, supervisors, segments, etc.
- Add a `CoordinatorConfigManager` to simplify read and update of coordinator configs
- Remove persistence related methods from `CoordinatorCompactionConfig` and
`CoordinatorDynamicConfig` as these are config classes.
- Remove annotations `@CoordinatorIndexingServiceDuty`,
`@CoordinatorMetadataStoreManagementDuty`
2023-09-13 09:06:57 +05:30
Suneet Saldanha 6371721e17
Add DOAP file for Druid (#14954)
The DOAP file is a standard RDF file that describes a project's metadata.
The Apache Software Foundation uses DOAP files for projects to keep project
listing information at https://projects.apache.org/projects.html

This change just introduces basic information about the project. Future
changes can add more information like each release that goes out.

The descriptions were pulled from the website and the README in this repo.
2023-09-12 17:40:21 -07:00
Clint Wylie 891f0a3fe9
longer compatibility window for nested column format v4 (#14955)
changes:
* add back nested column v4 serializers
* 'json' schema by default still uses the newer 'nested common format' used by 'auto', but now has an optional 'formatVersion' property which can be specified to override format versions on native ingest jobs
* add system config to specify default column format stuff, 'druid.indexing.formats', and property 'druid.indexing.formats.nestedColumnFormatVersion' to specify system level preferred nested column format for friendly rolling upgrades from versions which do not support the newer 'nested common format' used by 'auto'
2023-09-12 14:07:53 -07:00
Zoltan Haindrich 5d16d0edf0
Count distinct returned incorrect results without useApproximateCountDistinct (#14748)
* fix grouping engine handling of summaries when result set is empty
2023-09-12 13:57:54 -07:00
Abhishek Radhakrishnan 0f38a37b9d
Tweak GHA runner label. (#14963)
- processing/** can be ingestion, querying or neither. Removing it
for now.
- Also, add msq extension for the querying label.
2023-09-11 20:09:26 -07:00
Clint Wylie 5cecf6ce8f
fix issue with segment metadata cache and complex types when doing out of order upgrades from 0.22 (#14948) 2023-09-12 10:54:35 +08:00
Suneet Saldanha 757603a773
Set task location as k8sPodName for mm-less ingestion (#14959)
* Set task location as k8sPodName for mm-less ingestion

* tests
2023-09-11 19:44:26 -07:00
George Shiqi Wu f773d83914
Mixed task runner for migration to mm-less ingestion (#14918)
* save work

* Working

* Fix runner constructor

* Working runner

* extra log lines

* try using lifecycle for everything

* clean up configs

* cleanup /workers call

* Use a single config

* Allow selecting runner

* debug changes

* Work on composite task runner

* Unit tests running

* Add documentation

* Add some javadocs

* Fix spelling

* Use standard libraries

* code review

* fix

* fix

* use taskRunner as string

* checkstyl

---------

Co-authored-by: Suneet Saldanha <suneet@apache.org>
2023-09-11 18:09:46 -07:00
317brian 3a453f7a3c
docs: add note about transparent_reconnection (#14953)
* add note about transparent_reconnection

* Update docs/api-reference/sql-jdbc.md
2023-09-11 11:58:39 -07:00
Kashif Faraz 7871e633c6
Fix bug in KillStalePendingSegments (#14961) 2023-09-11 15:18:15 +05:30
Tejaswini Bandlamudi dec6a0aa14
Update google client apis to latest version (#14414)
Currently Druid is using google apis client 1.26.0 version and google-oauth-client-1.26.0.jar in particular is bringing following CVEs CVE-2020-7692, CVE-2021-22573. Despite the CVEs being false positives, they're causing red security scans on Druid distribution. Hence updating the version to latest version with these CVE fixes.
2023-09-11 12:27:23 +05:30
Clint Wylie 2b7f2c5119
use VectorValueSelector instead of BaseLongVectorValueSelector for StringFirstAggregatorFactory.factorizeVector (#14957) 2023-09-09 04:03:05 -07:00
317brian 09f7dfe327
docs: update docusaurus 2 stuff (#14864) 2023-09-08 14:19:15 -07:00
Zoltan Haindrich 699893bcff
Fix StringLastAggregatorFactory equals/toString (#14907)
* update test

* update test

* format

* test

* fix0

* Revert "fix0"

This reverts commit 44992cb393.

* ok resultset

* add plan

* update test

* before rewind

* test

* fix toString/compare/test

* move test

* add timeColumn to hashCode
2023-09-08 09:20:54 -07:00
Kashif Faraz 647686aee2
Add test and metrics for KillStalePendingSegments duty (#14951)
Changes:
- Add new metric `kill/pendingSegments/count` with dimension `dataSource`
- Add tests for `KillStalePendingSegments`
- Reduce no-op logs that spit out for each datasource even when no pending
segments have been deleted. This can get particularly noisy at low values of `indexingPeriod`.
- Refactor the code in `KillStalePendingSegments` for readability and add javadocs
2023-09-08 10:33:47 +05:30
Abhishek Radhakrishnan f9cf500a69
Extend GHA autolabeler to other areas (#14903)
* Automate adding labels.

* Add metrics/event emitting label

* ingestion and segment format
2023-09-07 20:25:37 -07:00
Hardik Bajaj e100b18e86
Updated documentation for OshiSysMonitor (#14912) 2023-09-07 16:54:33 +05:30
Kashif Faraz 88f3c9baed
Fix bug in computed value of balancerComputeThreads (#14947)
In smartSegmentLoading mode, use computed value of balancerComputeThreads
rather than configured value.
2023-09-07 01:14:05 +05:30
Soumyava a8fa979115
Unnest dont push down not (#14942)
* Not pushing down not filters

* New test case

* Updating tests

* Removing a stale comment
2023-09-06 08:57:03 -07:00
Zoltan Haindrich 23308c050d
Remove DruidAggregateCaseToFilterRule (#14940)
The issue due to which the custom rule was added has been fixed as a part of https://issues.apache.org/jira/browse/CALCITE-3763 and accommodated during Calcite upgrade
2023-09-06 19:11:58 +05:30
Laksh Singla 6ee0b06e38
Auto configuration for maxSubqueryBytes (#14808)
A new monitor SubqueryCountStatsMonitor which emits the metrics corresponding to the subqueries and their execution is now introduced. Moreover, the user can now also use the auto mode to automatically set the number of bytes available per query for the inlining of its subquery's results.
2023-09-06 05:47:19 +00:00
Adarsh Sanjeev 959148ad37
Add code to wait for segments generated to be loaded on historicals (#14322)
Currently, after an MSQ query, the web console is responsible for waiting for the segments to load. It does so by checking if there are any segments loading into the datasource ingested into, which can cause some issues, like in cases where the segments would never be loaded, or would end up waiting for other ingests as well.

This PR shifts this responsibility to the controller, which would have the list of segments created.
2023-09-06 10:35:57 +05:30
Clint Wylie 706b57c0b2
fixup array and mvd sql docs (#14928) 2023-09-05 16:17:00 -07:00
Jill Osborne 425ebaa387
Query tips doc (#14922)
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2023-09-05 14:16:01 -07:00
Soumyava 8088a763a6
Vectorize earliest aggregator for both numeric and string types (#14408)
* Vectorizing earliest for numeric

* Vectorizing earliest string aggregator

* checkstyle fix

* Removing unnecessary exceptions

* Ignoring tests in MSQ as earliest is not supported for numeric there

* Fixing benchmarks

* Updating tests as MSQ does not support earliest for some cases

* Addressing review comments by adding the following:
1. Checking capabilities first before creating selectors
2. Removing mockito in tests for numeric first aggs
3. Removing unnecessary tests

* Addressing issues for dictionary encoded single string columns where we can use the dictionary ids instead of the entire string

* Adding a flag for multi value dimension selector

* Addressing comments

* 1 more change

* Handling review comments part 1

* Handling review comments and correctness fix for latest_by when the time expression need not be in sorted order

* Updating numeric first vector agg

* Revert "Updating numeric first vector agg"

This reverts commit 4291709901.

* Updating code for correctness issues

* fixing an issue with latest agg

* Adding more comments and removing an unnecessary check

* Addressing null checks for tie selector and only vectorize false for quantile sketches
2023-09-05 08:41:42 -07:00
Abhishek Radhakrishnan 9d6ca61ac1
Verify statsd mock client interaction in unit test (#14939) 2023-09-05 07:34:22 -07:00
Kashif Faraz 289ee1e011
Refactor: Cleanup NoopTask (#14938)
Changes:
- Simplify static `create` methods for `NoopTask`
- Remove `FirehoseFactory`, `IsReadyResult`, `readyTime` from `NoopTask`
as these fields were not being used anywhere
- Update tests
2023-09-05 09:15:41 +05:30
panhongan d4e972e1e4
Add checking for new checkpoint (#14353)
Check that a checkpoint is non-empty before adding it to the checkpoint sequence 
in a SeekableStreamSupervisor
2023-09-04 13:18:55 +05:30
Kashif Faraz ec630e3671
Remove deprecated coordinator dynamic configs (#14923)
Changes:

[A] Remove config `decommissioningMaxPercentOfMaxSegmentsToMove`
- It is a complicated config 😅 , 
- It is always desirable to prioritize move from decommissioning servers so that
they can be terminated quickly, so this should always be 100%
- It is already handled by `smartSegmentLoading` (enabled by default)

[B] Remove config `maxNonPrimaryReplicantsToLoad`
This was added in #11135 to address two requirements:
- Prevent coordinator runs from getting stuck assigning too many segments to historicals
- Prevent load of replicas from competing with load of unavailable segments

Both of these requirements are now already met thanks to:
- Round-robin segment assignment
- Prioritization in the new coordinator
- Modifications to `replicationThrottleLimit`
- `smartSegmentLoading` (enabled by default)
2023-09-04 11:54:36 +05:30
Kashif Faraz 7f26b80e21
Simplify ServiceMetricEvent.Builder (#14933)
Changes:
- Make ServiceMetricEvent.Builder extend ServiceEventBuilder<ServiceMetricEvent>
and thus convert it to a plain builder rather than a builder of builder.
- Add methods setCreatedTime , setMetricAndValue to the builder
2023-09-01 11:30:45 +05:30