13572 Commits

Author SHA1 Message Date
George Shiqi Wu
8e22a178cc
Support getTaskLocation for mixed task runner (#15033)
The KubernetesAndWorkerTaskRunner currently doesn't implement getTaskLocation, so tasks run by it will show a unknown TaskLocation in the druid console after a task has completed.

Fix bug in KubernetesAndWorkerTaskRunner that manifests as missing information in the druid Web Console.
2023-09-27 08:57:36 +05:30
Gian Merlino
3dabfead05
Fix getResultType for HLL, quantiles aggregators. (#15043)
The aggregators had incorrect types for getResultType when shouldFinalze
is false. They had the finalized type, but they should have had the
intermediate type.

Also includes a refactor of how ExprMacroTable is handled in tests, to make
it easier to add tests for this to the MSQ module. The bug was originally
noticed because the incorrect result types caused MSQ queries with DS_HLL
to behave erratically.
2023-09-27 08:51:14 +05:30
YongGang
7301e60a9c
Add metrics for number of segments generated per task in MSQ (#14980)
Add ingest/tombstones/count and ingest/segments/count metrics in MSQ.
2023-09-26 02:46:33 +05:30
Soumyava
75af741a96
Revert "SQL: Plan non-equijoin conditions as cross join followed by filter. (#14978)" (#15029)
This reverts commit 4f498e64691ecd22eaa2c940d1d0d57e769ee9e7.
2023-09-25 11:35:44 -07:00
Abhishek Radhakrishnan
ba6101ad75
Remove EOFException catch block from the Avro decoders (#15018)
* Remove stale comment since we're on avro version 1.11.1

* Update exception blocks. With 1.11.1, read() only throws IOException.

* Unit tests

* Cleanup and add more tests.
2023-09-25 08:38:41 -07:00
AmatyaAvadhanula
f7a549123b
Commit segments only when they are covered by active locks (#15027)
* Commit segments only when they are covered by active locks
2023-09-25 13:45:42 +05:30
Tejaswini Bandlamudi
48b6d2abf9
skip org.owasp:dependency-check on extensions-contrib modules and suppress false-positive gRPC CVEs (#15026) 2023-09-25 12:14:42 +05:30
Gian Merlino
0850e615b2
Remove istrue, isfalse vectorized impls. (#14991)
These were added in #14977, but the implementations are incorrect, because they return null when the input arg is null. They should return false when the input is null. Remove them for now, rather than fixing them, since they're so new that they might as well never have existed.
2023-09-25 11:34:24 +05:30
Soumyava
c184b5250f
Unnest now works on MSQ (#14886)
This entails:
    Removing the enableUnnest flag and additional machinery
    Updating the datasource plan and frame processors to support unnest
    Adding support in MSQ for UnnestDataSource and FilteredDataSource
    CalciteArrayTest now has a MSQ test component
    Additional tests for Unnest on MSQ
2023-09-25 09:19:21 +05:30
AmatyaAvadhanula
c62193c4d7
Add support for concurrent batch Append and Replace (#14407)
Changes:
- Add task context parameter `taskLockType`. This determines the type of lock used by a batch task.
- Add new task actions for transactional replace and append of segments
- Add methods StorageCoordinator.commitAppendSegments and commitReplaceSegments
- Upgrade segments to appropriate versions when performing replace and append
- Add new metadata table `upgradeSegments` to track segments that need to be upgraded
- Add tests
2023-09-25 07:06:37 +05:30
Kashif Faraz
d7c152c82c
Add a TaskReport for "kill" tasks (#15023)
- Add `KillTaskReport` that contains stats for `numSegmentsKilled`,
`numBatchesProcessed`, `numSegmentsMarkedAsUnused`
- Fix bug where exception message had no formatter but was was still being passed some args.
- Add some comments regarding deprecation of `markAsUnused` flag.
2023-09-23 07:44:27 +05:30
YongGang
be3f93e3cf
Restore tasks when lifecycle start (#14909)
* K8s tasks restore should be from lifecycle start

* add test

* add more tests

* fix test

* wait tasks restore finish when start

* fix style

* revert previous change and add comment
2023-09-22 12:03:34 -07:00
Karan Kumar
5cee9f6148
Allow cancellation of MSQ tasks if they are waiting for segments to load (#15000)
With PR #14322 , MSQ insert/Replace q's will wait for segment to be loaded on the historical's before finishing.

The patch introduces a bug where in the main thread had a thread.sleep() which could not be interrupted via the cancel calls from the overlord.

This new patch addressed that problem by moving the thread.sleep inside a thread of its own. Thus the main thread is now waiting on the future object of this execution.

The cancel call can now shutdown the executor service via another method thus unblocking the main thread to proceed.
2023-09-22 11:21:04 +05:30
Kashif Faraz
409bffe7f2
Rename IMSC.announceHistoricalSegments to commitSegments (#15021)
This commit pulls out some changes from #14407 to simplify that PR.

Changes:
- Rename `IndexerMetadataStorageCoordinator.announceHistoricalSegments` to `commitSegments`
- Rename the overloaded method to `commitSegmentsAndMetadata`
- Fix some typos
2023-09-21 16:19:03 +05:30
Zoltan Haindrich
e76962f453
Use annotation to mark DecoupleIgnore (#15005) 2023-09-21 12:36:52 +05:30
Laksh Singla
ebb794632a
Allow users with STATE permissions to read and write the state APIs for querying with deep storage (#14944)
Currently, only the user who has submitted the async query has permission to interact with the status APIs for that async query. However, often we want an administrator to interact with these resources as well.
Druid handles these with the STATE resource traditionally, and if the requesting user has necessary permissions on it as well, alternatively, they should be allowed to interact with the status APIs, irrespective of whether they are the submitter of the query.
2023-09-21 06:55:07 +05:30
Pranav
883c2692d2
Adding new function decode_base64_utf8 and expr macro (#14943)
* Adding new function decode_base64_utf8 and expr macro

* using BaseScalarUnivariateMacroFunctionExpr

* Print stack trace in case of debug in ChainedExecutionQueryRunner

* fix static check
2023-09-20 17:06:34 -07:00
Xavier Léauté
22abc10f24
update RoaringBitmap to 0.9.49 (#15006)
* update RoaringBitmap to 0.9.49

update RoaringBitmap from 0.9.0 to 0.9.49

Many optimizations and improvements have gone into recent releases of
RoaringBitmap. It seems worthwhile to incorporate those.

* implement workaround for BatchIterator interface change

* add test case for BatchIteratorAdapter.advanceIfNeeded
2023-09-20 15:52:27 -07:00
Laksh Singla
82e809c8d0
fix (#15017) 2023-09-20 15:48:26 -07:00
Zoltan Haindrich
08cf290da2
Configure caching for static-check actions (#15010)
* some stuff

* some stuff

* dont change it.sh

* some stuff

* updates

* add missing

* add 1 more

* setup-java
2023-09-20 14:11:39 -07:00
Gian Merlino
823f620ede
Add IS [NOT] DISTINCT FROM to SQL and join matchers. (#14976)
* Add IS [NOT] DISTINCT FROM to SQL and join matchers.

Changes:

1) Add "isdistinctfrom" and "notdistinctfrom" native expressions.

2) Add "IS [NOT] DISTINCT FROM" to SQL. It uses the new native expressions
   when generating expressions, and is treated the same as equals and
   not-equals when generating native filters on literals.

3) Update join matchers to have an "includeNull" parameter that determines
   whether we are operating in "equals" mode or "is not distinct from"
   mode.

* Main changes:

- Add ARRAY handling to "notdistinctfrom" and "isdistinctfrom".
- Include null in pushed-down filters when using "notdistinctfrom" in a join.

Other changes:
- Adjust join filter analyzer to more explicitly use InDimFilter's ValuesSets,
  relying less on remembering to get it right to avoid copies.

* Remove unused "wrap" method.

* Fixes.

* Remove methods we do not need.

* Fix bug with INPUT_REF.
2023-09-20 10:44:32 -07:00
Zoltan Haindrich
79f882f48c
Fix exception cause logging in QueryResultPusher (#14975) 2023-09-20 15:44:02 +05:30
Zoltan Haindrich
e8773f4d0f
Enable already passing tests in DecoupledPlanningCalciteQueryTest (#14996) 2023-09-20 15:42:52 +05:30
Sam Wheating
73bab2f020
Add option to copy query results directly to clipboard (#14889)
* Add option to copy query results to clipboard

* Refactor, allow copying in all formats

---------

Co-authored-by: Sam Wheating <sam.wheating@reddit.com>
2023-09-19 10:25:39 -07:00
Gian Merlino
4f498e6469
SQL: Plan non-equijoin conditions as cross join followed by filter. (#14978)
* SQL: Plan non-equijoin conditions as cross join followed by filter.

Druid has previously refused to execute joins with non-equality-based
conditions. This was well-intentioned: the idea was to push people to
write their queries in a different, hopefully more performant way.

But as we're moving towards fuller SQL support, it makes more sense to
allow these conditions to go through with the best plan we can come up
with: a cross join followed by a filter. In some cases this will allow
the query to run, and people will be happy with that. In other cases,
it will run into resource limits during execution. But we should at
least give the query a chance.

This patch also updates the documentation to explain how people can
tell whether their queries are being planned this way.

* cartesian is a word.

* Adjust tests.

* Update docs/querying/datasource.md

Co-authored-by: Benedict Jin <asdf2014@apache.org>

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
2023-09-19 10:23:42 -07:00
George Shiqi Wu
d459df8d6e
Fix log syntax (#15004) 2023-09-18 10:40:02 -07:00
Karan Kumar
973fbaf962
Adding addition logging for taskIdReady in MSQ for debugging lock races. (#14998) 2023-09-17 20:11:58 +00:00
Rohan Garg
39d95955f5
Do not eagerly close inner iterators in CloseableIterator#flatMap (#14986) 2023-09-15 15:14:20 +05:30
Laksh Singla
0fc5d5405a
Tweak GHA runner label for MSQ (#14992) 2023-09-15 05:44:21 +00:00
Soumyava
279b3818f0
Make Unnest work with nullif operator (#14993)
This is due to the recursive filter creation in unnest storage adapter not performing correctly in case of an empty children. This PR addresses the issue
2023-09-15 09:54:14 +05:30
Gian Merlino
3ae5e97801
Add IS [NOT] TRUE, IS [NOT] FALSE native functions. (#14977)
They are not quite the same as "x == true", "x != true", etc. These
functions never return null, even when "x" itself is null.
2023-09-14 09:19:09 -07:00
AmatyaAvadhanula
0e3df2d2e9
Clean up stale locks if segment allocation fails (#14966)
* Clean up stale locks if segment allocation fails due to an exception
2023-09-14 14:58:02 +05:30
Soumyava
7bbefd5741
Updating version in from.ftl (#14982) 2023-09-14 05:11:36 +00:00
Soumyava
5c42ac8c4d
Fix for latest agg to handle nulls in time column. Also adding optimi… (#14911)
* Fix for latest agg to handle nulls in time column. Also adding optimization for dictionary encoded string columns

* One minor fix

* Adding more tests for the new class

* Changing the init to a putInt
2023-09-13 17:37:26 -07:00
Soumyava
bf99d2c7b2
Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly (#14924)
* Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly

* Fixing a failed test

* Updating numericNilAgg

* Moving to use default values in case of nil agg

* Adding the same for first agg

* Fixing a test

* fixing vectorized string agg for last/first with cast if numeric

* Updating tests to remove mockito and cover the case of string first/last on non string columns

* Updating a test to vectorize

* Addressing review comments: Name change to NilVectorAggregator and using static variables now

* fixing intellij inspections
2023-09-13 13:15:14 -07:00
Benedict Jin
7f757e33f0
Fix the created property in DOAP RDF file (#14971) 2023-09-13 06:12:35 -07:00
Laksh Singla
4c57504960
Fix the uncaught exceptions when materializing results as frames (#14970)
When materializing the results as frames, we defer the creation of the frames in ScanQueryQueryToolChest, which passes through the catch-all block reserved for catching cases when we don't have the complete row signature in the query (and falls back to the old code).
This PR aims to resolve it by adding the frame generation code to the try-catch block we have at the outer level.
2023-09-13 15:41:28 +05:30
Tejaswini Bandlamudi
b7bb5ee1db
Upload docker and druid service logs as artifacts on GitHub Actions IT run failure (#14967)
With this PR, docker and druid service logs are uploaded as artifacts onto GitHub when an IT job fails so that we can later download them for investigation.
2023-09-13 11:32:04 +05:30
Clint Wylie
23b78c0f95
use mmap for nested column value to dictionary id lookup for more chill heap usage during serialization (#14919) 2023-09-12 21:01:18 -07:00
Kashif Faraz
286eecad7c
Simplify DruidCoordinatorConfig and binding of metadata cleanup duties (#14891)
Changes:
- Move following configs from `CliCoordinator` to `DruidCoordinatorConfig`:
  - `druid.coordinator.kill.on`
  - `druid.coordinator.kill.pendingSegments.on`
  - `druid.coordinator.kill.supervisors.on`
  - `druid.coordinator.kill.rules.on`
  - `druid.coordinator.kill.audit.on`
  - `druid.coordinator.kill.datasource.on`
  - `druid.coordinator.kill.compaction.on`
- In the Coordinator style used by historical management duties, always instantiate all
 the metadata cleanup duties but execute only if enabled. In the existing code, they are
instantiated only when enabled by using optional binding with Guice.
- Add a wrapper `MetadataManager` which contains handles to all the different
metadata managers for rules, supervisors, segments, etc.
- Add a `CoordinatorConfigManager` to simplify read and update of coordinator configs
- Remove persistence related methods from `CoordinatorCompactionConfig` and
`CoordinatorDynamicConfig` as these are config classes.
- Remove annotations `@CoordinatorIndexingServiceDuty`,
`@CoordinatorMetadataStoreManagementDuty`
2023-09-13 09:06:57 +05:30
Suneet Saldanha
6371721e17
Add DOAP file for Druid (#14954)
The DOAP file is a standard RDF file that describes a project's metadata.
The Apache Software Foundation uses DOAP files for projects to keep project
listing information at https://projects.apache.org/projects.html

This change just introduces basic information about the project. Future
changes can add more information like each release that goes out.

The descriptions were pulled from the website and the README in this repo.
2023-09-12 17:40:21 -07:00
Clint Wylie
891f0a3fe9
longer compatibility window for nested column format v4 (#14955)
changes:
* add back nested column v4 serializers
* 'json' schema by default still uses the newer 'nested common format' used by 'auto', but now has an optional 'formatVersion' property which can be specified to override format versions on native ingest jobs
* add system config to specify default column format stuff, 'druid.indexing.formats', and property 'druid.indexing.formats.nestedColumnFormatVersion' to specify system level preferred nested column format for friendly rolling upgrades from versions which do not support the newer 'nested common format' used by 'auto'
2023-09-12 14:07:53 -07:00
Zoltan Haindrich
5d16d0edf0
Count distinct returned incorrect results without useApproximateCountDistinct (#14748)
* fix grouping engine handling of summaries when result set is empty
2023-09-12 13:57:54 -07:00
Abhishek Radhakrishnan
0f38a37b9d
Tweak GHA runner label. (#14963)
- processing/** can be ingestion, querying or neither. Removing it
for now.
- Also, add msq extension for the querying label.
2023-09-11 20:09:26 -07:00
Clint Wylie
5cecf6ce8f
fix issue with segment metadata cache and complex types when doing out of order upgrades from 0.22 (#14948) 2023-09-12 10:54:35 +08:00
Suneet Saldanha
757603a773
Set task location as k8sPodName for mm-less ingestion (#14959)
* Set task location as k8sPodName for mm-less ingestion

* tests
2023-09-11 19:44:26 -07:00
George Shiqi Wu
f773d83914
Mixed task runner for migration to mm-less ingestion (#14918)
* save work

* Working

* Fix runner constructor

* Working runner

* extra log lines

* try using lifecycle for everything

* clean up configs

* cleanup /workers call

* Use a single config

* Allow selecting runner

* debug changes

* Work on composite task runner

* Unit tests running

* Add documentation

* Add some javadocs

* Fix spelling

* Use standard libraries

* code review

* fix

* fix

* use taskRunner as string

* checkstyl

---------

Co-authored-by: Suneet Saldanha <suneet@apache.org>
2023-09-11 18:09:46 -07:00
317brian
3a453f7a3c
docs: add note about transparent_reconnection (#14953)
* add note about transparent_reconnection

* Update docs/api-reference/sql-jdbc.md
2023-09-11 11:58:39 -07:00
Kashif Faraz
7871e633c6
Fix bug in KillStalePendingSegments (#14961) 2023-09-11 15:18:15 +05:30
Tejaswini Bandlamudi
dec6a0aa14
Update google client apis to latest version (#14414)
Currently Druid is using google apis client 1.26.0 version and google-oauth-client-1.26.0.jar in particular is bringing following CVEs CVE-2020-7692, CVE-2021-22573. Despite the CVEs being false positives, they're causing red security scans on Druid distribution. Hence updating the version to latest version with these CVE fixes.
2023-09-11 12:27:23 +05:30