Commit Graph

13221 Commits

Author SHA1 Message Date
Gian Merlino 3dabfead05
Fix getResultType for HLL, quantiles aggregators. (#15043)
The aggregators had incorrect types for getResultType when shouldFinalze
is false. They had the finalized type, but they should have had the
intermediate type.

Also includes a refactor of how ExprMacroTable is handled in tests, to make
it easier to add tests for this to the MSQ module. The bug was originally
noticed because the incorrect result types caused MSQ queries with DS_HLL
to behave erratically.
2023-09-27 08:51:14 +05:30
YongGang 7301e60a9c
Add metrics for number of segments generated per task in MSQ (#14980)
Add ingest/tombstones/count and ingest/segments/count metrics in MSQ.
2023-09-26 02:46:33 +05:30
Soumyava 75af741a96
Revert "SQL: Plan non-equijoin conditions as cross join followed by filter. (#14978)" (#15029)
This reverts commit 4f498e6469.
2023-09-25 11:35:44 -07:00
Abhishek Radhakrishnan ba6101ad75
Remove `EOFException` catch block from the Avro decoders (#15018)
* Remove stale comment since we're on avro version 1.11.1

* Update exception blocks. With 1.11.1, read() only throws IOException.

* Unit tests

* Cleanup and add more tests.
2023-09-25 08:38:41 -07:00
AmatyaAvadhanula f7a549123b
Commit segments only when they are covered by active locks (#15027)
* Commit segments only when they are covered by active locks
2023-09-25 13:45:42 +05:30
Tejaswini Bandlamudi 48b6d2abf9
skip org.owasp:dependency-check on extensions-contrib modules and suppress false-positive gRPC CVEs (#15026) 2023-09-25 12:14:42 +05:30
Gian Merlino 0850e615b2
Remove istrue, isfalse vectorized impls. (#14991)
These were added in #14977, but the implementations are incorrect, because they return null when the input arg is null. They should return false when the input is null. Remove them for now, rather than fixing them, since they're so new that they might as well never have existed.
2023-09-25 11:34:24 +05:30
Soumyava c184b5250f
Unnest now works on MSQ (#14886)
This entails:
    Removing the enableUnnest flag and additional machinery
    Updating the datasource plan and frame processors to support unnest
    Adding support in MSQ for UnnestDataSource and FilteredDataSource
    CalciteArrayTest now has a MSQ test component
    Additional tests for Unnest on MSQ
2023-09-25 09:19:21 +05:30
AmatyaAvadhanula c62193c4d7
Add support for concurrent batch Append and Replace (#14407)
Changes:
- Add task context parameter `taskLockType`. This determines the type of lock used by a batch task.
- Add new task actions for transactional replace and append of segments
- Add methods StorageCoordinator.commitAppendSegments and commitReplaceSegments
- Upgrade segments to appropriate versions when performing replace and append
- Add new metadata table `upgradeSegments` to track segments that need to be upgraded
- Add tests
2023-09-25 07:06:37 +05:30
Kashif Faraz d7c152c82c
Add a TaskReport for "kill" tasks (#15023)
- Add `KillTaskReport` that contains stats for `numSegmentsKilled`,
`numBatchesProcessed`, `numSegmentsMarkedAsUnused`
- Fix bug where exception message had no formatter but was was still being passed some args.
- Add some comments regarding deprecation of `markAsUnused` flag.
2023-09-23 07:44:27 +05:30
YongGang be3f93e3cf
Restore tasks when lifecycle start (#14909)
* K8s tasks restore should be from lifecycle start

* add test

* add more tests

* fix test

* wait tasks restore finish when start

* fix style

* revert previous change and add comment
2023-09-22 12:03:34 -07:00
Karan Kumar 5cee9f6148
Allow cancellation of MSQ tasks if they are waiting for segments to load (#15000)
With PR #14322 , MSQ insert/Replace q's will wait for segment to be loaded on the historical's before finishing.

The patch introduces a bug where in the main thread had a thread.sleep() which could not be interrupted via the cancel calls from the overlord.

This new patch addressed that problem by moving the thread.sleep inside a thread of its own. Thus the main thread is now waiting on the future object of this execution.

The cancel call can now shutdown the executor service via another method thus unblocking the main thread to proceed.
2023-09-22 11:21:04 +05:30
Kashif Faraz 409bffe7f2
Rename IMSC.announceHistoricalSegments to commitSegments (#15021)
This commit pulls out some changes from #14407 to simplify that PR.

Changes:
- Rename `IndexerMetadataStorageCoordinator.announceHistoricalSegments` to `commitSegments`
- Rename the overloaded method to `commitSegmentsAndMetadata`
- Fix some typos
2023-09-21 16:19:03 +05:30
Zoltan Haindrich e76962f453
Use annotation to mark DecoupleIgnore (#15005) 2023-09-21 12:36:52 +05:30
Laksh Singla ebb794632a
Allow users with STATE permissions to read and write the state APIs for querying with deep storage (#14944)
Currently, only the user who has submitted the async query has permission to interact with the status APIs for that async query. However, often we want an administrator to interact with these resources as well.
Druid handles these with the STATE resource traditionally, and if the requesting user has necessary permissions on it as well, alternatively, they should be allowed to interact with the status APIs, irrespective of whether they are the submitter of the query.
2023-09-21 06:55:07 +05:30
Pranav 883c2692d2
Adding new function decode_base64_utf8 and expr macro (#14943)
* Adding new function decode_base64_utf8 and expr macro

* using BaseScalarUnivariateMacroFunctionExpr

* Print stack trace in case of debug in ChainedExecutionQueryRunner

* fix static check
2023-09-20 17:06:34 -07:00
Xavier Léauté 22abc10f24
update RoaringBitmap to 0.9.49 (#15006)
* update RoaringBitmap to 0.9.49

update RoaringBitmap from 0.9.0 to 0.9.49

Many optimizations and improvements have gone into recent releases of
RoaringBitmap. It seems worthwhile to incorporate those.

* implement workaround for BatchIterator interface change

* add test case for BatchIteratorAdapter.advanceIfNeeded
2023-09-20 15:52:27 -07:00
Laksh Singla 82e809c8d0
fix (#15017) 2023-09-20 15:48:26 -07:00
Zoltan Haindrich 08cf290da2
Configure caching for static-check actions (#15010)
* some stuff

* some stuff

* dont change it.sh

* some stuff

* updates

* add missing

* add 1 more

* setup-java
2023-09-20 14:11:39 -07:00
Gian Merlino 823f620ede
Add IS [NOT] DISTINCT FROM to SQL and join matchers. (#14976)
* Add IS [NOT] DISTINCT FROM to SQL and join matchers.

Changes:

1) Add "isdistinctfrom" and "notdistinctfrom" native expressions.

2) Add "IS [NOT] DISTINCT FROM" to SQL. It uses the new native expressions
   when generating expressions, and is treated the same as equals and
   not-equals when generating native filters on literals.

3) Update join matchers to have an "includeNull" parameter that determines
   whether we are operating in "equals" mode or "is not distinct from"
   mode.

* Main changes:

- Add ARRAY handling to "notdistinctfrom" and "isdistinctfrom".
- Include null in pushed-down filters when using "notdistinctfrom" in a join.

Other changes:
- Adjust join filter analyzer to more explicitly use InDimFilter's ValuesSets,
  relying less on remembering to get it right to avoid copies.

* Remove unused "wrap" method.

* Fixes.

* Remove methods we do not need.

* Fix bug with INPUT_REF.
2023-09-20 10:44:32 -07:00
Zoltan Haindrich 79f882f48c
Fix exception cause logging in QueryResultPusher (#14975) 2023-09-20 15:44:02 +05:30
Zoltan Haindrich e8773f4d0f
Enable already passing tests in DecoupledPlanningCalciteQueryTest (#14996) 2023-09-20 15:42:52 +05:30
Sam Wheating 73bab2f020
Add option to copy query results directly to clipboard (#14889)
* Add option to copy query results to clipboard

* Refactor, allow copying in all formats

---------

Co-authored-by: Sam Wheating <sam.wheating@reddit.com>
2023-09-19 10:25:39 -07:00
Gian Merlino 4f498e6469
SQL: Plan non-equijoin conditions as cross join followed by filter. (#14978)
* SQL: Plan non-equijoin conditions as cross join followed by filter.

Druid has previously refused to execute joins with non-equality-based
conditions. This was well-intentioned: the idea was to push people to
write their queries in a different, hopefully more performant way.

But as we're moving towards fuller SQL support, it makes more sense to
allow these conditions to go through with the best plan we can come up
with: a cross join followed by a filter. In some cases this will allow
the query to run, and people will be happy with that. In other cases,
it will run into resource limits during execution. But we should at
least give the query a chance.

This patch also updates the documentation to explain how people can
tell whether their queries are being planned this way.

* cartesian is a word.

* Adjust tests.

* Update docs/querying/datasource.md

Co-authored-by: Benedict Jin <asdf2014@apache.org>

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
2023-09-19 10:23:42 -07:00
George Shiqi Wu d459df8d6e
Fix log syntax (#15004) 2023-09-18 10:40:02 -07:00
Karan Kumar 973fbaf962
Adding addition logging for taskIdReady in MSQ for debugging lock races. (#14998) 2023-09-17 20:11:58 +00:00
Rohan Garg 39d95955f5
Do not eagerly close inner iterators in CloseableIterator#flatMap (#14986) 2023-09-15 15:14:20 +05:30
Laksh Singla 0fc5d5405a
Tweak GHA runner label for MSQ (#14992) 2023-09-15 05:44:21 +00:00
Soumyava 279b3818f0
Make Unnest work with nullif operator (#14993)
This is due to the recursive filter creation in unnest storage adapter not performing correctly in case of an empty children. This PR addresses the issue
2023-09-15 09:54:14 +05:30
Gian Merlino 3ae5e97801
Add IS [NOT] TRUE, IS [NOT] FALSE native functions. (#14977)
They are not quite the same as "x == true", "x != true", etc. These
functions never return null, even when "x" itself is null.
2023-09-14 09:19:09 -07:00
AmatyaAvadhanula 0e3df2d2e9
Clean up stale locks if segment allocation fails (#14966)
* Clean up stale locks if segment allocation fails due to an exception
2023-09-14 14:58:02 +05:30
Soumyava 7bbefd5741
Updating version in from.ftl (#14982) 2023-09-14 05:11:36 +00:00
Soumyava 5c42ac8c4d
Fix for latest agg to handle nulls in time column. Also adding optimi… (#14911)
* Fix for latest agg to handle nulls in time column. Also adding optimization for dictionary encoded string columns

* One minor fix

* Adding more tests for the new class

* Changing the init to a putInt
2023-09-13 17:37:26 -07:00
Soumyava bf99d2c7b2
Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly (#14924)
* Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly

* Fixing a failed test

* Updating numericNilAgg

* Moving to use default values in case of nil agg

* Adding the same for first agg

* Fixing a test

* fixing vectorized string agg for last/first with cast if numeric

* Updating tests to remove mockito and cover the case of string first/last on non string columns

* Updating a test to vectorize

* Addressing review comments: Name change to NilVectorAggregator and using static variables now

* fixing intellij inspections
2023-09-13 13:15:14 -07:00
Benedict Jin 7f757e33f0
Fix the created property in DOAP RDF file (#14971) 2023-09-13 06:12:35 -07:00
Laksh Singla 4c57504960
Fix the uncaught exceptions when materializing results as frames (#14970)
When materializing the results as frames, we defer the creation of the frames in ScanQueryQueryToolChest, which passes through the catch-all block reserved for catching cases when we don't have the complete row signature in the query (and falls back to the old code).
This PR aims to resolve it by adding the frame generation code to the try-catch block we have at the outer level.
2023-09-13 15:41:28 +05:30
Tejaswini Bandlamudi b7bb5ee1db
Upload docker and druid service logs as artifacts on GitHub Actions IT run failure (#14967)
With this PR, docker and druid service logs are uploaded as artifacts onto GitHub when an IT job fails so that we can later download them for investigation.
2023-09-13 11:32:04 +05:30
Clint Wylie 23b78c0f95
use mmap for nested column value to dictionary id lookup for more chill heap usage during serialization (#14919) 2023-09-12 21:01:18 -07:00
Kashif Faraz 286eecad7c
Simplify DruidCoordinatorConfig and binding of metadata cleanup duties (#14891)
Changes:
- Move following configs from `CliCoordinator` to `DruidCoordinatorConfig`:
  - `druid.coordinator.kill.on`
  - `druid.coordinator.kill.pendingSegments.on`
  - `druid.coordinator.kill.supervisors.on`
  - `druid.coordinator.kill.rules.on`
  - `druid.coordinator.kill.audit.on`
  - `druid.coordinator.kill.datasource.on`
  - `druid.coordinator.kill.compaction.on`
- In the Coordinator style used by historical management duties, always instantiate all
 the metadata cleanup duties but execute only if enabled. In the existing code, they are
instantiated only when enabled by using optional binding with Guice.
- Add a wrapper `MetadataManager` which contains handles to all the different
metadata managers for rules, supervisors, segments, etc.
- Add a `CoordinatorConfigManager` to simplify read and update of coordinator configs
- Remove persistence related methods from `CoordinatorCompactionConfig` and
`CoordinatorDynamicConfig` as these are config classes.
- Remove annotations `@CoordinatorIndexingServiceDuty`,
`@CoordinatorMetadataStoreManagementDuty`
2023-09-13 09:06:57 +05:30
Suneet Saldanha 6371721e17
Add DOAP file for Druid (#14954)
The DOAP file is a standard RDF file that describes a project's metadata.
The Apache Software Foundation uses DOAP files for projects to keep project
listing information at https://projects.apache.org/projects.html

This change just introduces basic information about the project. Future
changes can add more information like each release that goes out.

The descriptions were pulled from the website and the README in this repo.
2023-09-12 17:40:21 -07:00
Clint Wylie 891f0a3fe9
longer compatibility window for nested column format v4 (#14955)
changes:
* add back nested column v4 serializers
* 'json' schema by default still uses the newer 'nested common format' used by 'auto', but now has an optional 'formatVersion' property which can be specified to override format versions on native ingest jobs
* add system config to specify default column format stuff, 'druid.indexing.formats', and property 'druid.indexing.formats.nestedColumnFormatVersion' to specify system level preferred nested column format for friendly rolling upgrades from versions which do not support the newer 'nested common format' used by 'auto'
2023-09-12 14:07:53 -07:00
Zoltan Haindrich 5d16d0edf0
Count distinct returned incorrect results without useApproximateCountDistinct (#14748)
* fix grouping engine handling of summaries when result set is empty
2023-09-12 13:57:54 -07:00
Abhishek Radhakrishnan 0f38a37b9d
Tweak GHA runner label. (#14963)
- processing/** can be ingestion, querying or neither. Removing it
for now.
- Also, add msq extension for the querying label.
2023-09-11 20:09:26 -07:00
Clint Wylie 5cecf6ce8f
fix issue with segment metadata cache and complex types when doing out of order upgrades from 0.22 (#14948) 2023-09-12 10:54:35 +08:00
Suneet Saldanha 757603a773
Set task location as k8sPodName for mm-less ingestion (#14959)
* Set task location as k8sPodName for mm-less ingestion

* tests
2023-09-11 19:44:26 -07:00
George Shiqi Wu f773d83914
Mixed task runner for migration to mm-less ingestion (#14918)
* save work

* Working

* Fix runner constructor

* Working runner

* extra log lines

* try using lifecycle for everything

* clean up configs

* cleanup /workers call

* Use a single config

* Allow selecting runner

* debug changes

* Work on composite task runner

* Unit tests running

* Add documentation

* Add some javadocs

* Fix spelling

* Use standard libraries

* code review

* fix

* fix

* use taskRunner as string

* checkstyl

---------

Co-authored-by: Suneet Saldanha <suneet@apache.org>
2023-09-11 18:09:46 -07:00
317brian 3a453f7a3c
docs: add note about transparent_reconnection (#14953)
* add note about transparent_reconnection

* Update docs/api-reference/sql-jdbc.md
2023-09-11 11:58:39 -07:00
Kashif Faraz 7871e633c6
Fix bug in KillStalePendingSegments (#14961) 2023-09-11 15:18:15 +05:30
Tejaswini Bandlamudi dec6a0aa14
Update google client apis to latest version (#14414)
Currently Druid is using google apis client 1.26.0 version and google-oauth-client-1.26.0.jar in particular is bringing following CVEs CVE-2020-7692, CVE-2021-22573. Despite the CVEs being false positives, they're causing red security scans on Druid distribution. Hence updating the version to latest version with these CVE fixes.
2023-09-11 12:27:23 +05:30
Clint Wylie 2b7f2c5119
use VectorValueSelector instead of BaseLongVectorValueSelector for StringFirstAggregatorFactory.factorizeVector (#14957) 2023-09-09 04:03:05 -07:00