Commit Graph

1566 Commits

Author SHA1 Message Date
Zoltan Haindrich 4157a8f105 add/.etc 2024-07-30 10:16:03 +00:00
Zoltan Haindrich 8bb38a04a5 fix FIMXE 2024-07-25 03:33:33 +00:00
Zoltan Haindrich d705c2759b cleanup 2024-07-25 03:05:04 +00:00
Zoltan Haindrich a489e19242 move to new file 2024-07-24 17:26:07 +00:00
Zoltan Haindrich 0be1f81d7e remove druidPrettyprinter 2024-07-24 17:17:15 +00:00
Zoltan Haindrich 7cfbfdc3ee add DruidPrettyPrinter 2024-07-24 17:14:30 +00:00
Zoltan Haindrich e60a200d95 format/etc 2024-07-24 15:16:39 +00:00
Zoltan Haindrich 31e97324ce x 2024-07-19 11:36:51 +00:00
Zoltan Haindrich 361149b097 m 2024-07-19 07:29:50 +00:00
Zoltan Haindrich bc7174cb6a cleanup 2024-07-19 04:30:15 +00:00
Zoltan Haindrich 9cf723adae rename 2024-07-19 04:29:05 +00:00
Zoltan Haindrich 7a34b6e092 cleanup 2024-07-19 04:28:02 +00:00
Zoltan Haindrich eb4fd9f66c removedup 2024-07-18 07:24:56 +00:00
Zoltan Haindrich 47aeb016df Merge branch 'quidem-record' into quidem-msq 2024-07-18 05:48:32 +00:00
Zoltan Haindrich 06b68b6c89 Merge remote-tracking branch 'apache/master' into quidem-record 2024-07-18 05:48:13 +00:00
Akshat Jain b53c26f5c5
Fix issues with partitioning boundaries for MSQ window functions (#16729)
* Fix issues with partitioning boundaries for MSQ window functions

* Address review comments

* Address review comments

* Add test for coverage check failure

* Address review comment

* Remove DruidWindowQueryTest and WindowQueryTestBase, move those tests to DrillWindowQueryTest

* Update extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/WindowOperatorQueryKit.java

* Address review comments

* Add test for equals and hashcode for WindowOperatorQueryFrameProcessorFactory

* Address review comment

* Fix checkstyle

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
2024-07-18 10:05:09 +08:00
Zoltan Haindrich 70ff2a3e97 add exploratory msqPlan cmd 2024-07-17 19:48:08 +00:00
Zoltan Haindrich 8b26e490e9 fix types/resultset/etc 2024-07-17 19:30:33 +00:00
Zoltan Haindrich c59f1adcc8 updates 2024-07-17 16:42:22 +00:00
Zoltan Haindrich 95ca0a9f5d cleanup 2024-07-17 16:41:09 +00:00
Zoltan Haindrich b100e982a4 make/etc 2024-07-17 16:40:30 +00:00
Zoltan Haindrich 0811d801fb make query run 2024-07-17 16:33:10 +00:00
Zoltan Haindrich 97c32ca3de less crappy way to run it 2024-07-17 16:19:08 +00:00
Zoltan Haindrich 6790f9cf8b move stuff 2024-07-17 16:08:32 +00:00
Zoltan Haindrich 51d465df6d make engine load via injector for msqdrill 2024-07-17 16:04:14 +00:00
Zoltan Haindrich f3cf778115 some stuff 2024-07-17 15:48:36 +00:00
Zoltan Haindrich 42b3086512 msq-test-0 2024-07-17 15:38:50 +00:00
Zoltan Haindrich 8ada2ff238 picked akshat's 3e0202811e05dcd07db5ab47791151fab5dd5772 2024-07-17 14:44:27 +00:00
Zoltan Haindrich 2a590eb3ae Merge commit 'apache/master^^^' into quidem-record 2024-07-17 13:27:54 +00:00
trompa ebf216829d
#16717 defer provider instantiation in Kubernetes Module (#16726)
* #16717 defer provider instatiation

* add license header

* fix style, ignore new class in jacoco as it is still initialization code

---------

Co-authored-by: Alberto Lago Alvarado <albl@sitecore.net>
2024-07-16 13:05:28 -07:00
Sree Charan Manamala 78a4a09d01
Window Function offset correction for RAC (#16718)
* When an ArrayList RAC creates a child RAC, the start and end offsets need to have the offset of parent's start offset
* Defaults the 2nd window bound to CURRENT ROW when only a single bound is specified
* Removes the windowingStrictValidation warning and throws a hard exception when Order By alongside RANGE clause is not provided with UNBOUNDED or CURRENT ROW as both bounds
2024-07-15 12:43:27 +02:00
Kashif Faraz 656667ee89
Tests: Add utility class TuningConfigBuilder to make IndexTask tests more readable and concise (#16732)
Changes:
- No functional change
- Add class `TuningConfigBuilder` to build `IndexTuningConfig`, `CompactionTuningConfig`
- Remove old class `ParallelIndexTestingFactory.TuningConfigBuilder`
- Remove some unused fields and methods
2024-07-15 10:13:06 +05:30
Kashif Faraz a618c5dd0d
Refactor: Miscellaneous batch task cleanup (#16730)
Changes
- No functional change
- Remove unused method `IndexTuningConfig.withPartitionsSpec()`
- Remove unused method `ParallelIndexTuningConfig.withPartitionsSpec()`
- Remove redundant method `CompactTask.emitIngestionModeMetrics()`
- Remove Clock argument from `CompactionTask.createDataSchemasForInterval()` as it was only needed
for one test which was just verifying the value passed by the test itself. The code now uses a `Stopwatch`
instead and test simply verifies that the metric has been emitted.
- Other minor cleanup changes
2024-07-13 08:12:51 +05:30
Laksh Singla 3a1b437056
Improve the fallback strategy when the broker is unable to materialize the subquery's results as frames for estimating the bytes (#16679)
Better fallback strategy when the broker is unable to materialize the subquery's results as frames for estimating the bytes:
a. We don't touch the subquery sequence till we know that we can materialize the result as frames
2024-07-12 21:49:12 +05:30
Vishesh Garg 197c54f673
Auto-Compaction using Multi-Stage Query Engine (#16291)
Description:
Compaction operations issued by the Coordinator currently run using the native query engine.
As majority of the advancements that we are making in batch ingestion are in MSQ, it is imperative
that we support compaction on MSQ to make Compaction more robust and possibly faster. 
For instance, we have seen OOM errors in native compaction that MSQ could have handled by its
auto-calculation of tuning parameters. 

This commit enables compaction on MSQ to remove the dependency on native engine. 

Main changes:
* `DataSourceCompactionConfig` now has an additional field `engine` that can be one of 
`[native, msq]` with `native` being the default.
*  if engine is MSQ, `CompactSegments` duty assigns all available compaction task slots to the
launched `CompactionTask` to ensure full capacity is available to MSQ. This is to avoid stalling which
could happen in case a fraction of the tasks were allotted and they eventually fell short of the number
of tasks required by the MSQ engine to run the compaction.
* `ClientCompactionTaskQuery` has a new field `compactionRunner` with just one `engine` field.
* `CompactionTask` now has `CompactionRunner` interface instance with its implementations
`NativeCompactinRunner` and `MSQCompactionRunner` in the `druid-multi-stage-query` extension.
The objectmapper deserializes `ClientCompactionRunnerInfo` in `ClientCompactionTaskQuery` to the
`CompactionRunner` instance that is mapped to the specified type [`native`, `msq`]. 
* `CompactTask` uses the `CompactionRunner` instance it receives to create the indexing tasks.
* `CompactionTask` to `MSQControllerTask` conversion logic checks whether metrics are present in 
the segment schema. If present, the task is created with a native group-by query; if not, the task is
issued with a scan query. The `storeCompactionState` flag is set in the context.
* Each created `MSQControllerTask` is launched in-place and its `TaskStatus` tracked to determine the
final status of the `CompactionTask`. The id of each of these tasks is the same as that of `CompactionTask`
since otherwise, the workers will be unable to determine the controller task's location for communication
(as they haven't been launched via the overlord).
2024-07-12 16:40:20 +05:30
Sree Charan Manamala eb981d855f
Correct aggregators violating names (#16615)
In case of few aggregators for example BloomSqlAggregator, BaseVarianceSqlAggregator etc, the aggName is being updated from a0 to a0:agg, breaching the contract as we would expect the aggName as the name which is passed. This is causing a mismatch while creating a column accessor.

This commit aims to correct those violating sql aggregators.
2024-07-12 09:18:09 +02:00
Adarsh Sanjeev 7c625356c5
Add logging for sketches on workers (#16697)
Improve the logging of sketches on workers.
2024-07-09 14:37:43 +05:30
Adarsh Sanjeev af5399cd9d
Fixes a bug when running queries with a limit clause (#16643)
Add a shuffling based on the resultShuffleSpecFactory after a limit processor depending on the query destination. LimitFrameProcessors currently do not update the partition boosting column, so we also add the boost column to the previous stage, if one is required.
2024-07-09 14:29:12 +05:30
Kashif Faraz 7c6f2b1e20
Minor log cleanup in K8sDruidNodeDiscoveryProvider (#16701) 2024-07-08 18:32:39 +05:30
Abhishek Radhakrishnan 35b970935f
Better error handling when retrieving Avro schemas from registry (#16684)
* Handle RestClientException separately, instead of returning a generic error.

- Add tests
- Clean up the tests; remove the legacy expected exception pattern
- Better test assertions

* Rename tests

* checkstyle fixes
2024-07-02 16:48:34 -07:00
Akshat Jain 34c80ee3de
Add MSQ engine support for window function drill tests (#16665)
* Add MSQ engine support for window function drill tests

* Address review comments

* Revert formatting changes in TestDataBuilder
2024-06-28 11:14:17 +05:30
Abhishek Radhakrishnan 82117e8101
Add MSQ query context `maxNumSegments` (#16637)
* Add MSQ query context maxNumSegments.

- Default is MAX_INT (unbounded).
- When set and if a time chunk contains more number of segments than set in the
  query context, the MSQ task will fail with TooManySegments fault.

* Fixup hashCode().

* Rename and checkpoint.

* Add some insert and replace happy and sad path tests.

* Update error msg.

* Commentary

* Adjust the default to be null (meaning no max bound on number of segments).

Also fix formatter.

* Fix CodeQL warnings and minor cleanup.

* Assert on maxNumSegments tuning config.

* Minor test cleanup.

* Use null default for the MultiStageQueryContext as well

* Review feedback

* Review feedback

* Move logic to common function getPartitionsByBucket shared by INSERT and REPLACE.

* Rename to validateNumSegmentsPerBucketOrThrow() for consistency.

* Add segmentGranularity to error message.
2024-06-26 09:29:51 -07:00
Laksh Singla 71b3b5ab5d
Add query context parameter to remove null bytes when writing frames (#16579)
MSQ cannot process null bytes in string fields, and the current workaround is to remove them using the REPLACE function. 'removeNullBytes' context parameter has been added which sanitizes the input string fields by removing these null bytes.
2024-06-26 15:00:30 +05:30
Tom 52c9929019
Column name in parse exceptions (#16529)
* first pass

* more changes

* fix tests and formatting

* fix kinesis failing tests

* fix kafka tests

* add dimension name to float parse errors

* double and convertToType handling of dimensionName can report parse errors with dimension name

* fix checkstyle issue

* fix tests

* more cases to have better parse exception messages

* fix test

* fix tests

* partially address comments

* annotate method parameter with nullable

* address comments

* fix tests

* let float, double, long dimensionIndexer pass dimensionName down to dimensionHandlerUtils

* fix compilation error and clean up formatting

* clean up whitespace

* address feedback. undo change, pass down report parse exception for convertToType

* fix test
2024-06-25 13:42:52 -07:00
Zoltan Haindrich 1a5faf1afb more pomxml stuff 2024-06-25 08:13:31 +00:00
Zoltan Haindrich 3dfe5c4a05 add reflections 2024-06-25 07:08:01 +00:00
Kashif Faraz f1043d20bc
Support csv input format in Kafka ingestion with header (#16630)
* Support ListBasedInputRow in Kafka ingestion with header
* Fix up buildBlendedEventMap
* Add new test for KafkaInputFormat with csv value and headers
* Do not use forbidden APIs
* Move utility method to TestUtils
2024-06-25 11:50:01 +05:30
Clint Wylie 37a50e6803
Remove index_realtime and index_realtime_appenderator tasks (#16602)
index_realtime tasks were removed from the documentation in #13107. Even
at that time, they weren't really documented per se— just mentioned. They
existed solely to support Tranquility, which is an obsolete ingestion
method that predates migration of Druid to ASF and is no longer being
maintained. Tranquility docs were also de-linked from the sidebars and
the other doc pages in #11134. Only a stub remains, so people with
links to the page can see that it's no longer recommended.

index_realtime_appenderator tasks existed in the code base, but were
never documented, nor as far as I am aware were they used for any purpose.

This patch removes both task types completely, as well as removes all
supporting code that was otherwise unused. It also updates the stub
doc for Tranquility to be firmer that it is not compatible. (Previously,
the stub doc said it wasn't recommended, and pointed out that it is
built against an ancient 0.9.2 version of Druid.)

ITUnionQueryTest has been migrated to the new integration tests framework and updated to use Kafka ingestion.

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2024-06-24 20:13:33 -07:00
Adarsh Sanjeev 1a883ba1f7
Fix complex columns with export (#16572)
This PR fixes a few bugs with MSQ export. The main change is calling SqlResults#coerce before writing the column. This allows sketches and json to be correctly deserialized. The format of the exported complex columns are similar to those produced by Async MSQ queries with CSV format.

Notes:

    Fix printing of complex columns during export. Sketches and JSON are now correctly formatted during export.
    Fix an NPE if the writer has not been initialized. Empty export queries will create an empty file at the location.
    Fix a bug with counters for MSQ export, where rows were reported for only the first partition.
2024-06-24 09:03:30 +05:30
Akshat Jain 641f739a47
Fix flaky test in RetryableS3OutputStreamTest (#16639)
As part of #16481, we have started uploading the chunks in parallel.
That means that it's not necessary for the part that finished uploading last
to be less than or equal to the chunkSize (as the final part could've been uploaded earlier).

This made a test in RetryableS3OutputStreamTest flaky where we were
asserting that the final part should be smaller than chunk size.

This commit fixes the test, and also adds another test where the file size
is such that all chunk sizes would be of equal size.
2024-06-24 08:13:47 +05:30