Commit Graph

14308 Commits

Author SHA1 Message Date
Zoltan Haindrich d227029b6b undo unrealted change 2024-07-19 19:16:46 +00:00
Zoltan Haindrich f7247e1bb7 use entryset 2024-07-19 15:13:17 +00:00
Zoltan Haindrich b38935a450 add test; fb 2024-07-19 11:44:23 +00:00
Zoltan Haindrich e2a54b5758 update 2024-07-19 08:42:58 +00:00
Zoltan Haindrich d216b934fc Merge remote-tracking branch 'kgyrtkirk/quidem-record' into quidem-record 2024-07-18 11:41:21 +00:00
Zoltan Haindrich 76ff3f26e1 add supress 2024-07-18 07:25:19 +00:00
Benedict Jin e388140b2a
Apply suggestions from code review 2024-07-18 15:06:59 +08:00
Zoltan Haindrich 06b68b6c89 Merge remote-tracking branch 'apache/master' into quidem-record 2024-07-18 05:48:13 +00:00
Akshat Jain b53c26f5c5
Fix issues with partitioning boundaries for MSQ window functions (#16729)
* Fix issues with partitioning boundaries for MSQ window functions

* Address review comments

* Address review comments

* Add test for coverage check failure

* Address review comment

* Remove DruidWindowQueryTest and WindowQueryTestBase, move those tests to DrillWindowQueryTest

* Update extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/WindowOperatorQueryKit.java

* Address review comments

* Add test for equals and hashcode for WindowOperatorQueryFrameProcessorFactory

* Address review comment

* Fix checkstyle

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
2024-07-18 10:05:09 +08:00
Vadim Ogievetsky 44b3f8e588
Web console: fix a few console bugs (#16735)
* remove __time from min max query shortcut

* fix scrolling in retention rules dialog

* actions menus should have titles

* change term

* correctly name sort/shuffle
2024-07-17 14:51:17 -07:00
Kashif Faraz 89066b72cf
Fix bug in TaskStorageQueryAdapter (#16750)
Changes:
- Do not hold a reference to `TaskQueue` in `TaskStorageQueryAdapter`
- Use `TaskStorage` instead of `TaskStorageQueryAdapter` in `IndexerMetadataStorageAdapter`
- Rename `TaskStorageQueryAdapter` to `TaskQueryTool`
- Fix newly added task actions `RetrieveUpgradedFromSegmentIds` and `RetrieveUpgradedToSegmentIds`
by removing `isAudited` method.
2024-07-17 23:17:41 +05:30
Zoltan Haindrich 82436df585 fix test;disable dep-check for module 2024-07-17 14:34:33 +00:00
Zoltan Haindrich 2a590eb3ae Merge commit 'apache/master^^^' into quidem-record 2024-07-17 13:27:54 +00:00
Sree Charan Manamala 40ef9fc4ec
Bug fix for array type selector causing array aggregation over window frame fail (#16653) 2024-07-17 14:09:56 +02:00
Kashif Faraz 9f6ce6ddc0
Remove task action audit logging and druid_taskLog metadata table (#16309)
Description:
Task action audit logging was first deprecated and disabled by default in Druid 0.13, #6368.

As called out in the original discussion #5859, there are several drawbacks to persisting task action audit logs. 
- Only usage of the task audit logs is to serve the API `/indexer/v1/task/{taskId}/segments`
which returns the list of segments created by a task.
- The use case is really narrow and no prod clusters really use this information.
- There can be better ways of obtaining this information, such as the metric
`segment/added/bytes` which reports both the segment ID and task ID
when a segment is committed by a task. We could also include committed segment IDs in task reports.
- A task persisting several segments would bloat up the audit logs table putting unnecessary strain
on metadata storage.

Changes:
- Remove `TaskAuditLogConfig`
- Remove method `TaskAction.isAudited()`. No task action is audited anymore.
- Remove `SegmentInsertAction` as it is not used anymore. `SegmentTransactionalInsertAction`
is the new incarnation which has been in use for a while.
- Deprecate `MetadataStorageActionHandler.addLog()` and `getLogs()`. These are not used anymore
but need to be retained for backward compatibility of extensions.
- Do not create `druid_taskLog` metadata table anymore.
2024-07-17 17:09:00 +05:30
trompa ebf216829d
#16717 defer provider instantiation in Kubernetes Module (#16726)
* #16717 defer provider instatiation

* add license header

* fix style, ignore new class in jacoco as it is still initialization code

---------

Co-authored-by: Alberto Lago Alvarado <albl@sitecore.net>
2024-07-16 13:05:28 -07:00
Kashif Faraz 01d67ae543
Allow CompactionSegmentIterator to have custom priority (#16737)
Changes:
- Break `NewestSegmentFirstIterator` into two parts
  - `DatasourceCompactibleSegmentIterator` - this contains all the code from `NewestSegmentFirstIterator`
  but now handles a single datasource and allows a priority to be specified
  - `PriorityBasedCompactionSegmentIterator` - contains separate iterator for each datasource and
  combines the results into a single queue to be used by a compaction search policy
- Update `NewestSegmentFirstPolicy` to use the above new classes
- Cleanup `CompactionStatistics` and `AutoCompactionSnapshot`
- Cleanup `CompactSegments`
- Remove unused methods from `Tasks`
- Remove unneeded `TasksTest`
- Move tests from `NewestSegmentFirstIteratorTest` to `CompactionStatusTest`
and `DatasourceCompactibleSegmentIteratorTest`
2024-07-16 19:54:49 +05:30
Adithya Chakilam 6cf6838eb9
kubernetes-overlord-extension: Fix tasks not being shutdown (#16711) 2024-07-15 14:35:11 -07:00
AmatyaAvadhanula 6891866c43
Process retrieval of parent and child segment ids in batches (#16734) 2024-07-15 18:24:23 +05:30
Sree Charan Manamala 78a4a09d01
Window Function offset correction for RAC (#16718)
* When an ArrayList RAC creates a child RAC, the start and end offsets need to have the offset of parent's start offset
* Defaults the 2nd window bound to CURRENT ROW when only a single bound is specified
* Removes the windowingStrictValidation warning and throws a hard exception when Order By alongside RANGE clause is not provided with UNBOUNDED or CURRENT ROW as both bounds
2024-07-15 12:43:27 +02:00
Rishabh Singh 64104533ac
Enable querying entirely cold datasources (#16676)
Add ability to query entirely cold datasources.
2024-07-15 15:02:59 +05:30
Laksh Singla 209f8a9546
Deserialize complex dimensions in group by queries to their respective types when reading from spilled files and cached results (#16620)
Like #16511, but for keys that have been spilled or cached during the grouping process
2024-07-15 15:00:17 +05:30
AmatyaAvadhanula d6c760f7ce
Do not kill segments with referenced load specs from deep storage (#16667)
Do not kill segments with referenced load specs from deep storage
2024-07-15 14:07:53 +05:30
Kashif Faraz 656667ee89
Tests: Add utility class TuningConfigBuilder to make IndexTask tests more readable and concise (#16732)
Changes:
- No functional change
- Add class `TuningConfigBuilder` to build `IndexTuningConfig`, `CompactionTuningConfig`
- Remove old class `ParallelIndexTestingFactory.TuningConfigBuilder`
- Remove some unused fields and methods
2024-07-15 10:13:06 +05:30
Kashif Faraz a618c5dd0d
Refactor: Miscellaneous batch task cleanup (#16730)
Changes
- No functional change
- Remove unused method `IndexTuningConfig.withPartitionsSpec()`
- Remove unused method `ParallelIndexTuningConfig.withPartitionsSpec()`
- Remove redundant method `CompactTask.emitIngestionModeMetrics()`
- Remove Clock argument from `CompactionTask.createDataSchemasForInterval()` as it was only needed
for one test which was just verifying the value passed by the test itself. The code now uses a `Stopwatch`
instead and test simply verifies that the metric has been emitted.
- Other minor cleanup changes
2024-07-13 08:12:51 +05:30
Laksh Singla 3a1b437056
Improve the fallback strategy when the broker is unable to materialize the subquery's results as frames for estimating the bytes (#16679)
Better fallback strategy when the broker is unable to materialize the subquery's results as frames for estimating the bytes:
a. We don't touch the subquery sequence till we know that we can materialize the result as frames
2024-07-12 21:49:12 +05:30
Vishesh Garg 197c54f673
Auto-Compaction using Multi-Stage Query Engine (#16291)
Description:
Compaction operations issued by the Coordinator currently run using the native query engine.
As majority of the advancements that we are making in batch ingestion are in MSQ, it is imperative
that we support compaction on MSQ to make Compaction more robust and possibly faster. 
For instance, we have seen OOM errors in native compaction that MSQ could have handled by its
auto-calculation of tuning parameters. 

This commit enables compaction on MSQ to remove the dependency on native engine. 

Main changes:
* `DataSourceCompactionConfig` now has an additional field `engine` that can be one of 
`[native, msq]` with `native` being the default.
*  if engine is MSQ, `CompactSegments` duty assigns all available compaction task slots to the
launched `CompactionTask` to ensure full capacity is available to MSQ. This is to avoid stalling which
could happen in case a fraction of the tasks were allotted and they eventually fell short of the number
of tasks required by the MSQ engine to run the compaction.
* `ClientCompactionTaskQuery` has a new field `compactionRunner` with just one `engine` field.
* `CompactionTask` now has `CompactionRunner` interface instance with its implementations
`NativeCompactinRunner` and `MSQCompactionRunner` in the `druid-multi-stage-query` extension.
The objectmapper deserializes `ClientCompactionRunnerInfo` in `ClientCompactionTaskQuery` to the
`CompactionRunner` instance that is mapped to the specified type [`native`, `msq`]. 
* `CompactTask` uses the `CompactionRunner` instance it receives to create the indexing tasks.
* `CompactionTask` to `MSQControllerTask` conversion logic checks whether metrics are present in 
the segment schema. If present, the task is created with a native group-by query; if not, the task is
issued with a scan query. The `storeCompactionState` flag is set in the context.
* Each created `MSQControllerTask` is launched in-place and its `TaskStatus` tracked to determine the
final status of the `CompactionTask`. The id of each of these tasks is the same as that of `CompactionTask`
since otherwise, the workers will be unable to determine the controller task's location for communication
(as they haven't been launched via the overlord).
2024-07-12 16:40:20 +05:30
Sree Charan Manamala eb981d855f
Correct aggregators violating names (#16615)
In case of few aggregators for example BloomSqlAggregator, BaseVarianceSqlAggregator etc, the aggName is being updated from a0 to a0:agg, breaching the contract as we would expect the aggName as the name which is passed. This is causing a mismatch while creating a column accessor.

This commit aims to correct those violating sql aggregators.
2024-07-12 09:18:09 +02:00
Clint Wylie dca31d466c
minor adjustments for performance (#16714)
changes:
* switch to stop using some string.format
* switch some streams to classic loops
2024-07-11 16:57:15 -07:00
Vadim Ogievetsky 307b8849de
Web console: better sql data loader reset (#16696)
* better sql data loader reset

* snapshot

* fix destination pane sizing

* clean doc links

* update doc links

* more doc links

* extract getClusterCapacity

* update snapsohts

* allow submit suspended

* some renaming

* diff with current

* Do delta
2024-07-11 14:45:04 -07:00
Clint Wylie b3c238457f
fix unnest bugs (#16723)
changes:
* fixes a bug with unnest storage adapter not preserving underlying columns dictionary uniqueness when allowing dimension selector cursor
* fixes a bug with unnest on realtime segments with empty rows incorrectly specifying index 0 as the row dictionary value
2024-07-11 13:48:15 -07:00
Sree Charan Manamala 760d70312f
Window Drill tests coverage improvement (#16722)
Window Drill tests coverage improvement
2024-07-11 19:11:36 +05:30
Clint Wylie d6c07270a5
fix issues with join filter pushdown and virtual column resolution (#16702) 2024-07-11 04:26:07 -07:00
YongGang 4b293fc2a9
Docs: Fix k8s dynamic config URL (#16720) 2024-07-11 10:05:47 +05:30
Kashif Faraz 616ae631c6
Fix NPE in CompactSegments (#16713) 2024-07-10 14:51:52 +08:00
Adarsh Sanjeev 7c625356c5
Add logging for sketches on workers (#16697)
Improve the logging of sketches on workers.
2024-07-09 14:37:43 +05:30
Adarsh Sanjeev af5399cd9d
Fixes a bug when running queries with a limit clause (#16643)
Add a shuffling based on the resultShuffleSpecFactory after a limit processor depending on the query destination. LimitFrameProcessors currently do not update the partition boosting column, so we also add the boost column to the previous stage, if one is required.
2024-07-09 14:29:12 +05:30
Zoltan Haindrich a9bd0eea2a
Fix queries filtering for the same condition with both an IN and EQUALS to not return empty results (#16597)
temp fix until CALCITE-6435 gets fixed (released&upgraded to)
added a custom rule (FixIncorrectInExpansionTypes) to fix-up types of the affected literals
added a testcase which will alert on upgrade
2024-07-09 12:28:21 +05:30
Clint Wylie 09e0eefdc3
modify equality and typed in filter behavior for numeric match values on string columns (#16593)
* fix equality and typed in filter behavior for numeric match values on string columns
changes:
* EqualityFilter and TypedInfilter numeric match values against string columns will now cast strings to numeric values instead of converting the numeric values directly to string for pure string equality, which is consistent with the casts which are eaten in the SQL layer, as well as classic druid behavior
* added tests to cover numeric equality matching. Double match values in particular would fail to match the string values since `1.0` would become `'1.0'` which does not match `'1'`.
2024-07-08 10:58:05 -07:00
Kashif Faraz 7c6f2b1e20
Minor log cleanup in K8sDruidNodeDiscoveryProvider (#16701) 2024-07-08 18:32:39 +05:30
Abhishek Radhakrishnan bf2be938a9
Refactor `SegmentLoadDropHandler` code (#16685)
Motivation:
- Improve code hygeiene
- Make `SegmentLoadDropHandler` easily extensible

Changes:
- Add `SegmentBootstrapper`
- Move code for bootstrapping segments already cached on disk and fetched from coordinator to
`SegmentBootstrapper`.
- No functional change
- Use separate executor service in `SegmentBootstrapper`
- Bind `SegmentBootstrapper` to `ManageLifecycle` explicitly in `CliBroker`, `CliHistorical` etc.
2024-07-08 09:29:55 +05:30
Alberic Liu c6c2652c89
unified the code format in NestedDataOperatorConversions (#16695) 2024-07-08 10:06:24 +08:00
Lars Francke 586c713d12
Updates build documentation to not mention explicit Java version as it was out of sync with the dedicated Java page. (#16674)
This means there is one less place to keep information in sync.
2024-07-03 20:53:15 +05:30
Virushade f290cf083a
Update examples/bin/dsql scripts to accept Python 3 (#16677)
* Update examples/bin/dsql scripts to accept Python 3

Remove redundant urllib import

Translating to Python3: Changing xrange to range

Translating to Python3: Changing long to int

Translating to Python3: Change urllib2 methods, and fix encoding/decoding issues

Remove unnecessary import

Add option for Python2

Rename files

* Update examples/bin/dsql

Co-authored-by: Benedict Jin <asdf2014@apache.org>

* Resolve PR comments

Add comment in files indicating updates need to be made in both places

Update examples/bin/dsql

Co-authored-by: Benedict Jin <asdf2014@apache.org>

* Update error output when using Python 2.

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
2024-07-03 15:52:57 +08:00
Kashif Faraz 6c87b1637b
Revert "Downgrade the version of Apache Curator from 5.5.0 to 5.3.0 to avoid a bug in the new version (#16425)" (#16688)
This reverts commit cb7c2c1e37.
2024-07-03 11:18:50 +05:30
Abhishek Radhakrishnan 35b970935f
Better error handling when retrieving Avro schemas from registry (#16684)
* Handle RestClientException separately, instead of returning a generic error.

- Add tests
- Clean up the tests; remove the legacy expected exception pattern
- Better test assertions

* Rename tests

* checkstyle fixes
2024-07-02 16:48:34 -07:00
317brian d65e015c94
docs: nit for link format (#16687) 2024-07-02 16:45:09 -07:00
Victoria Lim adde024e11
docs: Subtitle updates in migration guide overview (#16683) 2024-07-02 12:56:05 -07:00
zachjsh 5e05858ff7
Catalog granularity accepts query format (#16680)
Previously, the segment granularity for tables in the catalog had to be defined in period format, ie `'PT1H'` , `'P1D'`, etc. This disallows a user from defining segment granularity of `'ALL'` for a table in the catalog, which may be a valid use case. This change makes it so that a user may define the segment granularity of a table in the catalog, as any string that results in a valid granularity using either the `Granularity.fromString(str)` method, or `new PeriodGranularity(new Period(value), null, null)`, and that granularity maps to a standard supported granularity, where `GranularityType.isStandard(granularity)` returns true. As a result a user may who wants to assign a catalog table's segment granularity to be hourly, may assign the segment granularity property of the table to be either `PT1H`, or `HOUR`. These are the same formats accepted at query time.
2024-07-02 12:14:28 -04:00
Jill Osborne bd49ecfd29
Addition to subquery limit migration guide (#16671)
Co-authored-by: Laksh Singla <lakshsingla@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2024-07-01 14:22:47 -07:00