Commit Graph

14673 Commits

Author SHA1 Message Date
Kashif Faraz 24e5d8a9e8
Refactor: Minor cleanup of segment allocation flow (#17524)
Changes
--------
- Simplify the arguments of IndexerMetadataStorageCoordinator.allocatePendingSegment
- Remove field SegmentCreateRequest.upgradedFromSegmentId as it was always null
- Miscellaneous cleanup
2024-12-13 07:46:57 +05:30
Katya Macedo b86ea4d5c4
[Docs] Improve druid.coordinator.kill.on description (#17538)
* Docs: improve druid.coordinator.kill.on description

* Update docs/configuration/index.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update description for durationToRetain

* Update docs/configuration/index.md

* Update after review

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-12-12 16:58:38 -08:00
George Shiqi Wu aca56d6bb8
reject publishing actions with a retriable error code if a earlier task is still publishing (#17509)
* Working queuing of publishing

* fix style

* Add unit tests

* add tests

* retry within the connector

* fix unit tests

* Update indexing-service/src/main/java/org/apache/druid/indexing/common/actions/LocalTaskActionClient.java

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* Add comment

* fix style

* Fix unit tests

* style fix

---------

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2024-12-12 10:37:53 -05:00
Zoltan Haindrich 1a38434d8d
Restore usage of filtered SUM (#17378) 2024-12-12 10:30:42 +01:00
Ashwin Tumma 05c3cbce08
Docs: Update SQL metrics documentation to include dimension engine (#17554)
Co-authored-by: Ashwin Tumma <ashwin.tumma@salesforce.com>
2024-12-12 08:56:42 +05:30
Peter Marshall ccadfd071d
Docs: Update partitioning.md to fix a typo (#17555)
Quick fix to point the links to `dimensionsSpec` to the correct section of the ingestion spec doc.
2024-12-12 08:56:05 +05:30
Clint Wylie 3c1b488cb7
remove druid.sql.planner.serializeComplexValues config in favor of always serializing complex values (#17549) 2024-12-11 13:07:56 -08:00
Andy Tsai f3d7f1aa96
Adding 3 sets of SQL tests in quidem (#17548)
Description

Migrate the initial 3 sets of SQL tests to quidem.  These 3 sets cover numeric, string, and datetime scalar functions.
These tests use the existing kttm dataset.  They aim to exercise SQL queries in a more comprehensive way:

Each scalar function is exercised in 3 different query shapes:
  simple query
  subquery
  group by query
Each query covers all operators in its predicates.
All queries are select count(*) queries.  They are designed to all return the same result for easy maintenance and debugging.

These are the initial sets of tests.  More tests to cover the rest of the scalar and aggregation functions will come later.
2024-12-11 12:57:37 -08:00
Katya Macedo a51061fa43
[Docs] Improve Bloom filter topic (#17547)
* [Docs] Improve Bloom filter topic

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update spelling file

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-12-10 11:43:56 -08:00
Jill Osborne 61d986a179
Filters doc fix (#17553) 2024-12-10 09:34:43 -08:00
Akshat Jain 7705694481
Increase heap size for integration-tests (#17551) 2024-12-10 09:24:58 +05:30
Clint Wylie 80d2cd3632
snapshot column capabilities for realtime cursors (#17386)
* snapshot column capabilities for realtime cursors

changes:
* adds `CursorBuildSpec.getPhysicalColumns()` to allow specifying the set of required physical columns from a segment. if null, all columns are assumed to be required (e.g. full scan)
* `IncrementalIndexCursorFactory`/`IncrementalIndexCursorHolder` uses the physical columns from the cursor build spec to know which set of dimensions to 'snapshot' the capabilities for, allowing expression selectors on realtime queries to no longer be required to treat selectors from `StringDimensionIndexer` as multi-valued unless they truly are multi-valued. this fixes several bugs with expressions on realtime queries that change a value from `StringDimensionIndexer` to some type other than string, which would often result in a single element array from the column being handled as multi-valued
* `StringDimensionIndexer.setSparseIndexed()` now adds the default value to the dictionary when set
* `StringDimensionIndexer` column value selectors now always report that they are dictionary encoded, and that name lookup is possible in advance on their selectors (since set sparse adds the null value so the cardinality is correct)
* fixed a mistake that expression selectors for realtime queries with no null values could not use dictionary encoded selectors

* hmm

* test changes

* cleanup

* add test coverage

* fix test

* fixes

* cleanup
2024-12-09 08:44:54 -08:00
Rohan Garg ae4ea51352
Rewrite S3StorageConnectorTest using testcontainers and MinIO (#17539) 2024-12-09 09:48:38 -05:00
Akshat Jain b114807560
Fix cron job ITs by using jdk17 as the runtime_jdk (#17544)
This PR changes runtime_jdk to 17 from 21.0.4 to fix the cron job ITs.
2024-12-06 14:36:27 -08:00
zachjsh 3b6a3ae222
Add taskStatus dimension to service/heartbeat metric (#17488)
* SQL syntax error should target USER persona

* * revert change to queryHandler and related tests, based on review comments

* * add test

* * add taskStatus dimension to `service/heartbeat` metric

* * address review comments

* * fix compilation error from merge

* * improve test coverage

* Address review comments

* * remove unuused import

* * address remaining comments
2024-12-06 17:18:59 -05:00
George Shiqi Wu 7736228f37
Separate stop/start logic for LeaderLatch (#17546) 2024-12-06 16:01:28 -05:00
Virushade f61ec0af85
Reduce occurrences of failed IT builds (#17543)
Reduce occurrences of failed IT builds: break up the setup command and add a few retries to improve resiliency.
2024-12-06 09:57:38 -08:00
Abhishek Radhakrishnan 3a2220c68d
Refactor: Move some classes from `sql` to `processing` & `server` for reusability (#17542)
This PR contains non-functional / refactoring changes of the following classes in the sql module:

1. Move ExplainPlan and ExplainAttributes fromsql/src/main/java/org/apache/druid/sql/http to processing/src/main/java/org/apache/druid/query/explain
2. Move sql/src/main/java/org/apache/druid/sql/SqlTaskStatus.java -> processing/src/main/java/org/apache/druid/query/http/SqlTaskStatus.java
3. Add a new class processing/src/main/java/org/apache/druid/query/http/ClientSqlQuery.java that is effectively a thin POJO version of SqlQuery in the sql module but without any of the Calcite functionality and business logic.
4. Move BrokerClient, BrokerClientImpl and Broker classes from sql/src/main/java/org/apache/druid/sql/client to server/src/main/java/org/apache/druid/client/broker.
5. Remove BrokerServiceModule that provided the BrokerClient. The functionality is now contained in ServiceClientModule in the server package itself which provides all the clients as well.

This is done so that we can reuse the said classes in #17353 without brining in Calcite and other dependencies to the Overlord.
2024-12-06 09:32:03 -08:00
TessaIO 93c123a482
docs: fix cached lookup module documentation (#17527)
* docs: fix loading lookup documentation

Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>

* docs: fix indentation and punctuation

Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>

---------

Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>
2024-12-06 00:09:37 -08:00
Kashif Faraz 3de46746ca
Fix NPE in segment allocation when reduceMetadataIO is true (#17537) 2024-12-05 12:58:47 +05:30
Karan Kumar 0eb8d733d4
Adding leader and not being leader logging on the overlord. (#17519) 2024-12-03 22:36:53 +05:30
Clint Wylie 9ef46fc92d
suppress kafka cve for ranger extension (#17531) 2024-12-02 21:25:39 -08:00
Zoltan Haindrich c1ef38b052
Minor fixes and enhancements in UnionQuery handling (#17483)
* plan consistently with either UnionDataSource or UnionQuery for decoupled mode
* expose errors
* move decoupled related setting from PlannerConfig to QueryContexts
2024-11-28 10:05:12 +01:00
Vadim Ogievetsky ddbb985369
Web console: refactor and improve the segment timeline (try 2) (#17521)
* refactor and improve the segment timeline

* us consistent state

* type cleanup

* add shpitz

* better bubble

* Update web-console/src/components/segment-timeline/segment-bar-chart-render.tsx

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

---------

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2024-11-27 19:30:40 -08:00
Charles Smith 0325f62af2
[Docs] Remove ambiguous advice regarding TopN correctness (#17522) 2024-11-27 11:41:28 -08:00
Vadim Ogievetsky f3e1f1e586
Revert "Web console: refactor and improve the segment timeline (#17508)" (#17520)
This reverts commit 09432c099b.
2024-11-27 09:38:48 -08:00
Vadim Ogievetsky 09432c099b
Web console: refactor and improve the segment timeline (#17508)
* refactor and improve the segment timeline

* us consistent state

* type cleanup

* add shpitz

* better bubble
2024-11-27 09:37:01 -08:00
Vishesh Garg 1b9a6dde9f
Fix compilation error for MSQCompactionRunnerTest (#17516) 2024-11-27 12:46:30 +01:00
Gian Merlino 80d6763e39
ServerSelector: Synchronize getAllServers(). (#17499)
This method was missing some required synchronization. This patch also
adds GuardedBy annotations to historicalServers and realtimeServers, which
would have caught it.
2024-11-27 13:31:00 +05:30
Vishesh Garg 5333c53d71
Support non time order in MSQ compaction (#17318)
This patch supports sorting segments by non-time columns (added in #16849) to MSQ compaction.
Specifically, if `forceSegmentSortByTime` is set in the data schema, either via the user-supplied
compaction config or in the inferred schema, the following steps are taken:
- Skip adding `__time` explicitly as the first column to the dimension schema since it already comes
as part of the schema
- Ensure column mappings propagate `__time` in the order specified by the schema
- Set `forceSegmentSortByTime` in the MSQ context.
2024-11-27 13:26:10 +05:30
Clint Wylie 2831d79871
update kafka dependency version to 3.9.0 (#17513)
* update kafka dependency version to 3.9.0

* update licenses.yaml
2024-11-27 12:14:05 +05:30
Akshat Jain dd46c7722d
Remove pre-java-11 profile (#17511)
We have removed support for Java 8 in #17466. This PR removes an unused profile pre-java-11 which activated for JDK < 11.
2024-11-26 08:43:20 +01:00
Kashif Faraz 207ad16f07
Reduce metadata IO during segment allocation (#17496)
Changes
---------
- Add Overlord runtime property `druid.indexer.tasklock.batchAllocationReduceMetadataIO`
- Setting this flag to true (default value) allows the Overlord to fetch only necessary segment
payloads during segment allocation
- Setting this flag to false restores original segment allocation behaviour
2024-11-26 11:40:09 +05:30
Clint Wylie ede9e4077a
add support for aggregate only projections (#17484) 2024-11-25 09:22:46 -08:00
Zoltan Haindrich 20aea29a51
Rename d1/d2 columns in tests (#17471) 2024-11-22 14:58:56 +01:00
Rishabh Singh 74422b58f5
Emit disk spill and merge buffer utilisation metrics for GroupBy queries (#17360)
This change is to emit following metrics as part of GroupByStatsMonitor monitor,
mergeBuffer/used -> Number of merge buffers used.
mergeBuffer/acquisitionTimeNs -> Total time required to acquire merge buffer.
mergeBuffer/acquisition -> Number of queries that acquired a batch of merge buffers.
groupBy/spilledQueries -> Number of queries that spilled onto the disk.
groupBy/spilledBytes-> Spilled bytes on the disk.
groupBy/mergeDictionarySize -> Size of the merging dictionary.
2024-11-22 14:22:03 +05:30
Adarsh Sanjeev df649c0bbd
Refactors (#17498)
Follow-up PR to #17493 to address pending unaddressed comments.
2024-11-22 09:22:38 +05:30
Katya Macedo bd93d0046d
Docs: update text and example (#17480)
* Docs: update text and example

* Update after review

* Update the spelling file

* Update text for clarity

* Update after review
2024-11-21 08:40:41 -08:00
Vivek Dhiman bb44f85bb6
Updated error response to hide error stack in case of JsonMappingException (#16821)
Added flag druid.server.http.showDetailedJsonMappingError similar druid.server.http.showDetailedJettyError to configure error message detail.
2024-11-21 19:11:48 +05:30
Adarsh Sanjeev 2726c6f388
Minor refactors to processing
Some refactors across druid to clean up the code and add utility functions where required.
2024-11-21 15:37:55 +05:30
Akshat Jain 17215cd677
Remove support for Java 8 (#17466)
All JDK 8 based CI checks have been removed.
    Images used in Dockerfile(s) have been updated to Java 17 based images.
    Documentation has been updated accordingly.
2024-11-21 15:33:08 +05:30
Adithya Chakilam c1d6328249
StreamingTaskRunner: Close the rejection period updater executor service (#17490) 2024-11-19 12:49:20 -08:00
zachjsh 8853c7e5c6
Add `ingest/notices/queueSize` and `ingest/pause/time` to statsd emitter (#17487)
* SQL syntax error should target USER persona

* * revert change to queryHandler and related tests, based on review comments

* * add test

* * add `ingest/notices/queueSize` and `ingest/pause/time` to statsd emitter

* * add taskStatus dimension to `service/heartbeat` metric

* Revert "* add taskStatus dimension to `service/heartbeat` metric"

This reverts commit cfb02a2813.
2024-11-18 20:58:00 -05:00
Adithya Chakilam 6f436301be
supervisor: make rejection periods work with stopTasksCount (#17442)
* kafka-indexing: Report consumer io time

* commit

* backward

* tests

* remove unwanted changes

* comments

* comments

* coverage

* change name

* fixes

* fixes

* comments
2024-11-18 13:12:24 -08:00
Clint Wylie 24a1fafaa7
projection segment merge fixes (#17460)
changes:
* fix issue when merging projections from multiple-incremental persists which was hoping that some 'dim conversion' buffers were not closed, but they already were (by the merging iterator). fix involves selectively persisting these conversion buffers to temp files in the segment write out directory and mapping them and tying them to the segment level closer so that they are available after the lifetime of the parent merger
* modify auto column serializers to use segment write out directory for temp files instead of java.io.tmpdir
* fix queryable index projection to not put the time-like column as a dimension, instead only adding it as __time
* use smoosh for temp files so can safely write any Serializer to a temp smoosh
2024-11-15 16:46:04 -08:00
Rishabh Singh 7f335ff486
Resolve CVEs: Upgrade jetty version and suppress azure cve (#17385) 2024-11-15 10:55:02 +05:30
Katya Macedo 75d9ece665
Docs: update descriptions and default values (#17473) 2024-11-13 16:29:27 -08:00
zachjsh b0c73d7c2a
Add 'ingest/notices/time' metric to statsd emitter (#17468)
* SQL syntax error should target USER persona

* * revert change to queryHandler and related tests, based on review comments

* * add test

* Add 'ingest/notices/time' metric to statsd emitter

This metric gives the milliseconds taken to process a notice by the supervisor.
2024-11-13 12:17:01 -05:00
Akshat Jain 390c2d68c8
Remove `intellij-inspections` check from CI (#17469) 2024-11-13 18:58:17 +05:30
Kiran Gadhave 1dbd005df6
updated docs with behavior for empty collections in pod template selector config (#17464) 2024-11-12 13:21:27 -08:00