Commit Graph

4174 Commits

Author SHA1 Message Date
Jonathan Wei 5d1e66b8f9
Allow broker to use catalog for datasource schemas for SQL queries (#15469)
* Allow broker to use catalog for datasource schemas

* More PR comments

* PR comments
2024-01-08 13:46:08 -06:00
AmatyaAvadhanula 63bfb3e6c9
Handle half-eternity intervals while fetching segments with created dates (#15608)
* Handle all intervals while fetching segments with created dates
2024-01-08 12:07:11 +05:30
George Shiqi Wu 9fe67958be
Increase ServerConnector accept queue size (#15596)
* Allow overwriting ServerConnector accept queue size

* Use a single config

* Fix spacing

* fix spacing

* fixed value

* read value from environment

* fix spacing

* Unpack value before reading

* check somaxconn on linux only
2024-01-05 12:04:15 -05:00
Gian Merlino e40b96e026
Reverse lookup fixes and enhancements. (#15611)
* Reverse lookup fixes and enhancements.

1) Add a "mayIncludeUnknown" parameter to DimFilter#optimize. This is important
   because otherwise the reverse-lookup optimization is done improperly when
   the "in" filter appears under a "not", and the lookup extractionFn may return
   null for some possible values of the filtered column. The "includeUnknown" test
   cases in InDimFilterTest illustrate the difference in behavior.

2) Enhance InDimFilter#optimizeLookup to handle "mayIncludeUnknown", and to be able
   to do a reverse lookup in a wider variety of cases.

3) Make "unapply" protected in LookupExtractor, and move callers to "unapplyAll".
   The main reason is that MapLookupExtractor, a common implementation, lacks a
   reverse mapping and therefore does a scan of the map for each call to "unapply".
   For performance sake these calls need to be batched.

* Remove optimize call from BloomDimFilter.

* Follow the law.

* Fix tests.

* Fix imports.

* Switch function.

* Fix tests.

* More tests.
2024-01-03 13:28:44 -08:00
Abhishek Radhakrishnan f0f428274a
Prometheus config property doc fixup (#15613)
* Minor fixes

* Update docs/development/extensions-contrib/prometheus.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-01-02 16:28:42 -08:00
kaisun2000 a5e9b14be0
Add delay before the peon drops the segments after publishing them (#15373)
Currently in the realtime ingestion (Kafka/Kinesis) case, after publishing the segments, upon acknowledgement from the coordinator that the segments are already placed in some historicals, the peon would unannounce the segments (basically saying the segments are not in this peon anymore to the whole cluster) and drop the segments from cache and sink timeline in one shot.

The in transit queries from the brokers that still thinks the segments are in the peon can get a NullPointer exception when the peon is unsetting the hydrants in the sinks.

The fix would let the peon to wait for a configurable delay period before dropping segments, remove segments from cache etc after the peon unannounce the segments.

This delayed approach is similar to how the historicals handle segments moving out.
2024-01-02 11:08:28 +05:30
Kashif Faraz 9f568858ef
Add logging implementation for AuditManager and audit more endpoints (#15480)
Changes
- Add `log` implementation for `AuditManager` alongwith `SQLAuditManager`
- `LoggingAuditManager` simply logs the audit event. Thus, it returns empty for
all `fetchAuditHistory` calls.
- Add new config `druid.audit.manager.type` which can take values `log`, `sql` (default)
- Add new config `druid.audit.manager.logLevel` which can take values `DEBUG`, `INFO`, `WARN`.
This gets activated only if `type` is `log`.
- Remove usage of `ConfigSerde` from `AuditManager` as audit is not just limited to configs
- Add `AuditSerdeHelper` for a single implementation of serialization/deserialization of
audit payload and other utility methods.
2023-12-19 13:14:04 +05:30
Kashif Faraz feeb4f0fb0
Allocate pending segments at latest committed version (#15459)
The segment allocation algorithm reuses an already allocated pending segment if the new allocation request is made for the same parameters:

datasource
sequence name
same interval
same value of skipSegmentLineageCheck (false for batch append, true for streaming append)
same previous segment id (used only when skipSegmentLineageCheck = false)
The above parameters can thus uniquely identify a pending segment (enforced by the UNIQUE constraint on the sequence_name_prev_id_sha1 column in druid_pendingSegments metadata table).

This reuse is done in order to

allow replica tasks (in case of streaming ingestion) to use the same set of segment IDs.
allow re-run of a failed batch task to use the same segment ID and prevent unnecessary allocations
2023-12-14 16:18:39 +05:30
AmatyaAvadhanula 91ca8e73d6
Skip compaction for datasources with partial-eternity segments (#15542)
This PR builds on #13304 to skip compaction for datasources with segments that have their interval start or end coinciding with Eternity interval end-points.

This is needed in order to prevent an issue similar to #13208 as the Coordinator tries to iterate over a large number of intervals when trying to compact an interval with infinite start or end.
2023-12-12 15:06:45 +05:30
zachjsh ab7d9bc6ec
Add api for Retrieving unused segments (#15415)
### Description

This pr adds an api for retrieving unused segments for a particular datasource. The api supports pagination by the addition of `limit` and `lastSegmentId` parameters. The resulting unused segments are returned with optional `sortOrder`, `ASC` or `DESC` with respect to the matching segments `id`, `start time`, and `end time`, or not returned in any guarenteed order if `sortOrder` is not specified

`GET /druid/coordinator/v1/datasources/{dataSourceName}/unusedSegments?interval={interval}&limit={limit}&lastSegmentId={lastSegmentId}&sortOrder={sortOrder}`

Returns a list of unused segments for a datasource in the cluster contained within an optionally specified interval.
Optional parameters for limit and lastSegmentId can be given as well, to limit results and enable paginated results.
The results may be sorted in either ASC, or DESC order depending on specifying the sortOrder parameter.

`dataSourceName`: The name of the datasource
`interval`:                 the specific interval to search for unused segments for.
`limit`:                      the maximum number of unused segments to return information about. This property helps to
                                 support pagination
`lastSegmentId`:     the last segment id from which to search for results. All segments returned are > this segment 
                                 lexigraphically if sortOrder is null or ASC, or < this segment lexigraphically if sortOrder is DESC.
`sortOrder`:            Specifies the order with which to return the matching segments by start time, end time. A null
                                value indicates that order does not matter.

This PR has:

- [x] been self-reviewed.
   - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
- [x] added documentation for new or modified features or behaviors.
- [ ] a release note entry in the PR description.
- [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
- [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
- [ ] added integration tests.
- [x] been tested in a test Druid cluster.
2023-12-11 16:32:18 -05:00
Abhishek Radhakrishnan 96be82a3e6
Clean up duty for non-overlapping eternity tombstones (#15281)
* Add initial draft of MarkDanglingTombstonesAsUnused duty.

* Use overshadowed segments instead of all used segments.

* Add unit test for MarkDanglingSegmentsAsUnused duty.

* Add mock call

* Simplify code.

* Docs

* shorter lines formatting

* metric doc

* More tests, refactor and fix up some logic.

* update javadocs; other review comments.

* Make numCorePartitions as 0 in the TombstoneShardSpec.

* fix up test

* Add tombstone core partition tests

* Update docs/design/coordinator.md

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* review comment

* Minor cleanup

* Only consider tombstones with 0 core partitions

* Need to register the test shard type to make jackson happy

* test comments

* checkstyle

* fixup misc typos in comments

* Update logic to use overshadowed segments

* minor cleanup

* Rename duty to eternity tombstone instead of dangling. Add test for full eternity tombstone.

* Address review feedback.

---------

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2023-12-11 08:57:15 -08:00
Clint Wylie 557f3f6f57
add array column type support to EXTEND operator (#15458) 2023-12-06 23:21:35 -08:00
Rishabh Singh 6a64f72c67
Lookup on incomplete partition set in SegmentMetadataQuerySegmentWalker (#15496)
Description
With CentralizedDatasourceSchema (#14989) feature enabled, metadata for appended segments was not being refreshed. This caused numRows to be 0 for the new segments and would probably cause the datasource schema to not include columns from the new segments.

Analysis
The problem turned out in the new QuerySegmentWalker implementation in the Coordinator. It first finds the segment to be queried in the Coordinator timeline. Then it creates a new timeline of the segments present in the timeline.
The problem was that it is looking up complete partition set in the new timeline. Since the appended segments by themselves do not make a complete partition set, no SegmentMetadataQuery were executed.
2023-12-06 15:25:28 +05:30
Abhishek Radhakrishnan f4949afdd7
clarify and fixup typos related to unused segments in docs and javadocs. (#15498) 2023-12-05 22:30:32 -08:00
Vishesh Garg 326b7b731d
Upgrade zookeeper from 3.5.10 to 3.8.3 (#15477)
Upgrade zookeeper from 3.5.10 to 3.8.3
2023-12-05 18:57:56 +05:30
Rishabh Singh d968bb3f43
Rename config for enabling CentralizedDatasourceSchema feature (#15476)
* Rename property to druid.centralizedDatasourceSchema.enabled
* Update config name in docker-compose
2023-12-05 16:57:25 +05:30
Jan Werner 8cc256b079
update guava to 32.0.1-jre to address CVEs (#15482)
Update guava to 32.0.1-jre to address two CVEs: CVE-2020-8908, CVE-2023-2976
This change requires a minor test change to remove assumptions about ordering.

---------

Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
2023-12-04 13:18:42 -08:00
Pranav 9f3b26676d
Log full stack when exception message is null (#15467) 2023-11-30 16:47:37 -08:00
Kashif Faraz 8ddb847658
Fix message when skipping compaction (#15460) 2023-11-30 14:57:13 +05:30
Vadim Ogievetsky 31fa63e789
Web console: better management proxy detection (#15453)
* better management proxy detection

* fix checkstyle issue

* add test

* test should read the body also

* use ObjectMapper

* assert read ammount
2023-11-29 21:43:42 -08:00
Kashif Faraz 58a724c7e4
Use StubServiceEmitter in tests (#15426)
* Use StubServiceEmitter in tests
* Remove unthrown exception from declaration
2023-11-28 09:43:09 +05:30
Karan Kumar a0188192de
Fixing failing compaction/parallel index jobs during upgrade due to new actions being available on the overlord. (#15430)
* Fixing failing compaction/parallel index jobs during upgrade due to new actions not available on the overlord.

* Fixing build

* Removing extra space.

* Fixing json getter.

* Review comments.
2023-11-25 13:50:29 +05:30
Gian Merlino 5ccd79d62b
Fix NPE caused by realtime segment closing race, fix possible missing-segment retry bug. (#15260)
* Fix NPE caused by realtime segment closing race, fix possible missing-segment retry bug.

Fixes #12168, by returning empty from FireHydrant when the segment is
swapped to null. This causes the SinkQuerySegmentWalker to use
ReportTimelineMissingSegmentQueryRunner, which causes the Broker to look
for the segment somewhere else.

In addition, this patch changes SinkQuerySegmentWalker to acquire references
to all hydrants (subsegments of a sink) at once, and return a
ReportTimelineMissingSegmentQueryRunner if *any* of them could not be acquired.
I suspect, although have not confirmed, that the prior behavior could lead to
segments being reported as missing even though results from some hydrants were
still included.

* Some more test coverage.
2023-11-20 10:48:14 -08:00
Abhishek Radhakrishnan 470c8ed7b0
Make `numCorePartitions` as 0 for tombstones (#15379)
* Make numCorePartitions as 0 in the TombstoneShardSpec.

* fix up test

* Add tombstone core partition tests

* review comment

* Need to register the test shard type to make jackson happy
2023-11-20 09:42:51 -08:00
Rishabh Singh 03a092f4ec
Document Nuances in SegmentMetadataCache Behaviour (#15367)
* Document segment metadata cache behaviour

* Fix typo

* Minor update

* Minor change
2023-11-15 18:41:20 +05:30
Rishabh Singh 5446494e63
Non-existent datasource shouldn't affect schema rebuilding for other datasources (#15355)
In pull request #14985, a bug was introduced where periodic refresh would skip rebuilding a datasource's schema after encountering a non-existent datasource. This resulted in remaining datasources having stale schema information.

This change addresses the bug and adds a unit test to validate the refresh mechanism's behaviour when a datasource is removed, and other datasources have schema changes.
2023-11-14 12:52:33 +05:30
AmatyaAvadhanula 895e53555c
Optimize mark segments as unused (#15352) 2023-11-09 15:13:45 +05:30
Pranav e2fde8c516
Refactor lookups behavior while loading/dropping the containers (#14806) 2023-11-07 10:07:28 -08:00
Abhishek Radhakrishnan 2136dc3591
Batch segment retrieval from the metadata store (#15305)
* Add a unit test that fails when used segments with too many intervals are retrieved.

- This is a failing test case that needs to be ignored.

* Batch the intervals (use 100 as it's consistent with batching in other places).

* move the filtering inside the batch

* Account for limit cross the batch splits.

* Adjustments

* Fixup and add tests

* small refactor

* add more tests.

* remove wrapper.

* Minor edits

* assert out of range
2023-11-06 11:30:24 -08:00
Rishabh Singh 8c802e4c9b
Relocating Table Schema Building: Shifting from Brokers to Coordinator for Improved Efficiency (#14985)
In the current design, brokers query both data nodes and tasks to fetch the schema of the segments they serve. The table schema is then constructed by combining the schemas of all segments within a datasource. However, this approach leads to a high number of segment metadata queries during broker startup, resulting in slow startup times and various issues outlined in the design proposal.

To address these challenges, we propose centralizing the table schema management process within the coordinator. This change is the first step in that direction. In the new arrangement, the coordinator will take on the responsibility of querying both data nodes and tasks to fetch segment schema and subsequently building the table schema. Brokers will now simply query the Coordinator to fetch table schema. Importantly, brokers will still retain the capability to build table schemas if the need arises, ensuring both flexibility and resilience.
2023-11-04 19:33:25 +05:30
Gian Merlino 98f1eb8ede
Use filters for pruning properly for hash-joins. (#15299)
* Use filters for pruning properly for hash-joins.

Native used them too aggressively: it might use filters for the RHS
to prune the LHS. MSQ used them not at all. Now, both use them properly,
pruning based on base (LHS) columns only.

* Fix tests.

* Fix style.

* Clear filterFields too.

* Update.
2023-11-03 07:29:16 -07:00
Gian Merlino d87d92bc43
Add system fields to input sources. (#15276)
* Add system fields to input sources.

Main changes:

1) The SystemField enum defines system fields "__file_uri", "__file_path",
   and "__file_bucket". They are associated with each input entity.

2) The SystemFieldInputSource interface can be added to any InputSource
   to make it system-field-capable. It sets up serialization of a list
   of configured "systemFields" in the JSON form of the input source, and
   provides a method getSystemFieldValue for computing the value of each
   system field. Cloud object, HDFS, HTTP, and Local now have this.

* Fix various LocalInputSource calls.

* Fix style stuff.

* Fixups.

* Fix tests and coverage.
2023-11-02 10:31:28 -07:00
Laksh Singla b82ad59dfe
Better logging in ServiceClientImpl (#15269)
ServiceClientImpl logs the cause of every retry, even though we are retrying the connection attempt. This leads to slight pollution in the logs because a lot of the time, the reason for retrying is the same. This is seen primarily in MSQ, when the worker task hasn't launched yet however controller attempts to connect to the worker task, which can lead to scary-looking messages (with INFO log level), even though they are normal.
This PR changes the logging logic to log every 10 (arbitrary number) retries instead of every retry, to reduce the pollution of the logs.
Note: If there are no retries left, the client returns an exception, which would get thrown up by the caller, and therefore this change doesn't hide any important information.
2023-11-02 11:32:49 +05:30
Gian Merlino 6b6d73b5d4
Use min of scheduler threads and server threads for subquery guardrails. (#15295)
* Use min of scheduler threads and server threads for subquery guardrails.

This allows more memory to be used for subqueries when the query scheduler
is configured to limit queries below the number of server threads. The patch
also refactors the code so SubqueryGuardrailHelper is provided by a Guice
Provider rather than being created by ClientQuerySegmentWalker, to achieve
better separation of concerns.

* Exclude provider from coverage.
2023-11-01 22:34:53 -07:00
Gian Merlino 37e158c2c4
Frames: consider writing singly-valued column when input column hasMultipleValues is UNKNOWN. (#15300)
* Frames: consider writing singly-valued column when input column hasMultipleValues is UNKNOWN.

Prior to this patch, columnar frames would always write multi-valued columns if
the input column had hasMultipleValues = UNKNOWN. This had the effect of flipping
UNKNOWN to TRUE when copying data into frames, which is problematic because TRUE
causes expressions to assume that string inputs must be treated as arrays.

We now avoid this by flipping UNKNOWN to FALSE if no multi-valuedness
is encountered, and flipping it to TRUE if multi-valuedness is encountered.

* Add regression test case.
2023-11-01 22:05:53 -07:00
Suneet Saldanha e6b7c36e74
LoadRules with 0 replicas should be treated as handoff complete (#15274)
* LoadRules with 0 replicas should be treated as handoff complete

* fix it

* pr feedback

* fixit
2023-10-30 10:42:58 -07:00
George Shiqi Wu 3173093415
Handle status failures for streaming supervisors (#15174)
* Cleanup logic

* newline

* remove whitespace

* Fix log message

* Add test class

* PR changes
2023-10-30 10:21:23 -07:00
kaisun2000 60c2ad597a
Enhance json parser error logging to better track Istio Proxy error message (#15176)
Currently the inter Druid communication via rest endpoints is based on json formatted payload. Upon parsing error, there is only a generic exception stating expected json token type and current json token type. There is no detailed error log about the content of the payload causing the violation.

In the micro-service world, the trend is to deploy the Druid servers in k8 with the mesh network. Often the istio proxy or other proxies is used to intercept the network connection between Druid servers. The proxy may give error messages for various reasons. These error messages are not expected by the json parser. The generic error message from Druid can be very misleading as the user may think the message is based on the response from the other Druid server.

For example, this is an example of mysterious error message

QueryInterruptedException{msg=Next token wasn't a START_ARRAY, was[VALUE_STRING] from url[http://xxxxx:8088/druid/v2/], code=Unknown exception, class=org.apache.druid.java.util.common.IAE, host=xxxxx:8088}"

While the context of the message is the following from the proxy when it can't tunnel the network connection.

pstream connect error or disconnect/reset before header

So this very simple PR is just to enhance the logging and get the real underlying message printed out. This would save a lot of head scratching time if Druid is deployed with mesh network.

Co-authored-by: Kai Sun <kai.sun@salesforce.com>
2023-10-27 14:20:19 +05:30
AmatyaAvadhanula 65b69cded4
Filter pending segments upgraded with transactional replace (#15169)
* Filter pending segments upgraded with transactional replace

* Push sequence name filter to metadata query
2023-10-23 21:18:47 +05:30
AmatyaAvadhanula a8febd457c
A Replacing task must read segments created before it acquired its lock (#15085)
* Replacing tasks must read segments created before they acquired their locks
2023-10-19 11:13:07 +05:30
AmatyaAvadhanula d25caaefa4
Add support for streaming ingestion with concurrent replace (#15039)
Add support for streaming ingestion with concurrent replace

---------

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2023-10-13 09:09:03 +05:30
Laksh Singla 5f86072456
Prepare master for Druid 29 (#15121)
Prepare master for Druid 29
2023-10-11 10:33:45 +05:30
Laksh Singla 95bf331c08
Rename the default setting of 'maxSubqueryBytes' from 'unlimited' to 'disabled' (#15108)
The default setting of 'maxSubqueryBytes' is renamed from 'unlimited' to 'disabled'.
2023-10-10 02:03:29 +05:30
Adarsh Sanjeev 7a35ce886d
Add ability for MSQ tasks to query realtime tasks (#15024)
This PR aims to add the capabilities to:
1. Fetch the realtime segment metadata from the coordinator server view,
2. Adds the ability for workers to query indexers, similar to how brokers do the same for native queries.
2023-10-09 15:14:03 +05:30
kaisun2000 e2cc1c4ad1
Add metric -- count of queries waiting for merge buffers (#15025)
Add 'mergeBuffer/pendingRequests' metric that exposes the count of waiting queries (threads) blocking in the merge buffers pools.
2023-10-09 12:56:23 +05:30
Pranav f1edd671fb
Exposing optional replaceMissingValueWith in lookup function and macros (#14956)
* Exposing optional replaceMissingValueWith in lookup function and macros

* args range validation

* Updating docs

* Addressing comments

* Update docs/querying/sql-scalar.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Update docs/querying/sql-functions.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Addressing comments

---------

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2023-10-02 17:09:23 -07:00
Parth Agrawal d038237ece
memcached cache: switch to AWS elasticache-java-cluster-client and add TLS support (#14827)
This PR updates the library used for Memcached client to AWS Elasticache Client : https://github.com/awslabs/aws-elasticache-cluster-client-memcached-for-java

This enables us to use the option of encrypting data in transit:
Amazon ElastiCache for Memcached now supports encryption of data in transit

For clusters running the Memcached engine, ElastiCache supports Auto Discovery—the ability for client programs to automatically identify all of the nodes in a cache cluster, and to initiate and maintain connections to all of these nodes.
Benefits of Auto Discovery - Amazon ElastiCache

AWS has forked spymemcached 2.12.1, and has since added all the patches included in 2.12.2 and 2.12.3 as part of the 1.2.0 release. So, this can now be considered as an equivalent drop-in replacement.

GitHub - awslabs/aws-elasticache-cluster-client-memcached-for-java: Amazon ElastiCache Cluster Client for Java - enhanced library to connect to ElastiCache clusters.
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/elasticache/AmazonElastiCacheClient.html#AmazonElastiCacheClient--

How to enable TLS with Elasticache

On server side:
https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/in-transit-encryption-mc.html#in-transit-encryption-enable-existing-mc

On client side:
GitHub - awslabs/aws-elasticache-cluster-client-memcached-for-java: Amazon ElastiCache Cluster Client for Java - enhanced library to connect to ElastiCache clusters.
2023-10-02 12:51:05 -07:00
Gian Merlino 3dabfead05
Fix getResultType for HLL, quantiles aggregators. (#15043)
The aggregators had incorrect types for getResultType when shouldFinalze
is false. They had the finalized type, but they should have had the
intermediate type.

Also includes a refactor of how ExprMacroTable is handled in tests, to make
it easier to add tests for this to the MSQ module. The bug was originally
noticed because the incorrect result types caused MSQ queries with DS_HLL
to behave erratically.
2023-09-27 08:51:14 +05:30
AmatyaAvadhanula c62193c4d7
Add support for concurrent batch Append and Replace (#14407)
Changes:
- Add task context parameter `taskLockType`. This determines the type of lock used by a batch task.
- Add new task actions for transactional replace and append of segments
- Add methods StorageCoordinator.commitAppendSegments and commitReplaceSegments
- Upgrade segments to appropriate versions when performing replace and append
- Add new metadata table `upgradeSegments` to track segments that need to be upgraded
- Add tests
2023-09-25 07:06:37 +05:30
Kashif Faraz d7c152c82c
Add a TaskReport for "kill" tasks (#15023)
- Add `KillTaskReport` that contains stats for `numSegmentsKilled`,
`numBatchesProcessed`, `numSegmentsMarkedAsUnused`
- Fix bug where exception message had no formatter but was was still being passed some args.
- Add some comments regarding deprecation of `markAsUnused` flag.
2023-09-23 07:44:27 +05:30