Commit Graph

2960 Commits

Author SHA1 Message Date
Clint Wylie e5661a394c
refactor front-coded into static classes instead of using functional interfaces (#14572)
* refactor front-coded into static classes instead of using functional interfaces

* shared v0 static method instead of copy
2023-08-04 10:52:36 -07:00
Soumyava 0d73480c8f
Latest aggregator factories should accept time as VectorValueSelecto… (#14753)
Fix the queries that have latest aggregator with an expression as time column
2023-08-04 13:04:25 +05:30
imply-cheddar 748874405c
Minimize PostAggregator computations (#14708)
* Minimize PostAggregator computations

Since a change back in 2014, the topN query has been computing
all PostAggregators on all intermediate responses from leaf nodes
to brokers.  This generates significant slow downs for queries
with relatively expensive PostAggregators.  This change rewrites
the query that is pushed down to only have the minimal set of
PostAggregators such that it is impossible for downstream
processing to do too much work.  The final PostAggregators are
applied at the very end.
2023-08-04 00:04:31 +05:30
Clint Wylie 94fb41a4df
fix nested field virtual column array column element vector object selector (#14729)
Fixes a case I missed in #14688 when the return type is STRING but its coming from a top level array typed column instead of a nested array column while making a vector object selector.

Also while here I noticed that the internal JSON_VALUE functions for array types were named inconsistently with the non-array functions, so I renamed them. These are not documented so it should not be disruptive in any way, since they are only used internally for rewrites while planning to make the correctly virtual column.

JSON_VALUE_RETURNING_ARRAY_VARCHAR -> JSON_VALUE_ARRAY_VARCHAR
JSON_VALUE_RETURNING_ARRAY_BIGINT -> JSON_VALUE_ARRAY_BIGINT
JSON_VALUE_RETURNING_ARRAY_DOUBLE -> JSON_VALUE_ARRAY_DOUBLE
The internal non-array functions are JSON_VALUE_VARCHAR, JSON_VALUE_BIGINT, and JSON_VALUE_DOUBLE.
2023-08-02 17:08:24 +05:30
Clint Wylie 2e456d25ae
fix issue with ExprEval.bestEffortOf and mixed type arrays containing ARRAY<COMPLEX<json>> and other complicated casts (#14710) 2023-08-01 09:25:14 -07:00
Adarsh Sanjeev 21d023b62b
Handle taskIds which are not found in the overlord correctly (#14706)
This PR has fixes a bug in the SqlStatementAPI where if the task is not found on the overlord, the response status is 500.
This changes the response to invalid input since the queryID passed is not valid.
2023-07-31 21:38:14 +05:30
Clint Wylie 5f72f4f37d
fixes for nested virtual column array element vector selectors and fixes for variant and nested variant numeric columns
* fix issue with nested virtual column array element vector selectors when input is numeric array but output is non-numeric
* add vector value selector for mixed numeric type variant and nested variant fields, tests
2023-07-28 15:14:29 -07:00
Gian Merlino 6517fc2796
Save a metadata call when reading files from CloudObjectInputSource. (#14677)
* Save a metadata call when reading files from CloudObjectInputSource.

The call to createSplits(inputFormat, null) in formattableReader would
use the default split hint spec, MaxSizeSplitHintSpec, which makes
getObjectMetadata calls in order to compute its splits. This isn't
necessary; we're just trying to unpack the files inside the input
source.

To fix this, use FilePerSplitHintSpec to extract files without any
funny business.

* Adjust call.

* Fix constant.

* Test coverage.
2023-07-28 13:31:03 -07:00
Gian Merlino 46ecc6b900
Frames support for string arrays that are null. (#14653)
* Frames support for string arrays that are null.

The row format represents null arrays as 0x0001, which older readers
would interpret as an empty array. This provides compatibility with
older readers, which is useful during updates.

The column format represents null arrays by writing -(actual length) - 1
instead of the length, and using FrameColumnWriters.TYPE_STRING_ARRAY for
the type code for string arrays generally. Older readers will report this
as an unrecognized type code. Column format is only used by the operator
query, which is currently experimental, so the impact isn't too severe.

* Remove unused import.

* Return Object[] instead of List from frame array selectors.

Update MSQSelectTest and MSQInsertTest to reflect the fact that null
arrays are possible.

Add a bunch of javadocs to object selectors describing expected behavior,
including the requirement that array selectors return Object[].

* update test case.

* Update test cases.
2023-07-28 10:23:39 -07:00
Clint Wylie d406bafdfc
fix issues with equality and range filters matching double values to long typed inputs (#14654)
* fix issues with equality and range filters matching double values to long typed inputs
* adjust to ensure we never homogenize null, [], and [null] into [null] for expressions on real array columns
2023-07-27 16:01:21 -07:00
TSFenwick 9a9038c7ae
Speed up kill tasks by deleting segments in batch (#14131)
* allow for batched delete of segments instead of deleting segment data one by one

create new batchdelete method in datasegment killer that has default functionality
of iterating through all segments and calling delete on them. This will enable
a slow rollout of other deepstorage implementations to move to a batched delete
on their own time

* cleanup batchdelete segments

* batch delete with the omni data deleter

cleaned up code
just need to add tests and docs for this functionality

* update java doc to explain how it will try to use batch if function is overwritten

* rename killBatch to kill
add unit tests

* add omniDataSegmentKillerTest for deleting multiple segments at a time. fix checkstyle

* explain test peculiarity better

* clean up batch kill in s3.

* remove unused return value. cleanup comments and fix checkstyle

* default to batch delete. more specific java docs. list segments that couldn't be deleted
if there was a client error or server error

* simplify error handling

* add tests where an exception is thrown when killing multiple s3 segments

* add test for failing to delete two calls with the s3 client

* fix javadoc for kill(List<DataSegment> segments) clean up tests remove feature flag

* fix typo in javadocs

* fix test failure

* fix checkstyle and improve tests

* fix intellij inspections issues

* address comments, make delete multiple segments not assume same bucket

* fix test errors

* better grammar and punctuation. fix test. and better logging for exception

* remove unused code

* avoid extra arraylist instantiation

* fix broken test

* fix broken test

* fix tests to use assert.throws
2023-07-27 15:34:44 -07:00
Abhishek Agarwal 52fbc6939e
Fix the bug in string last vector aggregation (#14655) 2023-07-26 09:18:03 +05:30
Gian Merlino 2f9619a96f
Use OverlordClient for all Overlord RPCs. (#14581)
* Use OverlordClient for all Overlord RPCs.

Continuing the work from #12696, this patch removes HttpIndexingServiceClient
and the IndexingService flavor of DruidLeaderClient completely. All remaining
usages are migrated to OverlordClient.

Supporting changes include:

1) Add a variety of methods to OverlordClient.

2) Update MetadataTaskStorage to skip the complete-task lookup when
   the caller requests zero completed tasks. This helps performance of
   the "get active tasks" APIs, which don't want to see complete ones.

* Use less forbidden APIs.

* Fixes from CI.

* Add test coverage.

* Two more tests.

* Fix test.

* Updates from CR.

* Remove unthrown exceptions.

* Refactor to improve testability and test coverage.

* Add isNil tests.

* Remove unnecessary "deserialize" methods.
2023-07-24 21:14:27 -07:00
Abhishek Agarwal efb32810c4
Clean up the core API required for Iceberg extension (#14614)
Changes:
- Replace `AbstractInputSourceBuilder` with `InputSourceFactory`
- Move iceberg specific logic to `IcebergInputSource`
2023-07-21 13:01:33 +05:30
Karan Kumar 77e0c16bce
Sql statement api error messaging fixes. (#14629)
* Error messaging fixes.

* Static check fix

* Review comments
2023-07-20 22:48:44 +05:30
Clint Wylie 024ce40f1a
reduce heap footprint of ingesting auto typed columns by pushing compression and index generation into writeTo (#14615) 2023-07-20 00:54:58 -07:00
Gian Merlino bac5ef347c
Add ingest/input/bytes metric and Kafka consumer metrics. (#14582)
* Add ingest/input/bytes metric and Kafka consumer metrics.

New metrics:

1) ingest/input/bytes. Equivalent to processedBytes in the task reports.

2) kafka/consumer/bytesConsumed: Equivalent to the Kafka consumer
   metric "bytes-consumed-total". Only emitted for Kafka tasks.

3) kafka/consumer/recordsConsumed: Equivalent to the Kafka consumer
   metric "records-consumed-total". Only emitted for Kafka tasks.

* Fix anchor.

* Fix KafkaConsumerMonitor.

* Interface updates.

* Doc changes.

* Update indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTask.java

Co-authored-by: Benedict Jin <asdf2014@apache.org>

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
2023-07-20 10:56:22 +08:00
Clint Wylie 68fd22169f
remove extractionFn from equality, null, and range filters (#14612)
* remove extractionFn from equality, null, and range filters
changes:
* EqualityFilter, NullFilter, and RangeFilter no longer support extractionFn
* SQL planner will use ExpressionFilter in the small number of cases where an extractionFn would have been used if sqlUseBoundsAndSelectors is set to false instead of equality/null/range filters
* fix bugs and add tests with serde, equals, and cache key for null, equality, and range filters

* test coverage fixes bugs

* adjust

* adjust again

* so persnickety
2023-07-19 10:37:57 -07:00
Abhishek Radhakrishnan f4d0ea7bc8
Add support for earliest `aggregatorMergeStrategy` (#14598)
* Add EARLIEST aggregator merge strategy.

- More unit tests.
- Include the aggregators analysis type by default in tests.

* Docs.

* Some comments and a test

* Collapse into individual code blocks.
2023-07-18 12:37:10 -07:00
Clint Wylie 913416c669
add equality, null, and range filter (#14542)
changes:
* new filters that preserve match value typing to better handle filtering different column types
* sql planner uses new filters by default in sql compatible null handling mode
* remove isFilterable from column capabilities
* proper handling of array filtering, add array processor to column processors
* javadoc for sql test filter functions
* range filter support for arrays, tons more tests, fixes
* add dimension selector tests for mixed type roots
* support json equality
* rename semantic index maker thingys to mostly have plural names since they typically make many indexes, e.g. StringValueSetIndex -> StringValueSetIndexes
* add cooler equality index maker, ValueIndexes 
* fix missing string utf8 index supplier
* expression array comparator stuff
2023-07-18 12:15:22 -07:00
AmatyaAvadhanula 0412f40d36
Prepare master branch for next release, 28.0.0 (#14595)
* Prepare master branch for next release, 28.0.0
2023-07-18 09:22:30 +05:30
Atul Mohan 03d6d395a0
Extension to read and ingest iceberg data files (#14329)
This adds a new contrib extension: druid-iceberg-extensions which can be used to ingest data stored in Apache Iceberg format. It adds a new input source of type iceberg that connects to a catalog and retrieves the data files associated with an iceberg table and provides these data file paths to either an S3 or HDFS input source depending on the warehouse location.

Two important dependencies associated with Apache Iceberg tables are:

Catalog : This extension supports reading from either a Hive Metastore catalog or a Local file-based catalog. Support for AWS Glue is not available yet.
Warehouse : This extension supports reading data files from either HDFS or S3. Adapters for other cloud object locations should be easy to add by extending the AbstractInputSourceAdapter.
2023-07-18 08:59:57 +05:30
Laksh Singla c1c7dff2ad
Using DruidExceptions in MSQ (changes related to the Broker) (#14534)
MSQ engine returns correct error codes for invalid user inputs in the query context. Also, using DruidExceptions for MSQ related errors happening in the Broker with improved error messages.
2023-07-13 19:08:49 +00:00
Abhishek Radhakrishnan f4ee58eaa8
Add `aggregatorMergeStrategy` property in SegmentMetadata queries (#14560)
* Add aggregatorMergeStrategy property to SegmentMetadaQuery.

- Adds a new property aggregatorMergeStrategy to segmentMetadata query.
aggregatorMergeStrategy currently supports three types of merge strategies -
the legacy strict and lenient strategies, and the new latest strategy.
- The latest strategy considers the latest aggregator from the latest segment
by time order when there's a conflict when merging aggregators from different
segments.
- Deprecate lenientAggregatorMerge property; The API validates that both the new
and old properties are not set, and returns an exception.
- When merging segments as part of segmentMetadata query, the segments have a more
elaborate id -- <datasource>_<interval>_merged_<partition_number> format, similar to
the name format that segments usually contain. Previously it was simply "merged".
- Adjust unit tests to test the latest strategy, to assert the returned complete
SegmentAnalysis object instead of just the aggregators for completeness.

* Don't explicitly set strict strategy in tests

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/querying/segmentmetadataquery.md

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

---------

Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2023-07-13 12:37:36 -04:00
Sam Rash 0dcb19f7e3
Add Continuous Profiling to Unit Tests (#14506)
Uses a custom continusou jfr profiler.

Modifies the github actions for tests to do profiling only in the case
of jdk17, as the profiler requires jdk17+ to use the JFR streaming API
plus a few other language features in the code.

Continuous Profiling service is provided to the Apache Druid project
free of charge by Imply and any committer can request free access to
the UI.
2023-07-12 17:50:38 -07:00
imply-cheddar 65e1b27aa7
Fix a resource leak with Window processing (#14573)
* Fix a resource leak with Window processing

Additionally, in order to find the leak, there were
adjustments to the StupidPool to track leaks a bit better.
It would appear that the pool objects get GC'd during testing
for some reason which was causing some incorrect identification
of leaks from objects that had been returned but were GC'd along
with the pool.

* Suppress unused warning
2023-07-12 17:25:42 -05:00
Gian Merlino 3ff51487b7
Add ZooKeeper connection state alerts and metrics. (#14333)
* Add ZooKeeper connection state alerts and metrics.

- New metric "zk/connected" is an indicator showing 1 when connected,
  0 when disconnected.
- New metric "zk/disconnected/time" measures time spent disconnected.
- New alert when Curator connection state enters LOST or SUSPENDED.

* Use right GuardedBy.

* Test fixes, coverage.

* Adjustment.

* Fix tests.

* Fix ITs.

* Improved injection.

* Adjust metric name, add tests.
2023-07-12 09:34:28 -07:00
Gian Merlino 3711c0d987
Reduce heap footprint of GenericIndexed. (#14563)
Two changes:

1) Intern DecompressingByteBufferObjectStrategy. Saves ~32 bytes per column.

2) Split GenericIndexed into GenericIndexed.V1 and GenericIndexed.V2. The
   major benefit here is isolating out the ByteBuffers that are only needed
   for V2. This saves ~80 bytes for V1 (one buffer instead of two).
2023-07-12 08:11:41 -07:00
Gian Merlino cc8b210e4c
AggregatorFactory: Use guessAggregatorHeapFootprint when factorizeWithSize is not implemented. (#14567)
There are two ways of estimating heap footprint of an Aggregator:

1) AggregatorFactory#guessAggregatorHeapFootprint
2) AggregatorFactory#factorizeWithSize + Aggregator#aggregateWithSize

When the second path is used, the default implementation of factorizeWithSize
is now updated to delegate to guessAggregatorHeapFootprint, making these equivalent.
The old logic used getMaxIntermediateSize, which is less accurate.

Also fixes a bug where, when using the second path, calling factorizeWithSize
on PassthroughAggregatorFactory would fail because getMaxIntermediateSize was
not implemented. (There is no buffer aggregator, so there would be no need.)
2023-07-12 07:33:27 -07:00
hqx871 7142b0c39e
Enable result level cache for GroupByStrategyV2 on broker (#11595)
Cache is disabled for GroupByStrategyV2 on broker since the pr #3820 [groupBy v2: Results not fully merged when caching is enabled on the broker]. But we can enable the result-level cache on broker for GroupByStrategyV2 and keep the segment-level cache disabled.
2023-07-12 15:00:01 +05:30
Kashif Faraz 58a35bf07e
Deprecate EntryExistsException in Druid 27 and remove in Druid 28 (#14554)
Also deprecate UnknownSegmentIdsException.
2023-07-08 15:40:14 +05:30
Gian Merlino 63ee69b4e8
Claim full support for Java 17. (#14384)
* Claim full support for Java 17.

No production code has changed, except the startup scripts.

Changes:

1) Allow Java 17 without DRUID_SKIP_JAVA_CHECK.

2) Include the full list of opens and exports on both Java 11 and 17.

3) Document that Java 17 is both supported and preferred.

4) Switch some tests from Java 11 to 17 to get better coverage on the
   preferred version.

* Doc update.

* Update errorprone.

* Update docker_build_containers.sh.

* Update errorprone in licenses.yaml.

* Add some more run-javas.

* Additional run-javas.

* Update errorprone.

* Suppress new errorprone error.

* Add exports and opens in ForkingTaskRunner for Java 11+.

Test, doc changes.

* Additional errorprone updates.

* Update for errorprone.

* Restore old fomatting in LdapCredentialsValidator.

* Copy bin/ too.

* Fix Java 15, 17 build line in docker_build_containers.sh.

* Update busybox image.

* One more java command.

* Fix interpolation.

* IT commandline refinements.

* Switch to busybox 1.34.1-glibc.

* POM adjustments, build and test one IT on 17.

* Additional debugging.

* Fix silly thing.

* Adjust command line.

* Add exports and opens one more place.

* Additional harmonization of strong encapsulation parameters.
2023-07-07 12:52:35 -07:00
Karan Kumar afa8c7b8ab
Adding Ability for MSQ to write select results to durable storage. (#14527)
One of the most requested features in druid is to have an ability to download big result sets.
As part of #14416 , we added an ability for MSQ to be queried via a query friendly endpoint. This PR builds upon that work and adds the ability for MSQ to write select results to durable storage.

We write the results to the durable storage location <prefix>/results/<queryId> in the druid frame format. This is exposed to users by
/v2/sql/statements/:queryId/results.
2023-07-07 20:49:48 +05:30
Gian Merlino dd78e00dc5
Fix ColumnSignature error message and jdk17 test issue. (#14538)
* Fix ColumnSignature error message and jdk17 test issue.

On jdk17, the "problem" part of the error message could change from
NullPointerException to:

  Cannot invoke "String.length()" because "s" is null

Due to the new more-helpful NPEs in Java 17. This broke the expectation
and led to test failures on this case.

This patch fixes the problem by improving the error message so it isn't
a generic NullPointerException.

* Fix format.
2023-07-06 15:10:59 -07:00
imply-cheddar 5fc122a144
Add window-focused tests from Drill (#13773)
This commit borrows some test definitions from Drill's test suite
and tries to use them to flesh out the full validation of window
function capbilities.

In order to be able to run these tests, we also add the ability to
run a Scan operation against segments, which also meant an
implementation of RowsAndColumns for frames.
2023-07-06 09:20:32 -07:00
imply-cheddar 277b357256
Optimize IntervalIterator (#14530)
UniformGranularityTest's test to test a large number of intervals
runs through 10 years of 1 second intervals.  This pushes a lot of
stuff through IntervalIterator and shows up in terms of test
runtime as one of the hottest tests.  Most of the time is going to
constructing jodatime objects because it is doing things with
DateTime objects instead of millis.  Change the calls to use
millis instead and things go faster.
2023-07-06 14:44:23 +05:30
Kashif Faraz 87bb1b9709
Fix bug during initialization of HttpServerInventoryView (#14517)
If a server is removed during `HttpServerInventoryView.serverInventoryInitialized`,
the initialization gets stuck as this server is never synced. The method eventually times
out (default 250s).

Fix: Mark a server as stopped if it is removed. `serverInventoryInitialized` only waits for
non-stopped servers to sync.

Other changes:
- Add new metrics for better debugging of slow broker/coordinator startup
  - `segment/serverview/sync/healthy`: whether the server view is syncing properly with a server
  - `segment/serverview/sync/unstableTime`: time for which sync with a server has been unstable  
- Clean up logging in `HttpServerInventoryView` and `ChangeRequestHttpSyncer`
- Minor refactor for readability
- Add utility class `Stopwatch`
- Add tests and stubs
2023-07-06 13:04:53 +05:30
Clint Wylie 277aaa5c57
remove druid.processing.columnCache.sizeBytes and CachingIndexed, combine string column implementations (#14500)
* combine string column implementations
changes:
* generic indexed, front-coded, and auto string columns now all share the same column and index supplier implementations
* remove CachingIndexed implementation, which I think is largely no longer needed by the switch of many things to directly using ByteBuffer, avoiding the cost of creating Strings
* remove ColumnConfig.columnCacheSizeBytes since CachingIndexed was the only user
2023-07-02 19:37:15 -07:00
Gian Merlino 67fbd8e7fc
Add "stringEncoding" parameter to DataSketches HLL. (#11201)
* Add "stringEncoding" parameter to DataSketches HLL.

Builds on the concept from #11172 and adds a way to feed HLL sketches
with UTF-8 bytes.

This must be an option rather than always-on, because prior to this
patch, HLL sketches used UTF-16LE encoding when hashing strings. To
remain compatible with sketch images created prior to this patch -- which
matters during rolling updates and when reading sketches that have been
written to segments -- we must keep UTF-16LE as the default.

Not currently documented, because I'm not yet sure how best to expose
this functionality to users. I think the first place would be in the SQL
layer: we could have it automatically select UTF-8 or UTF-16LE when
building sketches at query time. We need to be careful about this, though,
because UTF-8 isn't always faster. Sometimes, like for the results of
expressions, UTF-16LE is faster. I expect we will sort this out in
future patches.

* Fix benchmark.

* Fix style issues, improve test coverage.

* Put round back, to make IT updates easier.

* Fix test.

* Fix issue with filtered aggregators and add test.

* Use DS native update(ByteBuffer) method. Improve test coverage.

* Add another suppression.

* Fix ITAutoCompactionTest.

* Update benchmarks.

* Updates.

* Fix conflict.

* Adjustments.
2023-06-30 12:45:55 -07:00
Gian Merlino e10e35aa2c
Add REGEXP_REPLACE function. (#14460)
* Add REGEXP_REPLACE function.

Replaces all instances of a pattern with a replacement string.

* Fixes.

* Improve test coverage.

* Adjust behavior.
2023-06-29 13:47:57 -07:00
Gian Merlino a6cabbe10f
SQL: Avoid "intervals" for non-table-based datasources. (#14336)
In these other cases, stick to plain "filter". This simplifies lots of
logic downstream, and doesn't hurt since we don't have intervals-specific
optimizations outside of tables.

Fixes an issue where we couldn't properly filter on a column from an
external datasource if it was named __time.
2023-06-29 09:57:11 +05:30
Gian Merlino 82fbb31c7c
Properly read SQL-compatible segments in default-value mode. (#14142)
* Properly read SQL-compatible segments in default-value mode.

Main changes:

1) Dictionary-encoded and front-coded string columns: in default-value
   mode, detect cases where a dictionary has the empty string in it, then
   either combine it with null (if null is present) or replace it with
   null (if null is not present).

2) Numeric nullable columns: in default-value mode, ignore the null
   value bitmap. This causes all null numbers to be read as zeroes.

Testing strategy:

1) Add a mmappedWithSqlCompatibleNulls case to BaseFilterTest that
   writes segments under SQL-compatible mode, and reads them under
   default-value mode.

2) Unit tests for the new wrapper classes (CombineFirstTwoEntriesIndexed,
   CombineFirstTwoValuesColumnarInts, CombineFirstTwoValuesColumnarMultiInts,
   CombineFirstTwoValuesIndexedInts).

* Fix a mistake, use more singlethreadedness.

* WIP

* Tests, improvements.

* Style.

* See Spot bug.

* Remove unused method.

* Address review comments.

1) Read bitmaps even if we don't retain them.
2) Combine StringFrontCodedDictionaryEncodedColumn and ScalarStringDictionaryEncodedColumn.

* Add missing tests.
2023-06-28 10:30:27 -07:00
Karan Kumar cb3a9d2b57
Adding Interactive API's for MSQ engine (#14416)
This PR aims to expose a new API called
"@path("/druid/v2/sql/statements/")" which takes the same payload as the current "/druid/v2/sql" endpoint and allows users to fetch results in an async manner.
2023-06-28 17:51:58 +05:30
imply-cheddar fd20bbd30e
Fix another infinite loop and remove Mockito usage (#14493)
* Fix another infinite loop and remove Mockito usage

The ConfigManager objects were `started()` without ever being
stopped.  This scheduled a poll call that never-ended, to make
matters worse, the poll interval was set to 0 ms, making an
infinite poll with 0 sleep, i.e. an infinite loop.

Also introduce test classes and remove usage of mocks

* Checkstyle
2023-06-27 21:49:27 -07:00
Adarsh Sanjeev 0335aaa279
Add query results directory and prevent the auto cleaner from cleaning it (#14446)
Adds support for automatic cleaning of a "query-results" directory in durable storage. This directory will be cleaned up only if the task id is not known to the overlord. This will allow the storage of query results after the task has finished running.
2023-06-28 10:14:04 +05:30
Abhishek Radhakrishnan 2cfb00b1de
Add missing `isNull()` implementation to `FilteredAggregator` (#14465) 2023-06-27 16:35:15 -07:00
Gian Merlino c78d885b80
Cache parsed expressions and binding analysis in more places. (#14124)
* Cache parsed expressions and binding analysis in more places.

Main changes:

1) Cache parsed and analyzed expressions within PlannerContext for a
   single SQL query.

2) Cache parsed expressions together with input binding analysis using
   a new class AnalyzeExpr.

This speeds up SQL planning, because SQL planning involves parsing
analyzing the same expression strings over and over again.

* Fixes.

* Fix style.

* Fix test.

* Simplify: get rid of AnalyzedExpr, focus on caching.

* Rename parse -> parseExpression.
2023-06-27 13:40:35 -07:00
imply-cheddar 2f0a43790c
Make GuavaUtilsTest use less CPU (#14487) 2023-06-26 21:45:29 -07:00
Clint Wylie 6ba10c8b6c
fix bug with json_value expression array extraction (#14461) 2023-06-26 21:02:44 -07:00
Laksh Singla f546cd64a9
MSQ: Ensure that the allocated segment aligns with the requested granularity (#14475)
Changes:
- Throw an `InsertCannotAllocateSegmentFault` if the allocated segment is not aligned with
the requested granularity.
- Tests to verify new behaviour
2023-06-27 09:25:32 +05:30
Laksh Singla 114380749d
MSQ: Improve the parse exception errors and the handling of null UTF characters in Strings in Frames (#14398) 2023-06-26 18:14:29 +05:30
Laksh Singla 1647d5f4a0
Limit the subquery results by memory usage (#13952)
Users can now add a guardrail to prevent subquery’s results from exceeding the set number of bytes by setting druid.server.http.maxSubqueryRows in Broker's config or maxSubqueryRows in the query context. This feature is experimental for now and would default back to row-based limiting in case it fails to get the accurate size of the results consumed by the query.
2023-06-26 18:12:28 +05:30
Gian Merlino 970288067a
Fix flaky HttpEmitterConfigTest and ParametrizedUriEmitterConfigTest. (#14481)
Recently, we have seen flakiness in these two tests, apparently due to
computations based on Runtime.getRuntime().maxMemory() differing during
static initialization and in the actual tests. I can't think of a reason
why this would be happening, but anyway, this patch switches the tests to
use the statics instead of recomputing Runtime.getRuntime().maxMemory().
2023-06-23 16:27:11 -07:00
imply-cheddar 7e2cf35d7b
Fix compatibility issue with SqlTaskResource (#14466)
* Fix compatibility issue with SqlTaskResource

The DruidException changes broke the response format
for errors coming back from the SqlTaskResource, so fix those
2023-06-23 01:15:32 -07:00
Clint Wylie 31b9d5695d
Extend InitializedNullHandlingTest instead of NullHandlingTest (#14467)
NullHandlingTest is an actual test, it shouldn't be used as a base class
2023-06-22 15:01:50 +05:30
Hardik Bajaj 1ea9158a50
Added new SysMonitorOshi v0 using Oshi library (#14359)
Added a new monitor SysMonitorOshi to replace SysMonitor. The new monitor has a wider support for different machine architectures including ARM instances. Please switch to SysMonitorOshi as SysMonitor is now deprecated and will be removed in future releases.
2023-06-20 20:57:58 +05:30
Kashif Faraz 50461c3bd5
Enable smartSegmentLoading on the Coordinator (#13197)
This commit does a complete revamp of the coordinator to address problem areas:
- Stability: Fix several bugs, add capabilities to prioritize and cancel load queue items
- Visibility: Add new metrics, improve logs, revamp `CoordinatorRunStats`
- Configuration: Add dynamic config `smartSegmentLoading` to automatically set
optimal values for all segment loading configs such as `maxSegmentsToMove`,
`replicationThrottleLimit` and `maxSegmentsInNodeLoadingQueue`.

Changed classes:
- Add `StrategicSegmentAssigner` to make assignment decisions for load, replicate and move
- Add `SegmentAction` to distinguish between load, replicate, drop and move operations
- Add `SegmentReplicationStatus` to capture current state of replication of all used segments
- Add `SegmentLoadingConfig` to contain recomputed dynamic config values
- Simplify classes `LoadRule`, `BroadcastRule`
- Simplify the `BalancerStrategy` and `CostBalancerStrategy`
- Add several new methods to `ServerHolder` to track loaded and queued segments
- Refactor `DruidCoordinator`

Impact:
- Enable `smartSegmentLoading` by default. With this enabled, none of the following
dynamic configs need to be set: `maxSegmentsToMove`, `replicationThrottleLimit`,
`maxSegmentsInNodeLoadingQueue`, `useRoundRobinSegmentAssignment`,
`emitBalancingStats` and `replicantLifetime`.
- Coordinator reports richer metrics and produces cleaner and more informative logs
- Coordinator uses an unlimited load queue for all serves, and makes better assignment decisions
2023-06-19 14:27:35 +05:30
imply-cheddar cfd07a95b7
Errors take 3 (#14004)
Introduce DruidException, an exception whose goal in life is to be delivered to a user.

DruidException itself has javadoc on it to describe how it should be used.  This commit both introduces the Exception and adjusts some of the places that are generating exceptions to generate DruidException objects instead, as a way to show how the Exception should be used.

This work was a 3rd iteration on top of work that was started by Paul Rogers.  I don't know if his name will survive the squash-and-merge, so I'm calling it out here and thanking him for starting on this.
2023-06-19 01:11:13 -07:00
Adarsh Sanjeev 128133fadc
Add column replication_factor column to sys.segments table (#14403)
Description:
Druid allows a configuration of load rules that may cause a used segment to not be loaded
on any historical. This status is not tracked in the sys.segments table on the broker, which
makes it difficult to determine if the unavailability of a segment is expected and if we should
not wait for it to be loaded on a server after ingestion has finished.

Changes:
- Track replication factor in `SegmentReplicantLookup` during evaluation of load rules
- Update API `/druid/coordinator/v1metadata/segments` to return replication factor
- Add column `replication_factor` to the sys.segments virtual table and populate it in
`MetadataSegmentView`
- If this column is 0, the segment is not assigned to any historical and will not be loaded.
2023-06-18 10:02:21 +05:30
George Shiqi Wu 64af9bfe5b
Add groupId to metrics (#14402)
* Add group id as a dimension

* Revert changes

* Add to forking task runner

* Add missing metrics

* Fix indenting

* revert metrics

* Fix indentation
2023-06-16 09:28:16 -07:00
Clint Wylie 359bd63cc9
allow expression "best effort" type determination to better handle mixed type arrays (#14438) 2023-06-16 00:02:43 -07:00
Clint Wylie ff5ae4db6c
fix kafka input format reader schema discovery and partial schema discovery (#14421)
* fix kafka input format reader schema discovery and partial schema discovery to actually work right, by re-using dimension filtering logic of MapInputRowParser
2023-06-15 00:11:04 -07:00
Clint Wylie ca116cf886
adjust broker parallel merge to help managed blocking be more well behaved (#14427) 2023-06-15 00:10:31 -07:00
Pranav e426d370ea
Start with solo accumulator and empty partition (#14426)
* Starting parallel merge with solo accumulator and empty partitions

* shutshown pool in test
2023-06-14 16:20:48 -07:00
Clint Wylie 8454cc619a
auto columns fixes (#14422)
changes:
* auto columns no longer participate in generic 'null column' handling, this was a mistake to try to support and caused ingestion failures due to mismatched ColumnFormat, and will be replaced in the future with nested common format constant column functionality (not in this PR)
* fix bugs with auto columns which contain empty objects, empty arrays, or primitive types mixed with either of these empty constructs
* fix bug with bound filter when upper is null equivalent but is strict
2023-06-14 08:57:06 -07:00
Abhishek Radhakrishnan be5a6593a9
Reset `RuntimeInfo` to fix flaky test `ParametrizedUriEmitterConfigTest`. (#14405)
* Add injector so JVM settings are correctly set up and bound for the test.

* Add VisibleForTesting IDE annotation.

* spacing
2023-06-13 18:07:51 -07:00
Clint Wylie 61120dc49a
fix Kafka input format to throw ParseException if timestamp is missing (#14413) 2023-06-13 09:00:11 -07:00
Adarsh Sanjeev 267cbac6ff
Add logs for deleting files using storage connector (#14350)
* Add logs for deleting files using storage connector

* Address review comments

* Update log message format
2023-06-11 21:24:30 +05:30
Kashif Faraz 6e158704cb
Do not retry INSERT task into metadata if max_allowed_packet limit is violated (#14271)
Changes
- Add a `DruidException` which contains a user-facing error message, HTTP response code
- Make `EntryExistsException` extend `DruidException`
- If metadata store max_allowed_packet limit is violated while inserting a new task, throw
`DruidException` with response code 400 (bad request) to prevent retries
- Add `SQLMetadataConnector.isRootCausePacketTooBigException` with impl for MySQL
2023-06-10 12:15:44 +05:30
imply-cheddar 87149d5975
Remove AbstractIndex (#14388)
The class apparently only exists to add a toString()
method to Indexes, which basically just crashes any debugger
on any meaningfully sized index.  It's a pointless
abstract class that basically only causes pain.
2023-06-08 19:52:16 -07:00
Harini Rajendran 4ff6026d30
Adding SegmentMetadataEvent and publishing them via KafkaEmitter (#14281)
In this PR, we are enhancing KafkaEmitter, to emit metadata about published segments (SegmentMetadataEvent) into a Kafka topic. This segment metadata information that gets published into Kafka, can be used by any other downstream services to query Druid intelligently based on the segments published. The segment metadata gets published into kafka topic in json string format similar to other events.
2023-06-02 21:28:26 +05:30
zachjsh e75fb8e8e3
Account for data format and compression in MSQ auto taskAssignment (#14307)
### Description

This change allows for consideration of the input format and compression  when computing how to split the input files among available tasks, in MSQ ingestion, when considering the value of the  `maxInputBytesPerWorker` query context parameter. This query parameter allows users to control the maximum number of bytes, with granularity of input file / object, that ingestion tasks will be assigned to ingest. With this change, this context parameter now denotes the estimated weighted size in bytes of the input to split on, with consideration for input format and compression format, rather than the actual file size, reported by the file system.  We assume uncompressed newline delimited json as a baseline, with scaling factor of `1`. This means that when computing the byte weight that a file has towards the input splitting, we take the file size as is, if uncompressed json, 1:1. It was found during testing that gzip compressed json, and parquet, has scale factors of `4` and `8` respectively, meaning that each byte of data is weighted 4x and 8x respectively, when computing input splits. This weighted byte scaling is only considered for MSQ ingestion that uses either LocalInputSource or CloudObjectInputSource at the moment. The default value of the `maxInputBytesPerWorker` query context parameter has been updated from 10 GiB, to 512 MiB
2023-06-01 12:53:49 -07:00
Clint Wylie 4096f51f0b
add configurable ColumnTypeMergePolicy to SegmentMetadataCache (#14319)
This PR adds a new interface to control how SegmentMetadataCache chooses ColumnType when faced with differences between segments for SQL schemas which are computed, exposed as druid.sql.planner.metadataColumnTypeMergePolicy and adds a new 'least restrictive type' mode to allow choosing the type that data across all segments can best be coerced into and sets this as the default behavior.

This is a behavior change around when segment driven schema migrations take effect for the SQL schema. With latestInterval, the SQL schema will be updated as soon as the first job with the new schema has published segments, while using leastRestrictive, the schema will only be updated once all segments are reindexed to the new type. The benefit of leastRestrictive is that it eliminates a bunch of type coercion errors that can happen in SQL when types are varied across segments with latestInterval because the newest type is not able to correctly represent older data, such as if the segments have a mix of ARRAY and number types, or any other combinations that lead to odd query plans.
2023-05-24 20:32:51 +05:30
Soumyava 22ba457d29
Expr getCacheKey now delegates to children (#14287)
* Expr getCacheKey now delegates to children

* Removed the LOOKUP_EXPR_CACHE_KEY as we do not need it

* Adding an unit test

* Update processing/src/main/java/org/apache/druid/math/expr/Expr.java

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

---------

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2023-05-23 14:49:38 -07:00
Abhishek Radhakrishnan a5e04d95a4
Add `TYPE_NAME` to the complex serde classes and replace the hardcoded names. (#14317)
* Add TYPE_NAME to the serde classes and reuse them instead of hardcoded strings.

* Static check fixes.
2023-05-23 00:54:47 -05:00
Clint Wylie d92b9fbfac
more resilient segment metadata, dont parallel merge internal segment metadata queries (#14296) 2023-05-17 04:12:55 -07:00
Clint Wylie b038a11280
fix issues with handling arrays with all null elements and arrays of booleans in strict mode (#14297) 2023-05-17 01:33:44 -07:00
Soumyava 96a3c00754
Fixing an issue with filtering on a single dimension by converting In… (#14277)
* Fixing an issue with filtering on a single dimension by converting In filter to a selector filter as needed with Filters.toFilter

* Adding a test so that any future refactoring does not break this behavior

* Made comment a bit more meaningful
2023-05-15 20:10:36 -07:00
imply-cheddar f9861808bc
Be able to load segments on Peons (#14239)
* Be able to load segments on Peons

This change introduces a new config on WorkerConfig
that indicates how many bytes of each storage
location to use for storage of a task.  Said config
is divided up amongst the locations and slots
and then used to set TaskConfig.tmpStorageBytesPerTask

The Peons use their local task dir and
tmpStorageBytesPerTask as their StorageLocations for
the SegmentManager such that they can accept broadcast
segments.
2023-05-12 16:51:00 -07:00
Kashif Faraz ba11b3d462
Refactor: Add OverlordDuty to replace OverlordHelper and align with CoordinatorDuty (#14235)
Changes:
- Replace `OverlordHelper` with `OverlordDuty` to align with `CoordinatorDuty`
  - Each duty has a `run()` method and defines a `Schedule` with an initial delay and period.
  - Update existing duties `TaskLogAutoCleaner` and `DurableStorageCleaner`
- Add utility class `Configs`
- Update log, error messages and javadocs
- Other minor style improvements
2023-05-12 22:39:56 +05:30
Clint Wylie 9875090bee
fix segment metadata queries for auto ingested columns that had all null values (#14262) 2023-05-11 20:58:06 -07:00
Soumyava f128b9b666
Updates to filter processing for inner query in Joins (#14237) 2023-05-11 17:21:41 +05:30
Clint Wylie a58cebe491
add array_to_mv function to convert arrays into mvds to assist with migration from mvds to arrays (#14236) 2023-05-11 04:43:28 -07:00
Kashif Faraz 64e6283eca
Do not allow retention rules to be null (#14223)
Changes:
- Do not allow retention rules for any datasource or cluster to be null
- Allow empty rules at the datasource level but not at the cluster level
- Add validation to ensure that `druid.manager.rules.defaultRule` is always set correctly
- Minor style refactors
2023-05-11 14:33:56 +05:30
Clint Wylie aaaff74740
fix npe regression in json_value when filtering non-existent paths (#14250)
* fix npe regression in json_value when filtering non-existent paths

* more coverage
2023-05-10 22:39:22 -07:00
Clint Wylie 6db11bfc60
suppress some cves and fix javadoc build when using java 17 (#14241) 2023-05-10 15:47:10 -07:00
Clint Wylie 8805d8d7db
fix issues with filtering nulls on values coerced to numeric types (#14139)
* fix issues with filtering nulls on values coerced to numeric types
* fix issues with 'auto' type numeric columns in default value mode
* optimize variant typed columns without nested data
* more tests for 'auto' type column ingestion
2023-05-08 13:19:02 -07:00
Clint Wylie a7a4bfd331
modify QueryScheduler to lazily acquire lanes when executing queries to avoid leaks (#14184)
This PR fixes an issue that could occur if druid.query.scheduler.numThreads is configured and any exception occurs after QueryScheduler.run has been called to create a Sequence. This would result in total and/or lane specific locks being acquired, but because the sequence was not actually being evaluated, the "baggage" which typically releases these locks was not being executed. An example of how this can happen is if a group-by having filter, which wraps and transforms this sequence happens to explode while wrapping the sequence. The end result is that the locks are acquired, but never released, eventually halting the ability to execute any queries.
2023-05-08 11:42:05 +05:30
Clint Wylie 90ea192d9c
fix bugs with auto encoded long vector deserializers (#14186)
This PR fixes an issue when using 'auto' encoded LONG typed columns and the 'vectorized' query engine. These columns use a delta based bit-packing mechanism, and errors in the vectorized reader would cause it to incorrectly read column values for some bit sizes (1 through 32 bits). This is a regression caused by #11004, which added the optimized readers to improve performance, so impacts Druid versions 0.22.0+.

While writing the test I finally got sad enough about IndexSpec not having a "builder", so I made one, and switched all the things to use it. Apologies for the noise in this bug fix PR, the only real changes are in VSizeLongSerde, and the tests that have been modified to cover the buggy behavior, VSizeLongSerdeTest and ExpressionVectorSelectorsTest. Everything else is just cleanup of IndexSpec usage.
2023-05-01 11:49:27 +05:30
Suneet Saldanha 84c11df980
Make LoggingEmitter more useful by using Markers (#14121)
* Make LoggingEmitter more useful

* Skip code coverage for facade classes

* fix spellcheck

* code review

* fix dependency

* logging.md

* fix checkstyle

* Add back jacoco version to main pom
2023-04-27 15:06:06 -07:00
Adarsh Sanjeev 5aa119dfda
Add retry to opening retrying stream (#14126)
* Add retry to opening retrying stream
* Add retry to S3Entity for network issues

* Fix tests and clean up code
2023-04-27 16:52:22 +05:30
Gian Merlino 42c8c84eb6
TimeBoundary: Use cursor when datasource is not a regular table. (#14151)
* TimeBoundary: Use cursor when datasource is not a regular table.

Fixes a bug where TimeBoundary could return incorrect results with
INNER Join or inline data.

* Addl Javadocs.
2023-04-26 17:00:13 -07:00
Gian Merlino 752475b799
Fix two concurrency issues with segment fetching. (#14042)
* Fix two concurrency issues with segment fetching.

1) SegmentLocalCacheManager: Fix a concurrency issue where certain directory
   cleanup happened outside of directoryWriteRemoveLock. This created the
   possibility that segments would be deleted by one thread, while being
   actively downloaded by another thread.

2) TaskDataSegmentProcessor (MSQ): Fix a concurrency issue when two stages
   in the same process both use the same segment. For example: a self-join
   using distributed sort-merge. Prior to this change, the two stages could
   delete each others' segments.

3) ReferenceCountingResourceHolder: increment() returns a new ResourceHolder,
   rather than a Releaser. This allows it to be passed to callers without them
   having to hold on to both the original ResourceHolder *and* a Releaser.

4) Simplify various interfaces and implementations by using ResourceHolder
   instead of Pair and instead of split-up fields.

* Add test.

* Fix style.

* Remove Releaser.

* Updates from master.

* Add some GuardedBys.

* Use the correct GuardedBy.

* Adjustments.
2023-04-25 20:49:27 -07:00
Gian Merlino 2dfb693d4c
Improved handling for zero-length intervals. (#14136)
* Improved handling for zero-length intervals.

1) Return an empty list from VersionedIntervalTimeline.lookup when
   provided with an empty interval. (The logic doesn't quite work when
   intervals are empty, which led to #14129.)

2) Don't return zero-length intervals from JodaUtils.condenseIntervals.

3) Detect "incorrect" comparator in JodaUtils.condenseIntervals, and
   recreate the SortedSet if needed. (Not strictly related to the theme
   of this patch. Just another thing in the same file.)

4) Remove unused method JodaUtils.containOverlappingIntervals.

Fixes #14129.

* Fix TimewarpOperatorTest.
2023-04-25 17:12:56 -07:00
Gian Merlino 89e7948159
MSQ: Subclass CalciteJoinQueryTest, other supporting changes. (#14105)
* MSQ: Subclass CalciteJoinQueryTest, other supporting changes.

The main change is the new tests: we now subclass CalciteJoinQueryTest
in CalciteSelectJoinQueryMSQTest twice, once for Broadcast and once for
SortMerge.

Two supporting production changes for default-value mode:

1) InputNumberDataSource is marked as concrete, to allow leftFilter to
   be pushed down to it.

2) In default-value mode, numeric frame field readers can now return nulls.
   This is necessary when stacking joins on top of joins: nulls must be
   preserved for semantics that match broadcast joins and native queries.

3) In default-value mode, StringFieldReader.isNull returns true on empty
   strings in addition to nulls. This is more consistent with the behavior
   of the selectors, which map empty strings to null as well in that mode.

As an effect of change (2), the InsertTimeNull change from #14020 (to
replace null timestamps with default timestamps) is reverted. IMO, this
is fine, as either behavior is defensible, and the change from #14020
hasn't been released yet.

* Adjust tests.

* Style fix.

* Additional tests.
2023-04-25 12:10:23 -07:00
Gian Merlino 73f050027b
MSQ: Preserve original ParseException when writing frames. (#14122) 2023-04-25 11:47:15 +05:30
Nicholas Lippis 9d4cc501f7
return task status reported by peon (#14040)
* return task status reported by peon

* Write TaskStatus to file in AbstractTask.cleanUp

* Get TaskStatus from task log

* Fix merge conflicts in AbstractTaskTest

* Add unit tests for TaskLogPusher, TaskLogStreamer, NoopTaskLogs to satisfy code coverage

* Add license headerss

* Fix style

* Remove unknown exception declarations
2023-04-24 12:05:39 -07:00
TSFenwick accd5536df
Allow for Log4J to be configured for peons but still ensure console logging is enforced (#14094)
* Allow for Log4J to be configured for peons but still ensure console logging is enforced

This change will allow for log4j to be configured for peons but require console logging is still
configured for them to ensure peon logs are saved to deep storage.

Also fixed the test ConsoleLoggingEnforcementTest to use a valid appender for the non console
Config as the previous config was incorrect and would never return a logger.

* fix checkstyle

* add warning to logger when it overwrites all loggers to be console

* optimize calls for altering logging config for ConsoleLoggingEnforcementConfigurationFactory

add getName to the druid logger class

* update docs, and error message

* edit docs to be more clear

* fix checkstyle issues

* CI fixes - LoggerTest code coverage and fix spelling issue for logging docs
2023-04-24 10:41:56 -07:00
Soumyava 8d60edcfcb
Updating segment map function for QueryDataSource to ensure group by … (#14112)
* Updating segment map function for QueryDataSource to ensure group by of group by of join data source gets into proper segment map function path

* Adding unit tests for the failed case

* There you go coverage bot, be happy now
2023-04-20 13:22:29 -07:00
Gian Merlino 9436ee8a63
Nicer error message for CSV with no properties. (#14093)
* Nicer error message for CSV with no properties.

* Take two.

* Adjustments from review, and test fixes.

* Fix test.

* Fix static check.
2023-04-18 12:52:02 -07:00
Clint Wylie e7d2e8b914
fix bug filtering nested columns with expression filters (#14096) 2023-04-17 14:21:32 -07:00
Gian Merlino facd82b493
Add HLLC tests for empty strings that don't pass. (#14085)
I believe the test case illustrates the cause of the problem in #13950.
2023-04-17 15:46:42 +05:30
Gian Merlino 0884a22c41
MSQ: Support for querying lookup and inline data directly. (#14048)
* MSQ: Support for querying lookup and inline data directly.

Main changes:

1) Add of LookupInputSpec and DataSourcePlan.forLookup.

2) Add InlineInputSpec, and modify of DataSourcePlan.forInline to use
   this instead of an ExternalInputSpec with JSON. This allows the inline
   data to act as the right-hand side of a join, if needed.

Supporting changes:

1) Modify JoinDataSource's leftFilter validation to be a little less
   strict: it's now OK with leftFilter being attached to any concrete
   leaf (no children) datasource, rather than requiring it be a table.
   This allows MSQ to create JoinDataSource with InputNumberDataSource
   as the base.

2) Add SegmentWranglerModule to CliIndexer, CliPeon. This allows them to
   query lookups and inline data directly.

* Updates based on CI.

* Additional tests.

* Style fix.

* Remove unused import.
2023-04-14 14:04:02 -07:00
Clint Wylie 179e2e8108
adjust useSchemaDiscovery to also include the behavior of includeAllDimensions to support partial schema declaration without having to set two flags (#14076) 2023-04-12 23:12:49 -07:00
Gian Merlino 81074411a9
MSQ: Support multiple result columns with the same name. (#14025)
* MSQ: Support multiple result columns with the same name.

This is allowed in SQL, and is supported by the regular SQL endpoint.
We retain a validation that INSERT ... SELECT does not allow multiple
columns with the same name, because column names in segments must be
unique.
2023-04-13 11:09:39 +05:30
Clint Wylie 9ed8beca5e
bug fixes and add support for boolean inputs to classic long dimension indexer (#14069)
changes:
* adds support for boolean inputs to the classic long dimension indexer, which plays nice with LONG being the semi official boolean type in Druid, and even nicer when druid.expressions.useStrictBooleans is set to true, since the sampler when using the new 'auto' schema when 'useSchemaDiscovery' is specified on the dimensions spec will call the type out as LONG
* fix bugs with sampler response and new schema discovery stuff incorrectly using classic 'json' type for the logical schema instead of the new 'auto' type
2023-04-11 20:49:52 -07:00
Clint Wylie 29652bd246
fix NPE that can happen when merging all null nested v4 format columns (#14068) 2023-04-11 19:04:51 -07:00
Clint Wylie d61bd7f8f1
fix bug in nested v4 format merger from refactoring (#14053) 2023-04-10 20:38:58 -07:00
Clint Wylie 1aef72aa7e
Bump up the version in pom to 27.0.0 in preparation of release (#14051) 2023-04-10 14:56:59 +05:30
Gian Merlino d52bc333aa
Frames: Ensure nulls are read as default values when appropriate. (#14020)
* Frames: Ensure nulls are read as default values when appropriate.

Fixes a bug where LongFieldWriter didn't write a properly transformed
zero when writing out a null. This had no meaningful effect in SQL-compatible
null handling mode, because the field would get treated as a null anyway.
But it does have an effect in default-value mode: it would cause Long.MIN_VALUE
to get read out instead of zero.

Also adds NullHandling checks to the various frame-based column selectors,
allowing reading of nullable frames by servers in default-value mode.
2023-04-10 05:28:46 +05:30
Clint Wylie f41468fd46
fix off by one error in FrontCodedIndexedWriter and FrontCodedIntArrayIndexedWriter getCardinality method (#14047)
* fix off by one error in FrontCodedIndexedWriter and FrontCodedIntArrayIndexedWriter getCardinality method
2023-04-07 03:11:15 -07:00
zachjsh 5c0221375c
Allow for Input source security in native task layer (#14003)
Fixes #13837.

### Description

This change allows for input source type security in the native task layer.

To enable this feature, the user must set the following property to true:

`druid.auth.enableInputSourceSecurity=true`

The default value for this property is false, which will continue the existing functionality of needing authorization to write to the respective datasource.

When this config is enabled, the users will be required to be authorized for the following resource action, in addition to write permission on the respective datasource.

`new ResourceAction(new Resource(ResourceType.EXTERNAL, {INPUT_SOURCE_TYPE}, Action.READ`

where `{INPUT_SOURCE_TYPE}` is the type of the input source being used;, http, inline, s3, etc..

Only tasks that provide a non-default implementation of the `getInputSourceResources` method can be submitted when config `druid.auth.enableInputSourceSecurity=true` is set. Otherwise, a 400 error will be thrown.
2023-04-06 13:13:09 -04:00
Abhishek Agarwal 92912a6a2b
JOIN or UNNEST queries over tombstone segment can fail (#14021)
Join,Unnest queries over tombstone segment can fail
2023-04-06 16:55:58 +05:30
Clint Wylie b11c0bc249
smarter nested column index utilization (#13977)
* smarter nested column index utilization
changes:
* adds skipValueRangeIndexScale and skipValuePredicateIndexScale to ColumnConfig (e.g. DruidProcessingConfig) available as system config via druid.processing.indexes.skipValueRangeIndexScale and druid.processing.indexes.skipValuePredicateIndexScale
* NestedColumnIndexSupplier uses skipValueRangeIndexScale and skipValuePredicateIndexScale to multiply by the total number of rows to be processed to determine the threshold at which we should no longer consider using bitmap indexes because it will be too many operations
* Default values for skipValueRangeIndexScale and skipValuePredicateIndexScale have been initially set to 0.08, but are separate to allow independent tuning
* these are not documented on purpose yet because they are kind of hard to explain, the mainly exist to help conduct larger scale experiments than the jmh benchmarks used to derive the initial set of values
* these changes provide a pretty sweet performance boost for filter processing on nested columns
2023-04-06 04:09:24 -07:00
Gian Merlino 319f99db05
Always use file sizes when determining batch ingest splits (#13955)
* Always use file sizes when determining batch ingest splits.

Main changes:

1) Update CloudObjectInputSource and its subclasses (S3, GCS,
   Azure, Aliyun OSS) to use SplitHintSpecs in all cases. Previously, they
   were only used for prefixes, not uris or objects.

2) Update ExternalInputSpecSlicer (MSQ) to consider file size. Previously,
   file size was ignored; all files were treated as equal weight when
   determining splits.

A side effect of these changes is that we'll make additional network
calls to find the sizes of objects when users specify URIs or objects
as opposed to prefixes. IMO, this is worth it because it's the only way
to respect the user's split hint and task assignment settings.

Secondary changes:

1) S3, Aliyun OSS: Use getObjectMetadata instead of listObjects to get
   metadata for a single object. This is a simpler call that is also
   expected to be less expensive.

2) Azure: Fix a bug where getBlobLength did not populate blob
   reference attributes, and therefore would not actually retrieve the
   blob length.

3) MSQ: Align dynamic slicing logic between ExternalInputSpecSlicer and
   TableInputSpecSlicer.

4) MSQ: Adjust WorkerInputs to ensure there is always at least one
   worker, even if it has a nil slice.

* Add msqCompatible to testGroupByWithImpossibleTimeFilter.

* Fix tests.

* Add additional tests.

* Remove unused stuff.

* Remove more unused stuff.

* Adjust thresholds.

* Remove irrelevant test.

* Fix comments.

* Fix bug.

* Updates.
2023-04-05 08:54:01 -07:00
Clint Wylie d21babc5b8
remix nested columns (#14014)
changes:
* introduce ColumnFormat to separate physical storage format from logical type. ColumnFormat is now used instead of ColumnCapabilities to get column handlers for segment creation
* introduce new 'auto' type indexer and merger which produces a new common nested format of columns, which is the next logical iteration of the nested column stuff. Essentially this is an automatic type column indexer that produces the most appropriate column for the given inputs, making either STRING, ARRAY<STRING>, LONG, ARRAY<LONG>, DOUBLE, ARRAY<DOUBLE>, or COMPLEX<json>.
* revert NestedDataColumnIndexer, NestedDataColumnMerger, NestedDataColumnSerializer to their version pre #13803 behavior (v4) for backwards compatibility
* fix a bug in RoaringBitmapSerdeFactory if anything actually ever wrote out an empty bitmap using toBytes and then later tried to read it (the nerve!)
2023-04-04 17:51:59 -07:00
Karan Kumar 217b0f6832
Eagerly fetching remote s3 files leading to out of disk (OOD) (#13981)
* Eagerly fetching remote s3 files leading to OOD.
2023-04-03 14:10:37 +05:30
Clint Wylie 518698a952
lower segment heap footprint and fix bug with expression type coercion (#14002) 2023-03-31 13:53:22 -07:00
Clint Wylie e3211e3be0
actually backwards compatible frontCoded string encoding strategy (#13996) 2023-03-31 02:24:12 -07:00
Soumyava 1eeecf5fb2
Fixing regression issues on unnest (#13976)
* select sum(c) on an unnested column now does not return 'Type mismatch' error and works properly
* Making sure an inner join query works properly
* Having on unnested column with a group by now works correctly
* count(*) on an unnested query now works correctly
2023-03-31 09:06:43 +05:30
Karan Kumar 8dce3ca4d5
OOM fix for running MSQ jobs with `intermediateSuperSorterStorageMaxLocalBytes` set (#13974)
While using intermediateSuperSorterStorageMaxLocalBytes the super sorter was retaining references of the memory allocator.

The fix clears the current outputChannel when close() is called on the ComposingWritableFrameChannel.java
2023-03-29 18:00:00 +05:30
Clint Wylie 2219e68fa3
add backwards compat mode for frontCoded stringEncodingStrategy (#13988) 2023-03-28 14:44:44 -07:00
Paul Rogers 76fe26d4ba
Fix typos, add tests for http() function (#13954) 2023-03-28 14:41:06 -07:00
Karan Kumar c2fe6a4956
Reworking s3 connector with various improvements (#13960)
* Reworking s3 connector with
1. Adding retries
2. Adding max fetch size
3. Using s3Utils for most of the api's
4. Fixing bugs in DurableStorageCleaner
5. Moving to Iterator for listDir call
2023-03-28 17:05:16 +05:30
Clint Wylie d5b1b5bc8e
nested columns + arrays = array columns! (#13803)
array columns!
changes:
* add support for storing nested arrays of string, long, and double values as specialized nested columns instead of breaking them into separate element columns
* nested column type mimic behavior means that columns ingested with only root arrays of primitive values will be ARRAY typed columns
* neat test refactor stuff
* add v4 segment test
* add array element indexes
* add tests for unnest and array columns
* fix unnest column value selector cursor handling of null and empty arrays
2023-03-27 12:42:35 -07:00
abhagraw c52d15d65d
Fixing security vulnerability check errors (#13956)
* Fixing security vulnerability check errors

* Updating javax.el to jakarta.el

* Adding cron job trigger on changes to suppressions file
2023-03-23 11:10:06 +05:30
Soumyava 2ad133c06e
Unnest changes for moving the filter on right side of correlate to inside the unnest datasource (#13934)
* Refactoring and bug fixes on top of unnest. The filter now is passed inside the unnest cursors. Added tests for scenarios such as
1. filter on unnested column which involves a left filter rewrite
2. filter on unnested virtual column which pushes the filter to the right only and involves no rewrite
3. not filters
4. SQL functions applied on top of unnested column
5. null present in first row of the column to be unnested
2023-03-22 18:24:00 -07:00
Clint Wylie f4392a3155
expression transform improvements and fixes (#13947)
changes:
* fixes inconsistent handling of byte[] values between ExprEval.bestEffortOf and ExprEval.ofType, which could cause byte[] values to end up as java toString values instead of base64 encoded strings in ingest time transforms
* improved ExpressionTransform binding to re-use ExprEval.bestEffortOf when evaluating a binding instead of throwing it away
* improved ExpressionTransform array handling, added RowFunction.evalDimension that returns List<String> to back Row.getDimension and remove the automatic coercing of array types that would typically happen to expression transforms unless using Row.getDimension
* added some tests for ExpressionTransform with array inputs
* improved ExpressionPostAggregator to use partial type information from decoration
* migrate some test uses of InputBindings.forMap to use other methods
2023-03-21 23:26:53 -07:00
Clint Wylie ed57c5c853
better FrontCodedIndexed (#13854)
* Adds new implementation of 'frontCoded' string encoding strategy, which writes out a v1 FrontCodedIndexed which stores buckets on a prefix of the previous value instead of the first value in the bucket
2023-03-14 18:14:11 -07:00
somu-imply a7ba361666
Refactoring and bug fixes on top of unnest. The allowList now is not passed … (#13922)
* Refactoring and bug fixes on top of unnest. The filter now is passed inside the unnest cursors. Added tests for scenarios such as
1. filter on unnested column which involves a left filter rewrite
2. filter on unnested virtual column which pushes the filter to the right only and involves no rewrite
3. not filters
4. SQL functions applied on top of unnested column
5. null present in first row of the column to be unnested
2023-03-14 16:05:56 -07:00
Gian Merlino 4b1ffbc452
Various changes and fixes to UNNEST. (#13892)
* Various changes and fixes to UNNEST.

Native changes:

1) UnnestDataSource: Replace "column" and "outputName" with "virtualColumn".
   This enables pushing expressions into the datasource. This in turn
   allows us to do the next thing...

2) UnnestStorageAdapter: Logically apply query-level filters and virtual
   columns after the unnest operation. (Physically, filters are pulled up,
   when possible.) This is beneficial because it allows filters and
   virtual columns to reference the unnested column, and because it is
   consistent with how the join datasource works.

3) Various documentation updates, including declaring "unnest" as an
   experimental feature for now.

SQL changes:

1) Rename DruidUnnestRel (& Rule) to DruidUnnestRel (& Rule). The rel
   is simplified: it only handles the UNNEST part of a correlated join.
   Constant UNNESTs are handled with regular inline rels.

2) Rework DruidCorrelateUnnestRule to focus on pulling Projects from
   the left side up above the Correlate. New test testUnnestTwice verifies
   that this works even when two UNNESTs are stacked on the same table.

3) Include ProjectCorrelateTransposeRule from Calcite to encourage
   pushing mappings down below the left-hand side of the Correlate.

4) Add a new CorrelateFilterLTransposeRule and CorrelateFilterRTransposeRule
   to handle pulling Filters up above the Correlate. New tests
   testUnnestWithFiltersOutside and testUnnestTwiceWithFilters verify
   this behavior.

5) Require a context feature flag for SQL UNNEST, since it's undocumented.
   As part of this, also cleaned up how we handle feature flags in SQL.
   They're now hooked into EngineFeatures, which is useful because not
   all engines support all features.
2023-03-10 16:42:08 +05:30
imply-cheddar 6b90a320cf
Add back function signature for compat (#13914)
* Add back function signature for compat

* Suppress IntelliJ Error
2023-03-09 21:06:34 -08:00
Laksh Singla 5b0b3a9b2c
Add a readOnly() method for PartitionedOutputChannel (#13755)
With SuperSorter using the PartitionedOutputChannels for sorting, it might OOM on inputs of reasonable size because the channel consists of both the writable frame channel and the frame allocator, both of which are not required once the output channel has been written to.
This change adds a readOnly to the output channel which contains only the readable channel, due to which unnecessary memory references to the writable channel and the memory allocator are lost once the output channel has been written to, preventing the OOM.
2023-03-10 06:58:00 +05:30
Gian Merlino bf39b4d313
Window planning: use collation traits, improve subquery logic. (#13902)
* Window planning: use collation traits, improve subquery logic.

SQL changes:

1) Attach RelCollation (sorting) trait to any PartialDruidQuery
   that ends in AGGREGATE or AGGREGATE_PROJECT. This allows planning to
   take advantage of the fact that Druid sorts by dimensions when
   doing aggregations.

2) Windowing: inspect RelCollation trait from input, and insert naiveSort
   if, and only if, necessary.

3) Windowing: add support for Project after Window, when the Project
   is a simple mapping. Helps eliminate subqueries.

4) DruidRules: update logic for considering subqueries to reflect that
   subqueries are not required to be GroupBys, and that we have a bunch
   of new Stages now. With all of this evolution that has happened, the
   old logic didn't quite make sense.

Native changes:

1) Use merge sort (stable) rather than quicksort when sorting
   RowsAndColumns. Makes it easier to write test cases for plans that
   involve re-sorting the data.

* Changes from review.

* Mark the bad test as failing.

* Additional update.

* Fix failingTest.

* Fix tests.

* Mark a var final.
2023-03-09 15:48:13 -08:00
Gian Merlino fe9d0c46d5
Improve memory efficiency of WrappedRoaringBitmap. (#13889)
* Improve memory efficiency of WrappedRoaringBitmap.

Two changes:

1) Use an int[] for sizes 4 or below.
2) Remove the boolean compressRunOnSerialization. Doesn't save much
   space, but it does save a little, and it isn't adding a ton of value
   to have it be configurable. It was originally configurable in case
   anything broke when enabling it, but it's been a while and nothing
   has broken.

* Slight adjustment.

* Adjust for inspection.

* Updates.

* Update snaps.

* Update test.

* Adjust test.

* Fix snaps.
2023-03-09 15:48:02 -08:00
Clint Wylie 48ac5ce50b
use native nvl expression for SQL NVL and 2 argument COALESCE (#13897)
* use custom case operator conversion instead of direct operator conversion, to produce native nvl expression for SQL NVL and 2 argument COALESCE, and add optimization for certain case filters from coalesce and nvl statements
2023-03-09 05:46:17 -08:00
Gian Merlino 82f7a56475
Sort-merge join and hash shuffles for MSQ. (#13506)
* Sort-merge join and hash shuffles for MSQ.

The main changes are in the processing, multi-stage-query, and sql modules.

processing module:

1) Rename SortColumn to KeyColumn, replace boolean descending with KeyOrder.
   This makes it nicer to model hash keys, which use KeyOrder.NONE.

2) Add nullability checkers to the FieldReader interface, and an
   "isPartiallyNullKey" method to FrameComparisonWidget. The join
   processor uses this to detect null keys.

3) Add WritableFrameChannel.isClosed and OutputChannel.isReadableChannelReady
   so callers can tell which OutputChannels are ready for reading and which
   aren't.

4) Specialize FrameProcessors.makeCursor to return FrameCursor, a random-access
   implementation. The join processor uses this to rewind when it needs to
   replay a set of rows with a particular key.

5) Add MemoryAllocatorFactory, which is embedded inside FrameWriterFactory
   instead of a particular MemoryAllocator. This allows FrameWriterFactory
   to be shared in more scenarios.

multi-stage-query module:

1) ShuffleSpec: Add hash-based shuffles. New enum ShuffleKind helps callers
   figure out what kind of shuffle is happening. The change from SortColumn
   to KeyColumn allows ClusterBy to be used for both hash-based and sort-based
   shuffling.

2) WorkerImpl: Add ability to handle hash-based shuffles. Refactor the logic
   to be more readable by moving the work-order-running code to the inner
   class RunWorkOrder, and the shuffle-pipeline-building code to the inner
   class ShufflePipelineBuilder.

3) Add SortMergeJoinFrameProcessor and factory.

4) WorkerMemoryParameters: Adjust logic to reserve space for output frames
   for hash partitioning. (We need one frame per partition.)

sql module:

1) Add sqlJoinAlgorithm context parameter; can be "broadcast" or
   "sortMerge". With native, it must always be "broadcast", or it's a
   validation error. MSQ supports both. Default is "broadcast" in
   both engines.

2) Validate that MSQs do not use broadcast join with RIGHT or FULL join,
   as results are not correct for broadcast join with those types. Allow
   this in native for two reasons: legacy (the docs caution against it,
   but it's always been allowed), and the fact that it actually *does*
   generate correct results in native when the join is processed on the
   Broker. It is much less likely that MSQ will plan in such a way that
   generates correct results.

3) Remove subquery penalty in DruidJoinQueryRel when using sort-merge
   join, because subqueries are always required, so there's no reason
   to penalize them.

4) Move previously-disabled join reordering and manipulation rules to
   FANCY_JOIN_RULES, and enable them when using sort-merge join. Helps
   get to better plans where projections and filters are pushed down.

* Work around compiler problem.

* Updates from static analysis.

* Fix @param tag.

* Fix declared exception.

* Fix spelling.

* Minor adjustments.

* wip

* Merge fixups

* fixes

* Fix CalciteSelectQueryMSQTest

* Empty keys are sortable.

* Address comments from code review. Rename mux -> mix.

* Restore inspection config.

* Restore original doc.

* Reorder imports.

* Adjustments

* Fix.

* Fix imports.

* Adjustments from review.

* Update header.

* Adjust docs.
2023-03-08 14:19:39 -08:00
Abhishek Agarwal 52bd9e6adb
Improved error message when topic name changes within same supervisor (#13815)
Improved error message when topic name changes within same supervisor

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
2023-03-07 18:10:18 -08:00
Gian Merlino fcfb7b8ff6
Add warning comments to Granularity.getIterable. (#13888)
This function is notorious for causing memory exhaustion and excessive
CPU usage; so much so that it was valuable to work around it in the
SQL planner in #13206. Hopefully, a warning comment will encourage
developers to stay away and come up with solutions that do not involve
computing all possible buckets.
2023-03-06 22:57:10 -08:00
Anshu Makkar a10e4150d5
Add Post Aggregators for Tuple Sketches (#13819)
You can now do the following operations with TupleSketches in Post Aggregation Step

Get the Sketch Output as Base64 String
Provide a constant Tuple Sketch in post-aggregation step that can be used in Set Operations
Get the Estimated Value(Sum) of Summary/Metrics Objects associated with Tuple Sketch
2023-03-03 09:32:09 +05:30
Tejaswini Bandlamudi 7103cb4b9d
Removes FiniteFirehoseFactory and its implementations (#12852)
The FiniteFirehoseFactory and InputRowParser classes were deprecated in 0.17.0 (#8823) in favor of InputSource & InputFormat. This PR removes the FiniteFirehoseFactory and all its implementations along with classes solely used by them like Fetcher (Used by PrefetchableTextFilesFirehoseFactory). Refactors classes including tests using FiniteFirehoseFactory to use InputSource instead.
Removing InputRowParser may not be as trivial as many classes that aren't deprecated depends on it (with no alternatives), like EventReceiverFirehoseFactory. Hence FirehoseFactory, EventReceiverFirehoseFactory, and Firehose are marked deprecated.
2023-03-02 18:07:17 +05:30
Clint Wylie 6cf754b0e0
move numeric null value coercion out of expression processing engine (#13809)
* move numeric null value coercion out of expression processing engine
* add ExprEval.valueOrDefault() to allow consumers to automatically coerce to default values
* rename Expr.buildVectorized as Expr.asVectorProcessor more consistent naming with Function and ApplyFunction; javadocs for some stuff
2023-02-28 18:10:07 -08:00
Clint Wylie 1d8fff4096
sampler + type detection = bff (#13711)
* sampler + type detection = bff
* split logical and physical dimensions, tidy up
2023-02-28 04:14:30 -08:00
hqx871 79f04e71a1
Hadoop based batch ingestion support range partition (#13303)
This pr implements range partitioning for hadoop-based ingestion. For detail about multi dimension range partition can be seen #11848.
2023-02-23 11:38:03 +05:30
Kashif Faraz 3a67a43c8a
Add method SegmentTimeline.addSegments (#13831) 2023-02-21 23:58:01 -08:00
Clint Wylie 614205f3bc
fix some intellij inspections in druid-processing (#13823)
fix some intellij inspections in druid-processing
2023-02-21 09:02:02 +05:30
Gian Merlino 882ae9f002
Speed up composite key joins on IndexedTable. (#13516)
* Speed up composite key joins on IndexedTable.

Prior to this patch, IndexedTable indexes are sorted IntList. This works
great when we have a single-column join key: we simply retrieve the list
and we know what rows match. However, when we have a composite key, we
need to merge the sorted lists. This is inefficient when one is very dense
and others are very sparse.

This patch switches from sorted IntList to IntSortedSet, and changes
to the following intersection algorithm:

1) Initialize the intersection set to the smallest matching set from the
   various parts of the composite key.

2) For each element in that smallest set, check other sets for that element.
   If any do *not* include it, then remove the element from the intersection
   set.

This way, complexity scales with the size of the smallest set, not the
largest one.

* RangeIntSet stuff.
2023-02-17 22:01:01 -08:00
Clint Wylie 08b5951cc5
merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698)
* merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything
* fix poms and license stuff
* mockito is evil
* allow reset of JvmUtils RuntimeInfo if tests used static injection to override
2023-02-17 14:27:41 -08:00
Paul Rogers 333196d207
Code cleanup & message improvements (#13778)
* Misc cleanup edits

Correct spacing
Add type parameters
Add toString() methods to formats so tests compare correctly
IT doc revisions
Error message edits
Display UT query results when tests fail

* Edit

* Build fix

* Build fixes
2023-02-15 15:22:54 +05:30
Suneet Saldanha f67abf2e99
Better logs for query errors (#13776)
* Better logs for query errors

* checkstyle
2023-02-14 15:55:58 -08:00