Commit Graph

2789 Commits

Author SHA1 Message Date
zachjsh e75fb8e8e3
Account for data format and compression in MSQ auto taskAssignment (#14307)
### Description

This change allows for consideration of the input format and compression  when computing how to split the input files among available tasks, in MSQ ingestion, when considering the value of the  `maxInputBytesPerWorker` query context parameter. This query parameter allows users to control the maximum number of bytes, with granularity of input file / object, that ingestion tasks will be assigned to ingest. With this change, this context parameter now denotes the estimated weighted size in bytes of the input to split on, with consideration for input format and compression format, rather than the actual file size, reported by the file system.  We assume uncompressed newline delimited json as a baseline, with scaling factor of `1`. This means that when computing the byte weight that a file has towards the input splitting, we take the file size as is, if uncompressed json, 1:1. It was found during testing that gzip compressed json, and parquet, has scale factors of `4` and `8` respectively, meaning that each byte of data is weighted 4x and 8x respectively, when computing input splits. This weighted byte scaling is only considered for MSQ ingestion that uses either LocalInputSource or CloudObjectInputSource at the moment. The default value of the `maxInputBytesPerWorker` query context parameter has been updated from 10 GiB, to 512 MiB
2023-06-01 12:53:49 -07:00
Clint Wylie 4096f51f0b
add configurable ColumnTypeMergePolicy to SegmentMetadataCache (#14319)
This PR adds a new interface to control how SegmentMetadataCache chooses ColumnType when faced with differences between segments for SQL schemas which are computed, exposed as druid.sql.planner.metadataColumnTypeMergePolicy and adds a new 'least restrictive type' mode to allow choosing the type that data across all segments can best be coerced into and sets this as the default behavior.

This is a behavior change around when segment driven schema migrations take effect for the SQL schema. With latestInterval, the SQL schema will be updated as soon as the first job with the new schema has published segments, while using leastRestrictive, the schema will only be updated once all segments are reindexed to the new type. The benefit of leastRestrictive is that it eliminates a bunch of type coercion errors that can happen in SQL when types are varied across segments with latestInterval because the newest type is not able to correctly represent older data, such as if the segments have a mix of ARRAY and number types, or any other combinations that lead to odd query plans.
2023-05-24 20:32:51 +05:30
Soumyava 22ba457d29
Expr getCacheKey now delegates to children (#14287)
* Expr getCacheKey now delegates to children

* Removed the LOOKUP_EXPR_CACHE_KEY as we do not need it

* Adding an unit test

* Update processing/src/main/java/org/apache/druid/math/expr/Expr.java

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

---------

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2023-05-23 14:49:38 -07:00
Abhishek Radhakrishnan a5e04d95a4
Add `TYPE_NAME` to the complex serde classes and replace the hardcoded names. (#14317)
* Add TYPE_NAME to the serde classes and reuse them instead of hardcoded strings.

* Static check fixes.
2023-05-23 00:54:47 -05:00
Clint Wylie d92b9fbfac
more resilient segment metadata, dont parallel merge internal segment metadata queries (#14296) 2023-05-17 04:12:55 -07:00
Clint Wylie b038a11280
fix issues with handling arrays with all null elements and arrays of booleans in strict mode (#14297) 2023-05-17 01:33:44 -07:00
Soumyava 96a3c00754
Fixing an issue with filtering on a single dimension by converting In… (#14277)
* Fixing an issue with filtering on a single dimension by converting In filter to a selector filter as needed with Filters.toFilter

* Adding a test so that any future refactoring does not break this behavior

* Made comment a bit more meaningful
2023-05-15 20:10:36 -07:00
imply-cheddar f9861808bc
Be able to load segments on Peons (#14239)
* Be able to load segments on Peons

This change introduces a new config on WorkerConfig
that indicates how many bytes of each storage
location to use for storage of a task.  Said config
is divided up amongst the locations and slots
and then used to set TaskConfig.tmpStorageBytesPerTask

The Peons use their local task dir and
tmpStorageBytesPerTask as their StorageLocations for
the SegmentManager such that they can accept broadcast
segments.
2023-05-12 16:51:00 -07:00
Kashif Faraz ba11b3d462
Refactor: Add OverlordDuty to replace OverlordHelper and align with CoordinatorDuty (#14235)
Changes:
- Replace `OverlordHelper` with `OverlordDuty` to align with `CoordinatorDuty`
  - Each duty has a `run()` method and defines a `Schedule` with an initial delay and period.
  - Update existing duties `TaskLogAutoCleaner` and `DurableStorageCleaner`
- Add utility class `Configs`
- Update log, error messages and javadocs
- Other minor style improvements
2023-05-12 22:39:56 +05:30
Clint Wylie 9875090bee
fix segment metadata queries for auto ingested columns that had all null values (#14262) 2023-05-11 20:58:06 -07:00
Soumyava f128b9b666
Updates to filter processing for inner query in Joins (#14237) 2023-05-11 17:21:41 +05:30
Clint Wylie a58cebe491
add array_to_mv function to convert arrays into mvds to assist with migration from mvds to arrays (#14236) 2023-05-11 04:43:28 -07:00
Kashif Faraz 64e6283eca
Do not allow retention rules to be null (#14223)
Changes:
- Do not allow retention rules for any datasource or cluster to be null
- Allow empty rules at the datasource level but not at the cluster level
- Add validation to ensure that `druid.manager.rules.defaultRule` is always set correctly
- Minor style refactors
2023-05-11 14:33:56 +05:30
Clint Wylie aaaff74740
fix npe regression in json_value when filtering non-existent paths (#14250)
* fix npe regression in json_value when filtering non-existent paths

* more coverage
2023-05-10 22:39:22 -07:00
Clint Wylie 6db11bfc60
suppress some cves and fix javadoc build when using java 17 (#14241) 2023-05-10 15:47:10 -07:00
Clint Wylie 8805d8d7db
fix issues with filtering nulls on values coerced to numeric types (#14139)
* fix issues with filtering nulls on values coerced to numeric types
* fix issues with 'auto' type numeric columns in default value mode
* optimize variant typed columns without nested data
* more tests for 'auto' type column ingestion
2023-05-08 13:19:02 -07:00
Clint Wylie a7a4bfd331
modify QueryScheduler to lazily acquire lanes when executing queries to avoid leaks (#14184)
This PR fixes an issue that could occur if druid.query.scheduler.numThreads is configured and any exception occurs after QueryScheduler.run has been called to create a Sequence. This would result in total and/or lane specific locks being acquired, but because the sequence was not actually being evaluated, the "baggage" which typically releases these locks was not being executed. An example of how this can happen is if a group-by having filter, which wraps and transforms this sequence happens to explode while wrapping the sequence. The end result is that the locks are acquired, but never released, eventually halting the ability to execute any queries.
2023-05-08 11:42:05 +05:30
Clint Wylie 90ea192d9c
fix bugs with auto encoded long vector deserializers (#14186)
This PR fixes an issue when using 'auto' encoded LONG typed columns and the 'vectorized' query engine. These columns use a delta based bit-packing mechanism, and errors in the vectorized reader would cause it to incorrectly read column values for some bit sizes (1 through 32 bits). This is a regression caused by #11004, which added the optimized readers to improve performance, so impacts Druid versions 0.22.0+.

While writing the test I finally got sad enough about IndexSpec not having a "builder", so I made one, and switched all the things to use it. Apologies for the noise in this bug fix PR, the only real changes are in VSizeLongSerde, and the tests that have been modified to cover the buggy behavior, VSizeLongSerdeTest and ExpressionVectorSelectorsTest. Everything else is just cleanup of IndexSpec usage.
2023-05-01 11:49:27 +05:30
Suneet Saldanha 84c11df980
Make LoggingEmitter more useful by using Markers (#14121)
* Make LoggingEmitter more useful

* Skip code coverage for facade classes

* fix spellcheck

* code review

* fix dependency

* logging.md

* fix checkstyle

* Add back jacoco version to main pom
2023-04-27 15:06:06 -07:00
Adarsh Sanjeev 5aa119dfda
Add retry to opening retrying stream (#14126)
* Add retry to opening retrying stream
* Add retry to S3Entity for network issues

* Fix tests and clean up code
2023-04-27 16:52:22 +05:30
Gian Merlino 42c8c84eb6
TimeBoundary: Use cursor when datasource is not a regular table. (#14151)
* TimeBoundary: Use cursor when datasource is not a regular table.

Fixes a bug where TimeBoundary could return incorrect results with
INNER Join or inline data.

* Addl Javadocs.
2023-04-26 17:00:13 -07:00
Gian Merlino 752475b799
Fix two concurrency issues with segment fetching. (#14042)
* Fix two concurrency issues with segment fetching.

1) SegmentLocalCacheManager: Fix a concurrency issue where certain directory
   cleanup happened outside of directoryWriteRemoveLock. This created the
   possibility that segments would be deleted by one thread, while being
   actively downloaded by another thread.

2) TaskDataSegmentProcessor (MSQ): Fix a concurrency issue when two stages
   in the same process both use the same segment. For example: a self-join
   using distributed sort-merge. Prior to this change, the two stages could
   delete each others' segments.

3) ReferenceCountingResourceHolder: increment() returns a new ResourceHolder,
   rather than a Releaser. This allows it to be passed to callers without them
   having to hold on to both the original ResourceHolder *and* a Releaser.

4) Simplify various interfaces and implementations by using ResourceHolder
   instead of Pair and instead of split-up fields.

* Add test.

* Fix style.

* Remove Releaser.

* Updates from master.

* Add some GuardedBys.

* Use the correct GuardedBy.

* Adjustments.
2023-04-25 20:49:27 -07:00
Gian Merlino 2dfb693d4c
Improved handling for zero-length intervals. (#14136)
* Improved handling for zero-length intervals.

1) Return an empty list from VersionedIntervalTimeline.lookup when
   provided with an empty interval. (The logic doesn't quite work when
   intervals are empty, which led to #14129.)

2) Don't return zero-length intervals from JodaUtils.condenseIntervals.

3) Detect "incorrect" comparator in JodaUtils.condenseIntervals, and
   recreate the SortedSet if needed. (Not strictly related to the theme
   of this patch. Just another thing in the same file.)

4) Remove unused method JodaUtils.containOverlappingIntervals.

Fixes #14129.

* Fix TimewarpOperatorTest.
2023-04-25 17:12:56 -07:00
Gian Merlino 89e7948159
MSQ: Subclass CalciteJoinQueryTest, other supporting changes. (#14105)
* MSQ: Subclass CalciteJoinQueryTest, other supporting changes.

The main change is the new tests: we now subclass CalciteJoinQueryTest
in CalciteSelectJoinQueryMSQTest twice, once for Broadcast and once for
SortMerge.

Two supporting production changes for default-value mode:

1) InputNumberDataSource is marked as concrete, to allow leftFilter to
   be pushed down to it.

2) In default-value mode, numeric frame field readers can now return nulls.
   This is necessary when stacking joins on top of joins: nulls must be
   preserved for semantics that match broadcast joins and native queries.

3) In default-value mode, StringFieldReader.isNull returns true on empty
   strings in addition to nulls. This is more consistent with the behavior
   of the selectors, which map empty strings to null as well in that mode.

As an effect of change (2), the InsertTimeNull change from #14020 (to
replace null timestamps with default timestamps) is reverted. IMO, this
is fine, as either behavior is defensible, and the change from #14020
hasn't been released yet.

* Adjust tests.

* Style fix.

* Additional tests.
2023-04-25 12:10:23 -07:00
Gian Merlino 73f050027b
MSQ: Preserve original ParseException when writing frames. (#14122) 2023-04-25 11:47:15 +05:30
Nicholas Lippis 9d4cc501f7
return task status reported by peon (#14040)
* return task status reported by peon

* Write TaskStatus to file in AbstractTask.cleanUp

* Get TaskStatus from task log

* Fix merge conflicts in AbstractTaskTest

* Add unit tests for TaskLogPusher, TaskLogStreamer, NoopTaskLogs to satisfy code coverage

* Add license headerss

* Fix style

* Remove unknown exception declarations
2023-04-24 12:05:39 -07:00
TSFenwick accd5536df
Allow for Log4J to be configured for peons but still ensure console logging is enforced (#14094)
* Allow for Log4J to be configured for peons but still ensure console logging is enforced

This change will allow for log4j to be configured for peons but require console logging is still
configured for them to ensure peon logs are saved to deep storage.

Also fixed the test ConsoleLoggingEnforcementTest to use a valid appender for the non console
Config as the previous config was incorrect and would never return a logger.

* fix checkstyle

* add warning to logger when it overwrites all loggers to be console

* optimize calls for altering logging config for ConsoleLoggingEnforcementConfigurationFactory

add getName to the druid logger class

* update docs, and error message

* edit docs to be more clear

* fix checkstyle issues

* CI fixes - LoggerTest code coverage and fix spelling issue for logging docs
2023-04-24 10:41:56 -07:00
Soumyava 8d60edcfcb
Updating segment map function for QueryDataSource to ensure group by … (#14112)
* Updating segment map function for QueryDataSource to ensure group by of group by of join data source gets into proper segment map function path

* Adding unit tests for the failed case

* There you go coverage bot, be happy now
2023-04-20 13:22:29 -07:00
Gian Merlino 9436ee8a63
Nicer error message for CSV with no properties. (#14093)
* Nicer error message for CSV with no properties.

* Take two.

* Adjustments from review, and test fixes.

* Fix test.

* Fix static check.
2023-04-18 12:52:02 -07:00
Clint Wylie e7d2e8b914
fix bug filtering nested columns with expression filters (#14096) 2023-04-17 14:21:32 -07:00
Gian Merlino facd82b493
Add HLLC tests for empty strings that don't pass. (#14085)
I believe the test case illustrates the cause of the problem in #13950.
2023-04-17 15:46:42 +05:30
Gian Merlino 0884a22c41
MSQ: Support for querying lookup and inline data directly. (#14048)
* MSQ: Support for querying lookup and inline data directly.

Main changes:

1) Add of LookupInputSpec and DataSourcePlan.forLookup.

2) Add InlineInputSpec, and modify of DataSourcePlan.forInline to use
   this instead of an ExternalInputSpec with JSON. This allows the inline
   data to act as the right-hand side of a join, if needed.

Supporting changes:

1) Modify JoinDataSource's leftFilter validation to be a little less
   strict: it's now OK with leftFilter being attached to any concrete
   leaf (no children) datasource, rather than requiring it be a table.
   This allows MSQ to create JoinDataSource with InputNumberDataSource
   as the base.

2) Add SegmentWranglerModule to CliIndexer, CliPeon. This allows them to
   query lookups and inline data directly.

* Updates based on CI.

* Additional tests.

* Style fix.

* Remove unused import.
2023-04-14 14:04:02 -07:00
Clint Wylie 179e2e8108
adjust useSchemaDiscovery to also include the behavior of includeAllDimensions to support partial schema declaration without having to set two flags (#14076) 2023-04-12 23:12:49 -07:00
Gian Merlino 81074411a9
MSQ: Support multiple result columns with the same name. (#14025)
* MSQ: Support multiple result columns with the same name.

This is allowed in SQL, and is supported by the regular SQL endpoint.
We retain a validation that INSERT ... SELECT does not allow multiple
columns with the same name, because column names in segments must be
unique.
2023-04-13 11:09:39 +05:30
Clint Wylie 9ed8beca5e
bug fixes and add support for boolean inputs to classic long dimension indexer (#14069)
changes:
* adds support for boolean inputs to the classic long dimension indexer, which plays nice with LONG being the semi official boolean type in Druid, and even nicer when druid.expressions.useStrictBooleans is set to true, since the sampler when using the new 'auto' schema when 'useSchemaDiscovery' is specified on the dimensions spec will call the type out as LONG
* fix bugs with sampler response and new schema discovery stuff incorrectly using classic 'json' type for the logical schema instead of the new 'auto' type
2023-04-11 20:49:52 -07:00
Clint Wylie 29652bd246
fix NPE that can happen when merging all null nested v4 format columns (#14068) 2023-04-11 19:04:51 -07:00
Clint Wylie d61bd7f8f1
fix bug in nested v4 format merger from refactoring (#14053) 2023-04-10 20:38:58 -07:00
Clint Wylie 1aef72aa7e
Bump up the version in pom to 27.0.0 in preparation of release (#14051) 2023-04-10 14:56:59 +05:30
Gian Merlino d52bc333aa
Frames: Ensure nulls are read as default values when appropriate. (#14020)
* Frames: Ensure nulls are read as default values when appropriate.

Fixes a bug where LongFieldWriter didn't write a properly transformed
zero when writing out a null. This had no meaningful effect in SQL-compatible
null handling mode, because the field would get treated as a null anyway.
But it does have an effect in default-value mode: it would cause Long.MIN_VALUE
to get read out instead of zero.

Also adds NullHandling checks to the various frame-based column selectors,
allowing reading of nullable frames by servers in default-value mode.
2023-04-10 05:28:46 +05:30
Clint Wylie f41468fd46
fix off by one error in FrontCodedIndexedWriter and FrontCodedIntArrayIndexedWriter getCardinality method (#14047)
* fix off by one error in FrontCodedIndexedWriter and FrontCodedIntArrayIndexedWriter getCardinality method
2023-04-07 03:11:15 -07:00
zachjsh 5c0221375c
Allow for Input source security in native task layer (#14003)
Fixes #13837.

### Description

This change allows for input source type security in the native task layer.

To enable this feature, the user must set the following property to true:

`druid.auth.enableInputSourceSecurity=true`

The default value for this property is false, which will continue the existing functionality of needing authorization to write to the respective datasource.

When this config is enabled, the users will be required to be authorized for the following resource action, in addition to write permission on the respective datasource.

`new ResourceAction(new Resource(ResourceType.EXTERNAL, {INPUT_SOURCE_TYPE}, Action.READ`

where `{INPUT_SOURCE_TYPE}` is the type of the input source being used;, http, inline, s3, etc..

Only tasks that provide a non-default implementation of the `getInputSourceResources` method can be submitted when config `druid.auth.enableInputSourceSecurity=true` is set. Otherwise, a 400 error will be thrown.
2023-04-06 13:13:09 -04:00
Abhishek Agarwal 92912a6a2b
JOIN or UNNEST queries over tombstone segment can fail (#14021)
Join,Unnest queries over tombstone segment can fail
2023-04-06 16:55:58 +05:30
Clint Wylie b11c0bc249
smarter nested column index utilization (#13977)
* smarter nested column index utilization
changes:
* adds skipValueRangeIndexScale and skipValuePredicateIndexScale to ColumnConfig (e.g. DruidProcessingConfig) available as system config via druid.processing.indexes.skipValueRangeIndexScale and druid.processing.indexes.skipValuePredicateIndexScale
* NestedColumnIndexSupplier uses skipValueRangeIndexScale and skipValuePredicateIndexScale to multiply by the total number of rows to be processed to determine the threshold at which we should no longer consider using bitmap indexes because it will be too many operations
* Default values for skipValueRangeIndexScale and skipValuePredicateIndexScale have been initially set to 0.08, but are separate to allow independent tuning
* these are not documented on purpose yet because they are kind of hard to explain, the mainly exist to help conduct larger scale experiments than the jmh benchmarks used to derive the initial set of values
* these changes provide a pretty sweet performance boost for filter processing on nested columns
2023-04-06 04:09:24 -07:00
Gian Merlino 319f99db05
Always use file sizes when determining batch ingest splits (#13955)
* Always use file sizes when determining batch ingest splits.

Main changes:

1) Update CloudObjectInputSource and its subclasses (S3, GCS,
   Azure, Aliyun OSS) to use SplitHintSpecs in all cases. Previously, they
   were only used for prefixes, not uris or objects.

2) Update ExternalInputSpecSlicer (MSQ) to consider file size. Previously,
   file size was ignored; all files were treated as equal weight when
   determining splits.

A side effect of these changes is that we'll make additional network
calls to find the sizes of objects when users specify URIs or objects
as opposed to prefixes. IMO, this is worth it because it's the only way
to respect the user's split hint and task assignment settings.

Secondary changes:

1) S3, Aliyun OSS: Use getObjectMetadata instead of listObjects to get
   metadata for a single object. This is a simpler call that is also
   expected to be less expensive.

2) Azure: Fix a bug where getBlobLength did not populate blob
   reference attributes, and therefore would not actually retrieve the
   blob length.

3) MSQ: Align dynamic slicing logic between ExternalInputSpecSlicer and
   TableInputSpecSlicer.

4) MSQ: Adjust WorkerInputs to ensure there is always at least one
   worker, even if it has a nil slice.

* Add msqCompatible to testGroupByWithImpossibleTimeFilter.

* Fix tests.

* Add additional tests.

* Remove unused stuff.

* Remove more unused stuff.

* Adjust thresholds.

* Remove irrelevant test.

* Fix comments.

* Fix bug.

* Updates.
2023-04-05 08:54:01 -07:00
Clint Wylie d21babc5b8
remix nested columns (#14014)
changes:
* introduce ColumnFormat to separate physical storage format from logical type. ColumnFormat is now used instead of ColumnCapabilities to get column handlers for segment creation
* introduce new 'auto' type indexer and merger which produces a new common nested format of columns, which is the next logical iteration of the nested column stuff. Essentially this is an automatic type column indexer that produces the most appropriate column for the given inputs, making either STRING, ARRAY<STRING>, LONG, ARRAY<LONG>, DOUBLE, ARRAY<DOUBLE>, or COMPLEX<json>.
* revert NestedDataColumnIndexer, NestedDataColumnMerger, NestedDataColumnSerializer to their version pre #13803 behavior (v4) for backwards compatibility
* fix a bug in RoaringBitmapSerdeFactory if anything actually ever wrote out an empty bitmap using toBytes and then later tried to read it (the nerve!)
2023-04-04 17:51:59 -07:00
Karan Kumar 217b0f6832
Eagerly fetching remote s3 files leading to out of disk (OOD) (#13981)
* Eagerly fetching remote s3 files leading to OOD.
2023-04-03 14:10:37 +05:30
Clint Wylie 518698a952
lower segment heap footprint and fix bug with expression type coercion (#14002) 2023-03-31 13:53:22 -07:00
Clint Wylie e3211e3be0
actually backwards compatible frontCoded string encoding strategy (#13996) 2023-03-31 02:24:12 -07:00
Soumyava 1eeecf5fb2
Fixing regression issues on unnest (#13976)
* select sum(c) on an unnested column now does not return 'Type mismatch' error and works properly
* Making sure an inner join query works properly
* Having on unnested column with a group by now works correctly
* count(*) on an unnested query now works correctly
2023-03-31 09:06:43 +05:30
Karan Kumar 8dce3ca4d5
OOM fix for running MSQ jobs with `intermediateSuperSorterStorageMaxLocalBytes` set (#13974)
While using intermediateSuperSorterStorageMaxLocalBytes the super sorter was retaining references of the memory allocator.

The fix clears the current outputChannel when close() is called on the ComposingWritableFrameChannel.java
2023-03-29 18:00:00 +05:30