Commit Graph

1128 Commits

Author SHA1 Message Date
Gian Merlino 1c7a03a47b
Lower default maxRowsInMemory for realtime ingestion. (#13939)
* Lower default maxRowsInMemory for realtime ingestion.

The thinking here is that for best ingestion throughput, we want
intermediate persists to be as big as possible without using up all
available memory. So, we rely mainly on maxBytesInMemory. The default
maxRowsInMemory (1 million) is really just a safety: in case we have
a large number of very small rows, we don't want to get overwhelmed
by per-row overheads.

However, maximum ingestion throughput isn't necessarily the primary
goal for realtime ingestion. Query performance is also important. And
because query performance is not as good on the in-memory dataset, it's
helpful to keep it from growing too large. 150k seems like a reasonable
balance here. It means that for a typical 5 million row segment, we
won't trigger more than 33 persists due to this limit, which is a
reasonable number of persists.

* Update tests.

* Update server/src/main/java/org/apache/druid/segment/indexing/RealtimeTuningConfig.java

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* Fix test.

* Fix link.

---------

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2023-03-21 10:36:36 -07:00
Adarsh Sanjeev 143fdcfacf
Change test name so it triggers in CI (#13844)
As the name of the class did not end or start with "Test", CalciteSelectQueryMSQTest was not triggered in CI. This PR renames the test.
2023-03-20 15:55:52 +05:30
Karan Kumar bf13156b55
Regression bug fix where ever LimitFrameProcessor's were used. (#13941) 2023-03-16 09:18:18 -07:00
Karan Kumar 67df1324ee
Undocumenting certain context parameter in MSQ. (#13928)
* Removing intermediateSuperSorterStorageMaxLocalBytes, maxInputBytesPerWorker, composedIntermediateSuperSorterStorageEnabled, clusterStatisticsMergeMode from docs

* Adding documentation in the context class.
2023-03-16 17:56:44 +05:30
Tejaswini Bandlamudi 6837289cb0
Fixes parquet uint_32 datatype conversion (#13935)
After parquet ingestion, uint_32 parquet datatypes are stored as null values in the dataSource. This PR fixes this conversion bug.
2023-03-16 15:27:38 +05:30
Gian Merlino 4b1ffbc452
Various changes and fixes to UNNEST. (#13892)
* Various changes and fixes to UNNEST.

Native changes:

1) UnnestDataSource: Replace "column" and "outputName" with "virtualColumn".
   This enables pushing expressions into the datasource. This in turn
   allows us to do the next thing...

2) UnnestStorageAdapter: Logically apply query-level filters and virtual
   columns after the unnest operation. (Physically, filters are pulled up,
   when possible.) This is beneficial because it allows filters and
   virtual columns to reference the unnested column, and because it is
   consistent with how the join datasource works.

3) Various documentation updates, including declaring "unnest" as an
   experimental feature for now.

SQL changes:

1) Rename DruidUnnestRel (& Rule) to DruidUnnestRel (& Rule). The rel
   is simplified: it only handles the UNNEST part of a correlated join.
   Constant UNNESTs are handled with regular inline rels.

2) Rework DruidCorrelateUnnestRule to focus on pulling Projects from
   the left side up above the Correlate. New test testUnnestTwice verifies
   that this works even when two UNNESTs are stacked on the same table.

3) Include ProjectCorrelateTransposeRule from Calcite to encourage
   pushing mappings down below the left-hand side of the Correlate.

4) Add a new CorrelateFilterLTransposeRule and CorrelateFilterRTransposeRule
   to handle pulling Filters up above the Correlate. New tests
   testUnnestWithFiltersOutside and testUnnestTwiceWithFilters verify
   this behavior.

5) Require a context feature flag for SQL UNNEST, since it's undocumented.
   As part of this, also cleaned up how we handle feature flags in SQL.
   They're now hooked into EngineFeatures, which is useful because not
   all engines support all features.
2023-03-10 16:42:08 +05:30
Gian Merlino 90d8f67e3d
Avoid creating new RelDataTypeFactory during SQL planning. (#13904)
* Avoid creating new RelDataTypeFactory during SQL planning.

Reduces unnecessary CPU cycles.

* Fix.
2023-03-08 21:55:49 -08:00
Laksh Singla dc67296e9d
Fix for OOM in the Tombstone generating logic in MSQ (#13893)
fix OOMs using a different logic for generating tombstones

---------

Co-authored-by: Paul Rogers <paul-rogers@users.noreply.github.com>
2023-03-08 21:38:08 -08:00
Clint Wylie c7f4bb5056
fix KafkaInputFormat when used with Sampler API (#13900)
* fix KafkaInputFormat when used with Sampler API

* handle key format sampling the same as value format sampling
2023-03-08 16:23:24 -08:00
Gian Merlino 82f7a56475
Sort-merge join and hash shuffles for MSQ. (#13506)
* Sort-merge join and hash shuffles for MSQ.

The main changes are in the processing, multi-stage-query, and sql modules.

processing module:

1) Rename SortColumn to KeyColumn, replace boolean descending with KeyOrder.
   This makes it nicer to model hash keys, which use KeyOrder.NONE.

2) Add nullability checkers to the FieldReader interface, and an
   "isPartiallyNullKey" method to FrameComparisonWidget. The join
   processor uses this to detect null keys.

3) Add WritableFrameChannel.isClosed and OutputChannel.isReadableChannelReady
   so callers can tell which OutputChannels are ready for reading and which
   aren't.

4) Specialize FrameProcessors.makeCursor to return FrameCursor, a random-access
   implementation. The join processor uses this to rewind when it needs to
   replay a set of rows with a particular key.

5) Add MemoryAllocatorFactory, which is embedded inside FrameWriterFactory
   instead of a particular MemoryAllocator. This allows FrameWriterFactory
   to be shared in more scenarios.

multi-stage-query module:

1) ShuffleSpec: Add hash-based shuffles. New enum ShuffleKind helps callers
   figure out what kind of shuffle is happening. The change from SortColumn
   to KeyColumn allows ClusterBy to be used for both hash-based and sort-based
   shuffling.

2) WorkerImpl: Add ability to handle hash-based shuffles. Refactor the logic
   to be more readable by moving the work-order-running code to the inner
   class RunWorkOrder, and the shuffle-pipeline-building code to the inner
   class ShufflePipelineBuilder.

3) Add SortMergeJoinFrameProcessor and factory.

4) WorkerMemoryParameters: Adjust logic to reserve space for output frames
   for hash partitioning. (We need one frame per partition.)

sql module:

1) Add sqlJoinAlgorithm context parameter; can be "broadcast" or
   "sortMerge". With native, it must always be "broadcast", or it's a
   validation error. MSQ supports both. Default is "broadcast" in
   both engines.

2) Validate that MSQs do not use broadcast join with RIGHT or FULL join,
   as results are not correct for broadcast join with those types. Allow
   this in native for two reasons: legacy (the docs caution against it,
   but it's always been allowed), and the fact that it actually *does*
   generate correct results in native when the join is processed on the
   Broker. It is much less likely that MSQ will plan in such a way that
   generates correct results.

3) Remove subquery penalty in DruidJoinQueryRel when using sort-merge
   join, because subqueries are always required, so there's no reason
   to penalize them.

4) Move previously-disabled join reordering and manipulation rules to
   FANCY_JOIN_RULES, and enable them when using sort-merge join. Helps
   get to better plans where projections and filters are pushed down.

* Work around compiler problem.

* Updates from static analysis.

* Fix @param tag.

* Fix declared exception.

* Fix spelling.

* Minor adjustments.

* wip

* Merge fixups

* fixes

* Fix CalciteSelectQueryMSQTest

* Empty keys are sortable.

* Address comments from code review. Rename mux -> mix.

* Restore inspection config.

* Restore original doc.

* Reorder imports.

* Adjustments

* Fix.

* Fix imports.

* Adjustments from review.

* Update header.

* Adjust docs.
2023-03-08 14:19:39 -08:00
Adarsh Sanjeev ef82756176
Add validation for aggregations on __time (#13793)
* Add validation for aggregations on __time
2023-03-07 17:16:36 -08:00
Karan Kumar 94cfabea18
Suggested memory calculation in case NOT_ENOUGH_MEMORY_FAULT is thrown. (#13846)
* Suggested memory calculation in case NOT_ENOUGH_MEMORY_FAULT is thrown.

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-03-06 18:00:36 +05:30
Karan Kumar 65c3954942
Adding forbidden api for Properties#get() and Properties#getOrDefault() (#13882)
Properties#getOrDefault method does not check the default map for values where as Properties#getProperty() does.
2023-03-06 10:42:04 +05:30
Rohan Garg f33898ed6d
Fix durable storage cleanup (#13853) 2023-03-06 09:49:14 +05:30
Nicholas Lippis b68180fc44
use getProperty in MSQDurableStorageModule (#13881) 2023-03-04 11:56:43 -08:00
Anshu Makkar a10e4150d5
Add Post Aggregators for Tuple Sketches (#13819)
You can now do the following operations with TupleSketches in Post Aggregation Step

Get the Sketch Output as Base64 String
Provide a constant Tuple Sketch in post-aggregation step that can be used in Set Operations
Get the Estimated Value(Sum) of Summary/Metrics Objects associated with Tuple Sketch
2023-03-03 09:32:09 +05:30
Tejaswini Bandlamudi 7103cb4b9d
Removes FiniteFirehoseFactory and its implementations (#12852)
The FiniteFirehoseFactory and InputRowParser classes were deprecated in 0.17.0 (#8823) in favor of InputSource & InputFormat. This PR removes the FiniteFirehoseFactory and all its implementations along with classes solely used by them like Fetcher (Used by PrefetchableTextFilesFirehoseFactory). Refactors classes including tests using FiniteFirehoseFactory to use InputSource instead.
Removing InputRowParser may not be as trivial as many classes that aren't deprecated depends on it (with no alternatives), like EventReceiverFirehoseFactory. Hence FirehoseFactory, EventReceiverFirehoseFactory, and Firehose are marked deprecated.
2023-03-02 18:07:17 +05:30
Laksh Singla ca68fd93a6
Generate tombstones when running MSQ's replace (#13706)
*When running REPLACE queries, the segments which contain no data are dropped (marked as unused). This PR aims to generate tombstones in place of segments which contain no data to mark their deletion, as is the behavior with the native ingestion.

This will cause InsertCannotReplaceExistingSegmentFault to be removed since it was generated if the interval to be marked unused didn't fully overlap one of the existing segments to replace.
2023-03-01 12:01:30 +05:30
Clint Wylie 1d8fff4096
sampler + type detection = bff (#13711)
* sampler + type detection = bff
* split logical and physical dimensions, tidy up
2023-02-28 04:14:30 -08:00
Gian Merlino aeb1187a7d
Fix NPE in KinesisSupervisor#setupRecordSupplier. (#13859)
* Fix NPE in KinesisSupervisor#setupRecordSupplier.

PR #13539 refactored record supplier creation and introduced a bug:
this method would throw NPE when recordsPerFetch was not provided
by the user. recordsPerFetch isn't needed in this context at all,
since the supervisor-side supplier doesn't fetch records. So this
patch sets it to zero.

* Remove unused imports.
2023-02-27 19:55:28 -08:00
Karan Kumar 6bb5effa7b
Better logging for MSQ worker task (#13790)
* Adding more logs to MSQ worker implementation which makes it easier to debug.
2023-02-26 03:24:24 +05:30
Paul Rogers 914eebb4b7
Wire up the catalog resolver (#13788)
Introduces the catalog resolver interface
Wires the resolver up to the planner factory
Refactors planner factory
2023-02-22 11:42:32 -08:00
Abhishek Agarwal d2dbb8b2c0
Fix infinite checkpointing between tasks and overlord (#13825)
If the intermediate handoff period is less than the task duration and there is no new data in the input topic, task will continuously checkpoint the same offsets again and again. This PR fixes that bug by resetting the checkpoint time even when the task receives the same end offset request again.
2023-02-22 19:25:59 +05:30
Abhishek Radhakrishnan 9e9976001c
Add ANSI_QUOTES propety to DBI init in lookups. (#13826) 2023-02-21 15:13:22 -08:00
Clint Wylie 08b5951cc5
merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698)
* merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything
* fix poms and license stuff
* mockito is evil
* allow reset of JvmUtils RuntimeInfo if tests used static injection to override
2023-02-17 14:27:41 -08:00
Paul Rogers 842ee554de
Refinements to input-source specific table functions (#13780)
Refinements to table functions

Fixes various bugs
Improves the structure of the table function classes
Adds unit and integration tests
2023-02-13 16:21:27 -08:00
Adarsh Sanjeev d7a15be9bc
Add assertions for counters from reports (#13726)
Adds assertions for counters to MSQ unit tests
2023-02-08 16:33:37 +05:30
imply-cheddar f684df4c22
Use an HllSketchHolder object to enable optimized merge (#13737)
* Use an HllSketchHolder object to enable optimized merge

HllSketchAggregatorFactory.combine had been implemented using a
pure pair-wise, "make a union -> add 2 things to union -> get sketch"
algorithm.  This algorithm does 2 things that was CPU

1) The Union object always builds an HLL_8 sketch regardless of the
  target type.  This means that when the target type is not HLL_8, we
  spent CPU cycles converting to HLL_8 and back over and over again
2) By throwing away the Union object and converting back to the
  HllSketch only to build another Union object, we do lots and lots
  of copy+conversions of the HllSketch

This change introduces an HllSketchHolder object which can hold onto
a Union object and delay conversion back into an HllSketch until
it is actually needed.  This follows the same pattern as the
SketchHolder object for theta sketches.
2023-02-07 13:57:48 -08:00
Rohan Garg a0f8889f23
Robust handling and management of S3 streams for MSQ shuffle storage (#13741) 2023-02-07 14:17:37 +05:30
Clint Wylie 2d3bee8545
various nested column (and other) fixes (#13732)
changes:
* modified druid schema column type compution to special case COMPLEX<json> handling to choose COMPLEX<json> if any column in any segment is COMPLEX<json>
* NestedFieldVirtualColumn can now work correctly on any type of column, returning either a column selector if a root path, or nil selector if not
* fixed a random bug with NilVectorSelector when using a vector size larger than the default and druid.generic.useDefaultValueForNull=false would have the nulls vector set to all false instead of true
* fixed an overly aggressive check in ExprEval.ofType when handling complex types which would try to treat any string as base64 without gracefully falling back if it was not in fact base64 encoded, along with special handling for complex<json>
* added ExpressionVectorSelectors.castValueSelectorToObject and ExpressionVectorSelectors.castObjectSelectorToNumeric as convience methods to cast vector selectors using cast expressions without the trouble of constructing an expression. the polymorphic nature of the non-vectorized engine (and significantly larger overhead of non-vectorized expression processing) made adding similar methods for non-vectorized selectors less attractive and so have not been added at this time
* fix inconsistency between nested column indexer and serializer in handling values (coerce non primitive and non arrays of primitives using asString)
* ExprEval best effort mode now handles byte[] as string
* added test for ExprEval.bestEffortOf, and add missing conversion cases that tests uncovered
* more tests more better
2023-02-06 19:48:02 -08:00
imply-cheddar 9c5b61e114
Fallback virtual column (#13739)
* Fallback virtual column

This virtual columns enables falling back to another column if
the original column doesn't exist.  This is useful when doing
column migrations and you have some old data with column X,
new data with column Y and you want to use Y if it exists, X
otherwise so that you can run a consistent query against all of
the data.
2023-02-06 19:36:50 -08:00
Laksh Singla 9100a61bf6
Fix NPE in postCleanupStage if stage doesn't exist (#13742)
With fault tolerance enabled in MSQ, not all the work orders might be populated if the worker is restarted. In case it gets the request for cleaning up the stage which is not present in the worker's map, it can throw an NPE. Added a check to ensure that the stage is present in the map before cleaning it up, or else logging it as a warning.
2023-02-06 19:13:39 +05:30
Rohan Garg c5835c29a1
Use durable super sorter intermediate storage only with composable storage (#13748)
* This enables usage of durable storage connector only in case the composable storage feature is enabled.
2023-02-06 18:59:18 +05:30
somu-imply 74ff848ce5
Fixing incorrect filtering of nulls in an array when ingesting for JSON and Avro (#13712) 2023-02-01 04:15:08 -08:00
Adarsh Sanjeev 51dfde0284
Add maxInputBytesPerWorker as query context parameter (#13707)
* Add maxInputBytesPerWorker as query context parameter

* Move documenation to msq specific docs

* Update tests

* Spacing

* Address review comments

* Fix test

* Update docs/multi-stage-query/reference.md

* Correct spelling mistake

---------

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
2023-01-31 20:55:28 +05:30
Rohan Garg f76acccff2
Allow using composed storage for SuperSorter intermediate data (#13368) 2023-01-24 01:02:03 +05:30
Laksh Singla a516eb1a41
Port Calcite's tests to run with MSQ (#13625)
* SQL test framework extensions

* Capture planner artifacts: logical plan, etc.
* Planner test builder validates the logical plan
* Validation for the SQL resut schema (we already have
  validation for the Druid row signature)
* Better Guice integration: properties, reuse Guice modules
* Avoid need for hand-coded expr, macro tables
* Retire some of the test-specific query component creation
* Fix query log hook race condition

Co-authored-by: Paul Rogers <progers@apache.org>
2023-01-19 08:51:11 -08:00
Clint Wylie fb26a1093d
discover nested columns when using nested column indexer for schemaless ingestion (#13672)
* discover nested columns when using nested column indexer for schemaless
* move useNestedColumnIndexerForSchemaDiscovery from AppendableIndexSpec to DimensionsSpec
2023-01-18 12:57:28 -08:00
Paul Rogers 22630b0aab
Much improved table functions (#13627)
Much improved table functions

* Revises properties, definitions in the catalog
* Adds a "table function" abstraction to model such functions
* Specific functions for HTTP, inline, local and S3.
* Extended SQL types in the catalog
* Restructure external table definitions to use table functions
* EXTEND syntax for Druid's extern table function
* Support for array-valued table function parameters
* Support for array-valued SQL query parameters
* Much new documentation
2023-01-17 08:41:57 -08:00
Gian Merlino 182c4fad29
Kinesis: More robust default fetch settings. (#13539)
* Kinesis: More robust default fetch settings.

1) Default recordsPerFetch and recordBufferSize based on available memory
   rather than using hardcoded numbers. For this, we need an estimate
   of record size. Use 10 KB for regular records and 1 MB for aggregated
   records. With 1 GB heaps, 2 processors per task, and nonaggregated
   records, recordBufferSize comes out to the same as the old
   default (10000), and recordsPerFetch comes out slightly lower (1250
   instead of 4000).

2) Default maxRecordsPerPoll based on whether records are aggregated
   or not (100 if not aggregated, 1 if aggregated). Prior default was 100.

3) Default fetchThreads based on processors divided by task count on
   Indexers, rather than overall processor count.

4) Additionally clean up the serialized JSON a bit by adding various
   JsonInclude annotations.

* Updates for tests.

* Additional important verify.
2023-01-13 11:03:54 +05:30
Adarsh Sanjeev cb16a7f6a9
Fix behaviour of downsampling buckets to a single key (#13663) 2023-01-12 21:24:24 +05:30
Adarsh Sanjeev afb3d91777
Add unit test for complex column grouping (#13650)
* Add unit test for complex column grouping

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
2023-01-12 15:25:01 +05:30
Maytas Monsereenusorn 7f54ebbf47
Fix Parquet Parser missing column when reading parquet file (#13612)
* fix parquet reader

* fix checkstyle

* fix bug

* fix inspection

* refactor

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* add test

* fix checkstyle

* fix tests

* add IT

* add IT

* add more tests

* fix checkstyle

* fix stuff

* fix stuff

* add more tests

* add more tests
2023-01-11 20:08:48 -10:00
Karan Kumar 56076d33fb
Worker retry for MSQ task (#13353)
* Initial commit.

* Fixing error message in retry exceeded exception

* Cleaning up some code

* Adding some test cases.

* Adding java docs.

* Finishing up state test cases.

* Adding some more java docs and fixing spot bugs, intellij inspections

* Fixing intellij inspections and added tests

* Documenting error codes

* Migrate current integration batch tests to equivalent MSQ tests (#13374)

* Migrate current integration batch tests to equivalent MSQ tests using new IT framework

* Fix build issues

* Trigger Build

* Adding more tests and addressing comments

* fixBuildIssues

* fix dependency issues

* Parameterized the test and addressed comments

* Addressing comments

* fixing checkstyle errors

* Adressing comments

* Adding ITTest which kills the worker abruptly

* Review comments phase one

* Adding doc changes

* Adjusting for single threaded execution.

* Adding Sequential Merge PR state handling

* Merge things

* Fixing checkstyle.

* Adding new context param for fault tolerance.
Adding stale task handling in sketchFetcher.
Adding UT's.

* Merge things

* Merge things

* Adding parameterized tests
Created separate module for faultToleranceTests

* Adding missed files

* Review comments and fixing tests.

* Documentation things.

* Fixing IT

* Controller impl fix.

* Fixing racy WorkerSketchFetcherTest.java exception handling.

Co-authored-by: abhagraw <99210446+abhagraw@users.noreply.github.com>
Co-authored-by: Karan Kumar <cryptoe@karans-mbp.lan>
2023-01-11 07:38:29 +05:30
Abhishek Radhakrishnan 41fdf6eafb
Quote and escape literals in JDBC lookup to allow reserved identifiers. (#13632)
* Quote and escape table, key and column names.

* fix typo.

* More select statements.

* Derby lookup tests create quoted identifiers so it's compatible.

* Use Stringutils.replace() utility.

* quote the filter string.

* Squish doubly quote usage into a single function.

* Add parameterized test with reserved identifiers.

* few changes.
2023-01-10 12:11:54 +05:30
imply-cheddar f1821a7c18
Add Sort Operator for Window Functions (#13619)
* Addition of NaiveSortMaker and Default implementation

Add the NaiveSortMaker which makes a sorter
object and a default implementation of the
interface.

This also allows us to plan multiple different window 
definitions on the same query.
2023-01-06 00:27:18 -08:00
imply-cheddar a8ecc48ffe
Validate response headers and fix exception logging (#13609)
* Validate response headers and fix exception logging

A class of QueryException were throwing away their
causes making it really hard to determine what's
going wrong when something goes wrong in the SQL
planner specifically.  Fix that and adjust tests
 to do more validation of response headers as well.

We allow 404s and 307s to be returned even without 
authorization validated, but others get converted to 403
2023-01-05 14:15:15 -08:00
imply-cheddar 7b92b85168
Unify DummyRequest with MockHttpServletRequest (#13602)
We had 2 different classes both creating fake
instances of an HttpServletRequest, this makes
it to that we only have one in a common location
2022-12-21 20:15:08 -08:00
Kashif Faraz c1e2656644
Fix scope of dependencies in protobuf-extensions pom (#13593) 2022-12-19 13:56:55 +05:30
Clint Wylie d9e5245ff0
allow string dimension indexer to handle byte[] as base64 strings (#13573)
This PR expands `StringDimensionIndexer` to handle conversion of `byte[]` to base64 encoded strings, rather than the current behavior of calling java `toString`. 

This issue was uncovered by a regression of sorts introduced by #13519, which updated the protobuf extension to directly convert stuff to java types, resulting in `bytes` typed values being converted as `byte[]` instead of a base64 string which the previous JSON based conversion created. While outputting `byte[]` is more consistent with other input formats, and preferable when the bytes can be consumed directly (such as complex types serde), when fed to a `StringDimensionIndexer`, it resulted in an ugly java `toString` because `processRowValsToUnsortedEncodedKeyComponent` is fed the output of `row.getRaw(..)`. Converting `byte[]` to a base64 string within `StringDimensionIndexer` is consistent with the behavior of calling `row.getDimension(..)` which does do this coercion (and why many tests on binary types appeared to be doing the expected thing).

I added some protobuf `bytes` tests, but they don't really hit the new `StringDimensionIndexer` behavior because they operate on the `InputRow` directly, and call `getDimension` to validate stuff. The parser based version still uses the old conversion mechanisms, so when not using a flattener incorrectly calls `toString` on the `ByteString`. I have encoded this behavior in the test for now, if we either update the parser to use the new flattener or just .. remove parsers we can remove this test stuff.
2022-12-16 14:50:17 +05:30