druid

Commit Graph

Author	SHA1	Message	Date
Clint Wylie	0bd13bcd51	Projections prototype (#17214 )	2024-10-05 04:38:57 -07:00
Clint Wylie	04fe56835d	add druid.expressions.allowVectorizeFallback and default to false (#17248 ) changes: adds ExpressionProcessing.allowVectorizeFallback() and ExpressionProcessingConfig.allowVectorizeFallback(), defaulting to false until few remaining bugs can be fixed (mostly complex types and some odd interactions with mixed types) add cannotVectorizeUnlessFallback functions to make it easy to toggle the default of this config, and easy to know what to delete when we remove it in the future	2024-10-05 12:42:42 +05:30
Sree Charan Manamala	67d361c9bf	Window Functions : Remove enable windowing flag (#17087 )	2024-09-23 08:24:26 +02:00
Abhishek Agarwal	78775ad398	Prepare master for 32.0.0 release (#17022 )	2024-09-10 11:01:20 +05:30
Clint Wylie	f57cd6f7af	transition away from StorageAdapter (#16985 ) * transition away from StorageAdapter changes: * CursorHolderFactory has been renamed to CursorFactory and moved off of StorageAdapter, instead fetched directly from the segment via 'asCursorFactory'. The previous deprecated CursorFactory interface has been merged into StorageAdapter * StorageAdapter is no longer used by any engines or tests and has been marked as deprecated with default implementations of all methods that throw exceptions indicating the new methods to call instead * StorageAdapter methods not covered by CursorFactory (CursorHolderFactory prior to this change) have been moved into interfaces which are retrieved by Segment.as, the primary classes are the previously existing Metadata, as well as new interfaces PhysicalSegmentInspector and TopNOptimizationInspector * added UnnestSegment and FilteredSegment that extend WrappedSegmentReference since their StorageAdapter implementations were previously provided by WrappedSegmentReference * added PhysicalSegmentInspector which covers some of the previous StorageAdapter functionality which was primarily used for segment metadata queries and other metadata uses, and is implemented for QueryableIndexSegment and IncrementalIndexSegment * added TopNOptimizationInspector to cover the oddly specific StorageAdapter.hasBuiltInFilters implementation, which is implemented for HashJoinSegment, UnnestSegment, and FilteredSegment * Updated all engines and tests to no longer use StorageAdapter	2024-09-09 14:55:29 -07:00
Clint Wylie	f8301a314f	generic block compressed complex columns (#16863 ) changes: * Adds new `CompressedComplexColumn`, `CompressedComplexColumnSerializer`, `CompressedComplexColumnSupplier` based on `CompressedVariableSizedBlobColumn` used by JSON columns * Adds `IndexSpec.complexMetricCompression` which can be used to specify compression for the generic compressed complex column. Defaults to uncompressed because compressed columns are not backwards compatible. * Adds new definition of `ComplexMetricSerde.getSerializer` which accepts an `IndexSpec` argument when creating a serializer. The old signature has been marked `@Deprecated` and has a default implementation that returns `null`, but it will be used by the default implementation of the new version if it is implemented to return a non-null value. The default implementation of the new method will use a `CompressedComplexColumnSerializer` if `IndexSpec.complexMetricCompression` is not null/none/uncompressed, or will use `LargeColumnSupportedComplexColumnSerializer` otherwise. * Removed all duplicate generic implementations of `ComplexMetricSerde.getSerializer` and `ComplexMetricSerde.deserializeColumn` into default implementations `ComplexMetricSerde` instead of being copied all over the place. The default implementation of `deserializeColumn` will check if the first byte indicates that the new compression was used, otherwise will use the `GenericIndexed` based supplier. * Complex columns with custom serializers/deserializers are unaffected and may continue doing whatever it is they do, either with specialized compression or whatever else, this new stuff is just to provide generic implementations built around `ObjectStrategy`. * add ObjectStrategy.readRetainsBufferReference so CompressedComplexColumn only copies on read if required * add copyValueOnRead flag down to CompressedBlockReader to avoid buffer duplicate if the value needs copied anyway	2024-08-27 00:34:41 -07:00
Clint Wylie	4283b270e3	rework cursor creation (#16533 ) changes: * Added `CursorBuildSpec` which captures all of the 'interesting' stuff that goes into producing a cursor as a replacement for the method arguments of `CursorFactory.canVectorize`, `CursorFactory.makeCursor`, and `CursorFactory.makeVectorCursor` * added new interface `CursorHolder` and new interface `CursorHolderFactory` as a replacement for `CursorFactory`, with method `makeCursorHolder`, which takes a `CursorBuildSpec` as an argument and replaces `CursorFactory.canVectorize`, `CursorFactory.makeCursor`, and `CursorFactory.makeVectorCursor` * `CursorFactory.makeCursors` previously returned a `Sequence<Cursor>` corresponding to the query granularity buckets, with a separate `Cursor` per bucket. `CursorHolder.asCursor` instead returns a single `Cursor` (equivalent to 'ALL' granularity), and a new `CursorGranularizer` has been added for query engines to iterate over the cursor and divide into granularity buckets. This makes the non-vectorized engine behave the same way as the vectorized query engine (with its `VectorCursorGranularizer`), and simplifies a lot of stuff that has to read segments particularly if it does not care about bucketing the results into granularities. * Deprecated `CursorFactory`, `CursorFactory.canVectorize`, `CursorFactory.makeCursors`, and `CursorFactory.makeVectorCursor` * updated all `StorageAdapter` implementations to implement `makeCursorHolder`, transitioned direct `CursorFactory` implementations to instead implement `CursorMakerFactory`. `StorageAdapter` being a `CursorMakerFactory` is intended to be a transitional thing, ideally will not be released in favor of moving `CursorMakerFactory` to be fetched directly from `Segment`, however this PR was already large enough so this will be done in a follow-up. * updated all query engines to use `makeCursorHolder`, granularity based engines to use `CursorGranularizer`.	2024-08-16 11:34:10 -07:00
Zoltan Haindrich	26e3c44f4b	Quidem record (#16624 ) * enables to launch a fake broker based on test resources (druidtest uri) * could record queries into new testfiles during usage * instead of re-purpose Calcite's Hook migrates to use DruidHook which we can add further keys * added a quidem-ut module which could be the place for tests which could iteract with modules/etc	2024-08-05 14:58:32 +02:00
Akshat Jain	6a2348b78b	Preemptive restriction for queries with approximate count distinct on complex columns of unsupported type (#16682 ) This PR aims to check if the complex column being queried aligns with the supported types in the aggregator and aggregator factories, and throws a user-friendly error message if they don't.	2024-07-22 21:34:06 +05:30
Laksh Singla	3a1b437056	Improve the fallback strategy when the broker is unable to materialize the subquery's results as frames for estimating the bytes (#16679 ) Better fallback strategy when the broker is unable to materialize the subquery's results as frames for estimating the bytes: a. We don't touch the subquery sequence till we know that we can materialize the result as frames	2024-07-12 21:49:12 +05:30
Sree Charan Manamala	eb981d855f	Correct aggregators violating names (#16615 ) In case of few aggregators for example BloomSqlAggregator, BaseVarianceSqlAggregator etc, the aggName is being updated from a0 to a0:agg, breaching the contract as we would expect the aggName as the name which is passed. This is causing a mismatch while creating a column accessor. This commit aims to correct those violating sql aggregators.	2024-07-12 09:18:09 +02:00
Gian Merlino	277006446d	Fallback vectorization for FunctionExpr and BaseMacroFunctionExpr. (#16366 ) * Fallback vectorization for FunctionExpr and BaseMacroFunctionExpr. This patch adds FallbackVectorProcessor, a processor that adapts non-vectorizable operations into vectorizable ones. It is used in FunctionExpr and BaseMacroFunctionExpr. In addition: - Identifiers are updated to offer getObjectVector for ARRAY and COMPLEX in addition to STRING. ExprEvalObjectVector is updated to offer ARRAY and COMPLEX as well. - In SQL tests, cannotVectorize now fails tests if an exception is not thrown. This makes it easier to identify tests that can now vectorize. - Fix a null-matcher bug in StringObjectVectorValueMatcher. * Fix tests. * Fixes. * Fix tests. * Fix test. * Fix test.	2024-06-05 20:03:02 -07:00
Gian Merlino	2534a42539	Fix serde for ArrayOfDoublesSketchConstantPostAggregator. (#16550 ) * Fix serde for ArrayOfDoublesSketchConstantPostAggregator. The version originally added in #13819 was missing an annotation for the "value" property. Fixes #16539. Line endings for ArrayOfDoublesSketchConstantPostAggregator.java are changed from \r\n to \n. Adds a serde test, and improves various other datasketches post-aggregator serde tests to deserialize into PostAggregator. This verifies that the type information is set up correctly. * Fix excessive imports. * Fix equals, hashCode.	2024-06-05 20:01:51 -07:00
Zoltan Haindrich	1811674753	Enable quidem tests to use different suppliers (#16382 ) * enable quidem uri support for `druidtest:///?ComponentSupplier=Nested` and similar * changes the way `SqlTestFrameworkConfig` is being applied; all options will have their own annotation (its kinda impossible to detect that an annotation has a set value or its the default) * enables hierarchical processing of config annotation (was needed to enable class level supplier annotation) * moves uri processing related string2config stuff into `SqlTestFrameworkConfig`	2024-05-09 09:21:02 +02:00
Adarsh Sanjeev	9a2d7c28bc	Prepare master branch for 31.0.0 release (#16333 )	2024-04-26 09:22:43 +05:30
Zoltan Haindrich	9c0bd56f5b	Make QueryComponentSupliers independent from test classes (#16275 )	2024-04-25 02:12:07 -04:00
Gian Merlino	8a5cc976a9	ArrayOfDoublesSketchBuildAggregator: Fix NPE in get() for empty sketch. (#16330 ) Fixes a bug introduced in #16296, where the sketch might not be initialized if get() is called without calling aggregate(). Also adds a test for this case.	2024-04-25 00:59:59 -04:00
Gian Merlino	274ccbfd85	Reset buffer aggregators when resetting Groupers. (#16296 ) Buffer aggregators can contain some cached objects within them, such as Memory references or HLL Unions. Prior to this patch, various Grouper implementations were not releasing this state when resetting their own internal state, which could lead to excessive memory use. This patch renames AggregatorAdapater#close to "reset", and updates Grouper implementations to call this reset method whenever they reset their internal state. The base method on BufferAggregator and VectorAggregator remains named "close", for compatibility with existing extensions, but the contract is adjusted to say that the aggregator may be reused after the method is called. All existing implementations in core already adhere to this new contract, except for the ArrayOfDoubles build flavors, which are updated in this patch to adhere. Additionally, this patch harmonizes buffer sketch helpers to call their clear method "clear" rather than a mix of "clear" and "close". (Others were already using "clear".)	2024-04-24 05:39:24 -04:00
Gian Merlino	9f358f5f4a	SQL tests: avoid mixing skip and cannot vectorize. (#16251 ) * SQL tests: avoid mixing skip and cannot vectorize. skipVectorize switches off vectorization tests completely, and cannotVectorize turns vectorization tests into negative tests. It doesn't make sense to use them together, so this patch makes it an error to do so, and cleans up cases where both are mentioned. This patch also has the effect of changing various tests from skipVectorize to cannotVectorize, because in the past when both were mentioned, skipVectorize would take priority. * Fix bug with StringAnyAggregatorFactory attempting to vectorize when it cannt. * Fix tests.	2024-04-11 15:06:11 -07:00
Soumyava	4bea865697	Restore context flag for window functions (#16229 )	2024-04-03 13:57:13 +05:30
zachjsh	9b52c909e0	fix complex types returning UNKNOWN as their SQL type inference (#16216 ) * * fix * * fix * * address review comments	2024-04-02 14:36:01 -04:00
Soumyava	524842a3bb	Window function on msq (#15470 ) This PR aims to introduce Window functions on MSQ by doing the following: Introduce a Window querykit for handling window queries along with its factory and a processor for window queries If a window operator is present with a partition by clause, pushes the partition as a shuffle spec of the previous stage In presence of empty OVER() clause lets all operators loose on a single rac In presence of no empty OVER() clause, breaks down each window into individual stages Associated machinery to handle window functions in MSQ Introduced a separate hidden engine feature WINDOW_LEAF_OPERATOR which is set only for MSQ engine. In presence of this feature, the planner plans without the leaf operators by creating a window query over an inner scan query. In case of native this is set to false and the planner generates the leafOperators Guardrails around materialization Comprehensive UTs	2024-03-28 14:58:34 +05:30
Zoltan Haindrich	0a42342cef	Update CalciteTest to use junit5 (#16106 ) Update CalciteTest to use junit5 change the way temp dirs are handled * add openrewrite workflow to safeguard upgrade * replace junitparamrunner with standard junit5 parametered tests * update a few rules to junit5 api * lots of boring changes * cleanup QueryLogHook * cleanup * fix compile error: ARRAYS_DATASOURCE * fix test * remove enclosed * empty +TEST:TDigestSketchSqlAggregatorTest,HllSketchSqlAggregatorTest,DoublesSketchSqlAggregatorTest,ThetaSketchSqlAggregatorTest,ArrayOfDoublesSketchSqlAggregatorTest,BloomFilterSqlAggregatorTest,BloomDimFilterSqlTest,CatalogIngestionTest,CatalogQueryTest,FixedBucketsHistogramQuantileSqlAggregatorTest,QuantileSqlAggregatorTest,MSQArraysTest,MSQDataSketchesTest,MSQExportTest,MSQFaultsTest,MSQInsertTest,MSQLoadedSegmentTests,MSQParseExceptionsTest,MSQReplaceTest,MSQSelectTest,InsertLockPreemptedFaultTest,MSQWarningsTest,SqlMSQStatementResourcePostTest,SqlStatementResourceTest,CalciteSelectJoinQueryMSQTest,CalciteSelectQueryMSQTest,CalciteUnionQueryMSQTest,MSQTestBase,VarianceSqlAggregatorTest,SleepSqlTest,SqlRowTransformerTest,DruidAvaticaHandlerTest,DruidStatementTest,BaseCalciteQueryTest,CalciteArraysQueryTest,CalciteCorrelatedQueryTest,CalciteExplainQueryTest,CalciteExportTest,CalciteIngestionDmlTest,CalciteInsertDmlTest,CalciteJoinQueryTest,CalciteLookupFunctionQueryTest,CalciteMultiValueStringQueryTest,CalciteNestedDataQueryTest,CalciteParameterQueryTest,CalciteQueryTest,CalciteReplaceDmlTest,CalciteScanSignatureTest,CalciteSelectQueryTest,CalciteSimpleQueryTest,CalciteSubqueryTest,CalciteSysQueryTest,CalciteTableAppendTest,CalciteTimeBoundaryQueryTest,CalciteUnionQueryTest,CalciteWindowQueryTest,DecoupledPlanningCalciteJoinQueryTest,DecoupledPlanningCalciteQueryTest,DecoupledPlanningCalciteUnionQueryTest,DrillWindowQueryTest,DruidPlannerResourceAnalyzeTest,IngestTableFunctionTest,QueryTestRunner,SqlTestFrameworkConfig,SqlAggregationModuleTest,ExpressionsTest,GreatestExpressionTest,IPv4AddressMatchExpressionTest,IPv4AddressParseExpressionTest,IPv4AddressStringifyExpressionTest,LeastExpressionTest,TimeFormatOperatorConversionTest,CombineAndSimplifyBoundsTest,FiltrationTest,SqlQueryTest,CalcitePlannerModuleTest,CalcitesTest,DruidCalciteSchemaModuleTest,DruidSchemaNoDataInitTest,InformationSchemaTest,NamedDruidSchemaTest,NamedLookupSchemaTest,NamedSystemSchemaTest,RootSchemaProviderTest,SystemSchemaTest,CalciteTestBase,SqlResourceTest * use @Nested * add rule to remove enclosed; upgrade surefire * remove enclosed * cleanup * add comment about surefire exclude	2024-03-19 04:05:12 -07:00
Zoltan Haindrich	8252d72e2a	Pull up literals in InputAccessor (#16033 ) * Pull up literals in InputAccessor * pull up literals in `InputAccessor` * remove the need to pass `constants` of `Window` operator Fixes #15353 * update test * enable relax_nulls	2024-03-12 09:14:31 -07:00
Gian Merlino	0f6a895372	Rework ExprMacro base classes to simplify implementations. (#15622 ) * Rework ExprMacro base classes to simplify implementations. This patch removes BaseScalarUnivariateMacroFunctionExpr, adds BaseMacroFunctionExpr at the top of the hierarchy (a suitable base class for ExprMacros that take either arrays or scalars), and adds an implementation for "visit" to BaseMacroFunctionExpr. The effect on implementations is generally cleaner code: - Exprs no longer need to implement "visit". - Exprs no longer need to implement "stringify", even if they don't use all of their args at runtime, because BaseMacroFunctionExpr has access to even unused args. - Exprs that accept arrays can extend BaseMacroFunctionExpr and inherit a bunch of useful methods. The only one they need to implement themselves that scalar exprs don't is "supplyAnalyzeInputs". * Make StringDecodeBase64UTFExpression a static class. * Remove unused import. * Formatting, annotation changes.	2024-02-12 15:50:45 -08:00
Gian Merlino	21a97f4c61	Fix HllSketchHolderObjectStrategy#isSafeToConvertToNullSketch. (#15860 ) * Fix HllSketchHolderObjectStrategy#isSafeToConvertToNullSketch. The prior code from #15162 was reading only the low-order byte of an int representing the size of a coupon set. As a result, it would erroneously believe that a coupon set with a multiple of 256 elements was empty.	2024-02-08 08:14:28 +05:30
Karan Kumar	c4990f56d6	Prepare main branch for next 30.0.0 release. (#15707 )	2024-01-23 15:55:54 +05:30
Zoltan Haindrich	d6a12c4389	Add ability to enable ResultCache in tests (#15465 )	2024-01-22 09:02:59 -05:00
Zoltan Haindrich	8a43db9395	Range support in window expressions (support them as groups) (#15365 ) * support groups windowing mode; which is a close relative of ranges (but not in the standard) * all windows with range expressions will be executed wit it groups * it will be 100% correct in case for both bounds its true that: isCurrentRow() \|\| isUnBounded() * this covers OVER ( ORDER BY COL ) * for other cases it will have some chances of getting correct results...	2024-01-17 00:05:21 -06:00
Adarsh Sanjeev	254a8eb7e0	Add null checks for HllSketchHolder (#15502 ) Fixes a potential NPE which could occur while folding the HllSketchAggregator. If the sketch is null, druid could return a null HllSketchHolder object. Adding a null check here could help here Resolves a null pointer exception in HllSketchAggregatorFactory	2023-12-08 11:43:04 +05:30
Magnus Henoch	67f45fa7bf	Fix histograms for sketches where min and max are equal (#15381 ) There is a problem with Quantiles sketches and KLL Quantiles sketches. Queries using the histogram post-aggregator fail if: - the sketch contains at least one value, and - the values in the sketch are all equal, and - the splitPoints argument is not passed to the post-aggregator, and - the numBins argument is greater than 2 (or not specified, which leads to the default of 10 being used) In that case, the query fails and returns this error: { "error": "Unknown exception", "errorClass": "org.apache.datasketches.common.SketchesArgumentException", "host": null, "errorCode": "legacyQueryException", "persona": "OPERATOR", "category": "RUNTIME_FAILURE", "errorMessage": "Values must be unique, monotonically increasing and not NaN.", "context": { "host": null, "errorClass": "org.apache.datasketches.common.SketchesArgumentException", "legacyErrorCode": "Unknown exception" } } This behaviour is undesirable, since the caller doesn't necessarily know in advance whether the sketch has values that are diverse enough. With this change, the post-aggregators return [N, 0, 0...] instead of crashing, where N is the number of values in the sketch, and the length of the list is equal to numBins. That is what they already returned for numBins = 2. Here is an example of a query that would fail: {"queryType":"timeseries", "dataSource": { "type": "inline", "columnNames": ["foo", "bar"], "rows": [ ["abc", 42.0], ["def", 42.0] ] }, "intervals":["0000/3000"], "granularity":"all", "aggregations":[ {"name":"the_sketch", "fieldName":"bar", "type":"quantilesDoublesSketch"}], "postAggregations":[ {"name":"the_histogram", "type":"quantilesDoublesSketchToHistogram", "field":{"type":"fieldAccess","fieldName":"the_sketch"}, "numBins": 3}]} I believe this also fixes issue #10585.	2023-11-16 12:31:22 +05:30
Rishabh Singh	8c802e4c9b	Relocating Table Schema Building: Shifting from Brokers to Coordinator for Improved Efficiency (#14985 ) In the current design, brokers query both data nodes and tasks to fetch the schema of the segments they serve. The table schema is then constructed by combining the schemas of all segments within a datasource. However, this approach leads to a high number of segment metadata queries during broker startup, resulting in slow startup times and various issues outlined in the design proposal. To address these challenges, we propose centralizing the table schema management process within the coordinator. This change is the first step in that direction. In the new arrangement, the coordinator will take on the responsibility of querying both data nodes and tasks to fetch segment schema and subsequently building the table schema. Brokers will now simply query the Coordinator to fetch table schema. Importantly, brokers will still retain the capability to build table schemas if the need arises, ensuring both flexibility and resilience.	2023-11-04 19:33:25 +05:30
Adarsh Sanjeev	9576fd3141	HllSketch Merge Aggregator optimizations (#15162 ) * Null byte serde for empty sketches * Cache for HllSketchMerge * Check for empty sketches * Address review comments * Revert changes to HllSketchHolder * Handle null sketch holders instead of null sketches * Add unit test for MSQ HllSketch * Add comments * Fix style	2023-11-03 11:01:22 +08:00
Clint Wylie	d261587f4a	explicit outputType for ExpressionPostAggregator, better documentation for the differences between arrays and mvds (#15245 ) * better documentation for the differences between arrays and mvds * add outputType to ExpressionPostAggregator to make docs true * add output coercion if outputType is defined on ExpressionPostAgg * updated post-aggregations.md to be consistent with aggregations.md and filters.md and use tables	2023-11-02 00:31:37 -07:00
Laksh Singla	2ea7177f15	Allow casted literal values in SQL functions accepting literals (#15282 ) Functions that accept literals also allow casted literals. This shouldn't have an impact on the queries that the user writes. It enables the SQL functions to accept explicit cast, which is required with JDBC.	2023-11-01 10:38:48 +05:30
Alexander Saydakov	f1132d20c5	use datasketches-java 4.2.0 (#15257 ) * use datasketches-java 4.2.0 * use exclusive mode * fixed issues raised by CodeQL * fixed issue raised by spotbugs * fixed issues raised by intellij * added missing import * Update QuantilesSketchKeyCollector search mode and adjust tests. * Update sizeOf functions and add unit tests * Add unit tests --------- Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com> Co-authored-by: Gian Merlino <gianmerlino@gmail.com> Co-authored-by: Adarsh Sanjeev <adarshsanjeev@gmail.com>	2023-10-26 16:28:33 -07:00
Clint Wylie	d0f64608eb	sql compatible three-valued logic native filters (#15058 ) * sql compatible tri-state native logical filters when druid.expressions.useStrictBooleans=true and druid.generic.useDefaultValueForNull=false, and new druid.generic.useThreeValueLogicForNativeFilters=true * log.warn if non-default configurations are used to guide operators towards SQL complaint behavior	2023-10-12 00:06:23 -07:00
Laksh Singla	5f86072456	Prepare master for Druid 29 (#15121 ) Prepare master for Druid 29	2023-10-11 10:33:45 +05:30
Zoltan Haindrich	b5a87fd89b	Support constant args in window functions (#15071 ) Instead of passing the constants around in a new parameter; InputAccessor was introduced to take care of transparently handling the constants - this new class started picking up some copy-paste debris around field accesses; and made them a little bit more readble.	2023-10-08 12:14:25 +05:30
Zoltan Haindrich	7b869fd37a	Change type of AVG aggregates to double (#15089 ) The sql standard is not very restrictive regarding this: If AVG is specified and DT is exact numeric, then the declared type of the result is an implemen- tation-defined exact numeric type with precision not less than the precision of DT and scale not less than the scale of DT. so; using the same type is also ok (without patch); however the avg of 0 and 1 is 0 right now because of the retention of the integer typ Postgres,MySql and Oracle and Drill seem to increase precision ; mssql returns 0 http://sqlfiddle.com/#!9/6f7248/1 I think we should also increase precision as its already calculated more precisely	2023-10-07 18:01:09 +05:30
Pranav	06c5527c85	Allow aliasing of Macros and add new alias for complex decode 64 (#15034 ) * Add AliasExprMacro to allow aliasing of native expression macros * Add decode_base64_complex alias for complex_decode_base64	2023-10-05 16:24:36 -07:00
Gian Merlino	3dabfead05	Fix getResultType for HLL, quantiles aggregators. (#15043 ) The aggregators had incorrect types for getResultType when shouldFinalze is false. They had the finalized type, but they should have had the intermediate type. Also includes a refactor of how ExprMacroTable is handled in tests, to make it easier to add tests for this to the MSQ module. The bug was originally noticed because the incorrect result types caused MSQ queries with DS_HLL to behave erratically.	2023-09-27 08:51:14 +05:30
Clint Wylie	36e659a501	remove group-by v1 (#14866 ) * remove group-by v1 * docs * remove unused configs, fix test * fix test * adjustments * why not * adjust * review stuff	2023-08-23 12:44:06 -07:00
Soumyava	afe22907a5	Calcite upgrade 1.35 (#14510 ) * Update to Calcite 1.35.0 * Update from.ftl for Calcite 1.35.0. * Fixed tests in Calcite upgrade by doing the following: 1. Added a new rule, CoreRules.PROJECT_FILTER_TRANSPOSE_WHOLE_PROJECT_EXPRESSIONS, to Base rules 2. Refactored the CorrelateUnnestRule 3. Updated CorrelateUnnestRel accordingly 4. Fixed a case with selector filters on the left where Calcite was eliding the virtual column 5. Additional test cases for fixes in 2,3,4 6. Update to StringListAggregator to fail a query if separators are not propagated appropriately * Refactored for testcases to pass after the upgrade, introduced 2 new data sources for handling filters and select projects * Added a literalSqlAggregator as the upgraded Calcite involved changes to subquery remove rule. This corrected plans for 2 queries with joins and subqueries by replacing an useless literal dimension with a post agg. Additionally a test with COUNT DISTINCT and FILTER which was failing with Calcite 1.21 is added here which passes with 1.35 * Updated to latest avatica and updated code as SqlUnknownTimeStamp is now used in Calcite which needs to be resolved to a timestamp literal * Added a wrapper segment ref to use for unnest and filter segment reference	2023-08-11 12:47:16 -07:00
Clint Wylie	913416c669	add equality, null, and range filter (#14542 ) changes: * new filters that preserve match value typing to better handle filtering different column types * sql planner uses new filters by default in sql compatible null handling mode * remove isFilterable from column capabilities * proper handling of array filtering, add array processor to column processors * javadoc for sql test filter functions * range filter support for arrays, tons more tests, fixes * add dimension selector tests for mixed type roots * support json equality * rename semantic index maker thingys to mostly have plural names since they typically make many indexes, e.g. StringValueSetIndex -> StringValueSetIndexes * add cooler equality index maker, ValueIndexes * fix missing string utf8 index supplier * expression array comparator stuff	2023-07-18 12:15:22 -07:00
AmatyaAvadhanula	0412f40d36	Prepare master branch for next release, 28.0.0 (#14595 ) * Prepare master branch for next release, 28.0.0	2023-07-18 09:22:30 +05:30
Pranav	8087aa2b80	Adding the null check in combine and fold in doublesSketch (#14568 )	2023-07-11 14:28:34 +05:30
imply-cheddar	66cac08a52	Refactor HllSketchBuildAggregatorFactory (#14544 ) * Refactor HllSketchBuildAggregatorFactory The usage of ColumnProcessors and HllSketchBuildColumnProcessorFactory made it very difficult to figure out what was going on from just looking at the AggregatorFactory or Aggregator code. It also didn't properly double check that you could use UTF8 ahead of time, even though it's entirely possible to validate it before trying to use it. This refactor makes keeps the general indirection that had been implemented by the Consumer<Supplier<HllSketch>> but centralizes the decision logic and makes it easier to understand the code. * Test fixes * Add test that validates the types are maintained * Add back indirection to avoid buffer calls * Cover floats and doubles are the same thing * Static checks	2023-07-10 09:57:09 -07:00
Gian Merlino	67fbd8e7fc	Add "stringEncoding" parameter to DataSketches HLL. (#11201 ) * Add "stringEncoding" parameter to DataSketches HLL. Builds on the concept from #11172 and adds a way to feed HLL sketches with UTF-8 bytes. This must be an option rather than always-on, because prior to this patch, HLL sketches used UTF-16LE encoding when hashing strings. To remain compatible with sketch images created prior to this patch -- which matters during rolling updates and when reading sketches that have been written to segments -- we must keep UTF-16LE as the default. Not currently documented, because I'm not yet sure how best to expose this functionality to users. I think the first place would be in the SQL layer: we could have it automatically select UTF-8 or UTF-16LE when building sketches at query time. We need to be careful about this, though, because UTF-8 isn't always faster. Sometimes, like for the results of expressions, UTF-16LE is faster. I expect we will sort this out in future patches. * Fix benchmark. * Fix style issues, improve test coverage. * Put round back, to make IT updates easier. * Fix test. * Fix issue with filtered aggregators and add test. * Use DS native update(ByteBuffer) method. Improve test coverage. * Add another suppression. * Fix ITAutoCompactionTest. * Update benchmarks. * Updates. * Fix conflict. * Adjustments.	2023-06-30 12:45:55 -07:00
Gian Merlino	3d19b748fb	SQL OperatorConversions: Introduce.aggregatorBuilder, allow CAST-as-literal. (#14249 ) * SQL OperatorConversions: Introduce.aggregatorBuilder, allow CAST-as-literal. Four main changes: 1) Provide aggregatorBuilder, a more consistent way of defining the SqlAggFunction we need for all of our SQL aggregators. The mechanism is analogous to the one we already use for SQL functions (OperatorConversions.operatorBuilder). 2) Allow CASTs of constants to be considered as "literalOperands". This fixes an issue where various of our operators are defined with OperandTypes.LITERAL as part of their checkers, which doesn't allow casts. However, in these cases we generally _do_ want to allow casts. The important piece is that the value must be reducible to a constant, not that the SQL text is literally a literal. 3) Update DataSketches SQL aggregators to use the new aggregatorBuilder functionality. The main user-visible effect here is [2]: the aggregators would now accept, for example, "CAST(0.99 AS DOUBLE)" as a literal argument. Other aggregators could be updated in a future patch. 4) Rename "requiredOperands" to "requiredOperandCount", because the old name was confusing. (It rhymes with "literalOperands" but the arguments mean different things.) * Adjust method calls.	2023-06-23 16:25:04 -07:00

1 2 3 4 5 ...

286 Commits