* Segment backed broadcast join IndexedTable
* fix comments
* fix tests
* sharing is caring
* fix test
* i hope this doesnt fix it
* filter by schema to maybe fix test
* changes
* close join stuffs so it does not leak, allow table to directly make selector factory
* oops
* update comment
* review stuffs
* better check
* remove DruidLeaderClient.goAsync(..) that does not follow redirect.
Replace its usage by DruidLeaadereClient.go(..) with
InputStreamFullResponseHandler
* remove ByteArrayResponseHolder dependency from JsonParserIterator
* add UT to cover lines in InputStreamFullResponseHandler
* refactor SystemSchema to reduce branches
* further reduce branches
* Revert "add UT to cover lines in InputStreamFullResponseHandler"
This reverts commit 330aba3dd9.
* UTs for InputStreamFullResponseHandler
* remove unused imports
* Add "offset" parameter to the Scan query.
It works by doing the query as normal and then throwing away the first
"offset" number of rows on the broker.
* Fix constructor call.
* Fix up JSONs.
* Fix call to ScanQuery.
* Doc update.
* Fix javadocs.
* Spotbugs, LGTM suppressions.
* Javadocs.
* Fix suppression.
* Stabilize Scan query result order, add tests.
* Update LGTM comment.
* Fixup.
* Test different batch sizes too.
* Nicer tests.
* Fix comment.
* LongMaxVectorAggregator support and test case.
* DoubleMinVectorAggregator and test cases.
* DoubleMaxVectorAggregator and unit test.
* FloatMinVectorAggregator and FloatMaxVectorAggregator.
* Documentation update to include the other vector aggregators.
* Bug fix.
* checkstyle formatting fixes.
* CalciteQueryTest cases update.
* Separate test classes for FloatMaxAggregation and FloatMniAggregation.
* remove the cannotVectorize for float max/min aggregator in test.
* Tests in GroupByQueryRunner, GroupByTimeseriesQueryRunner and TimeseriesQueryRunner.
* Add "offset" parameter to GroupBy query.
It works by doing the query as normal and then throwing away the first
"offset" number of rows on the broker.
* Stabilize GroupBy sorts.
* Fix inspections.
* Fix suppression.
* Fixups.
* Move TopNSequence to druid-core.
* Addl comments.
* NumberedElement equals verification.
* Changes from review.
* Fix minor formatting in docs.
* Add Nullhandling initialization for test to run from IDE.
* Vectorize longMin aggregator.
- A new vectorized class for the vectorized long min aggregator.
- Changes to AggregatorFactory to support vectorize functionality.
- Few changes to schema evolution test to add LongMinAggregatorFactory.
* Add longSum to the supported vectorized aggregator implementations.
* Add MIN() long min to calcite query test that can vectorize.
* Add simple long aggregations test.
* Fixup formatting per checkstyle guide.
* fixup and add more tests for long min aggregator.
* Override test for groupBy since timestamps are handled differently.
* Null compatibility check in test.
* Review comment: Add a test case to LongMinAggregationTest.
* Fix timeseries query constructor when postAggregator has an expression reading timestamp result column
* fix npe
* Fix postAgg referencing timestampResultField and add a test for it
* fix test
* doc
* revert doc
* new average aggregator
* method to create count aggregator factory
* test everything
* update other usages
* fix style
* fix more tests
* fix datasketches tests
* Ensure that join filter pre-analysis operates on optimized filters, add DimFilter.toOptimizedFilter
* Remove aggressive equality check that was used for testing
* Use Suppliers.memoize
* Checkstyle
* ROUND and having comparators correctly handle doubles
Double.NaN, Double.POSITIVE_INFINITY and Double.NEGATIVE_INFINITY are not real
numbers. Because of this, they can not be converted to BigDecimal and instead
throw a NumberFormatException.
This change adds support for calculations that produce these numbers either
for use in the `ROUND` function or the HavingSpecMetricComparator by not
attempting to convert the number to a BigDecimal.
The bug in ROUND was first introduced in #7224 where we added the ability to
round to any decimal place. This PR changes the behavior back to using
`Math.round` if we recognize a number that can not be converted to a
BigDecimal.
* Add tests and fix spellcheck
* update error message in ExpressionsTest
* Address comments
* fix up round for infinity
* round non numeric doubles returns a double
* fix spotbugs
* Update docs/misc/math-expr.md
* Update docs/querying/sql.md
* make joinables closeable
* tests and adjustments
* refactor to make join stuffs impelement ReferenceCountedObject instead of Closable, more tests
* fixes
* javadocs and stuff
* fix bugs
* more test
* fix lgtm alert
* simplify
* fixup javadoc
* review stuffs
* safeguard against exceptions
* i hate this checkstyle rule
* make IndexedTable extend Closeable
* fix groupBy with literal in subquery grouping
* fix groupBy with literal in subquery grouping
* fix groupBy with literal in subquery grouping
* address comments
* update javadocs
* Fix join
* Fix Subquery could not be converted to groupBy query
* Fix Subquery could not be converted to groupBy query
* Fix Subquery could not be converted to groupBy query
* Fix Subquery could not be converted to groupBy query
* Fix Subquery could not be converted to groupBy query
* Fix Subquery could not be converted to groupBy query
* Fix Subquery could not be converted to groupBy query
* Fix Subquery could not be converted to groupBy query
* add tests
* address comments
* fix failing tests
* Add REGEXP_LIKE, fix empty-pattern bug in REGEXP_EXTRACT.
- Add REGEXP_LIKE function that returns a boolean, and is useful in
WHERE clauses.
- Fix REGEXP_EXTRACT return type (should be nullable; causes incorrect
filter elision).
- Fix REGEXP_EXTRACT behavior for empty patterns: should always match
(previously, they threw errors).
- Improve error behavior when REGEXP_EXTRACT and REGEXP_LIKE are passed
non-literal patterns.
- Improve documentation of REGEXP_EXTRACT.
* Changes based on PR review.
* Fix arg check.
* Important fixes!
* Add speller.
* wip
* Additional tests.
* Fix up tests.
* Add validation error tests.
* Additional tests.
* Remove useless call.
* Refactor JoinFilterAnalyzer
This patch attempts to make it easier to follow the join filter analysis code
with the hope of making it easier to add rewrite optimizations in the future.
To keep the patch small and easy to review, this is the first of at least 2
patches that are planned.
This patch adds a builder to the Pre-Analysis, so that it is easier to
instantiate the preAnalysis. It also moves some of the filter normalization
code out to Fitlers with associated tests.
* fix tests
* Refactor JoinFilterAnalyzer - part 2
This change introduces the following components:
* RhsRewriteCandidates - a wrapper for a list of candidates and associated
functions to operate on the set of candidates.
* JoinableClauses - a wrapper for the list of JoinableClause that represent
a join condition and the associated functions to operate on the clauses.
* Equiconditions - a wrapper representing the equiconditions that are used
in the join condition.
And associated test changes.
This refactoring surfaced 2 bugs:
- Missing equals and hashcode implementation for RhsRewriteCandidate, thus
allowing potential duplicates in the rhs rewrite candidates
- Missing Filter#supportsRequiredColumnRewrite check in
analyzeJoinFilterClause, which could result in UnsupportedOperationException
being thrown by the filter
* fix compile error
* remove unused class
* Refactor JoinFilterAnalyzer - Correlations
Move the correlation related code out into it's own class so it's easier
to maintain.
Another patch should follow this one so that the query path uses the
correlation object instead of it's underlying maps.
* Optimize join queries where filter matches nothing
Fixes#9787
This PR changes the Joinable interface to return an Optional set of correlated
values for a column.
This allows the JoinFilterAnalyzer to differentiate between the case where the
column has no matching values and when the column could not find matching
values.
This PR chose not to distinguish between cases where correlated values could
not be computed because of a config that has this behavior disabled or because
of user error - like a column that could not be found. The reasoning was that
the latter is likely an error and the non filter pushdown path will surface the
error if it is.
* Add ingestion specs for CalciteQueryTests
This PR introduces ingestion specs that can be used for local testing
so that CalciteQueryTests can be built on a druid cluster.
* Add README
* Update sql/src/test/resources/calcite/tests/README.md
* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit
* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit
* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit
* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit
* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit
* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit
* address comments
* address comments
* fix checkstyle
* address comments
* address comments
* Fix potential NPEs in joins
intelliJ reported issues with potential NPEs. This was first hit in testing
with a filter being pushed down to the left hand table when joining against
an indexed table.
* More null check cleanup
* Optimize filter value rewrite for IndexedTable
* Add unit tests for LookupJoinable
* Add tests for IndexedTableJoinable
* Add non null assert for dimension selector
* Supress null warning in LookupJoinMatcher
* remove some null checks on hot path
* fix issue with group by limit pushdown for extractionFn, expressions, joins, etc
* remove unused
* fix test
* revert unintended change
* more tests
* consider capabilities for StringGroupByColumnSelectorStrategy
* fix test
* fix and more test
* revert because im scared
* SQL: More straightforward handling of join planning.
Two changes that simplify how joins are planned:
1) Stop using JoinProjectTransposeRule as a way of guiding subquery
removal. Instead, add logic to DruidJoinRule that identifies removable
subqueries and removes them at the point of creating a DruidJoinQueryRel.
This approach reduces the size of the planning space and allows the
planner to complete quickly.
2) Remove rules that reorder joins. Not because of an impact on the
planning time (it seems minimal), but because the decisions that the
planner was making in the new tests were sometimes worse than the
user-provided order. I think we'll need to go with the user-provided
order for now, and revisit reordering when we can add more smarts to
the cost estimator.
A third change updates numeric ExprEval classes to store their
value as a boxed type that corresponds to what it is supposed to be.
This is useful because it affects the behavior of "asString", and
is included in this patch because it is needed for the new test
"testInnerJoinTwoLookupsToTableUsingNumericColumnInReverse". This
test relies on CAST('6', 'DOUBLE') stringifying to "6.0" like an
actual double would.
Fixes#9646.
* Fix comments.
* Fix tests.
* SQL support for joins on subqueries.
Changes to SQL module:
- DruidJoinRule: Allow joins on subqueries (left/right are no longer
required to be scans or mappings).
- DruidJoinRel: Add cost estimation code for joins on subqueries.
- DruidSemiJoinRule, DruidSemiJoinRel: Removed, since DruidJoinRule can
handle this case now.
- DruidRel: Remove Nullable annotation from toDruidQuery, because
it is no longer needed (it was used by DruidSemiJoinRel).
- Update Rules constants to reflect new rules available in our current
version of Calcite. Some of these are useful for optimizing joins on
subqueries.
- Rework cost estimation to be in terms of cost per row, and place all
relevant constants in CostEstimates.
Other changes:
- RowBasedColumnSelectorFactory: Don't set hasMultipleValues. The lack
of isComplete is enough to let callers know that columns might have
multiple values, and explicitly setting it to true causes
ExpressionSelectors to think it definitely has multiple values, and
treat the inputs as arrays. This behavior interfered with some of the
new tests that involved queries on lookups.
- QueryContexts: Add maxSubqueryRows parameter, and use it in druid-sql
tests.
* Fixes for tests.
* Adjustments.
* Broker: Add ability to inline subqueries.
The main changes:
- ClientQuerySegmentWalker: Add ability to inline queries.
- Query: Add "getSubQueryId" and "withSubQueryId" methods.
- QueryMetrics: Add "subQueryId" dimension.
- ServerConfig: Add new "maxSubqueryRows" parameter, which is used by
ClientQuerySegmentWalker to limit how many rows can be inlined per
query.
- IndexedTableJoinMatcher: Allow creating keys on top of unknown types,
by assuming they are strings. This is useful because not all types are
known for fields in query results.
- InlineDataSource: Store RowSignature rather than component parts. Add
more zealous "equals" and "hashCode" methods to ease testing.
- Moved QuerySegmentWalker test code from CalciteTests and
SpecificSegmentsQueryWalker in druid-sql to QueryStackTests in
druid-server. Use this to spin up a new ClientQuerySegmentWalkerTest.
* Adjustments from CI.
* Fix integration test.
* Match GREATEST/LEAST function behavior
Change the behavior of the GREATEST / LEAST functions to be similar to
how it is implemented in other databases (as functions instead of
aggregators). The GREATEST/LEAST functions are not in the SQL standard,
but users will expect behavior similar to what other databases provide.
* Match postgres behavior & handle more SQL types
* Fix imports
* Move RowSignature from druid-sql to druid-processing and make use of it.
1) Moved (most of) RowSignature from sql to processing. Left behind the SQL-specific
stuff in a RowSignatures utility class. It also picked up some new convenience
methods along the way.
2) There were a lot of places in the code where Map<String, ValueType> was used to
associate columns with type info. These are now all replaced with RowSignature.
3) QueryToolChest's resultArrayFields method is replaced with resultArraySignature,
and it now provides type info.
* Fix up extensions.
* Various fixes