Commit Graph

461 Commits

Author SHA1 Message Date
zhangyue19921010 8c6153d511
[Bug Fix] Broker will not wait for its SQL metadata view to fully initialize before starting up, even though set awaitInitializationOnStart true (#10779)
* enhance the logic of Start up DruidSchema immediately if there are no segments.

* add UT to test DruidSchema init

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-01-22 08:48:21 -08:00
Jihoon Son 95065bdf1a
Bump dev version to 0.22.0-SNAPSHOT (#10759) 2021-01-15 13:16:23 -08:00
Jihoon Son 149306c9db
Tidy up HTTP status codes for query errors (#10746)
* Tidy up query error codes

* fix tests

* Restore query exception type in JsonParserIterator

* address review comments; add a comment explaining the ugly switch

* fix test
2021-01-13 17:20:00 -08:00
Clint Wylie 8c3c9b4060
fix limited queries with subtotals (#10743)
* i put my thing down, flip it and reverse it

* oops
2021-01-13 12:55:24 -08:00
Franklyn Dsouza 045b29fa95
Correctly handle null values in time column results (#10642)
* handle null case

* test this case

* test sql resource

* fix style
2021-01-04 22:22:46 -08:00
Clint Wylie da0eabaa01
integration test for coordinator and overlord leadership client (#10680)
* integration test for coordinator and overlord leadership, added sys.servers is_leader column

* docs

* remove not needed

* fix comments

* fix compile heh

* oof

* revert unintended

* fix tests, split out docker-compose file selection from starting cluster, use docker-compose down to stop cluster

* fixes

* style

* dang

* heh

* scripts are hard

* fix spelling

* fix thing that must not matter since was already wrong ip, log when test fails

* needs more heap

* fix merge

* less aggro
2020-12-17 22:50:12 -08:00
Abhishek Agarwal 796c25532e
Fix post-aggregator computation when used with subtotals (#10653)
* Fix post-aggregator computation

* remove commented code

* Fix numeric null handling

* Add test when subquery returns null long
2020-12-17 20:10:26 -08:00
Clint Wylie 64f97e7003
fix DruidSchema incorrectly listing tables with no segments (#10660)
* fix race condition with DruidSchema tables and dataSourcesNeedingRebuild

* rework to see if it passes analysis

* more better

* maybe this

* re-arrange and comments
2020-12-11 14:14:00 -08:00
Abhishek Agarwal 26d74b3580
Add grouping_id function (#10518)
* First draft of grouping_id function

* Add more tests and documentation

* Add calcite tests

* Fix travis failures

* bit of a change

* Add documentation

* Fix typos

* typo fix
2020-12-07 11:46:29 -08:00
Gian Merlino b7641f644c
Two fixes related to encoding of % symbols. (#10645)
* Two fixes related to encoding of % symbols.

1) TaskResourceFilter: Don't double-decode task ids. request.getPathSegments()
   returns already-decoded strings. Applying StringUtils.urlDecode on
   top of that causes erroneous behavior with '%' characters.

2) Update various ThreadFactoryBuilder name formats to escape '%'
   characters. This fixes situations where substrings starting with '%'
   are erroneously treated as format specifiers.

ITs are updated to include a '%' in extra.datasource.name.suffix.

* Avoid String.replace.

* Work around surefire bug.

* Fix xml encoding.

* Another try at the proper encoding.

* Give up on the emojis.

* Less ambitious testing.

* Fix an additional problem.

* Adjust encodeForFormat to return null if the input is null.
2020-12-06 22:35:11 -08:00
frank chen d7d2c804ad
Add zero period support to TIMESTAMPADD (#10550)
* Allow zero period for TIMESTAMPADD

* update test cases

* add empty zone test case

* add unit test cases for TimestampShiftMacro
2020-11-18 18:26:53 -08:00
Atul Mohan 6ccddedb7a
Improved exception handling in case of query timeouts (#10464)
* Separate timeout exceptions

* Add more tests

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-11-03 09:00:33 -06:00
Himanshu 4de4d4d111
remove ServerDiscoverySelector from DruidLeaderClient (#10537) 2020-10-28 10:55:11 -07:00
Clint Wylie d0821de854
support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions (#10499)
* support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions

* inspector

* changes

* more test

* clean
2020-10-26 19:55:24 -07:00
Maytas Monsereenusorn 3538abd5d0
Make sure all fields in sys.segments are JSON-serialized (#10481)
* fix JSON format

* Change all columns in sys segments to be JSON

* Change all columns in sys segments to be JSON

* add tests

* fix failing tests

* fix failing tests
2020-10-14 13:49:46 -07:00
Clint Wylie 207ef310f2
vectorized group by support for nullable numeric columns (#10441)
* vectorized group by support for numeric null columns

* revert unintended change

* adjust

* review stuffs
2020-10-05 21:53:53 -07:00
Jonathan Wei 65c0d64676
Update version to 0.21.0-SNAPSHOT (#10450)
* [maven-release-plugin] prepare release druid-0.21.0

* [maven-release-plugin] prepare for next development iteration

* Update web-console versions
2020-10-03 16:08:34 -07:00
Clint Wylie 9ec5c08e2a
fix array types from escaping into wider query engine (#10460)
* fix array types from escaping into wider query engine

* oops

* adjust

* fix lgtm
2020-10-03 15:30:34 -07:00
Clint Wylie 753bce324b
vectorize constant expressions with optimized selectors (#10440) 2020-09-29 13:19:06 -07:00
Clint Wylie 1d6cb624f4
add vectorizeVirtualColumns query context parameter (#10432)
* add vectorizeVirtualColumns query context parameter

* oops

* spelling

* default to false, more docs

* fix test

* fix spelling
2020-09-28 18:48:34 -07:00
Clint Wylie 3d700a5e31
vectorize remaining math expressions (#10429)
* vectorize remaining math expressions

* fixes

* remove cannotVectorize() where no longer true

* disable vectorized groupby for numeric columns with nulls

* fixes
2020-09-26 23:30:14 -07:00
Maytas Monsereenusorn 72f1b55f56
Add last_compaction_state to sys.segments table (#10413)
* Add is_compacted to sys.segments table

* change is_compacted to last_compaction_state

* fix tests

* fix tests

* address comments
2020-09-23 15:29:36 -07:00
Clint Wylie 19c4b16640
vectorized expressions and expression virtual columns (#10401)
* vectorized expression virtual columns

* cleanup

* fixes

* preserve float if explicitly specified

* oops

* null handling fixes, more tests

* what is an expression planner?

* better names

* remove unused method, add pi

* move vector processor builders into static methods

* reduce boilerplate

* oops

* more naming adjustments

* changes

* nullable

* missing hex

* more
2020-09-23 13:56:38 -07:00
Suneet Saldanha f71ba6f2c2
Vectorized ANY aggregators (#10338)
* WIP vectorized ANY aggregators

* tests

* fix aggs

* cleanup

* code review + tests

* docs

* use NilVectorSelector when needed

* fix spellcheck

* dont instantiate vectors

* cleanup
2020-09-14 19:44:58 -07:00
Clint Wylie 184b202411
add computed Expr output types (#10370)
* push down ValueType to ExprType conversion, tidy up

* determine expr output type for given input types

* revert unintended name change

* add nullable

* tidy up

* fixup

* more better

* fix signatures

* naming things is hard

* fix inspection

* javadoc

* make default implementation of Expr.getOutputType that returns null

* rename method

* more test

* add output for contains expr macro, split operation and function auto conversion
2020-09-14 18:18:56 -07:00
Abhishek Agarwal f5e2645bbb
Support SearchQueryDimFilter in sql via new methods (#10350)
* Support SearchQueryDimFilter in sql via new methods

* Contains is a reserved word

* revert unnecessary change

* Fix toDruidExpression method

* rename methods

* java docs

* Add native functions

* revert change in dockerfile

* remove changes from dockerfile

* More tests

* travis fix

* Handle null values better
2020-09-14 09:57:54 -07:00
Joy Kent e5f0da30ae
Fix stringFirst/stringLast rollup during ingestion (#10332)
* Add IndexMergerRollupTest

This changelist adds a test to merge indexes with StringFirst/StringLast aggregator.

* Fix StringFirstAggregateCombiner/StringLastAggregateCombiner

The segment-level type for stringFirst/stringLast is SerializablePairLongString,
not String. This changelist fixes it.

* Fix EarliestLatestAnySqlAggregator to handle COMPLEX type

This changelist allows EarliestLatestAnySqlAggregator to accept COMPLEX
type as an operand. For its return type, we set it to VARCHAR, since
COMPLEX column is only generated by stringFirst/stringLast during ingestion
rollup.

* Return value with smaller timestamp in StringFirstAggregatorFactory.combine function

* Add integration tests for stringFirst/stringLast during ingestion

* Use one EarliestLatestReturnTypeInference instance

Co-authored-by: Joy Kent <joy@automonic.ai>
2020-09-08 17:36:04 -07:00
Gian Merlino 8ab1979304
Remove implied profanity from error messages. (#10270)
i.e. WTF, WTH.
2020-08-28 11:38:50 -07:00
Gian Merlino 5cd7610fb6
SQL support for union datasources. (#10324)
* SQL support for union datasources.

Exposed via the "UNION ALL" operator. This means that there are now two
different implementations of UNION ALL: one at the top level of a query
that works by concatenating subquery results, and one at the table level
that works by creating a UnionDataSource.

The SQL documentation is updated to discuss these two use cases and how
they behave.

Future work could unify these by building support for a native datasource
that represents the union of multiple subqueries. (Today, UnionDataSource
can only represent the union of tables, not subqueries.)

* Fixes.

* Error message for sanity check.

* Additional test fixes.

* Add some error messages.
2020-08-28 07:57:06 -07:00
Clint Wylie ab60661008
refactor internal type system (#9638)
* better type tracking: add typed postaggs, finalized types for agg factories

* more javadoc

* adjustments

* transition to getTypeName to be used exclusively for complex types

* remove unused fn

* adjust

* more better

* rename getTypeName to getComplexTypeName

* setup expression post agg for type inference existing

* more javadocs

* fixup

* oops

* more test

* more test

* more comments/javadoc

* nulls

* explicitly handle only numeric and complex aggregators for incremental index

* checkstyle

* more tests

* adjust

* more tests to showcase difference in behavior

* timeseries longsum array
2020-08-26 10:53:44 -07:00
Gian Merlino 0910d22f48
Add SQL "OFFSET" clause. (#10279)
* Add SQL "OFFSET" clause.

Under the hood, this uses the new offset features from #10233 (Scan)
and #10235 (GroupBy). Since Timeseries and TopN queries do not currently
have an offset feature, SQL planning will switch from one of those to
Scan or GroupBy if users add an OFFSET.

Includes a refactoring to harmonize offset and limit planning using an
OffsetLimit wrapper class. This is useful because it ensures that the
various places that need to deal with offset and limit collapsing all
behave the same way, using its "andThen" method.

* Fix test and add another test.
2020-08-21 14:11:54 -07:00
Clint Wylie 7620b0c54e
Segment backed broadcast join IndexedTable (#10224)
* Segment backed broadcast join IndexedTable

* fix comments

* fix tests

* sharing is caring

* fix test

* i hope this doesnt fix it

* filter by schema to maybe fix test

* changes

* close join stuffs so it does not leak, allow table to directly make selector factory

* oops

* update comment

* review stuffs

* better check
2020-08-20 14:12:39 -07:00
Himanshu 12ae84165e
remove DruidLeaderClient.goAsync(..) that does not follow redirect. Replace its usage by DruidLeaderClient.go(..) with InputStreamFullResponseHandler (#9717)
* remove DruidLeaderClient.goAsync(..) that does not follow redirect.
Replace its usage by DruidLeaadereClient.go(..) with
InputStreamFullResponseHandler

* remove ByteArrayResponseHolder dependency from JsonParserIterator

* add UT to cover lines in InputStreamFullResponseHandler

* refactor SystemSchema to reduce branches

* further reduce branches

* Revert "add UT to cover lines in InputStreamFullResponseHandler"

This reverts commit 330aba3dd9.

* UTs for InputStreamFullResponseHandler

* remove unused imports
2020-08-14 10:51:18 -07:00
Gian Merlino 6cca7242de
Add "offset" parameter to the Scan query. (#10233)
* Add "offset" parameter to the Scan query.

It works by doing the query as normal and then throwing away the first
"offset" number of rows on the broker.

* Fix constructor call.

* Fix up JSONs.

* Fix call to ScanQuery.

* Doc update.

* Fix javadocs.

* Spotbugs, LGTM suppressions.

* Javadocs.

* Fix suppression.

* Stabilize Scan query result order, add tests.

* Update LGTM comment.

* Fixup.

* Test different batch sizes too.

* Nicer tests.

* Fix comment.
2020-08-13 14:56:24 -07:00
Jihoon Son a61263b4a9
Allow forceLimitPushDown in SQL (#10253)
* Allow forceLimitPushDown in SQL

* fix test

* fix test

* review comments

* fix test
2020-08-13 13:30:41 -07:00
Abhishek Radhakrishnan dc16abae34
Vectorization support for long, double, float min & max aggregators. (#10260)
* LongMaxVectorAggregator support and test case.

* DoubleMinVectorAggregator and test cases.

* DoubleMaxVectorAggregator and unit test.

* FloatMinVectorAggregator and FloatMaxVectorAggregator.

* Documentation update to include the other vector aggregators.

* Bug fix.

* checkstyle formatting fixes.

* CalciteQueryTest cases update.

* Separate test classes for FloatMaxAggregation and FloatMniAggregation.

* remove the cannotVectorize for float max/min aggregator in test.

* Tests in GroupByQueryRunner, GroupByTimeseriesQueryRunner and TimeseriesQueryRunner.
2020-08-10 15:18:55 -07:00
Gian Merlino b6aaf59e8c
Add "offset" parameter to GroupBy query. (#10235)
* Add "offset" parameter to GroupBy query.

It works by doing the query as normal and then throwing away the first
"offset" number of rows on the broker.

* Stabilize GroupBy sorts.

* Fix inspections.

* Fix suppression.

* Fixups.

* Move TopNSequence to druid-core.

* Addl comments.

* NumberedElement equals verification.

* Changes from review.
2020-08-05 15:39:58 -07:00
Abhishek Radhakrishnan 34a4113752
Add vectorization support for the longMin aggregator. (#10211)
* Fix minor formatting in docs.

* Add Nullhandling initialization for test to run from IDE.

* Vectorize longMin aggregator.

- A new vectorized class for the vectorized long min aggregator.
- Changes to AggregatorFactory to support vectorize functionality.
- Few changes to schema evolution test to add LongMinAggregatorFactory.

* Add longSum to the supported vectorized aggregator implementations.

* Add MIN() long min to calcite query test that can vectorize.

* Add simple long aggregations test.

* Fixup formatting per checkstyle guide.

* fixup and add more tests for long min aggregator.

* Override test for groupBy since timestamps are handled differently.

* Null compatibility check in test.

* Review comment: Add a test case to LongMinAggregationTest.
2020-08-01 15:32:09 -07:00
Maytas Monsereenusorn 574b062f1f
Cluster wide default query context setting (#10208)
* Cluster wide default query context setting

* Cluster wide default query context setting

* Cluster wide default query context setting

* add docs

* fix docs

* update props

* fix checkstyle

* fix checkstyle

* fix checkstyle

* update docs

* address comments

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix NPE
2020-07-29 15:19:18 -07:00
Jihoon Son 63c1746fe4
Fix timeseries query constructor when postAggregator has an expression reading timestamp result column (#10198)
* Fix timeseries query constructor when postAggregator has an expression reading timestamp result column

* fix npe

* Fix postAgg referencing timestampResultField and add a test for it

* fix test

* doc

* revert doc
2020-07-27 10:54:44 -07:00
Jihoon Son 26d099f39b
Fix sys.servers table to not throw NPE and handle brokers/indexers/peons properly for broadcast segments (#10183)
* Fix sys.servers table to not throw NPE and handle brokers/indexers/peons properly for broadcast segments

* fix tests and add missing tests

* revert null handling fix

* unused import

* move out util methods from DiscoveryDruidNode
2020-07-21 17:52:51 -07:00
Franklyn Dsouza 1b9aacb1cd
Fix avg sql aggregator (#10135)
* new average aggregator

* method to create count aggregator factory

* test everything

* update other usages

* fix style

* fix more tests

* fix datasketches tests
2020-07-08 08:38:56 -07:00
Clint Wylie c86e7ce30b
bump version to 0.20.0-SNAPSHOT (#10124) 2020-07-06 15:08:32 -07:00
Jonathan Wei ed981ef88e
Add DimFilter.toOptimizedFilter(), ensure that join filter pre-analysis operates on optimized filters (#10056)
* Ensure that join filter pre-analysis operates on optimized filters, add DimFilter.toOptimizedFilter

* Remove aggressive equality check that was used for testing

* Use Suppliers.memoize

* Checkstyle
2020-07-01 22:26:17 -07:00
Maytas Monsereenusorn 1676ba22e3
Fix Stack overflow with infinite loop in ReduceExpressionsRule of HepProgram (#10120)
* Fix Stack overflow with SELECT ARRAY ['Hello', NULL]

* address comments
2020-07-01 17:48:09 -07:00
Yuanli Han fc555980e8
Remove payload field from table sys.segment (#9883)
* remove payload field from table sys.segments

* update doc

* fix test

* fix CI failure

* add necessary fields

* fix doc

* fix comment
2020-06-29 22:20:23 -07:00
Clint Wylie ec1f443a5c
update avatica to handle additional character sets over jdbc (#10074)
* update avatica to handle additional character sets over jdbc

* update license yaml, fix test

* oops
2020-06-24 19:58:34 -07:00
Clint Wylie c2f5d453f8
fix topn on string columns with non-sorted or non-unique dictionaries (#10053)
* fix topn on string columns with non-sorted or non-unique dictionaries

* fix metadata tests

* refactor, clarify comments and code, fix ci failures
2020-06-19 11:35:18 -07:00
Jonathan Wei 37e150c075
Fix join filter rewrites with nested queries (#10015)
* Fix join filter rewrites with nested queries

* Fix test, inspection, coverage

* Remove clauses from group key

* Fix import order

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2020-06-18 21:32:29 -07:00
Clint Wylie b5e6569d2c
global table only if joinable (#10041)
* global table if only joinable

* oops

* fix style, add more tests

* Update sql/src/test/java/org/apache/druid/sql/calcite/schema/DruidSchemaTest.java

* better information schema columns, distinguish broadcast from joinable

* fix javadoc

* fix mistake

Co-authored-by: Jihoon Son <jihoonson@apache.org>
2020-06-18 17:32:10 -07:00
Samarth Jain 3527458f85
Druid Avatica - Handle escaping of search characters correctly (#10040)
Fix Avatica based metadata queries by appending ESCAPE '\' clause to the LIKE expressions
2020-06-17 20:01:31 -07:00
Clint Wylie 68aa384190
global table datasource for broadcast segments (#10020)
* global table datasource for broadcast segments

* tests

* fix

* fix test

* comments and javadocs

* review stuffs

* use generated equals and hashcode
2020-06-16 17:58:05 -07:00
Suneet Saldanha 4e483a70b4
ROUND and having comparators correctly handle special double values (#10014)
* ROUND and having comparators correctly handle doubles

Double.NaN, Double.POSITIVE_INFINITY and Double.NEGATIVE_INFINITY are not real
numbers. Because of this, they can not be converted to BigDecimal and instead
throw a NumberFormatException.

This change adds support for calculations that produce these numbers either
for use in the `ROUND` function or the HavingSpecMetricComparator by not
attempting to convert the number to a BigDecimal.

The bug in ROUND was first introduced in #7224 where we added the ability to
round to any decimal place. This PR changes the behavior back to using
`Math.round` if we recognize a number that can not be converted to a
BigDecimal.

* Add tests and fix spellcheck

* update error message in ExpressionsTest

* Address comments

* fix up round for infinity

* round non numeric doubles returns a double

* fix spotbugs

* Update docs/misc/math-expr.md

* Update docs/querying/sql.md
2020-06-16 16:09:46 -07:00
Clint Wylie 96eb69e475
ignore brokers in broker views (#10017) 2020-06-10 12:29:30 -07:00
Clint Wylie f8b643ec72
make joinables closeable (#9982)
* make joinables closeable

* tests and adjustments

* refactor to make join stuffs impelement ReferenceCountedObject instead of Closable, more tests

* fixes

* javadocs and stuff

* fix bugs

* more test

* fix lgtm alert

* simplify

* fixup javadoc

* review stuffs

* safeguard against exceptions

* i hate this checkstyle rule

* make IndexedTable extend Closeable
2020-06-09 20:12:36 -07:00
Jonathan Wei 771870ae2d
Load broadcast datasources on broker and tasks (#9971)
* Load broadcast datasources on broker and tasks

* Add javadocs

* Support HTTP segment management

* Fix indexer maxSize

* inspection fix

* Make segment cache optional on non-historicals

* Fix build

* Fix inspections, some coverage, failed tests

* More tests

* Add CliIndexer to MainTest

* Fix inspection

* Rename UnprunedDataSegment to LoadableDataSegment

* Address PR comments

* Fix
2020-06-08 20:15:59 -07:00
Maytas Monsereenusorn 9738a03c83
Fix groupBy with literal in subquery grouping (#9986)
* fix groupBy with literal in subquery grouping

* fix groupBy with literal in subquery grouping

* fix groupBy with literal in subquery grouping

* address comments

* update javadocs
2020-06-04 13:28:05 -10:00
Maytas Monsereenusorn 790e9482ea
Fix Subquery could not be converted to groupBy query (#9959)
* Fix join

* Fix Subquery could not be converted to groupBy query

* Fix Subquery could not be converted to groupBy query

* Fix Subquery could not be converted to groupBy query

* Fix Subquery could not be converted to groupBy query

* Fix Subquery could not be converted to groupBy query

* Fix Subquery could not be converted to groupBy query

* Fix Subquery could not be converted to groupBy query

* Fix Subquery could not be converted to groupBy query

* add tests

* address comments

* fix failing tests
2020-06-03 16:46:28 -07:00
Gian Merlino 3dfd7c30c0
Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT. (#9893)
* Add REGEXP_LIKE, fix empty-pattern bug in REGEXP_EXTRACT.

- Add REGEXP_LIKE function that returns a boolean, and is useful in
  WHERE clauses.
- Fix REGEXP_EXTRACT return type (should be nullable; causes incorrect
  filter elision).
- Fix REGEXP_EXTRACT behavior for empty patterns: should always match
  (previously, they threw errors).
- Improve error behavior when REGEXP_EXTRACT and REGEXP_LIKE are passed
  non-literal patterns.
- Improve documentation of REGEXP_EXTRACT.

* Changes based on PR review.

* Fix arg check.

* Important fixes!

* Add speller.

* wip

* Additional tests.

* Fix up tests.

* Add validation error tests.

* Additional tests.

* Remove useless call.
2020-06-03 14:31:37 -07:00
Maytas Monsereenusorn 0d22462e07
Document unsupported Join on multi-value column (#9948)
* Document Unsupported Join on multi-value column

* Document Unsupported Join on multi-value column

* address comments

* Add unit tests

* address comments

* add tests
2020-06-03 09:55:52 -10:00
Maytas Monsereenusorn 821c5d5a5c
Prevent JOIN reducing to a JOIN with constant in the ON condition (#9941)
* Prevent Join reducing to on constant condition

* Prevent Join reducing to on constant condition

* addreess comments

* set queryContext in tests
2020-06-01 09:39:06 -07:00
Suneet Saldanha e03d38b6c8
Optimize join queries where filter matches nothing (#9931)
* Refactor JoinFilterAnalyzer

This patch attempts to make it easier to follow the join filter analysis code
with the hope of making it easier to add rewrite optimizations in the future.

To keep the patch small and easy to review, this is the first of at least 2
patches that are planned.

This patch adds a builder to the Pre-Analysis, so that it is easier to
instantiate the preAnalysis. It also moves some of the filter normalization
code out to Fitlers with associated tests.

* fix tests

* Refactor JoinFilterAnalyzer - part 2

This change introduces the following components:
 * RhsRewriteCandidates - a wrapper for a list of candidates and associated
     functions to operate on the set of candidates.
 * JoinableClauses - a wrapper for the list of JoinableClause that represent
     a join condition and the associated functions to operate on the clauses.
 * Equiconditions - a wrapper representing the equiconditions that are used
     in the join condition.

And associated test changes.

This refactoring surfaced 2 bugs:
 - Missing equals and hashcode implementation for RhsRewriteCandidate, thus
   allowing potential duplicates in the rhs rewrite candidates
 - Missing Filter#supportsRequiredColumnRewrite check in
   analyzeJoinFilterClause, which could result in UnsupportedOperationException
   being thrown by the filter

* fix compile error

* remove unused class

* Refactor JoinFilterAnalyzer - Correlations

Move the correlation related code out into it's own class so it's easier
to maintain.
Another patch should follow this one so that the query path uses the
correlation object instead of it's underlying maps.

* Optimize join queries where filter matches nothing

Fixes #9787

This PR changes the Joinable interface to return an Optional set of correlated
values for a column.
This allows the JoinFilterAnalyzer to differentiate between the case where the
column has no matching values and when the column could not find matching
values.

This PR chose not to distinguish between cases where correlated values could
not be computed because of a config that has this behavior disabled or because
of user error - like a column that could not be found. The reasoning was that
the latter is likely an error and the non filter pushdown path will surface the
error if it is.
2020-05-29 16:53:03 -07:00
Suneet Saldanha cbd587dbd6
Add parameterized Calcite tests for join queries (#9923)
* Add parameterized Calcite tests for join queries

* new tests

* review comments
2020-05-28 19:10:26 -07:00
Suneet Saldanha b5dfa5322b
Make it easier for devs to add CalciteQueryTests (#9922)
* Add ingestion specs for CalciteQueryTests

This PR introduces ingestion specs that can be used for local testing
so that CalciteQueryTests can be built on a druid cluster.

* Add README

* Update sql/src/test/resources/calcite/tests/README.md
2020-05-27 13:03:53 -07:00
Maytas Monsereenusorn 0a8bf83bc5
Bad plan for table-lookup-lookup join with filter on first lookup and outer limit (#9773)
* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit

* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit

* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit

* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit

* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit

* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit

* address comments

* address comments

* fix checkstyle

* address comments

* address comments
2020-05-14 16:56:40 -07:00
Suneet Saldanha b0167295d7
Fail incorrectly constructed join queries (#9830)
* Fail incorrectly constructed join queries

* wip annotation for equals implementations

* Add equals tests

* fix tests

* Actually fix the tests

* Address review comments

* prohibit Pattern.hashCode()
2020-05-13 14:23:04 -07:00
Jihoon Son 6674d721bc
Avoid sorting values in InDimFilter if possible (#9800)
* Avoid sorting values in InDimFilter if possible

* tests

* more tests

* fix and and or filters

* fix build

* false and true vector matchers

* fix vector matchers

* checkstyle

* in filter null handling

* remove wrong test

* address comments

* remove unnecessary null check

* redundant separator

* address comments

* typo

* tests
2020-05-06 15:26:36 -07:00
Clint Wylie 9a293d554d
remove UnionMergeRule rules from SQL planner (#9797) 2020-05-01 12:50:11 -07:00
Suneet Saldanha 7510e6e722
Fix potential NPEs in joins (#9760)
* Fix potential NPEs in joins

intelliJ reported issues with potential NPEs. This was first hit in testing
with a filter being pushed down to the left hand table when joining against
an indexed table.

* More null check cleanup

* Optimize filter value rewrite for IndexedTable

* Add unit tests for LookupJoinable

* Add tests for IndexedTableJoinable

* Add non null assert for dimension selector

* Supress null warning in LookupJoinMatcher

* remove some null checks on hot path
2020-04-29 11:03:13 -07:00
Jihoon Son b8f7128b2d
Revert "remove ServerDiscoverySelector from DruidLeaderClient (#9481)" (#9702)
* Revert "remove ServerDiscoverySelector from DruidLeaderClient (#9481)"

This reverts commit 072bbe210f.

* fix build
2020-04-14 20:42:56 -07:00
Clint Wylie 0ff926b1a1
fix issue with group by limit pushdown for extractionFn, expressions, joins, etc (#9662)
* fix issue with group by limit pushdown for extractionFn, expressions, joins, etc

* remove unused

* fix test

* revert unintended change

* more tests

* consider capabilities for StringGroupByColumnSelectorStrategy

* fix test

* fix and more test

* revert because im scared
2020-04-11 01:18:11 -07:00
Gian Merlino 75c543b50f
SQL: More straightforward handling of join planning. (#9648)
* SQL: More straightforward handling of join planning.

Two changes that simplify how joins are planned:

1) Stop using JoinProjectTransposeRule as a way of guiding subquery
removal. Instead, add logic to DruidJoinRule that identifies removable
subqueries and removes them at the point of creating a DruidJoinQueryRel.
This approach reduces the size of the planning space and allows the
planner to complete quickly.

2) Remove rules that reorder joins. Not because of an impact on the
planning time (it seems minimal), but because the decisions that the
planner was making in the new tests were sometimes worse than the
user-provided order. I think we'll need to go with the user-provided
order for now, and revisit reordering when we can add more smarts to
the cost estimator.

A third change updates numeric ExprEval classes to store their
value as a boxed type that corresponds to what it is supposed to be.
This is useful because it affects the behavior of "asString", and
is included in this patch because it is needed for the new test
"testInnerJoinTwoLookupsToTableUsingNumericColumnInReverse". This
test relies on CAST('6', 'DOUBLE') stringifying to "6.0" like an
actual double would.

Fixes #9646.

* Fix comments.

* Fix tests.
2020-04-09 16:21:43 -07:00
Jihoon Son 0da8ffc3ff
Bump up development version to 0.19.0-SNAPSHOT (#9586) 2020-03-30 16:24:04 -07:00
Clint Wylie fa5da6693c
add lane enforcement for joinish queries (#9563)
* add lane enforcement for joinish queries

* oops

* style

* review stuffs
2020-03-30 11:58:16 -07:00
Clint Wylie 2bc29543e5
modify QueryCapacityExceededException to provide better messaging (#9547)
* modify QueryCapacityExceededException to provide better messaging

* style
2020-03-23 20:05:11 -07:00
Gian Merlino 54c9325256
SQL support for joins on subqueries. (#9545)
* SQL support for joins on subqueries.

Changes to SQL module:

- DruidJoinRule: Allow joins on subqueries (left/right are no longer
  required to be scans or mappings).
- DruidJoinRel: Add cost estimation code for joins on subqueries.
- DruidSemiJoinRule, DruidSemiJoinRel: Removed, since DruidJoinRule can
  handle this case now.
- DruidRel: Remove Nullable annotation from toDruidQuery, because
  it is no longer needed (it was used by DruidSemiJoinRel).
- Update Rules constants to reflect new rules available in our current
  version of Calcite. Some of these are useful for optimizing joins on
  subqueries.
- Rework cost estimation to be in terms of cost per row, and place all
  relevant constants in CostEstimates.

Other changes:

- RowBasedColumnSelectorFactory: Don't set hasMultipleValues. The lack
  of isComplete is enough to let callers know that columns might have
  multiple values, and explicitly setting it to true causes
  ExpressionSelectors to think it definitely has multiple values, and
  treat the inputs as arrays. This behavior interfered with some of the
  new tests that involved queries on lookups.
- QueryContexts: Add maxSubqueryRows parameter, and use it in druid-sql
  tests.

* Fixes for tests.

* Adjustments.
2020-03-22 16:43:55 -07:00
Gian Merlino 1ef25a438f
Broker: Add ability to inline subqueries. (#9533)
* Broker: Add ability to inline subqueries.

The main changes:

- ClientQuerySegmentWalker: Add ability to inline queries.
- Query: Add "getSubQueryId" and "withSubQueryId" methods.
- QueryMetrics: Add "subQueryId" dimension.
- ServerConfig: Add new "maxSubqueryRows" parameter, which is used by
  ClientQuerySegmentWalker to limit how many rows can be inlined per
  query.
- IndexedTableJoinMatcher: Allow creating keys on top of unknown types,
  by assuming they are strings. This is useful because not all types are
  known for fields in query results.
- InlineDataSource: Store RowSignature rather than component parts. Add
  more zealous "equals" and "hashCode" methods to ease testing.
- Moved QuerySegmentWalker test code from CalciteTests and
  SpecificSegmentsQueryWalker in druid-sql to QueryStackTests in
  druid-server. Use this to spin up a new ClientQuerySegmentWalkerTest.

* Adjustments from CI.

* Fix integration test.
2020-03-18 15:06:45 -07:00
Jonathan Wei b1847364b0
More efficient join filter rewrites (#9516)
* More efficient join filter rewrites

* Rebase

* Remove unused functions

* PR comments, fix compile

* Adjust comment

* Allow filter rewrite when join condition has LHS expression

* Fix inspections

* Fix tests
2020-03-16 22:16:14 -07:00
Clint Wylie 6afd55c8f4
threshold based automatic query prioritization (#9493)
* threshold based automatic query prioritization

* fixes

* spelling and fixes

* fix docs

* spelling

* checkstyle

* adjustments

* doc fix
2020-03-13 01:41:54 -07:00
Chi Cao Minh 6b02991464
Match GREATEST/LEAST function behavior to other DBs (#9488)
* Match GREATEST/LEAST function behavior

Change the behavior of the GREATEST / LEAST functions to be similar to
how it is implemented in other databases (as functions instead of
aggregators). The GREATEST/LEAST functions are not in the SQL standard,
but users will expect behavior similar to what other databases provide.

* Match postgres behavior & handle more SQL types

* Fix imports
2020-03-12 15:10:11 -07:00
Gian Merlino ff59d2e78b
Move RowSignature from druid-sql to druid-processing and make use of it. (#9508)
* Move RowSignature from druid-sql to druid-processing and make use of it.

1) Moved (most of) RowSignature from sql to processing. Left behind the SQL-specific
   stuff in a RowSignatures utility class. It also picked up some new convenience
   methods along the way.
2) There were a lot of places in the code where Map<String, ValueType> was used to
   associate columns with type info. These are now all replaced with RowSignature.
3) QueryToolChest's resultArrayFields method is replaced with resultArraySignature,
   and it now provides type info.

* Fix up extensions.

* Various fixes
2020-03-12 11:06:44 -07:00
Gian Merlino 2ef5c17441
Link up row-based datasources to serving layer. (#9503)
* Link up row-based datasources to serving layer.

- Add SegmentWrangler interface that allows linking of DataSources to Segments.
- Add LocalQuerySegmentWalker that uses SegmentWranglers to compute queries on
  data that is available locally.
- Modify ClientQuerySegmentWalker to use LocalQuerySegmentWalker when the base
  datasource is concrete and not a table.
- Add SegmentWranglerModule to the Broker so it has them available and can
  properly instantiate . LocalQuerySegmentWalkers.
- Set InlineDataSource and LookupDataSource to concrete, since they can be
  directly queried now.

* Fix tests.
2020-03-11 11:32:27 -07:00
Clint Wylie 8b9fe6f584
query laning and load shedding (#9407)
* prototype

* merge QueryScheduler and QueryManager

* everything in its right place

* adjustments

* docs

* fixes

* doc fixes

* use resilience4j instead of semaphore

* more tests

* simplify

* checkstyle

* spelling

* oops heh

* remove unused

* simplify

* concurrency tests

* add SqlResource tests, refactor error response

* add json config tests

* use LongAdder instead of AtomicLong

* remove test only stuffs from scheduler

* javadocs, etc

* style

* partial review stuffs

* adjust

* review stuffs

* more javadoc

* error response documentation

* spelling

* preserve user specified lane for NoSchedulingStrategy

* more test, why not

* doc adjustment

* style

* missed review for make a thing a constant

* fixes and tests

* fix test

* Update docs/configuration/index.md

Co-Authored-By: sthetland <steve.hetland@imply.io>

* doc update

Co-authored-by: sthetland <steve.hetland@imply.io>
2020-03-10 02:57:16 -07:00
Jihoon Son 75e2051195
Convert array_contains() and array_overlaps() into native filters if possible (#9487)
* Convert array_contains() and array_overlaps() into native filters if
possible

* make spotbugs happy and fix null results when null compatible
2020-03-09 22:50:38 -07:00
Clint Wylie f8b1f2f7f3
fix issue when distinct grouping dimensions are optimized into the same virtual column expression (#9429)
* fix issue when distinct grouping dimensions are optimized into the same virtual column expression

* fix tests

* more better

* fixes
2020-03-09 17:48:29 -07:00
Jonathan Wei 0136dba95d
Add option to control join filter rewrites (#9472)
* Add option to control join filter rewrites

* Fix inspections
2020-03-09 17:36:07 -07:00
Himanshu 072bbe210f
remove ServerDiscoverySelector from DruidLeaderClient (#9481) 2020-03-09 12:13:59 -07:00
Gian Merlino c9faf3e148
Add SQL GROUPING SETS support. (#9122)
* Add SQL GROUPING SETS support.

Built on top of the subtotalsSpec feature in the groupBy query. This also involves
two changes to subtotalsSpec:

- Alter behavior so limitSpec is applied after subtotalsSpec, rather than applied to
  each grouping set. This is more in line with SQL standard behavior. I think it is okay
  to make this change, since the old behavior was not documented, so users should
  hopefully not be depending on it.
- Fix a bug where virtual columns were included in the subtotal queries, but they
  should not have been.

Also fixes two bugs in query equality checking:

- BaseQuery: Use getDuration() instead of "duration" in equals and hashCode, since the
  latter is lazily initialized and might be null in one query but not the other.
- GroupByQuery: Include subtotalsSpec in equals and hashCode.

* Fix bugs.

* Fix tests.

* PR updates.

* Grouping class hygiene.
2020-02-26 08:52:39 -08:00
Clint Wylie 6d8dd5ec10
string -> expression -> string -> expression (#9367)
* add Expr.stringify which produces parseable expression strings, parser support for null values in arrays, and parser support for empty numeric arrays

* oops, macros are expressions too

* style

* spotbugs

* qualified type arrays

* review stuffs

* simplify grammar

* more permissive array parsing

* reuse expr joiner

* fix it
2020-02-21 15:43:02 -08:00
Srinivas Reddy 05258dca37
Improved the readability and fixed few java warnings (#9163)
* Improved the readability and fixed few java warnings

* Fix the checkstyle

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2020-02-22 07:30:11 +09:00
Clint Wylie b408a6d774
sql support for dynamic parameters (#6974)
* sql support for dynamic parameters

* fixup

* javadocs

* fixup from merge

* formatting

* fixes

* fix it

* doc fix

* remove druid fallback self-join parameterized test

* unused imports

* ignore test for now

* fix imports

* fixup

* fix merge

* merge fixup

* fix test that cannot vectorize

* fixup and more better

* dependency thingo

* fix docs

* tweaks

* fix docs

* spelling

* unused imports after merge

* review stuffs

* add comment

* add ignore text

* review stuffs
2020-02-19 13:09:20 -08:00
Lucas Capistrant 5befd40638
Issue 4909 popped up again. I applied PR 5451 liberally to all new Calcite test classes introduced in PR 9279 to fix (#9324) 2020-02-16 22:29:43 -08:00
Clint Wylie b1be88d79c
fix Expressions.toQueryGranularity to be more correct, improve javadocs of Expr.getIdentifierIfIdentifier and Expr.getBindingIfIdentifier (#9363) 2020-02-16 08:36:40 -08:00
Suneet Saldanha b1f38131af
Fix timestamp extract fn to match postgreSQL (#9337)
* Fix timestamp extract fn to match postgres

Update the timestamp extract function so that it matches the PostgreSQL docs.
Examples from the PostgreSQL docs were added as tests for DECADE, CENTURY
and MILLENIUM extraction.

There were bugs in CENTURY and MILLENIUM that were spotted because of intelliJ
inspections - 'Integer division in floating point context'

* Update CalciteQueryTest

* remove useless round

* mark integer division as an error
2020-02-12 15:39:19 -08:00
Maytas Monsereenusorn c30579e47b
ANY Aggregator should not skip null values implementation (#9317)
* ANY Aggregator should not skip null values implementation

* add tests

* add more tests

* Update documentation

* add more tests

* address review comments

* optimize StringAnyBufferAggregator

* fix failing tests

* address pr comments
2020-02-12 14:01:41 -08:00
Manish Gill d268ff7297
Use ExecutorService instead of ScheduledExecutorService where necessary (#9325)
* Use ExecutorService instead of ScheduledExecutorService where necessary - #9286

* Added inspection rule to prohibit ScheduledExecutorService assignment to ExecutorService
2020-02-11 19:05:48 -08:00
Jonathan Wei b2c00b3a79
Add query context option to disable join filter push down (#9335) 2020-02-11 15:31:34 -08:00
Jonathan Wei 57765a499b
Allow overriding default JoinableFactory in SpecificSegmentsQuerySegmentWalker (#9330) 2020-02-06 18:35:26 -08:00
Gian Merlino 475b90c3a6
Remove EasyMock dependency from CalciteTests. (#9310)
* Remove EasyMock dependency from CalciteTests.

Useful because CalciteTests is used by other modules (e.g. druid-benchmarks)
and we don't want them to have to pull in EasyMock.

* CalciteTests no longer needs curator-x-discovery either.
2020-02-04 22:10:17 -08:00
Aditya 868fdeb384
GREATEST/LEAST post-aggregators in SQL (#8719)
* implement shell for greatest sql aggregator with hardcoded long values

* implement functional long greatest aggregator for direct access columns

* implement greatest & least sql aggregators for long & double types using abstract base class

* add javadocs, unit tests & handling for floats for greatest/least postaggregations

* minor checkstyle fix

* improve naming for the test cases

* make inner class static

* remove blank lines to retest travis build

* change trivial text to rerun travis build

* implement suggested updates for greatest/least sql aggs & fix checkstyle issues

* fix stale comments in greatest/least sql aggs abstract base

* Update sql.md

* improve sql function definitions for greatest/least sql aggs

* add more tests for greatest/least sql aggs

* add tests to cover invalid greatest/least sql expressions

* rename & reorder greatest least sql tests
2020-02-04 17:08:53 -08:00