Commit Graph

2458 Commits

Author SHA1 Message Date
frank chen d7d2c804ad
Add zero period support to TIMESTAMPADD (#10550)
* Allow zero period for TIMESTAMPADD

* update test cases

* add empty zone test case

* add unit test cases for TimestampShiftMacro
2020-11-18 18:26:53 -08:00
frank chen e83d5cb59e
Fix ingestion failure of pretty-formatted JSON message (#10383)
* support multi-line text

* add test cases

* split json text into lines case by case

* improve exception handle

* fix CI

* use IntermediateRowParsingReader as base of JsonReader

* update doc

* ignore the non-immutable field in test case

* add more test cases

* mark `lineSplittable` as final

* fix testcases

* fix doc

* add a test case for SqlReader

* return all raw columns when exception occurs

* fix CI

* fix test cases

* resolve review comments

* handle ParseException returned by index.add

* apply Iterables.getOnlyElement

* fix CI

* fix test cases

* improve code in more graceful way

* fix test cases

* fix test cases

* add a test case to check multiple json string in one text block

* fix inspection check
2020-11-13 13:59:23 -08:00
Atul Mohan 6ccddedb7a
Improved exception handling in case of query timeouts (#10464)
* Separate timeout exceptions

* Add more tests

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-11-03 09:00:33 -06:00
Clint Wylie d0821de854
support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions (#10499)
* support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions

* inspector

* changes

* more test

* clean
2020-10-26 19:55:24 -07:00
Liran Funaro f3a2903218
Configurable Index Type (#10335)
* Introduce a Configurable Index Type

* Change to @UnstableApi

* Add AppendableIndexSpecTest

* Update doc

* Add spelling exception

* Add tests coverage

* Revert some of the changes to reduce diff

* Minor fixes

* Update getMaxBytesInMemoryOrDefault() comment

* Fix typo, remove redundant interface

* Remove off-heap spec (postponed to a later PR)

* Add javadocs to AppendableIndexSpec

* Describe testCreateTask()

* Add tests for AppendableIndexSpec within TuningConfig

* Modify hashCode() to conform with equals()

* Add comment where building incremental-index

* Add "EqualsVerifier" tests

* Revert some of the API back to AppenderatorConfig

* Don't use multi-line comments

* Remove knob documentation (deferred)
2020-10-23 18:34:26 -07:00
Abhishek Agarwal 567e381705
Any virtual column on "__time" should be a pre-join virtual column (#10451)
* Virtual column on __time should be in pre-join

* Add unit test
2020-10-12 13:04:55 -07:00
Abhishek Agarwal 4d2a92f46a
Add caching support to join queries (#10366)
* Proposed changes for making joins cacheable

* Add unit tests

* Fix tests

* simplify logic

* Pull empty byte array logic out of CachingQueryRunner

* remove useless null check

* Minor refactor

* Fix tests

* Fix segment caching on Broker

* Move join cache key computation in Broker

Move join cache key computation in Broker from ResultLevelCachingQueryRunner to CachingClusteredClient

* Fix compilation

* Review comments

* Add more tests

* Fix inspection errors

* Pushed condition analysis to JoinableFactory

* review comments

* Disable join caching for broker and add prefix key to BroadcastSegmentIndexedTable

* Remove commented lines

* Fix populateCache

* Disable caching for selective datasources

Refactored the code so that we can decide at the data source level, whether to enable cache for broker or data nodes
2020-10-09 17:42:30 -07:00
Jihoon Son 1deed9fbcd
Close aggregators in HashVectorGrouper.close() (#10452)
* Close aggregators in HashVectorGrouper.close()

* reuse grouper

* Add missing dependency
2020-10-06 10:17:33 -07:00
Clint Wylie 207ef310f2
vectorized group by support for nullable numeric columns (#10441)
* vectorized group by support for numeric null columns

* revert unintended change

* adjust

* review stuffs
2020-10-05 21:53:53 -07:00
Jonathan Wei 65c0d64676
Update version to 0.21.0-SNAPSHOT (#10450)
* [maven-release-plugin] prepare release druid-0.21.0

* [maven-release-plugin] prepare for next development iteration

* Update web-console versions
2020-10-03 16:08:34 -07:00
Clint Wylie 9ec5c08e2a
fix array types from escaping into wider query engine (#10460)
* fix array types from escaping into wider query engine

* oops

* adjust

* fix lgtm
2020-10-03 15:30:34 -07:00
Clint Wylie 753bce324b
vectorize constant expressions with optimized selectors (#10440) 2020-09-29 13:19:06 -07:00
Gian Merlino 2be1ae128f
RowBasedIndexedTable: Add specialized index types for long keys. (#10430)
* RowBasedIndexedTable: Add specialized index types for long keys.

Two new index types are added:

1) Use an int-array-based index in cases where the difference between
   the min and max values isn't too large, and keys are unique.

2) Use a Long2ObjectOpenHashMap (instead of the prior Java HashMap) in
   all other cases.

In addition:

1) RowBasedIndexBuilder, a new class, is responsible for picking which
   index implementation to use.

2) The IndexedTable.Index interface is extended to support using
   unboxed primitives in the unique-long-keys case, and callers are
   updated to use the new functionality.

Other key types continue to use indexes backed by Java HashMaps.

* Fixup logic.

* Add tests.
2020-09-29 10:46:47 -07:00
Gian Merlino 599aacce0f
Remove Expr.visit. (#10437)
* Remove Expr.visit.

It isn't used and doesn't have tests.

* Remove Visitor too.
2020-09-28 22:13:10 -07:00
Clint Wylie 1d6cb624f4
add vectorizeVirtualColumns query context parameter (#10432)
* add vectorizeVirtualColumns query context parameter

* oops

* spelling

* default to false, more docs

* fix test

* fix spelling
2020-09-28 18:48:34 -07:00
Clint Wylie 3d700a5e31
vectorize remaining math expressions (#10429)
* vectorize remaining math expressions

* fixes

* remove cannotVectorize() where no longer true

* disable vectorized groupby for numeric columns with nulls

* fixes
2020-09-26 23:30:14 -07:00
Jihoon Son 0cc9eb4903
Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided (#10288)
* Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided

* query context

* fix tests; add more test

* javadoc

* docs and more tests

* remove default and hadoop tests

* consistent name and fix javadoc

* spelling and field name

* default function for partitionsSpec

* other comments

* address comments

* fix tests and spelling

* test

* doc
2020-09-24 16:32:56 -07:00
Clint Wylie 19c4b16640
vectorized expressions and expression virtual columns (#10401)
* vectorized expression virtual columns

* cleanup

* fixes

* preserve float if explicitly specified

* oops

* null handling fixes, more tests

* what is an expression planner?

* better names

* remove unused method, add pi

* move vector processor builders into static methods

* reduce boilerplate

* oops

* more naming adjustments

* changes

* nullable

* missing hex

* more
2020-09-23 13:56:38 -07:00
Gian Merlino 1af2eace41
Include Sequence-building time in CPU time metric. (#10377)
* Include Sequence-building time in CPU time metric.

Meaningful work can be done while building Sequences, and we should
count this work. On the Broker, this includes subquery processing
work done by the mergeResults call of the GroupByQueryQueryToolChest.

* Add test.
2020-09-23 14:33:55 +08:00
Dylan Wylie f3eb0cfb3b
Avoid large limits causing int overflow in buffer size checks (#10356)
* Avoid large limits causing int overflow in buffer size checks

* fix lgtm overflow warning

Co-authored-by: Dylan <dwylie@spotx.tv>
2020-09-18 13:08:49 -07:00
Suneet Saldanha f71ba6f2c2
Vectorized ANY aggregators (#10338)
* WIP vectorized ANY aggregators

* tests

* fix aggs

* cleanup

* code review + tests

* docs

* use NilVectorSelector when needed

* fix spellcheck

* dont instantiate vectors

* cleanup
2020-09-14 19:44:58 -07:00
Clint Wylie e012d5c41b
allow vectorized query engines to utilize vectorized virtual columns (#10388)
* allow vectorized query engines to utilize vectorized virtual column implementations

* javadoc, refactor, checkstyle

* intellij inspection and more javadoc

* better

* review stuffs

* fix incorrect refactor, thanks tests

* minor adjustments
2020-09-14 19:29:35 -07:00
Clint Wylie 184b202411
add computed Expr output types (#10370)
* push down ValueType to ExprType conversion, tidy up

* determine expr output type for given input types

* revert unintended name change

* add nullable

* tidy up

* fixup

* more better

* fix signatures

* naming things is hard

* fix inspection

* javadoc

* make default implementation of Expr.getOutputType that returns null

* rename method

* more test

* add output for contains expr macro, split operation and function auto conversion
2020-09-14 18:18:56 -07:00
Abhishek Agarwal f5e2645bbb
Support SearchQueryDimFilter in sql via new methods (#10350)
* Support SearchQueryDimFilter in sql via new methods

* Contains is a reserved word

* revert unnecessary change

* Fix toDruidExpression method

* rename methods

* java docs

* Add native functions

* revert change in dockerfile

* remove changes from dockerfile

* More tests

* travis fix

* Handle null values better
2020-09-14 09:57:54 -07:00
Jihoon Son 8f14ac814e
More structured way to handle parse exceptions (#10336)
* More structured way to handle parse exceptions

* checkstyle; add more tests

* forbidden api; test

* address comment; new test

* address review comments

* javadoc for parseException; remove redundant parseException in streaming ingestion

* fix tests

* unnecessary catch

* unused imports

* appenderator test

* unused import
2020-09-11 16:31:10 -07:00
Joy Kent e5f0da30ae
Fix stringFirst/stringLast rollup during ingestion (#10332)
* Add IndexMergerRollupTest

This changelist adds a test to merge indexes with StringFirst/StringLast aggregator.

* Fix StringFirstAggregateCombiner/StringLastAggregateCombiner

The segment-level type for stringFirst/stringLast is SerializablePairLongString,
not String. This changelist fixes it.

* Fix EarliestLatestAnySqlAggregator to handle COMPLEX type

This changelist allows EarliestLatestAnySqlAggregator to accept COMPLEX
type as an operand. For its return type, we set it to VARCHAR, since
COMPLEX column is only generated by stringFirst/stringLast during ingestion
rollup.

* Return value with smaller timestamp in StringFirstAggregatorFactory.combine function

* Add integration tests for stringFirst/stringLast during ingestion

* Use one EarliestLatestReturnTypeInference instance

Co-authored-by: Joy Kent <joy@automonic.ai>
2020-09-08 17:36:04 -07:00
Suneet Saldanha 91a153820e
fix NPE in StringGroupByColumnSelectorStrategy#bufferComparator (#10325)
* fix NPE in StringGroupByColumnSelectorStrategy#bufferComparator

* Add tests

* javadocs
2020-09-04 13:23:40 -07:00
Gian Merlino d7fcff3aba
StringFirstAggregatorFactory: Fix incorrect "combine" method. (#10351)
* StringFirstAggregatorFactory: Fix incorrect "combine" method.

There was a test, but it was wrong.

* Fix superclass.
2020-09-03 20:03:26 -07:00
Gian Merlino 8ab1979304
Remove implied profanity from error messages. (#10270)
i.e. WTF, WTH.
2020-08-28 11:38:50 -07:00
Gian Merlino 21703d81ac
Fix handling of 'join' on top of 'union' datasources. (#10318)
* Fix handling of 'join' on top of 'union' datasources.

The problem is that unions are typically rewritten into a series of
individual queries on the underlying tables, but this isn't done when
the union is wrapped in a join.

The main changes are in UnionQueryRunner:

1) Replace an instanceof UnionQueryRunner check with DataSourceAnalysis.
2) Replace a "query.withDataSource" call with a new function, "Queries.withBaseDataSource".

Together, these enable UnionQueryRunner to "see through" a join.

* Tests.

* Adjust heap sizes for integration tests.

* Different approach, more tests.

* Tweak.

* Styling.
2020-08-26 14:23:54 -07:00
Jihoon Son b9ff3483ac
Add support for all partitioing schemes for auto compaction (#10307)
* Add support for all partitioing schemes for auto compaction

* annotate last compaction state for multi phase parallel indexing

* fix build and tests

* test

* better home
2020-08-26 13:19:18 -07:00
Clint Wylie ab60661008
refactor internal type system (#9638)
* better type tracking: add typed postaggs, finalized types for agg factories

* more javadoc

* adjustments

* transition to getTypeName to be used exclusively for complex types

* remove unused fn

* adjust

* more better

* rename getTypeName to getComplexTypeName

* setup expression post agg for type inference existing

* more javadocs

* fixup

* oops

* more test

* more test

* more comments/javadoc

* nulls

* explicitly handle only numeric and complex aggregators for incremental index

* checkstyle

* more tests

* adjust

* more tests to showcase difference in behavior

* timeseries longsum array
2020-08-26 10:53:44 -07:00
Suneet Saldanha a9de00d43a
Remove NUMERIC_HASHING_THRESHOLD (#10313)
* Make NUMERIC_HASHING_THRESHOLD configurable

Change the default numeric hashing threshold to 1 and make it configurable.

Benchmarks attached to this PR show that binary searches are not more faster
than doing a set contains check. The attached flamegraph shows the amount of
time a query spent in the binary search. Given the benchmarks, we can expect
to see roughly a 2x speed up in this part of the query which works out to
~ a 10% faster query in this instance.

* Remove NUMERIC_HASHING_THRESHOLD

* Remove stale docs
2020-08-25 20:05:39 -07:00
Gian Merlino f53785c52c
ExpressionFilter: Use index for expressions of single multi-value columns. (#10320)
Previously, this was disallowed, because expressions treated multi-values
as nulls. But now, if there's a single multi-value column that can be
mapped over, it's okay to use the index. Expression selectors already do
this.
2020-08-24 23:29:31 -07:00
Suneet Saldanha 707b5aae2b
Optimize large InDimFilters (#10312)
* Optimize large InDimFilters

For large InDimFilters, in default mode, the filter does a linear check of the
set to see if it contains either an empty or null. If it does, the empties are
converted to nulls by passing through the entire list again.

Instead of this, in default mode, we attempt to remove an empty string from the
values that are passed to the InDimFilter. If an empty string was removed, we
add null to the set

* code review

* Revert "code review"

This reverts commit 61fe33ebf7.

* code review - less brittle
2020-08-24 16:39:27 -07:00
Clint Wylie 7620b0c54e
Segment backed broadcast join IndexedTable (#10224)
* Segment backed broadcast join IndexedTable

* fix comments

* fix tests

* sharing is caring

* fix test

* i hope this doesnt fix it

* filter by schema to maybe fix test

* changes

* close join stuffs so it does not leak, allow table to directly make selector factory

* oops

* update comment

* review stuffs

* better check
2020-08-20 14:12:39 -07:00
Gian Merlino 6cca7242de
Add "offset" parameter to the Scan query. (#10233)
* Add "offset" parameter to the Scan query.

It works by doing the query as normal and then throwing away the first
"offset" number of rows on the broker.

* Fix constructor call.

* Fix up JSONs.

* Fix call to ScanQuery.

* Doc update.

* Fix javadocs.

* Spotbugs, LGTM suppressions.

* Javadocs.

* Fix suppression.

* Stabilize Scan query result order, add tests.

* Update LGTM comment.

* Fixup.

* Test different batch sizes too.

* Nicer tests.

* Fix comment.
2020-08-13 14:56:24 -07:00
Clint Wylie e053348f74
add hasNulls to ColumnCapabilities, ColumnAnalysis (#10219)
* add isNullable to ColumnCapabilities, ColumnAnalysis

* better builder

* fix segment metadata queries in integration tests

* adjustments

* cleanup

* fix spotbugs

* treat unknown as true in segmentmetadata

* rename to hasNulls, add docs

* fixup

* test the dim indexer selector isNull fix for numeric columns

* fixes

* oof
2020-08-13 14:55:32 -07:00
Jihoon Son a61263b4a9
Allow forceLimitPushDown in SQL (#10253)
* Allow forceLimitPushDown in SQL

* fix test

* fix test

* review comments

* fix test
2020-08-13 13:30:41 -07:00
Gian Merlino 89860b7d6a
Fix javadoc mistake in DefaultLimitSpec. (#10269)
Javadoc for getLimit should say it's a limit, not an offset.
2020-08-13 12:17:26 -07:00
Gian Merlino e273264332
Fix two id-over-maxId errors in StringDimensionIndexer. (#10245)
1) lookupId could return IDs beyond maxId if called with a recently added value.
2) getRow could return an ID for null beyond maxId, if null was recently
   encountered in a dimension that initially didn't appear at all. (In this case,
   the dictionary ID for null can be > 0).

Also add a comment explaining how this stuff is supposed to work.
2020-08-11 20:32:10 -07:00
Clint Wylie c72f96a4ba
fix bug with expressions on sparse string realtime columns without explicit null valued rows (#10248)
* fix bug with realtime expressions on sparse string columns

* fix test

* add comment back

* push capabilities for dimensions to dimension indexers since they know things

* style

* style

* fixes

* getting a bit carried away

* missed one

* fix it

* benchmark build fix

* review stuffs

* javadoc and comments

* add comment

* more strict check

* fix missed usaged of impl instead of interface
2020-08-11 11:07:17 -07:00
Abhishek Radhakrishnan dc16abae34
Vectorization support for long, double, float min & max aggregators. (#10260)
* LongMaxVectorAggregator support and test case.

* DoubleMinVectorAggregator and test cases.

* DoubleMaxVectorAggregator and unit test.

* FloatMinVectorAggregator and FloatMaxVectorAggregator.

* Documentation update to include the other vector aggregators.

* Bug fix.

* checkstyle formatting fixes.

* CalciteQueryTest cases update.

* Separate test classes for FloatMaxAggregation and FloatMniAggregation.

* remove the cannotVectorize for float max/min aggregator in test.

* Tests in GroupByQueryRunner, GroupByTimeseriesQueryRunner and TimeseriesQueryRunner.
2020-08-10 15:18:55 -07:00
Gian Merlino 170031744e
Combine InDimFilter, InFilter. (#10119)
* Combine InDimFilter, InFilter.

There are two motivations:

1. Ensure that when HashJoinSegmentStorageAdapter compares its Filter
   to the original one, and it is an "in" type, the comparison is by
   reference and does not need to check deep equality. This is useful
   when the "in" filter is very large.
2. Simplify things. (There isn't a great reason for the DimFilter and
   Filter logic to be separate, and combining them reduces some
   duplication.)

* Fix test.
2020-08-06 18:34:21 -07:00
Gian Merlino b6aaf59e8c
Add "offset" parameter to GroupBy query. (#10235)
* Add "offset" parameter to GroupBy query.

It works by doing the query as normal and then throwing away the first
"offset" number of rows on the broker.

* Stabilize GroupBy sorts.

* Fix inspections.

* Fix suppression.

* Fixups.

* Move TopNSequence to druid-core.

* Addl comments.

* NumberedElement equals verification.

* Changes from review.
2020-08-05 15:39:58 -07:00
Abhishek Radhakrishnan 34a4113752
Add vectorization support for the longMin aggregator. (#10211)
* Fix minor formatting in docs.

* Add Nullhandling initialization for test to run from IDE.

* Vectorize longMin aggregator.

- A new vectorized class for the vectorized long min aggregator.
- Changes to AggregatorFactory to support vectorize functionality.
- Few changes to schema evolution test to add LongMinAggregatorFactory.

* Add longSum to the supported vectorized aggregator implementations.

* Add MIN() long min to calcite query test that can vectorize.

* Add simple long aggregations test.

* Fixup formatting per checkstyle guide.

* fixup and add more tests for long min aggregator.

* Override test for groupBy since timestamps are handled differently.

* Null compatibility check in test.

* Review comment: Add a test case to LongMinAggregationTest.
2020-08-01 15:32:09 -07:00
frank chen 646fa84d04
Support unit on byte-related properties (#10203)
* support unit suffix on byte-related properties

* add doc

* change default value of byte-related properites in example files

* fix coding style

* fix doc

* fix CI

* suppress spelling errors

* improve code according to comments

* rename Bytes to HumanReadableBytes

* add getBytesInInt to get value safely

* improve doc

* fix problem reported by CI

* fix problem reported by CI

* resolve code review comments

* improve error message

* improve code & doc according to comments

* fix CI problem

* improve doc

* suppress spelling check errors
2020-07-31 09:58:48 +08:00
Maytas Monsereenusorn 574b062f1f
Cluster wide default query context setting (#10208)
* Cluster wide default query context setting

* Cluster wide default query context setting

* Cluster wide default query context setting

* add docs

* fix docs

* update props

* fix checkstyle

* fix checkstyle

* fix checkstyle

* update docs

* address comments

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix NPE
2020-07-29 15:19:18 -07:00
Jihoon Son 63c1746fe4
Fix timeseries query constructor when postAggregator has an expression reading timestamp result column (#10198)
* Fix timeseries query constructor when postAggregator has an expression reading timestamp result column

* fix npe

* Fix postAgg referencing timestampResultField and add a test for it

* fix test

* doc

* revert doc
2020-07-27 10:54:44 -07:00
Jihoon Son 6fdce36e41
Add integration tests for query retry on missing segments (#10171)
* Add integration tests for query retry on missing segments

* add missing dependencies; fix travis conf

* address comments

* Integration tests extension

* remove unused dependency

* remove druid_main

* fix java agent port
2020-07-22 22:30:35 -07:00
Jihoon Son 41982116f4
Report missing segments when there is no segment for the query datasource in historicals (#10199)
* Report missing segments when there is no segment for the query
datasource in historicals

* test

* missing part for test

* another test
2020-07-20 21:02:52 -07:00
Nishant Bangarwa 971d8a353b
Add groupBy limitSpec to queryCache key (#10093)
* Add groupBy limitSpec to queryCache key

* Only add limitSpec to cache key if pushdown is set to true

* review comment
2020-07-13 19:15:09 -07:00
Jihoon Son 53a2550571
Follow-up for RetryQueryRunner fix (#10144)
* address comments; use guice instead of query context

* typo

* QueryResource tests

* address comments

* catch queryException

* fix spell check
2020-07-08 13:28:11 -07:00
Clint Wylie 010fe047e1
AbstractOptimizableDimFilter should be public (#10142) 2020-07-06 15:19:32 -07:00
Clint Wylie c86e7ce30b
bump version to 0.20.0-SNAPSHOT (#10124) 2020-07-06 15:08:32 -07:00
Jonathan Wei ed981ef88e
Add DimFilter.toOptimizedFilter(), ensure that join filter pre-analysis operates on optimized filters (#10056)
* Ensure that join filter pre-analysis operates on optimized filters, add DimFilter.toOptimizedFilter

* Remove aggressive equality check that was used for testing

* Use Suppliers.memoize

* Checkstyle
2020-07-01 22:26:17 -07:00
Samarth Jain e2c5bcc22d
Fix UnknownComplexTypeColumn#makeVectorObjectSelector. Add a warning … (#10123)
* Fix UnknownComplexTypeColumn#makeVectorObjectSelector. Add a warning message to indicate failure in deserializing.
2020-07-01 20:06:23 -07:00
Samarth Jain 3e92cdf1cf
Revert "Fix UnknownTypeComplexColumn#makeVectorObjectSelector" (#10121)
This reverts commit 7bb7489afc.
2020-07-01 14:33:17 -07:00
Jihoon Son 657f8ee80f
Fix RetryQueryRunner to actually do the job (#10082)
* Fix RetryQueryRunner to actually do the job

* more javadoc

* fix test and checkstyle

* don't combine for testing

* address comments

* fix unit tests

* always initialize response context in cachingClusteredClient

* fix subquery

* address comments

* fix test

* query id for builders

* make queryId optional in the builders and ClusterQueryResult

* fix test

* suppress tests and unused methods

* exclude groupBy builder

* fix jacoco exclusion

* add tests for builders

* address comments

* don't truncate
2020-07-01 14:02:21 -07:00
samarthjain 7bb7489afc Fix UnknownTypeComplexColumn#makeVectorObjectSelector 2020-07-01 12:02:23 -07:00
Gian Merlino 5faa897a34
Join filter pre-analysis simplifications and sanity checks. (#10104)
* Join filter pre-analysis simplifications and sanity checks.

- At pre-analysis time, only compute pre-analysis for the innermost
  root query, since this is the one that will run on the join that involves
  the base datasource. Previously, pre-analyses were computed for multiple
  levels of the query, some of which were unnecessary.
- Remove JoinFilterPreAnalysisGroup and join query level gathering code,
  since they existed to support precomputation of multiple pre-analyses.
- Embed JoinFilterPreAnalysisKey into JoinFilterPreAnalysis and use it to
  sanity check at processing time that the correct pre-analysis was done.

Tangentially related changes:

- Remove prioritizeAndLaneQuery functionality from LocalQuerySegmentWalker.
  The computed priority and lanes were not being used.
- Add "getBaseQuery" method to DataSourceAnalysis to support identification
  of the proper subquery for filter pre-analysis.

* Fix compilation errors.

* Adjust tests.
2020-06-30 19:14:22 -07:00
Samarth Jain 2c1b45842f
Prevent unknown complex types from breaking DruidSchema refresh (#9422) 2020-06-30 14:06:17 -07:00
Suneet Saldanha 15a0b4ffe2
Filter http requests by http method (#10085)
* Filter http requests by http method

Add a config that allows a user which http methods to allow against their
Druid server.

Druid will only accept http requests with the method: GET, PUT, POST, DELETE
and OPTIONS.
If a Druid admin wants to allow other methods, they can do so by using the
ServerConfig#allowedHttpMethods config.

If a Druid user would like to disallow OPTIONS, this can be done by changing
the AuthConfig#allowUnauthenticatedHttpOptions config

* Exclude OPTIONS from always supported HTTP methods

Add HEAD as an allowed method for web console e2e tests

* fix docs

* fix security IT

* Actually fix the web console e2e tests

* Ignore icode coverage for nitialization classes

* code review
2020-06-29 16:59:31 -07:00
chenyuzhi459 a4c6d5f37e
fix query memory leak (#10027)
* fix query memory leak

* rollup ./idea

* roll up .idea

* clean code

* optimize style

* optimize cancel function

* optimize style

* add concurrentGroupTest test case

* add test case

* add unit test

* fix code style

* optimize cancell method use

* format code

* reback code

* optimize cancelAll

* clean code

* add comment
2020-06-26 23:30:59 -07:00
Maytas Monsereenusorn 9be5039f68
Enable query vectorization by default (#10065)
* Enable query vectorization by default

* update docs
2020-06-24 13:08:49 -07:00
Maytas Monsereenusorn f80c02da02
Fix HyperUniquesAggregatorFactory.estimateCardinality null handling to respect output type (#10063)
* fix return type from HyperUniquesAggregator/HyperUniquesVectorAggregator

* address comments

* address comments
2020-06-23 15:54:37 -10:00
Clint Wylie eee99ff0d5
minor rework of topn algorithm selection for clarity and more javadocs (#10058)
* minor refactor of topn engine algorithm selection for clarity

* adjust

* more javadoc
2020-06-22 09:08:50 -07:00
Clint Wylie c2f5d453f8
fix topn on string columns with non-sorted or non-unique dictionaries (#10053)
* fix topn on string columns with non-sorted or non-unique dictionaries

* fix metadata tests

* refactor, clarify comments and code, fix ci failures
2020-06-19 11:35:18 -07:00
Jonathan Wei 37e150c075
Fix join filter rewrites with nested queries (#10015)
* Fix join filter rewrites with nested queries

* Fix test, inspection, coverage

* Remove clauses from group key

* Fix import order

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2020-06-18 21:32:29 -07:00
Clint Wylie b5e6569d2c
global table only if joinable (#10041)
* global table if only joinable

* oops

* fix style, add more tests

* Update sql/src/test/java/org/apache/druid/sql/calcite/schema/DruidSchemaTest.java

* better information schema columns, distinguish broadcast from joinable

* fix javadoc

* fix mistake

Co-authored-by: Jihoon Son <jihoonson@apache.org>
2020-06-18 17:32:10 -07:00
Aleksey Plekhanov 2c384b61ff
IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty*()" (#9690)
* IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty*()"

* Reverted checkstyle rule

* Added tests to pass CI

* Codestyle
2020-06-18 09:47:07 -07:00
Maytas Monsereenusorn 7569ee3ec6
All aggregators should check if column can be vectorize (#10026)
* All aggregators should use vectorization-aware column processor

* All aggregators should use vectorization-aware column processor

* fix canVectorize

* fix canVectorize

* add tests

* revert back default

* address comment

* address comments

* address comment

* address comment
2020-06-17 01:52:02 -10:00
Clint Wylie 68aa384190
global table datasource for broadcast segments (#10020)
* global table datasource for broadcast segments

* tests

* fix

* fix test

* comments and javadocs

* review stuffs

* use generated equals and hashcode
2020-06-16 17:58:05 -07:00
Suneet Saldanha 4e483a70b4
ROUND and having comparators correctly handle special double values (#10014)
* ROUND and having comparators correctly handle doubles

Double.NaN, Double.POSITIVE_INFINITY and Double.NEGATIVE_INFINITY are not real
numbers. Because of this, they can not be converted to BigDecimal and instead
throw a NumberFormatException.

This change adds support for calculations that produce these numbers either
for use in the `ROUND` function or the HavingSpecMetricComparator by not
attempting to convert the number to a BigDecimal.

The bug in ROUND was first introduced in #7224 where we added the ability to
round to any decimal place. This PR changes the behavior back to using
`Math.round` if we recognize a number that can not be converted to a
BigDecimal.

* Add tests and fix spellcheck

* update error message in ExpressionsTest

* Address comments

* fix up round for infinity

* round non numeric doubles returns a double

* fix spotbugs

* Update docs/misc/math-expr.md

* Update docs/querying/sql.md
2020-06-16 16:09:46 -07:00
Gian Merlino 9330ca9717
Remove LegacyDataSource. (#10037)
* Remove LegacyDataSource.

Its purpose was to enable deserialization of strings into TableDataSources.
But we can do this more straightforwardly with Jackson annotations.

* Slight test improvement.
2020-06-16 14:40:35 -07:00
Clint Wylie 9468df4721
make phaser of ReferenceCountingCloseableObject protected instead of private so subclasses can do stuff with it (#10035) 2020-06-15 19:56:49 -07:00
Stefan Birkner 7282e2f2f9
Simplify CompressedVSizeColumnarIntsSupplierTest (#10003)
The parameters generator uses CompressionStrategy.noNoneValues() instead
of CompressionStrategyTest.compressionStrategies() which wrapped each
strategy in a single element array. This improves readability of the
test.
2020-06-10 09:32:00 -07:00
Clint Wylie f8b643ec72
make joinables closeable (#9982)
* make joinables closeable

* tests and adjustments

* refactor to make join stuffs impelement ReferenceCountedObject instead of Closable, more tests

* fixes

* javadocs and stuff

* fix bugs

* more test

* fix lgtm alert

* simplify

* fixup javadoc

* review stuffs

* safeguard against exceptions

* i hate this checkstyle rule

* make IndexedTable extend Closeable
2020-06-09 20:12:36 -07:00
Clint Wylie 1c9ca55247
remove incorrect and unnecessary overrides from BooleanVectorValueMatcher (#9994)
* remove incorrect and unnecessary overrides from BooleanVectorValueMatcher

* add test case

* add unit tests for ... part of VectorValueMatcherColumnProcessorFactory

* Update VectorValueMatcherColumnProcessorFactoryTest.java
2020-06-09 19:32:16 -07:00
Clint Wylie c5d6163c76
add a GeneratorInputSource to fill up a cluster with generated data for testing (#9946)
* move benchmark data generator into druid-processing, add a GeneratorInputSource to fill up a cluster with data

* newlines

* make test coverage not fail maybe

* remove useless test

* Update pom.xml

* Update GeneratorInputSourceTest.java

* less passive aggressive test names
2020-06-09 19:31:04 -07:00
Clint Wylie 7f51e44b00
fix NilVectorSelector filter optimization (#9989) 2020-06-08 17:40:29 -07:00
Clint Wylie 77dd5b06ae
ColumnCapabilities.hasMultipleValues refactor (#9731)
* transition ColumnCapabilities.hasMultipleValues to Capable enum, remove ColumnCapabilities.isComplete

* remove artifical, always multi-value capabilities from IncrementalIndexStorageAdapter and fix up fallout from that, fix ColumnCapabilities merge in index merger

* fix typo

* remove unused method

* review stuffs, revert IncrementalIndexStorageAdapater capabilities change, plumb lame workaround to SegmentAnalyzer

* more comment

* use volatile booleans

* fix line length

* correctly handle missing columns for vector processors

* return ColumnCapabilities.Capable for BitmapIndexSelector.hasMultipleValues, fix vector processor selection for complex

* false on non-existent
2020-06-04 23:52:37 -07:00
Maytas Monsereenusorn 9738a03c83
Fix groupBy with literal in subquery grouping (#9986)
* fix groupBy with literal in subquery grouping

* fix groupBy with literal in subquery grouping

* fix groupBy with literal in subquery grouping

* address comments

* update javadocs
2020-06-04 13:28:05 -10:00
Maytas Monsereenusorn 790e9482ea
Fix Subquery could not be converted to groupBy query (#9959)
* Fix join

* Fix Subquery could not be converted to groupBy query

* Fix Subquery could not be converted to groupBy query

* Fix Subquery could not be converted to groupBy query

* Fix Subquery could not be converted to groupBy query

* Fix Subquery could not be converted to groupBy query

* Fix Subquery could not be converted to groupBy query

* Fix Subquery could not be converted to groupBy query

* Fix Subquery could not be converted to groupBy query

* add tests

* address comments

* fix failing tests
2020-06-03 16:46:28 -07:00
Gian Merlino 3dfd7c30c0
Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT. (#9893)
* Add REGEXP_LIKE, fix empty-pattern bug in REGEXP_EXTRACT.

- Add REGEXP_LIKE function that returns a boolean, and is useful in
  WHERE clauses.
- Fix REGEXP_EXTRACT return type (should be nullable; causes incorrect
  filter elision).
- Fix REGEXP_EXTRACT behavior for empty patterns: should always match
  (previously, they threw errors).
- Improve error behavior when REGEXP_EXTRACT and REGEXP_LIKE are passed
  non-literal patterns.
- Improve documentation of REGEXP_EXTRACT.

* Changes based on PR review.

* Fix arg check.

* Important fixes!

* Add speller.

* wip

* Additional tests.

* Fix up tests.

* Add validation error tests.

* Additional tests.

* Remove useless call.
2020-06-03 14:31:37 -07:00
Maytas Monsereenusorn 0d22462e07
Document unsupported Join on multi-value column (#9948)
* Document Unsupported Join on multi-value column

* Document Unsupported Join on multi-value column

* address comments

* Add unit tests

* address comments

* add tests
2020-06-03 09:55:52 -10:00
Gian Merlino 3d81564a14
Fix various processing buffer leaks and simplify BlockingPool. (#9928)
* - GroupByQueryEngineV2: Fix leak of intermediate processing buffer when
  exceptions are thrown before result sequence is created.
- PooledTopNAlgorithm: Fix leak of intermediate processing buffer when
  exceptions are thrown before the PooledTopNParams object is created.
- BlockingPool: Remove unused "take" methods.

* Add tests to verify that buffers have been returned.
2020-06-02 18:26:18 -07:00
Gian Merlino 309fc04d54
Fix various Yielder leaks. (#9934)
* Fix various Yielder leaks.

- CombiningSequence leaked the input yielder from "toYielder" if it ran
  into an exception while accumulating the last value from the input
  yielder.
- MergeSequence leaked input yielders from "toYielder" if it ran into
  an exception while building the initial priority queue.
- ScanQueryRunnerFactory leaked the input yielder in its
  "priorityQueueSortAndLimit" strategy if it ran into an exception
  while scanning and sorting.
- YieldingSequenceBase.accumulate chomped IOExceptions thrown in
  "accumulate" during yielder closing.

* Add tests.

* Fix braces.
2020-06-02 18:26:06 -07:00
Xavier Léauté 4ecf1900c3
fix nullhandling exceptions related to test ordering (#9964)
follow-up to https://github.com/apache/druid/pull/9570
2020-06-02 10:13:54 -07:00
Clint Wylie c690d10a7d
support customized factory.json via IndexSpec for segment persist (#9957)
* support customized factory.json via IndexSpec for segment persist

* equals verifier
2020-06-01 16:36:32 -07:00
Suneet Saldanha e03d38b6c8
Optimize join queries where filter matches nothing (#9931)
* Refactor JoinFilterAnalyzer

This patch attempts to make it easier to follow the join filter analysis code
with the hope of making it easier to add rewrite optimizations in the future.

To keep the patch small and easy to review, this is the first of at least 2
patches that are planned.

This patch adds a builder to the Pre-Analysis, so that it is easier to
instantiate the preAnalysis. It also moves some of the filter normalization
code out to Fitlers with associated tests.

* fix tests

* Refactor JoinFilterAnalyzer - part 2

This change introduces the following components:
 * RhsRewriteCandidates - a wrapper for a list of candidates and associated
     functions to operate on the set of candidates.
 * JoinableClauses - a wrapper for the list of JoinableClause that represent
     a join condition and the associated functions to operate on the clauses.
 * Equiconditions - a wrapper representing the equiconditions that are used
     in the join condition.

And associated test changes.

This refactoring surfaced 2 bugs:
 - Missing equals and hashcode implementation for RhsRewriteCandidate, thus
   allowing potential duplicates in the rhs rewrite candidates
 - Missing Filter#supportsRequiredColumnRewrite check in
   analyzeJoinFilterClause, which could result in UnsupportedOperationException
   being thrown by the filter

* fix compile error

* remove unused class

* Refactor JoinFilterAnalyzer - Correlations

Move the correlation related code out into it's own class so it's easier
to maintain.
Another patch should follow this one so that the query path uses the
correlation object instead of it's underlying maps.

* Optimize join queries where filter matches nothing

Fixes #9787

This PR changes the Joinable interface to return an Optional set of correlated
values for a column.
This allows the JoinFilterAnalyzer to differentiate between the case where the
column has no matching values and when the column could not find matching
values.

This PR chose not to distinguish between cases where correlated values could
not be computed because of a config that has this behavior disabled or because
of user error - like a column that could not be found. The reasoning was that
the latter is likely an error and the non filter pushdown path will surface the
error if it is.
2020-05-29 16:53:03 -07:00
Suneet Saldanha 9c40bebc02
Refactor JoinFilterAnalyzer - part 2 (#9929)
* Refactor JoinFilterAnalyzer

This patch attempts to make it easier to follow the join filter analysis code
with the hope of making it easier to add rewrite optimizations in the future.

To keep the patch small and easy to review, this is the first of at least 2
patches that are planned.

This patch adds a builder to the Pre-Analysis, so that it is easier to
instantiate the preAnalysis. It also moves some of the filter normalization
code out to Fitlers with associated tests.

* fix tests

* Refactor JoinFilterAnalyzer - part 2

This change introduces the following components:
 * RhsRewriteCandidates - a wrapper for a list of candidates and associated
     functions to operate on the set of candidates.
 * JoinableClauses - a wrapper for the list of JoinableClause that represent
     a join condition and the associated functions to operate on the clauses.
 * Equiconditions - a wrapper representing the equiconditions that are used
     in the join condition.

And associated test changes.

This refactoring surfaced 2 bugs:
 - Missing equals and hashcode implementation for RhsRewriteCandidate, thus
   allowing potential duplicates in the rhs rewrite candidates
 - Missing Filter#supportsRequiredColumnRewrite check in
   analyzeJoinFilterClause, which could result in UnsupportedOperationException
   being thrown by the filter

* fix compile error

* remove unused class
2020-05-29 15:03:35 -07:00
Suneet Saldanha faef31a0af
Refactor JoinFilterAnalyzer (#9921)
* Refactor JoinFilterAnalyzer

This patch attempts to make it easier to follow the join filter analysis code
with the hope of making it easier to add rewrite optimizations in the future.

To keep the patch small and easy to review, this is the first of at least 2
patches that are planned.

This patch adds a builder to the Pre-Analysis, so that it is easier to
instantiate the preAnalysis. It also moves some of the filter normalization
code out to Fitlers with associated tests.

* fix tests
2020-05-28 22:32:09 -07:00
Suneet Saldanha b0167295d7
Fail incorrectly constructed join queries (#9830)
* Fail incorrectly constructed join queries

* wip annotation for equals implementations

* Add equals tests

* fix tests

* Actually fix the tests

* Address review comments

* prohibit Pattern.hashCode()
2020-05-13 14:23:04 -07:00
Jonathan Wei 16d293d6e0
Directly rewrite filters on RHS join columns into LHS equivalents (#9818)
* Directly rewrite filters on RHS join columns into LHS equivalents

* PR comments

* Fix inspection

* Revert unnecessary ExprMacroTable change

* Fix build after merge

* Address PR comments
2020-05-08 23:45:35 -07:00
mcbrewster 28be107a1c
add flag to flattenSpec to keep null columns (#9814)
* add flag to flattenSpec to keep null columns

* remove changes to inputFormat interface

* add comment

* change comment message

* update web console e2e test

* move keepNullColmns to JSONParseSpec

* fix merge conflicts

* fix tests

* set keepNullColumns to false by default

* fix lgtm

* change Boolean to boolean, add keepNullColumns to hash, add tests for keepKeepNullColumns false + true with no nuulul columns

* Add equals verifier tests
2020-05-08 21:53:39 -07:00
Maytas Monsereenusorn accd710115
Add equivalent test coverage for all RHS join impls (#9831)
* Add equivalent test coverage for all RHS join impls

* address comments
2020-05-06 16:10:41 -07:00
Jihoon Son 6674d721bc
Avoid sorting values in InDimFilter if possible (#9800)
* Avoid sorting values in InDimFilter if possible

* tests

* more tests

* fix and and or filters

* fix build

* false and true vector matchers

* fix vector matchers

* checkstyle

* in filter null handling

* remove wrong test

* address comments

* remove unnecessary null check

* redundant separator

* address comments

* typo

* tests
2020-05-06 15:26:36 -07:00
Suneet Saldanha 1e857c5303
Ignore druid-processing benchmarks in tests (#9821) 2020-05-06 08:59:48 -07:00
Jihoon Son c6caae9a24
Fix filtering on boolean values in transformation (#9812)
* Fix filter on boolean value in Transform

* assert

* more descriptive test

* remove assert

* add assert for cached string; disable tests

* typo
2020-05-04 18:47:10 -07:00
Jian Wang 85dfbb64cb
Update documention for metricCompression (#9811) 2020-05-03 12:56:48 -07:00
Suneet Saldanha 7510e6e722
Fix potential NPEs in joins (#9760)
* Fix potential NPEs in joins

intelliJ reported issues with potential NPEs. This was first hit in testing
with a filter being pushed down to the left hand table when joining against
an indexed table.

* More null check cleanup

* Optimize filter value rewrite for IndexedTable

* Add unit tests for LookupJoinable

* Add tests for IndexedTableJoinable

* Add non null assert for dimension selector

* Supress null warning in LookupJoinMatcher

* remove some null checks on hot path
2020-04-29 11:03:13 -07:00
Jonathan Wei fe000a9e4b
Adjust string comparators used for ingestion (#9742)
* Adjust string comparators used for ingestion

* Small tweak

* Fix inspection, more javadocs

* Address PR comment

* Add rollup comment

* Add ordering test

* Fix IncrementaIndexRowCompTest
2020-04-25 13:47:07 -07:00
BIGrey c5bfe36011
Optimize FileWriteOutBytes to avoid high system cpu usage (#9722)
* optimize FileWriteOutBytes to avoid high sys cpu

* optimize FileWriteOutBytes to avoid high sys cpu -- remove IOException

* optimize FileWriteOutBytes to avoid high sys cpu -- remove IOException in writeOutBytes.size

* Revert "optimize FileWriteOutBytes to avoid high sys cpu -- remove IOException in writeOutBytes.size"

This reverts commit 965f7421

* Revert "optimize FileWriteOutBytes to avoid high sys cpu -- remove IOException"

This reverts commit 149e08c0

* optimize FileWriteOutBytes to avoid high sys cpu -- avoid IOEception never thrown check

* Fix size counting to handle IOE in FileWriteOutBytes + tests

* remove unused throws IOException in WriteOutBytes.size()

* Remove redundant throws IOExcpetion clauses

* Parameterize IndexMergeBenchmark

Co-authored-by: huanghui.bigrey <huanghui.bigrey@bytedance.com>
Co-authored-by: Suneet Saldanha <suneet.saldanha@imply.io>
2020-04-23 20:18:42 -07:00
Clint Wylie 68cc0b2e1c
fixes for inline subqueries when multi-value dimension is present (#9698)
* fixes for inline subqueries when multi-value dimension is present

* fix test

* allow missing capabilities for vectorized group by queries to be treated as single dims since it means that column doesnt exist

* add comment
2020-04-21 18:44:26 -07:00
Jenson b9ad250c00
Fix misuse of Integer.SIZE in FileWriteOutBytes.writeInt (#9723)
* change Integer.SIZE to Integer.BYTES in FileWriteOutBytes#writeInt

* Add ASF header

Co-authored-by: jenson <junstan@paypal.com>
2020-04-19 18:16:53 +08:00
Clint Wylie e677c62484
document useFilterCNF query context parameter (#9647)
* document useFilterCNF query context parameter

* move context key to QueryContexts

* Update .spelling
2020-04-16 22:12:20 -07:00
Clint Wylie b89ad49396
disable group by config applyLimitPushDownToSegment by default (#9711)
* disable group by config applyLimitPushDownToSegment by default

* document
2020-04-16 03:03:35 -07:00
Clint Wylie 0ff926b1a1
fix issue with group by limit pushdown for extractionFn, expressions, joins, etc (#9662)
* fix issue with group by limit pushdown for extractionFn, expressions, joins, etc

* remove unused

* fix test

* revert unintended change

* more tests

* consider capabilities for StringGroupByColumnSelectorStrategy

* fix test

* fix and more test

* revert because im scared
2020-04-11 01:18:11 -07:00
Gian Merlino 5249155284
Fix off-by-one in IndexedTableJoinMatcher.getCardinality. (#9674)
* Fix off-by-one in IndexedTableJoinMatcher.getCardinality.

It would report a cardinality that is one lower than the actual cardinality.
The missing value is the phantom null that can be generated by outer joins.

* Fix tests.
2020-04-10 18:11:05 -07:00
Suneet Saldanha 332ca19621
Fix potential integer overflow issues (#9609)
ApproximateHistogram - seems unlikely
SegmentAnalyzer - unclear if this is an actual issue
GenericIndexedWriter - unclear if this is an actual issue
IncrementalIndexRow and OnheapIncrementalIndex are non-issues becaus it's very
unlikely for the number of dims to be large enough to hit the overflow
condition
2020-04-10 11:47:08 -07:00
Suneet Saldanha 1ced3b33fb
IntelliJ inspections cleanup (#9339)
* IntelliJ inspections cleanup

* Standard Charset object can be used
* Redundant Collection.addAll() call
* String literal concatenation missing whitespace
* Statement with empty body
* Redundant Collection operation
* StringBuilder can be replaced with String
* Type parameter hides visible type

* fix warnings in test code

* more test fixes

* remove string concatenation inspection error

* fix extra curly brace

* cleanup AzureTestUtils

* fix charsets for RangerAdminClient

* review comments
2020-04-10 10:04:40 -07:00
Jihoon Son e157fb089a
Fix wrong cardinality computation in BufferArrayGrouper (#9655)
* Fix wrong cardinality computation in BufferArrayGrouper

* fix javadoc
2020-04-10 09:05:38 -07:00
Suneet Saldanha 65de636893
Fix potential integer overflow in BufferArrayGrouper (#9605)
This change fixes a potential integer overflow in BufferArrayGrouper that
was flagged by LGTM. It also adds a check that the vectorized arrays are
initialized before aggregateVector is called.

The changes in HashTableUtils should not have any effect since the numbers
being multiplied are small, but the change will remove the warnings from
being flagged in LGTM.
2020-04-09 17:46:15 -07:00
Jihoon Son a6790ff22a
More optimize CNF conversion of filters (#9634)
* More optimize CNF conversion of filters

* update license

* fix build

* checkstyle

* remove unnecessary code

* split helper

* license

* checkstyle

* add comments on cnf conversion
2020-04-08 21:31:17 -07:00
Abhishek Radhakrishnan 08851c0198
Preserve the null values for numeric type dimensions post-compaction. (#9622)
* Add selector null check to preserve null values as-is.

* Fix typo.

* add wrapping dimension selector test.

* Address review comments.

* nit: replace exception type.

* uh, float is indeed NOT a special case.
2020-04-08 18:56:06 -07:00
Jihoon Son 82ce60b5c1
Reuse transformer in stream indexing (#9625)
* Reuse transformer in stream indexing

* remove unused method

* memoize complied pattern
2020-04-06 16:36:08 -07:00
Jihoon Son 40e84a171b
Eliminate common subfilters when converting it to a CNF (#9608) 2020-04-05 22:29:41 -07:00
Jihoon Son 0da8ffc3ff
Bump up development version to 0.19.0-SNAPSHOT (#9586) 2020-03-30 16:24:04 -07:00
Himanshu 839379246a
remove commons-lang3 usage from DoubleMeanAggregatorFactoryTest (#9578) 2020-03-30 14:31:50 -07:00
Stanislav Poryadnyi 9081b5f25c
fix MAX_INTERMEDIATE_SIZE for DoubleMeanHolder (#9568)
* fix MAX_INTERMEDIATE_SIZE for DoubleMeanHolder

* byte[] type handling in deserialize and finalizeComputation for DoubleMeanAggregatorFactory

* DoubleMeanAggregatorFactory tests: Max Intermediate Size, Deserialize, finalizeComputation

* moved byte[] check to first position

Co-authored-by: Stanislav <S.Poryadnyi@abcconsulting.ru>
2020-03-27 22:26:31 -07:00
Xavier Léauté b4ad3d0d88
fix nullhandling exceptions related to test ordering (#9570)
* fix nullhandling exceptions related to test ordering

Tests might get executed in different order depending on the maven
version and the test environment. This may lead to "NullHandling module
not initialized" errors for some tests where we do not initialize
null-handling explicitly.

* use InitializedNullHandlingTest
2020-03-27 09:46:31 -07:00
Clint Wylie 2c49f6d89a
error on value counter overflow instead of writing sad segments (#9559) 2020-03-26 16:54:48 -07:00
Clint Wylie bf85ea19b2
roaring bitmaps by default (#9548)
* it is finally time

* fix it

* more docs

* fix doc
2020-03-23 18:15:57 -07:00
Gian Merlino 54c9325256
SQL support for joins on subqueries. (#9545)
* SQL support for joins on subqueries.

Changes to SQL module:

- DruidJoinRule: Allow joins on subqueries (left/right are no longer
  required to be scans or mappings).
- DruidJoinRel: Add cost estimation code for joins on subqueries.
- DruidSemiJoinRule, DruidSemiJoinRel: Removed, since DruidJoinRule can
  handle this case now.
- DruidRel: Remove Nullable annotation from toDruidQuery, because
  it is no longer needed (it was used by DruidSemiJoinRel).
- Update Rules constants to reflect new rules available in our current
  version of Calcite. Some of these are useful for optimizing joins on
  subqueries.
- Rework cost estimation to be in terms of cost per row, and place all
  relevant constants in CostEstimates.

Other changes:

- RowBasedColumnSelectorFactory: Don't set hasMultipleValues. The lack
  of isComplete is enough to let callers know that columns might have
  multiple values, and explicitly setting it to true causes
  ExpressionSelectors to think it definitely has multiple values, and
  treat the inputs as arrays. This behavior interfered with some of the
  new tests that involved queries on lookups.
- QueryContexts: Add maxSubqueryRows parameter, and use it in druid-sql
  tests.

* Fixes for tests.

* Adjustments.
2020-03-22 16:43:55 -07:00
Gian Merlino 1ef25a438f
Broker: Add ability to inline subqueries. (#9533)
* Broker: Add ability to inline subqueries.

The main changes:

- ClientQuerySegmentWalker: Add ability to inline queries.
- Query: Add "getSubQueryId" and "withSubQueryId" methods.
- QueryMetrics: Add "subQueryId" dimension.
- ServerConfig: Add new "maxSubqueryRows" parameter, which is used by
  ClientQuerySegmentWalker to limit how many rows can be inlined per
  query.
- IndexedTableJoinMatcher: Allow creating keys on top of unknown types,
  by assuming they are strings. This is useful because not all types are
  known for fields in query results.
- InlineDataSource: Store RowSignature rather than component parts. Add
  more zealous "equals" and "hashCode" methods to ease testing.
- Moved QuerySegmentWalker test code from CalciteTests and
  SpecificSegmentsQueryWalker in druid-sql to QueryStackTests in
  druid-server. Use this to spin up a new ClientQuerySegmentWalkerTest.

* Adjustments from CI.

* Fix integration test.
2020-03-18 15:06:45 -07:00
Jonathan Wei b1847364b0
More efficient join filter rewrites (#9516)
* More efficient join filter rewrites

* Rebase

* Remove unused functions

* PR comments, fix compile

* Adjust comment

* Allow filter rewrite when join condition has LHS expression

* Fix inspections

* Fix tests
2020-03-16 22:16:14 -07:00
Clint Wylie 6afd55c8f4
threshold based automatic query prioritization (#9493)
* threshold based automatic query prioritization

* fixes

* spelling and fixes

* fix docs

* spelling

* checkstyle

* adjustments

* doc fix
2020-03-13 01:41:54 -07:00
Gian Merlino ff59d2e78b
Move RowSignature from druid-sql to druid-processing and make use of it. (#9508)
* Move RowSignature from druid-sql to druid-processing and make use of it.

1) Moved (most of) RowSignature from sql to processing. Left behind the SQL-specific
   stuff in a RowSignatures utility class. It also picked up some new convenience
   methods along the way.
2) There were a lot of places in the code where Map<String, ValueType> was used to
   associate columns with type info. These are now all replaced with RowSignature.
3) QueryToolChest's resultArrayFields method is replaced with resultArraySignature,
   and it now provides type info.

* Fix up extensions.

* Various fixes
2020-03-12 11:06:44 -07:00
Jonathan Wei 3082b9289a
Fix NPE when using IndexedTable and all left rows are filtered out (#9490)
* Fix NPE when using IndexedTable and all left rows are filtered out

* Fix compile

* Add constant for uninitialized current row

* Fix checkstyle
2020-03-11 19:23:05 -07:00
Gian Merlino 2ef5c17441
Link up row-based datasources to serving layer. (#9503)
* Link up row-based datasources to serving layer.

- Add SegmentWrangler interface that allows linking of DataSources to Segments.
- Add LocalQuerySegmentWalker that uses SegmentWranglers to compute queries on
  data that is available locally.
- Modify ClientQuerySegmentWalker to use LocalQuerySegmentWalker when the base
  datasource is concrete and not a table.
- Add SegmentWranglerModule to the Broker so it has them available and can
  properly instantiate . LocalQuerySegmentWalkers.
- Set InlineDataSource and LookupDataSource to concrete, since they can be
  directly queried now.

* Fix tests.
2020-03-11 11:32:27 -07:00
Gian Merlino 4f085896c6
Ability to directly query row-based datasources. (#9502)
* Ability to directly query row-based datasources.

Includes:

- Foundational classes RowBasedSegment, RowBasedStorageAdapter,
  RowBasedCursor provide a queryable interface on top of a
  RowBasedColumnSelectorFactory.
- Add LookupSegment: A RowBasedSegment that is built on lookup data.
- Improve capability reporting in RowBasedColumnSelectorFactory.

* Fix import.

* Remove unthrown IOException.
2020-03-10 20:39:01 -07:00
Samarth Jain c74749f0f4
Don't exclude null dimension values from the map based query response (#9438) 2020-03-10 15:06:03 -07:00
Gian Merlino c6c2282b59
Harmonization and bug-fixing for selector and filter behavior on unknown types. (#9484)
* Harmonization and bug-fixing for selector and filter behavior on unknown types.

- Migrate ValueMatcherColumnSelectorStrategy to newer ColumnProcessorFactory
  system, and set defaultType COMPLEX so unknown types can be dynamically matched.
- Remove ValueGetters in favor of ColumnComparisonFilter doing its own thing.
- Switch various methods to use convertObjectToX when casting to numbers, rather
  than ad-hoc and inconsistent logic.
- Fix bug in RowBasedExpressionColumnValueSelector: isBindingArray should return
  true even for 0- or 1- element arrays.
- Adjust various javadocs.

* Add throwParseExceptions option to Rows.objectToNumber, switch back to that.

* Update tests.

* Adjust moment sketch tests.
2020-03-10 07:15:57 -07:00
Clint Wylie 8b9fe6f584
query laning and load shedding (#9407)
* prototype

* merge QueryScheduler and QueryManager

* everything in its right place

* adjustments

* docs

* fixes

* doc fixes

* use resilience4j instead of semaphore

* more tests

* simplify

* checkstyle

* spelling

* oops heh

* remove unused

* simplify

* concurrency tests

* add SqlResource tests, refactor error response

* add json config tests

* use LongAdder instead of AtomicLong

* remove test only stuffs from scheduler

* javadocs, etc

* style

* partial review stuffs

* adjust

* review stuffs

* more javadoc

* error response documentation

* spelling

* preserve user specified lane for NoSchedulingStrategy

* more test, why not

* doc adjustment

* style

* missed review for make a thing a constant

* fixes and tests

* fix test

* Update docs/configuration/index.md

Co-Authored-By: sthetland <steve.hetland@imply.io>

* doc update

Co-authored-by: sthetland <steve.hetland@imply.io>
2020-03-10 02:57:16 -07:00
Jihoon Son 75e2051195
Convert array_contains() and array_overlaps() into native filters if possible (#9487)
* Convert array_contains() and array_overlaps() into native filters if
possible

* make spotbugs happy and fix null results when null compatible
2020-03-09 22:50:38 -07:00
Jonathan Wei 0136dba95d
Add option to control join filter rewrites (#9472)
* Add option to control join filter rewrites

* Fix inspections
2020-03-09 17:36:07 -07:00
Clint Wylie a677664811
allow optimization of single multi-value column input expr with repeated identifier (#9425)
* allow optimization of single multi-value column input expr with repeated identifier

* add test
2020-03-06 12:53:32 -08:00
Julian Jaffe eda03630d0
Add OnHeapMemorySegmentWriteOutMediumFactory (#9454)
* Add OnHeapMemorySegmentWriteOutMediumFactory

Add a factory for OnHeapMemorySegmentWriteOutMedium to support direct writing via Spark.

* Register OnHeapMemorySegmentWriteOutMediumFactory.

Register OnHeapMemorySegmentWriteOutMediumFactory with SegmentWriteOutMediumFactory.

* Remove unnecessary throws

The base `makeSegmentWriteOutMedium` throws an IOException, but the particular implementation of OnHeapMemorySegmentWriteOutMediumFactory does not throw a checked exception.

* Update SegmentWriteOutMedium docs to include onHeapMemory

Update the SegmentWriteOutMedium section of the indexing docs to include a description of the new OnHeapSegmentMediumWriteOut option.
2020-03-05 22:34:08 -08:00
Jihoon Son 3016057178
Make Transform an ExtensionPoint (#9319)
* Make Transform an ExtensionPoint

* Add transform to the list of documented extensions

* Add example transform implementation
2020-03-04 12:13:14 -08:00
Gian Merlino 1fd865b7c1
BufferArrayGrouper: Fix potential overflow in requiredBufferCapacity. (#9435)
* BufferArrayGrouper: Fix potential overflow in requiredBufferCapacity.

If cardinality was high, the computation could overflow an int. There
were tests for this, but the tests were wrong.

* Nicer.
2020-02-28 14:27:52 -08:00
Gian Merlino 81d8be6e39
CacheStrategy: Improve Javadocs. (#9280)
* CacheStrategy: Improve Javadocs.

* Update processing/src/main/java/org/apache/druid/query/CacheStrategy.java

Co-Authored-By: Suneet Saldanha <44787917+suneet-s@users.noreply.github.com>

Co-authored-by: Suneet Saldanha <44787917+suneet-s@users.noreply.github.com>
2020-02-28 11:30:58 -08:00
Gian Merlino ef3d24e886
Add javadocs for enableFilterPushDown. (#9423) 2020-02-26 22:07:33 -08:00
Gian Merlino c9faf3e148
Add SQL GROUPING SETS support. (#9122)
* Add SQL GROUPING SETS support.

Built on top of the subtotalsSpec feature in the groupBy query. This also involves
two changes to subtotalsSpec:

- Alter behavior so limitSpec is applied after subtotalsSpec, rather than applied to
  each grouping set. This is more in line with SQL standard behavior. I think it is okay
  to make this change, since the old behavior was not documented, so users should
  hopefully not be depending on it.
- Fix a bug where virtual columns were included in the subtotal queries, but they
  should not have been.

Also fixes two bugs in query equality checking:

- BaseQuery: Use getDuration() instead of "duration" in equals and hashCode, since the
  latter is lazily initialized and might be null in one query but not the other.
- GroupByQuery: Include subtotalsSpec in equals and hashCode.

* Fix bugs.

* Fix tests.

* PR updates.

* Grouping class hygiene.
2020-02-26 08:52:39 -08:00
Jonathan Wei 5ce9c81b68
Add join prefix duplicate/shadowing check (#9384)
* Add join prefix duplicate/shadowing check

* Fix format string

* PR comments

* PR comment

* Optimize loop PR comment
2020-02-25 18:17:23 -08:00
Clint Wylie 6d8dd5ec10
string -> expression -> string -> expression (#9367)
* add Expr.stringify which produces parseable expression strings, parser support for null values in arrays, and parser support for empty numeric arrays

* oops, macros are expressions too

* style

* spotbugs

* qualified type arrays

* review stuffs

* simplify grammar

* more permissive array parsing

* reuse expr joiner

* fix it
2020-02-21 15:43:02 -08:00
Jonathan Wei cab08f941d
Fix join filter push down post-join virtual column handling (#9373)
* Fix join filter push down post-join virtual column handling

* Remove unused adapter param, update javadocs

* Fix TC

* Update processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java

Co-Authored-By: Suneet Saldanha <44787917+suneet-s@users.noreply.github.com>

* Address PR comments

Co-authored-by: Suneet Saldanha <44787917+suneet-s@users.noreply.github.com>
2020-02-19 15:51:05 -08:00
Chi Cao Minh e7eb45e648
Run IntelliJ inspections on Travis (#9179)
* Run IntelliJ inspections on Travis

Running IntelliJ inspections currently takes about 90 minutes, but they
can be run in about 30 minutes on Travis.

* Restore assert statements
2020-02-19 11:34:19 +03:00
Jonathan Wei 73a0181e34
Fix handling for columns that appear multiple times in join conditions (#9362)
* Fix handling for columns that appear multiple times in join conditions

* Remove unneeded comment

* Fix test
2020-02-17 10:54:04 -08:00
Suneet Saldanha b1f38131af
Fix timestamp extract fn to match postgreSQL (#9337)
* Fix timestamp extract fn to match postgres

Update the timestamp extract function so that it matches the PostgreSQL docs.
Examples from the PostgreSQL docs were added as tests for DECADE, CENTURY
and MILLENIUM extraction.

There were bugs in CENTURY and MILLENIUM that were spotted because of intelliJ
inspections - 'Integer division in floating point context'

* Update CalciteQueryTest

* remove useless round

* mark integer division as an error
2020-02-12 15:39:19 -08:00