Commit Graph

1976 Commits

Author SHA1 Message Date
Gian Merlino 6c0c6e60b3
Vectorized theta sketch aggregator + rework of VectorColumnProcessorFactory. (#10767)
* Vectorized theta sketch aggregator.

Also a refactoring of BufferAggregator and VectorAggregator such that
they share a common interface, BaseBufferAggregator. This allows
implementing both in the same file with an abstract + dual subclass
structure.

* Rework implementation to use composition instead of inheritance.

* Rework things to enable working properly for both complex types and
regular types.

Involved finally moving makeVectorProcessor from DimensionHandlerUtils
into ColumnProcessors and harmonizing the two things.

* Add missing method.

* Style and name changes.

* Fix issues from inspections.

* Fix style issue.
2021-01-29 09:30:09 -08:00
Clint Wylie cd6af93274
add leftover tests from #10743 (#10766) 2021-01-22 09:20:48 -08:00
Gian Merlino 8b808c4879
Retain order of AND, OR filter children. (#10758)
* Retain order of AND, OR filter children.

If we retain the order, it enables short-circuiting. People can put a
more selective filter earlier in the list and lower the chance that
later filters will need to be evaluated.

Short-circuiting was working before #9608, which switched to unordered
sets to solve a different problem. This patch tries to solve that
problem a different way.

This patch moves filter simplification logic from "optimize" to
"toFilter", because that allows the code to be shared with Filters.and
and Filters.or. The simplification has become more complicated and so
it's useful to share it.

This patch also removes code from CalciteCnfHelper that is no longer
necessary because Filters.and and Filters.or are now doing the work.

* Fixes for inspections.

* Fix tests.

* Back to a Set.
2021-01-20 08:59:20 -08:00
zhangyue19921010 2590ad4f67
Historical unloads damaged segments automatically when lazy on start. (#10688)
* ready to test

* tested on dev cluster

* tested

* code review

* add UTs

* add UTs

* ut passed

* ut passed

* opti imports

* done

* done

* fix checkstyle

* modify uts

* modify logs

* changing the package of SegmentLazyLoadFailCallback.java to org.apache.druid.segment

* merge from master

* modify import orders

* merge from master

* merge from master

* modify logs

* modify docs

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-01-16 19:53:30 -08:00
Gian Merlino 2b24dc3764
SegmentAnalyzer: Properly close column after retrieving it. (#10772) 2021-01-16 19:26:34 -08:00
Gian Merlino a82910e065
OrFilter: Properly handle child matchers that return the original mask. (#10754)
* OrFilter: Properly handle child matchers that return the original mask.

This happens when a child matcher is literally true (for example,
BooleanVectorValueMatcher). In this case, OrFilter would throw this
exception from its call to removeAll while processing the next filter:

  java.lang.IllegalStateException: 'other' must be a different instance from 'this'

Also update the javadocs for VectorValueMatcher to call out that the
returned object may be the same as the input mask.

* Fix style.
2021-01-14 23:28:13 -08:00
Gian Merlino 7354953b1b
VectorMatch: Disallow "copyFrom", "addAll" on self; improve tests. (#10755)
No existing code relies on being able to call these methods in this way.

The new tests exhaustively test all vectors up to size 7, and also test
behavior the run-on-self behavior that has been adjusted by this patch.
2021-01-14 18:29:13 -08:00
Gian Merlino 2bbf89db81
Remove FalseVectorMatcher, TrueVectorMatcher in favor of BooleanVectorValueMatcher. (#10757) 2021-01-14 18:28:25 -08:00
Jihoon Son 149306c9db
Tidy up HTTP status codes for query errors (#10746)
* Tidy up query error codes

* fix tests

* Restore query exception type in JsonParserIterator

* address review comments; add a comment explaining the ugly switch

* fix test
2021-01-13 17:20:00 -08:00
Clint Wylie 8c3c9b4060
fix limited queries with subtotals (#10743)
* i put my thing down, flip it and reverse it

* oops
2021-01-13 12:55:24 -08:00
Clint Wylie 9362dc7968
re-use expression vector evaluation results for the same offset in expression vector selectors (#10614)
* cache expression selector results by associating vector expression bindings to underlying vector offset

* better coverage, fix floats

* style

* stupid bot

* stupid me

* more test

* intellij threw me under the bus when it generated those junit methods

* narrow interface instead of passing around offset
2021-01-13 12:44:56 -08:00
秦臻 c62b7c19c3
javascript filter result convert to java boolean (#10721)
* javascript filter result convert to java boolean

* use type convert replace script convert, and add more unit test

Co-authored-by: qinzhen <qinzhen@kuaishou.com>
2021-01-08 14:30:09 -08:00
Gian Merlino 6eef0e4c9f
Fix collision between #10689 and #10593. (#10738) 2021-01-08 09:52:27 -08:00
Aleksey Plekhanov 26bcd47e51
Thread-safety for ResponseContext.REGISTERED_KEYS (#9667) 2021-01-08 00:37:49 -08:00
Liran Funaro 08ab82f55c
IncrementalIndex Tests and Benchmarks Parametrization (#10593)
* Remove redundant IncrementalIndex.Builder

* Parametrize incremental index tests and benchmarks

- Reveal and fix a bug in OffheapIncrementalIndex

* Fix forbiddenapis error: Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale]

* Fix Intellij errors: declared exception is never thrown

* Add documentation and validate before closing objects on tearDown.

* Add documentation to OffheapIncrementalIndexTestSpec

* Doc corrections and minor changes.

* Add logging for generated rows.

* Refactor new tests/benchmarks.

* Improve IncrementalIndexCreator documentation

* Add required tests for DataGenerator

* Revert "rollupOpportunity" to be a string
2021-01-07 22:18:47 -08:00
Gian Merlino 48e576a307
Scan query: More accurate error message when segment per time chunk limit is exceeded. (#10630)
* Scan query: More accurate error message when segment per time chunk limit is exceeded.

* Add guardrail test.
2021-01-06 14:11:28 -08:00
Jonathan Wei 68bb038b31
Multiphase segment merge for IndexMergerV9 (#10689)
* Multiphase merge for IndexMergerV9

* JSON fix

* Cleanup temp files

* Docs

* Address logging and add IT

* Fix spelling and test unloader datasource name
2021-01-05 22:19:09 -08:00
Abhishek Agarwal 796c25532e
Fix post-aggregator computation when used with subtotals (#10653)
* Fix post-aggregator computation

* remove commented code

* Fix numeric null handling

* Add test when subquery returns null long
2020-12-17 20:10:26 -08:00
Abhishek Agarwal 26d74b3580
Add grouping_id function (#10518)
* First draft of grouping_id function

* Add more tests and documentation

* Add calcite tests

* Fix travis failures

* bit of a change

* Add documentation

* Fix typos

* typo fix
2020-12-07 11:46:29 -08:00
Maytas Monsereenusorn 7eb5f59a9a
Fix string byte calculation in StringDimensionIndexer (#10623)
* fix string byte calculation

* fix tests

* fix test
2020-12-04 00:51:48 -08:00
Himanshu 813e18774e
make dimension column extensible with COMPLEX type (#10277)
* make dimension column extensible with COMPLEX type

* more changes

Change-Id: I9707dd644b8d71030b74a8c1d6fff0c0020d960d

* processing module changes for build fix

Change-Id: I146f95a41b79d20edb1721be13f0e9641f788e0e

* rename ColumnCapabilities.getTypeName() to getComplexTypeName()

* rename ColumnBuilder.setTypeName(..) -> ColumnBuilder.setComplexTypeName(..)
2020-12-03 08:58:17 -08:00
Lucas Capistrant 2e02eebd9d
Add context dimension to DefaultQueryMetrics (#10578)
* Add context dimension to DefaultQueryMetrics

* remove redundant addition of context dimension from DruidMetrics now that QueryMetrics adds it by default

* update SearchQueryMetrics to reflect the same pattern as other default dimensions in QueryMetrics

* add PublicApi annotation for context in QueryMetrics Interface
2020-12-01 18:34:03 -08:00
Lucas Capistrant 2560bf0a19
Add new coordinator metrics for coordinator duty runtimes (#10603)
* Add new coordinator metrics for duty runtimes

* fix spelling for a constant variable value

* add comment clarifying why the global runtime metric is emitted where it is

* Remove duty alias in lieu of using the class name for metrics

* fix docs

* CoordinatorStats tests + add duty stats to accumulate() logic
2020-11-29 14:47:35 -08:00
frank chen fe693a4f01
Improve doc and exception message for invalid user configurations (#10598)
* improve doc and exception message

* add spelling check rules and remove unused import

* add a test to improve test coverage
2020-11-23 15:03:13 -08:00
frank chen d7d2c804ad
Add zero period support to TIMESTAMPADD (#10550)
* Allow zero period for TIMESTAMPADD

* update test cases

* add empty zone test case

* add unit test cases for TimestampShiftMacro
2020-11-18 18:26:53 -08:00
frank chen e83d5cb59e
Fix ingestion failure of pretty-formatted JSON message (#10383)
* support multi-line text

* add test cases

* split json text into lines case by case

* improve exception handle

* fix CI

* use IntermediateRowParsingReader as base of JsonReader

* update doc

* ignore the non-immutable field in test case

* add more test cases

* mark `lineSplittable` as final

* fix testcases

* fix doc

* add a test case for SqlReader

* return all raw columns when exception occurs

* fix CI

* fix test cases

* resolve review comments

* handle ParseException returned by index.add

* apply Iterables.getOnlyElement

* fix CI

* fix test cases

* improve code in more graceful way

* fix test cases

* fix test cases

* add a test case to check multiple json string in one text block

* fix inspection check
2020-11-13 13:59:23 -08:00
Atul Mohan 6ccddedb7a
Improved exception handling in case of query timeouts (#10464)
* Separate timeout exceptions

* Add more tests

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-11-03 09:00:33 -06:00
Clint Wylie d0821de854
support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions (#10499)
* support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions

* inspector

* changes

* more test

* clean
2020-10-26 19:55:24 -07:00
Liran Funaro f3a2903218
Configurable Index Type (#10335)
* Introduce a Configurable Index Type

* Change to @UnstableApi

* Add AppendableIndexSpecTest

* Update doc

* Add spelling exception

* Add tests coverage

* Revert some of the changes to reduce diff

* Minor fixes

* Update getMaxBytesInMemoryOrDefault() comment

* Fix typo, remove redundant interface

* Remove off-heap spec (postponed to a later PR)

* Add javadocs to AppendableIndexSpec

* Describe testCreateTask()

* Add tests for AppendableIndexSpec within TuningConfig

* Modify hashCode() to conform with equals()

* Add comment where building incremental-index

* Add "EqualsVerifier" tests

* Revert some of the API back to AppenderatorConfig

* Don't use multi-line comments

* Remove knob documentation (deferred)
2020-10-23 18:34:26 -07:00
Abhishek Agarwal 567e381705
Any virtual column on "__time" should be a pre-join virtual column (#10451)
* Virtual column on __time should be in pre-join

* Add unit test
2020-10-12 13:04:55 -07:00
Abhishek Agarwal 4d2a92f46a
Add caching support to join queries (#10366)
* Proposed changes for making joins cacheable

* Add unit tests

* Fix tests

* simplify logic

* Pull empty byte array logic out of CachingQueryRunner

* remove useless null check

* Minor refactor

* Fix tests

* Fix segment caching on Broker

* Move join cache key computation in Broker

Move join cache key computation in Broker from ResultLevelCachingQueryRunner to CachingClusteredClient

* Fix compilation

* Review comments

* Add more tests

* Fix inspection errors

* Pushed condition analysis to JoinableFactory

* review comments

* Disable join caching for broker and add prefix key to BroadcastSegmentIndexedTable

* Remove commented lines

* Fix populateCache

* Disable caching for selective datasources

Refactored the code so that we can decide at the data source level, whether to enable cache for broker or data nodes
2020-10-09 17:42:30 -07:00
Jihoon Son 1deed9fbcd
Close aggregators in HashVectorGrouper.close() (#10452)
* Close aggregators in HashVectorGrouper.close()

* reuse grouper

* Add missing dependency
2020-10-06 10:17:33 -07:00
Clint Wylie 207ef310f2
vectorized group by support for nullable numeric columns (#10441)
* vectorized group by support for numeric null columns

* revert unintended change

* adjust

* review stuffs
2020-10-05 21:53:53 -07:00
Clint Wylie 9ec5c08e2a
fix array types from escaping into wider query engine (#10460)
* fix array types from escaping into wider query engine

* oops

* adjust

* fix lgtm
2020-10-03 15:30:34 -07:00
Clint Wylie 753bce324b
vectorize constant expressions with optimized selectors (#10440) 2020-09-29 13:19:06 -07:00
Gian Merlino 2be1ae128f
RowBasedIndexedTable: Add specialized index types for long keys. (#10430)
* RowBasedIndexedTable: Add specialized index types for long keys.

Two new index types are added:

1) Use an int-array-based index in cases where the difference between
   the min and max values isn't too large, and keys are unique.

2) Use a Long2ObjectOpenHashMap (instead of the prior Java HashMap) in
   all other cases.

In addition:

1) RowBasedIndexBuilder, a new class, is responsible for picking which
   index implementation to use.

2) The IndexedTable.Index interface is extended to support using
   unboxed primitives in the unique-long-keys case, and callers are
   updated to use the new functionality.

Other key types continue to use indexes backed by Java HashMaps.

* Fixup logic.

* Add tests.
2020-09-29 10:46:47 -07:00
Gian Merlino 599aacce0f
Remove Expr.visit. (#10437)
* Remove Expr.visit.

It isn't used and doesn't have tests.

* Remove Visitor too.
2020-09-28 22:13:10 -07:00
Clint Wylie 1d6cb624f4
add vectorizeVirtualColumns query context parameter (#10432)
* add vectorizeVirtualColumns query context parameter

* oops

* spelling

* default to false, more docs

* fix test

* fix spelling
2020-09-28 18:48:34 -07:00
Clint Wylie 3d700a5e31
vectorize remaining math expressions (#10429)
* vectorize remaining math expressions

* fixes

* remove cannotVectorize() where no longer true

* disable vectorized groupby for numeric columns with nulls

* fixes
2020-09-26 23:30:14 -07:00
Jihoon Son 0cc9eb4903
Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided (#10288)
* Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided

* query context

* fix tests; add more test

* javadoc

* docs and more tests

* remove default and hadoop tests

* consistent name and fix javadoc

* spelling and field name

* default function for partitionsSpec

* other comments

* address comments

* fix tests and spelling

* test

* doc
2020-09-24 16:32:56 -07:00
Clint Wylie 19c4b16640
vectorized expressions and expression virtual columns (#10401)
* vectorized expression virtual columns

* cleanup

* fixes

* preserve float if explicitly specified

* oops

* null handling fixes, more tests

* what is an expression planner?

* better names

* remove unused method, add pi

* move vector processor builders into static methods

* reduce boilerplate

* oops

* more naming adjustments

* changes

* nullable

* missing hex

* more
2020-09-23 13:56:38 -07:00
Gian Merlino 1af2eace41
Include Sequence-building time in CPU time metric. (#10377)
* Include Sequence-building time in CPU time metric.

Meaningful work can be done while building Sequences, and we should
count this work. On the Broker, this includes subquery processing
work done by the mergeResults call of the GroupByQueryQueryToolChest.

* Add test.
2020-09-23 14:33:55 +08:00
Dylan Wylie f3eb0cfb3b
Avoid large limits causing int overflow in buffer size checks (#10356)
* Avoid large limits causing int overflow in buffer size checks

* fix lgtm overflow warning

Co-authored-by: Dylan <dwylie@spotx.tv>
2020-09-18 13:08:49 -07:00
Suneet Saldanha f71ba6f2c2
Vectorized ANY aggregators (#10338)
* WIP vectorized ANY aggregators

* tests

* fix aggs

* cleanup

* code review + tests

* docs

* use NilVectorSelector when needed

* fix spellcheck

* dont instantiate vectors

* cleanup
2020-09-14 19:44:58 -07:00
Clint Wylie e012d5c41b
allow vectorized query engines to utilize vectorized virtual columns (#10388)
* allow vectorized query engines to utilize vectorized virtual column implementations

* javadoc, refactor, checkstyle

* intellij inspection and more javadoc

* better

* review stuffs

* fix incorrect refactor, thanks tests

* minor adjustments
2020-09-14 19:29:35 -07:00
Clint Wylie 184b202411
add computed Expr output types (#10370)
* push down ValueType to ExprType conversion, tidy up

* determine expr output type for given input types

* revert unintended name change

* add nullable

* tidy up

* fixup

* more better

* fix signatures

* naming things is hard

* fix inspection

* javadoc

* make default implementation of Expr.getOutputType that returns null

* rename method

* more test

* add output for contains expr macro, split operation and function auto conversion
2020-09-14 18:18:56 -07:00
Abhishek Agarwal f5e2645bbb
Support SearchQueryDimFilter in sql via new methods (#10350)
* Support SearchQueryDimFilter in sql via new methods

* Contains is a reserved word

* revert unnecessary change

* Fix toDruidExpression method

* rename methods

* java docs

* Add native functions

* revert change in dockerfile

* remove changes from dockerfile

* More tests

* travis fix

* Handle null values better
2020-09-14 09:57:54 -07:00
Jihoon Son 8f14ac814e
More structured way to handle parse exceptions (#10336)
* More structured way to handle parse exceptions

* checkstyle; add more tests

* forbidden api; test

* address comment; new test

* address review comments

* javadoc for parseException; remove redundant parseException in streaming ingestion

* fix tests

* unnecessary catch

* unused imports

* appenderator test

* unused import
2020-09-11 16:31:10 -07:00
Joy Kent e5f0da30ae
Fix stringFirst/stringLast rollup during ingestion (#10332)
* Add IndexMergerRollupTest

This changelist adds a test to merge indexes with StringFirst/StringLast aggregator.

* Fix StringFirstAggregateCombiner/StringLastAggregateCombiner

The segment-level type for stringFirst/stringLast is SerializablePairLongString,
not String. This changelist fixes it.

* Fix EarliestLatestAnySqlAggregator to handle COMPLEX type

This changelist allows EarliestLatestAnySqlAggregator to accept COMPLEX
type as an operand. For its return type, we set it to VARCHAR, since
COMPLEX column is only generated by stringFirst/stringLast during ingestion
rollup.

* Return value with smaller timestamp in StringFirstAggregatorFactory.combine function

* Add integration tests for stringFirst/stringLast during ingestion

* Use one EarliestLatestReturnTypeInference instance

Co-authored-by: Joy Kent <joy@automonic.ai>
2020-09-08 17:36:04 -07:00
Suneet Saldanha 91a153820e
fix NPE in StringGroupByColumnSelectorStrategy#bufferComparator (#10325)
* fix NPE in StringGroupByColumnSelectorStrategy#bufferComparator

* Add tests

* javadocs
2020-09-04 13:23:40 -07:00