druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	8b808c4879	Retain order of AND, OR filter children. (#10758 ) * Retain order of AND, OR filter children. If we retain the order, it enables short-circuiting. People can put a more selective filter earlier in the list and lower the chance that later filters will need to be evaluated. Short-circuiting was working before #9608, which switched to unordered sets to solve a different problem. This patch tries to solve that problem a different way. This patch moves filter simplification logic from "optimize" to "toFilter", because that allows the code to be shared with Filters.and and Filters.or. The simplification has become more complicated and so it's useful to share it. This patch also removes code from CalciteCnfHelper that is no longer necessary because Filters.and and Filters.or are now doing the work. * Fixes for inspections. * Fix tests. * Back to a Set.	2021-01-20 08:59:20 -08:00
zhangyue19921010	2590ad4f67	Historical unloads damaged segments automatically when lazy on start. (#10688 ) * ready to test * tested on dev cluster * tested * code review * add UTs * add UTs * ut passed * ut passed * opti imports * done * done * fix checkstyle * modify uts * modify logs * changing the package of SegmentLazyLoadFailCallback.java to org.apache.druid.segment * merge from master * modify import orders * merge from master * merge from master * modify logs * modify docs * modify logs to rerun ci * modify logs to rerun ci * modify logs to rerun ci * modify logs to rerun ci * modify logs to rerun ci * modify logs to rerun ci * modify logs to rerun ci * modify logs to rerun ci Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-01-16 19:53:30 -08:00
Gian Merlino	2b24dc3764	SegmentAnalyzer: Properly close column after retrieving it. (#10772 )	2021-01-16 19:26:34 -08:00
Jihoon Son	95065bdf1a	Bump dev version to 0.22.0-SNAPSHOT (#10759 )	2021-01-15 13:16:23 -08:00
Gian Merlino	a82910e065	OrFilter: Properly handle child matchers that return the original mask. (#10754 ) * OrFilter: Properly handle child matchers that return the original mask. This happens when a child matcher is literally true (for example, BooleanVectorValueMatcher). In this case, OrFilter would throw this exception from its call to removeAll while processing the next filter: java.lang.IllegalStateException: 'other' must be a different instance from 'this' Also update the javadocs for VectorValueMatcher to call out that the returned object may be the same as the input mask. * Fix style.	2021-01-14 23:28:13 -08:00
Gian Merlino	7354953b1b	VectorMatch: Disallow "copyFrom", "addAll" on self; improve tests. (#10755 ) No existing code relies on being able to call these methods in this way. The new tests exhaustively test all vectors up to size 7, and also test behavior the run-on-self behavior that has been adjusted by this patch.	2021-01-14 18:29:13 -08:00
Gian Merlino	2bbf89db81	Remove FalseVectorMatcher, TrueVectorMatcher in favor of BooleanVectorValueMatcher. (#10757 )	2021-01-14 18:28:25 -08:00
Jihoon Son	149306c9db	Tidy up HTTP status codes for query errors (#10746 ) * Tidy up query error codes * fix tests * Restore query exception type in JsonParserIterator * address review comments; add a comment explaining the ugly switch * fix test	2021-01-13 17:20:00 -08:00
Clint Wylie	8c3c9b4060	fix limited queries with subtotals (#10743 ) * i put my thing down, flip it and reverse it * oops	2021-01-13 12:55:24 -08:00
Clint Wylie	9362dc7968	re-use expression vector evaluation results for the same offset in expression vector selectors (#10614 ) * cache expression selector results by associating vector expression bindings to underlying vector offset * better coverage, fix floats * style * stupid bot * stupid me * more test * intellij threw me under the bus when it generated those junit methods * narrow interface instead of passing around offset	2021-01-13 12:44:56 -08:00
秦臻	c62b7c19c3	javascript filter result convert to java boolean (#10721 ) * javascript filter result convert to java boolean * use type convert replace script convert, and add more unit test Co-authored-by: qinzhen <qinzhen@kuaishou.com>	2021-01-08 14:30:09 -08:00
Gian Merlino	6eef0e4c9f	Fix collision between #10689 and #10593 . (#10738 )	2021-01-08 09:52:27 -08:00
Aleksey Plekhanov	26bcd47e51	Thread-safety for ResponseContext.REGISTERED_KEYS (#9667 )	2021-01-08 00:37:49 -08:00
Liran Funaro	08ab82f55c	IncrementalIndex Tests and Benchmarks Parametrization (#10593 ) * Remove redundant IncrementalIndex.Builder * Parametrize incremental index tests and benchmarks - Reveal and fix a bug in OffheapIncrementalIndex * Fix forbiddenapis error: Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale] * Fix Intellij errors: declared exception is never thrown * Add documentation and validate before closing objects on tearDown. * Add documentation to OffheapIncrementalIndexTestSpec * Doc corrections and minor changes. * Add logging for generated rows. * Refactor new tests/benchmarks. * Improve IncrementalIndexCreator documentation * Add required tests for DataGenerator * Revert "rollupOpportunity" to be a string	2021-01-07 22:18:47 -08:00
Gian Merlino	48e576a307	Scan query: More accurate error message when segment per time chunk limit is exceeded. (#10630 ) * Scan query: More accurate error message when segment per time chunk limit is exceeded. * Add guardrail test.	2021-01-06 14:11:28 -08:00
Jonathan Wei	68bb038b31	Multiphase segment merge for IndexMergerV9 (#10689 ) * Multiphase merge for IndexMergerV9 * JSON fix * Cleanup temp files * Docs * Address logging and add IT * Fix spelling and test unloader datasource name	2021-01-05 22:19:09 -08:00
Abhishek Agarwal	796c25532e	Fix post-aggregator computation when used with subtotals (#10653 ) * Fix post-aggregator computation * remove commented code * Fix numeric null handling * Add test when subquery returns null long	2020-12-17 20:10:26 -08:00
Abhishek Agarwal	26d74b3580	Add grouping_id function (#10518 ) * First draft of grouping_id function * Add more tests and documentation * Add calcite tests * Fix travis failures * bit of a change * Add documentation * Fix typos * typo fix	2020-12-07 11:46:29 -08:00
Maytas Monsereenusorn	7eb5f59a9a	Fix string byte calculation in StringDimensionIndexer (#10623 ) * fix string byte calculation * fix tests * fix test	2020-12-04 00:51:48 -08:00
Himanshu	813e18774e	make dimension column extensible with COMPLEX type (#10277 ) * make dimension column extensible with COMPLEX type * more changes Change-Id: I9707dd644b8d71030b74a8c1d6fff0c0020d960d * processing module changes for build fix Change-Id: I146f95a41b79d20edb1721be13f0e9641f788e0e * rename ColumnCapabilities.getTypeName() to getComplexTypeName() * rename ColumnBuilder.setTypeName(..) -> ColumnBuilder.setComplexTypeName(..)	2020-12-03 08:58:17 -08:00
Lucas Capistrant	2e02eebd9d	Add context dimension to DefaultQueryMetrics (#10578 ) * Add context dimension to DefaultQueryMetrics * remove redundant addition of context dimension from DruidMetrics now that QueryMetrics adds it by default * update SearchQueryMetrics to reflect the same pattern as other default dimensions in QueryMetrics * add PublicApi annotation for context in QueryMetrics Interface	2020-12-01 18:34:03 -08:00
Lucas Capistrant	2560bf0a19	Add new coordinator metrics for coordinator duty runtimes (#10603 ) * Add new coordinator metrics for duty runtimes * fix spelling for a constant variable value * add comment clarifying why the global runtime metric is emitted where it is * Remove duty alias in lieu of using the class name for metrics * fix docs * CoordinatorStats tests + add duty stats to accumulate() logic	2020-11-29 14:47:35 -08:00
frank chen	fe693a4f01	Improve doc and exception message for invalid user configurations (#10598 ) * improve doc and exception message * add spelling check rules and remove unused import * add a test to improve test coverage	2020-11-23 15:03:13 -08:00
frank chen	d7d2c804ad	Add zero period support to TIMESTAMPADD (#10550 ) * Allow zero period for TIMESTAMPADD * update test cases * add empty zone test case * add unit test cases for TimestampShiftMacro	2020-11-18 18:26:53 -08:00
frank chen	e83d5cb59e	Fix ingestion failure of pretty-formatted JSON message (#10383 ) * support multi-line text * add test cases * split json text into lines case by case * improve exception handle * fix CI * use IntermediateRowParsingReader as base of JsonReader * update doc * ignore the non-immutable field in test case * add more test cases * mark `lineSplittable` as final * fix testcases * fix doc * add a test case for SqlReader * return all raw columns when exception occurs * fix CI * fix test cases * resolve review comments * handle ParseException returned by index.add * apply Iterables.getOnlyElement * fix CI * fix test cases * improve code in more graceful way * fix test cases * fix test cases * add a test case to check multiple json string in one text block * fix inspection check	2020-11-13 13:59:23 -08:00
Atul Mohan	6ccddedb7a	Improved exception handling in case of query timeouts (#10464 ) * Separate timeout exceptions * Add more tests Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>	2020-11-03 09:00:33 -06:00
Clint Wylie	d0821de854	support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions (#10499 ) * support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions * inspector * changes * more test * clean	2020-10-26 19:55:24 -07:00
Liran Funaro	f3a2903218	Configurable Index Type (#10335 ) * Introduce a Configurable Index Type * Change to @UnstableApi * Add AppendableIndexSpecTest * Update doc * Add spelling exception * Add tests coverage * Revert some of the changes to reduce diff * Minor fixes * Update getMaxBytesInMemoryOrDefault() comment * Fix typo, remove redundant interface * Remove off-heap spec (postponed to a later PR) * Add javadocs to AppendableIndexSpec * Describe testCreateTask() * Add tests for AppendableIndexSpec within TuningConfig * Modify hashCode() to conform with equals() * Add comment where building incremental-index * Add "EqualsVerifier" tests * Revert some of the API back to AppenderatorConfig * Don't use multi-line comments * Remove knob documentation (deferred)	2020-10-23 18:34:26 -07:00
Abhishek Agarwal	567e381705	Any virtual column on "__time" should be a pre-join virtual column (#10451 ) * Virtual column on __time should be in pre-join * Add unit test	2020-10-12 13:04:55 -07:00
Abhishek Agarwal	4d2a92f46a	Add caching support to join queries (#10366 ) * Proposed changes for making joins cacheable * Add unit tests * Fix tests * simplify logic * Pull empty byte array logic out of CachingQueryRunner * remove useless null check * Minor refactor * Fix tests * Fix segment caching on Broker * Move join cache key computation in Broker Move join cache key computation in Broker from ResultLevelCachingQueryRunner to CachingClusteredClient * Fix compilation * Review comments * Add more tests * Fix inspection errors * Pushed condition analysis to JoinableFactory * review comments * Disable join caching for broker and add prefix key to BroadcastSegmentIndexedTable * Remove commented lines * Fix populateCache * Disable caching for selective datasources Refactored the code so that we can decide at the data source level, whether to enable cache for broker or data nodes	2020-10-09 17:42:30 -07:00
Jihoon Son	1deed9fbcd	Close aggregators in HashVectorGrouper.close() (#10452 ) * Close aggregators in HashVectorGrouper.close() * reuse grouper * Add missing dependency	2020-10-06 10:17:33 -07:00
Clint Wylie	207ef310f2	vectorized group by support for nullable numeric columns (#10441 ) * vectorized group by support for numeric null columns * revert unintended change * adjust * review stuffs	2020-10-05 21:53:53 -07:00
Jonathan Wei	65c0d64676	Update version to 0.21.0-SNAPSHOT (#10450 ) * [maven-release-plugin] prepare release druid-0.21.0 * [maven-release-plugin] prepare for next development iteration * Update web-console versions	2020-10-03 16:08:34 -07:00
Clint Wylie	9ec5c08e2a	fix array types from escaping into wider query engine (#10460 ) * fix array types from escaping into wider query engine * oops * adjust * fix lgtm	2020-10-03 15:30:34 -07:00
Clint Wylie	753bce324b	vectorize constant expressions with optimized selectors (#10440 )	2020-09-29 13:19:06 -07:00
Gian Merlino	2be1ae128f	RowBasedIndexedTable: Add specialized index types for long keys. (#10430 ) * RowBasedIndexedTable: Add specialized index types for long keys. Two new index types are added: 1) Use an int-array-based index in cases where the difference between the min and max values isn't too large, and keys are unique. 2) Use a Long2ObjectOpenHashMap (instead of the prior Java HashMap) in all other cases. In addition: 1) RowBasedIndexBuilder, a new class, is responsible for picking which index implementation to use. 2) The IndexedTable.Index interface is extended to support using unboxed primitives in the unique-long-keys case, and callers are updated to use the new functionality. Other key types continue to use indexes backed by Java HashMaps. * Fixup logic. * Add tests.	2020-09-29 10:46:47 -07:00
Gian Merlino	599aacce0f	Remove Expr.visit. (#10437 ) * Remove Expr.visit. It isn't used and doesn't have tests. * Remove Visitor too.	2020-09-28 22:13:10 -07:00
Clint Wylie	1d6cb624f4	add vectorizeVirtualColumns query context parameter (#10432 ) * add vectorizeVirtualColumns query context parameter * oops * spelling * default to false, more docs * fix test * fix spelling	2020-09-28 18:48:34 -07:00
Clint Wylie	3d700a5e31	vectorize remaining math expressions (#10429 ) * vectorize remaining math expressions * fixes * remove cannotVectorize() where no longer true * disable vectorized groupby for numeric columns with nulls * fixes	2020-09-26 23:30:14 -07:00
Jihoon Son	0cc9eb4903	Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided (#10288 ) * Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided * query context * fix tests; add more test * javadoc * docs and more tests * remove default and hadoop tests * consistent name and fix javadoc * spelling and field name * default function for partitionsSpec * other comments * address comments * fix tests and spelling * test * doc	2020-09-24 16:32:56 -07:00
Clint Wylie	19c4b16640	vectorized expressions and expression virtual columns (#10401 ) * vectorized expression virtual columns * cleanup * fixes * preserve float if explicitly specified * oops * null handling fixes, more tests * what is an expression planner? * better names * remove unused method, add pi * move vector processor builders into static methods * reduce boilerplate * oops * more naming adjustments * changes * nullable * missing hex * more	2020-09-23 13:56:38 -07:00
Gian Merlino	1af2eace41	Include Sequence-building time in CPU time metric. (#10377 ) * Include Sequence-building time in CPU time metric. Meaningful work can be done while building Sequences, and we should count this work. On the Broker, this includes subquery processing work done by the mergeResults call of the GroupByQueryQueryToolChest. * Add test.	2020-09-23 14:33:55 +08:00
Dylan Wylie	f3eb0cfb3b	Avoid large limits causing int overflow in buffer size checks (#10356 ) * Avoid large limits causing int overflow in buffer size checks * fix lgtm overflow warning Co-authored-by: Dylan <dwylie@spotx.tv>	2020-09-18 13:08:49 -07:00
Suneet Saldanha	f71ba6f2c2	Vectorized ANY aggregators (#10338 ) * WIP vectorized ANY aggregators * tests * fix aggs * cleanup * code review + tests * docs * use NilVectorSelector when needed * fix spellcheck * dont instantiate vectors * cleanup	2020-09-14 19:44:58 -07:00
Clint Wylie	e012d5c41b	allow vectorized query engines to utilize vectorized virtual columns (#10388 ) * allow vectorized query engines to utilize vectorized virtual column implementations * javadoc, refactor, checkstyle * intellij inspection and more javadoc * better * review stuffs * fix incorrect refactor, thanks tests * minor adjustments	2020-09-14 19:29:35 -07:00
Clint Wylie	184b202411	add computed Expr output types (#10370 ) * push down ValueType to ExprType conversion, tidy up * determine expr output type for given input types * revert unintended name change * add nullable * tidy up * fixup * more better * fix signatures * naming things is hard * fix inspection * javadoc * make default implementation of Expr.getOutputType that returns null * rename method * more test * add output for contains expr macro, split operation and function auto conversion	2020-09-14 18:18:56 -07:00
Abhishek Agarwal	f5e2645bbb	Support SearchQueryDimFilter in sql via new methods (#10350 ) * Support SearchQueryDimFilter in sql via new methods * Contains is a reserved word * revert unnecessary change * Fix toDruidExpression method * rename methods * java docs * Add native functions * revert change in dockerfile * remove changes from dockerfile * More tests * travis fix * Handle null values better	2020-09-14 09:57:54 -07:00
Jihoon Son	8f14ac814e	More structured way to handle parse exceptions (#10336 ) * More structured way to handle parse exceptions * checkstyle; add more tests * forbidden api; test * address comment; new test * address review comments * javadoc for parseException; remove redundant parseException in streaming ingestion * fix tests * unnecessary catch * unused imports * appenderator test * unused import	2020-09-11 16:31:10 -07:00
Joy Kent	e5f0da30ae	Fix stringFirst/stringLast rollup during ingestion (#10332 ) * Add IndexMergerRollupTest This changelist adds a test to merge indexes with StringFirst/StringLast aggregator. * Fix StringFirstAggregateCombiner/StringLastAggregateCombiner The segment-level type for stringFirst/stringLast is SerializablePairLongString, not String. This changelist fixes it. * Fix EarliestLatestAnySqlAggregator to handle COMPLEX type This changelist allows EarliestLatestAnySqlAggregator to accept COMPLEX type as an operand. For its return type, we set it to VARCHAR, since COMPLEX column is only generated by stringFirst/stringLast during ingestion rollup. * Return value with smaller timestamp in StringFirstAggregatorFactory.combine function * Add integration tests for stringFirst/stringLast during ingestion * Use one EarliestLatestReturnTypeInference instance Co-authored-by: Joy Kent <joy@automonic.ai>	2020-09-08 17:36:04 -07:00
Suneet Saldanha	91a153820e	fix NPE in StringGroupByColumnSelectorStrategy#bufferComparator (#10325 ) * fix NPE in StringGroupByColumnSelectorStrategy#bufferComparator * Add tests * javadocs	2020-09-04 13:23:40 -07:00
Gian Merlino	d7fcff3aba	StringFirstAggregatorFactory: Fix incorrect "combine" method. (#10351 ) * StringFirstAggregatorFactory: Fix incorrect "combine" method. There was a test, but it was wrong. * Fix superclass.	2020-09-03 20:03:26 -07:00
Gian Merlino	8ab1979304	Remove implied profanity from error messages. (#10270 ) i.e. WTF, WTH.	2020-08-28 11:38:50 -07:00
Gian Merlino	21703d81ac	Fix handling of 'join' on top of 'union' datasources. (#10318 ) * Fix handling of 'join' on top of 'union' datasources. The problem is that unions are typically rewritten into a series of individual queries on the underlying tables, but this isn't done when the union is wrapped in a join. The main changes are in UnionQueryRunner: 1) Replace an instanceof UnionQueryRunner check with DataSourceAnalysis. 2) Replace a "query.withDataSource" call with a new function, "Queries.withBaseDataSource". Together, these enable UnionQueryRunner to "see through" a join. * Tests. * Adjust heap sizes for integration tests. * Different approach, more tests. * Tweak. * Styling.	2020-08-26 14:23:54 -07:00
Jihoon Son	b9ff3483ac	Add support for all partitioing schemes for auto compaction (#10307 ) * Add support for all partitioing schemes for auto compaction * annotate last compaction state for multi phase parallel indexing * fix build and tests * test * better home	2020-08-26 13:19:18 -07:00
Clint Wylie	ab60661008	refactor internal type system (#9638 ) * better type tracking: add typed postaggs, finalized types for agg factories * more javadoc * adjustments * transition to getTypeName to be used exclusively for complex types * remove unused fn * adjust * more better * rename getTypeName to getComplexTypeName * setup expression post agg for type inference existing * more javadocs * fixup * oops * more test * more test * more comments/javadoc * nulls * explicitly handle only numeric and complex aggregators for incremental index * checkstyle * more tests * adjust * more tests to showcase difference in behavior * timeseries longsum array	2020-08-26 10:53:44 -07:00
Suneet Saldanha	a9de00d43a	Remove NUMERIC_HASHING_THRESHOLD (#10313 ) * Make NUMERIC_HASHING_THRESHOLD configurable Change the default numeric hashing threshold to 1 and make it configurable. Benchmarks attached to this PR show that binary searches are not more faster than doing a set contains check. The attached flamegraph shows the amount of time a query spent in the binary search. Given the benchmarks, we can expect to see roughly a 2x speed up in this part of the query which works out to ~ a 10% faster query in this instance. * Remove NUMERIC_HASHING_THRESHOLD * Remove stale docs	2020-08-25 20:05:39 -07:00
Gian Merlino	f53785c52c	ExpressionFilter: Use index for expressions of single multi-value columns. (#10320 ) Previously, this was disallowed, because expressions treated multi-values as nulls. But now, if there's a single multi-value column that can be mapped over, it's okay to use the index. Expression selectors already do this.	2020-08-24 23:29:31 -07:00
Suneet Saldanha	707b5aae2b	Optimize large InDimFilters (#10312 ) * Optimize large InDimFilters For large InDimFilters, in default mode, the filter does a linear check of the set to see if it contains either an empty or null. If it does, the empties are converted to nulls by passing through the entire list again. Instead of this, in default mode, we attempt to remove an empty string from the values that are passed to the InDimFilter. If an empty string was removed, we add null to the set * code review * Revert "code review" This reverts commit `61fe33ebf7`. * code review - less brittle	2020-08-24 16:39:27 -07:00
Clint Wylie	7620b0c54e	Segment backed broadcast join IndexedTable (#10224 ) * Segment backed broadcast join IndexedTable * fix comments * fix tests * sharing is caring * fix test * i hope this doesnt fix it * filter by schema to maybe fix test * changes * close join stuffs so it does not leak, allow table to directly make selector factory * oops * update comment * review stuffs * better check	2020-08-20 14:12:39 -07:00
Gian Merlino	6cca7242de	Add "offset" parameter to the Scan query. (#10233 ) * Add "offset" parameter to the Scan query. It works by doing the query as normal and then throwing away the first "offset" number of rows on the broker. * Fix constructor call. * Fix up JSONs. * Fix call to ScanQuery. * Doc update. * Fix javadocs. * Spotbugs, LGTM suppressions. * Javadocs. * Fix suppression. * Stabilize Scan query result order, add tests. * Update LGTM comment. * Fixup. * Test different batch sizes too. * Nicer tests. * Fix comment.	2020-08-13 14:56:24 -07:00
Clint Wylie	e053348f74	add hasNulls to ColumnCapabilities, ColumnAnalysis (#10219 ) * add isNullable to ColumnCapabilities, ColumnAnalysis * better builder * fix segment metadata queries in integration tests * adjustments * cleanup * fix spotbugs * treat unknown as true in segmentmetadata * rename to hasNulls, add docs * fixup * test the dim indexer selector isNull fix for numeric columns * fixes * oof	2020-08-13 14:55:32 -07:00
Jihoon Son	a61263b4a9	Allow forceLimitPushDown in SQL (#10253 ) * Allow forceLimitPushDown in SQL * fix test * fix test * review comments * fix test	2020-08-13 13:30:41 -07:00
Gian Merlino	89860b7d6a	Fix javadoc mistake in DefaultLimitSpec. (#10269 ) Javadoc for getLimit should say it's a limit, not an offset.	2020-08-13 12:17:26 -07:00
Gian Merlino	e273264332	Fix two id-over-maxId errors in StringDimensionIndexer. (#10245 ) 1) lookupId could return IDs beyond maxId if called with a recently added value. 2) getRow could return an ID for null beyond maxId, if null was recently encountered in a dimension that initially didn't appear at all. (In this case, the dictionary ID for null can be > 0). Also add a comment explaining how this stuff is supposed to work.	2020-08-11 20:32:10 -07:00
Clint Wylie	c72f96a4ba	fix bug with expressions on sparse string realtime columns without explicit null valued rows (#10248 ) * fix bug with realtime expressions on sparse string columns * fix test * add comment back * push capabilities for dimensions to dimension indexers since they know things * style * style * fixes * getting a bit carried away * missed one * fix it * benchmark build fix * review stuffs * javadoc and comments * add comment * more strict check * fix missed usaged of impl instead of interface	2020-08-11 11:07:17 -07:00
Abhishek Radhakrishnan	dc16abae34	Vectorization support for long, double, float min & max aggregators. (#10260 ) * LongMaxVectorAggregator support and test case. * DoubleMinVectorAggregator and test cases. * DoubleMaxVectorAggregator and unit test. * FloatMinVectorAggregator and FloatMaxVectorAggregator. * Documentation update to include the other vector aggregators. * Bug fix. * checkstyle formatting fixes. * CalciteQueryTest cases update. * Separate test classes for FloatMaxAggregation and FloatMniAggregation. * remove the cannotVectorize for float max/min aggregator in test. * Tests in GroupByQueryRunner, GroupByTimeseriesQueryRunner and TimeseriesQueryRunner.	2020-08-10 15:18:55 -07:00
Gian Merlino	170031744e	Combine InDimFilter, InFilter. (#10119 ) * Combine InDimFilter, InFilter. There are two motivations: 1. Ensure that when HashJoinSegmentStorageAdapter compares its Filter to the original one, and it is an "in" type, the comparison is by reference and does not need to check deep equality. This is useful when the "in" filter is very large. 2. Simplify things. (There isn't a great reason for the DimFilter and Filter logic to be separate, and combining them reduces some duplication.) * Fix test.	2020-08-06 18:34:21 -07:00
Gian Merlino	b6aaf59e8c	Add "offset" parameter to GroupBy query. (#10235 ) * Add "offset" parameter to GroupBy query. It works by doing the query as normal and then throwing away the first "offset" number of rows on the broker. * Stabilize GroupBy sorts. * Fix inspections. * Fix suppression. * Fixups. * Move TopNSequence to druid-core. * Addl comments. * NumberedElement equals verification. * Changes from review.	2020-08-05 15:39:58 -07:00
Abhishek Radhakrishnan	34a4113752	Add vectorization support for the longMin aggregator. (#10211 ) * Fix minor formatting in docs. * Add Nullhandling initialization for test to run from IDE. * Vectorize longMin aggregator. - A new vectorized class for the vectorized long min aggregator. - Changes to AggregatorFactory to support vectorize functionality. - Few changes to schema evolution test to add LongMinAggregatorFactory. * Add longSum to the supported vectorized aggregator implementations. * Add MIN() long min to calcite query test that can vectorize. * Add simple long aggregations test. * Fixup formatting per checkstyle guide. * fixup and add more tests for long min aggregator. * Override test for groupBy since timestamps are handled differently. * Null compatibility check in test. * Review comment: Add a test case to LongMinAggregationTest.	2020-08-01 15:32:09 -07:00
frank chen	646fa84d04	Support unit on byte-related properties (#10203 ) * support unit suffix on byte-related properties * add doc * change default value of byte-related properites in example files * fix coding style * fix doc * fix CI * suppress spelling errors * improve code according to comments * rename Bytes to HumanReadableBytes * add getBytesInInt to get value safely * improve doc * fix problem reported by CI * fix problem reported by CI * resolve code review comments * improve error message * improve code & doc according to comments * fix CI problem * improve doc * suppress spelling check errors	2020-07-31 09:58:48 +08:00
Maytas Monsereenusorn	574b062f1f	Cluster wide default query context setting (#10208 ) * Cluster wide default query context setting * Cluster wide default query context setting * Cluster wide default query context setting * add docs * fix docs * update props * fix checkstyle * fix checkstyle * fix checkstyle * update docs * address comments * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * fix NPE	2020-07-29 15:19:18 -07:00
Jihoon Son	63c1746fe4	Fix timeseries query constructor when postAggregator has an expression reading timestamp result column (#10198 ) * Fix timeseries query constructor when postAggregator has an expression reading timestamp result column * fix npe * Fix postAgg referencing timestampResultField and add a test for it * fix test * doc * revert doc	2020-07-27 10:54:44 -07:00
Jihoon Son	6fdce36e41	Add integration tests for query retry on missing segments (#10171 ) * Add integration tests for query retry on missing segments * add missing dependencies; fix travis conf * address comments * Integration tests extension * remove unused dependency * remove druid_main * fix java agent port	2020-07-22 22:30:35 -07:00
Jihoon Son	41982116f4	Report missing segments when there is no segment for the query datasource in historicals (#10199 ) * Report missing segments when there is no segment for the query datasource in historicals * test * missing part for test * another test	2020-07-20 21:02:52 -07:00
Nishant Bangarwa	971d8a353b	Add groupBy limitSpec to queryCache key (#10093 ) * Add groupBy limitSpec to queryCache key * Only add limitSpec to cache key if pushdown is set to true * review comment	2020-07-13 19:15:09 -07:00
Jihoon Son	53a2550571	Follow-up for RetryQueryRunner fix (#10144 ) * address comments; use guice instead of query context * typo * QueryResource tests * address comments * catch queryException * fix spell check	2020-07-08 13:28:11 -07:00
Clint Wylie	010fe047e1	AbstractOptimizableDimFilter should be public (#10142 )	2020-07-06 15:19:32 -07:00
Clint Wylie	c86e7ce30b	bump version to 0.20.0-SNAPSHOT (#10124 )	2020-07-06 15:08:32 -07:00
Jonathan Wei	ed981ef88e	Add DimFilter.toOptimizedFilter(), ensure that join filter pre-analysis operates on optimized filters (#10056 ) * Ensure that join filter pre-analysis operates on optimized filters, add DimFilter.toOptimizedFilter * Remove aggressive equality check that was used for testing * Use Suppliers.memoize * Checkstyle	2020-07-01 22:26:17 -07:00
Samarth Jain	e2c5bcc22d	Fix UnknownComplexTypeColumn#makeVectorObjectSelector. Add a warning … (#10123 ) * Fix UnknownComplexTypeColumn#makeVectorObjectSelector. Add a warning message to indicate failure in deserializing.	2020-07-01 20:06:23 -07:00
Samarth Jain	3e92cdf1cf	Revert "Fix UnknownTypeComplexColumn#makeVectorObjectSelector" (#10121 ) This reverts commit `7bb7489afc`.	2020-07-01 14:33:17 -07:00
Jihoon Son	657f8ee80f	Fix RetryQueryRunner to actually do the job (#10082 ) * Fix RetryQueryRunner to actually do the job * more javadoc * fix test and checkstyle * don't combine for testing * address comments * fix unit tests * always initialize response context in cachingClusteredClient * fix subquery * address comments * fix test * query id for builders * make queryId optional in the builders and ClusterQueryResult * fix test * suppress tests and unused methods * exclude groupBy builder * fix jacoco exclusion * add tests for builders * address comments * don't truncate	2020-07-01 14:02:21 -07:00
samarthjain	7bb7489afc	Fix UnknownTypeComplexColumn#makeVectorObjectSelector	2020-07-01 12:02:23 -07:00
Gian Merlino	5faa897a34	Join filter pre-analysis simplifications and sanity checks. (#10104 ) * Join filter pre-analysis simplifications and sanity checks. - At pre-analysis time, only compute pre-analysis for the innermost root query, since this is the one that will run on the join that involves the base datasource. Previously, pre-analyses were computed for multiple levels of the query, some of which were unnecessary. - Remove JoinFilterPreAnalysisGroup and join query level gathering code, since they existed to support precomputation of multiple pre-analyses. - Embed JoinFilterPreAnalysisKey into JoinFilterPreAnalysis and use it to sanity check at processing time that the correct pre-analysis was done. Tangentially related changes: - Remove prioritizeAndLaneQuery functionality from LocalQuerySegmentWalker. The computed priority and lanes were not being used. - Add "getBaseQuery" method to DataSourceAnalysis to support identification of the proper subquery for filter pre-analysis. * Fix compilation errors. * Adjust tests.	2020-06-30 19:14:22 -07:00
Samarth Jain	2c1b45842f	Prevent unknown complex types from breaking DruidSchema refresh (#9422 )	2020-06-30 14:06:17 -07:00
Suneet Saldanha	15a0b4ffe2	Filter http requests by http method (#10085 ) * Filter http requests by http method Add a config that allows a user which http methods to allow against their Druid server. Druid will only accept http requests with the method: GET, PUT, POST, DELETE and OPTIONS. If a Druid admin wants to allow other methods, they can do so by using the ServerConfig#allowedHttpMethods config. If a Druid user would like to disallow OPTIONS, this can be done by changing the AuthConfig#allowUnauthenticatedHttpOptions config * Exclude OPTIONS from always supported HTTP methods Add HEAD as an allowed method for web console e2e tests * fix docs * fix security IT * Actually fix the web console e2e tests * Ignore icode coverage for nitialization classes * code review	2020-06-29 16:59:31 -07:00
chenyuzhi459	a4c6d5f37e	fix query memory leak (#10027 ) * fix query memory leak * rollup ./idea * roll up .idea * clean code * optimize style * optimize cancel function * optimize style * add concurrentGroupTest test case * add test case * add unit test * fix code style * optimize cancell method use * format code * reback code * optimize cancelAll * clean code * add comment	2020-06-26 23:30:59 -07:00
Maytas Monsereenusorn	9be5039f68	Enable query vectorization by default (#10065 ) * Enable query vectorization by default * update docs	2020-06-24 13:08:49 -07:00
Maytas Monsereenusorn	f80c02da02	Fix HyperUniquesAggregatorFactory.estimateCardinality null handling to respect output type (#10063 ) * fix return type from HyperUniquesAggregator/HyperUniquesVectorAggregator * address comments * address comments	2020-06-23 15:54:37 -10:00
Clint Wylie	eee99ff0d5	minor rework of topn algorithm selection for clarity and more javadocs (#10058 ) * minor refactor of topn engine algorithm selection for clarity * adjust * more javadoc	2020-06-22 09:08:50 -07:00
Clint Wylie	c2f5d453f8	fix topn on string columns with non-sorted or non-unique dictionaries (#10053 ) * fix topn on string columns with non-sorted or non-unique dictionaries * fix metadata tests * refactor, clarify comments and code, fix ci failures	2020-06-19 11:35:18 -07:00
Jonathan Wei	37e150c075	Fix join filter rewrites with nested queries (#10015 ) * Fix join filter rewrites with nested queries * Fix test, inspection, coverage * Remove clauses from group key * Fix import order Co-authored-by: Gian Merlino <gianmerlino@gmail.com>	2020-06-18 21:32:29 -07:00
Clint Wylie	b5e6569d2c	global table only if joinable (#10041 ) * global table if only joinable * oops * fix style, add more tests * Update sql/src/test/java/org/apache/druid/sql/calcite/schema/DruidSchemaTest.java * better information schema columns, distinguish broadcast from joinable * fix javadoc * fix mistake Co-authored-by: Jihoon Son <jihoonson@apache.org>	2020-06-18 17:32:10 -07:00
Aleksey Plekhanov	2c384b61ff	IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty()" (#9690 ) IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty()" Reverted checkstyle rule * Added tests to pass CI * Codestyle	2020-06-18 09:47:07 -07:00
Maytas Monsereenusorn	7569ee3ec6	All aggregators should check if column can be vectorize (#10026 ) * All aggregators should use vectorization-aware column processor * All aggregators should use vectorization-aware column processor * fix canVectorize * fix canVectorize * add tests * revert back default * address comment * address comments * address comment * address comment	2020-06-17 01:52:02 -10:00
Clint Wylie	68aa384190	global table datasource for broadcast segments (#10020 ) * global table datasource for broadcast segments * tests * fix * fix test * comments and javadocs * review stuffs * use generated equals and hashcode	2020-06-16 17:58:05 -07:00
Suneet Saldanha	4e483a70b4	ROUND and having comparators correctly handle special double values (#10014 ) * ROUND and having comparators correctly handle doubles Double.NaN, Double.POSITIVE_INFINITY and Double.NEGATIVE_INFINITY are not real numbers. Because of this, they can not be converted to BigDecimal and instead throw a NumberFormatException. This change adds support for calculations that produce these numbers either for use in the `ROUND` function or the HavingSpecMetricComparator by not attempting to convert the number to a BigDecimal. The bug in ROUND was first introduced in #7224 where we added the ability to round to any decimal place. This PR changes the behavior back to using `Math.round` if we recognize a number that can not be converted to a BigDecimal. * Add tests and fix spellcheck * update error message in ExpressionsTest * Address comments * fix up round for infinity * round non numeric doubles returns a double * fix spotbugs * Update docs/misc/math-expr.md * Update docs/querying/sql.md	2020-06-16 16:09:46 -07:00
Gian Merlino	9330ca9717	Remove LegacyDataSource. (#10037 ) * Remove LegacyDataSource. Its purpose was to enable deserialization of strings into TableDataSources. But we can do this more straightforwardly with Jackson annotations. * Slight test improvement.	2020-06-16 14:40:35 -07:00
Clint Wylie	9468df4721	make phaser of ReferenceCountingCloseableObject protected instead of private so subclasses can do stuff with it (#10035 )	2020-06-15 19:56:49 -07:00
Stefan Birkner	7282e2f2f9	Simplify CompressedVSizeColumnarIntsSupplierTest (#10003 ) The parameters generator uses CompressionStrategy.noNoneValues() instead of CompressionStrategyTest.compressionStrategies() which wrapped each strategy in a single element array. This improves readability of the test.	2020-06-10 09:32:00 -07:00

1 2 3 4 5 ...

2431 Commits