druid

Commit Graph

Author	SHA1	Message	Date
kaijianding	e39ff44481	improve groupBy query granularity translation with 2x query performance improve when issued from sql layer (#11379 ) * improve groupBy query granularity translation when issued from sql layer * fix style * use virtual column to determine timestampResult granularity * dont' apply postaggregators on compute nodes * relocate constants * fix order by correctness issue * fix ut * use more easier understanding code in DefaultLimitSpec * address comment * rollback use virtual column to determine timestampResult granularity * fix style * fix style * address the comment * add more detail document to explain the tradeoff * address the comment * address the comment	2021-07-11 10:22:47 -07:00
Clint Wylie	17efa6f556	add single input string expression dimension vector selector and better expression planning (#11213 ) * add single input string expression dimension vector selector and better expression planning * better * fixes * oops * rework how vector processor factories choose string processors, fix to be less aggressive about vectorizing * oops * javadocs, renaming * more javadocs * benchmarks * use string expression vector processor with vector size 1 instead of expr.eval * better logging * javadocs, surprising number of the the * more * simplify	2021-07-06 11:20:49 -07:00
Abhishek Agarwal	03a6a6d6e1	Replace Processing ExecutorService with QueryProcessingPool (#11382 ) This PR refactors the code for QueryRunnerFactory#mergeRunners to accept a new interface called QueryProcessingPool instead of ExecutorService for concurrent execution of query runners. This interface will let custom extensions inject their own implementation for deciding which query-runner to prioritize first. The default implementation is the same as today that takes the priority of query into account. QueryProcessingPool can also be used as a regular executor service. It has a dedicated method for accepting query execution work so implementations can differentiate between regular async tasks and query execution tasks. This dedicated method also passes the QueryRunner object as part of the task information. This hook will let custom extensions carry any state from QuerySegmentWalker to QueryProcessingPool#mergeRunners which is not possible currently.	2021-07-01 16:03:08 +05:30
frank chen	906a704c55	Eliminate ambiguities of KB/MB/GB in the doc (#11333 ) * GB ---> GiB * suppress spelling check * MB --> MiB, KB --> KiB * Use IEC binary prefix * Add reference link * Fix doc style	2021-06-30 13:42:45 -07:00
Clint Wylie	df9b57aa1a	bitwise aggregators, better null handling options for expression agg (#11280 ) * bitwise aggregators, better nulls for expression agg * correct behavior * rework deserialize, better names * fix json, share mask	2021-06-25 16:51:16 -07:00
Xavier Léauté	712f2a5d00	upgrade error-prone to 2.7.1 and support checks with Java 11+ (#11363 ) * upgrade error-prone to 2.7.1 and support checks with Java 11+ - upgrade error-prone to 2.7.1 - support running error-prone with Java 11 and above using -Xplugin instead of custom compiler - add compiler arguments to ignore warnings/errors in Java 15/16 - introduce strictCompile property to enable strict profiles since we now need multiple strict profiles for Java 8 - properly exclude all generated source files from error-prone - fix druid-processing overriding annotation processors from parent pom - fix druid-core disabling most non-default checks - align plugin and annotation errorprone versions - fix / suppress additional issues found by error-prone: * fix bug in SeekableStreamSupervisor initializing ArrayList size with the taskGroupdId * fix missing @Override annotations - remove outdated compiler plugin in benchmarks - remove deleted ParameterPackage error-prone rule - re-enable checks on benchmark module as well * fix IntelliJ inspections * disable LongFloatConversion due to bug in error-prone with JDK 8 * add comment about InsecureCrypto	2021-06-16 12:55:34 -07:00
Clint Wylie	bfbd7ec432	fix a bugs related to SQL type inference return type nullability (#11327 ) * fix a bunch of type inference nullability bugs * fixes * style * fix test * fix concat	2021-06-15 12:26:59 -07:00
Clint Wylie	920aa414ca	enrich expression cache key information to support expressions which depend on external state (#11358 ) * enrich expression cache key information to support expressions which depend on external state such as lookups * cache rules everything around me * low carb * rename	2021-06-14 17:26:43 -07:00
Clint Wylie	50327b8f63	ignore bySegment query context for SQL queries (#11352 ) * ignore bySegment query context for SQL queries * revert unintended change	2021-06-11 13:49:03 -07:00
Clint Wylie	6b272c857f	adjust topn heap algorithm to only use known cardinality path when dictionary is unique (#11186 ) * adjust topn heap algorithm to only use known cardinality path when dictionary is unique * better check and add comment * adjust comment more	2021-06-10 18:32:22 -05:00
Jihoon Son	51f983101e	Fix wrong encoding in PredicateFilteredDimensionSelector.getRow (#11339 )	2021-06-10 09:14:34 -07:00
dependabot[bot]	167044f715	Bump fastutil from 8.2.3 to 8.5.4 (#11347 ) * Bump fastutil from 8.2.3 to 8.5.4 Bumps [fastutil](https://github.com/vigna/fastutil) from 8.2.3 to 8.5.4. - [Release notes](https://github.com/vigna/fastutil/releases) - [Changelog](https://github.com/vigna/fastutil/blob/master/CHANGES) - [Commits](https://github.com/vigna/fastutil/compare/8.2.3...8.5.4) --- updated-dependencies: - dependency-name: it.unimi.dsi:fastutil dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * update licenses.yaml * update maven dependency list for -core and -extra libraries to pass maven dependency checks Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xavier Léauté <xvrl@apache.org>	2021-06-10 07:43:18 -07:00
Maria Sitkovets	259207753d	Fix is null selector returning incorrect value for Long data type (#11170 ) * Fix is null selector returning incorrect value for Long data type * Fix style errors * Refactor getObject method to also cache null column values * Make lastInput variable nullable * Refactor unit test * Use new boolean lastInputIsNull instead of Long for lastInput to avoid boxing * Refactor to remove Long for input variable * Make a separate null caching variable * Cleaner null caching implementation	2021-05-19 20:47:02 -07:00
Clint Wylie	6d08a7051e	fix bug with aggregator expressions on realtime index with string columns always producing 0 values (#11185 ) * fix bug with aggregator expressions on realtime index with string columns always producing 0 values * more test * rework some stuff * javadocs	2021-05-17 11:59:13 -07:00
Clint Wylie	790262e5d0	add estimated byte size limit enforcement for heap based expression aggregator (#11236 )	2021-05-12 01:21:50 -07:00
Clint Wylie	f6662b4893	fix count and average SQL aggregators on constant virtual columns (#11208 ) * fix count and average SQL aggregators on constant virtual columns * style * even better, why are we tracking virtual columns in aggregations at all if we have a virtual column registry * oops missed a few * remove unused * this will fix it	2021-05-10 13:41:48 -07:00
Clint Wylie	691d7a1d54	SQL timeseries no longer skip empty buckets with all granularity (#11188 ) * SQL timeseries no longer skip empty buckets with all granularity * add comment, fix tests * the ol switcheroo * revert unintended change * docs and more tests * style * make checkstyle happy * docs fixes and more tests * add docs, tests for array_agg * fixes * oops * doc stuffs * fix compile, match doc style	2021-05-10 10:13:37 -07:00
Gian Merlino	a1f850d707	Fix vectorized cardinality bug on certain string columns. (#11199 ) * Fix vectorized cardinality bug on certain string columns. Fixes a bug introduced in #11182, related to the fact that in some cases, ColumnProcessors.makeVectorProcessor will call "makeObjectProcessor" instead of "makeSingleValueDimensionProcessor" or "makeMultiValueDimensionProcessor". CardinalityVectorProcessorFactory improperly ignored calls to "makeObjectProcessor". In addition to fixing the bug, I added this detail to the javadocs for VectorColumnProcessorFactory, to prevent others from running into the same thing in the future. They do not currently call out this case. * Improve test coverage. * Additional fixes.	2021-05-07 08:37:10 -07:00
Clint Wylie	554f1ffeee	ARRAY_AGG sql aggregator function (#11157 ) * ARRAY_AGG sql aggregator function * add javadoc * spelling * review stuff, return null instead of empty when nil input * review stuff * Update sql.md * use type inference for finalize, refactor some things	2021-05-03 22:17:10 -07:00
Gian Merlino	bef7cc911f	Vectorize the cardinality aggregator. (#11182 ) * Vectorize the cardinality aggregator. Does not include a byRow implementation, so if byRow is true then the aggregator still goes through the non-vectorized path. Testing strategy: - New tests that exercise both styles of "aggregate" for supported types. - Some existing tests have also become active (note the deleted "cannotVectorize" lines). * Adjust whitespace.	2021-05-03 20:27:02 -07:00
Gian Merlino	809e001939	Vectorize the DataSketches quantiles aggregator. (#11183 ) * Vectorize the DataSketches quantiles aggregator. Also removes synchronization for the BufferAggregator and VectorAggregator implementations, since it is not necessary (similar to #11115). Extends DoublesSketchAggregatorTest and DoublesSketchSqlAggregatorTest to run all test cases in vectorized mode. * Style fix.	2021-05-02 16:14:21 -07:00
Gian Merlino	046069f35a	Add a way to retrieve UTF-8 bytes directly via DimensionDictionarySelector. (#11172 ) * Add a way to retrieve UTF-8 bytes directly via DimensionDictionarySelector. The idea is that certain operations (like count distinct on strings) will be faster if they are able to run directly on UTF-8 bytes instead of on Java Strings decoded by "lookupName". * Add license header. * Updates suggested by robots.	2021-04-30 10:56:11 -07:00
Gian Merlino	6d82c3cbf1	StringComparators: No need to convert to UTF-8 for lexicographic comparison. (#11171 ) Lexicographic ordering of UTF-8 byte sequences and in-memory UTF-16 strings are equivalent. So, we can skip the (expensive) conversion and get an equivalent result. Thank you, Unicode!	2021-04-30 10:54:20 -07:00
Gian Merlino	7d808e357c	InDimFilter: Fix cache key computation to avoid collisions. (#11168 ) The prior code did not include separation between values, and encoded null ambiguously. This patch fixes both of those issues by encoding strings as length + value instead of just value. I think cache key computation was OK prior to #9800. Prior to that patch, the cache key was computed using CacheKeyBuilder.appendStrings, which encodes strings as UTF-8 and inserts a separator byte (0xff) between them that cannot appear in a UTF-8 stream.	2021-04-28 17:28:29 -07:00
Gian Merlino	ad028de538	InDimFilter: Fix NPE involving certain Set types. (#11169 ) * InDimFilter: Fix NPE involving certain Set types. Normally, InDimFilters that come from JSON have HashSets for "values". However, programmatically-generated filters (like the ones from #11068) may use other set types. Some set types, like TreeSets with natural ordering, will throw NPE on "contains(null)", which causes the InDimFilter's ValueMatcher to throw NPE if it encounters a null value. This patch adds code to detect if the values set can support contains(null), and if not, wrap that in a null-checking lambda. Also included: - Remove unneeded NullHandling.needsEmptyToNull method. - Update IndexedTableJoinable to generate a TreeSet that does not require lambda-wrapping. (This particular TreeSet is how I noticed the bug in the first place.) * Test fixes. * Improve test coverage	2021-04-28 14:13:42 -07:00
Harini Rajendran	8a3be6bccc	Fix TimeSeriesUnionQueryRunnerTest by extending InitializedNullHandlingTest (#11154 )	2021-04-23 08:56:03 -07:00
Clint Wylie	57ff1f9cdb	expression aggregator (#11104 ) * add experimental expression aggregator * add test * fix lgtm * fix test * adjust test * use not null constant * array_set_concat docs * add equals and hashcode and tostring * fix it * spelling * do multi-value magic for expression agg, more javadocs, tests * formatting * fix inspection * more better * nullable	2021-04-22 18:30:16 -07:00
Gian Merlino	202c78c8f3	Enable rewriting certain inner joins as filters. (#11068 ) * Enable rewriting certain inner joins as filters. The main logic for doing the rewrite is in JoinableFactoryWrapper's segmentMapFn method. The requirements are: - It must be an inner equi-join. - The right-hand columns referenced by the condition must not contain any duplicate values. (If they did, the inner join would not be guaranteed to return at most one row for each left-hand-side row.) - No columns from the right-hand side can be used by anything other than the join condition itself. HashJoinSegmentStorageAdapter is also modified to pass through to the base adapter (even allowing vectorization!) in the case where 100% of join clauses could be rewritten as filters. In support of this goal: - Add Query getRequiredColumns() method to help us figure out whether the right-hand side of a join datasource is being used or not. - Add JoinConditionAnalysis getRequiredColumns() method to help us figure out if the right-hand side of a join is being used by later join clauses acting on the same base. - Add Joinable getNonNullColumnValuesIfAllUnique method to enable retrieving the set of values that will form the "in" filter. - Add LookupExtractor canGetKeySet() and keySet() methods to support LookupJoinable in its efforts to implement the new Joinable method. - Add "enableRewriteJoinToFilter" feature flag to JoinFilterRewriteConfig. The default is disabled. * Test improvements. * Test fixes. * Avoid slow size() call. * Remove invalid test. * Fix style. * Fix mistaken default. * Small fixes. * Fix logic error.	2021-04-14 10:49:27 -07:00
Maytas Monsereenusorn	f968400170	Introduce a new configuration that skip storing audit payload if payload size exceed limit and skip storing null fields for audit payload (#11078 ) * Add config to skip storing audit payload if exceed limit * fix checkstyle * change config name * skip null fields for audit payload * fix checkstyle * address comments * fix guice * fix test * add tests * address comments * address comments * address comments * fix checkstyle * address comments * fix test * fix test * address comments * Address comments Co-authored-by: Jihoon Son <jihoonson@apache.org> Co-authored-by: Jihoon Son <jihoonson@apache.org>	2021-04-13 20:18:28 -07:00
Clint Wylie	08d3786738	improve bitmap vector offset to report contiguous groups (#11039 ) * improve bitmap vector offset to report contiguous groups * benchmark style * check for contiguous in getOffsets, tests for exceptions	2021-04-13 11:47:01 -07:00
Gian Merlino	c158207ab6	Rename BitmapOperationTest base class to avoid flaky test. (#11102 ) PR #10936 renamed BitmapBenchmark, the parent of a couple of bitmap tests, to BitmapOperationTest. This patch renames it to BitmapOperationTestBase so JUnit doesn't pick it up as a test case. When JUnit picks it up, it becomes a flaky test, since its behavior and correctness depends on whether it runs before or after its subclasses.	2021-04-13 08:01:15 -07:00
Gian Merlino	c8e394015d	LongsLongEncodingReader: Implement "duplicate", fixing concurrency bug. (#11098 ) Regression introduced in #11004 due to overzealous optimization. Even though we replaced stateful usage of ByteBuffer with stateless usage of Memory, we still need to create a new object on "duplicate" due to semantics of setBuffer.	2021-04-13 08:01:01 -07:00
Jihoon Son	25db8787b3	Fix CAST being ignored when aggregating on strings after cast (#11083 ) * Fix CAST being ignored when aggregating on strings after cast * fix checkstyle and dependency * unused import	2021-04-12 22:21:24 -07:00
BIGrey	d33fdd093b	Nested GroupBy query got wrong/empty result when using virtual column and filter (#11081 ) * fix nested groupby got empty result when using virtual column * move to query.getVirtualColumns().wrap instead of new VirtualizedColumnSelectorFactory * move test to GroupByQueryRunnerTest * Update processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryRunnerTest.java Co-authored-by: huagnhui.bigrey <huanghui.bigrey@bytedance.com> Co-authored-by: Jihoon Son <jihoonson@apache.org>	2021-04-10 21:29:41 -07:00
Clint Wylie	338886fd5f	vector group by support for string expressions (#11010 ) * vector group by support for string expressions * fix test * comments, javadoc	2021-04-08 19:23:39 -07:00
Abhishek Agarwal	0df0bff44b	Enable multiple distinct aggregators in same query (#11014 ) * Enable multiple distinct count * Add more tests * fix sql test * docs fix * Address nits	2021-04-07 00:52:19 -07:00
Clint Wylie	c0e6d1c7f8	vectorize 'auto' long decoding (#11004 ) * Vectorize LongDeserializers. Also, add many more tests. * more faster * more more faster * more cleanup * fixes * forbidden * benchmark style * idk why * adjust * add preconditions for value >= 0 for writers * add 64 bit exception Co-authored-by: Gian Merlino <gian@imply.io>	2021-03-26 18:39:13 -07:00
Gian Merlino	bf20f9e979	DruidInputSource: Fix issues in column projection, timestamp handling. (#10267 ) * DruidInputSource: Fix issues in column projection, timestamp handling. DruidInputSource, DruidSegmentReader changes: 1) Remove "dimensions" and "metrics". They are not necessary, because we can compute which columns we need to read based on what is going to be used by the timestamp, transform, dimensions, and metrics. 2) Start using ColumnsFilter (see below) to decide which columns we need to read. 3) Actually respect the "timestampSpec". Previously, it was ignored, and the timestamp of the returned InputRows was set to the `__time` column of the input datasource. (1) and (2) together fix a bug in which the DruidInputSource would not properly read columns that are used as inputs to a transformSpec. (3) fixes a bug where the timestampSpec would be ignored if you attempted to set the column to something other than `__time`. (1) and (3) are breaking changes. Web console changes: 1) Remove "Dimensions" and "Metrics" from the Druid input source. 2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for compatibility with the new behavior. Other changes: 1) Add ColumnsFilter, a new class that allows input readers to determine which columns they need to read. Currently, it's only used by the DruidInputSource, but it could be used by other columnar input sources in the future. 2) Add a ColumnsFilter to InputRowSchema. 3) Remove the metric names from InputRowSchema (they were unused). 4) Add InputRowSchemas.fromDataSchema method that computes the proper ColumnsFilter for given timestamp, dimensions, transform, and metrics. 5) Add "getRequiredColumns" method to TransformSpec to support the above. * Various fixups. * Uncomment incorrectly commented lines. * Move TransformSpecTest to the proper module. * Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting. * Fix. * Fix build. * Checkstyle. * Misc fixes. * Fix test. * Move config. * Fix imports. * Fixup. * Fix ShuffleResourceTest. * Add import. * Smarter exclusions. * Fixes based on tests. Also, add TIME_COLUMN constant in the web console. * Adjustments for tests. * Reorder test data. * Update docs. * Update docs to say Druid 0.22.0 instead of 0.21.0. * Fix test. * Fix ITAutoCompactionTest. * Changes from review & from merging.	2021-03-25 10:32:21 -07:00
Maytas Monsereenusorn	f19c2e9ce4	If ingested data has sparse columns, the ingested data with forceGuaranteedRollup=true can result in imperfect rollup and final dimension ordering can be different from dimensionSpec ordering in the ingestionSpec (#10948 ) * add IT * add IT * add the fix * fix checkstyle * fix compile * fix compile * fix test * fix test * address comments	2021-03-18 17:04:28 -07:00
Xavier Léauté	1061faa6ba	prefer string concatenation over String.format in performance sensitive code (#10997 ) String.format relies on regex parsing, which makes these calls expensive at higher request volumes.	2021-03-16 22:06:26 -07:00
Clint Wylie	4cd4a22f87	expression filter support for vectorized query engines (#10613 ) * expression filter support for vectorized query engines * remove unused codes * more tests * refactor, more tests * suppress * more * more * more * oops, i was wrong * comment * remove decorate, object dimension selector, more javadocs * style	2021-03-16 11:46:50 -07:00
Xavier Léauté	d26e1bc70d	update code check plugins for Java 15 support (#10978 ) * update maven-forbidden-api plugin to 3.1 * update maven-pmd-plugin to 3.14 * update spotbugs to 4.2.2 * fixes validation failures newly caught by those updates - fix SpotBugs NP_NONNULL_PARAM_VIOLATION - fix PMD UnnecessaryFullyQualifiedName	2021-03-11 07:31:41 -08:00
frank chen	b808fd2ef9	Fix NPE in the constructor of TopNQuery (#10969 ) * fix NPE * Add unit tests to cover parameter checking	2021-03-11 00:04:49 -08:00
Clint Wylie	58294329b7	fix SQL issue for group by queries with time filter that gets optimized to false (#10968 ) * fix SQL issue for group by queries with time filter that gets optimized to false * short circuit always false in CombineAndSimplifyBounds * adjust * javadocs * add preconditions for and/or filters to ensure they have children * add comments, remove preconditions	2021-03-09 19:41:16 -08:00
Abhishek Agarwal	c66951a59e	Add flag in SQL to disable left base filter optimization for joins (#10947 ) * Add flag to disable left base filter * code coverage * Draft * Review comments * code coverage * add docs * Add old tests	2021-03-09 13:07:34 -08:00
Jihoon Son	2c30f8b3b7	Migrate bitmap benchmarks to JMH (#10936 ) * Migrate bitmap benchmarks to JMH * add concise	2021-03-04 12:50:55 -08:00
Abhishek Agarwal	1a15987432	Supporting filters in the left base table for join datasources (#10697 ) * where filter left first draft * Revert changes in calcite test * Refactor a bit * Fixing the Tests * Changes * Adding tests * Add tests for correlated queries * Add comment * Fix typos	2021-03-04 10:39:21 -08:00
Gian Merlino	87a2abff79	Fix runtime error when IndexedTableJoinMatcher matches long selector to unique string index. (#10942 ) * Fix runtime error when IndexedTableJoinMatcher matches long selector to unique string index. The issue arises when matching against a long selector on the left-hand side to a string typed Index on the right-hand side, and when that Index also returns true from areKeysUnique. In this case, IndexedTableJoinMatcher would generate a ConditionMatcher that implements matchSingleRow by calling findUniqueLong on the Index. This is inappropriate because the Index is actually string typed. The fix is to check the type of the Index before deciding how to implement the ConditionMatcher. The patch adds "testMatchSingleRowToUniqueStringIndex" to IndexedTableJoinMatcherTest, which explores this case. * Update tests.	2021-03-04 00:57:59 -08:00
Gian Merlino	07902f607b	Granularity: Introduce primitive-typed bucketStart, increment methods. (#10904 ) * Granularity: Introduce primitive-typed bucketStart, increment methods. Saves creation of unnecessary DateTime objects in timestamp_floor and timestamp_ceil expressions. * Fix style. * Amp up the test coverage.	2021-02-25 07:59:20 -08:00
Gian Merlino	b7e9f5bc85	BoundDimFilter: Simplify the various DruidLongPredicates. (#10906 ) They all use Long.compare, but they don't need to. Changing to regular comparisons simplifies the code and also removes branches. (Internally, Long.compare has two branches.)	2021-02-19 16:44:56 -08:00
Abhishek Agarwal	8718155f8f	Allow for empty keys in hash map (#10869 ) * allow for empty keys in hash map * fix serde test	2021-02-10 11:19:57 -08:00
Jihoon Son	ac41e41232	Update doc for query errors and add unit tests for JsonParserIterator (#10833 ) * Update doc for query errors and add unit tests for JsonParserIterator * static constructor for convenience * rename method	2021-02-05 02:55:32 -08:00
Jihoon Son	3f8f00a231	Fix CVE-2021-25646 (#10818 )	2021-02-04 11:21:43 -08:00
Gian Merlino	6c0c6e60b3	Vectorized theta sketch aggregator + rework of VectorColumnProcessorFactory. (#10767 ) * Vectorized theta sketch aggregator. Also a refactoring of BufferAggregator and VectorAggregator such that they share a common interface, BaseBufferAggregator. This allows implementing both in the same file with an abstract + dual subclass structure. * Rework implementation to use composition instead of inheritance. * Rework things to enable working properly for both complex types and regular types. Involved finally moving makeVectorProcessor from DimensionHandlerUtils into ColumnProcessors and harmonizing the two things. * Add missing method. * Style and name changes. * Fix issues from inspections. * Fix style issue.	2021-01-29 09:30:09 -08:00
Clint Wylie	cd6af93274	add leftover tests from #10743 (#10766 )	2021-01-22 09:20:48 -08:00
Gian Merlino	8b808c4879	Retain order of AND, OR filter children. (#10758 ) * Retain order of AND, OR filter children. If we retain the order, it enables short-circuiting. People can put a more selective filter earlier in the list and lower the chance that later filters will need to be evaluated. Short-circuiting was working before #9608, which switched to unordered sets to solve a different problem. This patch tries to solve that problem a different way. This patch moves filter simplification logic from "optimize" to "toFilter", because that allows the code to be shared with Filters.and and Filters.or. The simplification has become more complicated and so it's useful to share it. This patch also removes code from CalciteCnfHelper that is no longer necessary because Filters.and and Filters.or are now doing the work. * Fixes for inspections. * Fix tests. * Back to a Set.	2021-01-20 08:59:20 -08:00
zhangyue19921010	2590ad4f67	Historical unloads damaged segments automatically when lazy on start. (#10688 ) * ready to test * tested on dev cluster * tested * code review * add UTs * add UTs * ut passed * ut passed * opti imports * done * done * fix checkstyle * modify uts * modify logs * changing the package of SegmentLazyLoadFailCallback.java to org.apache.druid.segment * merge from master * modify import orders * merge from master * merge from master * modify logs * modify docs * modify logs to rerun ci * modify logs to rerun ci * modify logs to rerun ci * modify logs to rerun ci * modify logs to rerun ci * modify logs to rerun ci * modify logs to rerun ci * modify logs to rerun ci Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-01-16 19:53:30 -08:00
Gian Merlino	2b24dc3764	SegmentAnalyzer: Properly close column after retrieving it. (#10772 )	2021-01-16 19:26:34 -08:00
Jihoon Son	95065bdf1a	Bump dev version to 0.22.0-SNAPSHOT (#10759 )	2021-01-15 13:16:23 -08:00
Gian Merlino	a82910e065	OrFilter: Properly handle child matchers that return the original mask. (#10754 ) * OrFilter: Properly handle child matchers that return the original mask. This happens when a child matcher is literally true (for example, BooleanVectorValueMatcher). In this case, OrFilter would throw this exception from its call to removeAll while processing the next filter: java.lang.IllegalStateException: 'other' must be a different instance from 'this' Also update the javadocs for VectorValueMatcher to call out that the returned object may be the same as the input mask. * Fix style.	2021-01-14 23:28:13 -08:00
Gian Merlino	7354953b1b	VectorMatch: Disallow "copyFrom", "addAll" on self; improve tests. (#10755 ) No existing code relies on being able to call these methods in this way. The new tests exhaustively test all vectors up to size 7, and also test behavior the run-on-self behavior that has been adjusted by this patch.	2021-01-14 18:29:13 -08:00
Gian Merlino	2bbf89db81	Remove FalseVectorMatcher, TrueVectorMatcher in favor of BooleanVectorValueMatcher. (#10757 )	2021-01-14 18:28:25 -08:00
Jihoon Son	149306c9db	Tidy up HTTP status codes for query errors (#10746 ) * Tidy up query error codes * fix tests * Restore query exception type in JsonParserIterator * address review comments; add a comment explaining the ugly switch * fix test	2021-01-13 17:20:00 -08:00
Clint Wylie	8c3c9b4060	fix limited queries with subtotals (#10743 ) * i put my thing down, flip it and reverse it * oops	2021-01-13 12:55:24 -08:00
Clint Wylie	9362dc7968	re-use expression vector evaluation results for the same offset in expression vector selectors (#10614 ) * cache expression selector results by associating vector expression bindings to underlying vector offset * better coverage, fix floats * style * stupid bot * stupid me * more test * intellij threw me under the bus when it generated those junit methods * narrow interface instead of passing around offset	2021-01-13 12:44:56 -08:00
秦臻	c62b7c19c3	javascript filter result convert to java boolean (#10721 ) * javascript filter result convert to java boolean * use type convert replace script convert, and add more unit test Co-authored-by: qinzhen <qinzhen@kuaishou.com>	2021-01-08 14:30:09 -08:00
Gian Merlino	6eef0e4c9f	Fix collision between #10689 and #10593 . (#10738 )	2021-01-08 09:52:27 -08:00
Aleksey Plekhanov	26bcd47e51	Thread-safety for ResponseContext.REGISTERED_KEYS (#9667 )	2021-01-08 00:37:49 -08:00
Liran Funaro	08ab82f55c	IncrementalIndex Tests and Benchmarks Parametrization (#10593 ) * Remove redundant IncrementalIndex.Builder * Parametrize incremental index tests and benchmarks - Reveal and fix a bug in OffheapIncrementalIndex * Fix forbiddenapis error: Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale] * Fix Intellij errors: declared exception is never thrown * Add documentation and validate before closing objects on tearDown. * Add documentation to OffheapIncrementalIndexTestSpec * Doc corrections and minor changes. * Add logging for generated rows. * Refactor new tests/benchmarks. * Improve IncrementalIndexCreator documentation * Add required tests for DataGenerator * Revert "rollupOpportunity" to be a string	2021-01-07 22:18:47 -08:00
Gian Merlino	48e576a307	Scan query: More accurate error message when segment per time chunk limit is exceeded. (#10630 ) * Scan query: More accurate error message when segment per time chunk limit is exceeded. * Add guardrail test.	2021-01-06 14:11:28 -08:00
Jonathan Wei	68bb038b31	Multiphase segment merge for IndexMergerV9 (#10689 ) * Multiphase merge for IndexMergerV9 * JSON fix * Cleanup temp files * Docs * Address logging and add IT * Fix spelling and test unloader datasource name	2021-01-05 22:19:09 -08:00
Abhishek Agarwal	796c25532e	Fix post-aggregator computation when used with subtotals (#10653 ) * Fix post-aggregator computation * remove commented code * Fix numeric null handling * Add test when subquery returns null long	2020-12-17 20:10:26 -08:00
Abhishek Agarwal	26d74b3580	Add grouping_id function (#10518 ) * First draft of grouping_id function * Add more tests and documentation * Add calcite tests * Fix travis failures * bit of a change * Add documentation * Fix typos * typo fix	2020-12-07 11:46:29 -08:00
Maytas Monsereenusorn	7eb5f59a9a	Fix string byte calculation in StringDimensionIndexer (#10623 ) * fix string byte calculation * fix tests * fix test	2020-12-04 00:51:48 -08:00
Himanshu	813e18774e	make dimension column extensible with COMPLEX type (#10277 ) * make dimension column extensible with COMPLEX type * more changes Change-Id: I9707dd644b8d71030b74a8c1d6fff0c0020d960d * processing module changes for build fix Change-Id: I146f95a41b79d20edb1721be13f0e9641f788e0e * rename ColumnCapabilities.getTypeName() to getComplexTypeName() * rename ColumnBuilder.setTypeName(..) -> ColumnBuilder.setComplexTypeName(..)	2020-12-03 08:58:17 -08:00
Lucas Capistrant	2e02eebd9d	Add context dimension to DefaultQueryMetrics (#10578 ) * Add context dimension to DefaultQueryMetrics * remove redundant addition of context dimension from DruidMetrics now that QueryMetrics adds it by default * update SearchQueryMetrics to reflect the same pattern as other default dimensions in QueryMetrics * add PublicApi annotation for context in QueryMetrics Interface	2020-12-01 18:34:03 -08:00
Lucas Capistrant	2560bf0a19	Add new coordinator metrics for coordinator duty runtimes (#10603 ) * Add new coordinator metrics for duty runtimes * fix spelling for a constant variable value * add comment clarifying why the global runtime metric is emitted where it is * Remove duty alias in lieu of using the class name for metrics * fix docs * CoordinatorStats tests + add duty stats to accumulate() logic	2020-11-29 14:47:35 -08:00
frank chen	fe693a4f01	Improve doc and exception message for invalid user configurations (#10598 ) * improve doc and exception message * add spelling check rules and remove unused import * add a test to improve test coverage	2020-11-23 15:03:13 -08:00
frank chen	d7d2c804ad	Add zero period support to TIMESTAMPADD (#10550 ) * Allow zero period for TIMESTAMPADD * update test cases * add empty zone test case * add unit test cases for TimestampShiftMacro	2020-11-18 18:26:53 -08:00
frank chen	e83d5cb59e	Fix ingestion failure of pretty-formatted JSON message (#10383 ) * support multi-line text * add test cases * split json text into lines case by case * improve exception handle * fix CI * use IntermediateRowParsingReader as base of JsonReader * update doc * ignore the non-immutable field in test case * add more test cases * mark `lineSplittable` as final * fix testcases * fix doc * add a test case for SqlReader * return all raw columns when exception occurs * fix CI * fix test cases * resolve review comments * handle ParseException returned by index.add * apply Iterables.getOnlyElement * fix CI * fix test cases * improve code in more graceful way * fix test cases * fix test cases * add a test case to check multiple json string in one text block * fix inspection check	2020-11-13 13:59:23 -08:00
Atul Mohan	6ccddedb7a	Improved exception handling in case of query timeouts (#10464 ) * Separate timeout exceptions * Add more tests Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>	2020-11-03 09:00:33 -06:00
Clint Wylie	d0821de854	support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions (#10499 ) * support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions * inspector * changes * more test * clean	2020-10-26 19:55:24 -07:00
Liran Funaro	f3a2903218	Configurable Index Type (#10335 ) * Introduce a Configurable Index Type * Change to @UnstableApi * Add AppendableIndexSpecTest * Update doc * Add spelling exception * Add tests coverage * Revert some of the changes to reduce diff * Minor fixes * Update getMaxBytesInMemoryOrDefault() comment * Fix typo, remove redundant interface * Remove off-heap spec (postponed to a later PR) * Add javadocs to AppendableIndexSpec * Describe testCreateTask() * Add tests for AppendableIndexSpec within TuningConfig * Modify hashCode() to conform with equals() * Add comment where building incremental-index * Add "EqualsVerifier" tests * Revert some of the API back to AppenderatorConfig * Don't use multi-line comments * Remove knob documentation (deferred)	2020-10-23 18:34:26 -07:00
Abhishek Agarwal	567e381705	Any virtual column on "__time" should be a pre-join virtual column (#10451 ) * Virtual column on __time should be in pre-join * Add unit test	2020-10-12 13:04:55 -07:00
Abhishek Agarwal	4d2a92f46a	Add caching support to join queries (#10366 ) * Proposed changes for making joins cacheable * Add unit tests * Fix tests * simplify logic * Pull empty byte array logic out of CachingQueryRunner * remove useless null check * Minor refactor * Fix tests * Fix segment caching on Broker * Move join cache key computation in Broker Move join cache key computation in Broker from ResultLevelCachingQueryRunner to CachingClusteredClient * Fix compilation * Review comments * Add more tests * Fix inspection errors * Pushed condition analysis to JoinableFactory * review comments * Disable join caching for broker and add prefix key to BroadcastSegmentIndexedTable * Remove commented lines * Fix populateCache * Disable caching for selective datasources Refactored the code so that we can decide at the data source level, whether to enable cache for broker or data nodes	2020-10-09 17:42:30 -07:00
Jihoon Son	1deed9fbcd	Close aggregators in HashVectorGrouper.close() (#10452 ) * Close aggregators in HashVectorGrouper.close() * reuse grouper * Add missing dependency	2020-10-06 10:17:33 -07:00
Clint Wylie	207ef310f2	vectorized group by support for nullable numeric columns (#10441 ) * vectorized group by support for numeric null columns * revert unintended change * adjust * review stuffs	2020-10-05 21:53:53 -07:00
Jonathan Wei	65c0d64676	Update version to 0.21.0-SNAPSHOT (#10450 ) * [maven-release-plugin] prepare release druid-0.21.0 * [maven-release-plugin] prepare for next development iteration * Update web-console versions	2020-10-03 16:08:34 -07:00
Clint Wylie	9ec5c08e2a	fix array types from escaping into wider query engine (#10460 ) * fix array types from escaping into wider query engine * oops * adjust * fix lgtm	2020-10-03 15:30:34 -07:00
Clint Wylie	753bce324b	vectorize constant expressions with optimized selectors (#10440 )	2020-09-29 13:19:06 -07:00
Gian Merlino	2be1ae128f	RowBasedIndexedTable: Add specialized index types for long keys. (#10430 ) * RowBasedIndexedTable: Add specialized index types for long keys. Two new index types are added: 1) Use an int-array-based index in cases where the difference between the min and max values isn't too large, and keys are unique. 2) Use a Long2ObjectOpenHashMap (instead of the prior Java HashMap) in all other cases. In addition: 1) RowBasedIndexBuilder, a new class, is responsible for picking which index implementation to use. 2) The IndexedTable.Index interface is extended to support using unboxed primitives in the unique-long-keys case, and callers are updated to use the new functionality. Other key types continue to use indexes backed by Java HashMaps. * Fixup logic. * Add tests.	2020-09-29 10:46:47 -07:00
Gian Merlino	599aacce0f	Remove Expr.visit. (#10437 ) * Remove Expr.visit. It isn't used and doesn't have tests. * Remove Visitor too.	2020-09-28 22:13:10 -07:00
Clint Wylie	1d6cb624f4	add vectorizeVirtualColumns query context parameter (#10432 ) * add vectorizeVirtualColumns query context parameter * oops * spelling * default to false, more docs * fix test * fix spelling	2020-09-28 18:48:34 -07:00
Clint Wylie	3d700a5e31	vectorize remaining math expressions (#10429 ) * vectorize remaining math expressions * fixes * remove cannotVectorize() where no longer true * disable vectorized groupby for numeric columns with nulls * fixes	2020-09-26 23:30:14 -07:00
Jihoon Son	0cc9eb4903	Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided (#10288 ) * Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided * query context * fix tests; add more test * javadoc * docs and more tests * remove default and hadoop tests * consistent name and fix javadoc * spelling and field name * default function for partitionsSpec * other comments * address comments * fix tests and spelling * test * doc	2020-09-24 16:32:56 -07:00
Clint Wylie	19c4b16640	vectorized expressions and expression virtual columns (#10401 ) * vectorized expression virtual columns * cleanup * fixes * preserve float if explicitly specified * oops * null handling fixes, more tests * what is an expression planner? * better names * remove unused method, add pi * move vector processor builders into static methods * reduce boilerplate * oops * more naming adjustments * changes * nullable * missing hex * more	2020-09-23 13:56:38 -07:00
Gian Merlino	1af2eace41	Include Sequence-building time in CPU time metric. (#10377 ) * Include Sequence-building time in CPU time metric. Meaningful work can be done while building Sequences, and we should count this work. On the Broker, this includes subquery processing work done by the mergeResults call of the GroupByQueryQueryToolChest. * Add test.	2020-09-23 14:33:55 +08:00
Dylan Wylie	f3eb0cfb3b	Avoid large limits causing int overflow in buffer size checks (#10356 ) * Avoid large limits causing int overflow in buffer size checks * fix lgtm overflow warning Co-authored-by: Dylan <dwylie@spotx.tv>	2020-09-18 13:08:49 -07:00
Suneet Saldanha	f71ba6f2c2	Vectorized ANY aggregators (#10338 ) * WIP vectorized ANY aggregators * tests * fix aggs * cleanup * code review + tests * docs * use NilVectorSelector when needed * fix spellcheck * dont instantiate vectors * cleanup	2020-09-14 19:44:58 -07:00
Clint Wylie	e012d5c41b	allow vectorized query engines to utilize vectorized virtual columns (#10388 ) * allow vectorized query engines to utilize vectorized virtual column implementations * javadoc, refactor, checkstyle * intellij inspection and more javadoc * better * review stuffs * fix incorrect refactor, thanks tests * minor adjustments	2020-09-14 19:29:35 -07:00

1 2 3 4 5 ...

2486 Commits