druid

Commit Graph

Author	SHA1	Message	Date
Maytas Monsereenusorn	9be5039f68	Enable query vectorization by default (#10065 ) * Enable query vectorization by default * update docs	2020-06-24 13:08:49 -07:00
Maytas Monsereenusorn	f80c02da02	Fix HyperUniquesAggregatorFactory.estimateCardinality null handling to respect output type (#10063 ) * fix return type from HyperUniquesAggregator/HyperUniquesVectorAggregator * address comments * address comments	2020-06-23 15:54:37 -10:00
Clint Wylie	eee99ff0d5	minor rework of topn algorithm selection for clarity and more javadocs (#10058 ) * minor refactor of topn engine algorithm selection for clarity * adjust * more javadoc	2020-06-22 09:08:50 -07:00
Clint Wylie	c2f5d453f8	fix topn on string columns with non-sorted or non-unique dictionaries (#10053 ) * fix topn on string columns with non-sorted or non-unique dictionaries * fix metadata tests * refactor, clarify comments and code, fix ci failures	2020-06-19 11:35:18 -07:00
Jonathan Wei	37e150c075	Fix join filter rewrites with nested queries (#10015 ) * Fix join filter rewrites with nested queries * Fix test, inspection, coverage * Remove clauses from group key * Fix import order Co-authored-by: Gian Merlino <gianmerlino@gmail.com>	2020-06-18 21:32:29 -07:00
Clint Wylie	b5e6569d2c	global table only if joinable (#10041 ) * global table if only joinable * oops * fix style, add more tests * Update sql/src/test/java/org/apache/druid/sql/calcite/schema/DruidSchemaTest.java * better information schema columns, distinguish broadcast from joinable * fix javadoc * fix mistake Co-authored-by: Jihoon Son <jihoonson@apache.org>	2020-06-18 17:32:10 -07:00
Aleksey Plekhanov	2c384b61ff	IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty()" (#9690 ) IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty()" Reverted checkstyle rule * Added tests to pass CI * Codestyle	2020-06-18 09:47:07 -07:00
Maytas Monsereenusorn	7569ee3ec6	All aggregators should check if column can be vectorize (#10026 ) * All aggregators should use vectorization-aware column processor * All aggregators should use vectorization-aware column processor * fix canVectorize * fix canVectorize * add tests * revert back default * address comment * address comments * address comment * address comment	2020-06-17 01:52:02 -10:00
Clint Wylie	68aa384190	global table datasource for broadcast segments (#10020 ) * global table datasource for broadcast segments * tests * fix * fix test * comments and javadocs * review stuffs * use generated equals and hashcode	2020-06-16 17:58:05 -07:00
Suneet Saldanha	4e483a70b4	ROUND and having comparators correctly handle special double values (#10014 ) * ROUND and having comparators correctly handle doubles Double.NaN, Double.POSITIVE_INFINITY and Double.NEGATIVE_INFINITY are not real numbers. Because of this, they can not be converted to BigDecimal and instead throw a NumberFormatException. This change adds support for calculations that produce these numbers either for use in the `ROUND` function or the HavingSpecMetricComparator by not attempting to convert the number to a BigDecimal. The bug in ROUND was first introduced in #7224 where we added the ability to round to any decimal place. This PR changes the behavior back to using `Math.round` if we recognize a number that can not be converted to a BigDecimal. * Add tests and fix spellcheck * update error message in ExpressionsTest * Address comments * fix up round for infinity * round non numeric doubles returns a double * fix spotbugs * Update docs/misc/math-expr.md * Update docs/querying/sql.md	2020-06-16 16:09:46 -07:00
Gian Merlino	9330ca9717	Remove LegacyDataSource. (#10037 ) * Remove LegacyDataSource. Its purpose was to enable deserialization of strings into TableDataSources. But we can do this more straightforwardly with Jackson annotations. * Slight test improvement.	2020-06-16 14:40:35 -07:00
Clint Wylie	9468df4721	make phaser of ReferenceCountingCloseableObject protected instead of private so subclasses can do stuff with it (#10035 )	2020-06-15 19:56:49 -07:00
Stefan Birkner	7282e2f2f9	Simplify CompressedVSizeColumnarIntsSupplierTest (#10003 ) The parameters generator uses CompressionStrategy.noNoneValues() instead of CompressionStrategyTest.compressionStrategies() which wrapped each strategy in a single element array. This improves readability of the test.	2020-06-10 09:32:00 -07:00
Clint Wylie	f8b643ec72	make joinables closeable (#9982 ) * make joinables closeable * tests and adjustments * refactor to make join stuffs impelement ReferenceCountedObject instead of Closable, more tests * fixes * javadocs and stuff * fix bugs * more test * fix lgtm alert * simplify * fixup javadoc * review stuffs * safeguard against exceptions * i hate this checkstyle rule * make IndexedTable extend Closeable	2020-06-09 20:12:36 -07:00
Clint Wylie	1c9ca55247	remove incorrect and unnecessary overrides from BooleanVectorValueMatcher (#9994 ) * remove incorrect and unnecessary overrides from BooleanVectorValueMatcher * add test case * add unit tests for ... part of VectorValueMatcherColumnProcessorFactory * Update VectorValueMatcherColumnProcessorFactoryTest.java	2020-06-09 19:32:16 -07:00
Clint Wylie	c5d6163c76	add a GeneratorInputSource to fill up a cluster with generated data for testing (#9946 ) * move benchmark data generator into druid-processing, add a GeneratorInputSource to fill up a cluster with data * newlines * make test coverage not fail maybe * remove useless test * Update pom.xml * Update GeneratorInputSourceTest.java * less passive aggressive test names	2020-06-09 19:31:04 -07:00
Clint Wylie	7f51e44b00	fix NilVectorSelector filter optimization (#9989 )	2020-06-08 17:40:29 -07:00
Clint Wylie	77dd5b06ae	ColumnCapabilities.hasMultipleValues refactor (#9731 ) * transition ColumnCapabilities.hasMultipleValues to Capable enum, remove ColumnCapabilities.isComplete * remove artifical, always multi-value capabilities from IncrementalIndexStorageAdapter and fix up fallout from that, fix ColumnCapabilities merge in index merger * fix typo * remove unused method * review stuffs, revert IncrementalIndexStorageAdapater capabilities change, plumb lame workaround to SegmentAnalyzer * more comment * use volatile booleans * fix line length * correctly handle missing columns for vector processors * return ColumnCapabilities.Capable for BitmapIndexSelector.hasMultipleValues, fix vector processor selection for complex * false on non-existent	2020-06-04 23:52:37 -07:00
Maytas Monsereenusorn	9738a03c83	Fix groupBy with literal in subquery grouping (#9986 ) * fix groupBy with literal in subquery grouping * fix groupBy with literal in subquery grouping * fix groupBy with literal in subquery grouping * address comments * update javadocs	2020-06-04 13:28:05 -10:00
Maytas Monsereenusorn	790e9482ea	Fix Subquery could not be converted to groupBy query (#9959 ) * Fix join * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * add tests * address comments * fix failing tests	2020-06-03 16:46:28 -07:00
Gian Merlino	3dfd7c30c0	Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT. (#9893 ) * Add REGEXP_LIKE, fix empty-pattern bug in REGEXP_EXTRACT. - Add REGEXP_LIKE function that returns a boolean, and is useful in WHERE clauses. - Fix REGEXP_EXTRACT return type (should be nullable; causes incorrect filter elision). - Fix REGEXP_EXTRACT behavior for empty patterns: should always match (previously, they threw errors). - Improve error behavior when REGEXP_EXTRACT and REGEXP_LIKE are passed non-literal patterns. - Improve documentation of REGEXP_EXTRACT. * Changes based on PR review. * Fix arg check. * Important fixes! * Add speller. * wip * Additional tests. * Fix up tests. * Add validation error tests. * Additional tests. * Remove useless call.	2020-06-03 14:31:37 -07:00
Maytas Monsereenusorn	0d22462e07	Document unsupported Join on multi-value column (#9948 ) * Document Unsupported Join on multi-value column * Document Unsupported Join on multi-value column * address comments * Add unit tests * address comments * add tests	2020-06-03 09:55:52 -10:00
Gian Merlino	3d81564a14	Fix various processing buffer leaks and simplify BlockingPool. (#9928 ) * - GroupByQueryEngineV2: Fix leak of intermediate processing buffer when exceptions are thrown before result sequence is created. - PooledTopNAlgorithm: Fix leak of intermediate processing buffer when exceptions are thrown before the PooledTopNParams object is created. - BlockingPool: Remove unused "take" methods. * Add tests to verify that buffers have been returned.	2020-06-02 18:26:18 -07:00
Gian Merlino	309fc04d54	Fix various Yielder leaks. (#9934 ) * Fix various Yielder leaks. - CombiningSequence leaked the input yielder from "toYielder" if it ran into an exception while accumulating the last value from the input yielder. - MergeSequence leaked input yielders from "toYielder" if it ran into an exception while building the initial priority queue. - ScanQueryRunnerFactory leaked the input yielder in its "priorityQueueSortAndLimit" strategy if it ran into an exception while scanning and sorting. - YieldingSequenceBase.accumulate chomped IOExceptions thrown in "accumulate" during yielder closing. * Add tests. * Fix braces.	2020-06-02 18:26:06 -07:00
Xavier Léauté	4ecf1900c3	fix nullhandling exceptions related to test ordering (#9964 ) follow-up to https://github.com/apache/druid/pull/9570	2020-06-02 10:13:54 -07:00
Clint Wylie	c690d10a7d	support customized factory.json via IndexSpec for segment persist (#9957 ) * support customized factory.json via IndexSpec for segment persist * equals verifier	2020-06-01 16:36:32 -07:00
Suneet Saldanha	e03d38b6c8	Optimize join queries where filter matches nothing (#9931 ) * Refactor JoinFilterAnalyzer This patch attempts to make it easier to follow the join filter analysis code with the hope of making it easier to add rewrite optimizations in the future. To keep the patch small and easy to review, this is the first of at least 2 patches that are planned. This patch adds a builder to the Pre-Analysis, so that it is easier to instantiate the preAnalysis. It also moves some of the filter normalization code out to Fitlers with associated tests. * fix tests * Refactor JoinFilterAnalyzer - part 2 This change introduces the following components: * RhsRewriteCandidates - a wrapper for a list of candidates and associated functions to operate on the set of candidates. * JoinableClauses - a wrapper for the list of JoinableClause that represent a join condition and the associated functions to operate on the clauses. * Equiconditions - a wrapper representing the equiconditions that are used in the join condition. And associated test changes. This refactoring surfaced 2 bugs: - Missing equals and hashcode implementation for RhsRewriteCandidate, thus allowing potential duplicates in the rhs rewrite candidates - Missing Filter#supportsRequiredColumnRewrite check in analyzeJoinFilterClause, which could result in UnsupportedOperationException being thrown by the filter * fix compile error * remove unused class * Refactor JoinFilterAnalyzer - Correlations Move the correlation related code out into it's own class so it's easier to maintain. Another patch should follow this one so that the query path uses the correlation object instead of it's underlying maps. * Optimize join queries where filter matches nothing Fixes #9787 This PR changes the Joinable interface to return an Optional set of correlated values for a column. This allows the JoinFilterAnalyzer to differentiate between the case where the column has no matching values and when the column could not find matching values. This PR chose not to distinguish between cases where correlated values could not be computed because of a config that has this behavior disabled or because of user error - like a column that could not be found. The reasoning was that the latter is likely an error and the non filter pushdown path will surface the error if it is.	2020-05-29 16:53:03 -07:00
Suneet Saldanha	9c40bebc02	Refactor JoinFilterAnalyzer - part 2 (#9929 ) * Refactor JoinFilterAnalyzer This patch attempts to make it easier to follow the join filter analysis code with the hope of making it easier to add rewrite optimizations in the future. To keep the patch small and easy to review, this is the first of at least 2 patches that are planned. This patch adds a builder to the Pre-Analysis, so that it is easier to instantiate the preAnalysis. It also moves some of the filter normalization code out to Fitlers with associated tests. * fix tests * Refactor JoinFilterAnalyzer - part 2 This change introduces the following components: * RhsRewriteCandidates - a wrapper for a list of candidates and associated functions to operate on the set of candidates. * JoinableClauses - a wrapper for the list of JoinableClause that represent a join condition and the associated functions to operate on the clauses. * Equiconditions - a wrapper representing the equiconditions that are used in the join condition. And associated test changes. This refactoring surfaced 2 bugs: - Missing equals and hashcode implementation for RhsRewriteCandidate, thus allowing potential duplicates in the rhs rewrite candidates - Missing Filter#supportsRequiredColumnRewrite check in analyzeJoinFilterClause, which could result in UnsupportedOperationException being thrown by the filter * fix compile error * remove unused class	2020-05-29 15:03:35 -07:00
Suneet Saldanha	faef31a0af	Refactor JoinFilterAnalyzer (#9921 ) * Refactor JoinFilterAnalyzer This patch attempts to make it easier to follow the join filter analysis code with the hope of making it easier to add rewrite optimizations in the future. To keep the patch small and easy to review, this is the first of at least 2 patches that are planned. This patch adds a builder to the Pre-Analysis, so that it is easier to instantiate the preAnalysis. It also moves some of the filter normalization code out to Fitlers with associated tests. * fix tests	2020-05-28 22:32:09 -07:00
Suneet Saldanha	b0167295d7	Fail incorrectly constructed join queries (#9830 ) * Fail incorrectly constructed join queries * wip annotation for equals implementations * Add equals tests * fix tests * Actually fix the tests * Address review comments * prohibit Pattern.hashCode()	2020-05-13 14:23:04 -07:00
Jonathan Wei	16d293d6e0	Directly rewrite filters on RHS join columns into LHS equivalents (#9818 ) * Directly rewrite filters on RHS join columns into LHS equivalents * PR comments * Fix inspection * Revert unnecessary ExprMacroTable change * Fix build after merge * Address PR comments	2020-05-08 23:45:35 -07:00
mcbrewster	28be107a1c	add flag to flattenSpec to keep null columns (#9814 ) * add flag to flattenSpec to keep null columns * remove changes to inputFormat interface * add comment * change comment message * update web console e2e test * move keepNullColmns to JSONParseSpec * fix merge conflicts * fix tests * set keepNullColumns to false by default * fix lgtm * change Boolean to boolean, add keepNullColumns to hash, add tests for keepKeepNullColumns false + true with no nuulul columns * Add equals verifier tests	2020-05-08 21:53:39 -07:00
Maytas Monsereenusorn	accd710115	Add equivalent test coverage for all RHS join impls (#9831 ) * Add equivalent test coverage for all RHS join impls * address comments	2020-05-06 16:10:41 -07:00
Jihoon Son	6674d721bc	Avoid sorting values in InDimFilter if possible (#9800 ) * Avoid sorting values in InDimFilter if possible * tests * more tests * fix and and or filters * fix build * false and true vector matchers * fix vector matchers * checkstyle * in filter null handling * remove wrong test * address comments * remove unnecessary null check * redundant separator * address comments * typo * tests	2020-05-06 15:26:36 -07:00
Suneet Saldanha	1e857c5303	Ignore druid-processing benchmarks in tests (#9821 )	2020-05-06 08:59:48 -07:00
Jihoon Son	c6caae9a24	Fix filtering on boolean values in transformation (#9812 ) * Fix filter on boolean value in Transform * assert * more descriptive test * remove assert * add assert for cached string; disable tests * typo	2020-05-04 18:47:10 -07:00
Jian Wang	85dfbb64cb	Update documention for metricCompression (#9811 )	2020-05-03 12:56:48 -07:00
Suneet Saldanha	7510e6e722	Fix potential NPEs in joins (#9760 ) * Fix potential NPEs in joins intelliJ reported issues with potential NPEs. This was first hit in testing with a filter being pushed down to the left hand table when joining against an indexed table. * More null check cleanup * Optimize filter value rewrite for IndexedTable * Add unit tests for LookupJoinable * Add tests for IndexedTableJoinable * Add non null assert for dimension selector * Supress null warning in LookupJoinMatcher * remove some null checks on hot path	2020-04-29 11:03:13 -07:00
Jonathan Wei	fe000a9e4b	Adjust string comparators used for ingestion (#9742 ) * Adjust string comparators used for ingestion * Small tweak * Fix inspection, more javadocs * Address PR comment * Add rollup comment * Add ordering test * Fix IncrementaIndexRowCompTest	2020-04-25 13:47:07 -07:00
BIGrey	c5bfe36011	Optimize FileWriteOutBytes to avoid high system cpu usage (#9722 ) * optimize FileWriteOutBytes to avoid high sys cpu * optimize FileWriteOutBytes to avoid high sys cpu -- remove IOException * optimize FileWriteOutBytes to avoid high sys cpu -- remove IOException in writeOutBytes.size * Revert "optimize FileWriteOutBytes to avoid high sys cpu -- remove IOException in writeOutBytes.size" This reverts commit `965f7421` * Revert "optimize FileWriteOutBytes to avoid high sys cpu -- remove IOException" This reverts commit `149e08c0` * optimize FileWriteOutBytes to avoid high sys cpu -- avoid IOEception never thrown check * Fix size counting to handle IOE in FileWriteOutBytes + tests * remove unused throws IOException in WriteOutBytes.size() * Remove redundant throws IOExcpetion clauses * Parameterize IndexMergeBenchmark Co-authored-by: huanghui.bigrey <huanghui.bigrey@bytedance.com> Co-authored-by: Suneet Saldanha <suneet.saldanha@imply.io>	2020-04-23 20:18:42 -07:00
Clint Wylie	68cc0b2e1c	fixes for inline subqueries when multi-value dimension is present (#9698 ) * fixes for inline subqueries when multi-value dimension is present * fix test * allow missing capabilities for vectorized group by queries to be treated as single dims since it means that column doesnt exist * add comment	2020-04-21 18:44:26 -07:00
Jenson	b9ad250c00	Fix misuse of Integer.SIZE in FileWriteOutBytes.writeInt (#9723 ) * change Integer.SIZE to Integer.BYTES in FileWriteOutBytes#writeInt * Add ASF header Co-authored-by: jenson <junstan@paypal.com>	2020-04-19 18:16:53 +08:00
Clint Wylie	e677c62484	document useFilterCNF query context parameter (#9647 ) * document useFilterCNF query context parameter * move context key to QueryContexts * Update .spelling	2020-04-16 22:12:20 -07:00
Clint Wylie	b89ad49396	disable group by config applyLimitPushDownToSegment by default (#9711 ) * disable group by config applyLimitPushDownToSegment by default * document	2020-04-16 03:03:35 -07:00
Clint Wylie	0ff926b1a1	fix issue with group by limit pushdown for extractionFn, expressions, joins, etc (#9662 ) * fix issue with group by limit pushdown for extractionFn, expressions, joins, etc * remove unused * fix test * revert unintended change * more tests * consider capabilities for StringGroupByColumnSelectorStrategy * fix test * fix and more test * revert because im scared	2020-04-11 01:18:11 -07:00
Gian Merlino	5249155284	Fix off-by-one in IndexedTableJoinMatcher.getCardinality. (#9674 ) * Fix off-by-one in IndexedTableJoinMatcher.getCardinality. It would report a cardinality that is one lower than the actual cardinality. The missing value is the phantom null that can be generated by outer joins. * Fix tests.	2020-04-10 18:11:05 -07:00
Suneet Saldanha	332ca19621	Fix potential integer overflow issues (#9609 ) ApproximateHistogram - seems unlikely SegmentAnalyzer - unclear if this is an actual issue GenericIndexedWriter - unclear if this is an actual issue IncrementalIndexRow and OnheapIncrementalIndex are non-issues becaus it's very unlikely for the number of dims to be large enough to hit the overflow condition	2020-04-10 11:47:08 -07:00
Suneet Saldanha	1ced3b33fb	IntelliJ inspections cleanup (#9339 ) * IntelliJ inspections cleanup * Standard Charset object can be used * Redundant Collection.addAll() call * String literal concatenation missing whitespace * Statement with empty body * Redundant Collection operation * StringBuilder can be replaced with String * Type parameter hides visible type * fix warnings in test code * more test fixes * remove string concatenation inspection error * fix extra curly brace * cleanup AzureTestUtils * fix charsets for RangerAdminClient * review comments	2020-04-10 10:04:40 -07:00
Jihoon Son	e157fb089a	Fix wrong cardinality computation in BufferArrayGrouper (#9655 ) * Fix wrong cardinality computation in BufferArrayGrouper * fix javadoc	2020-04-10 09:05:38 -07:00
Suneet Saldanha	65de636893	Fix potential integer overflow in BufferArrayGrouper (#9605 ) This change fixes a potential integer overflow in BufferArrayGrouper that was flagged by LGTM. It also adds a check that the vectorized arrays are initialized before aggregateVector is called. The changes in HashTableUtils should not have any effect since the numbers being multiplied are small, but the change will remove the warnings from being flagged in LGTM.	2020-04-09 17:46:15 -07:00

1 2 3 4 5 ...

2294 Commits