druid

mirror of https://github.com/apache/druid.git synced 2025-03-01 14:59:08 +00:00

Author	SHA1	Message	Date
Clint Wylie	c72f96a4ba	fix bug with expressions on sparse string realtime columns without explicit null valued rows (#10248 ) * fix bug with realtime expressions on sparse string columns * fix test * add comment back * push capabilities for dimensions to dimension indexers since they know things * style * style * fixes * getting a bit carried away * missed one * fix it * benchmark build fix * review stuffs * javadoc and comments * add comment * more strict check * fix missed usaged of impl instead of interface	2020-08-11 11:07:17 -07:00
Abhishek Radhakrishnan	dc16abae34	Vectorization support for long, double, float min & max aggregators. (#10260 ) * LongMaxVectorAggregator support and test case. * DoubleMinVectorAggregator and test cases. * DoubleMaxVectorAggregator and unit test. * FloatMinVectorAggregator and FloatMaxVectorAggregator. * Documentation update to include the other vector aggregators. * Bug fix. * checkstyle formatting fixes. * CalciteQueryTest cases update. * Separate test classes for FloatMaxAggregation and FloatMniAggregation. * remove the cannotVectorize for float max/min aggregator in test. * Tests in GroupByQueryRunner, GroupByTimeseriesQueryRunner and TimeseriesQueryRunner.	2020-08-10 15:18:55 -07:00
Gian Merlino	170031744e	Combine InDimFilter, InFilter. (#10119 ) * Combine InDimFilter, InFilter. There are two motivations: 1. Ensure that when HashJoinSegmentStorageAdapter compares its Filter to the original one, and it is an "in" type, the comparison is by reference and does not need to check deep equality. This is useful when the "in" filter is very large. 2. Simplify things. (There isn't a great reason for the DimFilter and Filter logic to be separate, and combining them reduces some duplication.) * Fix test.	2020-08-06 18:34:21 -07:00
Gian Merlino	b6aaf59e8c	Add "offset" parameter to GroupBy query. (#10235 ) * Add "offset" parameter to GroupBy query. It works by doing the query as normal and then throwing away the first "offset" number of rows on the broker. * Stabilize GroupBy sorts. * Fix inspections. * Fix suppression. * Fixups. * Move TopNSequence to druid-core. * Addl comments. * NumberedElement equals verification. * Changes from review.	2020-08-05 15:39:58 -07:00
Abhishek Radhakrishnan	34a4113752	Add vectorization support for the longMin aggregator. (#10211 ) * Fix minor formatting in docs. * Add Nullhandling initialization for test to run from IDE. * Vectorize longMin aggregator. - A new vectorized class for the vectorized long min aggregator. - Changes to AggregatorFactory to support vectorize functionality. - Few changes to schema evolution test to add LongMinAggregatorFactory. * Add longSum to the supported vectorized aggregator implementations. * Add MIN() long min to calcite query test that can vectorize. * Add simple long aggregations test. * Fixup formatting per checkstyle guide. * fixup and add more tests for long min aggregator. * Override test for groupBy since timestamps are handled differently. * Null compatibility check in test. * Review comment: Add a test case to LongMinAggregationTest.	2020-08-01 15:32:09 -07:00
frank chen	646fa84d04	Support unit on byte-related properties (#10203 ) * support unit suffix on byte-related properties * add doc * change default value of byte-related properites in example files * fix coding style * fix doc * fix CI * suppress spelling errors * improve code according to comments * rename Bytes to HumanReadableBytes * add getBytesInInt to get value safely * improve doc * fix problem reported by CI * fix problem reported by CI * resolve code review comments * improve error message * improve code & doc according to comments * fix CI problem * improve doc * suppress spelling check errors	2020-07-31 09:58:48 +08:00
Maytas Monsereenusorn	574b062f1f	Cluster wide default query context setting (#10208 ) * Cluster wide default query context setting * Cluster wide default query context setting * Cluster wide default query context setting * add docs * fix docs * update props * fix checkstyle * fix checkstyle * fix checkstyle * update docs * address comments * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * fix NPE	2020-07-29 15:19:18 -07:00
Jihoon Son	63c1746fe4	Fix timeseries query constructor when postAggregator has an expression reading timestamp result column (#10198 ) * Fix timeseries query constructor when postAggregator has an expression reading timestamp result column * fix npe * Fix postAgg referencing timestampResultField and add a test for it * fix test * doc * revert doc	2020-07-27 10:54:44 -07:00
Jihoon Son	6fdce36e41	Add integration tests for query retry on missing segments (#10171 ) * Add integration tests for query retry on missing segments * add missing dependencies; fix travis conf * address comments * Integration tests extension * remove unused dependency * remove druid_main * fix java agent port	2020-07-22 22:30:35 -07:00
Jihoon Son	41982116f4	Report missing segments when there is no segment for the query datasource in historicals (#10199 ) * Report missing segments when there is no segment for the query datasource in historicals * test * missing part for test * another test	2020-07-20 21:02:52 -07:00
Nishant Bangarwa	971d8a353b	Add groupBy limitSpec to queryCache key (#10093 ) * Add groupBy limitSpec to queryCache key * Only add limitSpec to cache key if pushdown is set to true * review comment	2020-07-13 19:15:09 -07:00
Jihoon Son	53a2550571	Follow-up for RetryQueryRunner fix (#10144 ) * address comments; use guice instead of query context * typo * QueryResource tests * address comments * catch queryException * fix spell check	2020-07-08 13:28:11 -07:00
Clint Wylie	010fe047e1	AbstractOptimizableDimFilter should be public (#10142 )	2020-07-06 15:19:32 -07:00
Clint Wylie	c86e7ce30b	bump version to 0.20.0-SNAPSHOT (#10124 )	2020-07-06 15:08:32 -07:00
Jonathan Wei	ed981ef88e	Add DimFilter.toOptimizedFilter(), ensure that join filter pre-analysis operates on optimized filters (#10056 ) * Ensure that join filter pre-analysis operates on optimized filters, add DimFilter.toOptimizedFilter * Remove aggressive equality check that was used for testing * Use Suppliers.memoize * Checkstyle	2020-07-01 22:26:17 -07:00
Samarth Jain	e2c5bcc22d	Fix UnknownComplexTypeColumn#makeVectorObjectSelector. Add a warning … (#10123 ) * Fix UnknownComplexTypeColumn#makeVectorObjectSelector. Add a warning message to indicate failure in deserializing.	2020-07-01 20:06:23 -07:00
Samarth Jain	3e92cdf1cf	Revert "Fix UnknownTypeComplexColumn#makeVectorObjectSelector" (#10121 ) This reverts commit 7bb7489afc7a2cc496be93ae69681b6ab13a7c66.	2020-07-01 14:33:17 -07:00
Jihoon Son	657f8ee80f	Fix RetryQueryRunner to actually do the job (#10082 ) * Fix RetryQueryRunner to actually do the job * more javadoc * fix test and checkstyle * don't combine for testing * address comments * fix unit tests * always initialize response context in cachingClusteredClient * fix subquery * address comments * fix test * query id for builders * make queryId optional in the builders and ClusterQueryResult * fix test * suppress tests and unused methods * exclude groupBy builder * fix jacoco exclusion * add tests for builders * address comments * don't truncate	2020-07-01 14:02:21 -07:00
samarthjain	7bb7489afc	Fix UnknownTypeComplexColumn#makeVectorObjectSelector	2020-07-01 12:02:23 -07:00
Gian Merlino	5faa897a34	Join filter pre-analysis simplifications and sanity checks. (#10104 ) * Join filter pre-analysis simplifications and sanity checks. - At pre-analysis time, only compute pre-analysis for the innermost root query, since this is the one that will run on the join that involves the base datasource. Previously, pre-analyses were computed for multiple levels of the query, some of which were unnecessary. - Remove JoinFilterPreAnalysisGroup and join query level gathering code, since they existed to support precomputation of multiple pre-analyses. - Embed JoinFilterPreAnalysisKey into JoinFilterPreAnalysis and use it to sanity check at processing time that the correct pre-analysis was done. Tangentially related changes: - Remove prioritizeAndLaneQuery functionality from LocalQuerySegmentWalker. The computed priority and lanes were not being used. - Add "getBaseQuery" method to DataSourceAnalysis to support identification of the proper subquery for filter pre-analysis. * Fix compilation errors. * Adjust tests.	2020-06-30 19:14:22 -07:00
Samarth Jain	2c1b45842f	Prevent unknown complex types from breaking DruidSchema refresh (#9422 )	2020-06-30 14:06:17 -07:00
Suneet Saldanha	15a0b4ffe2	Filter http requests by http method (#10085 ) * Filter http requests by http method Add a config that allows a user which http methods to allow against their Druid server. Druid will only accept http requests with the method: GET, PUT, POST, DELETE and OPTIONS. If a Druid admin wants to allow other methods, they can do so by using the ServerConfig#allowedHttpMethods config. If a Druid user would like to disallow OPTIONS, this can be done by changing the AuthConfig#allowUnauthenticatedHttpOptions config * Exclude OPTIONS from always supported HTTP methods Add HEAD as an allowed method for web console e2e tests * fix docs * fix security IT * Actually fix the web console e2e tests * Ignore icode coverage for nitialization classes * code review	2020-06-29 16:59:31 -07:00
chenyuzhi459	a4c6d5f37e	fix query memory leak (#10027 ) * fix query memory leak * rollup ./idea * roll up .idea * clean code * optimize style * optimize cancel function * optimize style * add concurrentGroupTest test case * add test case * add unit test * fix code style * optimize cancell method use * format code * reback code * optimize cancelAll * clean code * add comment	2020-06-26 23:30:59 -07:00
Maytas Monsereenusorn	9be5039f68	Enable query vectorization by default (#10065 ) * Enable query vectorization by default * update docs	2020-06-24 13:08:49 -07:00
Maytas Monsereenusorn	f80c02da02	Fix HyperUniquesAggregatorFactory.estimateCardinality null handling to respect output type (#10063 ) * fix return type from HyperUniquesAggregator/HyperUniquesVectorAggregator * address comments * address comments	2020-06-23 15:54:37 -10:00
Clint Wylie	eee99ff0d5	minor rework of topn algorithm selection for clarity and more javadocs (#10058 ) * minor refactor of topn engine algorithm selection for clarity * adjust * more javadoc	2020-06-22 09:08:50 -07:00
Clint Wylie	c2f5d453f8	fix topn on string columns with non-sorted or non-unique dictionaries (#10053 ) * fix topn on string columns with non-sorted or non-unique dictionaries * fix metadata tests * refactor, clarify comments and code, fix ci failures	2020-06-19 11:35:18 -07:00
Jonathan Wei	37e150c075	Fix join filter rewrites with nested queries (#10015 ) * Fix join filter rewrites with nested queries * Fix test, inspection, coverage * Remove clauses from group key * Fix import order Co-authored-by: Gian Merlino <gianmerlino@gmail.com>	2020-06-18 21:32:29 -07:00
Clint Wylie	b5e6569d2c	global table only if joinable (#10041 ) * global table if only joinable * oops * fix style, add more tests * Update sql/src/test/java/org/apache/druid/sql/calcite/schema/DruidSchemaTest.java * better information schema columns, distinguish broadcast from joinable * fix javadoc * fix mistake Co-authored-by: Jihoon Son <jihoonson@apache.org>	2020-06-18 17:32:10 -07:00
Aleksey Plekhanov	2c384b61ff	IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty()" (#9690 ) IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty()" Reverted checkstyle rule * Added tests to pass CI * Codestyle	2020-06-18 09:47:07 -07:00
Maytas Monsereenusorn	7569ee3ec6	All aggregators should check if column can be vectorize (#10026 ) * All aggregators should use vectorization-aware column processor * All aggregators should use vectorization-aware column processor * fix canVectorize * fix canVectorize * add tests * revert back default * address comment * address comments * address comment * address comment	2020-06-17 01:52:02 -10:00
Clint Wylie	68aa384190	global table datasource for broadcast segments (#10020 ) * global table datasource for broadcast segments * tests * fix * fix test * comments and javadocs * review stuffs * use generated equals and hashcode	2020-06-16 17:58:05 -07:00
Suneet Saldanha	4e483a70b4	ROUND and having comparators correctly handle special double values (#10014 ) * ROUND and having comparators correctly handle doubles Double.NaN, Double.POSITIVE_INFINITY and Double.NEGATIVE_INFINITY are not real numbers. Because of this, they can not be converted to BigDecimal and instead throw a NumberFormatException. This change adds support for calculations that produce these numbers either for use in the `ROUND` function or the HavingSpecMetricComparator by not attempting to convert the number to a BigDecimal. The bug in ROUND was first introduced in #7224 where we added the ability to round to any decimal place. This PR changes the behavior back to using `Math.round` if we recognize a number that can not be converted to a BigDecimal. * Add tests and fix spellcheck * update error message in ExpressionsTest * Address comments * fix up round for infinity * round non numeric doubles returns a double * fix spotbugs * Update docs/misc/math-expr.md * Update docs/querying/sql.md	2020-06-16 16:09:46 -07:00
Gian Merlino	9330ca9717	Remove LegacyDataSource. (#10037 ) * Remove LegacyDataSource. Its purpose was to enable deserialization of strings into TableDataSources. But we can do this more straightforwardly with Jackson annotations. * Slight test improvement.	2020-06-16 14:40:35 -07:00
Clint Wylie	9468df4721	make phaser of ReferenceCountingCloseableObject protected instead of private so subclasses can do stuff with it (#10035 )	2020-06-15 19:56:49 -07:00
Stefan Birkner	7282e2f2f9	Simplify CompressedVSizeColumnarIntsSupplierTest (#10003 ) The parameters generator uses CompressionStrategy.noNoneValues() instead of CompressionStrategyTest.compressionStrategies() which wrapped each strategy in a single element array. This improves readability of the test.	2020-06-10 09:32:00 -07:00
Clint Wylie	f8b643ec72	make joinables closeable (#9982 ) * make joinables closeable * tests and adjustments * refactor to make join stuffs impelement ReferenceCountedObject instead of Closable, more tests * fixes * javadocs and stuff * fix bugs * more test * fix lgtm alert * simplify * fixup javadoc * review stuffs * safeguard against exceptions * i hate this checkstyle rule * make IndexedTable extend Closeable	2020-06-09 20:12:36 -07:00
Clint Wylie	1c9ca55247	remove incorrect and unnecessary overrides from BooleanVectorValueMatcher (#9994 ) * remove incorrect and unnecessary overrides from BooleanVectorValueMatcher * add test case * add unit tests for ... part of VectorValueMatcherColumnProcessorFactory * Update VectorValueMatcherColumnProcessorFactoryTest.java	2020-06-09 19:32:16 -07:00
Clint Wylie	c5d6163c76	add a GeneratorInputSource to fill up a cluster with generated data for testing (#9946 ) * move benchmark data generator into druid-processing, add a GeneratorInputSource to fill up a cluster with data * newlines * make test coverage not fail maybe * remove useless test * Update pom.xml * Update GeneratorInputSourceTest.java * less passive aggressive test names	2020-06-09 19:31:04 -07:00
Clint Wylie	7f51e44b00	fix NilVectorSelector filter optimization (#9989 )	2020-06-08 17:40:29 -07:00
Clint Wylie	77dd5b06ae	ColumnCapabilities.hasMultipleValues refactor (#9731 ) * transition ColumnCapabilities.hasMultipleValues to Capable enum, remove ColumnCapabilities.isComplete * remove artifical, always multi-value capabilities from IncrementalIndexStorageAdapter and fix up fallout from that, fix ColumnCapabilities merge in index merger * fix typo * remove unused method * review stuffs, revert IncrementalIndexStorageAdapater capabilities change, plumb lame workaround to SegmentAnalyzer * more comment * use volatile booleans * fix line length * correctly handle missing columns for vector processors * return ColumnCapabilities.Capable for BitmapIndexSelector.hasMultipleValues, fix vector processor selection for complex * false on non-existent	2020-06-04 23:52:37 -07:00
Maytas Monsereenusorn	9738a03c83	Fix groupBy with literal in subquery grouping (#9986 ) * fix groupBy with literal in subquery grouping * fix groupBy with literal in subquery grouping * fix groupBy with literal in subquery grouping * address comments * update javadocs	2020-06-04 13:28:05 -10:00
Maytas Monsereenusorn	790e9482ea	Fix Subquery could not be converted to groupBy query (#9959 ) * Fix join * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * add tests * address comments * fix failing tests	2020-06-03 16:46:28 -07:00
Gian Merlino	3dfd7c30c0	Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT. (#9893 ) * Add REGEXP_LIKE, fix empty-pattern bug in REGEXP_EXTRACT. - Add REGEXP_LIKE function that returns a boolean, and is useful in WHERE clauses. - Fix REGEXP_EXTRACT return type (should be nullable; causes incorrect filter elision). - Fix REGEXP_EXTRACT behavior for empty patterns: should always match (previously, they threw errors). - Improve error behavior when REGEXP_EXTRACT and REGEXP_LIKE are passed non-literal patterns. - Improve documentation of REGEXP_EXTRACT. * Changes based on PR review. * Fix arg check. * Important fixes! * Add speller. * wip * Additional tests. * Fix up tests. * Add validation error tests. * Additional tests. * Remove useless call.	2020-06-03 14:31:37 -07:00
Maytas Monsereenusorn	0d22462e07	Document unsupported Join on multi-value column (#9948 ) * Document Unsupported Join on multi-value column * Document Unsupported Join on multi-value column * address comments * Add unit tests * address comments * add tests	2020-06-03 09:55:52 -10:00
Gian Merlino	3d81564a14	Fix various processing buffer leaks and simplify BlockingPool. (#9928 ) * - GroupByQueryEngineV2: Fix leak of intermediate processing buffer when exceptions are thrown before result sequence is created. - PooledTopNAlgorithm: Fix leak of intermediate processing buffer when exceptions are thrown before the PooledTopNParams object is created. - BlockingPool: Remove unused "take" methods. * Add tests to verify that buffers have been returned.	2020-06-02 18:26:18 -07:00
Gian Merlino	309fc04d54	Fix various Yielder leaks. (#9934 ) * Fix various Yielder leaks. - CombiningSequence leaked the input yielder from "toYielder" if it ran into an exception while accumulating the last value from the input yielder. - MergeSequence leaked input yielders from "toYielder" if it ran into an exception while building the initial priority queue. - ScanQueryRunnerFactory leaked the input yielder in its "priorityQueueSortAndLimit" strategy if it ran into an exception while scanning and sorting. - YieldingSequenceBase.accumulate chomped IOExceptions thrown in "accumulate" during yielder closing. * Add tests. * Fix braces.	2020-06-02 18:26:06 -07:00
Xavier Léauté	4ecf1900c3	fix nullhandling exceptions related to test ordering (#9964 ) follow-up to https://github.com/apache/druid/pull/9570	2020-06-02 10:13:54 -07:00
Clint Wylie	c690d10a7d	support customized factory.json via IndexSpec for segment persist (#9957 ) * support customized factory.json via IndexSpec for segment persist * equals verifier	2020-06-01 16:36:32 -07:00
Suneet Saldanha	e03d38b6c8	Optimize join queries where filter matches nothing (#9931 ) * Refactor JoinFilterAnalyzer This patch attempts to make it easier to follow the join filter analysis code with the hope of making it easier to add rewrite optimizations in the future. To keep the patch small and easy to review, this is the first of at least 2 patches that are planned. This patch adds a builder to the Pre-Analysis, so that it is easier to instantiate the preAnalysis. It also moves some of the filter normalization code out to Fitlers with associated tests. * fix tests * Refactor JoinFilterAnalyzer - part 2 This change introduces the following components: * RhsRewriteCandidates - a wrapper for a list of candidates and associated functions to operate on the set of candidates. * JoinableClauses - a wrapper for the list of JoinableClause that represent a join condition and the associated functions to operate on the clauses. * Equiconditions - a wrapper representing the equiconditions that are used in the join condition. And associated test changes. This refactoring surfaced 2 bugs: - Missing equals and hashcode implementation for RhsRewriteCandidate, thus allowing potential duplicates in the rhs rewrite candidates - Missing Filter#supportsRequiredColumnRewrite check in analyzeJoinFilterClause, which could result in UnsupportedOperationException being thrown by the filter * fix compile error * remove unused class * Refactor JoinFilterAnalyzer - Correlations Move the correlation related code out into it's own class so it's easier to maintain. Another patch should follow this one so that the query path uses the correlation object instead of it's underlying maps. * Optimize join queries where filter matches nothing Fixes #9787 This PR changes the Joinable interface to return an Optional set of correlated values for a column. This allows the JoinFilterAnalyzer to differentiate between the case where the column has no matching values and when the column could not find matching values. This PR chose not to distinguish between cases where correlated values could not be computed because of a config that has this behavior disabled or because of user error - like a column that could not be found. The reasoning was that the latter is likely an error and the non filter pushdown path will surface the error if it is.	2020-05-29 16:53:03 -07:00

1 2 3 4 5 ...

2317 Commits