druid

Commit Graph

Author	SHA1	Message	Date
Paul Rogers	f4dcc52dac	Redesign QueryContext class (#13071 ) We introduce two new configuration keys that refine the query context security model controlled by druid.auth.authorizeQueryContextParams. When that value is set to true then two other configuration options become available: druid.auth.unsecuredContextKeys: The set of query context keys that do not require a security check. Use this for the "white-list" of key to allow. All other keys go through the existing context key security checks. druid.auth.securedContextKeys: The set of query context keys that do require a security check. Use this when you want to allow all but a specific set of keys: only these keys go through the existing context key security checks. Both are set using JSON list format: druid.auth.securedContextKeys=["secretKey1", "secretKey2"] You generally set one or the other values. If both are set, unsecuredContextKeys acts as exceptions to securedContextKeys. In addition, Druid defines two query context keys which always bypass checks because Druid uses them internally: sqlQueryId sqlStringifyArrays	2022-10-15 11:02:11 +05:30
Rohan Garg	45dfd679e9	Composite approach for checking in-filter values set in column dictionary (#13133 )	2022-10-13 12:32:48 +05:30
Kashif Faraz	346fbf133f	Make DimensionDictionary abstract (#13215 ) This is in preparation for eventually retiring the flag `useMaxMemoryEstimates`, after which the footprint of a value in the dimension dictionary will always be estimated using the `estimateSizeOfValue()` method.	2022-10-13 07:18:46 +05:30
Abhishek Agarwal	548d0d0bb2	Add more information to exceptions occurred while writing temporary data (#13217 ) * Add more information to exceptions when writing tmp data to disk * Better error message	2022-10-13 08:23:51 +08:00
Clint Wylie	6eff6c9ae4	fix json_value sql planning with decimal type, fix vectorized expression math null value handling in default mode (#13214 ) * fix json_value sql planning with decimal type, fix vectorized expression math null value handling in default mode changes: * json_value 'returning' decimal will now plan to native double typed query instead of ending up with default string typing, allowing decimal vector math expressions to work with this type * vector math expressions now zero out 'null' values even in 'default' mode (druid.generic.useDefaultValueForNull=false) to prevent downstream things that do not check the null vector from producing incorrect results * more better * test and why not vectorize * more test, more fix	2022-10-12 16:28:41 -07:00
Clint Wylie	59e2afc566	use object[] instead of string[] for vector expressions to be consistent with vector object selectors (#13209 ) * use object[] instead of string[] for vector expressions to be consistent with vector object selectors * simplify	2022-10-12 02:53:43 -07:00
Clint Wylie	9688674ea8	fix issue with nested column null value index incorrectly matching non-null values (#13211 )	2022-10-11 15:54:36 -07:00
Adarsh Sanjeev	92d2633ae6	Update ClusterByStatisticsCollectorImpl to use bytes instead of keys (#12998 ) * Update clusterByStatistics to use bytes instead of keys * Address review comments * Resolve checkstyle * Increase test coverage * Update test * Update thresholds * Update retained keys function * Update docs * Fix spelling	2022-10-03 12:08:23 +05:30
Clint Wylie	a0e0fbe1b3	nested column serializer performance improvement for sparse columns (#13101 )	2022-09-19 14:07:48 +05:30
Clint Wylie	5ece870634	split up NestedDataColumnSerializer into separate files (#13096 ) * split up NestedDataColumnSerializer into separate files * fix it	2022-09-16 01:28:47 -07:00
Frank Chen	fd6c05eee8	Avoid ClassCastException when getting values from `QueryContext` (#13022 ) * Use safe conversion methods * Rename method * Add getContextAsBoolean * Update test case * Remove generic from getContextValue * Update catch-handler * Add test * Resolve comments * Replace 'getContextXXX' to 'getQueryContext().getAsXXXX'	2022-09-13 18:00:09 +08:00
imply-cheddar	5ba0075c0c	Expose HTTP Response headers from SqlResource (#13052 ) * Expose HTTP Response headers from SqlResource This change makes the SqlResource expose HTTP response headers in the same way that the QueryResource exposes them. Fundamentally, the change is to pipe the QueryResponse object all the way through to the Resource so that it can populate response headers. There is also some code cleanup around DI, as there was a superfluous FactoryFactory class muddying things up.	2022-09-12 01:40:06 -07:00
Gian Merlino	e29e7a8434	Add ARRAY_QUANTILE function. (#13061 ) * Add ARRAY_QUANTILE function. Expected usage is like: ARRAY_QUANTILE(ARRAY_AGG(x), 0.9). * Fix test.	2022-09-09 11:29:20 -07:00
Clint Wylie	6438f4198d	improve nested column serializer (#13051 ) changes: * long and double value columns are now written directly, at the same time as writing out the 'intermediary' dictionaryid column with unsorted ids * remove reverse value lookup from GlobalDictionaryIdLookup since it is no longer needed	2022-09-08 18:32:53 -07:00
Rohan Garg	2f156b3610	Disallow timeseries queries with ETERNITY interval and non-ALL granularity (#12944 )	2022-09-07 16:45:08 +05:30
Rohan Garg	7aa8d7f987	Add query/time metric for SQL queries from router (#12867 ) * Add query/time metric for SQL queries from router * Fix query cancel bug when user has overriden native query-id in a SQL query	2022-09-07 13:54:46 +05:30
Clint Wylie	a3a377e570	more consistent expression error messages (#12995 ) * more consistent expression error messages * review stuff * add NamedFunction for Function, ApplyFunction, and ExprMacro to share common stuff * fixes * add expression transform name to transformer failure, better parse_json error messaging	2022-09-06 23:21:38 -07:00
sr	ed26e2d634	Improve String Last/First Storage Efficiency (#12879 ) -Add classes for writing cell values in LZ4 block compressed format. Payloads are indexed by element number for efficient random lookup -update SerializablePairLongStringComplexMetricSerde to use block compression -SerializablePairLongStringComplexMetricSerde also uses delta encoding of the Long by doing 2-pass encoding: buffers first to find min/max numbers and delta-encodes as integers if possible Entry points for doing block-compressed storage of byte[] payloads are the CellWriter and CellReader class. See SerializablePairLongStringComplexMetricSerde for how these are used along with how to do full column-based storage (delta encoding here) which includes 2-pass encoding to compute a column header	2022-09-06 20:00:54 -07:00
Gian Merlino	2450b96ac8	FrameFile: Java 17 compatibility. (#12987 ) * FrameFile: Java 17 compatibility. DataSketches Memory.map is not Java 17 compatible, and from discussions with the team, is challenging to make compatible with 17 while also retaining compatibility with 8 and 11. So, in this patch, we switch away from Memory.map and instead use the builtin JDK mmap functionality. Since it only supports maps up to Integer.MAX_VALUE, we also implement windowing in FrameFile, such that we can still handle large files. Other changes: 1) Add two new "map" functions to FileUtils, which we use in this patch. 2) Add a footer checksum to the FrameFile format. Individual frames already have checksums, but the footer was missing one. * Changes for static analysis. * wip * Fixes.	2022-08-30 11:13:47 -07:00
Gian Merlino	414176fb97	Fix accounting of bytesAdded in ReadableByteChunksFrameChannel. (#12988 ) * Fix accounting of bytesAdded in ReadableByteChunksFrameChannel. Could cause WorkerInputChannelFactory to get into an infinite loop when reading the footer of a frame file. * Additional tests.	2022-08-29 18:25:28 -07:00
Abhishek Agarwal	618757352b	Bump up the version to 25.0.0 (#12975 ) * Bump up the version to 25.0.0 * Fix the version in console	2022-08-29 11:27:38 +05:30
Kashif Faraz	9843355ddd	Throw parse exception for multi-valued numeric dims (#12953 ) During ingestion, if a row containing multiple values for a numeric dimension is encountered, the whole ingestion task fails. Ideally, this should just be registered as a parse exception. Changes: - Remove `instanceof List` check from `LongDimensionIndexer`, `FloatDimensionIndexer` and `DoubleDimensionIndexer`. Any invalid type, including list, throws a parse exception in `DimensionHandlerUtils.convertObjectToXXX` methods. `ParseException` is already handled in `OnHeapIncrementalIndex` and does not fail the entire task.	2022-08-29 10:33:48 +05:30
Clint Wylie	16f5ac5bd5	json_value adjustments (#12968 ) * json_value adjustments changes: * native json_value expression now has optional 3rd argument to specify type, which will cast all values to the specified type * rework how JSON_VALUE is wired up in SQL. Now we are using a custom convertlet to translate JSON_VALUE(... RETURNING type) into dedicated JSON_VALUE_BIGINT, JSON_VALUE_DOUBLE, JSON_VALUE_VARCHAR, JSON_VALUE_ANY instead of using the calcite StandardConvertletTable that wraps JSON_VALUE_ANY in a CAST, so that we preserve the typing of JSON_VALUE to pass down to the native expression as the 3rd argument * fix json_value_any to be usable by humans too, coverage * fix bug * checkstyle * checkstyle * review stuff * validate that options to json_value are the supported options rather than ignore them * remove more legacy undocumented functions	2022-08-27 07:15:47 -07:00
Alexander Saydakov	7e2371bbde	KLL sketch (#12498 ) * KLL sketch * added documentation * direct static refs * direct static refs * fixed test * addressed review points * added KLL sketch related terms * return a copy from get * Copy unions when returning them from "get". * Remove redundant "final". Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com> Co-authored-by: Gian Merlino <gianmerlino@gmail.com>	2022-08-26 21:19:24 -07:00
Clint Wylie	72aba00e09	add json function support for paths with negative array indexes (#12972 )	2022-08-25 17:11:28 -07:00
Clint Wylie	82ad927087	tighten up array handling, fix bug with array_slice output type inference (#12914 )	2022-08-25 00:48:49 -07:00
Adarsh Sanjeev	3b58a01c7c	Correct spelling in messages and variable names. (#12932 )	2022-08-24 11:06:31 +05:30
Clint Wylie	289e43281e	stricter behavior for parse_json, add try_parse_json, remove to_json (#12920 )	2022-08-22 18:41:07 -07:00
Clint Wylie	7fb1153bba	add virtual columns to search query cache key (#12907 ) * add virtual columns to search query cache key	2022-08-17 20:26:01 -07:00
Gian Merlino	d3015d0f8e	DruidQuery: Return a copy from withScanSignatureIfNeeded, as promised. (#12906 ) The method wasn't following its contract, leading to pollution of the overall planner context, when really we just want to create a new context for a specific query.	2022-08-16 13:23:14 -07:00
Clint Wylie	e42e025296	inject @Json ObjectMapper for to_json_string and parse_json expressions (#12900 ) * inject @Json ObjectMapper for to_json_string and parse_json expressions * fix npe * better	2022-08-15 08:44:24 -07:00
Gian Merlino	846345669d	Error handling improvements for frame channels. (#12895 ) * Error handling improvements for frame channels. Two changes: 1) Send errors down in-memory channels (BlockingQueueFrameChannel) on failure. This ensures that in situations where a chain of processors has been set up on a single machine, all processors see the root cause error. In particular, this means the final processor in the chain reports the root cause error, which ensures that someone with a handle to the final processor will get the proper error. 2) Update FrameFileHttpResponseHandler to expect that the final fetch, rather than being simply empty, is also empty with a special header. This ensures that the handler is able to tell the difference between an empty fetch due to being at EOF, and an empty fetch due to a truncated HTTP response (after the 200 OK and headers are sent down, but before any content appears). * Fix tests, imports. * Checkstyle!	2022-08-15 11:31:55 +05:30
Rohan Garg	b26ab678b9	Do no create filters on right side table columns while join to filter conversion (#12899 )	2022-08-14 08:35:23 -07:00
Paul Rogers	41712b7a3a	Refactor SqlLifecycle into statement classes (#12845 ) * Refactor SqlLifecycle into statement classes Create direct & prepared statements Remove redundant exceptions from tests Tidy up Calcite query tests Make PlannerConfig more testable * Build fixes * Added builder to SqlQueryPlus * Moved Calcites system properties to saffron.properties * Build fix * Resolve merge conflict * Fix IntelliJ inspection issue * Revisions from reviews Backed out a revision to Calcite tests that didn't work out as planned * Build fix * Fixed spelling errors * Fixed failed test Prepare now enforces security; before it did not. * Rebase and fix IntelliJ inspections issue * Clean up exception handling * Fix handling of JDBC auth errors * Build fix * More tweaks to security messages	2022-08-14 00:44:08 -07:00
Clint Wylie	f4e0909e92	fix bug with json_object expression not fully unwrapping inputs (#12893 )	2022-08-13 21:15:19 -07:00
Rohan Garg	5394838030	Enable conversion of join to filter by default (#12868 )	2022-08-13 20:37:43 +05:30
Rohan Garg	af700bba0c	Fix hasBuiltInFilters for joins (#12894 )	2022-08-13 16:26:24 +05:30
Lucas Capistrant	3a3271eddc	Introduce defaultOnDiskStorage config for Group By (#12833 ) * Introduce defaultOnDiskStorage config for groupBy * add debug log to groupby query config * Apply config change suggestion from review * Remove accidental new lines * update default value of new default disk storage config * update debug log to have more descriptive text * Make maxOnDiskStorage and defaultOnDiskStorage HumanRedadableBytes * improve test coverage * Provide default implementation to new default method on advice of reviewer	2022-08-12 09:40:21 -07:00
Karan Kumar	2f2d8ded5a	Introducing Storage connector Interface (#12874 ) In the current druid code base, we have the interface DataSegmentPusher which allows us to push segments to the appropriate deep storage without the extension being worried about the semantics of how to push too deep storage. While working on #12262, whose some part of the code will go as an extension, I realized that we do not have an interface that allows us to do basic "write, get, delete, deleteAll" operations on the appropriate deep storage without let's say pulling the s3-storage-extension dependency in the custom extension. Hence, the idea of StorageConnector was born where the storage connector sits inside the druid core so all extensions have access to it. Each deep storage implementation, for eg s3, GCS, will implement this interface. Now with some Jackson magic, we bind the implementation of the correct deep storage implementation on runtime using a type variable.	2022-08-12 16:11:49 +05:30
Suneet Saldanha	267b32c2e2	Set druid.processing.fifo to true by default (#12571 )	2022-08-08 10:18:24 -07:00
Gian Merlino	01d555e47b	Adjust "in" filter null behavior to match "selector". (#12863 ) * Adjust "in" filter null behavior to match "selector". Now, both of them match numeric nulls if constructed with a "null" value. This is consistent as far as native execution goes, but doesn't match the behavior of SQL = and IN. So, to address that, this patch also updates the docs to clarify that the native filters do match nulls. This patch also updates the SQL docs to describe how Boolean logic is handled in addition to how NULL values are handled. Fixes #12856. * Fix test.	2022-08-08 09:08:36 -07:00
Karan Kumar	607b0b9310	Adding withName implementation to AggregatorFactory (#12862 ) * Adding agg factory with name impl * Adding test cases * Fixing test case * Fixing test case * Updated java docs.	2022-08-08 18:31:56 +05:30
Jonathan Wei	2045a1345c	Fix NPE when applying a transform that outputs to __time (#12870 )	2022-08-07 19:21:47 +05:30
Gian Merlino	ca4e64aea3	Frame processing and channels. (#12848 ) * Frame processing and channels. Follow-up to #12745. This patch adds three new concepts: 1) Frame channels are interfaces for doing nonblocking reads and writes of frames. 2) Frame processors are interfaces for doing nonblocking processing of frames received from input channels and sent to output channels. 3) Cluster-by keys, which can be used for sorting or partitioning. The patch also adds SuperSorter, a user of these concepts, both to illustrate how they are used, and also because it is going to be useful in future work. Central classes: - ReadableFrameChannel. Implementations include BlockingQueueFrameChannel (in-memory channel that implements both interfaces), ReadableFileFrameChannel (file-based channel), ReadableByteChunksFrameChannel (byte-stream-based channel), and others. - WritableFrameChannel. Implementations include BlockingQueueFrameChannel and WritableStreamFrameChannel (byte-stream-based channel). - ClusterBy, a sorting or partitioning key. - FrameProcessor, nonblocking processor of frames. Implementations include FrameChannelBatcher, FrameChannelMerger, and FrameChannelMuxer. - FrameProcessorExecutor, an executor service that runs FrameProcessors. - SuperSorter, a class that uses frame channels and processors to do parallel external merge sort of any amount of data (as long as there is enough disk space). * Additional tests, fixes. * Changes from review. * Better implementation for ReadableInputStreamFrameChannel. * Rename getFrameFileReference -> newFrameFileReference. * Add InterruptedException to runIncrementally; add more tests. * Cancellation adjustments. * Review adjustments. * Refactor BlockingQueueFrameChannel, rename doneReading and doneWriting to close. * Additional changes from review. * Additional changes. * Fix test. * Adjustments. * Adjustments.	2022-08-04 21:29:04 -07:00
Clint Wylie	73cfc4e5d0	fix expression plan type inference to correctly handle complex types (#12857 )	2022-08-04 02:56:05 -07:00
Paul Rogers	a618458bf0	Tidy up construction of the Guice Injectors (#12816 ) * Refactor Guice initialization Builders for various module collections Revise the extensions loader Injector builders for server startup Move Hadoop init to indexer Clean up server node role filtering Calcite test injector builder * Revisions from review comments * Build fixes * Revisions from review comments	2022-08-04 00:05:07 -07:00
Gian Merlino	ef6811ef88	Improved Java 17 support and Java runtime docs. (#12839 ) * Improved Java 17 support and Java runtime docs. 1) Add a "Java runtime" doc page with information about supported Java versions, garbage collection, and strong encapsulation.. 2) Update asm and equalsverifier to versions that support Java 17. 3) Add additional "--add-opens" lines to surefire configuration, so tests can pass successfully under Java 17. 4) Switch openjdk15 tests to openjdk17. 5) Update FrameFile to specifically mention Java runtime incompatibility as the cause of not being able to use Memory.map. 6) Update SegmentLoadDropHandler to log an error for Errors too, not just Exceptions. This is important because an IllegalAccessError is encountered when the correct "--add-opens" line is not provided, which would otherwise be silently ignored. 7) Update example configs to use druid.indexer.runner.javaOptsArray instead of druid.indexer.runner.javaOpts. (The latter is deprecated.) * Adjustments. * Use run-java in more places. * Add run-java. * Update .gitignore. * Exclude hadoop-client-api. Brought in when building on Java 17. * Swap one more usage of java. * Fix the run-java script. * Fix flag. * Include link to Temurin. * Spelling. * Update examples/bin/run-java Co-authored-by: Xavier Léauté <xl+github@xvrl.net> Co-authored-by: Xavier Léauté <xl+github@xvrl.net>	2022-08-03 23:16:05 -07:00
Clint Wylie	6981b1cc12	fix bugs with nested column jsonpath parser (#12831 )	2022-08-02 11:38:25 -07:00
Clint Wylie	6046a392b6	add DictionaryEncodedStringValueIndex implementation to NestedFieldLiteralColumnIndexSupplier (#12837 )	2022-08-01 21:40:35 -07:00
Rohan Garg	7ae6cc6e60	Fix string first/last aggregator comparator (#12773 )	2022-08-01 20:54:15 +05:30
Clint Wylie	d96a9c1e6f	add missing selectors for explicit null columns (#12834 )	2022-07-29 19:08:58 -07:00
Clint Wylie	189e8b9d18	add NumericRangeIndex interface and BoundFilter support (#12830 ) add NumericRangeIndex interface and BoundFilter support changes: * NumericRangeIndex interface, like LexicographicalRangeIndex but for numbers * BoundFilter now uses NumericRangeIndex if comparator is numeric and there is no extractionFn * NestedFieldLiteralColumnIndexSupplier.java now supports supplying NumericRangeIndex for single typed numeric nested literal columns * better faster stronger and (ever so slightly) more understandable * more tests, fix bug * fix style	2022-07-29 18:58:49 -07:00
Maytas Monsereenusorn	24c345cdf0	Allow dictionary encoded column to use a more generic index interface (#12826 )	2022-07-27 15:23:00 -07:00
Maytas Monsereenusorn	5417aa2055	Fix: ParseException swallow cause Exception (#12810 ) * add impl * add impl * fix checkstyle	2022-07-22 13:46:28 -07:00
Clint Wylie	1e0542626b	add nested column query benchmarks (#12786 )	2022-07-14 18:16:30 -07:00
Clint Wylie	05b2e967ed	druid nested data column type (#12753 ) * add new druid nested data column type * fixes and such * fixes * adjustments, more tests * self review * oops * fix and test * more better * style	2022-07-14 12:07:23 -07:00
Rohan Garg	bb953be09b	Refactor usage of JoinableFactoryWrapper + more test coverage (#12767 ) Refactor usage of JoinableFactoryWrapper to add e2e test for createSegmentMapFn with joinToFilter feature enabled	2022-07-12 06:25:36 -07:00
Gian Merlino	97207cdcc7	Automatic sizing for GroupBy dictionaries. (#12763 ) * Automatic sizing for GroupBy dictionary sizes. Merging and selector dictionary sizes currently both default to 100MB. This is not optimal, because it can lead to OOM on small servers and insufficient resource utilization on larger servers. It also invites end users to try to tune it when queries run out of dictionary space, which can make things worse if the end user sets it to too high. So, this patch: - Adds automatic tuning for selector and merge dictionaries. Selectors use up to 15% of the heap and merge buffers use up to 30% of the heap (aggregate across all queries). - Updates out-of-memory error messages to emphasize enabling disk spilling vs. increasing memory parameters. With the memory parameters automatically sized, it is more likely that an end user will get benefit from enabling disk spilling. - Removes the query context parameters that allow lowering of configured dictionary sizes. These complicate the calculation, and I don't see a reasonable use case for them. * Adjust tests. * Review adjustments. * Additional comment. * Remove unused import.	2022-07-11 08:20:50 -07:00
Gian Merlino	864b77e91a	SpillingGrouper: Make DISK_FULL sticky. (#12764 ) When we return DISK_FULL to a processing thread, it skips the rest of the segment and the query is canceled. However, it's possible that the next segment starts processing before cancellation can kick in. We want that one, if it occurs, to see DISK_FULL too.	2022-07-09 06:45:38 -07:00
Gian Merlino	edfbcc8455	Preserve column order in DruidSchema, SegmentMetadataQuery. (#12754 ) * Preserve column order in DruidSchema, SegmentMetadataQuery. Instead of putting columns in alphabetical order. This is helpful because it makes query order better match ingestion order. It also allows tools, like the reindexing flow in the web console, to more easily do follow-on ingestions using a column order that matches the pre-existing column order. We prefer the order from the latest segments. The logic takes all columns from the latest segments in the order they appear, then adds on columns from older segments after those. * Additional test adjustments. * Adjust imports.	2022-07-08 22:04:11 -07:00
Gian Merlino	9c925b4f09	Frame format for data transfer and short-term storage. (#12745 ) * Frame format for data transfer and short-term storage. As we move towards query execution plans that involve more transfer of data between servers, it's important to have a data format that provides for doing this more efficiently than the options available to us today. This patch adds: - Columnar frames, which support fast querying. - Row-based frames, which support fast sorting via memory comparison and fast whole-row copies via memory copying. - Frame files, a container format that can be stored on disk or transferred between servers. The idea is we should use row-based frames when data is expected to be sorted, and columnar frames when data is expected to be queried. The code in this patch is not used in production yet. Therefore, the patch involves minimal changes outside of the org.apache.druid.frame package. The main ones are adjustments to SqlBenchmark to add benchmarks for queries on frames, and the addition of a "forEach" method to Sequence. * Fixes based on tests, static analysis. * Additional fixes. * Skip DS mapping tests on JDK 14+ * Better JDK checking in tests. * Fix imports. * Additional comment. * Adjustments from code review. * Update test case.	2022-07-08 20:42:06 -07:00
Rohan Garg	bcff35f798	Pushdown join filter with right side referencing columns (#12749 )	2022-07-08 19:59:41 +05:30
Jianhuan Liu	4574dea5e9	Use MXBeans to get GC metrics #12476 (#12481 ) * jvm gc to mxbeans * add zgc and shenandoah #12476 * remove tryCreateGcCounter * separate the space collector * blend GcGenerationCollector into GcCollector * add jdk surefire argLine	2022-07-08 14:32:06 +08:00
Gian Merlino	49feffff1b	Add comment about double-close in ColumnSelectorColumnIndexSelector. (#12735 )	2022-07-06 00:50:35 -07:00
Clint Wylie	36e38b319b	add virtual column support to search query (#12720 )	2022-07-04 21:58:10 -07:00
imply-cheddar	e3128e3fa3	Poison stupid pool (#12646 ) * Poison StupidPool and fix resource leaks There are various resource leaks from test setup as well as some corners in query processing. We poison the StupidPool to start failing tests when the leaks come and fix any issues uncovered from that so that we can start from a clean baseline. Unfortunately, because of how poisoning works, we can only fail future checkouts from the same pool, which means that there is a natural race between a leak happening -> GC occurs -> leak detected -> pool poisoned. This race means that, depending on interleaving of tests, if the very last time that an object is checked out from the pool leaks, then it won't get caught. At some point in the future, something will catch it, however and from that point on it will be deterministic. * Remove various things left over from iterations * Clean up FilterAnalysis and add javadoc on StupidPool * Revert changes to .idea/misc.xml that accidentally got pushed * Style and test branches * Stylistic woes	2022-07-03 14:36:22 -07:00
Clint Wylie	48731710fb	precursor changes for nested columns to minimize files changed (#12714 ) * precursor changes for nested columns to minimize files changed * inspection fix * visibility * adjustment * unecessary change	2022-07-01 02:27:19 -07:00
Abhishek Agarwal	dbd45daf33	Flakiness and exceptions during tests (#12705 )	2022-06-28 10:36:23 +05:30
Tejaswini Bandlamudi	1fc2f6e4b0	Throw BadQueryContextException if context params cannot be parsed (#12680 )	2022-06-24 09:21:25 +05:30
Gian Merlino	818974f6e4	ScanQuery: Fix JsonIgnore for isLegacy. (#12674 ) True, false, and null have different meanings: true/false mean "legacy" and "not legacy"; null means use the default set by ScanQueryConfig. So, we need to respect this in the JsonIgnore setup.	2022-06-18 15:55:54 -07:00
Gian Merlino	e76a5077ef	Fix self-referential shape inspection in BaseExpressionColumnValueSelector. (#12669 ) * Fix self-referential shape inspection in BaseExpressionColumnValueSelector. The new test would throw StackOverflowError on the old code. * Restore prior test.	2022-06-17 16:15:50 -07:00
Clint Wylie	18937ffee2	split out null value index (#12627 ) * split out null value index * gg spotbugs * fix stuff	2022-06-17 15:29:23 -07:00
Paul Rogers	893759de91	Remove null and empty fields from native queries (#12634 ) * Remove null and empty fields from native queries * Test fixes * Attempted IT fix. * Revisions from review comments * Build fixes resulting from changes suggested by reviews * IT fix for changed segment size	2022-06-16 14:07:25 -07:00
Paul Rogers	45e3111549	Clean up query contexts (#12633 ) * Clean up query contexts Uses constants in place of literal strings for context keys. Moves some QueryContext methods to QueryContexts for reuse. * Revisions from review comments	2022-06-15 11:31:22 -07:00
Rohan Garg	28f2c8e112	Support LoadScope for Peons + Access Modifier Updates (#12640 ) * Support LoadScope for Peons * Update access modifiers for GroupByEngineV2	2022-06-14 21:52:50 -07:00
Rohan Garg	afaea251f2	Push join build table values as filter incase of duplicates (#12225 ) * Push join build table values as filter * Add tests for JoinableFactoryWrapper * fixup! Push join build table values as filter * fixup! Add tests for JoinableFactoryWrapper * fixup! Push join build table values as filter	2022-06-13 17:18:27 -07:00
Abhishek Agarwal	59a0c10c47	Add remedial information in error message when type is unknown (#12612 ) Often users are submitting queries, and ingestion specs that work only if the relevant extension is not loaded. However, the error is too technical for the users and doesn't suggest them to check for missing extensions. This PR modifies the error message so users can at least check their settings before assuming that the error is because of a bug.	2022-06-07 20:22:45 +05:30
Gian Merlino	abf0e0a159	CompressionStrategyTest: Fix thread-unsafe Closer usage. (#12605 ) Closer is not thread-safe, so we need one per thread in the concurrency tests.	2022-06-04 10:57:13 -07:00
Clint Wylie	98f6bca2cd	fix regression with ipv4_match and prefixes (#12542 ) * fix issue with ipv4_match and prefixes	2022-06-01 14:03:08 -07:00
Clint Wylie	31f988ec76	fix backwards compatibility for explicit null columns (#12585 )	2022-06-01 12:39:48 -07:00
Clint Wylie	0640c9c9ac	fix compression-strategy-test (#12575 ) fixes an issue caused by a test modification in #12408 that was closing buffers allocated by the compression strategy instead of allowing the closer to do it	2022-05-31 11:48:32 -07:00
Gian Merlino	02ae3e74ff	RowBasedColumnSelectorFactory: Add "useStringValueOfNullInLists" parameter. (#12578 ) RowBasedColumnSelectorFactory inherited strange behavior from Rows.objectToStrings for nulls that appear in lists: instead of being left as a null, it is replaced with the string "null". Some callers may need compatibility with this strange behavior, but it should be opt-in. Query-time call sites are changed to opt-out of this behavior, since it is not consistent with query-time expectations. The IncrementalIndex ingestion-time call site retains the old behavior, as this is traditionally when Rows.objectToStrings would be used.	2022-05-31 11:38:56 -07:00
Gian Merlino	6d2ff796a3	Add RowIdSupplier to ColumnSelectorFactory. (#12577 ) * Add RowIdSupplier to ColumnSelectorFactory. This enables virtual columns to cache their outputs in case they are called multiple times on the same underlying row. This is common for numeric selectors, where the common pattern is to call isNull() and then follow with getLong(), getFloat(), or getDouble(). Here, output caching reduces the number of expression evals by half. * Fix tests.	2022-05-31 11:38:03 -07:00
Clint Wylie	b746bf9129	fix virtual column cycle bug, sql virtual column optimize bug (#12576 ) * fix virtual column cycle bug, sql virtual column optimize bug * more test	2022-05-30 23:51:21 -07:00
Dr. Sizzles	7291c92f4f	Adding zstandard compression library (#12408 ) * Adding zstandard compression library * 1. Took @clintropolis's advice to have ZStandard decompressor use the byte array when the buffers are not direct. 2. Cleaned up checkstyle issues. * Fixing zstandard version to latest stable version in pom's and updating license files * Removing zstd from benchmarks and adding to processing (poms) * fix the intellij inspection issue * Removing the prefix v for the version in the license check for ztsd * Fixing license checks Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>	2022-05-28 17:01:44 -07:00
Clint Wylie	d0c9c37e35	make query context changes backwards compatible (#12564 ) Adds a default implementation of getQueryContext, which was added to the Query interface in #12396. Query is marked with @ExtensionPoint, and lately we have been trying to be less volatile on these interfaces by providing default implementations to be more chill for extension writers. The way this default implementation is done in this PR is a bit strange due to the way that getQueryContext is used (mutated with system default and system generated keys); the default implementation has a specific object that it returns, and I added another temporary default method isLegacyContext that checks if the getQueryContext returns that object or not. If not, callers fall back to using getContext and withOverriddenContext to set these default and system values. I am open to other ideas as well, but this way should work at least without exploding, and added some tests to ensure that it is wired up correctly for QueryLifecycle, including the context authorization stuff. The added test shows the strange behavior if query context authorization is enabled, mainly that the system default and system generated query context keys also need to be granted as permissions for things to function correctly. This is not great, so I mentioned it in the javadocs as well. Not sure if it needs to be called out anywhere else.	2022-05-25 15:24:41 +05:30
Karan Kumar	9f9faeec81	object[] handling for DimensionHandlers for arrays (#12552 ) Description Fixes a bug when running q's like SELECT cntarray, Count() FROM (SELECT dim1, dim2, Array_agg(cnt) AS cntarray FROM (SELECT dim1, dim2, dim3, Count() AS cnt FROM foo GROUP BY 1, 2, 3) GROUP BY 1, 2) GROUP BY 1 This generates an error: org.apache.druid.java.util.common.ISE: Unable to convert type [Ljava.lang.Object; to org.apache.druid.segment.data.ComparableList at org.apache.druid.segment.DimensionHandlerUtils.convertToList(DimensionHandlerUtils.java:405) ~[druid-xx] Because it's an array of numbers it looks like it does the convertToList call, which looks like: @Nullable public static ComparableList convertToList(Object obj) { if (obj == null) { return null; } if (obj instanceof List) { return new ComparableList((List) obj); } if (obj instanceof ComparableList) { return (ComparableList) obj; } throw new ISE("Unable to convert type %s to %s", obj.getClass().getName(), ComparableList.class.getName()); } I.e. it doesn't know about arrays. Added the array handling as part of this PR.	2022-05-25 15:24:18 +05:30
Agustin Gonzalez	2f3d7a4c07	Emit state of replace and append for native batch tasks (#12488 ) * Emit state of replace and append for native batch tasks * Emit count of one depending on batch ingestion mode (APPEND, OVERWRITE, REPLACE) * Add metric to compaction job * Avoid null ptr exc when null emitter * Coverage * Emit tombstone & segment counts * Tasks need a type * Spelling * Integrate BatchIngestionMode in batch ingestion tasks functionality * Typos * Remove batch ingestion type from metric since it is already in a dimension. Move IngestionMode to AbstractTask to facilitate having mode as a dimension. Add metrics to streaming. Add missing coverage. * Avoid inner class referenced by sub-class inspection. Refactor computation of IngestionMode to make it more robust to null IOConfig and fix test. * Spelling * Avoid polluting the Task interface * Rename computeCompaction methods to avoid ambiguous java compiler error if they are passed null. Other minor cleanup.	2022-05-23 12:32:47 -07:00
Gian Merlino	37853f8de4	ConcurrentGrouper: Add mergeThreadLocal option, fix bug around the switch to spilling. (#12513 ) * ConcurrentGrouper: Add option to always slice up merge buffers thread-locally. Normally, the ConcurrentGrouper shares merge buffers across processing threads until spilling starts, and then switches to a thread-local model. This minimizes memory use and reduces likelihood of spilling, which is good, but it creates thread contention. The new mergeThreadLocal option causes a query to start in thread-local mode immediately, and allows us to experiment with the relative performance of the two modes. * Fix grammar in docs. * Fix race in ConcurrentGrouper. * Fix issue with timeouts. * Remove unused import. * Add "tradeoff" to dictionary.	2022-05-21 10:28:54 -07:00
Gian Merlino	69aac6c8dd	Direct UTF-8 access for "in" filters. (#12517 ) * Direct UTF-8 access for "in" filters. Directly related: 1) InDimFilter: Store stored Strings (in ValuesSet) plus sorted UTF-8 ByteBuffers (in valuesUtf8). Use valuesUtf8 whenever possible. If necessary, the input set is copied into a ValuesSet. Much logic is simplified, because we always know what type the values set will be. I think that there won't even be an efficiency loss in most cases. InDimFilter is most frequently created by deserialization, and this patch updates the JsonCreator constructor to deserialize directly into a ValuesSet. 2) Add Utf8ValueSetIndex, which InDimFilter uses to avoid UTF-8 decodes during index lookups. 3) Add unsigned comparator to ByteBufferUtils and use it in GenericIndexed.BYTE_BUFFER_STRATEGY. This is important because UTF-8 bytes can be compared as bytes if, and only if, the comparison is unsigned. 4) Add specialization to GenericIndexed.singleThreaded().indexOf that avoids needless ByteBuffer allocations. 5) Clarify that objects returned by ColumnIndexSupplier.as are not thread-safe. DictionaryEncodedStringIndexSupplier now calls singleThreaded() on all relevant GenericIndexed objects, saving a ByteBuffer allocation per access. Also: 1) Fix performance regression in LikeFilter: since #12315, it applied the suffix matcher to all values in range even for type MATCH_ALL. 2) Add ObjectStrategy.canCompare() method. This fixes LikeFilterBenchmark, which was broken due to calls to strategy.compare in GenericIndexed.fromIterable. * Add like-filter implementation tests. * Add in-filter implementation tests. * Add tests, fix issues. * Fix style. * Adjustments from review.	2022-05-20 01:51:28 -07:00
machine424	90531fd53f	Do not alter query timeout in ScanQueryEngine (#12271 ) Add test to detect timeout mutability	2022-05-19 09:24:42 -07:00
Gian Merlino	4631cff2a9	Free ByteBuffers in tests and fix some bugs. (#12521 ) * Ensure ByteBuffers allocated in tests get freed. Many tests had problems where a direct ByteBuffer would be allocated and then not freed. This is bad because it causes flaky tests. To fix this: 1) Add ByteBufferUtils.allocateDirect(size), which returns a ResourceHolder. This makes it easy to free the direct buffer. Currently, it's only used in tests, because production code seems OK. 2) Update all usages of ByteBuffer.allocateDirect (off-heap) in tests either to ByteBuffer.allocate (on-heap, which are garbaged collected), or to ByteBufferUtils.allocateDirect (wherever it seemed like there was a good reason for the buffer to be off-heap). Make sure to close all direct holders when done. * Changes based on CI results. * A different approach. * Roll back BitmapOperationTest stuff. * Try additional surefire memory. * Revert "Roll back BitmapOperationTest stuff." This reverts commit `49f846d9e3`. * Add TestBufferPool. * Revert Xmx change in tests. * Better behaved NestedQueryPushDownTest. Exit tests on OOME. * Fix TestBufferPool. * Remove T1C from ARM tests. * Somewhat safer. * Fix tests. * Fix style stuff. * Additional debugging. * Reset null / expr configs better. * ExpressionLambdaAggregatorFactory thread-safety. * Alter forkNode to try to get better info when a JVM crashes. * Fix buffer retention in ExpressionLambdaAggregatorFactory. * Remove unused import.	2022-05-19 07:42:29 -07:00
Gian Merlino	5b6727f319	Enable vectorized virtual column processing by default. (#12520 ) In the majority of cases, this improves performance. There's only one case I'm aware of where this may be a net negative: for time_floor(__time, <period>) where there are many repeated __time values. In nonvectorized processing, SingleLongInputCachingExpressionColumnValueSelector implements an optimization to avoid computing the time_floor function on every row. There is no such optimization in vectorized processing. IMO, we shouldn't mention this in the docs. Rationale: It's too fiddly of a thing: it's not guaranteed that nonvectorized processing will be faster due to the optimization, because it would have to overcome the inherent speed advantage of vectorization. So it'd always require testing to determine the best setting for a specific dataset. It would be bad if users disabled vectorization thinking it would speed up their queries, and it actually slowed them down. And even if users do their own testing, at some point in the future we'll implement the optimization for vectorized processing too, and it's likely that users that explicitly disabled vectorization will continue to have it disabled. I'd like to avoid this outcome by encouraging all users to enable vectorization at all times. Really advanced users would be following development activity anyway, and can read this issue	2022-05-16 15:43:53 +05:30
Gian Merlino	ff253fd8a3	Add setProcessingThreadNames context parameter. (#12514 ) setting thread names takes a measurable amount of time in the case where segment scans are very quick. In high-QPS testing we found a slight performance boost from turning off processing thread renaming. This option makes that possible.	2022-05-16 13:42:00 +05:30
Abhishek Radhakrishnan	9177515be2	Add IPAddress java library as dependency and migrate IPv4 functions to use the new library. (#11634 ) * Add ipaddress library as dependency. * IPv4 functions to use the inet.ipaddr package. * Remove unused imports. * Add new function. * Minor rename. * Add more unit tests. * IPv4 address expr utils unit tests and address options. * Adjust the IPv4Util functions. * Move the UTs a bit around. * Javadoc comments. * Add license info for IPAddress. * Fix groupId, artifact and version in license.yaml. * Remove redundant subnet in messages - fixes UT. * Remove unused commons-net dependency for /processing project. * Make class and methods public so it can be accessed. * Add initial version of benchmark * Add subnetutils package for benchmarks. * Auto generate ip addresses. * Add more v4 address representations in setup to avoid bias. * Use ThreadLocalRandom to avoid forbidden API usage. * Adjust IPv4AddressBenchmark to adhere to codestyle rules. * Update ipaddress library to latest 5.3.4 * Add ipaddress package dependency to benchmarks project.	2022-05-11 22:06:20 -07:00
Clint Wylie	9e5a940cf1	remake column indexes and query processing of filters (#12388 ) Following up on #12315, which pushed most of the logic of building ImmutableBitmap into BitmapIndex in order to hide the details of how column indexes are implemented from the Filter implementations, this PR totally refashions how Filter consume indexes. The end result, while a rather dramatic reshuffling of the existing code, should be extraordinarily flexible, eventually allowing us to model any type of index we can imagine, and providing the machinery to build the filters that use them, while also allowing for other column implementations to implement the built-in index types to provide adapters to make use indexing in the current set filters that Druid provides.	2022-05-11 11:57:08 +05:30
Rohan Garg	75836a5a06	Add feature flag for sql planning of TimeBoundary queries (#12491 ) * Add feature flag for sql planning of TimeBoundary queries * fixup! Add feature flag for sql planning of TimeBoundary queries * Add documentation for enableTimeBoundaryPlanning * fixup! Add documentation for enableTimeBoundaryPlanning	2022-05-10 15:23:42 +05:30
somu-imply	c68388ebcd	Vectorized version of string last aggregator (#12493 ) * Vectorized version of string last aggregator * Updating string last and adding testcases * Updating code and adding testcases for serializable pairs * Addressing review comments	2022-05-09 17:02:38 -07:00
Rohan Garg	2dd073c2cd	Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation (#12484 ) * Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation * fixup! Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation * Document vectorized dimension	2022-05-09 10:40:17 -07:00
Gian Merlino	529b983ad0	GroupBy: Reduce allocations by reusing entry and key holders. (#12474 ) * GroupBy: Reduce allocations by reusing entry and key holders. Two main changes: 1) Reuse Entry objects returned by various implementations of Grouper.iterator. 2) Reuse key objects contained within those Entry objects. This is allowed by the contract, which states that entries must be processed and immediately discarded. However, not all call sites respected this, so this patch also updates those call sites. One particularly sneaky way that the old code retained entries too long is due to Guava's MergingIterator and CombiningIterator. Internally, these both advance to the next value prior to returning the current value. So, this patch addresses that in two ways: 1) For merging, we have our own implementation MergeIterator already, although it had the same problem. So, this patch updates our implementation to return the current item prior to advancing to the next item. It also adds a forbidden-api entry to ensure that this safer implementation is used instead of Guava's. 2) For combining, we address the problem in a different way: by copying the key when creating the new, combined entry. * Attempt to fix test. * Remove unused import.	2022-04-28 23:21:13 -07:00

1 2 3 4 5 ...

2699 Commits