druid

Commit Graph

Author	SHA1	Message	Date
Clint Wylie	5d39b94149	allow compaction to work with spatial dimensions (#15321 )	2023-11-03 11:27:50 -07:00
Laksh Singla	0cc8839a60	Allow casted literal values in SQL functions accepting literals (Part 2) (#15316 )	2023-11-03 21:22:19 +05:30
Tts-233	f39a778f7d	Fix 404 URL about native query (#15324 )	2023-11-03 08:39:59 -07:00
Gian Merlino	98f1eb8ede	Use filters for pruning properly for hash-joins. (#15299 ) * Use filters for pruning properly for hash-joins. Native used them too aggressively: it might use filters for the RHS to prune the LHS. MSQ used them not at all. Now, both use them properly, pruning based on base (LHS) columns only. * Fix tests. * Fix style. * Clear filterFields too. * Update.	2023-11-03 07:29:16 -07:00
Karan Kumar	5036af6fb3	Doc fixes for query from deep storage and MSQ (#15313 ) Minor updates to the documentation. Added prerequisites. Removed a known issue in MSQ since its no longer valid. --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-11-03 10:52:20 +05:30
Adarsh Sanjeev	9576fd3141	HllSketch Merge Aggregator optimizations (#15162 ) * Null byte serde for empty sketches * Cache for HllSketchMerge * Check for empty sketches * Address review comments * Revert changes to HllSketchHolder * Handle null sketch holders instead of null sketches * Add unit test for MSQ HllSketch * Add comments * Fix style	2023-11-03 11:01:22 +08:00
cristian-popa	fb260f3e41	docs: LDAP trust store property clarification (#15028 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-11-02 13:00:08 -07:00
Gian Merlino	d87d92bc43	Add system fields to input sources. (#15276 ) * Add system fields to input sources. Main changes: 1) The SystemField enum defines system fields "__file_uri", "__file_path", and "__file_bucket". They are associated with each input entity. 2) The SystemFieldInputSource interface can be added to any InputSource to make it system-field-capable. It sets up serialization of a list of configured "systemFields" in the JSON form of the input source, and provides a method getSystemFieldValue for computing the value of each system field. Cloud object, HDFS, HTTP, and Local now have this. * Fix various LocalInputSource calls. * Fix style stuff. * Fixups. * Fix tests and coverage.	2023-11-02 10:31:28 -07:00
AmatyaAvadhanula	dc3213b05d	Fix used segment retrieval in Kill tasks (#15306 ) Fix used segment retrieval in Kill tasks	2023-11-02 19:07:17 +05:30
Clint Wylie	d261587f4a	explicit outputType for ExpressionPostAggregator, better documentation for the differences between arrays and mvds (#15245 ) * better documentation for the differences between arrays and mvds * add outputType to ExpressionPostAggregator to make docs true * add output coercion if outputType is defined on ExpressionPostAgg * updated post-aggregations.md to be consistent with aggregations.md and filters.md and use tables	2023-11-02 00:31:37 -07:00
Adarsh Sanjeev	22443ab87e	Fix an issue with passing order by and limit to realtime tasks (#15301 ) While running queries on real time tasks using MSQ, there is an issue with queries with certain order by columns. If the query specifies a non time column, the query is planned as it is supported by MSQ. However, this throws an exception when passed to real time tasks once as the native query stack does not support it. This PR resolves this by removing the ordering from the query before contacting real time tasks. Fixes a bug with MSQ while reading data from real time tasks with non time ordering	2023-11-02 11:38:26 +05:30
Laksh Singla	b82ad59dfe	Better logging in ServiceClientImpl (#15269 ) ServiceClientImpl logs the cause of every retry, even though we are retrying the connection attempt. This leads to slight pollution in the logs because a lot of the time, the reason for retrying is the same. This is seen primarily in MSQ, when the worker task hasn't launched yet however controller attempts to connect to the worker task, which can lead to scary-looking messages (with INFO log level), even though they are normal. This PR changes the logging logic to log every 10 (arbitrary number) retries instead of every retry, to reduce the pollution of the logs. Note: If there are no retries left, the client returns an exception, which would get thrown up by the caller, and therefore this change doesn't hide any important information.	2023-11-02 11:32:49 +05:30
Gian Merlino	6b6d73b5d4	Use min of scheduler threads and server threads for subquery guardrails. (#15295 ) * Use min of scheduler threads and server threads for subquery guardrails. This allows more memory to be used for subqueries when the query scheduler is configured to limit queries below the number of server threads. The patch also refactors the code so SubqueryGuardrailHelper is provided by a Guice Provider rather than being created by ClientQuerySegmentWalker, to achieve better separation of concerns. * Exclude provider from coverage.	2023-11-01 22:34:53 -07:00
Gian Merlino	37e158c2c4	Frames: consider writing singly-valued column when input column hasMultipleValues is UNKNOWN. (#15300 ) * Frames: consider writing singly-valued column when input column hasMultipleValues is UNKNOWN. Prior to this patch, columnar frames would always write multi-valued columns if the input column had hasMultipleValues = UNKNOWN. This had the effect of flipping UNKNOWN to TRUE when copying data into frames, which is problematic because TRUE causes expressions to assume that string inputs must be treated as arrays. We now avoid this by flipping UNKNOWN to FALSE if no multi-valuedness is encountered, and flipping it to TRUE if multi-valuedness is encountered. * Add regression test case.	2023-11-01 22:05:53 -07:00
Charles Smith	de557a62ad	Suggest adoption of Google Style guide (#14905 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-11-01 13:31:03 -07:00
Charles Smith	3860052de0	remove references to Jupyter notebooks within the Druid repo (#15143 ) Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2023-11-01 13:17:06 -07:00
Katya Macedo	935050bf43	docs: Dynamic config cleanup (#15265 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-11-01 11:22:33 -07:00
Sergio Ferragut	c9c3df204e	Redirect to new jupyter notebook project (#15136 )	2023-11-01 08:38:40 -07:00
Laksh Singla	2ea7177f15	Allow casted literal values in SQL functions accepting literals (#15282 ) Functions that accept literals also allow casted literals. This shouldn't have an impact on the queries that the user writes. It enables the SQL functions to accept explicit cast, which is required with JDBC.	2023-11-01 10:38:48 +05:30
George Shiqi Wu	49e0cba7ba	Fix dockerfile for druid image (#15264 ) Fixes docker image build issues with apache/druid.	2023-11-01 09:55:54 +05:30
317brian	436ded3d78	docs: durable storage azure cleanup (#15120 ) Co-authored-by: Laksh Singla <lakshsingla@gmail.com>	2023-10-31 15:20:38 -07:00
Katya Macedo	a43ffbdf2b	[Docs] Improvements to JSON-based batch Ingestion page (#15286 )	2023-10-31 14:50:45 -07:00
317brian	87695410ac	docs: blurb about msq union all (#15223 )	2023-10-31 14:15:38 -07:00
Suneet Saldanha	e6b7c36e74	LoadRules with 0 replicas should be treated as handoff complete (#15274 ) * LoadRules with 0 replicas should be treated as handoff complete * fix it * pr feedback * fixit	2023-10-30 10:42:58 -07:00
George Shiqi Wu	3173093415	Handle status failures for streaming supervisors (#15174 ) * Cleanup logic * newline * remove whitespace * Fix log message * Add test class * PR changes	2023-10-30 10:21:23 -07:00
Vishesh Garg	a27598a487	Segregate advance and advanceUninterruptibly flow in postJoinCursor to allow for interrupts in advance (#15222 ) Currently advance function in postJoinCursor calls advanceUninterruptibly which in turn keeps calling baseCursor.advanceUninterruptibly until the post join condition matches, without checking for interrupts. This causes the CPU to hit 100% without getting a chance for query to be cancelled. With this change, the call flow of advance and advanceUninterruptibly is separated out so that they call baseCursor.advance and baseCursor.advanceUninterruptibly in them, respectively, giving a chance for interrupts in the former case between successive calls to baseCursor.advance.	2023-10-30 14:39:15 +05:30
Ben Sykes	275c1ec64c	Fix error assuming a Complex Type that is a Number is a double (#15272 ) * Fix error assuming a Complex Type that is a Number is a double In the case where a complex type is a number, it may not be castable to double. It can safely be case as Number first to get to the doubleValue.	2023-10-30 09:52:52 +05:30
Vishesh Garg	039b05585c	Add worker status and duration metrics in live and task reports (#15180 ) Add worker status and duration metrics in live and task reports for tracking.	2023-10-30 09:43:22 +05:30
Zoltan Haindrich	f4a74710e6	Process pure ordering changes with windowing operators (#15241 ) - adds a new query build path: DruidQuery#toScanAndSortQuery which: - builds a ScanQuery without considering the current ordering - builds an operator to execute the sort - fixes a null string to "null" literal string conversion in the frame serializer code - fixes some DrillWindowQueryTest cases - fix NPE in NaiveSortOperator in case there was no input - enables back CoreRules.AGGREGATE_REMOVE - adds a processing level OffsetLimit class and uses that instead of just the limit in the rac parts - earlier window expressions on top of a subquery with an offset may have ignored the offset	2023-10-29 16:40:49 +05:30
317brian	737947754d	docs: add concurent compaction docs (#15218 ) Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2023-10-27 10:29:34 -07:00
kaisun2000	60c2ad597a	Enhance json parser error logging to better track Istio Proxy error message (#15176 ) Currently the inter Druid communication via rest endpoints is based on json formatted payload. Upon parsing error, there is only a generic exception stating expected json token type and current json token type. There is no detailed error log about the content of the payload causing the violation. In the micro-service world, the trend is to deploy the Druid servers in k8 with the mesh network. Often the istio proxy or other proxies is used to intercept the network connection between Druid servers. The proxy may give error messages for various reasons. These error messages are not expected by the json parser. The generic error message from Druid can be very misleading as the user may think the message is based on the response from the other Druid server. For example, this is an example of mysterious error message QueryInterruptedException{msg=Next token wasn't a START_ARRAY, was[VALUE_STRING] from url[http://xxxxx:8088/druid/v2/], code=Unknown exception, class=org.apache.druid.java.util.common.IAE, host=xxxxx:8088}" While the context of the message is the following from the proxy when it can't tunnel the network connection. pstream connect error or disconnect/reset before header So this very simple PR is just to enhance the logging and get the real underlying message printed out. This would save a lot of head scratching time if Druid is deployed with mesh network. Co-authored-by: Kai Sun <kai.sun@salesforce.com>	2023-10-27 14:20:19 +05:30
Laksh Singla	7c8e841362	Suppress CVE's in master (#15231 )	2023-10-27 09:29:18 +05:30
Simon Hofbauer	e9b7e4a0eb	fix JSON flaky tests (#15261 ) Co-authored-by: simonh5 <simonh5@illinois.edu>	2023-10-26 20:27:09 -07:00
Alexander Saydakov	f1132d20c5	use datasketches-java 4.2.0 (#15257 ) * use datasketches-java 4.2.0 * use exclusive mode * fixed issues raised by CodeQL * fixed issue raised by spotbugs * fixed issues raised by intellij * added missing import * Update QuantilesSketchKeyCollector search mode and adjust tests. * Update sizeOf functions and add unit tests * Add unit tests --------- Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com> Co-authored-by: Gian Merlino <gianmerlino@gmail.com> Co-authored-by: Adarsh Sanjeev <adarshsanjeev@gmail.com>	2023-10-26 16:28:33 -07:00
David Christle	fc0b940f78	Document the allowed range of announcer maxBytesPerNode (#15063 )	2023-10-26 14:51:01 -07:00
Pranav	e7b8e6569b	Updating plugin which has fix for corrupt nodejs pkg (#15259 )	2023-10-25 21:49:58 -07:00
Zoltan Haindrich	f48263bbb3	Report function name for unknown exceptions during execution (#14987 ) * provide function name when unknown exceptions are encountered * fix keywords/etc * fix keywrod order - regex excercise * add test * add check&fix keywords * decoupledIgnore * Revert "decoupledIgnore" This reverts commit `e922c820a7`. * unpatch Function * move to a different location * checkstyle	2023-10-25 13:37:30 -07:00
YongGang	7a25ee4fd9	Ability to send task types to k8s or worker task runner (#15196 ) * Ability to send task types to k8s or worker task runner * add more tests * use runnerStrategy to determine task runner * minor refine * refine runner strategy config * move workerType config to upper level * validate config when application start	2023-10-25 09:55:56 -07:00
Laksh Singla	207398a47d	Initialize null handling in CompressedBigDecimalAggregatorTimeseriesTestBase to fix failing test(#15252 )	2023-10-25 20:26:46 +05:30
Adarsh Sanjeev	c5fa649ea5	Rename segment load wait parameter (#15251 )	2023-10-25 18:08:37 +05:30
Zoltan Haindrich	6784e9c507	Fix summary row issues in case postaggregations are happening (#15232 ) * fix-1/2 * add message v1 * extend test to cover for IOB issue * move stuff around * change message * fix testcase string * compute postaggs (thank you Clint!) * enable feature for test * ignore tests in msq --------- Co-authored-by: Soumyava Das <soumyava@users.noreply.github.com>	2023-10-24 20:33:59 -07:00
Soumyava	06f40a0019	remove calcite AggregateRemoveRule to fix nested group by query with order by in outer query (#15237 ) * Fixing nested group by query with order by in outer query * Adding examples	2023-10-24 15:30:13 -07:00
Clint Wylie	4149c9422c	cleanup temp files for nested column serializer (#15236 ) * cleanup temp files for nested column serializer * fix style * fix tests in default value mode	2023-10-24 15:30:00 -07:00
Abhishek Radhakrishnan	63e3e9531d	Update S3 retry logic to account for the underlying cause in case of `IOException` (#15238 ) * Update S3 retry logic based on the underlying cause in case of IOException. 4xx and other errors wrapped in IOException for instance aren't retriable. * Fix CI	2023-10-24 15:04:42 -07:00
AmatyaAvadhanula	65b69cded4	Filter pending segments upgraded with transactional replace (#15169 ) * Filter pending segments upgraded with transactional replace * Push sequence name filter to metadata query	2023-10-23 21:18:47 +05:30
Zoltan Haindrich	2e31cb2901	DrillWindowQueryTest: use proper way to decide if the query is ordered (#15118 )	2023-10-23 10:54:28 -04:00
Zoltan Haindrich	b95035f183	Fix VirtualColumn related issues in window expressions (#15119 ) for some exotic queries like: SELECT '_'\|\|dim1, MIN(cast(0 as double)) OVER (), MIN(cast((cnt\|\|cnt) as bigint)) OVER () FROM foo the compilation have resulted in NPE -s mostly because VirtualColumn -s were not handled properly	2023-10-23 14:05:59 +05:30
Clint Wylie	c8e458452d	Fix native is boolean filter cache key tests to test the right thing (#15216 )	2023-10-23 11:24:46 +05:30
AmatyaAvadhanula	33fdd770f7	Consider only supervisors with append lock for concurrent transactional replace (#15220 ) A SegmentTransactionReplaceAction must only update the mapping of tasks with append locks that are running concurrently. To ensure this, we return the supervisor id only if it has the taskLockType as APPEND in its context.	2023-10-22 14:12:36 +05:30
Zoltan Haindrich	fbbb9c7730	Allow DESC ordering in window expressions (#15195 )	2023-10-20 07:55:28 -04:00

1 2 3 4 5 ...

13362 Commits All Branches Search

13362 Commits

All Branches