druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	dbed1b0f50	Defer more expressions in vectorized groupBy. (#16338 ) * Defer more expressions in vectorized groupBy. This patch adds a way for columns to provide GroupByVectorColumnSelectors, which controls how the groupBy engine operates on them. This mechanism is used by ExpressionVirtualColumn to provide an ExpressionDeferredGroupByVectorColumnSelector that uses the inputs of an expression as the grouping key. The actual expression evaluation is deferred until the grouped ResultRow is created. A new context parameter "deferExpressionDimensions" allows users to control when this deferred selector is used. The default is "fixedWidthNonNumeric", which is a behavioral change from the prior behavior. Users can get the prior behavior by setting this to "singleString". * Fix style. * Add deferExpressionDimensions to SqlExpressionBenchmark. * Fix style. * Fix inspections. * Add more testing. * Use valueOrDefault. * Compute exprKeyBytes a bit lighter-weight.	2024-06-26 17:28:36 -07:00
Clint Wylie	d4f2636325	fix greatest/least function non-vectorized processing to ignore null argument types (#16649 )	2024-06-26 12:59:42 -07:00
Andreas Maechler	ab76d851ad	Update docs contribution with correct script (#16581 ) * Spacing * Fix ordering * npm run start	2024-06-26 10:30:52 -07:00
Abhishek Radhakrishnan	82117e8101	Add MSQ query context `maxNumSegments` (#16637 ) * Add MSQ query context maxNumSegments. - Default is MAX_INT (unbounded). - When set and if a time chunk contains more number of segments than set in the query context, the MSQ task will fail with TooManySegments fault. * Fixup hashCode(). * Rename and checkpoint. * Add some insert and replace happy and sad path tests. * Update error msg. * Commentary * Adjust the default to be null (meaning no max bound on number of segments). Also fix formatter. * Fix CodeQL warnings and minor cleanup. * Assert on maxNumSegments tuning config. * Minor test cleanup. * Use null default for the MultiStageQueryContext as well * Review feedback * Review feedback * Move logic to common function getPartitionsByBucket shared by INSERT and REPLACE. * Rename to validateNumSegmentsPerBucketOrThrow() for consistency. * Add segmentGranularity to error message.	2024-06-26 09:29:51 -07:00
Rahul Bansal	b772277d3b	Update intellij-setup.md (#16655 ) updating typing mistakes	2024-06-26 17:38:37 +05:30
Laksh Singla	71b3b5ab5d	Add query context parameter to remove null bytes when writing frames (#16579 ) MSQ cannot process null bytes in string fields, and the current workaround is to remove them using the REPLACE function. 'removeNullBytes' context parameter has been added which sanitizes the input string fields by removing these null bytes.	2024-06-26 15:00:30 +05:30
Kashif Faraz	d9bd02256a	Refactor: Rename UsedSegmentChecker and cleanup task actions (#16644 ) Changes: - Rename `UsedSegmentChecker` to `PublishedSegmentsRetriever` - Remove deprecated single `Interval` argument from `RetrieveUsedSegmentsAction` as it is now unused and has been deprecated since #1988 - Return `Set` of segments instead of a `Collection` from `IndexerMetadataStorageCoordinator.retrieveUsedSegments()`	2024-06-26 10:48:59 +05:30
Tom	52c9929019	Column name in parse exceptions (#16529 ) * first pass * more changes * fix tests and formatting * fix kinesis failing tests * fix kafka tests * add dimension name to float parse errors * double and convertToType handling of dimensionName can report parse errors with dimension name * fix checkstyle issue * fix tests * more cases to have better parse exception messages * fix test * fix tests * partially address comments * annotate method parameter with nullable * address comments * fix tests * let float, double, long dimensionIndexer pass dimensionName down to dimensionHandlerUtils * fix compilation error and clean up formatting * clean up whitespace * address feedback. undo change, pass down report parse exception for convertToType * fix test	2024-06-25 13:42:52 -07:00
Abhishek Radhakrishnan	e01f155209	Add missing `delta-storage` dependency and class loader workaround to Delta table ingestion (#16648 ) * Workaround to ingesting from Delta table in 3.2.0. With the upgrade to Kernel 3.2.0, the Druid Delta connector extension isn't able to ingest from Delta tables successfully. Please see https://github.com/delta-io/delta/issues/3299 The underlying problem seems to be coming from https://github.com/delta-io/delta/blob/master/kernel/kernel-defaults/src/main/java/io/delta/kernel/defaults/internal/logstore/LogStoreProvider.java#L99 This patch is a workaround to setting the thread class loader explictly. The Kernel community may consider a fix in the next release as it's affected another connector as well. * Address review comment: clear the CL after the Thread CL is set.	2024-06-25 09:16:13 -07:00
Edgar Melendrez	b43f4063c5	Docs: update link and title of quickstart (#16638 ) * update link and title * Discard changes to website/package.json * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> --------- Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-06-25 09:07:00 -07:00
Abhishek Radhakrishnan	2979f73e89	Fix Intellij inspection (#16651 )	2024-06-25 04:32:43 -07:00
Kashif Faraz	f1043d20bc	Support csv input format in Kafka ingestion with header (#16630 ) * Support ListBasedInputRow in Kafka ingestion with header * Fix up buildBlendedEventMap * Add new test for KafkaInputFormat with csv value and headers * Do not use forbidden APIs * Move utility method to TestUtils	2024-06-25 11:50:01 +05:30
Clint Wylie	37a50e6803	Remove index_realtime and index_realtime_appenderator tasks (#16602 ) index_realtime tasks were removed from the documentation in #13107. Even at that time, they weren't really documented per se— just mentioned. They existed solely to support Tranquility, which is an obsolete ingestion method that predates migration of Druid to ASF and is no longer being maintained. Tranquility docs were also de-linked from the sidebars and the other doc pages in #11134. Only a stub remains, so people with links to the page can see that it's no longer recommended. index_realtime_appenderator tasks existed in the code base, but were never documented, nor as far as I am aware were they used for any purpose. This patch removes both task types completely, as well as removes all supporting code that was otherwise unused. It also updates the stub doc for Tranquility to be firmer that it is not compatible. (Previously, the stub doc said it wasn't recommended, and pointed out that it is built against an ancient 0.9.2 version of Druid.) ITUnionQueryTest has been migrated to the new integration tests framework and updated to use Kafka ingestion. Co-authored-by: Gian Merlino <gianmerlino@gmail.com>	2024-06-24 20:13:33 -07:00
317brian	2131917f16	docs: added front-coded dictionaries to upgrade notes (#16647 ) * docs: add front-coded dictionareis to upgrade notes * add it to release notes template	2024-06-24 10:52:26 -07:00
Abhishek Radhakrishnan	7463589b07	Support for bootstrap segments (#16609 ) * Initial support for bootstrap segments. - Adds a new API in the coordinator. - All processes that have storage locations configured (including tasks) talk to the coordinator if they can, and fetch bootstrap segments from it. - Then load the segments onto the segment cache as part of startup. - This addresses the segment bootstrapping logic required by processes before they can start serving queries or ingesting. This patch also lays the foundation to speed up upgrades. * Fail open by default if there are any errors talking to the coordinator. * Add test for failure scenario and cleanup logs. * Cleanup and add debug log * Assert the events so we know the list exactly. * Revert RunRules test. The rules aren't evaluated if there are no clusters. * Revert RunRulesTest too. * Remove debug info. * Make the API POST and update log. * Fix up UTs. * Throw 503 from MetadataResource; clean up exception handling and DruidException. * Remove unused logger, add verification of metrics and docs. * Update error message * Update server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Apply suggestions from code review Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Adjust test metric expectations with the rename. * Add BootstrapSegmentResponse container in the response for future extensibility. * Rename to BootstrapSegmentsInfo for internal consistency. * Remove unused log. * Use a member variable for broadcast segments instead of segmentAssigner. * Minor cleanup * Add test for loadable bootstrap segments and clarify comment. * Review suggestions. --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-06-24 09:27:17 -07:00
Misha	354a3bea0b	The default `WHERE' filter for automatically generated SQL queries is returned (#16608 ) * Returned the default `WHERE` filter for auto-generated SQL queries * Checkstyle fix --------- Co-authored-by: sviatahorau <mikhail.sviatahorau@deep.bi>	2024-06-24 08:52:35 -07:00
Sree Charan Manamala	990fd5f5fb	Make use group iterator for all window frames & support for same bound kinds (#16603 ) Fixes apache/druid#15739	2024-06-24 15:52:41 +02:00
Kashif Faraz	0fe6a2af68	Fix replica task failures with metadata inconsistency while running concurrent append replace (#16614 ) Changes: - Add new task action `RetrieveSegmentsByIdAction` - Use new task action to retrieve segments irrespective of their visibility - During rolling upgrades, this task action would fail as Overlord would be on old version - If new action fails, fall back to just fetching used segments as before	2024-06-24 09:56:04 +05:30
Adarsh Sanjeev	1a883ba1f7	Fix complex columns with export (#16572 ) This PR fixes a few bugs with MSQ export. The main change is calling SqlResults#coerce before writing the column. This allows sketches and json to be correctly deserialized. The format of the exported complex columns are similar to those produced by Async MSQ queries with CSV format. Notes: Fix printing of complex columns during export. Sketches and JSON are now correctly formatted during export. Fix an NPE if the writer has not been initialized. Empty export queries will create an empty file at the location. Fix a bug with counters for MSQ export, where rows were reported for only the first partition.	2024-06-24 09:03:30 +05:30
Akshat Jain	641f739a47	Fix flaky test in RetryableS3OutputStreamTest (#16639 ) As part of #16481, we have started uploading the chunks in parallel. That means that it's not necessary for the part that finished uploading last to be less than or equal to the chunkSize (as the final part could've been uploaded earlier). This made a test in RetryableS3OutputStreamTest flaky where we were asserting that the final part should be smaller than chunk size. This commit fixes the test, and also adds another test where the file size is such that all chunk sizes would be of equal size.	2024-06-24 08:13:47 +05:30
Laksh Singla	00c96432af	Materialize scan results correctly when columns are not present in the segments (#16619 ) Fixes a bug causing maxSubqueryBytes not to work when segments have missing columns.	2024-06-23 23:15:45 +05:30
Rishabh Singh	a63c12bf34	Upload tasklogs along with service logs on Standard IT failure (#16631 ) * Fix build * Push tasklogs alongwith service logs * temp changes to run standard its without unit test results * test * minor change * test * test * Update datasource name for ITSystemTableBatchIndexTaskTest * Publish task logs * Revert other changes * update standard-it yaml	2024-06-22 11:45:54 +05:30
Vadim Ogievetsky	51c73b5a4e	Web console: show formatted JSON value (#16632 ) * show formatted json value * update snapshot * window functions * count star can also have a window * better edit query context	2024-06-21 18:33:15 -07:00
Rishabh Singh	4eced9b3c9	Fix CentralizedDatasourceSchema group IT failure (#16636 ) * Fix build * Update datasource name in ITSystemTableBatchIndexTaskTest	2024-06-21 15:40:12 -07:00
Suneet Saldanha	4e0ea7823b	Update docs for K8s TaskRunner Dynamic Config (#16600 ) * Update docs for K8s TaskRunner Dynamic Config * touchups * code review * npe * oopsies	2024-06-21 06:01:59 -07:00
Akshat Jain	cd438b1918	Emit metrics for S3UploadThreadPool (#16616 ) * Emit metrics for S3UploadThreadPool * Address review comments * Revert unnecessary formatting change * Revert unnecessary formatting change in metrics.md file * Address review comments * Add metric for task duration * Minor fix in metrics.md * Add s3Key and uploadId in the log message * Address review comments * Create new instance of ServiceMetricEvent.Builder for thread safety * Address review comments * Address review comments	2024-06-21 11:36:47 +05:30
Adithya Chakilam	35709de549	CgroupCpuSetMonitor: Initialize the cgroup discoverer (#16621 )	2024-06-20 10:23:59 -07:00
Andreas Maechler	ae70e18bc8	docs: Update Azure extension (#16585 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-06-20 09:31:29 -07:00
Abhishek Radhakrishnan	b20c3dbadf	Fix malformed period throwing `ADMIN` persona error (#16626 ) * Turn invalid periods into user-facing exception providing more context. The current exception is targeting the ADMIN persona. Catch that and turn it into a USER persona instead. Also, provide more context in the error message. * Review comment: pass the wrapping expression and stringify. * Update processing/src/main/java/org/apache/druid/query/expression/ExprUtils.java Co-authored-by: Clint Wylie <cjwylie@gmail.com> --------- Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2024-06-20 08:40:28 -07:00
Sree Charan Manamala	7ac0862287	Grouping Engine fix when a limit spec with different order by columns is applied (#16534 )	2024-06-20 11:35:58 +02:00
Rishabh Singh	169a8dbd1a	Disable TestValidateIncompatibleCentralizedDatasourceSchemaConfig (#16627 ) * Fix build * Ignore test	2024-06-18 17:50:46 -07:00
Maytas Monsereenusorn	44268e7fad	Pass requestBufferSize from Config to Proxy servlet (#16611 )	2024-06-19 02:42:16 +07:00
AmatyaAvadhanula	be3593f099	Optimize unused segment query for segment allocation (#16623 )	2024-06-18 20:45:04 +05:30
Sam Rash	a10310388f	Add Conditional Helpers to DruidException / InvalidInput (#16470 ) Adds versions of DruidException.defensive(String, Object...) InvalidInput.exception(String, Object...) InvalidInput.exception(Throwable, String, Object...) the versions add a boolean as the first arg and only create and throw an exception if it's false. It can be used similar to Preconditions.checkState/checkArgument	2024-06-18 14:05:43 +05:30
AmatyaAvadhanula	4c8932e00e	Fix attempts to publish the same pending segments multiple times (#16605 ) * Fix attempts to publish the same pending segments multiple times	2024-06-18 12:02:13 +05:30
Abhishek Radhakrishnan	51b2f6cb45	Fix retry logic in `BrokerClient` and flakey `BrokerClientTest` (#16618 ) * Fix flakey BrokerClientTest. The testError() method reliably fails in the IDE. This is because the the test runner has <surefire.rerunFailingTestsCount>3</surefire.rerunFailingTestsCount> is set to 3, so maven retries this "flaky test" multiple times and the test code returns a successful response in the third attempt. The exception handling in BrokerClientTest was broken: - All non-2xx errors were being turned as 5xx errors. Remove that block of code. If we need to handle retries of more specific 5xx error codes, that should be hanlded explicitly. Or if there's a source of 4xx class error that needs to be 5xx, fix that in the source of error. * Fix CodeQL warning for unused parameter.	2024-06-17 12:48:15 -07:00
Maytas Monsereenusorn	d6c7d868cd	Fix peon startup with non string property value (#16612 )	2024-06-16 07:48:44 +05:30
Jill Osborne	aec1d5ddd6	Link fix (#16596 ) * Link fix * Update docs/operations/auth.md Co-authored-by: Andreas Maechler <amaechler@gmail.com> --------- Co-authored-by: Andreas Maechler <amaechler@gmail.com>	2024-06-14 11:40:53 -07:00
Virushade	eb842d3dda	Remove redundant check on optional in BlockingQueueFrameChannel.Writable#isClosed (#16595 ) * Remove redundant check on optional in BlockingQueueFrameChannel.Writable#isClosed * Rollback mistake	2024-06-14 15:21:07 +05:30
Laksh Singla	da1e293a57	Deserialize dimensions in group by queries to their respective types when reading from their serialized format (#16511 ) * init * tests, pair groupable * framework change * tests * update benchmarks * comments * add javadoc for the jsonMapper * remove extra deserialization * add special serde for map based result rows * revert unnecessary change --------- Co-authored-by: asdf2014 <asdf2014@apache.org>	2024-06-14 16:27:47 +08:00
317brian	e1926e2549	docs: fix redirect (#16548 ) * doc: cleanup unnecessary redirect (cherry picked from commit d86aaadbc78cc51345f768ee66c9a8b2cbf13f27) * restore redirect file entry. delete md file	2024-06-14 09:54:16 +08:00
Alberic Liu	ea2de517b2	Update the youtube link for druid presentations page (#16601 ) * Update the link to lambda architectures with Druid * update the youtube link	2024-06-14 09:47:46 +08:00
Victoria Lim	836cdb48a5	docs: Migration guide for MVDs to arrays (#16516 ) Co-authored-by: Clint Wylie <cjwylie@gmail.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2024-06-13 13:05:58 -07:00
Zoltan Haindrich	ac19b148c2	Upgrade calcite to 1.37.0 (#16504 ) * contains Make a full copy of the parser and apply our modifications to it #16503 * some minor api changes pair/entry * some unnecessary aggregation was removed from a set of queries in `CalciteSubqueryTest` * `AliasedOperatorConversion` was detecting `CHAR_LENGTH` as not a function ; I've removed the check * the field it was using doesn't look maintained that much * the `kind` is passed for the created `SqlFunction` so I don't think this check is actually needed * some decoupled test cases become broken - will be fixed later * some aggregate related changes: due to the fact that SUM() and COUNT() of no inputs are different * upgrade avatica to 1.25.0 * `CalciteQueryTest#testExactCountDistinctWithFilter` is now executable Close apache/druid#16503	2024-06-13 08:47:50 +02:00
George Shiqi Wu	d5a25a94b8	Docs: Clarify that all supervisors can support early handoff (#16588 )	2024-06-13 08:43:22 +05:30
YongGang	46dbc74053	Support Dynamic Peon Pod Template Selection in K8s extension (#16510 ) * initial commit * add Javadocs * refine JSON input config * more test and fix build * extract existing behavior as default strategy * change template mapping fallback * add docs * update doc * fix doc * address comments * define Matcher interface * fix test coverage * use lower case for endpoint path * update Json name * add more tests * refactoring Selector class	2024-06-12 15:27:10 -07:00
Zoltan Haindrich	f8645de341	Remove incorrect utf8 conversion of ResultCache keys (#16569 )	2024-06-12 13:12:05 -07:00
Clint Wylie	fee509df2e	fix NestedDataColumnIndexerV4 to not report cardinality (#16507 ) * fix NestedDataColumnIndexerV4 to not report cardinality changes: * fix issue similar to #16489 but for NestedDataColumnIndexerV4, which can report STRING type if it only processes a single type of values. this should be less common than the auto indexer problem * fix some issues with sql benchmarks	2024-06-11 20:58:12 -07:00
zachjsh	3f5f5921e0	Fix sql syntax error user (#16583 ) This fixes an issue where in some cases, a SQL syntax error encountered when parsing / planning a query results in an error returned to the user with persona a `admin` when it should instead be `user`.	2024-06-11 18:08:35 -04:00
Andreas Maechler	fec48432d4	docs: Correct some outdated module names (#16584 ) * Fix module names * Better spacing * Some spacing * Suggestions from code review Thanks Abhishek. * More links * Roll-up time * Remove logs * More spelling	2024-06-11 14:17:40 -07:00

... 2 3 4 5 6 ...

14289 Commits All Branches Search

14289 Commits

All Branches