druid

Commit Graph

Author	SHA1	Message	Date
Kashif Faraz	f5b5cb93ea	Fix expiry timeout bug in LocalIntermediateDataManager (#12722 ) The expiry timeout is compared against the current time but the condition is reversed. This means that as soon as a supervisor task finishes, its partitions are cleaned up, irrespective of the specified `intermediaryPartitionTimeout` period. After these changes, the `intermediaryPartitionTimeout` will start getting honored. Changes * Fix the condition * Add tests to verify the new correct behaviour * Reduce the default expiry timeout from P1D to PT5M to retain current behaviour in case of default configs.	2022-07-01 16:29:22 +05:30
Clint Wylie	48731710fb	precursor changes for nested columns to minimize files changed (#12714 ) * precursor changes for nested columns to minimize files changed * inspection fix * visibility * adjustment * unecessary change	2022-07-01 02:27:19 -07:00
Clint Wylie	d30efb1c1e	fix bug when rewriting sql virtual column registry (#12718 )	2022-07-01 02:24:00 -07:00
Rohan Garg	c09b5a2294	Fix skipTests build flag (#12716 ) * fix skipTests * Skip console UTs with skipTests * Use skipTests in skip-tests profile	2022-06-29 21:59:26 -07:00
Rui Chen	068bea6334	deps: upgrade mysql-connector-java to v5.1.49 (#12704 )	2022-06-29 23:15:46 +08:00
Abhishek Agarwal	dbd45daf33	Flakiness and exceptions during tests (#12705 )	2022-06-28 10:36:23 +05:30
Paul Rogers	f83fab699e	Add IT-related changes pulled out of PR #12368 (#12673 ) This commit contains changes made to the existing ITs to support the new ITs. Changes: - Make the "custom node role" code usable by the new ITs. - Use flag `-DskipITs` to skips the integration tests but runs unit tests. - Use flag `-DskipUTs` skips unit tests but runs the "new" integration tests. - Expand the existing Druid profile, `-P skip-tests` to skip both ITs and UTs.	2022-06-26 02:13:59 +05:30
Paul Rogers	f7caee3b25	Revert changes from #12672 (#12703 ) * Revert changes from #12672 * Reverted more conflicting changes Changes are not needed given previous reversions.	2022-06-25 09:10:44 +05:30
Gian Merlino	679ccffe0f	Revert "SqlSegmentsMetadataQuery: Fix OVERLAPS for wide target segments. (#12600 )" (#12679 ) This reverts commit `8fbf92e047`.	2022-06-25 09:08:26 +05:30
William Hyun	2aadd69f54	Update ORC to 1.7.5 (#12667 )	2022-06-24 16:08:42 -07:00
Gian Merlino	d5abd06b96	Fix flaky KafkaIndexTaskTest. (#12657 ) * Fix flaky KafkaIndexTaskTest. The testRunTransactionModeRollback case had many race conditions. Most notably, it would commit a transaction and then immediately check to see that the results were not indexed. This is racey because it relied on the indexing thread being slower than the test thread. Now, the case waits for the transaction to be processed by the indexing thread before checking the results. * Changes from review.	2022-06-24 13:53:51 -07:00
Didip Kerabat	6ddb828c7a	Able to filter Cloud objects with glob notation. (#12659 ) In a heterogeneous environment, sometimes you don't have control over the input folder. Upstream can put any folder they want. In this situation the S3InputSource.java is unusable. Most people like me solved it by using Airflow to fetch the full list of parquet files and pass it over to Druid. But doing this explodes the JSON spec. We had a situation where 1 of the JSON spec is 16MB and that's simply too much for Overlord. This patch allows users to pass {"filter": "*.parquet"} and let Druid performs the filtering of the input files. I am using the glob notation to be consistent with the LocalFirehose syntax.	2022-06-24 11:40:08 +05:30
Tejaswini Bandlamudi	1fc2f6e4b0	Throw BadQueryContextException if context params cannot be parsed (#12680 )	2022-06-24 09:21:25 +05:30
Gian Merlino	d29343cbe3	Disable autokill of segments by default. (#12693 ) Also add clarifying commentary to the documentation about how durationToRetain works.	2022-06-23 17:17:11 -07:00
Paul Rogers	ffcb996468	Cleanup changes pulled out of PR #12368 (#12672 ) This commit contains the cleanup needed for the new integration test framework. Changes: - Fix log lines, misspellings, docs, etc. - Allow the use of some of Druid's "JSON config" objects in tests - Fix minor bug in `BaseNodeRoleWatcher`	2022-06-23 23:19:50 +05:30
Jihoon Son	3d9e3dbad9	Fix hadoop library location for integration tests (#12497 )	2022-06-23 10:39:54 -05:00
Gian Merlino	4d892483ca	Fix thread-unsafe emitter usage in SeekableStreamSupervisorStateTest. (#12658 ) The TestEmitter is used from different threads without concurrency control. This patch makes the emitter thread-safe.	2022-06-22 22:29:16 -07:00
Kashif Faraz	b6f8d7a1b3	Add query context param `forceExpressionVirtualColumns` to always use "expression"-type virtual columns in query plan (#12583 ) SQL expressions such as those containing `MV_FILTER_ONLY` and `MV_FILTER_NONE` are planned as specialized virtual columns instead of the default `expression`-type virtual columns. This commit adds a new context parameter to force the `expression`-type virtual columns. Changes - Add query context param `forceExpressionVirtualColumns` - Use context param to determine if specialized virtual columns should be used or not - Moved some tests into `CalciteExplainQueryTest`	2022-06-22 15:33:50 +05:30
AmatyaAvadhanula	6bcb778eeb	Add CVEs for Hadoop3 (#12336 ) * Add CVEs * Move CVEs under hadoop3 section	2022-06-22 14:12:17 +05:30
Tejaswini Bandlamudi	99e1b4efee	Update default value of `inputSegmentSizeBytes` in configuration docs (#12678 )	2022-06-22 09:05:03 +05:30
Gian Merlino	0099940808	Add TIME_IN_INTERVAL SQL operator. (#12662 ) * Add TIME_IN_INTERVAL SQL operator. The operator is implemented as a convertlet rather than an OperatorConversion, because this allows it to be equivalent to using the >= and < operators directly. * SqlParserPos cannot be null here. * Remove unused import. * Doc updates. * Add words to dictionary.	2022-06-21 13:05:37 -07:00
AmatyaAvadhanula	eccdec9139	Reduce interval creation cost for segment cost computation (#12670 ) Changes: - Reuse created interval in `SegmentId.getInterval()` - Intern intervals to save on memory footprint	2022-06-21 17:39:43 +05:30
Tejaswini Bandlamudi	a85b1d8985	Lazy Initialisation of Orc extensions module (#12663 ) * Lazy initialization of Orc extension * nit * moving intialize method to OrcInputFormat	2022-06-21 11:13:10 +05:30
Gian Merlino	818974f6e4	ScanQuery: Fix JsonIgnore for isLegacy. (#12674 ) True, false, and null have different meanings: true/false mean "legacy" and "not legacy"; null means use the default set by ScanQueryConfig. So, we need to respect this in the JsonIgnore setup.	2022-06-18 15:55:54 -07:00
Gian Merlino	e76a5077ef	Fix self-referential shape inspection in BaseExpressionColumnValueSelector. (#12669 ) * Fix self-referential shape inspection in BaseExpressionColumnValueSelector. The new test would throw StackOverflowError on the old code. * Restore prior test.	2022-06-17 16:15:50 -07:00
Clint Wylie	18937ffee2	split out null value index (#12627 ) * split out null value index * gg spotbugs * fix stuff	2022-06-17 15:29:23 -07:00
Paul Rogers	893759de91	Remove null and empty fields from native queries (#12634 ) * Remove null and empty fields from native queries * Test fixes * Attempted IT fix. * Revisions from review comments * Build fixes resulting from changes suggested by reviews * IT fix for changed segment size	2022-06-16 14:07:25 -07:00
Jill Osborne	f050069767	Segments doc update (#12344 ) * Corrected heading levels in segments doc * IMPLY-18394: Updated Segments doc * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update segments.md * Updated links to changed headings in Segments doc * Corrected spelling error * Update segments.md Incorporated suggestions from Paul Rogers. * Update index.md * Update segments.md * Update segments.md * Update segments.md * Update compaction.md * Update docs/design/segments.md fix typo * Update docs/ingestion/compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-06-16 13:25:17 -07:00
AmatyaAvadhanula	f970757efc	Optimize overlord GET /tasks memory usage (#12404 ) The web-console (indirectly) calls the Overlord’s GET tasks API to fetch the tasks' summary which in turn queries the metadata tasks table. This query tries to fetch several columns, including payload, of all the rows at once. This introduces a significant memory overhead and can cause unresponsiveness or overlord failure when the ingestion tab is opened multiple times (due to several parallel calls to this API) Another thing to note is that the task table (the payload column in particular) can be very large. Extracting large payloads from such tables can be very slow, leading to slow UI. While we are fixing the memory pressure in the overlord, we can also fix the slowness in UI caused by fetching large payloads from the table. Fetching large payloads also puts pressure on the metadata store as reported in the community (Metadata store query performance degrades as the tasks in druid_tasks table grows · Issue #12318 · apache/druid ) The task summaries returned as a response for the API are several times smaller and can fit comfortably in memory. So, there is an opportunity here to fix the memory usage, slow ingestion, and under-pressure metadata store by removing the need to handle large payloads in every layer we can. Of course, the solution becomes complex as we try to fix more layers. With that in mind, this page captures two approaches. They vary in complexity and also in the degree to which they fix the aforementioned problems.	2022-06-16 22:30:37 +05:30
Lucas Capistrant	602d95d865	Add a builder class for TestDruidCoordinatorConfig (#12624 ) * Add a builder class for TestDruidCoordinatorConfig * updates after review * Fix formatting	2022-06-16 09:11:31 -05:00
Victoria Lim	94564b6ce6	Update screenshots for Druid console doc (#12593 ) * druid console doc updates * remove extra image * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> * updated screenshot labels Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-06-15 16:42:20 -07:00
Gian Merlino	70f3b13621	ForkingTaskRunner: Set ActiveProcessorCount for tasks. (#12592 ) * ForkingTaskRunner: Set ActiveProcessorCount for tasks. This prevents various automatically-sized thread pools from being unreasonably large (we don't want each task to size its pools as if it is the only thing on the entire machine). * Fix tests. * Add missing LifecycleStart annotation. * ForkingTaskRunner needs ManageLifecycle.	2022-06-15 15:56:32 -07:00
Paul Rogers	45e3111549	Clean up query contexts (#12633 ) * Clean up query contexts Uses constants in place of literal strings for context keys. Moves some QueryContext methods to QueryContexts for reuse. * Revisions from review comments	2022-06-15 11:31:22 -07:00
Rohan Garg	28f2c8e112	Support LoadScope for Peons + Access Modifier Updates (#12640 ) * Support LoadScope for Peons * Update access modifiers for GroupByEngineV2	2022-06-14 21:52:50 -07:00
Gian Merlino	283249c51b	NettyHttpClient: Fix double-return on certain exceptions. (#12626 ) The "exceptionCaught" handler may get called multiple times. We should only return the channel to the pool the first time. Returning it more than once leads to a warning like "Resource at key[%s] was returned multiple times?"	2022-06-14 21:40:47 -07:00
Gian Merlino	1f6e888472	Add QoSFilters first in the chain. (#12625 ) * Add QoSFilters first in the chain. When a request is suspended and later resumed due to QoS constraints, its filter chain is restarted. Placing QoSFilters first in the chain avoids double-execution of other filters. Fixes an issue where requests deferred by QoS would report 403 Forbidden due to double-execution of SecuritySanityCheckFilter. * Smaller changes. * Add QoS filters in BaseJettyTest. * Remove unused parameter.	2022-06-14 13:37:00 -07:00
Gian Merlino	ceb4ace118	NettyHttpClient: Replace ReadTimeoutException with our own exception. (#12635 ) * NettyHttpClient: Replace ReadTimeoutException with our own exception. * Replace exception with same type. * Remove unused import.	2022-06-14 13:34:46 -07:00
Vadim Ogievetsky	6f7fa334fd	Web console: totalNumMergeTasks can be set on range also (#12648 ) * totalNumMergeTasks can be set on range also * fix formatting	2022-06-14 11:18:17 -07:00
Atul Mohan	68bae6eafb	Fix version in master (#12644 )	2022-06-14 11:32:46 +05:30
Rohan Garg	afaea251f2	Push join build table values as filter incase of duplicates (#12225 ) * Push join build table values as filter * Add tests for JoinableFactoryWrapper * fixup! Push join build table values as filter * fixup! Add tests for JoinableFactoryWrapper * fixup! Push join build table values as filter	2022-06-13 17:18:27 -07:00
317brian	27e8b43673	fix: update footer copyright year (#12594 )	2022-06-13 16:29:58 -07:00
Gian Merlino	1ace7336cd	Update node to 14.19.3. (#12632 )	2022-06-10 10:18:12 -07:00
Victoria Lim	353475bd36	Docs for automatic compaction (#12569 ) * docs for auto-compaction * fix broken links * another link * Apply suggestions from code review Co-authored-by: Suneet Saldanha <suneet@apache.org> * Apply suggestions from code review Co-authored-by: Suneet Saldanha <suneet@apache.org> * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Suneet Saldanha <suneet@apache.org> * reorg content for skipOffset * Update docs/ingestion/automatic-compaction.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Apply suggestions from code review Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Suneet Saldanha <suneet@apache.org> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-06-09 14:55:12 -07:00
TSFenwick	a3603ad6b0	Use DefaultQueryConfig in SqlLifecycle to correctly populate request logs (#12613 ) Fixes an issue where sql query request logs do not include the default query context values set via `druid.query.default.context.xyz` runtime properties. # Change summary * Inject `DefaultQueryConfig` into `SqlLifecycleFactory` * Add params from `DefaultQueryConfig` to the query context in `SqlLifecycle` # Description - This change does not affect query execution. This is because the `DefaultQueryConfig` was already being used in `QueryLifecycle`, which is initialized when the SQL is translated to a native query. - This also handles any potential use case where a context parameter should be handled at the SQL stage itself.	2022-06-08 12:52:50 +05:30
Gian Merlino	8fbf92e047	SqlSegmentsMetadataQuery: Fix OVERLAPS for wide target segments. (#12600 ) * SqlSegmentsMetadataQuery: Fix OVERLAPS for wide target segments. Segments with endpoints prior to year 0 or after year 9999 may overlap the search intervals but not match the generated SQL conditions. So, we need to add an additional OR condition to catch these. I checked a real, live MySQL metadata store to confirm that the query still uses metadata store indexes. It does. * Add comments.	2022-06-07 11:33:46 -07:00
Abhishek Agarwal	59a0c10c47	Add remedial information in error message when type is unknown (#12612 ) Often users are submitting queries, and ingestion specs that work only if the relevant extension is not loaded. However, the error is too technical for the users and doesn't suggest them to check for missing extensions. This PR modifies the error message so users can at least check their settings before assuming that the error is because of a bug.	2022-06-07 20:22:45 +05:30
Laksh Singla	81c37c6515	Add validation for invalid partitioned by granularities (#12589 ) * Add validation for invalid partitioned by granularities * review comments * improve error message, change location of the method * remove imports * use StringUtils.lowercase Co-authored-by: Adarsh Sanjeev <adarshsanjeev@gmail.com>	2022-06-06 22:00:29 +05:30
Adarsh Sanjeev	5a283964ca	Improve SQL validation error messages (#12611 ) Update the SQL validation error message to specify whether the ingest is INSERT or REPLACE for better user experience.	2022-06-06 16:14:28 +05:30
Gian Merlino	abf0e0a159	CompressionStrategyTest: Fix thread-unsafe Closer usage. (#12605 ) Closer is not thread-safe, so we need one per thread in the concurrency tests.	2022-06-04 10:57:13 -07:00
Gian Merlino	a503683a4a	Add caching and CSP response headers. (#12609 ) * Add caching and CSP response headers. * Fix tests. * Fix checkstyle issues Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2022-06-04 21:46:49 +05:30

... 2 3 4 5 6 ...

11969 Commits All Branches Search

11969 Commits

All Branches