druid

Commit Graph

Author	SHA1	Message	Date
Maytas Monsereenusorn	1558ef471c	Add some debug tips for debugging peons (#12697 ) * add some debug tips * address comments * fix typo	2022-07-09 01:47:25 -07:00
Didip Kerabat	48fd2e6400	Add missing metrics into statsd-reporter. (#12762 )	2022-07-08 23:13:06 -07:00
Gian Merlino	edfbcc8455	Preserve column order in DruidSchema, SegmentMetadataQuery. (#12754 ) * Preserve column order in DruidSchema, SegmentMetadataQuery. Instead of putting columns in alphabetical order. This is helpful because it makes query order better match ingestion order. It also allows tools, like the reindexing flow in the web console, to more easily do follow-on ingestions using a column order that matches the pre-existing column order. We prefer the order from the latest segments. The logic takes all columns from the latest segments in the order they appear, then adds on columns from older segments after those. * Additional test adjustments. * Adjust imports.	2022-07-08 22:04:11 -07:00
Gian Merlino	9c925b4f09	Frame format for data transfer and short-term storage. (#12745 ) * Frame format for data transfer and short-term storage. As we move towards query execution plans that involve more transfer of data between servers, it's important to have a data format that provides for doing this more efficiently than the options available to us today. This patch adds: - Columnar frames, which support fast querying. - Row-based frames, which support fast sorting via memory comparison and fast whole-row copies via memory copying. - Frame files, a container format that can be stored on disk or transferred between servers. The idea is we should use row-based frames when data is expected to be sorted, and columnar frames when data is expected to be queried. The code in this patch is not used in production yet. Therefore, the patch involves minimal changes outside of the org.apache.druid.frame package. The main ones are adjustments to SqlBenchmark to add benchmarks for queries on frames, and the addition of a "forEach" method to Sequence. * Fixes based on tests, static analysis. * Additional fixes. * Skip DS mapping tests on JDK 14+ * Better JDK checking in tests. * Fix imports. * Additional comment. * Adjustments from code review. * Update test case.	2022-07-08 20:42:06 -07:00
Rohan Garg	bcff35f798	Pushdown join filter with right side referencing columns (#12749 )	2022-07-08 19:59:41 +05:30
Gian Merlino	378fea9517	Retain CSP configuration in ServerConfig constructor. (#12755 ) Without this change, CliIndexer would not apply custom CSP headers and would revert to the default.	2022-07-08 19:19:14 +05:30
Jianhuan Liu	4574dea5e9	Use MXBeans to get GC metrics #12476 (#12481 ) * jvm gc to mxbeans * add zgc and shenandoah #12476 * remove tryCreateGcCounter * separate the space collector * blend GcGenerationCollector into GcCollector * add jdk surefire argLine	2022-07-08 14:32:06 +08:00
Gian Merlino	e82890fde4	Mark specific nimbus.lang.tag.version. (#12751 ) * Mark specific nimbus.lang.tag.version. * Add ignoredUnusedDeclaredDependencies.	2022-07-07 09:58:35 +05:30
PJ Fanning	059aba781a	issue-12628: upgrade jetty to 9.4.41.v20210516 due to CVE (#12629 ) * upgrade jetty to 9.4.41.v20210516 due to cve * Update licenses.yaml	2022-07-07 00:20:01 +08:00
Rohan Garg	d732de9948	Allow adding calcite rules from extensions (#12715 ) * Allow adding calcite rules from extensions * fixup! Allow adding calcite rules from extensions * Move Rules to CalciteRulesManager * fixup! Move Rules to CalciteRulesManager	2022-07-06 19:32:35 +05:30
Gian Merlino	49feffff1b	Add comment about double-close in ColumnSelectorColumnIndexSelector. (#12735 )	2022-07-06 00:50:35 -07:00
Jill Osborne	682ea7f32d	IMPLY-12348: Update description of UNION ALL in SQL syntax doc (#12710 ) * IMPLY-12348: Updated description of UNION ALL * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update sql.md * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-07-05 13:08:01 -07:00
Didip Kerabat	06251c5d2a	Add EIGHT_HOUR into possible list of Granularities. (#12717 ) * Add EIGHT_HOUR into possible list of Granularities. * Add the missing definition. * fix test. * Fix another test. * Stylecheck finally passed. Co-authored-by: Didip Kerabat <didip@apple.com>	2022-07-05 11:05:37 -07:00
Gian Merlino	2b330186e2	Mid-level service client and updated high-level clients. (#12696 ) * Mid-level service client and updated high-level clients. Our servers talk to each other over HTTP. We have a low-level HTTP client (HttpClient) that is super-asynchronous and super-customizable through its handlers. It's also proven to be quite robust: we use it for Broker -> Historical communication over the wide variety of query types and workloads we support. But the low-level client has no facilities for service location or retries, which means we have a variety of high-level clients that implement these in their own ways. Some high-level clients do a better job than others. This patch adds a mid-level ServiceClient that makes it easier for high-level clients to be built correctly and harmoniously, and migrates some of the high-level logic to use ServiceClients. Main changes: 1) Add ServiceClient org.apache.druid.rpc package. That package also contains supporting stuff like ServiceLocator and RetryPolicy interfaces, and a DiscoveryServiceLocator based on DruidNodeDiscoveryProvider. 2) Add high-level OverlordClient in org.apache.druid.rpc.indexing. 3) Indexing task client creator in TaskServiceClients. It uses SpecificTaskServiceLocator to find the tasks. This improves on ClientInfoTaskProvider by caching task locations for up to 30 seconds across calls, reducing load on the Overlord. 4) Rework ParallelIndexSupervisorTaskClient to use a ServiceClient instead of extending IndexTaskClient. 5) Rework RemoteTaskActionClient to use a ServiceClient instead of DruidLeaderClient. 6) Rework LocalIntermediaryDataManager, TaskMonitor, and ParallelIndexSupervisorTask. As a result, MiddleManager, Peon, and Overlord no longer need IndexingServiceClient (which internally used DruidLeaderClient). There are some concrete benefits over the prior logic, namely: - DruidLeaderClient does retries in its "go" method, but only retries exactly 5 times, does not sleep between retries, and does not retry retryable HTTP codes like 502, 503, 504. (It only retries IOExceptions.) ServiceClient handles retries in a more reasonable way. - DruidLeaderClient's methods are all synchronous, whereas ServiceClient methods are asynchronous. This is used in one place so far: the SpecificTaskServiceLocator, so we don't need to block a thread trying to locate a task. It can be used in other places in the future. - HttpIndexingServiceClient does not properly handle all server errors. In some cases, it tries to parse a server error as a successful response (for example: in getTaskStatus). - IndexTaskClient currently makes an Overlord call on every task-to-task HTTP request, as a way to find where the target task is. ServiceClient, through SpecificTaskServiceLocator, caches these target locations for a period of time. * Style adjustments. * For the coverage. * Adjustments. * Better behaviors. * Fixes.	2022-07-05 09:43:26 -07:00
Clint Wylie	36e38b319b	add virtual column support to search query (#12720 )	2022-07-04 21:58:10 -07:00
Rohan Garg	97a926fb29	Suppress CVE-2022-33915 (#12740 )	2022-07-04 22:48:08 +05:30
Tejaswini Bandlamudi	d559773a0e	sets Hadoop conf ClassLoader (#12738 )	2022-07-04 17:07:39 +05:30
imply-cheddar	e3128e3fa3	Poison stupid pool (#12646 ) * Poison StupidPool and fix resource leaks There are various resource leaks from test setup as well as some corners in query processing. We poison the StupidPool to start failing tests when the leaks come and fix any issues uncovered from that so that we can start from a clean baseline. Unfortunately, because of how poisoning works, we can only fail future checkouts from the same pool, which means that there is a natural race between a leak happening -> GC occurs -> leak detected -> pool poisoned. This race means that, depending on interleaving of tests, if the very last time that an object is checked out from the pool leaks, then it won't get caught. At some point in the future, something will catch it, however and from that point on it will be deterministic. * Remove various things left over from iterations * Clean up FilterAnalysis and add javadoc on StupidPool * Revert changes to .idea/misc.xml that accidentally got pushed * Style and test branches * Stylistic woes	2022-07-03 14:36:22 -07:00
Clint Wylie	bbbb6e1c3f	fix DruidSchema issue where datasources with no segments can become stuck in tables list indefinitely (#12727 )	2022-07-01 18:54:01 -07:00
Kashif Faraz	f5b5cb93ea	Fix expiry timeout bug in LocalIntermediateDataManager (#12722 ) The expiry timeout is compared against the current time but the condition is reversed. This means that as soon as a supervisor task finishes, its partitions are cleaned up, irrespective of the specified `intermediaryPartitionTimeout` period. After these changes, the `intermediaryPartitionTimeout` will start getting honored. Changes * Fix the condition * Add tests to verify the new correct behaviour * Reduce the default expiry timeout from P1D to PT5M to retain current behaviour in case of default configs.	2022-07-01 16:29:22 +05:30
Clint Wylie	48731710fb	precursor changes for nested columns to minimize files changed (#12714 ) * precursor changes for nested columns to minimize files changed * inspection fix * visibility * adjustment * unecessary change	2022-07-01 02:27:19 -07:00
Clint Wylie	d30efb1c1e	fix bug when rewriting sql virtual column registry (#12718 )	2022-07-01 02:24:00 -07:00
Rohan Garg	c09b5a2294	Fix skipTests build flag (#12716 ) * fix skipTests * Skip console UTs with skipTests * Use skipTests in skip-tests profile	2022-06-29 21:59:26 -07:00
Rui Chen	068bea6334	deps: upgrade mysql-connector-java to v5.1.49 (#12704 )	2022-06-29 23:15:46 +08:00
Abhishek Agarwal	dbd45daf33	Flakiness and exceptions during tests (#12705 )	2022-06-28 10:36:23 +05:30
Paul Rogers	f83fab699e	Add IT-related changes pulled out of PR #12368 (#12673 ) This commit contains changes made to the existing ITs to support the new ITs. Changes: - Make the "custom node role" code usable by the new ITs. - Use flag `-DskipITs` to skips the integration tests but runs unit tests. - Use flag `-DskipUTs` skips unit tests but runs the "new" integration tests. - Expand the existing Druid profile, `-P skip-tests` to skip both ITs and UTs.	2022-06-26 02:13:59 +05:30
Paul Rogers	f7caee3b25	Revert changes from #12672 (#12703 ) * Revert changes from #12672 * Reverted more conflicting changes Changes are not needed given previous reversions.	2022-06-25 09:10:44 +05:30
Gian Merlino	679ccffe0f	Revert "SqlSegmentsMetadataQuery: Fix OVERLAPS for wide target segments. (#12600 )" (#12679 ) This reverts commit `8fbf92e047`.	2022-06-25 09:08:26 +05:30
William Hyun	2aadd69f54	Update ORC to 1.7.5 (#12667 )	2022-06-24 16:08:42 -07:00
Gian Merlino	d5abd06b96	Fix flaky KafkaIndexTaskTest. (#12657 ) * Fix flaky KafkaIndexTaskTest. The testRunTransactionModeRollback case had many race conditions. Most notably, it would commit a transaction and then immediately check to see that the results were not indexed. This is racey because it relied on the indexing thread being slower than the test thread. Now, the case waits for the transaction to be processed by the indexing thread before checking the results. * Changes from review.	2022-06-24 13:53:51 -07:00
Didip Kerabat	6ddb828c7a	Able to filter Cloud objects with glob notation. (#12659 ) In a heterogeneous environment, sometimes you don't have control over the input folder. Upstream can put any folder they want. In this situation the S3InputSource.java is unusable. Most people like me solved it by using Airflow to fetch the full list of parquet files and pass it over to Druid. But doing this explodes the JSON spec. We had a situation where 1 of the JSON spec is 16MB and that's simply too much for Overlord. This patch allows users to pass {"filter": "*.parquet"} and let Druid performs the filtering of the input files. I am using the glob notation to be consistent with the LocalFirehose syntax.	2022-06-24 11:40:08 +05:30
Tejaswini Bandlamudi	1fc2f6e4b0	Throw BadQueryContextException if context params cannot be parsed (#12680 )	2022-06-24 09:21:25 +05:30
Gian Merlino	d29343cbe3	Disable autokill of segments by default. (#12693 ) Also add clarifying commentary to the documentation about how durationToRetain works.	2022-06-23 17:17:11 -07:00
Paul Rogers	ffcb996468	Cleanup changes pulled out of PR #12368 (#12672 ) This commit contains the cleanup needed for the new integration test framework. Changes: - Fix log lines, misspellings, docs, etc. - Allow the use of some of Druid's "JSON config" objects in tests - Fix minor bug in `BaseNodeRoleWatcher`	2022-06-23 23:19:50 +05:30
Jihoon Son	3d9e3dbad9	Fix hadoop library location for integration tests (#12497 )	2022-06-23 10:39:54 -05:00
Gian Merlino	4d892483ca	Fix thread-unsafe emitter usage in SeekableStreamSupervisorStateTest. (#12658 ) The TestEmitter is used from different threads without concurrency control. This patch makes the emitter thread-safe.	2022-06-22 22:29:16 -07:00
Kashif Faraz	b6f8d7a1b3	Add query context param `forceExpressionVirtualColumns` to always use "expression"-type virtual columns in query plan (#12583 ) SQL expressions such as those containing `MV_FILTER_ONLY` and `MV_FILTER_NONE` are planned as specialized virtual columns instead of the default `expression`-type virtual columns. This commit adds a new context parameter to force the `expression`-type virtual columns. Changes - Add query context param `forceExpressionVirtualColumns` - Use context param to determine if specialized virtual columns should be used or not - Moved some tests into `CalciteExplainQueryTest`	2022-06-22 15:33:50 +05:30
AmatyaAvadhanula	6bcb778eeb	Add CVEs for Hadoop3 (#12336 ) * Add CVEs * Move CVEs under hadoop3 section	2022-06-22 14:12:17 +05:30
Tejaswini Bandlamudi	99e1b4efee	Update default value of `inputSegmentSizeBytes` in configuration docs (#12678 )	2022-06-22 09:05:03 +05:30
Gian Merlino	0099940808	Add TIME_IN_INTERVAL SQL operator. (#12662 ) * Add TIME_IN_INTERVAL SQL operator. The operator is implemented as a convertlet rather than an OperatorConversion, because this allows it to be equivalent to using the >= and < operators directly. * SqlParserPos cannot be null here. * Remove unused import. * Doc updates. * Add words to dictionary.	2022-06-21 13:05:37 -07:00
AmatyaAvadhanula	eccdec9139	Reduce interval creation cost for segment cost computation (#12670 ) Changes: - Reuse created interval in `SegmentId.getInterval()` - Intern intervals to save on memory footprint	2022-06-21 17:39:43 +05:30
Tejaswini Bandlamudi	a85b1d8985	Lazy Initialisation of Orc extensions module (#12663 ) * Lazy initialization of Orc extension * nit * moving intialize method to OrcInputFormat	2022-06-21 11:13:10 +05:30
Gian Merlino	818974f6e4	ScanQuery: Fix JsonIgnore for isLegacy. (#12674 ) True, false, and null have different meanings: true/false mean "legacy" and "not legacy"; null means use the default set by ScanQueryConfig. So, we need to respect this in the JsonIgnore setup.	2022-06-18 15:55:54 -07:00
Gian Merlino	e76a5077ef	Fix self-referential shape inspection in BaseExpressionColumnValueSelector. (#12669 ) * Fix self-referential shape inspection in BaseExpressionColumnValueSelector. The new test would throw StackOverflowError on the old code. * Restore prior test.	2022-06-17 16:15:50 -07:00
Clint Wylie	18937ffee2	split out null value index (#12627 ) * split out null value index * gg spotbugs * fix stuff	2022-06-17 15:29:23 -07:00
Paul Rogers	893759de91	Remove null and empty fields from native queries (#12634 ) * Remove null and empty fields from native queries * Test fixes * Attempted IT fix. * Revisions from review comments * Build fixes resulting from changes suggested by reviews * IT fix for changed segment size	2022-06-16 14:07:25 -07:00
Jill Osborne	f050069767	Segments doc update (#12344 ) * Corrected heading levels in segments doc * IMPLY-18394: Updated Segments doc * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update segments.md * Updated links to changed headings in Segments doc * Corrected spelling error * Update segments.md Incorporated suggestions from Paul Rogers. * Update index.md * Update segments.md * Update segments.md * Update segments.md * Update compaction.md * Update docs/design/segments.md fix typo * Update docs/ingestion/compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-06-16 13:25:17 -07:00
AmatyaAvadhanula	f970757efc	Optimize overlord GET /tasks memory usage (#12404 ) The web-console (indirectly) calls the Overlord’s GET tasks API to fetch the tasks' summary which in turn queries the metadata tasks table. This query tries to fetch several columns, including payload, of all the rows at once. This introduces a significant memory overhead and can cause unresponsiveness or overlord failure when the ingestion tab is opened multiple times (due to several parallel calls to this API) Another thing to note is that the task table (the payload column in particular) can be very large. Extracting large payloads from such tables can be very slow, leading to slow UI. While we are fixing the memory pressure in the overlord, we can also fix the slowness in UI caused by fetching large payloads from the table. Fetching large payloads also puts pressure on the metadata store as reported in the community (Metadata store query performance degrades as the tasks in druid_tasks table grows · Issue #12318 · apache/druid ) The task summaries returned as a response for the API are several times smaller and can fit comfortably in memory. So, there is an opportunity here to fix the memory usage, slow ingestion, and under-pressure metadata store by removing the need to handle large payloads in every layer we can. Of course, the solution becomes complex as we try to fix more layers. With that in mind, this page captures two approaches. They vary in complexity and also in the degree to which they fix the aforementioned problems.	2022-06-16 22:30:37 +05:30
Lucas Capistrant	602d95d865	Add a builder class for TestDruidCoordinatorConfig (#12624 ) * Add a builder class for TestDruidCoordinatorConfig * updates after review * Fix formatting	2022-06-16 09:11:31 -05:00
Victoria Lim	94564b6ce6	Update screenshots for Druid console doc (#12593 ) * druid console doc updates * remove extra image * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> * updated screenshot labels Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-06-15 16:42:20 -07:00

... 3 4 5 6 7 ...

12038 Commits All Branches Search

12038 Commits

All Branches