druid

Commit Graph

Author	SHA1	Message	Date
Adarsh Sanjeev	24f8f9e1ab	Add check for eternity time segment to SqlSegmentsMetadataQuery (#12844 ) * Add check for eternity time segment to SqlSegmentsMetadataQuery * Add check for half eternities * Add multiple segments test * Add failing test to document known issue	2022-08-04 22:33:08 -07:00
Gian Merlino	ca4e64aea3	Frame processing and channels. (#12848 ) * Frame processing and channels. Follow-up to #12745. This patch adds three new concepts: 1) Frame channels are interfaces for doing nonblocking reads and writes of frames. 2) Frame processors are interfaces for doing nonblocking processing of frames received from input channels and sent to output channels. 3) Cluster-by keys, which can be used for sorting or partitioning. The patch also adds SuperSorter, a user of these concepts, both to illustrate how they are used, and also because it is going to be useful in future work. Central classes: - ReadableFrameChannel. Implementations include BlockingQueueFrameChannel (in-memory channel that implements both interfaces), ReadableFileFrameChannel (file-based channel), ReadableByteChunksFrameChannel (byte-stream-based channel), and others. - WritableFrameChannel. Implementations include BlockingQueueFrameChannel and WritableStreamFrameChannel (byte-stream-based channel). - ClusterBy, a sorting or partitioning key. - FrameProcessor, nonblocking processor of frames. Implementations include FrameChannelBatcher, FrameChannelMerger, and FrameChannelMuxer. - FrameProcessorExecutor, an executor service that runs FrameProcessors. - SuperSorter, a class that uses frame channels and processors to do parallel external merge sort of any amount of data (as long as there is enough disk space). * Additional tests, fixes. * Changes from review. * Better implementation for ReadableInputStreamFrameChannel. * Rename getFrameFileReference -> newFrameFileReference. * Add InterruptedException to runIncrementally; add more tests. * Cancellation adjustments. * Review adjustments. * Refactor BlockingQueueFrameChannel, rename doneReading and doneWriting to close. * Additional changes from review. * Additional changes. * Fix test. * Adjustments. * Adjustments.	2022-08-04 21:29:04 -07:00
Katya Macedo	c6dd9dd4af	Fix typo in compaction.md (#12774 )	2022-08-04 14:47:22 -07:00
Clint Wylie	73cfc4e5d0	fix expression plan type inference to correctly handle complex types (#12857 )	2022-08-04 02:56:05 -07:00
Paul Rogers	a618458bf0	Tidy up construction of the Guice Injectors (#12816 ) * Refactor Guice initialization Builders for various module collections Revise the extensions loader Injector builders for server startup Move Hadoop init to indexer Clean up server node role filtering Calcite test injector builder * Revisions from review comments * Build fixes * Revisions from review comments	2022-08-04 00:05:07 -07:00
Gian Merlino	ef6811ef88	Improved Java 17 support and Java runtime docs. (#12839 ) * Improved Java 17 support and Java runtime docs. 1) Add a "Java runtime" doc page with information about supported Java versions, garbage collection, and strong encapsulation.. 2) Update asm and equalsverifier to versions that support Java 17. 3) Add additional "--add-opens" lines to surefire configuration, so tests can pass successfully under Java 17. 4) Switch openjdk15 tests to openjdk17. 5) Update FrameFile to specifically mention Java runtime incompatibility as the cause of not being able to use Memory.map. 6) Update SegmentLoadDropHandler to log an error for Errors too, not just Exceptions. This is important because an IllegalAccessError is encountered when the correct "--add-opens" line is not provided, which would otherwise be silently ignored. 7) Update example configs to use druid.indexer.runner.javaOptsArray instead of druid.indexer.runner.javaOpts. (The latter is deprecated.) * Adjustments. * Use run-java in more places. * Add run-java. * Update .gitignore. * Exclude hadoop-client-api. Brought in when building on Java 17. * Swap one more usage of java. * Fix the run-java script. * Fix flag. * Include link to Temurin. * Spelling. * Update examples/bin/run-java Co-authored-by: Xavier Léauté <xl+github@xvrl.net> Co-authored-by: Xavier Léauté <xl+github@xvrl.net>	2022-08-03 23:16:05 -07:00
Clint Wylie	623b075d12	fix nested column sql operator return type inference (#12851 ) * fix nested column sql operator return type inference * oops, final	2022-08-03 15:39:08 -07:00
刘小辉	6f5c1434b8	fix get task may be null (#12100 )	2022-08-03 09:23:48 -07:00
AmatyaAvadhanula	fbd1a07e7e	Fix kinesis IT flakiness (#12821 )	2022-08-03 17:16:16 +05:30
Peter Marshall	0a4ed3ba61	Readme - link fix to build guide (#12849 )	2022-08-03 19:32:37 +08:00
Karan Kumar	3290b49754	Log4j bump to 2.18 due to [LOG4J2-3419] (#12847 ) * Log4j bump to 2.18 due to [LOG4J2-3419] * Fixing license issues	2022-08-02 23:25:40 -07:00
Gian Merlino	2912a36a20	Use nonzero default value of maxQueuedBytes. (#12840 ) * Use nonzero default value of maxQueuedBytes. The purpose of this parameter is to prevent the Broker from running out of memory. The prior default is unlimited; this patch changes it to a relatively conservative 25MB. This may be too low for larger clusters. The risk is that throughput can decrease for queries with large resultsets or large amounts of intermediate data. However, I think this is better than the risk of the prior default, which is that these queries can cause the Broker to go OOM. * Alter calculation.	2022-08-02 17:57:27 -07:00
Gian Merlino	0ca37c20a6	Python 3 support for post-index-task. (#12841 ) * Python 3 support for post-index-task. Useful when running on macOS or any other system that doesn't have Python 2. * Encode JSON returned by read_task_file. * Adjust. * Skip needless loads. * Add a decode. * Additional decodes needed.	2022-08-02 17:53:34 -07:00
Clint Wylie	6981b1cc12	fix bugs with nested column jsonpath parser (#12831 )	2022-08-02 11:38:25 -07:00
Rohan Garg	eabce8a159	Fix flakiness in query-retry ITs (#12818 )	2022-08-02 17:20:16 +05:30
Tejaswini Bandlamudi	cceb2e849e	Perform lazy initialization of parquet extensions module (#12827 ) Historicals and middle managers crash with an `UnknownHostException` on trying to load `druid-parquet-extensions` with an ephemeral Hadoop cluster. This happens because the `fs.defaultFS` URI value cannot be resolved at start up time as the hadoop cluster may not exist at startup time. This commit fixes the error by performing initialization of the filesystem in `ParquetInputFormat.createReader()` whenever a new reader is requested.	2022-08-02 13:41:12 +05:30
Clint Wylie	6046a392b6	add DictionaryEncodedStringValueIndex implementation to NestedFieldLiteralColumnIndexSupplier (#12837 )	2022-08-01 21:40:35 -07:00
Rohan Garg	7ae6cc6e60	Fix string first/last aggregator comparator (#12773 )	2022-08-01 20:54:15 +05:30
317brian	553ff47616	fix: fix broken link to Class TTest (#12836 )	2022-07-31 10:18:14 +08:00
Clint Wylie	d96a9c1e6f	add missing selectors for explicit null columns (#12834 )	2022-07-29 19:08:58 -07:00
Clint Wylie	189e8b9d18	add NumericRangeIndex interface and BoundFilter support (#12830 ) add NumericRangeIndex interface and BoundFilter support changes: * NumericRangeIndex interface, like LexicographicalRangeIndex but for numbers * BoundFilter now uses NumericRangeIndex if comparator is numeric and there is no extractionFn * NestedFieldLiteralColumnIndexSupplier.java now supports supplying NumericRangeIndex for single typed numeric nested literal columns * better faster stronger and (ever so slightly) more understandable * more tests, fix bug * fix style	2022-07-29 18:58:49 -07:00
Paul Rogers	d52abe7b38	Today is that day - Single pass through Calcite planner (#12636 ) * Druid planner now makes only one pass through Calcite planner Resolves the issue that required two parse/plan cycles: one for validate, another for plan. Creates a clone of the Calcite planner and validator to resolve the conflict that prevented the merger.	2022-07-29 18:53:21 -07:00
Charles Smith	efbb58e90e	docs: remove maxRowsPerSegment where appropriate (#12071 ) * remove maxRowsPerSegment where appropriate * fix tutorial, accept suggestions * Update docs/design/coordinator.md * additional tutorial file * fix initial index spec * accept comments * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * add back comment on maxrows per segment * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * rm duplicate entry * Update native-batch-simple-task.md remove ref to `maxrowspersegment` * Update native-batch.md remove ref to `maxrowspersegment` * final tenticles * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-07-28 16:52:13 +05:30
Maytas Monsereenusorn	24c345cdf0	Allow dictionary encoded column to use a more generic index interface (#12826 )	2022-07-27 15:23:00 -07:00
Paul Rogers	a8b155e9c6	Fixes for the Avatica JDBC driver (#12709 ) * Fixes for the Avatica JDBC driver Correctly implement regular and prepared statements Correctly implement result sets Fix race condition with contexts Clarify when parameters are used Prepare for single-pass through the planner * Addressed review comments * Addressed review comment	2022-07-27 15:22:40 -07:00
Atul Mohan	93a9a4b1c5	Add retention for file request logs (#12559 ) * Add retention for file request logs * Spelling	2022-07-27 08:17:02 -07:00
Rohan Garg	bf0886a8ab	Fix hash calcuation in RendezvousHasher (#12817 )	2022-07-27 12:16:27 +05:30
Jacques Arnoux	6b0b1d7af3	replaces hard-coded probe delays with helm values (#12805 )	2022-07-26 14:04:06 +05:30
Laksh Singla	2e616e633a	Determine type of `__time` column by RowSignature in case of External Datasource (#12770 ) Some queries like `REPLACE INTO ... SELECT TIME_PARSE("__time") AS __time FROM ...` fail at the Calcite layer because any column with name `__time` is considered to be of type `SqlTypeName.TIMESTAMP`. Changes: - Modify `RowSignatures.toRelDataType()` so that the type of `__time` column is determined by the RowSignature's type.	2022-07-26 12:09:40 +05:30
Charles Smith	d7d4314367	remove ref to plywood repo (#12809 )	2022-07-26 10:12:13 +08:00
PJ Fanning	188b5b0027	Upgrade to jetty 9.4.48.v20220622 due to CVEs (#12801 ) * Upgrade to jetty 9.4.48.v20220622 due to CVEs * Update licenses.yaml	2022-07-26 10:11:48 +08:00
Tejaswini Bandlamudi	5772dfd155	Peons should not report SysMonitor stats since MiddleManager reports them. (#12802 ) Sysmonitor stats (mem, fs, disk, net, cpu, swap, sys, tcp) are reported by all Druid processes, including Peons that are ephemeral in nature. Since Peons always run on the same host as the MiddleManager that spawned them and is unlikely to change, the SyMonitor metrics emitted by Peon are merely duplicates. This is often not a problem except when machines are super-beefy. Imagine a 64-core machine and 32 workers running on this machine. now you will have each Peon reporting metrics for each core. that's an increase of (32 * 64)x in the number of metrics. This leads to a metric explosion. This PR updates MetricsModule to check node role running while registering SysMonitor and not to load any existing SysMonitor$Stats.	2022-07-23 13:32:16 +05:30
Victoria Lim	6394ecfd21	update figure and reference (#12813 )	2022-07-22 15:54:25 -07:00
Maytas Monsereenusorn	5417aa2055	Fix: ParseException swallow cause Exception (#12810 ) * add impl * add impl * fix checkstyle	2022-07-22 13:46:28 -07:00
Kashif Faraz	6c96d09680	Suppress some false alarm CVEs (#12812 ) This commit suppresses the following CVEs: - CVE-2021-43138: false alarm for async-http-client - CVE-2021-34538: applicable to Hive server - CVE-2020-25638: requires hibernate update, which causes Hadoop ingestion failure - CVE-2021-27568: false alarm for accessors-smart which is a dependency of json-smart (already suppressed)	2022-07-22 22:27:31 +05:30
Kashif Faraz	9e5f0109fd	Fix CVE-2022-2048 (jetty) and CVE-2022-31159 (aws-java-sdk-s3) (#12807 ) Changes: - Upgrade aws sdk version from `1.12.37` to `1.12.264` - Upgrade jetty version from `9.4.41.v20210516` to `9.4.47.v20220610`	2022-07-21 13:08:18 +05:30
Katya Macedo	a2be685824	Remove the time bit, fix headings (#12808 ) * Remove the time bit, fix headings * Adopt review suggestions * Edits * Update smoosh file description * Adopt review suggestions * Update spelling	2022-07-20 15:37:57 -07:00
Maytas Monsereenusorn	3bf1e699ff	GREATEST/LEAST function is incorrectly specifying that it cannot return null (#12804 )	2022-07-20 14:41:24 +05:30
Katya Macedo	809bf161ce	Add a note about setting the value of maxNumConcurrentSubTasks (#12772 ) * Add clarification for combining input source * Update inputFormat note * Update maxNumConcurrentSubTasks note * Fix broken link * Update docs/ingestion/native-batch-input-source.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-07-19 15:34:21 -07:00
Adarsh Sanjeev	f3272a25f9	Add check for sqlOuterLimit to ingest queries (#12799 ) * Add check for sqlOuterLimit to ingest queries * Fix checkstyle * Add comment	2022-07-19 09:02:43 -07:00
Tejaswini Bandlamudi	cc1ff56ca5	Unregisters `RealtimeMetricsMonitor`, `TaskRealtimeMetricsMonitor` on Indexers after task completion (#12743 ) Few indexing tasks register RealtimeMetricsMonitor or TaskRealtimeMetricsMonitor with the process’s MonitorScheduler when they start. These monitors never unregister themselves (they always return true, they'd need to return false to unregister). Each of these monitors emits a set of metrics once every druid.monitoring.emissionPeriod. As a result, after executing several tasks for a while, Indexer emits metrics of these tasks even after they're long gone. Proposed Solution Since one should be able to obtain the last round of ingestion metrics after the task unregisters the monitor, introducing lastRoundMetricsToBePushed variable to keep track of the same and overriding the AbstractMonitor.monitor method in RealtimeMetricsMonitor, TaskRealtimeMetricsMonitor to implement the new logic.	2022-07-18 14:34:18 +05:30
Atul Mohan	75045970cd	S3 Ingestion from non-default endpoints (#11798 ) * Add endpoint support for s3inputsource * Changes to tests * Fix docs * Fix config * Fix inspections * Fix spelling * Remove password from toString	2022-07-15 11:03:34 -07:00
Jianhuan Liu	d4403c15aa	Upgrade prometheus version, add more labels to PrometheusEmitter (#12769 ) Changes: - Upgrade prometheus to version 0.16.0 - Add optional labels `druid_service` and `host_name` to `PrometheusEmitter`	2022-07-15 14:43:12 +05:30
Vadim Ogievetsky	f2a7970a6c	reindex flow should take order from Druid (#12790 )	2022-07-14 20:03:33 -07:00
Clint Wylie	1e0542626b	add nested column query benchmarks (#12786 )	2022-07-14 18:16:30 -07:00
Paul Rogers	ee15c238cc	Clone Calcite planner to access validator (#12708 ) Done in preparation for the "single-pass" planner.	2022-07-14 18:10:33 -07:00
Yuanli Han	50f1f5840d	show json and add search box (#12784 )	2022-07-14 17:01:30 -07:00
Yuanli Han	82315779ff	fix segment timeline bar chart (#12782 )	2022-07-14 16:58:24 -07:00
Vadim Ogievetsky	14e5b8325c	make tick formatting more robust (#12788 )	2022-07-14 16:56:53 -07:00
Clint Wylie	e25ba00470	fix bug in ObjectFlatteners.toMap which caused null values in avro-stream/avro-ocf/parquet/orc to be converted to {} instead of null in web-console sampler UI (#12785 ) * fix bug in ObjectFlatteners.toMap which caused null values in avro-stream/avro-ocf/parquet/orc to be converted to {} instead of null * fix parquet test that expected wrong behavior, my bad heh	2022-07-14 16:52:01 -07:00

1 2 3 4 5 ...

11953 Commits All Branches Search

11953 Commits

All Branches