druid

Commit Graph

Author	SHA1	Message	Date
Rishabh Singh	db95c375a6	Increase historical heap for standard IT (#15337 ) Lately, Query IT has been failing due to historical server running out of memory (OOM). We are investigating the historical heap dump from the test. Until the issue is resolved, we are increasing the heap size of historical server.	2023-11-08 15:21:30 +05:30
Abhishek Agarwal	4b64a5693b	Move service specific JVM parameters to the right in tests (#15325 ) Historical OOMs were not getting dumped into /shared/logs because common JVM flags will override service-specific JVM flags. This PR fixes that and also removes unnecessary overrides in historical.	2023-11-06 15:45:59 +05:30
Rishabh Singh	8c802e4c9b	Relocating Table Schema Building: Shifting from Brokers to Coordinator for Improved Efficiency (#14985 ) In the current design, brokers query both data nodes and tasks to fetch the schema of the segments they serve. The table schema is then constructed by combining the schemas of all segments within a datasource. However, this approach leads to a high number of segment metadata queries during broker startup, resulting in slow startup times and various issues outlined in the design proposal. To address these challenges, we propose centralizing the table schema management process within the coordinator. This change is the first step in that direction. In the new arrangement, the coordinator will take on the responsibility of querying both data nodes and tasks to fetch segment schema and subsequently building the table schema. Brokers will now simply query the Coordinator to fetch table schema. Importantly, brokers will still retain the capability to build table schemas if the need arises, ensuring both flexibility and resilience.	2023-11-04 19:33:25 +05:30
Xavier Léauté	352702bb25	run some integration tests with Java 21 (#15104 ) * use setup-java everywhere for consistency * add Java 21 to integration test matrix * simplify docker build containers script + add Java 21 * fix for Java versions reporting 21-ea	2023-10-20 11:18:13 +08:00
Laksh Singla	5f86072456	Prepare master for Druid 29 (#15121 ) Prepare master for Druid 29	2023-10-11 10:33:45 +05:30
Rishabh Singh	ebb9724c26	Pass jvm option to write heap dump on out of memory (#15053 )	2023-09-29 17:54:53 +05:30
Zoltan Haindrich	5f3b310115	Build reliablity fixes (#15048 ) * disable parallel builds; enable batch mode to get rid of transfer progress * restore .m2 from setup-java if not found * some change to sql * add ws * fix quote * fix quote * undo querytest change * nullhandling in mvtest * init more * skip commitid plugin * add-back 1.0C to build ; remove redundant skip-s from copy-resources; add comment	2023-09-28 12:27:52 -07:00
Soumyava	8088a763a6	Vectorize earliest aggregator for both numeric and string types (#14408 ) * Vectorizing earliest for numeric * Vectorizing earliest string aggregator * checkstyle fix * Removing unnecessary exceptions * Ignoring tests in MSQ as earliest is not supported for numeric there * Fixing benchmarks * Updating tests as MSQ does not support earliest for some cases * Addressing review comments by adding the following: 1. Checking capabilities first before creating selectors 2. Removing mockito in tests for numeric first aggs 3. Removing unnecessary tests * Addressing issues for dictionary encoded single string columns where we can use the dictionary ids instead of the entire string * Adding a flag for multi value dimension selector * Addressing comments * 1 more change * Handling review comments part 1 * Handling review comments and correctness fix for latest_by when the time expression need not be in sorted order * Updating numeric first vector agg * Revert "Updating numeric first vector agg" This reverts commit `4291709901`. * Updating code for correctness issues * fixing an issue with latest agg * Adding more comments and removing an unnecessary check * Addressing null checks for tie selector and only vectorize false for quantile sketches	2023-09-05 08:41:42 -07:00
Clint Wylie	5d1412949e	enable sql compatible null handling mode by default (#14792 ) * enable sql compatible null handling mode by default * fix bug with string first/last aggs when druid.generic.useDefaultValueForNull=false	2023-08-21 20:07:13 -07:00
Kashif Faraz	097b645005	Clean up after add kill bufferPeriod (#14868 ) Follow up changes to #12599 Changes: - Rename column `used_flag_last_updated` to `used_status_last_updated` - Remove new CLI tool `UpdateTables`. - We already have a `CreateTables` with similar functionality, which should be able to handle update cases too. - Any user running the cluster for the first time should either just have `connector.createTables` enabled or run `CreateTables` which should create tables at the latest version. - For instance, the `UpdateTables` tool would be inadequate when a new metadata table has been added to Druid, and users would have to run `CreateTables` anyway. - Remove `upgrade-prep.md` and include that info in `metadata-init.md`. - Fix log messages to adhere to Druid style - Use lambdas	2023-08-19 00:00:04 +05:30
Lucas Capistrant	9c124f2cde	Add a configurable bufferPeriod between when a segment is marked unused and deleted by KillUnusedSegments duty (#12599 ) * Add new configurable buffer period to create gap between mark unused and kill of segment * Changes after testing * fixes and improvements * changes after initial self review * self review changes * update sql statement that was lacking last_used * shore up some code in SqlMetadataConnector after self review * fix derby compatibility and improve testing/docs * fix checkstyle violations * Fixes post merge with master * add some unit tests to improve coverage * ignore test coverage on new UpdateTools cli tool * another attempt to ignore UpdateTables in coverage check * change column name to used_flag_last_updated * fix a method signature after column name switch * update docs spelling * Update spelling dictionary * Fixing up docs/spelling and integrating altering tasks table with my alteration code * Update NULL values for used_flag_last_updated in the background * Remove logic to allow segs with null used_flag_last_updated to be killed regardless of bufferPeriod * remove unneeded things now that the new column is automatically updated * Test new background row updater method * fix broken tests * fix create table statement * cleanup DDL formatting * Revert adding columns to entry table by default * fix compilation issues after merge with master * discovered and fixed metastore inserts that were breaking integration tests * fixup forgotten insert by using pattern of sharing now timestamp across columns * fix issue introduced by merge * fixup after merge with master * add some directions to docs in the case of segment table validation issues	2023-08-17 19:32:51 -05:00
Rishabh Singh	0dc305f9e4	Upgrade hibernate validator version to fix CVE-2019-10219 (#14757 )	2023-08-14 11:50:51 +05:30
Tejaswini Bandlamudi	a45b25fa1d	Removes support for Hadoop 2 (#14763 ) Removing Hadoop 2 support as discussed in https://lists.apache.org/list?dev@druid.apache.org:lte=1M:hadoop	2023-08-09 17:47:52 +05:30
Kashif Faraz	2d8e0f28f3	Refactor: Cleanup coordinator duties for metadata cleanup (#14631 ) Changes - Add abstract class `MetadataCleanupDuty` - Make `KillAuditLogs`, `KillCompactionConfig`, etc extend `MetadataCleanupDuty` - Improve log and error messages - Cleanup tests - No functional change	2023-08-05 13:08:23 +05:30
AmatyaAvadhanula	5a52f7a457	Fix IT failure due to query interval (#14738 )	2023-08-02 11:29:35 -07:00
Abhishek Agarwal	5c96b60162	Increase heap size for router (#14699 )	2023-08-01 08:58:48 +05:30
Gian Merlino	986a271a7d	Merge core CoordinatorClient with MSQ CoordinatorServiceClient. (#14652 ) * Merge core CoordinatorClient with MSQ CoordinatorServiceClient. Continuing the work from #12696, this patch merges the MSQ CoordinatorServiceClient into the core CoordinatorClient, yielding a single interface that serves both needs and is based on the ServiceClient RPC system rather than DruidLeaderClient. Also removes the backwards-compatibility code for the handoff API in CoordinatorBasedSegmentHandoffNotifier, because the new API was added in 0.14.0. That's long enough ago that we don't need backwards compatibility for rolling updates. * Fixups. * Trigger GHA. * Remove unnecessary retrying in DruidInputSource. Add "about an hour" retry policy and h * EasyMock	2023-07-27 13:23:37 -07:00
AmatyaAvadhanula	0412f40d36	Prepare master branch for next release, 28.0.0 (#14595 ) * Prepare master branch for next release, 28.0.0	2023-07-18 09:22:30 +05:30
Gian Merlino	63ee69b4e8	Claim full support for Java 17. (#14384 ) * Claim full support for Java 17. No production code has changed, except the startup scripts. Changes: 1) Allow Java 17 without DRUID_SKIP_JAVA_CHECK. 2) Include the full list of opens and exports on both Java 11 and 17. 3) Document that Java 17 is both supported and preferred. 4) Switch some tests from Java 11 to 17 to get better coverage on the preferred version. * Doc update. * Update errorprone. * Update docker_build_containers.sh. * Update errorprone in licenses.yaml. * Add some more run-javas. * Additional run-javas. * Update errorprone. * Suppress new errorprone error. * Add exports and opens in ForkingTaskRunner for Java 11+. Test, doc changes. * Additional errorprone updates. * Update for errorprone. * Restore old fomatting in LdapCredentialsValidator. * Copy bin/ too. * Fix Java 15, 17 build line in docker_build_containers.sh. * Update busybox image. * One more java command. * Fix interpolation. * IT commandline refinements. * Switch to busybox 1.34.1-glibc. * POM adjustments, build and test one IT on 17. * Additional debugging. * Fix silly thing. * Adjust command line. * Add exports and opens one more place. * Additional harmonization of strong encapsulation parameters.	2023-07-07 12:52:35 -07:00
Jan Werner	95115d722a	CVE fixes - update of multiple dependencies. (#14519 ) Apache Druid brings multiple direct and transitive dependencies that are affected by plethora of CVEs. This PR attempts to update all the dependencies that did not require code refactoring. This PR modifies pom files, license file and OWASP Dependency Check suppression file.	2023-07-07 20:27:30 +05:30
Gian Merlino	67fbd8e7fc	Add "stringEncoding" parameter to DataSketches HLL. (#11201 ) * Add "stringEncoding" parameter to DataSketches HLL. Builds on the concept from #11172 and adds a way to feed HLL sketches with UTF-8 bytes. This must be an option rather than always-on, because prior to this patch, HLL sketches used UTF-16LE encoding when hashing strings. To remain compatible with sketch images created prior to this patch -- which matters during rolling updates and when reading sketches that have been written to segments -- we must keep UTF-16LE as the default. Not currently documented, because I'm not yet sure how best to expose this functionality to users. I think the first place would be in the SQL layer: we could have it automatically select UTF-8 or UTF-16LE when building sketches at query time. We need to be careful about this, though, because UTF-8 isn't always faster. Sometimes, like for the results of expressions, UTF-16LE is faster. I expect we will sort this out in future patches. * Fix benchmark. * Fix style issues, improve test coverage. * Put round back, to make IT updates easier. * Fix test. * Fix issue with filtered aggregators and add test. * Use DS native update(ByteBuffer) method. Improve test coverage. * Add another suppression. * Fix ITAutoCompactionTest. * Update benchmarks. * Updates. * Fix conflict. * Adjustments.	2023-06-30 12:45:55 -07:00
Adarsh Sanjeev	233233c92d	Add query context parameter to control limiting select rows (#14476 ) * Add query context parameter to control limiting select rows * Add unit tests * Address review comments * Address review comments * Address review comments	2023-06-28 17:54:24 +05:30
Abhishek Agarwal	f8f2fe8b7b	Skip tests based on files changed in the PR (#14445 ) Our CI system has a lot of tests. And much of this testing is really unnecessary for most of the PRs. This PR adds some checks so we can skip these expensive tests when we know they are not necessary.	2023-06-22 12:27:23 +05:30
Kashif Faraz	50461c3bd5	Enable smartSegmentLoading on the Coordinator (#13197 ) This commit does a complete revamp of the coordinator to address problem areas: - Stability: Fix several bugs, add capabilities to prioritize and cancel load queue items - Visibility: Add new metrics, improve logs, revamp `CoordinatorRunStats` - Configuration: Add dynamic config `smartSegmentLoading` to automatically set optimal values for all segment loading configs such as `maxSegmentsToMove`, `replicationThrottleLimit` and `maxSegmentsInNodeLoadingQueue`. Changed classes: - Add `StrategicSegmentAssigner` to make assignment decisions for load, replicate and move - Add `SegmentAction` to distinguish between load, replicate, drop and move operations - Add `SegmentReplicationStatus` to capture current state of replication of all used segments - Add `SegmentLoadingConfig` to contain recomputed dynamic config values - Simplify classes `LoadRule`, `BroadcastRule` - Simplify the `BalancerStrategy` and `CostBalancerStrategy` - Add several new methods to `ServerHolder` to track loaded and queued segments - Refactor `DruidCoordinator` Impact: - Enable `smartSegmentLoading` by default. With this enabled, none of the following dynamic configs need to be set: `maxSegmentsToMove`, `replicationThrottleLimit`, `maxSegmentsInNodeLoadingQueue`, `useRoundRobinSegmentAssignment`, `emitBalancingStats` and `replicantLifetime`. - Coordinator reports richer metrics and produces cleaner and more informative logs - Coordinator uses an unlimited load queue for all serves, and makes better assignment decisions	2023-06-19 14:27:35 +05:30
imply-cheddar	cfd07a95b7	Errors take 3 (#14004 ) Introduce DruidException, an exception whose goal in life is to be delivered to a user. DruidException itself has javadoc on it to describe how it should be used. This commit both introduces the Exception and adjusts some of the places that are generating exceptions to generate DruidException objects instead, as a way to show how the Exception should be used. This work was a 3rd iteration on top of work that was started by Paul Rogers. I don't know if his name will survive the squash-and-merge, so I'm calling it out here and thanking him for starting on this.	2023-06-19 01:11:13 -07:00
Adarsh Sanjeev	128133fadc	Add column replication_factor column to sys.segments table (#14403 ) Description: Druid allows a configuration of load rules that may cause a used segment to not be loaded on any historical. This status is not tracked in the sys.segments table on the broker, which makes it difficult to determine if the unavailability of a segment is expected and if we should not wait for it to be loaded on a server after ingestion has finished. Changes: - Track replication factor in `SegmentReplicantLookup` during evaluation of load rules - Update API `/druid/coordinator/v1metadata/segments` to return replication factor - Add column `replication_factor` to the sys.segments virtual table and populate it in `MetadataSegmentView` - If this column is 0, the segment is not assigned to any historical and will not be loaded.	2023-06-18 10:02:21 +05:30
Laksh Singla	4935f2470a	Limit results generated by SELECT queries in MSQ (#14370 ) * Limit select results in MSQ * reduce number of files in test * add truncated flag * avoid materializing select results to list, use iterable instead * javadocs	2023-06-15 13:13:11 +05:30
Abhishek Radhakrishnan	1c76ebad3b	Minor doc updates. (#14409 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-06-12 15:24:48 -07:00
Tejaswini Bandlamudi	914c006b8e	increase middlemanager heap server size in tests (#14345 )	2023-05-29 10:45:34 +05:30
Tejaswini Bandlamudi	9e0708f5e6	update heap size of coordinator, overlord services in docker IT environment (#14214 )	2023-05-12 23:19:48 +05:30
Tejaswini Bandlamudi	774073b2e7	Update Hadoop3 as default build version (#14005 ) Hadoop 2 often causes red security scans on Druid distribution because of the dependencies it brings. We want to move away from Hadoop 2 and provide Hadoop 3 distribution available. Switch druid to building with Hadoop 3 by default. Druid will still be compatible with Hadoop 2 and users can build hadoop-2 compatible distribution using hadoop2 profile.	2023-04-26 12:52:51 +05:30
Gian Merlino	a7d4162195	Compaction: Block input specs not aligned with segmentGranularity. (#14127 ) * Compaction: Block input specs not aligned with segmentGranularity. When input intervals are not aligned with segmentGranularity, data may be overshadowed if it lies in the space between the input intervals and the output segmentGranularity. In MSQ REPLACE, this is a validation error. IMO the same behavior makes sense for compaction tasks. In case anyone was depending on the ability to compact nonaligned intervals, a configuration parameter allowNonAlignedInterval is provided. I don't expect it to be used much. * Remove unused. * ITCompactionTaskTest uses non-aligned intervals.	2023-04-25 17:06:16 -07:00
Atul Mohan	e3c160f2f2	Add start_time column to sys.servers (#13358 ) Adds a new column start_time to sys.servers that captures the time at which the server was added to the cluster.	2023-04-14 15:23:34 +05:30
Gian Merlino	81074411a9	MSQ: Support multiple result columns with the same name. (#14025 ) * MSQ: Support multiple result columns with the same name. This is allowed in SQL, and is supported by the regular SQL endpoint. We retain a validation that INSERT ... SELECT does not allow multiple columns with the same name, because column names in segments must be unique.	2023-04-13 11:09:39 +05:30
Clint Wylie	1aef72aa7e	Bump up the version in pom to 27.0.0 in preparation of release (#14051 )	2023-04-10 14:56:59 +05:30
AdheipSingh	64b67c22c4	add latest version of druid operator to integeration tests (#13883 ) * add latest version of druid operator to integeration tests * Update integration-tests/k8s/tiny-cluster.yaml	2023-03-10 16:11:25 +05:30
Gian Merlino	fe9d0c46d5	Improve memory efficiency of WrappedRoaringBitmap. (#13889 ) * Improve memory efficiency of WrappedRoaringBitmap. Two changes: 1) Use an int[] for sizes 4 or below. 2) Remove the boolean compressRunOnSerialization. Doesn't save much space, but it does save a little, and it isn't adding a ton of value to have it be configurable. It was originally configurable in case anything broke when enabling it, but it's been a while and nothing has broken. * Slight adjustment. * Adjust for inspection. * Updates. * Update snaps. * Update test. * Adjust test. * Fix snaps.	2023-03-09 15:48:02 -08:00
Tejaswini Bandlamudi	7103cb4b9d	Removes FiniteFirehoseFactory and its implementations (#12852 ) The FiniteFirehoseFactory and InputRowParser classes were deprecated in 0.17.0 (#8823) in favor of InputSource & InputFormat. This PR removes the FiniteFirehoseFactory and all its implementations along with classes solely used by them like Fetcher (Used by PrefetchableTextFilesFirehoseFactory). Refactors classes including tests using FiniteFirehoseFactory to use InputSource instead. Removing InputRowParser may not be as trivial as many classes that aren't deprecated depends on it (with no alternatives), like EventReceiverFirehoseFactory. Hence FirehoseFactory, EventReceiverFirehoseFactory, and Firehose are marked deprecated.	2023-03-02 18:07:17 +05:30
Tejaswini Bandlamudi	e2461c21c4	fix flaky BatchIndex IT failures. (#13855 )	2023-02-27 17:23:14 -08:00
hqx871	79f04e71a1	Hadoop based batch ingestion support range partition (#13303 ) This pr implements range partitioning for hadoop-based ingestion. For detail about multi dimension range partition can be seen #11848.	2023-02-23 11:38:03 +05:30
Abhishek Radhakrishnan	8595271b55	Fixup typos in integration-test README. (#13828 )	2023-02-21 15:12:37 -08:00
Tejaswini Bandlamudi	e788f1ae6b	Add option to run standard & revised ITs manually on PRs (#13814 ) Create the docker image in case of maven dependencies cache restore failure too as env.sh file is removed on maven rebuild. Increase java heap size for security IT failing with error	2023-02-20 16:15:15 +05:30
Clint Wylie	08b5951cc5	merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698 ) * merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything * fix poms and license stuff * mockito is evil * allow reset of JvmUtils RuntimeInfo if tests used static injection to override	2023-02-17 14:27:41 -08:00
Abhishek Agarwal	8d03ace1b4	Use K3S instead of minikube for integration tests (#13782 ) We are seeing failures on GHA while using minikube so switching to K3S instead.	2023-02-17 23:06:30 +05:30
Paul Rogers	333196d207	Code cleanup & message improvements (#13778 ) * Misc cleanup edits Correct spacing Add type parameters Add toString() methods to formats so tests compare correctly IT doc revisions Error message edits Display UT query results when tests fail * Edit * Build fix * Build fixes	2023-02-15 15:22:54 +05:30
Tejaswini Bandlamudi	c95a26cae3	Migrate ITs from Travis to GHA (#13681 )	2023-02-01 03:31:29 -08:00
Maytas Monsereenusorn	7f54ebbf47	Fix Parquet Parser missing column when reading parquet file (#13612 ) * fix parquet reader * fix checkstyle * fix bug * fix inspection * refactor * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * add test * fix checkstyle * fix tests * add IT * add IT * add more tests * fix checkstyle * fix stuff * fix stuff * add more tests * add more tests	2023-01-11 20:08:48 -10:00
abhagraw	5ef689fc3f	Cloud deep storage tests in new IT framework (S3, GCS, Azure) (#13535 ) * MSQ s3 deep storage tests * Fix license check * Getting config values from env variables * Added s3TestUtils * Merged AbstractITSQLBasedIngestionTest with AbstractITBatchIndexTest * Fixing license issues * Fixing checkstyle errors * Fix spotbug errors * Update s3util name in other files * GCS and Azure deep storage tests * Fix license and checkstyle errors * Fix dependency error * fix intellij check errors * Copy credentials file in all containers * Refactor and gcs file upload fix * Fixing dependency check errors and codeQL warnings * Fixing checkstyle errors * Fixing intellij inspection errors * Removing unrequired exceptions * Addressing comments	2023-01-11 09:43:44 +05:30
Karan Kumar	56076d33fb	Worker retry for MSQ task (#13353 ) * Initial commit. * Fixing error message in retry exceeded exception * Cleaning up some code * Adding some test cases. * Adding java docs. * Finishing up state test cases. * Adding some more java docs and fixing spot bugs, intellij inspections * Fixing intellij inspections and added tests * Documenting error codes * Migrate current integration batch tests to equivalent MSQ tests (#13374) * Migrate current integration batch tests to equivalent MSQ tests using new IT framework * Fix build issues * Trigger Build * Adding more tests and addressing comments * fixBuildIssues * fix dependency issues * Parameterized the test and addressed comments * Addressing comments * fixing checkstyle errors * Adressing comments * Adding ITTest which kills the worker abruptly * Review comments phase one * Adding doc changes * Adjusting for single threaded execution. * Adding Sequential Merge PR state handling * Merge things * Fixing checkstyle. * Adding new context param for fault tolerance. Adding stale task handling in sketchFetcher. Adding UT's. * Merge things * Merge things * Adding parameterized tests Created separate module for faultToleranceTests * Adding missed files * Review comments and fixing tests. * Documentation things. * Fixing IT * Controller impl fix. * Fixing racy WorkerSketchFetcherTest.java exception handling. Co-authored-by: abhagraw <99210446+abhagraw@users.noreply.github.com> Co-authored-by: Karan Kumar <cryptoe@karans-mbp.lan>	2023-01-11 07:38:29 +05:30
imply-cheddar	0efd0879a8	Unify the handling of HTTP between SQL and Native (#13564 ) * Unify the handling of HTTP between SQL and Native The SqlResource and QueryResource have been using independent logic for things like error handling and response context stuff. This became abundantly clear and painful during a change I was making for Window Functions, so I unified them into using the same code for walking the response and serializing it. Things are still not perfectly unified (it would be the absolute best if the SqlResource just took SQL, planned it and then delegated the query run entirely to the QueryResource), but this refactor doesn't take that fully on. The new code leverages async query processing from our jetty container, the different interaction model with the Resource means that a lot of tests had to be adjusted to align with the async query model. The semantics of the tests remain the same with one exception: the SqlResource used to not log requests that failed authorization checks, now it does.	2022-12-19 00:25:33 -08:00

1 2 3 4 5 ...

575 Commits