druid

Commit Graph

Author	SHA1	Message	Date
Tejaswini Bandlamudi	7103cb4b9d	Removes FiniteFirehoseFactory and its implementations (#12852 ) The FiniteFirehoseFactory and InputRowParser classes were deprecated in 0.17.0 (#8823) in favor of InputSource & InputFormat. This PR removes the FiniteFirehoseFactory and all its implementations along with classes solely used by them like Fetcher (Used by PrefetchableTextFilesFirehoseFactory). Refactors classes including tests using FiniteFirehoseFactory to use InputSource instead. Removing InputRowParser may not be as trivial as many classes that aren't deprecated depends on it (with no alternatives), like EventReceiverFirehoseFactory. Hence FirehoseFactory, EventReceiverFirehoseFactory, and Firehose are marked deprecated.	2023-03-02 18:07:17 +05:30
Tejaswini Bandlamudi	e2461c21c4	fix flaky BatchIndex IT failures. (#13855 )	2023-02-27 17:23:14 -08:00
hqx871	79f04e71a1	Hadoop based batch ingestion support range partition (#13303 ) This pr implements range partitioning for hadoop-based ingestion. For detail about multi dimension range partition can be seen #11848.	2023-02-23 11:38:03 +05:30
Abhishek Radhakrishnan	8595271b55	Fixup typos in integration-test README. (#13828 )	2023-02-21 15:12:37 -08:00
Tejaswini Bandlamudi	e788f1ae6b	Add option to run standard & revised ITs manually on PRs (#13814 ) Create the docker image in case of maven dependencies cache restore failure too as env.sh file is removed on maven rebuild. Increase java heap size for security IT failing with error	2023-02-20 16:15:15 +05:30
Clint Wylie	08b5951cc5	merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698 ) * merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything * fix poms and license stuff * mockito is evil * allow reset of JvmUtils RuntimeInfo if tests used static injection to override	2023-02-17 14:27:41 -08:00
Abhishek Agarwal	8d03ace1b4	Use K3S instead of minikube for integration tests (#13782 ) We are seeing failures on GHA while using minikube so switching to K3S instead.	2023-02-17 23:06:30 +05:30
Paul Rogers	333196d207	Code cleanup & message improvements (#13778 ) * Misc cleanup edits Correct spacing Add type parameters Add toString() methods to formats so tests compare correctly IT doc revisions Error message edits Display UT query results when tests fail * Edit * Build fix * Build fixes	2023-02-15 15:22:54 +05:30
Tejaswini Bandlamudi	c95a26cae3	Migrate ITs from Travis to GHA (#13681 )	2023-02-01 03:31:29 -08:00
Maytas Monsereenusorn	7f54ebbf47	Fix Parquet Parser missing column when reading parquet file (#13612 ) * fix parquet reader * fix checkstyle * fix bug * fix inspection * refactor * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * add test * fix checkstyle * fix tests * add IT * add IT * add more tests * fix checkstyle * fix stuff * fix stuff * add more tests * add more tests	2023-01-11 20:08:48 -10:00
abhagraw	5ef689fc3f	Cloud deep storage tests in new IT framework (S3, GCS, Azure) (#13535 ) * MSQ s3 deep storage tests * Fix license check * Getting config values from env variables * Added s3TestUtils * Merged AbstractITSQLBasedIngestionTest with AbstractITBatchIndexTest * Fixing license issues * Fixing checkstyle errors * Fix spotbug errors * Update s3util name in other files * GCS and Azure deep storage tests * Fix license and checkstyle errors * Fix dependency error * fix intellij check errors * Copy credentials file in all containers * Refactor and gcs file upload fix * Fixing dependency check errors and codeQL warnings * Fixing checkstyle errors * Fixing intellij inspection errors * Removing unrequired exceptions * Addressing comments	2023-01-11 09:43:44 +05:30
Karan Kumar	56076d33fb	Worker retry for MSQ task (#13353 ) * Initial commit. * Fixing error message in retry exceeded exception * Cleaning up some code * Adding some test cases. * Adding java docs. * Finishing up state test cases. * Adding some more java docs and fixing spot bugs, intellij inspections * Fixing intellij inspections and added tests * Documenting error codes * Migrate current integration batch tests to equivalent MSQ tests (#13374) * Migrate current integration batch tests to equivalent MSQ tests using new IT framework * Fix build issues * Trigger Build * Adding more tests and addressing comments * fixBuildIssues * fix dependency issues * Parameterized the test and addressed comments * Addressing comments * fixing checkstyle errors * Adressing comments * Adding ITTest which kills the worker abruptly * Review comments phase one * Adding doc changes * Adjusting for single threaded execution. * Adding Sequential Merge PR state handling * Merge things * Fixing checkstyle. * Adding new context param for fault tolerance. Adding stale task handling in sketchFetcher. Adding UT's. * Merge things * Merge things * Adding parameterized tests Created separate module for faultToleranceTests * Adding missed files * Review comments and fixing tests. * Documentation things. * Fixing IT * Controller impl fix. * Fixing racy WorkerSketchFetcherTest.java exception handling. Co-authored-by: abhagraw <99210446+abhagraw@users.noreply.github.com> Co-authored-by: Karan Kumar <cryptoe@karans-mbp.lan>	2023-01-11 07:38:29 +05:30
imply-cheddar	0efd0879a8	Unify the handling of HTTP between SQL and Native (#13564 ) * Unify the handling of HTTP between SQL and Native The SqlResource and QueryResource have been using independent logic for things like error handling and response context stuff. This became abundantly clear and painful during a change I was making for Window Functions, so I unified them into using the same code for walking the response and serializing it. Things are still not perfectly unified (it would be the absolute best if the SqlResource just took SQL, planned it and then delegated the query run entirely to the QueryResource), but this refactor doesn't take that fully on. The new code leverages async query processing from our jetty container, the different interaction model with the Resource means that a lot of tests had to be adjusted to align with the async query model. The semantics of the tests remain the same with one exception: the SqlResource used to not log requests that failed authorization checks, now it does.	2022-12-19 00:25:33 -08:00
abhagraw	f6f625ee08	MSQ Reindex IT (#13433 ) * MSQ Reindex IT * Fixing checkstyle errors * Addressing comments * Addressing comments	2022-12-01 12:13:23 +05:30
Kashif Faraz	7cf761cee4	Prepare master branch for next release, 26.0.0 (#13401 ) * Prepare master branch for next release, 26.0.0 * Use docker image for druid 24.0.1 * Fix version in druid-it-cases pom.xml	2022-11-22 15:31:01 +05:30
abhagraw	5172d76a67	Migrate current integration batch tests to equivalent MSQ tests (#13374 ) * Migrate current integration batch tests to equivalent MSQ tests using new IT framework * Fix build issues * Trigger Build * Adding more tests and addressing comments * fixBuildIssues * fix dependency issues * Parameterized the test and addressed comments * Addressing comments * fixing checkstyle errors * Adressing comments	2022-11-21 09:12:02 +05:30
Rohan Garg	6ccf31490e	Allow injection of node-role set to all non base modules (#13371 )	2022-11-18 12:12:03 +05:30
Paul Rogers	7e600d2c63	Enhancements to the Calcite test framework (#13283 ) * Enhancements to the Calcite test framework * Standardize "Unauthorized" messages * Additional test framework extension points * Resolved joinable factory dependency issue	2022-11-08 14:28:49 -08:00
Gian Merlino	8f90589ce5	Always return sketches from DS_HLL, DS_THETA, DS_QUANTILES_SKETCH. (#13247 ) * Always return sketches from DS_HLL, DS_THETA, DS_QUANTILES_SKETCH. These aggregation functions are documented as creating sketches. However, they are planned into native aggregators that include finalization logic to convert the sketch to a number of some sort. This creates an inconsistency: the functions sometimes return sketches, and sometimes return numbers, depending on where they lie in the native query plan. This patch changes these SQL aggregators to _never_ finalize, by using the "shouldFinalize" feature of the native aggregators. It already existed for theta sketches. This patch adds the feature for hll and quantiles sketches. As to impact, Druid finalizes aggregators in two cases: - When they appear in the outer level of a query (not a subquery). - When they are used as input to an expression or finalizing-field-access post-aggregator (not any other kind of post-aggregator). With this patch, the functions will no longer be finalized in these cases. The second item is not likely to matter much. The SQL functions all declare return type OTHER, which would be usable as an input to any other function that makes sense and that would be planned into an expression. So, the main effect of this patch is the first item. To provide backwards compatibility with anyone that was depending on the old behavior, the patch adds a "sqlFinalizeOuterSketches" query context parameter that restores the old behavior. Other changes: 1) Move various argument-checking logic from runtime to planning time in DoublesSketchListArgBaseOperatorConversion, by adding an OperandTypeChecker. 2) Add various JsonIgnores to the sketches to simplify their JSON representations. 3) Allow chaining of ExpressionPostAggregators and other PostAggregators in the SQL layer. 4) Avoid unnecessary FieldAccessPostAggregator wrapping in the SQL layer, now that expressions can operate on complex inputs. 5) Adjust return type to thetaSketch (instead of OTHER) in ThetaSketchSetBaseOperatorConversion. * Fix benchmark class. * Fix compilation error. * Fix ThetaSketchSqlAggregatorTest. * Hopefully fix ITAutoCompactionTest. * Adjustment to ITAutoCompactionTest.	2022-11-03 09:43:00 -07:00
Kashif Faraz	fd7864ae33	Improve run time of coordinator duty MarkAsUnusedOvershadowedSegments (#13287 ) In clusters with a large number of segments, the duty `MarkAsUnusedOvershadowedSegments` can take a long very long time to finish. This is because of the costly invocation of `timeline.isOvershadowed` which is done for every used segment in every coordinator run. Changes - Use `DataSourceSnapshot.getOvershadowedSegments` to get all overshadowed segments - Iterate over this set instead of all used segments to identify segments that can be marked as unused - Mark segments as unused in the DB in batches rather than one at a time - Refactor: Add class `SegmentTimeline` for ease of use and readability while using a `VersionedIntervalTimeline` of segments.	2022-11-01 20:19:52 +05:30
Paul Rogers	86e6e61e88	Modular Calcite Test Framework (#12965 ) * Refactor Calcite test "framework" for planner tests Refactors the current Calcite tests to make it a bit easier to adjust the set of runtime objects used within a test. * Move data creation out of CalciteTests into TestDataBuilder * Move "framework" creation out of CalciteTests into a QueryFramework * Move injector-dependent functions from CalciteTests into QueryFrameworkUtils * Wrapper around the planner factory, etc. to allow customization. * Bulk of the "framework" created once per class rather than once per test. * Refactor tests to use a test builder * Change all testQuery() methods to use the test builder. Move test execution & verification into a test runner.	2022-10-20 15:45:44 -07:00
Paul Rogers	f4dcc52dac	Redesign QueryContext class (#13071 ) We introduce two new configuration keys that refine the query context security model controlled by druid.auth.authorizeQueryContextParams. When that value is set to true then two other configuration options become available: druid.auth.unsecuredContextKeys: The set of query context keys that do not require a security check. Use this for the "white-list" of key to allow. All other keys go through the existing context key security checks. druid.auth.securedContextKeys: The set of query context keys that do require a security check. Use this when you want to allow all but a specific set of keys: only these keys go through the existing context key security checks. Both are set using JSON list format: druid.auth.securedContextKeys=["secretKey1", "secretKey2"] You generally set one or the other values. If both are set, unsecuredContextKeys acts as exceptions to securedContextKeys. In addition, Druid defines two query context keys which always bypass checks because Druid uses them internally: sqlQueryId sqlStringifyArrays	2022-10-15 11:02:11 +05:30
Tejaswini Bandlamudi	3e13584e0e	Adds Idle feature to `SeekableStreamSupervisor` for inactive stream (#13144 ) * Idle Seekable stream supervisor changes. * nit * nit * nit * Adds unit tests * Supervisor decides it's idle state instead of AutoScaler * docs update * nit * nit * docs update * Adds Kafka unit test * Adds Kafka Integration test. * Updates travis config. * Updates kafka-indexing-service dependencies. * updates previous offsets snapshot & doc * Doesn't act if supervisor is suspended. * Fixes highest current offsets fetch bug, adds new Kafka UT tests, doc changes. * Reverts Kinesis Supervisor idle behaviour changes. * nit * nit * Corrects SeekableStreamSupervisorSpec check on idle behaviour config, adds tests. * Fixes getHighestCurrentOffsets to fetch offsets of publishing tasks too * Adds Kafka Supervisor UT * Improves test coverage in druid-server * Corrects IT override config * Doc updates and Syntactic changes * nit * supervisorSpec.ioConfig.idleConfig changes	2022-10-12 18:31:08 +05:30
Frank Chen	d30cf8c308	Dependency cleanup (#13194 ) * Clean up dependency in extensions * Bump protobuf/aws.sdk * Bump aws-sdk to 1.12.317 * Fix CI * Fix CI * Update license * Update license	2022-10-10 20:34:38 +08:00
Laksh Singla	728745a1d3	Add IT for MSQ task engine using the new IT framework (#12992 ) * first test, serde causing problems * serde working * insert and select check * Add cluster annotations for MSQ test cases * Add cluster config for MSQ * Add MSQ config to the pom.xml * cleanup unnecessary changes * Remove model classes * Comments, checkstyle, check queries from file * fixup test case name * build failure fix * review changes * build failure fix * Trigger Build * Log the mismatch in QueryResultsVerifier * Trigger Build * Change the signature of the results verifier * review changes * LGTM fix * build, change pom * Trigger Build * Trigger Build * trigger build with minimal pom changes * guice fix in tests * travis.yml	2022-09-22 16:09:47 +05:30
Vadim Ogievetsky	b9edfe34a4	be consistent about referring to the web console by its name (#13118 )	2022-09-19 15:02:17 -07:00
Frank Chen	b8dd822f32	Some improvements about Docker (#13059 )	2022-09-16 09:25:52 +08:00
Adam Peck	ee22663dd3	Add interpolation to JsonConfigurator (#13023 ) * Add interpolation to JsonConfigurator * Fix checkstyle * Fix tests by removing common-text override * Add back commons-text without version * Remove unused hadoopDir configs * Move some stuff to hopefully pass coverage	2022-09-07 12:48:01 +05:30
Abhishek Agarwal	618757352b	Bump up the version to 25.0.0 (#12975 ) * Bump up the version to 25.0.0 * Fix the version in console	2022-08-29 11:27:38 +05:30
Paul Rogers	cfed036091	Add the new integration test framework (#12368 ) This commit is a first draft of the revised integration test framework which provides: - A new directory, integration-tests-ex that holds the new integration test structure. (For now, the existing integration-tests is left unchanged.) - Maven module druid-it-tools to hold code placed into the Docker image. - Maven module druid-it-image to build the Druid-only test image from the tarball produced in distribution. (Dependencies live in their "official" image.) - Maven module druid-it-cases that holds the revised tests and the framework itself. The framework includes file-based test configuration, test-specific clients, test initialization and updated versions of some of the common test support classes. The integration test setup is primarily a huge mass of details. This approach refactors many of those details: from how the image is built and configured to how the Docker Compose scripts are structured to test configuration. An extensive set of "readme" files explains those details. Rather than repeat that material here, please consult those files for explanations.	2022-08-24 17:03:23 +05:30
Xavier Léauté	752e42a312	fix running integration tests on macos aarch64 (#12913 ) * add osx-aarch_64 netty-transport-native-kqueue native dependency * align docker-java dependency versions using bom and update to 3.2.13	2022-08-17 18:03:24 +02:00
Abhishek Agarwal	adbebc174a	Fix flaky tests in SeekableStreamSupervisorStateTest (#12875 ) * Fix flaky test in SeekableStreamSupervisorStateTest * Fix for flaky security IT Test * fix tests * retry queries if there is some flakiness	2022-08-16 18:38:03 +05:30
Paul Rogers	41712b7a3a	Refactor SqlLifecycle into statement classes (#12845 ) * Refactor SqlLifecycle into statement classes Create direct & prepared statements Remove redundant exceptions from tests Tidy up Calcite query tests Make PlannerConfig more testable * Build fixes * Added builder to SqlQueryPlus * Moved Calcites system properties to saffron.properties * Build fix * Resolve merge conflict * Fix IntelliJ inspection issue * Revisions from reviews Backed out a revision to Calcite tests that didn't work out as planned * Build fix * Fixed spelling errors * Fixed failed test Prepare now enforces security; before it did not. * Rebase and fix IntelliJ inspections issue * Clean up exception handling * Fix handling of JDBC auth errors * Build fix * More tweaks to security messages	2022-08-14 00:44:08 -07:00
AmatyaAvadhanula	d294404924	Kinesis ingestion with empty shards (#12792 ) Kinesis ingestion requires all shards to have at least 1 record at the required position in druid. Even if this is satisified initially, resharding the stream can lead to empty intermediate shards. A significant delay in writing to newly created shards was also problematic. Kinesis shard sequence numbers are big integers. Introduce two more custom sequence tokens UNREAD_TRIM_HORIZON and UNREAD_LATEST to indicate that a shard has not been read from and that it needs to be read from the start or the end respectively. These values can be used to avoid the need to read at least one record to obtain a sequence number for ingesting a newly discovered shard. If a record cannot be obtained immediately, use a marker to obtain the relevant shardIterator and use this shardIterator to obtain a valid sequence number. As long as a valid sequence number is not obtained, continue storing the token as the offset. These tokens (UNREAD_TRIM_HORIZON and UNREAD_LATEST) are logically ordered to be earlier than any valid sequence number. However, the ordering requires a few subtle changes to the existing mechanism for record sequence validation: The sequence availability check ensures that the current offset is before the earliest available sequence in the shard. However, current token being an UNREAD token indicates that any sequence number in the shard is valid (despite the ordering) Kinesis sequence numbers are inclusive i.e if current sequence == end sequence, there are more records left to read. However, the equality check is exclusive when dealing with UNREAD tokens.	2022-08-05 22:38:58 +05:30
Paul Rogers	a618458bf0	Tidy up construction of the Guice Injectors (#12816 ) * Refactor Guice initialization Builders for various module collections Revise the extensions loader Injector builders for server startup Move Hadoop init to indexer Clean up server node role filtering Calcite test injector builder * Revisions from review comments * Build fixes * Revisions from review comments	2022-08-04 00:05:07 -07:00
AmatyaAvadhanula	fbd1a07e7e	Fix kinesis IT flakiness (#12821 )	2022-08-03 17:16:16 +05:30
Rohan Garg	eabce8a159	Fix flakiness in query-retry ITs (#12818 )	2022-08-02 17:20:16 +05:30
Paul Rogers	d52abe7b38	Today is that day - Single pass through Calcite planner (#12636 ) * Druid planner now makes only one pass through Calcite planner Resolves the issue that required two parse/plan cycles: one for validate, another for plan. Creates a clone of the Calcite planner and validator to resolve the conflict that prevented the merger.	2022-07-29 18:53:21 -07:00
Paul Rogers	a8b155e9c6	Fixes for the Avatica JDBC driver (#12709 ) * Fixes for the Avatica JDBC driver Correctly implement regular and prepared statements Correctly implement result sets Fix race condition with contexts Clarify when parameters are used Prepare for single-pass through the planner * Addressed review comments * Addressed review comment	2022-07-27 15:22:40 -07:00
Rohan Garg	bb953be09b	Refactor usage of JoinableFactoryWrapper + more test coverage (#12767 ) Refactor usage of JoinableFactoryWrapper to add e2e test for createSegmentMapFn with joinToFilter feature enabled	2022-07-12 06:25:36 -07:00
Kashif Faraz	8dc4a155c7	Fix flaky IT: ITPerfectRollupParallelBatchIndexTest (#12737 ) * Increase worker.intermediaryPartitionTimeout in ITs to 30 mins * Update timeout to 60 mins * Remove timeout change from indexer	2022-07-09 17:15:51 +05:30
Maytas Monsereenusorn	1558ef471c	Add some debug tips for debugging peons (#12697 ) * add some debug tips * address comments * fix typo	2022-07-09 01:47:25 -07:00
Clint Wylie	bbbb6e1c3f	fix DruidSchema issue where datasources with no segments can become stuck in tables list indefinitely (#12727 )	2022-07-01 18:54:01 -07:00
Abhishek Agarwal	dbd45daf33	Flakiness and exceptions during tests (#12705 )	2022-06-28 10:36:23 +05:30
Paul Rogers	f83fab699e	Add IT-related changes pulled out of PR #12368 (#12673 ) This commit contains changes made to the existing ITs to support the new ITs. Changes: - Make the "custom node role" code usable by the new ITs. - Use flag `-DskipITs` to skips the integration tests but runs unit tests. - Use flag `-DskipUTs` skips unit tests but runs the "new" integration tests. - Expand the existing Druid profile, `-P skip-tests` to skip both ITs and UTs.	2022-06-26 02:13:59 +05:30
Jihoon Son	3d9e3dbad9	Fix hadoop library location for integration tests (#12497 )	2022-06-23 10:39:54 -05:00
Tejaswini Bandlamudi	99e1b4efee	Update default value of `inputSegmentSizeBytes` in configuration docs (#12678 )	2022-06-22 09:05:03 +05:30
Paul Rogers	893759de91	Remove null and empty fields from native queries (#12634 ) * Remove null and empty fields from native queries * Test fixes * Attempted IT fix. * Revisions from review comments * Build fixes resulting from changes suggested by reviews * IT fix for changed segment size	2022-06-16 14:07:25 -07:00
AmatyaAvadhanula	f970757efc	Optimize overlord GET /tasks memory usage (#12404 ) The web-console (indirectly) calls the Overlord’s GET tasks API to fetch the tasks' summary which in turn queries the metadata tasks table. This query tries to fetch several columns, including payload, of all the rows at once. This introduces a significant memory overhead and can cause unresponsiveness or overlord failure when the ingestion tab is opened multiple times (due to several parallel calls to this API) Another thing to note is that the task table (the payload column in particular) can be very large. Extracting large payloads from such tables can be very slow, leading to slow UI. While we are fixing the memory pressure in the overlord, we can also fix the slowness in UI caused by fetching large payloads from the table. Fetching large payloads also puts pressure on the metadata store as reported in the community (Metadata store query performance degrades as the tasks in druid_tasks table grows · Issue #12318 · apache/druid ) The task summaries returned as a response for the API are several times smaller and can fit comfortably in memory. So, there is an opportunity here to fix the memory usage, slow ingestion, and under-pressure metadata store by removing the need to handle large payloads in every layer we can. Of course, the solution becomes complex as we try to fix more layers. With that in mind, this page captures two approaches. They vary in complexity and also in the degree to which they fix the aforementioned problems.	2022-06-16 22:30:37 +05:30
superivaj	f9bdb3b236	Fix usage of maxColumnsToMerge in auto-compaction tuning config (#12551 ) Issue: Even though `CompactionTuningConfig` allows a `maxColumnsToMerge` config (to optimize memory usage, particulary for datasources with many dimensions), the corresponding client object `ClientCompactionTaskQueryTuningConfig` (used by the coordinator duty `CompactSegments` to trigger auto-compaction) does not contain this field. Thus, the value of `maxColumnsToMerge` specified in any datasource compaction config is ignored. Changes: - Add field `maxColumnsToMerge` in `ClientCompactionTaskQueryTuningConfig` and `UserCompactionTaskQueryTuningConfig` - Fix tests	2022-05-20 22:23:08 +05:30

1 2 3 4 5 ...

538 Commits