druid

Commit Graph

Author	SHA1	Message	Date
Kashif Faraz	8ff1b2d5d4	Revert "Add filter in cloud object input source for backward compatibility (#13437 )" (#13450 ) This reverts commit `b12e5f300e`.	2022-11-30 16:33:05 +05:30
Tejaswini Bandlamudi	b12e5f300e	Add filter in cloud object input source for backward compatibility (#13437 ) https://github.com/apache/druid/pull/13027 PR replaces `filter` parameter with `objectGlob` in ingestion input source. However, this will cause existing ingestion jobs to fail if they are using a filter already. This PR adds old filter functionality alongside objectGlob to preserve backward compatibility.	2022-11-28 23:04:33 +05:30
Didip Kerabat	56d5c9780d	Use standard library to correctly glob and stop at the correct folder structure when filtering cloud objects (#13027 ) * Use standard library to correctly glob and stop at the correct folder structure when filtering cloud objects. Removed: import org.apache.commons.io.FilenameUtils; Add: import java.nio.file.FileSystems; import java.nio.file.PathMatcher; import java.nio.file.Paths; * Forgot to update CloudObjectInputSource as well. * Fix tests. * Removed unused exceptions. * Able to reduced user mistakes, by removing the protocol and the bucket on filter. * add 1 more test. * add comment on filterWithoutProtocolAndBucket * Fix lint issue. * Fix another lint issue. * Replace all mention of filter -> objectGlob per convo here: https://github.com/apache/druid/pull/13027#issuecomment-1266410707 * fix 1 bad constructor. * Fix the documentation. * Don’t do anything clever with the object path. * Remove unused imports. * Fix spelling error. * Fix incorrect search and replace. * Addressing Gian’s comment. * add filename on .spelling * Fix documentation. * fix documentation again Co-authored-by: Didip Kerabat <didip@apple.com>	2022-11-10 23:46:40 -08:00
Abhishek Agarwal	e3f9a0ed44	Lazy initialization of segment killers, movers and archivers (#13170 ) * Lazy initialization of segment killers, movers and archivers * Add test for lazy killer * Add more tests * Intellij fixes	2022-10-04 15:55:46 +05:30
Karan Kumar	275f834b2a	Race in Task report/log streamer (#12931 ) * Fixing RACE in HTTP remote task Runner * Changes in the interface * Updating documentation * Adding test cases to SwitchingTaskLogStreamer * Adding more tests	2022-08-25 17:56:01 -07:00
Didip Kerabat	6ddb828c7a	Able to filter Cloud objects with glob notation. (#12659 ) In a heterogeneous environment, sometimes you don't have control over the input folder. Upstream can put any folder they want. In this situation the S3InputSource.java is unusable. Most people like me solved it by using Airflow to fetch the full list of parquet files and pass it over to Druid. But doing this explodes the JSON spec. We had a situation where 1 of the JSON spec is 16MB and that's simply too much for Overlord. This patch allows users to pass {"filter": "*.parquet"} and let Druid performs the filtering of the input files. I am using the glob notation to be consistent with the LocalFirehose syntax.	2022-06-24 11:40:08 +05:30
Karan Kumar	b94390ba33	Adding Shared Access resource support for azure (#12266 ) Azure Blob storage has multiple modes of authentication. One of them is Shared access resource . This is very useful in cases when we do not want to add the account key in the druid properties .	2022-02-22 18:27:43 +05:30
Jihoon Son	ab3d994a17	Lazy instantiation for segmentKillers, segmentMovers, and segmentArchivers (#12207 ) * working * Lazily load segmentKillers, segmentMovers, and segmentArchivers * more tests * test-jar plugin * more coverage * lazy client * clean up changes * checkstyle * i did not change the branch condition * adjust failure rate to run tests faster * javadocs * checkstyle	2022-02-08 13:02:06 -08:00
Gian Merlino	babf00f8e3	Migrate File.mkdirs to FileUtils.mkdirp. (#11879 ) * Migrate File.mkdirs to FileUtils.mkdirp. * Remove unused imports. * Fix LookupReferencesManager. * Simplify. * Also migrate usages of forceMkdir. * Fix var name. * Fix incorrect call. * Update test.	2021-11-09 11:10:49 -08:00
Parag Jain	c7b46671b3	option to use deep storage for storing shuffle data (#11507 ) Fixes #11297. Description Description and design in the proposal #11297 Key changed/added classes in this PR DataSegmentPusher ShuffleClient PartitionStat PartitionLocation *IntermediaryDataManager	2021-08-13 16:40:25 -04:00
Jihoon Son	b5b3e6ecce	Add maxNumFiles to splitHintSpec (#10243 ) * Add maxNumFiles to splitHintSpec * missing link * fix build failure; use maxNumFiles for integration tests * spelling * lower default * Update docs/ingestion/native-batch.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * address comments; change default maxSplitSize * spelling * typos and doc * same change for segments splitHintSpec * fix build * fix build Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2020-08-21 09:43:58 -07:00
Suneet Saldanha	1ced3b33fb	IntelliJ inspections cleanup (#9339 ) * IntelliJ inspections cleanup * Standard Charset object can be used * Redundant Collection.addAll() call * String literal concatenation missing whitespace * Statement with empty body * Redundant Collection operation * StringBuilder can be replaced with String * Type parameter hides visible type * fix warnings in test code * more test fixes * remove string concatenation inspection error * fix extra curly brace * cleanup AzureTestUtils * fix charsets for RangerAdminClient * review comments	2020-04-10 10:04:40 -07:00
zachjsh	e855c7fe1b	Allow Cloud Deep Storage configs without segment bucket or path specified (#9588 ) * Allow Cloud SegmentKillers to be instantiated without segment bucket or path This change fixes a bug that was introduced that causes ingestion to fail if data is ingested from one of the supported cloud storages (Azure, Google, S3), and the user is using another type of storage for deep storage. In this case the all segment killer implementations are instantiated. A change recently made forced a dependency between the supported cloud storage type SegmentKiller classes and the deep storage configuration for that storage type being set, which forced the deep storage bucket and prefix to be non-null. This caused a NullPointerException to be thrown when instantiating the SegmentKiller classes during ingestion. To fix this issue, the respective deep storage segment configs for the cloud storage types supported in druid are now allowed to have nullable bucket and prefix configurations * * Allow google deep storage bucket to be null	2020-04-01 11:57:32 -07:00
zachjsh	4870ad7b56	Azure deep storage does not work with datasource name containing non-ASCII chars (#9525 ) * Azure deep storage does not work with datasource name containing non-ASCII chars Fixed a bug where recording the segment file location fails when using Azure Deep Storage, if the datasource has any special characters * * update jacoco thresholds * * resolve merge conflicts * address review comments	2020-03-19 12:32:35 -07:00
zachjsh	b18dd2b7a9	Ability to Delete task logs and segments from Azure Storage (#9523 ) * Ability to Delete task logs and segments from Azure Storage * implement ability to delete all tasks logs or all task logs written before a particular date when written to Azure storage * implement ability to delete all segments from Azure deep storage * * Address review comments	2020-03-18 17:59:17 -07:00
Jihoon Son	9466ac7c9b	Skip empty files for local, hdfs, and cloud input sources (#9450 ) * Skip empty files for local, hdfs, and cloud input sources * split hint spec doc * doc for skipping empty files * fix typo; adjust tests * unnecessary fluent iterable * address comments * fix test * use the right lists * fix test * fix test	2020-03-03 20:51:06 -08:00
zachjsh	d771b42ed1	Move Azure extension into Core (#9394 ) * Move Azure extension into Core Moving the azure extension into Core. * * Fix build failure * * Add The MIT License (MIT) to list of compatible licenses * * Address review comments * * change reference to contrib azure to core azure * * Fix spelling mistakes.	2020-02-25 17:49:16 -08:00

17 Commits