druid

Commit Graph

Author	SHA1	Message	Date
Didip Kerabat	6ddb828c7a	Able to filter Cloud objects with glob notation. (#12659 ) In a heterogeneous environment, sometimes you don't have control over the input folder. Upstream can put any folder they want. In this situation the S3InputSource.java is unusable. Most people like me solved it by using Airflow to fetch the full list of parquet files and pass it over to Druid. But doing this explodes the JSON spec. We had a situation where 1 of the JSON spec is 16MB and that's simply too much for Overlord. This patch allows users to pass {"filter": "*.parquet"} and let Druid performs the filtering of the input files. I am using the glob notation to be consistent with the LocalFirehose syntax.	2022-06-24 11:40:08 +05:30
Abhishek Agarwal	2fe053c5cb	Bump up the versions (#12480 )	2022-04-27 14:28:20 +05:30
Jihoon Son	ab3d994a17	Lazy instantiation for segmentKillers, segmentMovers, and segmentArchivers (#12207 ) * working * Lazily load segmentKillers, segmentMovers, and segmentArchivers * more tests * test-jar plugin * more coverage * lazy client * clean up changes * checkstyle * i did not change the branch condition * adjust failure rate to run tests faster * javadocs * checkstyle	2022-02-08 13:02:06 -08:00
Gian Merlino	babf00f8e3	Migrate File.mkdirs to FileUtils.mkdirp. (#11879 ) * Migrate File.mkdirs to FileUtils.mkdirp. * Remove unused imports. * Fix LookupReferencesManager. * Simplify. * Also migrate usages of forceMkdir. * Fix var name. * Fix incorrect call. * Update test.	2021-11-09 11:10:49 -08:00
Clint Wylie	fe1d8c206a	bump version to 0.23.0-SNAPSHOT (#11670 )	2021-09-08 15:56:04 -07:00
Parag Jain	c7b46671b3	option to use deep storage for storing shuffle data (#11507 ) Fixes #11297. Description Description and design in the proposal #11297 Key changed/added classes in this PR DataSegmentPusher ShuffleClient PartitionStat PartitionLocation *IntermediaryDataManager	2021-08-13 16:40:25 -04:00
Xavier Léauté	a1c20d7457	update jackson dependencies to use bom (#11353 ) Switching to the bom dependency declaration simplifies managing jackson dependencies. It also removes the need to override individual library versions for CVE fixes, since the bom takes care of that internally. This change aligns our jackson dependency versions on 2.10.5(.x): - updates jackson libraries from 2.10.2 to 2.10.5 - jackson-databind remains at 2.10.5.1 as defined in the bom Release notes: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.10	2021-06-16 18:37:30 -07:00
frank chen	1d79ca906a	Bump aliyun SDK to 3.11.3 (#11044 )	2021-03-29 20:33:24 -07:00
Gian Merlino	bf20f9e979	DruidInputSource: Fix issues in column projection, timestamp handling. (#10267 ) * DruidInputSource: Fix issues in column projection, timestamp handling. DruidInputSource, DruidSegmentReader changes: 1) Remove "dimensions" and "metrics". They are not necessary, because we can compute which columns we need to read based on what is going to be used by the timestamp, transform, dimensions, and metrics. 2) Start using ColumnsFilter (see below) to decide which columns we need to read. 3) Actually respect the "timestampSpec". Previously, it was ignored, and the timestamp of the returned InputRows was set to the `__time` column of the input datasource. (1) and (2) together fix a bug in which the DruidInputSource would not properly read columns that are used as inputs to a transformSpec. (3) fixes a bug where the timestampSpec would be ignored if you attempted to set the column to something other than `__time`. (1) and (3) are breaking changes. Web console changes: 1) Remove "Dimensions" and "Metrics" from the Druid input source. 2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for compatibility with the new behavior. Other changes: 1) Add ColumnsFilter, a new class that allows input readers to determine which columns they need to read. Currently, it's only used by the DruidInputSource, but it could be used by other columnar input sources in the future. 2) Add a ColumnsFilter to InputRowSchema. 3) Remove the metric names from InputRowSchema (they were unused). 4) Add InputRowSchemas.fromDataSchema method that computes the proper ColumnsFilter for given timestamp, dimensions, transform, and metrics. 5) Add "getRequiredColumns" method to TransformSpec to support the above. * Various fixups. * Uncomment incorrectly commented lines. * Move TransformSpecTest to the proper module. * Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting. * Fix. * Fix build. * Checkstyle. * Misc fixes. * Fix test. * Move config. * Fix imports. * Fixup. * Fix ShuffleResourceTest. * Add import. * Smarter exclusions. * Fixes based on tests. Also, add TIME_COLUMN constant in the web console. * Adjustments for tests. * Reorder test data. * Update docs. * Update docs to say Druid 0.22.0 instead of 0.21.0. * Fix test. * Fix ITAutoCompactionTest. * Changes from review & from merging.	2021-03-25 10:32:21 -07:00
Jihoon Son	95065bdf1a	Bump dev version to 0.22.0-SNAPSHOT (#10759 )	2021-01-15 13:16:23 -08:00
Jonathan Wei	65c0d64676	Update version to 0.21.0-SNAPSHOT (#10450 ) * [maven-release-plugin] prepare release druid-0.21.0 * [maven-release-plugin] prepare for next development iteration * Update web-console versions	2020-10-03 16:08:34 -07:00
Jihoon Son	b5b3e6ecce	Add maxNumFiles to splitHintSpec (#10243 ) * Add maxNumFiles to splitHintSpec * missing link * fix build failure; use maxNumFiles for integration tests * spelling * lower default * Update docs/ingestion/native-batch.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * address comments; change default maxSplitSize * spelling * typos and doc * same change for segments splitHintSpec * fix build * fix build Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2020-08-21 09:43:58 -07:00
Clint Wylie	c86e7ce30b	bump version to 0.20.0-SNAPSHOT (#10124 )	2020-07-06 15:08:32 -07:00
frank chen	60c6bd5b4c	support Aliyun OSS service as deep storage (#9898 ) * init commit, all tests passed * fix format Signed-off-by: frank chen <frank.chen021@outlook.com> * data stored successfully * modify config path * add doc * add aliyun-oss extension to project * remove descriptor deletion code to avoid warning message output by aliyun client * fix warnings reported by lgtm-com * fix ci warnings Signed-off-by: frank chen <frank.chen021@outlook.com> * fix errors reported by intellj inspection check Signed-off-by: frank chen <frank.chen021@outlook.com> * fix doc spelling check Signed-off-by: frank chen <frank.chen021@outlook.com> * fix dependency warnings reported by ci Signed-off-by: frank chen <frank.chen021@outlook.com> * fix warnings reported by CI Signed-off-by: frank chen <frank.chen021@outlook.com> * add package configuration to support showing extension info Signed-off-by: frank chen <frank.chen021@outlook.com> * add IT test cases and fix bugs Signed-off-by: frank chen <frank.chen021@outlook.com> * 1. code review comments adopted 2. change schema from 'aliyun-oss' to 'oss' Signed-off-by: frank chen <frank.chen021@outlook.com> * add license info Signed-off-by: frank chen <frank.chen021@outlook.com> * fix doc Signed-off-by: frank chen <frank.chen021@outlook.com> * exclude execution of IT testcases of OSS extension from CI Signed-off-by: frank chen <frank.chen021@outlook.com> * put the extensions under contrib group and add to distribution * fix names in test cases * add unit test to cover OssInputSource * fix names in test cases * fix dependency problem reported by CI Signed-off-by: frank chen <frank.chen021@outlook.com>	2020-07-01 22:20:53 -07:00

14 Commits