From cc21df7ee86f4d4482b15333acd972fdac1a8ea6 Mon Sep 17 00:00:00 2001 From: Asif Sohail Mohammed Date: Tue, 22 Aug 2023 15:28:13 -0500 Subject: [PATCH] S3 scan doc update (#4848) * Updated s3 scan docs Signed-off-by: Asif Sohail Mohammed * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Asif Sohail Mohammed Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../pipelines/configuration/sources/s3.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/_data-prepper/pipelines/configuration/sources/s3.md b/_data-prepper/pipelines/configuration/sources/s3.md index 23e224ce..5624bed4 100644 --- a/_data-prepper/pipelines/configuration/sources/s3.md +++ b/_data-prepper/pipelines/configuration/sources/s3.md @@ -187,27 +187,27 @@ The following parameters allow you to scan S3 objects. All options can be config Option | Required | Type | Description :--- | :--- | :--- | :--- -`start_time` | No | String | The start of the time range during which to scan objects from all the buckets. This should follow [ISO LocalDateTime](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html#ISO_LOCAL_DATE_TIME) format, for example, `023-01-23T10:00:00`, or it can be configured to the keyword that represents the current `LocalDateTime`, such as `now`. To define a time range, use `start_time` with either `end_time` or `range`. -`end_time` | No | String | The end of the time range during which to scan objects from all the buckets. This should follow [ISO LocalDateTime](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html#ISO_LOCAL_DATE_TIME) format, for example, `023-01-23T10:00:00`, or it can be configured to the keyword that represents the current `LocalDateTime`, such as `now`. To define a time range, use `end_time` with either `start_time` or `range`. -`range` | No | String | The time range from which objects are scanned from all buckets. If configured with `start_time`, defines a last modified time range with `start_time` + `range`. If configured with `end_time`, defines a last modified time range with `end_time` - `range`. Supports ISO_8601 notation strings, such as `PT20.345S` or `PT15M`, and notation strings for seconds (`60s`) and milliseconds (`1600ms`). +`start_time` | No | String | The time from which to start scanning objects modified after the given `start_time`. This should follow [ISO LocalDateTime](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html#ISO_LOCAL_DATE_TIME) format, for example, `023-01-23T10:00:00`. If `end_time` is configured along with `start_time`, all objects after `start_time` and before `end_time` will be processed. `start_time` and `range` cannot be used together. +`end_time` | No | String | The time after which no objects will be scanned after the given `end_time`. This should follow [ISO LocalDateTime](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html#ISO_LOCAL_DATE_TIME) format, for example, `023-01-23T10:00:00`. If `start_time` is configured along with `end_time`, all objects after `start_time` and before `end_time` will be processed. `end_time` and `range` cannot be used together. +`range` | No | String | The time range from which objects are scanned from all buckets. Supports ISO_8601 notation strings, such as `PT20.345S` or `PT15M`, and notation strings for seconds (`60s`) and milliseconds (`1600ms`). `start_time` and `end_time` cannot be used with `range`. Range `P12H` scans all the objects modified in the last 12 hours from the time pipeline started. `buckets` | Yes | List | A list of [buckets](#bucket) to scan. `scheduling` | No | List | The configuration for scheduling periodic scans on all buckets. `start_time`, `end_time` and `range` can not be used if scheduling is configured. ### bucket Option | Required | Type | Description -:--- | :--- | :--- | :--- -`bucket` | Yes | String list | Provides options for each bucket. +:--- | :--- |:-----| :--- +`bucket` | Yes | Map | Provides options for each bucket. -You can configure the following options inside the bucket setting. +You can configure the following options inside the [bucket](#bucket) setting. Option | Required | Type | Description :--- | :--- | :--- | :--- `name` | Yes | String | The string representing the S3 bucket name to scan. `filter` | No | [Filter](#filter) | Provides the filter configuration. -`start_time` | No | String | The start of the time range during which to scan objects from all the buckets. This should follow [ISO LocalDateTime](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html#ISO_LOCAL_DATE_TIME) format, for example, `023-01-23T10:00:00`, or it can be configured to the keyword that represents the current `LocalDateTime`, such as `now`. To define a time range, use `start_time` with either `end_time` or `range`. -`end_time` | No | String | The end of the time range during which to scan objects from all the buckets. This should follow [ISO LocalDateTime](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html#ISO_LOCAL_DATE_TIME) format, for example, `023-01-23T10:00:00`, or it can be configured to the keyword that represents the current `LocalDateTime`, such as `now`. To define a time range, use `end_time` with either `start_time` or `range`. -`range` | No | String | The time range from which objects are scanned from all buckets. If configured with `start_time`, defines a last modified time range with `start_time` + `range`. If configured with `end_time`, defines a last modified time range with `end_time` - `range`. Supports ISO_8601 notation strings, such as `PT20.345S` or `PT15M`, and notation strings for seconds (`60s`) and milliseconds (`1600ms`). +`start_time` | No | String | The time from which to start scanning objects modified after the given `start_time`. This should follow [ISO LocalDateTime](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html#ISO_LOCAL_DATE_TIME) format, for example, `023-01-23T10:00:00`. If `end_time` is configured along with `start_time`, all objects after `start_time` and before `end_time` will be processed. `start_time` and `range` cannot be used together. This will overwrites the `start_time` at the [scan](#scan) level. +`end_time` | No | String | The time after which no objects will be scanned after the given `end_time`. This should follow [ISO LocalDateTime](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html#ISO_LOCAL_DATE_TIME) format, for example, `023-01-23T10:00:00`. If `start_time` is configured along with `end_time`, all objects after `start_time` and before `end_time` will be processed. This overwrites the `end_time` at the [scan](#scan) level. +`range` | No | String | The time range from which objects are scanned from all buckets. Supports ISO_8601 notation strings, such as `PT20.345S` or `PT15M`, and notation strings for seconds (`60s`) and milliseconds (`1600ms`). `start_time` and `end_time` cannot be used with `range`. Range `P12H` scans all the objects modified in the last 12 hours from the time pipeline started. This overwrites the `range` at the [scan](#scan) level. ### filter