From 8651556df8c7bd36a99c55a05d70e13c40f4898b Mon Sep 17 00:00:00 2001
From: Chen <19492223+chenqi0805@users.noreply.github.com>
Date: Thu, 28 Apr 2022 19:42:00 -0500
Subject: [PATCH 1/3] MAINT: update docs
Signed-off-by: Chen <19492223+chenqi0805@users.noreply.github.com>
---
.../data-prepper/data-prepper-reference.md | 43 +++++++------
_clients/data-prepper/pipelines.md | 61 +++++++++++++++++++
2 files changed, 86 insertions(+), 18 deletions(-)
diff --git a/_clients/data-prepper/data-prepper-reference.md b/_clients/data-prepper/data-prepper-reference.md
index 8936e489..24366454 100644
--- a/_clients/data-prepper/data-prepper-reference.md
+++ b/_clients/data-prepper/data-prepper-reference.md
@@ -37,22 +37,23 @@ Sources define where your data comes from.
Source for the OpenTelemetry Collector.
-Option | Required | Type | Description
-:--- | :--- | :--- | :---
-port | No | Integer | The port OTel trace source is running on. Default is `21890`.
-request_timeout | No | Integer | The request timeout in milliseconds. Default is `10_000`.
-health_check_service | No | Boolean | Enables a gRPC health check service under `grpc.health.v1/Health/Check`. Default is `false`.
-proto_reflection_service | No | Boolean | Enables a reflection service for Protobuf services (see [gRPC reflection](https://github.com/grpc/grpc/blob/master/doc/server-reflection.md) and [gRPC Server Reflection Tutorial](https://github.com/grpc/grpc-java/blob/master/documentation/server-reflection-tutorial.md) docs). Default is `false`.
-unframed_requests | No | Boolean | Enable requests not framed using the gRPC wire protocol.
-thread_count | No | Integer | The number of threads to keep in the ScheduledThreadPool. Default is `200`.
-max_connection_count | No | Integer | The maximum allowed number of open connections. Default is `500`.
-ssl | No | Boolean | Enables connections to the OTel source port over TLS/SSL. Defaults to `true`.
-sslKeyCertChainFile | Conditionally | String | File-system path or AWS S3 path to the security certificate (e.g. `"config/demo-data-prepper.crt"` or `"s3://my-secrets-bucket/demo-data-prepper.crt"`). Required if ssl is set to `true`.
-sslKeyFile | Conditionally | String | File-system path or AWS S3 path to the security key (e.g. `"config/demo-data-prepper.key"` or `"s3://my-secrets-bucket/demo-data-prepper.key"`). Required if ssl is set to `true`.
-useAcmCertForSSL | No | Boolean | Whether to enable TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is `false`.
-acmCertificateArn | Conditionally | String | Represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificate. Required if `useAcmCertForSSL` is set to `true`.
-awsRegion | Conditionally | String | Represents the AWS region to use ACM or S3. Required if `useAcmCertForSSL` is set to `true` or `sslKeyCertChainFile` and `sslKeyFile` are AWS S3 paths.
-authentication | No | Object| An authentication configuration. By default, this runs an unauthenticated server. This uses pluggable authentication for HTTPS. To use basic authentication, define the `http_basic` plugin with a `username` and `password`. To provide customer authentication use or create a plugin which implements: [GrpcAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/GrpcAuthenticationProvider.java).
+Option | Required | Type | Description
+:--- |:--------------|:--------| :---
+port | No | Integer | The port OTel trace source is running on. Default is `21890`.
+request_timeout | No | Integer | The request timeout in milliseconds. Default is `10_000`.
+health_check_service | No | Boolean | Enables a gRPC health check service under `grpc.health.v1/Health/Check`. Default is `false`.
+proto_reflection_service | No | Boolean | Enables a reflection service for Protobuf services (see [gRPC reflection](https://github.com/grpc/grpc/blob/master/doc/server-reflection.md) and [gRPC Server Reflection Tutorial](https://github.com/grpc/grpc-java/blob/master/documentation/server-reflection-tutorial.md) docs). Default is `false`.
+unframed_requests | No | Boolean | Enable requests not framed using the gRPC wire protocol.
+thread_count | No | Integer | The number of threads to keep in the ScheduledThreadPool. Default is `200`.
+max_connection_count | No | Integer | The maximum allowed number of open connections. Default is `500`.
+ssl | No | Boolean | Enables connections to the OTel source port over TLS/SSL. Defaults to `true`.
+sslKeyCertChainFile | Conditionally | String | File-system path or AWS S3 path to the security certificate (e.g. `"config/demo-data-prepper.crt"` or `"s3://my-secrets-bucket/demo-data-prepper.crt"`). Required if ssl is set to `true`.
+sslKeyFile | Conditionally | String | File-system path or AWS S3 path to the security key (e.g. `"config/demo-data-prepper.key"` or `"s3://my-secrets-bucket/demo-data-prepper.key"`). Required if ssl is set to `true`.
+useAcmCertForSSL | No | Boolean | Whether to enable TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is `false`.
+acmCertificateArn | Conditionally | String | Represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificate. Required if `useAcmCertForSSL` is set to `true`.
+awsRegion | Conditionally | String | Represents the AWS region to use ACM or S3. Required if `useAcmCertForSSL` is set to `true` or `sslKeyCertChainFile` and `sslKeyFile` are AWS S3 paths.
+authentication | No | Object | An authentication configuration. By default, this runs an unauthenticated server. This uses pluggable authentication for HTTPS. To use basic authentication, define the `http_basic` plugin with a `username` and `password`. To provide customer authentication use or create a plugin which implements: [GrpcAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/GrpcAuthenticationProvider.java).
+record_type | No | String | A string represents the supported record data type that will be written into the buffer plugin. Its value takes either `otlp` or `event`. Default is `otlp`.
- `otlp`: otel-trace-source will write each incoming ExportTraceServiceRequest as record data type into the buffer.
- `event`: otel-trace-source will decode each incoming ExportTraceServiceRequest into collection of Data Prepper internal spans serving as buffer items. To achieve better performance in this mode, it is recommended to set the buffer capacity proportional to the estimated number of spans in the incoming request payload.
### http_source
@@ -116,13 +117,19 @@ Prior to Data Prepper 1.3, Processors were named Preppers. Starting in Data Prep
### otel_trace_raw_prepper
-Converts OpenTelemetry data to OpenSearch-compatible JSON documents.
+Converts OpenTelemetry data to OpenSearch-compatible JSON documents and fills in trace group related fields in those JSON documents. It requires `record_type` to be set as `otlp` in `otel_trace_source`.
Option | Required | Type | Description
:--- | :--- | :--- | :---
-root_span_flush_delay | No | Integer | Represents the time interval in seconds to flush all the root spans in the processor together with their descendants. Default is 30.
trace_flush_interval | No | Integer | Represents the time interval in seconds to flush all the descendant spans without any root span. Default is 180.
+### otel_trace_raw
+
+This processor is a Data Prepper event record type compatible version of `otel_trace_raw_prepper` that fills in trace group related fields into all incoming Data Prepper span records. It requires `record_type` to be set as `event` in `otel_trace_source`.
+
+Option | Required | Type | Description
+:--- | :--- | :--- | :---
+trace_flush_interval | No | Integer | Represents the time interval in seconds to flush all the descendant spans without any root span. Default is 180.
### service_map_stateful
diff --git a/_clients/data-prepper/pipelines.md b/_clients/data-prepper/pipelines.md
index b664d98a..8eb084a4 100644
--- a/_clients/data-prepper/pipelines.md
+++ b/_clients/data-prepper/pipelines.md
@@ -75,6 +75,8 @@ This example uses weak security. We strongly recommend securing all plugins whic
The following example demonstrates how to build a pipeline that supports the [Trace Analytics OpenSearch Dashboards plugin]({{site.url}}{{site.baseurl}}/observability-plugin/trace/ta-dashboards/). This pipeline takes data from the OpenTelemetry Collector and uses two other pipelines as sinks. These two separate pipelines index trace and the service map documents for the dashboard plugin.
+#### Classic
+
```yml
entry-pipeline:
delay: "100"
@@ -115,6 +117,65 @@ service-map-pipeline:
trace_analytics_service_map: true
```
+#### Event record type
+
+Starting from Data Prepper 1.4, we support event record type in trace analytics pipeline source, buffer and processors.
+
+```yml
+entry-pipeline:
+ delay: "100"
+ source:
+ otel_trace_source:
+ ssl: false
+ record_type: event
+ buffer:
+ bounded_blocking:
+ buffer_size: 10240
+ batch_size: 160
+ sink:
+ - pipeline:
+ name: "raw-pipeline"
+ - pipeline:
+ name: "service-map-pipeline"
+raw-pipeline:
+ source:
+ pipeline:
+ name: "entry-pipeline"
+ buffer:
+ bounded_blocking:
+ buffer_size: 10240
+ batch_size: 160
+ processor:
+ - otel_trace_raw:
+ sink:
+ - opensearch:
+ hosts: ["https://localhost:9200"]
+ insecure: true
+ username: admin
+ password: admin
+ trace_analytics_raw: true
+service-map-pipeline:
+ delay: "100"
+ source:
+ pipeline:
+ name: "entry-pipeline"
+ buffer:
+ bounded_blocking:
+ buffer_size: 10240
+ batch_size: 160
+ processor:
+ - service_map_stateful:
+ sink:
+ - opensearch:
+ hosts: ["https://localhost:9200"]
+ insecure: true
+ username: admin
+ password: admin
+ trace_analytics_service_map: true
+```
+
+Note that it is recommended to scale the `buffer_size` and `batch_size` by the estimated maximum batch size in the client request payload to maintain similar ingestion throughput and latency as in [Classic](#classic).
+
## Migrating from Logstash
Data Prepper supports Logstash configuration files for a limited set of plugins. Simply use the logstash config to run Data Prepper.
From de422a31420460632d7649711bc6149504bf7e18 Mon Sep 17 00:00:00 2001
From: Chen <19492223+chenqi0805@users.noreply.github.com>
Date: Fri, 29 Apr 2022 11:08:37 -0500
Subject: [PATCH 2/3] MAINT: add user recommendation
Signed-off-by: Chen <19492223+chenqi0805@users.noreply.github.com>
---
_clients/data-prepper/pipelines.md | 2 ++
1 file changed, 2 insertions(+)
diff --git a/_clients/data-prepper/pipelines.md b/_clients/data-prepper/pipelines.md
index 8eb084a4..3937b599 100644
--- a/_clients/data-prepper/pipelines.md
+++ b/_clients/data-prepper/pipelines.md
@@ -77,6 +77,8 @@ The following example demonstrates how to build a pipeline that supports the [Tr
#### Classic
+This pipeline definition will be deprecated in 2.0. Users are recommended to use [Event record type](#event-record-type) pipeline definition.
+
```yml
entry-pipeline:
delay: "100"
From a032aa13ae220b998c6e499dd1d86dad5e8b8911 Mon Sep 17 00:00:00 2001
From: Chen <19492223+chenqi0805@users.noreply.github.com>
Date: Fri, 29 Apr 2022 19:15:52 -0500
Subject: [PATCH 3/3] MAINT: update working
Signed-off-by: Chen <19492223+chenqi0805@users.noreply.github.com>
---
_clients/data-prepper/data-prepper-reference.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/_clients/data-prepper/data-prepper-reference.md b/_clients/data-prepper/data-prepper-reference.md
index 24366454..4d869bc1 100644
--- a/_clients/data-prepper/data-prepper-reference.md
+++ b/_clients/data-prepper/data-prepper-reference.md
@@ -52,7 +52,7 @@ sslKeyFile | Conditionally | String | File-system path or AWS S3 path to the se
useAcmCertForSSL | No | Boolean | Whether to enable TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is `false`.
acmCertificateArn | Conditionally | String | Represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificate. Required if `useAcmCertForSSL` is set to `true`.
awsRegion | Conditionally | String | Represents the AWS region to use ACM or S3. Required if `useAcmCertForSSL` is set to `true` or `sslKeyCertChainFile` and `sslKeyFile` are AWS S3 paths.
-authentication | No | Object | An authentication configuration. By default, this runs an unauthenticated server. This uses pluggable authentication for HTTPS. To use basic authentication, define the `http_basic` plugin with a `username` and `password`. To provide customer authentication use or create a plugin which implements: [GrpcAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/GrpcAuthenticationProvider.java).
+authentication | No | Object | An authentication configuration. By default, this creates an unauthenticated server for the pipeline. This uses pluggable authentication for HTTPS. To use basic authentication, define the `http_basic` plugin with a `username` and `password`. To provide customer authentication use or create a plugin which implements: [GrpcAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/GrpcAuthenticationProvider.java).
record_type | No | String | A string represents the supported record data type that will be written into the buffer plugin. Its value takes either `otlp` or `event`. Default is `otlp`. - `otlp`: otel-trace-source will write each incoming ExportTraceServiceRequest as record data type into the buffer.
- `event`: otel-trace-source will decode each incoming ExportTraceServiceRequest into collection of Data Prepper internal spans serving as buffer items. To achieve better performance in this mode, it is recommended to set the buffer capacity proportional to the estimated number of spans in the incoming request payload.
### http_source
@@ -66,7 +66,7 @@ request_timeout | No | Integer | The request timeout in millis. Default is `10_0
thread_count | No | Integer | The number of threads to keep in the ScheduledThreadPool. Default is `200`.
max_connection_count | No | Integer | The maximum allowed number of open connections. Default is `500`.
max_pending_requests | No | Integer | The maximum number of allowed tasks in ScheduledThreadPool work queue. Default is `1024`.
-authentication | No | Object | An authentication configuration. By default, this runs an unauthenticated server. This uses pluggable authentication for HTTPS. To use basic authentication define the `http_basic` plugin with a `username` and `password`. To provide customer authentication use or create a plugin which implements: [ArmeriaHttpAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/ArmeriaHttpAuthenticationProvider.java).
+authentication | No | Object | An authentication configuration. By default, this creates an unauthenticated server for the pipeline. This uses pluggable authentication for HTTPS. To use basic authentication define the `http_basic` plugin with a `username` and `password`. To provide customer authentication use or create a plugin which implements: [ArmeriaHttpAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/ArmeriaHttpAuthenticationProvider.java).
### file