From 8651556df8c7bd36a99c55a05d70e13c40f4898b Mon Sep 17 00:00:00 2001 From: Chen <19492223+chenqi0805@users.noreply.github.com> Date: Thu, 28 Apr 2022 19:42:00 -0500 Subject: [PATCH 1/3] MAINT: update docs Signed-off-by: Chen <19492223+chenqi0805@users.noreply.github.com> --- .../data-prepper/data-prepper-reference.md | 43 +++++++------ _clients/data-prepper/pipelines.md | 61 +++++++++++++++++++ 2 files changed, 86 insertions(+), 18 deletions(-) diff --git a/_clients/data-prepper/data-prepper-reference.md b/_clients/data-prepper/data-prepper-reference.md index 8936e489..24366454 100644 --- a/_clients/data-prepper/data-prepper-reference.md +++ b/_clients/data-prepper/data-prepper-reference.md @@ -37,22 +37,23 @@ Sources define where your data comes from. Source for the OpenTelemetry Collector. -Option | Required | Type | Description -:--- | :--- | :--- | :--- -port | No | Integer | The port OTel trace source is running on. Default is `21890`. -request_timeout | No | Integer | The request timeout in milliseconds. Default is `10_000`. -health_check_service | No | Boolean | Enables a gRPC health check service under `grpc.health.v1/Health/Check`. Default is `false`. -proto_reflection_service | No | Boolean | Enables a reflection service for Protobuf services (see [gRPC reflection](https://github.com/grpc/grpc/blob/master/doc/server-reflection.md) and [gRPC Server Reflection Tutorial](https://github.com/grpc/grpc-java/blob/master/documentation/server-reflection-tutorial.md) docs). Default is `false`. -unframed_requests | No | Boolean | Enable requests not framed using the gRPC wire protocol. -thread_count | No | Integer | The number of threads to keep in the ScheduledThreadPool. Default is `200`. -max_connection_count | No | Integer | The maximum allowed number of open connections. Default is `500`. -ssl | No | Boolean | Enables connections to the OTel source port over TLS/SSL. Defaults to `true`. -sslKeyCertChainFile | Conditionally | String | File-system path or AWS S3 path to the security certificate (e.g. `"config/demo-data-prepper.crt"` or `"s3://my-secrets-bucket/demo-data-prepper.crt"`). Required if ssl is set to `true`. -sslKeyFile | Conditionally | String | File-system path or AWS S3 path to the security key (e.g. `"config/demo-data-prepper.key"` or `"s3://my-secrets-bucket/demo-data-prepper.key"`). Required if ssl is set to `true`. -useAcmCertForSSL | No | Boolean | Whether to enable TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is `false`. -acmCertificateArn | Conditionally | String | Represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificate. Required if `useAcmCertForSSL` is set to `true`. -awsRegion | Conditionally | String | Represents the AWS region to use ACM or S3. Required if `useAcmCertForSSL` is set to `true` or `sslKeyCertChainFile` and `sslKeyFile` are AWS S3 paths. -authentication | No | Object| An authentication configuration. By default, this runs an unauthenticated server. This uses pluggable authentication for HTTPS. To use basic authentication, define the `http_basic` plugin with a `username` and `password`. To provide customer authentication use or create a plugin which implements: [GrpcAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/GrpcAuthenticationProvider.java). +Option | Required | Type | Description +:--- |:--------------|:--------| :--- +port | No | Integer | The port OTel trace source is running on. Default is `21890`. +request_timeout | No | Integer | The request timeout in milliseconds. Default is `10_000`. +health_check_service | No | Boolean | Enables a gRPC health check service under `grpc.health.v1/Health/Check`. Default is `false`. +proto_reflection_service | No | Boolean | Enables a reflection service for Protobuf services (see [gRPC reflection](https://github.com/grpc/grpc/blob/master/doc/server-reflection.md) and [gRPC Server Reflection Tutorial](https://github.com/grpc/grpc-java/blob/master/documentation/server-reflection-tutorial.md) docs). Default is `false`. +unframed_requests | No | Boolean | Enable requests not framed using the gRPC wire protocol. +thread_count | No | Integer | The number of threads to keep in the ScheduledThreadPool. Default is `200`. +max_connection_count | No | Integer | The maximum allowed number of open connections. Default is `500`. +ssl | No | Boolean | Enables connections to the OTel source port over TLS/SSL. Defaults to `true`. +sslKeyCertChainFile | Conditionally | String | File-system path or AWS S3 path to the security certificate (e.g. `"config/demo-data-prepper.crt"` or `"s3://my-secrets-bucket/demo-data-prepper.crt"`). Required if ssl is set to `true`. +sslKeyFile | Conditionally | String | File-system path or AWS S3 path to the security key (e.g. `"config/demo-data-prepper.key"` or `"s3://my-secrets-bucket/demo-data-prepper.key"`). Required if ssl is set to `true`. +useAcmCertForSSL | No | Boolean | Whether to enable TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is `false`. +acmCertificateArn | Conditionally | String | Represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificate. Required if `useAcmCertForSSL` is set to `true`. +awsRegion | Conditionally | String | Represents the AWS region to use ACM or S3. Required if `useAcmCertForSSL` is set to `true` or `sslKeyCertChainFile` and `sslKeyFile` are AWS S3 paths. +authentication | No | Object | An authentication configuration. By default, this runs an unauthenticated server. This uses pluggable authentication for HTTPS. To use basic authentication, define the `http_basic` plugin with a `username` and `password`. To provide customer authentication use or create a plugin which implements: [GrpcAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/GrpcAuthenticationProvider.java). +record_type | No | String | A string represents the supported record data type that will be written into the buffer plugin. Its value takes either `otlp` or `event`. Default is `otlp`. ### http_source @@ -116,13 +117,19 @@ Prior to Data Prepper 1.3, Processors were named Preppers. Starting in Data Prep ### otel_trace_raw_prepper -Converts OpenTelemetry data to OpenSearch-compatible JSON documents. +Converts OpenTelemetry data to OpenSearch-compatible JSON documents and fills in trace group related fields in those JSON documents. It requires `record_type` to be set as `otlp` in `otel_trace_source`. Option | Required | Type | Description :--- | :--- | :--- | :--- -root_span_flush_delay | No | Integer | Represents the time interval in seconds to flush all the root spans in the processor together with their descendants. Default is 30. trace_flush_interval | No | Integer | Represents the time interval in seconds to flush all the descendant spans without any root span. Default is 180. +### otel_trace_raw + +This processor is a Data Prepper event record type compatible version of `otel_trace_raw_prepper` that fills in trace group related fields into all incoming Data Prepper span records. It requires `record_type` to be set as `event` in `otel_trace_source`. + +Option | Required | Type | Description +:--- | :--- | :--- | :--- +trace_flush_interval | No | Integer | Represents the time interval in seconds to flush all the descendant spans without any root span. Default is 180. ### service_map_stateful diff --git a/_clients/data-prepper/pipelines.md b/_clients/data-prepper/pipelines.md index b664d98a..8eb084a4 100644 --- a/_clients/data-prepper/pipelines.md +++ b/_clients/data-prepper/pipelines.md @@ -75,6 +75,8 @@ This example uses weak security. We strongly recommend securing all plugins whic The following example demonstrates how to build a pipeline that supports the [Trace Analytics OpenSearch Dashboards plugin]({{site.url}}{{site.baseurl}}/observability-plugin/trace/ta-dashboards/). This pipeline takes data from the OpenTelemetry Collector and uses two other pipelines as sinks. These two separate pipelines index trace and the service map documents for the dashboard plugin. +#### Classic + ```yml entry-pipeline: delay: "100" @@ -115,6 +117,65 @@ service-map-pipeline: trace_analytics_service_map: true ``` +#### Event record type + +Starting from Data Prepper 1.4, we support event record type in trace analytics pipeline source, buffer and processors. + +```yml +entry-pipeline: + delay: "100" + source: + otel_trace_source: + ssl: false + record_type: event + buffer: + bounded_blocking: + buffer_size: 10240 + batch_size: 160 + sink: + - pipeline: + name: "raw-pipeline" + - pipeline: + name: "service-map-pipeline" +raw-pipeline: + source: + pipeline: + name: "entry-pipeline" + buffer: + bounded_blocking: + buffer_size: 10240 + batch_size: 160 + processor: + - otel_trace_raw: + sink: + - opensearch: + hosts: ["https://localhost:9200"] + insecure: true + username: admin + password: admin + trace_analytics_raw: true +service-map-pipeline: + delay: "100" + source: + pipeline: + name: "entry-pipeline" + buffer: + bounded_blocking: + buffer_size: 10240 + batch_size: 160 + processor: + - service_map_stateful: + sink: + - opensearch: + hosts: ["https://localhost:9200"] + insecure: true + username: admin + password: admin + trace_analytics_service_map: true +``` + +Note that it is recommended to scale the `buffer_size` and `batch_size` by the estimated maximum batch size in the client request payload to maintain similar ingestion throughput and latency as in [Classic](#classic). + ## Migrating from Logstash Data Prepper supports Logstash configuration files for a limited set of plugins. Simply use the logstash config to run Data Prepper. From de422a31420460632d7649711bc6149504bf7e18 Mon Sep 17 00:00:00 2001 From: Chen <19492223+chenqi0805@users.noreply.github.com> Date: Fri, 29 Apr 2022 11:08:37 -0500 Subject: [PATCH 2/3] MAINT: add user recommendation Signed-off-by: Chen <19492223+chenqi0805@users.noreply.github.com> --- _clients/data-prepper/pipelines.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_clients/data-prepper/pipelines.md b/_clients/data-prepper/pipelines.md index 8eb084a4..3937b599 100644 --- a/_clients/data-prepper/pipelines.md +++ b/_clients/data-prepper/pipelines.md @@ -77,6 +77,8 @@ The following example demonstrates how to build a pipeline that supports the [Tr #### Classic +This pipeline definition will be deprecated in 2.0. Users are recommended to use [Event record type](#event-record-type) pipeline definition. + ```yml entry-pipeline: delay: "100" From a032aa13ae220b998c6e499dd1d86dad5e8b8911 Mon Sep 17 00:00:00 2001 From: Chen <19492223+chenqi0805@users.noreply.github.com> Date: Fri, 29 Apr 2022 19:15:52 -0500 Subject: [PATCH 3/3] MAINT: update working Signed-off-by: Chen <19492223+chenqi0805@users.noreply.github.com> --- _clients/data-prepper/data-prepper-reference.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_clients/data-prepper/data-prepper-reference.md b/_clients/data-prepper/data-prepper-reference.md index 24366454..4d869bc1 100644 --- a/_clients/data-prepper/data-prepper-reference.md +++ b/_clients/data-prepper/data-prepper-reference.md @@ -52,7 +52,7 @@ sslKeyFile | Conditionally | String | File-system path or AWS S3 path to the se useAcmCertForSSL | No | Boolean | Whether to enable TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is `false`. acmCertificateArn | Conditionally | String | Represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificate. Required if `useAcmCertForSSL` is set to `true`. awsRegion | Conditionally | String | Represents the AWS region to use ACM or S3. Required if `useAcmCertForSSL` is set to `true` or `sslKeyCertChainFile` and `sslKeyFile` are AWS S3 paths. -authentication | No | Object | An authentication configuration. By default, this runs an unauthenticated server. This uses pluggable authentication for HTTPS. To use basic authentication, define the `http_basic` plugin with a `username` and `password`. To provide customer authentication use or create a plugin which implements: [GrpcAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/GrpcAuthenticationProvider.java). +authentication | No | Object | An authentication configuration. By default, this creates an unauthenticated server for the pipeline. This uses pluggable authentication for HTTPS. To use basic authentication, define the `http_basic` plugin with a `username` and `password`. To provide customer authentication use or create a plugin which implements: [GrpcAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/GrpcAuthenticationProvider.java). record_type | No | String | A string represents the supported record data type that will be written into the buffer plugin. Its value takes either `otlp` or `event`. Default is `otlp`. ### http_source @@ -66,7 +66,7 @@ request_timeout | No | Integer | The request timeout in millis. Default is `10_0 thread_count | No | Integer | The number of threads to keep in the ScheduledThreadPool. Default is `200`. max_connection_count | No | Integer | The maximum allowed number of open connections. Default is `500`. max_pending_requests | No | Integer | The maximum number of allowed tasks in ScheduledThreadPool work queue. Default is `1024`. -authentication | No | Object | An authentication configuration. By default, this runs an unauthenticated server. This uses pluggable authentication for HTTPS. To use basic authentication define the `http_basic` plugin with a `username` and `password`. To provide customer authentication use or create a plugin which implements: [ArmeriaHttpAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/ArmeriaHttpAuthenticationProvider.java). +authentication | No | Object | An authentication configuration. By default, this creates an unauthenticated server for the pipeline. This uses pluggable authentication for HTTPS. To use basic authentication define the `http_basic` plugin with a `username` and `password`. To provide customer authentication use or create a plugin which implements: [ArmeriaHttpAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/ArmeriaHttpAuthenticationProvider.java). ### file