Merge pull request #330 from opensearch-project/data-prepper-1.2
Data prepper 1.2
This commit is contained in:
commit
4d9dc70a43
|
@ -48,7 +48,7 @@ collections:
|
|||
replication-plugin:
|
||||
permalink: /:collection/:path/
|
||||
output: true
|
||||
observability-plugins:
|
||||
observability:
|
||||
permalink: /:collection/:path/
|
||||
output: true
|
||||
monitoring-plugins:
|
||||
|
@ -90,8 +90,8 @@ just_the_docs:
|
|||
replication-plugin:
|
||||
name: Replication plugin
|
||||
nav_fold: true
|
||||
observability-plugins:
|
||||
name: Observability plugins
|
||||
observability:
|
||||
name: Observability
|
||||
nav_fold: true
|
||||
monitoring-plugins:
|
||||
name: Monitoring plugins
|
||||
|
|
|
@ -1,26 +0,0 @@
|
|||
---
|
||||
layout: default
|
||||
title: About Observability
|
||||
nav_order: 1
|
||||
has_children: false
|
||||
redirect_from:
|
||||
- /observability-plugins/
|
||||
---
|
||||
|
||||
# About Observability
|
||||
OpenSearch Dashboards
|
||||
{: .label .label-yellow :}
|
||||
|
||||
The Observability plugins are a collection of plugins that let you visualize data-driven events by using Piped Processing Language to explore, discover, and query data stored in OpenSearch.
|
||||
|
||||
Your experience of exploring data might differ, but if you're new to exploring data to create visualizations, we recommend trying a workflow like the following:
|
||||
|
||||
1. Explore data over a certain timeframe using [Piped Processing Language]({{site.url}}{{site.baseurl}}/observability-plugins/ppl/index).
|
||||
1. Use [event analytics]({{site.url}}{{site.baseurl}}/observability-plugins/event-analytics) to turn data-driven events into visualizations.
|
||||
![Sample Event Analytics View]({{site.url}}{{site.baseurl}}/images/event-analytics.png)
|
||||
1. Create [operational panels]({{site.url}}{{site.baseurl}}/observability-plugins/operational-panels) and add visualizations to compare data the way you like.
|
||||
![Sample Operational Panel View]({{site.url}}{{site.baseurl}}/images/operational-panel.png)
|
||||
1. Use [trace analytics]({{site.url}}{{site.baseurl}}/observability-plugins/trace/index) to create traces and dive deep into your data.
|
||||
![Sample Trace Analytics View]({{site.url}}{{site.baseurl}}/images/observability-trace.png)
|
||||
1. Leverage [notebooks]({{site.url}}{{site.baseurl}}/observability-plugins/notebooks) to combine different visualizations and code blocks that you can share with team members.
|
||||
![Sample Notebooks View]({{site.url}}{{site.baseurl}}/images/notebooks.png)
|
|
@ -1,195 +0,0 @@
|
|||
---
|
||||
layout: default
|
||||
title: Configuration reference
|
||||
parent: Trace analytics
|
||||
nav_order: 25
|
||||
---
|
||||
|
||||
# Data Prepper configuration reference
|
||||
|
||||
This page lists all supported Data Prepper sources, buffers, preppers, and sinks, along with their associated options. For example configuration files, see [Data Prepper]({{site.url}}{{site.baseurl}}/observability-plugins/trace/data-prepper/).
|
||||
|
||||
|
||||
## Data Prepper server options
|
||||
|
||||
Option | Required | Description
|
||||
:--- | :--- | :---
|
||||
ssl | No | Boolean, indicating whether TLS should be used for server APIs. Defaults to true.
|
||||
keyStoreFilePath | No | String, path to a .jks or .p12 keystore file. Required if ssl is true.
|
||||
keyStorePassword | No | String, password for keystore. Optional, defaults to empty string.
|
||||
privateKeyPassword | No | String, password for private key within keystore. Optional, defaults to empty string.
|
||||
serverPort | No | Integer, port number to use for server APIs. Defaults to 4900
|
||||
|
||||
|
||||
## General pipeline options
|
||||
|
||||
Option | Required | Description
|
||||
:--- | :--- | :---
|
||||
workers | No | Integer, default 1. Essentially the number of application threads. As a starting point for your use case, try setting this value to the number of CPU cores on the machine.
|
||||
delay | No | Integer (milliseconds), default 3,000. How long workers wait between buffer read attempts.
|
||||
|
||||
|
||||
## Sources
|
||||
|
||||
Sources define where your data comes from.
|
||||
|
||||
|
||||
### otel_trace_source
|
||||
|
||||
Source for the OpenTelemetry Collector.
|
||||
|
||||
Option | Required | Description
|
||||
:--- | :--- | :---
|
||||
port | No | Integer, the port OTel trace source is running on. Default is `21890`.
|
||||
request_timeout | No | Integer, the request timeout in millis. Default is `10_000`.
|
||||
health_check_service | No | Boolean, enables a gRPC health check service under `grpc.health.v1/Health/Check`. Default is `false`.
|
||||
proto_reflection_service | No | Boolean, enables a reflection service for Protobuf services (see [gRPC reflection](https://github.com/grpc/grpc/blob/master/doc/server-reflection.md) and [gRPC Server Reflection Tutorial](https://github.com/grpc/grpc-java/blob/master/documentation/server-reflection-tutorial.md) docs). Default is `false`.
|
||||
unframed_requests | No | Boolean, enable requests not framed using the gRPC wire protocol.
|
||||
thread_count | No | Integer, the number of threads to keep in the ScheduledThreadPool. Default is `200`.
|
||||
max_connection_count | No | Integer, the maximum allowed number of open connections. Default is `500`.
|
||||
ssl | No | Boolean, enables connections to the OTel source port over TLS/SSL. Defaults to `true`.
|
||||
sslKeyCertChainFile | Conditionally | String, file-system path or AWS S3 path to the security certificate (e.g. `"config/demo-data-prepper.crt"` or `"s3://my-secrets-bucket/demo-data-prepper.crt"`). Required if ssl is set to `true`.
|
||||
sslKeyFile | Conditionally | String, file-system path or AWS S3 path to the security key (e.g. `"config/demo-data-prepper.key"` or `"s3://my-secrets-bucket/demo-data-prepper.key"`). Required if ssl is set to `true`.
|
||||
useAcmCertForSSL | No | Boolean, enables TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is `false`.
|
||||
acmCertificateArn | Conditionally | String, represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificate. Required if `useAcmCertForSSL` is set to `true`.
|
||||
awsRegion | Conditionally | String, represents the AWS region to use ACM or S3. Required if `useAcmCertForSSL` is set to `true` or `sslKeyCertChainFile` and `sslKeyFile` are AWS S3 paths.
|
||||
|
||||
|
||||
### file
|
||||
|
||||
Source for flat file input.
|
||||
|
||||
Option | Required | Description
|
||||
:--- | :--- | :---
|
||||
path | Yes | String, path to the input file (e.g. `logs/my-log.log`).
|
||||
|
||||
|
||||
### pipeline
|
||||
|
||||
Source for reading from another pipeline.
|
||||
|
||||
Option | Required | Description
|
||||
:--- | :--- | :---
|
||||
name | Yes | String, name of the pipeline to read from.
|
||||
|
||||
|
||||
### stdin
|
||||
|
||||
Source for console input. Can be useful for testing. No options.
|
||||
|
||||
|
||||
## Buffers
|
||||
|
||||
Buffers store data as it passes through the pipeline. If you implement a custom buffer, it can be memory-based (better performance) or disk-based (larger).
|
||||
|
||||
|
||||
### bounded_blocking
|
||||
|
||||
The default buffer. Memory-based.
|
||||
|
||||
Option | Required | Description
|
||||
:--- | :--- | :---
|
||||
buffer_size | No | Integer, default 512. The maximum number of records the buffer accepts.
|
||||
batch_size | No | Integer, default 8. The maximum number of records the buffer drains after each read.
|
||||
|
||||
|
||||
## Preppers
|
||||
|
||||
Preppers perform some action on your data: filter, transform, enrich, etc.
|
||||
|
||||
|
||||
### otel_trace_raw_prepper
|
||||
|
||||
Converts OpenTelemetry data to OpenSearch-compatible JSON documents.
|
||||
|
||||
Option | Required | Description
|
||||
:--- | :--- | :---
|
||||
root_span_flush_delay | No | Integer, representing the time interval in seconds to flush all the root spans in the prepper together with their descendants. Defaults to 30.
|
||||
trace_flush_interval | No | Integer, representing the time interval in seconds to flush all the descendant spans without any root span. Defaults to 180.
|
||||
|
||||
|
||||
### service_map_stateful
|
||||
|
||||
Uses OpenTelemetry data to create a distributed service map for visualization in OpenSearch Dashboards.
|
||||
|
||||
Option | Required | Description
|
||||
:--- | :--- | :---
|
||||
window_duration | No | Integer, representing the fixed time window in seconds to evaluate service-map relationships. Defaults to 180.
|
||||
|
||||
### peer_forwarder
|
||||
|
||||
Forwards ExportTraceServiceRequests via gRPC to other Data Prepper instances. Required for operating Data Prepper in a clustered deployment.
|
||||
|
||||
Option | Required | Description
|
||||
:--- | :--- | :---
|
||||
time_out | No | Integer, forwarded request timeout in seconds. Defaults to 3 seconds.
|
||||
span_agg_count | No | Integer, batch size for number of spans per request. Defaults to 48.
|
||||
target_port | No | Integer, the destination port to forward requests to. Defaults to `21890`.
|
||||
discovery_mode | No | String, peer discovery mode to be used. Allowable values are `static`, `dns`, and `aws_cloud_map`. Defaults to `static`.
|
||||
static_endpoints | No | List, containing string endpoints of all Data Prepper instances.
|
||||
domain_name | No | String, single domain name to query DNS against. Typically used by creating multiple DNS A Records for the same domain.
|
||||
ssl | No | Boolean, indicating whether TLS should be used. Default is true.
|
||||
awsCloudMapNamespaceName | Conditionally | String, name of your CloudMap Namespace. Required if `discovery_mode` is set to `aws_cloud_map`.
|
||||
awsCloudMapServiceName | Conditionally | String, service name within your CloudMap Namespace. Required if `discovery_mode` is set to `aws_cloud_map`.
|
||||
sslKeyCertChainFile | Conditionally | String, represents the SSL certificate chain file path or AWS S3 path. S3 path example `s3://<bucketName>/<path>`. Required if `ssl` is set to `true`.
|
||||
useAcmCertForSSL | No | Boolean, enables TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is `false`.
|
||||
awsRegion | Conditionally | String, represents the AWS region to use ACM, S3, or CloudMap. Required if `useAcmCertForSSL` is set to `true` or `sslKeyCertChainFile` and `sslKeyFile` are AWS S3 paths.
|
||||
acmCertificateArn | Conditionally | String represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificate. Required if `useAcmCertForSSL` is set to `true`.
|
||||
|
||||
### string_converter
|
||||
|
||||
Converts strings to uppercase or lowercase. Mostly useful as an example if you want to develop your own prepper.
|
||||
|
||||
Option | Required | Description
|
||||
:--- | :--- | :---
|
||||
upper_case | No | Boolean, whether to convert to uppercase (`true`) or lowercase (`false`).
|
||||
|
||||
|
||||
## Sinks
|
||||
|
||||
Sinks define where Data Prepper writes your data to.
|
||||
|
||||
|
||||
### opensearch
|
||||
|
||||
Sink for an OpenSearch cluster.
|
||||
|
||||
Option | Required | Description
|
||||
:--- | :--- | :---
|
||||
hosts | Yes | List of OpenSearch hosts to write to (e.g. `["https://localhost:9200", "https://remote-cluster:9200"]`).
|
||||
cert | No | String, path to the security certificate (e.g. `"config/root-ca.pem"`) if the cluster uses the OpenSearch security plugin.
|
||||
username | No | String, username for HTTP basic authentication.
|
||||
password | No | String, password for HTTP basic authentication.
|
||||
aws_sigv4 | No | Boolean, whether to use IAM signing to connect to an Amazon OpenSearch Service domain. For your access key, secret key, and optional session token, Data Prepper uses the default credential chain (environment variables, Java system properties, `~/.aws/credential`, etc.).
|
||||
aws_region | No | String, AWS region (e.g. `"us-east-1"`) for the domain if you are connecting to Amazon OpenSearch Service.
|
||||
aws_sts_role | No | String, IAM role which the sink plugin will assume to sign request to Amazon OpenSearch Service. If not provided the plugin will use the default credentials.
|
||||
trace_analytics_raw | No | Boolean, default false. Whether to export as trace data to the `otel-v1-apm-span-*` index pattern (alias `otel-v1-apm-span`) for use with the Trace Analytics OpenSearch Dashboards plugin.
|
||||
trace_analytics_service_map | No | Boolean, default false. Whether to export as trace data to the `otel-v1-apm-service-map` index for use with the service map component of the Trace Analytics OpenSearch Dashboards plugin.
|
||||
index | No | String, name of the index to export to. Only required if you don't use the `trace_analytics_raw` or `trace_analytics_service_map` presets.
|
||||
template_file | No | String, the path to a JSON [index template]({{site.url}}{{site.baseurl}}/opensearch/index-templates/) file (e.g. `/your/local/template-file.json` if you do not use the `trace_analytics_raw` or `trace_analytics_service_map`. See [otel-v1-apm-span-index-template.json](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/src/main/resources/otel-v1-apm-span-index-template.json) for an example.
|
||||
document_id_field | No | String, the field from the source data to use for the OpenSearch document ID (e.g. `"my-field"`) if you don't use the `trace_analytics_raw` or `trace_analytics_service_map` presets.
|
||||
dlq_file | No | String, the path to your preferred dead letter queue file (e.g. `/your/local/dlq-file`). Data Prepper writes to this file when it fails to index a document on the OpenSearch cluster.
|
||||
bulk_size | No | Integer (long), default 5. The maximum size (in MiB) of bulk requests to the OpenSearch cluster. Values below 0 indicate an unlimited size. If a single document exceeds the maximum bulk request size, Data Prepper sends it individually.
|
||||
|
||||
|
||||
### file
|
||||
|
||||
Sink for flat file output.
|
||||
|
||||
Option | Required | Description
|
||||
:--- | :--- | :---
|
||||
path | Yes | String, path for the output file (e.g. `logs/my-transformed-log.log`).
|
||||
|
||||
|
||||
### pipeline
|
||||
|
||||
Sink for writing to another pipeline.
|
||||
|
||||
Option | Required | Description
|
||||
:--- | :--- | :---
|
||||
name | Yes | String, name of the pipeline to write to.
|
||||
|
||||
|
||||
### stdout
|
||||
|
||||
Sink for console output. Can be useful for testing. No options.
|
|
@ -1,135 +0,0 @@
|
|||
---
|
||||
layout: default
|
||||
title: Data Prepper
|
||||
parent: Trace analytics
|
||||
nav_order: 20
|
||||
---
|
||||
|
||||
# Data Prepper
|
||||
|
||||
Data Prepper is an independent component, not an OpenSearch plugin, that converts data for use with OpenSearch. It's not bundled with the all-in-one OpenSearch installation packages.
|
||||
|
||||
|
||||
## Install Data Prepper
|
||||
|
||||
To use the Docker image, pull it like any other image:
|
||||
|
||||
```bash
|
||||
docker pull opensearchproject/data-prepper:latest
|
||||
```
|
||||
|
||||
Otherwise, [download](https://opensearch.org/downloads.html) the appropriate archive for your operating system and unzip it.
|
||||
|
||||
|
||||
## Configure pipelines
|
||||
|
||||
To use Data Prepper, you define pipelines in a configuration YAML file. Each pipeline is a combination of a source, a buffer, zero or more preppers, and one or more sinks:
|
||||
|
||||
```yml
|
||||
sample-pipeline:
|
||||
workers: 4 # the number of workers
|
||||
delay: 100 # in milliseconds, how long workers wait between read attempts
|
||||
source:
|
||||
otel_trace_source:
|
||||
ssl: true
|
||||
sslKeyCertChainFile: "config/demo-data-prepper.crt"
|
||||
sslKeyFile: "config/demo-data-prepper.key"
|
||||
buffer:
|
||||
bounded_blocking:
|
||||
buffer_size: 1024 # max number of records the buffer accepts
|
||||
batch_size: 256 # max number of records the buffer drains after each read
|
||||
prepper:
|
||||
- otel_trace_raw_prepper:
|
||||
sink:
|
||||
- opensearch:
|
||||
hosts: ["https:localhost:9200"]
|
||||
cert: "config/root-ca.pem"
|
||||
username: "ta-user"
|
||||
password: "ta-password"
|
||||
trace_analytics_raw: true
|
||||
```
|
||||
|
||||
- Sources define where your data comes from. In this case, the source is the OpenTelemetry Collector (`otel_trace_source`) with some optional SSL settings.
|
||||
|
||||
- Buffers store data as it passes through the pipeline.
|
||||
|
||||
By default, Data Prepper uses its one and only buffer, the `bounded_blocking` buffer, so you can omit this section unless you developed a custom buffer or need to tune the buffer settings.
|
||||
|
||||
- Preppers perform some action on your data: filter, transform, enrich, etc.
|
||||
|
||||
You can have multiple preppers, which run sequentially from top to bottom, not in parallel. The `otel_trace_raw_prepper` prepper converts OpenTelemetry data into OpenSearch-compatible JSON documents.
|
||||
|
||||
- Sinks define where your data goes. In this case, the sink is an OpenSearch cluster.
|
||||
|
||||
Pipelines can act as the source for other pipelines. In the following example, a pipeline takes data from the OpenTelemetry Collector and uses two other pipelines as sinks:
|
||||
|
||||
```yml
|
||||
entry-pipeline:
|
||||
delay: "100"
|
||||
source:
|
||||
otel_trace_source:
|
||||
ssl: true
|
||||
sslKeyCertChainFile: "config/demo-data-prepper.crt"
|
||||
sslKeyFile: "config/demo-data-prepper.key"
|
||||
sink:
|
||||
- pipeline:
|
||||
name: "raw-pipeline"
|
||||
- pipeline:
|
||||
name: "service-map-pipeline"
|
||||
raw-pipeline:
|
||||
source:
|
||||
pipeline:
|
||||
name: "entry-pipeline"
|
||||
prepper:
|
||||
- otel_trace_raw_prepper:
|
||||
sink:
|
||||
- opensearch:
|
||||
hosts: ["https://localhost:9200" ]
|
||||
cert: "config/root-ca.pem"
|
||||
username: "ta-user"
|
||||
password: "ta-password"
|
||||
trace_analytics_raw: true
|
||||
service-map-pipeline:
|
||||
delay: "100"
|
||||
source:
|
||||
pipeline:
|
||||
name: "entry-pipeline"
|
||||
prepper:
|
||||
- service_map_stateful:
|
||||
sink:
|
||||
- opensearch:
|
||||
hosts: ["https://localhost:9200"]
|
||||
cert: "config/root-ca.pem"
|
||||
username: "ta-user"
|
||||
password: "ta-password"
|
||||
trace_analytics_service_map: true
|
||||
```
|
||||
|
||||
To learn more, see the [Data Prepper configuration reference]({{site.url}}{{site.baseurl}}/observability-plugins/trace/data-prepper-reference/).
|
||||
|
||||
## Configure the Data Prepper server
|
||||
Data Prepper itself provides administrative HTTP endpoints such as `/list` to list pipelines and `/metrics/prometheus` to provide Prometheus-compatible metrics data. The port which serves these endpoints, as well as TLS configuration, is specified by a separate YAML file. Example:
|
||||
|
||||
```yml
|
||||
ssl: true
|
||||
keyStoreFilePath: "/usr/share/data-prepper/keystore.jks"
|
||||
keyStorePassword: "password"
|
||||
privateKeyPassword: "other_password"
|
||||
serverPort: 1234
|
||||
```
|
||||
|
||||
## Start Data Prepper
|
||||
|
||||
**Docker**
|
||||
|
||||
```bash
|
||||
docker run --name data-prepper --expose 21890 -v /full/path/to/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml -v /full/path/to/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml opensearchproject/opensearch-data-prepper:latest
|
||||
```
|
||||
|
||||
**macOS and Linux**
|
||||
|
||||
```bash
|
||||
./data-prepper-tar-install.sh config/pipelines.yaml config/data-prepper-config.yaml
|
||||
```
|
||||
|
||||
For production workloads, you likely want to run Data Prepper on a dedicated machine, which makes connectivity a concern. Data Prepper uses port 21890 and must be able to connect to both the OpenTelemetry Collector and the OpenSearch cluster. In the [sample applications](https://github.com/opensearch-project/Data-Prepper/tree/main/examples), you can see that all components use the same Docker network and expose the appropriate ports.
|
|
@ -0,0 +1,231 @@
|
|||
---
|
||||
layout: default
|
||||
title: Configuration reference
|
||||
parent: Data Prepper
|
||||
nav_order: 3
|
||||
---
|
||||
|
||||
# Data Prepper configuration reference
|
||||
|
||||
This page lists all supported Data Prepper server, sources, buffers, preppers, and sinks, along with their associated options. For example configuration files, see [Data Prepper]({{site.url}}{{site.baseurl}}/observability/data-prepper/pipelines/).
|
||||
|
||||
## Data Prepper server options
|
||||
|
||||
Option | Required | Type | Description
|
||||
:--- | :--- | :--- | :---
|
||||
ssl | No | Boolean | Indicates whether TLS should be used for server APIs. Defaults to true.
|
||||
keyStoreFilePath | No | String | Path to a .jks or .p12 keystore file. Required if ssl is true.
|
||||
keyStorePassword | No | String | Password for keystore. Optional, defaults to empty string.
|
||||
privateKeyPassword | No | String | Password for private key within keystore. Optional, defaults to empty string.
|
||||
serverPort | No | Integer | Port number to use for server APIs. Defaults to 4900
|
||||
metricRegistries | No | List | Metrics registries for publishing the generated metrics. Currently supports Prometheus and CloudWatch. Defaults to Prometheus.
|
||||
|
||||
## General pipeline options
|
||||
|
||||
Option | Required | Type | Description
|
||||
:--- | :--- | :--- | :---
|
||||
workers | No | Integer | Essentially the number of application threads. As a starting point for your use case, try setting this value to the number of CPU cores on the machine. Default is 1.
|
||||
delay | No | Integer | Amount of time in milliseconds workers wait between buffer read attempts. Default is 3,000.
|
||||
|
||||
|
||||
## Sources
|
||||
|
||||
Sources define where your data comes from.
|
||||
|
||||
|
||||
### otel_trace_source
|
||||
|
||||
Source for the OpenTelemetry Collector.
|
||||
|
||||
Option | Required | Type | Description
|
||||
:--- | :--- | :--- | :---
|
||||
port | No | Integer | The port OTel trace source is running on. Default is `21890`.
|
||||
request_timeout | No | Integer | The request timeout in milliseconds. Default is `10_000`.
|
||||
health_check_service | No | Boolean | Enables a gRPC health check service under `grpc.health.v1/Health/Check`. Default is `false`.
|
||||
proto_reflection_service | No | Boolean | Enables a reflection service for Protobuf services (see [gRPC reflection](https://github.com/grpc/grpc/blob/master/doc/server-reflection.md) and [gRPC Server Reflection Tutorial](https://github.com/grpc/grpc-java/blob/master/documentation/server-reflection-tutorial.md) docs). Default is `false`.
|
||||
unframed_requests | No | Boolean | Enable requests not framed using the gRPC wire protocol.
|
||||
thread_count | No | Integer | The number of threads to keep in the ScheduledThreadPool. Default is `200`.
|
||||
max_connection_count | No | Integer | The maximum allowed number of open connections. Default is `500`.
|
||||
ssl | No | Boolea | Enables connections to the OTel source port over TLS/SSL. Defaults to `true`.
|
||||
sslKeyCertChainFile | Conditionally | String | File-system path or AWS S3 path to the security certificate (e.g. `"config/demo-data-prepper.crt"` or `"s3://my-secrets-bucket/demo-data-prepper.crt"`). Required if ssl is set to `true`.
|
||||
sslKeyFile | Conditionally | String | File-system path or AWS S3 path to the security key (e.g. `"config/demo-data-prepper.key"` or `"s3://my-secrets-bucket/demo-data-prepper.key"`). Required if ssl is set to `true`.
|
||||
useAcmCertForSSL | No | Boolean, enables TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is `false`.
|
||||
acmCertificateArn | Conditionally | String | Represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificate. Required if `useAcmCertForSSL` is set to `true`.
|
||||
awsRegion | Conditionally | String | Represents the AWS region to use ACM or S3. Required if `useAcmCertForSSL` is set to `true` or `sslKeyCertChainFile` and `sslKeyFile` are AWS S3 paths.
|
||||
authentication | No | Object| An authentication configuration. By default, this runs an unauthenticated server. This uses pluggable authentication for HTTPS. To use basic authentication, define the `http_basic` plugin with a `username` and `password`. To provide customer authentication use or create a plugin which implements: [GrpcAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/GrpcAuthenticationProvider.java).
|
||||
|
||||
### http_source
|
||||
|
||||
This is a source plugin that supports HTTP protocol. Currently ONLY support Json UTF-8 codec for incoming request, e.g. `[{"key1": "value1"}, {"key2": "value2"}]`.
|
||||
|
||||
Option | Required | Type | Description
|
||||
:--- | :--- | :--- | :---
|
||||
port | No | Integer | The port the source is running on. Default is `2021`. Valid options are between `0` and `65535`.
|
||||
request_timeout | No | Integer | The request timeout in millis. Default is `10_000`.
|
||||
thread_count | No | Integer | The number of threads to keep in the ScheduledThreadPool. Default is `200`.
|
||||
max_connection_count | No | Integer | The maximum allowed number of open connections. Default is `500`.
|
||||
max_pending_requests | No | Integer | The maximum number of allowed tasks in ScheduledThreadPool work queue. Default is `1024`.
|
||||
authentication | No | Object | An authentication configuration. By default, this runs an unauthenticated server. This uses pluggable authentication for HTTPS. To use basic authentication define the `http_basic` plugin with a `username` and `password`. To provide customer authentication use or create a plugin which implements: [ArmeriaHttpAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/ArmeriaHttpAuthenticationProvider.java).
|
||||
|
||||
### file
|
||||
|
||||
Source for flat file input.
|
||||
|
||||
Option | Required | Type | Description
|
||||
:--- | :--- | :--- | :---
|
||||
path | Yes | String | Path to the input file (e.g. `logs/my-log.log`).
|
||||
format | No | String | Format of each line in the file. Valid options are `json` or `plain`. Default is `plain`.
|
||||
record_type | No | String | The record type that will be stored. Valid options are `string` or `event`. Default is `string`. If you would like to use the file source for log analytics use cases like grok, set this option to `event`.
|
||||
|
||||
### pipeline
|
||||
|
||||
Source for reading from another pipeline.
|
||||
|
||||
Option | Required | Type | Description
|
||||
:--- | :--- | :--- | :---
|
||||
name | Yes | String | Name of the pipeline to read from.
|
||||
|
||||
|
||||
### stdin
|
||||
|
||||
Source for console input. Can be useful for testing. No options.
|
||||
|
||||
|
||||
## Buffers
|
||||
|
||||
Buffers store data as it passes through the pipeline. If you implement a custom buffer, it can be memory-based (better performance) or disk-based (larger).
|
||||
|
||||
|
||||
### bounded_blocking
|
||||
|
||||
The default buffer. Memory-based.
|
||||
|
||||
Option | Required | Type | Description
|
||||
:--- | :--- | :--- | :---
|
||||
buffer_size | No | Integer | The maximum number of records the buffer accepts. Default is 512.
|
||||
batch_size | No | Integer | The maximum number of records the buffer drains after each read. Default is 8.
|
||||
|
||||
|
||||
## Preppers
|
||||
|
||||
Preppers perform some action on your data: filter, transform, enrich, etc.
|
||||
|
||||
|
||||
### otel_trace_raw_prepper
|
||||
|
||||
Converts OpenTelemetry data to OpenSearch-compatible JSON documents.
|
||||
|
||||
Option | Required | Type | Description
|
||||
:--- | :--- | :--- | :---
|
||||
root_span_flush_delay | No | Integer | Represents the time interval in seconds to flush all the root spans in the prepper together with their descendants. Default is 30.
|
||||
trace_flush_interval | No | Integer | Represents the time interval in seconds to flush all the descendant spans without any root span. Default is 180.
|
||||
|
||||
|
||||
### service_map_stateful
|
||||
|
||||
Uses OpenTelemetry data to create a distributed service map for visualization in OpenSearch Dashboards.
|
||||
|
||||
Option | Required | Type | Description
|
||||
:--- | :--- | :--- | :---
|
||||
window_duration | No | Integer | Represents the fixed time window in seconds to evaluate service-map relationships. Default is 180.
|
||||
|
||||
### peer_forwarder
|
||||
|
||||
Forwards ExportTraceServiceRequests via gRPC to other Data Prepper instances. Required for operating Data Prepper in a clustered deployment.
|
||||
|
||||
Option | Required | Type | Description
|
||||
:--- | :--- | :--- | :---
|
||||
time_out | No | Integer | Forwarded request timeout in seconds. Defaults to 3 seconds.
|
||||
span_agg_count | No | Integer | Batch size for number of spans per request. Defaults to 48.
|
||||
target_port | No | Integer | The destination port to forward requests to. Defaults to `21890`.
|
||||
discovery_mode | No | String | Peer discovery mode to be used. Allowable values are `static`, `dns`, and `aws_cloud_map`. Defaults to `static`.
|
||||
static_endpoints | No | List | List containing string endpoints of all Data Prepper instances.
|
||||
domain_name | No | String | Single domain name to query DNS against. Typically used by creating multiple DNS A Records for the same domain.
|
||||
ssl | No | Boolean | Indicates whether TLS should be used. Default is true.
|
||||
awsCloudMapNamespaceName | Conditionally | String | Name of your CloudMap Namespace. Required if `discovery_mode` is set to `aws_cloud_map`.
|
||||
awsCloudMapServiceName | Conditionally | String | Service name within your CloudMap Namespace. Required if `discovery_mode` is set to `aws_cloud_map`.
|
||||
sslKeyCertChainFile | Conditionally | String | Represents the SSL certificate chain file path or AWS S3 path. S3 path example `s3://<bucketName>/<path>`. Required if `ssl` is set to `true`.
|
||||
useAcmCertForSSL | No | Boolean | Enables TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is `false`.
|
||||
awsRegion | Conditionally | String | Represents the AWS region to use ACM, S3, or CloudMap. Required if `useAcmCertForSSL` is set to `true` or `sslKeyCertChainFile` and `sslKeyFile` are AWS S3 paths.
|
||||
acmCertificateArn | Conditionally | String | Represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificate. Required if `useAcmCertForSSL` is set to `true`.
|
||||
|
||||
### string_converter
|
||||
|
||||
Converts strings to uppercase or lowercase. Mostly useful as an example if you want to develop your own prepper.
|
||||
|
||||
Option | Required | Type | Description
|
||||
:--- | :--- | :--- | :---
|
||||
upper_case | No | Boolean | Whether to convert to uppercase (`true`) or lowercase (`false`).
|
||||
|
||||
### grok_prepper
|
||||
|
||||
Takes unstructured data and utilizes pattern matching to structure and extract important keys and make data more structured and queryable.
|
||||
|
||||
Option | Required | Type | Description
|
||||
:--- | :--- | :--- | :---
|
||||
match | No | Map | Specifies which keys to match specific patterns against. Default is an empty body.
|
||||
keep_empty_captures | No | Boolean | Enables preserving `null` captures. Default value is `false`.
|
||||
named_captures_only | No | Boolean | enables whether to keep only named captures. Default value is `true`.
|
||||
break_on_match | No | Boolean | Specifies wether to match all patterns or stop once the first successful match is found. Default is `true`.
|
||||
keys_to_overwrite | No | List | Specifies which existing keys are to be overwritten if there is a capture with the same key value. Default is `[]`.
|
||||
pattern_definitions | No | Map | Allows for custom pattern use inline. Default value is an empty body.
|
||||
patterns_directories | No | List | Specifies the path of directories that contain customer pattern files. Default value is an empty list.
|
||||
pattern_files_glob | No | String | Specifies which pattern files to use from the directories specified for `pattern_directories`. Default is `*`.
|
||||
target_key | No | String | Specifies a parent level key to store all captures. Default value is `null`.
|
||||
timeout_millis | No | Integer | Maximum amount of time that should take place for the matching. Setting to `0` disables the timeout. Default value is `30,000`.
|
||||
|
||||
## Sinks
|
||||
|
||||
Sinks define where Data Prepper writes your data to.
|
||||
|
||||
|
||||
### OpenSearch
|
||||
|
||||
Sink for an OpenSearch cluster.
|
||||
|
||||
Option | Required | Type | Description
|
||||
:--- | :--- | :--- | :---
|
||||
hosts | Yes | List | List of OpenSearch hosts to write to (e.g. `["https://localhost:9200", "https://remote-cluster:9200"]`).
|
||||
cert | No | String | Path to the security certificate (e.g. `"config/root-ca.pem"`) if the cluster uses the OpenSearch security plugin.
|
||||
username | No | String | Username for HTTP basic authentication.
|
||||
password | No | String | Password for HTTP basic authentication.
|
||||
aws_sigv4 | No | Boolean | default false. Whether to use IAM signing to connect to an Amazon OpenSearch Service domain. For your access key, secret key, and optional session token, Data Prepper uses the default credential chain (environment variables, Java system properties, `~/.aws/credential`, etc.).
|
||||
aws_region | No | String | AWS region (e.g. `"us-east-1"`) for the domain if you are connecting to Amazon OpenSearch Service.
|
||||
aws_sts_role_arn | No | String | IAM role which the sink plugin assumes to sign request to Amazon OpenSearch Service. If not provided the plugin uses the default credentials.
|
||||
socket_timeout | No | Integer | the timeout in milliseconds for waiting for data (or, put differently, a maximum period inactivity between two consecutive data packets). A timeout value of zero is interpreted as an infinite timeout. If this timeout value is either negative or not set, the underlying Apache HttpClient would rely on operating system settings for managing socket timeouts.
|
||||
connect_timeout | No | Integer | The timeout in milliseconds used when requesting a connection from the connection manager. A timeout value of zero is interpreted as an infinite timeout. If this timeout value is either negative or not set, the underlying Apache HttpClient would rely on operating system settings for managing connection timeouts.
|
||||
insecure | No | Boolean | Whether to verify SSL certificates. If set to true, CA certificate verification is disabled and insecure HTTP requests are sent instead. Default is false.
|
||||
proxy | No | String | The address of a [forward HTTP proxy server](https://en.wikipedia.org/wiki/Proxy_server). The format is "<host name or IP>:<port>". Examples: "example.com:8100", "http://example.com:8100", "112.112.112.112:8100". Port number cannot be omitted.
|
||||
trace_analytics_raw | No | Boolean | Deprecated in favor of `index_type`. Whether to export as trace data to the `otel-v1-apm-span-*` index pattern (alias `otel-v1-apm-span`) for use with the Trace Analytics OpenSearch Dashboards plugin. Default is false.
|
||||
trace_analytics_service_map | No | Boolean | Deprecated in favor of `index_type`. Whether to export as trace data to the `otel-v1-apm-service-map` index for use with the service map component of the Trace Analytics OpenSearch Dashboards plugin. | Default is false.
|
||||
index | No | String | Name of the index to export to. Only required if you don't use the `trace-analytics-raw` or `trace-analytics-service-map` presets. In other words, this parameter is applicable and required only if index_type is explicitly `custom` or defaults to `custom`.
|
||||
index_type | No | String | This index type instructs the Sink plugin what type of data it is handling. Valid values: `custom`, `trace-analytics-raw`, `trace-analytics-service-map`. Default is `custom`.
|
||||
template_file | No | String | Path to a JSON [index template]({{site.url}}{{site.baseurl}}/opensearch/index-templates/) file (e.g. `/your/local/template-file.json` if you do not use the `trace_analytics_raw` or `trace_analytics_service_map`.) See [otel-v1-apm-span-index-template.json](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/src/main/resources/otel-v1-apm-span-index-template.json) for an example.
|
||||
document_id_field | No | String | The field from the source data to use for the OpenSearch document ID (e.g. `"my-field"`) if you don't use the `trace_analytics_raw` or `trace_analytics_service_map` presets.
|
||||
dlq_file | No | String | The path to your preferred dead letter queue file (e.g. `/your/local/dlq-file`). Data Prepper writes to this file when it fails to index a document on the OpenSearch cluster.
|
||||
bulk_size | No | Integer (long) | The maximum size (in MiB) of bulk requests to the OpenSearch cluster. Values below 0 indicate an unlimited size. If a single document exceeds the maximum bulk request size, Data Prepper sends it individually. Default is 5.
|
||||
ism_policy_file | No | String | The absolute file path for an ISM (Index State Management) policy JSON file. This policy file is effective only when there is no built-in policy file for the index type. For example, `custom` index type is currently the only one without a built-in policy file, thus it would use the policy file here if it's provided through this parameter. For more information, see [ISM policies]({{site.url}}{{site.baseurl}}/im-plugin/ism/policies/).
|
||||
number_of_shards | No | Integer | The number of primary shards that an index should have on the destination OpenSearch server. This parameter is effective only when `template_file` is either explicitly provided in Sink configuration or built-in. If this parameter is set, it would override the value in index template file. For more information, see [create index]({{site.url}}{{site.baseurl}}/opensearch/rest-api/index-apis/create-index/).
|
||||
number_of_replicas | No | Integer | The number of replica shards each primary shard should have on the destination OpenSearch server. For example, if you have 4 primary shards and set number_of_replicas to 3, the index has 12 replica shards. This parameter is effective only when `template_file` is either explicitly provided in Sink configuration or built-in. If this parameter is set, it would override the value in index template file. For more information, see [create index]({{site.url}}{{site.baseurl}}/opensearch/rest-api/index-apis/create-index/).
|
||||
|
||||
### file
|
||||
|
||||
Sink for flat file output.
|
||||
|
||||
Option | Required | Type | Description
|
||||
:--- | :--- | :--- | :---
|
||||
path | Yes | String | Path for the output file (e.g. `logs/my-transformed-log.log`).
|
||||
|
||||
|
||||
### pipeline
|
||||
|
||||
Sink for writing to another pipeline.
|
||||
|
||||
Option | Required | Type | Description
|
||||
:--- | :--- | :--- | :---
|
||||
name | Yes | String | Name of the pipeline to write to.
|
||||
|
||||
|
||||
### stdout
|
||||
|
||||
Sink for console output. Can be useful for testing. No options.
|
|
@ -0,0 +1,63 @@
|
|||
---
|
||||
layout: default
|
||||
title: Get Started
|
||||
parent: Data Prepper
|
||||
nav_order: 1
|
||||
---
|
||||
|
||||
# Get started with Data Prepper
|
||||
|
||||
Data Prepper is an independent component, not an OpenSearch plugin, that converts data for use with OpenSearch. It's not bundled with the all-in-one OpenSearch installation packages.
|
||||
|
||||
## 1. Install Data Prepper
|
||||
|
||||
To use the Docker image, pull it like any other image:
|
||||
|
||||
```bash
|
||||
docker pull opensearchproject/data-prepper:latest
|
||||
```
|
||||
|
||||
## 2. Define a pipeline
|
||||
|
||||
Create a Data Prepper pipeline file, `pipelines.yaml`, with the following configuration:
|
||||
|
||||
```yml
|
||||
simple-sample-pipeline:
|
||||
workers: 2
|
||||
delay: "5000"
|
||||
source:
|
||||
random:
|
||||
sink:
|
||||
- stdout:
|
||||
```
|
||||
|
||||
## 3. Start Data Prepper
|
||||
|
||||
Run the following command with your pipeline configuration YAML.
|
||||
|
||||
```bash
|
||||
docker run --name data-prepper \
|
||||
-v /full/path/to/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml \
|
||||
opensearchproject/opensearch-data-prepper:latest
|
||||
```
|
||||
|
||||
This sample pipeline configuration above demonstrates a simple pipeline with a source (`random`) sending data to a sink (`stdout`). For more examples and details on more advanced pipeline configurations, see [Pipelines]({{site.url}}{{site.baseurl}}/observability/data-prepper/pipelines).
|
||||
|
||||
After starting Data Prepper, you should see log output and some UUIDs after a few seconds:
|
||||
|
||||
```yml
|
||||
2021-09-30T20:19:44,147 [main] INFO com.amazon.dataprepper.pipeline.server.DataPrepperServer - Data Prepper server running at :4900
|
||||
2021-09-30T20:19:44,681 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
|
||||
2021-09-30T20:19:45,183 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
|
||||
2021-09-30T20:19:45,687 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
|
||||
2021-09-30T20:19:46,191 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
|
||||
2021-09-30T20:19:46,694 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
|
||||
2021-09-30T20:19:47,200 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
|
||||
2021-09-30T20:19:49,181 [simple-test-pipeline-processor-worker-1-thread-1] INFO com.amazon.dataprepper.pipeline.ProcessWorker - simple-test-pipeline Worker: Processing 6 records from buffer
|
||||
07dc0d37-da2c-447e-a8df-64792095fb72
|
||||
5ac9b10a-1d21-4306-851a-6fb12f797010
|
||||
99040c79-e97b-4f1d-a70b-409286f2a671
|
||||
5319a842-c028-4c17-a613-3ef101bd2bdd
|
||||
e51e700e-5cab-4f6d-879a-1c3235a77d18
|
||||
b4ed2d7e-cf9c-4e9d-967c-b18e8af35c90
|
||||
```
|
|
@ -0,0 +1,15 @@
|
|||
---
|
||||
layout: default
|
||||
title: Data Prepper
|
||||
nav_order: 80
|
||||
has_children: true
|
||||
has_toc: false
|
||||
---
|
||||
|
||||
# Data Prepper
|
||||
|
||||
Data Prepper is a server side data collector capable of filtering, enriching, transforming, normalizing and aggregating data for downstream analytics and visualization.
|
||||
|
||||
Data Prepper lets users build custom pipelines to improve the operational view of applications. Two common uses for Data Prepper are trace and log analytics. [Trace analytics]({{site.url}}{{site.baseurl}}/observability/trace/index/) can help you visualize the flow of events and identify performance problems, and [log analytics]({{site.url}}{{site.baseurl}}/observability/log-analytics/) can improve searching, analyzing and provide insights into your application.
|
||||
|
||||
To get started building your own custom pipelines with Data Prepper, see the [Get Started]({{site.url}}{{site.baseurl}}/observability/data-prepper/get-started/) guide.
|
|
@ -0,0 +1,153 @@
|
|||
---
|
||||
layout: default
|
||||
title: Pipelines
|
||||
parent: Data Prepper
|
||||
nav_order: 2
|
||||
---
|
||||
|
||||
# Pipelines
|
||||
|
||||
![Data Prepper Pipeline]({{site.url}}{{site.baseurl}}/images/data-prepper-pipeline.png)
|
||||
|
||||
To use Data Prepper, you define pipelines in a configuration YAML file. Each pipeline is a combination of a source, a buffer, zero or more preppers, and one or more sinks. For example:
|
||||
|
||||
```yml
|
||||
simple-sample-pipeline:
|
||||
workers: 2 # the number of workers
|
||||
delay: 5000 # in milliseconds, how long workers wait between read attempts
|
||||
source:
|
||||
random:
|
||||
buffer:
|
||||
bounded_blocking:
|
||||
buffer_size: 1024 # max number of records the buffer accepts
|
||||
batch_size: 256 # max number of records the buffer drains after each read
|
||||
processor:
|
||||
- string_converter:
|
||||
upper_case: true
|
||||
sink:
|
||||
- stdout:
|
||||
```
|
||||
|
||||
- Sources define where your data comes from. In this case, the source is a random UUID generator (`random`).
|
||||
|
||||
- Buffers store data as it passes through the pipeline.
|
||||
|
||||
By default, Data Prepper uses its one and only buffer, the `bounded_blocking` buffer, so you can omit this section unless you developed a custom buffer or need to tune the buffer settings.
|
||||
|
||||
- Preppers perform some action on your data: filter, transform, enrich, etc.
|
||||
|
||||
You can have multiple preppers, which run sequentially from top to bottom, not in parallel. The `string_converter` prepper transform the strings by making them uppercase.
|
||||
|
||||
- Sinks define where your data goes. In this case, the sink is stdout.
|
||||
|
||||
## Examples
|
||||
|
||||
This section provides some pipeline examples that you can use to start creating your own pipelines. For more information, see [Data Prepper configuration reference]({{site.url}}{{site.baseurl}}/observability/data-prepper/data-prepper-reference/) guide.
|
||||
|
||||
The Data Prepper repository has several [sample applications](https://github.com/opensearch-project/data-prepper/tree/main/examples) to help you get started.
|
||||
|
||||
### Log ingestion pipeline
|
||||
|
||||
The following example demonstrates how to use HTTP source and Grok prepper plugins to process unstructured log data.
|
||||
|
||||
```yml
|
||||
log-pipeline:
|
||||
source:
|
||||
http:
|
||||
ssl: false
|
||||
processor:
|
||||
- grok:
|
||||
match:
|
||||
log: [ "%{COMMONAPACHELOG}" ]
|
||||
sink:
|
||||
- opensearch:
|
||||
hosts: [ "https://opensearch:9200" ]
|
||||
insecure: true
|
||||
username: admin
|
||||
password: admin
|
||||
index: apache_logs
|
||||
```
|
||||
|
||||
This example uses weak security. We strongly recommend securing all plugins which open external ports in production environments.
|
||||
{: .note}
|
||||
|
||||
### Trace Analytics pipeline
|
||||
|
||||
The following example demonstrates how to build a pipeline that supports the [Trace Analytics OpenSearch Dashboards plugin]({{site.url}}{{site.baseurl}}/observability/trace/ta-dashboards/). This pipeline takes data from the OpenTelemetry Collector and uses two other pipelines as sinks. These two separate pipelines index trace and the service map documents for the dashboard plugin.
|
||||
|
||||
```yml
|
||||
entry-pipeline:
|
||||
delay: "100"
|
||||
source:
|
||||
otel_trace_source:
|
||||
ssl: false
|
||||
sink:
|
||||
- pipeline:
|
||||
name: "raw-pipeline"
|
||||
- pipeline:
|
||||
name: "service-map-pipeline"
|
||||
raw-pipeline:
|
||||
source:
|
||||
pipeline:
|
||||
name: "entry-pipeline"
|
||||
prepper:
|
||||
- otel_trace_raw_prepper:
|
||||
sink:
|
||||
- opensearch:
|
||||
hosts: ["https://localhost:9200"]
|
||||
insecure: true
|
||||
username: admin
|
||||
password: admin
|
||||
trace_analytics_raw: true
|
||||
service-map-pipeline:
|
||||
delay: "100"
|
||||
source:
|
||||
pipeline:
|
||||
name: "entry-pipeline"
|
||||
prepper:
|
||||
- service_map_stateful:
|
||||
sink:
|
||||
- opensearch:
|
||||
hosts: ["https://localhost:9200"]
|
||||
insecure: true
|
||||
username: admin
|
||||
password: admin
|
||||
trace_analytics_service_map: true
|
||||
```
|
||||
|
||||
## Migrating from Logstash
|
||||
|
||||
Data Prepper supports Logstash configuration files for a limited set of plugins. Simply use the logstash config to run Data Prepper.
|
||||
|
||||
```bash
|
||||
docker run --name data-prepper \
|
||||
-v /full/path/to/logstash.conf:/usr/share/data-prepper/pipelines.conf \
|
||||
opensearchproject/opensearch-data-prepper:latest
|
||||
```
|
||||
|
||||
This feature is limited by feature parity of Data Prepper. As of Data Prepper 1.2 release, the following plugins from the Logstash configuration are supported:
|
||||
|
||||
- HTTP Input plugin
|
||||
- Grok Filter plugin
|
||||
- Elasticsearch Output plugin
|
||||
- Amazon Elasticsearch Output plugin
|
||||
|
||||
## Configure the Data Prepper server
|
||||
|
||||
Data Prepper itself provides administrative HTTP endpoints such as `/list` to list pipelines and `/metrics/prometheus` to provide Prometheus-compatible metrics data. The port that has these endpoints has a TLS configuration and is specified by a separate YAML file. By default, these endpoints are secured by Data Prepper docker images. We strongly recommend providing your own configuration file for securing production environments. Here is an example `data-prepper-config.yaml`:
|
||||
|
||||
```yml
|
||||
ssl: true
|
||||
keyStoreFilePath: "/usr/share/data-prepper/keystore.jks"
|
||||
keyStorePassword: "password"
|
||||
privateKeyPassword: "other_password"
|
||||
serverPort: 1234
|
||||
```
|
||||
|
||||
To configure the Data Prepper server, run Data Prepper with the additional yaml file.
|
||||
|
||||
```bash
|
||||
docker run --name data-prepper -v /full/path/to/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml \
|
||||
/full/path/to/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml \
|
||||
opensearchproject/opensearch-data-prepper:latest
|
||||
````
|
|
@ -6,7 +6,7 @@ nav_order: 10
|
|||
|
||||
# Event analytics
|
||||
|
||||
Event analytics in observability is where you can use [Piped Processing Language]({{site.url}}{{site.baseurl}}/observability-plugins/ppl/index) (PPL) queries to build and view different visualizations of your data.
|
||||
Event analytics in observability is where you can use [Piped Processing Language]({{site.url}}{{site.baseurl}}/observability/ppl/index) (PPL) queries to build and view different visualizations of your data.
|
||||
|
||||
## Get started with event analytics
|
||||
|
||||
|
@ -24,10 +24,10 @@ source = opensearch_dashboards_sample_data_logs | fields host | stats count()
|
|||
|
||||
By default, Dashboards shows results from the last 15 minutes of your data. To see data from a different timeframe, use the date and time selector.
|
||||
|
||||
For more information about building PPL queries, see [Piped Processing Language]({{site.url}}{{site.baseurl}}/observability-plugins/ppl/index).
|
||||
For more information about building PPL queries, see [Piped Processing Language]({{site.url}}{{site.baseurl}}/observability/ppl/index).
|
||||
|
||||
## Save a visualization
|
||||
|
||||
After Dashboards generates a visualization, you must save it if you want to return to it at a later time or if you want to add it to an [operational panel]({{site.url}}{{site.baseurl}}/observability-plugins/operational-panels).
|
||||
After Dashboards generates a visualization, you must save it if you want to return to it at a later time or if you want to add it to an [operational panel]({{site.url}}{{site.baseurl}}/observability/operational-panels).
|
||||
|
||||
To save a visualization, expand the save dropdown menu next to **Run**, enter a name for your visualization, then choose **Save**. You can reopen any saved visualizations on the event analytics page.
|
|
@ -0,0 +1,28 @@
|
|||
---
|
||||
layout: default
|
||||
title: About Observability
|
||||
nav_order: 1
|
||||
has_children: false
|
||||
redirect_from:
|
||||
- /observability/
|
||||
- /observability/
|
||||
---
|
||||
|
||||
# About Observability
|
||||
OpenSearch Dashboards
|
||||
{: .label .label-yellow :}
|
||||
|
||||
Observability is collection of plugins and applications that let you visualize data-driven events by using Piped Processing Language to explore, discover, and query data stored in OpenSearch.
|
||||
|
||||
Your experience of exploring data might differ, but if you're new to exploring data to create visualizations, we recommend trying a workflow like the following:
|
||||
|
||||
1. Explore data over a certain timeframe using [Piped Processing Language]({{site.url}}{{site.baseurl}}/observability/ppl/index).
|
||||
2. Use [event analytics]({{site.url}}{{site.baseurl}}/observability/event-analytics) to turn data-driven events into visualizations.
|
||||
![Sample Event Analytics View]({{site.url}}{{site.baseurl}}/images/event-analytics.png)
|
||||
3. Create [operational panels]({{site.url}}{{site.baseurl}}/observability/operational-panels) and add visualizations to compare data the way you like.
|
||||
![Sample Operational Panel View]({{site.url}}{{site.baseurl}}/images/operational-panel.png)
|
||||
4. Use [log analytics]({{site.url}}{{site.baseurl}}/observability/log-analytics) to transform unstructured log data.
|
||||
5. Use [trace analytics]({{site.url}}{{site.baseurl}}/observability/trace/index) to create traces and dive deep into your data.
|
||||
![Sample Trace Analytics View]({{site.url}}{{site.baseurl}}/images/observability-trace.png)
|
||||
6. Leverage [notebooks]({{site.url}}{{site.baseurl}}/observability/notebooks) to combine different visualizations and code blocks that you can share with team members.
|
||||
![Sample Notebooks View]({{site.url}}{{site.baseurl}}/images/notebooks.png)
|
|
@ -0,0 +1,93 @@
|
|||
---
|
||||
layout: default
|
||||
title: Log analytics
|
||||
nav_order: 70
|
||||
---
|
||||
|
||||
# Log Ingestion
|
||||
|
||||
Log ingestion provides a way to transform unstructured log data into structured data and ingest into OpenSearch. Structured log data allows for improved queries and filtering based on the data format when searching logs for an event.
|
||||
|
||||
## Get started with log ingestion
|
||||
|
||||
OpenSearch Log Ingestion consists of three components---[Data Prepper]({{site.url}}{{site.baseurl}}/observability/data-prepper/index/), [OpenSearch]({{site.url}}{{site.baseurl}}/) and [OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/)---that fit into the OpenSearch ecosystem. The Data Prepper repository has several [sample applications](https://github.com/opensearch-project/data-prepper/tree/main/examples) to help you get started.
|
||||
|
||||
### Basic flow of data
|
||||
|
||||
![Log data flow diagram from a distributed application to OpenSearch]({{site.url}}{{site.baseurl}}/images/la.png)
|
||||
|
||||
1. Log Ingestion relies on you adding log collection to your application's environment to gather and send log data.
|
||||
|
||||
(In the [example](#example) below, [FluentBit](https://docs.fluentbit.io/manual/) is used as a log collector that collects log data from a file and sends the log data to Data Prepper).
|
||||
|
||||
2. [Data Prepper]({{site.url}}{{site.baseurl}}/observability/data-prepper/index/) receives the log data, transforms the data into a structure format, and indexes it on an OpenSearch cluster.
|
||||
|
||||
3. The data can then be explored through OpenSearch search queries or the **Discover** page in OpenSearch Dashboards.
|
||||
|
||||
### Example
|
||||
|
||||
This example mimics the writing of log entries to a log file that are then processed by Data Prepper and stored in OpenSearch.
|
||||
|
||||
Download or clone the [Data Prepper repository](https://github.com/opensearch-project/data-prepper). Then navigate to `examples/log-ingestion/` and open `docker-compose.yml` in a text editor. This file contains a container for:
|
||||
|
||||
- [Fluent Bit](https://docs.fluentbit.io/manual/) (`fluent-bit`)
|
||||
- Data Prepper (`data-prepper`)
|
||||
- A single-node OpenSearch cluster (`opensearch`)
|
||||
- OpenSearch Dashboards (`opensearch-dashboards`).
|
||||
|
||||
Close the file and run `docker-compose up --build` to start the containers.
|
||||
|
||||
After the containers start, your ingestion pipeline is set up and ready to ingest log data. The `fluent-bit` container is configured to read log data from `test.log`. Run the following command to generate log data to send to the log ingestion pipeline.
|
||||
|
||||
```
|
||||
echo '63.173.168.120 - - [04/Nov/2021:15:07:25 -0500] "GET /search/tag/list HTTP/1.0" 200 5003' >> test.log
|
||||
```
|
||||
|
||||
Fluent-Bit will collect the log data and send it to Data Prepper:
|
||||
|
||||
```angular2html
|
||||
[2021/12/02 15:35:41] [ info] [output:http:http.0] data-prepper:2021, HTTP status=200
|
||||
200 OK
|
||||
```
|
||||
|
||||
Data Prepper will process the log and index it:
|
||||
|
||||
```
|
||||
2021-12-02T15:35:44,499 [log-pipeline-processor-worker-1-thread-1] INFO com.amazon.dataprepper.pipeline.ProcessWorker - log-pipeline Worker: Processing 1 records from buffer
|
||||
```
|
||||
|
||||
This should result in a single document being written to the OpenSearch cluster in the `apache-logs` index as defined in the `log_pipeline.yaml` file.
|
||||
|
||||
Run the following command to see one of the raw documents in the OpenSearch cluster:
|
||||
|
||||
```bash
|
||||
curl -X GET -u 'admin:admin' -k 'https://localhost:9200/apache_logs/_search?pretty&size=1'
|
||||
```
|
||||
|
||||
The response should show the parsed log data:
|
||||
|
||||
```
|
||||
"hits" : [
|
||||
{
|
||||
"_index" : "apache_logs",
|
||||
"_type" : "_doc",
|
||||
"_id" : "yGrJe30BgI2EWNKtDZ1g",
|
||||
"_score" : 1.0,
|
||||
"_source" : {
|
||||
"date" : 1.638459307042312E9,
|
||||
"log" : "63.173.168.120 - - [04/Nov/2021:15:07:25 -0500] \"GET /search/tag/list HTTP/1.0\" 200 5003",
|
||||
"request" : "/search/tag/list",
|
||||
"auth" : "-",
|
||||
"ident" : "-",
|
||||
"response" : "200",
|
||||
"bytes" : "5003",
|
||||
"clientip" : "63.173.168.120",
|
||||
"verb" : "GET",
|
||||
"httpversion" : "1.0",
|
||||
"timestamp" : "04/Nov/2021:15:07:25 -0500"
|
||||
}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
The same data can be viewed in OpenSearch Dashboards by visiting the **Discover** page and searching the `apache_logs` index. Remember, you must create the index in OpenSearch Dashboards if this is your first time searching for the index.
|
|
@ -6,7 +6,7 @@ nav_order: 30
|
|||
|
||||
# Operational panels
|
||||
|
||||
Operational panels in OpenSearch Dashboards are collections of visualizations generated using [Piped Processing Language]({{site.url}}{{site.baseurl}}/observability-plugins/ppl/index) (PPL) queries.
|
||||
Operational panels in OpenSearch Dashboards are collections of visualizations generated using [Piped Processing Language]({{site.url}}{{site.baseurl}}/observability/ppl/index) (PPL) queries.
|
||||
|
||||
## Get started with operational panels
|
||||
|
||||
|
@ -16,7 +16,7 @@ If you want to start using operational panels without adding any data, expand th
|
|||
|
||||
To create an operational panel and add visualizations:
|
||||
|
||||
1. From the **Add Visualization** dropdown menu, choose **Select Existing Visualization** or **Create New Visualization**, which takes you to the [event analytics]({{site.url}}{{site.baseurl}}/observability-plugins/event-analytics) explorer, where you can use PPL to create visualizations.
|
||||
1. From the **Add Visualization** dropdown menu, choose **Select Existing Visualization** or **Create New Visualization**, which takes you to the [event analytics]({{site.url}}{{site.baseurl}}/observability/event-analytics) explorer, where you can use PPL to create visualizations.
|
||||
1. If you're adding already existing visualizations, choose a visualization from the dropdown menu.
|
||||
1. Choose **Add**.
|
||||
|
|
@ -9,7 +9,6 @@ nav_order: 1
|
|||
|
||||
OpenSearch Trace Analytics consists of two components---Data Prepper and the Trace Analytics OpenSearch Dashboards plugin---that fit into the OpenTelemetry and OpenSearch ecosystems. The Data Prepper repository has several [sample applications](https://github.com/opensearch-project/data-prepper/tree/main/examples) to help you get started.
|
||||
|
||||
|
||||
## Basic flow of data
|
||||
|
||||
![Data flow diagram from a distributed application to OpenSearch]({{site.url}}{{site.baseurl}}/images/ta.svg)
|
||||
|
@ -20,10 +19,9 @@ OpenSearch Trace Analytics consists of two components---Data Prepper and the Tra
|
|||
|
||||
1. The [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/getting-started/) receives data from the application and formats it into OpenTelemetry data.
|
||||
|
||||
1. [Data Prepper]({{site.url}}{{site.baseurl}}/observability-plugins/trace/data-prepper/) processes the OpenTelemetry data, transforms it for use in OpenSearch, and indexes it on an OpenSearch cluster.
|
||||
|
||||
1. The [Trace Analytics OpenSearch Dashboards plugin]({{site.url}}{{site.baseurl}}/observability-plugins/trace/ta-dashboards/) displays the data in near real-time as a series of charts and tables, with an emphasis on service architecture, latency, error rate, and throughput.
|
||||
1. [Data Prepper]({{site.url}}{{site.baseurl}}/observability/data-prepper/index/) processes the OpenTelemetry data, transforms it for use in OpenSearch, and indexes it on an OpenSearch cluster.
|
||||
|
||||
1. The [Trace Analytics OpenSearch Dashboards plugin]({{site.url}}{{site.baseurl}}/observability/trace/ta-dashboards/) displays the data in near real-time as a series of charts and tables, with an emphasis on service architecture, latency, error rate, and throughput.
|
||||
|
||||
## Jaeger HotROD
|
||||
|
||||
|
@ -80,4 +78,4 @@ curl -X GET -u 'admin:admin' -k 'https://localhost:9200/otel-v1-apm-span-000001/
|
|||
|
||||
Navigate to `http://localhost:5601` in a web browser and choose **Trace Analytics**. You can see the results of your single click in the Jaeger HotROD web interface: the number of traces per API and HTTP method, latency trends, a color-coded map of the service architecture, and a list of trace IDs that you can use to drill down on individual operations.
|
||||
|
||||
If you don't see your trace, adjust the timeframe in OpenSearch Dashboards. For more information on using the plugin, see [OpenSearch Dashboards plugin]({{site.url}}{{site.baseurl}}/observability-plugins/trace/ta-dashboards/).
|
||||
If you don't see your trace, adjust the timeframe in OpenSearch Dashboards. For more information on using the plugin, see [OpenSearch Dashboards plugin]({{site.url}}{{site.baseurl}}/observability/trace/ta-dashboards/).
|
|
@ -262,4 +262,4 @@ You can use wildcards to delete more than one data stream.
|
|||
|
||||
We recommend deleting data from a data stream using an ISM policy.
|
||||
|
||||
You can also use [asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/index/) and [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) and [PPL]({{site.url}}{{site.baseurl}}/observability-plugins/ppl/index/) to query your data stream directly. You can also use the security plugin to define granular permissions on the data stream name.
|
||||
You can also use [asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/index/) and [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) and [PPL]({{site.url}}{{site.baseurl}}/observability/ppl/index/) to query your data stream directly. You can also use the security plugin to define granular permissions on the data stream name.
|
||||
|
|
|
@ -0,0 +1 @@
|
|||
<mxfile host="Electron" modified="2021-12-14T14:26:24.213Z" agent="5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/15.8.7 Chrome/91.0.4472.164 Electron/13.6.2 Safari/537.36" etag="0EV8uYp45YbKkp4qsNFr" version="15.8.7" type="device"><diagram id="qG4h7x6C5n9358YNV1Zm" name="Page-1">7VhNc9owEP01HMPYCH8dCyHtIZ1mSjtpj8JebDWy5ZHlGPLrK2P5UwaSGUh6KBest2shvbfPKzNBy3j3meM0+soCoJOZEewm6HYym5mmYcuvEtlXiOc5FRByEqikFliTF1CgodCcBJD1EgVjVJC0D/osScAXPQxzzop+2pbR/q+mOAQNWPuY6ugjCURUoa5ltPgXIGEkmg2rSIzrZAVkEQ5Y0YHQaoKWnDFRXcW7JdCSvJqX6r67I9FmYRwS8Zob1oviwVyFN4/ff76Y3g+SxzncqFmeMc3VhtVixb5mgLM8CaCcxJygRRERAesU+2W0kJpLLBIxVeFMcPbUMCX3uFA/AFzA7ujKzYYPWUjAYhB8L1PaKqpuUTU0r9kvOorUNEddNepErKogbOZuiZIXiqs38GZrNEEg60YNGRcRC1mC6apFFy2Rhhy1OfeMpYq+PyDEXpkA54L1yd2yRKig6Z0gu1zKaarlylnOfTixQ1UIAvMQxIk8NC4dB4oFee6v4+IyzLTyXVf70tShVD4x4HwBn+d4SyhdMsr4YWYUYHC3fpPZidi+C5vthSxgDSzg6hZocroWcK/lAOcjHCDZ4vtf6v7D4Hc5mFr18HbXDd7u1ehdnYNe6Zz5RzoHac5Z5NstcE3VAGdRK1kuKElkkdfN1ujrgykJE3ntS9LkXGhB8QboA8uIIKwXKN1AZJu9HyRsmBAs7iR8UlOKskQWsomm5cLiXVieN6YFbKisoWy6qVav2VMyeGfeafZMWAIjdXEBpyJv4FRPd6pjTOe6VW2JGt2PfSXt3f/WPWXJV1jX+kjrzjXrPnBI0xHv1m5JOfMhy853vw32n8KD1N8qpx/pis1DYfzMN2yRFrjBfKxFurMNsu3LGM9C541njp0S7Wu1SEs/nZDk6R3PJmBK6p0x4j3bQfhCxDvzf+1s4ukOISkc6nlIvty2GHuFGTaKDrEK0nrdsGPFJAjoMVn7z9Oen67QlMyBRJajS2SPKITerpActu+0h1jnnwG0+gs=</diagram></mxfile>
|
Binary file not shown.
After Width: | Height: | Size: 25 KiB |
|
@ -0,0 +1 @@
|
|||
<mxfile host="Electron" modified="2021-12-02T23:36:50.889Z" agent="5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/13.6.2 Chrome/83.0.4103.122 Electron/9.2.0 Safari/537.36" etag="TGQ9dVb-OumTgvDfhW7r" version="13.6.2" type="device"><diagram id="raLMG7nUvYtrXQTlhIq5" name="Page-1">3VdNj5swEP01HFciQBxy7CbZrdStulIO7a1y8ADuGgYZk4/++ppgvkSSTdSsmvSC7DdjZvyenxGWO0u2z5Jm8VdkICzHZlvLnVuOM/WIfpbArgKIb1dAJDmroFELLPlvMGCdVnAGeS9RIQrFsz4YYJpCoHoYlRI3/bQQRb9qRiMYAMuAiiH6nTMVG3Rk223gM/AoNqX9sQkktE42QB5ThpsO5C4sdyYRVTVKtjMQJXc1L9W6pyPRpjEJqTpnwY+HsEi+2Nmb8lZ2Aovtz/nzg1u9ZU1FYTb8glEpF1XUtK12NRfANDVmilLFGGFKxaJFHyUWKYOyoK1nbc4LYqbBkQZ/gVI7ozMtFGooVokw0apmWejoLg2UYyEDOLG1+rRQGYE6kec0WugzDJiAkju9zpzXWj0Jgiq+7rdFzeGKmmXNm16R64bbFAzDXLfREUgPOgVbaC/bBRKOBhJ+yjLBA90upgMNW4VKujcxV7DM6J7IjTZwX42QCzFDgXK/1mUU/DDQeK4kvkEnQgIfVmGj3xqkgu1pBYeMmwUN5bUC9VWw6fivzok71iP2cVV6pF/K8Pj/NYlzpkm8OzeJM5DwSY9S9cjVQMF3TNF30D+xiHdzFpkM+P2WQboEKoPYcojQ9R9XUo+icsQwKBK91/zO3eOd6R5y5+7xBurO91ef/Sohy0Be10Jj8Jl3yEK+s3IJuY6FyM1ZaDog+c7dQc50h39YqLPt8Fesk4surkAUubrL8+7f3Hn3TzJvz2ker5BKlh/4ftSxqwoBIy3F5JAQUzJx6QcJ4Y4/Tgg9bf8vq49B+5PuLv4A</diagram></mxfile>
|
Binary file not shown.
After Width: | Height: | Size: 21 KiB |
Loading…
Reference in New Issue