151 lines
5.2 KiB
Markdown
151 lines
5.2 KiB
Markdown
|
---
|
||
|
layout: default
|
||
|
title: Pipelines
|
||
|
parent: Data Prepper
|
||
|
nav_order: 2
|
||
|
---
|
||
|
|
||
|
# Pipelines
|
||
|
|
||
|
![Data Prepper Pipeline]({{site.url}}{{site.baseurl}}/images/data-prepper-pipeline.png)
|
||
|
|
||
|
To use Data Prepper, you define pipelines in a configuration YAML file. Each pipeline is a combination of a source, a buffer, zero or more preppers, and one or more sinks. For example:
|
||
|
|
||
|
```yml
|
||
|
simple-sample-pipeline:
|
||
|
workers: 2 # the number of workers
|
||
|
delay: 5000 # in milliseconds, how long workers wait between read attempts
|
||
|
source:
|
||
|
random:
|
||
|
buffer:
|
||
|
bounded_blocking:
|
||
|
buffer_size: 1024 # max number of records the buffer accepts
|
||
|
batch_size: 256 # max number of records the buffer drains after each read
|
||
|
processor:
|
||
|
- string_converter:
|
||
|
upper_case: true
|
||
|
sink:
|
||
|
- stdout:
|
||
|
```
|
||
|
|
||
|
- Sources define where your data comes from. In this case, the source is a random UUID generator (`random`).
|
||
|
|
||
|
- Buffers store data as it passes through the pipeline.
|
||
|
|
||
|
By default, Data Prepper uses its one and only buffer, the `bounded_blocking` buffer, so you can omit this section unless you developed a custom buffer or need to tune the buffer settings.
|
||
|
|
||
|
- Preppers perform some action on your data: filter, transform, enrich, etc.
|
||
|
|
||
|
You can have multiple preppers, which run sequentially from top to bottom, not in parallel. The `string_converter` prepper transform the strings by making them uppercase.
|
||
|
|
||
|
- Sinks define where your data goes. In this case, the sink is stdout.
|
||
|
|
||
|
## Examples
|
||
|
|
||
|
This section provides some pipeline examples that you can use to start creating your own pipelines. For more information, see [Data Prepper configuration reference]({{site.url}}{{site.baseurl}}/observability/data-prepper/data-prepper-reference/) guide.
|
||
|
|
||
|
The Data Prepper repository has several [sample applications](https://github.com/opensearch-project/data-prepper/tree/main/examples) to help you get started.
|
||
|
|
||
|
### Log ingestion pipeline
|
||
|
|
||
|
The following example demonstrates how to use HTTP source and Grok prepper plugins to process unstructured log data.
|
||
|
|
||
|
```yaml
|
||
|
log-pipeline:
|
||
|
source:
|
||
|
http:
|
||
|
ssl: false
|
||
|
processor:
|
||
|
- grok:
|
||
|
match:
|
||
|
log: [ "%{COMMONAPACHELOG}" ]
|
||
|
sink:
|
||
|
- opensearch:
|
||
|
hosts: [ "https://opensearch:9200" ]
|
||
|
insecure: true
|
||
|
username: admin
|
||
|
password: admin
|
||
|
index: apache_logs
|
||
|
```
|
||
|
|
||
|
Note: This example uses weak security. We strongly recommend securing all plugins which open external ports in production environments.
|
||
|
|
||
|
### Trace Analytics pipeline
|
||
|
|
||
|
The following example demonstrates how to build a pipeline that supports the [Trace Analytics OpenSearch Dashboards plugin]({{site.url}}{{site.baseurl}}/observability/trace/ta-dashboards/). This pipeline takes data from the OpenTelemetry Collector and uses two other pipelines as sinks. These two separate pipelines index trace and the service map documents for the dashboard plugin.
|
||
|
|
||
|
```yml
|
||
|
entry-pipeline:
|
||
|
delay: "100"
|
||
|
source:
|
||
|
otel_trace_source:
|
||
|
ssl: false
|
||
|
sink:
|
||
|
- pipeline:
|
||
|
name: "raw-pipeline"
|
||
|
- pipeline:
|
||
|
name: "service-map-pipeline"
|
||
|
raw-pipeline:
|
||
|
source:
|
||
|
pipeline:
|
||
|
name: "entry-pipeline"
|
||
|
prepper:
|
||
|
- otel_trace_raw_prepper:
|
||
|
sink:
|
||
|
- opensearch:
|
||
|
hosts: ["https://localhost:9200" ]
|
||
|
insecure: true
|
||
|
username: admin
|
||
|
password: admin
|
||
|
trace_analytics_raw: true
|
||
|
service-map-pipeline:
|
||
|
delay: "100"
|
||
|
source:
|
||
|
pipeline:
|
||
|
name: "entry-pipeline"
|
||
|
prepper:
|
||
|
- service_map_stateful:
|
||
|
sink:
|
||
|
- opensearch:
|
||
|
hosts: ["https://localhost:9200"]
|
||
|
insecure: true
|
||
|
username: admin
|
||
|
password: admin
|
||
|
trace_analytics_service_map: true
|
||
|
```
|
||
|
|
||
|
## Migrating from Logstash
|
||
|
|
||
|
Data Prepper supports Logstash configuration files for a limited set of plugins. Simply use the logstash config to run Data Prepper.
|
||
|
|
||
|
```bash
|
||
|
docker run --name data-prepper \
|
||
|
-v /full/path/to/logstash.conf:/usr/share/data-prepper/pipelines.conf \
|
||
|
opensearchproject/opensearch-data-prepper:latest
|
||
|
```
|
||
|
|
||
|
This feature is limited by feature parity of Data Prepper. As of Data Prepper 1.2 release, the following plugins from the Logstash configuration are supported:
|
||
|
|
||
|
- HTTP Input plugin
|
||
|
- Grok Filter plugin
|
||
|
- Elasticsearch Output plugin
|
||
|
- Amazon Elasticsearch Output plugin
|
||
|
|
||
|
## Configure the Data Prepper server
|
||
|
Data Prepper itself provides administrative HTTP endpoints such as `/list` to list pipelines and `/metrics/prometheus` to provide Prometheus-compatible metrics data. The port which serves these endpoints has a TLS configuration and is specified by a separate YAML file. Data Prepper docker images secures these endpoints by default. We strongly recommend providing your own configuration file for securing production environments. Here is an example `data-prepper-config.yaml`:
|
||
|
|
||
|
```yml
|
||
|
ssl: true
|
||
|
keyStoreFilePath: "/usr/share/data-prepper/keystore.jks"
|
||
|
keyStorePassword: "password"
|
||
|
privateKeyPassword: "other_password"
|
||
|
serverPort: 1234
|
||
|
```
|
||
|
|
||
|
To configure the Data Prepper server, run Data Prepper with the additional yaml file.
|
||
|
|
||
|
```yaml
|
||
|
docker run --name data-prepper -v /full/path/to/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml \
|
||
|
/full/path/to/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml \
|
||
|
opensearchproject/opensearch-data-prepper:latest
|
||
|
````
|