opensearch-docs-cn/_observability/data-prepper/pipelines.md

154 lines
5.2 KiB
Markdown

---
layout: default
title: Pipelines
parent: Data Prepper
nav_order: 2
---
# Pipelines
![Data Prepper Pipeline]({{site.url}}{{site.baseurl}}/images/data-prepper-pipeline.png)
To use Data Prepper, you define pipelines in a configuration YAML file. Each pipeline is a combination of a source, a buffer, zero or more preppers, and one or more sinks. For example:
```yml
simple-sample-pipeline:
workers: 2 # the number of workers
delay: 5000 # in milliseconds, how long workers wait between read attempts
source:
random:
buffer:
bounded_blocking:
buffer_size: 1024 # max number of records the buffer accepts
batch_size: 256 # max number of records the buffer drains after each read
processor:
- string_converter:
upper_case: true
sink:
- stdout:
```
- Sources define where your data comes from. In this case, the source is a random UUID generator (`random`).
- Buffers store data as it passes through the pipeline.
By default, Data Prepper uses its one and only buffer, the `bounded_blocking` buffer, so you can omit this section unless you developed a custom buffer or need to tune the buffer settings.
- Preppers perform some action on your data: filter, transform, enrich, etc.
You can have multiple preppers, which run sequentially from top to bottom, not in parallel. The `string_converter` prepper transform the strings by making them uppercase.
- Sinks define where your data goes. In this case, the sink is stdout.
## Examples
This section provides some pipeline examples that you can use to start creating your own pipelines. For more information, see [Data Prepper configuration reference]({{site.url}}{{site.baseurl}}/observability/data-prepper/data-prepper-reference/) guide.
The Data Prepper repository has several [sample applications](https://github.com/opensearch-project/data-prepper/tree/main/examples) to help you get started.
### Log ingestion pipeline
The following example demonstrates how to use HTTP source and Grok prepper plugins to process unstructured log data.
```yml
log-pipeline:
source:
http:
ssl: false
processor:
- grok:
match:
log: [ "%{COMMONAPACHELOG}" ]
sink:
- opensearch:
hosts: [ "https://opensearch:9200" ]
insecure: true
username: admin
password: admin
index: apache_logs
```
This example uses weak security. We strongly recommend securing all plugins which open external ports in production environments.
{: .note}
### Trace Analytics pipeline
The following example demonstrates how to build a pipeline that supports the [Trace Analytics OpenSearch Dashboards plugin]({{site.url}}{{site.baseurl}}/observability/trace/ta-dashboards/). This pipeline takes data from the OpenTelemetry Collector and uses two other pipelines as sinks. These two separate pipelines index trace and the service map documents for the dashboard plugin.
```yml
entry-pipeline:
delay: "100"
source:
otel_trace_source:
ssl: false
sink:
- pipeline:
name: "raw-pipeline"
- pipeline:
name: "service-map-pipeline"
raw-pipeline:
source:
pipeline:
name: "entry-pipeline"
prepper:
- otel_trace_raw_prepper:
sink:
- opensearch:
hosts: ["https://localhost:9200"]
insecure: true
username: admin
password: admin
trace_analytics_raw: true
service-map-pipeline:
delay: "100"
source:
pipeline:
name: "entry-pipeline"
prepper:
- service_map_stateful:
sink:
- opensearch:
hosts: ["https://localhost:9200"]
insecure: true
username: admin
password: admin
trace_analytics_service_map: true
```
## Migrating from Logstash
Data Prepper supports Logstash configuration files for a limited set of plugins. Simply use the logstash config to run Data Prepper.
```bash
docker run --name data-prepper \
-v /full/path/to/logstash.conf:/usr/share/data-prepper/pipelines.conf \
opensearchproject/opensearch-data-prepper:latest
```
This feature is limited by feature parity of Data Prepper. As of Data Prepper 1.2 release, the following plugins from the Logstash configuration are supported:
- HTTP Input plugin
- Grok Filter plugin
- Elasticsearch Output plugin
- Amazon Elasticsearch Output plugin
## Configure the Data Prepper server
Data Prepper itself provides administrative HTTP endpoints such as `/list` to list pipelines and `/metrics/prometheus` to provide Prometheus-compatible metrics data. The port that has these endpoints has a TLS configuration and is specified by a separate YAML file. By default, these endpoints are secured by Data Prepper docker images. We strongly recommend providing your own configuration file for securing production environments. Here is an example `data-prepper-config.yaml`:
```yml
ssl: true
keyStoreFilePath: "/usr/share/data-prepper/keystore.jks"
keyStorePassword: "password"
privateKeyPassword: "other_password"
serverPort: 1234
```
To configure the Data Prepper server, run Data Prepper with the additional yaml file.
```bash
docker run --name data-prepper -v /full/path/to/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml \
/full/path/to/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml \
opensearchproject/opensearch-data-prepper:latest
````