opensearch-docs-cn/_clients/data-prepper/pipelines.md

---
layout: default
title: Pipelines
parent: Data Prepper
nav_order: 2
---

# Pipelines

![Data Prepper Pipeline]({{site.url}}{{site.baseurl}}/images/data-prepper-pipeline.png)

To use Data Prepper, you define pipelines in a configuration YAML file. Each pipeline is a combination of a source, a buffer, zero or more preppers, and one or more sinks. For example:

```yml
simple-sample-pipeline:
  workers: 2 # the number of workers
  delay: 5000 # in milliseconds, how long workers wait between read attempts
  source:
    random:
  buffer:
    bounded_blocking:
      buffer_size: 1024 # max number of records the buffer accepts
      batch_size: 256 # max number of records the buffer drains after each read
  processor:
    - string_converter:
        upper_case: true
  sink:
    - stdout:
```

- Sources define where your data comes from. In this case, the source is a random UUID generator (`random`).

- Buffers store data as it passes through the pipeline.

  By default, Data Prepper uses its one and only buffer, the `bounded_blocking` buffer, so you can omit this section unless you developed a custom buffer or need to tune the buffer settings.

- Preppers perform some action on your data: filter, transform, enrich, etc.

  You can have multiple preppers, which run sequentially from top to bottom, not in parallel. The `string_converter` prepper transform the strings by making them uppercase.

- Sinks define where your data goes. In this case, the sink is stdout.

## Examples

This section provides some pipeline examples that you can use to start creating your own pipelines. For more information, see [Data Prepper configuration reference]({{site.url}}{{site.baseurl}}/clients/data-prepper/data-prepper-reference/) guide.

The Data Prepper repository has several [sample applications](https://github.com/opensearch-project/data-prepper/tree/main/examples) to help you get started.

### Log ingestion pipeline

The following example demonstrates how to use HTTP source and Grok prepper plugins to process unstructured log data.

```yml
log-pipeline:
  source:
    http:
      ssl: false
  processor:
    - grok:
        match:
          log: [ "%{COMMONAPACHELOG}" ]
  sink:
    - opensearch:
        hosts: [ "https://opensearch:9200" ]
        insecure: true
        username: admin
        password: admin
        index: apache_logs
```

This example uses weak security. We strongly recommend securing all plugins which open external ports in production environments.
{: .note}

### Trace Analytics pipeline

The following example demonstrates how to build a pipeline that supports the [Trace Analytics OpenSearch Dashboards plugin]({{site.url}}{{site.baseurl}}/observability-plugin/trace/ta-dashboards/). This pipeline takes data from the OpenTelemetry Collector and uses two other pipelines as sinks. These two separate pipelines index trace and the service map documents for the dashboard plugin.

```yml
entry-pipeline:
  delay: "100"
  source:
    otel_trace_source:
      ssl: false
  sink:
    - pipeline:
        name: "raw-pipeline"
    - pipeline:
        name: "service-map-pipeline"
raw-pipeline:
  source:
    pipeline:
      name: "entry-pipeline"
  prepper:
    - otel_trace_raw_prepper:
  sink:
    - opensearch:
        hosts: ["https://localhost:9200"]
        insecure: true
        username: admin
        password: admin
        trace_analytics_raw: true
service-map-pipeline:
  delay: "100"
  source:
    pipeline:
      name: "entry-pipeline"
  prepper:
    - service_map_stateful:
  sink:
    - opensearch:
        hosts: ["https://localhost:9200"]
        insecure: true
        username: admin
        password: admin
        trace_analytics_service_map: true
```

## Migrating from Logstash

Data Prepper supports Logstash configuration files for a limited set of plugins. Simply use the logstash config to run Data Prepper.

```bash
docker run --name data-prepper \
    -v /full/path/to/logstash.conf:/usr/share/data-prepper/pipelines.conf \
    opensearchproject/opensearch-data-prepper:latest
```

This feature is limited by feature parity of Data Prepper. As of Data Prepper 1.2 release, the following plugins from the Logstash configuration are supported:

- HTTP Input plugin
- Grok Filter plugin
- Elasticsearch Output plugin
- Amazon Elasticsearch Output plugin

## Configure the Data Prepper server

Data Prepper itself provides administrative HTTP endpoints such as `/list` to list pipelines and `/metrics/prometheus` to provide Prometheus-compatible metrics data. The port that has these endpoints has a TLS configuration and is specified by a separate YAML file. By default, these endpoints are secured by Data Prepper docker images. We strongly recommend providing your own configuration file for securing production environments. Here is an example `data-prepper-config.yaml`:

```yml
ssl: true
keyStoreFilePath: "/usr/share/data-prepper/keystore.jks"
keyStorePassword: "password"
privateKeyPassword: "other_password"
serverPort: 1234
```

To configure the Data Prepper server, run Data Prepper with the additional yaml file.

```bash
docker run --name data-prepper -v /full/path/to/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml \
    /full/path/to/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml \
    opensearchproject/opensearch-data-prepper:latest
````
addressing pr feedback Signed-off-by: Christopher Manning <cmanning09@users.noreply.github.com> 2021-12-14 09:48:53 -05:00			`---`
			`layout: default`
			`title: Pipelines`
			`parent: Data Prepper`
			`nav_order: 2`
			`---`

			`# Pipelines`

			`![Data Prepper Pipeline]({{site.url}}{{site.baseurl}}/images/data-prepper-pipeline.png)`

			`To use Data Prepper, you define pipelines in a configuration YAML file. Each pipeline is a combination of a source, a buffer, zero or more preppers, and one or more sinks. For example:`

			```yml
			`simple-sample-pipeline:`
Language tweaks Signed-off-by: keithhc2 <keithhc2@users.noreply.github.com> 2021-12-15 16:09:58 -05:00			`workers: 2 # the number of workers`
addressing pr feedback Signed-off-by: Christopher Manning <cmanning09@users.noreply.github.com> 2021-12-14 09:48:53 -05:00			`delay: 5000 # in milliseconds, how long workers wait between read attempts`
			`source:`
			`random:`
			`buffer:`
			`bounded_blocking:`
			`buffer_size: 1024 # max number of records the buffer accepts`
			`batch_size: 256 # max number of records the buffer drains after each read`
			`processor:`
			`- string_converter:`
			`upper_case: true`
			`sink:`
			`- stdout:`
			```

			- Sources define where your data comes from. In this case, the source is a random UUID generator (`random`).

			`- Buffers store data as it passes through the pipeline.`

			By default, Data Prepper uses its one and only buffer, the `bounded_blocking` buffer, so you can omit this section unless you developed a custom buffer or need to tune the buffer settings.

			`- Preppers perform some action on your data: filter, transform, enrich, etc.`

			You can have multiple preppers, which run sequentially from top to bottom, not in parallel. The `string_converter` prepper transform the strings by making them uppercase.

			`- Sinks define where your data goes. In this case, the sink is stdout.`

			`## Examples`

Moved data prepper Signed-off-by: keithhc2 <keithhc2@users.noreply.github.com> 2022-02-08 16:54:39 -05:00			`This section provides some pipeline examples that you can use to start creating your own pipelines. For more information, see [Data Prepper configuration reference]({{site.url}}{{site.baseurl}}/clients/data-prepper/data-prepper-reference/) guide.`
addressing pr feedback Signed-off-by: Christopher Manning <cmanning09@users.noreply.github.com> 2021-12-14 09:48:53 -05:00
			`The Data Prepper repository has several [sample applications](https://github.com/opensearch-project/data-prepper/tree/main/examples) to help you get started.`

			`### Log ingestion pipeline`

			`The following example demonstrates how to use HTTP source and Grok prepper plugins to process unstructured log data.`

Language tweaks Signed-off-by: keithhc2 <keithhc2@users.noreply.github.com> 2021-12-15 16:09:58 -05:00			```yml
addressing pr feedback Signed-off-by: Christopher Manning <cmanning09@users.noreply.github.com> 2021-12-14 09:48:53 -05:00			`log-pipeline:`
			`source:`
			`http:`
			`ssl: false`
			`processor:`
			`- grok:`
			`match:`
			`log: [ "%{COMMONAPACHELOG}" ]`
			`sink:`
			`- opensearch:`
			`hosts: [ "https://opensearch:9200" ]`
			`insecure: true`
			`username: admin`
			`password: admin`
			`index: apache_logs`
			```

Language tweaks Signed-off-by: keithhc2 <keithhc2@users.noreply.github.com> 2021-12-15 16:09:58 -05:00			`This example uses weak security. We strongly recommend securing all plugins which open external ports in production environments.`
			`{: .note}`
addressing pr feedback Signed-off-by: Christopher Manning <cmanning09@users.noreply.github.com> 2021-12-14 09:48:53 -05:00
			`### Trace Analytics pipeline`

More renaming stuff Signed-off-by: keithhc2 <keithhc2@users.noreply.github.com> 2022-02-08 18:05:16 -05:00			`The following example demonstrates how to build a pipeline that supports the [Trace Analytics OpenSearch Dashboards plugin]({{site.url}}{{site.baseurl}}/observability-plugin/trace/ta-dashboards/). This pipeline takes data from the OpenTelemetry Collector and uses two other pipelines as sinks. These two separate pipelines index trace and the service map documents for the dashboard plugin.`
addressing pr feedback Signed-off-by: Christopher Manning <cmanning09@users.noreply.github.com> 2021-12-14 09:48:53 -05:00
			```yml
			`entry-pipeline:`
			`delay: "100"`
			`source:`
			`otel_trace_source:`
			`ssl: false`
			`sink:`
			`- pipeline:`
			`name: "raw-pipeline"`
			`- pipeline:`
			`name: "service-map-pipeline"`
			`raw-pipeline:`
			`source:`
			`pipeline:`
			`name: "entry-pipeline"`
			`prepper:`
			`- otel_trace_raw_prepper:`
			`sink:`
			`- opensearch:`
Language tweaks Signed-off-by: keithhc2 <keithhc2@users.noreply.github.com> 2021-12-15 16:09:58 -05:00			`hosts: ["https://localhost:9200"]`
addressing pr feedback Signed-off-by: Christopher Manning <cmanning09@users.noreply.github.com> 2021-12-14 09:48:53 -05:00			`insecure: true`
			`username: admin`
			`password: admin`
			`trace_analytics_raw: true`
			`service-map-pipeline:`
			`delay: "100"`
			`source:`
			`pipeline:`
			`name: "entry-pipeline"`
			`prepper:`
			`- service_map_stateful:`
			`sink:`
			`- opensearch:`
			`hosts: ["https://localhost:9200"]`
			`insecure: true`
			`username: admin`
			`password: admin`
			`trace_analytics_service_map: true`
			```

			`## Migrating from Logstash`

			`Data Prepper supports Logstash configuration files for a limited set of plugins. Simply use the logstash config to run Data Prepper.`

			```bash
			`docker run --name data-prepper \`
			`-v /full/path/to/logstash.conf:/usr/share/data-prepper/pipelines.conf \`
			`opensearchproject/opensearch-data-prepper:latest`
			```

			`This feature is limited by feature parity of Data Prepper. As of Data Prepper 1.2 release, the following plugins from the Logstash configuration are supported:`

			`- HTTP Input plugin`
			`- Grok Filter plugin`
			`- Elasticsearch Output plugin`
			`- Amazon Elasticsearch Output plugin`

			`## Configure the Data Prepper server`
Language tweaks Signed-off-by: keithhc2 <keithhc2@users.noreply.github.com> 2021-12-15 16:09:58 -05:00
			Data Prepper itself provides administrative HTTP endpoints such as `/list` to list pipelines and `/metrics/prometheus` to provide Prometheus-compatible metrics data. The port that has these endpoints has a TLS configuration and is specified by a separate YAML file. By default, these endpoints are secured by Data Prepper docker images. We strongly recommend providing your own configuration file for securing production environments. Here is an example `data-prepper-config.yaml`:
addressing pr feedback Signed-off-by: Christopher Manning <cmanning09@users.noreply.github.com> 2021-12-14 09:48:53 -05:00
			```yml
			`ssl: true`
			`keyStoreFilePath: "/usr/share/data-prepper/keystore.jks"`
			`keyStorePassword: "password"`
			`privateKeyPassword: "other_password"`
			`serverPort: 1234`
			```

			`To configure the Data Prepper server, run Data Prepper with the additional yaml file.`

Language tweaks Signed-off-by: keithhc2 <keithhc2@users.noreply.github.com> 2021-12-15 16:09:58 -05:00			```bash
addressing pr feedback Signed-off-by: Christopher Manning <cmanning09@users.noreply.github.com> 2021-12-14 09:48:53 -05:00			`docker run --name data-prepper -v /full/path/to/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml \`
			`/full/path/to/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml \`
			`opensearchproject/opensearch-data-prepper:latest`
Language tweaks Signed-off-by: keithhc2 <keithhc2@users.noreply.github.com> 2021-12-15 16:09:58 -05:00			````