opensearch-docs-cn/_data-prepper/getting-started.md
Caroline 0249991f76
Data Prepper ToC Update (#2514)
* Creating PR with first file.

Signed-off-by: carolxob <carolxob@amazon.com>

* Adding newly created files to PR.

Signed-off-by: carolxob <carolxob@amazon.com>

* Reorganized files and added appropriate metadata to map ToC correctly.

Signed-off-by: carolxob <carolxob@amazon.com>

* Moved Authoring pipelines page.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor ToC updates.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor ToC updates to Sources section for Data Prepper.

Signed-off-by: carolxob <carolxob@amazon.com>

* Updated Buffers section under Data Prepper.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor update to otelmetricssource.

Signed-off-by: carolxob <carolxob@amazon.com>

* Restructured ToC in Processors section for Data Prepper.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor filename change.

Signed-off-by: carolxob <carolxob@amazon.com>

* Adjustments to metadata in ToC.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor edit.

Signed-off-by: carolxob <carolxob@amazon.com>

* Fixed nav order in metadata.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor edit.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor update top metadata for ToC.

Signed-off-by: carolxob <carolxob@amazon.com>

* Adjustmenets to Toc order.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor adjustments to ToC metadata.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor adjustments to Sinks section.

Signed-off-by: carolxob <carolxob@amazon.com>

* Adjustements to high level ToC.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor adjustement to Pipelines.md

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor update.

Signed-off-by: carolxob <carolxob@amazon.com>

* Slight reorganization. Removed two placeholder pages for now.

Signed-off-by: carolxob <carolxob@amazon.com>

* Removed a page and replaced with pipelines content.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor changes/additions to content for placeholder pages.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor update to page link.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor adjustments to ToC metadata.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor edits.

Signed-off-by: carolxob <carolxob@amazon.com>

* Removed /clients from redirects to correct nav order.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor edits.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor adjustments to ToC metadata.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor adjustments.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor adjustment ot metadata.

Signed-off-by: carolxob <carolxob@amazon.com>

* TOC link fixes

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Changed page name.

Signed-off-by: carolxob <carolxob@amazon.com>

* Corrected references to Peer Forwarder.

Signed-off-by: carolxob <carolxob@amazon.com>

* Renamed Data Prepper folder.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor updates to phrasing and capitalization.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor phrasing update.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor phrasing update.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor change.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor change to change S3 Source to S3Source.

Signed-off-by: carolxob <carolxob@amazon.com>

* Updated references to peer forwarder and changed capitalization.

Signed-off-by: carolxob <carolxob@amazon.com>

* Updated capitalization for peer forwarder.

Signed-off-by: carolxob <carolxob@amazon.com>

* Made edits based on doc review feedback.

Signed-off-by: carolxob <carolxob@amazon.com>

* Update to one word.

Signed-off-by: carolxob <carolxob@amazon.com>

---------

Signed-off-by: carolxob <carolxob@amazon.com>
Signed-off-by: Naarcha-AWS <naarcha@amazon.com>
Co-authored-by: Naarcha-AWS <naarcha@amazon.com>
2023-02-03 15:06:10 -07:00

6.6 KiB

layout title nav_order redirect_from
default Getting started 5
/clients/data-prepper/get-started/

Getting started with Data Prepper

Data Prepper is an independent component, not an OpenSearch plugin, that converts data for use with OpenSearch. It's not bundled with the all-in-one OpenSearch installation packages.

If you are migrating from Open Distro Data Prepper, visit the Migrating from Open Distro page.

1. Installing Data Prepper

There are two ways to install Data Prepper:

  1. Run the Docker image.
  2. Build from source.

The easiest way to use Data Prepper is by running the Docker image. We suggest that you use this approach if you have Docker available.

You can pull the Docker image:

docker pull opensearchproject/data-prepper:latest

If you have special requirements that require you to build from source, or if you want to contribute, see the Developer Guide.

2. Configuring Data Prepper

You must configure Data Prepper with a pipeline before running it.

You will configure two files:

  • data-prepper-config.yaml
  • pipelines.yaml

Depending on your use case, we have a few different guides to configuring Data Prepper.

3. Defining a pipeline

Create a Data Prepper pipeline file, pipelines.yaml, with the following configuration:

simple-sample-pipeline:
  workers: 2
  delay: "5000"
  source:
    random:
  sink:
    - stdout:

4. Running Data Prepper

Run the following command with your pipeline configuration YAML.

docker run --name data-prepper \
    -v /full/path/to/pipelines.yaml:/usr/share/data-prepper/pipelines/pipelines.yaml \
    opensearchproject/data-prepper:latest

The preceding example pipeline configuration above demonstrates a simple pipeline with a source (random) sending data to a sink (stdout). For further detailed examples of more advanced pipeline configurations, see Pipelines.

After starting Data Prepper, you should see log output and some UUIDs after a few seconds:

2021-09-30T20:19:44,147 [main] INFO  com.amazon.dataprepper.pipeline.server.DataPrepperServer - Data Prepper server running at :4900
2021-09-30T20:19:44,681 [random-source-pool-0] INFO  com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
2021-09-30T20:19:45,183 [random-source-pool-0] INFO  com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
2021-09-30T20:19:45,687 [random-source-pool-0] INFO  com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
2021-09-30T20:19:46,191 [random-source-pool-0] INFO  com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
2021-09-30T20:19:46,694 [random-source-pool-0] INFO  com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
2021-09-30T20:19:47,200 [random-source-pool-0] INFO  com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
2021-09-30T20:19:49,181 [simple-test-pipeline-processor-worker-1-thread-1] INFO  com.amazon.dataprepper.pipeline.ProcessWorker -  simple-test-pipeline Worker: Processing 6 records from buffer
07dc0d37-da2c-447e-a8df-64792095fb72
5ac9b10a-1d21-4306-851a-6fb12f797010
99040c79-e97b-4f1d-a70b-409286f2a671
5319a842-c028-4c17-a613-3ef101bd2bdd
e51e700e-5cab-4f6d-879a-1c3235a77d18
b4ed2d7e-cf9c-4e9d-967c-b18e8af35c90

The remainder of this page provides examples for running Data Prepper from the Docker image. If you built from source, refer to the Developer Guide for more information.

However you configure your pipeline, you will run Data Prepper the same way. You run the Docker image and supply both the pipelines.yaml and data-prepper-config.yaml files.

For Data Prepper 2.0 or later, use this command:

docker run --name data-prepper -p 4900:4900 -v ${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines/pipelines.yaml -v ${PWD}/data-prepper-config.yaml:/usr/share/data-prepper/config/data-prepper-config.yaml opensearchproject/data-prepper:latest

For Data Prepper before version 2.0, use this command:

docker run --name data-prepper -p 4900:4900 -v ${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml -v ${PWD}/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml opensearchproject/data-prepper:1.x

Once Data Prepper is running, it will process data until it is shut down. Once you are done, shut it down with the following command:

curl -X POST http://localhost:4900/shutdown

Additional configurations

For Data Prepper 2.0 or later, the Log4j 2 configuration file is read from config/log4j2.properties in the application's home directory. By default, it uses log4j2-rolling.properties in the shared-config directory.

For Data Prepper 1.5 or earlier, optionally add "-Dlog4j.configurationFile=config/log4j2.properties" to the command if you would like to pass a custom log4j2 properties file. If no properties file is provided, Data Prepper will default to the log4j2.properties file in the shared-config directory.

Next steps

Trace Analytics is an important Data Prepper use case. If you haven't yet configured it, see the Trace Analytics.

Log Ingestion is also an important Data Prepper use case. To learn more, see the Log Ingestion Documentation.

To learn how to run Data Prepper with a Logstash configuration, see the Logstash Migration Guide.

For information on how to monitor Data Prepper, see the Monitoring page.

Other examples

We have several other Docker examples that allow you to run Data Prepper in different scenarios.