opensearch-docs-cn/_data-prepper/getting-started.md

161 lines
8.5 KiB
Markdown
Raw Normal View History

---
layout: default
Restructure Data Prepper plugins documentation (#2073) * Removed content from Data Prepper reference and broke out into separate pages. Signed-off-by: carolxob <carolxob@amazon.com> * Checking in file to make sure it's the right version. Signed-off-by: carolxob <carolxob@amazon.com> * Minor update. Signed-off-by: carolxob <carolxob@amazon.com> * Updated files. Signed-off-by: carolxob <carolxob@amazon.com> * Adding Sinks file. Signed-off-by: carolxob <carolxob@amazon.com> * Added file to PR. Signed-off-by: carolxob <carolxob@amazon.com> * Corrected TOC hierarchy. Signed-off-by: carolxob <carolxob@amazon.com> * Added images and reorganized files. Signed-off-by: carolxob <carolxob@amazon.com> * Reconfigured some content based on David's feedback. Signed-off-by: carolxob <carolxob@amazon.com> * Modified reference page. Signed-off-by: carolxob <carolxob@amazon.com> * Fixed minor heading issue. Signed-off-by: carolxob <carolxob@amazon.com> * Minor edits. Signed-off-by: carolxob <carolxob@amazon.com> * Major edits, files created, moved, and content broken out from main config page. Signed-off-by: carolxob <carolxob@amazon.com> * Adding key value and processors pages to the PR. Signed-off-by: carolxob <carolxob@amazon.com> * Basic ToC reorg. Signed-off-by: carolxob <carolxob@amazon.com> * ToC updates. Signed-off-by: carolxob <carolxob@amazon.com> * Minor ToC updates. Signed-off-by: carolxob <carolxob@amazon.com> * Minor update.: Signed-off-by: carolxob <carolxob@amazon.com> * Minor edits to ToC again. Signed-off-by: carolxob <carolxob@amazon.com> * Minor updates. Signed-off-by: carolxob <carolxob@amazon.com> * Minor TOC update Signed-off-by: carolxob <carolxob@amazon.com> * Minor ToC updates. Signed-off-by: carolxob <carolxob@amazon.com> * Minor ToC updates. Signed-off-by: carolxob <carolxob@amazon.com> * Minor ToC updates. Signed-off-by: carolxob <carolxob@amazon.com> * Minor ToC updates. Signed-off-by: carolxob <carolxob@amazon.com> * Minor ToC updates. Signed-off-by: carolxob <carolxob@amazon.com> * Minor ToC changes. Signed-off-by: carolxob <carolxob@amazon.com> * Minor ToC updates. Signed-off-by: carolxob <carolxob@amazon.com> * ToC updates. Signed-off-by: carolxob <carolxob@amazon.com> * Minor ToC updates. Signed-off-by: carolxob <carolxob@amazon.com> * Minor ToC updates. Signed-off-by: carolxob <carolxob@amazon.com> * Minor ToC updates. Signed-off-by: carolxob <carolxob@amazon.com> * Minor ToC updates Signed-off-by: carolxob <carolxob@amazon.com> * Minor ToC changes. Signed-off-by: carolxob <carolxob@amazon.com> * Minor ToC changes. Signed-off-by: carolxob <carolxob@amazon.com> * Minor ToC changes. Signed-off-by: carolxob <carolxob@amazon.com> * Minor ToC changes. Signed-off-by: carolxob <carolxob@amazon.com> * Minor ToC changes. Signed-off-by: carolxob <carolxob@amazon.com> * Minor ToC edits. Signed-off-by: carolxob <carolxob@amazon.com> * Changed filename. Signed-off-by: carolxob <carolxob@amazon.com> * Minor ToC updates Signed-off-by: carolxob <carolxob@amazon.com> * Making small Toc changes. Signed-off-by: carolxob <carolxob@amazon.com> * Minor ToC updates. Signed-off-by: carolxob <carolxob@amazon.com> * Added comment blocks for Config and Metrics sections. Signed-off-by: carolxob <carolxob@amazon.com> * Minor ToC updates to add Sinks and Sources under config guide. Signed-off-by: carolxob <carolxob@amazon.com> Signed-off-by: carolxob <carolxob@amazon.com>
2022-12-27 17:58:48 -05:00
title: Getting started
nav_order: 5
redirect_from:
- /clients/data-prepper/get-started/
---
# Getting started with Data Prepper
Data Prepper is an independent component, not an OpenSearch plugin, that converts data for use with OpenSearch. It's not bundled with the all-in-one OpenSearch installation packages.
If you are migrating from Open Distro Data Prepper, see [Migrating from Open Distro]({{site.url}}{{site.baseurl}}/data-prepper/migrate-open-distro/).
{: .note}
## 1. Installing Data Prepper
There are two ways to install Data Prepper: you can run the Docker image or build from source.
The easiest way to use Data Prepper is by running the Docker image. We suggest that you use this approach if you have [Docker](https://www.docker.com) available. Run the following command:
```
docker pull opensearchproject/data-prepper:latest
```
{% include copy.html %}
If you have special requirements that require you to build from source, or if you want to contribute, see the [Developer Guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md).
## 2. Configuring Data Prepper
Expand configuration information in Data Prepper Getting started guide. (#3298) * Expand configuration information in Data Prepper Getting started guide. Signed-off-by: carolxob <carolxob@amazon.com> * Minor copy edits. Signed-off-by: carolxob <carolxob@amazon.com> * Copy edits for the Configuration section. Signed-off-by: carolxob <carolxob@amazon.com> * Minor edits. Signed-off-by: carolxob <carolxob@amazon.com> * Minor edits. Signed-off-by: carolxob <carolxob@amazon.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Made edits based on doc review feedback. Signed-off-by: carolxob <carolxob@amazon.com> * Made edits based on doc review feedback. Signed-off-by: carolxob <carolxob@amazon.com> * Update _data-prepper/getting-started.md Co-authored-by: Nathan Bower <nbower@amazon.com> * Update _data-prepper/getting-started.md Co-authored-by: Nathan Bower <nbower@amazon.com> * Update _data-prepper/getting-started.md Co-authored-by: Nathan Bower <nbower@amazon.com> * Update _data-prepper/getting-started.md Co-authored-by: Nathan Bower <nbower@amazon.com> --------- Signed-off-by: carolxob <carolxob@amazon.com> Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
2023-03-09 14:22:42 -05:00
Two configuration files are required to run a Data Prepper instance. Optionally, you can configure a Log4j 2 configuration file. See [Configuring Log4j]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/configuring-log4j/) for more information. The following list describes the purpose of each configuration file:
Expand configuration information in Data Prepper Getting started guide. (#3298) * Expand configuration information in Data Prepper Getting started guide. Signed-off-by: carolxob <carolxob@amazon.com> * Minor copy edits. Signed-off-by: carolxob <carolxob@amazon.com> * Copy edits for the Configuration section. Signed-off-by: carolxob <carolxob@amazon.com> * Minor edits. Signed-off-by: carolxob <carolxob@amazon.com> * Minor edits. Signed-off-by: carolxob <carolxob@amazon.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Made edits based on doc review feedback. Signed-off-by: carolxob <carolxob@amazon.com> * Made edits based on doc review feedback. Signed-off-by: carolxob <carolxob@amazon.com> * Update _data-prepper/getting-started.md Co-authored-by: Nathan Bower <nbower@amazon.com> * Update _data-prepper/getting-started.md Co-authored-by: Nathan Bower <nbower@amazon.com> * Update _data-prepper/getting-started.md Co-authored-by: Nathan Bower <nbower@amazon.com> * Update _data-prepper/getting-started.md Co-authored-by: Nathan Bower <nbower@amazon.com> --------- Signed-off-by: carolxob <carolxob@amazon.com> Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
2023-03-09 14:22:42 -05:00
* `pipelines.yaml`: This file describes which data pipelines to run, including sources, processors, and sinks.
* `data-prepper-config.yaml`: This file contains Data Prepper server settings that allow you to interact with exposed Data Prepper server APIs.
* `log4j2-rolling.properties` (optional): This file contains Log4j 2 configuration options and can be a JSON, YAML, XML, or .properties file type.
For Data Prepper versions earlier than 2.0, the `.jar` file expects the pipeline configuration file path to be followed by the server configuration file path. See the following configuration path example:
```
java -jar data-prepper-core-$VERSION.jar pipelines.yaml data-prepper-config.yaml
```
Optionally, you can add `"-Dlog4j.configurationFile=config/log4j2.properties"` to the command to pass a custom Log4j 2 configuration file. If you don't provide a properties file, Data Prepper defaults to the `log4j2.properties` file in the `shared-config` directory.
Starting with Data Prepper 2.0, you can launch Data Prepper by using the following `data-prepper` script that does not require any additional command line arguments:
```
bin/data-prepper
```
Configuration files are read from specific subdirectories in the application's home directory:
1. `pipelines/`: Used for pipeline configurations. Pipeline configurations can be written in one or more YAML files.
2. `config/data-prepper-config.yaml`: Used for the Data Prepper server configuration.
You can supply your own pipeline configuration file path followed by the server configuration file path. However, this method will not be supported in a future release. See the following example:
```
bin/data-prepper pipelines.yaml data-prepper-config.yaml
```
The Log4j 2 configuration file is read from the `config/log4j2.properties` file located in the application's home directory.
To configure Data Prepper, see the following information for each use case:
* [Trace analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/trace-analytics/): Learn how to collect trace data and customize a pipeline that ingests and transforms that data.
* [Log analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/log-analytics/): Learn how to set up Data Prepper for log observability.
## 3. Defining a pipeline
Expand configuration information in Data Prepper Getting started guide. (#3298) * Expand configuration information in Data Prepper Getting started guide. Signed-off-by: carolxob <carolxob@amazon.com> * Minor copy edits. Signed-off-by: carolxob <carolxob@amazon.com> * Copy edits for the Configuration section. Signed-off-by: carolxob <carolxob@amazon.com> * Minor edits. Signed-off-by: carolxob <carolxob@amazon.com> * Minor edits. Signed-off-by: carolxob <carolxob@amazon.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Update _data-prepper/getting-started.md Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> * Made edits based on doc review feedback. Signed-off-by: carolxob <carolxob@amazon.com> * Made edits based on doc review feedback. Signed-off-by: carolxob <carolxob@amazon.com> * Update _data-prepper/getting-started.md Co-authored-by: Nathan Bower <nbower@amazon.com> * Update _data-prepper/getting-started.md Co-authored-by: Nathan Bower <nbower@amazon.com> * Update _data-prepper/getting-started.md Co-authored-by: Nathan Bower <nbower@amazon.com> * Update _data-prepper/getting-started.md Co-authored-by: Nathan Bower <nbower@amazon.com> --------- Signed-off-by: carolxob <carolxob@amazon.com> Co-authored-by: Aria Marble <111301581+ariamarble@users.noreply.github.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
2023-03-09 14:22:42 -05:00
Create a Data Prepper pipeline file named `pipelines.yaml` using the following configuration:
```yml
simple-sample-pipeline:
workers: 2
delay: "5000"
source:
random:
sink:
- stdout:
```
{% include copy.html %}
## 4. Running Data Prepper
Run the following command with your pipeline configuration YAML.
```bash
docker run --name data-prepper \
-v /${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines/pipelines.yaml \
opensearchproject/data-prepper:latest
```
{% include copy.html %}
The example pipeline configuration above demonstrates a simple pipeline with a source (`random`) sending data to a sink (`stdout`). For examples of more advanced pipeline configurations, see [Pipelines]({{site.url}}{{site.baseurl}}/clients/data-prepper/pipelines/).
After starting Data Prepper, you should see log output and some UUIDs after a few seconds:
```yml
2021-09-30T20:19:44,147 [main] INFO com.amazon.dataprepper.pipeline.server.DataPrepperServer - Data Prepper server running at :4900
2021-09-30T20:19:44,681 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
2021-09-30T20:19:45,183 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
2021-09-30T20:19:45,687 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
2021-09-30T20:19:46,191 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
2021-09-30T20:19:46,694 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
2021-09-30T20:19:47,200 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer
2021-09-30T20:19:49,181 [simple-test-pipeline-processor-worker-1-thread-1] INFO com.amazon.dataprepper.pipeline.ProcessWorker - simple-test-pipeline Worker: Processing 6 records from buffer
07dc0d37-da2c-447e-a8df-64792095fb72
5ac9b10a-1d21-4306-851a-6fb12f797010
99040c79-e97b-4f1d-a70b-409286f2a671
5319a842-c028-4c17-a613-3ef101bd2bdd
e51e700e-5cab-4f6d-879a-1c3235a77d18
b4ed2d7e-cf9c-4e9d-967c-b18e8af35c90
Data Prepper 2.0 documentation (#1510) * Change Data Prepper intro Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add next steps section Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add David's feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Fix optional tags Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Address small typo * [Data Prepper 2.0]MAINT: documentation change regarding record type (#1306) * MAINT: documentation change regarding record type Signed-off-by: Chen <qchea@amazon.com> * MAINT: documentation on trace group fields Signed-off-by: Chen <qchea@amazon.com> Signed-off-by: Chen <qchea@amazon.com> * Update docs for Data Prepper 2.0 (#1404) * Update get-started Signed-off-by: Hai Yan <oeyh@amazon.com> * Update pipelines.md Signed-off-by: Hai Yan <oeyh@amazon.com> * Add peer forwarder options to references Signed-off-by: Hai Yan <oeyh@amazon.com> * Add csv processor options to refereces Signed-off-by: Hai Yan <oeyh@amazon.com> * Add docs for conditional routing Signed-off-by: Hai Yan <oeyh@amazon.com> * Add docs for json processor Signed-off-by: Hai Yan <oeyh@amazon.com> * Remove docs for peer forwarder plugin Signed-off-by: Hai Yan <oeyh@amazon.com> * Address review feedback - revise sentences, fix inaccurate info and typos Signed-off-by: Hai Yan <oeyh@amazon.com> * Add missing options for http source and peer forwarder Signed-off-by: Hai Yan <oeyh@amazon.com> * Update ssl options on peer forwarder Signed-off-by: Hai Yan <oeyh@amazon.com> Signed-off-by: Hai Yan <oeyh@amazon.com> * More updates for Data Prepper 2.0 (#1469) * Update http source and opensearch sink options Signed-off-by: Hai Yan <oeyh@amazon.com> * Update docker run command Signed-off-by: Hai Yan <oeyh@amazon.com> * Add more missing options Signed-off-by: Hai Yan <oeyh@amazon.com> * Add metadata_root_key for s3 source Signed-off-by: Hai Yan <oeyh@amazon.com> * Address review comments - tweak sentences and fix typos Signed-off-by: Hai Yan <oeyh@amazon.com> * Address review comments Signed-off-by: Hai Yan <oeyh@amazon.com> Signed-off-by: Hai Yan <oeyh@amazon.com> * Fix broken link Signed-off-by: Naarcha-AWS <naarcha@amazon.com> Signed-off-by: Naarcha-AWS <naarcha@amazon.com> Signed-off-by: Chen <qchea@amazon.com> Signed-off-by: Hai Yan <oeyh@amazon.com> Co-authored-by: Naarcha-AWS <naarcha@amazon.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Qi Chen <qchea@amazon.com> Co-authored-by: Hai Yan <8153134+oeyh@users.noreply.github.com>
2022-10-11 12:36:37 -04:00
```
The remainder of this page provides examples for running Data Prepper from the Docker image. If you
built it from source, refer to the [Developer Guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md) for more information.
However you configure your pipeline, you'll run Data Prepper the same way. You run the Docker
image and modify both the `pipelines.yaml` and `data-prepper-config.yaml` files.
For Data Prepper 2.0 or later, use this command:
```
docker run --name data-prepper -p 4900:4900 -v ${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines/pipelines.yaml -v ${PWD}/data-prepper-config.yaml:/usr/share/data-prepper/config/data-prepper-config.yaml opensearchproject/data-prepper:latest
```
{% include copy.html %}
For Data Prepper versions earlier than 2.0, use this command:
```
docker run --name data-prepper -p 4900:4900 -v ${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml -v ${PWD}/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml opensearchproject/data-prepper:1.x
```
{% include copy.html %}
Once Data Prepper is running, it processes data until it is shut down. Once you are done, shut it down with the following command:
```
POST /shutdown
```
{% include copy-curl.html %}
### Additional configurations
For Data Prepper 2.0 or later, the Log4j 2 configuration file is read from `config/log4j2.properties` in the application's home directory. By default, it uses `log4j2-rolling.properties` in the *shared-config* directory.
For Data Prepper 1.5 or earlier, optionally add `"-Dlog4j.configurationFile=config/log4j2.properties"` to the command if you want to pass a custom log4j2 properties file. If no properties file is provided, Data Prepper defaults to the log4j2.properties file in the *shared-config* directory.
## Next steps
Trace analytics is an important Data Prepper use case. If you haven't yet configured it, see [Trace analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/trace-analytics/).
Log ingestion is also an important Data Prepper use case. To learn more, see [Log analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/log-analytics/).
To learn how to run Data Prepper with a Logstash configuration, see [Migrating from Logstash]({{site.url}}{{site.baseurl}}/data-prepper/migrating-from-logstash-data-prepper/).
For information on how to monitor Data Prepper, see [Monitoring]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/monitoring/).
## More examples
For more examples of Data Prepper, see [examples](https://github.com/opensearch-project/data-prepper/tree/main/examples/) in the Data Prepper repo.