Merge pull request #392 from opensearch-project/add-ingest-api

Add Ingest API to reference
This commit is contained in:
Naarcha-AWS 2022-02-08 11:29:25 -06:00 committed by GitHub
commit ba554dcbc1
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 396 additions and 0 deletions

View File

@ -0,0 +1,77 @@
---
layout: default
title: Create or update ingest pipeline
parent: Ingest APIs
grand_parent: REST API reference
nav_order: 11
---
# Create and update a pipeline
The create ingest pipeline API operation creates or updates an ingest pipeline. Each pipeline requires an ingest definition defining how each processor transforms your documents.
## Example
```
PUT _ingest/pipeline/12345
{
"description" : "A description for your pipeline",
"processors" : [
{
"set" : {
"field": "field-name",
"value": "value"
}
}
]
}
```
## Path and HTTP methods
```
PUT _ingest/pipeline/{id}
```
## Request body fields
Field | Required | Type | Description
:--- | :--- | :--- | :---
description | Optional | string | Description of your ingest pipeline.
processors | Required | Array of processor objects | A processor that transforms documents. Runs in the order specified. Appears in index once ran.
```json
{
"description" : "A description for your pipeline",
"processors" : [
{
"set" : {
"field": "field-name",
"value": "value"
}
}
]
}
```
## URL parameters
All URL parameters are optional.
Parameter | Type | Description
:--- | :--- | :---
master_timeout | time | How long to wait for a connection to the master node.
timeout | time | How long to wait for the request to return.
## Response
```json
{
"acknowledged" : true
}
```

View File

@ -0,0 +1,42 @@
---
layout: default
title: Delete a pipeline
parent: Ingest APIs
grand_parent: REST API reference
nav_order: 14
---
# Delete a pipeline
If you no longer want to use an ingest pipeline, use the delete ingest pipeline API operation.
## Example
```
DELETE _ingest/pipeline/12345
```
## Path and HTTP methods
Delete an ingest pipeline based on that pipeline's ID.
```
DELETE _ingest/pipeline/
```
## URL parameters
All URL parameters are optional.
Parameter | Type | Description
:--- | :--- | :---
master_timeout | time | How long to wait for a connection to the master node.
timeout | time | How long to wait for the request to return.
## Response
```json
{
"acknowledged" : true
}
```

View File

@ -0,0 +1,57 @@
---
layout: default
title: Get ingest pipeline
parent: Ingest APIs
grand_parent: REST API reference
nav_order: 10
---
## Get ingest pipeline
After you create a pipeline, use the get ingest pipeline API operation to return all the information about a specific ingest pipeline.
## Example
```
GET _ingest/pipeline/12345
```
## Path and HTTP methods
Return all ingest pipelines.
```
GET _ingest/pipeline
```
Returns a single ingest pipeline based on the pipeline's ID.
```
GET _ingest/pipeline/{id}
```
## URL parameters
All parameters are optional.
Parameter | Type | Description
:--- | :--- | :---
master_timeout | time | How long to wait for a connection to the master node.
## Response
```json
{
"pipeline-id" : {
"description" : "A description for your pipeline",
"processors" : [
{
"set" : {
"field" : "field-name",
"value" : "value"
}
}
]
}
}
```

View File

@ -0,0 +1,15 @@
---
layout: default
title: Ingest APIs
parent: REST API reference
has_children: true
nav_order: 3
redirect_from:
- /opensearch/rest-api/ingest-apis/
---
# Ingest APIs
Before you index your data, OpenSearch's ingest APIs help transform your data by creating and managing ingest pipelines. Pipelines consist of **processors**, customizable tasks that run in the order they appear in the request body. The transformed data appears in your index after each of the processor completes.
Ingest pipelines in OpenSearch can only be managed using ingest API operations. When using ingest in production environments, your cluster should contain at least one node with the node roles permission set to `ingest`. For more information on setting up node roles within a cluster, see [Cluster Formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/).

View File

@ -0,0 +1,205 @@
---
layout: default
title: Simulate an ingest pipeline
parent: Ingest APIs
grand_parent: REST API reference
nav_order: 13
---
# Simulate a pipeline
Simulates an ingest pipeline with any example documents you specify.
## Example
```
POST /_ingest/pipeline/35678/_simulate
{
"docs": [
{
"_index": "index",
"_id": "id",
"_source": {
"location": "document-name"
}
},
{
"_index": "index",
"_id": "id",
"_source": {
"location": "document-name"
}
}
]
}
```
## Path and HTTP methods
Simulate the last ingest pipeline created.
```
GET _ingest/pipeline/_simulate
POST _ingest/pipeline/_simulate
```
Simulate a single pipeline based on the pipeline's ID.
```
GET _ingest/pipeline/{id}/_simulate
POST _ingest/pipeline/{id}/_simulate
```
## URL parameters
All URL parameters are optional.
Parameter | Type | Description
:--- | :--- | :---
verbose | boolean | Verbose mode. Display data output for each processor in executed pipeline.
## Request body fields
Field | Required | Type | Description
:--- | :--- | :--- | :---
`pipeline` | Optional | object | The pipeline you want to simulate. When included without the pipeline `{id}` inside the request path, the response simulates the last pipeline created.
`docs` | Required | array of objects | The documents you want to use to test the pipeline.
The `docs` field can include the following subfields:
Field | Required | Type | Description
:--- | :--- | :---
`id` | Optional |string | An optional identifier for the document. The identifier cannot be used elsewhere in the index.
`index` | Optional | string | The index where the document's transformed data appears.
`source` | Required | object | The document's JSON body.
## Response
Responses vary based on which path and HTTP method you choose.
### Specify pipeline in request body
```json
{
"docs" : [
{
"doc" : {
"_index" : "index",
"_type" : "_doc",
"_id" : "id",
"_source" : {
"location" : "new-new",
"field2" : "_value"
},
"_ingest" : {
"timestamp" : "2022-02-07T18:47:57.479230835Z"
}
}
},
{
"doc" : {
"_index" : "index",
"_type" : "_doc",
"_id" : "id",
"_source" : {
"location" : "new-new",
"field2" : "_value"
},
"_ingest" : {
"timestamp" : "2022-02-07T18:47:57.47933496Z"
}
}
}
]
}
```
### Specify pipeline ID inside HTTP path
```json
{
"docs" : [
{
"doc" : {
"_index" : "index",
"_type" : "_doc",
"_id" : "id",
"_source" : {
"field-name" : "value",
"location" : "document-name"
},
"_ingest" : {
"timestamp" : "2022-02-03T21:47:05.382744877Z"
}
}
},
{
"doc" : {
"_index" : "index",
"_type" : "_doc",
"_id" : "id",
"_source" : {
"field-name" : "value",
"location" : "document-name"
},
"_ingest" : {
"timestamp" : "2022-02-03T21:47:05.382803544Z"
}
}
}
]
}
```
### Receive verbose response
With the `verbose` parameter set to `true`, the response shows how each processor transforms the specified document.
```json
{
"docs" : [
{
"processor_results" : [
{
"processor_type" : "set",
"status" : "success",
"doc" : {
"_index" : "index",
"_type" : "_doc",
"_id" : "id",
"_source" : {
"field-name" : "value",
"location" : "document-name"
},
"_ingest" : {
"pipeline" : "35678",
"timestamp" : "2022-02-03T21:45:09.414049004Z"
}
}
}
]
},
{
"processor_results" : [
{
"processor_type" : "set",
"status" : "success",
"doc" : {
"_index" : "index",
"_type" : "_doc",
"_id" : "id",
"_source" : {
"field-name" : "value",
"location" : "document-name"
},
"_ingest" : {
"pipeline" : "35678",
"timestamp" : "2022-02-03T21:45:09.414093212Z"
}
}
}
]
}
]
}
```