CSV processor

The csv processor is used to parse CSVs and store them as individual fields in a document. The processor ignores empty fields.

Syntax

The following is the syntax for the csv processor:

{
  "csv": {
    "field": "field_name",
    "target_fields": ["field1, field2, ..."]
  }
}

{% include copy-curl.html %}

Configuration parameters

The following table lists the required and optional parameters for the csv processor.

Parameter	Required/Optional	Description
`field`	Required	The name of the field containing the data to be converted. Supports template snippets.
`target_fields`	Required	The name of the field in which to store the parsed data.
`description`	Optional	A brief description of the processor.
`empty_value`	Optional	Represents optional parameters that are not required or are not applicable.
`if`	Optional	A condition for running the processor.
`ignore_failure`	Optional	Specifies whether the processor continues execution even if it encounters errors. If set to `true`, failures are ignored. Default is `false`.
`ignore_missing`	Optional	Specifies whether the processor should ignore documents that do not contain the specified field. If set to `true`, the processor does not modify the document if the field does not exist or is `null`. Default is `false`.
`on_failure`	Optional	A list of processors to run if the processor fails.
`quote`	Optional	The character used to quote fields in the CSV data. Default is `"`.
`separator`	Optional	The delimiter used to separate the fields in the CSV data. Default is `,`.
`tag`	Optional	An identifier tag for the processor. Useful for debugging in order to distinguish between processors of the same type.
`trim`	Optional	If set to `true`, the processor trims white space from the beginning and end of the text. Default is `false`.

Using the processor

Follow these steps to use the processor in a pipeline.

Step 1: Create a pipeline

The following query creates a pipeline, named csv-processor, that splits resource_usage into three new fields named cpu_usage, memory_usage, and disk_usage:

PUT _ingest/pipeline/csv-processor
{
  "description": "Split resource usage into individual fields",
  "processors": [
    {
      "csv": {
        "field": "resource_usage",
        "target_fields": ["cpu_usage", "memory_usage", "disk_usage"],
        "separator": ","
      }
    }
  ]
}

{% include copy-curl.html %}

Step 2 (Optional): Test the pipeline

It is recommended that you test your pipeline before you ingest documents. {: .tip}

To test the pipeline, run the following query:

POST _ingest/pipeline/csv-processor/_simulate
{
  "docs": [
    {
      "_index": "testindex1",
      "_id": "1",
      "_source": {
        "resource_usage": "25,4096,10",
        "memory_usage": "4096",
        "disk_usage": "10",
        "cpu_usage": "25"
      }
    }
  ]
}

{% include copy-curl.html %}

Response

The following example response confirms that the pipeline is working as expected:

{
  "docs": [
    {
      "doc": {
        "_index": "testindex1",
        "_id": "1",
        "_source": {
          "memory_usage": "4096",
          "disk_usage": "10",
          "resource_usage": "25,4096,10",
          "cpu_usage": "25"
        },
        "_ingest": {
          "timestamp": "2023-08-22T16:40:45.024796379Z"
        }
      }
    }
  ]
}

Step 3: Ingest a document

The following query ingests a document into an index named testindex1:

PUT testindex1/_doc/1?pipeline=csv-processor
{
  "resource_usage": "25,4096,10"
}

{% include copy-curl.html %}

Step 4 (Optional): Retrieve the document

To retrieve the document, run the following query:

GET testindex1/_doc/1

{% include copy-curl.html %}

4.0 KiB Raw Blame History

CSV processor

Syntax

Configuration parameters

Using the processor

4.0 KiB

Raw Blame History