2016-02-09 08:57:05 -05:00
[[ingest-attachment]]
2016-02-11 20:40:32 -05:00
=== Ingest Attachment Processor Plugin
2016-02-09 08:57:05 -05:00
2016-03-04 01:49:31 -05:00
The ingest attachment plugin lets Elasticsearch extract file attachments in common formats (such as PPT, XLS, and PDF) by
2016-02-09 08:57:05 -05:00
using the Apache text extraction library http://lucene.apache.org/tika/[Tika].
2016-03-04 01:49:31 -05:00
You can use the ingest attachment plugin as a replacement for the mapper attachment plugin.
2016-02-09 08:57:05 -05:00
The source field must be a base64 encoded binary.
[[ingest-attachment-options]]
.Attachment options
[options="header"]
|======
| Name | Required | Default | Description
| `source_field` | yes | - | The field to get the base64 encoded field from
| `target_field` | no | attachment | The field that will hold the attachment information
| `indexed_chars` | no | 100000 | The number of chars being used for extraction to prevent huge fields. Use `-1` for no limit.
2016-03-04 01:49:31 -05:00
| `fields` | no | all | Properties to select to be stored. Can be `content`, `title`, `name`, `author`, `keywords`, `date`, `content_type`, `content_length`, `language`
2016-02-09 08:57:05 -05:00
|======
[source,js]
--------------------------------------------------
{
"description" : "...",
"processors" : [
{
"attachment" : {
"source_field" : "data"
}
}
]
}
--------------------------------------------------
NOTE: Extracting contents from binary data is a resource intensive operation and
consumes a lot of resources. It is highly recommended to run pipelines
using this processor in a dedicated ingest node.