39 lines
1.5 KiB
Plaintext
39 lines
1.5 KiB
Plaintext
|
[[ingest-attachment]]
|
|||
|
== Ingest Attachment Processor Plugin
|
|||
|
|
|||
|
The ingest attachment plugin lets Elasticsearch extract file attachments in common formats (such as PPT, XLS, PDF)
|
|||
|
using the Apache text extraction library http://lucene.apache.org/tika/[Tika].
|
|||
|
|
|||
|
It can be used as replacement for the mapper attachment plugin.
|
|||
|
|
|||
|
The source field must be a base64 encoded binary.
|
|||
|
|
|||
|
[[ingest-attachment-options]]
|
|||
|
.Attachment options
|
|||
|
[options="header"]
|
|||
|
|======
|
|||
|
| Name | Required | Default | Description
|
|||
|
| `source_field` | yes | - | The field to get the base64 encoded field from
|
|||
|
| `target_field` | no | attachment | The field that will hold the attachment information
|
|||
|
| `indexed_chars` | no | 100000 | The number of chars being used for extraction to prevent huge fields. Use `-1` for no limit.
|
|||
|
| `fields` | no | all | Properties to select to be stored, can be `content`, `title`, `name`, `author`, `keywords`, `date`, `content_type`, `content_length`, `language`
|
|||
|
|======
|
|||
|
|
|||
|
[source,js]
|
|||
|
--------------------------------------------------
|
|||
|
{
|
|||
|
"description" : "...",
|
|||
|
"processors" : [
|
|||
|
{
|
|||
|
"attachment" : {
|
|||
|
"source_field" : "data"
|
|||
|
}
|
|||
|
}
|
|||
|
]
|
|||
|
}
|
|||
|
--------------------------------------------------
|
|||
|
|
|||
|
NOTE: Extracting contents from binary data is a resource intensive operation and
|
|||
|
consumes a lot of resources. It is highly recommended to run pipelines
|
|||
|
using this processor in a dedicated ingest node.
|