[doc] Link mapper-attachment type documentation to its repo
As explained in elasticsearch/elasticsearch-mapper-attachments#101, we should have consistent documentation. The best option is to link the documentation in elasticsearch guide to the most recent README in the plugin repo. Closes #9756
This commit is contained in:
parent
4fbce94f44
commit
0c8da6bb84
|
@ -3,88 +3,11 @@
|
||||||
|
|
||||||
The `attachment` type allows to index different "attachment" type field
|
The `attachment` type allows to index different "attachment" type field
|
||||||
(encoded as `base64`), for example, Microsoft Office formats, open
|
(encoded as `base64`), for example, Microsoft Office formats, open
|
||||||
document formats, ePub, HTML, and so on (full list can be found
|
document formats, ePub, HTML, and so on.
|
||||||
http://lucene.apache.org/tika/0.10/formats.html[here]).
|
|
||||||
|
|
||||||
The `attachment` type is provided as a
|
The `attachment` type is provided as a
|
||||||
https://github.com/elasticsearch/elasticsearch-mapper-attachments[plugin
|
https://github.com/elasticsearch/elasticsearch-mapper-attachments[plugin
|
||||||
extension]. The plugin is a simple zip file that can be downloaded and
|
extension]. It uses http://tika.apache.org/[Apache Tika] behind the scene.
|
||||||
placed under `$ES_HOME/plugins` location. It will be automatically
|
|
||||||
detected and the `attachment` type will be added.
|
|
||||||
|
|
||||||
Note, the `attachment` type is experimental.
|
See https://github.com/elasticsearch/elasticsearch-mapper-attachments#mapper-attachments-type-for-elasticsearch[README file]
|
||||||
|
for details.
|
||||||
Using the attachment type is simple, in your mapping JSON, simply set a
|
|
||||||
certain JSON element as attachment, for example:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"person" : {
|
|
||||||
"properties" : {
|
|
||||||
"my_attachment" : { "type" : "attachment" }
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
In this case, the JSON to index can be:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"my_attachment" : "... base64 encoded attachment ..."
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
Or it is possible to use more elaborated JSON if content type or
|
|
||||||
resource name need to be set explicitly:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"my_attachment" : {
|
|
||||||
"_content_type" : "application/pdf",
|
|
||||||
"_name" : "resource/name/of/my.pdf",
|
|
||||||
"content" : "... base64 encoded attachment ..."
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
The `attachment` type not only indexes the content of the doc, but also
|
|
||||||
automatically adds meta data on the attachment as well (when available).
|
|
||||||
The metadata supported are: `date`, `title`, `author`, and `keywords`.
|
|
||||||
They can be queried using the "dot notation", for example:
|
|
||||||
`my_attachment.author`.
|
|
||||||
|
|
||||||
Both the meta data and the actual content are simple core type mappers
|
|
||||||
(string, date, ...), thus, they can be controlled in the mappings. For
|
|
||||||
example:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"person" : {
|
|
||||||
"properties" : {
|
|
||||||
"file" : {
|
|
||||||
"type" : "attachment",
|
|
||||||
"fields" : {
|
|
||||||
"file" : {"index" : "no"},
|
|
||||||
"date" : {"store" : true},
|
|
||||||
"author" : {"analyzer" : "myAnalyzer"}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
In the above example, the actual content indexed is mapped under
|
|
||||||
`fields` name `file`, and we decide not to index it, so it will only be
|
|
||||||
available in the `_all` field. The other fields map to their respective
|
|
||||||
metadata names, but there is no need to specify the `type` (like
|
|
||||||
`string` or `date`) since it is already known.
|
|
||||||
|
|
||||||
The plugin uses http://lucene.apache.org/tika/[Apache Tika] to parse
|
|
||||||
attachments, so many formats are supported, listed
|
|
||||||
http://lucene.apache.org/tika/0.10/formats.html[here].
|
|
||||||
|
|
Loading…
Reference in New Issue