OpenSearch

Commit Graph

Author	SHA1	Message	Date
David Pilato	77081e3dbf	[Doc] copy_to using attachment field type If you want to use [copy_to](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#copy-to) feature, you need to define it on each sub-field you want to copy to another field: ```javascript PUT /test/person/_mapping { "person": { "properties": { "file": { "type": "attachment", "path": "full", "fields": { "file": { "type": "string", "copy_to": "copy" } } }, "copy": { "type": "string" } } } } ``` In this example, the extracted content will be copy as well to `copy` field. Closes #97. (cherry picked from commit f4f6b57) (cherry picked from commit 5878a62)	2015-02-11 23:13:56 +01:00
David Pilato	931be57da9	[test] Add standalone runner It could be sometime useful to have a stand alone runner to see how exactly Tika extracts content from a given file. You can run `StandaloneRunner` class using: * `-u file://URL/TO/YOUR/DOC` * `--size` set extracted size (default to mapper attachment size) * `BASE64` encoded binary Example: ```sh StandaloneRunner BASE64Text StandaloneRunner -u /tmp/mydoc.pdf StandaloneRunner -u /tmp/mydoc.pdf --size 1000000 ``` It produces something like: ``` ## Extracted text --------------------- BEGIN ----------------------- This is the extracted text ---------------------- END ------------------------ ## Metadata - author: null - content_length: null - content_type: application/pdf - date: null - keywords: null - language: null - name: null - title: null ``` Closes #99. (cherry picked from commit 720b3bf) (cherry picked from commit 990fa15)	2015-02-09 17:45:07 +01:00
tlrx	a5ed51533c	update documentation with release 2.4.1	2014-11-05 20:38:24 +01:00
David Pilato	03b47d5a4c	update documentation with release 2.4.0	2014-10-08 18:50:20 +02:00
David Pilato	eef6b61806	Create branch es-1.4 for elasticsearch 1.4.0	2014-09-12 16:08:59 +02:00
David Pilato	34fe111a2b	update documentation with release 2.3.2	2014-09-01 09:53:26 +02:00
David Pilato	cc1a43b5c3	update documentation with release 2.3.1	2014-08-18 21:52:53 +02:00
David Pilato	08454d72f6	update documentation with release 2.2.1	2014-08-18 21:39:31 +02:00
David Pilato	587e6d3da2	Docs: make the welcome page more obvious Closes #79.	2014-08-18 12:38:03 +02:00
David Pilato	ad986eb2fc	Add support for multi-fields Now https://github.com/elasticsearch/elasticsearch/pull/6867 is merged in elasticsearch core code (branch 1.x - es 1.4), we can support multi fields in mapper attachment plugin. ``` DELETE /test PUT /test { "settings": { "number_of_shards": 1 } } PUT /test/person/_mapping { "person": { "properties": { "file": { "type": "attachment", "path": "full", "fields": { "file": { "type": "string", "fields": { "store": { "type": "string", "store": true } } }, "content_type": { "type": "string", "fields": { "store": { "type": "string", "store": true }, "untouched": { "type": "string", "index": "not_analyzed", "store": true } } } } } } } } PUT /test/person/1?refresh=true { "file": "IkdvZCBTYXZlIHRoZSBRdWVlbiIgKGFsdGVybmF0aXZlbHkgIkdvZCBTYXZlIHRoZSBLaW5nIg==" } GET /test/person/_search { "fields": [ "file.store", "file.content_type.store" ], "aggs": { "store": { "terms": { "field": "file.content_type.store" } }, "untouched": { "terms": { "field": "file.content_type.untouched" } } } } ``` It gives: ```js { "took": 3, "timed_out": false, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "test", "_type": "person", "_id": "1", "_score": 1, "fields": { "file.store": [ "\"God Save the Queen\" (alternatively \"God Save the King\"\n" ], "file.content_type.store": [ "text/plain; charset=ISO-8859-1" ] } } ] }, "aggregations": { "store": { "doc_count_error_upper_bound": 0, "buckets": [ { "key": "1", "doc_count": 1 }, { "key": "8859", "doc_count": 1 }, { "key": "charset", "doc_count": 1 }, { "key": "iso", "doc_count": 1 }, { "key": "plain", "doc_count": 1 }, { "key": "text", "doc_count": 1 } ] }, "untouched": { "doc_count_error_upper_bound": 0, "buckets": [ { "key": "text/plain; charset=ISO-8859-1", "doc_count": 1 } ] } } } ``` Note that using shorter definition works as well: ``` DELETE /test PUT /test { "settings": { "number_of_shards": 1 } } PUT /test/person/_mapping { "person": { "properties": { "file": { "type": "attachment" } } } } PUT /test/person/1?refresh=true { "file": "IkdvZCBTYXZlIHRoZSBRdWVlbiIgKGFsdGVybmF0aXZlbHkgIkdvZCBTYXZlIHRoZSBLaW5nIg==" } GET /test/person/_search { "query": { "match": { "file": "king" } } } ``` gives: ```js { "took": 53, "timed_out": false, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "hits": { "total": 1, "max_score": 0.095891505, "hits": [ { "_index": "test", "_type": "person", "_id": "1", "_score": 0.095891505, "_source": { "file": "IkdvZCBTYXZlIHRoZSBRdWVlbiIgKGFsdGVybmF0aXZlbHkgIkdvZCBTYXZlIHRoZSBLaW5nIg==" } } ] } } ``` Closes #57. (cherry picked from commit 432d7c0)	2014-07-26 00:27:28 +02:00
David Pilato	663d4eaddb	Update to elasticsearch 1.4.0 Closes #77. (cherry picked from commit c58516f)	2014-07-26 00:26:41 +02:00
David Pilato	eaccd4383d	Deprecate `content` by `_content` When we want to force some values, we need to set those using `_field` where `field` is the field name we want to force: ``` { "file": { "_name": "myfilename.txt" } } ``` But to set the content itself, we use `content` field name. ``` { "file": { "content": "VGhpcyBpcyBhbiBlbGFzdGljc2VhcmNoIG1hcHBlciBhdHRhY2htZW50IHRlc3Qu", "_name": "myfilename.txt" } } ``` For consistency, we set `_content` instead: ``` { "file": { "_content": "VGhpcyBpcyBhbiBlbGFzdGljc2VhcmNoIG1hcHBlciBhdHRhY2htZW50IHRlc3Qu", "_name": "myfilename.txt" } } ``` Closes #73. (cherry picked from commit 2e6be20)	2014-07-25 18:15:37 +02:00
David Pilato	51a8f6f1a0	Fix doc typo (cherry picked from commit f70eb1d)	2014-06-03 10:13:12 +02:00
David Pilato	94cf141108	Use` _language` field instead of `language` When we want to force a language instead of using Tika language detection, we set `language` field in documents. To be consistent with other forced fields, `_content_type` and `_name`, we should prefix `language` field by an underscore `_`. So `language` become `_language`. We first deprecate `language` in version 2.1.0 and we remove it in 2.3.0. Closes #68. (cherry picked from commit 2f46343)	2014-06-03 10:10:49 +02:00
David Pilato	7c1c2011bc	Update to elasticsearch 1.3.0 Closes #67. (cherry picked from commit d3eaac9)	2014-06-03 09:49:41 +02:00
David Pilato	c0e7795f1f	Update to elasticsearch 1.2.0 Closes #66. (cherry picked from commit fb3b288)	2014-06-03 09:49:13 +02:00
David Pilato	7f8143ff12	Add highlighting documentation Closes #54. (cherry picked from commit efdf8ef)	2014-06-03 09:35:05 +02:00
David Pilato	8855bd7ddc	Fix typo for JSON fields (cherry picked from commit 63c60b8)	2014-06-03 09:34:51 +02:00
David Pilato	e95bb18edb	Create branches according to elasticsearch versions We create branches: * es-0.90 for elasticsearch 0.90 * es-1.0 for elasticsearch 1.0 * es-1.1 for elasticsearch 1.1 * master for elasticsearch master We also check that before releasing we don't have a dependency to an elasticsearch SNAPSHOT version. Add links to each version in documentation	2014-03-28 17:47:38 +01:00
David Pilato	839c4dab16	prepare for next development iteration	2014-03-25 19:02:16 +01:00
David Pilato	74d882110d	prepare release elasticsearch-mapper-attachments-2.0.0	2014-03-25 18:47:56 +01:00
Richard Louapre	3d15cb0484	Add language detection option Based on PR #45, we add a new language detection option using Language detection feature available in Tika: https://tika.apache.org/1.4/detection.html#Language_Detection By default, language detection is disabled (`false`) as it could come with a cost. This default value can be changed by setting the `index.mapping.attachment.detect_language` setting. It can also be provided on a per document indexed using the `_detect_language` parameter. Closes #45. Closes #44.	2014-03-25 18:26:09 +01:00
David Pilato	621995d0b4	Upgrade to Tika 1.5 Closes #56.	2014-03-19 23:20:29 +01:00
David Pilato	9d0b700b05	Add plugin release semi-automatic script Closes #58.	2014-03-19 23:04:09 +01:00
David Pilato	7fc31c89f7	prepare release elasticsearch-mapper-attachments-2.0.0.RC1	2014-01-15 23:37:44 +01:00
David Pilato	b877f1bd4f	Update to elasticsearch 1.0.0.RC1 Closes #48.	2014-01-14 14:51:32 +01:00
David Pilato	f8f647dea9	update headers	2014-01-13 22:31:14 +01:00
David Pilato	3f3fd74ee1	prepare release elasticsearch-mapper-attachments-1.9.0	2013-08-20 18:57:35 +02:00
David Pilato	62cc54a7c8	Update readme with release dates	2013-08-20 16:15:18 +02:00
David Pilato	8c340535d2	Add content_length metadata We now generate `content_length` field field based on file size. Closes #26.	2013-08-20 16:03:31 +02:00
Frédéric Camblor	019d0f9a26	Don't reject full document in case of invalid metadata From original PR #17 from @fcamblor If you try to index a document with an invalid metadata, the full document is rejected. For example: ```html <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html lang="fr"> <head> <title>Hello</title> <meta name="date" content=""> <meta name="Author" content="kimchy"> <meta name="Keywords" content="elasticsearch,cool,bonsai"> </head> <body>World</body> </html> ``` has a non parseable date. This fix add a new option that ignore parsing errors `"index.mapping.attachment.ignore_errors":true` (default to `true`). Closes #17, #38.	2013-08-20 12:26:49 +02:00
David Pilato	d2e2fb5cdf	Upgrade Tika to 1.4. Closes #36.	2013-08-14 16:57:42 +02:00
David Pilato	c0663277bc	prepare for next development iteration	2013-08-07 10:02:02 +02:00
David Pilato	0a454efe18	prepare release elasticsearch-mapper-attachments-1.8.0	2013-08-07 09:52:29 +02:00
David Pilato	d054f9a1e7	Mapper 1.7.0 does not work with elasticsearch 0.90.3 FastByteArrayInputStream has been removed in 0.90.3. Closes #34.	2013-08-07 09:47:12 +02:00
Shay Banon	7e58416506	release 1.7	2013-02-26 16:06:39 +01:00
David Pilato	942b87b763	Move to Elasticsearch 0.21.0.Beta1 Due to refactoring in 0.21.x we have to update this plugin Closes #24.	2013-02-23 12:13:51 +01:00
Martijn van Groningen	69f8bdea03	Master is now 0.20	2012-12-21 15:17:02 +01:00
Martijn van Groningen	a163fdad0f	Prepare 1.6.0 release	2012-09-28 12:00:12 +02:00
Martijn van Groningen	0a17fe2e44	Release 1.5	2012-09-19 11:33:06 +02:00
Martijn van Groningen	5c649ad226	Upgraded Tika, Testng, hamcrest, log4j and surefire plugin. Closes #12	2012-09-19 10:55:58 +02:00
Shay Banon	65043c0692	add license and repo	2012-06-10 22:14:18 +02:00
Shay Banon	66b96cb994	release 1.4.0	2012-03-25 20:10:46 +02:00
Shay Banon	c1df26e4e9	upgrade to tika 1.1	2012-03-25 20:00:45 +02:00
Shay Banon	4482a5de67	release 1.3.0	2012-03-07 22:02:49 +02:00
Shay Banon	744e3772a5	update readme	2012-03-07 21:56:48 +02:00
Shay Banon	9882a2937b	update readme	2012-03-04 11:59:22 +02:00
Shay Banon	8d2a02e7d1	release 1.2.0	2012-02-15 22:43:48 +02:00
Shay Banon	802b795289	release 1.1.0 supporting elasticsearch 0.19	2012-02-07 17:00:47 +02:00
Shay Banon	f97157da15	move to elasticsearch 0.19.0 snap and use some of its features	2012-01-31 14:38:41 +02:00

1 2

52 Commits