OpenSearch

Commit Graph

Author	SHA1	Message	Date
David Pilato	8855bd7ddc	Fix typo for JSON fields (cherry picked from commit 63c60b8)	2014-06-03 09:34:51 +02:00
David Pilato	4d63130a23	Update to elasticsearch 2.0.0 / Lucene 4.8.1	2014-06-03 09:34:31 +02:00
David Pilato	e95bb18edb	Create branches according to elasticsearch versions We create branches: * es-0.90 for elasticsearch 0.90 * es-1.0 for elasticsearch 1.0 * es-1.1 for elasticsearch 1.1 * master for elasticsearch master We also check that before releasing we don't have a dependency to an elasticsearch SNAPSHOT version. Add links to each version in documentation	2014-03-28 17:47:38 +01:00
David Pilato	839c4dab16	prepare for next development iteration	2014-03-25 19:02:16 +01:00
David Pilato	74d882110d	prepare release elasticsearch-mapper-attachments-2.0.0	2014-03-25 18:47:56 +01:00
Richard Louapre	3d15cb0484	Add language detection option Based on PR #45, we add a new language detection option using Language detection feature available in Tika: https://tika.apache.org/1.4/detection.html#Language_Detection By default, language detection is disabled (`false`) as it could come with a cost. This default value can be changed by setting the `index.mapping.attachment.detect_language` setting. It can also be provided on a per document indexed using the `_detect_language` parameter. Closes #45. Closes #44.	2014-03-25 18:26:09 +01:00
David Pilato	621995d0b4	Upgrade to Tika 1.5 Closes #56.	2014-03-19 23:20:29 +01:00
David Pilato	b8d7f17951	Update to elasticsearch 1.0.0 Closes #60.	2014-03-19 23:14:39 +01:00
David Pilato	1b7daafeac	Add plugin version in es-plugin.properties Closes #59.	2014-03-19 23:09:37 +01:00
David Pilato	9d0b700b05	Add plugin release semi-automatic script Closes #58.	2014-03-19 23:04:09 +01:00
David Pilato	c9d749b52d	prepare for next development iteration	2014-01-15 23:46:26 +01:00
David Pilato	7fc31c89f7	prepare release elasticsearch-mapper-attachments-2.0.0.RC1	2014-01-15 23:37:44 +01:00
David Pilato	054d1acf3a	Update to elasticsearch 1.0.0.RC1 Related to #48.	2014-01-15 23:37:43 +01:00
David Pilato	a85c3becae	Move tests to elasticsearch test framework Related to #49.	2014-01-15 15:38:39 +01:00
David Pilato	b877f1bd4f	Update to elasticsearch 1.0.0.RC1 Closes #48.	2014-01-14 14:51:32 +01:00
David Pilato	2b4f875731	Move tests to elasticsearch test framework Closes #49.	2014-01-13 23:18:04 +01:00
David Pilato	f8f647dea9	update headers	2014-01-13 22:31:14 +01:00
David Pilato	94425afad2	Create CONTRIBUTING.md	2013-08-21 11:57:47 +02:00
David Pilato	e40f333609	prepare for next development iteration	2013-08-20 19:09:14 +02:00
David Pilato	3f3fd74ee1	prepare release elasticsearch-mapper-attachments-1.9.0	2013-08-20 18:57:35 +02:00
David Pilato	b35ad804df	Ignore encrypted documents Original request: I am sending multiple pdf, word etc. attachments in one documents to be indexed. Some of them (pdf) are encrypted and I am getting a MapperParsingException caused by org.apache.tika.exception.TikaException: Unable to extract PDF content cause by org.apache.pdfbox.exceptions.WrappedIOException: Error decrypting document. I was wondering if the attachment mapper could expose some switch to ignore the documents it can not extract? As we now have option `ignore_errors`, we can support it. See #38 relative to this option. Closes #18.	2013-08-20 18:31:06 +02:00
David Pilato	d6aa2f0615	Tika may "hang up" client application Sometimes Tika may crash while parsing some files. In this case it may generate just runtime errors (Throwable), not TikaException. But there is no “catch” clause for Throwable in the AttachmentMapper.java : String parsedContent; try { // Set the maximum length of strings returned by the parseToString method, -1 sets no limit parsedContent = tika().parseToString(new FastByteArrayInputStream(content), metadata, indexedChars); } catch (TikaException e) { throw new MapperParsingException("Failed to extract [" + indexedChars + "] characters of text for [" + name + "]", e); } As a result, tika() may “hang up” the whole application. (we have some pdf-files that "hang up" Elastic client if you try to parse them using mapper-attahcment plugin) We propose the following fix: String parsedContent; try { // Set the maximum length of strings returned by the parseToString method, -1 sets no limit parsedContent = tika().parseToString(new FastByteArrayInputStream(content), metadata, indexedChars); } catch (Throwable e) { throw new MapperParsingException("Failed to extract [" + indexedChars + "] characters of text for [" + name + "]", e); } (just replace “TikaException” with “Throwable” – it works for our cases) Thank you! Closes #21.	2013-08-20 17:28:11 +02:00
David Pilato	0fff26f2bf	Mapper plugin overwrite date mapping If you define some specific mapping for your file content, such as the following: ```javascript { "person": { "properties": { "file": { "type": "attachment", "path": "full", "fields": { "date": { "type": "string" } } } } } } ``` And then, if you ask back the mapping, you get: ```javascript { "person":{ "properties":{ "file":{ "type":"attachment", "path":"full", "fields":{ "file":{ "type":"string" }, "author":{ "type":"string" }, "title":{ "type":"string" }, "name":{ "type":"string" }, "date":{ "type":"date", "format":"dateOptionalTime" }, "keywords":{ "type":"string" }, "content_type":{ "type":"string" } } } } } } ``` All your settings have been overwrited by the mapper plugin. See also issue #22 where the issue was found. Closes #39.	2013-08-20 16:35:52 +02:00
David Pilato	62cc54a7c8	Update readme with release dates	2013-08-20 16:15:18 +02:00
David Pilato	8c340535d2	Add content_length metadata We now generate `content_length` field field based on file size. Closes #26.	2013-08-20 16:03:31 +02:00
David Pilato	406e295c6c	In test for #38 , we should check the real file name as we have it :-).	2013-08-20 12:34:33 +02:00
Frédéric Camblor	019d0f9a26	Don't reject full document in case of invalid metadata From original PR #17 from @fcamblor If you try to index a document with an invalid metadata, the full document is rejected. For example: ```html <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html lang="fr"> <head> <title>Hello</title> <meta name="date" content=""> <meta name="Author" content="kimchy"> <meta name="Keywords" content="elasticsearch,cool,bonsai"> </head> <body>World</body> </html> ``` has a non parseable date. This fix add a new option that ignore parsing errors `"index.mapping.attachment.ignore_errors":true` (default to `true`). Closes #17, #38.	2013-08-20 12:26:49 +02:00
David Pilato	d7a2e7e2ff	Mapper plugin overwrites multifield mapping If you define some specific mapping for your file content, such as the following: ```javascript { "person": { "properties": { "file": { "type": "attachment", "path": "full", "fields": { "file": { "type": "multifield", "fields": { "file": { "type": "string" }, "suggest": { "type": "string" } } } } } } } } ``` And then, if you ask back the mapping, you get: ```javascript { "person":{ "properties":{ "file":{ "type":"attachment", "path":"full", "fields":{ "file":{ "type":"string" }, "author":{ "type":"string" }, "title":{ "type":"string" }, "name":{ "type":"string" }, "date":{ "type":"date", "format":"dateOptionalTime" }, "keywords":{ "type":"string" }, "content_type":{ "type":"string" } } } } } } ``` All your settings have been overwrited by the mapper plugin. Closes #37.	2013-08-19 11:01:02 +02:00
David Pilato	d2e2fb5cdf	Upgrade Tika to 1.4. Closes #36.	2013-08-14 16:57:42 +02:00
David Pilato	c0663277bc	prepare for next development iteration	2013-08-07 10:02:02 +02:00
David Pilato	0a454efe18	prepare release elasticsearch-mapper-attachments-1.8.0	2013-08-07 09:52:29 +02:00
David Pilato	d054f9a1e7	Mapper 1.7.0 does not work with elasticsearch 0.90.3 FastByteArrayInputStream has been removed in 0.90.3. Closes #34.	2013-08-07 09:47:12 +02:00
Shay Banon	690779cf2f	move to 1.8 snap	2013-02-26 16:06:53 +01:00
Shay Banon	7e58416506	release 1.7	2013-02-26 16:06:39 +01:00
David Pilato	942b87b763	Move to Elasticsearch 0.21.0.Beta1 Due to refactoring in 0.21.x we have to update this plugin Closes #24.	2013-02-23 12:13:51 +01:00
David Pilato	eba4da7086	NPE if "content" is missing in mapper-attachment plugin Curl recreation: curl -X DELETE "localhost:9200/test" curl -X PUT "localhost:9200/test" -d '{ "settings" : { "index" : { "number_of_shards" : 1, "number_of_replicas" : 0 }} }' curl -X GET "localhost:9200/_cluster/health?wait_for_status=green&pretty=1&timeout=5s" curl -X PUT "localhost:9200/test/attachment/_mapping" -d '{ "attachment" : { "properties" : { "file" : { "type" : "attachment" } } } }' curl -X PUT "localhost:9200/test/attachment/1" -d '{ "file" : { "_content_type" : "application/pdf", "_name" : "resource/name/of/my.pdf" } } ' Produces a: {"error":"NullPointerException[null]","status":500} And in ES logs: [2013-02-20 12:49:04,445][DEBUG][action.index ] [Drake, Frank] [test][0], node[LI6crwNKQmu1ue1u7mlqGA], [P], s[STARTED]: Failed to execute [index {[test][attachment][1], source[{ "file" : { "_content_type" : "application/pdf", "_name" : "resource/name/of/my.pdf" } } ]}] java.lang.NullPointerException at org.elasticsearch.common.io.FastByteArrayInputStream.<init>(FastByteArrayInputStream.java:90) at org.elasticsearch.index.mapper.attachment.AttachmentMapper.parse(AttachmentMapper.java:309) at org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:507) at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:449) at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:486) at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:430) at org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:318) at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:203) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:531) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:429) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Closes #23	2013-02-23 09:40:07 +01:00
Martijn van Groningen	69f8bdea03	Master is now 0.20	2012-12-21 15:17:02 +01:00
David Pilato	30e425e209	Plugin must be 0.20.x compatible (tests fails)	2012-12-21 15:16:04 +01:00
David Pilato	17ae816a6a	Move resources in /src/test/resources	2012-12-21 15:12:15 +01:00
Martijn van Groningen	15248a9d52	Set next development version	2012-09-28 12:09:44 +02:00
Martijn van Groningen	a163fdad0f	Prepare 1.6.0 release	2012-09-28 12:00:12 +02:00
David Pilato	00d87de418	#13 : Fix dependency to tika-app	2012-09-28 10:00:45 +02:00
Martijn van Groningen	64254e621b	Removed ExtendedTika class. The maxLength change made it into Tika 1.2, which we use now and was the reason of having this class.	2012-09-19 12:44:48 +02:00
Martijn van Groningen	ab159355ed	Set new development version	2012-09-19 11:42:34 +02:00
Martijn van Groningen	0a17fe2e44	Release 1.5	2012-09-19 11:33:06 +02:00
Martijn van Groningen	5c649ad226	Upgraded Tika, Testng, hamcrest, log4j and surefire plugin. Closes #12	2012-09-19 10:55:58 +02:00
Shay Banon	65043c0692	add license and repo	2012-06-10 22:14:18 +02:00
Shay Banon	0ae4c73386	move to 1.5.0 snap	2012-03-25 20:11:07 +02:00
Shay Banon	66b96cb994	release 1.4.0	2012-03-25 20:10:46 +02:00
Shay Banon	c1df26e4e9	upgrade to tika 1.1	2012-03-25 20:00:45 +02:00

1 2 3

132 Commits All Branches Search

132 Commits

All Branches