David Pilato
406e295c6c
In test for #38 , we should check the real file name as we have it :-).
2013-08-20 12:34:33 +02:00
Frédéric Camblor
019d0f9a26
Don't reject full document in case of invalid metadata
...
From original PR #17 from @fcamblor
If you try to index a document with an invalid metadata, the full document is rejected.
For example:
```html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd ">
<html lang="fr">
<head>
<title>Hello</title>
<meta name="date" content="">
<meta name="Author" content="kimchy">
<meta name="Keywords" content="elasticsearch,cool,bonsai">
</head>
<body>World</body>
</html>
```
has a non parseable date.
This fix add a new option that ignore parsing errors `"index.mapping.attachment.ignore_errors":true` (default to `true`).
Closes #17 , #38 .
2013-08-20 12:26:49 +02:00
David Pilato
d7a2e7e2ff
Mapper plugin overwrites multifield mapping
...
If you define some specific mapping for your file content, such as the following:
```javascript
{
"person": {
"properties": {
"file": {
"type": "attachment",
"path": "full",
"fields": {
"file": {
"type": "multifield",
"fields": {
"file": { "type": "string" },
"suggest": { "type": "string" }
}
}
}
}
}
}
}
```
And then, if you ask back the mapping, you get:
```javascript
{
"person":{
"properties":{
"file":{
"type":"attachment",
"path":"full",
"fields":{
"file":{
"type":"string"
},
"author":{
"type":"string"
},
"title":{
"type":"string"
},
"name":{
"type":"string"
},
"date":{
"type":"date",
"format":"dateOptionalTime"
},
"keywords":{
"type":"string"
},
"content_type":{
"type":"string"
}
}
}
}
}
}
```
All your settings have been overwrited by the mapper plugin.
Closes #37 .
2013-08-19 11:01:02 +02:00
David Pilato
d2e2fb5cdf
Upgrade Tika to 1.4.
...
Closes #36 .
2013-08-14 16:57:42 +02:00
David Pilato
c0663277bc
prepare for next development iteration
2013-08-07 10:02:02 +02:00
David Pilato
0a454efe18
prepare release elasticsearch-mapper-attachments-1.8.0
2013-08-07 09:52:29 +02:00
David Pilato
d054f9a1e7
Mapper 1.7.0 does not work with elasticsearch 0.90.3
...
FastByteArrayInputStream has been removed in 0.90.3.
Closes #34 .
2013-08-07 09:47:12 +02:00
Shay Banon
690779cf2f
move to 1.8 snap
2013-02-26 16:06:53 +01:00
Shay Banon
7e58416506
release 1.7
2013-02-26 16:06:39 +01:00
David Pilato
942b87b763
Move to Elasticsearch 0.21.0.Beta1
...
Due to refactoring in 0.21.x we have to update this plugin
Closes #24 .
2013-02-23 12:13:51 +01:00
David Pilato
eba4da7086
NPE if "content" is missing in mapper-attachment plugin
...
Curl recreation:
curl -X DELETE "localhost:9200/test"
curl -X PUT "localhost:9200/test" -d '{
"settings" : { "index" : { "number_of_shards" : 1, "number_of_replicas" : 0 }}
}'
curl -X GET "localhost:9200/_cluster/health?wait_for_status=green&pretty=1&timeout=5s"
curl -X PUT "localhost:9200/test/attachment/_mapping" -d '{
"attachment" : {
"properties" : {
"file" : {
"type" : "attachment"
}
}
}
}'
curl -X PUT "localhost:9200/test/attachment/1" -d '{
"file" : {
"_content_type" : "application/pdf",
"_name" : "resource/name/of/my.pdf"
}
}
'
Produces a:
{"error":"NullPointerException[null]","status":500}
And in ES logs:
[2013-02-20 12:49:04,445][DEBUG][action.index ] [Drake, Frank] [test][0], node[LI6crwNKQmu1ue1u7mlqGA], [P], s[STARTED]: Failed to execute [index {[test][attachment][1], source[{
"file" : {
"_content_type" : "application/pdf",
"_name" : "resource/name/of/my.pdf"
}
}
]}]
java.lang.NullPointerException
at org.elasticsearch.common.io.FastByteArrayInputStream.<init>(FastByteArrayInputStream.java:90)
at org.elasticsearch.index.mapper.attachment.AttachmentMapper.parse(AttachmentMapper.java:309)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:507)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:449)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:486)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:430)
at org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:318)
at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:203)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:531)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:429)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
Closes #23
2013-02-23 09:40:07 +01:00
Martijn van Groningen
69f8bdea03
Master is now 0.20
2012-12-21 15:17:02 +01:00
David Pilato
30e425e209
Plugin must be 0.20.x compatible (tests fails)
2012-12-21 15:16:04 +01:00
David Pilato
17ae816a6a
Move resources in /src/test/resources
2012-12-21 15:12:15 +01:00
Martijn van Groningen
15248a9d52
Set next development version
2012-09-28 12:09:44 +02:00
Martijn van Groningen
a163fdad0f
Prepare 1.6.0 release
2012-09-28 12:00:12 +02:00
David Pilato
00d87de418
#13 : Fix dependency to tika-app
2012-09-28 10:00:45 +02:00
Martijn van Groningen
64254e621b
Removed ExtendedTika class. The maxLength change made it into Tika 1.2, which we use now and was the reason of having this class.
2012-09-19 12:44:48 +02:00
Martijn van Groningen
ab159355ed
Set new development version
2012-09-19 11:42:34 +02:00
Martijn van Groningen
0a17fe2e44
Release 1.5
2012-09-19 11:33:06 +02:00
Martijn van Groningen
5c649ad226
Upgraded Tika, Testng, hamcrest, log4j and surefire plugin.
...
Closes #12
2012-09-19 10:55:58 +02:00
Shay Banon
65043c0692
add license and repo
2012-06-10 22:14:18 +02:00
Shay Banon
0ae4c73386
move to 1.5.0 snap
2012-03-25 20:11:07 +02:00
Shay Banon
66b96cb994
release 1.4.0
2012-03-25 20:10:46 +02:00
Shay Banon
c1df26e4e9
upgrade to tika 1.1
2012-03-25 20:00:45 +02:00
Shay Banon
4292512f8e
rename fileName to name
2012-03-17 11:54:40 +02:00
alheim
d9a822dba8
Add a fineName field to index the attchment fileName
2012-03-17 11:52:35 +02:00
Shay Banon
911fa246d0
move to 1.4.0 snap
2012-03-07 22:03:45 +02:00
Shay Banon
4482a5de67
release 1.3.0
2012-03-07 22:02:49 +02:00
Shay Banon
744e3772a5
update readme
2012-03-07 21:56:48 +02:00
Shay Banon
0352c1436e
change to _indexed_chars the parameter per doc, and add index.mapping.attachment.indexed_chars setting to globally change it (per index)
2012-03-07 21:53:41 +02:00
Shay Banon
59f38ff576
Merge branch 'master' of https://github.com/Henac/elasticsearch-mapper-attachments
2012-03-07 21:44:54 +02:00
Henac
9a26458862
Fixed issue with setting of maxStringLength applying globally to the tika instance.
...
I have extended the Tika class to allow for setting of how much text to
extract from a document to be on a per call basis.
2012-03-06 22:20:04 +11:00
Shay Banon
9882a2937b
update readme
2012-03-04 11:59:22 +02:00
Shay Banon
3a72b6b2c4
update to 0.19.0
2012-03-04 11:52:47 +02:00
Henac
6a08ca673a
Added the ability to specify the amount of text to extract and index from an attachment.
2012-03-04 16:09:21 +11:00
Shay Banon
bf12b2be21
Merge pull request #6 from dadoonet/master
...
Update maven assembly plugin version to 2.3
2012-02-26 15:39:45 -08:00
David Pilato
509b467658
Update maven assembly plugin to latest version : 2.3
2012-02-26 22:34:02 +01:00
David Pilato
79d7860a72
Merge remote-tracking branch 'elasticsearch/master'
...
Conflicts:
pom.xml
2012-02-26 17:19:12 +01:00
Shay Banon
dfd0e2cc41
latest assembly
2012-02-26 10:22:12 +02:00
Shay Banon
8ffa86bb31
latest elasticsearch
2012-02-26 10:21:53 +02:00
David Pilato
623952f839
Ignore eclipse files
2012-02-25 13:38:44 +01:00
David Pilato
1bee1b3d0c
Update to elasticsearch : 0.19.0.RC3 (fix dependencies issues)
2012-02-25 13:38:15 +01:00
David Pilato
ee3c17ec8b
We should indicate each plugin version. Assembly plugin 2.2-beta-5
...
(default) works but the latest release (2.2.1) won't as we don't set the
assembly id
2012-02-25 13:34:07 +01:00
Shay Banon
c13da65bea
move to 1.3.0 snap
2012-02-15 22:44:12 +02:00
Shay Banon
8d2a02e7d1
release 1.2.0
2012-02-15 22:43:48 +02:00
Shay Banon
0c0911d541
Upgrade to tika 1.0, closes #4 .
2012-02-15 22:34:08 +02:00
Shay Banon
7771070c44
move to 1.2.0 snap
2012-02-07 17:01:19 +02:00
Shay Banon
802b795289
release 1.1.0 supporting elasticsearch 0.19
2012-02-07 17:00:47 +02:00
Shay Banon
f97157da15
move to elasticsearch 0.19.0 snap and use some of its features
2012-01-31 14:38:41 +02:00