Commit Graph

107 Commits

Author SHA1 Message Date
David Pilato 406e295c6c In test for #38, we should check the real file name as we have it :-). 2013-08-20 12:34:33 +02:00
Frédéric Camblor 019d0f9a26 Don't reject full document in case of invalid metadata
From original PR #17 from @fcamblor

If you try to index a document with an invalid metadata, the full document is rejected.

For example:

```html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html lang="fr">
<head>
<title>Hello</title>
<meta name="date" content="">
<meta name="Author" content="kimchy">
<meta name="Keywords" content="elasticsearch,cool,bonsai">
</head>
<body>World</body>
</html>
```

has a non parseable date.

This fix add a new option that ignore parsing errors `"index.mapping.attachment.ignore_errors":true` (default to `true`).

Closes #17, #38.
2013-08-20 12:26:49 +02:00
David Pilato d7a2e7e2ff Mapper plugin overwrites multifield mapping
If you define some specific mapping for your file content, such as the following:

```javascript
{
    "person": {
        "properties": {
            "file": {
                "type": "attachment",
                "path": "full",
                "fields": {
                    "file": {
                        "type": "multifield",
                        "fields": {
                            "file": { "type": "string" },
                            "suggest": { "type": "string" }
                        }
                    }
                }
            }
        }
    }
}
```

And then, if you ask back the mapping, you get:

```javascript
{
   "person":{
      "properties":{
         "file":{
            "type":"attachment",
            "path":"full",
            "fields":{
               "file":{
                  "type":"string"
               },
               "author":{
                  "type":"string"
               },
               "title":{
                  "type":"string"
               },
               "name":{
                  "type":"string"
               },
               "date":{
                  "type":"date",
                  "format":"dateOptionalTime"
               },
               "keywords":{
                  "type":"string"
               },
               "content_type":{
                  "type":"string"
               }
            }
         }
      }
   }
}
```

All your settings have been overwrited by the mapper plugin.

Closes #37.
2013-08-19 11:01:02 +02:00
David Pilato d2e2fb5cdf Upgrade Tika to 1.4.
Closes #36.
2013-08-14 16:57:42 +02:00
David Pilato c0663277bc prepare for next development iteration 2013-08-07 10:02:02 +02:00
David Pilato 0a454efe18 prepare release elasticsearch-mapper-attachments-1.8.0 2013-08-07 09:52:29 +02:00
David Pilato d054f9a1e7 Mapper 1.7.0 does not work with elasticsearch 0.90.3
FastByteArrayInputStream has been removed in 0.90.3.
Closes #34.
2013-08-07 09:47:12 +02:00
Shay Banon 690779cf2f move to 1.8 snap 2013-02-26 16:06:53 +01:00
Shay Banon 7e58416506 release 1.7 2013-02-26 16:06:39 +01:00
David Pilato 942b87b763 Move to Elasticsearch 0.21.0.Beta1
Due to refactoring in 0.21.x we have to update this plugin
Closes #24.
2013-02-23 12:13:51 +01:00
David Pilato eba4da7086 NPE if "content" is missing in mapper-attachment plugin
Curl recreation:

        curl -X DELETE "localhost:9200/test"

        curl -X PUT "localhost:9200/test" -d '{
          "settings" : { "index" : { "number_of_shards" : 1, "number_of_replicas" : 0 }}
        }'

        curl -X GET "localhost:9200/_cluster/health?wait_for_status=green&pretty=1&timeout=5s"

        curl -X PUT "localhost:9200/test/attachment/_mapping" -d '{
          "attachment" : {
            "properties" : {
              "file" : {
                "type" : "attachment"
              }
            }
          }
        }'

        curl -X PUT "localhost:9200/test/attachment/1" -d '{
            "file" : {
                "_content_type" : "application/pdf",
                "_name" : "resource/name/of/my.pdf"
            }
        }
        '

Produces a:

        {"error":"NullPointerException[null]","status":500}

And in ES logs:

      [2013-02-20 12:49:04,445][DEBUG][action.index             ] [Drake, Frank] [test][0], node[LI6crwNKQmu1ue1u7mlqGA], [P], s[STARTED]: Failed to execute [index {[test][attachment][1], source[{
          "file" : {
              "_content_type" : "application/pdf",
              "_name" : "resource/name/of/my.pdf"
          }
      }
      ]}]
      java.lang.NullPointerException
      	at org.elasticsearch.common.io.FastByteArrayInputStream.<init>(FastByteArrayInputStream.java:90)
      	at org.elasticsearch.index.mapper.attachment.AttachmentMapper.parse(AttachmentMapper.java:309)
      	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:507)
      	at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:449)
      	at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:486)
      	at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:430)
      	at org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:318)
      	at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:203)
      	at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:531)
      	at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:429)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      	at java.lang.Thread.run(Thread.java:680)

Closes #23
2013-02-23 09:40:07 +01:00
Martijn van Groningen 69f8bdea03 Master is now 0.20 2012-12-21 15:17:02 +01:00
David Pilato 30e425e209 Plugin must be 0.20.x compatible (tests fails) 2012-12-21 15:16:04 +01:00
David Pilato 17ae816a6a Move resources in /src/test/resources 2012-12-21 15:12:15 +01:00
Martijn van Groningen 15248a9d52 Set next development version 2012-09-28 12:09:44 +02:00
Martijn van Groningen a163fdad0f Prepare 1.6.0 release 2012-09-28 12:00:12 +02:00
David Pilato 00d87de418 #13 : Fix dependency to tika-app 2012-09-28 10:00:45 +02:00
Martijn van Groningen 64254e621b Removed ExtendedTika class. The maxLength change made it into Tika 1.2, which we use now and was the reason of having this class. 2012-09-19 12:44:48 +02:00
Martijn van Groningen ab159355ed Set new development version 2012-09-19 11:42:34 +02:00
Martijn van Groningen 0a17fe2e44 Release 1.5 2012-09-19 11:33:06 +02:00
Martijn van Groningen 5c649ad226 Upgraded Tika, Testng, hamcrest, log4j and surefire plugin.
Closes #12
2012-09-19 10:55:58 +02:00
Shay Banon 65043c0692 add license and repo 2012-06-10 22:14:18 +02:00
Shay Banon 0ae4c73386 move to 1.5.0 snap 2012-03-25 20:11:07 +02:00
Shay Banon 66b96cb994 release 1.4.0 2012-03-25 20:10:46 +02:00
Shay Banon c1df26e4e9 upgrade to tika 1.1 2012-03-25 20:00:45 +02:00
Shay Banon 4292512f8e rename fileName to name 2012-03-17 11:54:40 +02:00
alheim d9a822dba8 Add a fineName field to index the attchment fileName 2012-03-17 11:52:35 +02:00
Shay Banon 911fa246d0 move to 1.4.0 snap 2012-03-07 22:03:45 +02:00
Shay Banon 4482a5de67 release 1.3.0 2012-03-07 22:02:49 +02:00
Shay Banon 744e3772a5 update readme 2012-03-07 21:56:48 +02:00
Shay Banon 0352c1436e change to _indexed_chars the parameter per doc, and add index.mapping.attachment.indexed_chars setting to globally change it (per index) 2012-03-07 21:53:41 +02:00
Shay Banon 59f38ff576 Merge branch 'master' of https://github.com/Henac/elasticsearch-mapper-attachments 2012-03-07 21:44:54 +02:00
Henac 9a26458862 Fixed issue with setting of maxStringLength applying globally to the tika instance.
I have extended the Tika class to allow for setting of how much text to
extract from a document to be on a per call basis.
2012-03-06 22:20:04 +11:00
Shay Banon 9882a2937b update readme 2012-03-04 11:59:22 +02:00
Shay Banon 3a72b6b2c4 update to 0.19.0 2012-03-04 11:52:47 +02:00
Henac 6a08ca673a Added the ability to specify the amount of text to extract and index from an attachment. 2012-03-04 16:09:21 +11:00
Shay Banon bf12b2be21 Merge pull request #6 from dadoonet/master
Update maven assembly plugin version to 2.3
2012-02-26 15:39:45 -08:00
David Pilato 509b467658 Update maven assembly plugin to latest version : 2.3 2012-02-26 22:34:02 +01:00
David Pilato 79d7860a72 Merge remote-tracking branch 'elasticsearch/master'
Conflicts:
	pom.xml
2012-02-26 17:19:12 +01:00
Shay Banon dfd0e2cc41 latest assembly 2012-02-26 10:22:12 +02:00
Shay Banon 8ffa86bb31 latest elasticsearch 2012-02-26 10:21:53 +02:00
David Pilato 623952f839 Ignore eclipse files 2012-02-25 13:38:44 +01:00
David Pilato 1bee1b3d0c Update to elasticsearch : 0.19.0.RC3 (fix dependencies issues) 2012-02-25 13:38:15 +01:00
David Pilato ee3c17ec8b We should indicate each plugin version. Assembly plugin 2.2-beta-5
(default) works but the latest release (2.2.1) won't as we don't set the
assembly id
2012-02-25 13:34:07 +01:00
Shay Banon c13da65bea move to 1.3.0 snap 2012-02-15 22:44:12 +02:00
Shay Banon 8d2a02e7d1 release 1.2.0 2012-02-15 22:43:48 +02:00
Shay Banon 0c0911d541 Upgrade to tika 1.0, closes #4. 2012-02-15 22:34:08 +02:00
Shay Banon 7771070c44 move to 1.2.0 snap 2012-02-07 17:01:19 +02:00
Shay Banon 802b795289 release 1.1.0 supporting elasticsearch 0.19 2012-02-07 17:00:47 +02:00
Shay Banon f97157da15 move to elasticsearch 0.19.0 snap and use some of its features 2012-01-31 14:38:41 +02:00