David Pilato
c353936b58
Add sonatype snapshot repository
2015-01-02 19:05:18 +01:00
David Pilato
33c9828385
Depend on elasticsearch-parent
...
To simplify plugins maintenance and provide more value in the future, we are starting to build an `elasticsearch-parent` project.
This commit is the first step for this plugin to depend on this new `pom` maven project.
2014-12-14 19:59:15 +01:00
David Pilato
c338ae0dbe
[Test] copyToByteArray has been removed in master
2014-12-03 18:42:14 +01:00
David Pilato
e3d80af54e
Test: Fix removed queryString -> queryStringQuery
2014-12-03 18:31:53 +01:00
Adrien Grand
11b1287610
Upgrade to Lucene 5.0.0-snapshot-1642891
2014-12-02 18:16:59 +01:00
Colin Goodheart-Smithe
bbd4a62e50
Updated AttachmentMapper to work with new validation in ES 2.0
2014-11-28 16:04:31 +00:00
Michael McCandless
abb03dc3d9
Upgrade to Lucene 5.0.0-snapshot-1641343
2014-11-24 05:51:40 -05:00
Michael McCandless
55042f0f23
Upgrade to Lucene 5.0.0-snapshot-1637347
2014-11-10 16:45:44 -05:00
Robert Muir
4c1b27f544
upgrade to lucene 5 snapshot
2014-11-05 16:48:10 -05:00
tlrx
a5ed51533c
update documentation with release 2.4.1
2014-11-05 20:38:24 +01:00
Jun Ohtani
94880aae3e
Tests: thread leaks detected
...
* exclude *StarndaloneTest*.class from test target
* add cleanup to MultifieldAttachementMapperTests for terminating ThreadPool
* Modify MapperTestUtils.newMapperService for adding ThreadPool
Closes #88
2014-11-03 02:22:45 +09:00
Jun Ohtani
d3f2df6d62
Tests: Fix randomizedtest fail
...
Closes #90
2014-11-03 02:15:59 +09:00
Michael McCandless
4dae1879ad
Upgrade to Lucene 4.10.2
2014-10-30 05:55:35 -04:00
David Pilato
a0d7aafdac
Fix test
...
Related to #89
2014-10-27 22:18:50 +01:00
David Pilato
92bdc23c78
Fix test
...
Related to #89
2014-10-27 22:13:15 +01:00
David Pilato
faf34d745d
Fix test
...
Related to #89
2014-10-27 22:08:41 +01:00
David Pilato
d08e9c7080
Test: add a standalone tool which process content
...
This tool is a simple main class which can be used to test what is extracted from a given binary file or from its base64 equivalent.
You can give as first argument the BASE64 content
Available options:
-u file:/URL/TO/YOUR/DOC (in place of BASE64 content)
-s set extracted size (default to mapper attachment size)
Examples:
```
StandaloneTest BASE64Text
StandaloneTest BASE64Text -s 1000000
StandaloneTest -u /tmp/mydoc.pdf
StandaloneTest -u /tmp/mydoc.pdf -s 1000000
```
Closes #89 .
2014-10-27 22:01:22 +01:00
David Pilato
c3bf3b1ce9
Tests: AnalysisService constructor signature change
...
Due to this [change](https://github.com/elasticsearch/elasticsearch/pull/8018 ), we need to fix our tests for elasticsearch 1.4.0 and above.
Closes #87 .
(cherry picked from commit b3b0d34)
2014-10-15 13:05:41 +02:00
David Pilato
03b47d5a4c
update documentation with release 2.4.0
2014-10-08 18:50:20 +02:00
mikemccand
2ff4eb58d6
Upgrade to Lucene 4.10.1
2014-09-28 17:57:06 -04:00
Michael McCandless
67a2548441
Upgrade to Lucene 4.10.1 snapshot
2014-09-24 17:10:08 -04:00
David Pilato
eef6b61806
Create branch es-1.4 for elasticsearch 1.4.0
2014-09-12 16:08:59 +02:00
David Pilato
ba74fc2b5e
Remove netcdf support
...
Sadly netcdf library is not Apache2 License compatible so we should not package it anymore.
For users who wants to use it, they can add manually [netcdf librairies](http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/ ) in `plugins/mapper-attachments` dir and they will get the support back.
Closes #84 .
2014-09-08 23:51:01 +02:00
David Pilato
888d79075e
Update to Lucene 4.10.0
...
Closes #85 .
2014-09-08 23:47:15 +02:00
David Pilato
20ee711436
parseMultiField() method signature change in es 1.4 and master
...
As seen with https://github.com/elasticsearch/elasticsearch/pull/7474 , we need to update mapper attachment plugin with this new signature.
Closes #83 .
2014-09-04 11:23:09 +02:00
David Pilato
c0d053d283
Update to elasticsearch 1.4
...
Related to #77
(cherry picked from commit ad1742a)
2014-09-01 10:26:38 +02:00
David Pilato
34fe111a2b
update documentation with release 2.3.2
2014-09-01 09:53:26 +02:00
David Pilato
87b38c54eb
Unable to extract text from Word documents
...
With issue #80 we explicitly removed appache POI dependency provided by Tika and replaced with a more recent one.
Sadly we forgot to add this new dependency to the assembly so the final ZIP file does not contain POI related jars.
Closes #82 .
(cherry picked from commit 49793d5)
2014-09-01 09:41:57 +02:00
David Pilato
cc1a43b5c3
update documentation with release 2.3.1
2014-08-18 21:52:53 +02:00
David Pilato
08454d72f6
update documentation with release 2.2.1
2014-08-18 21:39:31 +02:00
David Pilato
2b172f8ff6
Update a few dependencies
...
Related to #80 .
2014-08-18 17:49:36 +02:00
David Pilato
5cf20331a8
Update to elasticsearch 1.4.0
...
Related to #77 .
(cherry picked from commit 7e65cfb)
2014-08-18 15:39:19 +02:00
David Pilato
75d03621aa
Update a few dependencies
...
Related to #80 .
(cherry picked from commit 89d5460)
2014-08-18 15:37:03 +02:00
David Pilato
587e6d3da2
Docs: make the welcome page more obvious
...
Closes #79 .
2014-08-18 12:38:03 +02:00
David Pilato
f8d2975946
Update a few dependencies
...
Closes #80 .
(cherry picked from commit 930c8be)
2014-08-18 12:27:23 +02:00
David Pilato
6edf3447b1
Remove old `content` deprecated field
...
In #73 , we deprecated `content` field in favor of `_content` field.
In plugin version 2.4.0, we can now remove the old field name.
Closes #75 .
(cherry picked from commit 7a0f838)
2014-07-26 00:33:50 +02:00
David Pilato
e704f68525
Log tika exceptions
...
Currently tika exceptions are swallowed with no log message.
We'd like to be able to know when/if this occurs and for what reason.
Closes #78 .
(cherry picked from commit 36b0117)
2014-07-26 00:27:49 +02:00
David Pilato
ad986eb2fc
Add support for multi-fields
...
Now https://github.com/elasticsearch/elasticsearch/pull/6867 is merged in elasticsearch core code (branch 1.x - es 1.4),
we can support multi fields in mapper attachment plugin.
```
DELETE /test
PUT /test
{
"settings": {
"number_of_shards": 1
}
}
PUT /test/person/_mapping
{
"person": {
"properties": {
"file": {
"type": "attachment",
"path": "full",
"fields": {
"file": {
"type": "string",
"fields": {
"store": {
"type": "string",
"store": true
}
}
},
"content_type": {
"type": "string",
"fields": {
"store": {
"type": "string",
"store": true
},
"untouched": {
"type": "string",
"index": "not_analyzed",
"store": true
}
}
}
}
}
}
}
}
PUT /test/person/1?refresh=true
{
"file": "IkdvZCBTYXZlIHRoZSBRdWVlbiIgKGFsdGVybmF0aXZlbHkgIkdvZCBTYXZlIHRoZSBLaW5nIg=="
}
GET /test/person/_search
{
"fields": [
"file.store",
"file.content_type.store"
],
"aggs": {
"store": {
"terms": {
"field": "file.content_type.store"
}
},
"untouched": {
"terms": {
"field": "file.content_type.untouched"
}
}
}
}
```
It gives:
```js
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test",
"_type": "person",
"_id": "1",
"_score": 1,
"fields": {
"file.store": [
"\"God Save the Queen\" (alternatively \"God Save the King\"\n"
],
"file.content_type.store": [
"text/plain; charset=ISO-8859-1"
]
}
}
]
},
"aggregations": {
"store": {
"doc_count_error_upper_bound": 0,
"buckets": [
{
"key": "1",
"doc_count": 1
},
{
"key": "8859",
"doc_count": 1
},
{
"key": "charset",
"doc_count": 1
},
{
"key": "iso",
"doc_count": 1
},
{
"key": "plain",
"doc_count": 1
},
{
"key": "text",
"doc_count": 1
}
]
},
"untouched": {
"doc_count_error_upper_bound": 0,
"buckets": [
{
"key": "text/plain; charset=ISO-8859-1",
"doc_count": 1
}
]
}
}
}
```
Note that using shorter definition works as well:
```
DELETE /test
PUT /test
{
"settings": {
"number_of_shards": 1
}
}
PUT /test/person/_mapping
{
"person": {
"properties": {
"file": {
"type": "attachment"
}
}
}
}
PUT /test/person/1?refresh=true
{
"file": "IkdvZCBTYXZlIHRoZSBRdWVlbiIgKGFsdGVybmF0aXZlbHkgIkdvZCBTYXZlIHRoZSBLaW5nIg=="
}
GET /test/person/_search
{
"query": {
"match": {
"file": "king"
}
}
}
```
gives:
```js
{
"took": 53,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.095891505,
"hits": [
{
"_index": "test",
"_type": "person",
"_id": "1",
"_score": 0.095891505,
"_source": {
"file": "IkdvZCBTYXZlIHRoZSBRdWVlbiIgKGFsdGVybmF0aXZlbHkgIkdvZCBTYXZlIHRoZSBLaW5nIg=="
}
}
]
}
}
```
Closes #57 .
(cherry picked from commit 432d7c0)
2014-07-26 00:27:28 +02:00
David Pilato
663d4eaddb
Update to elasticsearch 1.4.0
...
Closes #77 .
(cherry picked from commit c58516f)
2014-07-26 00:26:41 +02:00
David Pilato
eaccd4383d
Deprecate `content` by `_content`
...
When we want to force some values, we need to set those using `_field` where `field` is the field name we want to force:
```
{
"file": {
"_name": "myfilename.txt"
}
}
```
But to set the content itself, we use `content` field name.
```
{
"file": {
"content": "VGhpcyBpcyBhbiBlbGFzdGljc2VhcmNoIG1hcHBlciBhdHRhY2htZW50IHRlc3Qu",
"_name": "myfilename.txt"
}
}
```
For consistency, we set `_content` instead:
```
{
"file": {
"_content": "VGhpcyBpcyBhbiBlbGFzdGljc2VhcmNoIG1hcHBlciBhdHRhY2htZW50IHRlc3Qu",
"_name": "myfilename.txt"
}
}
```
Closes #73 .
(cherry picked from commit 2e6be20)
2014-07-25 18:15:37 +02:00
David Pilato
1d1225b87c
Update to Lucene 4.9.0
...
Update to elasticsearch 1.3.0
Move to java 1.7
Related to #67 .
Closed #76 .
(cherry picked from commit 2303932)
2014-07-25 18:15:28 +02:00
David Pilato
310df36bfa
SL4FJ dependency version problem
...
This is due to `edu.ucar:netcdf` lib which comes from `tika-parsers` dependency.
```
[INFO] +- org.apache.tika:tika-parsers:jar:1.5:compile
[INFO] | +- edu.ucar:netcdf:jar:4.2-min:compile
[INFO] | | \- org.slf4j:slf4j-api:jar:1.5.6:compile
```
We can exclude this library from the generated ZIP artifact.
Closes #41 .
2014-06-14 18:56:14 +02:00
David Pilato
51a8f6f1a0
Fix doc typo
...
(cherry picked from commit f70eb1d)
2014-06-03 10:13:12 +02:00
David Pilato
a3bb103297
Remove deprecated `language` forced field
...
With #68 we replaced `language`field with `_language`.
We can now remove the old deprecated name.
Closes #69 .
(cherry picked from commit e39f144)
2014-06-03 10:11:13 +02:00
David Pilato
94cf141108
Use` _language` field instead of `language`
...
When we want to force a language instead of using Tika language detection, we set `language` field in documents.
To be consistent with other forced fields, `_content_type` and `_name`, we should prefix `language` field by an underscore `_`.
So `language` become `_language`.
We first deprecate `language` in version 2.1.0 and we remove it in 2.3.0.
Closes #68 .
(cherry picked from commit 2f46343)
2014-06-03 10:10:49 +02:00
David Pilato
7c1c2011bc
Update to elasticsearch 1.3.0
...
Closes #67 .
(cherry picked from commit d3eaac9)
2014-06-03 09:49:41 +02:00
David Pilato
c0e7795f1f
Update to elasticsearch 1.2.0
...
Closes #66 .
(cherry picked from commit fb3b288)
2014-06-03 09:49:13 +02:00
David Pilato
4b35501cf3
Setting "_content_type" in indexing request has no effect
...
Example below. I set the type as text/plain but it is identified as text/html.
```sh
#!/bin/sh
echo "\n\n Delete testidx \n"
curl -XDELETE "http://localhost:9200/testidx "
echo "\n\n Create index and mapping \n"
curl -XPUT "http://localhost:9200/testidx " -d'
{
"mappings": {
"session": {
"properties": {
"Content": {
"properties": {
"content": {
"type": "attachment",
"path": "full",
"store": "yes",
"fields": {
"content": {
"type": "string",
"store": "yes"
},
"author": {
"type": "string",
"store": "yes"
},
"title": {
"type": "string",
"store": "yes"
},
"name": {
"type": "string",
"store": "yes"
},
"date": {
"type": "date",
"format": "dateOptionalTime",
"store": "yes"
},
"keywords": {
"type": "string",
"store": "yes"
},
"content_type": {
"type": "string",
"store": "yes"
},
"content_length": {
"type": "integer",
"store": "yes"
}
}
}
}
}
}
}
}
}'
echo "\n\n Index document \n"
curl -XPOST "http://localhost:9200/_bulk " -d'
{"index":{"_index":"testidx","_type":"session"}}
{"Content":[{"_content_type":"text/plain","content":"BASE64ENCODED_CONTENT"}]}
'
echo "\n\n Refresh \n"
curl -XPOST "http://localhost:9200/testidx/_refresh "
echo "\n\n Get doc type \n"
curl -XPOST "http://localhost:9200/testidx/_search?pretty " -d'
{
"fields": ["Content.content.content_type","Content.content.content_length","Content.content"]
}'
```
Closes #65 .
(cherry picked from commit 38075dc)
2014-06-03 09:36:10 +02:00
David Pilato
7f8143ff12
Add highlighting documentation
...
Closes #54 .
(cherry picked from commit efdf8ef)
2014-06-03 09:35:05 +02:00
David Pilato
8855bd7ddc
Fix typo for JSON fields
...
(cherry picked from commit 63c60b8)
2014-06-03 09:34:51 +02:00