OpenSearch

Commit Graph

Author	SHA1	Message	Date
Ryan Ernst	765afb655e	Fix attachment mapper to expose subfields. The content mapper is now true subfield. There is limited backcompat support for the previous behavior of indexing and querying as the main field name. While indexes pre ES 2.0 can still be read, the content must now be queried with `FIELDNAME.content`.	2015-05-11 23:21:40 -07:00
David Pilato	1c030f6f75	Update to Tika 1.8 Tika 1.8 has been released. See https://dist.apache.org/repos/dist/release/tika/CHANGES-1.8.txt We can replace: ```java public static boolean isLocaleCompatible() { String language = Locale.getDefault().getLanguage(); boolean acceptedLocale = true; if ( // We can have issues with JDK7 Patch < 80 (JVM_MAJOR_VERSION == 1 && JVM_MINOR_VERSION == 7 && JVM_PATCH_MAJOR_VERSION == 0 && JVM_PATCH_MINOR_VERSION < 80) \|\| // We can have issues with JDK8 Patch < 40 (JVM_MAJOR_VERSION == 1 && JVM_MINOR_VERSION == 8 && JVM_PATCH_MAJOR_VERSION == 0 && JVM_PATCH_MINOR_VERSION < 40) ) { if (language.equalsIgnoreCase("tr") \|\| language.equalsIgnoreCase("az")) { acceptedLocale = false; } } return acceptedLocale; } ``` by ```java public static boolean isLocaleCompatible() { return true; } ``` Related to https://issues.apache.org/jira/browse/TIKA-1526 and #105 Note that Content-type has changed a bit and now returns something like `application/xhtml+xml; charset=ISO-8859-1` instead of `application/xhtml+xml`. Closes #112. (cherry picked from commit bf4af47971ed07bfa126409413c435f121444c3c)	2015-05-07 10:17:59 +02:00
Robert Muir	2f457111f9	Merge pull request #128 from elastic/get_past_mapping_changes Fix the build and try to migrate past mappings changes	2015-05-05 15:10:39 -07:00
Ryan Ernst	a83f98d018	fix standalone and remove unecessary override	2015-05-05 15:08:30 -07:00
Robert Muir	7ce86d95fe	Fix the build and try to migrate past mappings changes, but there is an @AwaitsFix test remaining with regards to copyTo behavior.	2015-05-05 13:43:31 -04:00
David Pilato	65a83e63d3	Mappings: Simplified mapper lookups Due to https://github.com/elastic/elasticsearch/pull/10705 We need to adapt the mapper attachment plugin to version 2.0.0 Closes #125.	2015-04-25 16:29:35 +02:00
David Pilato	7e2a9dbf0c	update documentation with release 2.5.0	2015-03-31 17:59:01 +02:00
David Pilato	d2c02b19fc	Don't wrap exceptions in `MapperParsingException` Some exceptions might not be serializable. It would be safer not to wrap them in a `MapperParsingException` but just create the `MapperParsingException`. Related to #113. (cherry picked from commit e58878c) (cherry picked from commit a673185)	2015-03-31 14:41:47 +02:00
David Pilato	cbad7dce76	Cleanup: Remove unsafe field in BytesStreamInput Related to https://github.com/elastic/elasticsearch/pull/10157 BytesStreamInput does not support anymore `BytesStreamInput(byte[], boolean)` Closes #120.	2015-03-30 15:00:50 +02:00
David Pilato	3154510fad	Update owner to elastic Fix typo in previous commit (cherry picked from commit 5303bc0) (cherry picked from commit d3dab9b) (cherry picked from commit 3ace2bb)	2015-03-30 14:53:58 +02:00
Robert Muir	51da1fe2a1	parse java.specification.version not java.version, so that it is robust	2015-03-30 14:50:18 +02:00
David Pilato	d20c8861ca	Update owner to elastic (cherry picked from commit c4d60ed) (cherry picked from commit 450d088)	2015-03-30 11:36:10 +02:00
David Pilato	9f6519f84a	Move parent after artifact coordinates	2015-03-30 11:35:54 +02:00
Robert Muir	021483626c	Merge pull request #117 from rmuir/exclude-jhighlight Exclude jhighlight dependency, which contains LGPL-only files	2015-03-20 14:55:00 -04:00
Robert Muir	977a7247c7	Exclude jhighlight dependency, which contains LGPL-only files	2015-03-20 14:42:55 -04:00
David Pilato	e08ebe9efa	create `es-1.5` branch	2015-03-16 16:52:08 -07:00
David Pilato	208c76e45e	[Test] Fix remaining static objects after running tests Test framework detects when static objects are not released when running tests. This commit remove usage of static objects when possible.	2015-02-23 17:46:28 +01:00
David Pilato	d4d54fe744	update documentation with release 2.4.3	2015-02-23 16:56:39 +01:00
David Pilato	cfd83443f1	Add test for asciidoc format Related to #29.	2015-02-23 16:43:45 +01:00
David Pilato	4f65664916	Tika might fails depending on the Locale Tika might fail with some Locale under some JVMs. We now check that won't happen before creating a Tika instance. That will generate a `WARN` in logs like: ``` Tika can not be initialized with the current Locale [tr] on the current JVM [1.7.0_60] ``` To check that Tika is not initialized, you can run the test suite with: ```sh mvn test -Dtests.output=always -Dtests.locale=tr ``` Closes #105. (cherry picked from commit d6d63f7) (cherry picked from commit 532bdf7)	2015-02-23 14:49:08 +01:00
David Pilato	a10c35f0eb	[Test] move test package to o.e.index.mapper.attachment Our package naming for tests is inconsistent. We should move tests from: * `o.e.index.mapper.xcontent` to `o.e.index.mapper.attachment.test.unit` * `o.e.plugin.mapper.attachments.test` to `o.e.index.mapper.attachment.test.integration` * `StandaloneRunner` class to `o.e.index.mapper.attachment.test.standalone` Also rename resource dirs to match the test name so it's definitely easier to find mappings used for each test. Closes #110.	2015-02-23 11:54:38 +01:00
David Pilato	4ffa06d773	[Doc] highlighting example is incorrect Closes #107.	2015-02-23 11:10:50 +01:00
David Pilato	6d77b085eb	[Test] Add highlighting test Closes #108. (cherry picked from commit 2c96550) (cherry picked from commit 440e534)	2015-02-23 11:10:49 +01:00
David Pilato	36344ac8b8	[Internal] Fix field mappers to always pass through index settings Caused by https://github.com/elasticsearch/elasticsearch/pull/9780 we now need to pass index settings instead of empty settings. Closes #109.	2015-02-23 11:04:35 +01:00
David Pilato	c3c9f66d0d	Indexing docx file fails I use ElasticSearch 1.4.3 with mapper-attachment plugin 2.4.2 (TIKA 1.7). I get an error when indexing specific docx file: > "[DEBUG][org.elasticsearch.index.mapper.attachment.AttachmentMapper] Failed to extract [-1] characters of text for [null]: [org.apache.poi.xwpf.usermodel.XWPFSDT.getContent()Lorg/apache/poi/xwpf/usermodel/ISDTContent;]" But if i use mapper-attachment plugin 2.4.1 (TIKA 1.5) there is no error and content is parsed successfully. Caused by this change #94. Closes #104.	2015-02-20 19:02:43 +01:00
David Pilato	1e0f03bc90	Remove DocValuesFormatService and PostingsFormatService Related to elasticsearch/elasticsearch#9741 Closes #103.	2015-02-19 19:05:59 +01:00
David Pilato	ec0de9c57d	[Test] Use now full qualified names for fields We were asking for short name fields but elasticsearch does not allow anymore using short names but full qualified names. ```java SearchResponse response = client().prepareSearch("test") .addField("content_type") .addField("name") .execute().get(); ``` We need to use now: ```java SearchResponse response = client().prepareSearch("test") .addField("file.content_type") .addField("file.name") .execute().get(); ``` Closes #102.	2015-02-18 20:36:25 +01:00
David Pilato	400910e53e	update documentation with release 2.4.2	2015-02-11 23:22:02 +01:00
David Pilato	77081e3dbf	[Doc] copy_to using attachment field type If you want to use [copy_to](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#copy-to) feature, you need to define it on each sub-field you want to copy to another field: ```javascript PUT /test/person/_mapping { "person": { "properties": { "file": { "type": "attachment", "path": "full", "fields": { "file": { "type": "string", "copy_to": "copy" } } }, "copy": { "type": "string" } } } } ``` In this example, the extracted content will be copy as well to `copy` field. Closes #97. (cherry picked from commit f4f6b57) (cherry picked from commit 5878a62)	2015-02-11 23:13:56 +01:00
David Pilato	ec59d381b8	Upgrade Tika to 1.7 Closes #94. (cherry picked from commit 0ab38f3) (cherry picked from commit 96c7bb1)	2015-02-11 17:17:41 +01:00
David Pilato	931be57da9	[test] Add standalone runner It could be sometime useful to have a stand alone runner to see how exactly Tika extracts content from a given file. You can run `StandaloneRunner` class using: * `-u file://URL/TO/YOUR/DOC` * `--size` set extracted size (default to mapper attachment size) * `BASE64` encoded binary Example: ```sh StandaloneRunner BASE64Text StandaloneRunner -u /tmp/mydoc.pdf StandaloneRunner -u /tmp/mydoc.pdf --size 1000000 ``` It produces something like: ``` ## Extracted text --------------------- BEGIN ----------------------- This is the extracted text ---------------------- END ------------------------ ## Metadata - author: null - content_length: null - content_type: application/pdf - date: null - keywords: null - language: null - name: null - title: null ``` Closes #99. (cherry picked from commit 720b3bf) (cherry picked from commit 990fa15)	2015-02-09 17:45:07 +01:00
David Pilato	c353936b58	Add sonatype snapshot repository	2015-01-02 19:05:18 +01:00
David Pilato	33c9828385	Depend on elasticsearch-parent To simplify plugins maintenance and provide more value in the future, we are starting to build an `elasticsearch-parent` project. This commit is the first step for this plugin to depend on this new `pom` maven project.	2014-12-14 19:59:15 +01:00
David Pilato	c338ae0dbe	[Test] copyToByteArray has been removed in master	2014-12-03 18:42:14 +01:00
David Pilato	e3d80af54e	Test: Fix removed queryString -> queryStringQuery	2014-12-03 18:31:53 +01:00
Adrien Grand	11b1287610	Upgrade to Lucene 5.0.0-snapshot-1642891	2014-12-02 18:16:59 +01:00
Colin Goodheart-Smithe	bbd4a62e50	Updated AttachmentMapper to work with new validation in ES 2.0	2014-11-28 16:04:31 +00:00
Michael McCandless	abb03dc3d9	Upgrade to Lucene 5.0.0-snapshot-1641343	2014-11-24 05:51:40 -05:00
Michael McCandless	55042f0f23	Upgrade to Lucene 5.0.0-snapshot-1637347	2014-11-10 16:45:44 -05:00
Robert Muir	4c1b27f544	upgrade to lucene 5 snapshot	2014-11-05 16:48:10 -05:00
tlrx	a5ed51533c	update documentation with release 2.4.1	2014-11-05 20:38:24 +01:00
Jun Ohtani	94880aae3e	Tests: thread leaks detected * exclude StarndaloneTest.class from test target * add cleanup to MultifieldAttachementMapperTests for terminating ThreadPool * Modify MapperTestUtils.newMapperService for adding ThreadPool Closes #88	2014-11-03 02:22:45 +09:00
Jun Ohtani	d3f2df6d62	Tests: Fix randomizedtest fail Closes #90	2014-11-03 02:15:59 +09:00
Michael McCandless	4dae1879ad	Upgrade to Lucene 4.10.2	2014-10-30 05:55:35 -04:00
David Pilato	a0d7aafdac	Fix test Related to #89	2014-10-27 22:18:50 +01:00
David Pilato	92bdc23c78	Fix test Related to #89	2014-10-27 22:13:15 +01:00
David Pilato	faf34d745d	Fix test Related to #89	2014-10-27 22:08:41 +01:00
David Pilato	d08e9c7080	Test: add a standalone tool which process content This tool is a simple main class which can be used to test what is extracted from a given binary file or from its base64 equivalent. You can give as first argument the BASE64 content Available options: -u file:/URL/TO/YOUR/DOC (in place of BASE64 content) -s set extracted size (default to mapper attachment size) Examples: ``` StandaloneTest BASE64Text StandaloneTest BASE64Text -s 1000000 StandaloneTest -u /tmp/mydoc.pdf StandaloneTest -u /tmp/mydoc.pdf -s 1000000 ``` Closes #89.	2014-10-27 22:01:22 +01:00
David Pilato	c3bf3b1ce9	Tests: AnalysisService constructor signature change Due to this [change](https://github.com/elasticsearch/elasticsearch/pull/8018), we need to fix our tests for elasticsearch 1.4.0 and above. Closes #87. (cherry picked from commit b3b0d34)	2014-10-15 13:05:41 +02:00
David Pilato	03b47d5a4c	update documentation with release 2.4.0	2014-10-08 18:50:20 +02:00

1 2 3 4

162 Commits All Branches Search

162 Commits

All Branches