Commit Graph

16 Commits

Author SHA1 Message Date
Martijn van Groningen f611f1c99e ingest: Move processors from core to ingest-common module.
Folded grok processor into ingest-common module.

The rest tests have been moved to ingest-common module as well, because these tests don't run in the rest-api-spec module but in the distribution:integ-test-zip module
and adding a test plugin there felt just wrong to me. I think this is ok. I left a tiny ingest rest test behind in that tests with an empty pipeline.

Removed messy tests, these tests were already covered in the rest tests

Added ingest test plugin in test infra so that each module testing integration with ingest doesn't need write its own plugin

Moved reindex ingest tests to qa module

Closes #18490
2016-06-07 17:32:52 +02:00
Adrien Grand 459916f5dd Remove custom Base64 implementation. #18413
This replaces o.e.common.Base64 with java.util.Base64.
2016-05-23 11:32:42 +02:00
Ryan Ernst 1d40c4bbc1 Make java9 work again
This change makes ES compile with java9 again, build 118.
* There are a handful of changes due to failure to determine types during compile.
* The attachment plugins which use tika needed to have tika upgraded in order to pickup fixes there for java 9.
* azure discovery and s3 repository indirectly depend on jaxb, which is no longer in the default modules. They now add a jaxb dependency externally, and make JarHell allow for this package.
2016-05-21 09:41:51 -07:00
polyfractal 72094feb12 [TEST] Add missing sort processor to tests, continued 2016-05-17 16:39:53 -04:00
Martijn van Groningen 7aca1389e2 ingest: Add `date_index_name` processor.
Closes #17814
2016-04-29 17:20:48 +02:00
Martijn van Groningen dd2184ab25 ingest: Streamline option naming for several processors:
* `rename` processor, renamed `to` to `target_field`
* `date` processor, renamed `match_field` to `field` and renamed `match_formats` to `formats`
* `geoip` processor, renamed `source_field` to `field` and renamed `fields` to `properties`
* `attachment` processor, renamed `source_field` to `field` and renamed `fields` to `properties`

Closes #17835
2016-04-21 13:40:43 +02:00
Alexander Reelsen da19ddf3e6 Ingest Attachment: Allow to prevent base64 conversions by using raw bytes (#16601)
CBOR is natively supported in Elasticsearch and allows for byte arrays.
This means, that by using CBOR the user can prevent base64 conversions
for the data being sent back and forth.

This PR adds support to extract data from a byte array in addition to
a string. This also required to add a ByteArrayValueSource class.
2016-04-11 14:14:56 +02:00
Simon Willnauer 6f28c173e2 [TEST] Test that all processors are available 2016-03-14 21:42:37 +01:00
Ryan Ernst 51d87d94dc Add getClassLoader perm for tika in ingest 2016-03-10 11:17:25 -08:00
David Pilato 6deabac8e8 Can not extract text from Office documents (`.docx` extension)
Add REST test for:

* `.doc`
* `.docx`

The later fails with:

```
==> Test Info: seed=DB93397128B876D4; jvm=1; suite=1
Suite: org.elasticsearch.ingest.attachment.IngestAttachmentRestIT
  2> REPRODUCE WITH: gradle :plugins:ingest-attachment:integTest -Dtests.seed=DB93397128B876D4 -Dtests.class=org.elasticsearch.ingest.attachment.IngestAttachmentRestIT -Dtests.method="test {yaml=ingest_attachment/30_files_supported/Test ingest attachment processor with .docx file}" -Des.logger.level=WARN -Dtests.security.manager=true -Dtests.locale=bg -Dtests.timezone=Europe/Athens
FAILURE 4.53s | IngestAttachmentRestIT.test {yaml=ingest_attachment/30_files_supported/Test ingest attachment processor with .docx file} <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: expected [2xx] status code but api [index] returned [400 Bad Request] [{"error":{"root_cause":[{"type":"parse_exception","reason":"Error parsing document in field [field1]"}],"type":"parse_exception","reason":"Error parsing document in field [field1]","caused_by":{"type":"tika_exception","reason":"Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@7f85baa5","caused_by":{"type":"illegal_state_exception","reason":"access denied (\"java.lang.RuntimePermission\" \"getClassLoader\")","caused_by":{"type":"access_control_exception","reason":"access denied (\"java.lang.RuntimePermission\" \"getClassLoader\")"}}}},"status":400}]
   >    at __randomizedtesting.SeedInfo.seed([DB93397128B876D4:53C706AB86441B2C]:0)
   >    at org.elasticsearch.test.rest.section.DoSection.execute(DoSection.java:107)
   >    at org.elasticsearch.test.rest.ESRestTestCase.test(ESRestTestCase.java:395)
   >    at java.lang.Thread.run(Thread.java:745)
```

Related to #16864
2016-03-10 10:57:59 +01:00
Martijn van Groningen 82d01e4315 Added ingest info to node info API, which contains a list of available processors.
Internally the put pipeline API uses this information in node info API to validate if all specified processors in a pipeline exist on all nodes in the cluster.
2016-03-07 14:44:50 +01:00
Alexander Reelsen e8d24d10dc Tests: Fix AttachmentProcessorFactoryTests to only check for existing fields 2016-02-10 15:29:16 +01:00
javanna d5969bb33a Attachment Processor: setFieldValue only once as a map 2016-02-10 12:38:39 +01:00
javanna 4e3fb69861 [TEST] rewrite testEnglishTextDocumentWithRandomFields 2016-02-10 12:34:51 +01:00
javanna fe7469dffb Attachment processor: remove unused NAME enum 2016-02-10 12:34:21 +01:00
Alexander Reelsen 0d4711c2fc Ingest: Add attachment processor
This is a simple port of the mapper attachment plugin to the ingest
functionality, no new features. The only option is to limit
the number of chars to prevent indexing of huge documents.

Fields can be selected in the processor as well.

Close #16303
2016-02-09 17:03:30 +01:00