Commit Graph

77 Commits

Author SHA1 Message Date
Alexander Reelsen d299bf5760 Add tests for ingesting CBOR data attachments (#49715)
Our docs specifically mention that CBOR is supported when ingesting attachments. However this is not tested anywhere.

This adds a test, that uses specifically CBOR format in its IndexRequest and another one that behaves like CBOR in the ingest attachment unit tests.
2019-12-06 14:33:39 +01:00
Rory Hunter c46a0e8708
Apply 2-space indent to all gradle scripts (#49071)
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
2019-11-14 11:01:23 +00:00
Mark Vieira 6ab4645f4e
[7.x] Introduce type-safe and consistent pattern for handling build globals (#48818)
This commit introduces a consistent, and type-safe manner for handling
global build parameters through out our build logic. Primarily this
replaces the existing usages of extra properties with static accessors.
It also introduces and explicit API for initialization and mutation of
any such parameters, as well as better error handling for uninitialized
or eager access of parameter values.

Closes #42042
2019-11-01 11:33:11 -07:00
Igor Motov 1818c5fa44 Ingest Attachment: Upgrade tika to v1.22 (#45575)
Upgrades:
Apache Tika: 1.19.1 -> 1.22.
pdfbox : 2.0.12 -> 2.0.16
poi : 4.0.0 -> 4.0.1
2019-08-19 18:17:16 -04:00
Mark Vieira c1816354ed
[Backport] Improve build configuration time (#42674) 2019-05-30 10:29:42 -07:00
Martijn van Groningen e5cec87697
Remove -Xlint exclusions in all plugins. (#40721)
The xlint exclusions of the following plugins were removed:
* ingest-attachment.
* mapper-size.
* transport-nio. Removing the -try exclusion required some work, because
  the NettyAdaptor implements AutoCloseable and NettyAdaptor#close() method
  could throw an InterruptedException (ChannelFuture#await() and a generic
  Exception is re-thrown, which maybe an ChannelFuture). The easiest way
  around this to me seemed that NettyAdaptor should not implement AutoCloseable,
  because it is not directly used in a try-with-resources statement.

Relates to #40366
2019-04-04 08:30:34 +02:00
Jason Tedor d02bca1314
Upgrade the bouncycastle dependency to 1.61 (#40017)
This commit upgrades the bouncycastle dependency from 1.59 to 1.61.
2019-03-14 08:54:47 -04:00
Jay Modi 54dbf9469c
Update httpclient for JDK 11 TLS engine (#37994)
The apache commons http client implementations recently released
versions that solve TLS compatibility issues with the new TLS engine
that supports TLSv1.3 with JDK 11. This change updates our code to
use these versions since JDK 11 is a supported JDK and we should
allow the use of TLSv1.3.
2019-01-30 14:24:29 -07:00
Colin Goodheart-Smithe 21e392e95e
Removes typed calls from YAML REST tests (#37611)
This PR attempts to remove all typed calls from our YAML REST tests. The PR adds include_type_name: false to create index requests that use a mapping and also to put mapping requests. It also removes _type from index requests where they haven't already been removed. The PR ignores tests named *_with_types.yml since this are specifically testing typed API behaviour.

The change also includes changing the test harness to add the type _doc to index, update, get and bulk requests that do not specify the document type when the test is running against a mixed 7.x/6.x cluster.
2019-01-30 16:32:58 +00:00
Alpar Torok a7c3d5842a
Split third party audit exclusions by type (#36763) 2019-01-07 17:24:19 +02:00
Alpar Torok 59536966c2
Add a new "contains" feature (#34738)
The contains syntax was added in #30874 but the skips were not properly
put in place.
The java runner has the feature so the tests will run as part of the
build, but language clients will be able to support it at their own
pace.
2018-10-25 08:50:50 +03:00
Ioannis Kakavas 080ddd5d9c
[WIP] Ingest Attachement: Upgrade tika to v1.19.1 (#33896)
Upgrades Tika to 1.19.1 and relevant transitive dependencies

Resolves: #31456, #31305
2018-10-12 17:09:14 +01:00
Armin Braun 46774098d9
INGEST: Implement Drop Processor (#32278)
* INGEST: Implement Drop Processor
* Adjust Processor API
* Implement Drop Processor
* Closes #23726
2018-09-05 14:25:29 +02:00
Alpar Torok 3828ec60f5
Fix forbidden apis on FIPS (#33202)
- third party audit detects jar hell with JDK so we disable it
- jdk non portable in forbiddenapis detects classes being used from the
JDK ( for fips ) that are not portable, this is intended so we don't
scan for it on fips.
- different exclusion rules for third party audit on fips

Closes #33179
2018-08-29 17:43:40 +03:00
Alpar Torok 2cc611604f
Run Third party audit with forbidden APIs CLI (part3/3) (#33052)
The new implementation is functional equivalent with the old, ant based one.
It parses task standard error to get the missing classes and violations in the same way.
I considered re-using ForbiddenApisCliTask but Gradle makes it hard to build inheritance with tasks that have task actions , since the order of the task actions can't be controlled.
This inheritance isn't dully desired either as the third party audit task is much more opinionated and we don't want to expose some of the configuration.
We could probably extract a common base class without any task actions, but probably more trouble than it's worth.

Closes #31715
2018-08-28 10:03:30 +03:00
Alpar Torok cf2295b408
Add JDK11 support and enable in CI (#31644)
* Upgrade bouncycastle

Required to fix
`bcprov-jdk15on-1.55.jar; invalid manifest format `
on jdk 11

* Downgrade bouncycastle to avoid invalid manifest

* Add checksum for new jars

* Update tika permissions for jdk 11

* Mute test failing on jdk 11

* Add JDK11 to CI

* Thread#stop(Throwable) was removed

http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-June/053536.html

* Disable failing tests #31456

* Temprorarily disable doc tests

To see if there are other failures on JDK11

* Only blacklist specific doc tests

* Disable only failing tests in ingest attachment plugin

* Mute failing HDFS tests #31498

* Mute failing lang-painless tests #31500

* Fix backwards compatability builds

Fix JAVA version to 10 for ES 6.3

* Add 6.x to bwx -> java10

* Prefix out and err from buildBwcVersion for readability

```
> Task :distribution:bwc:next-bugfix-snapshot:buildBwcVersion
  [bwc] :buildSrc:compileJava
  [bwc] WARNING: An illegal reflective access operation has occurred
  [bwc] WARNING: Illegal reflective access by org.codehaus.groovy.reflection.CachedClass (file:/home/alpar/.gradle/wrapper/dists/gradle-4.5-all/cg9lyzfg3iwv6fa00os9gcgj4/gradle-4.5/lib/groovy-all-2.4.12.jar) to method java.lang.Object.finalize()
  [bwc] WARNING: Please consider reporting this to the maintainers of org.codehaus.groovy.reflection.CachedClass
  [bwc] WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
  [bwc] WARNING: All illegal access operations will be denied in a future release
  [bwc] :buildSrc:compileGroovy
  [bwc] :buildSrc:writeVersionProperties
  [bwc] :buildSrc:processResources
  [bwc] :buildSrc:classes
  [bwc] :buildSrc:jar

```

* Also set RUNTIME_JAVA_HOME for bwcBuild

So that we can make sure it's not too new for the build to understand.

* Align bouncycastle dependency

* fix painles array tets

closes #31500

* Update jar checksums

* Keep 8/10 runtime/compile untill consensus builds on 11

* Only skip failing tests if running on Java 11

* Failures are dependent of compile java version not runtime

* Condition doc test exceptions on compiler java version as well

* Disable hdfs tests based on runtime java

* Set runtime java to minimum supported for bwc

* PR review

* Add comment with ticket for forbidden apis
2018-07-05 03:24:01 +00:00
Alpar Torok 08b8d11e30
Add support for switching distribution for all integration tests (#30874)
* remove left-over comment

* make sure of the property for plugins

* skip installing modules if these exist in the distribution

* Log the distrbution being ran

* Don't allow running with integ-tests-zip passed externally

* top level x-pack/qa can't run with oss distro

* Add support for matching objects in lists

Makes it possible to have a key that points to a list and assert that a
certain object is present in the list. All keys have to be present and
values have to match. The objects in the source list may have additional
fields.

example:
```
  match:  { 'nodes.$master.plugins': { name: ingest-attachment }  }
```

* Update plugin and module tests to work with other distributions

Some of the tests expected that the integration tests will always be ran
with  the `integ-test-zip` distribution so that there will be no other
plugins loaded.

With this change, we check for the presence of the plugin without
assuming exclusivity.

* Allow modules to run on other distros as well

To match the behavior of tets.distributions

* Add and use a new `contains` assertion

Replaces the  previus changes that caused `match` to do a partial match.

* Implement PR review comments
2018-06-26 06:49:03 -07:00
Jack Conradson 9efb0fe9ba Ingest Attachment: Upgrade Tika to 1.18 (#31252)
Fixes ES from hanging when a bad zip file is loaded through Tika.
2018-06-24 11:08:45 -07:00
Tanguy Leroux bf58660482
Remove all unused imports and fix CRLF (#31207)
The X-Pack opening and the recent other refactorings left a lot of 
unused imports in the codebase. This commit removes them all.
2018-06-11 15:12:12 +02:00
David Pilato 87553bba16
Add ingest-attachment support for per document `indexed_chars` limit (#28977)
We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.

We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.

Which allows running:

```
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars_field" : "size"
      }
    }
  ]
}
```

Then index either:

```
PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}
```

Which will use the default value (or the one defined by `indexed_chars`)

Or

```
PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000
}
```

Closes #28942
2018-03-14 19:07:20 +01:00
Christoph Büscher 231fd3c9be [Docs] Remove misleading comment
The TikaImpl#parse method comment sounds like this method is only used
in the same package for testing, but AttachmentProcessor uses it outside
of testing, so we should remove this comment.
2018-02-09 15:47:38 +01:00
Jason Tedor 641a6c9e62
Guard accessDeclaredMembers for Tika on JDK 10
Tika parsers need accessDeclaredMembers because ZipFile needs
accessDeclaredMembers on JDK 10. This commit guards adding this
permission to parsers so that the permission is only granted on JDK
10. Additionally, we add an assertion that forces us to check if the
permission is still needed in JDK 11.

Relates #28603
2018-02-09 09:08:07 -05:00
Christoph Büscher cc9cb5356a
Add missing runtime permission to TikaImpl (#28602)
Tests on jdk10 were failing because of a change in its ZipFile implementation 
that now needs `accessDeclaredMembers` permissions. This change adds 
the missing permission to the plugins security policy and TikaImpl.

Closes #28568
2018-02-09 14:41:24 +01:00
Jason Tedor aded32f48f
Fix third-party audit tasks on JDK 8
This one is interesting. The third party audit task runs inside the
Gradle JVM. This means that if Gradle is started on JDK 8, the third
party audit tasks will fail as a result of the changes to support
building Elasticsearch with the JDK 9 compiler. This commit reverts the
third party audit changes to support running this task when Gradle is
started with JDK 8.

Relates #28256
2018-01-16 22:59:29 -05:00
Jason Tedor 0a79555a12
Require JDK 9 for compilation (#28071)
This commit modifies the build to require JDK 9 for
compilation. Henceforth, we will compile with a JDK 9 compiler targeting
JDK 8 as the class file format. Optionally, RUNTIME_JAVA_HOME can be set
as the runtime JDK used for running tests. To enable this change, we
separate the meaning of the compiler Java home versus the runtime Java
home. If the runtime Java home is not set (via RUNTIME_JAVA_HOME) then
we fallback to using JAVA_HOME as the runtime Java home. This enables:
 - developers only have to set one Java home (JAVA_HOME)
 - developers can set an optional Java home (RUNTIME_JAVA_HOME) to test
   on the minimum supported runtime
 - we can test compiling with JDK 9 running on JDK 8 and compiling with
   JDK 9 running on JDK 9 in CI
2018-01-16 13:45:13 -05:00
Tal Levy 43ff38c5da
update ingest-attachment to use Tika 1.17 and newer deps (#27824)
- this pr updates tika and its dependencies
- updates the SHAs
- updates the class excludes
2017-12-15 13:47:26 -08:00
Guillaume Le Floch ac5fd6a7d9 Update Tika version to 1.15
This commit upgrades the Tika dependency to version 1.15.

Relates #25003
2017-11-09 13:16:44 -05:00
Tanguy Leroux 463e7e6fa3 Revert "Upgrade to Jackson 2.9.2 (#27032)"
This reverts commit 0b9acc5ace.
2017-10-20 08:25:41 +02:00
Tanguy Leroux f78b2e5bc9 Fix ingest-attachment yaml REST test 2017-10-19 22:20:50 +02:00
Yannick Welsch 3d8feff66e Use Java 9 FilePermission model (#26302)
This commit makes the security code aware of the Java 9 FilePermission changes (see #21534) and allows us to remove the `jdk.io.permissionsUseCanonicalPath` system property.
2017-08-22 11:22:00 +09:30
Yannick Welsch efd79882a2 Allow build to directly run under JDK 9 (#25859)
With Gradle 4.1 and newer JDK versions, we can finally invoke Gradle directly using a JDK9 JAVA_HOME without requiring a JDK8 to "bootstrap" the build. As the thirdPartyAudit task runs within the JVM that Gradle runs in, it needs to be adapted now to be JDK9 aware.

This commit also changes the `JavaCompile` tasks to only fork if necessary (i.e. when Gradle's JVM and JAVA_HOME's JVM differ).
2017-07-27 16:14:04 +02:00
Ryan Ernst 2a65bed243 Tests: Change rest test extension from .yaml to .yml (#24659)
This commit renames all rest test files to use the .yml extension
instead of .yaml. This way the extension used within all of
elasticsearch for yaml is consistent.
2017-05-16 17:24:35 -07:00
Ryan Ernst 212f24aa27 Tests: Clean up rest test file handling (#21392)
This change simplifies how the rest test runner finds test files and
removes all leniency.  Previously multiple prefixes and suffixes would
be tried, and tests could exist inside or outside of the classpath,
although outside of the classpath never quite worked. Now only classpath
tests are supported, and only one resource prefix is supported,
`/rest-api-spec/tests`.

closes #20240
2017-04-18 15:07:08 -07:00
Jason Tedor 3136ed1490 Rename random ASCII helper methods
This commit renames the random ASCII helper methods in ESTestCase. This
is because this method ultimately uses the random ASCII methods from
randomized runner, but these methods actually only produce random
strings generated from [a-zA-Z].

Relates #23886
2017-04-04 11:04:18 -04:00
Ryan Ernst f8453aca57 Packaging: Remove classpath ordering hack (#23596)
After the removal of the joda time hack we used to have, we can cleanup
the codebase handling in security, jarhell and plugins to be more picky
about uniqueness. This was originally in #18959 which was never merged.

closes #18959
2017-03-21 12:12:16 -07:00
David Pilato 6b66e29435 Remove POTM file after merging with master branch 2017-02-03 16:20:15 +01:00
David Pilato 626faeafe7 Merge branch 'master' into fix/22077-ingest-attachment
# Conflicts:
#	plugins/ingest-attachment/src/test/resources/org/elasticsearch/ingest/attachment/test/tika-files.zip
2017-02-03 16:15:44 +01:00
David Pilato 4775f520f4 Use PathUtils instead of Paths 2017-02-03 16:08:51 +01:00
David Pilato 7a8680c1a4 Replace tika-files.zip by a tika-files dir
Let's make our life easier when debugging/testing.
Also having a flat dir helps us to compare or "synchronize" more easily with Tika project files.

Closes #22958.
2017-02-03 15:19:00 +01:00
David Pilato 2b15d20f93 Remove support for Visio and POTM files
Actually we never supported Visio files but we are failing hard (kill a node) when that kind of file is provided.
See https://github.com/elastic/elasticsearch/pull/22079#issuecomment-277035357

This commits excludes Visio parsing from Tika so it does not fail anymore but returns empty content instead.

As a side effect, it also removes support for POTM files.

Closes #22077.
2017-02-03 13:03:52 +01:00
David Pilato ee3d73dc3d Add test-outlook.msg and test-outlook2003.msg files 2017-01-25 08:53:44 +01:00
David Pilato 8701f7a3ce Add missing mime4j library
In some cases (apparently with outlook files), mime4j library is needed.
We removed it in the past which can cause elasticsearch to crash when you are using ingest-attachment (and probably mapper-attachments as well in 2.x series) with a file which requires this library.

 Similar problem as the one reported at #22077.
2017-01-24 10:25:02 +01:00
Tim Brooks a4ac29c005 Add single static instance of SpecialPermission (#22726)
This commit adds a SpecialPermission constant and uses that constant
opposed to introducing new instances everywhere.

Additionally, this commit introduces a single static method to check that
the current code has permission. This avoids all the duplicated access
blocks that exist currently.
2017-01-21 12:03:52 -06:00
Nik Everett f5f2149ff2 Remove much ceremony from parsing client yaml test suites (#22311)
* Remove a checked exception, replacing it with `ParsingException`.
* Remove all Parser classes for the yaml sections, replacing them with static methods.
* Remove `ClientYamlTestFragmentParser`. Isn't used any more.
* Remove `ClientYamlTestSuiteParseContext`, replacing it with some static utility methods.

I did not rewrite the parsers using `ObjectParser` because I don't think it is worth it right now.
2016-12-22 11:00:34 -05:00
Tal Levy 5a90d9d7e6 add `ignore_missing` flag to ingest plugins (#22273)
added `ignore_missing` flag to:

- Attachment Processor
- GeoIP Processor
- User-Agent Processor
2016-12-20 10:53:28 -08:00
David Pilato 7517c50698 Update to Tika 1.14
Closes #20390.
2016-11-16 11:29:14 +01:00
Ryan Ernst 7a2c984bcc Test: Remove multi process support from rest test runner (#21391)
At one point in the past when moving out the rest tests from core to
their own subproject, we had multiple test classes which evenly split up
the tests to run. However, we simplified this and went back to a single
test runner to have better reproduceability in tests. This change
removes the remnants of that multiplexing support.
2016-11-07 15:07:34 -08:00
Alexander Reelsen 3c2e51d831 Deps: Update ingest-attachment to latest libraries (#20710)
Also added a test to check for a with a regular PDF,
instead of only an encrypted one with expected exception.
2016-10-10 12:55:05 +02:00
David Pilato 905684fe73 Adds content-length as number
If you run Elasticsearch with the ingest-attachment plugin:

```sh
gradle plugins:ingest-attachment:run
```

And then you use it on a document:

```js
 PUT _ingest/pipeline/attachment
 {
   "description" : "Extract attachment information",
   "processors" : [
     {
       "attachment" : {
         "field" : "data"
       }
     }
   ]
 }
 PUT my_index/my_type/my_id?pipeline=attachment
 {
   "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
 }
 GET my_index/my_type/my_id
```

 You were getting this back:

```js
 # PUT _ingest/pipeline/attachment
 {
   "acknowledged": true
 }

 # PUT my_index/my_type/my_id?pipeline=attachment
 {
   "_index": "my_index",
   "_type": "my_type",
   "_id": "my_id",
   "_version": 2,
   "result": "updated",
   "_shards": {
     "total": 2,
     "successful": 1,
     "failed": 0
   },
   "created": false
 }

 # GET my_index/my_type/my_id
 {
   "_index": "my_index",
   "_type": "my_type",
   "_id": "my_id",
   "_version": 2,
   "found": true,
   "_source": {
     "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=",
     "attachment": {
       "content_type": "application/rtf",
       "language": "ro",
       "content": "Lorem ipsum dolor sit amet",
       "content_length": "28"
     }
   }
 }
```

With this commit you are now getting:

```
 # GET my_index/my_type/my_id
 {
   "_index": "my_index",
   "_type": "my_type",
   "_id": "my_id",
   "_version": 2,
   "found": true,
   "_source": {
     "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=",
     "attachment": {
       "content_type": "application/rtf",
       "language": "ro",
       "content": "Lorem ipsum dolor sit amet",
       "content_length": 28
     }
   }
 }
```

Closes #19924
2016-08-10 18:31:16 +02:00
Nik Everett 9270e8b22b Rename client yaml test infrastructure
This makes it obvious that these tests are for running the client yaml
suites. Now that there are other ways of running tests using the REST
client against a running cluster we can't go on calling the shared
client yaml tests "REST tests". They are rest tests, but they aren't
**the** rest tests.
2016-07-26 13:53:44 -04:00