Commit Graph

25 Commits

Author SHA1 Message Date
dependabot[bot] 132eabd675
Bump xz from 1.8 to 1.9 in /plugins/ingest-attachment (#3248)
* Bump xz from 1.8 to 1.9 in /plugins/ingest-attachment

Bumps xz from 1.8 to 1.9.

---
updated-dependencies:
- dependency-name: org.tukaani:xz
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Updating SHAs

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com>
2022-05-10 11:01:57 -05:00
dependabot[bot] 112a108246
Bump commons-lang3 from 3.9 to 3.12.0 in /plugins/ingest-attachment (#3190)
* Bump commons-lang3 from 3.9 to 3.12.0 in /plugins/ingest-attachment

Bumps commons-lang3 from 3.9 to 3.12.0.

---
updated-dependencies:
- dependency-name: org.apache.commons:commons-lang3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Updating SHAs

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com>
2022-05-04 16:15:30 -07:00
dependabot[bot] 917f9c2699
Bump xmlbeans from 3.0.1 to 5.0.3 in /plugins/ingest-attachment (#2138)
* Bump xmlbeans from 3.0.1 to 5.0.3 in /plugins/ingest-attachment

Bumps xmlbeans from 3.0.1 to 5.0.3.

---
updated-dependencies:
- dependency-name: org.apache.xmlbeans:xmlbeans
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* Updating SHAs

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com>
2022-05-04 14:44:49 -07:00
Kartik Ganesh fc0f446ab2
Upgrading ingest-attachment dependencies (#3111)
* Upgrading Tika from 1.24.1 to 2.1.0 and bumping xmlbeans version

This major version upgrade requires an explicit dependency on tika-parsers-standard-package to import the parser implementations, and an update to the namespace of RTFParser. Also, LanguageIdentifier has been deprecated and replaced by LanguageDetector.

This change includes a bump in xmlbeans version from 3.0.1 to 3.1.0

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Upgrade Tika libraries from 2.1.0 to 2.2.0

This also requires a update of Apache Commons-IO from 2.7 to 2.11.0

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Upgrade Tika libraries from 2.2.0 to 2.2.1

Also update PDFBox to 2.0.25 as per Tika release notes

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Upgraded Tika and xmlbeans libraries

Tika libraries have been upgraded from 2.2.1 to 2.3.0. xmlbeans is now a subproject of POI, so POI was upgraded from 4.1.2 to 5.2.2. With POI 5.x the ooxml-schemas library has been moved to ooxml-lite/ooxml-full. Since ooxml-schemas no longer exists, the LICENSE and NOTICE files in the licenses/ directory have been removed. Finally, xmlbeans has been updated from 3.1.0 to 5.0.2

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* (In progress) Added tika-langdetect

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Upgrading tika libraries to 2.4.0

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Switched from tika-langdetect to tika-langdetect-optimaize

To fix the license check, the mapping regex was expanded to tika-.*
This now means the tika-core LICENSE and NOTICE files are no longer needed.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* (Work in progress) Switching AttachmentProcessor to use OptimaizeLangDetector

This is a concrete implementation of LanguageDetector. Using this requires bringing in the optimaize dependency.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Manually added LICENSE and NOTICE files for Optimaize language-detector

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Move Optimaize dependency to runtimeOnly

Also bring in transitive Guava dependency. This requires manual addition of LICENSE and NOTICE files as with other plugins.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Fix Optimaize langDetector to load models first before detecting

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Fallback logic, and test updates

Following the Tika library upgrade, some fallback logic is necessary:
1. "Author" is deprecated for MSOffice document parsing. It is recommended to use CREATOR from Tika Core Properties instead.
2. EPUB parsing no longer automatically extracts keywords. The convention to fall back to SUBJECT is now manually implemented in AttachmentProcessor

Finally, unit tests have been upgraded to account for non-deterministic language results across library upgrades.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Drop Guava version from 31.1 to 18.0

This is the version that Optimaize 0.6 depends on, and it allows for a smaller ignoreViolations list

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Fix ingest-attachment integration test to assert correct language

Signed-off-by: Kartik Ganesh <gkart@amazon.com>
2022-05-04 09:51:59 -07:00
Yoann Rodière b5d5616d44
Update commons-logging to 1.2 (#2806)
* Upgrade to Apache Commons Logging 1.2

Signed-off-by: Yoann Rodière <yoann@hibernate.org>

* Clarify that Apache HTTP/commons-* dependencies are not just for tests

Signed-off-by: Yoann Rodière <yoann@hibernate.org>
2022-04-08 16:43:51 -04:00
Sarat Vemulapalli db91d2efe9
Upgrading bouncycastle to 1.70 (#1832) 2022-01-03 07:35:38 -05:00
Abbas Hussain fa8126004c
Upgrade apache commons-compress to 1.21 (#1197)
Signed-off-by: Abbas Hussain <abbas_10690@yahoo.com>
2021-09-02 08:35:42 +05:30
Tianli Feng 18625952a9
update external library 'pdfbox' version to 2.0.24 to reduce vulnerability (#883) 2021-06-25 13:18:15 -07:00
Rabi Panda 50abf6d066
[CVE] Upgrade dependencies to mitigate CVEs (#657)
This PR upgrade the following dependencies to fix CVEs.

- commons-codec:1.12 (->1.13) apache/commons-codec@48b6157
- ant:1.10.8 (->1.10.9) https://ant.apache.org/security.html
- jackson-databind:2.10.4 (->2.11.0) FasterXML/jackson-databind#2589
- jackson-dataformat-cbor:2.10.4 (->2.11.0) https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-28491
- apache-httpclient:4.5.10 (->4.5.13) https://bugzilla.redhat.com/show_bug.cgi?id=CVE-2020-13956
- checkstyle:8.20 (->8.29) https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-10782
- junit:4.12 (->4.13.1) https://github.com/junit-team/junit4/security/advisories/GHSA-269g-pwp5-87pp
- netty:4.1.49.Final (->4.1.59) https://github.com/netty/netty/security/advisories/GHSA-5mcr-gq6c-3hq2

Signed-off-by: Rabi Panda <adnapibar@gmail.com>
2021-05-18 11:37:24 -07:00
Rabi Panda 0e180f4703
Update dependencies for ingest-attachment plugin. (#666)
This PR resolves the CVEs for dependencies in the ingest-attachment plugin.

tika : '1.24' -> '1.24.1' (https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-9489)
pdfbox : '2.0.19' -> '2.0.23' (https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-27807)
commons-io:commons-io : '2.6' -> '2.7' (https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-29425)

Signed-off-by: Rabi Panda <adnapibar@gmail.com>
2021-05-11 10:40:33 -07:00
Ioannis Kakavas 7d4ae7d982
Upgrade Tika to 1.24 (#54130) (#54150)
Also updates commons-compress to 1.19, pdfbox to 2.0.19 and
POI to 4.1.2. Adds a compile dependency to commons-math3
3.6.1 and SparseBitSet 1.2
2020-03-25 11:03:26 +02:00
Ioannis Kakavas d9ce0e6733
Update BouncyCastle to 1.64 (#52185) (#52464)
This commit upgrades the bouncycastle dependency from 1.61 to 1.64.
2020-02-18 14:11:34 +02:00
Igor Motov 1818c5fa44 Ingest Attachment: Upgrade tika to v1.22 (#45575)
Upgrades:
Apache Tika: 1.19.1 -> 1.22.
pdfbox : 2.0.12 -> 2.0.16
poi : 4.0.0 -> 4.0.1
2019-08-19 18:17:16 -04:00
Jason Tedor d02bca1314
Upgrade the bouncycastle dependency to 1.61 (#40017)
This commit upgrades the bouncycastle dependency from 1.59 to 1.61.
2019-03-14 08:54:47 -04:00
Jay Modi 54dbf9469c
Update httpclient for JDK 11 TLS engine (#37994)
The apache commons http client implementations recently released
versions that solve TLS compatibility issues with the new TLS engine
that supports TLSv1.3 with JDK 11. This change updates our code to
use these versions since JDK 11 is a supported JDK and we should
allow the use of TLSv1.3.
2019-01-30 14:24:29 -07:00
Ioannis Kakavas 080ddd5d9c
[WIP] Ingest Attachement: Upgrade tika to v1.19.1 (#33896)
Upgrades Tika to 1.19.1 and relevant transitive dependencies

Resolves: #31456, #31305
2018-10-12 17:09:14 +01:00
Alpar Torok cf2295b408
Add JDK11 support and enable in CI (#31644)
* Upgrade bouncycastle

Required to fix
`bcprov-jdk15on-1.55.jar; invalid manifest format `
on jdk 11

* Downgrade bouncycastle to avoid invalid manifest

* Add checksum for new jars

* Update tika permissions for jdk 11

* Mute test failing on jdk 11

* Add JDK11 to CI

* Thread#stop(Throwable) was removed

http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-June/053536.html

* Disable failing tests #31456

* Temprorarily disable doc tests

To see if there are other failures on JDK11

* Only blacklist specific doc tests

* Disable only failing tests in ingest attachment plugin

* Mute failing HDFS tests #31498

* Mute failing lang-painless tests #31500

* Fix backwards compatability builds

Fix JAVA version to 10 for ES 6.3

* Add 6.x to bwx -> java10

* Prefix out and err from buildBwcVersion for readability

```
> Task :distribution:bwc:next-bugfix-snapshot:buildBwcVersion
  [bwc] :buildSrc:compileJava
  [bwc] WARNING: An illegal reflective access operation has occurred
  [bwc] WARNING: Illegal reflective access by org.codehaus.groovy.reflection.CachedClass (file:/home/alpar/.gradle/wrapper/dists/gradle-4.5-all/cg9lyzfg3iwv6fa00os9gcgj4/gradle-4.5/lib/groovy-all-2.4.12.jar) to method java.lang.Object.finalize()
  [bwc] WARNING: Please consider reporting this to the maintainers of org.codehaus.groovy.reflection.CachedClass
  [bwc] WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
  [bwc] WARNING: All illegal access operations will be denied in a future release
  [bwc] :buildSrc:compileGroovy
  [bwc] :buildSrc:writeVersionProperties
  [bwc] :buildSrc:processResources
  [bwc] :buildSrc:classes
  [bwc] :buildSrc:jar

```

* Also set RUNTIME_JAVA_HOME for bwcBuild

So that we can make sure it's not too new for the build to understand.

* Align bouncycastle dependency

* fix painles array tets

closes #31500

* Update jar checksums

* Keep 8/10 runtime/compile untill consensus builds on 11

* Only skip failing tests if running on Java 11

* Failures are dependent of compile java version not runtime

* Condition doc test exceptions on compiler java version as well

* Disable hdfs tests based on runtime java

* Set runtime java to minimum supported for bwc

* PR review

* Add comment with ticket for forbidden apis
2018-07-05 03:24:01 +00:00
Jack Conradson 9efb0fe9ba Ingest Attachment: Upgrade Tika to 1.18 (#31252)
Fixes ES from hanging when a bad zip file is loaded through Tika.
2018-06-24 11:08:45 -07:00
Tal Levy 43ff38c5da
update ingest-attachment to use Tika 1.17 and newer deps (#27824)
- this pr updates tika and its dependencies
- updates the SHAs
- updates the class excludes
2017-12-15 13:47:26 -08:00
Guillaume Le Floch ac5fd6a7d9 Update Tika version to 1.15
This commit upgrades the Tika dependency to version 1.15.

Relates #25003
2017-11-09 13:16:44 -05:00
David Pilato 8701f7a3ce Add missing mime4j library
In some cases (apparently with outlook files), mime4j library is needed.
We removed it in the past which can cause elasticsearch to crash when you are using ingest-attachment (and probably mapper-attachments as well in 2.x series) with a file which requires this library.

 Similar problem as the one reported at #22077.
2017-01-24 10:25:02 +01:00
David Pilato 7517c50698 Update to Tika 1.14
Closes #20390.
2016-11-16 11:29:14 +01:00
Alexander Reelsen 3c2e51d831 Deps: Update ingest-attachment to latest libraries (#20710)
Also added a test to check for a with a regular PDF,
instead of only an encrypted one with expected exception.
2016-10-10 12:55:05 +02:00
Ryan Ernst 1d40c4bbc1 Make java9 work again
This change makes ES compile with java9 again, build 118.
* There are a handful of changes due to failure to determine types during compile.
* The attachment plugins which use tika needed to have tika upgraded in order to pickup fixes there for java 9.
* azure discovery and s3 repository indirectly depend on jaxb, which is no longer in the default modules. They now add a jaxb dependency externally, and make JarHell allow for this package.
2016-05-21 09:41:51 -07:00
Alexander Reelsen 0d4711c2fc Ingest: Add attachment processor
This is a simple port of the mapper attachment plugin to the ingest
functionality, no new features. The only option is to limit
the number of chars to prevent indexing of huge documents.

Fields can be selected in the processor as well.

Close #16303
2016-02-09 17:03:30 +01:00