* Upgrading Tika from 1.24.1 to 2.1.0 and bumping xmlbeans version
This major version upgrade requires an explicit dependency on tika-parsers-standard-package to import the parser implementations, and an update to the namespace of RTFParser. Also, LanguageIdentifier has been deprecated and replaced by LanguageDetector.
This change includes a bump in xmlbeans version from 3.0.1 to 3.1.0
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
* Upgrade Tika libraries from 2.1.0 to 2.2.0
This also requires a update of Apache Commons-IO from 2.7 to 2.11.0
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
* Upgrade Tika libraries from 2.2.0 to 2.2.1
Also update PDFBox to 2.0.25 as per Tika release notes
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
* Upgraded Tika and xmlbeans libraries
Tika libraries have been upgraded from 2.2.1 to 2.3.0. xmlbeans is now a subproject of POI, so POI was upgraded from 4.1.2 to 5.2.2. With POI 5.x the ooxml-schemas library has been moved to ooxml-lite/ooxml-full. Since ooxml-schemas no longer exists, the LICENSE and NOTICE files in the licenses/ directory have been removed. Finally, xmlbeans has been updated from 3.1.0 to 5.0.2
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
* (In progress) Added tika-langdetect
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
* Upgrading tika libraries to 2.4.0
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
* Switched from tika-langdetect to tika-langdetect-optimaize
To fix the license check, the mapping regex was expanded to tika-.*
This now means the tika-core LICENSE and NOTICE files are no longer needed.
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
* (Work in progress) Switching AttachmentProcessor to use OptimaizeLangDetector
This is a concrete implementation of LanguageDetector. Using this requires bringing in the optimaize dependency.
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
* Manually added LICENSE and NOTICE files for Optimaize language-detector
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
* Move Optimaize dependency to runtimeOnly
Also bring in transitive Guava dependency. This requires manual addition of LICENSE and NOTICE files as with other plugins.
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
* Fix Optimaize langDetector to load models first before detecting
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
* Fallback logic, and test updates
Following the Tika library upgrade, some fallback logic is necessary:
1. "Author" is deprecated for MSOffice document parsing. It is recommended to use CREATOR from Tika Core Properties instead.
2. EPUB parsing no longer automatically extracts keywords. The convention to fall back to SUBJECT is now manually implemented in AttachmentProcessor
Finally, unit tests have been upgraded to account for non-deterministic language results across library upgrades.
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
* Drop Guava version from 31.1 to 18.0
This is the version that Optimaize 0.6 depends on, and it allows for a smaller ignoreViolations list
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
* Fix ingest-attachment integration test to assert correct language
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
The types exist transport action can be removed now that the TransportClient has
been removed and types support has been removed.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
In preparation for re-enabling the missingJavadoc gradle task this change adds
in the missing package-info.java files to the server folder. For now general
javadocs are added to these files with the intent to clean up with better
descriptions over time.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
When using package distribution
```
./gradlew :distribution:packages:no-jdk-deb:assemble
```
When `true` this include jdk switch the boolean to the correct value
fix https://github.com/opensearch-project/OpenSearch/issues/3024
Signed-off-by: Laurent Arnoud <laurent.arnoud@platform.sh>
Refactors XContentType.fromMediaTypeOrFormat to fromMediaType so Accept headers
and Content-Type headers can be parsed separately. This helps in reusing the
same parse logic in for REST Versioning API support.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
* Replace internal usages of 'master' term in 'client' directory
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Add a uni test for NodeSelector to test deprecated master role
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Replace internal usages of 'master' terminology in server/src/main directory
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Restore rename DISCOVERED_MASTER in ClusterHealthResponse
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Rename two methods in unit tests
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Replace master word in ClusterState
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Replace master word in LeaderChecker JoinHelper JoinTaskExecutor
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Replace master word in more classes
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Replace master word in more classes
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Replace master word in more classes
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Replace master word in more classes
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Replace master word in more classes
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Replace master word in more classes
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Replace master word in DiscoveryNodes classes
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Replace master word in more classes
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Correct mistakes
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Adjust format by spotlessApply task
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Change MASTER__NODE_BOOTSTRAPPED_MSG in test
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Fix SnapshotDisruptionIT by renaming to cluster-manager
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* [Remove] Type from nested fields using new metadata field mapper
types support is removed yet nested documents use the _type field to store the
path for nested documents. A new _nested_path metadata field mapper is added to
take the place of the _type field in order to remove the type dependency in
nested documents. BWC is handled in the new field mapper to ensure compatibility
with older versions.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
* pr fixes
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
* add test to merge same mapping with empty index settings
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
* This change formalizes the notion of feature flags, and adds a "replication type" setting that will differentiate between document and segment replication, gated by a feature flag.
Since seg-rep is currently an incomplete implementation, the feature flag ensures that the setting is not visible to users without explicitly setting a system property. We can then continue to merge seg-rep related changes from the feature branch to `main` safely hidden behind the feature flag gate.
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
* Update security policy for testing feature flags
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
Co-authored-by: Nicholas Walter Knize <nknize@apache.org>
A few places still referenced legacy ESTestCase name. This refactors those
instances to OpenSearchTestCase.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
OpenSearch 2.0.0 no longer needs HLRC compatibility with legacy clients. This
commit removes all logic to spoof the version as a legacy cluster.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
Adds a new multi_term aggregation. The current implementation focuses
on adding new type aggregates. Performance (latency) is suboptimal in this
iteration, mainly because of brute force encoding/decoding a list of values
into bucket keys. A performance improvement change will be made as a
follow on.
Signed-off-by: Peng Huo <penghuo@gmail.com>
* Refactoring GatedAutoCloseable to AutoCloseableRefCounted
This is a part of the process of merging our feature branch - feature/segment-replication - back into main by re-PRing our changes from the feature branch.
GatedAutoCloseable currently wraps a subclass of RefCounted. Segment replication adds another subclass, but this also wraps RefCounted. Both subclasses have the same shutdown hook - decRef. This change makes the superclass less generic to increase code convergence.
The breakdown of the plan to merge segment-replication to main is detailed in #2355
Segment replication design proposal - #2229
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
* Minor refactoring in RecoveryState
This change makes two minor updates to RecoveryState -
1. The readRecoveryState API is removed because it can be replaced by an invocation of the constructor
2. The class members of the Timer inner class are changed to private, and accesses are only through the public APIs
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
* Update RecoveryTargetTests to test Timer subclasses deterministically
This change removes the use of RandomBoolean in testing the Timer classes and creates a dedicated unit test for each. The common test logic is shared via a private method.
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
* Move the RecoveryState.Timer class to a top-level class
This will eventually be reused across both replication use-cases - peer recovery and segment replication.
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
* Further update of timer tests in RecoveryTargetTests
Removes a non-deterministic code path around stopping the timer, and avoids assertThat (deprecated)
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
* Rename to ReplicationTimer
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
* Remove RecoveryTargetTests assert on a running timer
Trying to serialize and deserialize a running Timer instance, and then checking for equality leads to flaky test failures when the ser/deser takes time.
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
AllFieldMapper was deprecated in legacy 6x. The remaining references are
removed, along with the field mapper and corresponding tests.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>