Commit Graph

55275 Commits

Author SHA1 Message Date
Andrew Ross 12789f89a3
Close first engine instance before creating second (#1457)
When creating the second instance of an InternalEngine using the same
translog config of the default InternalEngine instance, the second
instance will attempt to delete all the existing translog files. I found
a deterministic test failure when running with the seed
`E3E6AAD95ABD299B`.

As opposed to creating a second engine instance with a different
translog location, just close the first one before creating the second.

Signed-off-by: Andrew Ross <andrross@amazon.com>
2021-10-28 01:11:05 -07:00
Owais Kazi 37ac3788a3
Run spotless and exclude checkstyle on modules module (#1442)
Signed-off-by: Owais Kazi <owaiskazi19@gmail.com>
2021-10-27 13:24:43 -07:00
Vacha 3f6e1df9eb
Fixing bwc test for repository-multi-version (#1441)
Signed-off-by: Vacha <vachshah@amazon.com>
2021-10-27 10:41:13 -04:00
Kyle J. Davis 39ddb774d8
changed work-in-progress language (#1275)
Signed-off-by: Kyle Davis <kyledvs@amazon.com>
2021-10-26 10:39:25 -07:00
Andriy Redko f469d53cff
[BUG] SymbolicLinkPreservingUntarTransform fails on Windows (#1433)
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
2021-10-26 09:45:26 -05:00
Owais Kazi 8394f541bc
Run spotless and exclude checkstyle on libs module (#1428)
Signed-off-by: Owais Kazi <owaiskazi19@gmail.com>
2021-10-26 09:45:26 -05:00
Andriy Redko a19c3736e5
Adjust CodeCache size to eliminate JVM warnings (and crashes) (#1426)
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
2021-10-26 09:45:26 -05:00
Daniel Doubrovkine (dB.) 2abd536f9b
Recommend Docker 3.6.0. (#1427)
Signed-off-by: dblock <dblock@dblock.org>
2021-10-26 09:45:26 -05:00
Rabi Panda c86d765e7c
Add extension point for custom TranslogDeletionPolicy in EnginePlugin. (#1404)
This commit adds a method that can be used to provide a custom TranslogDeletionPolicy
from within plugins that implement the EnginePlugin interface. This enables plugins to
provide a custom deletion policy with the current limitation that only one plugin can
override the policy. An exception will be thrown if more than one plugin overrides the
policy.

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2021-10-26 09:45:10 -05:00
CEHENKLE 8f566123c1
adding in untriaged label to features (#1419)
Signed-off-by: CEHENKLE <henkle@amazon.com>
2021-10-22 12:42:03 -04:00
Daniel Doubrovkine (dB.) e601a68457
Fix windows build (mostly) (#1412)
* Updated developer guide with Windows specifics.

Signed-off-by: dblock <dblock@dblock.org>

* Correct windows task name.

Signed-off-by: dblock <dblock@dblock.org>

* Use Docker desktop installation on Windows.

Signed-off-by: dblock <dblock@dblock.org>

* Locate docker-compose on Windows.

Signed-off-by: dblock <dblock@dblock.org>

* Default docker-compose location.

Signed-off-by: dblock <dblock@dblock.org>
2021-10-22 07:55:41 -04:00
Andrew Ross 5fb270908a
Change comment to point to DEVELOPER_GUIDE.md (#1415)
The content being referred to by the comment is in DEVELOPER_GUIDE.md
and not CONTRIBUTING.md.

Signed-off-by: Andrew Ross <andrross@amazon.com>
2021-10-22 07:54:42 -04:00
Owais Kazi d02443a265
Run spotless and exclude checkstyle on plugins module (#1417)
Signed-off-by: Owais Kazi <owaiskazi19@gmail.com>
2021-10-21 20:49:03 -04:00
Vacha 8b4a7683d5
Upgrading mockito version to make it consistent across the repo (#1410)
Signed-off-by: Vacha <vachshah@amazon.com>
2021-10-21 19:40:43 -04:00
Owais Kazi 33e70a9886
Run spotless and exclude checkstyle on client module (#1392)
Signed-off-by: Owais Kazi <owaiskazi19@gmail.com>
2021-10-21 13:13:38 -07:00
Andriy Redko 119701f622
Removing Jenkinsfile (not used), replaced by opensearch-build/jenkins/opensearch/Jenkinsfile (#1408)
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
2021-10-21 14:45:51 -04:00
Andriy Redko acac3cc285
Fixing org.opensearch.repositories.azure.AzureBlobContainerRetriesTests and org.opensearch.action.admin.cluster.node.stats.NodeStatsTests (#1390)
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
2021-10-20 15:27:18 -04:00
Saurabh Singh 284968bb85
Update node attribute check to version update (1.2) check for shard indexing pressure serialization. (#1395)
This commit adds an explicit version check for shard indexing pressure 
serialization. This is required to not mandate test have the cluster service 
initialized while asserting node attributes.

Signed-off-by: Saurabh Singh <sisurab@amazon.com>
Co-authored-by: Saurabh Singh <sisurab@amazon.com>
2021-10-20 12:18:46 -05:00
Xue Zhou 574e42c31b
Remove old ES libraries used in reindex due to CVEs (#1359)
This commit removes old ES libraries version 090 and 176 due to CVE

Signed-off-by: Xue Zhou <xuezhou@amazon.com>
2021-10-20 11:51:45 -05:00
Nick Knize ecac8d3c38
Add EngineConfig extensions to EnginePlugin (#1387)
This commit adds an extension point to EngineConfig through EnginePlugin using
a new EngineConfigFactory mechanism. EnginePlugin provides interface methods to
override configurations in EngineConfig. The EngineConfigFactory produces a new
instance of the EngineConfig using these overrides. Defaults are used absent
overridden configurations.

This serves as a mechanism to override Engine configurations (e.g., CodecService,
TranslogConfig) enabling Plugins to have higher fidelity for changing Engine
behavior without having to override the entire Engine (which is only permitted for
a single plugin).

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2021-10-19 23:04:28 -05:00
Sergey Nuyanzin 2da858ccb0
[typos] typos in DEVELOPER_GUIDE.md (#1381)
Signed-off-by: snuyanzin <snuyanzin@gmail.com>
2021-10-19 08:45:49 -04:00
Owais Kazi 996d33adb2
Run spotless and exclude checkstyle on server module (#1380)
Signed-off-by: Owais Kazi <owaiskazi19@gmail.com>
2021-10-19 08:32:54 -04:00
Andriy Redko 05dc4bf4b3
Fixing post merge 3rd party audit issues (#1384)
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
2021-10-18 22:06:50 -04:00
Andriy Redko 9612fe80b5
[repository-azure] plugin should use Azure Storage SDK v12 for Java (#1302)
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
2021-10-18 19:48:32 -04:00
Vacha c7f5c90a5f
Upgrading netty version to 4.1.69.Final (#1363)
Signed-off-by: Vacha <vachshah@amazon.com>
2021-10-18 17:30:19 -04:00
Andriy Redko 8ea3364bc5
Modernize and consolidate JDKs usage across all stages of the build. Update JDK-14 requirement, switch to JDK-17 instead (#1368)
* Modernize and consolidate JDKs usage across all stages of the build. Update JDK-14 requirement, switch to JDK-17 instead

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Updating phrasing based on review feedback

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Fixed runtime Java version usage, it has to be respected when RUNTIME_JAVA_HOME == JAVA_HOME

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Addressing review comments

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
2021-10-15 09:04:05 -04:00
Saurabh Singh 24fd89a1fd
Minor fix for the flaky test to reduce concurrency (#1361) (#1364)
Fixes flakiness for test testReplicaThreadedThroughputDegradationAndRejection.

Reduced the number of threads concurrently executing from the initial range between (100-120) to a new range between (80-100), as the previous range was breaking the node limits set as 10kb, for every execution where number of threads were greater than 110.

Signed-off-by: Saurabh Singh <sisurab@amazon.com>
2021-10-14 15:25:17 -07:00
Vacha d151082832
Upgrade hadoop dependencies for hdfs plugin (#1335)
* Upgrade hadoop dependencies for hdfs plugin

Signed-off-by: Vacha <vachshah@amazon.com>

* Fixing gradle check failures

Signed-off-by: Vacha <vachshah@amazon.com>

* Upgrading htrace-core4 to 4.1.0

Signed-off-by: Vacha <vachshah@amazon.com>
2021-10-14 14:43:49 -04:00
Romain Tartière ea0fe7bfae
Allow building on FreeBSD (#1091)
* Allow building on FreeBSD

With this set of change, we are able to successfuly run:

```
./gradlew publishToMavenLocal -Dbuild.snapshot=false
```

This step is used in the OpenSearch repository context when building
plugins in the current state of the CI.

While here, reorder OS conditions alphabetically.

Before building, the openjdk14 package was installed and the environment
was adjusted to use it:

```
sudo pkg install openjdk14
export JAVA_HOME=/usr/local/openjdk14/
export PATH=$JAVA_HOME/bin:$PATH
```

Signed-off-by: Romain Tartière <romain@blogreen.org>

* Unbreak CI with FreeBSD support

Signed-off-by: dblock <dblock@dblock.org>

Co-authored-by: dblock <dblock@dblock.org>
2021-10-14 14:42:28 -04:00
Andriy Redko 3779576c51
Modernize and consolidate JDKs usage across all stages of the build. Use JDK-17 as bundled JDK distribution to run tests (#1358)
* Modernize and consolidate JDKs usage across all stages of the build. Use JDK-17 as bundled JDK distribution to run tests

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Using -Djava.security.egd=file:/dev/urandom explicitly for cli tests

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
2021-10-13 17:25:48 -04:00
CEHENKLE 5a29b4797f
initial commit to add in a dependabot.yml file (#1353)
Signed-off-by: CEHENKLE <henkle@amazon.com>
2021-10-11 09:31:57 -07:00
Andriy Redko e9635d6bfe
Replace securemock with mock-maker (test support), update Mockito to 3.12.4 (#1332)
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
2021-10-10 14:18:54 -04:00
Saurabh Singh 3665daf5d0
Add Shard Level Indexing Pressure (#1336)
Shard level indexing pressure improves the current Indexing Pressure framework which performs memory accounting at node level and rejects the requests. This takes a step further to have rejections based on the memory accounting at shard level along with other key performance factors like throughput and last successful requests. 

**Key features**
- Granular tracking of indexing tasks performance, at every shard level, for each node role i.e. coordinator, primary and replica.
- Smarter rejections by discarding the requests intended only for problematic index or shard, while still allowing others to continue (fairness in rejection).
- Rejections thresholds governed by combination of configurable parameters (such as memory limits on node) and dynamic parameters (such as latency increase, throughput degradation).
- Node level and shard level indexing pressure statistics exposed through stats api.
- Integration of Indexing pressure stats with Plugins for for metric visibility and auto-tuning in future.
- Control knobs to tune to the key performance thresholds which control rejections, to address any specific requirement or issues.
- Control knobs to run the feature in shadow-mode or enforced-mode. In shadow-mode only internal rejection breakdown metrics will be published while no actual rejections will be performed.

The changes were divided into small manageable chunks as part of the following PRs against a feature branch.

- Add Shard Indexing Pressure Settings. #716
- Add Shard Indexing Pressure Tracker. #717
- Refactor IndexingPressure to allow extension. #718
- Add Shard Indexing Pressure Store #838
- Add Shard Indexing Pressure Memory Manager #945
- Add ShardIndexingPressure framework level construct and Stats #1015
- Add Indexing Pressure Service which acts as orchestrator for IP #1084
- Add plumbing logic for IndexingPressureService in Transport Actions. #1113
- Add shard indexing pressure metric/stats via rest end point. #1171
- Add shard indexing pressure integration tests. #1198

Signed-off-by: Saurabh Singh <sisurab@amazon.com>
Co-authored-by: Saurabh Singh <sisurab@amazon.com>
Co-authored-by: Rabi Panda <adnapibar@gmail.com>
2021-10-07 11:06:15 -07:00
Rishikesh Pasham 808dbfd2ec
Adding Security Reporting Instructions in README.md file Signed-off-by: Rishikesh Reddy Pasham rishireddy1159@gmail.com (#1326)
Signed-off-by: Rishikesh Pasham <rishireddy1159@gmail.com>
2021-10-02 10:41:30 -04:00
Andriy Redko d46c206f29
[BUG] ConcurrentSnapshotsIT#testAssertMultipleSnapshotsAndPrimaryFailOver fails intermittently (#1311)
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
2021-09-30 12:44:29 -04:00
Andriy Redko 180db5cd09
Support for Heap after GC stats (correction after backport to 1.2.0) (#1315)
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
2021-09-30 09:48:24 -04:00
Andriy Redko 80388a8a29
Support for Heap after GC stats (#1265)
* Support for Heap after GC stats

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Addressing code review comments

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Using the right version 2.0.0 (instead of 1.2.0) for the change

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
2021-09-28 14:40:00 -04:00
Daniel Doubrovkine (dB.) 457c1cd6ec
fix typo (#1305)
Signed-off-by: sgbasaraner <sarpbasaraner@gmail.com>

Co-authored-by: Sarp Güney Başaraner <sarpbasaraner@gmail.com>
2021-09-28 13:54:35 -04:00
Shivansh Arora 416220f510
Making GeneralScriptException an Implementation of OpensearchWrapperException (#1066)
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
2021-09-28 12:17:43 -04:00
Nick Knize 46e0f63539
[Tests] Translog Pruning tests to MetadataCreateIndexServiceTests (#1295)
This commit adds test coverage for translog pruning setting to
MetadataCreateIndexServiceTests

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2021-09-25 23:05:32 -05:00
Sai 29c88c6900
Rename translog pruning setting to CCR specific setting and addressed Bug in the test case (#1243)
* Rename translog pruing setting to CCR specific setting

Signed-off-by: Sai Kumar <karanas@amazon.com>

* Rename to index.plugins.replication.translog.retention_lease.pruning.enabled as
index settings needs "index." as prefix

Signed-off-by: Sai Kumar <karanas@amazon.com>

* Add deprecations to retention pruning controls

This commit adds deprecation flags to all added settings, variables, and methods
specific to ccr's retention lease pruning mechanism.

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>

* Addressed CR comments

Signed-off-by: Sai Kumar <karanas@amazon.com>

* fix javadoc deprecation

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>

* fix deprecation tag in TranslogDeletionPolicy

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>

* Addressed test issue under translog tests

Signed-off-by: Sai Kumar <karanas@amazon.com>

Co-authored-by: Nicholas Walter Knize <nknize@apache.org>
2021-09-24 16:09:59 -07:00
Xue Zhou 82d1d0ec08
fix gradle check fail due to renameing -min in #1094 (#1289)
Signed-off-by: Xue Zhou <xuezhou@amazon.com>
2021-09-24 12:06:16 -05:00
Tianli Feng 20728c3725
Add guide for generating code coverage report in TESTING.md (#1264)
Signed-off-by: Tianli Feng <ftianli@amazon.com>
2021-09-21 19:44:08 -04:00
Xue Zhou 0ab8e34022
Rename artifact produced by the build to include -min (#1251)
Signed-off-by: Xue Zhou <xuezhou@amazon.com>
2021-09-21 19:43:35 -04:00
Andriy Redko cdbc84f09d
Update Jackson to 2.12.5 (#1247)
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
2021-09-21 18:33:20 -04:00
CEHENKLE e66b717c98
adding components to DEVELOPER_GUIDE (#1200)
* adding components to DEVELOPER_GUIDE

Signed-off-by: CEHENKLE <henkle@amazon.com>

* small tweaks

Signed-off-by: CEHENKLE <henkle@amazon.com>
2021-09-20 20:59:49 -07:00
Bukhtawar Khan 390e678f92
Handle shard over allocation during partial zone/rack or independent node failures (#1149)
The changes ensure that in the event of a partial zone failure, the surviving nodes in the minority zone don't get overloaded with shards, this is governed by a skewness limit.

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
2021-09-20 10:32:23 -07:00
Bukhtawar Khan f7e2984248
Introduce FS Health HEALTHY threshold to fail stuck node (#1167)
This will cause the leader stuck on IO during publication to step down and eventually trigger a leader election.

Issue Description
---
The publication of cluster state is time bound to 30s by a cluster.publish.timeout settings. If this time is reached before the new cluster state is committed, then the cluster state change is rejected and the leader considers itself to have failed. It stands down and starts trying to elect a new master.

There is a bug in leader that when it tries to publish the new cluster state it first tries acquire a lock to flush the new state under a mutex to disk. The same lock is used to cancel the publication on timeout. Below is the state of the timeout scheduler meant to cancel the publication. So essentially if the flushing of cluster state is stuck on IO, so will the cancellation of the publication since both of them share the same mutex. So leader will not step down and effectively block the cluster from making progress.

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
2021-09-16 17:02:25 -07:00
Andriy Redko b6c8bdf872
Drop mocksocket in favour of custom security manager checks (tests only) (#1205)
* Drop mocksocket in favour of custom security manager checks (tests only)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Slightly relaxed host checks to allow all local addresses

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
2021-09-16 17:21:47 -04:00
Nick Knize cbbf967d76
[Version] Add 1.2 for BWC testing (#1241)
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2021-09-15 09:07:53 -07:00