This will cause the leader stuck on IO during publication to step down and eventually trigger a leader election.
Issue Description
---
The publication of cluster state is time bound to 30s by a cluster.publish.timeout settings. If this time is reached before the new cluster state is committed, then the cluster state change is rejected and the leader considers itself to have failed. It stands down and starts trying to elect a new master.
There is a bug in leader that when it tries to publish the new cluster state it first tries acquire a lock to flush the new state under a mutex to disk. The same lock is used to cancel the publication on timeout. Below is the state of the timeout scheduler meant to cancel the publication. So essentially if the flushing of cluster state is stuck on IO, so will the cancellation of the publication since both of them share the same mutex. So leader will not step down and effectively block the cluster from making progress.
Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
* Drop mocksocket in favour of custom security manager checks (tests only)
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
* Slightly relaxed host checks to allow all local addresses
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
* Refactor the logic to control the format for code coverage report and rename the system property
* Remove outdated code of giving JaCoCo files permission when Java security manager enabled
Signed-off-by: Tianli Feng <ftianli@amazon.com>
The public key has changed since the initial release. This commit fixes the
public key and uses the .sig files that are published to the artifacts site.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
Most PRs do not add/update links, however sites go down often. This change makes sure that we catch any broken link in the repository and fix it, but at the same time we do not want to block PRs because of some unrelated broken links.
This PR updates the workflow to run everyday at midnight UTC.
Signed-off-by: Rabi Panda <adnapibar@gmail.com>
* Changes to support retrieval of operations from translog based on specified range
Signed-off-by: Sai Kumar <karanas@amazon.com>
* Addressed CR comments
Signed-off-by: Sai Kumar <karanas@amazon.com>
* Added testcases for internal engine
Signed-off-by: Sai Kumar <karanas@amazon.com>
* Support for translog pruning based on retention leases
Signed-off-by: Sai Kumar <karanas@amazon.com>
* Addressed CR Comments
Signed-off-by: Sai Kumar <karanas@amazon.com>
* Addressed test case issue
Signed-off-by: Sai Kumar <karanas@amazon.com>
* Explicitly point out the JDK 8 requirement is for runtime, but not for compiling.
* Clarify the JAVAx_HOME env variables are for the "backwards compatibility test".
* Add explanation on how the backwards compatibility tests get the OpenSearch distributions for a specific version.
Signed-off-by: Tianli Feng <ftianli@amazon.com>
The version framework only added support for OpenSearch 1.x bwc with legacy
clusters. This commit adds support for v2.0 which will be the last version with
bwc support for legacy clusters (v7.10)
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
* Drop mocksocket & securemock dependencies from sniffer and rest client (not needed)
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
* Removing .gitignore
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
This commit stages the branch to the next 1.0.1 patch release. BWC testing needs
this even if the next revision is never actually released.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
PluginInfo should use .onOrAfter(Version.V_1_1_0) instead of
.after(Version.V_1_0_0) for the new custom folder name for plugin feature.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
Lucene 9 removes support for SimpleFS File System format. This commit deprecates
the SimpleFS format in favor of NIOFS.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
In some cases as one shared with issue #1099, the maxConcurrentSearchRequests was chosen as 0 which
will compute the final value during execution of the request based on processor counts. When this
computed value is less than number of search request in msearch request, it will execute all the
requests in multiple iterations causing the failure since test will only wait for one such
iteration. Hence setting the maxConcurrentSearchRequests explicitly to number of search requests
being added in the test to ensure correct behavior
Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>
* Support for bwc tests for plugins
Signed-off-by: Vacha <vachshah@amazon.com>
* Adding support for restart upgrades for plugins bwc
Signed-off-by: Vacha <vachshah@amazon.com>
This change refactors the circular reference check in the Grok processor class
to use a formal depth-first traversal. It also includes a logic update to
prevent a stack overflow in one scenario and a check for malformed patterns.
This bugfix addresses CVE-2021-22144.
Signed-off-by: Kartik Ganesh <85275476+kartg@users.noreply.github.com>
* Part 1: Support for cancel_after_timeinterval parameter in search and msearch request
This commit introduces the new request level parameter to configure the timeout interval after which
a search request will be cancelled. For msearch request the parameter is supported both at parent
request and at sub child search requests. If it is provided at parent level and child search request
doesn't have it then the parent level value is set at such child request. The parent level msearch
is not used to cancel the parent request as it may be tricky to come up with correct value in cases
when child search request can have different runtimes
TEST: Added test for ser/de with new parameter
Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>
* Part 2: Support for cancel_after_timeinterval parameter in search and msearch request
This commit adds the handling of the new request level parameter and schedule cancellation task. It
also adds a cluster setting to set a global cancellation timeout for search request which will be
used in absence of request level timeout.
TEST: Added new tests in SearchCancellationIT
Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>
* Address Review feedback for Part 1
Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>
* Address review feedback for Part 2
Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>
* Update CancellableTask to remove the cancelOnTimeout boolean flag
Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>
* Replace search.cancellation.timeout cluster setting with search.enforce_server.timeout.cancellation to control if cluster level cancel_after_time_interval should take precedence over request level cancel_after_time_interval value
Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>
* Removing the search.enforce_server.timeout.cancellation cluster setting and just keeping search.cancel_after_time_interval setting with request level parameter taking the precedence.
Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>
Co-authored-by: Sorabh Hamirwasia <hsorabh@amazon.com>
* add jacoco plugin
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* config aggregated jacoco report
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Skip generating report if no test in subproject
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Add jacoco plugin into root project
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* test aggregate code coverage report
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Can generate aggregated unit test coverage report
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Can generate aggregated test report with source file linked
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Some cleanup, but jacocoReport is not dependson Test
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* cleanup
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* get unit test code coverage report
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* change 'enabled == true' to 'enabled'
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* add a comment for selectedsubprojects
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* add tasks to generate code coverage report for unit test and integration test
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* fix typo in variable
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Correct the task to get codecoverage for unit test and integtest
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Make code coverage report task run after test
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* apply gradle configuration aciidance api
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* apply jacoco plugin in BuildPlugin in all case
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Put file path list of integration test exec data in a variable
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Apply gradle configuration aviodance api to register task instead of create task
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* merge 2 jacocoreport configurations
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Add some comments in gradle script
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* Attach code coverage report task to gralde check task
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* add a space
Signed-off-by: Tianli Feng <ftianli@amazon.com>
* get code coverage report after check task on demand and get report in html format on demand
Signed-off-by: Tianli Feng <ftianli@amazon.com>
On February 3 2021, JFrog [announced](https://jfrog.com/blog/into-the-sunset-bintray-jcenter-gocenter-and-chartcenter/) the shutdown of JCenter. Later on April 27 2021, an update was provided that the repository will only be read only and new package and versions are no longer accepted on JCenter. This means we should no longer use JCenter for our central artifacts repository.
This change replaces JCenter with Maven Central as per the Gradle recommendation - https://blog.gradle.org/jcenter-shutdown
Signed-off-by: Rabi Panda <adnapibar@gmail.com>
This change fixes the issue where the sources and javadoc artifacts were not built and included with the publish.
Signed-off-by: Rabi Panda <adnapibar@gmail.com>
_cat/master is a fundamental API to know the master instance in the cluster. Given RestClusterState is exempted from tripping already, doesn't make sense for RestMasterAction to trip
Signed-off-by: Bukhtawar Khan bukhtawa@amazon.com