This will cause the leader stuck on IO during publication to step down and eventually trigger a leader election.
Issue Description
---
The publication of cluster state is time bound to 30s by a cluster.publish.timeout settings. If this time is reached before the new cluster state is committed, then the cluster state change is rejected and the leader considers itself to have failed. It stands down and starts trying to elect a new master.
There is a bug in leader that when it tries to publish the new cluster state it first tries acquire a lock to flush the new state under a mutex to disk. The same lock is used to cancel the publication on timeout. Below is the state of the timeout scheduler meant to cancel the publication. So essentially if the flushing of cluster state is stuck on IO, so will the cancellation of the publication since both of them share the same mutex. So leader will not step down and effectively block the cluster from making progress.
Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
* Drop mocksocket in favour of custom security manager checks (tests only)
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
* Slightly relaxed host checks to allow all local addresses
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
* Changes to support retrieval of operations from translog based on specified range
Signed-off-by: Sai Kumar <karanas@amazon.com>
* Addressed CR comments
Signed-off-by: Sai Kumar <karanas@amazon.com>
* Added testcases for internal engine
Signed-off-by: Sai Kumar <karanas@amazon.com>
* Support for translog pruning based on retention leases
Signed-off-by: Sai Kumar <karanas@amazon.com>
* Addressed CR Comments
Signed-off-by: Sai Kumar <karanas@amazon.com>
* Addressed test case issue
Signed-off-by: Sai Kumar <karanas@amazon.com>
The version framework only added support for OpenSearch 1.x bwc with legacy
clusters. This commit adds support for v2.0 which will be the last version with
bwc support for legacy clusters (v7.10)
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
This commit stages the branch to the next 1.0.1 patch release. BWC testing needs
this even if the next revision is never actually released.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
PluginInfo should use .onOrAfter(Version.V_1_1_0) instead of
.after(Version.V_1_0_0) for the new custom folder name for plugin feature.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
Lucene 9 removes support for SimpleFS File System format. This commit deprecates
the SimpleFS format in favor of NIOFS.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
In some cases as one shared with issue #1099, the maxConcurrentSearchRequests was chosen as 0 which
will compute the final value during execution of the request based on processor counts. When this
computed value is less than number of search request in msearch request, it will execute all the
requests in multiple iterations causing the failure since test will only wait for one such
iteration. Hence setting the maxConcurrentSearchRequests explicitly to number of search requests
being added in the test to ensure correct behavior
Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>
* Part 1: Support for cancel_after_timeinterval parameter in search and msearch request
This commit introduces the new request level parameter to configure the timeout interval after which
a search request will be cancelled. For msearch request the parameter is supported both at parent
request and at sub child search requests. If it is provided at parent level and child search request
doesn't have it then the parent level value is set at such child request. The parent level msearch
is not used to cancel the parent request as it may be tricky to come up with correct value in cases
when child search request can have different runtimes
TEST: Added test for ser/de with new parameter
Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>
* Part 2: Support for cancel_after_timeinterval parameter in search and msearch request
This commit adds the handling of the new request level parameter and schedule cancellation task. It
also adds a cluster setting to set a global cancellation timeout for search request which will be
used in absence of request level timeout.
TEST: Added new tests in SearchCancellationIT
Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>
* Address Review feedback for Part 1
Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>
* Address review feedback for Part 2
Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>
* Update CancellableTask to remove the cancelOnTimeout boolean flag
Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>
* Replace search.cancellation.timeout cluster setting with search.enforce_server.timeout.cancellation to control if cluster level cancel_after_time_interval should take precedence over request level cancel_after_time_interval value
Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>
* Removing the search.enforce_server.timeout.cancellation cluster setting and just keeping search.cancel_after_time_interval setting with request level parameter taking the precedence.
Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>
Co-authored-by: Sorabh Hamirwasia <hsorabh@amazon.com>
_cat/master is a fundamental API to know the master instance in the cluster. Given RestClusterState is exempted from tripping already, doesn't make sense for RestMasterAction to trip
Signed-off-by: Bukhtawar Khan bukhtawa@amazon.com
This change adds the initial version of a new CLI tool `opensearch-upgrade` as part of the OpenSearch distribution. This tool is meant for assisting during an upgrade from an existing Elasticsearch v7.10.2/v6.8.0 node to OpenSearch. It automates the process of importing existing configurations and installing of core plugins.
Signed-off-by: Rabi Panda <adnapibar@gmail.com>
* Add "tagline" field back to "MainResponse" in sever side (not in rest-high-level-client side) that removed in PR #427 .
* Replace with a new tagline "The OpenSearch Project: https://opensearch.org/".
* Turn the tagline into a constant in server/src/main/java/org/opensearch/action/main/MainResponse.java.
This Change removes version.distribution when the version.number is
overridden with the cluster setting compatibility.override_main_response_version.
Signed-off-by: Marc Handalian <handalm@amazon.com>
This change adds a new cluster setting "compatibility.override_main_response_version"
that when enabled spoofs the version.number returned from MainResponse
for REST clients expecting legacy version 7.10.2.
Signed-off-by: Marc Handalian <handalm@amazon.com>
* Add Plugin name for verbose Plugin not found exception
* Make the plugin loading failure exception more verbose
* Throw Opensearch in place of RuntimeException for plugin load failure
* Nit fix, added ... to make logging standout
Signed-off-by: Jayesh Hathila <sharma.jayesh52@gmail.com>
* Address a kind of issue suggested by Amazon CodeGuru Reviewer:
* Add try-with-resources block to automatically close the resources after using to avoid resource leak, in `SymbolicLinkPreservingTarIT`, `LicenseAnalyzer`, `SymbolicLinkPreservingUntarTransform`, `ConcurrentSeqNoVersioningIT` in `VersionProperties`, `GeoFilterIT`, `XContentHelper`, `Json` and `IndexShard` class
* Add try-finally block to close the resources after using to avoid resource leak, in `ServerChannelContext` class.
* Add try-catch block to close the resources when exception occurs in `FsBlobContainer` class (when XContentFactory.xContentType throws an exception).
* Close resources when assertion error occurs, in `ServerChannelContext` class.
* Version checks are incorrectly returning versions < 1.0.0.
Signed-off-by: dblock <dblock@amazon.com>
* Removed V_7_10_3 which has not been released as of time of the fork.
Signed-off-by: dblock <dblock@amazon.com>
* Update check for current version to get unreleased versions.
- no unreleased version if the current version is "1.0.0"
- add unit tests for OpenSearch 1.0.0 with legacy ES versions.
- update VersionUtils to include all legacy ES versions as released.
Signed-off-by: Rabi Panda <adnapibar@gmail.com>
Co-authored-by: Rabi Panda <adnapibar@gmail.com>
This commit fixes mixedCluster and rolling upgrades by spoofing OpenSearch
version 1.0.0 as Legacy version 7.10.2. With this commit an OpenSearch 1.x node
can join a legacy (<= 7.10.2) cluster and rolling upgrades work as expected.
Mixed clusters will not work beyond the duration of the upgrade since shards
cannot be replicated from upgraded nodes to nodes running older versions.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
Co-authored-by: Shweta Thareja <tharejas@amazon.com>
This commit changes MainResponse to spoof OpenSearch 1.x version numbers as
Legacy version number 7.10.2 for legacy clients.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
* An allocation constraint mechanism, that de-prioritizes nodes from getting picked for allocation if they breach certain constraints
Signed-off-by: Ashwin Pankaj <appankaj@amazon.com>
* Create group settings with fallback.
Signed-off-by: dblock <dblock@amazon.com>
* Use protected fallbackSetting in Setting.
Signed-off-by: dblock <dblock@amazon.com>
This commit adds support for data streams by adding a DataStreamFieldMapper, and making timestamp
field name configurable. Backwards compatibility is supported.
Signed-off-by: Ketan Verma <ketan9495@gmail.com>
* Make default number of shards configurable
The default number of primary shards for a new index, when the number of shards are not provided in the request, can be configured for the cluster.
Signed-off-by: Arunabh Singh <arunabs@amazon.com>
* Address PR comments
Signed-off-by: Arunabh Singh <arunabs@amazon.com>
Co-authored-by: Arunabh Singh <arunabs@amazon.com>
Changes the behavior of the recursive deletion function `executeOneStaleIndexDelete()` stop
condition to be when the queue of `staleIndicesToDelete` is empty -- also in the error flow.
Otherwise the GroupedActionListener never responds and in the event of a few exceptions the
deletion task gets stuck.
Alters the test case to fail to delete in bulk many snapshots at the first attempt, and then
the next successful deletion also takes care of the previously failed attempt as the test
originally intended.
SNAPSHOT threadpool is at most 5. So in the event we get more than 5 exceptions there are no
more threads to handle the deletion task and there is still one more snapshot to delete in the
queue. Thus, in the test I made the number of extra snapshots be one more than the max in the
SNAPSHOT threadpool.
Signed-off-by: AmiStrn <amitai.stern@logz.io>
Instead of snapshot delete of stale indices being a single threaded operation this commit makes
it a multithreaded operation and delete multiple stale indices in parallel using SNAPSHOT
threadpool's workers.
Signed-off-by: Piyush Daftary <piyush.besu@gmail.com>
The change in 0ba0e7cc, introduced the issue where randomly selecting an incompatible version fails the test. It caused the filtering logic to incorrectly identify all ES 7.*.* versions as bad versions for joining which should not be the case.
Additionally, split the test into two separate tests where earlier only one of them was run at random.
Signed-off-by: Rabi Panda <adnapibar@gmail.com>
This commit fixes the Version.fromString logic to identify legacy versions. It
also adds an optional "distribution" field to the MainRespose for OpenSearch
version 1.0.0+. Any preceeding versions that do not contain the distribution
label will be handeled as legacy versions appropriately.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
Use C1 compiler only for short-lived tasks and unit test execution. Tone
down some of the slowest unit tests.
Signed-off-by: Robert Muir <rmuir@apache.org>
This commit rebases the versioning to OpenSearch 1.0.0
Co-authored-by: Rabi Panda <adnapibar@gmail.com>
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
MergeSchedulerSettingsTests tweaks the `node.processors` setting: sets it
explicitly to values of `2` and `8`. On a machine with only `4` threads
(e.g. my 2-core thinkpad), the test fails, because it creates unexpected
warnings about `node.processors` being set higher than the number of
cpus.
The problem can be reproduced always, by pretending to be single core:
```
./gradlew ':server:test' --tests "org.opensearch.index.MergeSchedulerSettingsTests.testMaxThreadAndMergeCount" -Dtests.jvm.argline="-XX:ActiveProcessorCount=1"
```
Instead, allow the test to provoke these specific warnings.
Signed-off-by: Robert Muir <rmuir@apache.org>
This commit adds the SPDX license header and modifications copyright to security
policy files.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
This commit adds the SPDX Apache-2.0 license header along with an additional
copyright header for all modifications.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
* Fix package names in classes of the dummy plugin jars.
The `PluginsServiceTests` uses couple of dummy plugin jars from the resources directory. The jars have classes with imports with package name `org.elasticsearch.*`. This commit recreates the jars after renaming those package names.
* Fix the failing server tests as a result of renaming metadata prefix.
As we changed the metadata prefix in `OpenSearchException` from `es.` to `opensearch.` (commit 13f6d23), the order of the keys in the `HashMap` changed. However, the tests are expecting a value which relies on a certain order . Ideally, these tests should not assume the order.
This commit doesn't rewrite the test but only changes the order so the tests pass.
* Properly rename the data examples to fix test failure.
As part of the commit 0bdd129, we renamed the data examples in used in the test cases. This caused the test failures in `SimpleNestedIT` as it was sorting the results and the rename changed the order of the search result. In `SearchQueryIT`, we missed to rename the term used in the query.
This commit fixes both the issues.
Signed-off-by: Rabi Panda <adnapibar@gmail.com>
This commit fixes some more renaming issues and as a result fixes the failing tests,
* :qa:logging-config:test
* :example-plugins:painless-whitelist:yamlRestTest
* :modules:reindex:test
Signed-off-by: Rabi Panda <adnapibar@gmail.com>
This commit fixes some renaming issues which as a result fixes multiple failing unit tests in the server module.
Signed-off-by: Rabi Panda <adnapibar@gmail.com>
This commit fixes some name issues leftover from the rename to OpenSearch work.
With this commit, the `gradlew :run` task should work.
Signed-off-by: Rabi Panda <pandarab@amazon.com>
This commit refactors instances of 'elasticsearch' with opensearch everywhere
except references to issues, and other places needed to test compatibility with
old elasticsearch clusters.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
Currently the thirdPartyAudit task is failing for the test:framework module after renaming to OpenSearch. We have created an issue and temporarily suppressed the errors to unblock the precommit.
Issue: https://github.com/opensearch-project/OpenSearch/issues/420
Signed-off-by: Rabi Panda <adnapibar@gmail.com>
Fix miscellaneous issues identified during `gradle precommit`. These issues are the side effects of the renaming to OpenSearch work.
Signed-off-by: Rabi Panda <adnapibar@gmail.com>
* Refactor module server/src/test/java/org/elasticsearch/index
Signed-off-by: Harold Wang <harowang@amazon.com>
* Update class resolution during refactor
Signed-off-by: Harold Wang <harowang@amazon.com>
* Restore unintended class path changes
Signed-off-by: Harold Wang <harowang@amazon.com>
This commit refactors remaining ES classes to OpenSearch prefix throughout the
code base. All references are also refactored.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
This commit refactors the o.e.watcher package. References throughout the codebase are also refactored.
Signed-off-by: Himanshu Setia <setiah@amazon.com>
This commit refactors the following test packages from the server/test module:
* o.e.monitor
* o.e.persistent
* o.e.plugins
* o.e.recovery
to the o.opensearch namespace. All references throughout the codebase have also
been refactored.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
This commit cleans up imports, variable names, comments, and other misc usages
of ES with the new OpenSearch name.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
This commit refactors the remaining o.e.index and o.e.test packages in the
test/fixtures module. References throughout the codebase are also refactored.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
This commit refactors the following test framework packages:
* o.e.env
* o.e.geo
* o.e.http
* o.e.indices
* o.e.ingest
* o.e.plugin
* o.e.upgrades
to the o.opensearch namespace. All references throughout the test codebase have
been refactored.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>