* Remove IndexShard dependency from Repository
In order to simplify repository testing especially for BlobStoreRepository
it's important to remove the dependency on IndexShard and reduce it to
Store and MapperService (in the snapshot case). This significantly reduces
the dependcy footprint for Repository and allows unittesting without starting
nodes or instantiate entire shard instances. This change deprecates the old
method signatures and adds a unittest for FileRepository to show the advantage
of this change.
In addition, the unittesting surfaced a bug where the internal file names that
are private to the repository were used in the recovery stats instead of the
target file names which makes it impossible to relate to the actual lucene files
in the recovery stats.
* don't delegate deprecated methods
* apply comments
* test
When using High Level Rest Client Java API to produce search query, using AggregationBuilders.topHits("th").sort("_score", SortOrder.DESC)
caused query to contain duplicate sort clauses.
A shard that is undergoing peer recovery is subject to logging warnings of the form
org.elasticsearch.action.FailedNodeException: Failed node [XYZ]
...
Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file found in ...
These failures are actually harmless, and expected to happen while a peer recovery is ongoing (i.e.
there is an IndexShard instance, but no proper IndexCommit just yet).
As these failures are currently bubbled up to the master, they cause unnecessary reroutes and
confusion amongst users due to being logged as warnings.
Closes #40107
rp.client_secret is a required secure setting. Make sure we fail with
a SettingsException and a clear, actionable message when building
the realm, if the setting is missing.
Enhance the handling of merging the claims sets of the
ID Token and the UserInfo response. JsonObject#merge would throw a
runtime exception when attempting to merge two objects with the
same key and different values. This could happen for an OP that
returns different vales for the same claim in the ID Token and the
UserInfo response ( Google does that for profile claim ).
If a claim is contained in both sets, we attempt to merge the
values if they are objects or arrays, otherwise the ID Token claim
value takes presedence and overwrites the userinfo response.
This adds the node name where we fail to start a process via the native
controller to facilitate debugging as otherwise it might not be known
to which node the job was allocated.
Moves the test infrastructure away from using node.max_local_storage_nodes, allowing us in a
follow-up PR to deprecate this setting in 7.x and to remove it in 8.0.
This also changes the behavior of InternalTestCluster so that starting up nodes will not automatically
reuse data folders of previously stopped nodes. If this behavior is desired, it needs to be explicitly
done by passing the data path from the stopped node to the new node that is started.
This commit reworks and clarifies the docs for the `discovery-ec2` plugin:
- folds the tiny "Getting started with AWS" into the page on configuration
- spells out the name of each setting in full instead of noting the
`discovery.ec2` prefix at the top of the page.
- replaces each `(Secure)` marker with a sentence describing what that means in
situ
- notes some missing defaults
- clarifies the behaviour of `discovery.ec2.groups` (dependent on `.any_group`)
- clarifies what `discovery.ec2.host_type` is for
- adds `discovery.ec2.tag.TAGNAME` as a (meta-)setting rather than describing
it in a separate section
- notes that the tags mentioned in `discovery.ec2.tag.TAGNAME` cannot contain
colons (see #38406)
- clarifies the EC2-specific interface names and what they're for
- reorders and rewords the recommendations for storage
- expands on why you should not span a cluster across regions
- adds a suggestion on protecting instances against termination during scale-in
- reformat to 80 columns where possible
Fixes#38406
Simplifies the voting configuration reconfiguration logic by switching to an explicit Comparator for
the priorities. Does not make changes to the behavior of the component.
SHA256 was recently added to the Hasher class in order to be used
in the TokenService. A few tests were still using values() to get
the available algorithms from the Enum and it could happen that
SHA256 would be picked up by these.
This change adds an extra convenience method
(Hasher#getAvailableAlgoCacheHash) and enures that only this and
Hasher#getAvailableAlgoStoredHash are used for getting the list of
available password hashing algorithms in our tests.
This performs a simple restart test to move a basic licensed
cluster from no security (the default) to security & transport TLS
enabled.
Backport of: #41933
Flushing at the end of a peer recovery (if needed) can bring these
benefits:
1. Closing an index won't end up with the red state for a recovering
replica should always be ready for closing whether it performs the
verifying-before-close step or not.
2. Good opportunities to compact store (i.e., flushing and merging
Lucene, and trimming translog)
Closes#40024Closes#39588
This change verifies and aborts recovery if source and target have the
same syncId but different sequenceId. This commit also adds an upgrade
test to ensure that we always utilize syncId.
The verifying-before-close step ensures the global checkpoints on all
shard copies are in sync; thus, we don' t need to sync global
checkpoints for closed indices.
Relate #33888
There is an off-by-one error in this test. It leads to the recovery
thread never being started, and that means joining on it will wait
indefinitely. This commit addresses that by fixing the off-by-one error.
Relates #42325
I forgot to git add these before pushing, sorry. This commit fixes
compilation in IndexShardTests, they are needed here and not in master
due to differences in how Java infers types in generics between JDK 8
and JDK 11.
Today when executing an action on a primary shard under permit, we do
not enforce that the shard is in primary mode before executing the
action. This commit addresses this by wrapping actions to be executed
under permit in a check that the shard is in primary mode before
executing the action.
Today we are persisting the retention leases at least every thirty
seconds by a scheduled background sync. This sync causes an fsync to
disk and when there are a large number of shards allocated to slow
disks, these fsyncs can pile up and can severely impact the system. This
commit addresses this by only persisting and fsyncing the retention
leases if they have changed since the last time that we persisted and
fsynced the retention leases.
Re-enable muted tests and accommodate recent backend changes
that result in higher memory usage being reported for a job
at the start of its life-cycle
* Cleanup Various Uses of ActionListener
* Use shorter `map`, `runAfter` or `wrap` where functionally equivalent to anonymous class
* Use ActionRunnable where functionally equivalent
This corrects what appears to have been a copy-paste error
where the logger for `MachineLearning` and `DataFrame` was wrongly
set to be that of `XPackPlugin`.
If there are no realms that depend on the native role mapping store,
then changes should it should not perform any cache refresh.
A refresh with an empty realm array will refresh all realms.
This also fixes a spurious log warning that could occur if the
role mapping store was notified that the security index was recovered
before any realm were attached.
Backport of: #42169
Downgrading an Elasticsearch node to an earlier version is unsupported, because
we do not make any attempt to guarantee that a node can read any of the on-disk
data written by a future version. Yet today we do not actively prevent
downgrades, and sometimes users will attempt to roll back a failed upgrade with
an in-place downgrade and get into an unrecoverable state.
This change adds the current version of the node to the node metadata file, and
checks the version found in this file against the current version at startup.
If the node cannot be sure of its ability to read the on-disk data then it
refuses to start, preserving any on-disk data in its upgraded state.
This change also adds a command-line tool to overwrite the node metadata file
without performing any version checks, to unsafely bypass these checks and
recover the historical and lenient behaviour.
Rollup jobs can define how long they should wait before rolling up new documents.
However if the delay is smaller or if it's not a multiple of the rollup interval
the job can create incomplete buckets because the max boundary for a job is computed
from the time when the job started rounded to the interval minus the delay. This change
fixes this computation by applying the delay substraction before the rounding in order to ensure
that we never create a boundary that falls in a middle of a bucket.