This commit modifies the follow stats API response structure to more
clearly highlight meaning of the higher level fields. In particular,
previously the response had a top-level key for each index. Instead, we
nest the indices under an "indices" field which is now an array. The
values in this array are objects containing two fields: "index" which is
the name of the follower index, and "shards" which is an array where
each value in the array is the follower stats for that shard. That is,
we have gone from:
{
"bar": [
{
"shard_id": 0...
}...
]...
}
to
{
"indices": [
{
"index": "bar",
"shards": [
{
"shard_id": 0...
}...
]
}...
}
In the CCR docs we want to refer to the endpoint that returns following
stats as the follow stats API. This commit renames the internal
implementation of this endpoint to reflect this usage.
The "lookupUser" method on a realm facilitates the "run-as" and
"authorization_realms" features.
This commit allows a realm to be used for "lookup only", in which
case the "authenticate" method (and associated token methods) are
disabled.
It does this through the introduction of a new
"authentication.enabled" setting, which defaults to true.
Building automatons can be costly. For the most part we cache things
that use automatons so the cost is limited.
However:
- We don't (currently) do that everywhere (e.g. we don't cache role
mappings)
- It is sometimes necessary to clear some of those caches which can
cause significant CPU overhead and processing delays.
This commit introduces a new cache in the Automatons class to avoid
unnecesarily recomputing automatons.
There may be values in the thread context that ought to be preseved
for later use, even if one or more realms perform asynchronous
authentication.
This commit changes the AuthenticationService to wrap the potentially
asynchronous calls in a ContextPreservingActionListener that retains
the original thread context for the authentication.
This changes the delete job API by adding
the choice to delete a job asynchronously.
The commit adds a `wait_for_completion` parameter
to the delete job request. When set to `false`,
the action returns immediately and the response
contains the task id.
This also changes the handling of subsequent
delete requests for a job that is already being
deleted. It now uses the task framework to check
if the job is being deleted instead of the cluster
state. This is a beneficial for it is going to also
be working once the job configs are moved out of the
cluster state and into an index. Also, force delete
requests that are waiting for the job to be deleted
will not proceed with the deletion if the first task
fails. This will prevent overloading the cluster. Instead,
the failure is communicated better via notifications
so that the user may retry.
Finally, this makes the `deleting` property of the job
visible (also it was renamed from `deleted`). This allows
a client to render a deleting job differently.
Closes#32836
Drops the last logging constructor that takes `Settings` because it is
no longer needed.
Watcher goes through a lot of effort to pass `Settings` to `Logger`
constructors and dropping `Settings` from all of those calls allowed us
to remove quite a bit of log-based ceremony from watcher.
The Security plugin authorizes actions on indices. Authorization
happens on a per index/alias basis. Therefore a request with a
Multi Index Expression (containing wildcards) has to be
first evaluated in the authorization layer, before the request is
handled. For authorization purposes, wildcards in expressions will
only be expanded to indices/aliases that are visible by the authenticated
user. However, this "constrained" evaluation has to be compatible with
the expression evaluation that a cluster without the Security plugin
would do. Therefore any change in the evaluation logic
in any of these sites has to be mirrored in the other site.
This commit mirrors the changes in core from #33518 that allowed
for Multi Index Expression in the Get Alias API, loosely speaking.
This enables Elasticsearch to use the JVM-wide configured
PKCS#11 token as a keystore or a truststore for its TLS configuration.
The JVM is assumed to be configured accordingly with the appropriate
Security Provider implementation that supports PKCS#11 tokens.
For the PKCS#11 token to be used as a keystore or a truststore for an
SSLConfiguration, the .keystore.type or .truststore.type must be
explicitly set to pkcs11 in the configuration.
The fact that the PKCS#11 token configuration is JVM wide implies that
there is only one available keystore and truststore that can be used by TLS
configurations in Elasticsearch.
The PIN for the PKCS#11 token can be set as a truststore parameter in
Elasticsearch or as a JVM parameter ( -Djavax.net.ssl.trustStorePassword).
The basic goal of enabling PKCS#11 token support is to allow PKCS#11-NSS in
FIPS mode to be used as a FIPS 140-2 enabled Security Provider.
When the cluster.routing.allocation.disk.watermark.flood_stage watermark
is breached, DiskThresholdMonitor marks the indices as read-only. This
failed when x-pack security was present as system user does not have the privilege
for update settings action("indices:admin/settings/update").
This commit adds the required privilege for the system user. Also added missing
debug logs when access is denied to help future debugging.
An assert statement is added to catch any missed privileges required for
system user.
Closes#33119
In SessionFactoryLoadBalancingTests#testRoundRobinWithFailures()
we kill ldap servers randomly and immediately bind to that port
connecting to mock server socket. This is done to avoid someone else
listening to this port. As the creation of mock socket and binding to the
port is immediate, sometimes the earlier socket would be in TIME_WAIT state
thereby having problems with either bind or connect.
This commit sets the SO_REUSEADDR explicitly to true and also sets
the linger on time to 0(as we are not writing any data) so as to
allow re-use of the port and close immediately.
Note: I could not find other places where this might be problematic
but looking at test runs and netstat output I do see lot of sockets
in TIME_WAIT. If we find that this needs to be addressed we can
wrap ServerSocketFactory to set these options and use that with in
memory ldap server configuration during tests.
Closes#32190
This commit upgrades the unboundid ldapsdk to version 4.0.8. The
primary driver for upgrading is a fix that prevents this library from
rewrapping Error instances that would normally bubble up to the
UncaughtExceptionHandler and terminate the JVM. Other notable changes
include some fixes related to connection handling in the library's
connection pool implementation.
Closes#33175
The `DnRoleMapper` class is used to map distinguished names of groups
and users to role names. This mapper builds in an internal map that
maps from a `com.unboundid.ldap.sdk.DN` to a `Set<String>`. In cases
where a lot of distinct DNs are mapped to roles, this can consume quite
a bit of memory. The majority of the memory is consumed by the DN
object. For example, a 94 character DN that has 9 relative DNs (RDN)
will retain 4KB of memory, whereas the String itself consumes less than
250 bytes.
In order to reduce memory usage, we can map from a normalized DN string
to a List of roles. The normalized string is actually how the DN class
determines equality with another DN and we can drop the overhead of
needing to keep all of the other objects in memory. Additionally the
use of a List provides memory savings as each HashSet is backed by a
HashMap, which consumes a great deal more memory than an appropriately
sized ArrayList. The uniqueness we get from a Set is maintained by
first building a set when parsing the file and then converting to a
list upon completion.
Closes#34237
Today we reverse the initial order of the nested documents when we
index them in order to ensure that parents documents appear after
their children. This means that a query will always match nested documents
in the reverse order of their offsets in the source document.
Reversing all documents is not needed so this change ensures that parents
documents appear after their children without modifying the initial order
in each nested level. This allows to match children in the order of their
appearance in the source document which is a requirement to efficiently
implement #33587. Old indices created before this change will continue
to reverse the order of nested documents to ensure backwark compatibility.
The follower index shard history UUID will be fetched from the indices stats api when the shard follow task starts and will be provided with the bulk shard operation requests. The bulk shard operations api will fail if the provided history uuid is unequal to the actual history uuid.
No longer record the leader history uuid in shard follow task params, but rather use the leader history UUIDs directly from follower index's custom metadata. The resume follow api will remain to fail if leader index shard history UUIDs are missing.
Closes#33956
Revert "[TESTS] Pin MockWebServer to TLS1.2 (#33127)" (commit
214652d4af) and "Pin TLS1.2 in
SSLConfigurationReloaderTests" (commit
d9f5e4fd2e), which pinned the
MockWebServer used in the SSLConfigurationReloaderTests to TLSv1.2 in
order to prevent failures with JDK 11 related to ssl session
invalidation. We no longer need this pinning as the problematic code
was fixed in #34130.
The `-` and `+` as a number literal prefix are already
parsed by the rule in `valueExpression`. To accommodate
this, there are some code changes that enables the
`ExpressionBuilder` to parse Literal integers and decimals
together with the `-/+` prefix sign (if exists) and validate
them (wrong format, large numbers, etc.).
Follows: #33854
Adds support for the get rollup job to the High Level REST Client. I had
to do three interesting and unexpected things:
1. I ported the rollup state wiping code into the high level client
tests. I'll move this into the test framework in a followup and remove
the x-pack version.
2. The `timeout` in the rollup config was serialized using the
`toString` representation of `TimeValue` which produces fractional time
values which are more human readable but aren't supported by parsing. So
I switched it to `getStringRep`.
3. Refactor the xcontent round trip testing utilities so we can test
parsing of classes that don't implements `ToXContent`.
Previously, parsing an arithmetic expression with `*` and no spaces,
e.g.: `2*i` threw a parsing exception as the grammar rule for
tableIdentifier was clashing with the rule for arithmetic operator `*`.
This issue comes already in the lexer and the left part of the
expression (in our example `2*`) was recognised as a
TABLE_IDENTIFIER token.
The solution adopted is to allow the `*` wildcard in the table name
only if it's surrounded with double quotes, e.g.: `"my*index"`
Closes: #33957
Remove CamelCase to CAMEL_CASE conversion when resolving
a function. Only convert user input to upper case and then
try to match with aliases or primary names.
Keep the internal conversion FunctionName to FUNCTION__NAME
which provides flexibility when registering functions by their class
name.
Fixes: #34114
We use wrap code in `// tag` and `//end` to include it in our docs. Our
current docs style wraps code snippets in a box that is only wide enough
for 76 characters and adds a horizontal scroll bar for wider snippets
which makes the snippet much harder to read. This adds a checkstyle check
that looks for java code that is included in the docs and is wider than
that 76 characters so all snippets fit into the box. It solves many of
the failures that this catches but suppresses many more. I will clean
those up in a follow up change.
Since #34099, the FollowingEngine will skip an operation which was
already processed before. With that change, it should be okay to unmute
testFollowIndexAndCloseNode.
This change fixes a potential deadlock problem in the unit
test introduced in #34117.
It also removes a piece of debug code and corrects a docs
formatting problem that were both added in that same PR.
This arose when two commits were pushed at roughly the same time, both
of which compiled successfully against master, but not when taken
together. This commit fixes a reference in one of the commits that was
changed in the other commit.
This commit modifies the CCR stats endpoint for indices to be
/{index}/_ccr/stats. This makes this endpoint consistent with other
index-centric endpoints like indices stats.
The unfollow API changes a follower index into a regular index, so that it will accept write requests from clients.
For the unfollow api to work the index follow needs to be stopped and the index needs to be closed.
Closes#33931
This change introduces the indexing optimization using sequence numbers
in the FollowingEngine. This optimization uses the max_seq_no_updates
which is tracked on the primary of the leader and replicated to replicas
and followers.
Relates #33656
This can be used to restrict the amount of CPU a single
structure finder request can use.
The timeout is not implemented precisely, so requests
may run for slightly longer than the timeout before
aborting.
The default is 25 seconds, which is a little below
Kibana's default timeout of 30 seconds for calls to
Elasticsearch APIs.
Prior to following an index in the follow API, check whether current
user has sufficient privileges in the leader cluster to read and
monitor the leader index.
Also check this in the create and follow API prior to creating the
follow index.
Also introduced READ_CCR cluster privilege that include the minimal
cluster level actions that are required for ccr in the leader cluster.
So a user can follow indices in a cluster, but not use the ccr admin APIs.
Closes#33553
Co-authored-by: Jason Tedor <jason@tedor.me>
In prior versions of Java, we expected to see a SSLHandshakeException
when starting a handshake with a server that we do not trust. In JDK11,
the exception has changed to a SSLException, which
SSLHandshakeException extends. This is most likely a side effect of the
TLS 1.3 changes in JDK11. This change updates the test to catch the
SSLException instead of the SSLHandshakeException and enables the test
to work on JDK8 through JDK11.
Closes#29989
The SSLService invalidates SSLSessions when there is a change to any of
the underlying key or trust material. However, this invalidation code
did not check for a null SSLSession being returned from the context and
assumed that the context would always return a non-null object. The
return of a null object is possible in all versions, but JDK11 seems to
return them more often due to changes for TLS 1.3. There are a number
of reasons that we get a id of a session but the context returns null
when the session with that id is requested. Some of the reasons for
this are:
* Session was evicted by session cache
* Session has timed out
* Session has been invalidated by another caller
To handle this, the SSLService now checks if the value is null before
calling invalidate on the SSLSession.
Closes#32124
This commit removes the use of ExecutableScript from watcher in favor of
custom script contexts for both watcher condition scripts and transform
scripts.
In prior versions of Java, we expected to see a SSLHandshakeException
when starting a handshake with a server that we do not trust. In JDK11,
the exception has changed to a SSLException, which
SSLHandshakeException extends. This is most likely a side effect of the
TLS 1.3 changes in JDK11. This change updates the test to catch the
SSLException instead of the SSLHandshakeException and enables the test
to work on JDK8 through JDK11.
Closes#32293
The following changes were made:
* Added ElasticsearchSecurityException. For in the case the current user has insufficient privileges while an index is being followed. Prior to following ccr checks whether the current user has sufficient privileges and if not the follow api fails with an error.
* Added Index block exception. If the leader index gets closed, this exception is returned.
* Added ClusterBlockException service unavailable. In case for example the leader cluster is without elected master.
* Removed IndexNotFoundException. If the leader / follower index has been deleted, ccr will need to stop the shard follow tasks with an error.
Closes#33954
Fixes the equals and hash function to ignore the order of aggregations to ensure equality after serialization
and deserialization. This ensures storing configs with aggregation works properly.
This also addresses a potential issue in caching when the same query contains aggregations but in
different order. 1st it will not hit in the cache, 2nd cache objects which shall be equal might end up twice in
the cache.
Due to a bug, that was fixed in #33360 and commit
1de2a925ce the initial adding of a watch
could get lost, thus leaving the watcher stats count as zero despite adding a watch.
Closes#33326