Currently nio implements ip filtering at the channel context level. This
is kind of a hack as the application logic should be implemented at the
handler level. This commit moves the ip filtering into a channel
handler. This requires adding an indicator to the channel handler to
show when a channel should be closed.
This replaces the use of char[] in the password length validation
code, with the use of SecureString
Although the use of char[] is not in itself problematic, using a
SecureString encourages callers to think about the lifetime of the
password object and to clear it after use.
Backport of: #42884
This commit removes some very old test logging annotations that appeared
to be added to investigate test failures that are long since closed. If
these are needed, they can be added back on a case-by-case basis with a
comment associating them to a test failure.
Kibana wants to create access_token/refresh_token pair using Token
management APIs in exchange for kerberos tickets. `client_credentials`
grant_type requires every user to have `cluster:admin/xpack/security/token/create`
cluster privilege.
This commit introduces `_kerberos` grant_type for generating `access_token`
and `refresh_token` in exchange for a valid base64 encoded kerberos ticket.
In addition, `kibana_user` role now has cluster privilege to create tokens.
This allows Kibana to create access_token/refresh_token pair in exchange for
kerberos tickets.
Note:
The lifetime from the kerberos ticket is not used in ES and so even after it expires
the access_token/refresh_token pair will be valid. Care must be taken to invalidate
such tokens using token management APIs if required.
Closes#41943
* TestClusters: Convert the security plugin
This PR moves security tests to use TestClusters.
The TLS test required support in testclusters itself, so the correct
wait condition is configgured based on the cluster settings.
* PR review
The description field of xpack featuresets is optionally part of the
xpack info api, when using the verbose flag. However, this information
is unnecessary, as it is better left for documentation (and the existing
descriptions describe anything meaningful). This commit removes the
description field from feature sets.
It turns out that key rotation on the OP, can manifest as both
a BadJWSException and a BadJOSEException in nimbus-jose-jwt. As
such we cannot depend on matching only BadJWSExceptions to
determine if we should poll the remote JWKs for an update.
This has the side-effect that a remote JWKs source will be polled
exactly one additional time too for errors that have to do with
configuration, or for errors that might be caused by not synched
clocks, forged JWTs, etc. ( These will throw a BadJWTException
which extends BadJOSEException also )
WriteActionsTests#testBulk and WriteActionsTests#testIndex sometimes
fail with a pending retention lock. We might leak retention locks when
switching to async recovery. However, it's more likely that ongoing
recoveries prevent the retention lock from releasing.
This change increases the waiting time when we check for no pending
retention lock and also ensures no ongoing recovery in
WriteActionsTests.
Closes#41054
Kibana alerting is going to be built using API Keys, and should be
permitted on a basic license.
This commit moves API Keys (but not Tokens) to the Basic license
Relates: elastic/kibana#36836
Backport of: #42787
Currently, when the SSLEngine needs to produce handshake or close data,
we must manually call the nonApplicationWrite method. However, this data
is only required when something triggers the need (starting handshake,
reading from the wire, initiating close, etc). As we have a dedicated
outbound buffer, this data can be produced automatically. Additionally,
with this refactoring, we combine handshake and application mode into a
single mode. This is necessary as there are non-application messages that
are sent post handshake in TLS 1.3. Finally, this commit modifies the
SSLDriver tests to test against TLS 1.3.
Enable audit logs in docker by creating console appenders for audit loggers.
also rename field @timestamp to timestamp and add field type with value audit
The docker build contains now two log4j configuration for oss or default versions. The build now allows override the default configuration.
Also changed the format of a timestamp from ISO8601 to include time zone as per this discussion #36833 (comment)
closes#42666
backport#42671
This commit fixes the version parsing in various tests. The issue here is that
the parsing was relying on java.version. However, java.version can contain
additional characters such as -ea for early access builds. See JEP 233:
Name Syntax
------------------------------ --------------
java.version $VNUM(\-$PRE)?
java.runtime.version $VSTR
java.vm.version $VSTR
java.specification.version $VNUM
java.vm.specification.version $VNUM
Instead, we want java.specification.version.
This commit removes the TLS cluster join validator.
This validator existed to prevent v6.x nodes (which mandated
TLS) from joining an existing cluster of v5.x nodes (which did
not mandate TLS) unless the 6.x node (and by implication the
5.x nodes) was configured to use TLS.
Since 7.x nodes cannot talk to 5.x nodes, this validator is no longer
needed.
Removing the validator solves a problem where single node clusters
that were bound to local interfaces were incorrectly requiring TLS
when they recovered cluster state and joined their own cluster.
Backport of: #42826
Whether security is enabled/disabled is dependent on the combination
of the node settings and the cluster license.
This commit adds a license state listener that logs when the license
change causes security to switch state (or to be initialised).
This is primarily useful for diagnosing cluster formation issues.
Backport of: #42488
If the security index is closed, it should be treated as unavailable
for security purposes.
Prior to 8.0 (or in a mixed cluster) a closed security index has
no routing data, which would cause a NPE in the cluster change
handler, and the index state would not be updated correctly.
This commit fixes that problem
Backport of: #42191
This commit clones the existing AnalyzeRequest/AnalyzeResponse classes
to the high-level rest client, and adjusts request converters to use these new
classes.
This is a prerequisite to removing the Streamable interface from the internal
server version of these classes.
This commit changes the way token ids are hashed so that the output is
url safe without requiring encoding. This follows the pattern that we
use for document ids that are autogenerated, see UUIDs and the
associated classes for additional details.
This change ensures that:
- We only attempt to refresh the remote JWKS when there is a
signature related error only ( BadJWSException instead of the
geric BadJOSEException )
- We do call OpenIDConnectAuthenticator#getUserClaims upon
successful refresh.
- We test this in OpenIdConnectAuthenticatorTests.
Without this fix, when using the OpenID Connect realm with a remote
JWKSet configured in `op.jwks_path`, the refresh would be triggered
for most configuration errors ( i.e. wrong value for `op.issuer` )
and the kibana wouldn't get a response and timeout since
`getUserClaims` wouldn't be called because
`ReloadableJWKSource#reloadAsync` wouldn't call `onResponse` on the
future.
Test was using ClockMock#rewind passing the amount of nanoseconds
in order to "strip" nanos from the time value. This was intentional
as the expiration time of the UserToken doesn't have nanosecond
precision.
However, ClockMock#rewind doesn't support nanos either, so when it's
called with a TimeValue, it rewinds the clock by the TimeValue's
millis instead. This was causing the clock to go enough millis
before token expiration time and the test was passing. Once every
few hundred times though, the TimeValue by which we attempted to
rewind the clock only had nanos and no millis, so rewind moved the
clock back just a few millis, but still after expiration time.
This change moves the clock explicitly to the same instant as expiration,
using clock.setTime and disregarding nanos.
* Safer Wait for Snapshot Success in ClusterPrivilegeTests
* The snapshot state returned by the API might become SUCCESS before it's fully removed from the cluster state.
* We should fix this race in the transport API but it's not trivial and will be part of the incoming big round of refactoring the repository interaction, this added check fixes the test for now
* closes#38030
This commit makes creators of GetField split the fields into document fields and metadata fields. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in #24422.
rp.client_secret is a required secure setting. Make sure we fail with
a SettingsException and a clear, actionable message when building
the realm, if the setting is missing.
Enhance the handling of merging the claims sets of the
ID Token and the UserInfo response. JsonObject#merge would throw a
runtime exception when attempting to merge two objects with the
same key and different values. This could happen for an OP that
returns different vales for the same claim in the ID Token and the
UserInfo response ( Google does that for profile claim ).
If a claim is contained in both sets, we attempt to merge the
values if they are objects or arrays, otherwise the ID Token claim
value takes presedence and overwrites the userinfo response.
SHA256 was recently added to the Hasher class in order to be used
in the TokenService. A few tests were still using values() to get
the available algorithms from the Enum and it could happen that
SHA256 would be picked up by these.
This change adds an extra convenience method
(Hasher#getAvailableAlgoCacheHash) and enures that only this and
Hasher#getAvailableAlgoStoredHash are used for getting the list of
available password hashing algorithms in our tests.
This performs a simple restart test to move a basic licensed
cluster from no security (the default) to security & transport TLS
enabled.
Backport of: #41933
If there are no realms that depend on the native role mapping store,
then changes should it should not perform any cache refresh.
A refresh with an empty realm array will refresh all realms.
This also fixes a spurious log warning that could occur if the
role mapping store was notified that the security index was recovered
before any realm were attached.
Backport of: #42169
This commit changes how access tokens and refresh tokens are stored
in the tokens index.
Access token values are now hashed before being stored in the id
field of the `user_token` and before becoming part of the token
document id. Refresh token values are hashed before being stored
in the token field of the `refresh_token`. The tokens are hashed
without a salt value since these are v4 UUID values that have
enough entropy themselves. Both rainbow table attacks and offline
brute force attacks are impractical.
As a side effect of this change and in order to support multiple
concurrent refreshes as introduced in #39631, upon refreshing an
<access token, refresh token> pair, the superseding access token
and refresh tokens values are stored in the superseded token doc,
encrypted with a key that is derived from the superseded refresh
token. As such, subsequent requests to refresh the same token in
the predefined time window will return the same superseding access
token and refresh token values, without hitting the tokens index
(as this only stores hashes of the token values). AES in GCM
mode is used for encrypting the token values and the key
derivation from the superseded refresh token uses a small number
of iterations as it needs to be quick.
For backwards compatibility reasons, the new behavior is only
enabled when all nodes in a cluster are in the required version
so that old nodes can cope with the token values in a mixed
cluster during a rolling upgrade.
This commit updates the default ciphers and TLS protocols that are used
when the runtime JDK supports them. New cipher support has been
introduced in JDK 11 and 12 along with performance fixes for AES GCM.
The ciphers are ordered with PFS ciphers being most preferred, then
AEAD ciphers, and finally those with mainstream hardware support. When
available stronger encryption is preferred for a given cipher.
This is a backport of #41385 and #41808. There are known JDK bugs with
TLSv1.3 that have been fixed in various versions. These are:
1. The JDK's bundled HttpsServer will endless loop under JDK11 and JDK
12.0 (Fixed in 12.0.1) based on the way the Apache HttpClient performs
a close (half close).
2. In all versions of JDK 11 and 12, the HttpsServer will endless loop
when certificates are not trusted or another handshake error occurs. An
email has been sent to the openjdk security-dev list and #38646 is open
to track this.
3. In JDK 11.0.2 and prior there is a race condition with session
resumption that leads to handshake errors when multiple concurrent
handshakes are going on between the same client and server. This bug
does not appear when client authentication is in use. This is
JDK-8213202, which was fixed in 11.0.3 and 12.0.
4. In JDK 11.0.2 and prior there is a bug where resumed TLS sessions do
not retain peer certificate information. This is JDK-8212885.
The way these issues are addressed is that the current java version is
checked and used to determine the supported protocols for tests that
provoke these issues.
The migrate tool was added when the native realm was created, to aid
users in converting from file realms that were per node, into the
cluster managed native realm. While this tool was useful at the time,
users should now be using the native realm directly. This commit
deprecates the tool, to be removed in a followup for 8.0.
If a basic license enables security, then we should also enforce TLS
on the transport interface.
This was already the case for Standard/Gold/Platinum licenses.
For Basic, security defaults to disabled, so some of the process
around checking whether security is actuallY enabled is more complex
now that we need to account for basic licenses.
Because realms are configured at node startup, but license levels can
change dynamically, it is possible to have a running node that has a
particular realm type configured, but that realm is not permitted under
the current license.
In this case the realm is silently ignored during authentication.
This commit adds a warning in the elasticsearch logs if authentication
fails, and there are realms that have been skipped due to licensing.
This message is not intended to imply that the realms could (or would)
have successfully authenticated the user, but they may help reduce
confusion about why authentication failed if the caller was expecting
the authentication to be handled by a particular realm that is in fact
unlicensed.
Backport of: #41778
The run task is supposed to run elasticsearch with the given plugin or
module. However, for modules, this is most realistic if using the full
distribution. This commit changes the run setup to use the default or
oss as appropriate.
This is related to #27260. Currently we have a single read buffer that
is no larger than a single TLS packet. This prevents us from reading
multiple TLS packets in a single socket read call. This commit modifies
our TLS work to support reading similar to the plaintext case. The data
will be copied to a (potentially) recycled TLS packet-sized buffer for
interaction with the SSLEngine.
This commit is a refactoring of how we filter addresses on
interfaces. In particular, we refactor all of these methods into a
common private method. We also change the order of logic to first check
if an address matches our filter and then check if the interface is
up. This is to possibly avoid problems we are seeing where devices are
flapping up and down while we are checking for loopback addresses. We do
not expect the loopback device to flap up and down so by reversing the
logic here we avoid that problem on CI machines. Finally, we expand the
error message when this does occur so that we know which device is
flapping.
This is related to #27260. Currently there is a setting
http.read_timeout that allows users to define a read timeout for the
http transport. This commit implements support for this functionality
with the transport-nio plugin. The behavior here is that a repeating
task will be scheduled for the interval defined. If there have been
no requests received since the last run and there are no inflight
requests, the channel will be closed.
This is modelled on the qa test for TLS on basic.
It starts a cluster on basic with security & performs a number of
security related checks.
It also performs those same checks on a trial license.
This adds support for using security on a basic license.
It includes:
- AllowedRealmType.NATIVE realms (reserved, native, file)
- Roles / RBAC
- TLS (already supported)
It does not support:
- Audit
- IP filters
- Token Service & API Keys
- Advanced realms (AD, LDAP, SAML, etc)
- Advanced roles (DLS, FLS)
- Pluggable security
As with trial licences, security is disabled by default.
This commit does not include any new automated tests, but existing tests have been updated.