This change ensures that:
- We only attempt to refresh the remote JWKS when there is a
signature related error only ( BadJWSException instead of the
geric BadJOSEException )
- We do call OpenIDConnectAuthenticator#getUserClaims upon
successful refresh.
- We test this in OpenIdConnectAuthenticatorTests.
Without this fix, when using the OpenID Connect realm with a remote
JWKSet configured in `op.jwks_path`, the refresh would be triggered
for most configuration errors ( i.e. wrong value for `op.issuer` )
and the kibana wouldn't get a response and timeout since
`getUserClaims` wouldn't be called because
`ReloadableJWKSource#reloadAsync` wouldn't call `onResponse` on the
future.
Test was using ClockMock#rewind passing the amount of nanoseconds
in order to "strip" nanos from the time value. This was intentional
as the expiration time of the UserToken doesn't have nanosecond
precision.
However, ClockMock#rewind doesn't support nanos either, so when it's
called with a TimeValue, it rewinds the clock by the TimeValue's
millis instead. This was causing the clock to go enough millis
before token expiration time and the test was passing. Once every
few hundred times though, the TimeValue by which we attempted to
rewind the clock only had nanos and no millis, so rewind moved the
clock back just a few millis, but still after expiration time.
This change moves the clock explicitly to the same instant as expiration,
using clock.setTime and disregarding nanos.
* Safer Wait for Snapshot Success in ClusterPrivilegeTests
* The snapshot state returned by the API might become SUCCESS before it's fully removed from the cluster state.
* We should fix this race in the transport API but it's not trivial and will be part of the incoming big round of refactoring the repository interaction, this added check fixes the test for now
* closes#38030
This commit makes creators of GetField split the fields into document fields and metadata fields. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in #24422.
rp.client_secret is a required secure setting. Make sure we fail with
a SettingsException and a clear, actionable message when building
the realm, if the setting is missing.
Enhance the handling of merging the claims sets of the
ID Token and the UserInfo response. JsonObject#merge would throw a
runtime exception when attempting to merge two objects with the
same key and different values. This could happen for an OP that
returns different vales for the same claim in the ID Token and the
UserInfo response ( Google does that for profile claim ).
If a claim is contained in both sets, we attempt to merge the
values if they are objects or arrays, otherwise the ID Token claim
value takes presedence and overwrites the userinfo response.
SHA256 was recently added to the Hasher class in order to be used
in the TokenService. A few tests were still using values() to get
the available algorithms from the Enum and it could happen that
SHA256 would be picked up by these.
This change adds an extra convenience method
(Hasher#getAvailableAlgoCacheHash) and enures that only this and
Hasher#getAvailableAlgoStoredHash are used for getting the list of
available password hashing algorithms in our tests.
This performs a simple restart test to move a basic licensed
cluster from no security (the default) to security & transport TLS
enabled.
Backport of: #41933
If there are no realms that depend on the native role mapping store,
then changes should it should not perform any cache refresh.
A refresh with an empty realm array will refresh all realms.
This also fixes a spurious log warning that could occur if the
role mapping store was notified that the security index was recovered
before any realm were attached.
Backport of: #42169
This commit changes how access tokens and refresh tokens are stored
in the tokens index.
Access token values are now hashed before being stored in the id
field of the `user_token` and before becoming part of the token
document id. Refresh token values are hashed before being stored
in the token field of the `refresh_token`. The tokens are hashed
without a salt value since these are v4 UUID values that have
enough entropy themselves. Both rainbow table attacks and offline
brute force attacks are impractical.
As a side effect of this change and in order to support multiple
concurrent refreshes as introduced in #39631, upon refreshing an
<access token, refresh token> pair, the superseding access token
and refresh tokens values are stored in the superseded token doc,
encrypted with a key that is derived from the superseded refresh
token. As such, subsequent requests to refresh the same token in
the predefined time window will return the same superseding access
token and refresh token values, without hitting the tokens index
(as this only stores hashes of the token values). AES in GCM
mode is used for encrypting the token values and the key
derivation from the superseded refresh token uses a small number
of iterations as it needs to be quick.
For backwards compatibility reasons, the new behavior is only
enabled when all nodes in a cluster are in the required version
so that old nodes can cope with the token values in a mixed
cluster during a rolling upgrade.
This commit updates the default ciphers and TLS protocols that are used
when the runtime JDK supports them. New cipher support has been
introduced in JDK 11 and 12 along with performance fixes for AES GCM.
The ciphers are ordered with PFS ciphers being most preferred, then
AEAD ciphers, and finally those with mainstream hardware support. When
available stronger encryption is preferred for a given cipher.
This is a backport of #41385 and #41808. There are known JDK bugs with
TLSv1.3 that have been fixed in various versions. These are:
1. The JDK's bundled HttpsServer will endless loop under JDK11 and JDK
12.0 (Fixed in 12.0.1) based on the way the Apache HttpClient performs
a close (half close).
2. In all versions of JDK 11 and 12, the HttpsServer will endless loop
when certificates are not trusted or another handshake error occurs. An
email has been sent to the openjdk security-dev list and #38646 is open
to track this.
3. In JDK 11.0.2 and prior there is a race condition with session
resumption that leads to handshake errors when multiple concurrent
handshakes are going on between the same client and server. This bug
does not appear when client authentication is in use. This is
JDK-8213202, which was fixed in 11.0.3 and 12.0.
4. In JDK 11.0.2 and prior there is a bug where resumed TLS sessions do
not retain peer certificate information. This is JDK-8212885.
The way these issues are addressed is that the current java version is
checked and used to determine the supported protocols for tests that
provoke these issues.
The migrate tool was added when the native realm was created, to aid
users in converting from file realms that were per node, into the
cluster managed native realm. While this tool was useful at the time,
users should now be using the native realm directly. This commit
deprecates the tool, to be removed in a followup for 8.0.
If a basic license enables security, then we should also enforce TLS
on the transport interface.
This was already the case for Standard/Gold/Platinum licenses.
For Basic, security defaults to disabled, so some of the process
around checking whether security is actuallY enabled is more complex
now that we need to account for basic licenses.
Because realms are configured at node startup, but license levels can
change dynamically, it is possible to have a running node that has a
particular realm type configured, but that realm is not permitted under
the current license.
In this case the realm is silently ignored during authentication.
This commit adds a warning in the elasticsearch logs if authentication
fails, and there are realms that have been skipped due to licensing.
This message is not intended to imply that the realms could (or would)
have successfully authenticated the user, but they may help reduce
confusion about why authentication failed if the caller was expecting
the authentication to be handled by a particular realm that is in fact
unlicensed.
Backport of: #41778
The run task is supposed to run elasticsearch with the given plugin or
module. However, for modules, this is most realistic if using the full
distribution. This commit changes the run setup to use the default or
oss as appropriate.
This is related to #27260. Currently we have a single read buffer that
is no larger than a single TLS packet. This prevents us from reading
multiple TLS packets in a single socket read call. This commit modifies
our TLS work to support reading similar to the plaintext case. The data
will be copied to a (potentially) recycled TLS packet-sized buffer for
interaction with the SSLEngine.
This commit is a refactoring of how we filter addresses on
interfaces. In particular, we refactor all of these methods into a
common private method. We also change the order of logic to first check
if an address matches our filter and then check if the interface is
up. This is to possibly avoid problems we are seeing where devices are
flapping up and down while we are checking for loopback addresses. We do
not expect the loopback device to flap up and down so by reversing the
logic here we avoid that problem on CI machines. Finally, we expand the
error message when this does occur so that we know which device is
flapping.
This is related to #27260. Currently there is a setting
http.read_timeout that allows users to define a read timeout for the
http transport. This commit implements support for this functionality
with the transport-nio plugin. The behavior here is that a repeating
task will be scheduled for the interval defined. If there have been
no requests received since the last run and there are no inflight
requests, the channel will be closed.
This is modelled on the qa test for TLS on basic.
It starts a cluster on basic with security & performs a number of
security related checks.
It also performs those same checks on a trial license.
This adds support for using security on a basic license.
It includes:
- AllowedRealmType.NATIVE realms (reserved, native, file)
- Roles / RBAC
- TLS (already supported)
It does not support:
- Audit
- IP filters
- Token Service & API Keys
- Advanced realms (AD, LDAP, SAML, etc)
- Advanced roles (DLS, FLS)
- Pluggable security
As with trial licences, security is disabled by default.
This commit does not include any new automated tests, but existing tests have been updated.
This commit introduces the `.security-tokens` and `.security-tokens-7`
alias-index pair. Because index snapshotting is at the index level granularity
(ie you cannot snapshot a subset of an index) snapshoting .`security` had
the undesirable effect of storing ephemeral security tokens. The changes
herein address this issue by moving tokens "seamlessly" (without user
intervention) to another index, so that a "Security Backup" (ie snapshot of
`.security`) would not be bloated by ephemeral data.
Today we allow adding entries from a file or from a string, yet we
internally maintain this distinction such that if you try to add a value
from a file for a setting that expects a string or add a value from a
string for a setting that expects a file, you will have a bad time. This
causes a pain for operators such that for each setting they need to know
this difference. Yet, we do not need to maintain this distinction
internally as they are bytes after all. This commit removes that
distinction and includes logic to upgrade legacy keystores.
This is related to #27260. Currently for the SSLDriver we allocate a
dedicated network write buffer and encrypt the data into that buffer one
buffer at a time. This requires constantly switching between encrypting
and flushing. This commit adds a dedicated outbound buffer for SSL
operations that will internally allocate new packet sized buffers as
they are need (for writing encrypted data). This allows us to totally
encrypt an operation before writing it to the network. Eventually it can
be hooked up to buffer recycling.
This commit also backports the following commit:
Handle WRAP ops during SSL read
It is possible that a WRAP operation can occur while decrypting
handshake data in TLS 1.3. The SSLDriver does not currently handle this
well as it does not have access to the outbound buffer during read call.
This commit moves the buffer into the Driver to fix this issue. Data
wrapped during a read call will be queued for writing after the read
call is complete.
This is related to #27260. Currently for the SSLDriver we allocate a
dedicated network write buffer and encrypt the data into that buffer one
buffer at a time. This requires constantly switching between encrypting
and flushing. This commit adds a dedicated outbound buffer for SSL
operations that will internally allocate new packet sized buffers as
they are need (for writing encrypted data). This allows us to totally
encrypt an operation before writing it to the network. Eventually it can
be hooked up to buffer recycling.
TLS 1.3 changes to the SSLEngine introduced a scenario where a UNWRAP
call during a handshake can consume a close notify alerty without
throwing an exception. This means that we continue down a codepath where
we assert that we are still in handshaking mode. Transitioning to closed
from handshaking is a valid scenario. This commit removes this
assertion.
Today the `_field_caps` API returns the list of indices where a field
is present only if this field has different types within the requested indices.
However if the request is an index pattern (or an alias, or both...) there
is no way to infer the indices if the response contains only fields that have
the same type in all indices. This commit changes the response to always return
the list of indices in the response. It also adds a way to retrieve unmapped field
in a specific section per field called `unmapped`. This section is created for each field
that is present in some indices but not all if the parameter `include_unmapped` is set to
true in the request (defaults to false).
The Has Privileges API allows to tap into the authorization process, to validate
privileges without actually running the operations to be authorized. This commit
fixes a bug, in which the Has Privilege API returned spurious results when checking
for index privileges over restricted indices (currently .security, .security-6,
.security-7). The actual authorization process is not affected by the bug.
hamcrest has some improvements in newer versions, like FileMatchers
that make assertions regarding file exists cleaner. This commit upgrades
to the latest version of hamcrest so we can start using new and improved
matchers.
The `DistinguishedNamePredicate`, used for matching users to role mapping
expressions, should handle users with null DNs. But it fails to do so (and this is
a NPE bug), if the role mapping expression contains a lucene regexp or a wildcard.
The fix simplifies `DistinguishedNamePredicate` to not handle null DNs at all, and
instead use the `ExpressionModel#NULL_PREDICATE` for the DN field, just like
any other missing user field.
When the same alias points to multiple indices we can write to only one index
with `is_write_index` value `true`. The special handling in case of the put
mapping request(to resolve authorized indices) has a check on indices size
for a concrete index. If multiple indices existed then it marked the request
as unauthorized.
The check has been modified to consider write index flag and only when the
requested index matches with the one with write index alias, the alias is considered
for authorization.
Closes#40831
This commit adds an OpenID Connect authentication realm to
elasticsearch. Elasticsearch (with the assistance of kibana or
another web component) acts as an OpenID Connect Relying
Party and supports the Authorization Code Grant and Implicit
flows as described in http://ela.st/oidc-spec. It adds support
for consuming and verifying signed ID Tokens, both RP
initiated and 3rd party initiated Single Sign on and RP
initiated signle logout.
It also adds an OpenID Connect Provider in the idp-fixture to
be used for the associated integration tests.
This is a backport of #40674
For pattern "n:localhost" PatternRule#isLocalhost() matches
any local address, loopback address.
[Note: I think for "localhost" this should not consider IP address
as a match when they are bound to network interfaces. It should just
be loopback address check unless the intent is to match all local addresses.
This class is adopted from Netty3 and I am not sure if this is intended
behavior or maybe I am missing something]
For now I have fixed this assuming the PatternRule#isLocalhost check is
correct by avoiding use of local address to check address denied.
Closes#40194
* moved hlrc parsing tests from xpack to hlrc module and removed dependency on hlrc from xpack core
* deprecated old base test class
* added deprecated jdoc tag
* split test between xpack-core part and hlrc part
* added lang-mustache test dependency, this previously came in via
hlrc dependency.
* added hlrc dependency on a qa module
* duplicated ClusterPrivilegeName class in xpack-core, since x-pack
core no longer has a dependency on hlrc.
* replace ClusterPrivilegeName usages with string literals
* moved tests to dedicated to hlrc packages in order to remove Hlrc part from the name and make sure to use imports instead of full qualified class where possible
* remove ESTestCase. from method invocation and use method directly,
because these tests indirectly extend from ESTestCase
This PR generates deprecation log entries for each Role Descriptor,
used for building a Role, when the Role Descriptor grants more privileges
for an alias compared to an index that the alias points to. This is done in
preparation for the removal of the ability to define privileges over aliases.
There is one log entry for each "role descriptor name"-"alias name" pair.
On such a notice, the administrator is expected to modify the Role Descriptor
definition so that the name pattern for index names does not cover aliases.
Caveats:
* Role Descriptors that are not used in any authorization process,
either because they are not mapped to any user or the user they are mapped to
is not used by clients, are not be checked.
* Role Descriptors are merged when building the effective Role that is used in
the authorization process. Therefore some Role Descriptors can overlap others,
so even if one matches aliases in a deprecated way, and it is reported as such,
it is not at risk from the breaking behavior in the current role mapping configuration
and index-alias configuration. It is still reported because it is a best practice to
change its definition, or remove offending aliases.
* Replace usages RandomizedTestingTask with built-in Gradle Test (#40978)
This commit replaces the existing RandomizedTestingTask and supporting code with Gradle's built-in JUnit support via the Test task type. Additionally, the previous workaround to disable all tasks named "test" and create new unit testing tasks named "unitTest" has been removed such that the "test" task now runs unit tests as per the normal Gradle Java plugin conventions.
(cherry picked from commit 323f312bbc829a63056a79ebe45adced5099f6e6)
* Fix forking JVM runner
* Don't bump shadow plugin version
This opt-out query cache has an unsafe publication issue, where the
cache is exposed to another thread (namely the cluster state update
thread) before the constructor has finished execution. This exposes the
opt-out query cache to concurrency bugs. This commit addresses this by
ensuring that the opt-out query cache is not registered as a listener
for license state changes until after the constructor has returned.
* Avoid sharing source directories as it breaks intellij
* Subprojects share main project output classes directory
* Fix jar hell
* Fix sql security with ssl integ tests
* Relax dependency ordering rule so we don't explode on cycles
This adds a new security/qa test for TLS on a basic license.
It starts a 2 node cluster with a basic license, and TLS enabled
on both HTTP and Transport, and verifies the license type, x-pack
SSL usage and SSL certificates API.
It also upgrades the cluster to a trial license and performs that
same set of checks (to ensure that clusters with basic license
and TLS enabled can be upgraded to a higher feature license)
Backport of: #40714
This change updates our version of httpclient to version 4.5.8, which
contains the fix for HTTPCLIENT-1968, which is a bug where the client
started re-writing paths that contained encoded reserved characters
with their unreserved form.
Many gradle projects specifically use the -try exclude flag, because
there are many cases where auto-closeable resource ignore is never
referenced in body of corresponding try statement. Suppressing this
warning specifically in each case that it happens using
`@SuppressWarnings("try")` would be very verbose.
This change removes `-try` from any gradle project and adds it to the
build plugin. Also this change removes exclude flags from gradle projects
that is already specified in build plugin (for example -deprecation).
Relates to #40366
It is possible to have SSL enabled but security disabled if security
was dynamically disabled by the license type (e.g. trial license).
e.g. In the following configuration:
xpack.license.self_generated.type: trial
# xpack.security not set, default to disabled on trial
xpack.security.transport.ssl.enabled: true
The security feature will be reported as
available: true
enabled: false
And in this case, SSL will be active even though security is not
enabled.
This commit causes the X-Pack feature usage to report the state of the
"ssl" features unless security was explicitly disabled in the
settings.
Backport of: #40672
This adds a new `role_templates` field to role mappings that is an
alternative to the existing roles field.
These templates are evaluated at runtime to determine which roles should be
granted to a user.
For example, it is possible to specify:
"role_templates": [
{ "template":{ "source": "_user_{{username}}" } }
]
which would mean that every user is assigned to their own role based on
their username.
You may not specify both roles and role_templates in the same role
mapping.
This commit adds support for templates to the role mapping API, the role
mapping engine, the Java high level rest client, and Elasticsearch
documentation.
Due to the lack of caching in our role mapping store, it is currently
inefficient to use a large number of templated role mappings. This will be
addressed in a future change.
Backport of: #39984, #40504
This commit introduces 2 changes to application privileges:
- The validation rules now accept a wildcard in the "suffix" of an application name.
Wildcards were always accepted in the application name, but the "valid filename" check
for the suffix incorrectly prevented the use of wildcards there.
- A role may now be defined against a wildcard application (e.g. kibana-*) and this will
be correctly treated as granting the named privileges against all named applications.
This does not allow wildcard application names in the body of a "has-privileges" check, but the
"has-privileges" check can test concrete application names against roles with wildcards.
Backport of: #40398
Replicated closed indices can't be indexed into or searched, and therefore don't need a shard with
full indexing and search capabilities allocated. We can save on a lot of heap memory for those
indices by not allocating a mapper service and caching infrastructure (which preallocates a constant
amount per instance). Before this change, a 1GB ES instance could host 250 replicated closed
metricbeat indices (each index with one shard). After this change, the same instance can host 7300
replicated closed metricbeat instances (not that this would be a recommended configuration). Most
of the remaining memory is in the cluster state and the IndexSettings object.
This refactoring is in the context of the work related to moving security
tokens to a new index. In that regard, the Token Service has to work with
token documents stored in any of the two indices, albeit only as a transient
situation. I reckoned the added complexity as unmanageable,
hence this refactoring.
This is incomplete, as it fails to address the goal of minimizing .security accesses,
but I have stopped because otherwise it would've become a full blown rewrite
(if not already). I will follow-up with more targeted PRs.
In addition to being a true refactoring, some 400 errors moved to 500. Furthermore,
more stringed validation of various return result, has been implemented, notably the
one of the token document creation.
When creating API keys we check for if API key with
the same key name already exists and fail the request if it does.
The check should have been performed with XPackSecurityUser
instead of the authenticated user. This caused the request to fail
in case of the non-super user trying to create an API key.
This commit fixes by executing search action with SECURITY_ORIGIN
so it can be executed with XPackSecurityUser.
Also fixed the Rest test to avoid using a user with `super_user` role.
Closes#40029
The setup-passwords tool gives cryptic messages in case where custom discovery providers are
used (see #33580). As the URL auto-detection logic should be seen as best effort, this commit
improves the exception message to make it clearer what needs to be done to fix the issue.
Relates #33580
`SecurityIndexManager` is hardcoded to handle only the `.security`-`.security-7` alias-index pair.
This commit removes the hardcoded bits, so that the `SecurityIndexManager` can be reused
for other indices, such as the planned security tokens index (`.security-tokens-7`).
The LDAP tests attempt to bind all interfaces,
but if for some reason an interface can't be bound
the tests will stall until the suite times out.
This modifies the tests to be a bit more lenient and allow
some binding to fail so long as at least one succeeds.
This allows the test to continue even in more antagonistic
environments.
A TLS handshake requires exchanging multiple messages to initiate a
session. If one side decides to close during the handshake, it is
supposed to send a close_notify alert (similar to closing during
application data exchange). The java SSLEngine engine throws an
exception when this happens. We currently log this at the warn level if
trace logging is not enabled. This level is too high for a valid
scenario. Additionally it happens all the time in tests (quickly closing
and opened transports). This commit changes this to be logged at the
debug level if trace is not enabled. Additionally, it extracts the
transport security exception handling to a common class.
Previously all the threads were writing the received tokens to a
HashSet. In cases with many threads, sometimes (1 every ~25 tests)
calling size() on the HashSet returned 2 even though it seemed to
contain only one String and there was no evidence from logging that
threadSecurityClient.refreshToken() ever returned a different
access or refresh token.
This commit changes the test to use a ConcurrentHashMap instead,
checking that we only received one pair of access token/refresh token
eventually. It also adds a check so that we won't take into consideration
tokens that are returned after 30s, hence not in the concurrent refresh
time window.
Fixes several errors of the token retry logic:
* not checking for backoff.hasNext() before calling backoff.next()
* checking for backoff.hasNext() without calling backoff.next()
* not preserving the context on the retry
* calling scheduleWithFixedDelay instead of schedule
Today the `GroupedActionListener` accepts a `defaults` parameter but all
callers pass an empty list. Also it is permitted to pass an empty group but
this is trappy because the delegated listener is never be called in that case.
This commit removes the `defaults` parameter and forbids an empty group.
As we are moving to single type indices,
we need to address this change in security-related indexes.
To address this, we are
- updating index templates to use preferred type name `_doc`
- updating the API calls to use preferred type name `_doc`
Upgrade impact:-
In case of an upgrade from 6.x, the security index has type
`doc` and this will keep working as there is a single type and `_doc`
works as an alias to an existing type. The change is handled in the
`SecurityIndexManager` when we load mappings and settings from
the template. Previously, we used to do a `PutIndexTemplateRequest`
with the mapping source JSON with the type name. This has been
modified to remove the type name from the source.
So in the case of an upgrade, the `doc` type is updated
whereas for fresh installs `_doc` is updated. This happens as
backend handles `_doc` as an alias to the existing type name.
An optional step is to `reindex` security index and update the
type to `_doc`.
Since we do not support the security audit log index,
that template has been deleted.
Relates: #38637
This is a backport of #39631
Co-authored-by: Jay Modi jaymode@users.noreply.github.com
This change adds support for the concurrent refresh of access
tokens as described in #36872
In short it allows subsequent client requests to refresh the same token that
come within a predefined window of 60 seconds to be handled as duplicates
of the original one and thus receive the same response with the same newly
issued access token and refresh token.
In order to support that, two new fields are added in the token document. One
contains the instant (in epoqueMillis) when a given refresh token is refreshed
and one that contains a pointer to the token document that stores the new
refresh token and access token that was created by the original refresh.
A side effect of this change, that was however also a intended enhancement
for the token service, is that we needed to stop encrypting the string
representation of the UserToken while serializing. ( It was necessary as we
correctly used a new IV for every time we encrypted a token in serialization, so
subsequent serializations of the same exact UserToken would produce
different access token strings)
This change also handles the serialization/deserialization BWC logic:
In mixed clusters we keep creating tokens in the old format and
consume only old format tokens
In upgraded clusters, we start creating tokens in the new format but
still remain able to consume old format tokens (that could have been
created during the rolling upgrade and are still valid)
When reading/writing TokensInvalidationResult objects, we take into
consideration that pre 7.1.0 these contained an integer field that carried
the attempt count
Resolves#36872
Previously, the security index could be wrongfully recreated. This might
happen if the index was interpreted as missing, as in the case of a fresh
install, but the index existed and the state did not yet recover.
This fix will return HTTP SERVICE_UNAVAILABLE (503) for requests that
try to write to the security index before the state has not been recovered yet.
This is a backport of #38382
This change adds supports for the concurrent refresh of access
tokens as described in #36872
In short it allows subsequent client requests to refresh the same token that
come within a predefined window of 60 seconds to be handled as duplicates
of the original one and thus receive the same response with the same newly
issued access token and refresh token.
In order to support that, two new fields are added in the token document. One
contains the instant (in epoqueMillis) when a given refresh token is refreshed
and one that contains a pointer to the token document that stores the new
refresh token and access token that was created by the original refresh.
A side effect of this change, that was however also a intended enhancement
for the token service, is that we needed to stop encrypting the string
representation of the UserToken while serializing. ( It was necessary as we
correctly used a new IV for every time we encrypted a token in serialization, so
subsequent serializations of the same exact UserToken would produce
different access token strings)
This change also handles the serialization/deserialization BWC logic:
- In mixed clusters we keep creating tokens in the old format and
consume only old format tokens
- In upgraded clusters, we start creating tokens in the new format but
still remain able to consume old format tokens (that could have been
created during the rolling upgrade and are still valid)
Resolves#36872
Co-authored-by: Jay Modi jaymode@users.noreply.github.com
This commit adds a simple integ test that exercises the flow:
* snapshot .security
* delete .security
* restore .security
, checking that the Native Realm works as expected.
Relates #34454
Currently there are two security tests that specifically target the
netty security transport. This PR moves the client authentication tests
into `AbstractSimpleSecurityTransportTestCase` so that the nio transport
will also be tested.
Additionally the work to build transport configurations is moved out of
the netty transport and tested independently.
This changes the name of the internal security index to ".security-7",
but supports indices that were upgraded from earlier versions and use
the ".security-6" name.
In all cases, both ".security-6" and ".security-7" are considered to
be restricted index names regardless of which name is actually in use
on the cluster.
Backport of: #39337
This change is a backport of #39252
- Fixes TokenBackwardsCompatibilityIT: Existing tests seemed to made
the assumption that in the oneThirdUpgraded stage the master node
will be on the old version and in the twoThirdsUpgraded stage, the
master node will be one of the upgraded ones. However, there is no
guarantee that the master node in any of the states will or will
not be one of the upgraded ones.
This class now tests:
- That we can generate and consume tokens before we start the
rolling upgrade.
- That we can consume tokens generated in the old cluster during
all the stages of the rolling upgrade.
- That while on a mixed cluster, when/if the master node is
upgraded, we can generate, consume and refresh a token
- That after the rolling upgrade, we can consume a token
generated in an old cluster and can invalidate it so that it
can't be used any more.
- Ensures that during the rolling upgrade, the upgraded nodes have
the same configuration as the old nodes. Specifically that the
file realm we use is explicitly named `file1`. This is needed
because while attempting to refresh a token in a mixed cluster
we might create a token hitting an old node and attempt to refresh
it hitting a new node. If the file realm name is not the same, the
refresh will be seen as being made by a "different" client, and
will, thus, fail.
- Renames the Authentication variable we check while refreshing a
token to be clientAuth in order to make the code more readable.
Some of the above were possibly causing the flakiness of #37379
Currently remote compression and ping schedule settings are dynamic.
However, we do not listen for changes. This commit adds listeners for
changes to those two settings. Additionally, when those settings change
we now close existing connections and open new ones with the settings
applied.
Fixes#37201.
This change aims to fix failures in the session factory load balancing
tests that mock failure scenarios. For these tests, we randomly shut
down ldap servers and bind a client socket to the port they were
listening on. Unfortunately, we would occasionally encounter failures
in these tests where a socket was already in use and/or the port
we expected to connect to was wrong and in fact was to one of the ldap
instances that should have been shut down.
The failures are caused by the behavior of certain operating systems
when it comes to binding ports and wildcard addresses. It is possible
for a separate application to be bound to a wildcard address and still
allow our code to bind to that port on a specific address. So when we
close the server socket and open the client socket, we are still able
to establish a connection since the other application is already
listening on that port on a wildcard address. Another variant is that
the os will allow a wildcard bind of a server socket when there is
already an application listening on that port for a specific address.
In order to do our best to prevent failures in these scenarios, this
change does the following:
1. Binds a client socket to all addresses in an awaitBusy
2. Adds assumption that we could bind all valid addresses
3. In the case that we still establish a connection to an address that
we should not be able to, try to bind and expect a failure of not
being connected
Closes#32190
In most of the places we avoid creating the `.security` index (or updating the mapping)
for read/search operations. This is more of a nit for the case of the getRole call,
that fixes a possible mapping update during a get role, and removes a dead if branch
about creating the `.security` index.
This commit attempts to remove the retention leases on the leader shards
when unfollowing an index. This is best effort, since the leader might
not be available.
* Disable specific locales for tests in fips mode
The Bouncy Castle FIPS provider that we use for running our tests
in fips mode has an issue with locale sensitive handling of Dates as
described in https://github.com/bcgit/bc-java/issues/405
This causes certificate validation to fail if any given test that
includes some form of certificate validation happens to run in one
of the locales. This manifested earlier in #33081 which was
handled insufficiently in #33299
This change ensures that the problematic 3 locales
* th-TH
* ja-JP-u-ca-japanese-x-lvariant-JP
* th-TH-u-nu-thai-x-lvariant-TH
will not be used when running our tests in a FIPS 140 JVM. It also
reverts #33299
The data frame plugin allows users to create feature indexes by pivoting a source index. In a
nutshell this can be understood as reindex supporting aggregations or similar to the so called entity
centric indexing.
Full history is provided in: feature/data-frame-transforms
This commit is the first step in integrating shard history retention
leases with CCR. In this commit we integrate shard history retention
leases with recovery from remote. Before we start transferring files, we
take out a retention lease on the primary. Then during the file copy
phase, we repeatedly renew the retention lease. Finally, when recovery
from remote is complete, we disable the background renewing of the
retention lease.
Few tests failed intermittently and most of the
times due to invalidated or expired keys that were
deleted were still reported in search results.
This commit removes the test and adds enhancements
to other tests testing different scenario's.
When ExpiredApiKeysRemover is triggered, the tests
did not await its termination thereby sometimes
the results would be wrong for a search operation.
DELETE_INTERVAL setting has been further reduced to
100ms so we can trigger ExpiredApiKeysRemover faster.
Closes#38408
This change makes the writing of new usage data conditional based on
the version that is being written to. A test has also been added to
ensure serialization works as expected to an older version.
Relates #38687, #38917
This change updates the authentication service to use a consistent view
of the realms based on the license state at the start of
authentication. Without this, the license can change during
authentication of a request and it will result in a failure if the
realm that extracted the token is no longer in the realm list. This
manifests in some tests as an authentication failure that should never
really happen; one example would be the test framework's transport
client user should always have a succesful authentication but in the
LicensingTests this can fail and will show up as a
NoNodeAvailableException.
Additionally, the licensing tests have been updated to ensure that
there is consistency when changing the license. The license is changed
by modifying the internal xpack license state on each node, which has
no protection against be changed by some pending cluster action. The
methods to disable and enable now ensure we have a green cluster and
that the cluster is consistent before returning.
Closes#30301
Right now there is no way to determine whether the
token service or API key service is enabled or not.
This commit adds support for the enabled status of
token and API key service to the security feature set
usage API `/_xpack/usage`.
Closes#38535
* Enhance parsing of StatusCode in SAML Responses
<Status> elements in a failed response might contain two nested
<StatusCode> elements. We currently only parse the first one in
order to create a message that we attach to the Exception we return
and log. However this is generic and only gives out informarion
about whether the SAML IDP believes it's an error with the
request or if it couldn't handle the request for other reasons. The
encapsulated StatusCode has a more interesting error message that
potentially gives out the actual error as in Invalid nameid policy,
authentication failure etc.
This change ensures that we print that information also, and removes
Message and Details fields from the message when these are not
part of the Status element (which quite often is the case)
In #38333 and #38350 we moved away from the `discovery.zen` settings namespace
since these settings have an effect even though Zen Discovery itself is being
phased out. This change aligns the documentation and the names of related
classes and methods with the newly-introduced naming conventions.
This commit adds an authentication cache for API keys that caches the
hash of an API key with a faster hash. This will enable better
performance when API keys are used for bulk or heavy searching.
I have not been able to reproduce the failing
test scenario locally for #38408 and there are other similar
tests which are running fine in the same test class.
I am re-enabling the test with additional logs so
that we can debug further on what's happening.
I will keep the issue open for now and look out for the builds
to see if there are any related failures.
For some users, the built in authorization mechanism does not fit their
needs and no feature that we offer would allow them to control the
authorization process to meet their needs. In order to support this,
a concept of an AuthorizationEngine is being introduced, which can be
provided using the security extension mechanism.
An AuthorizationEngine is responsible for making the authorization
decisions about a request. The engine is responsible for knowing how to
authorize and can be backed by whatever mechanism a user wants. The
default mechanism is one backed by roles to provide the authorization
decisions. The AuthorizationEngine will be called by the
AuthorizationService, which handles more of the internal workings that
apply in general to authorization within Elasticsearch.
In order to support external authorization services that would back an
authorization engine, the entire authorization process has become
asynchronous, which also includes all calls to the AuthorizationEngine.
The use of roles also leaked out of the AuthorizationService in our
existing code that is not specifically related to roles so this also
needed to be addressed. RequestInterceptor instances sometimes used a
role to ensure a user was not attempting to escalate their privileges.
Addressing this leakage of roles meant that the RequestInterceptor
execution needed to move within the AuthorizationService and that
AuthorizationEngines needed to support detection of whether a user has
more privileges on a name than another. The second area where roles
leaked to the user is in the handling of a few privilege APIs that
could be used to retrieve the user's privileges or ask if a user has
privileges to perform an action. To remove the leakage of roles from
these actions, the AuthorizationService and AuthorizationEngine gained
methods that enabled an AuthorizationEngine to return the response for
these APIs.
Ultimately this feature is the work included in:
#37785#37495#37328#36245#38137#38219Closes#32435
Elasticsearch has long [supported](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning) compare and set (a.k.a optimistic concurrency control) operations using internal document versioning. Sadly that approach is flawed and can sometime do the wrong thing. Here's the relevant excerpt from the resiliency status page:
> When a primary has been partitioned away from the cluster there is a short period of time until it detects this. During that time it will continue indexing writes locally, thereby updating document versions. When it tries to replicate the operation, however, it will discover that it is partitioned away. It won’t acknowledge the write and will wait until the partition is resolved to negotiate with the master on how to proceed. The master will decide to either fail any replicas which failed to index the operations on the primary or tell the primary that it has to step down because a new primary has been chosen in the meantime. Since the old primary has already written documents, clients may already have read from the old primary before it shuts itself down. The version numbers of these reads may not be unique if the new primary has already accepted writes for the same document
We recently [introduced](https://www.elastic.co/guide/en/elasticsearch/reference/6.x/optimistic-concurrency-control.html) a new sequence number based approach that doesn't suffer from this dirty reads problem.
This commit removes support for internal versioning as a concurrency control mechanism in favor of the sequence number approach.
Relates to #1078
With this change we no longer support pluggable discovery implementations. No
known implementations of `DiscoveryPlugin` actually override this method, so in
practice this should have no effect on the wider world. However, we were using
this rather extensively in tests to provide the `test-zen` discovery type. We
no longer need a separate discovery type for tests as we no longer need to
customise its behaviour.
Relates #38410
Authn is enabled only if `license_type` is non `basic`, but `basic` is
what the `LicenseService` generates implicitly. This commit explicitly sets
license type to `trial`, which allows for authn, in the `SecuritySettingsSource`
which is the settings configuration parameter for `InternalTestCluster`s.
The real problem, that had created tests failures like #31028 and #32685, is
that the check `licenseState.isAuthAllowed()` can change sporadically. If it were
to return `true` or `false` during the whole test there would be no problem.
The problem manifests when it turns from `true` to `false` right before `Realms.asList()`.
There are other license checks before this one (request filter, token service, etc)
that would not cause a problem if they would suddenly see the check as `false`.
But switching to `false` before `Realms.asList()` makes it appear that no installed
realms could have handled the authn token which is an authentication error, as can
be seen in the failing tests.
Closes#31028#32685
Renames the following settings to remove the mention of `zen` in their names:
- `discovery.zen.hosts_provider` -> `discovery.seed_providers`
- `discovery.zen.ping.unicast.concurrent_connects` -> `discovery.seed_resolver.max_concurrent_resolvers`
- `discovery.zen.ping.unicast.hosts.resolve_timeout` -> `discovery.seed_resolver.timeout`
- `discovery.zen.ping.unicast.hosts` -> `discovery.seed_addresses`
X-Pack security supports built-in authentication service
`token-service` that allows access tokens to be used to
access Elasticsearch without using Basic authentication.
The tokens are generated by `token-service` based on
OAuth2 spec. The access token is a short-lived token
(defaults to 20m) and refresh token with a lifetime of 24 hours,
making them unsuitable for long-lived or recurring tasks where
the system might go offline thereby failing refresh of tokens.
This commit introduces a built-in authentication service
`api-key-service` that adds support for long-lived tokens aka API
keys to access Elasticsearch. The `api-key-service` is consulted
after `token-service` in the authentication chain. By default,
if TLS is enabled then `api-key-service` is also enabled.
The service can be disabled using the configuration setting.
The API keys:-
- by default do not have an expiration but expiration can be
configured where the API keys need to be expired after a
certain amount of time.
- when generated will keep authentication information of the user that
generated them.
- can be defined with a role describing the privileges for accessing
Elasticsearch and will be limited by the role of the user that
generated them
- can be invalidated via invalidation API
- information can be retrieved via a get API
- that have been expired or invalidated will be retained for 1 week
before being deleted. The expired API keys remover task handles this.
Following are the API key management APIs:-
1. Create API Key - `PUT/POST /_security/api_key`
2. Get API key(s) - `GET /_security/api_key`
3. Invalidate API Key(s) `DELETE /_security/api_key`
The API keys can be used to access Elasticsearch using `Authorization`
header, where the auth scheme is `ApiKey` and the credentials, is the
base64 encoding of API key Id and API key separated by a colon.
Example:-
```
curl -H "Authorization: ApiKey YXBpLWtleS1pZDphcGkta2V5" http://localhost:9200/_cluster/health
```
Closes#34383
We mention in our documentation for the token
expiration configuration maximum value is 1 hour
but do not enforce it. This commit adds max limit
to the TOKEN_EXPIRATION setting.
This commit introduces a background sync for retention leases. The idea
here is that we do a heavyweight sync when adding a new retention lease,
and then periodically we want to background sync any retention lease
renewals to the replicas. As long as the background sync interval is
significantly lower than the extended lifetime of a retention lease, it
is okay if from time to time a replica misses a sync (it will still have
an older version of the lease that is retaining more data as we assume
that renewals do not decrease the retaining sequence number). There are
two follow-ups that will come after this commit. The first is to address
the fact that we have not adapted the should periodically flush logic to
possibly flush the retention leases. We want to do something like flush
if we have not flushed in the last five minutes and there are renewed
retention leases since the last time that we flushed. An additional
follow-up will remove the syncing of retention leases when a retention
lease expires. Today this sync could be invoked in the background by a
merge operation. Rather, we will move the syncing of retention lease
expiration to be done under the background sync. The background sync
will use the heavyweight sync (write action) if a lease has expired, and
will use the lightweight background sync (replication action) otherwise.
It would be beneficial to apply some of the request interceptors even
when features are disabled. This change reworks the way we build that
list so that the interceptors we always want to use are constructed
outside of the settings check.
The culprit in #38097 is an `IndicesRequest` that has no indices,
but instead of `request.indices()` returning `null` or `String[0]`
it returned `String[] {null}` . This tripped the audit filter.
I have addressed this in two ways:
1. `request.indices()` returning `String[] {null}` is treated as `null`
or `String[0]`, i.e. no indices
2. `null` values among the roles and indices lists, which are
unexpected, will never again stumble the audit filter; `null` values
are treated as special values that will not match any policy,
i.e. their events will always be printed.
Closes#38097
Scheduler.schedule(...) would previously assume that caller handles
exception by calling get() on the returned ScheduledFuture.
schedule() now returns a ScheduledCancellable that no longer gives
access to the exception. Instead, any exception thrown out of a
scheduled Runnable is logged as a warning.
This is a continuation of #28667, #36137 and also fixes#37708.
The apache commons http client implementations recently released
versions that solve TLS compatibility issues with the new TLS engine
that supports TLSv1.3 with JDK 11. This change updates our code to
use these versions since JDK 11 is a supported JDK and we should
allow the use of TLSv1.3.
Today we pass `discovery.zen.minimum_master_nodes` to nodes started up in
tests, but for 7.x nodes this setting is not required as it has no effect.
This commit removes this setting so that nodes are started with more realistic
configurations, and deprecates it.
The certgen, certutil and saml-metadata tools did not correctly return
their exit code to the calling shell.
These commands now explicitly exit with the code that was returned
from the main(args, terminal) method.
Restricted indices (currently only .security-6 and .security) are special
internal indices that require setting the `allow_restricted_indices` flag
on every index permission that covers them. If this flag is `false`
(default) the permission will not cover these and actions against them
will not be authorized.
However, the monitoring APIs were the only exception to this rule.
This exception is herein forfeited and index monitoring privileges have to be
granted explicitly, using the `allow_restricted_indices` flag on the permission,
as is the case for any other index privilege.
This commit introduces the `create_snapshot` cluster privilege and
the `snapshot_user` role.
This role is to be used by "cronable" tools that call the snapshot API
periodically without recurring to the `manage` cluster privilege. The
`create_snapshot` cluster privilege is much more limited compared to
the `manage` privilege.
The `snapshot_user` role grants the privileges to view the metadata of
all indices (including restricted ones, i.e. .security). It obviously grants the
create snapshot privilege but the repository has to be created using another
role. In addition, it grants the privileges to (only) GET repositories and
snapshots, but not create and delete them.
The role does not allow to create repositories. This distinction is important
because snapshotting equates to the `read` index privilege if the user has
control of the snapshot destination, but this is not the case in this instance,
because the role does not grant control over repository configuration.
This commit introduces retention lease syncing from the primary to its
replicas when a new retention lease is added. A follow-up commit will
add a background sync of the retention leases as well so that renewed
retention leases are synced to replicas.
* Exit batch files explictly using ERRORLEVEL
This makes sure the exit code is preserved when calling the batch
files from different contexts other than DOS
Fixes#29582
This also fixes specific error codes being masked by an explict
exit /b 1
causing the useful exitcodes from ExitCodes to be lost.
* fix line breaks for calling cli to match the bash scripts
* indent size of bash files is 2, make sure editorconfig does the same for bat files
* update indenting to match bash files
* update elasticsearch-keystore.bat indenting
* Update elasticsearch-node.bat to exit outside of endlocal
This commit changes the default for the `track_total_hits` option of the search request
to `10,000`. This means that by default search requests will accurately track the total hit count
up to `10,000` documents, requests that match more than this value will set the `"total.relation"`
to `"gte"` (e.g. greater than or equals) and the `"total.value"` to `10,000` in the search response.
Scroll queries are not impacted, they will continue to count the total hits accurately.
The default is set back to `true` (accurate hit count) if `rest_total_hits_as_int` is set in the search request.
I choose `10,000` as the default because that's also the number we use to limit pagination. This means that users will be able to know how far they can jump (up to 10,000) even if the total number of hits is not accurate.
Closes#33028
When we can't map the principal attribute from the configured SAML
attribute in the realm settings, we can't complete the
authentication. We return an error to the user indicating this and
we present them with a list of attributes we did get from the SAML
response to point out that the expected one was not part of that
list. This list will never contain the NameIDs though as they are
not part of the SAMLAttribute list. So we might have a NameID but
just with a different format.
This commit removes the Index Audit Output type, following its deprecation
in 6.7 by 8765a31d4e6770. It also adds the migration notice (settings notice).
In general, the problem with the index audit output is that event indexing
can be slower than the rate with which audit events are generated,
especially during the daily rollovers or the rolling cluster upgrades.
In this situation audit events will be lost which is a terrible failure situation
for an audit system.
Besides of the settings under the `xpack.security.audit.index` namespace, the
`xpack.security.audit.outputs` setting has also been deprecated and will be
removed in 7. Although explicitly configuring the logfile output does not touch
any deprecation bits, this setting is made redundant in 7 so this PR deprecates
it as well.
Relates #29881
It looks like the output of FileUserPasswdStore.parseFile shouldn't be wrapped
into another map since its output can be null. Doing this wrapping after the null
check (which potentially raises an exception) instead.
Use PEM files for the key/cert for TLS on the http layer of the
node instead of a JKS keystore so that the tests can also run
in a FIPS 140 JVM .
Resolves: #37682
Due to missing stubbing for `NativePrivilegeStore#getPrivileges`
the test `testNegativeLookupsAreCached` failed
when the superuser role name was present in the role names.
This commit adds missing stubbing.
Closes: #37657
Currently we create dedicated network threads for both the http and
transport implementations. Since these these threads should never
perform blocking operations, these threads could be shared. This commit
modifies the nio-transport to have 0 http workers be default. If the
default configs are used, this will cause the http transport to be run
on the transport worker threads. The http worker setting will still exist
in case the user would like to configure dedicated workers. Additionally,
this commmit deletes dedicated acceptor threads. We have never had these
for the netty transport and they can be added back if a need is
determined in the future.
This grants the capability to grant privileges over certain restricted
indices (.security and .security-6 at the moment).
It also removes the special status of the superuser role.
IndicesPermission.Group is extended by adding the `allow_restricted_indices`
boolean flag. By default the flag is false. When it is toggled, you acknowledge
that the indices under the scope of the permission group can cover the
restricted indices as well. Otherwise, by default, restricted indices are ignored
when granting privileges, thus rendering them hidden for authorization purposes.
This effectively adds a confirmation "check-box" for roles that might grant
privileges to restricted indices.
The "special status" of the superuser role has been removed and coded as
any other role:
```
new RoleDescriptor("superuser",
new String[] { "all" },
new RoleDescriptor.IndicesPrivileges[] {
RoleDescriptor.IndicesPrivileges.builder()
.indices("*")
.privileges("all")
.allowRestrictedIndices(true)
// this ----^
.build() },
new RoleDescriptor.ApplicationResourcePrivileges[] {
RoleDescriptor.ApplicationResourcePrivileges.builder()
.application("*")
.privileges("*")
.resources("*")
.build()
},
null, new String[] { "*" },
MetadataUtils.DEFAULT_RESERVED_METADATA,
Collections.emptyMap());
```
In the context of the Backup .security work, this allows the creation of a
"curator role" that would permit listing (get settings) for all indices
(including the restricted ones). That way the curator role would be able to
ist and snapshot all indices, but not read or restore any of them.
Supersedes #36765
Relates #34454
This change fixes failures in the SslMultiPortTests where we attempt to
connect to a profile on a port it is listening on but the connection
fails. The failure is due to the profile being bound to multiple
addresses and randomization will pick one of these addresses to
determine the listening port. However, the address we get the port for
may not be the address we are actually connecting to. In order to
resolve this, the test now sets the bind host for profiles to the
loopback address and uses the same address for connecting.
Closes#37481
This change deletes the SslNullCipherTests from our codebase since it
will have issues with newer JDK versions and it is essentially testing
JDK functionality rather than our own. The upstream JDK issue for
disabling these ciphers by default is
https://bugs.openjdk.java.net/browse/JDK-8212823.
Closes#37403
* Default include_type_name to false for get and put mappings.
* Default include_type_name to false for get field mappings.
* Add a constant for the default include_type_name value.
* Default include_type_name to false for get and put index templates.
* Default include_type_name to false for create index.
* Update create index calls in REST documentation to use include_type_name=true.
* Some minor clean-ups around the get index API.
* In REST tests, use include_type_name=true by default for index creation.
* Make sure to use 'expression == false'.
* Clarify the different IndexTemplateMetaData toXContent methods.
* Fix FullClusterRestartIT#testSnapshotRestore.
* Fix the ml_anomalies_default_mappings test.
* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.
We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.
This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.
* Fix more REST tests.
* Improve some wording in the create index documentation.
* Add a note about types removal in the create index docs.
* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.
* Make sure to mention include_type_name in the REST docs for affected APIs.
* Make sure to use 'expression == false' in FullClusterRestartIT.
* Mention include_type_name in the REST templates docs.
This commit removes the fallback for SSL settings. While this may be
seen as a non user friendly change, the intention behind this change
is to simplify the reasoning needed to understand what is actually
being used for a given SSL configuration. Each configuration now needs
to be explicitly specified as there is no global configuration or
fallback to some other configuration.
Closes#29797
Adds another field, named "request.method", to the structured logfile audit.
This field is present for all events associated with a REST request (not a
transport request) and the value is one of GET, POST, PUT, DELETE, OPTIONS,
HEAD, PATCH, TRACE and CONNECT.
This commit reorders the realm list for iteration based on the last
successful authentication for the given principal. This is an
optimization to prevent unnecessary iteration over realms if we can
make a smart guess on which realm to try first.
This commit fixes a race condition in a test introduced by #36900 that
verifies concurrent authentications get a result propagated from the
first thread that attempts to authenticate. Previously, a thread may
be in a state where it had not attempted to authenticate when the first
thread that authenticates finishes the authentication, which would
cause the test to fail as there would be an additional authentication
attempt. This change adds additional latches to ensure all threads have
attempted to authenticate before a result gets returned in the
thread that is performing authentication.
Closing a channel using TLS/SSL requires reading and writing a
CLOSE_NOTIFY message (for pre-1.3 TLS versions). Many implementations do
not actually send the CLOSE_NOTIFY message, which means we are depending
on the TCP close from the other side to ensure channels are closed. In
case there is an issue with this, we need a timeout. This commit adds a
timeout to the channel close process for TLS secured channels.
As part of this change, we need a timer service. We could use the
generic Elasticsearch timeout threadpool. However, it would be nice to
have a local to the nio event loop timer service dedicated to network needs. In
the future this service could support read timeouts, connect timeouts,
request timeouts, etc. This commit adds a basic priority queue backed
service. Since our timeout volume (channel closes) is very low, this
should be fine. However, this can be updated to something more efficient
in the future if needed (timer wheel). Everything being local to the event loop
thread makes the logic simple as no locking or synchronization is necessary.
This bug was introduced in #36893 and had the effect that
execution would continue after calling onFailure on the the
listener in checkIfTokenIsValid in the case that the token is
expired. In a case of many consecutive requests this could lead to
the unwelcome side effect of an expired access token producing a
successful authentication response.
After #30794, our caching realms limit each principal to a single auth
attempt at a time. This prevents hammering of external servers but can
cause a significant performance hit when requests need to go through a
realm that takes a long time to attempt to authenticate in order to get
to the realm that actually authenticates. In order to address this,
this change will propagate failed results to listeners if they use the
same set of credentials that the authentication attempt used. This does
prevent these stalled requests from retrying the authentication attempt
but the implementation does allow for new requests to retry the
attempt.
We run subsequent token invalidation requests and we still want to
trigger the deletion of expired tokens so we need to lower the
deleteInterval parameter significantly. Especially now that the
bwc expiration logic is removed and the invalidation process is
much shorter
Resolves#37063
- Removes bwc invalidation logic from the TokenService
- Removes bwc serialization for InvalidateTokenResponse objects as
old nodes in supported mixed clusters during upgrade will be 6.7 and
thus will know of the new format
- Removes the created field from the TokensInvalidationResult and the
InvalidateTokenResponse as it is no longer useful in > 7.0
If we don't explicitly sett the client SSLSocketFactory when
creating an InMemoryDirectoryServer and setting its SSL config, it
will result in using a TrustAllTrustManager(that extends
X509TrustManager) which is not allowed in a FIPS 140 JVM.
Instead, we get the SSLSocketFactory from the existing SSLContext
and pass that to be used.
Resolves#37013
The phrase "missing authentication token" is historic and is based
around the use of "AuthenticationToken" objects inside the Realm code.
However, now that we have a TokenService and token API, this message
would sometimes lead people in the wrong direction and they would try
and generate a "token" for authentication purposes when they would
typically just need a username:password Basic Auth header.
This change replaces the word "token" with "credentials".
In #30509 we changed the way SSL configuration is reloaded when the
content of a file changes. As a consequence of that implementation
change the LDAP realm ceased to pick up changes to CA files (or other
certificate material) if they changed.
This commit repairs the reloading behaviour for LDAP realms, and adds
a test for this functionality.
Resolves: #36923
Realm settings were changed in #30241 in a non-BWC way.
If you try and start a 7.x node using a 6.x config style, then the
default error messages do not adequately describe the cause of
the problem, or the solution.
This change detects the when realms are using the 6.x style and fails
with a specific error message.
This detection is a best-effort, and will detect issues when the
realms have not been modified to use the 7.x style, but may not detect
situations where the configuration was partially changed.
e.g. We can detect this:
xpack.security.authc:
realms.pki1.type: pki
realms.pki1.order: 3
realms.pki1.ssl.certificate_authorities: [ "ca.crt" ]
But this (where the "order" has been updated, but the "ssl.*" has not)
will fall back to the standard "unknown setting" check
xpack.security.authc:
realms.pki.pki1.order: 3
realms.pki1.ssl.certificate_authorities: [ "ca.crt" ]
Closes: #36026
This change:
- Adds functionality to invalidate all (refresh+access) tokens for all users of a realm
- Adds functionality to invalidate all (refresh+access)tokens for a user in all realms
- Adds functionality to invalidate all (refresh+access) tokens for a user in a specific realm
- Changes the response format for the invalidate token API to contain information about the
number of the invalidated tokens and possible errors that were encountered.
- Updates the API Documentation
After back-porting to 6.x, the `created` field will be removed from master as a field in the
response
Resolves: #35115
Relates: #34556
This commit adds the last sequence number and primary term of the last operation that have
modified a document to `GetResult` and uses it to power the Update API.
Relates #36148
Relates #10708
For class fields of type collection whose order is not important
and for which duplicates are not permitted we declare them as `Set`s.
Usually the definition is a `HashSet` but in this case `TreeSet` is used
instead to aid testing.
This commit updates our transport settings for 7.0. It generally takes a
few approaches. First, for normal transport settings, it usestransport.
instead of transport.tcp. Second, it uses transport.tcp, http.tcp,
or network.tcp for all settings that are proxies for OS level socket
settings. Third, it marks the network.tcp.connect_timeout setting for
removal. Network service level settings are only settings that apply to
both the http and transport modules. There is no connect timeout in
http. Fourth, it moves all the transport settings to a single class
TransportSettings similar to the HttpTransportSettings class.
This commit does not actually remove any settings. It just adds the new
renamed settings and adds todos for settings that will be deprecated.
There are certain BootstrapCheck checks that may need access environment-specific
values. Watcher's EncryptSensitiveDataBootstrapCheck passes in the node's environment
via a constructor to bypass the shortcoming in BootstrapContext. This commit
pulls in the node's environment into BootstrapContext.
Another case is found in #36519, where it is useful to check the state of the
data-path. Since PathUtils.get and Paths.get are forbidden APIs, we rely on
the environment to retrieve references to things like node data paths.
This means that the BootstrapContext will have the same Settings used in the
Environment, which currently differs from the Node's settings.
`PageCacheRecycler` is the class that creates and holds pages of arrays
for various uses. `BigArrays` is just one user of these pages. This
commit moves the constants that define the page sizes for the recycler
to be on the recycler class.
This commit modifies BigArrays to take a circuit breaker name and
the circuit breaking service. The default instance of BigArrays that
is passed around everywhere always uses the request breaker. At the
network level, we want to be using the inflight request breaker. So this
change will allow that.
Additionally, as this change moves away from a single instance of
BigArrays, the class is modified to not be a Releasable anymore.
Releasing big arrays was always dispatching to the PageCacheRecycler,
so this change makes the PageCacheRecycler the class that needs to be
managed and torn-down.
Finally, this commit closes#31435 be making the serialization of
transport messages use the inflight request breaker. With this change,
we no longer push the global BigArrays instnace to the network level.
* This commit is part of our plan to deprecate and ultimately remove the use of _xpack in the REST APIs.
- REST API docs
- HLRC docs and doc tests
- Handle REST actions with deprecation warnings
- Changed endpoints in rest-api-spec and relevant file names
The following updates were made:
- Add a new untyped endpoint `{index}/_explain/{id}`.
- Add deprecation warnings to Rest*Action, plus tests in Rest*ActionTests.
- For each REST yml test, make sure there is one version without types, and another legacy version that retains types (called *_with_types.yml).
- Deprecate relevant methods on the Java HLRC requests/ responses.
- Update documentation (for both the REST API and Java HLRC).
This is related to #27260. In Elasticsearch all of the messages that we
serialize to write to the network are composed of heap bytes. When you
read or write to a nio socket in java, the heap memory you passed down
must be copied to/from direct memory. The JVM internally does some
buffering of the direct memory, however it is essentially unbounded.
This commit introduces a simple mechanism of buffering and copying the
memory in transport-nio. Each network event loop is given a 64kb
DirectByteBuffer. When we go to read we use this buffer and copy the
data after the read. Additionally, when we go to write, we copy the data
to the direct memory before calling write. 64KB is chosen as this is the
default receive buffer size we use for transport-netty4
(NETTY_RECEIVE_PREDICTOR_SIZE).
Since we only have one buffer per thread, we could afford larger.
However, if we the buffer is large and not all of the data is flushed in
a write call, we will do excess copies. This is something we can
explore in the future.
* Add deprecation warnings to `Rest*TermVectorsAction`, plus tests in `Rest*TermVectorsActionTests`.
* Deprecate relevant methods on the Java HLRC requests/ responses.
* Update documentation (for both the REST API and Java HLRC).
* For each REST yml test, create one version without types, and another legacy version that retains types (called *_with_types.yml).
We have a few places where we register license state listeners on
transient components (i.e., resources that can be open and closed during
the lifecycle of the server). In one case (the opt-out query cache) we
were never removing the registered listener, effectively a terrible
memory leak. In another case, we were not un-registered the listener
that we registered, since we were not referencing the same instance of
Runnable. This commit does two things:
- introduces a marker interface LicenseStateListener so that it is
easier to identify these listeners in the codebase and avoid classes
that need to register a license state listener from having to
implement Runnable which carries a different semantic meaning than
we want here
- fixes the two places where we are currently leaking license state
listeners
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).
Relates #33028
Made credentials mandatory for xpack migrate tool.
Closes#29847.
The x-pack user and roles APIs aren't available unless security is enabled, so the tool should always be called with the -u and -p options specified.
This commit adds an empty CcrRepository snapshot/restore repository.
When a new cluster is registered in the remote cluster settings, a new
CcrRepository is registered for that cluster.
This is implemented using a new concept of "internal repositories".
RepositoryPlugin now allows implementations to return factories for
"internal repositories". The "internal repositories" are different from
normal repositories in that they cannot be registered through the
external repository api. Additionally, "internal repositories" are local
to a node and are not stored in the cluster state.
The repository will be unregistered if the remote cluster is removed.
This commit makes `document`, `update`, `explain`, `termvectors` and `mapping`
typeless APIs work on indices that have a type whose name is not `_doc`.
Unfortunately, this needs to be a bit of a hack since I didn't want calls with
random type names to see documents with the type name that the user had chosen
upon type creation.
The `explain` and `termvectors` do not support being called without a type for
now so the test is just using `_doc` as a type for now, we will need to fix
tests later but this shouldn't require further changes server-side since passing
`_doc` as a type name is what typeless APIs do internally anyway.
Relates #35190
Introduces a debug log message when a bind fails and a trace message
when a bind succeeds.
It may seem strange to only debug a bind failure, but failures of this
nature are relatively common in some realm configurations (e.g. LDAP
realm with multiple user templates, or additional realms configured
after an LDAP realm).
This is a follow-up to #35144. That commit made the underlying
connection opening process in TcpTransport asynchronous. However the
method still blocked on the process being complete before returning.
This commit moves the blocking to the ConnectionManager level. This is
another step towards the top-level TransportService api being async.
This commit upgrades netty. This will close#35360. Netty started
throwing an IllegalArgumentException if a CompositeByteBuf is
created with < 2 components. Netty4Utils was updated to reflect this
change.
This is related to #34405 and a follow-up to #34753. It makes a number
of changes to our current keepalive pings.
The ping interval configuration is moved to the ConnectionProfile.
The server channel now responds to pings. This makes the keepalive
pings bidirectional.
On the client-side, the pings can now be optimized away. What this
means is that if the channel has received a message or sent a message
since the last pinging round, the ping is not sent for this round.
Today the default for USE_ZEN2 is false and it is overridden in many places. By
defaulting it to true we can be sure that the only places in which Zen2 does
not work are those in which it is explicitly set to false.
In #30241 Realm settings were changed, but the Kerberos realm settings
were not registered correctly. This change fixes the registration of
those Kerberos settings.
Also adds a new integration test that ensures every internal realm can
be configured in a test cluster.
Also fixes the QA test for kerberos.
Resolves: #35942
Right now using the `GET /_tasks/<taskid>` API and causing a task to opt
in to saving its result after being completed requires permissions on
the `.tasks` index. When we built this we thought that that was fine,
but we've since moved towards not leaking details like "persisting task
results after the task is completed is done by saving them into an index
named `.tasks`." A more modern way of doing this would be to save the
tasks into the index "under the hood" and to have APIs to manage the
saved tasks. This is the first step down that road: it drops the
requirement to have permissions to interact with the `.tasks` index when
fetching task statuses and when persisting statuses beyond the lifetime
of the task.
In particular, this moves the concept of the "origin" of an action into
a more prominent place in the Elasticsearch server. The origin of an
action is ignored by the server, but the security plugin uses the origin
to make requests on behalf of a user in such a way that the user need
not have permissions to perform these actions. It *can* be made to be
fairly precise. More specifically, we can create an internal user just
for the tasks API that just has permission to interact with the `.tasks`
index. This change doesn't do that, instead, it uses the ubiquitus
"xpack" user which has most permissions because it is simpler. Adding
the tasks user is something I'd like to get to in a follow up change.
Instead, the majority of this change is about moving the "origin"
concept from the security portion of x-pack into the server. This should
allow any code to use the origin. To keep the change managable I've also
opted to deprecate rather than remove the "origin" helpers in the
security code. Removing them is almost entirely mechanical and I'd like
to that in a follow up as well.
Relates to #35573
Clients can use the Kerberos V5 security mechanism and when it
used this to establish security context it failed to do so as
Elasticsearch server only accepted Spengo mechanism.
This commit adds support to accept Kerberos V5 credentials
over spnego.
Closes#34763
- Add the authentication realm and lookup realm name and type in the response for the _authenticate API
- The authentication realm is set as the lookup realm too (instead of setting the lookup realm to null or empty ) when no lookup realm is used.
This commit is related to #32517. It allows an "sni_server_name"
attribute on a DiscoveryNode to be propagated to the server using
the TLS SNI extentsion. Prior to this commit, this functionality
was only support for the netty transport. This commit adds this
functionality to the security nio transport.
This commit adds a test for handling correctly all they possible
`SamlPrepareAuthenticationRequest` parameter combinations that
we might get from Kibana or a custom web application talking to the
SAML APIs.
We can match the correct SAML realm based either on the realm name
or the ACS URL. If both are included in the request then both need to
match the realm configuration.
This generates a synthesized "id" for each incoming request that is
included in the audit logs (file only).
This id can be used to correlate events for the same request (e.g.
authentication success with access granted).
This request.id is specific to the audit logs and is not used for any
other purpose
The request.id is consistent across nodes if a single request requires
execution on multiple nodes (e.g. search acros multiple shards).
When assertions are enabled, a Put User action that have no effect (a
noop update) would trigger an assertion failure and shutdown the node.
This change accepts "noop" as an update result, and adds more
diagnostics to the assertion failure message.
The RestHasPrivilegesAction previously handled its own XContent
generation. This change moves that into HasPrivilegesResponse and
makes the response implement ToXContent.
This allows HasPrivilegesResponseTests to be used to test
compatibility between HLRC and X-Pack internals.
A serialization bug (cluster privs) was also fixed here.
* The port assigned to all loopback interfaces doesn't necessarily have to be the same for ipv4 and ipv6
=> use actual address from profile instead of just port + loopback in test
* Closes#35584
Zen2 is now feature-complete enough to run most ESIntegTestCase tests. The changes in this PR
are as follows:
- ClusterSettingsIT is adapted to not be Zen1 specific anymore (it was using Zen1 settings).
- Some of the integration tests require persistent storage of the cluster state, which is not fully
implemented yet (see #33958). These tests keep running with Zen1 for now but will be switched
over as soon as that is fully implemented.
- Some very few integration tests are not running yet with Zen2 for other reasons, depending on
some of the other open points in #32006.
The DefaultAuthenticationFailureHandler has a deprecated constructor
that was present to prevent a breaking change to custom realm plugin
authors in 6.x. This commit removes the constructor and its uses.
For some time, the PutUser REST API has supported storing a pre-hashed
password for a user. The change adds validation and tests around that
feature so that it can be documented & officially supported.
It also prevents the request from containing both a "password" and a "password_hash".
Many realm tests were written to use separate setting objects for
"global settings" and "realm settings".
Since #30241 there is no distinction between these settings, so these
tests can be cleaned up to use a single Settings object.
This is related to #34483. It introduces a namespaced setting for
compression that allows users to configure compression on a per remote
cluster basis. The transport.tcp.compress remains as a fallback
setting. If transport.tcp.compress is set to true, then all requests
and responses are compressed. If it is set to false, only requests to
clusters based on the cluster.remote.cluster_name.transport.compress
setting are compressed. However, after this change regardless of any
local settings, responses will be compressed if the request that is
received was compressed.
- Introduces a transport API for bootstrapping a Zen2 cluster
- Introduces a transport API for requesting the set of nodes that a
master-eligible node has discovered and for waiting until this comprises the
expected number of nodes.
- Alters ESIntegTestCase to use these APIs when forming a cluster, rather than
injecting the initial configuration directly.
There is no longer a concept of non-global "realm settings". All realm
settings should be loaded from the node's settings using standard
Setting classes.
This change renames the "globalSettings" field and method to simply be
"settings".
The file realm has not supported custom filenames/locations since at
least 5.0, but this test still tried to configure them.
Remove all configuration of file locations, and cleaned up a few other
warnings and deprecations
With this change, `Version` no longer carries information about the qualifier,
we still need a way to show the "display version" that does have both
qualifier and snapshot. This is now stored by the build and red from `META-INF`.
This is related to #29023. Additionally at other points we have
discussed a preference for removing the need to unnecessarily block
threads for opening new node connections. This commit lays the groudwork
for this by opening connections asynchronously at the transport level.
We still block, however, this work will make it possible to eventually
remove all blocking on new connections out of the TransportService
and Transport.
This moves all Realm settings to an Affix definition.
However, because different realm types define different settings
(potentially conflicting settings) this requires that the realm type
become part of the setting key.
Thus, we now need to define realm settings as:
xpack.security.authc.realms:
file.file1:
order: 0
native.native1:
order: 1
- This is a breaking change to realm config
- This is also a breaking change to custom security realms (SecurityExtension)
Stop passing `Settings` to `AbstractComponent`'s ctor. This allows us to
stop passing around `Settings` in a *ton* of places. While this change
touches many files, it touches them all in fairly small, mechanical
ways, doing a few things per file:
1. Drop the `super(settings);` line on everything that extends
`AbstractComponent`.
2. Drop the `settings` argument to the ctor if it is no longer used.
3. If the file doesn't use `logger` then drop `extends
AbstractComponent` from it.
4. Clean up all compilation failure caused by the `settings` removal
and drop any now unused `settings` isntances and method arguments.
I've intentionally *not* removed the `settings` argument from a few
files:
1. TransportAction
2. AbstractLifecycleComponent
3. BaseRestHandler
These files don't *need* `settings` either, but this change is large
enough as is.
Relates to #34488
Drops the `Settings` member from `AbstractComponent`, moving it from the
base class on to the classes that use it. For the most part this is a
mechanical change that doesn't drop `Settings` accesses. The one
exception to this is naming threads where it switches from an invocation
that passes `Settings` and extracts the node name to one that explicitly
passes the node name.
This change doesn't drop the `Settings` argument from
`AbstractComponent`'s ctor because this change is big enough as is.
We'll do that in a follow up change.
The native roles store previously would issue a search if attempting to
retrieve more than a single role. This is fine when we are attempting
to retrieve all of the roles to list them in the api, but could cause
issues when attempting to find roles for a user. The search is not
prioritized over other search requests, so heavy aggregations/searches
or overloaded nodes could cause roles to be cached as missing if the
search is rejected.
When attempting to load specific roles, we know the document id for the
role that we are trying to load, which allows us to use the multi-get
api for loading these roles. This change makes use of the multi-get api
when attempting to load more than one role by name. This api is also
constrained by a threadpool but the tasks in the GET threadpool should
be quicker than searches.
See #33205
SSLTrustRestrictionsTests.testRestrictionsAreReloaded checks that the
SSL trust configuration is automatically updated reapplied if the
underlying "trust_restrictions.yml" file is modified.
Since the default resource watcher frequency is 5seconds, it could
take 10 second to run that test (as it waits for 2 reloaded).
Previously this test set that frequency to a very low value (3ms) so
that the elapsed time for the test would be reduced. However this
caused other problems, including that the resource watcher would
frequently run while the cluster was shutting down and files were
being cleaned up.
This change resets that watch frequency back to its default (5s) and
then manually calls the "notifyNow" method on the resource watcher
whenever the restrictions file is modified, so that the SSL trust
configuration is reloaded at exactly the right time.
Resolves: #34502
In order to remove Streamable from the codebase, Response objects need
to be read using the Writeable.Reader interface which this change
enables. This change enables the use of Writeable.Reader by adding the
`Action#getResponseReader` method. The default implementation simply
uses the existing `newResponse` method and the readFrom method. As
responses are migrated to the Writeable.Reader interface, Action
classes can be updated to throw an UnsupportedOperationException when
`newResponse` is called and override the `getResponseReader` method.
Relates #34389
This is related to #30876. The AbstractSimpleTransportTestCase initiates
many tcp connections. There are normally over 1,000 connections in
TIME_WAIT at the end of the test. This is because every test opens at
least two different transports that connect to each other with 13
channel connection profiles. This commit modifies the default
connection profile used by this test to 6. One connection for each
type, except for REG which gets 2 connections.
* NETWORKING: Add SSL Handler before other Handlers
* The only way to run into the issue in #33998 is for `Netty4MessageChannelHandler`
to be in the pipeline while the SslHandler is not. Adding the SslHandler before any
other handlers should ensure correct ordering here even when we handle upstream events
in our own thread pool
* Ensure that channels that were closed concurrently don't trip the assertion
* Closes#33998
* Adding stack_monitoring_agent role
* Fixing checkstyle issues
* Adding tests for new role
* Tighten up privileges around index templates
* s/stack_monitoring_user/remote_monitoring_collector/ + remote_monitoring_user
* Fixing checkstyle violation
* Fix test
* Removing unused field
* Adding missed code
* Fixing data type
* Update Integration Test for new builtin user
This fixes a bug about aliases authorization.
That is, a user might see aliases which he is not authorized to see.
This manifests when the user is not authorized to see any aliases
and the `GetAlias` request is empty which normally is a marking
that all aliases are requested. In this case, no aliases should be
returned, but due to this bug, all aliases will have been returned.
JDK11 introduced some changes with the SSLEngine. A number of error
messages were changed. Additionally, there were some behavior changes
in regard to how the SSLEngine handles closes during the handshake
process. This commit updates our tests and SSLDriver to support these
changes.
The security native stores follow a pattern where
`SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls
made for the security index. The reasoning behind this was to check if
the security index had been upgraded to the latest version in a
consistent manner. However, this has the potential side effect that a
read will trigger the creation of the security index or an updating of
its mappings, which can lead to issues such as failures due to put
mapping requests timing out even though we might have been able to read
from the index and get the data necessary.
This change introduces a new method, `checkIndexVersionThenExecute`,
that provides the consistent checking of the security index to make
sure it has been upgraded. That is the only check that this method
performs prior to running the passed in operation, which removes the
possible triggering of index creation and mapping updates for reads.
Additionally, areas where we do reads now check the availability of the
security index and can short circuit requests. Availability in this
context means that the index exists and all primaries are active.
This is the fixed version of #34246, which was reverted.
Relates #33205
For user/_has_privileges and user/_privileges, handle the case where
there is no user in the security context. This is likely to indicate
that the server is running with a basic license, in which case the
action will be rejected with a non-compliance exception (provided
we don't throw a NPE).
The implementation here is based on the _authenticate API.
Resolves: #34567
The logfile audit log format is no longer formed by prefix fields followed
by key value fields, it is all formed by key value fields only (JSON format).
Consequently, the following settings, which toggled some of the prefix
fields, have been renamed:
audit.logfile .prefix.emit_node_host_address
audit.logfile .prefix.emit_node_host_name
audit.logfile .prefix.emit_node_name
This API is intended as a companion to the _has_privileges API.
It returns the list of privileges that are held by the current user.
This information is difficult to reason about, and consumers should
avoid making direct security decisions based solely on this data.
For example, each of the following index privileges (as well as many
more) would grant a user access to index a new document into the
"metrics-2018-08-30" index, but clients should not try and deduce
that information from this API.
- "all" on "*"
- "all" on "metrics-*"
- "write" on "metrics-2018-*"
- "write" on "metrics-2018-08-30"
Rather, if a client wished to know if a user had "index" access to
_any_ index, it would be possible to use this API to determine whether
the user has any index privileges, and on which index patterns, and
then feed those index patterns into _has_privileges in order to
determine whether the "index" privilege had been granted.
The result JSON is modelled on the Role API, with a few small changes
to reflect how privileges are modelled when multiple roles are merged
together (multiple DLS queries, multiple FLS grants, multiple global
conditions, etc).
This reverts commit 0b4e8db1d3 as some
issues have been identified with the changed handling of a primary
shard of the security index not being available.
The token service has fairly strict validation and there are a range
of reasons why request may be rejected.
The detail is typically returned in the client exception / json body
but the ES admin can only debug that if they have access to detailed
logs from the client.
This commit adds debug & trace logging to the token service so that it
is possible to perform this debugging from the server side if
necessary.
The security native stores follow a pattern where
`SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls
made for the security index. The reasoning behind this was to check if
the security index had been upgraded to the latest version in a
consistent manner. However, this has the potential side effect that a
read will trigger the creation of the security index or an updating of
its mappings, which can lead to issues such as failures due to put
mapping requests timing out even though we might have been able to read
from the index and get the data necessary.
This change introduces a new method, `checkIndexVersionThenExecute`,
that provides the consistent checking of the security index to make
sure it has been upgraded. That is the only check that this method
performs prior to running the passed in operation, which removes the
possible triggering of index creation and mapping updates for reads.
Additionally, areas where we do reads now check the availability of the
security index and can short circuit requests. Availability in this
context means that the index exists and all primaries are active.
Relates #33205
Security caches the result of role lookups and negative lookups are
cached indefinitely. In the case of transient failures this leads to a
bad experience as the roles could truly exist. The CompositeRolesStore
needs to know if a failure occurred in one of the roles stores in order
to make the appropriate decision as it relates to caching. In order to
provide this information to the CompositeRolesStore, the return type of
methods to retrieve roles has changed to a new class,
RoleRetrievalResult. This class provides the ability to pass back an
exception to the roles store. This exception does not mean that a
request should be failed but instead serves as a signal to the roles
store that missing roles should not be cached and neither should the
combined role if there are missing roles.
As part of this, the negative lookup cache was also changed from an
unbounded cache to a cache with a configurable limit.
Relates #33205
PR #34290 made it impossible to use thread-context values to pass
authentication metadata out of a realm. The SAML realm used this
technique to allow the SamlAuthenticateAction to process the parsed
SAML token, and apply them to the access token that was generated.
This new method adds metadata to the AuthenticationResult itself, and
then the authentication service makes this result available on the
thread context.
Closes: #34332
ListenableFuture may run a listener on the same thread that called the
addListener method or it may execute on another thread after the future
has completed. Whenever the ListenableFuture stores the listener for
execution later, it should preserve the thread context which is what
this change does.
Since all calls to `ESLoggerFactory` outside of the logging package were
deprecated, it seemed like it'd simplify things to migrate all of the
deprecated calls and declare `ESLoggerFactory` to be package private.
This does that.
The "lookupUser" method on a realm facilitates the "run-as" and
"authorization_realms" features.
This commit allows a realm to be used for "lookup only", in which
case the "authenticate" method (and associated token methods) are
disabled.
It does this through the introduction of a new
"authentication.enabled" setting, which defaults to true.
Building automatons can be costly. For the most part we cache things
that use automatons so the cost is limited.
However:
- We don't (currently) do that everywhere (e.g. we don't cache role
mappings)
- It is sometimes necessary to clear some of those caches which can
cause significant CPU overhead and processing delays.
This commit introduces a new cache in the Automatons class to avoid
unnecesarily recomputing automatons.
There may be values in the thread context that ought to be preseved
for later use, even if one or more realms perform asynchronous
authentication.
This commit changes the AuthenticationService to wrap the potentially
asynchronous calls in a ContextPreservingActionListener that retains
the original thread context for the authentication.
The Security plugin authorizes actions on indices. Authorization
happens on a per index/alias basis. Therefore a request with a
Multi Index Expression (containing wildcards) has to be
first evaluated in the authorization layer, before the request is
handled. For authorization purposes, wildcards in expressions will
only be expanded to indices/aliases that are visible by the authenticated
user. However, this "constrained" evaluation has to be compatible with
the expression evaluation that a cluster without the Security plugin
would do. Therefore any change in the evaluation logic
in any of these sites has to be mirrored in the other site.
This commit mirrors the changes in core from #33518 that allowed
for Multi Index Expression in the Get Alias API, loosely speaking.
When the cluster.routing.allocation.disk.watermark.flood_stage watermark
is breached, DiskThresholdMonitor marks the indices as read-only. This
failed when x-pack security was present as system user does not have the privilege
for update settings action("indices:admin/settings/update").
This commit adds the required privilege for the system user. Also added missing
debug logs when access is denied to help future debugging.
An assert statement is added to catch any missed privileges required for
system user.
Closes#33119
In SessionFactoryLoadBalancingTests#testRoundRobinWithFailures()
we kill ldap servers randomly and immediately bind to that port
connecting to mock server socket. This is done to avoid someone else
listening to this port. As the creation of mock socket and binding to the
port is immediate, sometimes the earlier socket would be in TIME_WAIT state
thereby having problems with either bind or connect.
This commit sets the SO_REUSEADDR explicitly to true and also sets
the linger on time to 0(as we are not writing any data) so as to
allow re-use of the port and close immediately.
Note: I could not find other places where this might be problematic
but looking at test runs and netstat output I do see lot of sockets
in TIME_WAIT. If we find that this needs to be addressed we can
wrap ServerSocketFactory to set these options and use that with in
memory ldap server configuration during tests.
Closes#32190
This commit upgrades the unboundid ldapsdk to version 4.0.8. The
primary driver for upgrading is a fix that prevents this library from
rewrapping Error instances that would normally bubble up to the
UncaughtExceptionHandler and terminate the JVM. Other notable changes
include some fixes related to connection handling in the library's
connection pool implementation.
Closes#33175
The `DnRoleMapper` class is used to map distinguished names of groups
and users to role names. This mapper builds in an internal map that
maps from a `com.unboundid.ldap.sdk.DN` to a `Set<String>`. In cases
where a lot of distinct DNs are mapped to roles, this can consume quite
a bit of memory. The majority of the memory is consumed by the DN
object. For example, a 94 character DN that has 9 relative DNs (RDN)
will retain 4KB of memory, whereas the String itself consumes less than
250 bytes.
In order to reduce memory usage, we can map from a normalized DN string
to a List of roles. The normalized string is actually how the DN class
determines equality with another DN and we can drop the overhead of
needing to keep all of the other objects in memory. Additionally the
use of a List provides memory savings as each HashSet is backed by a
HashMap, which consumes a great deal more memory than an appropriately
sized ArrayList. The uniqueness we get from a Set is maintained by
first building a set when parsing the file and then converting to a
list upon completion.
Closes#34237
Today we reverse the initial order of the nested documents when we
index them in order to ensure that parents documents appear after
their children. This means that a query will always match nested documents
in the reverse order of their offsets in the source document.
Reversing all documents is not needed so this change ensures that parents
documents appear after their children without modifying the initial order
in each nested level. This allows to match children in the order of their
appearance in the source document which is a requirement to efficiently
implement #33587. Old indices created before this change will continue
to reverse the order of nested documents to ensure backwark compatibility.
In prior versions of Java, we expected to see a SSLHandshakeException
when starting a handshake with a server that we do not trust. In JDK11,
the exception has changed to a SSLException, which
SSLHandshakeException extends. This is most likely a side effect of the
TLS 1.3 changes in JDK11. This change updates the test to catch the
SSLException instead of the SSLHandshakeException and enables the test
to work on JDK8 through JDK11.
Closes#29989
In prior versions of Java, we expected to see a SSLHandshakeException
when starting a handshake with a server that we do not trust. In JDK11,
the exception has changed to a SSLException, which
SSLHandshakeException extends. This is most likely a side effect of the
TLS 1.3 changes in JDK11. This change updates the test to catch the
SSLException instead of the SSLHandshakeException and enables the test
to work on JDK8 through JDK11.
Closes#32293
`Settings` is no longer required to get a `Logger` and we went to quite
a bit of effort to pass it to the `Logger` getters. This removes the
`Settings` from all of the logger fetches in security and x-pack:core.
Security previously hardcoded a default scroll keepalive of 10 seconds,
but in some cases this is not enough time as there can be network
issues or overloading of host machines. After this change, security
will now use the default keepalive timeout, which is controllable using
a setting and the default value is 5 minutes.
In order to optimize the use of the role cache, when the roles.yml file
is reloaded we now calculate the names of removed, changed, and added
roles so that they may be passed to any listeners. This allows a
listener to selectively clear cache for only the roles that have been
modified. The CompositeRolesStore has been adapted to do exactly that
so that we limit the need to reload roles from sources such as the
native roles stores or external role providers.
See #33205
This change cleans up "unused variable" warnings. There are several cases were we
most likely want to suppress the warnings (especially in the client documentation test
where the snippets contain many unused variables). In a lot of cases the unused
variables can just be deleted though.
* TESTS: Stabilize Renegotiation Test
* The second `startHandshake` is not synchronous and a read of only
50ms may fail to trigger it entirely (the failure can be reproduced reliably by setting the socket timeout to `1`)
=> fixed by retrying the read until the handshake finishes (a longer timeout would've worked too,
but retrying seemed more stable)
* Closes#33772
This commit introduces an AbstractSimpleSecurityTransportTestCase for
security transports. This classes provides transport tests that are
specific for security transports. Additionally, it fixes the tests referenced in
#33285.
SearchGroupsResolverInMemoryTests was (rarely) fail in a way that
suggests that the server-side delay (100ms) was not enough to trigger
the client-side timeout (5ms).
The server side delay has been increased to try and overcome this.
Resolves: #32913
This change adds the OneStatementPerLineCheck to our checkstyle precommit
checks. This rule restricts the number of statements per line to one. The
resoning behind this is that it is very difficult to read multiple statements on
one line. People seem to mostly use it in short lambdas and switch statements in
our code base, but just going through the changes already uncovered some actual
problems in randomization in test code, so I think its worth it.
Changes the format of log events in the audit logfile.
It also changes the filename suffix from `_access` to `_audit`.
The new entry format is consistent with Elastic Common Schema.
Entries are formatted as JSON with no nested objects and field
names have a dotted syntax. Moreover, log entries themselves
are not spaced by commas and there is exactly one entry per line.
In addition, entry fields are ordered, unlike a typical JSON doc,
such that a human would not strain his eyes over jumbled
fields from one line to the other; the order is defined in the log4j2
properties file.
The implementation utilizes the log4j2's `StringMapMessage`.
This means that the application builds the log event as a map
and the log4j logic (the appender's layout) handle the format
internally. The layout, such as the set of printed fields and their
order, can be changed at runtime without restarting the node.
We have a Kerberos setting to remove realm part from the user
principal name (remove_realm_name). If this is true then
the realm name is removed to form username but in the process,
the realm name is lost. For scenarios like Kerberos cross-realm
authentication, one could make use of the realm name to determine
role mapping for users coming from different realms.
This commit adds user metadata for kerberos_realm and
kerberos_user_principal_name.
We have a test dependency on Apache Mina when using SimpleKdcServer
for testing Kerberos. When checking for LDAP backend connectivity,
the code checks for deadlocks which require additional security
permissions accessClassInPackage.sun.reflect. As this is only for
test and we do not want to add security permissions to production,
this commit moves these tests and related classes to
x-pack evil-tests where they can run with security manager disabled.
The plan is to handle the security manager exception in the upstream issue
DIRMINA-1093
and then once the release is available to run these tests with security
manager enabled.
Closes#32739
This change removes the wrapping of the created field in the put user
response. The created field was added as a top level field in #32332,
while also still being wrapped within the `user` object of the
response. Since the value is available in both formats in 6.x, we can
remove the wrapped version for 7.0.
Today we use a special unicast hosts provider, the `MockUncasedHostsProvider`,
in many integration tests, to deal with the dynamic nature of the allocation of
ports to nodes. However #33241 allows us to use file-based discovery to achieve
the same goal, so the special test-only `MockUncasedHostsProvider` is no longer
required.
This change removes `MockUncasedHostProvider` and replaces it with file-based
discovery in tests based on `EsIntegTestCase`.
This change addresses some issues regarding thread safety around
updates and method calls on the XPackLicenseState object. There exists
a possibility that there could be a concurrent update to the
XPackLicenseState when there is a scheduled check to see if the license
is expired and a cluster state update. In order to address this, the
update method now has a synchronized block where member variables are
updated. Each method that reads these variables is now also
synchronized.
Along with the above change, there was a consistency issue around
security calls to the license state. The majority of security checks
make two calls to the license state, which could result in incorrect
behavior due to the checks being made against different license states.
The majority of this behavior was introduced for 6.3 with the inclusion
of x-pack in the default distribution. In order to resolve the majority
of these cases, the `isSecurityEnabled` method is no longer public and
the logic is also included in individual methods about security such as
`isAuthAllowed`. There were a few cases where this did not remove
multiple calls on the license state, so a new method has been added
which creates a copy of the current license state that will not change.
Callers can use this copy of the license state to make decisions based
on a consistent view of the license state.