The changes made to disable security for trial licenses unless security
is explicitly enabled caused issues when a 6.3 node attempts to join a
cluster that already has a production license installed. The new node
starts off with a trial license and `xpack.security.enabled` is not
set for the node, which causes the security code to skip attaching the
user to the request. The existing cluster has security enabled and the
lack of a user attached to the requests causes the request to be
rejected.
This commit changes the security code to check if the state has been
recovered yet when making the decision on whether or not to attach a
user. If the state has not yet been recovered, the code will attach
the user to the request in case security is enabled on the cluster
being joined.
Closes#31332
This is a general cleanup of channels and exception handling in http.
This commit introduces a CloseableChannel that is a superclass of
TcpChannel and HttpChannel. This allows us to unify the closing logic
between tcp and http transports. Additionally, the normal http channels
are extracted to the abstract server transport.
Finally, this commit (mostly) unifies the exception handling between nio
and netty4 http server transports.
Currently, when we open a new channel, we pass it an
InboundChannelBuffer. The channel buffer is preallocated a single 16kb
page. However, there is no guarantee that this channel will be read from
anytime soon. Instead, this commit does not preallocate that page. That
page will be allocated when we receive a read event.
Since #30966, Action no longer has anything but a call to the
GenericAction super constructor. This commit renames GenericAction
into Action, thus eliminating the Action class. Additionally, this
commit removes the Request generic parameter of the class, since
it was unused.
This commit makes it so that cluster state update tasks always run under the system context, only
restoring the original context when the listener that was provided with the task is called. A notable
exception is the clusterStatePublished(...) callback which will still run under system context,
because it's defined on the executor-level, and not the task level, and only called once for the
combined batch of tasks and can therefore not be uniquely identified with a task / thread context.
Relates #30603
This is related to #28898. This PR implements pooling of bytes arrays
when reading from the wire in the http server transport. In order to do
this, we must integrate with netty reference counting. That manner in
which this PR implements this is making Pages in InboundChannelBuffer
reference counted. When we accessing the underlying page to pass to
netty, we retain the page. When netty releases its bytebuf, it releases
the underlying pages we have passed to it.
This is related to #31325. There is currently information about the
get-trial-status api on the start-trial api documentation page. It also
has the incorrect route for that api. This commit removes that
information as the start-trial page properly links to a page providing
documenation about get-trial-status.
This pull request removes the relationship between the state
of persistent task (as stored in the cluster state) and the status
of the task (as reported by the Task APIs and used in various
places) that have been confusing for some time (#29608).
In order to do that, a new PersistentTaskState interface is added.
This interface represents the persisted state of a persistent task.
The methods used to update the state of persistent tasks are
renamed: updatePersistentStatus() becomes updatePersistentTaskState()
and now takes a PersistentTaskState as a parameter. The
Task.Status type as been changed to PersistentTaskState in all
places were it make sense (in persistent task customs in cluster
state and all other methods that deal with the state of an allocated
persistent task).
This is related to #28898. With the addition of the http nio transport,
we now have two different modules that provide http transports.
Currently most of the http logic lives at the module level. However,
some of this logic can live in server. In particular, some of the
setting of headers, cors, and pipelining. This commit begins this moving
in that direction by introducing lower level abstraction (HttpChannel,
HttpRequest, and HttpResonse) that is implemented by the modules. The
higher level rest request and rest channel work can live entirely in
server.
For 6.3 we renamed the `tar` and `zip` distributions to `oss-tar` and
`oss-zip`. Then we added new `tar` and `zip` distributions that contain
x-pack and are licensed under the Elastic License. Unfortunately we
accidentally generated POM files along side the new `tar` and `zip`
distributions that incorrectly claimed that they were Apache 2 licensed.
Oooops.
This fixes the license on the POMs generated for the `tar` and `zip`
distributions.
This adds a `description` to ML filters in order
to allow users to describe their filters in a human
readable form which is also editable (filter updates
to be added shortly).
Due to a runtime classpath clash, featureAware task was failing on JVMs
higher than 1.8 (since the ASM version from Painless was used instead
which does not recognized Java 9 or 10 bytecode) causing the task to
fail.
This commit excludes the ASM dependency (since it's not used by SQL
itself).
We currently have a specific REST action to retrieve all aliaes, which
uses internally the get index API. This doesn't seem to be required
anymore though as the existing RestGetAliaesAction could as well take
the requests with no indices and aliases specified.
This commit removes the RestGetAllAliasesAction in favour of using
RestGetAliasesAction also for requests that don't specify indices nor
aliases. Similar to #31129.
x-pack/sql depends on lang-painless which depends on ASM 5.1
FeatureAwareCheck needs ASM 6
This is a hack to strip ASM5 from the classpath for FeatureAwareCheck
This is related to #27260. Currently when we queue a write with a
channel we set OP_WRITE and wait until the next selection loop to flush
the write. However, if the channel does not have a pending write, it
is probably ready to flush. This PR implements an optimistic flush logic
that will attempt this flush.
- All rollup pages should be marked as experimental instead of just
the top page
- While the job config docs state which aggregations are allowed, adding
a section which specifically details this in one place is more convenient
for the user
- Add a clarification that the DeleteJob API does not delete the rollup
data, just the rollup job.
The parser for the Metric config was directly instantiating
the config object, rather than using the builder. That means it was
bypassing the validation logic built into the builder, and would allow
users to create invalid metric configs (like using unsupported metrics).
The job would later blow up and abort due to bad configs, but this isn't
immediately obvious to the user since the PutJob API succeeded.
This change prevents a datafeed using cross cluster search from starting if the remote cluster
does not have x-pack installed and a sufficient license. The check is made only when starting a
datafeed.
Rules allow users to supply a detector with domain
knowledge that can improve the quality of the results.
The model detects statistically anomalous results but it
has no knowledge of the meaning of the values being modelled.
For example, a detector that performs a population analysis
over IP addresses could benefit from a list of IP addresses
that the user knows to be safe. Then anomalous results for
those IP addresses will not be created and will not affect
the quantiles either.
Another example would be a detector looking for anomalies
in the median value of CPU utilization. A user might want
to inform the detector that any results where the actual
value is less than 5 is not interesting.
This commit introduces a `custom_rules` field to the `Detector`.
A detector may have multiple rules which are combined with `or`.
A rule has 3 fields: `actions`, `scope` and `conditions`.
Actions is a list of what should happen when the rule applies.
The current options include `skip_result` and `skip_model_update`.
The default value for `actions` is the `skip_result` action.
Scope is optional and allows for applying filters on any of the
partition/over/by field. When not defined the rule applies to
all series. The `filter_id` needs to be specified to match the id
of the filter to be used. Optionally, the `filter_type` can be specified
as either `include` (default) or `exclude`. When set to `include`
the rule applies to entities that are in the filter. When set to
`exclude` the rule only applies to entities not in the filter.
There may be zero or more conditions. A condition requires `applies_to`,
`operator` and `value` to be specified. The `applies_to` value can be
either `actual`, `typical` or `diff_from_typical` and it specifies
the numerical value to which the condition applies. The `operator`
(`lt`, `lte`, `gt`, `gte`) and `value` complete the definition.
Conditions are combined with `and` and allow to specify numerical
conditions for when a rule applies.
A rule must either have a scope or one or more conditions. Finally,
a rule with scope and conditions applies when all of them apply.
TransportAction has many variants of execute. One of those variants
executes by returning a future, which is then often blocked on by
calling get(). This commit removes this variant of execute, instead
using a helper method for tests that want to block, or having tests
pass in a PlainActionFuture directly as a listener.
Co-authored-by: Simon Willnauer <simonw@apache.org>
This is in preparation of pushing the new
rules design in the `ml-cpp` side. These
tests will be switched on again after merging
in the new rules implementation.
* Support RequestedAuthnContext
This implements limited support for RequestedAuthnContext by :
- Allowing SP administrators to define a list of authnContextClassRef
to be included in the RequestedAuthnContext of a SAML Authn Request
- Veirifying that the authnContext in the incoming SAML Asertion's
AuthnStatement contains one of the requested authnContextClassRef
- Only EXACT comparison is supported as the semantics of validating
the incoming authnContextClassRef are deployment dependant and
require pre-established rules for MINIMUM, MAXIMUM and BETTER
Also adds necessary AuthnStatement validation as indicated by [1] and
[2]
[1] https://docs.oasis-open.org/security/saml/v2.0/saml-core-2.0-os.pdf
3.4.1.4, line 2250-2253
[2] https://kantarainitiative.github.io/SAMLprofiles/saml2int.html
[SDP-IDP10]
Trying to post a new watch without any body currently results in a
NullPointerException. This change fixes that by validating that
Post and Put requests always have a body.
Closes#30057
Allows users of the Low Level REST client to specify which hosts a
request should be run on. They implement the `NodeSelector` interface
or reuse a built in selector like `NOT_MASTER_ONLY` to chose which nodes
are valid. Using it looks like:
```
Request request = new Request("POST", "/foo/_search");
RequestOptions options = request.getOptions().toBuilder();
options.setNodeSelector(NodeSelector.NOT_MASTER_ONLY);
request.setOptions(options);
...
```
This introduces a new `Node` object which contains a `HttpHost` and the
metadata about the host. At this point that metadata is just `version`
and `roles` but I plan to add node attributes in a followup. The
canonical way to **get** this metadata is to use the `Sniffer` to pull
the information from the Elasticsearch cluster.
I've marked this as "breaking-java" because it breaks custom
implementations of `HostsSniffer` by renaming the interface to
`NodesSniffer` and by changing it from returning a `List<HttpHost>` to a
`List<Node>`. It *shouldn't* break anyone else though.
Because we expect to find it useful, this also implements `host_selector`
support to `do` statements in the yaml tests. Using it looks a little
like:
```
---
"example test":
- skip:
features: host_selector
- do:
host_selector:
version: " - 7.0.0" # same syntax as skip
apiname:
something: true
```
The `do` section parses the `version` string into a host selector that
uses the same version comparison logic as the `skip` section. When the
`do` section is executed it passed the off to the `RestClient`, using
the `ElasticsearchHostsSniffer` to sniff the required metadata.
The idea is to use this in mixed version tests to target a specific
version of Elasticsearch so we can be sure about the deprecation
logging though we don't currently have any examples that need it. We do,
however, have at least one open pull request that requires something
like this to properly test it.
Closes#21888
This commit upgrades us to Netty 4.1.25. This upgrade is more
challenging than past upgrades, all because of a new object cleaner
thread that they have added. This thread requires an additional security
permission (set context class loader, needed to avoid leaks in certain
scenarios). Additionally, there is not a clean way to shutdown this
thread which means that the thread can fail thread leak control during
tests. As such, we have to filter this thread from thread leak control.
The core REST tests with security currently use a hardcoded username and
password. This is not amenable to running these tests in scenarios where
the user controls the creation of the cluster and owns the credentials
for this cluster. This commit enables running the core REST tests with
security with a custom username and password.
The goal of this commit is to address unknown licenses when producing
the dependencies info report. We have two different checks that we run
on licenses. The first check is whether or not we have stashed a copy of
the license text for a dependency in the repository. The second is to
map every dependency to a license type (e.g., BSD 3-clause). The problem
here is that the way we were handling licenses in the second check
differs from how we handle licenses in the first check. The first check
works by finding a license file with the name of the artifact followed
by the text -LICENSE.txt. Yet in some cases we allow mapping an artifact
name to another name used to check for the license (e.g., we map
lucene-.* to lucene, and opensaml-.* to shibboleth. The second check
understood the first way of looking for a license file but not the
second way. So in this commit we teach the second check about the
mappings from artifact names to license names. We do this by copying the
configuration from the dependencyLicenses task to the dependenciesInfo
task and then reusing the code from the first check in the second
check. There were some other challenges here though. For example,
dependenciesInfo was checking too many dependencies. For now, we should
only be checking direct dependencies and leaving transitive dependencies
from another org.elasticsearch artifact to that artifact (we want to do
this differently in a follow-up). We also want to disable
dependenciesInfo for projects that we do not publish, users only care
about licenses they might be exposed to if they use our assembled
products. With all of the changes in this commit we have eliminated all
unknown licenses. A follow-up will enforce that when we add a new
dependency it does not get mapped to unknown, these will be forbidden in
the future. Therefore, with this change and earlier changes are left
having no unknown licenses and two custom licenses; custom here means it
does not map to an SPDX license type. Those two licenses are xz and
ldapsdk. A future change will not allow additional custom licenses
unless they are explicitly whitelisted. This ensures that if a new
dependency is added it is mapped to an SPDX license or mapped to custom
because it does not have an SPDX license.
* Remove DocumentFieldMappers#simpleMatchToFullName, as it is duplicative of MapperService#simpleMatchToIndexNames.
* Rename MapperService#simpleMatchToIndexNames -> simpleMatchToFullName for consistency.
* Simplify EsIntegTestCase#assertConcreteMappingsOnAll to accept concrete fields instead of wildcard patterns.
Make SAML Response Destination check compliant
Only validate the Destination element of an incoming SAML Response
if Destination is present and the SAML Response is signed.
The standard [1] - 3.5.5.2 and [2] - 3.2.2 does mention that the
Destination element is optional and should only be verified when
the SAML Response is signed. Some Identity Provider implementations
are known to not set a Destination XML Attribute in their SAML
responses when those are not signed, so this change also aims to
enhance interoperability.
[1] https://docs.oasis-open.org/security/saml/v2.0/saml-bindings-2.0-os.pdf
[2] https://docs.oasis-open.org/security/saml/v2.0/saml-core-2.0-os.pdf
Use all running nodes as unicast seeds in the rolling restart tests to
avoid a race between pinging and the tests. Without this if the tests
are too fast then when a new node comes up and pings its single
configured seed node that node *might* not have a ping from the other
running node.
This is related to #27260 and #28898. This commit adds the transport-nio
plugin as a random option when running the http smoke tests. As part of
this PR, I identified an issue where cors support was not properly
enabled causing these tests to fail when using transport-nio. This
commit also fixes that issue.
This commit adjusts the indentation in the CLI scripts to give a clear
visual indication that the line being indented is a continuation of the
previous line.
SSLTrustRestrictionsTests updates the restrictions YML file during the test run to change the set of restrictions. This update was small, but it wasn't atomic.
If the yml file is reloaded while empty or invalid, then it causes all SSL certificates to be considered invalid (until it is reloaded again), which could break the sniffing/administrative client that runs underneath the tests.
A previous refactoring of the CLI scripts migrated all of the CLI tools
to shell to a common script, elasticsearch-cli. This approach is fine in
Bash where it is easy to tear arguments apart but it doesn't work so
well on Windows where quoting is insane. To avoid having to tear the
arguments apart to separate the first argument to elasticsearch-cli from
the remaining arguments, we instead choose a strategy where we can avoid
tearing the arguments apart. To do this, we will instead pass the main
class by an environment variable and then we can pass the arguments
straight through. This will let us avoid awful quoting issues on
Windows. This is the Windows side of that effort and the Bash side was
in a previous commit.
A previous refactoring of the CLI scripts migrated all of the CLI tools
to shell to a common script, elasticsearch-cli. This approach is fine in
Bash where it is easy to tear arguments apart but it doesn't work so
well on Windows where quoting is insane. To avoid having to tear the
arguments apart to separate the first argument to elasticsearch-cli from
the remaining arguments, we instead choose a strategy where we can avoid
tearing the arguments apart. To do this, we will instead pass the main
class by an environment variable and then we can pass the arguments
straight through. This will let us avoid awful quoting issues on
Windows. This is the non-Windows side of that effort and the Windows
side will be in a follow-up.
This is related to #27260. This commit combines the AcceptingSelector
and SocketSelector classes into a single NioSelector. This change
allows the same selector to handle both server and socket channels. This
is valuable as we do not necessarily want a dedicated thread running for
accepting channels.
With this change, this commit removes the configuration for dedicated
accepting selectors for the normal transport class. The accepting
workload for new node connections is likely low, meaning that there is
no need to dedicate a thread to this process.
I pushed a test that `assertBusy`s for a whole hour accidentally. I was
testing something and forgot to revert my local hack but caught it on
backport. This removes it.
This is much more realistic and can find more issues. This causes the
"mixed cluster" tests to be run twice so I had to fix the tests to work
in that case. In most cases I did as little as possible to get them
working but in a few cases I went a little beyond that to make them
easier for me to debug while getting them to work. My test changes:
1. Remove the "basic indexing" tests and replace them with a copy of the
tests used in the OSS. We have no way of sharing code between these two
projects so for now I copy.
2. Skip the a few tests in the "one third" upgraded scenario:
* creating a scroll to be reused when the cluster is fully upgraded
* creating some ml data to be used when the cluster is fully ugpraded
3. Drop many "assert yellow and that the cluster has two nodes"
assertions. These assertions duplicate those made by the wait condition
and they fail now that we have three nodes.
4. Switch many "assert green and that the cluster has two nodes" to 3
nodes. These assertions are unique from the wait condition and, while
I imagine they aren't required in all cases, now is not the time to
find that out. Thus, I made them work.
5. Rework the index audit trail test so it is more obvious that it is
the same test expecting different numbers based on the shape of the
cluster. The conditions for which number are expected are fairly
complex because the index audit trail is shut down until the template
for it is upgraded and the template is upgraded when a master node is
elected that has the new version of the software.
6. Add some more information to debug the index audit trail test because
it helped me figure out what was going on.
I also dropped the `waitCondition` from the `rolling-upgrade-basic`
tests because it wasn't needed.
Closes#25336
The native realm's usage stats were previously pulled from the cache,
which only contains the number of users that had authenticated in the
past 20 minutes. This commit changes this so that we pull the current
value from the security index by executing a search request. In order
to support this, the usage stats for realms is now asynchronous so that
we do not block while waiting on the search to complete.
The Index Audit trail allows the override of the template index
settings with settings specified on the conf file.
A bug will manifest when such conf file settings are specified
for templates that need to be upgraded. The bug is an endless
upgrade loop because the upgrade, although successful, is
not reckoned as such by the upgrade service.
move `finger_print`, `pattern` and `standard_html_strip` analyzers
to analysis-common module. (both AnalysisProvider and PreBuiltAnalyzerProvider)
Changed PreBuiltAnalyzerProviderFactory to extend from PreConfiguredAnalysisComponent and
changed to make sure that predefined analyzers are always instantiated with the current
ES version and if an instance is requested for a different version then delegate to PreBuiltCache.
This is similar to the behaviour that exists today in AnalysisRegistry.PreBuiltAnalysis and
PreBuiltAnalyzerProviderFactory. (#31095)
Relates to #23658
This commit adds a check that any class in X-Pack that is a feature
aware custom also implements the appropriate mix-in interface in
X-Pack. These interfaces provide a default implementation of
FeatureAware#getRequiredFeature that returns that x-pack is the required
feature. By implementing this interface, this gives a consistent way for
X-Pack feature aware customs to return the appopriate required feature
and this check enforces that all such feature aware customs return the
appropriate required feature.
We should not allow the user to configure index patterns that also match
the index which stores the rollup index.
For example, it is quite natural for a user to specify `metricbeat-*`
as the index pattern, and then store the rollups in `metricbeat-rolled`.
This will start throwing errors as soon as the rollup index is created
because the indexer will try to search it.
Note: this does not prevent the user from matching against existing
rollup indices. That should be prevented by the field-level validation
during job creation.
ObjectParser should throw XContentParseExceptions, not IAE. A dedicated parsing
exception can includes the place where the error occurred.
Closes#30605
This snapshot includes:
- LUCENE-8341: Record soft deletes in SegmentCommitInfo which will resolve#30851
- LUCENE-8335: Enforce soft-deletes field up-front
Extends ActionRequestValidationException with a rollup-specific version
to make it easier to handle mapping validation issues on the client
side.
The type will now be `rollup_action_request_validation_exception`
instead of `action_request_validation_exception`
The majority of Responses inheriting from AcknowledgeResponse implement
the readFrom and writeTo serialization method in the same way. Moving this
as a default into AcknowledgeResponse and letting the few exceptions that
need a slightly different implementation handle this themselves saves a lot
of duplication.
With #31020 we introduced the ability for transport clients to indicate what features they support
in order to make sure we don't serialize object to them they don't support. This PR adapts the
serialization logic of persistent tasks to be aware of those features and not serialize tasks that
aren't supported.
Also, a version check is added for the future where we may add new tasks implementations and
need to be able to indicate they shouldn't be serialized both to nodes and clients.
As the implementation relies on the interface of `PersistentTaskParams`, these are no longer
optional. That's acceptable as all current implementation have them and we plan to make
`PersistentTaskParams` more central in the future.
Relates to #30731
If you invoke elasticsearch-plugin (or any other CLI script on Windows)
with a path that has a percent-encoded space (or any other
percent-encoded character) because the CLI scripts now shell into a
common shell script (elasticsearch-cli) the percent-encoded space ends
up being interpreted as a parameter. For example passing install --batch
file:/c:/encoded%20%space/analysis-icu-7.0.0.zip to elasticsearch-plugin
leads to the %20 being interpreted as %2 followed by a zero. Here, the
%2 is interpreted as the second parameter (--batch) and the
InstallPluginCommand class ends up seeing
file:/c/encoded--batch0space/analysis-icu-7.0.0.zip as the path which
will not exist. This commit addresses this by escaping the %* that is
used to pass the parameters to the common CLI script so that the common
script sees the correct parameters without the %2 being substituted.
This commit introduces the ability for a client to communicate to the
server features that it can support and for these features to be used in
influencing the decisions that the server makes when communicating with
the client. To this end we carry the features from the client to the
underlying stream as we carry the version of the client today. This
enables us to enhance the logic where we make protocol decisions on the
basis of the version on the stream to also make protocol decisions on
the basis of the features on the stream. With such functionality, the
client can communicate to the server if it is a transport client, or if
it has, for example, X-Pack installed. This enables us to support
rolling upgrades from the OSS distribution to the default distribution
without breaking client connectivity as we can now elect to serialize
customs in the cluster state depending on whether or not the client
reports to us using the feature capabilities that it can under these
customs. This means that we would avoid sending a client pieces of the
cluster state that it can not understand. However, we want to take care
and always send the full cluster state during node-to-node communication
as otherwise we would end up with different understanding of what is in
the cluster state across nodes depending on which features they reported
to have. This is why when deciding whether or not to write out a custom
we always send the custom if the client is not a transport client and
otherwise do not send the custom if the client is transport client that
does not report to have the feature required by the custom.
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
Otherwise we could end up with persistent tasks metadata in the cluster that some of the nodes
might not understand in case where the cluster is during rolling upgrade from the default 6.2 to the
default 6.3 distribution.
Follow-up to #30743
In case an error is returned when calling search_shards on a remote
cluster, which will lead to throwing an exception in the coordinating
node, we should make sure that the status code returned by the
coordinating node is the same as the one returned by the remote
cluster. Up until now a 500 - Internal Server Error was always
returned. This commit changes this behaviour so that for instance if an
index is not found, which causes an 404, a 404 is also returned by the
coordinating node to the client.
Closes#27461
* Retain the expiryDate for trial licenses
While updating the license signature to the new license spec retain
the trial license expiration date to that of the existing license.
Resolves#30882
Although elasticsearch-certutil generates PKCS#12
files which are usable as both keystore and truststore
this is uncommon in practice. Settle these expectations
for the users following our security guides.
This modifies the high level rest client to allow calling code to
customize per request options for the bulk API. You do the actual
customization by passing a `RequestOptions` object to the API call
which is set on the `Request` that is generated by the high level
client. It also makes the `RequestOptions` a thing in the low level
rest client. For now that just means you use it to customize the
headers and the `httpAsyncResponseConsumerFactory` and we'll add
node selectors and per request timeouts in a follow up.
I only implemented this on the bulk API because it is the first one
in the list alphabetically and I wanted to keep the change small
enough to review. I'll convert the remaining APIs in a followup.
* Ensure that a purposefully wrong key is used
Uses a specific keypair for tests that require a purposefully wrong
keypair instead of selecting one randomly from the same pull from
which the correct one is selected. Entropy is low because of the
small space and the same key can be randomly selected as both the
correct one and the wrong one, causing the tests to fail.
The purposefully wrong key is also used in
testSigningKeyIsReloadedForEachRequest and needs to be cleaned
up afterwards so the rest of the tests don't use that for signing.
Resolves#30970
This commit removes the RequestBuilder generic type from Action. It was
needed to be used by the newRequest method, which in turn was used by
client.prepareExecute. Both of these methods are now removed, along with
the existing users of prepareExecute constructing the appropriate
builder directly.
This commit reworks the Sniffer component to simplify it and make it possible to test it.
In particular, it no longer takes out the host that failed when sniffing on failure, but rather relies on whatever the cluster returns. This is the result of some valid comments from #27985. Taking out one single host is too naive, hard to test and debug.
A new Scheduler abstraction is introduced to abstract the tasks scheduling away and make it possible to plug in any test implementation and take out timing aspects when testing.
Concurrency aspects have also been improved, synchronized methods are no longer required. At the same time, we were able to take #27697 and #25701 into account and fix them, especially now that we can more easily add tests.
Last but not least, unit tests are added for the Sniffer component, long overdue.
Closes#27697Closes#25701
This code is from an Apache 2.0 licensed codebase and when we imported
it into our codebase it carried the Apache 2.0 license as well. However,
during the migration of the X-Pack codebase from the internal private
repository to the elastic/elasticsearch repository, the migration tool
mistakently changed the license on this source file from the Apache 2.0
license to the Elastic license. This commit addresses this mistake by
reapplying the Apache 2.0 license.
Currently failures to compile a script usually lead to a ScriptException, which
inherits the 500 INTERNAL_SERVER_ERROR from ElasticsearchException if it does
not contain another root cause. Instead, this should be a 400 Bad Request error.
This PR changes this more generally for script compilation errors by changing
ScriptException to return 400 (bad request) as status code.
Closes#12315
This commit renames methods in the PersistentTasksService, to
make obvious that the methods send requests in order to change
the state of persistent tasks.
Relates to #29608.
ML has dedicated APIs for datafeeds and jobs yet base test classes and
some tests were relying on the cluster state for this state. This commit
removes this usage in favor of using the dedicated endpoints.
Limits the scope of the runtime dependency on
BouncyCastle so that it can be eventually removed.
* Splits functionality related to reading and generating certificates
and keys in two utility classes so that reading certificates and
keys doesn't require BouncyCastle.
* Implements a class for parsing PEM Encoded key material (which also
adds support for reading PKCS8 encoded encrypted private keys).
* Removes BouncyCastle dependency for all of our test suites(except
for the tests that explicitly test certificate generation) by using
pre-generated keys/certificates/keystores.
Removes the last remaining server dependencies from jdbc client. In order to do that it introduces the new project sql-shared-proto that contains only XContent-serializable classes. HTTP Client and JDBC now depend only on sql-shared-proto. I had to keep the original sql-proto project since it is used as a dependency by sql-cli and security integration tests.
Relates #29856
This is a bug that was identified by the kibana team. Currently on a
get-license call we do not serialize the hard-coded expiration for basic
licenses. However, the kibana team calls the x-pack info route which
still does serialize the expiration date. This commit removes that
serialization in the rest response.
Currently nio and netty modules use the CompletableFuture class for
managing listeners. This is unfortunate as that class accepts
Throwable. This commit adds a class CompletableContext that wraps
the CompletableFuture but does not accept Throwable. This allows the
modification of netty and nio logic to no longer handle Throwable.
This commit reworks the way our realms perform caching in order to
limit each principal to a single ongoing authentication per realm. In
other words, this means that multiple requests made by the same user
will not trigger more that one authentication attempt at a time if no
entry has been stored in the cache. If an entry is present in our
cache, there is no restriction on the number of concurrent
authentications performed for this user.
This change enables us to limit the load we place on an external system
like an LDAP server and also preserve resources such as CPU on
expensive operations such as BCrypt authentication.
Closes#30355
This commit fixes an issue with dynamic mapping updates when an index
operation is performed against an alias and when the user only has
permissions to the alias. Dynamic mapping updates resolve the concrete
index early to prevent issues so the information about the alias that
the triggering operation was being executed against is lost. When
security is enabled and a user only has privileges to the alias, this
dynamic mapping update would be rejected as it is executing against the
concrete index and not the alias. In order to handle this situation,
the security code needs to look at the concrete index and the
authorized indices of the user; if the concrete index is not authorized
the code will attempt to find an alias that the user has permissions to
update the mappings of.
Closes#30597
The .watcher-history-* template is currently using a plugin-custom index setting xpack.watcher.template.version,
which prevents this template from being installed in a mixed OSS / X-Pack cluster, ultimately
leading to the situation where an X-Pack node is constantly spamming an OSS master with (failed)
template updates. Other X-Pack templates (e.g. security-index-template or security_audit_log)
achieve the same versioning functionality by using a custom _meta field in the mapping instead.
This commit switches the .watcher-history-* template to use the _meta field instead.
Persistent tasks was moved from X-Pack to core in #28455.
However, registration of the named writables and named
X-content was left in X-Pack.
This change moves the registration of the named writables
and named X-content into core. Additionally, the persistent
task actions are no longer registered in the X-Pack client
plugin, as they are already registered in ActionModule.
This change adds a simple header to the transport client
that is present on the servers thread context that ensures
we can detect if a transport client talks to the server in a
specific request. This change also adds a header for xpack
to detect if the client has xpack installed.
This commit reintroduces 31251c9 and 63a5799. These commits introduced a
memory leak and were reverted. This commit brings those commits back
and fixes the memory leak by removing unnecessary retain method calls.
This reverts commit 31251c9 introduced in #30695.
We suspect this commit is causing the OOME's reported in #30811 and we will use this PR to test this assertion.
This commit adds the ability to configure how a docvalue field should be
formatted, so that it would be possible eg. to return a date field
formatted as the number of milliseconds since Epoch.
Closes#27740
Enables a rolling restart from the OSS distribution to the x-pack based distribution by preventing
x-pack code from installing custom metadata into the cluster state until all nodes are capable of
deserializing this metadata.
When doing a node restart using the test framework, the restarted node does not only use the
settings provided to the original node, but also additional settings provided by plugin extensions,
which does not correspond to the settings that a node would have on a true restart.
This is related to #29500. We are removing the ability to disable http
pipelining. This PR removes the references to disabling pipelining in
the integration test case.
This commit reduces the Windows CLI scripts to one-liners by moving all
of the redundant logic to an elasticsearch-cli script. This commit is
only the Windows side, a previous commit covered the Linux side.
Adding headers rather than setting them all at once seems more
user-friendly and we already do it in a similar way for parameters
(see Request#addParameter).
The new snapshot includes LUCENE-8324 which fixes missing checkpoint
after a fully deletes segment is dropped on flush. This snapshot should
resolves failed tests in the CorruptedFileIT suite.
Closes#30741Closes#30577
This commit changes the wait for a few netty threads to wait for these
threads to complete after the cluster has stopped. Previously, we were
waiting for these threads before the cluster was actually stopped; the
cluster is stopped in an AfterClass method of ESIntegTestCase, while
the wait was performed in the AfterClass of a class that extended
ESIntegTestCase, which is always executed before the AfterClass of
ESIntegTestCase.
Now, the wait is contained in an ExternalResource ClassRule that
implements the waiting for the threads to terminate in the after
method. This rule is executed after the AfterClass method in
ESIntegTestCase. The same fix has also been applied in
SecuritySingleNodeTestCase.
Closes#30563
Prior to this change an json array element with no fields would be omitted from json array.
Nested inner hits source filtering relies on the fact that the json array element numbering
remains untouched and this causes AOOB exceptions in the ES side during the fetch phase
without this change.
Closes#30624
This is related to #27260. The elasticsearch-nio jar is supposed to be
a library opposed to a framework. Currently it internally logs certain
exceptions. This commit modifies it to not rely on logging. Instead
exception handlers are passed by the applications that use the jar.
This commit reduces the Linux CLI scripts to one-liners by moving all of
the redundant logic to an elasticsearch-cli script. This commit is only
the Linux side, a follow-up will do this for Windows too.
Make all bool constructs use match/should (that is a query context) as
that is controlled and changed to a filter context by ES automatically
based on the sort order (_doc, field vs _sort) and trackScores.
Fix#29685
This change is to support rolling upgrade from a pre-6.3 default
distribution (i.e. without X-Pack) to a 6.3+ default distribution
(i.e. with X-Pack).
The ML metadata is no longer eagerly added to the cluster state
as soon as the master node has X-Pack available. Instead, it
is added when the first ML job is created.
As a result all methods that get the ML metadata need to be able
to handle the situation where there is no ML metadata in the
current cluster state. They do this by behaving as though an
empty ML metadata was present. This logic is encapsulated by
always asking for the current ML metadata using a static method
on the MlMetadata class.
Relates #30731
By default ML native processes are only allowed to use
30% of RAM, so the previous 2GB setting prevented the
test passing on VMs with only 4GB RAM. This change
reduces the limit to 1200MB, which means it can now
pass on VMs with 4GB RAM.
Meta plugins existed only for a short time, in order to enable breaking
up x-pack into multiple plugins. However, now that x-pack is no longer
installed as a plugin, the need for them has disappeared. This commit
removes the meta plugins infrastructure.
As the first record is random, there's a chance it will
be aligned on a bucket start. Thus we need to check the
bucket count is in [23, 24].
Closes#30715
These tests aim to check the set model memory limit is
respected. Additionally, it was asserting counts of
partition, by, over fields in an attempt to check that
the used memory is spent meaningfully. However, this
made the tests fragile, as changes in the ml-cpp could
lead to CI failures.
This commit removes those assertions. We are working on
adding tests in ml-cpp that will compensate.
diskspace and creates a subfolder for storing data outside of Lucene
indexes, but as part of the ES data paths.
Details:
- tmp storage is managed and does not allow allocation if disk space is
below a threshold (5GB at the moment)
- tmp storage is supposed to be managed by the native component but in
case this fails cleanup is provided:
- on job close
- on process crash
- after node crash, on restart
- available space is re-checked for every forecast call (the native
component has to check again before writing)
Note: The 1st path that has enough space is chosen on job open (job
close/reopen triggers a new search)
Watcher tests now always fail hard when watches that were
tried to be triggered in a test using the trigger() method,
but could not because they were not found on any of the
nodes in the cluster.
This change adds version information in case a native ML process crashes, the version is important for choosing the right symbol files when analyzing the crash. Adding the version combines all necessary information on one line.
relates elastic/ml-cpp#94
It is possible for state documents to be
left behind in the state index. This may be
because of bugs or uncontrollable scenarios.
In any case, those documents may take up quite
some disk space when they add up. This commit
adds a step in the expired data deletion that
is part of the daily maintenance service. The
new step searches for state documents that
do not belong to any of the current jobs and
deletes them.
Closes#30551
Adjust fast forward for token expiration test
Adjusts the maximum fast forward time for token expiration tests
to be 5 seconds before actual token expiration so that the test
won't fail even when upperlimit is randomly selected.
Resolves: #30062
The part of the history template responsible for slack attachments had a
dynamic mapping configured which could lead to problems, when a string
value looking like a date was configured in the value field of an
attachment.
This commit fixes the template by setting this field always to text.
This also requires a change in the template numbering to be sure this
will be applied properly when starting watcher.
This commit removes xpack from being a meta-plugin-as-a-module.
It also fixes a couple tests which were missing task dependencies, which
failed once the gradle execution order changed.
Removes dependency for server's version from the JDBC driver code. This
should allow us to dramatically reduce driver's size by removing the
server dependency from the driver.
Relates #29856
This commit increases the logging level around search to aid in
debugging failures in LicensingTests#testSecurityActionsByLicenseType
where we are seeing all shards failed error while trying to search the
security index.
See #30301
This change adds a `listTasks` method to the high level java
ClusterClient which allows listing running tasks through the
task management API.
Related to #27205
* Refactors ClientHelper to combine header logic
This change removes all the `*ClientHelper` classes which were
repeating logic between plugins and instead adds
`ClientHelper.executeWithHeaders()` and
`ClientHelper.executeWithHeadersAsync()` methods to centralise the
logic for executing requests with stored security headers.
* Removes Watcher headers constant
When the encrpytion of sensitive date is enabled, test that a
scheduled watch is executed as expected and produces the correct value
from a secret in the basic auth header.
Make SSLContext reloadable
This commit replaces all customKeyManagers and TrustManagers
(ReloadableKeyManager,ReloadableTrustManager,
EmptyKeyManager, EmptyTrustManager) with instances of
X509ExtendedKeyManager and X509ExtendedTrustManager.
This change was triggered by the effort to allow Elasticsearch to
run in a FIPS-140 environment. In JVMs running in FIPS approved
mode, only SunJSSE TrustManagers and KeyManagers can be used.
Reloadability is now ensured by a volatile instance of SSLContext
in SSLContectHolder.
SSLConfigurationReloaderTests use the reloadable SSLContext to
initialize HTTP Clients and Servers and use these for testing the
key material and trust relations.
This commit is related to #28898. It adds an nio driven http server
transport. Currently it only supports basic http features. Cors,
pipeling, and read timeouts will need to be added in future PRs.
This commit exposes the master version to the REST test context. This
will be needed in a follow-up where the master version will be used to
determine whether or not a certain warning header is expected.
Due to the way composite aggregation works, ordering in GROUP BY can be
applied only through grouped columns which now the analyzer verifier
enforces.
Fix 29900
This commit removes the SecurityLifecycleService, relegating its former
functions of listening for cluster state updates to SecurityIndexManager
and IndexAuditTrail.
This change adds a grok_pattern field to the GET categories API
output in ML. It's calculated using the regex and examples in the
categorization result, and applying a list of candidate Grok
patterns to the bits in between the tokens that are considered to
define the category.
This can currently be considered a prototype, as the Grok patterns
it produces are not optimal. However, enough people have said it
would be useful for it to be worthwhile exposing it as experimental
functionality for interested parties to try out.
This is fixing an issue that has come up in some builds. In some
scenarios I see an assertion failure that we are trying to move to
application mode when we are not in handshake mode. What I think is
happening is that we are in handshake mode and have received the
completed handshake message AND an application message. While reading in
handshake mode we switch to application mode. However, there is still
data to be consumed so we attempt to continue to read in handshake mode.
This leads to us attempting to move to application mode again throwing
an assertion.
This commit fixes this by immediatly exiting the handshake mode read
method if we are not longer in handshake mode. Additionally if we swap
modes during a read we attempt to read with the new mode to see if there
is data that needs to be handled.
This commit changes the default out-of-the-box configuration for the
number of shards from five to one. We think this will help address a
common problem of oversharding. For users with time-based indices that
need a different default, this can be managed with index templates. For
users with non-time-based indices that find they need to re-shard with
the split API in place they no longer need to resort only to
reindexing.
Since this has the impact of changing the default number of shards used
in REST tests, we want to ensure that we still have coverage for issues
that could arise from multiple shards. As such, we randomize (rarely)
the default number of shards in REST tests to two. This is managed via a
global index template. However, some tests check the templates that are
in the cluster state during the test. Since this template is randomly
there, we need a way for tests to skip adding the template used to set
the number of shards to two. For this we add the default_shards feature
skip. To avoid having to write our docs in a complicated way because
sometimes they might be behind one shard, and sometimes they might be
behind two shards we apply the default_shards feature skip to all docs
tests. That is, these tests will always run with the default number of
shards (one).
The errors were caused because release tests would use a copy of
the public key that was formatted differently. The change to the
public key format was introduced in [1].
Release tests Jenkins job has now been updated to use the correct
key format depending on the branch they run on [2]
Closes#30430
[1] https://github.com/elastic/elasticsearch/pull/30251
[2] https://github.com/elastic/infra/pull/4944
This commit adds the ability to specify a plugin from maven for a
test cluster to use. Currently, only local projects may be used as
plugins, except when testing bwc, where the coordinates of the project
are used. However, that assumes all projects always keep the same
coordinates, or are even still plugins, which is no longer the case for
x-pack. The full cluster and rolling restart tests are changed to use
this new method when pulling x-pack versions before 6.3.0.
Modifies the SQL tests to use the new `Request` object flavored methods
introduced onto the `RestClient` in #29623. We'd like to remove the old
methods eventually so we should stop using them.
Since adding back the per-watch statistics, we do not need to access
every trigger engine implementation to get the current total job count.
This commit removes the unused methods to do so.
Tweak the return data, in particular with regards for ODBC columns to
better align with the spec
Fix order for SYS TYPES and TABLES according to the JDBC/ODBC spec
Fix#30386Fix#30521
This commit cleans up some code in the FileUserPasswdStore and the
FileUserRolesStore classes. The maps used in these classes are volatile
so we need to make sure that we don't perform multiple operations with
the map unless we are sure we are using a reference to the same map.
The maps are also never null, but there were a few null checks in the
code that were not needed. These checks have been removed.
The TokenMetaData equals method compared byte arrays using `.equals` on
the arrays themselves, which is the equivalent of an `==` check. This
means that a seperate byte[] with the same contents would not be
considered equivalent to the existing one, even though it should be.
The method has been updated to use `Array#equals` and similarly the
hashcode method has been updated to call `Arrays#hashCode` instead of
calling hashcode on the array itself.
These tests are both in the file `watcher/stats/10_basic`, and have been
failing fairly frequently over the last month with a start-up issue.
The issue is being tracked in #30298.
Dates internally contain milliseconds (which appear when converting them
to Strings) however parsing does not accept them (and is being strict).
The parser has been changed so that Date is mandatory but the time
(including its fractions such as millis) are optional.
Fix#30002
This commit adds a general state listener to the SecurityIndexManager,
and replaces the existing health and up-to-date listeners with that. It
also moves helper methods relating to health to SecurityIndexManager
from SecurityLifecycleService.
This commit moves the generated-resources directory to be within
the build directory for the openldap-tests and saml-idp-tests
projects. Both projects create a generated-resources directory that
should have been in the build directory but were instead at the same
level as the build directory.
As conformance to best practices, this changes ensures that if a
SAML Response is signed, we verify the signature before processing
it any further. We were only checking the InResponseTo and
Destination attributes before potential signature validation but
there was no reason to do that up front either.
With the opening of xpack, we still retained a run task within
:x-pack:plugin. However, the root level run task also runs with the
default distribution. This change removes the extra run task inside
xpack in favor of using the root level task, and moves the
license/configuration code for run into the main run configuration.
This commit removes the hardcoded list of unconfigured ciphers in the
SslIntegrationTests. This list may include ciphers that are not
supported on certain JVMs. This list is replaced with code that
dynamically computes the set of ciphers that are not configured for
use by default.
The HTTPClient used in watcher is based on the apache http client. The
current client is using a lot of defaults - which are not always
optimal. Two of those defaults are the maximum number of total
connections and the maximum number of connections to a single route.
If one of those limits is reached, the HTTPClient waits for a connection
to be finished thus acting in a blocking fashion. In order to prevent
this when many requests are being executed, we increase the limit of
total connections as well as the connections per route (a route is
basically an endpoint, which also contains proxy information, not
containing an URL, just hosts).
On top of that an additional option has been set to evict
long running connections, which can potentially be reused after some
time. As this requires an additional background thread, this required
some changes to ensure that the httpclient is closed properly. Also the
timeout for this can be configured.
This commit renames IndexLifecycleManager to SecurityIndexManager as it
is not actually a general purpose class, but specific to security. It
also removes indirection in code calling the lifecycle service, instead
calling the security index manager directly.
Starting watcher should wait for the watcher to be started before
marking the status as started, which is now done via a callback.
Also, reloading watcher could set the execution service to paused. This could
lead to watches not being executed, when run in tests. This fix does not
change the paused flag in the execution service, just clears out the
current queue and executions.
Closes#30381
Today when processing a request for a URL path for which we can not find
a handler we send back a plain-text response. Yet, we have the accept
header in our hand and can respect the accepted media type of the
request. This commit addresses this.
This commit removes the unnecessary transport_client cluster permission
from the role that is used as an example in our documentation. This
permission is not needed to use cross cluster search.
When validating the search request, we make sure any date_histogram
aggregations have timezones that match the jobs. But we didn't
do any such validation on range queries.
While it wouldn't produce incorrect results, it would be confusing
to the user as no documents would match the aggregation (because we
add a filter clause on the timezone for the agg).
Now the user gets an exception up front, and some helpful text about
why the range query didnt match, and which timezones are acceptable
This commit updates the multi cluster search test with security so that
the user that is simply performing a multi cluster search does not have
any cluster permissions. This is done as none are needed by this user
and excess privileges could mask a behavior change.
Upgrade to lucene-7.4.0-snapshot-1ed95c097b
This version contains:
* An Analyzer for Korean
* An IntervalQuery and IntervalsSource that retrieve minimum intervals of positional queries.
* A new API to retrieve matches (offsets and positions) of a query for a single document.
* Support for soft deletes in the index writer.
* A fixed shingle filter that handles index time synonyms.
* Support for emoji sequence in ICUTokenizer (with an upgrade to icu 61.1)
When the watcher service pauses execution due to a cluster state update,
the trigger service and its engines also need to pause properly instead
of keeping going. This is also important when the .watches index is
deleted, so that watches don't stay in a triggered mode.