License state is currently made up of boolean methods that check whether
a particular feature is allowed by the current license state. Each new
feature must copy/past boiler plate code. While that has gotten easier
with utilities like isAllowedByLicense, this is still more cumbersome
than should be necessary. This commit adds a general purpose isAllowed
method which takes a new Feature enum, where each value of the enum
defines the minimum license mode and whether the license must be active
to be allowed. Only security features are converted in this PR, in order
to keep the commit size relatively small. The rest of the features will
be converted in a followup.
This adds a validation to VSParserHelper to ensure that a field or
script or both are specified by the user. This is technically
required today already, but throws an exception much deeper
in the agg framework and has a very unintuitive error for the user
(as well as eating more resources instead of failing early)
The (de)serialization code of the async search response
cannot handle exceptions that extend ElasticsearchException (e.g. ScriptException).
This commit fixes this bug by serializing the error with the more generic
StreamInput#writeException.
EQL will require very similar functionality to async search. This PR refactors
AsyncSearchIndexService to make it reusable for EQL.
Supersedes #55119
Relates to #49638
This commit refactors geo_shape doc values, fielddata, and utility classes from
the single mapper package in x-pack spatial plugin to a package structure that
is consistent with the server module.
Previously audit messages were indexed when datafeeds that were
assigned to a node were stopped, but not datafeeds that were
unassigned at the time they were stopped.
This change adds auditing for the unassigned case.
Backport of #55656
While we were catching `TaskCancelledException` while we wait for
reindexing to complete, we missed the fact that this exception
may be wrapped in a multi-node cluster. This is the reason
we may still fail the task when stop is called while reindexing.
Some times we're lucky and the exception is thrown by the same
node that runs the job. Then the exception is not wrapped and
things work fine. But when that is not the case the exception is
wrapped, we fail to catch it, and set the task to failed.
The fix is to simply unwrap the exception when we check it it
is `TaskCancelledException`.
Closes#55068
Backport of #55659
The CacheFile.fileLock() method is used to acquire a lock
on a cache file so that the file can't be deleted (or its file
handle closed) during the execution of a read or a write
operation.
Today this lock is obtained by first acquiring the eviction
lock (the write lock of the readwrite lock), then by checking
if the cache file is evicted and the file channel still open,
and finally by obtaining the file lock (the read lock of the
readwrite lock). Acquiring the read lock while the eviction
lock is held ensures that the cache file eviction cannot
start in the meanwhile. But eviction starts (and terminations)
also acquire the eviction lock; and this lock cannot be
obtained while a read lock is held (the write lock of a
readwrite lock is exclusive).
If we were acquiring a read lock and checking the eviction
flag and file channel existence while holding the read lock
we know that no eviction can start or finish until the
read lock is released.
Backport of #55115.
Replace calls to deprecate(String,Object...) with deprecateAndMaybeLog(...),
with an appropriate key, so that all messages can potentially be deduplicated.
The Kibana CSV export feature uses a non-standard timestamp format.
This change adds it to the formats the find_file_structure endpoint
recognizes out-of-the-box, to make round-tripping data from Kibana
back to Kibana via CSV files easier.
Fixes#55586
A JSON schema was recently introduced for the REST API specification. #54252
This PR introduces a 3rd party validation tool to ensure that the
REST specification conforms to the schema.
The task is applied to the 3 projects that contain REST API specifications.
The plugin wires this task into the precommit commit task, and should be
considered as part of the public API for the build tools for any plugin
developer to contribute their plugin's specification.
An ignore parameter has been introduced for the task to allow specific
file to be ignored from the validation. The ignored files in this PR
will soon get issues logged and a link so they can be fixed.
Closes#54314
Out of the box "access granted" audit events are not logged
for system users. The list of system users was stale and included
only the _system and _xpack users. This commit expands this list
with _xpack_security and _async_search, effectively reducing the
auditing noise by not logging the audit events of these system
users out of the box.
Closes#37924
The testPerformAction test has been failing periodically due to
how Hamcrest's containsStringIgnoringCase does not lowercase using
the same Locale set in the test infrastructure.
This commit falls back to explicitly lowercasing using the root
locale
This commit adds a new GeoShapeBoundsAggregator to the spatial plugin and registers it with the GeoShapeValuesSourceType. This enables geo_bounds aggregations on geo_shape fields
After #53562, the `geo_shape` field mapper is registered within
a module. This opens the door for introducing a new `geo_shape`
field mapper into the Spatial Plugin that has doc-values support.
This is very much an extension of server's GeoShapeFieldMapper,
but with the addition of the doc values implementation.
Data frame analytics process currently reports progress as
an integer `progress_percent`. We parse that and report it
from the _stats API as the progress of the `analyzing` phase.
However, we want to allow the DFA process to report progress
for more than one phase. This commit prepares for this by
parsing `phase_progress` from the process, an object that
contains the `phase` name plus the `progress_percent` for that
phase.
Backport of #55580
Instead of doing our own checks against REST status, shard counts, and shard failures, this commit changes all our extractor search requests to set `.setAllowPartialSearchResults(false)`.
- Scrolls are automatically cleared when a search failure occurs with `.setAllowPartialSearchResults(false)` set.
- Code error handling is simplified
closes https://github.com/elastic/elasticsearch/issues/40793
Issue #55521 suggested that audit messages were not written when
closing an unassigned job. This is not the case, but we didn't
have a test to prove it.
Backport of #55571
The ML info endpoint returns the max_model_memory_limit setting
if one is configured. However, it is still possible to create
a job that cannot run anywhere in the current cluster because
no node in the cluster has enough memory to accommodate it.
This change adds an extra piece of information,
limits.effective_max_model_memory_limit, to the ML info
response that returns the biggest model memory limit that could
be run in the current cluster assuming no other jobs were
running.
The idea is that the ML UI will be able to warn users who try to
create jobs with higher model memory limits that their jobs will
not be able to start unless they add a bigger ML node to their
cluster.
Backport of #55529
Adds a "node" field to the response from the following endpoints:
1. Open anomaly detection job
2. Start datafeed
3. Start data frame analytics job
If the job or datafeed is assigned to a node immediately then
this field will return the ID of that node.
In the case where a job or datafeed is opened or started lazily
the node field will contain an empty string. Clients that want
to test whether a job or datafeed was opened or started lazily
can therefore check for this.
Backport of #55473
PEMUtils would incorrectly fill the encryption password with zeros
(the '\0' character) after decrypting a PKCS#8 key.
Since PEMUtils did not take ownership of this password it should not
zero it out because it does not know whether the caller will use that
password array again. This is actually what PEMKeyConfig does - it
uses the key encryption password as the password for the ephemeral
keystore that it creates in order to build a KeyManager.
Backport of: #55457
In the test after the first load event is is not known which models are cached as
loading a later one will evict an earlier one and the order is not known.
The models could have been loaded 1 or 2 times not exactly twice
This change folds the removal of the in-progress snapshot entry
into setting the safe repository generation. Outside of removing
an unnecessary cluster state update, this also has the advantage
of removing a somewhat inconsistent cluster state where the safe
repository generation points at `RepositoryData` that contains a
finished snapshot while it is still in-progress in the cluster
state, making it easier to reason about the state machine of
upcoming concurrent snapshot operations.
Today a read-only engine requires a complete history of operations, in the
sense that its local checkpoint must equal its maximum sequence number. This is
a valid check for read-only engines that were obtained by closing an index
since closing an index waits for all in-flight operations to complete. However
a snapshot may not have this property if it was taken while indexing was
ongoing, but that's ok.
This commit weakens the check for a complete history to exclude the case of a
searchable snapshot.
Relates #50999
This change ensures that we return the latest expiration time
when retrieving the response from the index.
This commit also fixes a bug that stops the garbage collection of saved responses if the async search index is deleted.
If more than 100 shard-follow tasks are trying to connect to the remote
cluster, then some of them will abort with "connect listener queue is
full". This is because we retry on ESRejectedExecutionException, but not
on RejectedExecutionException.
`updateAndGet` could actually call the internal method more than once on contention.
If I read the JavaDocs, it says:
```* @param updateFunction a side-effect-free function```
So, it could be getting multiple updates on contention, thus having a race condition where stats are double counted.
To fix, I am going to use a `ReadWriteLock`. The `LongAdder` objects allows fast thread safe writes in high contention environments. These can be protected by the `ReadWriteLock::readLock`.
When stats are persisted, I need to call reset on all these adders. This is NOT thread safe if additions are taking place concurrently. So, I am going to protect with `ReadWriteLock::writeLock`.
This should prevent race conditions while allowing high (ish) throughput in the highly contention paths in inference.
I did some simple throughput tests and this change is not significantly slower and is simpler to grok (IMO).
closes https://github.com/elastic/elasticsearch/issues/54786
This paves the data layer way so that exceptionally large models are partitioned across multiple documents.
This change means that nodes before 7.8.0 will not be able to use trained inference models created on nodes on or after 7.8.0.
I chose the definition document limit to be 100. This *SHOULD* be plenty for any large model. One of the largest models that I have created so far had the following stats:
~314MB of inflated JSON, ~66MB when compressed, ~177MB of heap.
With the chunking sizes of `16 * 1024 * 1024` its compressed string could be partitioned to 5 documents.
Supporting models 20 times this size (compressed) seems adequate for now.
* [ML] fix native ML test log spam (#55459)
This adds a dependency to ingest common. This removes the log spam resulting from basic plugins being enabled that require the common ingest processors.
* removing unnecessary changes
* removing unused imports
* removing unnecessary java setting
This commit adds a new querystring parameter on the following APIs:
- Index
- Update
- Bulk
- Create Index
- Rollover
These APIs now support a `?prefer_v2_templates=true|false` flag. This flag changes the preference
creation to use either V2 index templates or V1 templates. This flag defaults to `false` and will be
changed to `true` for 8.0+ in subsequent work.
Additionally, setting this flag internally sets the `index.prefer_v2_templates` index-level setting.
This setting is used so that actions that automatically create a new index (things like rollover
initiated by ILM) will inherit the preference from the original index. This setting is dynamic so
that a transition from v1 to v2 templates can occur for long-running indices grouped by an alias
performing periodic rollover.
This also adds support for sending this parameter to the High Level Rest Client.
Relates to #53101
We don't really need `LinkedHashSet` here. We can assume that all the
entries are unique and just use a list and use the list utilities to
create the cheapest possible version of the list.
Also, this fixes a bug in `addSnapshot` which would mutate the existing
linked hash set on the current instance (fortunately this never caused a real world bug)
and brings the collection in line with the java docs on its getter that claim immutability.
Removing the deprecated "xpack.monitoring.enabled" setting introduced
log spam and potentially some failures in ML tests. It's possible to use
a different, non-deprecated setting to disable monitoring, so we do that
here.
Adds ranged read support for GCS repositories in order to enable searchable snapshot support
for GCS.
As part of this PR, I've extracted some of the test infrastructure to make sure that
GoogleCloudStorageBlobContainerRetriesTests and S3BlobContainerRetriesTests are covering
similar test (as I saw those diverging in what they cover)
This validation is not needed, as we have discovered the source of the
serialization error that was leading to some usage instances appearing
to not have a name.
This commit upgrades the ASM dependency used in the feature aware check
to 7.3.1. This gives support for JDK 14. Additionally, now that Gradle
understands JDK 13, it means we can remove a restriction on running the
feature aware check to JDK 12 and lower.
This change reworks the loading and monitoring of files that are used
for the construction of SSLContexts so that updates to these files are
not lost if the updates occur during startup. Previously, the
SSLService would parse the settings, build the SSLConfiguration
objects, and construct the SSLContexts prior to the
SSLConfigurationReloader starting to monitor these files for changes.
This allowed for a small window where updates to these files may never
be observed until the node restarted.
To remove the potential miss of a change to these files, the code now
parses the settings and builds SSLConfiguration instances prior to the
construction of the SSLService. The files back the SSLConfiguration
instances are then registered for monitoring and finally the SSLService
is constructed from the previously parse SSLConfiguration instances. As
the SSLService is not constructed when the code starts monitoring the
files for changes, a CompleteableFuture is used to obtain a reference
to the SSLService; this allows for construction of the SSLService to
complete and ensures that we do not miss any file updates during the
construction of the SSLService.
While working on this change, the SSLConfigurationReloader was also
refactored to reflect how it is currently used. When the
SSLConfigurationReloader was originally written the files that it
monitored could change during runtime. This is no longer the case as
we stopped the monitoring of files that back dynamic SSLContext
instances. In order to support the ability for items to change during
runtime, the class made use of concurrent data structures. The use of
these concurrent datastructures has been removed.
Closes#54867
Backport of #54999
Some aggregations, such as the Terms* family, will use an alternate
class to represent unmapped shard results (while the rest of the aggs
use the same object but with some form of "empty" or "nullish" values
to represent unmapped).
This was problematic with AbstractWireSerializingTestCase because it
expects the instanceReader to always match the original class. Instead,
we need to use the NamedWriteable version so that the registry
can be consulted for the proper deserialization reader.
Security features in the license state currently do a dynamic check on
whether security is enabled. This is because the license level can
change the default security enabled state. This commit splits out the
check on security being enabled, so that the combo method of security
enabled plus license allowed is no longer necessary.
We believe there's no longer a need to be able to disable basic-license
features completely using the "xpack.*.enabled" settings. If users don't
want to use those features, they simply don't need to use them. Having
such features always available lets us build more complex features that
assume basic-license features are present.
This commit deprecates settings of the form "xpack.*.enabled" for
basic-license features, excluding "security", which is a special case.
It also removes deprecated settings from integration tests and unit
tests where they're not directly relevant; e.g. monitoring and ILM are
no longer disabled in many integration tests.
* [ML] fix bugs with prediction field value settings (#55333)
This fixes two unreleased bugs:
1. Prediction value type of `number` might show unexpected classes
Analytics created models may have class labels like `1, 5, 10` (or some collection of discrete, whole numbers). These labels are passed to the inference model config in the `classification_labels` field.
When the predicted value format is `numeric` it should attempt to see if the classification labels are provided and are numeric. If so, use those. If not, use the underlying value.
2. When supplying an update overwrite, inference was losing the default prediction field value. This is because it was not copied over in the copy ctor in the ClassificationConfig.Builder class.
closes#55332