Currently in the x content serialization tests we compare the exception
messages that are serialized. These exceptions messages are not
equivalent because the exception often changes when serialized to x
content. This commit removes this assertion.
Currently there are two issues with serializing BulkByScrollResponse.
First, when deserializing from XContent, indexing exceptions and search
exceptions are switched. Additionally, search exceptions do no retain
the appropriate RestStatus code, so you must evaluate the status code
from the exception. However, the exception class is not always correctly
retained when serialized.
This commit adds tests in the failure case. Additionally, fixes the
swapping of failure types and adds the rest status code to the search
failure.
If a cluster sending monitoring data is unhealthy and triggers an
alert, then stops sending data the following exception [1] can occur.
This exception stops the current Watch and the behavior is actually
correct in part due to the exception. Simply fixing the exception
introduces some incorrect behavior. Now that the Watch does not
error in the this case, it will result in an incorrectly "resolved"
alert. The fix here is two parts a) fix the exception b) fix the
following incorrect behavior.
a) fixing the exception is as easy as checking the size of the
array before accessing it.
b) fixing the following incorrect behavior is a bit more intrusive
- Note - the UI depends on the success/met state for each condition
to determine an "OK" or "FIRING"
In this scenario, where an unhealthy cluster triggers an alert and
then goes silent, it should keep "FIRING" until it hears back that
the cluster is green. To keep the Watch "FIRING" either the index
action or the email action needs to fire. Since the Watch is neither
a "new" alert or a "resolved" alert, we do not want to keep sending
an email (that would be non-passive too). Without completely changing
the logic of how an alert is resolved allowing the index action to
take place would result in the alert being resolved. Since we can
not keep "FIRING" either the email or index action (since we don't
want to resolve the alert nor re-write the logic for alert resolution),
we will introduce a 3rd action. A logging action that WILL fire when
the cluster is unhealthy. Specifically will fire when there is an
unresolved alert and it can not find the cluster state.
This logging action is logged at debug, so it should be noticed much.
This logging action serves as an 'anchor' for the UI to keep the state
in an a "FIRING" status until the alert is resolved.
This presents a possible scenario where a cluster starts firing,
then goes completely silent forever, the Watch will be "FIRING"
forever. This is an edge case that already exists in some scenarios
and requires manual intervention to remove that Watch.
This changes changes to use a template-like method to populate the
version_created for the default monitoring watches. The version is
set to 7.5 since that is where this is first introduced.
Fixes#43184
The currently logic shard selecting logic selects a random shard copy
instead of selecting the local shard copy and if local copy is not
available then selecting a random shard copy. The latter is desired
behaviour for enrich.
By reusing `OperationRouting#searchShards(...)` we get the desired
behaviour and reuse the same logic that the search api is using.
This commit adds documentation for new index privilege
create_doc which only allows indexing of new documents
but no updates to existing documents via Index or Bulk APIs.
Relates: #45806
This commit adds a thread filter for gradle unit tests which omits
threads gradle may create that we have no control over shutting down.
The current example of this is for project.exec which gradle pools.
closes#47417
* Separate SLM stop/start/status API from ILM
This separates a start/stop/status API for SLM from being tied to ILM's
operation mode. These APIs look like:
```
POST /_slm/stop
POST /_slm/start
GET /_slm/status
```
This allows administrators to have fine-grained control over preventing
periodic snapshots and deletions while performing cluster maintenance.
Relates to #43663
* Allow going from RUNNING to STOPPED
* Align with the OperationMode rules
* Fix slmStopping method
* Make OperationModeUpdateTask constructor private
* Wipe snapshots better in test
Failed snapshots will eventually build up unless they are deleted. While
failures may not take up much space, they add noise to the list of
snapshots and it's desirable to remove them when they are no longer
useful.
With this change, failed snapshots are deleted using the following
strategy: `FAILED` snapshots will be kept until the configured
`expire_after` period has passed, if present, and then be deleted. If
there is no configured `expire_after` in the retention policy, then they
will be deleted if there is at least one more recent successful snapshot
from this policy (as they may otherwise be useful for troubleshooting
purposes). Failed snapshots are not counted towards either `min_count`
or `max_count`.
The "Conditionals with the Pipeline Processor" incorrectly documents
how to create a pipeline of pipelines with a failure condition. The
example as-is will always execute the fail processor. The change here
updates the documentation to correct guard the fail processor with an
if condition.
Similar to #47507. We are throwing `SnapshotException` when
you (and SLM tests) would expect a `SnapshotMissingException`
for concurrent snapshot status and snapshot delete operations
with a very low probability.
Fixed the exception type and added a test for this scenario.
Adds a check when running an Enrich policy to make sure that an Enrich index
is force merged down to one segment, and if it was not fully merged, attempts
the merge again, up to a configurable number of times.
We're needlessly wrapping a `SnapshotMissingException` which
itself is a `SnapshotException` when trying to load a missing
snapshot. This leads to failure #47442 which expects a
`SnapshotMissingException` in this case.
Closes#47442
* Add Snapshot Lifecycle Retention documentation
This commits adds API and general purpose documentation for SLM
retention.
Relates to #43663
* Fix docs tests
* Update default now that #47604 has been merged
* Update docs/reference/ilm/apis/slm-api.asciidoc
Co-Authored-By: Gordon Brown <gordon.brown@elastic.co>
* Update docs/reference/ilm/apis/slm-api.asciidoc
Co-Authored-By: Gordon Brown <gordon.brown@elastic.co>
* Update docs with feedback
When exceptions could be returned from another node, the exception
might be wrapped in a `RemoteTransportException`. In places where
we handled specific exceptions using `instanceof` we ought to unwrap
the cause first.
This commit attempts to fix this issue after searching code in the ML
plugin.
Backport of #47676
* Convert RunTask to use testclusers, remove ClusterFormationTasks
This PR adds a new RunTask and a way for it to start a
testclusters cluster out of band and block on it to replace
the old RunTask that used ClusterFormationTasks.
With this we can now remove ClusterFormationTasks.
Importing dangling indices with aliases risks breaking functionalities
using those aliases. For instance, writing to an alias may break if
there is no is_write_index indication on the existing alias and the
dangling index import adds a second index to the alias. Or an
application could have an assumption about the alias only ever pointing
to one index and suddenly seeing the alias also linked to an old index
could break it. With this change we strip aliases of the index meta data
found before importing a dangling index.
Setting `cluster.routing.allocation.disk.include_relocations` to `false` is a
bad idea since it will lead to the kinds of overshoot that were otherwise fixed
in #46079. This commit deprecates this setting so it can be removed in the next
major release.
Previously when retrieving an SLM policy it would always return a 200
with `{}` in the body, even if the policy did not exist. This changes
that behavior to throw an error (similar to our other APIs) if a
policy doesn't exist.
This also adds a basic CRUD yml test for the behavior.
Resolves#47664
The example use of a scoring script was incorrectly using a filter
script query, which has no scoring, and thus no _score variable
avialable. This commit converts the example doc to using the newer
script_score query.
this commit introduces a geo-match enrich processor that looks up a specific
`geo_point` field in the enrich-index for all entries that have a geo_shape match field
that meets some specific relation criteria with the input field.
For example, the enrich index may contain documents with zipcodes and their respective
geo_shape. Ingesting documents with a geo_point field can be enriched with which zipcode
they associate according to which shape they are contained within.
this commit also refactors some of the MatchProcessor by moving a lot of the shared code to
AbstractEnrichProcessor.
Closes#42639.
If a thread pool rejection exception happens, an alternative code
path is chosen to write history and delete the trigger. If an exception
happens during deletion of the trigger an exception may be thrown and not
caught.
This commit catches the exception and provides a meaning error message.
fixes#47008
When deactivating a watch, there is a chance that it is fully deactivated
and reporting as not running but the history is not fully written yet.
There is not a tight coupling between the associated watcher history
index and the deactivation. This test assumes that once a watch is
deactivated that all history is fully written in a very short time period.
If the Watch is deactivated, but the history is slow to write it can result
in a failing test.
This change removes an assertion that assumes that the deactivation of a watch
ensured the all of the watch history was written. There is still a minor race
condition with respect to the remaining history assertions. However, if the
history is slow to be written, it will allow the test to still passing.
fixes#47503
Adds the following parameters to `outlier_detection`:
- `compute_feature_influence` (boolean): whether to compute or not
feature influence scores
- `outlier_fraction` (double): the proportion of the data set assumed
to be outlying prior to running outlier detection
- `standardization_enabled` (boolean): whether to apply standardization
to the feature values
Backport of #47600
This has ELambda and ENewArrayFunctionRef add their generated synthetic
methods to the SClass node during the semantic pass and removes this
data from the write pass. This is the first step to remove "Globals" (mutable
state) from the write pass.
Previously, we supported only the format `{fn <FUNCTION_NAME>()}`
but other DBs like MSSQL, DB2, MariaDB/MySQL alos allow whitespaces
between `{` and `fn`. Furhermore, also some applications - like PowerBI -
generate escape sequences with spaces: `select { fn name(params) } etc.`
Add support for white spaces between `{` and the escape pattern definition
like `fn`, `ts`, `d`, `guid` etc.
Closes: #47401
(cherry picked from commit 08a22d0b393f4a76c52dabc5e7b9cafcc19c30ca)
Use case:
User with `create_doc` index privilege will be allowed to only index new documents
either via Index API or Bulk API.
There are two cases that we need to think:
- **User indexing a new document without specifying an Id.**
For this ES auto generates an Id and now ES version 7.5.0 onwards defaults to `op_type` `create` we just need to authorize on the `op_type`.
- **User indexing a new document with an Id.**
This is problematic as we do not know whether a document with Id exists or not.
If the `op_type` is `create` then we can assume the user is trying to add a document, if it exists it is going to throw an error from the index engine.
Given these both cases, we can safely authorize based on the `op_type` value. If the value is `create` then the user with `create_doc` privilege is authorized to index new documents.
In the `AuthorizationService` when authorizing a bulk request, we check the implied action.
This code changes that to append the `:op_type/index` or `:op_type/create`
to indicate the implied index action.
This commit adds support to retrieve all API keys if the authenticated
user is authorized to do so.
This removes the restriction of specifying one of the
parameters (like id, name, username and/or realm name)
when the `owner` is set to `false`.
Closes#46887