4379 Commits

Author SHA1 Message Date
Dimitris Athanasiou
447bac27d2
[7.x][ML] Delete unused data frame analytics state (#50243) (#50280)
This commit adds removal of unused data frame analytics state
from the _delete_expired_data API (and in extend th ML daily
maintenance task). At the moment the potential state docs
include the progress document and state for regression and
classification analyses.

Backport of #50243
2019-12-18 12:30:11 +00:00
Yannick Welsch
82086929d7 Increase timeout on FollowIndexSecurityIT.testAutoFollowPatterns (#50282)
This test was causing test failures on slow CI runs.

Closes #50279
2019-12-18 10:37:11 +01:00
Przemysław Witek
ac974c35c0
Pass processConnectTimeout to the method that fetches C++ process' PID (#50276) (#50290) 2019-12-17 21:32:37 +01:00
Florian Kelbert
afe9ee3fa5 [DOCS] Fix typo in Create API key docs (#50233) 2019-12-17 11:19:13 -05:00
David Kyle
098f540f9d
[ML] Remove usage of base action logger in ml actions (#50074) (#50236) 2019-12-17 13:03:27 +00:00
Martijn van Groningen
2079f1cbeb
Backport: Fix ingest simulate response document order if processor executes async (#50269)
Backport #50244 to 7.x branch.

If a processor executes asynchronously and the ingest simulate api simulates with
multiple documents then the order of the documents in the response may not match
the order of the documents in the request.

Alexander Reelsen discovered this issue with the enrich processor with the following reproduction:

```
PUT cities/_doc/munich
{"zip":"80331","city":"Munich"}

PUT cities/_doc/berlin
{"zip":"10965","city":"Berlin"}

PUT /_enrich/policy/zip-policy
{
  "match": {
    "indices": "cities",
    "match_field": "zip",
    "enrich_fields": [ "city" ]
  }
}

POST /_enrich/policy/zip-policy/_execute

GET _cat/indices/.enrich-*

POST /_ingest/pipeline/_simulate
{
  "pipeline": {
  "processors" : [
    {
      "enrich" : {
        "policy_name": "zip-policy",
        "field" : "zip",
        "target_field": "city",
        "max_matches": "1"
      }
    }
  ]
  },
  "docs": [
    { "_id": "first", "_source" : { "zip" : "80331" } } ,
    { "_id": "second", "_source" : { "zip" : "50667" } }
  ]
}
```

* fixed test compile error
2019-12-17 12:27:07 +01:00
Armin Braun
2e7b1ab375
Use ClusterState as Consistency Source for Snapshot Repositories (#49060) (#50267)
Follow up to #49729

This change removes falling back to listing out the repository contents to find the latest `index-N` in write-mounted blob store repositories.
This saves 2-3 list operations on each snapshot create and delete operation. Also it makes all the snapshot status APIs cheaper (and faster) by saving one list operation there as well in many cases.
This removes the resiliency to concurrent modifications of the repository as a result and puts a repository in a `corrupted` state in case loading `RepositoryData` failed from the assumed generation.
2019-12-17 10:55:13 +01:00
Andrei Stefan
c6fdf9ed8a Handle NULL in ResultSet's getDate() method (#50184)
(cherry picked from commit 08214eb1338fef5c8082c3f8b84c24dd53224ebe)
2019-12-17 10:03:23 +02:00
Tim Vernum
ce2aab3f2f
Add setting to restrict license types (#50252)
This adds a new "xpack.license.upload.types" setting that restricts
which license types may be uploaded to a cluster.

By default all types are allowed (excluding basic, which can only be
generated and never uploaded).
This setting does not restrict APIs that generate licenses such as the
start trial API.

This setting is not documented as it is intended to be set by
orchestrators and not end users.

Backport of: #49418
2019-12-17 14:58:58 +11:00
Julie Tibshirani
463cd414aa Bump the scroll keep-alive time in cluster upgrade tests. (#50195)
In the yaml cluster upgrade tests, we start a scroll in a mixed-version cluster,
then attempt to continue the scroll after the upgrade is complete. This test
occasionally fails because the scroll can expire before the cluster is done
upgrading.

The current scroll keep-alive time 5m. This PR bumps it to 10m, which gives a
good buffer since in failing tests the time was only exceeded by ~30 seconds.

Addresses #46529.
2019-12-16 10:58:31 -08:00
Rory Hunter
2bd3a05892
Refactor environment variable processing for Docker (#50221)
Backport of #49612.

The current Docker entrypoint script picks up environment variables and
translates them into -E command line arguments. However, since any tool
executes via `docker exec` doesn't run the entrypoint, it results in
a poorer user experience.

Therefore, refactor the env var handling so that the -E options are
generated in `elasticsearch-env`. These have to be appended to any
existing command arguments, since some CLI tools have subcommands and
-E arguments must come after the subcommand.

Also extract the support for `_FILE` env vars into a separate script, so
that it can be called from more than once place (the behaviour is
idempotent).

Finally, add noop -E handling to CronEvalTool for parity, and support
`-E` in MultiCommand before subcommands.
2019-12-16 15:39:28 +00:00
David Kyle
5542686283 [ML] Wait for green after opening job in NetworkDisruptionIT (#50232)
Closes #49908
2019-12-16 14:55:58 +00:00
David Roberts
32b2445744
Change process kill order for testclusters shutdown (#50215)
The testclusters shutdown code was killing child processes
of the ES JVM before the ES JVM.  This causes any running
ML jobs to be recorded as failed, as the ES JVM notices that
they have disconnected from it without being told to stop,
as they would if they crashed.  In many test suites this
doesn't matter because the test cluster will never be
restarted, but in the case of upgrade tests it makes it
impossible to test what happens when an ML job is running
at the time of the upgrade.

This change reverses the order of killing the ES process
tree such that the parent processes are killed before their
children.  A list of children is stored before killing the
parent so that they can subsequently be killed (if they
don't exit by themselves as a side effect of the parent
dying).

Backport of #50175
2019-12-16 14:12:36 +00:00
Dimitris Athanasiou
73add726d7
[7.x][ML] Fix exception when field is not included and excluded at the same time (#50192) (#50223)
Executing the data frame analytics _explain API with a config that contains
a field that is not in the includes list but at the same time is the excludes
list results to trying to remove the field twice from the iterator. That causes
an `IllegalStateException`. This commit fixes this issue and adds a test that
captures the scenario.

Backport of #50192
2019-12-16 11:30:06 +00:00
Armin Braun
761d6e8e4b
Remove BlobContainer Tests against Mocks (#50194) (#50220)
* Remove BlobContainer Tests against Mocks

Removing all these weird mocks as asked for by #30424.
All these tests are now part of real repository ITs and otherwise left unchanged if they had
independent tests that didn't call the `createBlobStore` method previously.
The HDFS tests also get added coverage as a side-effect because they did not have an implementation
of the abstract repository ITs.

Closes #30424
2019-12-16 11:37:09 +01:00
Ignacio Vera
3717c733ff
"CONTAINS" support for BKD-backed geo_shape and shape fields (#50141) (#50213)
Lucene 8.4 added support for "CONTAINS", therefore in this commit those
changes are integrated in Elasticsearch. This commit contains as well a
bug fix when querying with a geometry collection with "DISJOINT" relation.
2019-12-16 09:17:51 +01:00
Tim Vernum
a9d16ee895
Skip enterprise license tests in release build (#50182)
The release builds use a production license key, and our rest test load
licenses that are signed by the dev license key.

This change adds the new enterprise license Rest tests to the
blacklist on release builds.

Backport of: #50163
2019-12-16 10:11:21 +11:00
Nhat Nguyen
df46848fb0 Migrate peer recovery from translog to retention lease (#49448)
Since 7.4, we switch from translog to Lucene as the source of history
for peer recoveries. However, we reduce the likelihood of
operation-based recoveries when performing a full cluster restart from
pre-7.4 because existing copies do not have PPRL.

To remedy this issue, we fallback using translog in peer recoveries if
the recovering replica does not have a peer recovery retention lease,
and the replication group hasn't fully migrated to PRRL.

Relates #45136
2019-12-15 10:24:39 -05:00
Nhat Nguyen
c151a75dfe Use retention lease in peer recovery of closed indices (#48430)
Today we do not use retention leases in peer recovery for closed indices
because we can't sync retention leases on closed indices. This change
allows that ability and adjusts peer recovery to use retention leases
for all indices with soft-deletes enabled.

Relates #45136

Co-authored-by: David Turner <david.turner@elastic.co>
2019-12-15 10:24:34 -05:00
Benjamin Trent
4805d8ac7d
[ML][Inference] Adding a warning_field for warning msgs. (#49838) (#50183)
This adds a new field for the inference processor.

`warning_field` is a place for us to write warnings provided from the inference call. When there are warnings we are not going to write an inference result. The goal of this is to indicate that the data provided was too poor or too different for the model to make an accurate prediction.

The user could optionally include the `warning_field`. When it is not provided, it is assumed no warnings were desired to be written.

The first of these warnings is when ALL of the input fields are missing. If none of the trained fields are present, we don't bother inferencing against the model and instead provide a warning stating that the fields were missing.

Also, this adds checks to not allow duplicated fields during processor creation.
2019-12-13 10:39:51 -05:00
Benjamin Trent
41736dd6c3
[ML] retry bulk indexing of state docs (#50149) (#50185)
This exchanges the direct use of the `Client` for `ResultsPersisterService`. State doc persistence will now retry. Failures to persist state will still not throw, but will be audited and logged.
2019-12-13 10:39:34 -05:00
Dimitris Athanasiou
fe3c9e71d1
[7.x][ML] Fix DFA explain API timeout when source index is missing (#50176) (#50180)
This commit fixes a bug that caused the data frame analytics
_explain API to time out in a multi-node setup when the source
index was missing. When we try to create the extracted fields detector,
we check the index settings. If the index is missing that responds
with a failure that could be wrapped as a remote exception.
While we unwrapped correctly to check if the cause was an
`IndexNotFoundException`, we then proceeded to cast the original
exception instead of the cause.

Backport of #50176
2019-12-13 17:00:55 +02:00
Ioannis Kakavas
46376100b1
Fix testMalformedToken (#50164) (#50170)
This test was fixed as part of #49736 so that it used a
TokenService mock instance that was enabled, so that token
verification fails because the token is invalid and not because
the token service is not enabled.
When the randomly generated token we send, decodes to being of
version > 7.2 , we need to have mocked a GetResponse for the call
that TokenService#getUserTokenFromId will make, otherwise this
hangs and times out.
2019-12-13 13:46:44 +02:00
Dimitris Athanasiou
e6cbcf7f7c
[7.x] [ML] Persist/restore state for DFA classification (#50040) (#50147)
This commit adds state persist/restore for data frame analytics classification jobs.

Backport of #50040
2019-12-13 10:33:19 +02:00
Hendrik Muhs
1c3ce110bd [Transform] add actual timeout in message (#50140)
add the timeout to the message if stopping a transform times out
2019-12-13 08:10:25 +01:00
Jason Tedor
29526d0dfe
Validate exporter type is HTTP for HTTP exporter (#49992)
Today the HTTP exporter settings without the exporter type having been
configured to HTTP. When it is time to initialize the exporter, we can
blow up. Since this initialization happens on the cluster state applier
thread, it is quite problematic that we do not reject settings updates
where the type is not configured to HTTP, but there are HTTP exporter
settings configured. This commit addresses this by validating that the
exporter type is not only set, but is set to HTTP.
2019-12-12 20:01:04 -05:00
Tim Vernum
2811b97b76
Remove reserved roles for code search (#50115)
The "code_user" and "code_admin" reserved roles existed to support
code search which is no longer included in Kibana.

The "kibana_system" role included privileges to read/write from the
code search indices, but no longer needs that access.

Backport of: #50068
2019-12-13 10:22:55 +11:00
Julie Tibshirani
73c412063b Reenable the 'continue scroll' cluster upgrade test. 2019-12-12 12:34:49 -08:00
Benjamin Trent
d7ffa7f8f7
[7.x][ML] Add graceful retry for anomaly detector result indexing failures(#49508) (#50145)
* [ML] Add graceful retry for anomaly detector result indexing failures (#49508)

All results indexing now retry the amount of times configured in `xpack.ml.persist_results_max_retries`. The retries are done in a semi-random, exponential backoff.

* fixing test
2019-12-12 12:24:58 -05:00
Benjamin Trent
c043aa887f
[ML][Inference] Simplify inference processor options (#50105) (#50146)
* [ML][Inference] Simplify inference processor options

* addressing pr comments
2019-12-12 11:13:55 -05:00
David Roberts
13e47df97d [TEST] Increase timeout for ML internal cluster cleanup (#50142)
Closes #48511
2019-12-12 15:38:22 +00:00
Ignacio Vera
b5ec227de8
upgrade to lucene 8.4.0-snapshot-08b8d116f8f (#50129) (#50132) 2019-12-12 13:13:37 +01:00
David Kyle
7d4118dc4e Enable trace logging in failing ml NetworkDisruptionIT
https://github.com/elastic/elasticsearch/issues/49908
2019-12-12 11:16:01 +00:00
Tim Vernum
47e5e34f42
Support "enterprise" license types (#49474)
This adds "enterprise" as an acceptable type for a license loaded
through the PUT _license API.

Internally an enterprise license is treated as having a "platinum"
operating mode.

The handling of License types was refactored to have a new explicit
"LicenseType" enum in addition to the existing "OperatingMode" enum.

By default (in 7.x) the GET license API will return "platinum" when an
enterprise license is active in order to be compatible with existing
consumers of that API.
A new "accept_enterprise" flag has been introduced to allow clients to
opt-in to receive the correct "enterprise" type.

Backport of: #49223
2019-12-12 14:37:44 +11:00
Andrei Stefan
e9e2e5fc71 Have COUNT DISTINCT return 0 instead of NULL for no documents matching. (#50037)
(cherry picked from commit cb94731e6f41bc51c23e4aab495b64eea731a061)
2019-12-12 00:34:04 +02:00
Julie Tibshirani
277880bb4f
In sparse vector REST tests, specify the index name in searches. (#50061)
The `sparse_vector` REST tests occasionally fail on 7.x because we don't receive the expected response headers with deprecation warnings.

One theory as to what is happening is that there is an extra empty index present in addition to the test index. Since the search doesn't specify an index name, it hits both the test index and this extra empty index and shard responses from the extra index don't produce deprecation warnings. If not all shard responses contain the warning headers, then certain deprecation warnings can be lost (due to the bug described in #33936).

This PR tries to harden the `sparse_vector` tests by always specifying the index name during a search. This doesn't fix the root causes of the issue, but is good practice and can help avoid intermittent failures.

Addresses #49383.
2019-12-11 10:33:47 -08:00
Dimitris Athanasiou
03ecaae221
[7.x][ML] Avoid classification integ test training on single class (#50072) (#50078)
The `ClassificationIT.testTwoJobsWithSameRandomizeSeedUseSameTrainingSet`
test was previously set up to just have 10 rows. With `training_percent`
of 50%, only 5 rows will be used for training. There is a good chance that
all 5 rows will be of one class which results to failure.

This commit increases the rows to 100. Now 50 rows should be used for training
and the chance of failure should be very small.

Backport of #50072
2019-12-11 18:50:26 +02:00
Henning Andersen
9cdabbd363 Log attachment generation failures (#50080)
Watcher logs when actions fail in ActionWrapper, but failures to
generate an email attachment are not logged and we thus only know the
type of the exception and not where/how it occurred.
2019-12-11 17:20:22 +01:00
David Turner
285eacd267
Use more specific loggers in subclasses of TMNA (#50076)
Adjusts the subclasses of `TransportMasterNodeAction` to use their own loggers
instead of the one for the base class.

Relates #50056.
Partial backport of #46431 to 7.x.
2019-12-11 15:07:47 +00:00
Przemysław Witek
9b116c8fef
A few improvements to AnalyticsProcessManager class that make the code more readable. (#50026) (#50069) 2019-12-11 09:35:05 +01:00
Ioannis Kakavas
3b613c36f4
Always return 401 for not valid tokens (#49736) (#50042)
Return a 401 in all cases when a request is submitted with an
access token that we can't consume. Before this change, we would
throw a 500 when a request came in with an access token that we
had generated but was then invalidated/expired and deleted from
the tokens index.

Resolves: #38866
Backport of #49736
2019-12-11 09:14:50 +02:00
Stuart Cam
44cd2f444c Add the REST API specifications for SLM Status / Start / Stop endpoints. (#49759)
Was originally missed in PR #47710

(cherry picked from commit 133b34c8355639ae0f699a86ffd9f37d19f73bca)
2019-12-11 13:34:13 +11:00
Adrien Grand
87e72156ce
Upgrade to lucene 8.4.0-snapshot-662c455. (#50016) (#50039)
Lucene 8.4 is about to be released so we should check it doesn't cause problems
with Elasticsearch.
2019-12-10 18:04:58 +01:00
Dimitris Athanasiou
8891f4db88
[7.x][ML] Introduce randomize_seed setting for regression and classification (#49990) (#50023)
This adds a new `randomize_seed` for regression and classification.
When not explicitly set, the seed is randomly generated. One can
reuse the seed in a similar job in order to ensure the same docs
are picked for training.

Backport of #49990
2019-12-10 15:29:19 +02:00
Yannick Welsch
a16abf921f Make elasticsearch-node tools custom metadata-aware (#48390)
The elasticsearch-node tools allow manipulating the on-disk cluster state. The tool is currently
unaware of plugins and will therefore drop custom metadata from the cluster state once the
state is written out again (as it skips over the custom metadata that it can't read). This commit
preserves unknown customs when editing on-disk metadata through the elasticsearch-node
command-line tools.
2019-12-10 09:58:11 +01:00
Jason Tedor
bfb2dc1353
Enable dependent settings values to be validated (#49942)
Today settings can declare dependencies on another setting. This
declaration is implemented so that if the declared setting is not set
when the declaring setting is, settings validation fails. Yet, in some
cases we want not only that the setting is set, but that it also has a
specific value. For example, with the monitoring exporter settings, if
xpack.monitoring.exporters.my_exporter.host is set, we not only want
that xpack.monitoring.exporters.my_exporter.type is set, but that it is
also set to local. This commit extends the settings infrastructure so
that this declaration is possible. The use of this in the monitoring
exporter settings will be implemented in a follow-up.
2019-12-09 12:45:50 -05:00
Marios Trivyzas
48e7420307 SQL: [Tests] Unmute Pivot from NodeSublassTests (#49925)
The `testReplaceChildren()` has been fixed for Pivot as part
of #49693.

Reverting: #49045
(cherry picked from commit 4b9b9edbcf2041a8619b65580bbe192bf424cebc)
2019-12-09 17:20:20 +01:00
Benjamin Trent
0b6ce9683c
[ML] Use query in cardinality check (#49939) (#49984)
When checking the cardinality of a field, the query should be take into account. The user might know about some bad data in their index and want to filter down to the target_field values they care about.
2019-12-09 10:14:41 -05:00
Przemysław Witek
0965a10468
[7.x] Pass prediction_field_type to C++ analytics process (#49861) (#49981) 2019-12-09 14:43:01 +01:00
Benjamin Trent
049d854360
[ML][Inference] adjust so target_field always has inference result and optionally allow new top classes field in the classification config (#49923) (#49982) 2019-12-09 08:29:45 -05:00