Commit Graph

1557 Commits

Author SHA1 Message Date
Benjamin Trent 8559ff7cee
[ML][Inference] fixing pattern compilation + unnecessary string copy (#51483) (#51487) 2020-01-27 12:12:34 -05:00
Hendrik Muhs b233e93014
[Transform] refactor naming leftovers and apply code formating (#51465) (#51470)
refactor renaming leftovers: "data frame transform" to "transforms", touch only internals (variable
names, non-public API's, doc strings, ...) and apply code-formatting (spotless). No logical changes.
2020-01-27 14:04:57 +01:00
Ioannis Kakavas ee202a642f
Enable tests in FIPS 140 in JDK 11 (#49485)
This change changes the way to run our test suites in 
JVMs configured in FIPS 140 approved mode. It does so by:

- Configuring any given runtime Java in FIPS mode with the bundled
policy and security properties files, setting the system
properties java.security.properties and java.security.policy
with the == operator that overrides the default JVM properties
and policy.

- When runtime java is 11 and higher, using BouncyCastle FIPS 
Cryptographic provider and BCJSSE in FIPS mode. These are 
used as testRuntime dependencies for unit
tests and internal clusters, and copied (relevant jars)
explicitly to the lib directory for testclusters used in REST tests

- When runtime java is 8, using BouncyCastle FIPS 
Cryptographic provider and SunJSSE in FIPS mode. 

Running the tests in FIPS 140 approved mode doesn't require an
additional configuration either in CI workers or locally and is
controlled by specifying -Dtests.fips.enabled=true
2020-01-27 11:14:52 +02:00
Przemko Robakowski fbec19c022
Centralize mocks initialization in ILM steps tests (#51384) (#51453)
* Centralize mocks initialization in ILM steps tests

This change centralizes initialization of `Client`, `AdminClient`
and `IndicesAdminClient` for all classes extending `AbstractStepTestCase`.
This removes a lot of code duplication and make it easier to write tests.
This also removes need for `AsyncActionStep#setClient`

* Unused imports removed

* Added missed tests

* Fix OpenFollowerIndexStepTests
2020-01-25 01:19:55 +01:00
Benjamin Trent bf53ca3380
[7.x] [ML] Add _cat/ml/anomaly_detectors API (#51364) (#51408)
[ML] Add _cat/ml/anomaly_detectors API (#51364)
2020-01-24 11:54:22 -05:00
Benjamin Trent fc994d9ce1
[ML][Inference] Adds validations for model PUT (#51376) (#51409)
Adds validations making sure that

* `input.field_names` is not empty
* `ensemble.trained_models` is not empty
* `tree.feature_names` is not empty

closes https://github.com/elastic/elasticsearch/issues/51354
2020-01-24 09:29:12 -05:00
Benjamin Trent 76660a5a4f
[7.x] [ML][Inference] add tags url param to GET (#51330) (#51404)
* [ML][Inference] add tags url param to GET (#51330)

Adds a new URL parameter, `tags` to the GET _ml/inference/<model_id> endpoint.

This parameter allows the list of models to be further reduced to those who contain all the provided tags.
2020-01-24 08:26:58 -05:00
Przemysław Witek 8703b885c2
Move TupleMatchers class to org.elasticsearch.test.hamcrest package (#51359) (#51395) 2020-01-24 11:10:54 +01:00
Hendrik Muhs d46e8c3f7f [Transform] disallow dotted fieldnames (#51369)
adds field validation to disallow output field names starting and/or ending with a '.'. Avoids
indexing/mapping problems when starting the transform.
2020-01-24 09:05:44 +01:00
Lee Hinman 31747de2a2 Fix testHashcodeAndEquals mutation for WaitForSnapshotStep (#51379)
This fixes the test failure, it was randomness returning the same policy
rather than a new one. Switched to use `randomValueOtherThan`.

Resolves #51377
2020-01-23 16:19:50 -07:00
Hendrik Muhs 3553f68f5a [Transform] Handle permanent bulk indexing errors (#51307)
check bulk indexing error for permanent problems and ensure the state goes into failed instead of
retry. Corrects the stats API to show the real error and avoids excessive audit logging.

fixes #50122
2020-01-23 16:17:26 +01:00
Przemko Robakowski 84664e8d60
Expose master timeout for ILM actions (#51130) (#51348)
This change exposes master timeout to ILM steps through global dynamic setting.
All currently implemented steps make use of this setting as well.

Closes #44136
2020-01-23 15:28:13 +01:00
David Kyle 0ac03ac5e7
[ML] Add parsers for inference configuration classes (#51300) 2020-01-22 17:03:01 +00:00
Dimitris Athanasiou 59687a9384
[7.x][ML] Validate classification dependent_variable cardinality is at lea… (#51232) (#51309)
Data frame analytics classification currently only supports 2 classes for the
dependent variable. We were checking that the field's cardinality is not higher
than 2 but we should also check it is not less than that as otherwise the process
fails.

Backport of #51232
2020-01-22 16:51:16 +02:00
Przemysław Witek bfcfcdee33
[7.x] Do not copy mapping from dependent variable to prediction field in regression analysis (#51227) (#51288) 2020-01-22 12:36:24 +01:00
Andrei Dan 421aa14972
ILM: Make UpdateSettingsStep retryable (#51235) (#51298)
This makes the UpdateSettingsStep retryable. This step updates settings needed
during the execution of ILM actions (mark indexes as read-only, change
allocation configurations, mark indexing complete, etc)

As the index updates are idempotent in nature (PUT requests and are applied only
if the values have changed) and the settings values are seldom user-configurable
(aside from the allocate action) the testing for this change goes along the
lines of artificially simulating a setting update failure on a particular value
update, which is followed by a successful step execution (a retry) in an
environment outside of ILM (the step executions are triggered manually).

(cherry picked from commit 8391b0aba469f39532bfc2796b76148167dc0289)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-01-22 11:02:26 +00:00
Andrei Dan 123266714b
ILM wait for active shards on rolled index in a separate step (#50718) (#51296)
After we rollover the index we wait for the configured number of shards for the
rolled index to become active (based on the index.write.wait_for_active_shards
setting which might be present in a template, or otherwise in the default case,
for the primaries to become active).
This wait might be long due to disk watermarks being tripped, replicas not
being able to spring to life due to cluster nodes reconfiguration and others
and, the RolloverStep might not complete successfully due to this inherent
transient situation, albeit the rolled index having been created.

(cherry picked from commit 457a92fb4c68c55976cc3c3e2f00a053dd2eac70)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-01-22 11:01:52 +00:00
Nik Everett 977b53ab91
Fix flaky usage tracking test (#51169) (#51179)
We added tracking of index feature usage in #51031 but due to some copy
and paste errors the test fails on some seeds. This fixes those errors.
2020-01-17 16:53:13 -05:00
David Roberts 295665b1ea [ML] Add audit warning for 1000 categories found early in job (#51146)
If 1000 different category definitions are created for a job in
the first 100 buckets it processes then an audit warning will now
be created.  (This will cause a yellow warning triangle in the
ML UI's jobs list.)

Such a large number of categories suggests that the field that
categorization is working on is not well suited to the ML
categorization functionality.
2020-01-17 16:28:45 +00:00
Przemysław Witek da73c9104e
[ML] Fix tests randomly failing on CI (#51142) (#51150) 2020-01-17 14:58:58 +01:00
Przemysław Witek b1a526d5e9
[7.x] [ML] Update DFA progress document in the index the document belongs to (#51111) (#51117) 2020-01-17 08:12:54 +01:00
Hendrik Muhs 13343b15c9 [Transform] Improve force stop robustness in case of an error (#51072)
If a transform config got lost (e.g. because the internal index disappeared) tasks could not be
stopped using transform API. This change makes it possible to stop transforms without a config,
meaning to remove the background task. In order to do so force must be set to true.
2020-01-17 07:42:21 +01:00
Adrien Grand 45d7bdcfd7
Add analysis components and mapping types to the usage API. (#51062)
Knowing about used analysis components and mapping types would be incredibly
useful in order to know which ones may be deprecated or should get more love.

Some field types also act as a proxy to know about feature usage of some APIs
like the `percolator` or `completion` fields types for percolation and the
completion suggester, respectively.
2020-01-16 09:56:41 +01:00
Lee Hinman 2d1c28a45d
[7.x] Fix AllocateRoutedStepTests reusing keys for random valu… (#51058)
In these tests there was a very small chance that keys could collide,
which causes test failures.

Resolves #49307
2020-01-15 11:36:34 -07:00
Tim Vernum e41c0b1224
Deprecating kibana_user and kibana_dashboard_only_user roles (#50963)
This change adds a new `kibana_admin` role, and deprecates
the old `kibana_user` and`kibana_dashboard_only_user`roles.

The deprecation is implemented via a new reserved metadata
attribute, which can be consumed from the API and also triggers
deprecation logging when used (by a user authenticating to
Elasticsearch).

Some docs have been updated to avoid references to these
deprecated roles.

Backport of: #46456

Co-authored-by: Larry Gregory <lgregorydev@gmail.com>
2020-01-15 11:07:19 +11:00
Nik Everett fc5fde7950
Add "did you mean" to ObjectParser (#50938) (#50985)
Check it out:
```
$ curl -u elastic:password -HContent-Type:application/json -XPOST localhost:9200/test/_update/foo?pretty -d'{
  "dac": {}
}'

{
  "error" : {
    "root_cause" : [
      {
        "type" : "x_content_parse_exception",
        "reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?"
      }
    ],
    "type" : "x_content_parse_exception",
    "reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?"
  },
  "status" : 400
}
```

The tricky thing about implementing this is that x-content doesn't
depend on Lucene. So this works by creating an extension point for the
error message using SPI. Elasticsearch's server module provides the
"spell checking" implementation.
s
2020-01-14 17:53:41 -05:00
Benjamin Trent 72c270946f
[ML][Inference] Adding classification_weights to ensemble models (#50874) (#50994)
* [ML][Inference] Adding classification_weights to ensemble models

classification_weights are a way to allow models to
prefer specific classification results over others
this might be advantageous if classification value
probabilities are a known quantity and can improve
model error rates.
2020-01-14 12:40:25 -05:00
Dimitris Athanasiou 1d8cb3c741
[7.x][ML] Add num_top_feature_importance_values param to regression and classi… (#50914) (#50976)
Adds a new parameter to regression and classification that enables computation
of importance for the top most important features. The computation of the importance
is based on SHAP (SHapley Additive exPlanations) method.

Backport of #50914
2020-01-14 16:46:09 +02:00
Tim Vernum 2bb7b53e41
Add certutil http command (#50952)
This adds a new "http" sub-command to the certutil CLI tool.

The http command generates certificates/CSRs for use on the http
interface of an elasticsearch node/cluster.
It is designed to be a guided tool that provides explanations and
sugestions for each of the configuration options. The generated zip
file output includes extensive "readme" documentation and sample
configuration files for core Elastic products.

Backport of: #49827
2020-01-14 21:24:21 +11:00
Tim Vernum b02b073a57
Increase Size and lower TTL on DLS BitSet Cache (#50953)
The Document Level Security BitSet Cache (see #43669) had a default
configuration of "small size, long lifetime". However, this is not
a very useful default as the cache is most valuable for BitSets that
take a long time to construct, which is (generally speaking) the same
ones that operate over a large number of documents and contain many
bytes.

This commit changes the cache to be "large size, short lifetime" so
that it can hold bitsets representing billions of documents, but
releases memory quickly.

The new defaults are 10% of heap, and 2 hours.

This also adds some logging when a single BitSet exceeds the size of
the cache and when the cache is full.

Backport of: #50535
2020-01-14 18:04:02 +11:00
Tim Vernum 33c29fb5a3
Support Client and RoleMapping in custom Realms (#50950)
Previously custom realms were limited in what services and components
they had easy access to. It was possible to work around this because a
security extension is packaged within a Plugin, so there were ways to
store this components in static/SetOnce variables and access them from
the realm, but those techniques were fragile, undocumented and
difficult to discover.

This change includes key services as an argument to most of the methods
on SecurityExtension so that custom realm / role provider authors can
have easy access to them.

Backport of: #50534
2020-01-14 15:26:41 +11:00
Tim Vernum 90ba77951a
Fix memory leak in DLS bitset cache (#50946)
The Document Level Security BitSet cache stores a secondary "lookup
map" so that it can determine which cache entries to invalidate when
a Lucene index is closed (merged, etc).

There was a memory leak because this secondary map was not cleared
when entries were naturally evicted from the cache (due to size/ttl
limits).

This has been solved by adding a cache removal listener and processing
those removal events asyncronously.

Backport of: #50635
2020-01-14 13:19:05 +11:00
Tim Vernum 1577a0e617
Validate field permissions when creating a role (#50917)
When creating a role, we do not check if the exceptions for
the field permissions are a subset of granted fields. If such
a role is assigned to a user then that user's authentication fails
for this reason.

We added a check to validate role query in #46275 and on the same lines,
this commit adds check if the exceptions for the field
permissions is a subset of granted fields when parsing the
index privileges from the role descriptor.

Backport of: #50212

Co-authored-by: Yogesh Gaikwad <bizybot@users.noreply.github.com>
2020-01-14 12:37:45 +11:00
Tim Vernum c2acb8830a
Add max_resource_units to enterprise license (#50910)
The enterprise license type must have "max_resource_units" and may not
have "max_nodes".

This change adds support for this new field, validation that the field
is present if-and-only-if the license is enterprise and bumps the
license version number to reflect the new field.

Includes a BWC layer to return "max_nodes: ${max_resource_units}" in
the GET license API.

Backport of: #50735
2020-01-14 12:37:05 +11:00
Przemko Robakowski a18736b46d
[7.x] ILM action to wait for SLM policy execution (#50454) (#50943)
* ILM action to wait for SLM policy execution (#50454)

This change add new ILM action to wait for SLM policy execution to ensure that index has snapshot before deletion.

Closes #45067

* Fix flaky TimeSeriesLifecycleActionsIT#testWaitForSnapshot test

This change adds some randomness and cleanup step to TimeSeriesLifecycleActionsIT#testWaitForSnapshot and testWaitForSnapshotSlmExecutedBefore tests in attempt to make them stable.

Reletes to #50781

* Formatting changes

* Longer timeout

* Fix Map.of in Java8

* Unused import removed
2020-01-14 01:34:33 +01:00
Ioannis Kakavas ba37e3c4a0
Disable DiagnosticTrustManager in FIPS 140 (#49888)
This commit changes the default behavior for
xpack.security.ssl.diagnose.trust when running in a FIPS 140 JVM.

More specifically, when xpack.security.fips_mode.enabled is true:

- If xpack.security.ssl.diagnose.trust is not explicitly set, the
    default value of it becomes false and a log message is printed
    on info level, notifying of the fact that the TLS/SSL diagnostic
    messages are not enabled when in a FIPS 140 JVM.
- If xpack.security.ssl.diagnose.trust is explicitly set, the value of
    it is honored, even in FIPS mode.

This is relevant only for 7.x where we support Java 8 in which
SunJSSE can still be used as a FIPS 140 provider for TLS. SunJSSE
in FIPS mode, disallows the use of other TrustManager implementations
than the one shipped with SunJSSE.
2020-01-13 17:04:23 +02:00
Larry Gregory cc8aafcfc2
[7.x] - Adding GET/PUT ILM cluster privileges to `kibana_syste… (#50878)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-01-13 08:36:48 -05:00
Benjamin Trent eb8fd44836
[ML][Inference] minor fixes for created_by, and action permission (#50890) (#50911)
The system created and models we provide now use the `_xpack` user for uniformity with our other features

The `PUT` action is now an admin cluster action

And XPackClient class now references the action instance.
2020-01-13 07:59:31 -05:00
Albert Zaharovits 2b789fa3e6
Make .async-search-* a restricted namespace (#50294)
Hide the `.async-search-*` in Security by making it a restricted index namespace.
The namespace is hard-coded.
To grant privileges on restricted indices, one must explicitly toggle the
`allow_restricted_indices` flag in the indices permission in the role definition.
As is the case with any other index, if a certain user lacks all permissions for an
index, that index is effectively nonexistent for that user.
2020-01-13 12:20:54 +02:00
Benjamin Trent fa116a6d26
[7.x] [ML][Inference] PUT API (#50852) (#50887)
* [ML][Inference] PUT API (#50852)

This adds the `PUT` API for creating trained models that support our format.

This includes

* HLRC change for the API
* API creation
* Validations of model format and call

* fixing backport
2020-01-12 10:59:11 -05:00
Benjamin Trent 5afa0b71e9
[ML][Inference] Unify top_classes object field names with analytics (#50858) (#50861) 2020-01-10 12:00:37 -05:00
Dimitris Athanasiou 422422a2bc
[7.x][ML] Reuse SourceDestValidator for data frame analytics (#50841) (#50850)
This commit removes validation logic of source and dest indices
for data frame analytics and replaces it with using the common
`SourceDestValidator` class which is already used by transforms.
This way the validations and their messages become consistent
while we reduce code.

This means that where these validations fail the error messages
will be slightly different for data frame analytics.

Backport of #50841
2020-01-10 14:24:13 +02:00
Benjamin Trent 3e014d39c2
[Transform] fail to start/put on missing pipeline (#50701) (#50795)
If a pipeline referenced by a transform does not exist, we should not allow the transform to be created. 

We do allow the pipeline existence check to be skipped with defer_validations, but if the pipeline still does not exist on `_start`, the pipeline will fail to start.

relates:  #50135
2020-01-09 10:33:22 -05:00
Christoph Büscher b1b4282273 Make Multiplexer inherit filter chains analysis mode (#50662)
Currently, if an updateable synonym filter is included in a multiplexer filter,
it is not reloaded via the _reload_search_analyzers because the multiplexer
itself doesn't pass on the analysis mode of the filters it contains, so its not
recognized as "updateable" in itself. Instead we can check and merge the
AnalysisMode settings of all filters in the multiplexer and use the resulting
mode (e.g. search-time only) for the multiplexer itself, thus making any synonym
filters contained in it reloadable.  This, of course, will also make the
analyzers using the multiplexer be usable at search-time only.

Closes #50554
2020-01-08 22:12:01 +01:00
Lee Hinman 8dc6e98819
[7.x] Make InitializePolicyContextStep retryable (#50685) (#50760)
This commits makes the "init" ILM step retryable. It also adds a test
where an index is created with a non-parsable index name and then fails.

Related to #48183
2020-01-08 13:13:57 -07:00
Andrei Dan 3915d4c055
Make the UpdateRolloverLifecycleDateStep retryable (#50702) (#50730)
This makes the "update-rollover-lifecycle-date" step, which is part of the
rollover action, retryable. It also adds an integration test to check the
step is retried and it eventually succeeds.

(cherry picked from commit 5bf068522deb2b6cd2563bcf80f34fdbf459c9f2)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-01-08 11:45:26 +01:00
Benjamin Trent 060e0a6277
[ML][Inference] Add support for models shipped as resources (#50680) (#50700)
This adds support for models that are shipped as resources in the ML plugin. The first of which is the `lang_ident` model.
2020-01-07 09:21:59 -05:00
Hendrik Muhs 98ca9500e8
implement a workaround for remote cluster validation (#50460)
In 7.x an internal API used for validating remote cluster does not throw, see #50420 for the 
details. This change implements a workaround for remote cluster validation, only for 7.x branches.

fixes #50420
2020-01-07 13:51:51 +01:00
David Roberts 35453e2b0e [ML] Improve uniqueness of result document IDs (#50644)
Switch from a 32 bit Java hash to a 128 bit Murmur hash for
creating document IDs from by/over/partition field values.
The 32 bit Java hash was not sufficiently unique, and could
produce identical numbers for relatively common combinations
of by/partition field values such as L018/128 and L017/228.

Fixes #50613
2020-01-07 10:24:45 +00:00
Benjamin Trent 5ab9e75e28
[7.x] [ML][Inference] lang_ident model (#50292) (#50675)
* [ML][Inference] lang_ident model (#50292)

This PR contains a java port of Google's CLD3 compact NN model https://github.com/google/cld3

The ported model is formatted to fit within our inference model formatting and stored as a resource in the `:xpack:ml:` plugin and is under basic license.

The model is broken up into two major parts:
- Preprocessing through the custom embedding (based on CLD3's embedding layer)
- Pushing the embedded text through the two layers of fully connected shallow NN. 

Main differences between this port and CLD3:
- We take advantage of Java's internal Unicode handling where possible (i.e. codepoints, characters, decoders, etc.)
- We do not trim down input text by removing duplicated tokens
- We do not encode doubles/floats as longs/integers.
2020-01-06 16:24:03 -05:00