Commit Graph

1088 Commits

Author SHA1 Message Date
Benjamin Trent 89668c5ea0
[ML][Inference] adds new default_field_map field to trained models (#53294) (#53419)
Adds a new `default_field_map` field to trained model config objects.

This allows the model creator to supply field map if it knows that there should be some map for inference to work directly against the training data.

The use case internally is having analytics jobs supply a field mapping for multi-field fields. This allows us to use the model "out of the box" on data where we trained on `foo.keyword` but the `_source` only references `foo`.
2020-03-11 13:49:39 -04:00
Dimitris Athanasiou 0fd0516d0d
[7.x][ML] Rename data frame analytics maximum_number_trees to max_trees (#53300) (#53390)
Deprecates `maximum_number_trees` parameter of classification and
regression and replaces it with `max_trees`.

Backport of #53300
2020-03-11 12:45:27 +02:00
Hendrik Muhs 696aa4ddaf
[7.x][Transform] add support for script in group_by (#53167) (#53324)
add the possibility to base the group_by on the output of a script.

closes #43152
backport #53167
2020-03-10 11:12:58 +01:00
Christoph Büscher 9e561c2921 Fix AbstractBulkByScrollRequest slices parameter via Rest (#53068)
Currently the AbstractBulkByScrollRequest accepts slice values of 0 via its
`setSlices` method, denoting the "auto" slicing behaviour that is usable by
settting the "slices=auto" parameter on rest requests. When using the High Level
Rest Client, however, we send the 0 value as an integer, which is then rejected
as invalid by `AbstractBulkByScrollRequest#parseSlices`. Instead of making
parsing of the rest request more lenient, this PR opts for changing the
RequestConverter logic in the client to translate 0 values to "auto" on the rest
requests.

Closes #53044
2020-03-06 15:38:04 +01:00
Aleksandr Maus 2dc872f052
EQL: Add HLRC for EQL stats (#53043) (#53148) 2020-03-05 09:20:38 -05:00
Nik Everett 28df7ae5ed
Support multiple metrics in `top_metrics` agg (backport of #52965) (#53163)
This adds support for returning multiple metrics to the `top_metrics`
agg. It looks like:
```
POST /test/_search?filter_path=aggregations
{
  "aggs": {
    "tm": {
      "top_metrics": {
        "metrics": [
          {"field": "v"},
          {"field": "m"}
        ],
        "sort": {"s": "desc"}
      }
    }
  }
}
```
2020-03-05 08:12:01 -05:00
Aleksandr Maus b47bffba24
EQL: consistent naming for event type vs event category (#53073) (#53090)
Related to https://github.com/elastic/elasticsearch/issues/52941
2020-03-04 08:02:38 -05:00
Costin Leau 712e0c05cd EQL: Add implicit ordering on timestamp (#53004)
QL: Move Sort base class from SQL to QL
(cherry picked from commit 798015b7bbd565e9c4222724614baeb432c7c2b3)
2020-03-02 22:41:36 +02:00
Aleksandr Maus 89ed857c79
EQL: Change request parameter query to filter and rule to query (#52971) (#53006)
Related to https://github.com/elastic/elasticsearch/issues/52911
2020-03-02 09:26:23 -05:00
Dimitris Athanasiou 85b4e45093
[7.x]ML] Parse and report memory usage for DF Analytics (#52778) (#52980)
Adds reporting of memory usage for data frame analytics jobs.
This commit introduces a new index pattern `.ml-stats-*` whose
first concrete index will be `.ml-stats-000001`. This index serves
to store instrumentation information for those jobs.

Backport of #52778 and #52958
2020-02-29 13:03:40 +02:00
Dan Hermann dd44376d27
[7.x] Send the fields param in body instead of URL params (#52948) 2020-02-28 08:57:35 -06:00
Costin Leau a674085903 EQL: Disable field extraction for returned events (#52884)
Return the whole source of matching events

(cherry picked from commit 79ca586ab1d89d645fb58142b82202f14ce5d361)
2020-02-28 13:48:15 +02:00
Nik Everett 1d1956ee93
Add size support to `top_metrics` (backport of #52662) (#52914)
This adds support for returning the top "n" metrics instead of just the
very top.

Relates to #51813
2020-02-27 16:12:52 -05:00
Benjamin Trent 19a6c5d980
[7.x] [ML][Inference] Add support for multi-value leaves to the tree model (#52531) (#52901)
* [ML][Inference] Add support for multi-value leaves to the tree model (#52531)

This adds support for multi-value leaves. This is a prerequisite for multi-class boosted tree classification.
2020-02-27 14:05:28 -05:00
Benjamin Trent eac38e9847
[ML] Add indices_options to datafeed config and update (#52793) (#52905)
This adds a new configurable field called `indices_options`. This allows users to create or update the indices_options used when a datafeed reads from an index.

This is necessary for the following use cases:
 - Reading from frozen indices
 - Allowing certain indices in multiple index patterns to not exist yet

These index options are available on datafeed creation and update. Users may specify them as URL parameters or within the configuration object.

closes https://github.com/elastic/elasticsearch/issues/48056
2020-02-27 13:43:25 -05:00
Josh Devins 68ba571f70
Adds recall@k metric to rank eval API (#52889)
This change adds the recall@k metric and refactors precision@k to match
the new metric.

Recall@k is an important metric to use for learning to rank (LTR)
use-cases. Candidate generation or first ranking phase ranking functions
are often optimized for high recall, in order to generate as many
relevant candidates in the top-k as possible for a second phase of
ranking. Adding this metric allows tuning that base query for LTR.

See: https://github.com/elastic/elasticsearch/issues/51676
Backports: https://github.com/elastic/elasticsearch/pull/52577
2020-02-27 16:04:24 +01:00
Costin Leau 40bc06f6ad EQL: Hook engine to Elasticsearch (#52828)
Add query execution and return actual results returned from
Elasticsearch inside the tests

(cherry picked from commit 3e039282bf991af87604a6d4f8eada19d5e33842)
2020-02-27 11:22:22 +02:00
Jake Landis 8d311297ca
[7.x] Smarter copying of the rest specs and tests (#52114) (#52798)
* Smarter copying of the rest specs and tests (#52114)

This PR addresses the unnecessary copying of the rest specs and allows
for better semantics for which specs and tests are copied. By default
the rest specs will get copied if the project applies
`elasticsearch.standalone-rest-test` or `esplugin` and the project
has rest tests or you configure the custom extension `restResources`.

This PR also removes the need for dozens of places where the x-pack
specs were copied by supporting copying of the x-pack rest specs too.

The plugin/task introduced here can also copy the rest tests to the
local project through a similar configuration.

The new plugin/task allows a user to minimize the surface area of
which rest specs are copied. Per project can be configured to include
only a subset of the specs (or tests). Configuring a project to only
copy the specs when actually needed should help with build cache hit
rates since we can better define what is actually in use.
However, project level optimizations for build cache hit rates are
not included with this PR.

Also, with this PR you can no longer use the includePackaged flag on
integTest task.

The following items are included in this PR:
* new plugin: `elasticsearch.rest-resources`
* new tasks: CopyRestApiTask and CopyRestTestsTask - performs the copy
* new extension 'restResources'
```
restResources {
  restApi {
    includeCore 'foo' , 'bar' //will include the core specs that start with foo and bar
    includeXpack 'baz' //will include x-pack specs that start with baz
  }
  restTests {
    includeCore 'foo', 'bar' //will include the core tests that start with foo and bar
    includeXpack 'baz' //will include the x-pack tests that start with baz
  }
}

```
2020-02-26 08:13:41 -06:00
Sachin Frayne d3c0a2f013 Improve the error message when loading text fielddata. (#52753)
Emphasize keyword over fielddata as the preferred way to use String fields for aggregations or sorting.
2020-02-25 15:45:44 -08:00
Aleksandr Maus a7bdb0b456
EQL: Add integration tests harness to test EQL feature parity with original implementation (#52248) (#52675)
The tests use the original test queries from
https://github.com/endgameinc/eql/blob/master/eql/etc/test_queries.toml
for EQL implementation correctness validation.
The file test_queries_unsupported.toml serves as a "blacklist" for the
queries that we do not support. Currently all of the queries are
blacklisted. Over the time the expectation is to eventually have an
empty "blacklist" when all of the queries are fully supported.

The tests use the original test vector from
https://raw.githubusercontent.com/endgameinc/eql/master/eql/etc/test_data.json.

Only one EQL and the response is stubbed for now to match the expected
output from that query. This part would need some tweaking after EQL is
fully wired.

Related to https://github.com/elastic/elasticsearch/issues/49581
2020-02-24 12:46:59 -05:00
Przemko Robakowski e72cb79476
Add docs for errors in GetAlias API (#51850) (#52716)
Closes #31499

Co-authored-by: Maxim <timonin.maksim@mail.ru>
2020-02-24 18:22:09 +01:00
Igor Motov e5b21a3fc6
Add HLRC for EQL search (#52550)
Adds EQL HLRC client with the search method.

Relates to #51961
2020-02-21 08:44:08 -05:00
David Kyle 7bbe5c8464
[Ml] Validate tree feature index is within range (#52514)
This changes the tree validation code to ensure no node in the tree has a
feature index that is beyond the bounds of the feature_names array.
Specifically this handles the situation where the C++ emits a tree containing
a single node and an empty feature_names list. This is valid tree used to
centre the data in the ensemble but the validation code would reject this
as feature_names is empty. This meant a broken workflow as you cannot GET
the model and PUT it back
2020-02-19 14:41:43 +00:00
Nik Everett 146def8caa
Implement top_metrics agg (#51155) (#52366)
The `top_metrics` agg is kind of like `top_hits` but it only works on
doc values so it *should* be faster.

At this point it is fairly limited in that it only supports a single,
numeric sort and a single, numeric metric. And it only fetches the "very
topest" document worth of metric. We plan to support returning a
configurable number of top metrics, requesting more than one metric and
more than one sort. And, eventually, non-numeric sorts and metrics. The
trick is doing those things fairly efficiently.

Co-Authored by: Zachary Tong <zach@elastic.co>
2020-02-14 11:19:11 -05:00
Nik Everett 2dac36de4d
HLRC support for string_stats (#52163) (#52297)
This adds a builder and parsed results for the `string_stats`
aggregation directly to the high level rest client. Without this the
HLRC can't access the `string_stats` API without the elastic licensed
`analytics` module.

While I'm in there this adds a few of our usual unit tests and
modernizes the parsing.
2020-02-12 19:25:05 -05:00
David Roberts 1cefafdd14 [ML] Add new categorization stats to model_size_stats (#52009)
This change adds support for the following new model_size_stats
fields:

- categorized_doc_count
- total_category_count
- frequent_category_count
- rare_category_count
- dead_category_count
- categorization_status

Backport of #51879
2020-02-10 09:10:50 +00:00
Martijn van Groningen 44ea1efd26
Tidy up GetSourceRequest class: (#51916)
* No need to implement ToXContentObject
* Made index and id fields immutable.
2020-02-10 09:42:03 +01:00
Benjamin Trent c6111eb90e
[ML][Inference] adding number_samples to TreeNode (#51937) (#52060)
in preparation for feature importance and split information gain, adding `number_samples` field to `TreeNode` definition.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-02-07 17:04:58 -05:00
David Kyle 8f10a7c6ca [ML] Make Ensemble feature names optional (#51996)
The featureNames field is requisite in individual models but is not required by the Ensemble.
2020-02-07 10:08:37 +00:00
Jason Tedor 25daf5f1e1
Add autoscaling API skelton (#51564)
The main purpose of this commit is to add a single autoscaling REST
endpoint skeleton, for the purpose of starting to build out the build
and testing infrastructure that will surround it. For example, rather
than commiting a fully-functioning autoscaling API, we introduce here
the skeleton so that we can start wiring up the build and testing
infrastructure, establish security roles/permissions, an so on. This
way, in a forthcoming PR that introduces actual functionality, that PR
will be smaller and have less distractions around that sort of
infrastructure.
2020-02-06 21:55:01 -05:00
Ioannis Kakavas 5092d3098d
Cleanup test user in HLRC test (#49477) (#51942)
SecurityIT.testGetUser creates a user for testing purposes, but did
not delete the user at the end of the test. This could leave the
cluster in an unexpected state for other tests.

This commit:
- Deletes the user at the end of `testGetUser`
- Adds the test-name as metadata to the users that are created in `SecurityIT`
  so that their origin is clear if they do interfere with other tests
- Enables SecurityDocumentationIT.testGetUsers on the expectation that
  the new cleanup step will resolve the unreliability of that test.

Relates: #48440

Co-authored-by: Tim Vernum <tim@adjective.org>
2020-02-06 13:05:09 +02:00
Martijn van Groningen 0610eb51ef
Change HLRC SourceExists to use GetSourceRequest instead of GetRequest (#51789) (#51913)
Originates from #50885

Co-authored-by: Maxim <timonin.maksim@mail.ru>
2020-02-05 13:27:31 +01:00
Julie Tibshirani 38ce428831
Create a class to hold field capabilities for one index. (#51844)
Currently, the same class `FieldCapabilities` is used both to represent the
capabilities for one index, and also the merged capabilities across indices. To
help clarify the logic, this PR proposes to create a separate class
`IndexFieldCapabilities` for the capabilities in one index. The refactor will
also help when adding `source_path` information in #49264, since the merged
source path field will have a different structure from the field for a single index.

Individual changes:
* Add a new class IndexFieldCapabilities.
* Remove extra constructor from FieldCapabilities.
* Combine the add and merge methods in FieldCapabilities.Builder.
2020-02-04 11:24:57 -08:00
James Rodewig 4ea7297e1e
[DOCS] Change http://elastic.co -> https (#48479) (#51812)
Co-authored-by: Jonathan Budzenski <jon@budzenski.me>
2020-02-03 09:50:11 -05:00
Ryan Ernst 21224caeaf Remove comparison to true for booleans (#51723)
While we use `== false` as a more visible form of boolean negation
(instead of `!`), the true case is implied and the true value does not
need to explicitly checked. This commit converts cases that have slipped
into the code checking for `== true`.
2020-01-31 16:35:43 -08:00
Lee Hinman b9faa0733d
[7.x] Rename ILM history index enablement setting (#51698) (#51705)
* Rename ILM history index enablement setting

The previous setting was `index.lifecycle.history_index_enabled`, this commit changes it to
`indices.lifecycle.history_index_enabled` to indicate this is not an index-level setting (it's node
level).
2020-01-30 15:27:44 -07:00
Benjamin Trent 1380dd439a
[7.x] [ML][Inference] Fix weighted mode definition (#51648) (#51695)
* [ML][Inference] Fix weighted mode definition (#51648)

Weighted mode inaccurately assumed that the "max value" of the input values would be the maximum class value. This does not make sense. 

Weighted Mode should know how many classes there are. Hence the new parameter `num_classes`. This indicates what the maximum class value to be expected.
2020-01-30 15:33:25 -05:00
Hendrik Muhs b5106aa59d dump audit index to logs for better debugging (#51627)
The audit index is re-created for every testrun and therefore potential useful debug information
gets lost. This change reads out the audit index and logs the results, which makes them available
for debugging CI issues.

relates #51549
2020-01-30 11:14:56 +01:00
Albert Zaharovits f25b6cc2eb
Add new 'maintenance' index privilege #50643
This commit creates a new index privilege named `maintenance`.
The privilege grants the following actions: `refresh`, `flush` (also synced-`flush`),
and `force-merge`. Previously the actions were only under the `manage` privilege
which in some situations was too permissive.

Co-authored-by: Amir H Movahed <arhd83@gmail.com>
2020-01-30 11:59:11 +02:00
Benjamin Trent fc994d9ce1
[ML][Inference] Adds validations for model PUT (#51376) (#51409)
Adds validations making sure that

* `input.field_names` is not empty
* `ensemble.trained_models` is not empty
* `tree.feature_names` is not empty

closes https://github.com/elastic/elasticsearch/issues/51354
2020-01-24 09:29:12 -05:00
Benjamin Trent 76660a5a4f
[7.x] [ML][Inference] add tags url param to GET (#51330) (#51404)
* [ML][Inference] add tags url param to GET (#51330)

Adds a new URL parameter, `tags` to the GET _ml/inference/<model_id> endpoint.

This parameter allows the list of models to be further reduced to those who contain all the provided tags.
2020-01-24 08:26:58 -05:00
Nhat Nguyen 072203cba8
Clean soft-deletes setting in ccr tests (#51113) (#51372)
We no longer need to explicitly enable soft-deletes in CCR tests.

Relates #50775
Backport of #51113
2020-01-23 16:31:47 -05:00
Martijn van Groningen 0a8d8d7ae3
Add Get Source API to the HLRC (#51342)
Backport to 7.x branch of #50885.

Relates to #47678

Co-authored-by: Maxim <timonin.maksim@mail.ru>
2020-01-23 13:16:20 +01:00
Tim Vernum e41c0b1224
Deprecating kibana_user and kibana_dashboard_only_user roles (#50963)
This change adds a new `kibana_admin` role, and deprecates
the old `kibana_user` and`kibana_dashboard_only_user`roles.

The deprecation is implemented via a new reserved metadata
attribute, which can be consumed from the API and also triggers
deprecation logging when used (by a user authenticating to
Elasticsearch).

Some docs have been updated to avoid references to these
deprecated roles.

Backport of: #46456

Co-authored-by: Larry Gregory <lgregorydev@gmail.com>
2020-01-15 11:07:19 +11:00
Benjamin Trent 72c270946f
[ML][Inference] Adding classification_weights to ensemble models (#50874) (#50994)
* [ML][Inference] Adding classification_weights to ensemble models

classification_weights are a way to allow models to
prefer specific classification results over others
this might be advantageous if classification value
probabilities are a known quantity and can improve
model error rates.
2020-01-14 12:40:25 -05:00
Dimitris Athanasiou 1d8cb3c741
[7.x][ML] Add num_top_feature_importance_values param to regression and classi… (#50914) (#50976)
Adds a new parameter to regression and classification that enables computation
of importance for the top most important features. The computation of the importance
is based on SHAP (SHapley Additive exPlanations) method.

Backport of #50914
2020-01-14 16:46:09 +02:00
Nhat Nguyen fb32a55dd5 Deprecate synced flush (#50835)
A normal flush has the same effect as a synced flush on Elasticsearch 
7.6 or later. It's deprecated in 7.6 and will be removed in 8.0.

Relates #50776
2020-01-13 19:54:38 -05:00
Nhat Nguyen 05f97d5e1b Revert "Deprecate synced flush (#50835)"
This reverts commit 1a32d7142a.
2020-01-13 11:41:03 -05:00
Nhat Nguyen 1a32d7142a
Deprecate synced flush (#50835)
A normal flush has the same effect as a synced flush on Elasticsearch 
7.6 or later. It's deprecated in 7.6 and will be removed in 8.0.

Relates #50776
2020-01-13 10:58:29 -05:00
Benjamin Trent fa116a6d26
[7.x] [ML][Inference] PUT API (#50852) (#50887)
* [ML][Inference] PUT API (#50852)

This adds the `PUT` API for creating trained models that support our format.

This includes

* HLRC change for the API
* API creation
* Validations of model format and call

* fixing backport
2020-01-12 10:59:11 -05:00