By moving tokenization for categorization to Java we give users access to considerably more options for tokenizing their log messages prior to using ML to categorize them. Now all Elasticsearch analyzer functionality is available, which opens up the possibility to sensibly categorize non-English log messages.
Relates elastic/machine-learning-cpp#491
Original commit: elastic/x-pack-elasticsearch@5d61b67614
For the purpose of getting this API consumed by our UI, returning
overall buckets that match the job's largest `bucket_span` can
result in too much data. The UI only ever displays a few buckets
in the swimlane. Their span depends on the time range selected and
the screen resolution, but it will only ever be a relatively
low number.
This PR adds the ability to aggregate overall buckets in a user
specified `bucket_span`. That `bucket_span` may be equal or
greater to the largest job's `bucket_span`. The `overall_score`
of the result overall buckets is the max score of the
corresponding overall buckets with a span equal to the job's
largest `bucket_span`.
The implementation is now chunking the bucket requests
as otherwise the aggregation would fail when too many buckets
are matching.
Original commit: elastic/x-pack-elasticsearch@981f7a40e5
Adds the GET overall_buckets API.
The REST end point is: GET
/_xpack/ml/anomaly_detectors/job_id/results/overall_buckets
The API returns overall bucket results. An overall bucket
is a summarized bucket result over multiple jobs.
It has the `bucket_span` of the longest job's `bucket_span`.
It also has an `overall_score` that is the `top_n` average of the
max anomaly scores per job.
relates elastic/x-pack-elasticsearch#2693
Original commit: elastic/x-pack-elasticsearch@ba6061482d
The changes made for elastic/x-pack-elasticsearch#2369 showed that the ML security tests were seriously
weakened by the decision to grant many "minimal" privileges to all users
involved in the tests. A better solution is to override the auth header
such that a superuser runs setup actions and assertions that work by
querying raw documents in ways that an end user wouldn't. Then the ML
endpoints can be called with the privileges provided by the ML roles and
nothing else.
Original commit: elastic/x-pack-elasticsearch@4de42d9e54
This is related to elastic/x-pack-elasticsearch#1217. This PR removes the default password of
"changeme" from the reserved users.
This PR adds special behavior for authenticating the reserved users. No
ReservedRealm user can be authenticated until its password is set. The
one exception to this is the elastic user. The elastic user can be
authenticated with an empty password if the action is a rest request
originating from localhost. In this scenario where an elastic user is
authenticated with a default password, it will have metadata indicating
that it is in setup mode. An elastic user in setup mode is only
authorized to execute a change password request.
Original commit: elastic/x-pack-elasticsearch@e1e101a237
In does not make sense for the time_field in the data_description to
be used as a by/over/partition field name, nor the summary_count_field,
categorization_field or as an influencer. Therefore, configurations
where the time_field in the data_description is used in the
analysis_config are now rejected.
Additionally, it causes a problem communicating with the C++ code if
the control field name (which is '.') is used in the analysis_config,
so this is also rejected at the validation stage.
Relates elastic/x-pack-elasticsearch#1684
Original commit: elastic/x-pack-elasticsearch@e6750a2cda
* [ML] Not an error to close a job twice
* Error if job is opening
* Address review comments
* Test closed job isn’t resolved
Original commit: elastic/x-pack-elasticsearch@7da7b24c08
We didn't realise it was possible for a qa module to depend on the
test classes of the plugin module, so we duplicated a test class.
But it turns out it IS possible to declare this dependency and avoid
the duplication.
Original commit: elastic/x-pack-elasticsearch@b6a21cda28
Previously force closing a job required extra privileges. Following
the full discussion about what privileges should be required.
Original commit: elastic/x-pack-elasticsearch@4d85314b35
* Changed ML action names to allow distinguishing of admin and read-only actions
using wildcards
* Added manage_ml and monitor_ml built-in privileges as subsets of the existing
manage and monitor privileges
* Added out-of-the-box machine_learning_admin and machine_learning_user roles
* Changed machine learning results endpoints to use a NodeClient rather than an
InternalClient when searching for results so that index/document level permissions
applied to ML results are respected
Original commit: elastic/x-pack-elasticsearch@eee800aaa8