I thought QUERY_AND_FETCH was the most efficient for the data extractor
but it does not work with sorting. It causes all shard results to be
returned before sorting and thus we may get out-of-order errors.
This commit switches to the default search type.
Original commit: elastic/x-pack-elasticsearch@d8a8155973
* Extract method ScheduledJob#postData
* Remove unreachable else statement
* Restrain usage of DataExtractor in a single thread
Original commit: elastic/x-pack-elasticsearch@5b9b310d9d
* prelert to ml
* Prelert to Ml
* PRELERT to ML
Exceptions:
* prelert.com - because it generally appears in links to our website, and
although these will eventually break it will be possible for people to see
what was there using https://archive.org/web/
* PRELERT_AWS_ACCESS_KEY_ID and PRELERT_AWS_SECRET_ACCESS_KEY - because it
creates a knock-on effect on infra that will be temporary anyway because once
we're in x-pack we'll use x-pack keys
* prelert-artifacts - this is the name of the s3 bucket we're currently using
and you cannot rename s3 buckets - as with the access keys it will become
obsolete when we merge to x-pack so there's no point changing it now
* prelert-legacy - the name of our legacy Git repo has not changed
Original commit: elastic/x-pack-elasticsearch@720e83c7f2
and re-enabled some quantiles persistence unit tests (which can remain to be blocking as they aren't used on a network thread)
Original commit: elastic/x-pack-elasticsearch@cf8e78f42d
* Replace http data extractor with a client extractor
This first implementation replaces the HTTP extractor
with a client extractor that uses search & scroll.
Note that this first implementation has some limitations:
- Only reads data that are in the _source
- Does not handle aggregated searches
These limitations will be addressed in follow up PRs.
Relates to elastic/elasticsearch#154
Original commit: elastic/x-pack-elasticsearch@f692ed961c
* Upgrades to ES 6.0.0-alpha1-SNAPSHOT
* Kibana changes to run upgrade to 6.0.0-alpha1-SNAPSHOT
* Other version changes to 6.0.0-alpha1-SNAPSHOT
Original commit: elastic/x-pack-elasticsearch@574d8573ab
This commit contains around half of the endpoint changes Sophie and Steve
agreed with Clint:
1) Automatic job ID generation is removed
2) Job IDs must now be specified in the URL when putting a job; to avoid
breaking many test configs, job IDs may also be specified in the job config
body, but in this case the value specified must match the URL argument
3) The endpoint name for posting data is now post_data instead of job_data
4) The post_data endpoint ends with _data instead of data
5) modelsnapshots is renamed to model_snapshots in all related endpoints
6) PUT model_snapshots/description is changed to POST model_snapshots/_update
Relates elastic/elasticsearch#630
Original commit: elastic/x-pack-elasticsearch@c379a23f3c
The `influencer_field_name` field was declared two in the results mapping. Once directly from `ElasticsearchMappings.resultsMapping()` and again from `addInfluencerFieldsToMapping(XContentBuilder)` which the `resultsMapping()` method calls.
this change removes the duplicate.
Original commit: elastic/x-pack-elasticsearch@5707a5ee53
Allow deletes to proceed even if index is missing
Also adds some tests. All non-IndexNotFound exceptions will still abort the delete.
We can revisit this if we find other edge-cases.
Original commit: elastic/x-pack-elasticsearch@823d00d8a7
and FixBlockingClientOperations in two places where blocking client calls are ok,
because these methods aren't called from a network thread.
Original commit: elastic/x-pack-elasticsearch@a6dc34651c
Merged categoryDefinition(...) into categoryDefinitions(...) as the two did similar things. The get call has been replaced with a search with a query on the _uid field and routing on category id, so that the response handling code can be reused.
Original commit: elastic/x-pack-elasticsearch@4243917b00
The start scheduler waits until the scheduler state has been set to started before returning.
Before this change after the scheduler state has been set to started, the scheduler would link itself to the task the start scheduler api has created.
If the stop scheduler api was called immediately after the start scheduler api then this could lead the stop scheduler api cancelling the task without
stopping the scheduler, as the scheduler could not have been linked to the task.
Now the scheduler gets linked to the task before the scheduler state is set to started, fixing the problematic situation discribed above.
Original commit: elastic/x-pack-elasticsearch@8334ae1967
Also merged the JobProvider#getBucket(...) method into Jobprovider#getBuckets(...) method, because
it contained a lot of similar logic, otherwise it had to be converted to use non blocking client calls too.
Part of elastic/elasticsearch#127
Original commit: elastic/x-pack-elasticsearch@b1e66b62cb
There was an N-squared algorithm in the state processing code, leading
to large state persistence eventually timing out. Large state documents
are read from the network in 8KB chunks, and the old code was checking
ALL previously read chunks for separators every time a new chunk was read.
Fixeselastic/elasticsearch#635
Original commit: elastic/x-pack-elasticsearch@c814858c2c
Deleting a job now starts a three-step process:
1. Job status updated to DELETING
2. Physical index is deleted
3. Job removed from cluster state
When jobs are in DELETING, they cannot be modified/updated/changed at all. Only jobs that are DELETING can actually be removed from the CS.
Original commit: elastic/x-pack-elasticsearch@2cd99a240c
with the fix we also make sure that prelert metatadata is taken into account when verifying the cluste state consistency
Original commit: elastic/x-pack-elasticsearch@1deaec3836
Note that the change in elasticsearch allowed us to store scheduler
config's query and scriptFields as typed objects instead of
ByteReference.
Original commit: elastic/x-pack-elasticsearch@38c5aef2ef
* Check the bulk request contains actions before executing.
This suppresses an validation exception about no requests being added.
* Persist bulk request before refreshing the indexes on a flush acknowledgment
Original commit: elastic/x-pack-elasticsearch@22543e46c8
* Put model state in the .mlstate index
* Revert results index rename
* Put ModelSnapshots in the results index
* Change state index in C++
* Fix logging
* Rename state index ‘.ml-state’
Original commit: elastic/x-pack-elasticsearch@dbe5f6b525
* Strict parse search parts of schedulerConfig
This commit adds methods to build the typed objects
for the search parts of a scheduler config. Those are:
query, aggregations and script_fields.
As scheduler configs are stored in the cluster state and parsing
the search parts requires a SearchRequestParsers object, we cannot
store them as typed fields. Instead, they are now stored as
BytesReferences.
This change is in preparation for switching over to using a
client based data extractor.
Point summary of changes:
- query, aggregations and scriptFields are now stored as BytesReference
- adds methods to build the corresponding typed objects
- putting a scheduler now builds the search parts to to validate that
the config is valid
Relates to elastic/elasticsearch#478
Original commit: elastic/x-pack-elasticsearch@e6d5a85871
This doesn't happen initially when buckets are output by the C++, but
buckets can contain records at the moment they're sent for persistence
during normalization or during integration tests. It's safest if the
persistence code specifically doesn't persist these records.
Original commit: elastic/x-pack-elasticsearch@a93135d8c0
* Add job config option to set the index name
* Check index does not already exist if ’index_name’ is set
* Don’t create alias if ‘index_name’ is the same as ‘job_id’
* Default index_name value
Set it to job_id if null and only create the index alias if job_id != index_name
* Fix compile errors after rebasing
* Address review comments
* Test if the index exists by checking the cluster state
* Update comment
Original commit: elastic/x-pack-elasticsearch@a3e7f1a5bb
* Persist quantile documents with the jobId in the document Id
* Add job Id to snapshot Id
* Add job Id to categoriser state document Id
* Rename quantiles doc to start with job id as the other state docs do
* Fix restoring categoriser state
Original commit: elastic/x-pack-elasticsearch@3e5d3368b5
The threadpool that supplies the threads used for job IO cannot be
resized, so the number of jobs cannot be dynamic either
Original commit: elastic/x-pack-elasticsearch@c584bf7147
* Redesign the get anomaly_detectors APIs
This commit redesigns the APIs to get anomaly_detectors.
The new design has 2 GET APIs:
- An API to get the configurations: /anomaly_detectors/{job_id}
- An API to get the stats: /anomaly_detectors/{job_id}/_stats
For both APIs entering "_all" as the job_id returns results for
all jobs.
Note that page params have been removed. They were useful
when the configs were stored in an index. Now that they are part
of cluster state there is no need. Additionally, future support
for wildcard job_id expressions will give users a tool to narrow
down the GET actions to a certain subset of jobs which will be
more useful than the from/size approach.
Follow up:
- Implement similar GET APIs for schedulers
- Remove scheduler_stats from the anomaly_detectors _stats API
as it will be part of the schedulers _stats API
Closeselastic/elasticsearch#548
Original commit: elastic/x-pack-elasticsearch@046a0db8f5
* Added the Elastic copyright header to C++ files
* Added the name of the copyright holder to Java files
Original commit: elastic/x-pack-elasticsearch@aea1b5a656
Aggs does not need to be a separate member field. There can simply
be an aggs parse field which also then stored onto the aggregations
parse field.
Finally, retrieve_whole_source is unnecessary as we move towards
a node client based data extraction.
Original commit: elastic/x-pack-elasticsearch@14024c2ee5
* snake_case model debug output fields
* Rename CategorizerState and BucketInfluencer types to snake_case
Original commit: elastic/x-pack-elasticsearch@da94dc7ec1
changed stop scheduler api to wait with returning a response until the scheduler status has been set to STOPPED.
Original commit: elastic/x-pack-elasticsearch@e20fcd1ae9
* Check use of mappings
* Add unit tests for JobProvider.createJobRelatedIndices
* Remove ‘index: no’ from mappings as no longer required
The entire type mapping has ‘enabled: false’
* Restore “index.analysis.analyzer.default.type” setting
* Remove include_in_all from nested mappings
* Add audit and usage mappings to the job index
* Revert ‘Restore “index.analysis.analyzer.default.type” setting’
Original commit: elastic/x-pack-elasticsearch@c7d62e0c7e
This builds on PR elastic/elasticsearch#526 to get normalization working end-to-end using the
native normalizer process.
The ShortCircuitingRenormalizer class is basically doing what the old
BlockingQueueRenormaliser class did but using the ES threadpool instead
of its own thread.
Also fixed a bug where the C++ was calling the score field of partition_score
documents normalized_probability but the Java was calling it anomaly_score.
Original commit: elastic/x-pack-elasticsearch@d4cecae150
These test are just checking response / status code. It is cleaner to have these tests as yaml tests.
Original commit: elastic/x-pack-elasticsearch@61c323059e
* Test for job existence before updating its state
* Add unit tests covering expected missing job exceptions
Original commit: elastic/x-pack-elasticsearch@bcd270dafd
* Added back Normalizable classes
* Added back normalization process management classes
* Added back the scores updater
Original commit: elastic/x-pack-elasticsearch@ac8edf6ed6
This also shuffles results under /anomaly_detectors/. Note: the cluster state still refers to
"jobs" which should probably be fixed in a separate PR
Original commit: elastic/x-pack-elasticsearch@c9e634621c
Removed SchedulerStats as scheduler status is all we need and start and end times are only needed in start scheduler api.
Original commit: elastic/x-pack-elasticsearch@80c563cb69
* Use well-defined IDs for records and influencers
Removes the reliance on ES autogenerated UUIDs for all types that will
be renormalized
* Address some review comments
Original commit: elastic/x-pack-elasticsearch@85fde8b957
* Make ModelDebugOutput a result type
* Delete unused ElasticsearchBatchedModelDebugOutputIterator
* Add result_type field to ModelDebugOutput
* Address review comments
Original commit: elastic/x-pack-elasticsearch@a48e4cd946
The start scheduler api call will run until the scheduler has completed. Either when lookback only scheduler completes or the scheduler has been stopped.
The start scheduler api will first update the scheduler status from STOPPED to STARTED on master node and then start running the scheduler.
Once a scheduled job completes it updates the scheduler status from STARTED to STOPPED and then the start schedule api returns.
The STARTING and STOPPING statuses are no longer used, so have been removed.
The stop scheduler api is a sugar api that uses the task list and cancel apis stop the scheduler.
Renamed ScheduledJobService to ScheduledJobRunner
Original commit: elastic/x-pack-elasticsearch@ab504fe3d9
Make jobId and from/size mutually exclusive options.
This approach has the main properties of not allowing an invalid Request to be built, and alerting the user if they set an incorrect configuration. It has the downside that PageParams can be null so the consumer will have to check for it. Since jobId could be null before, this seemed acceptable.
Original commit: elastic/x-pack-elasticsearch@106dcdf61a
Adds a delete_list endpoint. If a list is currently in use by a job, it is not allowed to be deleted
Original commit: elastic/x-pack-elasticsearch@7d9a984b3a
This avoids the confusing situation that a there is no allocation when a job hasn't been opened yet. Now it complains about the fact that the job status is closed.
Original commit: elastic/x-pack-elasticsearch@3159dc6954
* Collapse ElasticsearchBulkDeleter into JobDataDeleter
* Add blocking delete to JobDataDeleter
* Delete interim results only after all the results are parsed.
* Remove unused deleteModelSizeStats and deleteModelDebugOutput methods.
Document missing javadoc tags
Original commit: elastic/x-pack-elasticsearch@1997541673
* A job now has the following statuses: OPENING, OPENED, CLOSING, CLOSED and FAILED.
* The open job and close job APIs wait until the job gets into a OPENED or CLOSED state.
* The post data api no longer lazily opens a job and fails if the job has not been opened.
* When a job gets into a failed state also the reason is recorded in the allocation.
* Removed pause and resume APIs.
* Made `max_running_jobs` setting dynamically updatedable.
Original commit: elastic/x-pack-elasticsearch@3485ec5317
* Add test to read model size stats
* Most recent model_size_stats document should have the name ‘model_size_stats’
Original commit: elastic/x-pack-elasticsearch@e192d4c34d
* Allow POST with body to get records
* Allow records endpoint to accept POST requests with body
* CategoryDefinition can accept POST requests with body parameters
Original commit: elastic/x-pack-elasticsearch@2edb7a9c47