mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-07 05:28:34 +00:00
The existing language was misleading about the model snapshots and where they are located. Saying "to disk" sounds like files external to Elasticsearch IMO. It raises the obvious question, where on disk? which node? Is it in the Elasticsearch snapshot repo? The model snapshots are held in an internal index.
105 lines
3.6 KiB
Plaintext
105 lines
3.6 KiB
Plaintext
[role="xpack"]
|
|
[testenv="platinum"]
|
|
[[ml-snapshot-resource]]
|
|
=== Model snapshot resources
|
|
|
|
Model snapshots are saved to an internal index within the Elasticsearch cluster.
|
|
By default, this is occurs approximately every 3 hours to 4 hours and is
|
|
configurable with the `background_persist_interval` property.
|
|
|
|
By default, model snapshots are retained for one day (twenty-four hours). You
|
|
can change this behavior by updating the `model_snapshot_retention_days` for the
|
|
job. When choosing a new value, consider the following:
|
|
|
|
* Persistence enables resilience in the event of a system failure.
|
|
* Persistence enables snapshots to be reverted.
|
|
* The time taken to persist a job is proportional to the size of the model in memory.
|
|
|
|
A model snapshot resource has the following properties:
|
|
|
|
`description`::
|
|
(string) An optional description of the job.
|
|
|
|
`job_id`::
|
|
(string) A numerical character string that uniquely identifies the job that
|
|
the snapshot was created for.
|
|
|
|
`min_version`::
|
|
(string) The minimum version required to be able to restore the model snapshot.
|
|
|
|
`latest_record_time_stamp`::
|
|
(date) The timestamp of the latest processed record.
|
|
|
|
`latest_result_time_stamp`::
|
|
(date) The timestamp of the latest bucket result.
|
|
|
|
`model_size_stats`::
|
|
(object) Summary information describing the model.
|
|
See <<ml-snapshot-stats,Model Size Statistics>>.
|
|
|
|
`retain`::
|
|
(boolean) If true, this snapshot will not be deleted during automatic cleanup
|
|
of snapshots older than `model_snapshot_retention_days`.
|
|
However, this snapshot will be deleted when the job is deleted.
|
|
The default value is false.
|
|
|
|
`snapshot_id`::
|
|
(string) A numerical character string that uniquely identifies the model
|
|
snapshot. For example: "1491852978".
|
|
|
|
`snapshot_doc_count`::
|
|
(long) For internal use only.
|
|
|
|
`timestamp`::
|
|
(date) The creation timestamp for the snapshot.
|
|
|
|
NOTE: All of these properties are informational with the exception of
|
|
`description` and `retain`.
|
|
|
|
[float]
|
|
[[ml-snapshot-stats]]
|
|
==== Model Size Statistics
|
|
|
|
The `model_size_stats` object has the following properties:
|
|
|
|
`bucket_allocation_failures_count`::
|
|
(long) The number of buckets for which entities were not processed due to
|
|
memory limit constraints.
|
|
|
|
`job_id`::
|
|
(string) A numerical character string that uniquely identifies the job.
|
|
|
|
`log_time`::
|
|
(date) The timestamp that the `model_size_stats` were recorded, according to
|
|
server-time.
|
|
|
|
`memory_status`::
|
|
(string) The status of the memory in relation to its `model_memory_limit`.
|
|
Contains one of the following values.
|
|
`ok`::: The internal models stayed below the configured value.
|
|
`soft_limit`::: The internal models require more than 60% of the configured
|
|
memory limit and more aggressive pruning will
|
|
be performed in order to try to reclaim space.
|
|
`hard_limit`::: The internal models require more space that the configured
|
|
memory limit. Some incoming data could not be processed.
|
|
|
|
`model_bytes`::
|
|
(long) An approximation of the memory resources required for this analysis.
|
|
|
|
`result_type`::
|
|
(string) Internal. This value is always set to "model_size_stats".
|
|
|
|
`timestamp`::
|
|
(date) The timestamp that the `model_size_stats` were recorded, according to the bucket timestamp of the data.
|
|
|
|
`total_by_field_count`::
|
|
(long) The number of _by_ field values analyzed. Note that these are counted separately for each detector and partition.
|
|
|
|
`total_over_field_count`::
|
|
(long) The number of _over_ field values analyzed. Note that these are counted separately for each detector and partition.
|
|
|
|
`total_partition_field_count`::
|
|
(long) The number of _partition_ field values analyzed.
|
|
|
|
NOTE: All of these properties are informational; you cannot change their values.
|