OpenSearch/docs/en/rest-api/ml/jobcounts.asciidoc

159 lines
5.6 KiB
Plaintext

//lcawley Verified example output 2017-04-11
[[ml-jobcounts]]
==== Job Counts
The get job statistics API provides information about the operational
progress of a job.
NOTE: Job count values are cumulative for the lifetime of a job. If a model snapshot is reverted
or old results are deleted, the job counts are not reset.
`data_counts`::
(object) An object that describes the number of records processed and any related error counts.
See <<ml-datacounts,data counts objects>>.
`job_id`::
(string) A numerical character string that uniquely identifies the job.
`model_size_stats`::
(object) An object that provides information about the size and contents of the model.
See <<ml-modelsizestats,model size stats objects>>
`state`::
(string) The status of the job, which can be one of the following values:
`open`::: The job is actively receiving and processing data.
`closed`::: The job finished successfully with its model state persisted.
The job is still available to accept further data.
`closing`::: TBD
`failed`::: The job did not finish successfully due to an error.
This situation can occur due to invalid input data. In this case,
sending corrected data to a failed job re-opens the job and
resets it to an open state.
NOTE: If you send data in a periodic cycle and close the job at the end of
each transaction, the job is marked as closed in the intervals between
when data is sent. For example, if data is sent every minute and it takes
1 second to process, the job has a closed state for 59 seconds.
[float]
[[ml-datacounts]]
===== Data Counts Objects
The `data_counts` object describes the number of records processed
and any related error counts. It has the following properties:
`bucket_count`::
(long) The number of bucket results produced by the job.
`earliest_record_timestamp`::
(string) The timestamp of the earliest chronologically ordered record.
The datetime string is in ISO 8601 format.
`empty_bucket_count`::
() TBD
`input_bytes`::
(long) The number of raw bytes read by the job.
`input_field_count`::
(long) The total number of record fields read by the job. This count includes
fields that are not used in the analysis.
`input_record_count`::
(long) The number of data records read by the job.
`invalid_date_count`::
(long) The number of records with either a missing date field or a date that could not be parsed.
`job_id`::
(string) A numerical character string that uniquely identifies the job.
`last_data_time`::
() TBD
`latest_record_timestamp`::
(string) The timestamp of the last chronologically ordered record.
If the records are not in strict chronological order, this value might not be
the same as the timestamp of the last record.
The datetime string is in ISO 8601 format.
`latest_sparse_bucket_timestamp`::
() TBD
`missing_field_count`::
(long) The number of records that are missing a field that the job is configured to analyze.
Records with missing fields are still processed because it is possible that not all fields are missing.
The value of `processed_record_count` includes this count. +
+
--
NOTE: If you are using data feeds or posting data to the job in JSON format, a
high `missing_field_count` is often not an indication of data issues. It is not
necessarily a cause for concern.
--
`out_of_order_timestamp_count`::
(long) The number of records that are out of time sequence and outside of the latency window.
These records are discarded, since jobs require time series data to be in ascending chronological order.
`processed_field_count`::
(long) The total number of fields in all the records that have been processed by the job.
Only fields that are specified in the detector configuration object contribute to this count.
The time stamp is not included in this count.
`processed_record_count`::
(long) The number of records that have been processed by the job.
This value includes records with missing fields, since they are nonetheless analyzed.
+
The following records are not processed:
* Records not in chronological order and outside the latency window
* Records with invalid timestamp
* Records filtered by an exclude transform
`sparse_bucket_count`::
() TBD
[float]
[[ml-modelsizestats]]
===== Model Size Stats Objects
The `model_size_stats` object has the following properties:
`bucket_allocation_failures_count`::
() TBD
`job_id`::
(string) A numerical character string that uniquely identifies the job.
`log_time`::
() TBD
`memory_status`::
(string) The status of the mathematical models. This property can have one of the following values:
`ok`::: The models stayed below the configured value.
`soft_limit`::: The models used more than 60% of the configured memory limit and older unused models will be pruned to free up space.
`hard_limit`::: The models used more space than the configured memory limit. As a result, not all incoming data was processed.
`model_bytes`::
(long) The number of bytes of memory used by the models. This is the maximum value since the
last time the model was persisted. If the job is closed, this value indicates the latest size.
`result_type`::
TBD
`total_by_field_count`::
(long) The number of `by` field values that were analyzed by the models.
NOTE: The `by` field values are counted separately for each detector and partition.
`total_over_field_count`::
(long) The number of `over` field values that were analyzed by the models.
NOTE: The `over` field values are counted separately for each detector and partition.
`total_partition_field_count`::
(long) The number of `partition` field values that were analyzed by the models.
`timestamp`::
TBD