From 314ca78e3160f4a822ae5e36b7b7683aae4dbb7f Mon Sep 17 00:00:00 2001 From: Lisa Cawley Date: Wed, 22 Apr 2020 10:58:26 -0700 Subject: [PATCH] [7.x][DOCS] Update example and nesting in get data frame analytics job stats API (#55612) --- .../apis/get-datafeed-stats.asciidoc | 23 +- .../apis/get-job-stats.asciidoc | 10 +- .../apis/get-dfanalytics-stats.asciidoc | 516 ++++++++++++++++- docs/reference/ml/ml-shared.asciidoc | 531 ++++-------------- 4 files changed, 617 insertions(+), 463 deletions(-) diff --git a/docs/reference/ml/anomaly-detection/apis/get-datafeed-stats.asciidoc b/docs/reference/ml/anomaly-detection/apis/get-datafeed-stats.asciidoc index e723e1dbc8f..b44645ab09a 100644 --- a/docs/reference/ml/anomaly-detection/apis/get-datafeed-stats.asciidoc +++ b/docs/reference/ml/anomaly-detection/apis/get-datafeed-stats.asciidoc @@ -82,17 +82,26 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=node-datafeeds] -- [%collapsible%open] ==== -`id`::: -include::{docdir}/ml/ml-shared.asciidoc[tag=node-id] - -`name`::: The node name. For example, `0-o0tOo`. +`attributes`::: +(object) +include::{docdir}/ml/ml-shared.asciidoc[tag=node-attributes] `ephemeral_id`::: +(string) include::{docdir}/ml/ml-shared.asciidoc[tag=node-ephemeral-id] -`transport_address`::: The host and port where transport HTTP connections are -accepted. For example, `127.0.0.1:9300`. -`attributes`::: For example, `{"ml.machine_memory": "17179869184"}`. +`id`::: +(string) +include::{docdir}/ml/ml-shared.asciidoc[tag=node-id] + +`name`::: +(string) +The node name. For example, `0-o0tOo`. + +`transport_address`::: +(string) +include::{docdir}/ml/ml-shared.asciidoc[tag=node-transport-address] + ==== -- diff --git a/docs/reference/ml/anomaly-detection/apis/get-job-stats.asciidoc b/docs/reference/ml/anomaly-detection/apis/get-job-stats.asciidoc index 1b8ff976862..cd609f7df82 100644 --- a/docs/reference/ml/anomaly-detection/apis/get-job-stats.asciidoc +++ b/docs/reference/ml/anomaly-detection/apis/get-job-stats.asciidoc @@ -281,8 +281,8 @@ available only for open jobs. [%collapsible%open] ==== `attributes`::: -(object) Lists node attributes. For example, -`{"ml.machine_memory": "17179869184"}`. +(object) +include::{docdir}/ml/ml-shared.asciidoc[tag=node-attributes] `ephemeral_id`::: (string) @@ -293,10 +293,12 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=node-ephemeral-id] include::{docdir}/ml/ml-shared.asciidoc[tag=node-id] `name`::: -(string) The node name. +(string) +The node name. `transport_address`::: -(string) The host and port where transport HTTP connections are accepted. +(string) +include::{docdir}/ml/ml-shared.asciidoc[tag=node-transport-address] ==== //End node diff --git a/docs/reference/ml/df-analytics/apis/get-dfanalytics-stats.asciidoc b/docs/reference/ml/df-analytics/apis/get-dfanalytics-stats.asciidoc index fe076ae2fec..e814eb7e3e1 100644 --- a/docs/reference/ml/df-analytics/apis/get-dfanalytics-stats.asciidoc +++ b/docs/reference/ml/df-analytics/apis/get-dfanalytics-stats.asciidoc @@ -61,12 +61,442 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=size] [[ml-get-dfanalytics-stats-response-body]] ==== {api-response-body-title} -The API returns the following information: - `data_frame_analytics`:: -(array) -include::{docdir}/ml/ml-shared.asciidoc[tag=data-frame-analytics-stats] +(array) +An array of objects that contain usage information for {dfanalytics-jobs}, which +are sorted by the `id` value in ascending order. ++ +.Properties of {dfanalytics-job} usage resources +[%collapsible%open] +==== +//Begin analysis_stats +`analysis_stats`::: +(object) +An object containing information about the analysis job. ++ +.Properties of `analysis_stats` +[%collapsible%open] +===== +//Begin classification_stats +`classification_stats`:::: +(object) +An object containing information about the {classanalysis} job. ++ +.Properties of `classification_stats` +[%collapsible%open] +====== +//Begin class_hyperparameters +`hyperparameters`:::: +(object) +An object containing the parameters of the {classanalysis} job. ++ +.Properties of `hyperparameters` +[%collapsible%open] +======= +`alpha`:::: +(double) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-alpha] +`class_assignment_objective`:::: +(string) +Defines whether class assignment maximizes the accuracy or the minimum recall +metric. Possible values are `maximize_accuracy` and `maximize_minimum_recall`. + +`downsample_factor`:::: +(double) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-downsample-factor] + +`eta`:::: +(double) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-eta] + +`eta_growth_rate_per_tree`:::: +(double) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-eta-growth] + +`feature_bag_fraction`:::: +(double) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-feature-bag-fraction] + +`gamma`:::: +(double) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-gamma] + +`lambda`:::: +(double) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-lambda] + +`max_attempts_to_add_tree`:::: +(integer) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-max-attempts] + +`max_optimization_rounds_per_hyperparameter`:::: +(integer) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-max-optimization-rounds] + +`max_trees`:::: +(integer) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-max-trees] + +`num_folds`:::: +(integer) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-num-folds] + +`num_splits_per_feature`:::: +(integer) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-num-splits] + +`soft_tree_depth_limit`:::: +(double) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-soft-limit] + +`soft_tree_depth_tolerance`:::: +(double) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-soft-tolerance] +======= +//End class_hyperparameters + +`iteration`:::: +(integer) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-iteration] + +`timestamp`:::: +(date) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timestamp] + +//Begin class_timing_stats +`timing_stats`:::: +(object) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timing-stats] ++ +.Properties of `timing_stats` +[%collapsible%open] +======= +`elapsed_time`:::: +(integer) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timing-stats-elapsed] + +`iteration_time`:::: +(integer) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timing-stats-iteration] +======= +//End class_timing_stats + +//Begin class_validation_loss +`validation_loss`:::: +(object) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-validation-loss] ++ +.Properties of `validation_loss` +[%collapsible%open] +======= +`fold_values`:::: +(array of strings) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-validation-loss-fold] + +`loss_type`:::: +(string) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-validation-loss-type] +======= +//End class_validation_loss +====== +//End classification_stats + +//Begin outlier_detection_stats +`outlier_detection_stats`:::: +(object) +An object containing information about the {oldetection} job. ++ +.Properties of `outlier_detection_stats` +[%collapsible%open] +====== +//Begin parameters +`parameters`:::: +(object) +The list of job parameters specified by the user or determined by algorithmic +heuristics. ++ +.Properties of `parameters` +[%collapsible%open] +======= +`compute_feature_influence`:::: +(boolean) +If true, feature influence calculation is enabled. + +`feature_influence_threshold`:::: +(double) +The minimum {olscore} that a document needs to have to calculate its feature +influence score. + +`method`:::: +(string) +The method that {oldetection} uses. Possible values are `lof`, `ldof`, +`distance_kth_nn`, `distance_knn`, and `ensemble`. + +`n_neighbors`:::: +(integer) +The value for how many nearest neighbors each method of {oldetection} uses to +calculate its outlier score. + +`outlier_fraction`:::: +(double) +The proportion of the data set that is assumed to be outlying prior to +{oldetection}. + +`standardization_enabled`:::: +(boolean) +If true, then the following operation is performed on the columns before +computing {olscores}: (x_i - mean(x_i)) / sd(x_i). +======= +//End parameters + +`timestamp`:::: +(date) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timestamp] + +//Begin od_timing_stats +`timing_stats`:::: +(object) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timing-stats] ++ +.Property of `timing_stats` +[%collapsible%open] +======= +`elapsed_time`:::: +(integer) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timing-stats-elapsed] +======= +//End od_timing_stats +====== +//End outlier_detection_stats + +//Begin regression_stats +`regression_stats`:::: +(object) +An object containing information about the {reganalysis}. ++ +.Properties of `regression_stats` +[%collapsible%open] +====== +//Begin reg_hyperparameters +`hyperparameters`:::: +(object) +An object containing the parameters of the {reganalysis}. ++ +.Properties of `hyperparameters` +[%collapsible%open] +======= +`alpha`:::: +(double) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-alpha] + +`downsample_factor`:::: +(double) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-downsample-factor] + +`eta`:::: +(double) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-eta] + +`eta_growth_rate_per_tree`:::: +(double) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-eta-growth] + +`feature_bag_fraction`:::: +(double) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-feature-bag-fraction] + +`gamma`:::: +(double) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-gamma] + +`lambda`:::: +(double) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-lambda] + +`max_attempts_to_add_tree`:::: +(integer) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-max-attempts] + +`max_optimization_rounds_per_hyperparameter`:::: +(integer) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-max-optimization-rounds] + +`max_trees`:::: +(integer) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-max-trees] + +`num_folds`:::: +(integer) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-num-folds] + +`num_splits_per_feature`:::: +(integer) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-num-splits] + +`soft_tree_depth_limit`:::: +(double) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-soft-limit] + +`soft_tree_depth_tolerance`:::: +(double) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-soft-tolerance] +======= +//End reg_hyperparameters + +`iteration`:::: +(integer) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-iteration] + +`timestamp`:::: +(date) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timestamp] + +//Begin reg_timing_stats +`timing_stats`:::: +(object) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timing-stats] ++ +.Propertis of `timing_stats` +[%collapsible%open] +======= +`elapsed_time`:::: +(integer) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timing-stats-elapsed] + +`iteration_time`:::: +(integer) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timing-stats-iteration] +======= +//End reg_timing_stats + +//Begin reg_validation_loss +`validation_loss`:::: +(object) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-validation-loss] ++ +.Properties of `validation_loss` +[%collapsible%open] +======= +`fold_values`:::: +(array of strings) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-validation-loss-fold] + +`loss_type`:::: +(string) +include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-validation-loss-type] +======= +//End reg_validation_loss +====== +//End regression_stats +===== +//End analysis_stats + + +`assignment_explanation`::: +(string) +For running jobs only, contains messages relating to the selection of a node to +run the job. + +//Begin data_counts +`data_counts`::: +(object) +An object that provides counts for the quantity of documents skipped, used in +training, or available for testing. ++ +.Properties of `data_counts` +[%collapsible%open] +===== +`skipped_docs_count`::: +(integer) +The number of documents that are skipped during the analysis because they +contained values that are not supported by the analysis. For example, +{oldetection} does not support missing fields so it skips documents with missing +fields. Likewise, all types of analysis skip documents that contain arrays with +more than one element. + +`test_docs_count`::: +(integer) +The number of documents that are not used for training the model and can be used +for testing. + +`training_docs_count`::: +(integer) +The number of documents that are used for training the model. +===== +//End data_counts + +`id`::: +(string) +The unique identifier of the {dfanalytics-job}. + +`memory_usage`::: +(Optional, object) +An object describing memory usage of the analytics. It is present only after the +job is started and memory usage is reported. ++ +.Properties of `memory_usage` +[%collapsible%open] +===== +`peak_usage_bytes`::: +(long) +The number of bytes used at the highest peak of memory usage. + +`timestamp`::: +(date) +The timestamp when memory usage was calculated. +===== + +`node`::: +(object) +Contains properties for the node that runs the job. This information is +available only for running jobs. ++ +.Properties of `node` +[%collapsible%open] +===== +`attributes`::: +(object) +include::{docdir}/ml/ml-shared.asciidoc[tag=node-attributes] + +`ephemeral_id`::: +(string) +include::{docdir}/ml/ml-shared.asciidoc[tag=node-ephemeral-id] + +`id`::: +(string) +include::{docdir}/ml/ml-shared.asciidoc[tag=node-id] + +`name`::: +(string) +The node name. + +`transport_address`::: +(string) +include::{docdir}/ml/ml-shared.asciidoc[tag=node-transport-address] +===== + +`progress`::: +(array) The progress report of the {dfanalytics-job} by phase. ++ +.Properties of phase objects +[%collapsible%open] +===== +`phase`::: +(string) Defines the phase of the {dfanalytics-job}. Possible phases: +`reindexing`, `loading_data`, `analyzing`, and `writing_results`. + +`progress_percent`::: +(integer) The progress that the {dfanalytics-job} has made expressed in +percentage. +===== + +`state`::: +(string) The status of the {dfanalytics-job}, which can be one of the following +values: `analyzing`, `failed`, `reindexing`, `started`, `starting`, `stopped`, +`stopping`. +==== +//End of data_frame_analytics [[ml-get-dfanalytics-stats-response-codes]] ==== {api-response-codes-title} @@ -79,11 +509,14 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=data-frame-analytics-stats] [[ml-get-dfanalytics-stats-example]] ==== {api-examples-title} +The following API retrieves usage information for the +{ml-docs}/ecommerce-outliers.html[{oldetection} {dfanalytics-job} example]: + [source,console] -------------------------------------------------- -GET _ml/data_frame/analytics/loganalytics/_stats +GET _ml/data_frame/analytics/ecommerce/_stats -------------------------------------------------- -// TEST[skip:TBD] +// TEST[skip:Kibana sample data] The API returns the following results: @@ -91,30 +524,55 @@ The API returns the following results: [source,console-result] ---- { - "count": 1, - "data_frame_analytics": [ + "count" : 1, + "data_frame_analytics" : [ + { + "id" : "ecommerce", + "state" : "stopped", + "progress" : [ { - "id": "loganalytics", - "state": "stopped", - "progress": [ - { - "phase": "reindexing", - "progress_percent": 0 - }, - { - "phase": "loading_data", - "progress_percent": 0 - }, - { - "phase": "analyzing", - "progress_percent": 0 - }, - { - "phase": "writing_results", - "progress_percent": 0 - } - ] + "phase" : "reindexing", + "progress_percent" : 100 + }, + { + "phase" : "loading_data", + "progress_percent" : 100 + }, + { + "phase" : "analyzing", + "progress_percent" : 100 + }, + { + "phase" : "writing_results", + "progress_percent" : 100 } - ] + ], + "data_counts" : { + "training_docs_count" : 3321, + "test_docs_count" : 0, + "skipped_docs_count" : 0 + }, + "memory_usage" : { + "timestamp" : 1586905058000, + "peak_usage_bytes" : 279484 + }, + "analysis_stats" : { + "outlier_detection_stats" : { + "timestamp" : 1586905058000, + "parameters" : { + "n_neighbors" : 0, + "method" : "ensemble", + "compute_feature_influence" : true, + "feature_influence_threshold" : 0.1, + "outlier_fraction" : 0.05, + "standardization_enabled" : true + }, + "timing_stats" : { + "elapsed_time" : 245 + } + } + } + } + ] } ---- diff --git a/docs/reference/ml/ml-shared.asciidoc b/docs/reference/ml/ml-shared.asciidoc index f24e988cfcc..560a90ed517 100644 --- a/docs/reference/ml/ml-shared.asciidoc +++ b/docs/reference/ml/ml-shared.asciidoc @@ -371,429 +371,6 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=time-format] ==== end::data-description[] -tag::data-frame-analytics-stats[] -An array of statistics objects for {dfanalytics-jobs}, which are -sorted by the `id` value in ascending order. - -//Begin analysis_stats -`analysis_stats`:: -(object) -An object containing statistical data about the analysis. -+ -.Properties of `analysis_stats` -[%collapsible%open] -==== -//Begin classification_stats -`classification_stats`::: -(object) -An object containing statistical data about the {classanalysis}. -+ -.Properties of `classification_stats` -[%collapsible%open] -===== -//Begin class_hyperparameters -`hyperparameters`:::: -(object) -An object containing the parameters of the {classanalysis}. -+ -.Properties of `hyperparameters` -[%collapsible%open] -====== -tag::dfas-alpha[] -`alpha`:::: -(double) -Regularization factor to penalize deeper trees when training decision trees. -end::dfas-alpha[] - -`class_assignment_objective`:::: -(string) -Defines whether class assignment maximizes the accuracy or the minimum recall -metric. Possible values are `maximize_accuracy` and `maximize_minimum_recall`. - -tag::dfas-downsample-factor[] -`downsample_factor`:::: -(double) -The value of the downsample factor. -end::dfas-downsample-factor[] - -tag::dfas-eta[] -`eta`:::: -(double) -The value of the eta hyperparameter. -end::dfas-eta[] - -tag::dfas-eta-growth[] -`eta_growth_rate_per_tree`:::: -(double) -Specifies the rate at which the `eta` increases for each new tree that is added to the -forest. For example, a rate of `1.05` increases `eta` by 5%. -end::dfas-eta-growth[] - -tag::dfas-feature-bag-fraction[] -`feature_bag_fraction`:::: -(double) -The fraction of features that is used when selecting a random bag for each -candidate split. -end::dfas-feature-bag-fraction[] - -tag::dfas-gamma[] -`gamma`:::: -(double) -Regularization factor to penalize trees with large numbers of nodes. -end::dfas-gamma[] - -tag::dfas-lambda[] -`lambda`:::: -(double) -Regularization factor to penalize large leaf weights. -end::dfas-lambda[] - -tag::dfas-max-attempts[] -`max_attempts_to_add_tree`:::: -(integer) -If the algorithm fails to determine a non-trivial tree (more than a single -leaf), this parameter determines how many of such consecutive failures are -tolerated. Once the number of attempts exceeds the threshold, the forest -training stops. -end::dfas-max-attempts[] - -tag::dfas-max-optimization-rounds[] -`max_optimization_rounds_per_hyperparameter`:::: -(integer) -A multiplier responsible for determining the maximum number of -hyperparameter optimization steps in the Bayesian optimization procedure. -The maximum number of steps is determined based on the number of undefined hyperparameters -times the maximum optimization rounds per hyperparameter. -end::dfas-max-optimization-rounds[] - -tag::dfas-max-trees[] -`max_trees`:::: -(integer) -The maximum number of trees in the forest. -end::dfas-max-trees[] - -tag::dfas-num-folds[] -`num_folds`:::: -(integer) -The maximum number of folds for the cross-validation procedure. -end::dfas-num-folds[] - -tag::dfas-num-splits[] -`num_splits_per_feature`:::: -(integer) -Determines the maximum number of splits for every feature that can occur in a -decision tree when the tree is trained. -end::dfas-num-splits[] - -tag::dfas-soft-limit[] -`soft_tree_depth_limit`:::: -(double) -Tree depth limit is used for calculating the tree depth penalty. This is a soft -limit, it can be exceeded. -end::dfas-soft-limit[] - -tag::dfas-soft-tolerance[] -`soft_tree_depth_tolerance`:::: -(double) -Tree depth tolerance is used for calculating the tree depth penalty. This is a -soft limit, it can be exceeded. -end::dfas-soft-tolerance[] -====== -//End class_hyperparameters - -tag::dfas-iteration[] -`iteration`:::: -(integer) -The number of iterations on the analysis. -end::dfas-iteration[] - -tag::dfas-timestamp[] -`timestamp`:::: -(date) -The timestamp when the statistics were reported in milliseconds since the epoch. -end::dfas-timestamp[] - -//Begin class_timing_stats -tag::dfas-timing-stats[] -`timing_stats`:::: -(object) -An object containing time statistics about the {dfanalytics-job}. -end::dfas-timing-stats[] -+ -.Properties of `timing_stats` -[%collapsible%open] -====== -tag::dfas-timing-stats-elapsed[] -`elapsed_time`:::: -(integer) -Runtime of the analysis in milliseconds. -end::dfas-timing-stats-elapsed[] - -tag::dfas-timing-stats-iteration[] -`iteration_time`:::: -(integer) -Runtime of the latest iteration of the analysis in milliseconds. -end::dfas-timing-stats-iteration[] -====== -//End class_timing_stats - -//Begin class_validation_loss -tag::dfas-validation-loss[] -`validation_loss`:::: -(object) -An object containing information about validation loss. -end::dfas-validation-loss[] -+ -.Properties of `validation_loss` -[%collapsible%open] -====== -tag::dfas-validation-loss-type[] -`loss_type`:::: -(string) -The type of the loss metric. For example, `binomial_logistic`. -end::dfas-validation-loss-type[] - -tag::dfas-validation-loss-fold[] -`fold_values`:::: -(array of strings) -Validation loss values for every added decision tree during the forest growing -procedure. -end::dfas-validation-loss-fold[] -====== -//End class_validation_loss -===== -//End classification_stats - -//Begin outlier_detection_stats -`outlier_detection_stats`::: -(object) -An object containing statistical data about the {oldetection} job. -+ -.Properties of `outlier_detection_stats` -[%collapsible%open] -===== -//Begin parameters -`parameters`:::: -(object) -The list of job parameters specified by the user or determined by algorithmic -heuristics. -+ -.Properties of `parameters` -[%collapsible%open] -====== -`compute_feature_influence`:::: -(boolean) -If true, feature influence calculation is enabled. - -`feature_influence_threshold`:::: -(double) -The minimum {olscore} that a document needs to have to calculate its feature -influence score. - -`method`:::: -(string) -The method that {oldetection} uses. Possible values are `lof`, `ldof`, -`distance_kth_nn`, `distance_knn`, and `ensemble`. - -`n_neighbors`:::: -(integer) -The value for how many nearest neighbors each method of {oldetection} uses to -calculate its outlier score. - -`outlier_fraction`:::: -(double) -The proportion of the data set that is assumed to be outlying prior to -{oldetection}. - -`standardization_enabled`:::: -(boolean) -If true, then the following operation is performed on the columns before -computing {olscores}: (x_i - mean(x_i)) / sd(x_i). -====== -//End parameters - -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timestamp] - -//Begin od_timing_stats -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timing-stats] -+ -.Property of `timing_stats` -[%collapsible%open] -====== -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timing-stats-elapsed] -====== -//End od_timing_stats -===== -//End outlier_detection_stats - -//Begin regression_stats -`regression_stats`::: -(object) -An object containing statistical data about the {reganalysis}. -+ -.Properties of `regression_stats` -[%collapsible%open] -===== -//Begin reg_hyperparameters -`hyperparameters`:::: -(object) -An object containing the parameters of the {reganalysis}. -+ -.Properties of `hyperparameters` -[%collapsible%open] -====== -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-alpha] - -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-downsample-factor] - -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-eta] - -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-eta-growth] - -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-feature-bag-fraction] - -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-gamma] - -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-lambda] - -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-max-attempts] - -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-max-optimization-rounds] - -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-max-trees] - -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-num-folds] - -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-num-splits] - -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-soft-limit] - -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-soft-tolerance] -====== -//End reg_hyperparameters - -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-iteration] - -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timestamp] - -//Begin reg_timing_stats -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timing-stats] -+ -.Propertis of `timing_stats` -[%collapsible%open] -====== -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timing-stats-elapsed] - -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timing-stats-iteration] -====== -//End reg_timing_stats - -//Begin reg_validation_loss -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-validation-loss] -+ -.Properties of `validation_loss` -[%collapsible%open] -====== -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-validation-loss-type] - -include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-validation-loss-fold] -====== -//End reg_validation_loss -===== -//End regression_stats -==== -//End analysis_stats - -`assignment_explanation`::: -(string) -For running jobs only, contains messages relating to the selection of a node to -run the job. - -//Begin data_counts -`data_counts`::: -(object) -An object containing statistical data about the documents in the analysis. -+ -.Properties of `data_counts` -[%collapsible%open] -==== -`skipped_docs_count`::: -(integer) -The number of documents that are skipped during the analysis because they -contained values that are not supported by the analysis. For example, -{oldetection} does not support missing fields so it skips documents with missing -fields. Likewise, all types of analysis skip documents that contain arrays with -more than one element. - -`test_docs_count`::: -(integer) -The number of documents that are not used for training the model and can be used -for testing. - -`training_docs_count`::: -(integer) -The number of documents that are used for training the model. -==== -//End data_counts - -`id`::: -(string) -The unique identifier of the {dfanalytics-job}. - -`memory_usage`::: -(Optional, object) -An object describing memory usage of the analytics. It is present only after the -job is started and memory usage is reported. - -`memory_usage`.`peak_usage_bytes`::: -(long) -The number of bytes used at the highest peak of memory usage. - -`memory_usage`.`timestamp`::: -(date) -The timestamp when memory usage was calculated. - -`node`::: -(object) -Contains properties for the node that runs the job. This information is -available only for running jobs. - -`node`.`attributes`::: -(object) -Lists node attributes such as `ml.machine_memory`, `ml.max_open_jobs`, and -`xpack.installed`. - -`node`.`ephemeral_id`::: -(string) -The ephemeral id of the node. - -`node`.`id`::: -(string) -The unique identifier of the node. - -`node`.`name`::: -(string) -The node name. - -`node`.`transport_address`::: -(string) -The host and port where transport HTTP connections are accepted. - -`progress`::: -(array) The progress report of the {dfanalytics-job} by phase. - -`progress`.`phase`::: -(string) Defines the phase of the {dfanalytics-job}. Possible phases: -`reindexing`, `loading_data`, `analyzing`, and `writing_results`. - -`progress`.`progress_percent`::: -(integer) The progress that the {dfanalytics-job} has made expressed in -percentage. - -`state`::: -(string) Current state of the {dfanalytics-job}. -end::data-frame-analytics-stats[] - tag::datafeed-id[] A numerical character string that uniquely identifies the {dfeed}. This identifier can contain lowercase alphanumeric characters (a-z @@ -894,6 +471,106 @@ A unique identifier for the detector. This identifier is based on the order of the detectors in the `analysis_config`, starting at zero. end::detector-index[] +tag::dfas-alpha[] +Regularization factor to penalize deeper trees when training decision trees. +end::dfas-alpha[] + +tag::dfas-downsample-factor[] +The value of the downsample factor. +end::dfas-downsample-factor[] + +tag::dfas-eta[] +The value of the eta hyperparameter. +end::dfas-eta[] + +tag::dfas-eta-growth[] +Specifies the rate at which the `eta` increases for each new tree that is added to the +forest. For example, a rate of `1.05` increases `eta` by 5%. +end::dfas-eta-growth[] + +tag::dfas-feature-bag-fraction[] +The fraction of features that is used when selecting a random bag for each +candidate split. +end::dfas-feature-bag-fraction[] + +tag::dfas-gamma[] +Regularization factor to penalize trees with large numbers of nodes. +end::dfas-gamma[] + +tag::dfas-iteration[] +The number of iterations on the analysis. +end::dfas-iteration[] + +tag::dfas-lambda[] +Regularization factor to penalize large leaf weights. +end::dfas-lambda[] + +tag::dfas-max-attempts[] +If the algorithm fails to determine a non-trivial tree (more than a single +leaf), this parameter determines how many of such consecutive failures are +tolerated. Once the number of attempts exceeds the threshold, the forest +training stops. +end::dfas-max-attempts[] + +tag::dfas-max-optimization-rounds[] +A multiplier responsible for determining the maximum number of +hyperparameter optimization steps in the Bayesian optimization procedure. +The maximum number of steps is determined based on the number of undefined hyperparameters +times the maximum optimization rounds per hyperparameter. +end::dfas-max-optimization-rounds[] + +tag::dfas-max-trees[] +The maximum number of trees in the forest. +end::dfas-max-trees[] + +tag::dfas-num-folds[] +The maximum number of folds for the cross-validation procedure. +end::dfas-num-folds[] + +tag::dfas-num-splits[] +Determines the maximum number of splits for every feature that can occur in a +decision tree when the tree is trained. +end::dfas-num-splits[] + +tag::dfas-soft-limit[] +Tree depth limit is used for calculating the tree depth penalty. This is a soft +limit, it can be exceeded. +end::dfas-soft-limit[] + +tag::dfas-soft-tolerance[] +Tree depth tolerance is used for calculating the tree depth penalty. This is a +soft limit, it can be exceeded. +end::dfas-soft-tolerance[] + +tag::dfas-timestamp[] +The timestamp when the statistics were reported in milliseconds since the epoch. +end::dfas-timestamp[] + +tag::dfas-timing-stats[] +An object containing time statistics about the {dfanalytics-job}. +end::dfas-timing-stats[] + +tag::dfas-timing-stats-elapsed[] +Runtime of the analysis in milliseconds. +end::dfas-timing-stats-elapsed[] + +tag::dfas-timing-stats-iteration[] +Runtime of the latest iteration of the analysis in milliseconds. +end::dfas-timing-stats-iteration[] + +tag::dfas-validation-loss[] +An object containing information about validation loss. +end::dfas-validation-loss[] + +tag::dfas-validation-loss-fold[] +Validation loss values for every added decision tree during the forest growing +procedure. +end::dfas-validation-loss-fold[] + +tag::dfas-validation-loss-type[] +The type of the loss metric. For example, `binomial_logistic`. +end::dfas-validation-loss-type[] + tag::earliest-record-timestamp[] The timestamp of the earliest chronologically input document. end::earliest-record-timestamp[] @@ -1334,6 +1011,10 @@ tag::node-address[] The network address of the node. end::node-address[] +tag::node-attributes[] +Lists node attributes such as `ml.machine_memory` or `ml.max_open_jobs` settings. +end::node-attributes[] + tag::node-datafeeds[] For started {dfeeds} only, this information pertains to the node upon which the {dfeed} is started. @@ -1352,6 +1033,10 @@ Contains properties for the node that runs the job. This information is available only for open jobs. end::node-jobs[] +tag::node-transport-address[] +The host and port where transport HTTP connections are accepted. +end::node-transport-address[] + tag::open-time[] For open jobs only, the elapsed time for which the job has been open. end::open-time[]