mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-03-27 02:18:42 +00:00
[7.x][DOCS] Move datafeed resource definitions into APIs (#50516)
This commit is contained in:
parent
8869f2b9b2
commit
4b829db593
@ -1,161 +0,0 @@
|
||||
[role="xpack"]
|
||||
[testenv="platinum"]
|
||||
[[ml-datafeed-resource]]
|
||||
=== {dfeed-cap} resources
|
||||
|
||||
A {dfeed} resource has the following properties:
|
||||
|
||||
`aggregations`::
|
||||
(object) If set, the {dfeed} performs aggregation searches.
|
||||
Support for aggregations is limited and should only be used with
|
||||
low cardinality data. For more information, see
|
||||
{ml-docs}/ml-configuring-aggregation.html[Aggregating data for faster performance].
|
||||
|
||||
`chunking_config`::
|
||||
(object) Specifies how data searches are split into time chunks.
|
||||
See <<ml-datafeed-chunking-config>>.
|
||||
For example: `{"mode": "manual", "time_span": "3h"}`
|
||||
|
||||
`datafeed_id`::
|
||||
(string) A numerical character string that uniquely identifies the {dfeed}.
|
||||
This property is informational; you cannot change the identifier for existing
|
||||
{dfeeds}.
|
||||
|
||||
`frequency`::
|
||||
(time units) The interval at which scheduled queries are made while the
|
||||
{dfeed} runs in real time. The default value is either the bucket span for short
|
||||
bucket spans, or, for longer bucket spans, a sensible fraction of the bucket
|
||||
span. For example: `150s`.
|
||||
|
||||
`indices`::
|
||||
(array) An array of index names. For example: `["it_ops_metrics"]`
|
||||
|
||||
`job_id`::
|
||||
(string) The unique identifier for the job to which the {dfeed} sends data.
|
||||
|
||||
`query`::
|
||||
(object) The {es} query domain-specific language (DSL). This value
|
||||
corresponds to the query object in an {es} search POST body. All the
|
||||
options that are supported by {es} can be used, as this object is
|
||||
passed verbatim to {es}. By default, this property has the following
|
||||
value: `{"match_all": {"boost": 1}}`.
|
||||
|
||||
`query_delay`::
|
||||
(time units) The number of seconds behind real time that data is queried. For
|
||||
example, if data from 10:04 a.m. might not be searchable in {es} until
|
||||
10:06 a.m., set this property to 120 seconds. The default value is randomly
|
||||
selected between `60s` and `120s`. This randomness improves the query
|
||||
performance when there are multiple jobs running on the same node.
|
||||
|
||||
`script_fields`::
|
||||
(object) Specifies scripts that evaluate custom expressions and returns
|
||||
script fields to the {dfeed}.
|
||||
The detector configuration objects in a job can contain
|
||||
functions that use these script fields.
|
||||
For more information, see
|
||||
{ml-docs}/ml-configuring-transform.html[Transforming data with script fields].
|
||||
|
||||
`scroll_size`::
|
||||
(unsigned integer) The `size` parameter that is used in {es} searches.
|
||||
The default value is `1000`.
|
||||
|
||||
`delayed_data_check_config`::
|
||||
(object) Specifies whether the data feed checks for missing data and
|
||||
the size of the window. For example:
|
||||
`{"enabled": true, "check_window": "1h"}` See
|
||||
<<ml-datafeed-delayed-data-check-config>>.
|
||||
|
||||
`max_empty_searches`::
|
||||
(integer) If a real-time {dfeed} has never seen any data (including during
|
||||
any initial training period) then it will automatically stop itself and
|
||||
close its associated job after this many real-time searches that return no
|
||||
documents. In other words, it will stop after `frequency` times
|
||||
`max_empty_searches` of real-time operation. If not set
|
||||
then a {dfeed} with no end time that sees no data will remain started until
|
||||
it is explicitly stopped. By default this setting is not set.
|
||||
|
||||
[[ml-datafeed-chunking-config]]
|
||||
==== Chunking configuration objects
|
||||
|
||||
{dfeeds-cap} might be required to search over long time periods, for several months
|
||||
or years. This search is split into time chunks in order to ensure the load
|
||||
on {es} is managed. Chunking configuration controls how the size of these time
|
||||
chunks are calculated and is an advanced configuration option.
|
||||
|
||||
A chunking configuration object has the following properties:
|
||||
|
||||
`mode`::
|
||||
There are three available modes: +
|
||||
`auto`::: The chunk size will be dynamically calculated. This is the default
|
||||
and recommended value.
|
||||
`manual`::: Chunking will be applied according to the specified `time_span`.
|
||||
`off`::: No chunking will be applied.
|
||||
|
||||
`time_span`::
|
||||
(time units) The time span that each search will be querying.
|
||||
This setting is only applicable when the mode is set to `manual`.
|
||||
For example: `3h`.
|
||||
|
||||
[[ml-datafeed-delayed-data-check-config]]
|
||||
==== Delayed data check configuration objects
|
||||
|
||||
The {dfeed} can optionally search over indices that have already been read in
|
||||
an effort to determine whether any data has subsequently been added to the index.
|
||||
If missing data is found, it is a good indication that the `query_delay` option
|
||||
is set too low and the data is being indexed after the {dfeed} has passed that
|
||||
moment in time. See
|
||||
{ml-docs}/ml-delayed-data-detection.html[Working with delayed data].
|
||||
|
||||
This check runs only on real-time {dfeeds}.
|
||||
|
||||
The configuration object has the following properties:
|
||||
|
||||
`enabled`::
|
||||
(boolean) Specifies whether the {dfeed} periodically checks for delayed data.
|
||||
Defaults to `true`.
|
||||
|
||||
`check_window`::
|
||||
(time units) The window of time that is searched for late data. This window of
|
||||
time ends with the latest finalized bucket. It defaults to `null`, which
|
||||
causes an appropriate `check_window` to be calculated when the real-time
|
||||
{dfeed} runs. In particular, the default `check_window` span calculation is
|
||||
based on the maximum of `2h` or `8 * bucket_span`.
|
||||
|
||||
[float]
|
||||
[[ml-datafeed-counts]]
|
||||
==== {dfeed-cap} counts
|
||||
|
||||
The get {dfeed} statistics API provides information about the operational
|
||||
progress of a {dfeed}. All of these properties are informational; you cannot
|
||||
update their values:
|
||||
|
||||
`assignment_explanation`::
|
||||
(string) For started {dfeeds} only, contains messages relating to the
|
||||
selection of a node.
|
||||
|
||||
`datafeed_id`::
|
||||
(string) A numerical character string that uniquely identifies the {dfeed}.
|
||||
|
||||
`node`::
|
||||
(object) The node upon which the {dfeed} is started. The {dfeed} and job will
|
||||
be on the same node.
|
||||
`id`::: The unique identifier of the node. For example,
|
||||
"0-o0tOoRTwKFZifatTWKNw".
|
||||
`name`::: The node name. For example, `0-o0tOo`.
|
||||
`ephemeral_id`::: The node ephemeral ID.
|
||||
`transport_address`::: The host and port where transport HTTP connections are
|
||||
accepted. For example, `127.0.0.1:9300`.
|
||||
`attributes`::: For example, `{"ml.machine_memory": "17179869184"}`.
|
||||
|
||||
`state`::
|
||||
(string) The status of the {dfeed}, which can be one of the following values: +
|
||||
`started`::: The {dfeed} is actively receiving data.
|
||||
`stopped`::: The {dfeed} is stopped and will not receive data until it is
|
||||
re-started.
|
||||
|
||||
`timing_stats`::
|
||||
(object) An object that provides statistical information about timing aspect of this datafeed. +
|
||||
`job_id`::: A numerical character string that uniquely identifies the job.
|
||||
`search_count`::: Number of searches performed by this datafeed.
|
||||
`total_search_time_ms`::: Total time the datafeed spent searching in milliseconds.
|
||||
|
@ -28,7 +28,8 @@ can delete it.
|
||||
==== {api-path-parms-title}
|
||||
|
||||
`<feed_id>`::
|
||||
(Required, string) Identifier for the {dfeed}.
|
||||
(Required, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id]
|
||||
|
||||
[[ml-delete-datafeed-query-parms]]
|
||||
==== {api-query-parms-title}
|
||||
|
@ -45,36 +45,66 @@ IMPORTANT: This API returns a maximum of 10,000 {dfeeds}.
|
||||
==== {api-path-parms-title}
|
||||
|
||||
`<feed_id>`::
|
||||
(Optional, string) Identifier for the {dfeed}. It can be a {dfeed} identifier
|
||||
or a wildcard expression. If you do not specify one of these options, the API
|
||||
returns statistics for all {dfeeds}.
|
||||
(Optional, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id-wildcard]
|
||||
+
|
||||
--
|
||||
If you do not specify one of these options, the API returns information about
|
||||
all {dfeeds}.
|
||||
--
|
||||
|
||||
[[ml-get-datafeed-stats-query-parms]]
|
||||
==== {api-query-parms-title}
|
||||
|
||||
`allow_no_datafeeds`::
|
||||
(Optional, boolean) Specifies what to do when the request:
|
||||
+
|
||||
--
|
||||
* Contains wildcard expressions and there are no {datafeeds} that match.
|
||||
* Contains the `_all` string or no identifiers and there are no matches.
|
||||
* Contains wildcard expressions and there are only partial matches.
|
||||
|
||||
The default value is `true`, which returns an empty `datafeeds` array when
|
||||
there are no matches and the subset of results when there are partial matches.
|
||||
If this parameter is `false`, the request returns a `404` status code when there
|
||||
are no matches or only partial matches.
|
||||
--
|
||||
|
||||
(Optional, boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=allow-no-datafeeds]
|
||||
|
||||
[[ml-get-datafeed-stats-results]]
|
||||
==== {api-response-body-title}
|
||||
|
||||
The API returns the following information:
|
||||
The API returns an array of {dfeed} count objects. All of these properties are
|
||||
informational; you cannot update their values.
|
||||
|
||||
`datafeeds`::
|
||||
(array) An array of {dfeed} count objects.
|
||||
For more information, see <<ml-datafeed-counts>>.
|
||||
`assignment_explanation`::
|
||||
(string) For started {dfeeds} only, contains messages relating to the selection
|
||||
of a node.
|
||||
|
||||
`datafeed_id`::
|
||||
(string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id]
|
||||
|
||||
`node`::
|
||||
(object) The node upon which the {dfeed} is started. The {dfeed} and job will be
|
||||
on the same node.
|
||||
`node`.`id`::: The unique identifier of the node. For example,
|
||||
`0-o0tOoRTwKFZifatTWKNw`.
|
||||
`node`.`name`::: The node name. For example, `0-o0tOo`.
|
||||
`node`.`ephemeral_id`::: The node ephemeral ID.
|
||||
`node`.`transport_address`::: The host and port where transport HTTP connections
|
||||
are accepted. For example, `127.0.0.1:9300`.
|
||||
`node`.`attributes`::: For example, `{"ml.machine_memory": "17179869184"}`.
|
||||
|
||||
`state`::
|
||||
(string) The status of the {dfeed}, which can be one of the following values:
|
||||
+
|
||||
--
|
||||
* `started`::: The {dfeed} is actively receiving data.
|
||||
* `stopped`::: The {dfeed} is stopped and will not receive data until it is
|
||||
re-started.
|
||||
--
|
||||
|
||||
`timing_stats`::
|
||||
(object) An object that provides statistical information about timing aspect of
|
||||
this {dfeed}.
|
||||
//average_search_time_per_bucket_ms
|
||||
//bucket_count
|
||||
//exponential_average_search_time_per_hour_ms
|
||||
`timing_stats`.`job_id`:::
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
|
||||
`timing_stats`.`search_count`::: Number of searches performed by this {dfeed}.
|
||||
`timing_stats`.`total_search_time_ms`::: Total time the {dfeed} spent searching
|
||||
in milliseconds.
|
||||
|
||||
[[ml-get-datafeed-stats-response-codes]]
|
||||
==== {api-response-codes-title}
|
||||
@ -86,14 +116,11 @@ The API returns the following information:
|
||||
[[ml-get-datafeed-stats-example]]
|
||||
==== {api-examples-title}
|
||||
|
||||
The following example gets usage information for the
|
||||
`datafeed-total-requests` {dfeed}:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET _ml/datafeeds/datafeed-total-requests/_stats
|
||||
GET _ml/datafeeds/datafeed-high_sum_total_sales/_stats
|
||||
--------------------------------------------------
|
||||
// TEST[skip:setup:server_metrics_startdf]
|
||||
// TEST[skip:Kibana sample data started datafeed]
|
||||
|
||||
The API returns the following results:
|
||||
|
||||
@ -103,7 +130,7 @@ The API returns the following results:
|
||||
"count": 1,
|
||||
"datafeeds": [
|
||||
{
|
||||
"datafeed_id": "datafeed-total-requests",
|
||||
"datafeed_id": "datafeed-high_sum_total_sales",
|
||||
"state": "started",
|
||||
"node": {
|
||||
"id": "2spCyo1pRi2Ajo-j-_dnPX",
|
||||
@ -117,9 +144,12 @@ The API returns the following results:
|
||||
},
|
||||
"assignment_explanation": "",
|
||||
"timing_stats": {
|
||||
"job_id": "job-total-requests",
|
||||
"search_count": 20,
|
||||
"total_search_time_ms": 120.5
|
||||
"job_id" : "high_sum_total_sales",
|
||||
"search_count" : 27,
|
||||
"bucket_count" : 619,
|
||||
"total_search_time_ms" : 296.0,
|
||||
"average_search_time_per_bucket_ms" : 0.4781906300484653,
|
||||
"exponential_average_search_time_per_hour_ms" : 33.28246548059884
|
||||
}
|
||||
}
|
||||
]
|
||||
|
@ -42,35 +42,26 @@ IMPORTANT: This API returns a maximum of 10,000 {dfeeds}.
|
||||
==== {api-path-parms-title}
|
||||
|
||||
`<feed_id>`::
|
||||
(Optional, string) Identifier for the {dfeed}. It can be a {dfeed} identifier
|
||||
or a wildcard expression. If you do not specify one of these options, the API
|
||||
returns information about all {dfeeds}.
|
||||
(Optional, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id-wildcard]
|
||||
+
|
||||
--
|
||||
If you do not specify one of these options, the API returns information about
|
||||
all {dfeeds}.
|
||||
--
|
||||
|
||||
[[ml-get-datafeed-query-parms]]
|
||||
==== {api-query-parms-title}
|
||||
|
||||
`allow_no_datafeeds`::
|
||||
(Optional, boolean) Specifies what to do when the request:
|
||||
+
|
||||
--
|
||||
* Contains wildcard expressions and there are no {datafeeds} that match.
|
||||
* Contains the `_all` string or no identifiers and there are no matches.
|
||||
* Contains wildcard expressions and there are only partial matches.
|
||||
|
||||
The default value is `true`, which returns an empty `datafeeds` array when
|
||||
there are no matches and the subset of results when there are partial matches.
|
||||
If this parameter is `false`, the request returns a `404` status code when there
|
||||
are no matches or only partial matches.
|
||||
--
|
||||
(Optional, boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=allow-no-datafeeds]
|
||||
|
||||
[[ml-get-datafeed-results]]
|
||||
==== {api-response-body-title}
|
||||
|
||||
The API returns the following information:
|
||||
|
||||
`datafeeds`::
|
||||
(array) An array of {dfeed} objects.
|
||||
For more information, see <<ml-datafeed-resource>>.
|
||||
The API returns an array of {dfeed} resources. For the full list of properties,
|
||||
see <<ml-put-datafeed-request-body,create {dfeeds} API>>.
|
||||
|
||||
[[ml-get-datafeed-response-codes]]
|
||||
==== {api-response-codes-title}
|
||||
@ -82,14 +73,11 @@ The API returns the following information:
|
||||
[[ml-get-datafeed-example]]
|
||||
==== {api-examples-title}
|
||||
|
||||
The following example gets configuration information for the
|
||||
`datafeed-total-requests` {dfeed}:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET _ml/datafeeds/datafeed-total-requests
|
||||
GET _ml/datafeeds/datafeed-high_sum_total_sales
|
||||
--------------------------------------------------
|
||||
// TEST[skip:setup:server_metrics_datafeed]
|
||||
// TEST[skip:Kibana sample data]
|
||||
|
||||
The API returns the following results:
|
||||
|
||||
@ -99,23 +87,31 @@ The API returns the following results:
|
||||
"count": 1,
|
||||
"datafeeds": [
|
||||
{
|
||||
"datafeed_id": "datafeed-total-requests",
|
||||
"job_id": "total-requests",
|
||||
"query_delay": "83474ms",
|
||||
"datafeed_id": "datafeed-high_sum_total_sales",
|
||||
"job_id": "high_sum_total_sales",
|
||||
"query_delay": "93169ms",
|
||||
"indices": [
|
||||
"server-metrics"
|
||||
"kibana_sample_data_ecommerce"
|
||||
],
|
||||
"query": {
|
||||
"match_all": {
|
||||
"boost": 1.0
|
||||
"query" : {
|
||||
"bool" : {
|
||||
"filter" : [
|
||||
{
|
||||
"term" : {
|
||||
"_index" : "kibana_sample_data_ecommerce"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"scroll_size": 1000,
|
||||
"chunking_config": {
|
||||
"mode": "auto"
|
||||
},
|
||||
"delayed_data_check_config" : {
|
||||
"enabled" : true
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
----
|
||||
// TESTRESPONSE[s/"query.boost": "1.0"/"query.boost": $body.query.boost/]
|
||||
|
@ -41,18 +41,17 @@ it to ensure it is returning the expected data.
|
||||
==== {api-path-parms-title}
|
||||
|
||||
`<datafeed_id>`::
|
||||
(Required, string) Identifier for the {dfeed}.
|
||||
(Required, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id]
|
||||
|
||||
[[ml-preview-datafeed-example]]
|
||||
==== {api-examples-title}
|
||||
|
||||
The following example obtains a preview of the `datafeed-farequote` {dfeed}:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET _ml/datafeeds/datafeed-farequote/_preview
|
||||
GET _ml/datafeeds/datafeed-high_sum_total_sales/_preview
|
||||
--------------------------------------------------
|
||||
// TEST[skip:setup:farequote_datafeed]
|
||||
// TEST[skip:Kibana sample data]
|
||||
|
||||
The data that is returned for this example is as follows:
|
||||
|
||||
@ -60,22 +59,29 @@ The data that is returned for this example is as follows:
|
||||
----
|
||||
[
|
||||
{
|
||||
"time": 1454803200000,
|
||||
"airline": "JZA",
|
||||
"doc_count": 5,
|
||||
"responsetime": 990.4628295898438
|
||||
"order_date" : 1575504259000,
|
||||
"category.keyword" : "Men's Clothing",
|
||||
"customer_full_name.keyword" : "Sultan Al Benson",
|
||||
"taxful_total_price" : 35.96875
|
||||
},
|
||||
{
|
||||
"time": 1454803200000,
|
||||
"airline": "JBU",
|
||||
"doc_count": 23,
|
||||
"responsetime": 877.5927124023438
|
||||
"order_date" : 1575504518000,
|
||||
"category.keyword" : [
|
||||
"Women's Accessories",
|
||||
"Women's Clothing"
|
||||
],
|
||||
"customer_full_name.keyword" : "Pia Webb",
|
||||
"taxful_total_price" : 83.0
|
||||
},
|
||||
{
|
||||
"time": 1454803200000,
|
||||
"airline": "KLM",
|
||||
"doc_count": 42,
|
||||
"responsetime": 1355.481201171875
|
||||
}
|
||||
"order_date" : 1575505382000,
|
||||
"category.keyword" : [
|
||||
"Women's Accessories",
|
||||
"Women's Shoes"
|
||||
],
|
||||
"customer_full_name.keyword" : "Brigitte Graham",
|
||||
"taxful_total_price" : 72.0
|
||||
},
|
||||
...
|
||||
]
|
||||
----
|
||||
|
@ -43,70 +43,55 @@ those same roles.
|
||||
==== {api-path-parms-title}
|
||||
|
||||
`<feed_id>`::
|
||||
(Required, string) A numerical character string that uniquely identifies the
|
||||
{dfeed}. This identifier can contain lowercase alphanumeric characters (a-z
|
||||
and 0-9), hyphens, and underscores. It must start and end with alphanumeric
|
||||
characters.
|
||||
(Required, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id]
|
||||
|
||||
[[ml-put-datafeed-request-body]]
|
||||
==== {api-request-body-title}
|
||||
|
||||
`aggregations`::
|
||||
(Optional, object) If set, the {dfeed} performs aggregation searches. For more
|
||||
information, see <<ml-datafeed-resource>>.
|
||||
(Optional, object)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=aggregations]
|
||||
|
||||
`chunking_config`::
|
||||
(Optional, object) Specifies how data searches are split into time chunks. See
|
||||
<<ml-datafeed-chunking-config>>.
|
||||
(Optional, object)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=chunking-config]
|
||||
|
||||
`delayed_data_check_config`::
|
||||
(Optional, object) Specifies whether the data feed checks for missing data and
|
||||
the size of the window. See <<ml-datafeed-delayed-data-check-config>>.
|
||||
(Optional, object)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=delayed-data-check-config]
|
||||
|
||||
`frequency`::
|
||||
(Optional, <<time-units, time units>>) The interval at which scheduled queries
|
||||
are made while the {dfeed} runs in real time. The default value is either the
|
||||
bucket span for short bucket spans, or, for longer bucket spans, a sensible
|
||||
fraction of the bucket span. For example: `150s`.
|
||||
(Optional, <<time-units, time units>>)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=frequency]
|
||||
|
||||
`indices`::
|
||||
(Required, array) An array of index names. Wildcards are supported. For
|
||||
example: `["it_ops_metrics", "server*"]`.
|
||||
+
|
||||
--
|
||||
NOTE: If any indices are in remote clusters then `cluster.remote.connect` must
|
||||
not be set to `false` on any ML node.
|
||||
--
|
||||
(Required, array)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=indices]
|
||||
|
||||
`job_id`::
|
||||
(Required, string) A numerical character string that uniquely identifies the
|
||||
{anomaly-job}.
|
||||
(Required, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
|
||||
|
||||
`max_empty_searches`::
|
||||
(Optional,integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=max-empty-searches]
|
||||
|
||||
`query`::
|
||||
(Optional, object) The {es} query domain-specific language (DSL). This value
|
||||
corresponds to the query object in an {es} search POST body. All the options
|
||||
that are supported by {Es} can be used, as this object is passed verbatim to
|
||||
{es}. By default, this property has the following value:
|
||||
`{"match_all": {"boost": 1}}`.
|
||||
(Optional, object)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=query]
|
||||
|
||||
`query_delay`::
|
||||
(Optional, <<time-units, time units>>) The number of seconds behind real time
|
||||
that data is queried. For example, if data from 10:04 a.m. might not be
|
||||
searchable in {es} until 10:06 a.m., set this property to 120 seconds. The
|
||||
default value is `60s`.
|
||||
(Optional, <<time-units, time units>>)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=query-delay]
|
||||
|
||||
`script_fields`::
|
||||
(Optional, object) Specifies scripts that evaluate custom expressions and
|
||||
returns script fields to the {dfeed}. The detector configuration objects in a
|
||||
job can contain functions that use these script fields. For more information,
|
||||
see <<request-body-search-script-fields,Script fields>>.
|
||||
(Optional, object)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=script-fields]
|
||||
|
||||
`scroll_size`::
|
||||
(Optional, unsigned integer) The `size` parameter that is used in {es}
|
||||
searches. The default value is `1000`.
|
||||
|
||||
For more information about these properties,
|
||||
see <<ml-datafeed-resource>>.
|
||||
(Optional, unsigned integer)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=scroll-size]
|
||||
|
||||
[[ml-put-datafeed-example]]
|
||||
==== {api-examples-title}
|
||||
|
@ -74,7 +74,8 @@ creation/update and runs the query using those same roles.
|
||||
==== {api-path-parms-title}
|
||||
|
||||
`<feed_id>`::
|
||||
(Required, string) Identifier for the {dfeed}.
|
||||
(Required, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id]
|
||||
|
||||
[[ml-start-datafeed-request-body]]
|
||||
==== {api-request-body-title}
|
||||
@ -94,7 +95,7 @@ creation/update and runs the query using those same roles.
|
||||
[[ml-start-datafeed-example]]
|
||||
==== {api-examples-title}
|
||||
|
||||
The following example starts the `datafeed-it-ops-kpi` {dfeed}:
|
||||
The following example starts the `datafeed-total-requests` {dfeed}:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
|
@ -40,25 +40,15 @@ comma-separated list of {dfeeds} or a wildcard expression. You can close all
|
||||
==== {api-path-parms-title}
|
||||
|
||||
`<feed_id>`::
|
||||
(Required, string) Identifier for the {dfeed}. It can be a {dfeed} identifier
|
||||
or a wildcard expression.
|
||||
(Required, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id-wildcard]
|
||||
|
||||
[[ml-stop-datafeed-query-parms]]
|
||||
==== {api-query-parms-title}
|
||||
|
||||
`allow_no_datafeeds`::
|
||||
(Optional, boolean) Specifies what to do when the request:
|
||||
+
|
||||
--
|
||||
* Contains wildcard expressions and there are no {datafeeds} that match.
|
||||
* Contains the `_all` string or no identifiers and there are no matches.
|
||||
* Contains wildcard expressions and there are only partial matches.
|
||||
|
||||
The default value is `true`, which returns an empty `datafeeds` array when
|
||||
there are no matches and the subset of results when there are partial matches.
|
||||
If this parameter is `false`, the request returns a `404` status code when there
|
||||
are no matches or only partial matches.
|
||||
--
|
||||
(Optional, boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=allow-no-datafeeds]
|
||||
|
||||
[[ml-stop-datafeed-request-body]]
|
||||
==== {api-request-body-title}
|
||||
|
@ -5,14 +5,15 @@
|
||||
Delayed data are documents that are indexed late. That is to say, it is data
|
||||
related to a time that the {dfeed} has already processed.
|
||||
|
||||
When you create a datafeed, you can specify a
|
||||
{ref}/ml-datafeed-resource.html[`query_delay`] setting. This setting enables the
|
||||
datafeed to wait for some time past real-time, which means any "late" data in
|
||||
this period is fully indexed before the datafeed tries to gather it. However, if
|
||||
the setting is set too low, the datafeed may query for data before it has been
|
||||
indexed and consequently miss that document. Conversely, if it is set too high,
|
||||
analysis drifts farther away from real-time. The balance that is struck depends
|
||||
upon each use case and the environmental factors of the cluster.
|
||||
When you create a {dfeed}, you can specify a
|
||||
{ref}/ml-put-datafeed.html#ml-put-datafeed-request-body[`query_delay`] setting.
|
||||
This setting enables the {dfeed} to wait for some time past real-time, which
|
||||
means any "late" data in this period is fully indexed before the {dfeed} tries
|
||||
to gather it. However, if the setting is set too low, the {dfeed} may query for
|
||||
data before it has been indexed and consequently miss that document. Conversely,
|
||||
if it is set too high, analysis drifts farther away from real-time. The balance
|
||||
that is struck depends upon each use case and the environmental factors of the
|
||||
cluster.
|
||||
|
||||
==== Why worry about delayed data?
|
||||
|
||||
@ -28,8 +29,7 @@ recorded so that you can determine a next course of action.
|
||||
|
||||
==== How do we detect delayed data?
|
||||
|
||||
In addition to the `query_delay` field, there is a
|
||||
{ref}/ml-datafeed-resource.html#ml-datafeed-delayed-data-check-config[delayed data check config],
|
||||
In addition to the `query_delay` field, there is a delayed data check config,
|
||||
which enables you to configure the datafeed to look in the past for delayed data.
|
||||
Every 15 minutes or every `check_window`, whichever is smaller, the datafeed
|
||||
triggers a document search over the configured indices. This search looks over a
|
||||
|
@ -465,3 +465,14 @@ This page was deleted.
|
||||
See the details in
|
||||
[[ml-apimodelplotconfig]]
|
||||
<<ml-put-job>>, <<ml-update-job>>, and <<ml-get-job>>.
|
||||
|
||||
[role="exclude",id="ml-datafeed-resource"]
|
||||
=== {dfeed-cap} resources
|
||||
|
||||
This page was deleted.
|
||||
[[ml-datafeed-chunking-config]]
|
||||
See the details in <<ml-put-datafeed>>, <<ml-update-datafeed>>,
|
||||
[[ml-datafeed-delayed-data-check-config]]
|
||||
<<ml-get-datafeed>>,
|
||||
[[ml-datafeed-counts]]
|
||||
<<ml-get-datafeed-stats>>.
|
@ -5,15 +5,12 @@
|
||||
These resource definitions are used in APIs related to {ml-features} and
|
||||
{security-features} and in {kib} advanced {ml} job configuration options.
|
||||
|
||||
* <<ml-datafeed-resource,{dfeeds-cap}>>
|
||||
* <<ml-datafeed-counts,{dfeed-cap} counts>>
|
||||
* <<ml-dfa-analysis-objects>>
|
||||
* <<ml-jobstats,{anomaly-jobs-cap} statistics>>
|
||||
* <<ml-snapshot-resource,{anomaly-detect-cap} model snapshots>>
|
||||
* <<ml-results-resource,{anomaly-detect-cap} results>>
|
||||
* <<role-mapping-resources,Role mappings>>
|
||||
|
||||
include::{es-repo-dir}/ml/anomaly-detection/apis/datafeedresource.asciidoc[]
|
||||
include::{es-repo-dir}/ml/df-analytics/apis/analysisobjects.asciidoc[]
|
||||
include::{es-repo-dir}/ml/anomaly-detection/apis/jobcounts.asciidoc[]
|
||||
include::{es-repo-dir}/ml/anomaly-detection/apis/snapshotresource.asciidoc[]
|
||||
|
Loading…
x
Reference in New Issue
Block a user