2019-03-21 12:11:38 -04:00
|
|
|
|
[role="xpack"]
|
|
|
|
|
[testenv="basic"]
|
|
|
|
|
[[put-data-frame-transform]]
|
|
|
|
|
=== Create {dataframe-transforms} API
|
2019-04-30 13:46:13 -04:00
|
|
|
|
|
|
|
|
|
[subs="attributes"]
|
2019-03-21 12:11:38 -04:00
|
|
|
|
++++
|
|
|
|
|
<titleabbrev>Create {dataframe-transforms}</titleabbrev>
|
|
|
|
|
++++
|
|
|
|
|
|
|
|
|
|
Instantiates a {dataframe-transform}.
|
|
|
|
|
|
2019-07-10 17:39:38 -04:00
|
|
|
|
beta[]
|
|
|
|
|
|
2019-06-26 16:46:21 -04:00
|
|
|
|
[[put-data-frame-transform-request]]
|
|
|
|
|
==== {api-request-title}
|
2019-03-21 12:11:38 -04:00
|
|
|
|
|
|
|
|
|
`PUT _data_frame/transforms/<data_frame_transform_id>`
|
|
|
|
|
|
2019-06-26 16:46:21 -04:00
|
|
|
|
[[put-data-frame-transform-prereqs]]
|
|
|
|
|
==== {api-prereq-title}
|
|
|
|
|
|
2019-06-27 18:16:24 -04:00
|
|
|
|
* If the {es} {security-features} are enabled, you must have
|
2019-06-26 16:46:21 -04:00
|
|
|
|
`manage_data_frame_transforms` cluster privileges to use this API. The built-in
|
|
|
|
|
`data_frame_transforms_admin` role has these privileges. You must also
|
|
|
|
|
have `read` and `view_index_metadata` privileges on the source index and `read`,
|
|
|
|
|
`create_index`, and `index` privileges on the destination index. For more
|
|
|
|
|
information, see {stack-ov}/security-privileges.html[Security privileges] and
|
|
|
|
|
{stack-ov}/built-in-roles.html[Built-in roles].
|
|
|
|
|
|
|
|
|
|
[[put-data-frame-transform-desc]]
|
2019-06-27 18:16:24 -04:00
|
|
|
|
==== {api-description-title}
|
2019-05-16 10:10:23 -04:00
|
|
|
|
|
2019-07-24 14:09:06 -04:00
|
|
|
|
This API defines a {dataframe-transform}, which copies data from source indices,
|
|
|
|
|
transforms it, and persists it into an entity-centric destination index. The
|
|
|
|
|
entities are defined by the set of `group_by` fields in the `pivot` object. You
|
|
|
|
|
can also think of the destination index as a two-dimensional tabular data
|
|
|
|
|
structure (known as a {dataframe}). The ID for each document in the
|
|
|
|
|
{dataframe} is generated from a hash of the entity, so there is a unique row
|
|
|
|
|
per entity. For more information, see
|
|
|
|
|
{stack-ov}/ml-dataframes.html[{dataframe-transforms-cap}].
|
|
|
|
|
|
2019-07-22 18:29:59 -04:00
|
|
|
|
When the {dataframe-transform} is created, a series of validations occur to
|
|
|
|
|
ensure its success. For example, there is a check for the existence of the
|
|
|
|
|
source indices and a check that the destination index is not part of the source
|
|
|
|
|
index pattern. You can use the `defer_validation` parameter to skip these
|
|
|
|
|
checks.
|
|
|
|
|
|
2019-07-25 18:03:57 -04:00
|
|
|
|
Deferred validations are always run when the {dataframe-transform} is started,
|
|
|
|
|
with the exception of privilege checks. When {es} {security-features} are
|
|
|
|
|
enabled, the {dataframe-transform} remembers which roles the user that created
|
|
|
|
|
it had at the time of creation and uses those same roles. If those roles do not
|
|
|
|
|
have the required privileges on the source and destination indices, the
|
|
|
|
|
{dataframe-transform} fails when it attempts unauthorized operations.
|
|
|
|
|
|
2019-05-16 10:10:23 -04:00
|
|
|
|
IMPORTANT: You must use {kib} or this API to create a {dataframe-transform}.
|
|
|
|
|
Do not put a {dataframe-transform} directly into any
|
|
|
|
|
`.data-frame-internal*` indices using the Elasticsearch index API.
|
|
|
|
|
If {es} {security-features} are enabled, do not give users any
|
|
|
|
|
privileges on `.data-frame-internal*` indices.
|
2019-03-21 12:11:38 -04:00
|
|
|
|
|
2019-06-26 16:46:21 -04:00
|
|
|
|
[[put-data-frame-transform-path-parms]]
|
|
|
|
|
==== {api-path-parms-title}
|
2019-03-21 12:11:38 -04:00
|
|
|
|
|
2019-07-10 17:39:38 -04:00
|
|
|
|
`<data_frame_transform_id>`::
|
2019-07-12 11:26:31 -04:00
|
|
|
|
(Required, string) Identifier for the {dataframe-transform}. This identifier
|
2019-07-10 17:39:38 -04:00
|
|
|
|
can contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and
|
|
|
|
|
underscores. It must start and end with alphanumeric characters.
|
2019-03-21 12:11:38 -04:00
|
|
|
|
|
2019-07-22 16:12:55 -04:00
|
|
|
|
[[put-data-frame-transform-query-parms]]
|
|
|
|
|
==== {api-query-parms-title}
|
|
|
|
|
|
|
|
|
|
`defer_validation`::
|
2019-07-22 18:29:59 -04:00
|
|
|
|
(Optional, boolean) When `true`, deferrable validations are not run. This
|
|
|
|
|
behavior may be desired if the source index does not exist until after the
|
2019-07-25 18:03:57 -04:00
|
|
|
|
{dataframe-transform} is created.
|
2019-07-22 16:12:55 -04:00
|
|
|
|
|
2019-06-26 16:46:21 -04:00
|
|
|
|
[[put-data-frame-transform-request-body]]
|
|
|
|
|
==== {api-request-body-title}
|
2019-03-21 12:11:38 -04:00
|
|
|
|
|
2019-07-10 17:39:38 -04:00
|
|
|
|
`description`::
|
2019-07-12 11:26:31 -04:00
|
|
|
|
(Optional, string) Free text description of the {dataframe-transform}.
|
2019-03-21 12:11:38 -04:00
|
|
|
|
|
2019-07-10 17:39:38 -04:00
|
|
|
|
`dest`::
|
2019-07-18 10:43:43 -04:00
|
|
|
|
(Required, object) Required. The destination configuration, which has the
|
|
|
|
|
following properties:
|
|
|
|
|
|
|
|
|
|
`index`:::
|
|
|
|
|
(Required, string) The _destination index_ for the {dataframe-transform}.
|
|
|
|
|
|
|
|
|
|
`pipeline`:::
|
|
|
|
|
(Optional, string) The unique identifier for a <<pipeline,pipeline>>.
|
2019-03-21 12:11:38 -04:00
|
|
|
|
|
2019-07-10 17:39:38 -04:00
|
|
|
|
`frequency`::
|
2019-08-07 10:18:35 -04:00
|
|
|
|
(Optional, <<time-units, time units>>) The interval between checks for changes in the source
|
2019-07-22 12:52:39 -04:00
|
|
|
|
indices when the {dataframe-transform} is running continuously. Also determines
|
|
|
|
|
the retry interval in the event of transient failures while the {dataframe-transform} is
|
|
|
|
|
searching or indexing. The minimum value is `1s` and the maximum is `1h`. The
|
|
|
|
|
default value is `1m`.
|
2019-07-10 04:35:23 -04:00
|
|
|
|
|
2019-07-10 17:39:38 -04:00
|
|
|
|
`pivot`::
|
2019-07-12 11:26:31 -04:00
|
|
|
|
(Required, object) Defines the pivot function `group by` fields and the aggregation to
|
2019-06-26 16:46:21 -04:00
|
|
|
|
reduce the data. See <<data-frame-transform-pivot>>.
|
2019-03-21 12:11:38 -04:00
|
|
|
|
|
2019-07-10 17:39:38 -04:00
|
|
|
|
`source`::
|
2019-07-18 10:58:28 -04:00
|
|
|
|
(Required, object) The source configuration, which has the following
|
|
|
|
|
properties:
|
|
|
|
|
|
|
|
|
|
`index`:::
|
|
|
|
|
(Required, string or array) The _source indices_ for the
|
|
|
|
|
{dataframe-transform}. It can be a single index, an index pattern (for
|
|
|
|
|
example, `"myindex*"`), or an array of indices (for example,
|
|
|
|
|
`["index1", "index2"]`).
|
|
|
|
|
|
|
|
|
|
`query`:::
|
|
|
|
|
(Optional, object) A query clause that retrieves a subset of data from the
|
|
|
|
|
source index. See <<query-dsl>>.
|
2019-07-17 11:55:06 -04:00
|
|
|
|
|
|
|
|
|
`sync`::
|
|
|
|
|
(Optional, object) Defines the properties required to run continuously.
|
|
|
|
|
`time`:::
|
|
|
|
|
(Required, object) Specifies that the {dataframe-transform} uses a time
|
|
|
|
|
field to synchronize the source and destination indices.
|
|
|
|
|
`field`::::
|
|
|
|
|
(Required, string) The date field that is used to identify new documents
|
|
|
|
|
in the source.
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
TIP: In general, it’s a good idea to use a field that contains the
|
|
|
|
|
<<accessing-ingest-metadata,ingest timestamp>>. If you use a different field,
|
|
|
|
|
you might need to set the `delay` such that it accounts for data transmission
|
|
|
|
|
delays.
|
|
|
|
|
|
|
|
|
|
--
|
|
|
|
|
`delay`::::
|
2019-08-07 10:18:35 -04:00
|
|
|
|
(Optional, <<time-units, time units>>) The time delay between the current time and the
|
2019-07-17 11:55:06 -04:00
|
|
|
|
latest input data time. The default value is `60s`.
|
2019-03-21 12:11:38 -04:00
|
|
|
|
|
2019-06-26 16:46:21 -04:00
|
|
|
|
[[put-data-frame-transform-example]]
|
2019-06-27 12:42:47 -04:00
|
|
|
|
==== {api-examples-title}
|
2019-03-21 12:11:38 -04:00
|
|
|
|
|
2019-09-09 13:38:14 -04:00
|
|
|
|
[source,console]
|
2019-03-21 12:11:38 -04:00
|
|
|
|
--------------------------------------------------
|
|
|
|
|
PUT _data_frame/transforms/ecommerce_transform
|
|
|
|
|
{
|
2019-04-23 07:38:35 -04:00
|
|
|
|
"source": {
|
|
|
|
|
"index": "kibana_sample_data_ecommerce",
|
|
|
|
|
"query": {
|
|
|
|
|
"term": {
|
|
|
|
|
"geoip.continent_name": {
|
|
|
|
|
"value": "Asia"
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
},
|
2019-03-21 12:11:38 -04:00
|
|
|
|
"pivot": {
|
|
|
|
|
"group_by": {
|
|
|
|
|
"customer_id": {
|
|
|
|
|
"terms": {
|
|
|
|
|
"field": "customer_id"
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
"aggregations": {
|
|
|
|
|
"max_price": {
|
|
|
|
|
"max": {
|
|
|
|
|
"field": "taxful_total_price"
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
2019-04-26 17:50:59 -04:00
|
|
|
|
},
|
2019-07-17 11:55:06 -04:00
|
|
|
|
"description": "Maximum priced ecommerce data by customer_id in Asia",
|
|
|
|
|
"dest": {
|
|
|
|
|
"index": "kibana_sample_data_ecommerce_transform",
|
|
|
|
|
"pipeline": "add_timestamp_pipeline"
|
|
|
|
|
},
|
|
|
|
|
"frequency": "5m",
|
|
|
|
|
"sync": {
|
|
|
|
|
"time": {
|
|
|
|
|
"field": "order_date",
|
|
|
|
|
"delay": "60s"
|
|
|
|
|
}
|
|
|
|
|
}
|
2019-03-21 12:11:38 -04:00
|
|
|
|
}
|
|
|
|
|
--------------------------------------------------
|
2019-07-15 06:58:09 -04:00
|
|
|
|
// TEST[setup:kibana_sample_data_ecommerce]
|
2019-03-21 12:11:38 -04:00
|
|
|
|
|
|
|
|
|
When the transform is created, you receive the following results:
|
2019-09-06 09:22:08 -04:00
|
|
|
|
|
|
|
|
|
[source,console-result]
|
2019-03-21 12:11:38 -04:00
|
|
|
|
----
|
|
|
|
|
{
|
|
|
|
|
"acknowledged" : true
|
|
|
|
|
}
|
|
|
|
|
----
|