196 lines
6.5 KiB
Plaintext
196 lines
6.5 KiB
Plaintext
[role="xpack"]
|
||
[testenv="basic"]
|
||
[[put-data-frame-transform]]
|
||
=== Create {dataframe-transforms} API
|
||
|
||
[subs="attributes"]
|
||
++++
|
||
<titleabbrev>Create {dataframe-transforms}</titleabbrev>
|
||
++++
|
||
|
||
Instantiates a {dataframe-transform}.
|
||
|
||
beta[]
|
||
|
||
[[put-data-frame-transform-request]]
|
||
==== {api-request-title}
|
||
|
||
`PUT _data_frame/transforms/<data_frame_transform_id>`
|
||
|
||
[[put-data-frame-transform-prereqs]]
|
||
==== {api-prereq-title}
|
||
|
||
* If the {es} {security-features} are enabled, you must have
|
||
`manage_data_frame_transforms` cluster privileges to use this API. The built-in
|
||
`data_frame_transforms_admin` role has these privileges. You must also
|
||
have `read` and `view_index_metadata` privileges on the source index and `read`,
|
||
`create_index`, and `index` privileges on the destination index. For more
|
||
information, see {stack-ov}/security-privileges.html[Security privileges] and
|
||
{stack-ov}/built-in-roles.html[Built-in roles].
|
||
|
||
[[put-data-frame-transform-desc]]
|
||
==== {api-description-title}
|
||
|
||
This API defines a {dataframe-transform}, which copies data from source indices,
|
||
transforms it, and persists it into an entity-centric destination index. The
|
||
entities are defined by the set of `group_by` fields in the `pivot` object. You
|
||
can also think of the destination index as a two-dimensional tabular data
|
||
structure (known as a {dataframe}). The ID for each document in the
|
||
{dataframe} is generated from a hash of the entity, so there is a unique row
|
||
per entity. For more information, see
|
||
{stack-ov}/ml-dataframes.html[{dataframe-transforms-cap}].
|
||
|
||
When the {dataframe-transform} is created, a series of validations occur to
|
||
ensure its success. For example, there is a check for the existence of the
|
||
source indices and a check that the destination index is not part of the source
|
||
index pattern. You can use the `defer_validation` parameter to skip these
|
||
checks.
|
||
|
||
Deferred validations are always run when the {dataframe-transform} is started,
|
||
with the exception of privilege checks. When {es} {security-features} are
|
||
enabled, the {dataframe-transform} remembers which roles the user that created
|
||
it had at the time of creation and uses those same roles. If those roles do not
|
||
have the required privileges on the source and destination indices, the
|
||
{dataframe-transform} fails when it attempts unauthorized operations.
|
||
|
||
IMPORTANT: You must use {kib} or this API to create a {dataframe-transform}.
|
||
Do not put a {dataframe-transform} directly into any
|
||
`.data-frame-internal*` indices using the Elasticsearch index API.
|
||
If {es} {security-features} are enabled, do not give users any
|
||
privileges on `.data-frame-internal*` indices.
|
||
|
||
[[put-data-frame-transform-path-parms]]
|
||
==== {api-path-parms-title}
|
||
|
||
`<data_frame_transform_id>`::
|
||
(Required, string) Identifier for the {dataframe-transform}. This identifier
|
||
can contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and
|
||
underscores. It must start and end with alphanumeric characters.
|
||
|
||
[[put-data-frame-transform-query-parms]]
|
||
==== {api-query-parms-title}
|
||
|
||
`defer_validation`::
|
||
(Optional, boolean) When `true`, deferrable validations are not run. This
|
||
behavior may be desired if the source index does not exist until after the
|
||
{dataframe-transform} is created.
|
||
|
||
[[put-data-frame-transform-request-body]]
|
||
==== {api-request-body-title}
|
||
|
||
`description`::
|
||
(Optional, string) Free text description of the {dataframe-transform}.
|
||
|
||
`dest`::
|
||
(Required, object) Required. The destination configuration, which has the
|
||
following properties:
|
||
|
||
`index`:::
|
||
(Required, string) The _destination index_ for the {dataframe-transform}.
|
||
|
||
`pipeline`:::
|
||
(Optional, string) The unique identifier for a <<pipeline,pipeline>>.
|
||
|
||
`frequency`::
|
||
(Optional, <<time-units, time units>>) The interval between checks for changes in the source
|
||
indices when the {dataframe-transform} is running continuously. Also determines
|
||
the retry interval in the event of transient failures while the {dataframe-transform} is
|
||
searching or indexing. The minimum value is `1s` and the maximum is `1h`. The
|
||
default value is `1m`.
|
||
|
||
`pivot`::
|
||
(Required, object) Defines the pivot function `group by` fields and the aggregation to
|
||
reduce the data. See <<data-frame-transform-pivot>>.
|
||
|
||
`source`::
|
||
(Required, object) The source configuration, which has the following
|
||
properties:
|
||
|
||
`index`:::
|
||
(Required, string or array) The _source indices_ for the
|
||
{dataframe-transform}. It can be a single index, an index pattern (for
|
||
example, `"myindex*"`), or an array of indices (for example,
|
||
`["index1", "index2"]`).
|
||
|
||
`query`:::
|
||
(Optional, object) A query clause that retrieves a subset of data from the
|
||
source index. See <<query-dsl>>.
|
||
|
||
`sync`::
|
||
(Optional, object) Defines the properties required to run continuously.
|
||
`time`:::
|
||
(Required, object) Specifies that the {dataframe-transform} uses a time
|
||
field to synchronize the source and destination indices.
|
||
`field`::::
|
||
(Required, string) The date field that is used to identify new documents
|
||
in the source.
|
||
+
|
||
--
|
||
TIP: In general, it’s a good idea to use a field that contains the
|
||
<<accessing-ingest-metadata,ingest timestamp>>. If you use a different field,
|
||
you might need to set the `delay` such that it accounts for data transmission
|
||
delays.
|
||
|
||
--
|
||
`delay`::::
|
||
(Optional, <<time-units, time units>>) The time delay between the current time and the
|
||
latest input data time. The default value is `60s`.
|
||
|
||
[[put-data-frame-transform-example]]
|
||
==== {api-examples-title}
|
||
|
||
[source,console]
|
||
--------------------------------------------------
|
||
PUT _data_frame/transforms/ecommerce_transform
|
||
{
|
||
"source": {
|
||
"index": "kibana_sample_data_ecommerce",
|
||
"query": {
|
||
"term": {
|
||
"geoip.continent_name": {
|
||
"value": "Asia"
|
||
}
|
||
}
|
||
}
|
||
},
|
||
"pivot": {
|
||
"group_by": {
|
||
"customer_id": {
|
||
"terms": {
|
||
"field": "customer_id"
|
||
}
|
||
}
|
||
},
|
||
"aggregations": {
|
||
"max_price": {
|
||
"max": {
|
||
"field": "taxful_total_price"
|
||
}
|
||
}
|
||
}
|
||
},
|
||
"description": "Maximum priced ecommerce data by customer_id in Asia",
|
||
"dest": {
|
||
"index": "kibana_sample_data_ecommerce_transform",
|
||
"pipeline": "add_timestamp_pipeline"
|
||
},
|
||
"frequency": "5m",
|
||
"sync": {
|
||
"time": {
|
||
"field": "order_date",
|
||
"delay": "60s"
|
||
}
|
||
}
|
||
}
|
||
--------------------------------------------------
|
||
// TEST[setup:kibana_sample_data_ecommerce]
|
||
|
||
When the transform is created, you receive the following results:
|
||
|
||
[source,console-result]
|
||
----
|
||
{
|
||
"acknowledged" : true
|
||
}
|
||
----
|