21
docs/reference/transform/api-quickref.asciidoc
Normal file
@ -0,0 +1,21 @@
|
||||
[role="xpack"]
|
||||
[[df-api-quickref]]
|
||||
== API quick reference
|
||||
|
||||
All {dataframe-transform} endpoints have the following base:
|
||||
|
||||
[source,js]
|
||||
----
|
||||
/_data_frame/transforms/
|
||||
----
|
||||
// NOTCONSOLE
|
||||
|
||||
* {ref}/put-data-frame-transform.html[Create {dataframe-transforms}]
|
||||
* {ref}/delete-data-frame-transform.html[Delete {dataframe-transforms}]
|
||||
* {ref}/get-data-frame-transform.html[Get {dataframe-transforms}]
|
||||
* {ref}/get-data-frame-transform-stats.html[Get {dataframe-transforms} statistics]
|
||||
* {ref}/preview-data-frame-transform.html[Preview {dataframe-transforms}]
|
||||
* {ref}/start-data-frame-transform.html[Start {dataframe-transforms}]
|
||||
* {ref}/stop-data-frame-transform.html[Stop {dataframe-transforms}]
|
||||
|
||||
For the full list, see {ref}/data-frame-apis.html[{dataframe-transform-cap} APIs].
|
88
docs/reference/transform/checkpoints.asciidoc
Normal file
@ -0,0 +1,88 @@
|
||||
[role="xpack"]
|
||||
[[ml-transform-checkpoints]]
|
||||
== How {dataframe-transform} checkpoints work
|
||||
++++
|
||||
<titleabbrev>How checkpoints work</titleabbrev>
|
||||
++++
|
||||
|
||||
beta[]
|
||||
|
||||
Each time a {dataframe-transform} examines the source indices and creates or
|
||||
updates the destination index, it generates a _checkpoint_.
|
||||
|
||||
If your {dataframe-transform} runs only once, there is logically only one
|
||||
checkpoint. If your {dataframe-transform} runs continuously, however, it creates
|
||||
checkpoints as it ingests and transforms new source data.
|
||||
|
||||
To create a checkpoint, the {cdataframe-transform}:
|
||||
|
||||
. Checks for changes to source indices.
|
||||
+
|
||||
Using a simple periodic timer, the {dataframe-transform} checks for changes to
|
||||
the source indices. This check is done based on the interval defined in the
|
||||
transform's `frequency` property.
|
||||
+
|
||||
If the source indices remain unchanged or if a checkpoint is already in progress
|
||||
then it waits for the next timer.
|
||||
|
||||
. Identifies which entities have changed.
|
||||
+
|
||||
The {dataframe-transform} searches to see which entities have changed since the
|
||||
last time it checked. The transform's `sync` configuration object identifies a
|
||||
time field in the source indices. The transform uses the values in that field to
|
||||
synchronize the source and destination indices.
|
||||
|
||||
. Updates the destination index (the {dataframe}) with the changed entities.
|
||||
+
|
||||
--
|
||||
The {dataframe-transform} applies changes related to either new or changed
|
||||
entities to the destination index. The set of changed entities is paginated. For
|
||||
each page, the {dataframe-transform} performs a composite aggregation using a
|
||||
`terms` query. After all the pages of changes have been applied, the checkpoint
|
||||
is complete.
|
||||
--
|
||||
|
||||
This checkpoint process involves both search and indexing activity on the
|
||||
cluster. We have attempted to favor control over performance while developing
|
||||
{dataframe-transforms}. We decided it was preferable for the
|
||||
{dataframe-transform} to take longer to complete, rather than to finish quickly
|
||||
and take precedence in resource consumption. That being said, the cluster still
|
||||
requires enough resources to support both the composite aggregation search and
|
||||
the indexing of its results.
|
||||
|
||||
TIP: If the cluster experiences unsuitable performance degradation due to the
|
||||
{dataframe-transform}, stop the transform. Consider whether you can apply a
|
||||
source query to the {dataframe-transform} to reduce the scope of data it
|
||||
processes. Also consider whether the cluster has sufficient resources in place
|
||||
to support both the composite aggregation search and the indexing of its
|
||||
results.
|
||||
|
||||
[discrete]
|
||||
[[ml-transform-checkpoint-errors]]
|
||||
==== Error handling
|
||||
|
||||
Failures in {dataframe-transforms} tend to be related to searching or indexing.
|
||||
To increase the resiliency of {dataframe-transforms}, the cursor positions of
|
||||
the aggregated search and the changed entities search are tracked in memory and
|
||||
persisted periodically.
|
||||
|
||||
Checkpoint failures can be categorized as follows:
|
||||
|
||||
* Temporary failures: The checkpoint is retried. If 10 consecutive failures
|
||||
occur, the {dataframe-transform} has a failed status. For example, this
|
||||
situation might occur when there are shard failures and queries return only
|
||||
partial results.
|
||||
* Irrecoverable failures: The {dataframe-transform} immediately fails. For
|
||||
example, this situation occurs when the source index is not found.
|
||||
* Adjustment failures: The {dataframe-transform} retries with adjusted settings.
|
||||
For example, if a parent circuit breaker memory errors occur during the
|
||||
composite aggregation, the transform receives partial results. The aggregated
|
||||
search is retried with a smaller number of buckets. This retry is performed at
|
||||
the interval defined in the transform's `frequency` property. If the search
|
||||
is retried to the point where it reaches a minimal number of buckets, an
|
||||
irrecoverable failure occurs.
|
||||
|
||||
If the node running the {dataframe-transforms} fails, the transform restarts
|
||||
from the most recent persisted cursor position. This recovery process might
|
||||
repeat some of the work the transform had already done, but it ensures data
|
||||
consistency.
|
335
docs/reference/transform/dataframe-examples.asciidoc
Normal file
@ -0,0 +1,335 @@
|
||||
[role="xpack"]
|
||||
[testenv="basic"]
|
||||
[[dataframe-examples]]
|
||||
== {dataframe-transform-cap} examples
|
||||
++++
|
||||
<titleabbrev>Examples</titleabbrev>
|
||||
++++
|
||||
|
||||
beta[]
|
||||
|
||||
These examples demonstrate how to use {dataframe-transforms} to derive useful
|
||||
insights from your data. All the examples use one of the
|
||||
{kibana-ref}/add-sample-data.html[{kib} sample datasets]. For a more detailed,
|
||||
step-by-step example, see
|
||||
<<ecommerce-dataframes,Transforming your data with {dataframes}>>.
|
||||
|
||||
* <<ecommerce-dataframes>>
|
||||
* <<example-best-customers>>
|
||||
* <<example-airline>>
|
||||
* <<example-clientips>>
|
||||
|
||||
include::ecommerce-example.asciidoc[]
|
||||
|
||||
[[example-best-customers]]
|
||||
=== Finding your best customers
|
||||
|
||||
In this example, we use the eCommerce orders sample dataset to find the customers
|
||||
who spent the most in our hypothetical webshop. Let's transform the data such
|
||||
that the destination index contains the number of orders, the total price of
|
||||
the orders, the amount of unique products and the average price per order,
|
||||
and the total amount of ordered products for each customer.
|
||||
|
||||
[source,console]
|
||||
----------------------------------
|
||||
POST _data_frame/transforms/_preview
|
||||
{
|
||||
"source": {
|
||||
"index": "kibana_sample_data_ecommerce"
|
||||
},
|
||||
"dest" : { <1>
|
||||
"index" : "sample_ecommerce_orders_by_customer"
|
||||
},
|
||||
"pivot": {
|
||||
"group_by": { <2>
|
||||
"user": { "terms": { "field": "user" }},
|
||||
"customer_id": { "terms": { "field": "customer_id" }}
|
||||
},
|
||||
"aggregations": {
|
||||
"order_count": { "value_count": { "field": "order_id" }},
|
||||
"total_order_amt": { "sum": { "field": "taxful_total_price" }},
|
||||
"avg_amt_per_order": { "avg": { "field": "taxful_total_price" }},
|
||||
"avg_unique_products_per_order": { "avg": { "field": "total_unique_products" }},
|
||||
"total_unique_products": { "cardinality": { "field": "products.product_id" }}
|
||||
}
|
||||
}
|
||||
}
|
||||
----------------------------------
|
||||
// TEST[skip:setup kibana sample data]
|
||||
|
||||
<1> This is the destination index for the {dataframe}. It is ignored by
|
||||
`_preview`.
|
||||
<2> Two `group_by` fields have been selected. This means the {dataframe} will
|
||||
contain a unique row per `user` and `customer_id` combination. Within this
|
||||
dataset both these fields are unique. By including both in the {dataframe} it
|
||||
gives more context to the final results.
|
||||
|
||||
NOTE: In the example above, condensed JSON formatting has been used for easier
|
||||
readability of the pivot object.
|
||||
|
||||
The preview {dataframe-transforms} API enables you to see the layout of the
|
||||
{dataframe} in advance, populated with some sample values. For example:
|
||||
|
||||
[source,js]
|
||||
----------------------------------
|
||||
{
|
||||
"preview" : [
|
||||
{
|
||||
"total_order_amt" : 3946.9765625,
|
||||
"order_count" : 59.0,
|
||||
"total_unique_products" : 116.0,
|
||||
"avg_unique_products_per_order" : 2.0,
|
||||
"customer_id" : "10",
|
||||
"user" : "recip",
|
||||
"avg_amt_per_order" : 66.89790783898304
|
||||
},
|
||||
...
|
||||
]
|
||||
}
|
||||
----------------------------------
|
||||
// NOTCONSOLE
|
||||
|
||||
This {dataframe} makes it easier to answer questions such as:
|
||||
|
||||
* Which customers spend the most?
|
||||
|
||||
* Which customers spend the most per order?
|
||||
|
||||
* Which customers order most often?
|
||||
|
||||
* Which customers ordered the least number of different products?
|
||||
|
||||
It's possible to answer these questions using aggregations alone, however
|
||||
{dataframes} allow us to persist this data as a customer centric index. This
|
||||
enables us to analyze data at scale and gives more flexibility to explore and
|
||||
navigate data from a customer centric perspective. In some cases, it can even
|
||||
make creating visualizations much simpler.
|
||||
|
||||
[[example-airline]]
|
||||
=== Finding air carriers with the most delays
|
||||
|
||||
In this example, we use the Flights sample dataset to find out which air carrier
|
||||
had the most delays. First, we filter the source data such that it excludes all
|
||||
the cancelled flights by using a query filter. Then we transform the data to
|
||||
contain the distinct number of flights, the sum of delayed minutes, and the sum
|
||||
of the flight minutes by air carrier. Finally, we use a
|
||||
{ref}/search-aggregations-pipeline-bucket-script-aggregation.html[`bucket_script`]
|
||||
to determine what percentage of the flight time was actually delay.
|
||||
|
||||
[source,console]
|
||||
----------------------------------
|
||||
POST _data_frame/transforms/_preview
|
||||
{
|
||||
"source": {
|
||||
"index": "kibana_sample_data_flights",
|
||||
"query": { <1>
|
||||
"bool": {
|
||||
"filter": [
|
||||
{ "term": { "Cancelled": false } }
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"dest" : { <2>
|
||||
"index" : "sample_flight_delays_by_carrier"
|
||||
},
|
||||
"pivot": {
|
||||
"group_by": { <3>
|
||||
"carrier": { "terms": { "field": "Carrier" }}
|
||||
},
|
||||
"aggregations": {
|
||||
"flights_count": { "value_count": { "field": "FlightNum" }},
|
||||
"delay_mins_total": { "sum": { "field": "FlightDelayMin" }},
|
||||
"flight_mins_total": { "sum": { "field": "FlightTimeMin" }},
|
||||
"delay_time_percentage": { <4>
|
||||
"bucket_script": {
|
||||
"buckets_path": {
|
||||
"delay_time": "delay_mins_total.value",
|
||||
"flight_time": "flight_mins_total.value"
|
||||
},
|
||||
"script": "(params.delay_time / params.flight_time) * 100"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
----------------------------------
|
||||
// TEST[skip:setup kibana sample data]
|
||||
|
||||
<1> Filter the source data to select only flights that were not cancelled.
|
||||
<2> This is the destination index for the {dataframe}. It is ignored by
|
||||
`_preview`.
|
||||
<3> The data is grouped by the `Carrier` field which contains the airline name.
|
||||
<4> This `bucket_script` performs calculations on the results that are returned
|
||||
by the aggregation. In this particular example, it calculates what percentage of
|
||||
travel time was taken up by delays.
|
||||
|
||||
The preview shows you that the new index would contain data like this for each
|
||||
carrier:
|
||||
|
||||
[source,js]
|
||||
----------------------------------
|
||||
{
|
||||
"preview" : [
|
||||
{
|
||||
"carrier" : "ES-Air",
|
||||
"flights_count" : 2802.0,
|
||||
"flight_mins_total" : 1436927.5130677223,
|
||||
"delay_time_percentage" : 9.335543983955839,
|
||||
"delay_mins_total" : 134145.0
|
||||
},
|
||||
...
|
||||
]
|
||||
}
|
||||
----------------------------------
|
||||
// NOTCONSOLE
|
||||
|
||||
This {dataframe} makes it easier to answer questions such as:
|
||||
|
||||
* Which air carrier has the most delays as a percentage of flight time?
|
||||
|
||||
NOTE: This data is fictional and does not reflect actual delays
|
||||
or flight stats for any of the featured destination or origin airports.
|
||||
|
||||
|
||||
[[example-clientips]]
|
||||
=== Finding suspicious client IPs by using scripted metrics
|
||||
|
||||
With {dataframe-transforms}, you can use
|
||||
{ref}/search-aggregations-metrics-scripted-metric-aggregation.html[scripted
|
||||
metric aggregations] on your data. These aggregations are flexible and make
|
||||
it possible to perform very complex processing. Let's use scripted metrics to
|
||||
identify suspicious client IPs in the web log sample dataset.
|
||||
|
||||
We transform the data such that the new index contains the sum of bytes and the
|
||||
number of distinct URLs, agents, incoming requests by location, and geographic
|
||||
destinations for each client IP. We also use a scripted field to count the
|
||||
specific types of HTTP responses that each client IP receives. Ultimately, the
|
||||
example below transforms web log data into an entity centric index where the
|
||||
entity is `clientip`.
|
||||
|
||||
[source,console]
|
||||
----------------------------------
|
||||
POST _data_frame/transforms/_preview
|
||||
{
|
||||
"source": {
|
||||
"index": "kibana_sample_data_logs",
|
||||
"query": { <1>
|
||||
"range" : {
|
||||
"timestamp" : {
|
||||
"gte" : "now-30d/d"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"dest" : { <2>
|
||||
"index" : "sample_weblogs_by_clientip"
|
||||
},
|
||||
"pivot": {
|
||||
"group_by": { <3>
|
||||
"clientip": { "terms": { "field": "clientip" } }
|
||||
},
|
||||
"aggregations": {
|
||||
"url_dc": { "cardinality": { "field": "url.keyword" }},
|
||||
"bytes_sum": { "sum": { "field": "bytes" }},
|
||||
"geo.src_dc": { "cardinality": { "field": "geo.src" }},
|
||||
"agent_dc": { "cardinality": { "field": "agent.keyword" }},
|
||||
"geo.dest_dc": { "cardinality": { "field": "geo.dest" }},
|
||||
"responses.total": { "value_count": { "field": "timestamp" }},
|
||||
"responses.counts": { <4>
|
||||
"scripted_metric": {
|
||||
"init_script": "state.responses = ['error':0L,'success':0L,'other':0L]",
|
||||
"map_script": """
|
||||
def code = doc['response.keyword'].value;
|
||||
if (code.startsWith('5') || code.startsWith('4')) {
|
||||
state.responses.error += 1 ;
|
||||
} else if(code.startsWith('2')) {
|
||||
state.responses.success += 1;
|
||||
} else {
|
||||
state.responses.other += 1;
|
||||
}
|
||||
""",
|
||||
"combine_script": "state.responses",
|
||||
"reduce_script": """
|
||||
def counts = ['error': 0L, 'success': 0L, 'other': 0L];
|
||||
for (responses in states) {
|
||||
counts.error += responses['error'];
|
||||
counts.success += responses['success'];
|
||||
counts.other += responses['other'];
|
||||
}
|
||||
return counts;
|
||||
"""
|
||||
}
|
||||
},
|
||||
"timestamp.min": { "min": { "field": "timestamp" }},
|
||||
"timestamp.max": { "max": { "field": "timestamp" }},
|
||||
"timestamp.duration_ms": { <5>
|
||||
"bucket_script": {
|
||||
"buckets_path": {
|
||||
"min_time": "timestamp.min.value",
|
||||
"max_time": "timestamp.max.value"
|
||||
},
|
||||
"script": "(params.max_time - params.min_time)"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
----------------------------------
|
||||
// TEST[skip:setup kibana sample data]
|
||||
|
||||
<1> This range query limits the transform to documents that are within the last
|
||||
30 days at the point in time the {dataframe-transform} checkpoint is processed.
|
||||
For batch {dataframes} this occurs once.
|
||||
<2> This is the destination index for the {dataframe}. It is ignored by
|
||||
`_preview`.
|
||||
<3> The data is grouped by the `clientip` field.
|
||||
<4> This `scripted_metric` performs a distributed operation on the web log data
|
||||
to count specific types of HTTP responses (error, success, and other).
|
||||
<5> This `bucket_script` calculates the duration of the `clientip` access based
|
||||
on the results of the aggregation.
|
||||
|
||||
The preview shows you that the new index would contain data like this for each
|
||||
client IP:
|
||||
|
||||
[source,js]
|
||||
----------------------------------
|
||||
{
|
||||
"preview" : [
|
||||
{
|
||||
"geo" : {
|
||||
"src_dc" : 12.0,
|
||||
"dest_dc" : 9.0
|
||||
},
|
||||
"clientip" : "0.72.176.46",
|
||||
"agent_dc" : 3.0,
|
||||
"responses" : {
|
||||
"total" : 14.0,
|
||||
"counts" : {
|
||||
"other" : 0,
|
||||
"success" : 14,
|
||||
"error" : 0
|
||||
}
|
||||
},
|
||||
"bytes_sum" : 74808.0,
|
||||
"timestamp" : {
|
||||
"duration_ms" : 4.919943239E9,
|
||||
"min" : "2019-06-17T07:51:57.333Z",
|
||||
"max" : "2019-08-13T06:31:00.572Z"
|
||||
},
|
||||
"url_dc" : 11.0
|
||||
},
|
||||
...
|
||||
}
|
||||
----------------------------------
|
||||
// NOTCONSOLE
|
||||
|
||||
This {dataframe} makes it easier to answer questions such as:
|
||||
|
||||
* Which client IPs are transferring the most amounts of data?
|
||||
|
||||
* Which client IPs are interacting with a high number of different URLs?
|
||||
|
||||
* Which client IPs have high error rates?
|
||||
|
||||
* Which client IPs are interacting with a high number of destination countries?
|
262
docs/reference/transform/ecommerce-example.asciidoc
Normal file
@ -0,0 +1,262 @@
|
||||
[role="xpack"]
|
||||
[testenv="basic"]
|
||||
[[ecommerce-dataframes]]
|
||||
=== Transforming the eCommerce sample data
|
||||
|
||||
beta[]
|
||||
|
||||
<<ml-dataframes,{dataframe-transforms-cap}>> enable you to retrieve information
|
||||
from an {es} index, transform it, and store it in another index. Let's use the
|
||||
{kibana-ref}/add-sample-data.html[{kib} sample data] to demonstrate how you can
|
||||
pivot and summarize your data with {dataframe-transforms}.
|
||||
|
||||
|
||||
. If the {es} {security-features} are enabled, obtain a user ID with sufficient
|
||||
privileges to complete these steps.
|
||||
+
|
||||
--
|
||||
You need `manage_data_frame_transforms` cluster privileges to preview and create
|
||||
{dataframe-transforms}. Members of the built-in `data_frame_transforms_admin`
|
||||
role have these privileges.
|
||||
|
||||
You also need `read` and `view_index_metadata` index privileges on the source
|
||||
index and `read`, `create_index`, and `index` privileges on the destination
|
||||
index.
|
||||
|
||||
For more information, see <<security-privileges>> and <<built-in-roles>>.
|
||||
--
|
||||
|
||||
. Choose your _source index_.
|
||||
+
|
||||
--
|
||||
In this example, we'll use the eCommerce orders sample data. If you're not
|
||||
already familiar with the `kibana_sample_data_ecommerce` index, use the
|
||||
*Revenue* dashboard in {kib} to explore the data. Consider what insights you
|
||||
might want to derive from this eCommerce data.
|
||||
--
|
||||
|
||||
. Play with various options for grouping and aggregating the data.
|
||||
+
|
||||
--
|
||||
For example, you might want to group the data by product ID and calculate the
|
||||
total number of sales for each product and its average price. Alternatively, you
|
||||
might want to look at the behavior of individual customers and calculate how
|
||||
much each customer spent in total and how many different categories of products
|
||||
they purchased. Or you might want to take the currencies or geographies into
|
||||
consideration. What are the most interesting ways you can transform and
|
||||
interpret this data?
|
||||
|
||||
_Pivoting_ your data involves using at least one field to group it and applying
|
||||
at least one aggregation. You can preview what the transformed data will look
|
||||
like, so go ahead and play with it!
|
||||
|
||||
For example, go to *Machine Learning* > *Data Frames* in {kib} and use the
|
||||
wizard to create a {dataframe-transform}:
|
||||
|
||||
[role="screenshot"]
|
||||
image::images/ecommerce-pivot1.jpg["Creating a simple {dataframe-transform} in {kib}"]
|
||||
|
||||
In this case, we grouped the data by customer ID and calculated the sum of
|
||||
products each customer purchased.
|
||||
|
||||
Let's add some more aggregations to learn more about our customers' orders. For
|
||||
example, let's calculate the total sum of their purchases, the maximum number of
|
||||
products that they purchased in a single order, and their total number of orders.
|
||||
We'll accomplish this by using the
|
||||
{ref}/search-aggregations-metrics-sum-aggregation.html[`sum` aggregation] on the
|
||||
`taxless_total_price` field, the
|
||||
{ref}/search-aggregations-metrics-max-aggregation.html[`max` aggregation] on the
|
||||
`total_quantity` field, and the
|
||||
{ref}/search-aggregations-metrics-cardinality-aggregation.html[`cardinality` aggregation]
|
||||
on the `order_id` field:
|
||||
|
||||
[role="screenshot"]
|
||||
image::images/ecommerce-pivot2.jpg["Adding multiple aggregations to a {dataframe-transform} in {kib}"]
|
||||
|
||||
TIP: If you're interested in a subset of the data, you can optionally include a
|
||||
{ref}/search-request-body.html#request-body-search-query[query] element. In this
|
||||
example, we've filtered the data so that we're only looking at orders with a
|
||||
`currency` of `EUR`. Alternatively, we could group the data by that field too.
|
||||
If you want to use more complex queries, you can create your {dataframe} from a
|
||||
{kibana-ref}/save-open-search.html[saved search].
|
||||
|
||||
If you prefer, you can use the
|
||||
{ref}/preview-data-frame-transform.html[preview {dataframe-transforms} API]:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
POST _data_frame/transforms/_preview
|
||||
{
|
||||
"source": {
|
||||
"index": "kibana_sample_data_ecommerce",
|
||||
"query": {
|
||||
"bool": {
|
||||
"filter": {
|
||||
"term": {"currency": "EUR"}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"pivot": {
|
||||
"group_by": {
|
||||
"customer_id": {
|
||||
"terms": {
|
||||
"field": "customer_id"
|
||||
}
|
||||
}
|
||||
},
|
||||
"aggregations": {
|
||||
"total_quantity.sum": {
|
||||
"sum": {
|
||||
"field": "total_quantity"
|
||||
}
|
||||
},
|
||||
"taxless_total_price.sum": {
|
||||
"sum": {
|
||||
"field": "taxless_total_price"
|
||||
}
|
||||
},
|
||||
"total_quantity.max": {
|
||||
"max": {
|
||||
"field": "total_quantity"
|
||||
}
|
||||
},
|
||||
"order_id.cardinality": {
|
||||
"cardinality": {
|
||||
"field": "order_id"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[skip:set up sample data]
|
||||
--
|
||||
|
||||
. When you are satisfied with what you see in the preview, create the
|
||||
{dataframe-transform}.
|
||||
+
|
||||
--
|
||||
.. Supply a job ID and the name of the target (or _destination_) index.
|
||||
|
||||
.. Decide whether you want the {dataframe-transform} to run once or continuously.
|
||||
--
|
||||
+
|
||||
--
|
||||
Since this sample data index is unchanging, let's use the default behavior and
|
||||
just run the {dataframe-transform} once.
|
||||
|
||||
[role="screenshot"]
|
||||
image::images/ecommerce-batch.jpg["Specifying the {dataframe-transform} options in {kib}"]
|
||||
|
||||
If you want to try it out, however, go ahead and click on *Continuous mode*.
|
||||
You must choose a field that the {dataframe-transform} can use to check which
|
||||
entities have changed. In general, it's a good idea to use the ingest timestamp
|
||||
field. In this example, however, you can use the `order_date` field.
|
||||
|
||||
If you prefer, you can use the
|
||||
{ref}/put-data-frame-transform.html[create {dataframe-transforms} API]. For
|
||||
example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT _data_frame/transforms/ecommerce-customer-transform
|
||||
{
|
||||
"source": {
|
||||
"index": [
|
||||
"kibana_sample_data_ecommerce"
|
||||
],
|
||||
"query": {
|
||||
"bool": {
|
||||
"filter": {
|
||||
"term": {
|
||||
"currency": "EUR"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"pivot": {
|
||||
"group_by": {
|
||||
"customer_id": {
|
||||
"terms": {
|
||||
"field": "customer_id"
|
||||
}
|
||||
}
|
||||
},
|
||||
"aggregations": {
|
||||
"total_quantity.sum": {
|
||||
"sum": {
|
||||
"field": "total_quantity"
|
||||
}
|
||||
},
|
||||
"taxless_total_price.sum": {
|
||||
"sum": {
|
||||
"field": "taxless_total_price"
|
||||
}
|
||||
},
|
||||
"total_quantity.max": {
|
||||
"max": {
|
||||
"field": "total_quantity"
|
||||
}
|
||||
},
|
||||
"order_id.cardinality": {
|
||||
"cardinality": {
|
||||
"field": "order_id"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"dest": {
|
||||
"index": "ecommerce-customers"
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[skip:setup kibana sample data]
|
||||
--
|
||||
|
||||
. Start the {dataframe-transform}.
|
||||
+
|
||||
--
|
||||
|
||||
TIP: Even though resource utilization is automatically adjusted based on the
|
||||
cluster load, a {dataframe-transform} increases search and indexing load on your
|
||||
cluster while it runs. If you're experiencing an excessive load, however, you
|
||||
can stop it.
|
||||
|
||||
You can start, stop, and manage {dataframe-transforms} in {kib}:
|
||||
|
||||
[role="screenshot"]
|
||||
image::images/dataframe-transforms.jpg["Managing {dataframe-transforms} in {kib}"]
|
||||
|
||||
Alternatively, you can use the
|
||||
{ref}/start-data-frame-transform.html[start {dataframe-transforms}] and
|
||||
{ref}/stop-data-frame-transform.html[stop {dataframe-transforms}] APIs. For
|
||||
example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
POST _data_frame/transforms/ecommerce-customer-transform/_start
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[skip:setup kibana sample data]
|
||||
|
||||
--
|
||||
|
||||
. Explore the data in your new index.
|
||||
+
|
||||
--
|
||||
For example, use the *Discover* application in {kib}:
|
||||
|
||||
[role="screenshot"]
|
||||
image::images/ecommerce-results.jpg["Exploring the new index in {kib}"]
|
||||
|
||||
--
|
||||
|
||||
TIP: If you do not want to keep the {dataframe-transform}, you can delete it in
|
||||
{kib} or use the
|
||||
{ref}/delete-data-frame-transform.html[delete {dataframe-transform} API]. When
|
||||
you delete a {dataframe-transform}, its destination index and {kib} index
|
||||
patterns remain.
|
BIN
docs/reference/transform/images/dataframe-transforms.jpg
Normal file
After Width: | Height: | Size: 240 KiB |
BIN
docs/reference/transform/images/ecommerce-batch.jpg
Normal file
After Width: | Height: | Size: 123 KiB |
BIN
docs/reference/transform/images/ecommerce-continuous.jpg
Normal file
After Width: | Height: | Size: 194 KiB |
BIN
docs/reference/transform/images/ecommerce-pivot1.jpg
Normal file
After Width: | Height: | Size: 489 KiB |
BIN
docs/reference/transform/images/ecommerce-pivot2.jpg
Normal file
After Width: | Height: | Size: 558 KiB |
BIN
docs/reference/transform/images/ecommerce-results.jpg
Normal file
After Width: | Height: | Size: 339 KiB |
BIN
docs/reference/transform/images/ml-dataframepivot.jpg
Normal file
After Width: | Height: | Size: 92 KiB |
82
docs/reference/transform/index.asciidoc
Normal file
@ -0,0 +1,82 @@
|
||||
[role="xpack"]
|
||||
[[ml-dataframes]]
|
||||
= {dataframe-transforms-cap}
|
||||
|
||||
[partintro]
|
||||
--
|
||||
|
||||
beta[]
|
||||
|
||||
{es} aggregations are a powerful and flexible feature that enable you to
|
||||
summarize and retrieve complex insights about your data. You can summarize
|
||||
complex things like the number of web requests per day on a busy website, broken
|
||||
down by geography and browser type. If you use the same data set to try to
|
||||
calculate something as simple as a single number for the average duration of
|
||||
visitor web sessions, however, you can quickly run out of memory.
|
||||
|
||||
Why does this occur? A web session duration is an example of a behavioral
|
||||
attribute not held on any one log record; it has to be derived by finding the
|
||||
first and last records for each session in our weblogs. This derivation requires
|
||||
some complex query expressions and a lot of memory to connect all the data
|
||||
points. If you have an ongoing background process that fuses related events from
|
||||
one index into entity-centric summaries in another index, you get a more useful,
|
||||
joined-up picture--this is essentially what _{dataframes}_ are.
|
||||
|
||||
|
||||
[discrete]
|
||||
[[ml-dataframes-usage]]
|
||||
== When to use {dataframes}
|
||||
|
||||
You might want to consider using {dataframes} instead of aggregations when:
|
||||
|
||||
* You need a complete _feature index_ rather than a top-N set of items.
|
||||
+
|
||||
In {ml}, you often need a complete set of behavioral features rather just the
|
||||
top-N. For example, if you are predicting customer churn, you might look at
|
||||
features such as the number of website visits in the last week, the total number
|
||||
of sales, or the number of emails sent. The {stack} {ml-features} create models
|
||||
based on this multi-dimensional feature space, so they benefit from full feature
|
||||
indices ({dataframes}).
|
||||
+
|
||||
This scenario also applies when you are trying to search across the results of
|
||||
an aggregation or multiple aggregations. Aggregation results can be ordered or
|
||||
filtered, but there are
|
||||
{ref}/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-order[limitations to ordering]
|
||||
and
|
||||
{ref}/search-aggregations-pipeline-bucket-selector-aggregation.html[filtering by bucket selector]
|
||||
is constrained by the maximum number of buckets returned. If you want to search
|
||||
all aggregation results, you need to create the complete {dataframe}. If you
|
||||
need to sort or filter the aggregation results by multiple fields, {dataframes}
|
||||
are particularly useful.
|
||||
|
||||
* You need to sort aggregation results by a pipeline aggregation.
|
||||
+
|
||||
{ref}/search-aggregations-pipeline.html[Pipeline aggregations] cannot be used
|
||||
for sorting. Technically, this is because pipeline aggregations are run during
|
||||
the reduce phase after all other aggregations have already completed. If you
|
||||
create a {dataframe}, you can effectively perform multiple passes over the data.
|
||||
|
||||
* You want to create summary tables to optimize queries.
|
||||
+
|
||||
For example, if you
|
||||
have a high level dashboard that is accessed by a large number of users and it
|
||||
uses a complex aggregation over a large dataset, it may be more efficient to
|
||||
create a {dataframe} to cache results. Thus, each user doesn't need to run the
|
||||
aggregation query.
|
||||
|
||||
Though there are multiple ways to create {dataframes}, this content pertains
|
||||
to one specific method: _{dataframe-transforms}_.
|
||||
|
||||
* <<ml-transform-overview>>
|
||||
* <<df-api-quickref>>
|
||||
* <<dataframe-examples>>
|
||||
* <<dataframe-troubleshooting>>
|
||||
* <<dataframe-limitations>>
|
||||
--
|
||||
|
||||
include::overview.asciidoc[]
|
||||
include::checkpoints.asciidoc[]
|
||||
include::api-quickref.asciidoc[]
|
||||
include::dataframe-examples.asciidoc[]
|
||||
include::troubleshooting.asciidoc[]
|
||||
include::limitations.asciidoc[]
|
219
docs/reference/transform/limitations.asciidoc
Normal file
@ -0,0 +1,219 @@
|
||||
[role="xpack"]
|
||||
[[dataframe-limitations]]
|
||||
== {dataframe-transform-cap} limitations
|
||||
[subs="attributes"]
|
||||
++++
|
||||
<titleabbrev>Limitations</titleabbrev>
|
||||
++++
|
||||
|
||||
beta[]
|
||||
|
||||
The following limitations and known problems apply to the 7.4 release of
|
||||
the Elastic {dataframe} feature:
|
||||
|
||||
[float]
|
||||
[[df-compatibility-limitations]]
|
||||
=== Beta {dataframe-transforms} do not have guaranteed backwards or forwards compatibility
|
||||
|
||||
Whilst {dataframe-transforms} are beta, it is not guaranteed that a
|
||||
{dataframe-transform} created in a previous version of the {stack} will be able
|
||||
to start and operate in a future version. Neither can support be provided for
|
||||
{dataframe-transform} tasks to be able to operate in a cluster with mixed node
|
||||
versions.
|
||||
Please note that the output of a {dataframe-transform} is persisted to a
|
||||
destination index. This is a normal {es} index and is not affected by the beta
|
||||
status.
|
||||
|
||||
[float]
|
||||
[[df-ui-limitation]]
|
||||
=== {dataframe-cap} UI will not work during a rolling upgrade from 7.2
|
||||
|
||||
If your cluster contains mixed version nodes, for example during a rolling
|
||||
upgrade from 7.2 to a newer version, and {dataframe-transforms} have been
|
||||
created in 7.2, the {dataframe} UI will not work. Please wait until all nodes
|
||||
have been upgraded to the newer version before using the {dataframe} UI.
|
||||
|
||||
|
||||
[float]
|
||||
[[df-datatype-limitations]]
|
||||
=== {dataframe-cap} data type limitation
|
||||
|
||||
{dataframes-cap} do not (yet) support fields containing arrays – in the UI or
|
||||
the API. If you try to create one, the UI will fail to show the source index
|
||||
table.
|
||||
|
||||
[float]
|
||||
[[df-ccs-limitations]]
|
||||
=== {ccs-cap} is not supported
|
||||
|
||||
{ccs-cap} is not supported for {dataframe-transforms}.
|
||||
|
||||
[float]
|
||||
[[df-kibana-limitations]]
|
||||
=== Up to 1,000 {dataframe-transforms} are supported
|
||||
|
||||
A single cluster will support up to 1,000 {dataframe-transforms}.
|
||||
When using the
|
||||
{ref}/get-data-frame-transform.html[GET {dataframe-transforms} API] a total
|
||||
`count` of transforms is returned. Use the `size` and `from` parameters to
|
||||
enumerate through the full list.
|
||||
|
||||
[float]
|
||||
[[df-aggresponse-limitations]]
|
||||
=== Aggregation responses may be incompatible with destination index mappings
|
||||
|
||||
When a {dataframe-transform} is first started, it will deduce the mappings
|
||||
required for the destination index. This process is based on the field types of
|
||||
the source index and the aggregations used. If the fields are derived from
|
||||
{ref}/search-aggregations-metrics-scripted-metric-aggregation.html[`scripted_metrics`]
|
||||
or {ref}/search-aggregations-pipeline-bucket-script-aggregation.html[`bucket_scripts`],
|
||||
{ref}/dynamic-mapping.html[dynamic mappings] will be used. In some instances the
|
||||
deduced mappings may be incompatible with the actual data. For example, numeric
|
||||
overflows might occur or dynamically mapped fields might contain both numbers
|
||||
and strings. Please check {es} logs if you think this may have occurred. As a
|
||||
workaround, you may define custom mappings prior to starting the
|
||||
{dataframe-transform}. For example,
|
||||
{ref}/indices-create-index.html[create a custom destination index] or
|
||||
{ref}/indices-templates.html[define an index template].
|
||||
|
||||
[float]
|
||||
[[df-batch-limitations]]
|
||||
=== Batch {dataframe-transforms} may not account for changed documents
|
||||
|
||||
A batch {dataframe-transform} uses a
|
||||
{ref}/search-aggregations-bucket-composite-aggregation.html[composite aggregation]
|
||||
which allows efficient pagination through all buckets. Composite aggregations
|
||||
do not yet support a search context, therefore if the source data is changed
|
||||
(deleted, updated, added) while the batch {dataframe} is in progress, then the
|
||||
results may not include these changes.
|
||||
|
||||
[float]
|
||||
[[df-consistency-limitations]]
|
||||
=== {cdataframe-cap} consistency does not account for deleted or updated documents
|
||||
|
||||
While the process for {cdataframe-transforms} allows the continual recalculation
|
||||
of the {dataframe-transform} as new data is being ingested, it does also have
|
||||
some limitations.
|
||||
|
||||
Changed entities will only be identified if their time field
|
||||
has also been updated and falls within the range of the action to check for
|
||||
changes. This has been designed in principle for, and is suited to, the use case
|
||||
where new data is given a timestamp for the time of ingest.
|
||||
|
||||
If the indices that fall within the scope of the source index pattern are
|
||||
removed, for example when deleting historical time-based indices, then the
|
||||
composite aggregation performed in consecutive checkpoint processing will search
|
||||
over different source data, and entities that only existed in the deleted index
|
||||
will not be removed from the {dataframe} destination index.
|
||||
|
||||
Depending on your use case, you may wish to recreate the {dataframe-transform}
|
||||
entirely after deletions. Alternatively, if your use case is tolerant to
|
||||
historical archiving, you may wish to include a max ingest timestamp in your
|
||||
aggregation. This will allow you to exclude results that have not been recently
|
||||
updated when viewing the {dataframe} destination index.
|
||||
|
||||
|
||||
[float]
|
||||
[[df-deletion-limitations]]
|
||||
=== Deleting a {dataframe-transform} does not delete the {dataframe} destination index or {kib} index pattern
|
||||
|
||||
When deleting a {dataframe-transform} using `DELETE _data_frame/transforms/index`
|
||||
neither the {dataframe} destination index nor the {kib} index pattern, should
|
||||
one have been created, are deleted. These objects must be deleted separately.
|
||||
|
||||
[float]
|
||||
[[df-aggregation-page-limitations]]
|
||||
=== Handling dynamic adjustment of aggregation page size
|
||||
|
||||
During the development of {dataframe-transforms}, control was favoured over
|
||||
performance. In the design considerations, it is preferred for the
|
||||
{dataframe-transform} to take longer to complete quietly in the background
|
||||
rather than to finish quickly and take precedence in resource consumption.
|
||||
|
||||
Composite aggregations are well suited for high cardinality data enabling
|
||||
pagination through results. If a {ref}/circuit-breaker.html[circuit breaker]
|
||||
memory exception occurs when performing the composite aggregated search then we
|
||||
try again reducing the number of buckets requested. This circuit breaker is
|
||||
calculated based upon all activity within the cluster, not just activity from
|
||||
{dataframe-transforms}, so it therefore may only be a temporary resource
|
||||
availability issue.
|
||||
|
||||
For a batch {dataframe-transform}, the number of buckets requested is only ever
|
||||
adjusted downwards. The lowering of value may result in a longer duration for the
|
||||
transform checkpoint to complete. For {cdataframes}, the number of
|
||||
buckets requested is reset back to its default at the start of every checkpoint
|
||||
and it is possible for circuit breaker exceptions to occur repeatedly in the
|
||||
{es} logs.
|
||||
|
||||
The {dataframe-transform} retrieves data in batches which means it calculates
|
||||
several buckets at once. Per default this is 500 buckets per search/index
|
||||
operation. The default can be changed using `max_page_search_size` and the
|
||||
minimum value is 10. If failures still occur once the number of buckets
|
||||
requested has been reduced to its minimum, then the {dataframe-transform} will
|
||||
be set to a failed state.
|
||||
|
||||
[float]
|
||||
[[df-dynamic-adjustments-limitations]]
|
||||
=== Handling dynamic adjustments for many terms
|
||||
|
||||
For each checkpoint, entities are identified that have changed since the last
|
||||
time the check was performed. This list of changed entities is supplied as a
|
||||
{ref}/query-dsl-terms-query.html[terms query] to the {dataframe-transform}
|
||||
composite aggregation, one page at a time. Then updates are applied to the
|
||||
destination index for each page of entities.
|
||||
|
||||
The page `size` is defined by `max_page_search_size` which is also used to
|
||||
define the number of buckets returned by the composite aggregation search. The
|
||||
default value is 500, the minimum is 10.
|
||||
|
||||
The index setting
|
||||
{ref}/index-modules.html#dynamic-index-settings[`index.max_terms_count`] defines
|
||||
the maximum number of terms that can be used in a terms query. The default value
|
||||
is 65536. If `max_page_search_size` exceeds `index.max_terms_count` the
|
||||
transform will fail.
|
||||
|
||||
Using smaller values for `max_page_search_size` may result in a longer duration
|
||||
for the transform checkpoint to complete.
|
||||
|
||||
[float]
|
||||
[[df-scheduling-limitations]]
|
||||
=== {cdataframe-cap} scheduling limitations
|
||||
|
||||
A {cdataframe} periodically checks for changes to source data. The functionality
|
||||
of the scheduler is currently limited to a basic periodic timer which can be
|
||||
within the `frequency` range from 1s to 1h. The default is 1m. This is designed
|
||||
to run little and often. When choosing a `frequency` for this timer consider
|
||||
your ingest rate along with the impact that the {dataframe-transform}
|
||||
search/index operations has other users in your cluster. Also note that retries
|
||||
occur at `frequency` interval.
|
||||
|
||||
[float]
|
||||
[[df-failed-limitations]]
|
||||
=== Handling of failed {dataframe-transforms}
|
||||
|
||||
Failed {dataframe-transforms} remain as a persistent task and should be handled
|
||||
appropriately, either by deleting it or by resolving the root cause of the
|
||||
failure and re-starting.
|
||||
|
||||
When using the API to delete a failed {dataframe-transform}, first stop it using
|
||||
`_stop?force=true`, then delete it.
|
||||
|
||||
If starting a failed {dataframe-transform}, after the root cause has been
|
||||
resolved, the `_start?force=true` parameter must be specified.
|
||||
|
||||
[float]
|
||||
[[df-availability-limitations]]
|
||||
=== {cdataframes-cap} may give incorrect results if documents are not yet available to search
|
||||
|
||||
After a document is indexed, there is a very small delay until it is available
|
||||
to search.
|
||||
|
||||
A {cdataframe-transform} periodically checks for changed entities between the
|
||||
time since it last checked and `now` minus `sync.time.delay`. This time window
|
||||
moves without overlapping. If the timestamp of a recently indexed document falls
|
||||
within this time window but this document is not yet available to search then
|
||||
this entity will not be updated.
|
||||
|
||||
If using a `sync.time.field` that represents the data ingest time and using a
|
||||
zero second or very small `sync.time.delay`, then it is more likely that this
|
||||
issue will occur.
|
71
docs/reference/transform/overview.asciidoc
Normal file
@ -0,0 +1,71 @@
|
||||
[role="xpack"]
|
||||
[[ml-transform-overview]]
|
||||
== {dataframe-transform-cap} overview
|
||||
++++
|
||||
<titleabbrev>Overview</titleabbrev>
|
||||
++++
|
||||
|
||||
beta[]
|
||||
|
||||
A _{dataframe}_ is a two-dimensional tabular data structure. In the context of
|
||||
the {stack}, it is a transformation of data that is indexed in {es}. For
|
||||
example, you can use {dataframes} to _pivot_ your data into a new entity-centric
|
||||
index. By transforming and summarizing your data, it becomes possible to
|
||||
visualize and analyze it in alternative and interesting ways.
|
||||
|
||||
A lot of {es} indices are organized as a stream of events: each event is an
|
||||
individual document, for example a single item purchase. {dataframes-cap} enable
|
||||
you to summarize this data, bringing it into an organized, more
|
||||
analysis-friendly format. For example, you can summarize all the purchases of a
|
||||
single customer.
|
||||
|
||||
You can create {dataframes} by using {dataframe-transforms}.
|
||||
{dataframe-transforms-cap} enable you to define a pivot, which is a set of
|
||||
features that transform the index into a different, more digestible format.
|
||||
Pivoting results in a summary of your data, which is the {dataframe}.
|
||||
|
||||
To define a pivot, first you select one or more fields that you will use to
|
||||
group your data. You can select categorical fields (terms) and numerical fields
|
||||
for grouping. If you use numerical fields, the field values are bucketed using
|
||||
an interval that you specify.
|
||||
|
||||
The second step is deciding how you want to aggregate the grouped data. When
|
||||
using aggregations, you practically ask questions about the index. There are
|
||||
different types of aggregations, each with its own purpose and output. To learn
|
||||
more about the supported aggregations and group-by fields, see
|
||||
{ref}/data-frame-transform-resource.html[{dataframe-transform-cap} resources].
|
||||
|
||||
As an optional step, you can also add a query to further limit the scope of the
|
||||
aggregation.
|
||||
|
||||
The {dataframe-transform} performs a composite aggregation that
|
||||
paginates through all the data defined by the source index query. The output of
|
||||
the aggregation is stored in a destination index. Each time the
|
||||
{dataframe-transform} queries the source index, it creates a _checkpoint_. You
|
||||
can decide whether you want the {dataframe-transform} to run once (batch
|
||||
{dataframe-transform}) or continuously ({cdataframe-transform}). A batch
|
||||
{dataframe-transform} is a single operation that has a single checkpoint.
|
||||
{cdataframe-transforms-cap} continually increment and process checkpoints as new
|
||||
source data is ingested.
|
||||
|
||||
.Example
|
||||
|
||||
Imagine that you run a webshop that sells clothes. Every order creates a document
|
||||
that contains a unique order ID, the name and the category of the ordered product,
|
||||
its price, the ordered quantity, the exact date of the order, and some customer
|
||||
information (name, gender, location, etc). Your dataset contains all the transactions
|
||||
from last year.
|
||||
|
||||
If you want to check the sales in the different categories in your last fiscal
|
||||
year, define a {dataframe-transform} that groups the data by the product
|
||||
categories (women's shoes, men's clothing, etc.) and the order date. Use the
|
||||
last year as the interval for the order date. Then add a sum aggregation on the
|
||||
ordered quantity. The result is a {dataframe} that shows the number of sold
|
||||
items in every product category in the last year.
|
||||
|
||||
[role="screenshot"]
|
||||
image::images/ml-dataframepivot.jpg["Example of a data frame pivot in {kib}"]
|
||||
|
||||
IMPORTANT: The {dataframe-transform} leaves your source index intact. It
|
||||
creates a new index that is dedicated to the {dataframe}.
|
||||
|
29
docs/reference/transform/troubleshooting.asciidoc
Normal file
@ -0,0 +1,29 @@
|
||||
[[dataframe-troubleshooting]]
|
||||
== Troubleshooting {dataframe-transforms}
|
||||
[subs="attributes"]
|
||||
++++
|
||||
<titleabbrev>Troubleshooting</titleabbrev>
|
||||
++++
|
||||
|
||||
Use the information in this section to troubleshoot common problems.
|
||||
|
||||
include::{stack-repo-dir}/help.asciidoc[tag=get-help]
|
||||
|
||||
If you encounter problems with your {dataframe-transforms}, you can gather more
|
||||
information from the following files and APIs:
|
||||
|
||||
* Lightweight audit messages are stored in `.data-frame-notifications-*`. Search
|
||||
by your `transform_id`.
|
||||
* The
|
||||
{ref}/get-data-frame-transform-stats.html[get {dataframe-transform} statistics API]
|
||||
provides information about the transform status and failures.
|
||||
* If the {dataframe-transform} exists as a task, you can use the
|
||||
{ref}/tasks.html[task management API] to gather task information. For example:
|
||||
`GET _tasks?actions=data_frame/transforms*&detailed`. Typically, the task exists
|
||||
when the transform is in a started or failed state.
|
||||
* The {es} logs from the node that was running the {dataframe-transform} might
|
||||
also contain useful information. You can identify the node from the notification
|
||||
messages. Alternatively, if the task still exists, you can get that information
|
||||
from the get {dataframe-transform} statistics API. For more information, see
|
||||
{ref}/logging.html[Logging configuration].
|
||||
|