2019-09-11 11:44:03 -04:00
|
|
|
[role="xpack"]
|
|
|
|
[testenv="basic"]
|
|
|
|
[[ecommerce-dataframes]]
|
|
|
|
=== Transforming the eCommerce sample data
|
|
|
|
|
|
|
|
beta[]
|
|
|
|
|
2019-09-16 11:28:19 -04:00
|
|
|
<<ml-dataframes,{transforms-cap}>> enable you to retrieve information
|
2019-09-11 11:44:03 -04:00
|
|
|
from an {es} index, transform it, and store it in another index. Let's use the
|
|
|
|
{kibana-ref}/add-sample-data.html[{kib} sample data] to demonstrate how you can
|
2019-09-16 11:28:19 -04:00
|
|
|
pivot and summarize your data with {transforms}.
|
2019-09-11 11:44:03 -04:00
|
|
|
|
|
|
|
|
|
|
|
. If the {es} {security-features} are enabled, obtain a user ID with sufficient
|
|
|
|
privileges to complete these steps.
|
|
|
|
+
|
|
|
|
--
|
|
|
|
You need `manage_data_frame_transforms` cluster privileges to preview and create
|
2019-09-16 11:28:19 -04:00
|
|
|
{transforms}. Members of the built-in `data_frame_transforms_admin`
|
2019-09-11 11:44:03 -04:00
|
|
|
role have these privileges.
|
|
|
|
|
|
|
|
You also need `read` and `view_index_metadata` index privileges on the source
|
|
|
|
index and `read`, `create_index`, and `index` privileges on the destination
|
|
|
|
index.
|
|
|
|
|
|
|
|
For more information, see <<security-privileges>> and <<built-in-roles>>.
|
|
|
|
--
|
|
|
|
|
|
|
|
. Choose your _source index_.
|
|
|
|
+
|
|
|
|
--
|
|
|
|
In this example, we'll use the eCommerce orders sample data. If you're not
|
|
|
|
already familiar with the `kibana_sample_data_ecommerce` index, use the
|
|
|
|
*Revenue* dashboard in {kib} to explore the data. Consider what insights you
|
|
|
|
might want to derive from this eCommerce data.
|
|
|
|
--
|
|
|
|
|
|
|
|
. Play with various options for grouping and aggregating the data.
|
|
|
|
+
|
|
|
|
--
|
2019-09-17 14:44:16 -04:00
|
|
|
_Pivoting_ your data involves using at least one field to group it and applying
|
|
|
|
at least one aggregation. You can preview what the transformed data will look
|
|
|
|
like, so go ahead and play with it!
|
|
|
|
|
2019-09-11 11:44:03 -04:00
|
|
|
For example, you might want to group the data by product ID and calculate the
|
|
|
|
total number of sales for each product and its average price. Alternatively, you
|
|
|
|
might want to look at the behavior of individual customers and calculate how
|
|
|
|
much each customer spent in total and how many different categories of products
|
|
|
|
they purchased. Or you might want to take the currencies or geographies into
|
|
|
|
consideration. What are the most interesting ways you can transform and
|
|
|
|
interpret this data?
|
|
|
|
|
2019-09-17 14:44:16 -04:00
|
|
|
Go to *Machine Learning* > *Data Frames* in {kib} and use the
|
2019-09-16 11:28:19 -04:00
|
|
|
wizard to create a {transform}:
|
2019-09-11 11:44:03 -04:00
|
|
|
|
|
|
|
[role="screenshot"]
|
2019-09-16 11:28:19 -04:00
|
|
|
image::images/ecommerce-pivot1.jpg["Creating a simple {transform} in {kib}"]
|
2019-09-11 11:44:03 -04:00
|
|
|
|
|
|
|
In this case, we grouped the data by customer ID and calculated the sum of
|
|
|
|
products each customer purchased.
|
|
|
|
|
|
|
|
Let's add some more aggregations to learn more about our customers' orders. For
|
|
|
|
example, let's calculate the total sum of their purchases, the maximum number of
|
|
|
|
products that they purchased in a single order, and their total number of orders.
|
|
|
|
We'll accomplish this by using the
|
|
|
|
{ref}/search-aggregations-metrics-sum-aggregation.html[`sum` aggregation] on the
|
|
|
|
`taxless_total_price` field, the
|
|
|
|
{ref}/search-aggregations-metrics-max-aggregation.html[`max` aggregation] on the
|
|
|
|
`total_quantity` field, and the
|
|
|
|
{ref}/search-aggregations-metrics-cardinality-aggregation.html[`cardinality` aggregation]
|
|
|
|
on the `order_id` field:
|
|
|
|
|
|
|
|
[role="screenshot"]
|
2019-09-16 11:28:19 -04:00
|
|
|
image::images/ecommerce-pivot2.jpg["Adding multiple aggregations to a {transform} in {kib}"]
|
2019-09-11 11:44:03 -04:00
|
|
|
|
|
|
|
TIP: If you're interested in a subset of the data, you can optionally include a
|
|
|
|
{ref}/search-request-body.html#request-body-search-query[query] element. In this
|
|
|
|
example, we've filtered the data so that we're only looking at orders with a
|
|
|
|
`currency` of `EUR`. Alternatively, we could group the data by that field too.
|
|
|
|
If you want to use more complex queries, you can create your {dataframe} from a
|
|
|
|
{kibana-ref}/save-open-search.html[saved search].
|
|
|
|
|
|
|
|
If you prefer, you can use the
|
2019-09-16 11:28:19 -04:00
|
|
|
{ref}/preview-data-frame-transform.html[preview {transforms} API]:
|
2019-09-11 11:44:03 -04:00
|
|
|
|
2019-09-13 11:23:53 -04:00
|
|
|
[source,console]
|
2019-09-11 11:44:03 -04:00
|
|
|
--------------------------------------------------
|
|
|
|
POST _data_frame/transforms/_preview
|
|
|
|
{
|
|
|
|
"source": {
|
|
|
|
"index": "kibana_sample_data_ecommerce",
|
|
|
|
"query": {
|
|
|
|
"bool": {
|
|
|
|
"filter": {
|
|
|
|
"term": {"currency": "EUR"}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"pivot": {
|
|
|
|
"group_by": {
|
|
|
|
"customer_id": {
|
|
|
|
"terms": {
|
|
|
|
"field": "customer_id"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"aggregations": {
|
|
|
|
"total_quantity.sum": {
|
|
|
|
"sum": {
|
|
|
|
"field": "total_quantity"
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"taxless_total_price.sum": {
|
|
|
|
"sum": {
|
|
|
|
"field": "taxless_total_price"
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"total_quantity.max": {
|
|
|
|
"max": {
|
|
|
|
"field": "total_quantity"
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"order_id.cardinality": {
|
|
|
|
"cardinality": {
|
|
|
|
"field": "order_id"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
// TEST[skip:set up sample data]
|
|
|
|
--
|
|
|
|
|
|
|
|
. When you are satisfied with what you see in the preview, create the
|
2019-09-16 11:28:19 -04:00
|
|
|
{transform}.
|
2019-09-11 11:44:03 -04:00
|
|
|
+
|
|
|
|
--
|
2019-09-17 14:44:16 -04:00
|
|
|
.. Supply a job ID and the name of the target (or _destination_) index. If the
|
|
|
|
target index does not exist, it will be created automatically.
|
2019-09-11 11:44:03 -04:00
|
|
|
|
2019-09-16 11:28:19 -04:00
|
|
|
.. Decide whether you want the {transform} to run once or continuously.
|
2019-09-11 11:44:03 -04:00
|
|
|
--
|
|
|
|
+
|
|
|
|
--
|
|
|
|
Since this sample data index is unchanging, let's use the default behavior and
|
2019-09-16 11:28:19 -04:00
|
|
|
just run the {transform} once.
|
2019-09-11 11:44:03 -04:00
|
|
|
|
|
|
|
[role="screenshot"]
|
2019-09-16 11:28:19 -04:00
|
|
|
image::images/ecommerce-batch.jpg["Specifying the {transform} options in {kib}"]
|
2019-09-11 11:44:03 -04:00
|
|
|
|
|
|
|
If you want to try it out, however, go ahead and click on *Continuous mode*.
|
2019-09-16 11:28:19 -04:00
|
|
|
You must choose a field that the {transform} can use to check which
|
2019-09-11 11:44:03 -04:00
|
|
|
entities have changed. In general, it's a good idea to use the ingest timestamp
|
|
|
|
field. In this example, however, you can use the `order_date` field.
|
|
|
|
|
|
|
|
If you prefer, you can use the
|
2019-09-16 11:28:19 -04:00
|
|
|
{ref}/put-data-frame-transform.html[create {transforms} API]. For
|
2019-09-11 11:44:03 -04:00
|
|
|
example:
|
|
|
|
|
2019-09-13 11:23:53 -04:00
|
|
|
[source,console]
|
2019-09-11 11:44:03 -04:00
|
|
|
--------------------------------------------------
|
|
|
|
PUT _data_frame/transforms/ecommerce-customer-transform
|
|
|
|
{
|
|
|
|
"source": {
|
|
|
|
"index": [
|
|
|
|
"kibana_sample_data_ecommerce"
|
|
|
|
],
|
|
|
|
"query": {
|
|
|
|
"bool": {
|
|
|
|
"filter": {
|
|
|
|
"term": {
|
|
|
|
"currency": "EUR"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"pivot": {
|
|
|
|
"group_by": {
|
|
|
|
"customer_id": {
|
|
|
|
"terms": {
|
|
|
|
"field": "customer_id"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"aggregations": {
|
|
|
|
"total_quantity.sum": {
|
|
|
|
"sum": {
|
|
|
|
"field": "total_quantity"
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"taxless_total_price.sum": {
|
|
|
|
"sum": {
|
|
|
|
"field": "taxless_total_price"
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"total_quantity.max": {
|
|
|
|
"max": {
|
|
|
|
"field": "total_quantity"
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"order_id.cardinality": {
|
|
|
|
"cardinality": {
|
|
|
|
"field": "order_id"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"dest": {
|
|
|
|
"index": "ecommerce-customers"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
// TEST[skip:setup kibana sample data]
|
|
|
|
--
|
|
|
|
|
2019-09-16 11:28:19 -04:00
|
|
|
. Start the {transform}.
|
2019-09-11 11:44:03 -04:00
|
|
|
+
|
|
|
|
--
|
|
|
|
|
|
|
|
TIP: Even though resource utilization is automatically adjusted based on the
|
2019-09-16 11:28:19 -04:00
|
|
|
cluster load, a {transform} increases search and indexing load on your
|
2019-09-11 11:44:03 -04:00
|
|
|
cluster while it runs. If you're experiencing an excessive load, however, you
|
|
|
|
can stop it.
|
|
|
|
|
2019-09-16 11:28:19 -04:00
|
|
|
You can start, stop, and manage {transforms} in {kib}:
|
2019-09-11 11:44:03 -04:00
|
|
|
|
|
|
|
[role="screenshot"]
|
2019-09-16 11:28:19 -04:00
|
|
|
image::images/dataframe-transforms.jpg["Managing {transforms} in {kib}"]
|
2019-09-11 11:44:03 -04:00
|
|
|
|
|
|
|
Alternatively, you can use the
|
2019-09-16 11:28:19 -04:00
|
|
|
{ref}/start-data-frame-transform.html[start {transforms}] and
|
|
|
|
{ref}/stop-data-frame-transform.html[stop {transforms}] APIs. For
|
2019-09-11 11:44:03 -04:00
|
|
|
example:
|
|
|
|
|
2019-09-13 11:23:53 -04:00
|
|
|
[source,console]
|
2019-09-11 11:44:03 -04:00
|
|
|
--------------------------------------------------
|
|
|
|
POST _data_frame/transforms/ecommerce-customer-transform/_start
|
|
|
|
--------------------------------------------------
|
|
|
|
// TEST[skip:setup kibana sample data]
|
|
|
|
|
|
|
|
--
|
|
|
|
|
|
|
|
. Explore the data in your new index.
|
|
|
|
+
|
|
|
|
--
|
|
|
|
For example, use the *Discover* application in {kib}:
|
|
|
|
|
|
|
|
[role="screenshot"]
|
|
|
|
image::images/ecommerce-results.jpg["Exploring the new index in {kib}"]
|
|
|
|
|
|
|
|
--
|
|
|
|
|
2019-09-16 11:28:19 -04:00
|
|
|
TIP: If you do not want to keep the {transform}, you can delete it in
|
2019-09-11 11:44:03 -04:00
|
|
|
{kib} or use the
|
2019-09-16 11:28:19 -04:00
|
|
|
{ref}/delete-data-frame-transform.html[delete {transform} API]. When
|
|
|
|
you delete a {transform}, its destination index and {kib} index
|
2019-09-11 11:44:03 -04:00
|
|
|
patterns remain.
|