OpenSearch/docs/reference/transform/ecommerce-tutorial.asciidoc

[role="xpack"]
[testenv="basic"]
[[ecommerce-transforms]]
=== Tutorial: Transforming the eCommerce sample data

beta[]

<<transforms,{transforms-cap}>> enable you to retrieve information
from an {es} index, transform it, and store it in another index. Let's use the
{kibana-ref}/add-sample-data.html[{kib} sample data] to demonstrate how you can
pivot and summarize your data with {transforms}.


. If the {es} {security-features} are enabled, obtain a user ID with sufficient
privileges to complete these steps. 
+
--
You need `manage_data_frame_transforms` cluster privileges to preview and create
{transforms}. Members of the built-in `data_frame_transforms_admin`
role have these privileges.

You also need `read` and `view_index_metadata` index privileges on the source
index and `read`, `create_index`, and `index` privileges on the destination
index. 

For more information, see
{stack-ov}/security-privileges.html[Security privileges] and
{stack-ov}/built-in-roles.html[Built-in roles].
--

. Choose your _source index_.
+
--
In this example, we'll use the eCommerce orders sample data. If you're not
already familiar with the `kibana_sample_data_ecommerce` index, use the
*Revenue* dashboard in {kib} to explore the data. Consider what insights you
might want to derive from this eCommerce data.
--

. Play with various options for grouping and aggregating the data. 
+
--
_Pivoting_ your data involves using at least one field to group it and applying
at least one aggregation. You can preview what the transformed data will look
like, so go ahead and play with it!

For example, you might want to group the data by product ID and calculate the
total number of sales for each product and its average price. Alternatively, you
might want to look at the behavior of individual customers and calculate how
much each customer spent in total and how many different categories of products
they purchased. Or you might want to take the currencies or geographies into
consideration. What are the most interesting ways you can transform and
interpret this data?

Go to *Machine Learning* > *Data Frames* in {kib} and use the
wizard to create a {transform}:

[role="screenshot"]
image::images/ecommerce-pivot1.jpg["Creating a simple {transform} in {kib}"]

In this case, we grouped the data by customer ID and calculated the sum of
products each customer purchased.

Let's add some more aggregations to learn more about our customers' orders. For
example, let's calculate the total sum of their purchases, the maximum number of
products that they purchased in a single order, and their total number of orders.
We'll accomplish this by using the
{ref}/search-aggregations-metrics-sum-aggregation.html[`sum` aggregation] on the
`taxless_total_price` field, the
{ref}/search-aggregations-metrics-max-aggregation.html[`max` aggregation] on the
`total_quantity` field, and the
{ref}/search-aggregations-metrics-cardinality-aggregation.html[`cardinality` aggregation]
on the `order_id` field:

[role="screenshot"]
image::images/ecommerce-pivot2.jpg["Adding multiple aggregations to a {transform} in {kib}"]

TIP: If you're interested in a subset of the data, you can optionally include a
{ref}/search-request-body.html#request-body-search-query[query] element. In this
example, we've filtered the data so that we're only looking at orders with a
`currency` of `EUR`. Alternatively, we could group the data by that field too.
If you want to use more complex queries, you can create your {dataframe} from a
{kibana-ref}/save-open-search.html[saved search].

If you prefer, you can use the
{ref}/preview-transform.html[preview {transforms} API]:

[source,console]
--------------------------------------------------
POST _data_frame/transforms/_preview
{
  "source": {
    "index": "kibana_sample_data_ecommerce",
    "query": {
      "bool": {
        "filter": {
          "term": {"currency": "EUR"}
        }
      }
    }
  },
  "pivot": {
    "group_by": {
      "customer_id": {
        "terms": {
          "field": "customer_id"
        }
      }
    },
    "aggregations": {
      "total_quantity.sum": {
        "sum": {
          "field": "total_quantity"
        }
      },
      "taxless_total_price.sum": {
        "sum": {
          "field": "taxless_total_price"
        }
      },
      "total_quantity.max": {
        "max": {
          "field": "total_quantity"
        }
      },
      "order_id.cardinality": {
        "cardinality": {
          "field": "order_id"
        }
      }
    }
  }
}
--------------------------------------------------
// TEST[skip:set up sample data]
--

. When you are satisfied with what you see in the preview, create the
{transform}. 
+
--
.. Supply a job ID and the name of the target (or _destination_) index. If the
target index does not exist, it will be created automatically.

.. Decide whether you want the {transform} to run once or continuously.
--
+
--
Since this sample data index is unchanging, let's use the default behavior and
just run the {transform} once.

[role="screenshot"]
image::images/ecommerce-batch.jpg["Specifying the {transform} options in {kib}"]

If you want to try it out, however, go ahead and click on *Continuous mode*. 
You must choose a field that the {transform} can use to check which
entities have changed. In general, it's a good idea to use the ingest timestamp
field. In this example, however, you can use the `order_date` field.

If you prefer, you can use the
{ref}/put-transform.html[create {transforms} API]. For
example:

[source,console]
--------------------------------------------------
PUT _data_frame/transforms/ecommerce-customer-transform
{
  "source": {
    "index": [
      "kibana_sample_data_ecommerce"
    ],
    "query": {
      "bool": {
        "filter": {
          "term": {
            "currency": "EUR"
          }
        }
      }
    }
  },
  "pivot": {
    "group_by": {
      "customer_id": {
        "terms": {
          "field": "customer_id"
        }
      }
    },
    "aggregations": {
      "total_quantity.sum": {
        "sum": {
          "field": "total_quantity"
        }
      },
      "taxless_total_price.sum": {
        "sum": {
          "field": "taxless_total_price"
        }
      },
      "total_quantity.max": {
        "max": {
          "field": "total_quantity"
        }
      },
      "order_id.cardinality": {
        "cardinality": {
          "field": "order_id"
        }
      }
    }
  },
  "dest": {
    "index": "ecommerce-customers"
  }
}
--------------------------------------------------
// TEST[skip:setup kibana sample data]
--

. Start the {transform}.
+
--

TIP: Even though resource utilization is automatically adjusted based on the
cluster load, a {transform} increases search and indexing load on your
cluster while it runs. If you're experiencing an excessive load, however, you
can stop it.

You can start, stop, and manage {transforms} in {kib}:

[role="screenshot"]
image::images/dataframe-transforms.jpg["Managing {transforms} in {kib}"]

Alternatively, you can use the
{ref}/start-transform.html[start {transforms}] and
{ref}/stop-transform.html[stop {transforms}] APIs. For
example:

[source,console]
--------------------------------------------------
POST _data_frame/transforms/ecommerce-customer-transform/_start
--------------------------------------------------
// TEST[skip:setup kibana sample data]

--

. Explore the data in your new index.
+
--
For example, use the *Discover* application in {kib}:

[role="screenshot"]
image::images/ecommerce-results.jpg["Exploring the new index in {kib}"]

--

TIP: If you do not want to keep the {transform}, you can delete it in
{kib} or use the
{ref}/delete-transform.html[delete {transform} API]. When
you delete a {transform}, its destination index and {kib} index
patterns remain.
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00			`[role="xpack"]`
			`[testenv="basic"]`
[DOCS] Adds transforms to Elasticsearch book (#46846) (#47055) 2019-09-25 08:11:37 -07:00			`[[ecommerce-transforms]]`
			`=== Tutorial: Transforming the eCommerce sample data`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00
			`beta[]`

[DOCS] Adds transforms to Elasticsearch book (#46846) (#47055) 2019-09-25 08:11:37 -07:00			`<<transforms,{transforms-cap}>> enable you to retrieve information`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00			`from an {es} index, transform it, and store it in another index. Let's use the`
			`{kibana-ref}/add-sample-data.html[{kib} sample data] to demonstrate how you can`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 08:28:19 -07:00			`pivot and summarize your data with {transforms}.`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00

			`. If the {es} {security-features} are enabled, obtain a user ID with sufficient`
			`privileges to complete these steps.`
			`+`
			`--`
			You need `manage_data_frame_transforms` cluster privileges to preview and create
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 08:28:19 -07:00			{transforms}. Members of the built-in `data_frame_transforms_admin`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00			`role have these privileges.`

			You also need `read` and `view_index_metadata` index privileges on the source
			index and `read`, `create_index`, and `index` privileges on the destination
			`index.`

[DOCS] Adds transforms to Elasticsearch book (#46846) (#47055) 2019-09-25 08:11:37 -07:00			`For more information, see`
			`{stack-ov}/security-privileges.html[Security privileges] and`
			`{stack-ov}/built-in-roles.html[Built-in roles].`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00			`--`

			`. Choose your _source index_.`
			`+`
			`--`
			`In this example, we'll use the eCommerce orders sample data. If you're not`
			already familiar with the `kibana_sample_data_ecommerce` index, use the
			`Revenue dashboard in {kib} to explore the data. Consider what insights you`
			`might want to derive from this eCommerce data.`
			`--`

			`. Play with various options for grouping and aggregating the data.`
			`+`
			`--`
[DOCS] Augments ecommerce example (#46788) 2019-09-17 11:44:16 -07:00			`_Pivoting_ your data involves using at least one field to group it and applying`
			`at least one aggregation. You can preview what the transformed data will look`
			`like, so go ahead and play with it!`

[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00			`For example, you might want to group the data by product ID and calculate the`
			`total number of sales for each product and its average price. Alternatively, you`
			`might want to look at the behavior of individual customers and calculate how`
			`much each customer spent in total and how many different categories of products`
			`they purchased. Or you might want to take the currencies or geographies into`
			`consideration. What are the most interesting ways you can transform and`
			`interpret this data?`

[DOCS] Augments ecommerce example (#46788) 2019-09-17 11:44:16 -07:00			`Go to Machine Learning > Data Frames in {kib} and use the`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 08:28:19 -07:00			`wizard to create a {transform}:`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00
			`[role="screenshot"]`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 08:28:19 -07:00			`image::images/ecommerce-pivot1.jpg["Creating a simple {transform} in {kib}"]`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00
			`In this case, we grouped the data by customer ID and calculated the sum of`
			`products each customer purchased.`

			`Let's add some more aggregations to learn more about our customers' orders. For`
			`example, let's calculate the total sum of their purchases, the maximum number of`
			`products that they purchased in a single order, and their total number of orders.`
			`We'll accomplish this by using the`
			{ref}/search-aggregations-metrics-sum-aggregation.html[`sum` aggregation] on the
			`taxless_total_price` field, the
			{ref}/search-aggregations-metrics-max-aggregation.html[`max` aggregation] on the
			`total_quantity` field, and the
			{ref}/search-aggregations-metrics-cardinality-aggregation.html[`cardinality` aggregation]
			on the `order_id` field:

			`[role="screenshot"]`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 08:28:19 -07:00			`image::images/ecommerce-pivot2.jpg["Adding multiple aggregations to a {transform} in {kib}"]`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00
			`TIP: If you're interested in a subset of the data, you can optionally include a`
			`{ref}/search-request-body.html#request-body-search-query[query] element. In this`
			`example, we've filtered the data so that we're only looking at orders with a`
			`currency` of `EUR`. Alternatively, we could group the data by that field too.
			`If you want to use more complex queries, you can create your {dataframe} from a`
			`{kibana-ref}/save-open-search.html[saved search].`

			`If you prefer, you can use the`
[DOCS] Update data frame transform URLs (#46940) (#46946) 2019-09-20 15:57:43 -07:00			`{ref}/preview-transform.html[preview {transforms} API]:`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00
[DOCS] Replace "// CONSOLE" comments with [source,console] (#46679) 2019-09-13 11:23:53 -04:00			`[source,console]`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00			`--------------------------------------------------`
			`POST _data_frame/transforms/_preview`
			`{`
			`"source": {`
			`"index": "kibana_sample_data_ecommerce",`
			`"query": {`
			`"bool": {`
			`"filter": {`
			`"term": {"currency": "EUR"}`
			`}`
			`}`
			`}`
			`},`
			`"pivot": {`
			`"group_by": {`
			`"customer_id": {`
			`"terms": {`
			`"field": "customer_id"`
			`}`
			`}`
			`},`
			`"aggregations": {`
			`"total_quantity.sum": {`
			`"sum": {`
			`"field": "total_quantity"`
			`}`
			`},`
			`"taxless_total_price.sum": {`
			`"sum": {`
			`"field": "taxless_total_price"`
			`}`
			`},`
			`"total_quantity.max": {`
			`"max": {`
			`"field": "total_quantity"`
			`}`
			`},`
			`"order_id.cardinality": {`
			`"cardinality": {`
			`"field": "order_id"`
			`}`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`
			`// TEST[skip:set up sample data]`
			`--`

			`. When you are satisfied with what you see in the preview, create the`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 08:28:19 -07:00			`{transform}.`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00			`+`
			`--`
[DOCS] Augments ecommerce example (#46788) 2019-09-17 11:44:16 -07:00			`.. Supply a job ID and the name of the target (or _destination_) index. If the`
			`target index does not exist, it will be created automatically.`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 08:28:19 -07:00			`.. Decide whether you want the {transform} to run once or continuously.`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00			`--`
			`+`
			`--`
			`Since this sample data index is unchanging, let's use the default behavior and`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 08:28:19 -07:00			`just run the {transform} once.`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00
			`[role="screenshot"]`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 08:28:19 -07:00			`image::images/ecommerce-batch.jpg["Specifying the {transform} options in {kib}"]`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00
			`If you want to try it out, however, go ahead and click on Continuous mode.`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 08:28:19 -07:00			`You must choose a field that the {transform} can use to check which`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00			`entities have changed. In general, it's a good idea to use the ingest timestamp`
			field. In this example, however, you can use the `order_date` field.

			`If you prefer, you can use the`
[DOCS] Update data frame transform URLs (#46940) (#46946) 2019-09-20 15:57:43 -07:00			`{ref}/put-transform.html[create {transforms} API]. For`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00			`example:`

[DOCS] Replace "// CONSOLE" comments with [source,console] (#46679) 2019-09-13 11:23:53 -04:00			`[source,console]`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00			`--------------------------------------------------`
			`PUT _data_frame/transforms/ecommerce-customer-transform`
			`{`
			`"source": {`
			`"index": [`
			`"kibana_sample_data_ecommerce"`
			`],`
			`"query": {`
			`"bool": {`
			`"filter": {`
			`"term": {`
			`"currency": "EUR"`
			`}`
			`}`
			`}`
			`}`
			`},`
			`"pivot": {`
			`"group_by": {`
			`"customer_id": {`
			`"terms": {`
			`"field": "customer_id"`
			`}`
			`}`
			`},`
			`"aggregations": {`
			`"total_quantity.sum": {`
			`"sum": {`
			`"field": "total_quantity"`
			`}`
			`},`
			`"taxless_total_price.sum": {`
			`"sum": {`
			`"field": "taxless_total_price"`
			`}`
			`},`
			`"total_quantity.max": {`
			`"max": {`
			`"field": "total_quantity"`
			`}`
			`},`
			`"order_id.cardinality": {`
			`"cardinality": {`
			`"field": "order_id"`
			`}`
			`}`
			`}`
			`},`
			`"dest": {`
			`"index": "ecommerce-customers"`
			`}`
			`}`
			`--------------------------------------------------`
			`// TEST[skip:setup kibana sample data]`
			`--`

[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 08:28:19 -07:00			`. Start the {transform}.`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00			`+`
			`--`

			`TIP: Even though resource utilization is automatically adjusted based on the`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 08:28:19 -07:00			`cluster load, a {transform} increases search and indexing load on your`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00			`cluster while it runs. If you're experiencing an excessive load, however, you`
			`can stop it.`

[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 08:28:19 -07:00			`You can start, stop, and manage {transforms} in {kib}:`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00
			`[role="screenshot"]`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 08:28:19 -07:00			`image::images/dataframe-transforms.jpg["Managing {transforms} in {kib}"]`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00
			`Alternatively, you can use the`
[DOCS] Update data frame transform URLs (#46940) (#46946) 2019-09-20 15:57:43 -07:00			`{ref}/start-transform.html[start {transforms}] and`
			`{ref}/stop-transform.html[stop {transforms}] APIs. For`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00			`example:`

[DOCS] Replace "// CONSOLE" comments with [source,console] (#46679) 2019-09-13 11:23:53 -04:00			`[source,console]`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00			`--------------------------------------------------`
			`POST _data_frame/transforms/ecommerce-customer-transform/_start`
			`--------------------------------------------------`
			`// TEST[skip:setup kibana sample data]`

			`--`

			`. Explore the data in your new index.`
			`+`
			`--`
			`For example, use the Discover application in {kib}:`

			`[role="screenshot"]`
			`image::images/ecommerce-results.jpg["Exploring the new index in {kib}"]`

			`--`

[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 08:28:19 -07:00			`TIP: If you do not want to keep the {transform}, you can delete it in`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00			`{kib} or use the`
[DOCS] Update data frame transform URLs (#46940) (#46946) 2019-09-20 15:57:43 -07:00			`{ref}/delete-transform.html[delete {transform} API]. When`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 08:28:19 -07:00			`you delete a {transform}, its destination index and {kib} index`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 08:44:03 -07:00			`patterns remain.`