OpenSearch/docs/reference/transform/overview.asciidoc

[role="xpack"]
[[transform-overview]]
=== {transform-cap} overview
++++
<titleabbrev>Overview</titleabbrev>
++++

You can use {transforms} to _pivot_ your data into a new entity-centric index. 
By transforming and summarizing your data, it becomes possible to visualize and 
analyze it in alternative and interesting ways.

A lot of {es} indices are organized as a stream of events: each event is an 
individual document, for example a single item purchase. {transforms-cap} enable
you to summarize this data, bringing it into an organized, more
analysis-friendly format. For example, you can summarize all the purchases of a
single customer.

{transforms-cap} enable you to define a pivot, which is a set of
features that transform the index into a different, more digestible format.
Pivoting results in a summary of your data in a new index.

To define a pivot, first you select one or more fields that you will use to
group your data. You can select categorical fields (terms) and numerical fields
for grouping. If you use numerical fields, the field values are bucketed using
an interval that you specify.

The second step is deciding how you want to aggregate the grouped data. When 
using aggregations, you practically ask questions about the index. There are 
different types of aggregations, each with its own purpose and output. To learn 
more about the supported aggregations and group-by fields, see 
<<put-transform>>.

As an optional step, you can also add a query to further limit the scope of the
aggregation.

The {transform} performs a composite aggregation that paginates through all the 
data defined by the source index query. The output of the aggregation is stored 
in a _destination index_. Each time the {transform} queries the source index, it 
creates a _checkpoint_. You can decide whether you want the {transform} to run 
once or continuously. A _batch {transform}_ is a single operation that has a
single checkpoint. _{ctransforms-cap}_ continually increment and process
checkpoints as new source data is ingested.

Imagine that you run a webshop that sells clothes. Every order creates a 
document that contains a unique order ID, the name and the category of the 
ordered product, its price, the ordered quantity, the exact date of the order, 
and some customer information (name, gender, location, etc). Your dataset 
contains all the transactions from last year.

If you want to check the sales in the different categories in your last fiscal
year, define a {transform} that groups the data by the product categories 
(women's shoes, men's clothing, etc.) and the order date. Use the last year as 
the interval for the order date. Then add a sum aggregation on the ordered 
quantity. The result is an entity-centric index that shows the number of sold
items in every product category in the last year.

[role="screenshot"]
image::images/pivot-preview.jpg["Example of a {transform} pivot in {kib}"]

IMPORTANT: The {transform} leaves your source index intact. It
creates a new index that is dedicated to the transformed data.


[[transform-performance]]
==== Performance considerations

{transforms-cap} perform search aggregations on the source indices then index
the results into the destination index. Therefore, a {transform} never takes
less time or uses less resources than the aggregation and indexing processes. 

If your {transform} must process a lot of historic data, it has high resource
usage initially--particularly during the first checkpoint.

For better performance, make sure that your search aggregations and queries are 
optimized and that your {transform} is processing only necessary data. Consider 
whether you can apply a source query to the {transform} to reduce the scope of 
data it processes. Also consider whether the cluster has sufficient resources in 
place to support both the composite aggregation search and the indexing of its
results.

If you prefer to spread out the impact on your cluster (at the cost of a slower
{transform}), you can throttle the rate at which it performs search and index
requests. Set the `docs_per_second` limit when you <<put-transform,create>> or
<<update-transform,update>> your {transform}. If you want to calculate the
current rate, use the following information from the
{ref}/get-transform-stats.html[get {transform} stats API]:
```
documents_processed / search_time_in_ms * 1000
```
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00			`[role="xpack"]`
[DOCS] Adds transforms to Elasticsearch book (#46846) (#47055) 2019-09-25 11:11:37 -04:00			`[[transform-overview]]`
			`=== {transform-cap} overview`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00			`++++`
			`<titleabbrev>Overview</titleabbrev>`
			`++++`

[DOCS] Removes data frame leftovers from transforms overview (#49434) 2019-11-22 03:13:55 -05:00			`You can use {transforms} to _pivot_ your data into a new entity-centric index.`
			`By transforming and summarizing your data, it becomes possible to visualize and`
			`analyze it in alternative and interesting ways.`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00
			`A lot of {es} indices are organized as a stream of events: each event is an`
[DOCS] Removes data frame leftovers from transforms overview (#49434) 2019-11-22 03:13:55 -05:00			`individual document, for example a single item purchase. {transforms-cap} enable`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00			`you to summarize this data, bringing it into an organized, more`
			`analysis-friendly format. For example, you can summarize all the purchases of a`
			`single customer.`

[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 11:28:19 -04:00			`{transforms-cap} enable you to define a pivot, which is a set of`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00			`features that transform the index into a different, more digestible format.`
[DOCS] Removes data frame leftovers from transforms overview (#49434) 2019-11-22 03:13:55 -05:00			`Pivoting results in a summary of your data in a new index.`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00
			`To define a pivot, first you select one or more fields that you will use to`
			`group your data. You can select categorical fields (terms) and numerical fields`
			`for grouping. If you use numerical fields, the field values are bucketed using`
			`an interval that you specify.`

			`The second step is deciding how you want to aggregate the grouped data. When`
			`using aggregations, you practically ask questions about the index. There are`
			`different types of aggregations, each with its own purpose and output. To learn`
			`more about the supported aggregations and group-by fields, see`
[DOCS] Fixes formatting in transform overview (#53900) 2020-03-23 13:20:41 -04:00			`<<put-transform>>.`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00
			`As an optional step, you can also add a query to further limit the scope of the`
			`aggregation.`

[DOCS] Removes data frame leftovers from transforms overview (#49434) 2019-11-22 03:13:55 -05:00			`The {transform} performs a composite aggregation that paginates through all the`
			`data defined by the source index query. The output of the aggregation is stored`
[DOCS] Fixes formatting in transform overview (#53900) 2020-03-23 13:20:41 -04:00			`in a _destination index_. Each time the {transform} queries the source index, it`
[DOCS] Removes data frame leftovers from transforms overview (#49434) 2019-11-22 03:13:55 -05:00			`creates a _checkpoint_. You can decide whether you want the {transform} to run`
[DOCS] Fixes formatting in transform overview (#53900) 2020-03-23 13:20:41 -04:00			`once or continuously. A _batch {transform}_ is a single operation that has a`
			`single checkpoint. _{ctransforms-cap}_ continually increment and process`
			`checkpoints as new source data is ingested.`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00
[DOCS] Removes data frame leftovers from transforms overview (#49434) 2019-11-22 03:13:55 -05:00			`Imagine that you run a webshop that sells clothes. Every order creates a`
			`document that contains a unique order ID, the name and the category of the`
			`ordered product, its price, the ordered quantity, the exact date of the order,`
			`and some customer information (name, gender, location, etc). Your dataset`
			`contains all the transactions from last year.`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00
			`If you want to check the sales in the different categories in your last fiscal`
[DOCS] Removes data frame leftovers from transforms overview (#49434) 2019-11-22 03:13:55 -05:00			`year, define a {transform} that groups the data by the product categories`
			`(women's shoes, men's clothing, etc.) and the order date. Use the last year as`
			`the interval for the order date. Then add a sum aggregation on the ordered`
			`quantity. The result is an entity-centric index that shows the number of sold`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00			`items in every product category in the last year.`

			`[role="screenshot"]`
[DOCS] Updates transform screenshots and text (#50059) 2019-12-12 11:20:39 -05:00			`image::images/pivot-preview.jpg["Example of a {transform} pivot in {kib}"]`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 11:28:19 -04:00			`IMPORTANT: The {transform} leaves your source index intact. It`
[DOCS] Removes data frame leftovers from transforms overview (#49434) 2019-11-22 03:13:55 -05:00			`creates a new index that is dedicated to the transformed data.`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00
[DOCS] Adds performance considerations section to transforms overview (#53791) Co-Authored-By: Lisa Cawley <lcawley@elastic.co> 2020-03-19 12:49:19 -04:00
			`[[transform-performance]]`
			`==== Performance considerations`

[DOCS] Removes transform performance note (#55177) 2020-04-15 13:40:04 -04:00			`{transforms-cap} perform search aggregations on the source indices then index`
			`the results into the destination index. Therefore, a {transform} never takes`
			`less time or uses less resources than the aggregation and indexing processes.`

			`If your {transform} must process a lot of historic data, it has high resource`
			`usage initially--particularly during the first checkpoint.`
[DOCS] Adds performance considerations section to transforms overview (#53791) Co-Authored-By: Lisa Cawley <lcawley@elastic.co> 2020-03-19 12:49:19 -04:00
			`For better performance, make sure that your search aggregations and queries are`
[DOCS] Add throttling based on configuration parameter (#56653) 2020-05-14 11:38:57 -04:00			`optimized and that your {transform} is processing only necessary data. Consider`
			`whether you can apply a source query to the {transform} to reduce the scope of`
			`data it processes. Also consider whether the cluster has sufficient resources in`
			`place to support both the composite aggregation search and the indexing of its`
			`results.`

			`If you prefer to spread out the impact on your cluster (at the cost of a slower`
			`{transform}), you can throttle the rate at which it performs search and index`
			requests. Set the `docs_per_second` limit when you <<put-transform,create>> or
			`<<update-transform,update>> your {transform}. If you want to calculate the`
			`current rate, use the following information from the`
			`{ref}/get-transform-stats.html[get {transform} stats API]:`
			```
			`documents_processed / search_time_in_ms * 1000`
			```