2019-09-11 11:44:03 -04:00
|
|
|
[role="xpack"]
|
2019-09-25 11:11:37 -04:00
|
|
|
[[transform-overview]]
|
|
|
|
=== {transform-cap} overview
|
2019-09-11 11:44:03 -04:00
|
|
|
++++
|
|
|
|
<titleabbrev>Overview</titleabbrev>
|
|
|
|
++++
|
|
|
|
|
2019-11-22 03:13:55 -05:00
|
|
|
You can use {transforms} to _pivot_ your data into a new entity-centric index.
|
|
|
|
By transforming and summarizing your data, it becomes possible to visualize and
|
|
|
|
analyze it in alternative and interesting ways.
|
2019-09-11 11:44:03 -04:00
|
|
|
|
|
|
|
A lot of {es} indices are organized as a stream of events: each event is an
|
2019-11-22 03:13:55 -05:00
|
|
|
individual document, for example a single item purchase. {transforms-cap} enable
|
2019-09-11 11:44:03 -04:00
|
|
|
you to summarize this data, bringing it into an organized, more
|
|
|
|
analysis-friendly format. For example, you can summarize all the purchases of a
|
|
|
|
single customer.
|
|
|
|
|
2019-09-16 11:28:19 -04:00
|
|
|
{transforms-cap} enable you to define a pivot, which is a set of
|
2019-09-11 11:44:03 -04:00
|
|
|
features that transform the index into a different, more digestible format.
|
2019-11-22 03:13:55 -05:00
|
|
|
Pivoting results in a summary of your data in a new index.
|
2019-09-11 11:44:03 -04:00
|
|
|
|
|
|
|
To define a pivot, first you select one or more fields that you will use to
|
|
|
|
group your data. You can select categorical fields (terms) and numerical fields
|
|
|
|
for grouping. If you use numerical fields, the field values are bucketed using
|
|
|
|
an interval that you specify.
|
|
|
|
|
|
|
|
The second step is deciding how you want to aggregate the grouped data. When
|
|
|
|
using aggregations, you practically ask questions about the index. There are
|
|
|
|
different types of aggregations, each with its own purpose and output. To learn
|
|
|
|
more about the supported aggregations and group-by fields, see
|
2020-03-23 13:20:41 -04:00
|
|
|
<<put-transform>>.
|
2019-09-11 11:44:03 -04:00
|
|
|
|
|
|
|
As an optional step, you can also add a query to further limit the scope of the
|
|
|
|
aggregation.
|
|
|
|
|
2019-11-22 03:13:55 -05:00
|
|
|
The {transform} performs a composite aggregation that paginates through all the
|
|
|
|
data defined by the source index query. The output of the aggregation is stored
|
2020-03-23 13:20:41 -04:00
|
|
|
in a _destination index_. Each time the {transform} queries the source index, it
|
2019-11-22 03:13:55 -05:00
|
|
|
creates a _checkpoint_. You can decide whether you want the {transform} to run
|
2020-03-23 13:20:41 -04:00
|
|
|
once or continuously. A _batch {transform}_ is a single operation that has a
|
|
|
|
single checkpoint. _{ctransforms-cap}_ continually increment and process
|
|
|
|
checkpoints as new source data is ingested.
|
2019-09-11 11:44:03 -04:00
|
|
|
|
2019-11-22 03:13:55 -05:00
|
|
|
Imagine that you run a webshop that sells clothes. Every order creates a
|
|
|
|
document that contains a unique order ID, the name and the category of the
|
|
|
|
ordered product, its price, the ordered quantity, the exact date of the order,
|
|
|
|
and some customer information (name, gender, location, etc). Your dataset
|
|
|
|
contains all the transactions from last year.
|
2019-09-11 11:44:03 -04:00
|
|
|
|
|
|
|
If you want to check the sales in the different categories in your last fiscal
|
2019-11-22 03:13:55 -05:00
|
|
|
year, define a {transform} that groups the data by the product categories
|
|
|
|
(women's shoes, men's clothing, etc.) and the order date. Use the last year as
|
|
|
|
the interval for the order date. Then add a sum aggregation on the ordered
|
|
|
|
quantity. The result is an entity-centric index that shows the number of sold
|
2019-09-11 11:44:03 -04:00
|
|
|
items in every product category in the last year.
|
|
|
|
|
|
|
|
[role="screenshot"]
|
2019-12-12 11:20:39 -05:00
|
|
|
image::images/pivot-preview.jpg["Example of a {transform} pivot in {kib}"]
|
2019-09-11 11:44:03 -04:00
|
|
|
|
2019-09-16 11:28:19 -04:00
|
|
|
IMPORTANT: The {transform} leaves your source index intact. It
|
2019-11-22 03:13:55 -05:00
|
|
|
creates a new index that is dedicated to the transformed data.
|
2019-09-11 11:44:03 -04:00
|
|
|
|
2020-03-19 12:49:19 -04:00
|
|
|
|
|
|
|
[[transform-performance]]
|
|
|
|
==== Performance considerations
|
|
|
|
|
|
|
|
{transforms-cap} perform search aggregations on the source
|
|
|
|
indices then index the results into the destination index. Therefore, a
|
|
|
|
{transform} never takes less time than the cumulated duration of the
|
|
|
|
aggregation that it performs and the indexing process.
|
|
|
|
|
|
|
|
For better performance, make sure that your search aggregations and queries are
|
2020-03-23 13:20:41 -04:00
|
|
|
optimized and that your {transform} is processing only necessary data.
|
2020-03-19 12:49:19 -04:00
|
|
|
|
|
|
|
NOTE: When you use <<search-aggregations-bucket-datehistogram-aggregation>>, the
|
|
|
|
queries are not considered optimal as they run through a significant amount of
|
|
|
|
data. For this reason, {transforms} performing date histogram aggregations take
|
|
|
|
longer to run.
|