diff --git a/docs/reference/transform/overview.asciidoc b/docs/reference/transform/overview.asciidoc index e3c852d8be9..50930b00c5c 100644 --- a/docs/reference/transform/overview.asciidoc +++ b/docs/reference/transform/overview.asciidoc @@ -7,22 +7,19 @@ beta[] -A _{dataframe}_ is a two-dimensional tabular data structure. In the context of -the {stack}, it is a transformation of data that is indexed in {es}. For -example, you can use {dataframes} to _pivot_ your data into a new entity-centric -index. By transforming and summarizing your data, it becomes possible to -visualize and analyze it in alternative and interesting ways. +You can use {transforms} to _pivot_ your data into a new entity-centric index. +By transforming and summarizing your data, it becomes possible to visualize and +analyze it in alternative and interesting ways. A lot of {es} indices are organized as a stream of events: each event is an -individual document, for example a single item purchase. {dataframes-cap} enable +individual document, for example a single item purchase. {transforms-cap} enable you to summarize this data, bringing it into an organized, more analysis-friendly format. For example, you can summarize all the purchases of a single customer. -You can create {dataframes} by using {transforms}. {transforms-cap} enable you to define a pivot, which is a set of features that transform the index into a different, more digestible format. -Pivoting results in a summary of your data, which is the {dataframe}. +Pivoting results in a summary of your data in a new index. To define a pivot, first you select one or more fields that you will use to group your data. You can select categorical fields (terms) and numerical fields @@ -38,34 +35,32 @@ more about the supported aggregations and group-by fields, see As an optional step, you can also add a query to further limit the scope of the aggregation. -The {transform} performs a composite aggregation that -paginates through all the data defined by the source index query. The output of -the aggregation is stored in a destination index. Each time the -{transform} queries the source index, it creates a _checkpoint_. You -can decide whether you want the {transform} to run once (batch -{transform}) or continuously ({transform}). A batch -{transform} is a single operation that has a single checkpoint. -{ctransforms-cap} continually increment and process checkpoints as new -source data is ingested. +The {transform} performs a composite aggregation that paginates through all the +data defined by the source index query. The output of the aggregation is stored +in a destination index. Each time the {transform} queries the source index, it +creates a _checkpoint_. You can decide whether you want the {transform} to run +once (batch {transform}) or continuously ({transform}). A batch {transform} is a +single operation that has a single checkpoint. {ctransforms-cap} continually +increment and process checkpoints as new source data is ingested. .Example -Imagine that you run a webshop that sells clothes. Every order creates a document -that contains a unique order ID, the name and the category of the ordered product, -its price, the ordered quantity, the exact date of the order, and some customer -information (name, gender, location, etc). Your dataset contains all the transactions -from last year. +Imagine that you run a webshop that sells clothes. Every order creates a +document that contains a unique order ID, the name and the category of the +ordered product, its price, the ordered quantity, the exact date of the order, +and some customer information (name, gender, location, etc). Your dataset +contains all the transactions from last year. If you want to check the sales in the different categories in your last fiscal -year, define a {transform} that groups the data by the product -categories (women's shoes, men's clothing, etc.) and the order date. Use the -last year as the interval for the order date. Then add a sum aggregation on the -ordered quantity. The result is a {dataframe} that shows the number of sold +year, define a {transform} that groups the data by the product categories +(women's shoes, men's clothing, etc.) and the order date. Use the last year as +the interval for the order date. Then add a sum aggregation on the ordered +quantity. The result is an entity-centric index that shows the number of sold items in every product category in the last year. [role="screenshot"] image::images/ml-dataframepivot.jpg["Example of a data frame pivot in {kib}"] IMPORTANT: The {transform} leaves your source index intact. It -creates a new index that is dedicated to the {dataframe}. +creates a new index that is dedicated to the transformed data.