[DOCS] Removes data frame leftovers from transforms overview (#49434)

This commit is contained in:
István Zoltán Szabó 2019-11-22 09:13:55 +01:00
parent d42eac9cf3
commit c13fce60a8
1 changed files with 22 additions and 27 deletions

View File

@ -7,22 +7,19 @@
beta[] beta[]
A _{dataframe}_ is a two-dimensional tabular data structure. In the context of You can use {transforms} to _pivot_ your data into a new entity-centric index.
the {stack}, it is a transformation of data that is indexed in {es}. For By transforming and summarizing your data, it becomes possible to visualize and
example, you can use {dataframes} to _pivot_ your data into a new entity-centric analyze it in alternative and interesting ways.
index. By transforming and summarizing your data, it becomes possible to
visualize and analyze it in alternative and interesting ways.
A lot of {es} indices are organized as a stream of events: each event is an A lot of {es} indices are organized as a stream of events: each event is an
individual document, for example a single item purchase. {dataframes-cap} enable individual document, for example a single item purchase. {transforms-cap} enable
you to summarize this data, bringing it into an organized, more you to summarize this data, bringing it into an organized, more
analysis-friendly format. For example, you can summarize all the purchases of a analysis-friendly format. For example, you can summarize all the purchases of a
single customer. single customer.
You can create {dataframes} by using {transforms}.
{transforms-cap} enable you to define a pivot, which is a set of {transforms-cap} enable you to define a pivot, which is a set of
features that transform the index into a different, more digestible format. features that transform the index into a different, more digestible format.
Pivoting results in a summary of your data, which is the {dataframe}. Pivoting results in a summary of your data in a new index.
To define a pivot, first you select one or more fields that you will use to To define a pivot, first you select one or more fields that you will use to
group your data. You can select categorical fields (terms) and numerical fields group your data. You can select categorical fields (terms) and numerical fields
@ -38,34 +35,32 @@ more about the supported aggregations and group-by fields, see
As an optional step, you can also add a query to further limit the scope of the As an optional step, you can also add a query to further limit the scope of the
aggregation. aggregation.
The {transform} performs a composite aggregation that The {transform} performs a composite aggregation that paginates through all the
paginates through all the data defined by the source index query. The output of data defined by the source index query. The output of the aggregation is stored
the aggregation is stored in a destination index. Each time the in a destination index. Each time the {transform} queries the source index, it
{transform} queries the source index, it creates a _checkpoint_. You creates a _checkpoint_. You can decide whether you want the {transform} to run
can decide whether you want the {transform} to run once (batch once (batch {transform}) or continuously ({transform}). A batch {transform} is a
{transform}) or continuously ({transform}). A batch single operation that has a single checkpoint. {ctransforms-cap} continually
{transform} is a single operation that has a single checkpoint. increment and process checkpoints as new source data is ingested.
{ctransforms-cap} continually increment and process checkpoints as new
source data is ingested.
.Example .Example
Imagine that you run a webshop that sells clothes. Every order creates a document Imagine that you run a webshop that sells clothes. Every order creates a
that contains a unique order ID, the name and the category of the ordered product, document that contains a unique order ID, the name and the category of the
its price, the ordered quantity, the exact date of the order, and some customer ordered product, its price, the ordered quantity, the exact date of the order,
information (name, gender, location, etc). Your dataset contains all the transactions and some customer information (name, gender, location, etc). Your dataset
from last year. contains all the transactions from last year.
If you want to check the sales in the different categories in your last fiscal If you want to check the sales in the different categories in your last fiscal
year, define a {transform} that groups the data by the product year, define a {transform} that groups the data by the product categories
categories (women's shoes, men's clothing, etc.) and the order date. Use the (women's shoes, men's clothing, etc.) and the order date. Use the last year as
last year as the interval for the order date. Then add a sum aggregation on the the interval for the order date. Then add a sum aggregation on the ordered
ordered quantity. The result is a {dataframe} that shows the number of sold quantity. The result is an entity-centric index that shows the number of sold
items in every product category in the last year. items in every product category in the last year.
[role="screenshot"] [role="screenshot"]
image::images/ml-dataframepivot.jpg["Example of a data frame pivot in {kib}"] image::images/ml-dataframepivot.jpg["Example of a data frame pivot in {kib}"]
IMPORTANT: The {transform} leaves your source index intact. It IMPORTANT: The {transform} leaves your source index intact. It
creates a new index that is dedicated to the {dataframe}. creates a new index that is dedicated to the transformed data.