[DOCS] Removes data frame leftovers from transforms overview (#49434)

This commit is contained in:
István Zoltán Szabó 2019-11-22 09:13:55 +01:00
parent d42eac9cf3
commit c13fce60a8
1 changed files with 22 additions and 27 deletions

View File

@ -7,22 +7,19 @@
beta[]
A _{dataframe}_ is a two-dimensional tabular data structure. In the context of
the {stack}, it is a transformation of data that is indexed in {es}. For
example, you can use {dataframes} to _pivot_ your data into a new entity-centric
index. By transforming and summarizing your data, it becomes possible to
visualize and analyze it in alternative and interesting ways.
You can use {transforms} to _pivot_ your data into a new entity-centric index.
By transforming and summarizing your data, it becomes possible to visualize and
analyze it in alternative and interesting ways.
A lot of {es} indices are organized as a stream of events: each event is an
individual document, for example a single item purchase. {dataframes-cap} enable
individual document, for example a single item purchase. {transforms-cap} enable
you to summarize this data, bringing it into an organized, more
analysis-friendly format. For example, you can summarize all the purchases of a
single customer.
You can create {dataframes} by using {transforms}.
{transforms-cap} enable you to define a pivot, which is a set of
features that transform the index into a different, more digestible format.
Pivoting results in a summary of your data, which is the {dataframe}.
Pivoting results in a summary of your data in a new index.
To define a pivot, first you select one or more fields that you will use to
group your data. You can select categorical fields (terms) and numerical fields
@ -38,34 +35,32 @@ more about the supported aggregations and group-by fields, see
As an optional step, you can also add a query to further limit the scope of the
aggregation.
The {transform} performs a composite aggregation that
paginates through all the data defined by the source index query. The output of
the aggregation is stored in a destination index. Each time the
{transform} queries the source index, it creates a _checkpoint_. You
can decide whether you want the {transform} to run once (batch
{transform}) or continuously ({transform}). A batch
{transform} is a single operation that has a single checkpoint.
{ctransforms-cap} continually increment and process checkpoints as new
source data is ingested.
The {transform} performs a composite aggregation that paginates through all the
data defined by the source index query. The output of the aggregation is stored
in a destination index. Each time the {transform} queries the source index, it
creates a _checkpoint_. You can decide whether you want the {transform} to run
once (batch {transform}) or continuously ({transform}). A batch {transform} is a
single operation that has a single checkpoint. {ctransforms-cap} continually
increment and process checkpoints as new source data is ingested.
.Example
Imagine that you run a webshop that sells clothes. Every order creates a document
that contains a unique order ID, the name and the category of the ordered product,
its price, the ordered quantity, the exact date of the order, and some customer
information (name, gender, location, etc). Your dataset contains all the transactions
from last year.
Imagine that you run a webshop that sells clothes. Every order creates a
document that contains a unique order ID, the name and the category of the
ordered product, its price, the ordered quantity, the exact date of the order,
and some customer information (name, gender, location, etc). Your dataset
contains all the transactions from last year.
If you want to check the sales in the different categories in your last fiscal
year, define a {transform} that groups the data by the product
categories (women's shoes, men's clothing, etc.) and the order date. Use the
last year as the interval for the order date. Then add a sum aggregation on the
ordered quantity. The result is a {dataframe} that shows the number of sold
year, define a {transform} that groups the data by the product categories
(women's shoes, men's clothing, etc.) and the order date. Use the last year as
the interval for the order date. Then add a sum aggregation on the ordered
quantity. The result is an entity-centric index that shows the number of sold
items in every product category in the last year.
[role="screenshot"]
image::images/ml-dataframepivot.jpg["Example of a data frame pivot in {kib}"]
IMPORTANT: The {transform} leaves your source index intact. It
creates a new index that is dedicated to the {dataframe}.
creates a new index that is dedicated to the transformed data.