[DOCS] Changes wording to move away from data frame terminology in the ES repo (#47093)
* [DOCS] Changes wording to move away from data frame terminology in the ES repo. Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
This commit is contained in:
parent
45605cfd7a
commit
170b102ab5
|
@ -21,7 +21,7 @@ list of IDs or a single ID. Wildcards, `*` and `_all` are also accepted.
|
||||||
---------------------------------------------------
|
---------------------------------------------------
|
||||||
include-tagged::{doc-tests-file}[{api}-request]
|
include-tagged::{doc-tests-file}[{api}-request]
|
||||||
---------------------------------------------------
|
---------------------------------------------------
|
||||||
<1> Constructing a new stop request referencing an existing {transform}
|
<1> Constructing a new stop request referencing an existing {transform}.
|
||||||
|
|
||||||
==== Optional arguments
|
==== Optional arguments
|
||||||
|
|
||||||
|
@ -31,7 +31,7 @@ The following arguments are optional.
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
include-tagged::{doc-tests-file}[{api}-request-options]
|
include-tagged::{doc-tests-file}[{api}-request-options]
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
<1> If true wait for the transform task to stop before responding
|
<1> If true wait for the {transform} task to stop before responding.
|
||||||
<2> Controls the amount of time to wait until the {transform} stops.
|
<2> Controls the amount of time to wait until the {transform} stops.
|
||||||
<3> Whether to ignore if a wildcard expression matches no {transforms}.
|
<3> Whether to ignore if a wildcard expression matches no {transforms}.
|
||||||
|
|
||||||
|
|
|
@ -180,7 +180,7 @@ hyperparameter optimization to give minimum validation errors.
|
||||||
===== Standard parameters
|
===== Standard parameters
|
||||||
|
|
||||||
`dependent_variable`::
|
`dependent_variable`::
|
||||||
(Required, string) Defines which field of the {dataframe} is to be predicted.
|
(Required, string) Defines which field of the document is to be predicted.
|
||||||
This parameter is supplied by field name and must match one of the fields in
|
This parameter is supplied by field name and must match one of the fields in
|
||||||
the index being used to train. If this field is missing from a document, then
|
the index being used to train. If this field is missing from a document, then
|
||||||
that document will not be used for training, but a prediction with the trained
|
that document will not be used for training, but a prediction with the trained
|
||||||
|
|
|
@ -30,9 +30,9 @@ Available evaluation types:
|
||||||
==== Binary soft classification configuration objects
|
==== Binary soft classification configuration objects
|
||||||
|
|
||||||
Binary soft classification evaluates the results of an analysis which outputs
|
Binary soft classification evaluates the results of an analysis which outputs
|
||||||
the probability that each {dataframe} row belongs to a certain class. For
|
the probability that each document belongs to a certain class. For
|
||||||
example, in the context of outlier detection, the analysis outputs the
|
example, in the context of outlier detection, the analysis outputs the
|
||||||
probability whether each row is an outlier.
|
probability whether each document is an outlier.
|
||||||
|
|
||||||
[discrete]
|
[discrete]
|
||||||
[[binary-sc-resources-properties]]
|
[[binary-sc-resources-properties]]
|
||||||
|
|
|
@ -66,14 +66,13 @@ affected when you update this setting. For more information about the
|
||||||
|
|
||||||
`xpack.ml.max_open_jobs` (<<cluster-update-settings,Dynamic>>)::
|
`xpack.ml.max_open_jobs` (<<cluster-update-settings,Dynamic>>)::
|
||||||
The maximum number of jobs that can run simultaneously on a node. Defaults to
|
The maximum number of jobs that can run simultaneously on a node. Defaults to
|
||||||
`20`. In this context, jobs include both anomaly detector jobs and data frame
|
`20`. In this context, jobs include both {anomaly-jobs} and {dfanalytics-jobs}.
|
||||||
analytics jobs. The maximum number of jobs is also constrained by memory usage.
|
The maximum number of jobs is also constrained by memory usage. Thus if the
|
||||||
Thus if the estimated memory usage of the jobs would be higher than allowed,
|
estimated memory usage of the jobs would be higher than allowed, fewer jobs will
|
||||||
fewer jobs will run on a node. Prior to version 7.1, this setting was a per-node
|
run on a node. Prior to version 7.1, this setting was a per-node non-dynamic
|
||||||
non-dynamic setting. It became a cluster-wide dynamic
|
setting. It became a cluster-wide dynamic setting in version 7.1. As a result,
|
||||||
setting in version 7.1. As a result, changes to its value after node startup
|
changes to its value after node startup are used only after every node in the
|
||||||
are used only after every node in the cluster is running version 7.1 or higher.
|
cluster is running version 7.1 or higher. The maximum permitted value is `512`.
|
||||||
The maximum permitted value is `512`.
|
|
||||||
|
|
||||||
`xpack.ml.node_concurrent_job_allocations` (<<cluster-update-settings,Dynamic>>)::
|
`xpack.ml.node_concurrent_job_allocations` (<<cluster-update-settings,Dynamic>>)::
|
||||||
The maximum number of jobs that can concurrently be in the `opening` state on
|
The maximum number of jobs that can concurrently be in the `opening` state on
|
||||||
|
|
|
@ -21,11 +21,11 @@ step-by-step example, see
|
||||||
[[example-best-customers]]
|
[[example-best-customers]]
|
||||||
==== Finding your best customers
|
==== Finding your best customers
|
||||||
|
|
||||||
In this example, we use the eCommerce orders sample dataset to find the customers
|
In this example, we use the eCommerce orders sample dataset to find the
|
||||||
who spent the most in our hypothetical webshop. Let's transform the data such
|
customers who spent the most in our hypothetical webshop. Let's transform the
|
||||||
that the destination index contains the number of orders, the total price of
|
data such that the destination index contains the number of orders, the total
|
||||||
the orders, the amount of unique products and the average price per order,
|
price of the orders, the amount of unique products and the average price per
|
||||||
and the total amount of ordered products for each customer.
|
order, and the total amount of ordered products for each customer.
|
||||||
|
|
||||||
[source,console]
|
[source,console]
|
||||||
----------------------------------
|
----------------------------------
|
||||||
|
@ -97,7 +97,7 @@ This {dataframe} makes it easier to answer questions such as:
|
||||||
* Which customers ordered the least number of different products?
|
* Which customers ordered the least number of different products?
|
||||||
|
|
||||||
It's possible to answer these questions using aggregations alone, however
|
It's possible to answer these questions using aggregations alone, however
|
||||||
{dataframes} allow us to persist this data as a customer centric index. This
|
{transforms} allow us to persist this data as a customer centric index. This
|
||||||
enables us to analyze data at scale and gives more flexibility to explore and
|
enables us to analyze data at scale and gives more flexibility to explore and
|
||||||
navigate data from a customer centric perspective. In some cases, it can even
|
navigate data from a customer centric perspective. In some cases, it can even
|
||||||
make creating visualizations much simpler.
|
make creating visualizations much simpler.
|
||||||
|
@ -275,9 +275,9 @@ POST _data_frame/transforms/_preview
|
||||||
----------------------------------
|
----------------------------------
|
||||||
// TEST[skip:setup kibana sample data]
|
// TEST[skip:setup kibana sample data]
|
||||||
|
|
||||||
<1> This range query limits the {transform} to documents that are within the last
|
<1> This range query limits the {transform} to documents that are within the
|
||||||
30 days at the point in time the {transform} checkpoint is processed.
|
last 30 days at the point in time the {transform} checkpoint is processed. For
|
||||||
For batch {dataframes} this occurs once.
|
batch {transforms} this occurs once.
|
||||||
<2> This is the destination index for the {dataframe}. It is ignored by
|
<2> This is the destination index for the {dataframe}. It is ignored by
|
||||||
`_preview`.
|
`_preview`.
|
||||||
<3> The data is grouped by the `clientip` field.
|
<3> The data is grouped by the `clientip` field.
|
||||||
|
|
|
@ -8,30 +8,30 @@
|
||||||
|
|
||||||
beta[]
|
beta[]
|
||||||
|
|
||||||
The following limitations and known problems apply to the 7.4 release of
|
The following limitations and known problems apply to the {version} release of
|
||||||
the Elastic {dataframe} feature:
|
the Elastic {transform} feature:
|
||||||
|
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[transform-compatibility-limitations]]
|
[[transform-compatibility-limitations]]
|
||||||
==== Beta {transforms} do not have guaranteed backwards or forwards compatibility
|
==== Beta {transforms} do not have guaranteed backwards or forwards compatibility
|
||||||
|
|
||||||
Whilst {transforms} are beta, it is not guaranteed that a
|
Whilst {transforms} are beta, it is not guaranteed that a {transform} created in
|
||||||
{transform} created in a previous version of the {stack} will be able
|
a previous version of the {stack} will be able to start and operate in a future
|
||||||
to start and operate in a future version. Neither can support be provided for
|
version. Neither can support be provided for {transform} tasks to be able to
|
||||||
{transform} tasks to be able to operate in a cluster with mixed node
|
operate in a cluster with mixed node versions. Please note that the output of a
|
||||||
versions.
|
{transform} is persisted to a destination index. This is a normal {es} index and
|
||||||
Please note that the output of a {transform} is persisted to a
|
is not affected by the beta status.
|
||||||
destination index. This is a normal {es} index and is not affected by the beta
|
|
||||||
status.
|
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[transform-ui-limitation]]
|
[[transform-ui-limitation]]
|
||||||
==== {dataframe-cap} UI will not work during a rolling upgrade from 7.2
|
==== {transforms-cap} UI will not work during a rolling upgrade from 7.2
|
||||||
|
|
||||||
If your cluster contains mixed version nodes, for example during a rolling
|
If your cluster contains mixed version nodes, for example during a rolling
|
||||||
upgrade from 7.2 to a newer version, and {transforms} have been
|
upgrade from 7.2 to a newer version, and {transforms} have been created in 7.2,
|
||||||
created in 7.2, the {dataframe} UI will not work. Please wait until all nodes
|
the {transforms} UI (earler {dataframe} UI) will not work. Please wait until all
|
||||||
have been upgraded to the newer version before using the {dataframe} UI.
|
nodes have been upgraded to the newer version before using the {transforms} UI.
|
||||||
|
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
|
@ -42,21 +42,23 @@ have been upgraded to the newer version before using the {dataframe} UI.
|
||||||
the API. If you try to create one, the UI will fail to show the source index
|
the API. If you try to create one, the UI will fail to show the source index
|
||||||
table.
|
table.
|
||||||
|
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[transform-ccs-limitations]]
|
[[transform-ccs-limitations]]
|
||||||
==== {ccs-cap} is not supported
|
==== {ccs-cap} is not supported
|
||||||
|
|
||||||
{ccs-cap} is not supported for {transforms}.
|
{ccs-cap} is not supported for {transforms}.
|
||||||
|
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[transform-kibana-limitations]]
|
[[transform-kibana-limitations]]
|
||||||
==== Up to 1,000 {transforms} are supported
|
==== Up to 1,000 {transforms} are supported
|
||||||
|
|
||||||
A single cluster will support up to 1,000 {transforms}.
|
A single cluster will support up to 1,000 {transforms}. When using the
|
||||||
When using the
|
{ref}/get-transform.html[GET {transforms} API] a total `count` of {transforms}
|
||||||
{ref}/get-transform.html[GET {transforms} API] a total
|
is returned. Use the `size` and `from` parameters to enumerate through the full
|
||||||
`count` of {transforms} is returned. Use the `size` and `from` parameters to
|
list.
|
||||||
enumerate through the full list.
|
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[transform-aggresponse-limitations]]
|
[[transform-aggresponse-limitations]]
|
||||||
|
@ -76,6 +78,7 @@ workaround, you may define custom mappings prior to starting the
|
||||||
{ref}/indices-create-index.html[create a custom destination index] or
|
{ref}/indices-create-index.html[create a custom destination index] or
|
||||||
{ref}/indices-templates.html[define an index template].
|
{ref}/indices-templates.html[define an index template].
|
||||||
|
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[transform-batch-limitations]]
|
[[transform-batch-limitations]]
|
||||||
==== Batch {transforms} may not account for changed documents
|
==== Batch {transforms} may not account for changed documents
|
||||||
|
@ -87,18 +90,18 @@ do not yet support a search context, therefore if the source data is changed
|
||||||
(deleted, updated, added) while the batch {dataframe} is in progress, then the
|
(deleted, updated, added) while the batch {dataframe} is in progress, then the
|
||||||
results may not include these changes.
|
results may not include these changes.
|
||||||
|
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[transform-consistency-limitations]]
|
[[transform-consistency-limitations]]
|
||||||
==== {cdataframe-cap} consistency does not account for deleted or updated documents
|
==== {ctransform-cap} consistency does not account for deleted or updated documents
|
||||||
|
|
||||||
While the process for {transforms} allows the continual recalculation
|
While the process for {transforms} allows the continual recalculation of the
|
||||||
of the {transform} as new data is being ingested, it does also have
|
{transform} as new data is being ingested, it does also have some limitations.
|
||||||
some limitations.
|
|
||||||
|
|
||||||
Changed entities will only be identified if their time field
|
Changed entities will only be identified if their time field has also been
|
||||||
has also been updated and falls within the range of the action to check for
|
updated and falls within the range of the action to check for changes. This has
|
||||||
changes. This has been designed in principle for, and is suited to, the use case
|
been designed in principle for, and is suited to, the use case where new data is
|
||||||
where new data is given a timestamp for the time of ingest.
|
given a timestamp for the time of ingest.
|
||||||
|
|
||||||
If the indices that fall within the scope of the source index pattern are
|
If the indices that fall within the scope of the source index pattern are
|
||||||
removed, for example when deleting historical time-based indices, then the
|
removed, for example when deleting historical time-based indices, then the
|
||||||
|
@ -106,29 +109,30 @@ composite aggregation performed in consecutive checkpoint processing will search
|
||||||
over different source data, and entities that only existed in the deleted index
|
over different source data, and entities that only existed in the deleted index
|
||||||
will not be removed from the {dataframe} destination index.
|
will not be removed from the {dataframe} destination index.
|
||||||
|
|
||||||
Depending on your use case, you may wish to recreate the {transform}
|
Depending on your use case, you may wish to recreate the {transform} entirely
|
||||||
entirely after deletions. Alternatively, if your use case is tolerant to
|
after deletions. Alternatively, if your use case is tolerant to historical
|
||||||
historical archiving, you may wish to include a max ingest timestamp in your
|
archiving, you may wish to include a max ingest timestamp in your aggregation.
|
||||||
aggregation. This will allow you to exclude results that have not been recently
|
This will allow you to exclude results that have not been recently updated when
|
||||||
updated when viewing the {dataframe} destination index.
|
viewing the destination index.
|
||||||
|
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[transform-deletion-limitations]]
|
[[transform-deletion-limitations]]
|
||||||
==== Deleting a {transform} does not delete the {dataframe} destination index or {kib} index pattern
|
==== Deleting a {transform} does not delete the destination index or {kib} index pattern
|
||||||
|
|
||||||
When deleting a {transform} using `DELETE _data_frame/transforms/index`
|
When deleting a {transform} using `DELETE _data_frame/transforms/index`
|
||||||
neither the {dataframe} destination index nor the {kib} index pattern, should
|
neither the destination index nor the {kib} index pattern, should one have been
|
||||||
one have been created, are deleted. These objects must be deleted separately.
|
created, are deleted. These objects must be deleted separately.
|
||||||
|
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[transform-aggregation-page-limitations]]
|
[[transform-aggregation-page-limitations]]
|
||||||
==== Handling dynamic adjustment of aggregation page size
|
==== Handling dynamic adjustment of aggregation page size
|
||||||
|
|
||||||
During the development of {transforms}, control was favoured over
|
During the development of {transforms}, control was favoured over performance.
|
||||||
performance. In the design considerations, it is preferred for the
|
In the design considerations, it is preferred for the {transform} to take longer
|
||||||
{transform} to take longer to complete quietly in the background
|
to complete quietly in the background rather than to finish quickly and take
|
||||||
rather than to finish quickly and take precedence in resource consumption.
|
precedence in resource consumption.
|
||||||
|
|
||||||
Composite aggregations are well suited for high cardinality data enabling
|
Composite aggregations are well suited for high cardinality data enabling
|
||||||
pagination through results. If a {ref}/circuit-breaker.html[circuit breaker]
|
pagination through results. If a {ref}/circuit-breaker.html[circuit breaker]
|
||||||
|
@ -138,19 +142,18 @@ calculated based upon all activity within the cluster, not just activity from
|
||||||
{transforms}, so it therefore may only be a temporary resource
|
{transforms}, so it therefore may only be a temporary resource
|
||||||
availability issue.
|
availability issue.
|
||||||
|
|
||||||
For a batch {transform}, the number of buckets requested is only ever
|
For a batch {transform}, the number of buckets requested is only ever adjusted
|
||||||
adjusted downwards. The lowering of value may result in a longer duration for the
|
downwards. The lowering of value may result in a longer duration for the
|
||||||
{transform} checkpoint to complete. For {cdataframes}, the number of
|
{transform} checkpoint to complete. For {ctransforms}, the number of buckets
|
||||||
buckets requested is reset back to its default at the start of every checkpoint
|
requested is reset back to its default at the start of every checkpoint and it
|
||||||
and it is possible for circuit breaker exceptions to occur repeatedly in the
|
is possible for circuit breaker exceptions to occur repeatedly in the {es} logs.
|
||||||
{es} logs.
|
|
||||||
|
The {transform} retrieves data in batches which means it calculates several
|
||||||
|
buckets at once. Per default this is 500 buckets per search/index operation. The
|
||||||
|
default can be changed using `max_page_search_size` and the minimum value is 10.
|
||||||
|
If failures still occur once the number of buckets requested has been reduced to
|
||||||
|
its minimum, then the {transform} will be set to a failed state.
|
||||||
|
|
||||||
The {transform} retrieves data in batches which means it calculates
|
|
||||||
several buckets at once. Per default this is 500 buckets per search/index
|
|
||||||
operation. The default can be changed using `max_page_search_size` and the
|
|
||||||
minimum value is 10. If failures still occur once the number of buckets
|
|
||||||
requested has been reduced to its minimum, then the {transform} will
|
|
||||||
be set to a failed state.
|
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[transform-dynamic-adjustments-limitations]]
|
[[transform-dynamic-adjustments-limitations]]
|
||||||
|
@ -158,9 +161,9 @@ be set to a failed state.
|
||||||
|
|
||||||
For each checkpoint, entities are identified that have changed since the last
|
For each checkpoint, entities are identified that have changed since the last
|
||||||
time the check was performed. This list of changed entities is supplied as a
|
time the check was performed. This list of changed entities is supplied as a
|
||||||
{ref}/query-dsl-terms-query.html[terms query] to the {transform}
|
{ref}/query-dsl-terms-query.html[terms query] to the {transform} composite
|
||||||
composite aggregation, one page at a time. Then updates are applied to the
|
aggregation, one page at a time. Then updates are applied to the destination
|
||||||
destination index for each page of entities.
|
index for each page of entities.
|
||||||
|
|
||||||
The page `size` is defined by `max_page_search_size` which is also used to
|
The page `size` is defined by `max_page_search_size` which is also used to
|
||||||
define the number of buckets returned by the composite aggregation search. The
|
define the number of buckets returned by the composite aggregation search. The
|
||||||
|
@ -175,6 +178,7 @@ is 65536. If `max_page_search_size` exceeds `index.max_terms_count` the
|
||||||
Using smaller values for `max_page_search_size` may result in a longer duration
|
Using smaller values for `max_page_search_size` may result in a longer duration
|
||||||
for the {transform} checkpoint to complete.
|
for the {transform} checkpoint to complete.
|
||||||
|
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[transform-scheduling-limitations]]
|
[[transform-scheduling-limitations]]
|
||||||
==== {cdataframe-cap} scheduling limitations
|
==== {cdataframe-cap} scheduling limitations
|
||||||
|
@ -187,6 +191,7 @@ your ingest rate along with the impact that the {transform}
|
||||||
search/index operations has other users in your cluster. Also note that retries
|
search/index operations has other users in your cluster. Also note that retries
|
||||||
occur at `frequency` interval.
|
occur at `frequency` interval.
|
||||||
|
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[transform-failed-limitations]]
|
[[transform-failed-limitations]]
|
||||||
==== Handling of failed {transforms}
|
==== Handling of failed {transforms}
|
||||||
|
@ -198,6 +203,7 @@ failure and re-starting.
|
||||||
When using the API to delete a failed {transform}, first stop it using
|
When using the API to delete a failed {transform}, first stop it using
|
||||||
`_stop?force=true`, then delete it.
|
`_stop?force=true`, then delete it.
|
||||||
|
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[transform-availability-limitations]]
|
[[transform-availability-limitations]]
|
||||||
==== {cdataframes-cap} may give incorrect results if documents are not yet available to search
|
==== {cdataframes-cap} may give incorrect results if documents are not yet available to search
|
||||||
|
@ -205,9 +211,9 @@ When using the API to delete a failed {transform}, first stop it using
|
||||||
After a document is indexed, there is a very small delay until it is available
|
After a document is indexed, there is a very small delay until it is available
|
||||||
to search.
|
to search.
|
||||||
|
|
||||||
A {ctransform} periodically checks for changed entities between the
|
A {ctransform} periodically checks for changed entities between the time since
|
||||||
time since it last checked and `now` minus `sync.time.delay`. This time window
|
it last checked and `now` minus `sync.time.delay`. This time window moves
|
||||||
moves without overlapping. If the timestamp of a recently indexed document falls
|
without overlapping. If the timestamp of a recently indexed document falls
|
||||||
within this time window but this document is not yet available to search then
|
within this time window but this document is not yet available to search then
|
||||||
this entity will not be updated.
|
this entity will not be updated.
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue