mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-03-24 17:09:48 +00:00
[DOCS] Update "Enrich your data" tutorials (#46417)
* Move enrich docs to separate file * Rewrite enrich processor tutorial
This commit is contained in:
parent
d74d995382
commit
a27d075db4
@ -73,8 +73,7 @@ include::put-enrich-policy.asciidoc[tag=enrich-policy-api-prereqs]
|
||||
Use the execute enrich policy API
|
||||
to create the enrich index for an existing enrich policy.
|
||||
|
||||
// tag::execute-enrich-policy-desc[]
|
||||
|
||||
// tag::execute-enrich-policy-def[]
|
||||
The *enrich index* contains documents from the policy's source indices.
|
||||
Enrich indices always begin with `.enrich-*`,
|
||||
are read-only,
|
||||
@ -85,20 +84,20 @@ and are <<indices-forcemerge,force merged>>.
|
||||
Enrich indices should be used by the <<enrich-processor,enrich processor>> only.
|
||||
Avoid using enrich indices for other purposes.
|
||||
====
|
||||
// end::execute-enrich-policy-def[]
|
||||
|
||||
// tag::update-enrich-index[]
|
||||
Once created, you cannot update
|
||||
or index documents to an enrich index.
|
||||
Instead, update your source indices
|
||||
and execute the enrich policy again.
|
||||
This creates a new enrich index from your updated source indices
|
||||
and deletes the previous enrich index.
|
||||
// end::update-enrich-index[]
|
||||
|
||||
Because this API request performs several operations,
|
||||
it may take a while to return a response.
|
||||
|
||||
// end::execute-enrich-policy-desc[]
|
||||
|
||||
|
||||
[[sample-api-path-params]]
|
||||
==== {api-path-parms-title}
|
||||
|
||||
|
@ -63,7 +63,7 @@ If you use {es} {security-features}, you must have:
|
||||
Use the put enrich policy API
|
||||
to create a new enrich policy.
|
||||
|
||||
// tag::enrich-policy-def
|
||||
// tag::enrich-policy-def[]
|
||||
An *enrich policy* is a set of rules the enrich processor uses
|
||||
to append the appropriate data to incoming documents.
|
||||
An enrich policy contains:
|
||||
@ -71,15 +71,15 @@ An enrich policy contains:
|
||||
* The *policy type*,
|
||||
which determines how the processor enriches incoming documents
|
||||
* A list of source indices
|
||||
* The *match field*, a field used to match incoming documents
|
||||
* *Enrich fields*, fields appended to incoming documents
|
||||
* The *match field* used to match incoming documents
|
||||
* *Enrich fields* appended to incoming documents
|
||||
from matching documents
|
||||
// end::enrich-policy-def
|
||||
// end::enrich-policy-def[]
|
||||
|
||||
|
||||
===== Update an enrich policy
|
||||
|
||||
// tag::update-enrich-policy
|
||||
// tag::update-enrich-policy[]
|
||||
You cannot update an existing enrich policy.
|
||||
Instead, you can:
|
||||
|
||||
@ -91,7 +91,7 @@ Instead, you can:
|
||||
|
||||
. Use the <<delete-enrich-policy-api, delete enrich policy API>>
|
||||
to delete the previous enrich policy.
|
||||
// end::update-enrich-policy
|
||||
// end::update-enrich-policy[]
|
||||
|
||||
|
||||
[[put-enrich-policy-api-path-params]]
|
||||
|
293
docs/reference/ingest/enrich.asciidoc
Normal file
293
docs/reference/ingest/enrich.asciidoc
Normal file
@ -0,0 +1,293 @@
|
||||
[role="xpack"]
|
||||
[testenv="basic"]
|
||||
[[ingest-enriching-data]]
|
||||
== Enrich your data
|
||||
|
||||
You can use the <<enrich-processor,enrich processor>>
|
||||
to append data from existing indices
|
||||
to incoming documents during ingest.
|
||||
|
||||
For example, you can use the enrich processor to:
|
||||
|
||||
* Identify web services or vendors based on known IP addresses
|
||||
* Add product information to retail orders based on product IDs
|
||||
* Supplement contact information based on an email address
|
||||
|
||||
|
||||
[float]
|
||||
[[enrich-setup]]
|
||||
=== Set up an enrich processor
|
||||
|
||||
To set up an enrich processor and learn how it works,
|
||||
follow these steps:
|
||||
|
||||
. Check the <<enrich-prereqs, prerequisites>>.
|
||||
. <<create-enrich-source-index>>.
|
||||
. <<create-enrich-policy>>.
|
||||
. <<execute-enrich-policy>>.
|
||||
. <<add-enrich-processor>>.
|
||||
. <<ingest-enrich-docs>>.
|
||||
|
||||
Once you have an enrich processor set up,
|
||||
you can <<update-enrich-data,update your enrich data>>
|
||||
and <<update-enrich-policies, update your enrich policies>>
|
||||
using the <<enrich-apis,enrich APIs>>.
|
||||
|
||||
[IMPORTANT]
|
||||
====
|
||||
The enrich processor performs several operations
|
||||
and may impact the speed of your <<pipeline,ingest pipeline>>.
|
||||
|
||||
We strongly recommend testing and benchmarking your enrich processors
|
||||
before deploying them in production.
|
||||
|
||||
We do not recommend using the enrich processor to append real-time data.
|
||||
The enrich processor works best with reference data
|
||||
that doesn't change frequently.
|
||||
====
|
||||
|
||||
[float]
|
||||
[[enrich-prereqs]]
|
||||
==== Prerequisites
|
||||
|
||||
include::{docdir}/ingest/apis/enrich/put-enrich-policy.asciidoc[tag=enrich-policy-api-prereqs]
|
||||
|
||||
[float]
|
||||
[[create-enrich-source-index]]
|
||||
==== Create a source index
|
||||
|
||||
To begin,
|
||||
create one or more source indices.
|
||||
|
||||
A *source index* contains data you want to append to incoming documents.
|
||||
You can index and manage documents in a source index
|
||||
like a regular index.
|
||||
|
||||
The following <<docs-index_,index API>> request creates the `users` source index
|
||||
containing user data.
|
||||
This request also indexes a new document to the `users` source index.
|
||||
|
||||
[source,js]
|
||||
----
|
||||
PUT /users/_doc/1?refresh
|
||||
{
|
||||
"email": "mardy.brown@asciidocsmith.com",
|
||||
"first_name": "Mardy",
|
||||
"last_name": "Brown",
|
||||
"city": "New Orleans",
|
||||
"county": "Orleans",
|
||||
"state": "LA",
|
||||
"zip": 70116,
|
||||
"web": "mardy.asciidocsmith.com"
|
||||
}
|
||||
----
|
||||
// CONSOLE
|
||||
|
||||
You also can set up {beats-ref}/getting-started.html[{beats}],
|
||||
such as a {filebeat-ref}/filebeat-getting-started.html[{filebeat}],
|
||||
to automatically send and index documents
|
||||
to your source indices.
|
||||
See {beats-ref}/getting-started.html[Getting started with {beats}].
|
||||
|
||||
|
||||
[float]
|
||||
[[create-enrich-policy]]
|
||||
==== Create an enrich policy
|
||||
|
||||
Use the <<put-enrich-policy-api, put enrich policy>> API
|
||||
to create an enrich policy.
|
||||
|
||||
include::{docdir}/ingest/apis/enrich/put-enrich-policy.asciidoc[tag=enrich-policy-def]
|
||||
|
||||
[source,js]
|
||||
----
|
||||
PUT /_enrich/policy/users-policy
|
||||
{
|
||||
"match": {
|
||||
"indices": "users",
|
||||
"match_field": "email",
|
||||
"enrich_fields": ["first_name", "last_name", "city", "zip", "state"]
|
||||
}
|
||||
}
|
||||
----
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
|
||||
[float]
|
||||
[[execute-enrich-policy]]
|
||||
==== Execute an enrich policy
|
||||
|
||||
Use the <<execute-enrich-policy-api, execute enrich policy>> API
|
||||
to create an enrich index for the policy.
|
||||
|
||||
include::apis/enrich/execute-enrich-policy.asciidoc[tag=execute-enrich-policy-def]
|
||||
|
||||
The following request executes the `users-policy` enrich policy.
|
||||
Because this API request performs several operations,
|
||||
it may take a while to return a response.
|
||||
|
||||
[source,js]
|
||||
----
|
||||
POST /_enrich/policy/users-policy/_execute
|
||||
----
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
|
||||
[float]
|
||||
[[add-enrich-processor]]
|
||||
==== Add the enrich processor to an ingest pipeline
|
||||
|
||||
Use the <<put-pipeline-api,put pipeline>> API
|
||||
to create an ingest pipeline.
|
||||
Include an <<enrich-processor,enrich processor>>
|
||||
that uses your enrich policy.
|
||||
|
||||
When defining an enrich processor,
|
||||
you must include the following:
|
||||
|
||||
* The *field* used to match incoming documents
|
||||
to documents in the enrich index.
|
||||
+
|
||||
This field should be included in incoming documents.
|
||||
To match, this field must contain the exact
|
||||
value of the match field of a document in the enrich index.
|
||||
|
||||
* The *target field* added to incoming documents.
|
||||
This field contains all appended enrich data.
|
||||
|
||||
The following request adds a new pipeline, `user_lookup`.
|
||||
This pipeline includes an enrich processor
|
||||
that uses the `users-policy` enrich policy.
|
||||
|
||||
[source,js]
|
||||
----
|
||||
PUT /_ingest/pipeline/user_lookup
|
||||
{
|
||||
"description" : "Enriching user details to messages",
|
||||
"processors" : [
|
||||
{
|
||||
"enrich" : {
|
||||
"policy_name": "users-policy",
|
||||
"field" : "email",
|
||||
"target_field": "user"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
----
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
You also can add other <<ingest-processors,processors>>
|
||||
to your ingest pipeline.
|
||||
You can use these processors to change or drop incoming documents
|
||||
based on your criteria.
|
||||
|
||||
See <<ingest-processors>> for a list of built-in processors.
|
||||
|
||||
[float]
|
||||
[[ingest-enrich-docs]]
|
||||
==== Ingest and enrich documents
|
||||
|
||||
Index incoming documents using your ingest pipeline.
|
||||
|
||||
Because the enrich policy type is `match`,
|
||||
the enrich processor matches incoming documents
|
||||
to documents in the enrich index
|
||||
based on match field values.
|
||||
The processor then appends the enrich field data
|
||||
from any matching document in the enrich index
|
||||
to target field of the incoming document.
|
||||
|
||||
The enrich processor appends all data to the target field as an array.
|
||||
If the incoming document matches more than one document in the enrich index,
|
||||
the processor appends data from those documents to the array.
|
||||
|
||||
If the incoming document matches no documents in the enrich index,
|
||||
the processor appends no data.
|
||||
|
||||
The following <<docs-index_,Index API>> request uses the ingest pipeline
|
||||
to index a document
|
||||
containing the `email` field,
|
||||
the `match_field` specified in the `users-policy` enrich policy.
|
||||
|
||||
[source,js]
|
||||
----
|
||||
PUT /my_index/_doc/my_id?pipeline=user_lookup
|
||||
{
|
||||
"email": "mardy.brown@asciidocsmith.com"
|
||||
}
|
||||
----
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
To verify the enrich processor matched
|
||||
and appended the appropriate field data,
|
||||
use the <<docs-get,get>> API to view the indexed document.
|
||||
|
||||
[source,js]
|
||||
----
|
||||
GET /my_index/_doc/my_id
|
||||
----
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
The API returns the following response:
|
||||
|
||||
[source,js]
|
||||
----
|
||||
{
|
||||
"found": true,
|
||||
"_index": "my_index",
|
||||
"_type": "_doc",
|
||||
"_id": "my_id",
|
||||
"_version": 1,
|
||||
"_seq_no": 55,
|
||||
"_primary_term": 1,
|
||||
"_source": {
|
||||
"user": [
|
||||
{
|
||||
"email": "mardy.brown@asciidocsmith.com",
|
||||
"first_name": "Mardy",
|
||||
"last_name": "Brown",
|
||||
"zip": 70116,
|
||||
"city": "New Orleans",
|
||||
"state": "LA"
|
||||
}
|
||||
],
|
||||
"email": "mardy.brown@asciidocsmith.com"
|
||||
}
|
||||
}
|
||||
----
|
||||
// TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term":1/"_primary_term" : $body._primary_term/]
|
||||
|
||||
|
||||
[float]
|
||||
[[update-enrich-data]]
|
||||
=== Update your enrich index
|
||||
|
||||
include::{docdir}/ingest/apis/enrich/execute-enrich-policy.asciidoc[tag=update-enrich-index]
|
||||
|
||||
If wanted, you can <<docs-reindex,reindex>>
|
||||
or <<docs-update-by-query,update>> any already ingested documents
|
||||
using your ingest pipeline.
|
||||
|
||||
|
||||
[float]
|
||||
[[update-enrich-policies]]
|
||||
=== Update an enrich policy
|
||||
|
||||
include::apis/enrich/put-enrich-policy.asciidoc[tag=update-enrich-policy]
|
||||
|
||||
////
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
DELETE /_ingest/pipeline/user_lookup
|
||||
|
||||
DELETE /_enrich/policy/users-policy
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
////
|
@ -752,204 +752,8 @@ metadata field to provide the error message.
|
||||
--------------------------------------------------
|
||||
// NOTCONSOLE
|
||||
|
||||
[role="xpack"]
|
||||
[testenv="basic"]
|
||||
[[ingest-enriching-data]]
|
||||
== Enrich your data using the ingest node
|
||||
|
||||
|
||||
|
||||
The <<enrich-processor,enrich processor>> allows documents to be enriched with data from
|
||||
an enrich index that is managed by an enrich policy prior to indexing.
|
||||
|
||||
The data that is used by the enrich index is managed by the user in regular indices.
|
||||
An enrich policy is configuration that indicates how an enrich index is created from
|
||||
the data in the user's maintained indices. When an enrich policy is executed
|
||||
a new enrich index is created for that policy, which the enrich process can then use.
|
||||
|
||||
An enrich policy also controls what kind of enrichment the `enrich` processor is able to do.
|
||||
|
||||
[[enrich-policy-definition]]
|
||||
=== Enrich Policy Definition
|
||||
|
||||
The <<enrich-processor,enrich processor>> requires more than just the configuration in a pipeline.
|
||||
The main piece to configure is the enrich policy:
|
||||
|
||||
[[enrich-policy-options]]
|
||||
.Enrich policy options
|
||||
[options="header"]
|
||||
|======
|
||||
| Name | Required | Default | Description
|
||||
| `type` | yes | - | The policy type.
|
||||
| `indices` | yes | - | The indices to fetch the data from.
|
||||
| `query` | no | `match_all` query | The query to be used to select which documents are included.
|
||||
| `match_field` | yes | - | The field that will be used to match against an input document.
|
||||
| `enrich_fields` | yes | - | The fields that will be available to enrich the input document.
|
||||
|======
|
||||
|
||||
[[enrich-policy-types]]
|
||||
==== Policy types
|
||||
|
||||
An enrich processor is associated with a policy via the `policy_name` option.
|
||||
The policy type of the policy determines what kind of enrichment an `enrich` processor is able to do.
|
||||
|
||||
The following policy types are currently supported:
|
||||
|
||||
* `match` - Can lookup documents by running a term query and use the retrieved content to enrich the document being ingested.
|
||||
|
||||
[[enrich-processor-getting-started]]
|
||||
=== Getting started
|
||||
|
||||
Create a regular index that contains data you like to enrich your incoming documents with:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT /users/_doc/1?refresh
|
||||
{
|
||||
"email": "mardy.brown@email.me",
|
||||
"first_name": "Mardy",
|
||||
"last_name": "Brown",
|
||||
"address": "6649 N Blue Gum St",
|
||||
"city": "New Orleans",
|
||||
"county": "Orleans",
|
||||
"state": "LA",
|
||||
"zip": 70116,
|
||||
"phone1":"504-621-8927",
|
||||
"phone2": "504-845-1427",
|
||||
"web": "mardy-brown.me"
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
|
||||
Create an enrich policy:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT /_enrich/policy/users-policy
|
||||
{
|
||||
"match": {
|
||||
"indices": "users",
|
||||
"match_field": "email",
|
||||
"enrich_fields": ["first_name", "last_name", "address", "city", "zip", "state"]
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
Which returns:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"acknowledged": true
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TESTRESPONSE
|
||||
|
||||
[[execute-enrich-policy]]
|
||||
Execute that enrich policy:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
POST /_enrich/policy/users-policy/_execute
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
Which returns:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"acknowledged": true
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TESTRESPONSE
|
||||
|
||||
Create the pipeline and enrich a document:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT _ingest/pipeline/user_lookup
|
||||
{
|
||||
"description" : "Enriching user details to messages",
|
||||
"processors" : [
|
||||
{
|
||||
"enrich" : {
|
||||
"policy_name": "users-policy",
|
||||
"field" : "email",
|
||||
"target_field": "user"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
PUT my_index/_doc/my_id?pipeline=user_lookup
|
||||
{
|
||||
"email": "mardy.brown@email.me"
|
||||
}
|
||||
|
||||
GET my_index/_doc/my_id
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
Which returns:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"found": true,
|
||||
"_index": "my_index",
|
||||
"_type": "_doc",
|
||||
"_id": "my_id",
|
||||
"_version": 1,
|
||||
"_seq_no": 55,
|
||||
"_primary_term": 1,
|
||||
"_source": {
|
||||
"user": [
|
||||
{
|
||||
"email": "mardy.brown@email.me",
|
||||
"first_name": "Mardy",
|
||||
"last_name": "Brown",
|
||||
"zip": 70116,
|
||||
"address": "6649 N Blue Gum St",
|
||||
"city": "New Orleans",
|
||||
"state": "LA"
|
||||
}
|
||||
],
|
||||
"email": "mardy.brown@email.me"
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term":1/"_primary_term" : $body._primary_term/]
|
||||
|
||||
//////////////////////////
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
DELETE /_ingest/pipeline/user_lookup
|
||||
DELETE /_enrich/policy/users-policy
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
//////////////////////////
|
||||
|
||||
[[enrich-policy-apis]]
|
||||
=== Enrich Policy APIs
|
||||
|
||||
Also there are several APIs in order to manage and execute enrich policies:
|
||||
|
||||
* <<put-enrich-policy-api,Put policy api>>.
|
||||
* <<get-enrich-policy-api,Get enrich policy api>>.
|
||||
* <<delete-enrich-policy-api,Delete policy api>>.
|
||||
* <<execute-enrich-policy-api,Execute policy api>>.
|
||||
|
||||
If security is enabled then the user managing enrich policies will need to have
|
||||
the `enrich_user` builtin role. Also the user will need to have read privileges
|
||||
for the indices the enrich policy is referring to.
|
||||
include::enrich.asciidoc[]
|
||||
|
||||
|
||||
[[ingest-processors]]
|
||||
|
@ -5,7 +5,7 @@
|
||||
|
||||
The `enrich` processor can enrich documents with data from another index.
|
||||
See <<ingest-enriching-data,enrich data>> section for more information how to set this up and
|
||||
check out the <<enrich-processor-getting-started,getting started>> to get familiar with enrich policies and related APIs.
|
||||
check out the <<ingest-enriching-data,tutorial>> to get familiar with enrich policies and related APIs.
|
||||
|
||||
[[enrich-options]]
|
||||
.Enrich Options
|
||||
|
Loading…
x
Reference in New Issue
Block a user