Commit Graph

127 Commits

Author SHA1 Message Date
James Baiera c0d5ec87e1 Set enrich indices to be read only before swapping their aliases (#42874) 2019-06-24 15:14:11 -04:00
Michael Basnight 6945e5d5e6 Add role for enrich processor (#42677)
This commit adds the manage_enrich privilege, which grants access to all
of the enrich processor lifecycle actions. In addition this commit also
creates a role which grants access to the generated indices.

Relates #41939

Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
2019-06-24 10:47:01 -05:00
Martijn van Groningen c8e6474eef
Changes required for merging in 7.x branch. 2019-06-13 16:58:27 +02:00
James Baiera 9d56a0365f Limit a enrich policy execution to only one at a time (#42535)
Add a keyed lock mechanism to the policy executor to ensure that an enrich policy
can only have one execution happening at a time.
2019-06-12 10:45:10 -04:00
James Baiera 415f1a484f Enrich validate nested mappings (#42452)
Ensures that fields retained in an enrich index from a source are not contained
under a nested field. It additionally makes sure that key fields exist, and that
value fields are checked if they are present. The policy runner test has also
been expanded with some faulty mapping test cases.
2019-06-03 16:02:31 -04:00
Michael Basnight 77eed9e6a0 Add enrich policy GET API (#41384)
This commit wires up the Rest calls and Transport calls for GET enrich
policy, as well as tests and rest spec additions.
2019-05-28 23:19:23 -05:00
Martijn van Groningen 484f5cee39
Stricter update dependency between pipelines and components used by pipelines (#42038)
Add support for components used by processor factories to get updated
before processor factories create new processor instances.

Components can register via `IngestService#addIngestClusterStateListener(...)`
then if the internal  representation of ingest pipelines get updated,
these components get  updated with the current cluster state before
pipelines are updated.

Registered EnrichProcessorFactory as ingest cluster state listener, so
that it has always an up to date view of the active enrich policies.
2019-05-28 09:04:46 +02:00
Martijn van Groningen 5901285773
Complete EnrichIT by using the execute enrich policy API (#42433) 2019-05-27 10:15:29 +02:00
Michael Basnight 2325ffb757 Add enrich policy execute API (#41762)
This commit wires up the Rest calls and Transport calls for execute
enrich policy, as well as tests and rest spec additions.
2019-05-24 09:39:29 -05:00
James Baiera 824ccfabd9 Backport 7.x - Add step to forcemerge enrich index after reindex (#41969)
Adds a step in the policy execution that forcemerge's a new enrich index after reindex completes.
2019-05-22 16:51:44 -04:00
Martijn van Groningen 9e514cb161
Remove schedule field from EnrichPolicy (#42143) 2019-05-22 17:13:54 +02:00
Martijn van Groningen 57a4614a7b
Keep track of the enrich key field in the enrich index. (#42022)
The enrich key field is being kept track in _meta field by the policy runner.
The ingest processor uses the field name defined in enrich index _meta field and
not in the policy. This will avoid problems if policy is changed without
a new enrich index being created.

This also complete decouples EnrichPolicy from ExactMatchProcessor.

The following scenario results in failure without this change:
1) Create policy
2) Execute policy
3) Create pipeline with enrich processor
4) Use pipeline
5) Update enrich key in policy
6) Use pipeline, which then fails.
2019-05-09 21:28:48 +02:00
Martijn van Groningen 299ff70bfe
Enrich store should only update the policies via an update task. (#41944) 2019-05-09 18:21:40 +02:00
Martijn van Groningen 1b00e7f834
Change the reindex fetch in policy runner from 1000 to 10000 and (#41838)
Reindex uses scroll searches to read the source data. It is more efficient
to read more data in one search scroll round then several. I think 10000
is a good sweet spot.

Relates to #32789
2019-05-07 13:00:40 +02:00
Martijn van Groningen d709b8bb97
Rename enrich policy index_pattern field to indices. (#41836)
Relates to #32789
2019-05-07 09:08:28 +02:00
Martijn van Groningen f366f56f00
Change policy runner to use helper method on EnrichPolicy instead of (#41839)
its own helper method to determine alias / policy base name.

This way both the enrich processor and policy runner use the same logic
to determine the alias to use.

Relates to #32789
2019-05-07 08:55:39 +02:00
James Baiera c25736c410
Backport 7.x - Add enrich policy runner (#41088) (#41759)
Backports #41088

Adds the foundation of the execution logic to execute an enrich policy. Validates
the source index existence as well as mappings, creates a new enrich index for
the policy, reindexes the source index into the new enrich index, and swaps the 
enrich alias for the policy to the new index.
2019-05-06 10:32:46 -04:00
Michael Basnight 5d53706310 Add enrich policy DELETE API (#41495)
This commit wires up the Rest calls and Transport calls for DELETE enrich
policy, as well as tests and rest spec additions.
2019-05-02 11:02:49 -05:00
Michael Basnight 2978ac3061 Add enrich policy list API (#41553)
This commit wires up the Rest calls and Transport calls for listing all
enrich policies, as well  as tests and rest spec additions.
2019-05-02 11:01:26 -05:00
Martijn van Groningen 8838bcc776
Add enrich processor (#41532)
The enrich processor performs a lookup in a locally allocated
enrich index shard using a field value from the document being enriched.
If there is a match then the _source of the enrich document is fetched.
The document being enriched then gets the decorate values from the
enrich document based on the configured decorate fields in the pipeline.

Note that the usage of the _source field is temporary until the enrich
source field that is part of #41521 is merged into the enrich branch.
Using the _source field involves significant decompression which not
desired for enrich use cases.

The policy contains the information what field in the enrich index
to query and what fields are available to decorate a document being
enriched with.

The enrich processor has the following configuration options:
* `policy_name` - the name of the policy this processor should use
* `enrich_key` - the field in the document being enriched that holds to lookup value
* `ignore_missing` - Whether to allow the key field to be missing
* `enrich_values` - a list of fields to decorate the document being enriched with.
                    Each entry holds a source field and a target field.
                    The source field indicates what decorate field to use that is available in the policy.
                    The target field controls the field name to use in the document being enriched.
                    The source and target fields can be the same.

Example pipeline config:

```
{
   "processors": [
      {
         "policy_name": "my_policy",
         "enrich_key": "host_name",
         "enrich_values": [
            {
              "source": "globalRank",
              "target": "global_rank"
            }
         ]
      }
   ]
}
```

In the above example documents are being enriched with a global rank value.
For each document that has match in the enrich index based on its host_name field,
the document gets an global rank field value, which is fetched from the `globalRank`
field in the enrich index and saved as `global_rank` in the document being enriched.

This is PR is part one of #41521
2019-04-30 20:51:13 +02:00
Martijn van Groningen 6af17e4bdf
Add enrich qa module for rest tests and (#41568)
move put policy api yaml test to this rest module.

The main benefit is that all tests will then be run when running:
`./gradlew -p x-pack/plugin/enrich check`

The rest qa module starts a node with default distribution and basic
license.

This qa module will also be used for adding different rest tests (not yaml),
for example rest tests needed for #41532

Also when we are going to work on security integration then we can
add a security qa module under the qa folder. Also at some point
we should add a multi node qa module.
2019-04-26 20:20:02 +02:00
Michael Basnight fad45ea6bd Add enrich policy PUT API (#41383)
This commit wires up the Rest calls and Transport calls for PUT enrich
policy, as well as tests and rest spec additions.
2019-04-25 15:15:25 -05:00
Michael Basnight 85c4cc7f4b Refactor the enrich store to remove it from guice (#41421)
There is no need to create a enrich store component for the transport
layer since the inner components of the store are either present in the
master node calls or via an already injected ClusterService. This commit
cleans up the class, adds the forthcoming delete call and tests the new
code.
2019-04-23 13:28:48 -05:00
Martijn van Groningen d99249c624
Move the policy class to xpack core module. (#41311)
This allows the transport client use this class in enrich APIs.

Relates to #40997
2019-04-18 09:02:43 +02:00
Martijn van Groningen d01c1f3ba0
Added enrich policy definition. (#41003)
Relates to #32789
2019-04-12 11:15:02 +02:00
Martijn van Groningen 5aecd9d0e2
Fixed build issue after merge 2019-04-09 10:07:30 +02:00
Martijn van Groningen e6bdfea474
first commit 2019-04-05 09:32:40 +02:00