114 Commits

Author SHA1 Message Date
Michael Basnight
9304f5c889 Ensure enrich executes on master node only (#46448)
The previous transport action was a read action, which under the right
set of circumstances can execute on a coordinating node. This commit
ensures that cannot happen.
2019-09-10 09:59:36 -05:00
Martijn van Groningen
ded98e50b7
Change exact match processor to match processor. (#46041)
Besides a rename, this changes allows to processor to attach multiple
enrich docs to the document being ingested.

Also in order to control the maximum number of enrich docs to be
included in the document being ingested, the `max_matches` setting
is added to the enrich processor.

Relates #32789
2019-09-04 18:05:12 +02:00
Martijn van Groningen
6bec63fdfa
removed redundant cast 2019-09-04 11:18:31 +02:00
Michael Basnight
51a703da29
Add enrich transport client support (#46002)
This commit adds an enrich client, as well as a smoke test to validate
the client works.
2019-08-29 09:10:07 -05:00
Michael Basnight
a82d24b3ce Remove enrich indices on delete policy (#45870)
When a policy is deleted, the enrich indices that are backing the policy
alias should also be deleted. This commit does that work and cleans up
the transport action a bit so that the lock release is easier to see, as
well as to ensure that any action carried out, regardless of exception,
unlocks the policy.
2019-08-23 15:26:43 -05:00
Martijn van Groningen
a38e6850a5
fixed errors after cherry-picking 2 commits 2019-08-23 13:51:00 +02:00
Martijn van Groningen
6067065ed6
Decouple enrich processor factory from enrich policy (#45826)
This commit changes the enrich processor factory to read the required
configuration from the current enrich index (from meta mapping field)
in order to create the processor.

Before this change the required config was read from the enrich policy
in the cluster state. Enrich policies are going to be stored in an
index (instead of the cluster state). In a processor factory there isn't
a way to load something from an index, so with this change we read
the required config / info from the enrich index (which is derived
from the enrich policy), which then allows us to move enrich policies
to an index.

With this change it is required to execute a policy before creating a
pipeline. Otherwise there is no enrich index and then there is no way
to validate that a policy exist or retrieve its type and match field.

Relates to #32789
2019-08-23 13:46:39 +02:00
Martijn van Groningen
cb42e19a32
Change how type is stored in an enrich policy. (#45789)
A policy type controls how the enrich index is created and
the query executed against the match field. Currently there
is a single policy type (`exact_match`). In the near future
more policy types will be added and different policy may have
different configuration options.

For this reason type should be a json object instead of a string field:

```
{
   "exact_match": {
      ...
   }
}
```

instead of:

```
{
  "type": "exact_match",
  ...
}
```

This will make streaming parsing of enrich policies easier as in the
new format, the parsing code can know ahead what configuration fields
to expect. In the latter format that is not possible if the type field
appears not as the first field.

Relates to #32789
2019-08-23 13:43:38 +02:00
Martijn van Groningen
33972423e9
Enrich processor configuration changes (#45466)
Enrich processor configuration changes:
* Renamed `enrich_key` option to `field` option.
* Replaced `set_from` and `targets` options with `target_field`.

The `target_field` option behaves different to how `set_from` and
`targets` worked. The `target_field` is the field that will contain
the looked up document.

Relates to #32789
2019-08-22 09:49:22 +02:00
Martijn van Groningen
5864f30771
ensure that the items in the bulk response are the same as is in the bulk request 2019-08-21 10:07:02 +02:00
Martijn van Groningen
ac7173c0d4
Renamed CoordinatorProxyAction to EnrichCoordinatorProxyAction and (#45663)
fail if query shard context needs current time (certain queries / scripts
use this, but in the enrich context this is not used).
2019-08-20 18:51:47 +02:00
Michael Basnight
e3373d349b Consolidate enrich list all and get by name APIs (#45705)
The get and list APIs are a single API in this commit. Whether
requesting one named policy or all policies, a list of policies is
returened. The list API code has all been removed and the GET api is
what remains, which contains much of the list response code.
2019-08-20 10:29:59 -05:00
Michael Basnight
db57d2206a Prevent delete policy for active executing policy (#45472)
This commit adds a lock to the delete policy, in the same way that the
locking is done for policy execution. It also creates a test to exercise
the delete transport action, and modifies an existing test to provide a
common set of functions for saving and deleting policies.
2019-08-15 10:08:11 -05:00
Michael Basnight
03f45dad57
Fix policy removal bug in delete policy (#45573)
The delete policy had a subtle bug in that it would still delete the
policy if pipelines were accessing it, after giving the client back an
error. This commit fixes that and ensures it does not happen by adding
verification in the test.
2019-08-15 13:20:59 +02:00
Michael Basnight
fd57d3cb29 Fix test broken by policy rename 2019-08-14 13:57:47 -05:00
Michael Basnight
52a094b177 Fail delete policy if pipeline exists (#44438)
If a pipeline that refrences the policy exists, we should not allow the
policy to be deleted. The user will need to remove the processor from
the pipeline before deleting the policy. This commit adds a check to
ensure that the policy cannot be deleted if it is referenced by any
pipeline in the system.
2019-08-14 13:51:10 -05:00
Martijn van Groningen
43b8ab607d
Improve naming of enrich policy fields. (#45494)
Renamed `enrich_key` to `match_field` and
renamed `enrich_values` to `enrich_fields`.

Relates #32789
2019-08-14 11:45:22 +02:00
Martijn van Groningen
452557cf2e
Validate policy name like an index name. (#45452)
The policy name is used to generate the enrich index name.
For this reason, a policy name should be validated in the same way
as index names.

Relates to #32789
2019-08-13 20:25:17 +02:00
Martijn van Groningen
0353eb9291
required changes after merging in upstream branch 2019-08-13 09:17:57 +02:00
Martijn van Groningen
4ac25b23f6
Add support for a more compact enrich values format (#45033)
In the case that source and target are the same in `enrich_values` then
a string array can be specified.

For example instead of this:

```
PUT /_ingest/pipeline/my-pipeline
{
    "processors": [
        {
            "enrich" : {
                "policy_name": "my-policy",
                "enrich_values": [
                    {
                        "source": "first_name",
                        "target": "first_name"
                    },
                    {
                        "source": "last_name",
                        "target": "last_name"
                    },
                    {
                        "source": "address",
                        "target": "address"
                    },
                    {
                        "source": "city",
                        "target": "city"
                    },
                    {
                        "source": "state",
                        "target": "state"
                    },
                    {
                        "source": "zip",
                        "target": "zip"
                    }
                ]
            }
        }
    ]
}
```
This more compact format can be specified:

```
PUT /_ingest/pipeline/my-pipeline
{
    "processors": [
        {
            "enrich" : {
                "policy_name": "my-policy",
                "targets": [
                   "first_name",
                   "last_name",
                   "address",
                   "city",
                   "state",
                   "zip"
                ]
            }
        }
    ]
}
```

And the `enrich_values` key has been renamed to `set_from`.

Relates to #32789
2019-08-09 12:40:58 +02:00
Martijn van Groningen
f1ee29f22e
Added a custom api to perform the msearch more efficiently for enrich processor (#43965)
Currently the msearch api is used to execute buffered search requests;
however the msearch api doesn't deal with search requests in an intelligent way.
It basically executes each search separately in a concurrent manner.

This api reuses the msearch request and response classes and executes
the searches as one request in the node holding the enrich index shard.
Things like engine.searcher and query shard context are only created once.
Also there are less layers than executing a regular msearch request. This
results in an interesting speedup.

Without this change, in a single node cluster, enriching documents
with a bulk size of 5000 items, the ingest time in each bulk response
varied from 174ms to 822ms. With this change the ingest time in each
bulk response varied from 54ms to 109ms.

I think we should add a change like this based on this improvement in ingest time.

However I do wonder if instead of doing this change, we should improve
the msearch api to execute more efficiently. That would be more complicated
then this change, because in this change the custom api can only search
enrich index shards and these are special because they always have a single
primary shard. If msearch api is to be improved then that should work for
any search request to any indices. Making the same optimization for
indices with more than 1 primary shard requires much more work.

The current change is isolated in the enrich plugin and LOC / complexity
is small. So this good enough for now.
2019-08-09 09:11:04 +02:00
Martijn van Groningen
e3fd1e6c7d
Add support for overwrite parameter in the enrich processor. (#45029)
Similar to how it is supported in the set processor:
https://www.elastic.co/guide/en/elasticsearch/reference/current/set-processor.html

Relates to #32789
2019-08-08 10:33:19 +02:00
James Baiera
480af1ccf2
Fix build errors (#44933)
Add EnrichPlugin to test cases that update cluster state
2019-07-29 14:17:44 +07:00
James Baiera
fda4db4fab fixup! Merge branch '7.x' into enrich-7.x 2019-07-25 15:28:40 -04:00
James Baiera
c357f81aa7 Add soft limit for max concurrent policy executions (#43117)
Adds a global soft limit on the number of concurrently executing enrich policies.
Since an enrich policy is run on the generic thread pool, this is meant to limit
policy runs separately from the generic thread pool capacity.
2019-07-23 16:03:14 -04:00
James Baiera
fc20264b99 Add Enrich index background task to cleanup old indices (#43746)
This PR adds a background maintenance task that is scheduled on the master node only.
The deletion of an index is based on if it is not linked to a policy or if the enrich alias is not
currently pointing at it. Synchronization has been added to make sure that no policy
executions are running at the time of cleanup, and if any executions do occur, the marking
process delays cleanup until next run.
2019-07-22 14:41:22 -04:00
James Baiera
7ad9beb087 Set auto expand replicas on enrich index after force merge is done. (#43600) 2019-07-12 11:56:56 -04:00
Michael Basnight
b4b2ad3593 Ensure enrich policy is immutable (#43604)
This commit ensures the policy cannot be overwritten. An error is thrown
if the policy exists. All tests have been updated accordingly.
2019-07-11 13:23:12 -05:00
Michael Basnight
d2c3f4bae9 Validate read priv of enrich source indices (#43595)
This commit adds permissions validation on the indices provided in the
enrich policy. These indices should be validated at store time so as not
to have cryptic error messages in the event the user does not have
permissions to access said indices.
2019-07-10 13:09:10 -05:00
Martijn van Groningen
9528c59fb3
added a basic test that enriching data works 2019-07-04 17:42:45 +02:00
Martijn van Groningen
7ba6e1752a
required changes after merge 2019-07-04 13:17:22 +02:00
Martijn van Groningen
397150fa1e
Add enrich coordinator proxy action (#43801)
Introduced proxy api the handle the search request load that originates
from enrich processor. The enrich processor can execute many search
requests that execute asynchronously in parallel and that can easily overwhelm
the search thread pool on nodes. In order to protect this the Coordinator
queues the search requests and only executes a fixed number of search requests
in parallel.

Besides this; the Coordinator tries to include as much as possible search requests
(up to a defined maximum) inside a multi search request in order to reduce the number
of remote api calls to be made from the node that performs ingestion.
2019-07-03 15:50:40 +02:00
Martijn van Groningen
785aedebad
Add restart node enrich tests. (#43579)
This test verifies that enrich policies still exist after a full
cluster restart. If EnrichPolicy is not registered as named xcontent
in EnrichPlugin class then this test fails.
2019-07-01 17:36:01 +02:00
Martijn van Groningen
237f2bd60a
Make ingest executing non blocking (#43361)
Added an additional method to the Processor interface to allow a
processor implementation to make a non blocking call.

Also added semaphore in order to avoid search thread pools from rejecting
search requests originating from the match processor. This is a temporary workaround.
2019-07-01 08:01:46 +02:00
Martijn van Groningen
d6a7fd9f30
unmuted test 2019-06-25 19:54:00 +02:00
James Baiera
1b902aa746 Make enrich processor use search action through a client (#43311)
Add client to processor parameters in the ingest service.
Remove the search provider function from the processor parameters.
ExactMatchProcessor and Factory converted to use client.
Remove test cases that are no longer applicable from processor.
2019-06-25 13:09:08 -04:00
Martijn van Groningen
36f0e8a8bb
Added multi node enrich tests and fixed serialization issues. (#43386)
The test for now tests the enrich APIs in a multi node environment.
Picked EsIntegTestCase test over a real qa module in order to avoid
adding another module that starts a test cluster.
2019-06-25 14:03:10 +02:00
James Baiera
c0d5ec87e1 Set enrich indices to be read only before swapping their aliases (#42874) 2019-06-24 15:14:11 -04:00
Michael Basnight
6945e5d5e6 Add role for enrich processor (#42677)
This commit adds the manage_enrich privilege, which grants access to all
of the enrich processor lifecycle actions. In addition this commit also
creates a role which grants access to the generated indices.

Relates #41939

Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
2019-06-24 10:47:01 -05:00
Martijn van Groningen
c8e6474eef
Changes required for merging in 7.x branch. 2019-06-13 16:58:27 +02:00
James Baiera
9d56a0365f Limit a enrich policy execution to only one at a time (#42535)
Add a keyed lock mechanism to the policy executor to ensure that an enrich policy
can only have one execution happening at a time.
2019-06-12 10:45:10 -04:00
James Baiera
415f1a484f Enrich validate nested mappings (#42452)
Ensures that fields retained in an enrich index from a source are not contained
under a nested field. It additionally makes sure that key fields exist, and that
value fields are checked if they are present. The policy runner test has also
been expanded with some faulty mapping test cases.
2019-06-03 16:02:31 -04:00
Michael Basnight
77eed9e6a0 Add enrich policy GET API (#41384)
This commit wires up the Rest calls and Transport calls for GET enrich
policy, as well as tests and rest spec additions.
2019-05-28 23:19:23 -05:00
Martijn van Groningen
484f5cee39
Stricter update dependency between pipelines and components used by pipelines (#42038)
Add support for components used by processor factories to get updated
before processor factories create new processor instances.

Components can register via `IngestService#addIngestClusterStateListener(...)`
then if the internal  representation of ingest pipelines get updated,
these components get  updated with the current cluster state before
pipelines are updated.

Registered EnrichProcessorFactory as ingest cluster state listener, so
that it has always an up to date view of the active enrich policies.
2019-05-28 09:04:46 +02:00
Martijn van Groningen
5901285773
Complete EnrichIT by using the execute enrich policy API (#42433) 2019-05-27 10:15:29 +02:00
Michael Basnight
2325ffb757 Add enrich policy execute API (#41762)
This commit wires up the Rest calls and Transport calls for execute
enrich policy, as well as tests and rest spec additions.
2019-05-24 09:39:29 -05:00
James Baiera
824ccfabd9 Backport 7.x - Add step to forcemerge enrich index after reindex (#41969)
Adds a step in the policy execution that forcemerge's a new enrich index after reindex completes.
2019-05-22 16:51:44 -04:00
Martijn van Groningen
9e514cb161
Remove schedule field from EnrichPolicy (#42143) 2019-05-22 17:13:54 +02:00
Martijn van Groningen
57a4614a7b
Keep track of the enrich key field in the enrich index. (#42022)
The enrich key field is being kept track in _meta field by the policy runner.
The ingest processor uses the field name defined in enrich index _meta field and
not in the policy. This will avoid problems if policy is changed without
a new enrich index being created.

This also complete decouples EnrichPolicy from ExactMatchProcessor.

The following scenario results in failure without this change:
1) Create policy
2) Execute policy
3) Create pipeline with enrich processor
4) Use pipeline
5) Update enrich key in policy
6) Use pipeline, which then fails.
2019-05-09 21:28:48 +02:00
Martijn van Groningen
299ff70bfe
Enrich store should only update the policies via an update task. (#41944) 2019-05-09 18:21:40 +02:00