Commit Graph

37 Commits

Author SHA1 Message Date
Rory Hunter 80d925e225
Auto-format buildSrc (#51043)
Backport / reimplementation of #50786 on 7.x.

Opt-in `buildSrc` for automatic formatting. This required a config tweak
in order to pick up all the Java sources, and as a result more files are
now found in the Enrich plugin, that were previously missed.

I also moved the 2 Java files in `buildSrc/src/main/groovy` into the Java
directory, which required some follow-up changes.
2020-01-16 10:26:27 +00:00
Rory Hunter c46a0e8708
Apply 2-space indent to all gradle scripts (#49071)
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
2019-11-14 11:01:23 +00:00
Martijn van Groningen a1dd830cb5
Re-enabled test with longer timeout waiting for monitoring.
See #48258
2019-11-11 16:07:50 +01:00
Martijn van Groningen c358ecb5fb
Don't preserve indices between enrich qa tests.
This was added because it was suspected to cause the monitoring
enrich verification to fail, but that is not the case.

See #48258
2019-10-31 14:23:56 +01:00
Martijn van Groningen 05324b7f03
Muted verifying monitoring integration in enrich integration test.
Relates to #48258
2019-10-24 08:39:53 +02:00
Martijn van Groningen c09b62d5bf
Backport: also validate source index at put enrich policy time (#48311)
Backport of: #48254

This changes tests to create a valid
source index prior to creating the enrich policy.
2019-10-22 07:38:16 +02:00
Martijn van Groningen 7fc9198d46
Change how `max_matches` affects `target_field` option. (#47982)
Prior to this change the `target_field` would always be a json array
field in the document being ingested. This to take into account that
multiple enrich documents could be inserted into the `target_field`.

However the default `max_matches` is `1`. Meaning that by default
only a single enrich document would be added to `target_field` json
array field.

This commit changes this; if `max_matches` is set to `1` then the single
document would be added as a json object to the `target_field` and
if it is configured to a higher value then the enrich documents will be
added as a json array (even if a single enrich document happens to be
enriched).
2019-10-14 21:09:48 +02:00
James Baiera 73263c654a Add basic task support for executing enrich policies (#47523)
Changes the execution logic to create a new task using the execute request,
and attaches the new task to the policy runner to be updated. Also, a new
response is now returned from the execute api, which contains either the task
id of the execution, or the completed status of the run. The fields are mutually
exclusive to make it easier to discern what type of response it is.
2019-10-11 13:32:06 -04:00
Martijn van Groningen 8b7100eb1f
Don't remove indices to avoid monitoring from intermittently failing
to index monitoring docs.
2019-10-08 17:10:42 +02:00
Martijn van Groningen fe937ea4b8
Add config namespace in get policy api response (#47162)
Currently the policy config is placed directly in the json object
of the toplevel `policies` array field. For example:

```
{
    "policies": [
        {
            "match": {
                "name" : "my-policy",
                "indices" : ["users"],
                "match_field" : "email",
                "enrich_fields" : [
                    "first_name",
                    "last_name",
                    "city",
                    "zip",
                    "state"
                ]
            }
        }
    ]
}
```

This change adds a `config` field in each policy json object:

```
{
    "policies": [
        {
            "config": {
                "match": {
                    "name" : "my-policy",
                    "indices" : ["users"],
                    "match_field" : "email",
                    "enrich_fields" : [
                        "first_name",
                        "last_name",
                        "city",
                        "zip",
                        "state"
                    ]
                }
            }
        }
    ]
}
```

This allows us in the future to add other information about policies
in the get policy api response.

The UI will consume this API to build an overview of all policies.
The UI may in the future include additional information about a policy
and the plan is to include that in the get policy api, so that this
information can be gathered in a single api call.

An example of the information that is likely to be added is:
* Last policy execution time
* The status of a policy (executing, executed, unexecuted)
* Information about the last failure if exists
2019-09-30 14:37:23 +02:00
Martijn van Groningen bb3e9cb908
fixed checkstyle violation 2019-09-30 08:42:51 +02:00
Martijn van Groningen 1c3d5b77b5
give monitoring more time 2019-09-30 08:04:29 +02:00
Martijn van Groningen 8a4eefdd83
Expose enrich stats api to monitoring. (#46708)
This change also slightly modifies the stats response,
so that is can easier consumer by monitoring and other
users. (coordinators stats are now in a list instead of
a map and has an additional field for the node id)

Relates to #32789
2019-09-26 11:04:33 +02:00
Martijn van Groningen a4b0f66919
Add enrich stats api (#46462)
The enrich api returns enrich coordinator stats and
information about currently executing enrich policies.

The coordinator stats include per ingest node:
* The current number of search requests in the queue.
* The total number of outstanding remote requests that
  have been executed since node startup. Each remote
  request is likely to include multiple search requests.
  This depends on how much search requests are in the
  queue at the time when the remote request is performed.
* The number of current outstanding remote requests.
* The total number of search requests that `enrich`
  processors have executed since node startup.

The current execution policies stats include:
* The name of policy that is executing
* A full blow task info object that is executing the policy.

Relates to #32789
2019-09-11 13:40:24 +02:00
Martijn van Groningen c79a8e448d
Convert enrich qa modules to use testclusters. 2019-09-11 11:40:18 +02:00
Martijn van Groningen ded98e50b7
Change exact match processor to match processor. (#46041)
Besides a rename, this changes allows to processor to attach multiple
enrich docs to the document being ingested.

Also in order to control the maximum number of enrich docs to be
included in the document being ingested, the `max_matches` setting
is added to the enrich processor.

Relates #32789
2019-09-04 18:05:12 +02:00
Martijn van Groningen cb42e19a32
Change how type is stored in an enrich policy. (#45789)
A policy type controls how the enrich index is created and
the query executed against the match field. Currently there
is a single policy type (`exact_match`). In the near future
more policy types will be added and different policy may have
different configuration options.

For this reason type should be a json object instead of a string field:

```
{
   "exact_match": {
      ...
   }
}
```

instead of:

```
{
  "type": "exact_match",
  ...
}
```

This will make streaming parsing of enrich policies easier as in the
new format, the parsing code can know ahead what configuration fields
to expect. In the latter format that is not possible if the type field
appears not as the first field.

Relates to #32789
2019-08-23 13:43:38 +02:00
Martijn van Groningen 33972423e9
Enrich processor configuration changes (#45466)
Enrich processor configuration changes:
* Renamed `enrich_key` option to `field` option.
* Replaced `set_from` and `targets` options with `target_field`.

The `target_field` option behaves different to how `set_from` and
`targets` worked. The `target_field` is the field that will contain
the looked up document.

Relates to #32789
2019-08-22 09:49:22 +02:00
Michael Basnight e3373d349b Consolidate enrich list all and get by name APIs (#45705)
The get and list APIs are a single API in this commit. Whether
requesting one named policy or all policies, a list of policies is
returened. The list API code has all been removed and the GET api is
what remains, which contains much of the list response code.
2019-08-20 10:29:59 -05:00
Michael Basnight 03f45dad57
Fix policy removal bug in delete policy (#45573)
The delete policy had a subtle bug in that it would still delete the
policy if pipelines were accessing it, after giving the client back an
error. This commit fixes that and ensures it does not happen by adding
verification in the test.
2019-08-15 13:20:59 +02:00
Michael Basnight fd57d3cb29 Fix test broken by policy rename 2019-08-14 13:57:47 -05:00
Michael Basnight 52a094b177 Fail delete policy if pipeline exists (#44438)
If a pipeline that refrences the policy exists, we should not allow the
policy to be deleted. The user will need to remove the processor from
the pipeline before deleting the policy. This commit adds a check to
ensure that the policy cannot be deleted if it is referenced by any
pipeline in the system.
2019-08-14 13:51:10 -05:00
Martijn van Groningen 43b8ab607d
Improve naming of enrich policy fields. (#45494)
Renamed `enrich_key` to `match_field` and
renamed `enrich_values` to `enrich_fields`.

Relates #32789
2019-08-14 11:45:22 +02:00
Martijn van Groningen 4ac25b23f6
Add support for a more compact enrich values format (#45033)
In the case that source and target are the same in `enrich_values` then
a string array can be specified.

For example instead of this:

```
PUT /_ingest/pipeline/my-pipeline
{
    "processors": [
        {
            "enrich" : {
                "policy_name": "my-policy",
                "enrich_values": [
                    {
                        "source": "first_name",
                        "target": "first_name"
                    },
                    {
                        "source": "last_name",
                        "target": "last_name"
                    },
                    {
                        "source": "address",
                        "target": "address"
                    },
                    {
                        "source": "city",
                        "target": "city"
                    },
                    {
                        "source": "state",
                        "target": "state"
                    },
                    {
                        "source": "zip",
                        "target": "zip"
                    }
                ]
            }
        }
    ]
}
```
This more compact format can be specified:

```
PUT /_ingest/pipeline/my-pipeline
{
    "processors": [
        {
            "enrich" : {
                "policy_name": "my-policy",
                "targets": [
                   "first_name",
                   "last_name",
                   "address",
                   "city",
                   "state",
                   "zip"
                ]
            }
        }
    ]
}
```

And the `enrich_values` key has been renamed to `set_from`.

Relates to #32789
2019-08-09 12:40:58 +02:00
Michael Basnight b4b2ad3593 Ensure enrich policy is immutable (#43604)
This commit ensures the policy cannot be overwritten. An error is thrown
if the policy exists. All tests have been updated accordingly.
2019-07-11 13:23:12 -05:00
Michael Basnight d2c3f4bae9 Validate read priv of enrich source indices (#43595)
This commit adds permissions validation on the indices provided in the
enrich policy. These indices should be validated at store time so as not
to have cryptic error messages in the event the user does not have
permissions to access said indices.
2019-07-10 13:09:10 -05:00
Michael Basnight 6945e5d5e6 Add role for enrich processor (#42677)
This commit adds the manage_enrich privilege, which grants access to all
of the enrich processor lifecycle actions. In addition this commit also
creates a role which grants access to the generated indices.

Relates #41939

Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
2019-06-24 10:47:01 -05:00
Michael Basnight 77eed9e6a0 Add enrich policy GET API (#41384)
This commit wires up the Rest calls and Transport calls for GET enrich
policy, as well as tests and rest spec additions.
2019-05-28 23:19:23 -05:00
Martijn van Groningen 5901285773
Complete EnrichIT by using the execute enrich policy API (#42433) 2019-05-27 10:15:29 +02:00
Michael Basnight 2325ffb757 Add enrich policy execute API (#41762)
This commit wires up the Rest calls and Transport calls for execute
enrich policy, as well as tests and rest spec additions.
2019-05-24 09:39:29 -05:00
Martijn van Groningen 9e514cb161
Remove schedule field from EnrichPolicy (#42143) 2019-05-22 17:13:54 +02:00
Martijn van Groningen 57a4614a7b
Keep track of the enrich key field in the enrich index. (#42022)
The enrich key field is being kept track in _meta field by the policy runner.
The ingest processor uses the field name defined in enrich index _meta field and
not in the policy. This will avoid problems if policy is changed without
a new enrich index being created.

This also complete decouples EnrichPolicy from ExactMatchProcessor.

The following scenario results in failure without this change:
1) Create policy
2) Execute policy
3) Create pipeline with enrich processor
4) Use pipeline
5) Update enrich key in policy
6) Use pipeline, which then fails.
2019-05-09 21:28:48 +02:00
Martijn van Groningen d709b8bb97
Rename enrich policy index_pattern field to indices. (#41836)
Relates to #32789
2019-05-07 09:08:28 +02:00
Michael Basnight 5d53706310 Add enrich policy DELETE API (#41495)
This commit wires up the Rest calls and Transport calls for DELETE enrich
policy, as well as tests and rest spec additions.
2019-05-02 11:02:49 -05:00
Michael Basnight 2978ac3061 Add enrich policy list API (#41553)
This commit wires up the Rest calls and Transport calls for listing all
enrich policies, as well  as tests and rest spec additions.
2019-05-02 11:01:26 -05:00
Martijn van Groningen 8838bcc776
Add enrich processor (#41532)
The enrich processor performs a lookup in a locally allocated
enrich index shard using a field value from the document being enriched.
If there is a match then the _source of the enrich document is fetched.
The document being enriched then gets the decorate values from the
enrich document based on the configured decorate fields in the pipeline.

Note that the usage of the _source field is temporary until the enrich
source field that is part of #41521 is merged into the enrich branch.
Using the _source field involves significant decompression which not
desired for enrich use cases.

The policy contains the information what field in the enrich index
to query and what fields are available to decorate a document being
enriched with.

The enrich processor has the following configuration options:
* `policy_name` - the name of the policy this processor should use
* `enrich_key` - the field in the document being enriched that holds to lookup value
* `ignore_missing` - Whether to allow the key field to be missing
* `enrich_values` - a list of fields to decorate the document being enriched with.
                    Each entry holds a source field and a target field.
                    The source field indicates what decorate field to use that is available in the policy.
                    The target field controls the field name to use in the document being enriched.
                    The source and target fields can be the same.

Example pipeline config:

```
{
   "processors": [
      {
         "policy_name": "my_policy",
         "enrich_key": "host_name",
         "enrich_values": [
            {
              "source": "globalRank",
              "target": "global_rank"
            }
         ]
      }
   ]
}
```

In the above example documents are being enriched with a global rank value.
For each document that has match in the enrich index based on its host_name field,
the document gets an global rank field value, which is fetched from the `globalRank`
field in the enrich index and saved as `global_rank` in the document being enriched.

This is PR is part one of #41521
2019-04-30 20:51:13 +02:00
Martijn van Groningen 6af17e4bdf
Add enrich qa module for rest tests and (#41568)
move put policy api yaml test to this rest module.

The main benefit is that all tests will then be run when running:
`./gradlew -p x-pack/plugin/enrich check`

The rest qa module starts a node with default distribution and basic
license.

This qa module will also be used for adding different rest tests (not yaml),
for example rest tests needed for #41532

Also when we are going to work on security integration then we can
add a security qa module under the qa folder. Also at some point
we should add a multi node qa module.
2019-04-26 20:20:02 +02:00