OpenSearch

Commit Graph

Author	SHA1	Message	Date
Martijn van Groningen	ded98e50b7	Change exact match processor to match processor. (#46041 ) Besides a rename, this changes allows to processor to attach multiple enrich docs to the document being ingested. Also in order to control the maximum number of enrich docs to be included in the document being ingested, the `max_matches` setting is added to the enrich processor. Relates #32789	2019-09-04 18:05:12 +02:00
Martijn van Groningen	6bec63fdfa	removed redundant cast	2019-09-04 11:18:31 +02:00
Michael Basnight	51a703da29	Add enrich transport client support (#46002 ) This commit adds an enrich client, as well as a smoke test to validate the client works.	2019-08-29 09:10:07 -05:00
Michael Basnight	a82d24b3ce	Remove enrich indices on delete policy (#45870 ) When a policy is deleted, the enrich indices that are backing the policy alias should also be deleted. This commit does that work and cleans up the transport action a bit so that the lock release is easier to see, as well as to ensure that any action carried out, regardless of exception, unlocks the policy.	2019-08-23 15:26:43 -05:00
Martijn van Groningen	a38e6850a5	fixed errors after cherry-picking 2 commits	2019-08-23 13:51:00 +02:00
Martijn van Groningen	6067065ed6	Decouple enrich processor factory from enrich policy (#45826 ) This commit changes the enrich processor factory to read the required configuration from the current enrich index (from meta mapping field) in order to create the processor. Before this change the required config was read from the enrich policy in the cluster state. Enrich policies are going to be stored in an index (instead of the cluster state). In a processor factory there isn't a way to load something from an index, so with this change we read the required config / info from the enrich index (which is derived from the enrich policy), which then allows us to move enrich policies to an index. With this change it is required to execute a policy before creating a pipeline. Otherwise there is no enrich index and then there is no way to validate that a policy exist or retrieve its type and match field. Relates to #32789	2019-08-23 13:46:39 +02:00
Martijn van Groningen	33972423e9	Enrich processor configuration changes (#45466 ) Enrich processor configuration changes: * Renamed `enrich_key` option to `field` option. * Replaced `set_from` and `targets` options with `target_field`. The `target_field` option behaves different to how `set_from` and `targets` worked. The `target_field` is the field that will contain the looked up document. Relates to #32789	2019-08-22 09:49:22 +02:00
Martijn van Groningen	5864f30771	ensure that the items in the bulk response are the same as is in the bulk request	2019-08-21 10:07:02 +02:00
Martijn van Groningen	ac7173c0d4	Renamed CoordinatorProxyAction to EnrichCoordinatorProxyAction and (#45663 ) fail if query shard context needs current time (certain queries / scripts use this, but in the enrich context this is not used).	2019-08-20 18:51:47 +02:00
Michael Basnight	e3373d349b	Consolidate enrich list all and get by name APIs (#45705 ) The get and list APIs are a single API in this commit. Whether requesting one named policy or all policies, a list of policies is returened. The list API code has all been removed and the GET api is what remains, which contains much of the list response code.	2019-08-20 10:29:59 -05:00
Michael Basnight	db57d2206a	Prevent delete policy for active executing policy (#45472 ) This commit adds a lock to the delete policy, in the same way that the locking is done for policy execution. It also creates a test to exercise the delete transport action, and modifies an existing test to provide a common set of functions for saving and deleting policies.	2019-08-15 10:08:11 -05:00
Michael Basnight	03f45dad57	Fix policy removal bug in delete policy (#45573 ) The delete policy had a subtle bug in that it would still delete the policy if pipelines were accessing it, after giving the client back an error. This commit fixes that and ensures it does not happen by adding verification in the test.	2019-08-15 13:20:59 +02:00
Michael Basnight	52a094b177	Fail delete policy if pipeline exists (#44438 ) If a pipeline that refrences the policy exists, we should not allow the policy to be deleted. The user will need to remove the processor from the pipeline before deleting the policy. This commit adds a check to ensure that the policy cannot be deleted if it is referenced by any pipeline in the system.	2019-08-14 13:51:10 -05:00
Martijn van Groningen	43b8ab607d	Improve naming of enrich policy fields. (#45494 ) Renamed `enrich_key` to `match_field` and renamed `enrich_values` to `enrich_fields`. Relates #32789	2019-08-14 11:45:22 +02:00
Martijn van Groningen	452557cf2e	Validate policy name like an index name. (#45452 ) The policy name is used to generate the enrich index name. For this reason, a policy name should be validated in the same way as index names. Relates to #32789	2019-08-13 20:25:17 +02:00
Martijn van Groningen	0353eb9291	required changes after merging in upstream branch	2019-08-13 09:17:57 +02:00
Martijn van Groningen	4ac25b23f6	Add support for a more compact enrich values format (#45033 ) In the case that source and target are the same in `enrich_values` then a string array can be specified. For example instead of this: ``` PUT /_ingest/pipeline/my-pipeline { "processors": [ { "enrich" : { "policy_name": "my-policy", "enrich_values": [ { "source": "first_name", "target": "first_name" }, { "source": "last_name", "target": "last_name" }, { "source": "address", "target": "address" }, { "source": "city", "target": "city" }, { "source": "state", "target": "state" }, { "source": "zip", "target": "zip" } ] } } ] } ``` This more compact format can be specified: ``` PUT /_ingest/pipeline/my-pipeline { "processors": [ { "enrich" : { "policy_name": "my-policy", "targets": [ "first_name", "last_name", "address", "city", "state", "zip" ] } } ] } ``` And the `enrich_values` key has been renamed to `set_from`. Relates to #32789	2019-08-09 12:40:58 +02:00
Martijn van Groningen	f1ee29f22e	Added a custom api to perform the msearch more efficiently for enrich processor (#43965 ) Currently the msearch api is used to execute buffered search requests; however the msearch api doesn't deal with search requests in an intelligent way. It basically executes each search separately in a concurrent manner. This api reuses the msearch request and response classes and executes the searches as one request in the node holding the enrich index shard. Things like engine.searcher and query shard context are only created once. Also there are less layers than executing a regular msearch request. This results in an interesting speedup. Without this change, in a single node cluster, enriching documents with a bulk size of 5000 items, the ingest time in each bulk response varied from 174ms to 822ms. With this change the ingest time in each bulk response varied from 54ms to 109ms. I think we should add a change like this based on this improvement in ingest time. However I do wonder if instead of doing this change, we should improve the msearch api to execute more efficiently. That would be more complicated then this change, because in this change the custom api can only search enrich index shards and these are special because they always have a single primary shard. If msearch api is to be improved then that should work for any search request to any indices. Making the same optimization for indices with more than 1 primary shard requires much more work. The current change is isolated in the enrich plugin and LOC / complexity is small. So this good enough for now.	2019-08-09 09:11:04 +02:00
Martijn van Groningen	e3fd1e6c7d	Add support for overwrite parameter in the enrich processor. (#45029 ) Similar to how it is supported in the set processor: https://www.elastic.co/guide/en/elasticsearch/reference/current/set-processor.html Relates to #32789	2019-08-08 10:33:19 +02:00
James Baiera	480af1ccf2	Fix build errors (#44933 ) Add EnrichPlugin to test cases that update cluster state	2019-07-29 14:17:44 +07:00
James Baiera	fda4db4fab	fixup! Merge branch '7.x' into enrich-7.x	2019-07-25 15:28:40 -04:00
James Baiera	c357f81aa7	Add soft limit for max concurrent policy executions (#43117 ) Adds a global soft limit on the number of concurrently executing enrich policies. Since an enrich policy is run on the generic thread pool, this is meant to limit policy runs separately from the generic thread pool capacity.	2019-07-23 16:03:14 -04:00
James Baiera	fc20264b99	Add Enrich index background task to cleanup old indices (#43746 ) This PR adds a background maintenance task that is scheduled on the master node only. The deletion of an index is based on if it is not linked to a policy or if the enrich alias is not currently pointing at it. Synchronization has been added to make sure that no policy executions are running at the time of cleanup, and if any executions do occur, the marking process delays cleanup until next run.	2019-07-22 14:41:22 -04:00
James Baiera	7ad9beb087	Set auto expand replicas on enrich index after force merge is done. (#43600 )	2019-07-12 11:56:56 -04:00
Michael Basnight	b4b2ad3593	Ensure enrich policy is immutable (#43604 ) This commit ensures the policy cannot be overwritten. An error is thrown if the policy exists. All tests have been updated accordingly.	2019-07-11 13:23:12 -05:00
Michael Basnight	d2c3f4bae9	Validate read priv of enrich source indices (#43595 ) This commit adds permissions validation on the indices provided in the enrich policy. These indices should be validated at store time so as not to have cryptic error messages in the event the user does not have permissions to access said indices.	2019-07-10 13:09:10 -05:00
Martijn van Groningen	9528c59fb3	added a basic test that enriching data works	2019-07-04 17:42:45 +02:00
Martijn van Groningen	7ba6e1752a	required changes after merge	2019-07-04 13:17:22 +02:00
Martijn van Groningen	397150fa1e	Add enrich coordinator proxy action (#43801 ) Introduced proxy api the handle the search request load that originates from enrich processor. The enrich processor can execute many search requests that execute asynchronously in parallel and that can easily overwhelm the search thread pool on nodes. In order to protect this the Coordinator queues the search requests and only executes a fixed number of search requests in parallel. Besides this; the Coordinator tries to include as much as possible search requests (up to a defined maximum) inside a multi search request in order to reduce the number of remote api calls to be made from the node that performs ingestion.	2019-07-03 15:50:40 +02:00
Martijn van Groningen	785aedebad	Add restart node enrich tests. (#43579 ) This test verifies that enrich policies still exist after a full cluster restart. If EnrichPolicy is not registered as named xcontent in EnrichPlugin class then this test fails.	2019-07-01 17:36:01 +02:00
Martijn van Groningen	237f2bd60a	Make ingest executing non blocking (#43361 ) Added an additional method to the Processor interface to allow a processor implementation to make a non blocking call. Also added semaphore in order to avoid search thread pools from rejecting search requests originating from the match processor. This is a temporary workaround.	2019-07-01 08:01:46 +02:00
Martijn van Groningen	d6a7fd9f30	unmuted test	2019-06-25 19:54:00 +02:00
James Baiera	1b902aa746	Make enrich processor use search action through a client (#43311 ) Add client to processor parameters in the ingest service. Remove the search provider function from the processor parameters. ExactMatchProcessor and Factory converted to use client. Remove test cases that are no longer applicable from processor.	2019-06-25 13:09:08 -04:00
Martijn van Groningen	36f0e8a8bb	Added multi node enrich tests and fixed serialization issues. (#43386 ) The test for now tests the enrich APIs in a multi node environment. Picked EsIntegTestCase test over a real qa module in order to avoid adding another module that starts a test cluster.	2019-06-25 14:03:10 +02:00
James Baiera	c0d5ec87e1	Set enrich indices to be read only before swapping their aliases (#42874 )	2019-06-24 15:14:11 -04:00
James Baiera	9d56a0365f	Limit a enrich policy execution to only one at a time (#42535 ) Add a keyed lock mechanism to the policy executor to ensure that an enrich policy can only have one execution happening at a time.	2019-06-12 10:45:10 -04:00
James Baiera	415f1a484f	Enrich validate nested mappings (#42452 ) Ensures that fields retained in an enrich index from a source are not contained under a nested field. It additionally makes sure that key fields exist, and that value fields are checked if they are present. The policy runner test has also been expanded with some faulty mapping test cases.	2019-06-03 16:02:31 -04:00
Michael Basnight	77eed9e6a0	Add enrich policy GET API (#41384 ) This commit wires up the Rest calls and Transport calls for GET enrich policy, as well as tests and rest spec additions.	2019-05-28 23:19:23 -05:00
Martijn van Groningen	484f5cee39	Stricter update dependency between pipelines and components used by pipelines (#42038 ) Add support for components used by processor factories to get updated before processor factories create new processor instances. Components can register via `IngestService#addIngestClusterStateListener(...)` then if the internal representation of ingest pipelines get updated, these components get updated with the current cluster state before pipelines are updated. Registered EnrichProcessorFactory as ingest cluster state listener, so that it has always an up to date view of the active enrich policies.	2019-05-28 09:04:46 +02:00
Michael Basnight	2325ffb757	Add enrich policy execute API (#41762 ) This commit wires up the Rest calls and Transport calls for execute enrich policy, as well as tests and rest spec additions.	2019-05-24 09:39:29 -05:00
James Baiera	824ccfabd9	Backport 7.x - Add step to forcemerge enrich index after reindex (#41969 ) Adds a step in the policy execution that forcemerge's a new enrich index after reindex completes.	2019-05-22 16:51:44 -04:00
Martijn van Groningen	9e514cb161	Remove schedule field from EnrichPolicy (#42143 )	2019-05-22 17:13:54 +02:00
Martijn van Groningen	57a4614a7b	Keep track of the enrich key field in the enrich index. (#42022 ) The enrich key field is being kept track in _meta field by the policy runner. The ingest processor uses the field name defined in enrich index _meta field and not in the policy. This will avoid problems if policy is changed without a new enrich index being created. This also complete decouples EnrichPolicy from ExactMatchProcessor. The following scenario results in failure without this change: 1) Create policy 2) Execute policy 3) Create pipeline with enrich processor 4) Use pipeline 5) Update enrich key in policy 6) Use pipeline, which then fails.	2019-05-09 21:28:48 +02:00
Martijn van Groningen	299ff70bfe	Enrich store should only update the policies via an update task. (#41944 )	2019-05-09 18:21:40 +02:00
Martijn van Groningen	1b00e7f834	Change the reindex fetch in policy runner from 1000 to 10000 and (#41838 ) Reindex uses scroll searches to read the source data. It is more efficient to read more data in one search scroll round then several. I think 10000 is a good sweet spot. Relates to #32789	2019-05-07 13:00:40 +02:00
Martijn van Groningen	d709b8bb97	Rename enrich policy index_pattern field to indices. (#41836 ) Relates to #32789	2019-05-07 09:08:28 +02:00
Martijn van Groningen	f366f56f00	Change policy runner to use helper method on EnrichPolicy instead of (#41839 ) its own helper method to determine alias / policy base name. This way both the enrich processor and policy runner use the same logic to determine the alias to use. Relates to #32789	2019-05-07 08:55:39 +02:00
James Baiera	c25736c410	Backport 7.x - Add enrich policy runner (#41088 ) (#41759 ) Backports #41088 Adds the foundation of the execution logic to execute an enrich policy. Validates the source index existence as well as mappings, creates a new enrich index for the policy, reindexes the source index into the new enrich index, and swaps the enrich alias for the policy to the new index.	2019-05-06 10:32:46 -04:00
Michael Basnight	5d53706310	Add enrich policy DELETE API (#41495 ) This commit wires up the Rest calls and Transport calls for DELETE enrich policy, as well as tests and rest spec additions.	2019-05-02 11:02:49 -05:00
Michael Basnight	2978ac3061	Add enrich policy list API (#41553 ) This commit wires up the Rest calls and Transport calls for listing all enrich policies, as well as tests and rest spec additions.	2019-05-02 11:01:26 -05:00
Martijn van Groningen	8838bcc776	Add enrich processor (#41532 ) The enrich processor performs a lookup in a locally allocated enrich index shard using a field value from the document being enriched. If there is a match then the _source of the enrich document is fetched. The document being enriched then gets the decorate values from the enrich document based on the configured decorate fields in the pipeline. Note that the usage of the _source field is temporary until the enrich source field that is part of #41521 is merged into the enrich branch. Using the _source field involves significant decompression which not desired for enrich use cases. The policy contains the information what field in the enrich index to query and what fields are available to decorate a document being enriched with. The enrich processor has the following configuration options: * `policy_name` - the name of the policy this processor should use * `enrich_key` - the field in the document being enriched that holds to lookup value * `ignore_missing` - Whether to allow the key field to be missing * `enrich_values` - a list of fields to decorate the document being enriched with. Each entry holds a source field and a target field. The source field indicates what decorate field to use that is available in the policy. The target field controls the field name to use in the document being enriched. The source and target fields can be the same. Example pipeline config: ``` { "processors": [ { "policy_name": "my_policy", "enrich_key": "host_name", "enrich_values": [ { "source": "globalRank", "target": "global_rank" } ] } ] } ``` In the above example documents are being enriched with a global rank value. For each document that has match in the enrich index based on its host_name field, the document gets an global rank field value, which is fetched from the `globalRank` field in the enrich index and saved as `global_rank` in the document being enriched. This is PR is part one of #41521	2019-04-30 20:51:13 +02:00
Michael Basnight	fad45ea6bd	Add enrich policy PUT API (#41383 ) This commit wires up the Rest calls and Transport calls for PUT enrich policy, as well as tests and rest spec additions.	2019-04-25 15:15:25 -05:00
Michael Basnight	85c4cc7f4b	Refactor the enrich store to remove it from guice (#41421 ) There is no need to create a enrich store component for the transport layer since the inner components of the store are either present in the master node calls or via an already injected ClusterService. This commit cleans up the class, adds the forthcoming delete call and tests the new code.	2019-04-23 13:28:48 -05:00
Martijn van Groningen	d99249c624	Move the policy class to xpack core module. (#41311 ) This allows the transport client use this class in enrich APIs. Relates to #40997	2019-04-18 09:02:43 +02:00
Martijn van Groningen	d01c1f3ba0	Added enrich policy definition. (#41003 ) Relates to #32789	2019-04-12 11:15:02 +02:00
Martijn van Groningen	e6bdfea474	first commit	2019-04-05 09:32:40 +02:00

1 2 3

106 Commits