OpenSearch

Commit Graph

Author	SHA1	Message	Date
István Zoltán Szabó	487b273286	[DOCS] Adds feature importance mapping subsection to inference processor docs (#54190 )	2020-03-26 09:26:50 +01:00
Dan Hermann	94ac979c66	Support array for all string ingest processors (#53694 )	2020-03-18 07:07:49 -05:00
Benjamin Trent	4e43ede735	[ML] renaming inference processor field field_mappings to new name field_map (#53433 ) (#53502 ) This renames the `inference` processor configuration field `field_mappings` to `field_map`. `field_mappings` is now deprecated.	2020-03-13 15:40:57 -04:00
Benjamin Trent	89668c5ea0	[ML][Inference] adds new default_field_map field to trained models (#53294 ) (#53419 ) Adds a new `default_field_map` field to trained model config objects. This allows the model creator to supply field map if it knows that there should be some map for inference to work directly against the training data. The use case internally is having analytics jobs supply a field mapping for multi-field fields. This allows us to use the model "out of the box" on data where we trained on `foo.keyword` but the `_source` only references `foo`.	2020-03-11 13:49:39 -04:00
David Pilato	6c6ab8fa47	[DOS] Fix typo in CSV processor docs (#52649 ) Corrects an example array in a snippet of the CSV processor docs.	2020-02-25 08:48:50 -05:00
Benjamin Trent	afd90647c9	[ML] Adds feature importance to option to inference processor (#52218 ) (#52666 ) This adds machine learning model feature importance calculations to the inference processor. The new flag in the configuration matches the analytics parameter name: `num_top_feature_importance_values` Example: ``` "inference": { "field_mappings": {}, "model_id": "my_model", "inference_config": { "regression": { "num_top_feature_importance_values": 3 } } } ``` This will write to the document as follows: ``` "inference" : { "feature_importance" : { "FlightTimeMin" : -76.90955548511226, "FlightDelayType" : 114.13514762158526, "DistanceMiles" : 13.731580450792187 }, "predicted_value" : 108.33165831875137, "model_id" : "my_model" } ``` This is done through calculating the [SHAP values](https://arxiv.org/abs/1802.03888). It requires that models have populated `number_samples` for each tree node. This is not available to models that were created before 7.7. Additionally, if the inference config is requesting feature_importance, and not all nodes have been upgraded yet, it will not allow the pipeline to be created. This is to safe-guard in a mixed-version environment where only some ingest nodes have been upgraded. NOTE: the algorithm is a Java port of the one laid out in ml-cpp: https://github.com/elastic/ml-cpp/blob/master/lib/maths/CTreeShapFeatureImportance.cc usability blocked by: https://github.com/elastic/ml-cpp/pull/991	2020-02-21 18:42:31 -05:00
Yang Wang	16ba59e9d1	Expose more authentication info to ingest pipeline (#51305 ) (#52119 ) The changes add more granularity for identiying the data ingestion user. The ingest pipeline can now be configure to record authentication realm and type. It can also record API key name and ID when one is in use. This improves traceability when data are being ingested from multiple agents and will become more relevant with the incoming support of required pipelines (#46847) Resolves: #49106	2020-02-11 23:05:01 +11:00
Przemko Robakowski	6332de40b4	Add empty_value parameter to CSV processor (#51567 ) (#51966 ) * Add empty_value parameter to CSV processor This change adds `empty_value` parameter to the CSV processor. This value is used to fill empty fields. Fields will be skipped if this parameter is ommited. This behavior is the same for both quoted and unquoted fields. * docs updated * Fix compilation problem Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-02-05 23:35:52 +01:00
David Kyle	289d4f4f4d	[ML] Remove stray field from inference docs (#51870 ) model_info_field is not a valid option	2020-02-05 10:50:51 +00:00
Florian Kelbert	43a7aadd46	[DOCS] Remove unneeded comma from CSV processor example (#51859 )	2020-02-04 09:26:20 -05:00
István Zoltán Szabó	30d1587ad5	[DOCS] Fixes indentation in inference processor code snippet (#51252 )	2020-01-21 16:22:16 +01:00
Martijn van Groningen	02dfd71efa	Backport: Add pipeline name to ingest metadata (#51050 ) Backport: #50467 This commit adds the name of the current pipeline to ingest metadata. This pipeline name is accessible under the following key: '_ingest.pipeline'. Example usage in pipeline: PUT /_ingest/pipeline/2 { "processors": [ { "set": { "field": "pipeline_name", "value": "{{_ingest.pipeline}}" } } ] } Closes #42106	2020-01-16 10:50:47 +01:00
Igor Motov	339d10c16f	Geo: Switch generated GeoJson type names to camel case (#50400 ) Switches generated GeoJson type names to camel case to conform to the standard. Closes #49568	2019-12-20 15:37:22 -05:00
István Zoltán Szabó	501ab83471	[DOCS] Adds inference processor documentation (#50204 ) Co-Authored-By: Lisa Cawley <lcawley@elastic.co>	2019-12-19 12:21:04 +01:00
Igor Motov	c77ca98928	Geo: Switch generated WKT to upper case (#50285 ) Switches generated WKT to upper case to conform to the standard recommendation. Relates #49568	2019-12-18 17:29:08 -05:00
Przemko Robakowski	4619834b97	[7.x] CSV ingest processor (#49509 ) (#50083 ) * CSV ingest processor (#49509) This change adds new ingest processor that breaks line from CSV file into separate fields. By default it conforms to RFC 4180 but can be tweaked. Closes #49113	2019-12-11 23:06:05 +01:00
Przemko Robakowski	d7083a84f4	Allow list of IPs in geoip ingest processor (#49573 ) (#49947 ) * Allow list of IPs in geoip ingest processor This change lets you use array of IPs in addition to string in geoip processor source field. It will set array containing geoip data for each element in source, unless first_only parameter option is enabled, then only first found will be returned. Closes #46193	2019-12-07 00:19:09 +01:00
Alexander Reelsen	6e751f5536	Docs: Fix & test more grok processor documentation (#49447 ) The documentation contained a small error, as bytes and duration was not properly converted to a number and thus remained a string. The documentation is now also properly tested by providing a full blown simulate pipeline example.	2019-12-03 11:55:49 +01:00
James Rodewig	3d44c1163a	[DOCS] Explicitly document enrich `target_field` includes `match_field` (#49407 ) When the enrich processor appends enrich data to an incoming document, it adds a `target_field` to contain the enrich data. This `target_field` contains both the `match_field` AND `enrich_fields` specified in the enrich policy. Previously, this was reflected in the documented example but not explicitly stated. This adds several explicit statements to the docs.	2019-12-02 09:13:24 -05:00
Martijn van Groningen	0a42395dfa	Backport: add templating support to pipeline processor (#49643 ) Backport of #49030 This commit adds templating support to the pipeline processor's `name` option. Closes #39955	2019-11-27 15:53:40 +01:00
Martijn van Groningen	09c4269097	Add templating support to enrich processor (#49093 ) Adds support for templating to `field` and `target_field` options.	2019-11-27 08:53:11 +01:00
James Rodewig	0b062bbc82	[DOCS] Correct required file ext for user agent ingest processor (#48688 ) For the user agent ingest processor, custom regex files must end with the `.yml` file extension. This corrects the docs which said the `.yaml` extension was required.	2019-10-30 11:11:29 -04:00
Dan Hermann	dbc05cd808	Add option to split processor for preserving trailing empty fields (#48685 )	2019-10-30 08:25:03 -05:00
James Rodewig	19afe3f84c	[DOCS] Remove duplicate links for ingest processor overview (#48394 )	2019-10-23 10:55:49 -05:00
Alexander Reelsen	66581d8158	update ingest-user-agent regexes.yml (#47807 ) This new regexes are from: `154eba17f5/regexes.yaml`	2019-10-18 16:26:48 +02:00
Martijn van Groningen	7fc9198d46	Change how `max_matches` affects `target_field` option. (#47982 ) Prior to this change the `target_field` would always be a json array field in the document being ingested. This to take into account that multiple enrich documents could be inserted into the `target_field`. However the default `max_matches` is `1`. Meaning that by default only a single enrich document would be added to `target_field` json array field. This commit changes this; if `max_matches` is set to `1` then the single document would be added as a json object to the `target_field` and if it is configured to a higher value then the enrich documents will be added as a json array (even if a single enrich document happens to be enriched).	2019-10-14 21:09:48 +02:00
James Rodewig	65f8294378	[DOCS] Add docs for `geo_match` enrich policy type (#47745 )	2019-10-09 09:02:52 -04:00
Martijn van Groningen	0cfddca61d	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-09-23 09:46:05 +02:00
Alexander Reelsen	011496ed5f	Expose cache setting in UserAgentPlugin (#46533 ) The setting was not registered. Also documentation has been added.	2019-09-16 11:30:38 +02:00
James Rodewig	a27d075db4	[DOCS] Update "Enrich your data" tutorials (#46417 ) * Move enrich docs to separate file * Rewrite enrich processor tutorial	2019-09-11 13:08:48 +02:00
Martijn van Groningen	c057fce978	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-09-09 08:40:54 +02:00
James Rodewig	f04573f8e8	[DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449 ) (#46459 )	2019-09-06 16:09:09 -04:00
James Rodewig	c46c57d439	[DOCS] Change // CONSOLE comments to [source,console] (#46441 ) (#46451 )	2019-09-06 11:31:13 -04:00
Martijn van Groningen	ded98e50b7	Change exact match processor to match processor. (#46041 ) Besides a rename, this changes allows to processor to attach multiple enrich docs to the document being ingested. Also in order to control the maximum number of enrich docs to be included in the document being ingested, the `max_matches` setting is added to the enrich processor. Relates #32789	2019-09-04 18:05:12 +02:00
Martijn van Groningen	555b630160	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-09-02 09:16:55 +02:00
Tal Levy	a356bcff41	Add Circle Processor (#43851 ) (#46097 ) add circle-processor that translates circles to polygons	2019-08-28 14:44:08 -07:00
Martijn van Groningen	33972423e9	Enrich processor configuration changes (#45466 ) Enrich processor configuration changes: * Renamed `enrich_key` option to `field` option. * Replaced `set_from` and `targets` options with `target_field`. The `target_field` option behaves different to how `set_from` and `targets` worked. The `target_field` is the field that will contain the looked up document. Relates to #32789	2019-08-22 09:49:22 +02:00
Michael Basnight	52a094b177	Fail delete policy if pipeline exists (#44438 ) If a pipeline that refrences the policy exists, we should not allow the policy to be deleted. The user will need to remove the processor from the pipeline before deleting the policy. This commit adds a check to ensure that the policy cannot be deleted if it is referenced by any pipeline in the system.	2019-08-14 13:51:10 -05:00
Martijn van Groningen	43b8ab607d	Improve naming of enrich policy fields. (#45494 ) Renamed `enrich_key` to `match_field` and renamed `enrich_values` to `enrich_fields`. Relates #32789	2019-08-14 11:45:22 +02:00
Martijn van Groningen	04626de6ae	Add initial version of enrich processor docs. (#45084 ) Relates to #32789	2019-08-12 20:36:54 +02:00
Jason Tedor	bf74d38782	Fix GeoIP custom database directory in docs (#43383 ) These docs were misleading for package installations of Elasticsearch. Instead, we should refer to $ES_CONFIG/ingest-geoip as the path to place the custom database files. For non-package installations, this is the same as $ES_HOME/config, but for package installations this is not the case as the config directory for package installations is /etc/elasticsearch, and is not relative to $ES_HOME. This commit corrects the docs.	2019-06-19 13:26:07 -04:00
Marios Trivyzas	3b42dde64f	[Docs] Add note for date patterns used for index search. (#42810 ) Add an explanatory NOTE section to draw attention to the difference between small and capital letters used for the index date patterns. e.g.: HH vs hh, MM vs mm. Closes: #22322 (cherry picked from commit c8125417dc33215651f9bb76c9b1ffaf25f41caf)	2019-06-03 22:27:19 +02:00
Alexander Reelsen	8e33a5292a	Add HTML strip processor (#41888 ) This processor uses the lucene HTMLStripCharFilter class to remove HTML entities from a field. This adds to the char filter, so that there is possibility to store the stripped version as well. Note, that the characeter filter replaces tags with a newline, so that the produced HTML will look slightly different than the incoming HTML with regards to newlines.	2019-05-09 13:01:07 +02:00
Flavio Pompermaier	83fef23fd1	Fix wrong property name (#40636 )	2019-05-09 08:53:05 +02:00
James Rodewig	b65ceb36bc	[DOCS] Escape quotes to avoid smart quotes in Asciidoctor (#41603 )	2019-04-30 16:31:20 -04:00
James Rodewig	53702efddd	[DOCS] Add anchors for Asciidoctor migration (#41648 )	2019-04-30 10:20:17 -04:00
Jason Tedor	ac58b9bded	Fix date index name processor default date_formats (#40915 ) This commit is a correction of a doc bug in the docs for the ingest date-index-name processor. The correct pattern is yyyy-MM-dd'T'HH:mm:ss.SSSXX. This is due to the transition from Joda time to Java time where Z does not mean the same thing between the two.	2019-04-05 17:45:57 -04:00
Tal Levy	9ab2410436	Adding an example in the Set processor documentation to address #30604 (#39941 ) (#39969 ) * Added an example of using set to copy values from one field to another. * Modified the document type to match the test.	2019-03-12 11:14:41 -07:00
Alexander Reelsen	8e5e48319e	Add documentation about breaking java time changes (#38886 ) In addition remove joda time mentions across the docs, make sure links are updated to java time javadocs. Forward port of #38720	2019-02-14 10:18:12 +01:00
Jake Landis	46bb663a09	Make 7.x like 6.7 user agent ecs, but default to true (#38828 ) Forward port of https://github.com/elastic/elasticsearch/pull/38757 This change reverts the initial 7.0 commits and replaces them with the 6.7 variant that still allows for the ecs flag. This commit differs from the 6.7 variants in that ecs flag will now default to true. 6.7: `ecs` : default `false` 7.x: `ecs` : default `true` 8.0: no option, but behaves as `true` * Revert "Ingest node - user agent, move device to an object (#38115)" This reverts commit `5b008a34aa`. * Revert "Add ECS schema for user-agent ingest processor (#37727) (#37984)" This reverts commit `cac6b8e06f`. * cherry-pick 5dfe1935345da3799931fd4a3ebe0b6aa9c17f57 Add ECS schema for user-agent ingest processor (#37727) * cherry-pick ec8ddc890a34853ee8db6af66f608b0ad0cd1099 Ingest node - user agent, move device to an object (#38115) (#38121) * cherry-pick f63cbdb9b426ba24ee4d987ca767ca05a22f2fbb (with manual merge fixes) Dep. check for ECS changes to User Agent processor (#38362) * make true the default for the ecs option, and update 7.0 references and tests	2019-02-13 10:28:01 -06:00

1 2

65 Commits