Commit Graph

249 Commits

Author SHA1 Message Date
James Rodewig ccbe2938c8
[DOCS] Fix Gsub processor snippet (#61720) (#61723) 2020-08-31 10:43:26 -04:00
Jake Landis d2e5f2f532
[7.x] Enhance the ingest node simulate verbose output (#60433) (#60678)
This commit enhances the verbose output for the
`_ingest/pipeline/_simulate?verbose` api. Specifically
this adds the following:
* the pipeline processor is now included in the output
* the conditional (if) and result is now included in the output iff it was defined
* a status field is always displayed. the possible values of status are
  * `success` - if the processor ran with out errors
  * `error` - if the processor ran but threw an error that was not ingored
  * `error_ignored` - if the processor ran but threw an error that was ingored
  * `skipped` - if the process did not run (currently only possible if the if condition evaluates to false)
  * `dropped` - if the the `drop` processor ran and dropped the document
* a `processor_type` field for the type of processor (e.g. set, rename, etc.)
* throw a better error if trying to simulate with a pipeline that does not exist

closes #56004
2020-08-27 16:53:09 -05:00
James Rodewig 1b3a002588
[DOCS] Fix ingest processor TOC sort (#61412) (#61416) 2020-08-21 09:21:41 -04:00
James Rodewig bba4220982
[DOCS] Fix `field` def for join processor (#61395) (#61413) 2020-08-21 08:53:38 -04:00
James Rodewig 60876a0e32
[DOCS] Replace Wikipedia links with attribute (#61171) (#61209) 2020-08-17 11:27:04 -04:00
James Rodewig 8263ce79e9
[DOCS] Update ingest processor snippet for ECS (#61128) (#61164)
Co-authored-by: Nicole Albee <2642763+a03nikki@users.noreply.github.com>
2020-08-14 14:21:47 -04:00
James Rodewig cfa67e933f
[DOCS] Fix chunking in query docs (#61053) (#61054)
Changes:
* Moves "Notes" sections for the joining queries and percolate query
  pages to the parent page
* Adds related redirects for the moved "Notes" pages
* Assigns explicit anchor IDs to other "Notes" headings. This was required for
  the redirects to work.
2020-08-12 14:01:10 -04:00
James Rodewig 14e1618fd9
[DOCS] Fix case of ingest processor titles (#61024) (#61039)
Converts page headings to sentence case.
Adds a title abbreviation.
2020-08-12 11:49:54 -04:00
James Rodewig 029869eb35
[DOCS] Fix metadata field refs (#60764) (#60769) 2020-08-05 14:04:55 -04:00
James Rodewig 5a2c6f0d4f
[DOCS] http -> https, remove outdated plugin docs (#60380) (#60545)
Plugin discovery documentation contained information about installing
Elasticsearch 2.0 and installing an oracle JDK, both of which is no
longer valid.

While noticing that the instructions used cleartext HTTP to install
packages, this commit replaces HTTPs links instead of HTTP where possible.

In addition a few community links have been removed, as they do not seem
to exist anymore.

Co-authored-by: Alexander Reelsen <alexander@reelsen.net>
2020-07-31 16:16:31 -04:00
James Rodewig aba785cb6e
[DOCS] Update my-index examples (#60132) (#60248)
Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
2020-07-27 15:58:26 -04:00
James Rodewig 1178f5c6db
[DOCS] Fix ingest processor docs for autogen doc IDs (#60147) (#60242)
If you autogen doc IDs, you cannot use the `{{_id}}` value in an ingest
processor.

This adds a related admonition to the ingest processor docs.
2020-07-27 13:55:21 -04:00
James Rodewig 988e8c8fc6
[DOCS] Swap `[float]` for `[discrete]` (#60134)
Changes instances of `[float]` in our docs for `[discrete]`.

Asciidoctor prefers the `[discrete]` tag for floating headings:
https://asciidoctor.org/docs/asciidoc-asciidoctor-diffs/#blocks
2020-07-23 12:42:33 -04:00
James Rodewig b302b09b85
[DOCS] Reformat snippets to use two-space indents (#59973) (#59994) 2020-07-21 15:49:58 -04:00
Dan Hermann 48df9b1a0e
Update regex file for es user agent node processor (#59697) (#59794) 2020-07-17 11:04:01 -05:00
James Rodewig 6ed356ffc3
[DOCS] Replace `datatype` with `data type` (#58972) (#59184) 2020-07-07 14:59:35 -04:00
David Kyle c651135562
[ML] Make Inference processor field_map and inference_config optional (#59010)
Relaxes the requirement that the inference ingest processor must has a
field_map and inference_config defined even if they are empty.
2020-07-06 11:35:30 +01:00
DeDe Morton 2c43421208 [DOCS] Change Beats links to refactored getting started docs (#58790) 2020-07-02 17:11:25 -07:00
Nik Everett 326cce624b Document using stored scripts for ingest (#58783)
This documents using stored scripts for complex conditionals in indest.
2020-07-01 13:36:00 -04:00
István Zoltán Szabó 13aa8b8d9a [DOCS] Updates results_field description in the inference processor docs (#58554) 2020-06-29 13:15:15 +02:00
Jake Landis dc7ffb154a
Update hh to HH in date processor example (#58089) (#58144)
Co-authored-by: Leaf-Lin <39002973+Leaf-Lin@users.noreply.github.com>
2020-06-15 17:04:14 -05:00
Dan Hermann 8a910443c4
Add ignore_empty_value parameter in set ingest processor (#57030) (#58108) 2020-06-15 08:35:08 -05:00
Jake Landis a370d5eead
[7.x] Ensure Joni warning are logged at debug (#57302) (#57897)
When Joni, the regex engine that powers grok emits a warning it
does so by default to System.err. System.err logs are all bucketed
together in the server log at WARN level. When Joni emits a warning,
it can be extremely verbose, logging a message for each execution
again that pattern. For ingest node that means for every document
that is run that through Grok. Fortunately, Joni provides a call
back hook to push these warnings to a custom location.

This commit implements Joni's callback hook to push the Joni warning
to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor)
at debug level. Generally these warning indicate a possible issue with
the regular expression and upon creation of the Grok processor will
do a "test run" of the expression and log the result (if any) at WARN 
level. This WARN level log should only occur on pipeline creation which 
is a much lower frequency then every document. 

Additionally, the documentation is updated with instructions for how
to set the logger to debug level.
2020-06-09 17:06:29 -05:00
Lisa Cawley db5bf92acf
[7.x][DOCS] Replace docdir attribute with es-repo-dir (#57489) (#57494) 2020-06-01 16:42:53 -07:00
Adam Locke cbd35e9a2b
[DOCS] Add links to `flattened` datatype (#56794) (#56963)
* Changes for #52239.

* Incorporating review feedback from Julie T. Also single-sourcing nexted options in the Mapping page and referencing them in the Nested page.

* Moving tip after the introduction and clarifying limits.

* Update docs/reference/mapping.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/mapping/types/nested.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2020-05-19 16:37:30 -04:00
James Rodewig c859fafcbd [DOCS] Correct `query` datatype in enrich policy definition (#56224)
Corrects the datatype for the `query` property of an enrich policy
object. The `query` property is a query object, not a string.
2020-05-13 08:35:17 -04:00
Thiago Souza 1feb0a95b5 [DOCS] Correct get enrich policy API request example (#56207) 2020-05-05 12:37:54 -04:00
István Zoltán Szabó a5cf4712e5 [DOCS] Changes feature importance links to point to the new page (#55531)
* [DOCS] Changes feature importance links to point to the new page.

* [DOCS] Fixes line breaks.
2020-04-28 09:03:43 +02:00
Benjamin Trent 8ff2cbf1a3
[7.x] [ML] adding prediction_field_type to inference config (#55128) (#55230)
* [ML] adding prediction_field_type to inference config (#55128)

Data frame analytics dynamically determines the classification field type. This field type then dictates the encoded JSON that is written to Elasticsearch. 

Inference needs to know about this field type so that it may provide the EXACT SAME predicted values as analytics. 

Here is added a new field `prediction_field_type` which indicates the desired type. Options are: `string` (DEFAULT), `number`, `boolean` (where close_to(1.0) == true, false otherwise). 

Analytics provides the default `prediction_field_type` when the model is created from the process.
2020-04-15 09:45:22 -04:00
István Zoltán Szabó d025b90cd1 [DOCS] Makes PUT inference API docs collapsible (#54653)
Co-authored-by: lcawl <lcawley@elastic.co>
2020-04-03 09:48:53 +02:00
Benjamin Trent 4a1610265f
[7.x] [ML] add new inference_config field to trained model config (#54421) (#54647)
* [ML] add new inference_config field to trained model config (#54421)

A new field called `inference_config` is now added to the trained model config object. This new field allows for default inference settings from analytics or some external model builder.

The inference processor can still override whatever is set as the default in the trained model config.

* fixing for backport
2020-04-02 12:25:10 -04:00
lcawl 949636944c [DOCS] Fixes shared attribute for feature importance 2020-04-01 14:52:08 -07:00
AndyHunt66 2dd8946539 [DOCS] Remove redundant sentence in ingest processor docs (#54329) 2020-03-27 08:25:09 -04:00
István Zoltán Szabó 487b273286 [DOCS] Adds feature importance mapping subsection to inference processor docs (#54190) 2020-03-26 09:26:50 +01:00
Dan Hermann 94ac979c66
Support array for all string ingest processors (#53694) 2020-03-18 07:07:49 -05:00
Benjamin Trent 4e43ede735
[ML] renaming inference processor field field_mappings to new name field_map (#53433) (#53502)
This renames the `inference` processor configuration field `field_mappings` to `field_map`.

`field_mappings` is now deprecated.
2020-03-13 15:40:57 -04:00
James Rodewig af987fb2d4
[DOCS] Reduce content reuse in enrich docs (#53460)
Restructures the 'Update an enrich policy' section to:

* Migrate the content to the section. It was previously stored in the
  Put Enrich Policy API docs.
* Remove the warning tag admonition from the section content.
* Replace a reused section earlier in the "Set up an enrich processor"
  page with a link.

No substantive changes were made to the content.
2020-03-12 05:57:23 -04:00
Benjamin Trent 89668c5ea0
[ML][Inference] adds new default_field_map field to trained models (#53294) (#53419)
Adds a new `default_field_map` field to trained model config objects.

This allows the model creator to supply field map if it knows that there should be some map for inference to work directly against the training data.

The use case internally is having analytics jobs supply a field mapping for multi-field fields. This allows us to use the model "out of the box" on data where we trained on `foo.keyword` but the `_source` only references `foo`.
2020-03-11 13:49:39 -04:00
Orhan Toy ad2f630795 [DOCS] Fix formatting of simulate ingest pipeline API docs (#52754) 2020-03-02 11:46:27 -05:00
David Pilato 6c6ab8fa47 [DOS] Fix typo in CSV processor docs (#52649)
Corrects an example array in a snippet of the CSV processor docs.
2020-02-25 08:48:50 -05:00
bellengao 49f37989c4 [DOCS] Fix typo in ingest node docs (#52671) 2020-02-25 07:57:52 -05:00
Benjamin Trent afd90647c9
[ML] Adds feature importance to option to inference processor (#52218) (#52666)
This adds machine learning model feature importance calculations to the inference processor.

The new flag in the configuration matches the analytics parameter name: `num_top_feature_importance_values`
Example:
```
"inference": {
   "field_mappings": {},
   "model_id": "my_model",
   "inference_config": {
      "regression": {
         "num_top_feature_importance_values": 3
      }
   }
}
```

This will write to the document as follows:
```
"inference" : {
   "feature_importance" : {
      "FlightTimeMin" : -76.90955548511226,
      "FlightDelayType" : 114.13514762158526,
      "DistanceMiles" : 13.731580450792187
   },
   "predicted_value" : 108.33165831875137,
   "model_id" : "my_model"
}
```

This is done through calculating the [SHAP values](https://arxiv.org/abs/1802.03888).

It requires that models have populated `number_samples` for each tree node. This is not available to models that were created before 7.7.

Additionally, if the inference config is requesting feature_importance, and not all nodes have been upgraded yet, it will not allow the pipeline to be created. This is to safe-guard in a mixed-version environment where only some ingest nodes have been upgraded.

NOTE: the algorithm is a Java port of the one laid out in ml-cpp: https://github.com/elastic/ml-cpp/blob/master/lib/maths/CTreeShapFeatureImportance.cc

usability blocked by: https://github.com/elastic/ml-cpp/pull/991
2020-02-21 18:42:31 -05:00
Russ Cam 62da077beb Specify name on enrich.get_policy as list type (#50217)
This commit updates the enrich.get_policy API to specify name
as a list, in line with other URL parts that accept a comma-separated
list of values.

In addition, update the get enrich policy API docs
to align the URL part name in the documentation with
the name used in the REST API specs.

(cherry picked from commit 94f6f946ef283dc93040e052b4676c5bc37f4bde)
2020-02-20 11:39:28 +10:00
Yang Wang 16ba59e9d1
Expose more authentication info to ingest pipeline (#51305) (#52119)
The changes add more granularity for identiying the data ingestion user.
The ingest pipeline can now be configure to record authentication realm and
type. It can also record API key name and ID when one is in use. 
This improves traceability when data are being ingested from multiple agents
and will become more relevant with the incoming support of required
pipelines (#46847)

Resolves: #49106
2020-02-11 23:05:01 +11:00
Przemko Robakowski 6332de40b4
Add empty_value parameter to CSV processor (#51567) (#51966)
* Add empty_value parameter to CSV processor

This change adds `empty_value` parameter to the CSV processor.
This value is used to fill empty fields. Fields will be skipped
if this parameter is ommited. This behavior is the same for both
quoted and unquoted fields.

* docs updated

* Fix compilation problem

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-02-05 23:35:52 +01:00
David Kyle 289d4f4f4d [ML] Remove stray field from inference docs (#51870)
model_info_field is not a valid option
2020-02-05 10:50:51 +00:00
Florian Kelbert 43a7aadd46 [DOCS] Remove unneeded comma from CSV processor example (#51859) 2020-02-04 09:26:20 -05:00
István Zoltán Szabó 30d1587ad5 [DOCS] Fixes indentation in inference processor code snippet (#51252) 2020-01-21 16:22:16 +01:00
Martijn van Groningen 02dfd71efa
Backport: Add pipeline name to ingest metadata (#51050)
Backport: #50467

This commit adds the name of the current pipeline to ingest metadata.
This pipeline name is accessible under the following key: '_ingest.pipeline'.

Example usage in pipeline:
PUT /_ingest/pipeline/2
{
    "processors": [
        {
            "set": {
                "field": "pipeline_name",
                "value": "{{_ingest.pipeline}}"
            }
        }
    ]
}

Closes #42106
2020-01-16 10:50:47 +01:00
Igor Motov 339d10c16f Geo: Switch generated GeoJson type names to camel case (#50400)
Switches generated GeoJson type names to camel case
to conform to the standard.

Closes #49568
2019-12-20 15:37:22 -05:00