OpenSearch/x-pack/plugin
Benjamin Trent 181ee3ae0b
[ML] specifying missing_field_value value and using it instead of empty_string (#53108) (#53165)
For analytics, we need a consistent way of indicating when a value is missing. Inheriting from anomaly detection, analysis sent `""` when a field is missing. This works fine with numbers, but the underlying analytics process actually treats `""` as a category in categorical values. 

Consequently, you end up with this situation in the resulting model
```
{
              "frequency_encoding" : {
                "field" : "RainToday",
                "feature_name" : "RainToday_frequency",
                "frequency_map" : {
                  "" : 0.009844409027270245,
                  "No" : 0.6472019970785184,
                  "Yes" : 0.6472019970785184
                }
              }
            }
```
For inference this is a problem, because inference will treat missing values as `null`. And thus not include them on the infer call against the model.

This PR takes advantage of our new `missing_field_value` option and supplies `\0` as the value.
2020-03-05 09:50:52 -05:00
..
analytics Support multiple metrics in `top_metrics` agg (backport of #52965) (#53163) 2020-03-05 08:12:01 -05:00
autoscaling [7.x] Smarter copying of the rest specs and tests (#52114) (#52798) 2020-02-26 08:13:41 -06:00
ccr Allow dynamic updates for index.hidden setting (#52837) 2020-02-26 11:46:29 -07:00
core Formalize usage stats for analytics (backport of #52966) (#53077) 2020-03-04 10:29:11 -05:00
deprecation Remove DeprecationLogger from route objects (#52285) 2020-02-12 15:05:41 -07:00
enrich Introduce system index APIs for Kibana (#53035) 2020-03-03 14:11:36 -07:00
eql EQL: Add HLRC for EQL stats (#53043) (#53148) 2020-03-05 09:20:38 -05:00
frozen-indices Upgrade Lucene to 8.5.0-snapshot-b01d7cb (#52584) 2020-02-21 10:25:03 +00:00
graph [7.x] Smarter copying of the rest specs and tests (#52114) (#52798) 2020-02-26 08:13:41 -06:00
ilm [7.x] Smarter copying of the rest specs and tests (#52114) (#52798) 2020-02-26 08:13:41 -06:00
logstash Introduce system index APIs for Kibana (#53035) 2020-03-03 14:11:36 -07:00
mapper-constant-keyword Introduce a `constant_keyword` field. (#49713) (#53024) 2020-03-03 16:01:47 +01:00
mapper-flattened Add size support to `top_metrics` (backport of #52662) (#52914) 2020-02-27 16:12:52 -05:00
ml [ML] specifying missing_field_value value and using it instead of empty_string (#53108) (#53165) 2020-03-05 09:50:52 -05:00
monitoring Single instance of the IndexNameExpressionResolver (#52604) 2020-02-21 07:50:02 -07:00
ql SQL: Fix column size for IP data type (#53056) 2020-03-04 10:36:44 +01:00
rollup Single instance of the IndexNameExpressionResolver (#52604) 2020-02-21 07:50:02 -07:00
search-business-rules Generalize how queries on `_index` are handled at rewrite time (#52815) 2020-02-26 15:37:43 +01:00
security Introduce system index APIs for Kibana (#53035) 2020-03-03 14:11:36 -07:00
spatial Add support for multipoint shape queries (#52564) (#52705) 2020-02-24 13:46:51 +01:00
sql upgrade to lucene-snapshot-fa75139efea (#53150) (#53151) 2020-03-05 10:04:05 +01:00
src/test Fix test failures with the new `constant_keyword` field. (#53153) 2020-03-05 14:29:13 +01:00
transform Introduce system index APIs for Kibana (#53035) 2020-03-03 14:11:36 -07:00
vectors Add size support to `top_metrics` (backport of #52662) (#52914) 2020-02-27 16:12:52 -05:00
voting-only-node Single instance of the IndexNameExpressionResolver (#52604) 2020-02-21 07:50:02 -07:00
watcher Use correct issue number: #52453 2020-03-04 16:17:55 +01:00
build.gradle Disable ILM history in x-pack rest tests (#52868) 2020-02-27 17:20:33 +01:00