Benjamin Trent 5ab9e75e28
[7.x] [ML][Inference] lang_ident model (#50292) (#50675)
* [ML][Inference] lang_ident model (#50292)

This PR contains a java port of Google's CLD3 compact NN model https://github.com/google/cld3

The ported model is formatted to fit within our inference model formatting and stored as a resource in the `:xpack:ml:` plugin and is under basic license.

The model is broken up into two major parts:
- Preprocessing through the custom embedding (based on CLD3's embedding layer)
- Pushing the embedded text through the two layers of fully connected shallow NN. 

Main differences between this port and CLD3:
- We take advantage of Java's internal Unicode handling where possible (i.e. codepoints, characters, decoders, etc.)
- We do not trim down input text by removing duplicated tokens
- We do not encode doubles/floats as longs/integers.
2020-01-06 16:24:03 -05:00
..
2019-10-11 16:34:11 +03:00

Elastic License Functionality

This directory tree contains files subject to the Elastic License. The files subject to the Elastic License are grouped in this directory to clearly separate them from files licensed under the Apache License 2.0.