Commit Graph

4 Commits

Author SHA1 Message Date
David Roberts 7345878d33
[ML] Refactor delimited file structure detection (#33233)
1. Use the term "delimited" rather than "separated values"
2. Use a single factory class with arguments to specify the
   delimiter and identification constraints

This change makes it easier to add support for other
delimiter characters.
2018-08-31 08:48:45 +01:00
David Roberts 22415fa2de
[ML] Fix character set finder bug with unencodable charsets (#33234)
Some character sets cannot be encoded and this was tripping
up the binary data check in the ML log structure character
set finder.

The fix is to assume that if ICU4J identifies that some bytes
correspond to a character set that cannot be encoded and those
bytes contain zeroes then the data is binary rather than text.

Fixes #33227
2018-08-29 14:56:02 +01:00
Alpar Torok 82d10b484a
Run forbidden api checks with runtimeJavaVersion (#32947)
Run forbidden APIs checks with runtime hava version
2018-08-22 09:05:22 +03:00
David Roberts 5ba04e23fc
[ML] Add log structure finder functionality (#32788)
This change adds a library to ML that can be used to deduce a log
file's structure given only a sample of the log file.

Eventually this will be used to add an endpoint to ML to make the
functionality available to end users, but this will follow in a
separate change.

The functionality is split into a library so that it can also be
used by a command line tool without requiring the command line
tool to include all server code.
2018-08-15 18:04:21 +01:00