1. Use the term "delimited" rather than "separated values"
2. Use a single factory class with arguments to specify the
delimiter and identification constraints
This change makes it easier to add support for other
delimiter characters.
Some character sets cannot be encoded and this was tripping
up the binary data check in the ML log structure character
set finder.
The fix is to assume that if ICU4J identifies that some bytes
correspond to a character set that cannot be encoded and those
bytes contain zeroes then the data is binary rather than text.
Fixes#33227
This change adds a library to ML that can be used to deduce a log
file's structure given only a sample of the log file.
Eventually this will be used to add an endpoint to ML to make the
functionality available to end users, but this will follow in a
separate change.
The functionality is split into a library so that it can also be
used by a command line tool without requiring the command line
tool to include all server code.