OpenSearch

History

David Roberts f472186b9f [ML] Improve file structure finder timestamp format determination (#41948 ) This change contains a major refactoring of the timestamp format determination code used by the ML find file structure endpoint. Previously timestamp format determination was done separately for each piece of text supplied to the timestamp format finder. This had the drawback that it was not possible to distinguish dd/MM and MM/dd in the case where both numbers were 12 or less. In order to do this sensibly it is best to look across all the available timestamps and see if one of the numbers is greater than 12 in any of them. This necessitates making the timestamp format finder an instantiable class that can accumulate evidence over time. Another problem with the previous approach was that it was only possible to override the timestamp format to one of a limited set of timestamp formats. There was no way out if a file to be analysed had a timestamp that was sane yet not in the supported set. This is now changed to allow any timestamp format that can be parsed by a combination of these Java date/time formats: yy, yyyy, M, MM, MMM, MMMM, d, dd, EEE, EEEE, H, HH, h, mm, ss, a, XX, XXX, zzz Additionally S letter groups (fractional seconds) are supported providing they occur after ss and separated from the ss by a dot, comma or colon. Spacing and punctuation is also permitted with the exception of the question mark, newline and carriage return characters, together with literal text enclosed in single quotes. The full list of changes/improvements in this refactor is: - Make TimestampFormatFinder an instantiable class - Overrides must be specified in Java date/time format - Joda format is no longer accepted - Joda timestamp formats in outputs are now derived from the determined or overridden Java timestamp formats, not stored separately - Functionality for determining the "best" timestamp format in a set of lines has been moved from TextLogFileStructureFinder to TimestampFormatFinder, taking advantage of the fact that TimestampFormatFinder is now an instantiable class with state - The functionality to quickly rule out some possible Grok patterns when looking for timestamp formats has been changed from using simple regular expressions to the much faster approach of using the Shift-And method of sub-string search, but using an "alphabet" consisting of just 1 (representing any digit) and 0 (representing non-digits) - Timestamp format overrides are now much more flexible - Timestamp format overrides that do not correspond to a built-in Grok pattern are mapped to a %{CUSTOM_TIMESTAMP} Grok pattern whose definition is included within the date processor in the ingest pipeline - Grok patterns that correspond to multiple Java date/time patterns are now handled better - the Grok pattern is accepted as matching broadly, and the required set of Java date/time patterns is built up considering all observed samples - As a result of the more flexible acceptance of Grok patterns, when looking for the "best" timestamp in a set of lines timestamps are considered different if they are preceded by a different sequence of punctuation characters (to prevent timestamps far into some lines being considered similar to timestamps near the beginning of other lines) - Out-of-the-box Grok patterns that are considered now include %{DATE} and %{DATESTAMP}, which have indeterminate day/month ordering - The order of day/month in formats with indeterminate day/month order is determined by considering all observed samples (plus the server locale if the observed samples still do not suggest an ordering) Relates #38086 Closes #35137 Closes #35132		2019-05-24 09:10:08 +01:00
..
calendarresource.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
close-job.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
datafeedresource.asciidoc	[ML] Use scaling thread pool and xpack.ml.max_open_jobs cluster-wide dynamic (#39736 )	2019-03-06 12:29:34 +00:00
delete-calendar-event.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
delete-calendar-job.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
delete-calendar.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
delete-datafeed.asciidoc	[DOCS] Allow attribute substitution in titleabbrevs for Asciidoctor migration (#41574 )	2019-04-30 13:46:45 -04:00
delete-expired-data.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
delete-filter.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
delete-forecast.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
delete-job.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
delete-snapshot.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
eventresource.asciidoc	[ML] Correct small inconsistencies in ml APIs spec and docs (#39907 )	2019-03-11 14:02:50 +00:00
filterresource.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
find-file-structure.asciidoc	[ML] Improve file structure finder timestamp format determination (#41948 )	2019-05-24 09:10:08 +01:00
flush-job.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
forecast.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
get-bucket.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
get-calendar-event.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
get-calendar.asciidoc	[ML] Correct small inconsistencies in ml APIs spec and docs (#39907 )	2019-03-11 14:02:50 +00:00
get-category.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
get-datafeed-stats.asciidoc	[DOCS] Allow attribute substitution in titleabbrevs for Asciidoctor migration (#41574 )	2019-04-30 13:46:45 -04:00
get-datafeed.asciidoc	[DOCS] Allow attribute substitution in titleabbrevs for Asciidoctor migration (#41574 )	2019-04-30 13:46:45 -04:00
get-filter.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
get-influencer.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
get-job-stats.asciidoc	[DOCS] Adds size limitation to the get datafeeds APIs (#37578 )	2019-01-17 10:47:15 -08:00
get-job.asciidoc	[DOCS] Adds limitation to the get jobs API (#37549 )	2019-01-17 08:21:37 -08:00
get-ml-info.asciidoc	[DOCS] Allow attribute substitution in titleabbrevs for Asciidoctor migration (#41574 )	2019-04-30 13:46:45 -04:00
get-overall-buckets.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
get-record.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
get-snapshot.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
jobcounts.asciidoc	[ML] Use scaling thread pool and xpack.ml.max_open_jobs cluster-wide dynamic (#39736 )	2019-03-06 12:29:34 +00:00
jobresource.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
ml-api.asciidoc	ML: Add upgrade mode docs, hlrc, and fix bug (#37942 )	2019-01-30 06:51:11 -06:00
open-job.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
post-calendar-event.asciidoc	[ML] Correct small inconsistencies in ml APIs spec and docs (#39907 )	2019-03-11 14:02:50 +00:00
post-data.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
preview-datafeed.asciidoc	[DOCS] Allow attribute substitution in titleabbrevs for Asciidoctor migration (#41574 )	2019-04-30 13:46:45 -04:00
put-calendar-job.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
put-calendar.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
put-datafeed.asciidoc	[DOCS] Allow attribute substitution in titleabbrevs for Asciidoctor migration (#41574 )	2019-04-30 13:46:45 -04:00
put-filter.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
put-job.asciidoc	[DOCS] Add warning about bypassing ML PUT APIs (#38605 )	2019-02-08 11:35:37 +00:00
resultsresource.asciidoc	[DOCS] Cleans up xpackml attributes	2019-01-07 14:33:10 -08:00
revert-snapshot.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
set-upgrade-mode.asciidoc	ML: Add upgrade mode docs, hlrc, and fix bug (#37942 )	2019-01-30 06:51:11 -06:00
snapshotresource.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
start-datafeed.asciidoc	[DOCS] Allow attribute substitution in titleabbrevs for Asciidoctor migration (#41574 )	2019-04-30 13:46:45 -04:00
stop-datafeed.asciidoc	[DOCS] Allow attribute substitution in titleabbrevs for Asciidoctor migration (#41574 )	2019-04-30 13:46:45 -04:00
update-datafeed.asciidoc	[DOCS] Allow attribute substitution in titleabbrevs for Asciidoctor migration (#41574 )	2019-04-30 13:46:45 -04:00
update-filter.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
update-job.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
update-snapshot.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
validate-detector.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
validate-job.asciidoc	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00