mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-03-06 10:59:12 +00:00
This change contains a major refactoring of the timestamp format determination code used by the ML find file structure endpoint. Previously timestamp format determination was done separately for each piece of text supplied to the timestamp format finder. This had the drawback that it was not possible to distinguish dd/MM and MM/dd in the case where both numbers were 12 or less. In order to do this sensibly it is best to look across all the available timestamps and see if one of the numbers is greater than 12 in any of them. This necessitates making the timestamp format finder an instantiable class that can accumulate evidence over time. Another problem with the previous approach was that it was only possible to override the timestamp format to one of a limited set of timestamp formats. There was no way out if a file to be analysed had a timestamp that was sane yet not in the supported set. This is now changed to allow any timestamp format that can be parsed by a combination of these Java date/time formats: yy, yyyy, M, MM, MMM, MMMM, d, dd, EEE, EEEE, H, HH, h, mm, ss, a, XX, XXX, zzz Additionally S letter groups (fractional seconds) are supported providing they occur after ss and separated from the ss by a dot, comma or colon. Spacing and punctuation is also permitted with the exception of the question mark, newline and carriage return characters, together with literal text enclosed in single quotes. The full list of changes/improvements in this refactor is: - Make TimestampFormatFinder an instantiable class - Overrides must be specified in Java date/time format - Joda format is no longer accepted - Joda timestamp formats in outputs are now derived from the determined or overridden Java timestamp formats, not stored separately - Functionality for determining the "best" timestamp format in a set of lines has been moved from TextLogFileStructureFinder to TimestampFormatFinder, taking advantage of the fact that TimestampFormatFinder is now an instantiable class with state - The functionality to quickly rule out some possible Grok patterns when looking for timestamp formats has been changed from using simple regular expressions to the much faster approach of using the Shift-And method of sub-string search, but using an "alphabet" consisting of just 1 (representing any digit) and 0 (representing non-digits) - Timestamp format overrides are now much more flexible - Timestamp format overrides that do not correspond to a built-in Grok pattern are mapped to a %{CUSTOM_TIMESTAMP} Grok pattern whose definition is included within the date processor in the ingest pipeline - Grok patterns that correspond to multiple Java date/time patterns are now handled better - the Grok pattern is accepted as matching broadly, and the required set of Java date/time patterns is built up considering all observed samples - As a result of the more flexible acceptance of Grok patterns, when looking for the "best" timestamp in a set of lines timestamps are considered different if they are preceded by a different sequence of punctuation characters (to prevent timestamps far into some lines being considered similar to timestamps near the beginning of other lines) - Out-of-the-box Grok patterns that are considered now include %{DATE} and %{DATESTAMP}, which have indeterminate day/month ordering - The order of day/month in formats with indeterminate day/month order is determined by considering all observed samples (plus the server locale if the observed samples still do not suggest an ordering) Relates #38086 Closes #35137 Closes #35132
The Elasticsearch docs are in AsciiDoc format and can be built using the Elasticsearch documentation build process. See: https://github.com/elastic/docs Snippets marked with `// CONSOLE` are automatically annotated with "VIEW IN CONSOLE" and "COPY AS CURL" in the documentation and are automatically tested by the command `gradle :docs:check`. To test just the docs from a single page, use e.g. `./gradlew :docs:integTestRunner --tests "*rollover*"`. NOTE: If you have an elasticsearch-extra folder alongside your elasticsearch folder, you must temporarily rename it when you are testing 6.3 or later branches. By default each `// CONSOLE` snippet runs as its own isolated test. You can manipulate the test execution in the following ways: * `// TEST`: Explicitly marks a snippet as a test. Snippets marked this way are tests even if they don't have `// CONSOLE` but usually `// TEST` is used for its modifiers: * `// TEST[s/foo/bar/]`: Replace `foo` with `bar` in the generated test. This should be used sparingly because it makes the snippet "lie". Sometimes, though, you can use it to make the snippet more clear. Keep in mind that if there are multiple substitutions then they are applied in the order that they are defined. * `// TEST[catch:foo]`: Used to expect errors in the requests. Replace `foo` with `request` to expect a 400 error, for example. If the snippet contains multiple requests then only the last request will expect the error. * `// TEST[continued]`: Continue the test started in the last snippet. Between tests the nodes are cleaned: indexes are removed, etc. This prevents that from happening between snippets because the two snippets are a single test. This is most useful when you have text and snippets that work together to tell the story of some use case because it merges the snippets (and thus the use case) into one big test. * `// TEST[skip:reason]`: Skip this test. Replace `reason` with the actual reason to skip the test. Snippets without `// TEST` or `// CONSOLE` aren't considered tests anyway but this is useful for explicitly documenting the reason why the test shouldn't be run. * `// TEST[setup:name]`: Run some setup code before running the snippet. This is useful for creating and populating indexes used in the snippet. The setup code is defined in `docs/build.gradle`. See `// TESTSETUP` below for a similar feature. * `// TEST[warning:some warning]`: Expect the response to include a `Warning` header. If the response doesn't include a `Warning` header with the exact text then the test fails. If the response includes `Warning` headers that aren't expected then the test fails. * `// TESTRESPONSE`: Matches this snippet against the body of the response of the last test. If the response is JSON then order is ignored. If you add `// TEST[continued]` to the snippet after `// TESTRESPONSE` it will continue in the same test, allowing you to interleave requests with responses to check. * `// TESTRESPONSE[s/foo/bar/]`: Substitutions. See `// TEST[s/foo/bar]` for how it works. These are much more common than `// TEST[s/foo/bar]` because they are useful for eliding portions of the response that are not pertinent to the documentation. * One interesting difference here is that you often want to match against the response from Elasticsearch. To do that you can reference the "body" of the response like this: `// TESTRESPONSE[s/"took": 25/"took": $body.took/]`. Note the `$body` string. This says "I don't expect that 25 number in the response, just match against what is in the response." Instead of writing the path into the response after `$body` you can write `$_path` which "figures out" the path. This is especially useful for making sweeping assertions like "I made up all the numbers in this example, don't compare them" which looks like `// TESTRESPONSE[s/\d+/$body.$_path/]`. * You can't use `// TESTRESPONSE` immediately after `// TESTSETUP`. Instead, consider using `// TEST[continued]` or rearrange your snippets. * `// TESTRESPONSE[_cat]`: Add substitutions for testing `_cat` responses. Use this after all other substitutions so it doesn't make other substitutions difficult. * `// TESTRESPONSE[skip:reason]`: Skip the assertions specified by this response. * `// TESTSETUP`: Marks this snippet as the "setup" for all other snippets in this file. This is a somewhat natural way of structuring documentation. You say "this is the data we use to explain this feature" then you add the snippet that you mark `// TESTSETUP` and then every snippet will turn into a test that runs the setup snippet first. See the "painless" docs for a file that puts this to good use. This is fairly similar to `// TEST[setup:name]` but rather than the setup defined in `docs/build.gradle` the setup is defined right in the documentation file. In general, we should prefer `// TESTSETUP` over `// TEST[setup:name]` because it makes it more clear what steps have to be taken before the examples will work. * `// NOTCONSOLE`: Marks this snippet as neither `// CONSOLE` nor `// TESTRESPONSE`, excluding it from the list of unconverted snippets. We should only use this for snippets that *are* JSON but are *not* responses or requests. In addition to the standard CONSOLE syntax these snippets can contain blocks of yaml surrounded by markers like this: ``` startyaml - compare_analyzers: {index: thai_example, first: thai, second: rebuilt_thai} endyaml ``` This allows slightly more expressive testing of the snippets. Since that syntax is not supported by CONSOLE the usual way to incorporate it is with a `// TEST[s//]` marker like this: ``` // TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: thai_example, first: thai, second: rebuilt_thai}\nendyaml\n/] ``` Any place you can use json you can use elements like `$body.path.to.thing` which is replaced on the fly with the contents of the thing at `path.to.thing` in the last response.