OpenSearch

Commit Graph

Author	SHA1	Message	Date
Andriy Redko	385b268bc0	Update Mockito to 4.2.x (#1830 ) Signed-off-by: Andriy Redko <andriy.redko@aiven.io>	2022-01-03 12:00:45 -05:00
Owais Kazi	8394f541bc	Run spotless and exclude checkstyle on libs module (#1428 ) Signed-off-by: Owais Kazi <owaiskazi19@gmail.com>	2021-10-26 09:45:26 -05:00
kartg	af6fbc77eb	Improving the Grok circular reference check to prevent stack overflow (#1079 ) This change refactors the circular reference check in the Grok processor class to use a formal depth-first traversal. It also includes a logic update to prevent a stack overflow in one scenario and a check for malformed patterns. This bugfix addresses CVE-2021-22144. Signed-off-by: Kartik Ganesh <85275476+kartg@users.noreply.github.com>	2021-08-12 12:52:02 -04:00
Nick Knize	9168f1fb43	[License] Add SPDX and OpenSearch Modification license header (#509 ) This commit adds the SPDX Apache-2.0 license header along with an additional copyright header for all modifications. Signed-off-by: Nicholas Walter Knize <nknize@apache.org>	2021-04-09 14:28:18 -05:00
Rabi Panda	1bdfbb4ef1	[Rename] Fix imports in the libs module. (#385 ) Signed-off-by: Rabi Panda <adnapibar@gmail.com>	2021-03-21 20:56:34 -05:00
Nick Knize	5b46a05702	[Rename] remaining packages and resources in test/fixture (#364 ) This commit refactors the remaining o.e.index and o.e.test packages in the test/fixtures module. References throughout the codebase are also refactored. Signed-off-by: Nicholas Walter Knize <nknize@apache.org>	2021-03-21 20:56:34 -05:00
Rabi Panda	eae9b0531b	[Rename] refactor libs/grok. (#262 ) Refactor the `libs/grok` module to rename the package name from `org.elasticsearch.grok` to `org.opensearch.grok` as part of the rename to OpenSearch work. Signed-off-by: Rabi Panda <adnapibar@gmail.com>	2021-03-21 20:56:34 -05:00
Nik Everett	719a76e4bd	Grok: "native" results (backport of #62843 ) (#62886 ) This adds the ability to fetch java primitives like `long` and `float` from grok matches rather than their boxed versions. It also allows customizing the which fields are extracted and how they are extracted. By default we continue to fetch a `Map<String, Object>` but runtime fields will be able to catch just the fields it is interested in, and the values will be primitives.	2020-09-24 11:47:13 -04:00
Nik Everett	f8bc5a3e6b	Grok: Handle utf-8 natively (backport of #62794 ) (#62826 ) This adds a method to `Grok` that matches against sections offset from utf-8 byte arrays: ``` Map<String, Object> captures(byte[] utf8Bytes, int offset, int length) ``` This'll be useful for the grok-flavored runtime fields because they want to match against utf-8 encoded strings stored in a big array. And joni already supports this.	2020-09-23 09:33:03 -04:00
Nik Everett	7ffea4621d	Extract capture config from grok patterns up front (backport of #62706 ) (#62785 ) This extracts the configuration for extracting values from a groked string when building the grok expression to do two things: 1. Create a method exposing that configuration on `Grok` itself which will be used grok `grok` flavored runtime fields. 2. Marginally speed up extracting grok values by skipping a little string manipulation.	2020-09-22 17:44:42 -04:00
Nik Everett	39a617773d	Raname grok's built-in patterns (backport of #62735 ) (#62765 ) This reworks the code around grok's built-in patterns to name things more like the rest of the code. Its not a big deal, but I'm just more used to having `public static final` constants in SHOUTING_SNAKE_CASE.	2020-09-22 13:06:43 -04:00
Dan Hermann	0b1e2172e1	[7.x] Preserve grok pattern ordering and add sort option (#61671 ) (#62162 )	2020-09-09 08:53:11 -05:00
Jake Landis	a370d5eead	[7.x] Ensure Joni warning are logged at debug (#57302 ) (#57897 ) When Joni, the regex engine that powers grok emits a warning it does so by default to System.err. System.err logs are all bucketed together in the server log at WARN level. When Joni emits a warning, it can be extremely verbose, logging a message for each execution again that pattern. For ingest node that means for every document that is run that through Grok. Fortunately, Joni provides a call back hook to push these warnings to a custom location. This commit implements Joni's callback hook to push the Joni warning to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor) at debug level. Generally these warning indicate a possible issue with the regular expression and upon creation of the Grok processor will do a "test run" of the expression and log the result (if any) at WARN level. This WARN level log should only occur on pipeline creation which is a much lower frequency then every document. Additionally, the documentation is updated with instructions for how to set the logger to debug level.	2020-06-09 17:06:29 -05:00
Jake Landis	f3721fa88c	[7.x] Prevent stack overflow for numerous grok patterns. (#55899 ) (#56065 ) This was noticed for a pipeline that was defining hundreds of grok patterns inline with a single grok processor. The recursive call used to translate a Grok pattern to a regular expression can overflow the stack. This commit converts that method to an iterative method. Co-authored-by: Przemko Robakowski <probakowski@users.noreply.github.com>	2020-05-05 16:52:56 -05:00
Dan Hermann	28643f8df1	Missing suffix for German Month "Juli" in Grok Pattern MONTH (#51579 ) (#51591 ) (#51863 )	2020-02-04 08:25:24 -06:00
Ryan Ernst	21224caeaf	Remove comparison to true for booleans (#51723 ) While we use `== false` as a more visible form of boolean negation (instead of `!`), the true case is implied and the true value does not need to explicitly checked. This commit converts cases that have slipped into the code checking for `== true`.	2020-01-31 16:35:43 -08:00
Alexander Reelsen	71054d269b	Sync grok patterns with logstash patterns (#50381 ) In order to ensure that logstash and Elasticsearch are able to understand the same patterns, this commit adapts to changes in logstash, adds a few patterns and changes a few.	2020-01-08 14:59:34 +01:00
Martijn van Groningen	0476f014bc	Unmuted and fixed test. Multiple invocations are expected. see #48519	2019-10-30 16:53:56 +01:00
Martijn van Groningen	7c2f5c51b5	Muted test See #48519	2019-10-30 15:54:25 +01:00
Martijn van Groningen	b034153df7	Change grok watch dog to be Matcher based instead of thread based. (#48346 ) There is a watchdog in order to avoid long running (and expensive) grok expressions. Currently the watchdog is thread based, threads that run grok expressions are registered and after completion unregister. If these threads stay registered for too long then the watch dog interrupts these threads. Joni (the library that powers grok expressions) has a mechanism that checks whether the current thread is interrupted and if so abort the pattern matching. Newer versions have an additional method to abort long running pattern matching inside joni. Instead of checking the thread's interrupted flag, joni now also checks a volatile field that can be set via a `Matcher` instance. This is more efficient method for aborting long running matches. (joni checks each 30k iterations whether interrupted flag is set vs. just checking a volatile field) Recently we upgraded to a recent joni version (#47374), and this PR is a followup of that PR. This change should also fix #43673, since it appears when unit tests are ran the a test runner thread's interrupted flag may already have been set, due to some thread reuse.	2019-10-24 15:34:01 +02:00
Martijn van Groningen	f48981f43c	Remove redundant nested operator in builtin grok expression. (#47870 ) This prevents the following warning from being printed to console: `regular expression has redundant nested repeat operator + /%\{(?<name>(?<pattern>[A-z0-9]+)(?::(?<subname>[[:alnum:]@\[\]_:.-]+))?)(?:=(?<definition>(?:(?:[^{}]+\|\.+)+)+))?\}/` The current grok expression is not failing, but just this warning is being printed. The warning started being printed after upgrading joni (#47374). Closes #47861	2019-10-14 14:34:48 +02:00
Martijn van Groningen	63b169b600	Upgrade joni from 2.1.6 to 2.1.29 (#47570 ) Backport of #47374 Changed the Grok class to use searchInterruptible(...) instead of search(...) otherwise we can't interrupt long running matching via the thread watch dog. Joni now also provides another way to interrupt long running matches. By invoking the interrupt() method on the Matcher. We need then to refactor the watch thread dog to keep track of Matchers instead of Threads, but it is a better way of doing this, since interrupting would be more direct (not every 30k iterations) and efficient (checking a volatile field). This work needs to be done in a follow up.	2019-10-04 12:54:49 -05:00
Alpar Torok	0a14bb174f	Remove eclipse conditionals (#44075 ) * Remove eclipse conditionals We used to have some meta projects with a `-test` prefix because historically eclipse could not distinguish between test and main source-sets and could only use a single classpath. This is no longer the case for the past few Eclipse versions. This PR adds the necessary configuration to correctly categorize source folders and libraries. With this change eclipse can import projects, and the visibility rules are correct e.x. auto compete doesn't offer classes from test code or `testCompile` dependencies when editing classes in `main`. Unfortunately the cyclic dependency detection in Eclipse doesn't seem to take the difference between test and non test source sets into account, but since we are checking this in Gradle anyhow, it's safe to set to `warning` in the settings. Unfortunately there is no setting to ignore it. This might cause problems when building since Eclipse will probably not know the right order to build things in so more wirk might be necesarry.	2019-10-03 11:55:00 +03:00
Albert Zaharovits	72eb9c2d44	Eclipse libs projects setup fix (#42852 ) Fallout from #42773 for eclipse users. (cherry picked from commit 998419c49fe51eb8343664a80f07d8d8d39abc6a)	2019-06-04 13:52:41 -07:00
David Roberts	14f29de2a8	Avoid HashMap construction on Grok non-match (#42444 ) This change moves the construction of the result HashMap in Grok.captures() into the branch that actually needs it. This probably will not make a measurable difference for ingest pipelines, but it is beneficial to the ML find_file_structure endpoint, as it tries out many Grok patterns that will fail to match.	2019-05-23 21:09:33 +01:00
austintp	8ebff0512b	Updates the grok patterns to be consistent with logstash (#27181 )	2019-02-05 12:37:02 -06:00
Henning Andersen	68ed72b923	Handle scheduler exceptions (#38014 ) Scheduler.schedule(...) would previously assume that caller handles exception by calling get() on the returned ScheduledFuture. schedule() now returns a ScheduledCancellable that no longer gives access to the exception. Instead, any exception thrown out of a scheduled Runnable is logged as a warning. This is a continuation of #28667, #36137 and also fixes #37708.	2019-01-31 17:51:45 +01:00
John	0baffda390	ingest: grok remove duplicated patterns (#35886 ) This commit removes the redundant (and incorrect) JAVACLASS and JAVAFILE grok patterns. This helps to keep parity with Logstash's patterns. See also: https://github.com/logstash-plugins/logstash-patterns-core/pull/237 closes #35699	2018-11-26 11:13:46 -06:00
Christoph Büscher	ba3ceeaccf	Clean up "unused variable" warnings (#31876 ) This change cleans up "unused variable" warnings. There are several cases were we most likely want to suppress the warnings (especially in the client documentation test where the snippets contain many unused variables). In a lot of cases the unused variables can just be deleted though.	2018-09-26 14:09:32 +02:00
Armin Braun	4dda5a990b	INGEST: Fix ThreadWatchDog Throwing on Shutdown (#32578 ) * INGEST: Fix ThreadWatchDog Throwing on Shutdown * #32539 is caused by the fact that ThreadWatchDog.Default could throw on shutdown if the ThreadPool is interrupted while `interruptLongRunningExecutions` is in progress. This is a result of the watchdog not having a lifecycle of its own (normally it terminates when the threadpool terminates). * We can't easily use `org.elasticsearch.common.util.concurrent.EsRejectedExecutionException#isExecutorShutdown` to catch this state the same way other components do since thatwould require adding the core lib to Grok as a dependency * Since we have no knowledge of the lifecycle in this compontent since we're only passed the scheduler `BiFunction` I fixed this by only scheduling the watchdog when there's actually registered threads in it. * I think using the patter of locking via two `Atomic` values should not be much of a performance concern here under load since either the integer will likely be > 0 in this case (because we have multiple Grok in parallel) or the running state will be true because there likely was at least one thread registered when the watchdog ran and so the enqueing of the watchdog task during `register` will happen very rarely here (in the worst case scenario of only a single Grok thread it will happen less frequently than once every `ingest.grok.watchdog.interval`). The atomic update on the count should not be relevant relative to the cost of adding a new node to the CHM either. Fixes #32539 * Also fixes the watchdog to run if it doens't have to in general.	2018-08-06 22:46:26 +02:00
Armin Braun	b7b413e55e	Extend allowed characters for grok field names (#21745 ) (#31653 )	2018-06-29 09:12:47 +02:00
Martijn van Groningen	6030d4be1e	[INGEST] Interrupt the current thread if evaluation grok expressions take too long (#31024 ) This adds a thread interrupter that allows us to encapsulate calls to org.joni.Matcher#search() This method can hang forever if the regex expression is too complex. The thread interrupter in the background checks every 3 seconds whether there are threads execution the org.joni.Matcher#search() method for longer than 5 seconds and if so interrupts these threads. Joni has checks that that for every 30k iterations it checks if the current thread is interrupted and if so returns org.joni.Matcher#INTERRUPTED Closes #28731	2018-06-12 07:49:03 +02:00
Tanguy Leroux	bf58660482	Remove all unused imports and fix CRLF (#31207 ) The X-Pack opening and the recent other refactorings left a lot of unused imports in the codebase. This commit removes them all.	2018-06-11 15:12:12 +02:00
Martijn van Groningen	9da95efa41	ingest: Don't allow circular referencing of named patterns in the grok processor. Otherwise the grok code throws a stackoverflow error. Closes #29257	2018-04-05 09:35:50 +02:00
Martijn van Groningen	e55ce1474d	Applied @colings86 changes to the build in order to make new module work in Eclipse too.	2018-02-20 13:49:57 +01:00
Martijn van Groningen	72de14115b	fixed codestyle violation	2018-02-20 08:46:57 +01:00
Martijn van Groningen	9c405e8595	made load method private and add another static getter that users of Grok can use to get the builtin patterns.	2018-02-20 08:09:24 +01:00
Martijn van Groningen	3fad16e76c	renamed module	2018-02-20 08:02:02 +01:00

38 Commits