OpenSearch

Commit Graph

Author	SHA1	Message	Date
David Roberts	b61202b0a8	[ML] Add a limit on line merging in find_file_structure (#42501 ) When analysing a semi-structured text file the find_file_structure endpoint merges lines to form multi-line messages using the assumption that the first line in each message contains the timestamp. However, if the timestamp is misdetected then this can lead to excessive numbers of lines being merged to form massive messages. This commit adds a line_merge_size_limit setting (default 10000 characters) that halts the analysis if a message bigger than this is created. This prevents significant CPU time being spent subsequently trying to determine the internal structure of the huge bogus messages.	2019-06-03 13:45:51 +01:00
Benjamin Trent	0253927ec4	[ML Data Frame] Refactor stop logic (#42644 ) (#42763 ) * Revert "invalid test" This reverts commit 9dd8b52c13c716918ff97e6527aaf43aefc4695d. * Testing * mend * Revert "[ML Data Frame] Mute Data Frame tests" This reverts commit 5d837fa312b0e41a77a65462667a2d92d1114567. * Call onStop and onAbort outside atomic update * Don’t update CS * Tidying up * Remove invalid test that asserted logic that has been removed * Add stopped event * Revert "Add stopped event" This reverts commit 02ba992f4818bebd838e1c7678bd2e1cc090bfab. * Adding check for STOPPED in saveState	2019-06-03 06:53:44 -05:00
David Roberts	10aca87389	[ML] Better detection of binary input in find_file_structure (#42707 ) This change helps to prevent the situation where a binary file uploaded to the find_file_structure endpoint is detected as being text in the UTF-16 character set, and then causes a large amount of CPU to be spent analysing the bogus text structure. The approach is to check the distribution of zero bytes between odd and even file positions, on the grounds that UTF-16BE or UTF16-LE would have a very skewed distribution.	2019-06-03 12:47:22 +01:00
Alan Woodward	2129d06643	Create client-only AnalyzeRequest/AnalyzeResponse classes (#42197 ) This commit clones the existing AnalyzeRequest/AnalyzeResponse classes to the high-level rest client, and adjusts request converters to use these new classes. This is a prerequisite to removing the Streamable interface from the internal server version of these classes.	2019-06-03 09:46:36 +01:00
James Rodewig	f51f8ed04c	[DOCS] Remove unneeded options from `[source,sql]` code blocks (#42759 ) In AsciiDoc, `subs="attributes,callouts,macros"` options were required to render `include-tagged::` in a code block. With elastic/docs#827, Elasticsearch Reference documentation migrated from AsciiDoc to Asciidoctor. In Asciidoctor, the `subs="attributes,callouts,macros"` options are no longer needed to render `include-tagged::` in a code block. This commit removes those unneeded options. Resolves #41589	2019-05-31 13:05:13 -04:00
Benjamin Trent	f22dcfb9da	[ML] [Data Frame] nesting group_by fields like other aggs (#42718 ) (#42760 )	2019-05-31 10:55:35 -05:00
David Roberts	87ca762573	[ML] Add Kibana application privilege to data frame admin/user roles (#42757 ) Data frame transforms are restricted by different roles to ML, but share the ML UI. To prevent the ML UI being hidden for users who only have the data frame admin or user role, it is necessary to add the ML Kibana application privilege to the backend data frame roles.	2019-05-31 15:41:59 +01:00
Przemysław Witek	f6779de2b7	Increase maximum forecast interval to 10 years. (#41082 ) (#42710 ) Increase the maximum duration to ~10 years (3650 days).	2019-05-31 06:19:47 +02:00
Jason Tedor	371cb9a8ce	Remove Log4j 1.2 API as a dependency (#42702 ) We had this as a dependency for legacy dependencies that still needed the Log4j 1.2 API. This appears to no longer be necessary, so this commit removes this artifact as a dependency. To remove this dependency, we had to fix a few places where we were accidentally relying on Log4j 1.2 instead of Log4j 2 (easy to do, since both APIs were on the compile-time classpath). Finally, we can remove our custom Netty logger factory. This was needed when we were on Log4j 1.2 and handled logging in our own unique way. When we migrated to Log4j 2 we could have dropped this dependency. However, even then Netty would still pick up Log4j 1.2 since it was on the classpath, thus the advantage to removing this as a dependency now.	2019-05-30 16:08:07 -04:00
Mark Vieira	c1816354ed	[Backport] Improve build configuration time (#42674 )	2019-05-30 10:29:42 -07:00
Benjamin Trent	b5527b3278	[ML] [Data Frame] add support for weighted_avg agg (#42646 ) (#42714 )	2019-05-30 12:05:35 -05:00
Jay Modi	711de2f59a	Make hashed token ids url safe (#42651 ) This commit changes the way token ids are hashed so that the output is url safe without requiring encoding. This follows the pattern that we use for document ids that are autogenerated, see UUIDs and the associated classes for additional details.	2019-05-30 10:44:41 -06:00
Ioannis Kakavas	7cabe8acc9	Fix refresh remote JWKS logic (#42662 ) This change ensures that: - We only attempt to refresh the remote JWKS when there is a signature related error only ( BadJWSException instead of the geric BadJOSEException ) - We do call OpenIDConnectAuthenticator#getUserClaims upon successful refresh. - We test this in OpenIdConnectAuthenticatorTests. Without this fix, when using the OpenID Connect realm with a remote JWKSet configured in `op.jwks_path`, the refresh would be triggered for most configuration errors ( i.e. wrong value for `op.issuer` ) and the kibana wouldn't get a response and timeout since `getUserClaims` wouldn't be called because `ReloadableJWKSource#reloadAsync` wouldn't call `onResponse` on the future.	2019-05-30 18:08:30 +03:00
Ioannis Kakavas	24a794fd6b	Fix testTokenExpiry flaky test (#42585 ) Test was using ClockMock#rewind passing the amount of nanoseconds in order to "strip" nanos from the time value. This was intentional as the expiration time of the UserToken doesn't have nanosecond precision. However, ClockMock#rewind doesn't support nanos either, so when it's called with a TimeValue, it rewinds the clock by the TimeValue's millis instead. This was causing the clock to go enough millis before token expiration time and the test was passing. Once every few hundred times though, the TimeValue by which we attempted to rewind the clock only had nanos and no millis, so rewind moved the clock back just a few millis, but still after expiration time. This change moves the clock explicitly to the same instant as expiration, using clock.setTime and disregarding nanos.	2019-05-30 07:53:56 +03:00
Igor Motov	d2f9ccbe18	Geo: Refactor libs/geo parsers (#42549 ) Refactors the WKT and GeoJSON parsers from an utility class into an instantiatable objects. This is a preliminary step in preparation for moving out coordinate validators from Geometry constructors. This should allow us to make validators plugable.	2019-05-29 20:07:27 -04:00
David Kyle	c5a410f68b	[ML Data Frame] Set DF task state when stopping (#42516 ) Set the state to stopped prior to persisting	2019-05-29 16:39:44 +01:00
Hendrik Muhs	345ff21ae5	[ML-DataFrame] rewrite start and stop to answer with acknowledged (#42589 ) rewrite start and stop to answer with acknowledged fixes #42450	2019-05-29 11:14:32 +02:00
Hendrik Muhs	ace96a2b6e	check position before and after latch (#42623 ) check position before and after latch #fixes 42084	2019-05-28 21:23:29 +02:00
David Kyle	aea600fe7d	[Ml Data Frame] Return bad_request on preview when config is invalid (#42447 )	2019-05-28 15:36:50 +01:00
David Turner	746a2f41fd	Remove PRE_60_NODE_CHECKPOINT (#42531 ) This commit removes the obsolete `PRE_60_NODE_CHECKPOINT` constant for dealing with 5.x nodes' lack of sequence number support. Backport of #42527	2019-05-28 12:25:53 +01:00
Daniel Mitterdorfer	635ce0ca6d	Mute AsyncTwoPhaseIndexerTests#testStateMachine() (#42610 ) Relates #42084 Relates #42609	2019-05-28 10:20:22 +02:00
Nhat Nguyen	2077f9ffbc	Reset mock transport service in CcrRetentionLeaseIT (#42600 ) testRetentionLeaseIsAddedIfItDisappearsWhileFollowing does not reset the mock transport service after test. Surviving transport interceptors from that test can sneaky remove retention leases and make other tests fail. Closes #39331 Closes #39509 Closes #41428 Closes #41679 Closes #41737 Closes #41756	2019-05-27 21:51:25 -04:00
Armin Braun	a96606d962	Safer Wait for Snapshot Success in ClusterPrivilegeTests (#40943 ) (#42575 ) * Safer Wait for Snapshot Success in ClusterPrivilegeTests * The snapshot state returned by the API might become SUCCESS before it's fully removed from the cluster state. * We should fix this race in the transport API but it's not trivial and will be part of the incoming big round of refactoring the repository interaction, this added check fixes the test for now * closes #38030	2019-05-27 12:08:20 +02:00
Armin Braun	a5ca20a250	Some Cleanup in o.e.i.engine (#42278 ) (#42566 ) * Some Cleanup in o.e.i.engine * Remove dead code and parameters * Reduce visibility in some obvious spots * Add missing `assert`s (not that important here since the methods themselves will probably be dead-code eliminated) but still	2019-05-27 11:04:54 +02:00
Nhat Nguyen	85e60850af	Add debug log for retention leases (#42557 ) We need more information to understand why CcrRetentionLeaseIT is failing. This commit adds some debug log to retention leases and enables them in CcrRetentionLeaseIT.	2019-05-26 16:04:47 -04:00
Nhat Nguyen	d6e2f4a43e	Enable recoveries trace log in CcrRetentionLeaseIT Tracked #41679	2019-05-24 22:16:14 -04:00
Tanguy Leroux	6bec876682	Improve Close Index Response (#39687 ) This changes the `CloseIndexResponse` so that it reports closing result for each index. Shard failures or exception are also reported per index, and the global acknowledgment flag is computed from the index results only. The response looks like: ``` { "acknowledged" : true, "shards_acknowledged" : true, "indices" : { "docs" : { "closed" : true } } } ``` The response reports shard failures like: ``` { "acknowledged" : false, "shards_acknowledged" : false, "indices" : { "docs-1" : { "closed" : true }, "docs-2" : { "closed" : false, "shards" : { "1" : { "failures" : [ { "shard" : 1, "index" : "docs-2", "status" : "BAD_REQUEST", "reason" : { "type" : "index_closed_exception", "reason" : "closed", "index_uuid" : "JFmQwr_aSPiZbkAH_KEF7A", "index" : "docs-2" } } ] } } }, "docs-3" : { "closed" : true } } } ``` Co-authored-by: Tanguy Leroux <tlrx.dev@gmail.com>	2019-05-24 21:57:55 -04:00
Julie Tibshirani	3a6c2525ca	Deprecate support for chained multi-fields. (#42330 ) This PR contains a straight backport of #41926, and also updates the migration documentation and deprecation info API for 7.x.	2019-05-24 15:55:06 -07:00
David Roberts	48dc0dca57	[ML] Use map and filter instead of flatMap in find_file_structure (#42534 ) Using map and filter avoids the garbage from all the Stream.of calls that flatMap necessitated. Performance is better when there are masses of fields.	2019-05-24 20:12:06 +01:00
David Roberts	34de68b007	[ML] Fix possible race condition when closing an opening job (#42506 ) This change fixes a race condition that would result in an in-memory data structure becoming out-of-sync with persistent tasks in cluster state. If repeated often enough this could result in it being impossible to open any ML jobs on the affected node, as the master node would think the node had capacity to open another job but the chosen node would error during the open sequence due to its in-memory data structure being full. The race could be triggered by opening a job and then closing it a tiny fraction of a second later. It is unlikely a user of the UI could open and close the job that fast, but a script or program calling the REST API could. The nasty thing is, from the externally observable states and stats everything would appear to be fine - the fast open then close sequence would appear to leave the job in the closed state. It's only later that the leftovers in the in-memory data structure might build up and cause a problem.	2019-05-24 20:11:58 +01:00
Hendrik Muhs	6d47ee9268	[ML-DataFrame] add support for fixed_interval, calendar_interval, remove interval (#42427 ) * add support for fixed_interval, calendar_interval, remove interval * adapt HLRC * checkstyle * add a hlrc to server test * adapt yml test * improve naming and doc * improve interface and add test code for hlrc to server * address review comments * repair merge conflict * fix date patterns * address review comments * remove assert for warning * improve exception message * use constants	2019-05-24 20:30:17 +02:00
Igor Motov	e28a9e99c4	SQL: Moves the JTS-based tests suppression to Before (#42526 ) Moves the test suppression from `ClassRule` to `Before`, where it is properly handled in the CI build. Fixes #42221	2019-05-24 13:58:53 -04:00
Hendrik Muhs	7cee294acf	[ML-DataFrame]backport dataframe changes from 42202, using client instead of transport (#42468 ) backport dataframe changes from #42202, using client instead of transport	2019-05-24 11:05:30 +02:00
David Roberts	f472186b9f	[ML] Improve file structure finder timestamp format determination (#41948 ) This change contains a major refactoring of the timestamp format determination code used by the ML find file structure endpoint. Previously timestamp format determination was done separately for each piece of text supplied to the timestamp format finder. This had the drawback that it was not possible to distinguish dd/MM and MM/dd in the case where both numbers were 12 or less. In order to do this sensibly it is best to look across all the available timestamps and see if one of the numbers is greater than 12 in any of them. This necessitates making the timestamp format finder an instantiable class that can accumulate evidence over time. Another problem with the previous approach was that it was only possible to override the timestamp format to one of a limited set of timestamp formats. There was no way out if a file to be analysed had a timestamp that was sane yet not in the supported set. This is now changed to allow any timestamp format that can be parsed by a combination of these Java date/time formats: yy, yyyy, M, MM, MMM, MMMM, d, dd, EEE, EEEE, H, HH, h, mm, ss, a, XX, XXX, zzz Additionally S letter groups (fractional seconds) are supported providing they occur after ss and separated from the ss by a dot, comma or colon. Spacing and punctuation is also permitted with the exception of the question mark, newline and carriage return characters, together with literal text enclosed in single quotes. The full list of changes/improvements in this refactor is: - Make TimestampFormatFinder an instantiable class - Overrides must be specified in Java date/time format - Joda format is no longer accepted - Joda timestamp formats in outputs are now derived from the determined or overridden Java timestamp formats, not stored separately - Functionality for determining the "best" timestamp format in a set of lines has been moved from TextLogFileStructureFinder to TimestampFormatFinder, taking advantage of the fact that TimestampFormatFinder is now an instantiable class with state - The functionality to quickly rule out some possible Grok patterns when looking for timestamp formats has been changed from using simple regular expressions to the much faster approach of using the Shift-And method of sub-string search, but using an "alphabet" consisting of just 1 (representing any digit) and 0 (representing non-digits) - Timestamp format overrides are now much more flexible - Timestamp format overrides that do not correspond to a built-in Grok pattern are mapped to a %{CUSTOM_TIMESTAMP} Grok pattern whose definition is included within the date processor in the ingest pipeline - Grok patterns that correspond to multiple Java date/time patterns are now handled better - the Grok pattern is accepted as matching broadly, and the required set of Java date/time patterns is built up considering all observed samples - As a result of the more flexible acceptance of Grok patterns, when looking for the "best" timestamp in a set of lines timestamps are considered different if they are preceded by a different sequence of punctuation characters (to prevent timestamps far into some lines being considered similar to timestamps near the beginning of other lines) - Out-of-the-box Grok patterns that are considered now include %{DATE} and %{DATESTAMP}, which have indeterminate day/month ordering - The order of day/month in formats with indeterminate day/month order is determined by considering all observed samples (plus the server locale if the observed samples still do not suggest an ordering) Relates #38086 Closes #35137 Closes #35132	2019-05-24 09:10:08 +01:00
Tim Vernum	567c0d331f	Fix settings prefix for realm truststore password (#42413 ) As part of #30241 realm settings were changed to be true affix settings. In the process of this change, the "ssl." prefix was lost from the realm truststore password. It should be: xpack.security.authc.realms.<type>.<name>.ssl.truststore.password Due to a mismatch between the way we define SSL settings and load SSL contexts, there was no way to define this legacy password setting in a realm config. The settings validation would reject "ssl.truststore.password" but the SSL service would ignore "truststore.password" Backport of: #42336	2019-05-24 13:16:26 +10:00
Ryan Ernst	a49bafc194	Split document and metadata fields in GetResult (#38373 ) (#42456 ) This commit makes creators of GetField split the fields into document fields and metadata fields. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in #24422.	2019-05-23 14:01:07 -07:00
Costin Leau	a48125a9f7	Fix FROZEN indices backport	2019-05-23 21:30:41 +03:00
Costin Leau	9fdf4215dd	Docs: Documentation for the upcoming SQL support of frozen indices (#41863 ) (cherry picked from commit a3cc03eb1503df24c1706a721fcc9af38c3b2873) (cherry picked from commit f42dcf2ffd7bd25f3f91aa6127515f393cd1860f)	2019-05-23 21:16:16 +03:00
Costin Leau	d5f04d29c9	SQL: Add support for FROZEN indices (#41558 ) Allow querying of FROZEN indices both through dedicated SQL grammar extension: > SELECT field FROM FROZEN index and also through driver configuration parameter, namely: > index.include.frozen: true/false Fix #39390 Fix #39377 (cherry picked from commit 2445a933915f420c7f51e8505afa0a7978ce6b0f)	2019-05-23 21:16:16 +03:00
David Kyle	a23257ce06	[ML Data Frame] Account for completed data frames in test (#42351 ) When asserting on the checkpoint value if the DF has completed the checkpoint will be 1 else 0. Similarly state may be started or indexing. Closes #42309	2019-05-23 14:05:09 +01:00
Jim Ferenczi	b88e80ab89	Upgrade to Lucene 8.1.0 (#42214 ) This commit upgrades to the GA release of Lucene 8.1.0	2019-05-23 11:46:45 +02:00
Jim Ferenczi	4ca5649a0d	Upgrade to lucene 8.1.0-snapshot-e460356abe (#40952 )	2019-05-23 11:45:33 +02:00
Mengwei Ding	fa98cbe320	Add .code_internal-* index pattern to kibana user (#42247 ) (#42387 )	2019-05-22 20:25:45 -07:00
Luca Cavanna	c2af62455f	Cut over SearchResponse and SearchTemplateResponse to Writeable (#41855 ) Relates to #34389	2019-05-22 18:47:54 +02:00
Luca Cavanna	29c9bb9181	Clean up ShardId usage of Streamable (#41843 ) ShardId already implements Writeable so there is no need for it to implement Streamable too. Also the readShardId static method can be easily replaced with direct usages of the constructor that takes a StreamInput as argument.	2019-05-22 18:47:54 +02:00
Yannick Welsch	5d8605c790	Fix testAutoFollowManyIndices On a slow CI worker, the test was failing an assertion. Closes #41234	2019-05-22 17:33:34 +02:00
David Kyle	075cc7c5cf	[ML Data Frame] Persist data frame after state changes (#42347 )	2019-05-22 15:40:40 +01:00
David Kyle	f696769a39	Mute Data Frame integration tests Relates to https://github.com/elastic/elasticsearch/issues/42344	2019-05-22 15:03:13 +01:00
Simon Willnauer	a79cd77e5c	Remove IndexShard dependency from Repository (#42213 ) * Remove IndexShard dependency from Repository In order to simplify repository testing especially for BlobStoreRepository it's important to remove the dependency on IndexShard and reduce it to Store and MapperService (in the snapshot case). This significantly reduces the dependcy footprint for Repository and allows unittesting without starting nodes or instantiate entire shard instances. This change deprecates the old method signatures and adds a unittest for FileRepository to show the advantage of this change. In addition, the unittesting surfaced a bug where the internal file names that are private to the repository were used in the recovery stats instead of the target file names which makes it impossible to relate to the actual lucene files in the recovery stats. * don't delegate deprecated methods * apply comments * test	2019-05-22 14:27:11 +02:00
Ioannis Kakavas	aab97f1311	Fail early when rp.client_secret is missing in OIDC realm (#42256 ) rp.client_secret is a required secure setting. Make sure we fail with a SettingsException and a clear, actionable message when building the realm, if the setting is missing.	2019-05-22 13:20:41 +03:00

1 2 3 4 5 ...

2777 Commits