OpenSearch

Commit Graph

Author	SHA1	Message	Date
David Roberts	d5baedb789	[ML] Change dots in CSV column names to underscores (#42839 ) Dots in the column names cause an error in the ingest pipeline, as dots are special characters in ingest pipeline. This PR changes dots into underscores in CSV field names suggested by the ML find_file_structure endpoint _unless_ the field names are specifically overridden. The reason for allowing them in overrides is that fields that are not mentioned in the ingest pipeline can contain dots. But it's more consistent that the default behaviour is to replace them all. Fixes elastic/kibana#26800	2019-06-05 11:28:33 +01:00
Mark Vieira	e44b8b1e2e	[Backport] Remove dependency substitutions 7.x (#42866 ) * Remove unnecessary usage of Gradle dependency substitution rules (#42773) (cherry picked from commit 12d583dbf6f7d44f00aa365e34fc7e937c3c61f7)	2019-06-04 13:50:23 -07:00
David Roberts	b61202b0a8	[ML] Add a limit on line merging in find_file_structure (#42501 ) When analysing a semi-structured text file the find_file_structure endpoint merges lines to form multi-line messages using the assumption that the first line in each message contains the timestamp. However, if the timestamp is misdetected then this can lead to excessive numbers of lines being merged to form massive messages. This commit adds a line_merge_size_limit setting (default 10000 characters) that halts the analysis if a message bigger than this is created. This prevents significant CPU time being spent subsequently trying to determine the internal structure of the huge bogus messages.	2019-06-03 13:45:51 +01:00
David Roberts	10aca87389	[ML] Better detection of binary input in find_file_structure (#42707 ) This change helps to prevent the situation where a binary file uploaded to the find_file_structure endpoint is detected as being text in the UTF-16 character set, and then causes a large amount of CPU to be spent analysing the bogus text structure. The approach is to check the distribution of zero bytes between odd and even file positions, on the grounds that UTF-16BE or UTF16-LE would have a very skewed distribution.	2019-06-03 12:47:22 +01:00
David Roberts	48dc0dca57	[ML] Use map and filter instead of flatMap in find_file_structure (#42534 ) Using map and filter avoids the garbage from all the Stream.of calls that flatMap necessitated. Performance is better when there are masses of fields.	2019-05-24 20:12:06 +01:00
David Roberts	34de68b007	[ML] Fix possible race condition when closing an opening job (#42506 ) This change fixes a race condition that would result in an in-memory data structure becoming out-of-sync with persistent tasks in cluster state. If repeated often enough this could result in it being impossible to open any ML jobs on the affected node, as the master node would think the node had capacity to open another job but the chosen node would error during the open sequence due to its in-memory data structure being full. The race could be triggered by opening a job and then closing it a tiny fraction of a second later. It is unlikely a user of the UI could open and close the job that fast, but a script or program calling the REST API could. The nasty thing is, from the externally observable states and stats everything would appear to be fine - the fast open then close sequence would appear to leave the job in the closed state. It's only later that the leftovers in the in-memory data structure might build up and cause a problem.	2019-05-24 20:11:58 +01:00
David Roberts	f472186b9f	[ML] Improve file structure finder timestamp format determination (#41948 ) This change contains a major refactoring of the timestamp format determination code used by the ML find file structure endpoint. Previously timestamp format determination was done separately for each piece of text supplied to the timestamp format finder. This had the drawback that it was not possible to distinguish dd/MM and MM/dd in the case where both numbers were 12 or less. In order to do this sensibly it is best to look across all the available timestamps and see if one of the numbers is greater than 12 in any of them. This necessitates making the timestamp format finder an instantiable class that can accumulate evidence over time. Another problem with the previous approach was that it was only possible to override the timestamp format to one of a limited set of timestamp formats. There was no way out if a file to be analysed had a timestamp that was sane yet not in the supported set. This is now changed to allow any timestamp format that can be parsed by a combination of these Java date/time formats: yy, yyyy, M, MM, MMM, MMMM, d, dd, EEE, EEEE, H, HH, h, mm, ss, a, XX, XXX, zzz Additionally S letter groups (fractional seconds) are supported providing they occur after ss and separated from the ss by a dot, comma or colon. Spacing and punctuation is also permitted with the exception of the question mark, newline and carriage return characters, together with literal text enclosed in single quotes. The full list of changes/improvements in this refactor is: - Make TimestampFormatFinder an instantiable class - Overrides must be specified in Java date/time format - Joda format is no longer accepted - Joda timestamp formats in outputs are now derived from the determined or overridden Java timestamp formats, not stored separately - Functionality for determining the "best" timestamp format in a set of lines has been moved from TextLogFileStructureFinder to TimestampFormatFinder, taking advantage of the fact that TimestampFormatFinder is now an instantiable class with state - The functionality to quickly rule out some possible Grok patterns when looking for timestamp formats has been changed from using simple regular expressions to the much faster approach of using the Shift-And method of sub-string search, but using an "alphabet" consisting of just 1 (representing any digit) and 0 (representing non-digits) - Timestamp format overrides are now much more flexible - Timestamp format overrides that do not correspond to a built-in Grok pattern are mapped to a %{CUSTOM_TIMESTAMP} Grok pattern whose definition is included within the date processor in the ingest pipeline - Grok patterns that correspond to multiple Java date/time patterns are now handled better - the Grok pattern is accepted as matching broadly, and the required set of Java date/time patterns is built up considering all observed samples - As a result of the more flexible acceptance of Grok patterns, when looking for the "best" timestamp in a set of lines timestamps are considered different if they are preceded by a different sequence of punctuation characters (to prevent timestamps far into some lines being considered similar to timestamps near the beginning of other lines) - Out-of-the-box Grok patterns that are considered now include %{DATE} and %{DATESTAMP}, which have indeterminate day/month ordering - The order of day/month in formats with indeterminate day/month order is determined by considering all observed samples (plus the server locale if the observed samples still do not suggest an ordering) Relates #38086 Closes #35137 Closes #35132	2019-05-24 09:10:08 +01:00
Dimitris Athanasiou	a6eb20ad35	[ML] Include node name when native controller cannot start process (#42225 ) (#42338 ) This adds the node name where we fail to start a process via the native controller to facilitate debugging as otherwise it might not be known to which node the job was allocated.	2019-05-22 12:42:04 +03:00
Yannick Welsch	770d8e9e39	Remove usage of max_local_storage_nodes in test infrastructure (#41652 ) Moves the test infrastructure away from using node.max_local_storage_nodes, allowing us in a follow-up PR to deprecate this setting in 7.x and to remove it in 8.0. This also changes the behavior of InternalTestCluster so that starting up nodes will not automatically reuse data folders of previously stopped nodes. If this behavior is desired, it needs to be explicitly done by passing the data path from the stopped node to the new node that is started.	2019-05-22 11:04:55 +02:00
Ed Savage	d97f4d5e28	[ML][TEST] Fix limits in AutodetectMemoryLimitIT (#42279 ) Re-enable muted tests and accommodate recent backend changes that result in higher memory usage being reported for a job at the start of its life-cycle	2019-05-21 18:44:47 +01:00
Dimitris Athanasiou	a4e6fb4dd2	[ML] Fix logger declaration in ML plugins (#42222 ) (#42238 ) This corrects what appears to have been a copy-paste error where the logger for `MachineLearning` and `DataFrame` was wrongly set to be that of `XPackPlugin`.	2019-05-21 18:03:24 +03:00
Zachary Tong	6ae6f57d39	[7.x Backport] Force selection of calendar or fixed intervals (#41906 ) The date_histogram accepts an interval which can be either a calendar interval (DST-aware, leap seconds, arbitrary length of months, etc) or fixed interval (strict multiples of SI units). Unfortunately this is inferred by first trying to parse as a calendar interval, then falling back to fixed if that fails. This leads to confusing arrangement where `1d` == calendar, but `2d` == fixed. And if you want a day of fixed time, you have to specify `24h` (e.g. the next smallest unit). This arrangement is very error-prone for users. This PR adds `calendar_interval` and `fixed_interval` parameters to any code that uses intervals (date_histogram, rollup, composite, datafeed, etc). Calendar only accepts calendar intervals, fixed accepts any combination of units (meaning `1d` can be used to specify `24h` in fixed time), and both are mutually exclusive. The old interval behavior is deprecated and will throw a deprecation warning. It is also mutually exclusive with the two new parameters. In the future the old dual-purpose interval will be removed. The change applies to both REST and java clients.	2019-05-20 12:07:29 -04:00
Ed Savage	840af87a74	[ML] Temporarily muting failing tests Muting a number of AutoDetectMemoryLimitIT tests to give CI a chance to settle before easing in required backend changes. relates elastic/ml-cpp#486 relates #42086	2019-05-19 08:29:50 -04:00
Ed Savage	a68b04e47b	[ML] Improve hard_limit audit message (#42086 ) Improve the hard_limit memory audit message by reporting how many bytes over the configured memory limit the job was at the point of the last allocation failure. Previously the model memory usage was reported, however this was inaccurate and hence of limited use - primarily because the total memory used by the model can decrease significantly after the models status is changed to hard_limit but before the model size stats are reported from autodetect to ES. While this PR contains the changes to the format of the hard_limit audit message it is dependent on modifications to the ml-cpp backend to send additional data fields in the model size stats message. These changes will follow in a subsequent PR. It is worth noting that this PR must be merged prior to the ml-cpp one, to keep CI tests happy.	2019-05-17 17:40:08 -04:00
David Roberts	226df35d96	[ML] Improve message misformation error in file structure finder (#42175 ) This change replaces the extremely unfriendly message "Number of messages analyzed must be positive" in the case where the sample lines were incorrectly grouped into just one message to an error that more helpfully explains the likely root cause of the problem.	2019-05-16 18:29:38 +01:00
Benjamin Trent	bf5a40c754	[ML] relax set upgrade mode test to match what is guaranteed (#41958 ) (#41979 ) * [ML] relax set upgrade mode test to match what is guaranteed * removing unused import	2019-05-09 14:28:50 -05:00
Jason Tedor	d7fd51a84e	Provide names for all artifact repositories (#41857 ) This commit adds a name for each Maven and Ivy repository used in the build.	2019-05-07 06:35:28 -04:00
Ryan Ernst	6fd8924c5a	Switch run task to use real distro (#41590 ) The run task is supposed to run elasticsearch with the given plugin or module. However, for modules, this is most realistic if using the full distribution. This commit changes the run setup to use the default or oss as appropriate.	2019-05-06 12:34:07 -07:00
Jason Tedor	f4da98ca3d	Use a proper repository for ml-cpp artifacts (#41817 ) This switches the strategy used to download machine learning artifacts from a manual download through S3 to using an Ivy repository on top of S3. This gives us all the benefits of Gradle dependency resolution including local caching.	2019-05-04 12:44:19 -04:00
Jason Tedor	241c4ef97a	Use https for artifact locations This commit switches to using https for some artifact locations.	2019-05-03 16:15:48 -04:00
Benjamin Trent	a92c06ae09	[ML] Refactor NativeStorageProvider to enable reuse (#41414 ) (#41746 ) * [ML] Refactor NativeStorageProvider to enable reuse Moves `NativeStorageProvider` as a machine learning component so that it can be reused for other job types. Also, we now pass the persistent task description as unique identifier which avoids conflicts between jobs of different type but with same ids. * Adding nativeStorageProvider as component Since `TransportForecastJobAction` is expected to get injected a `NativeStorageProvider` class, we need to make sure that it is a constructed component, as it does not have a zero parametered, public ctor.	2019-05-02 09:46:22 -05:00
Jason Tedor	7f3ab4524f	Bump 7.x branch to version 7.2.0 This commit adds the 7.2.0 version constant to the 7.x branch, and bumps BWC logic accordingly.	2019-05-01 13:38:57 -04:00
Tom Veasey	b3f4533e1c	[ML] Update for model selection change and disable temporarily (#41482 ) (#41682 )	2019-04-30 15:47:54 -05:00
Yogesh Gaikwad	c0d40ae4ca	Remove deprecated stashWithOrigin calls and use the alternative (#40847 ) (#41562 ) This commit removes the deprecated `stashWithOrigin` and modifies its usage to use the alternative.	2019-04-28 21:25:42 +10:00
Christoph Büscher	52495843cc	[Docs] Fix common word repetitions (#39703 )	2019-04-25 20:47:47 +02:00
Benjamin Trent	07d36fdb23	[ML] refactoring the ML plugin to use the common auditor code (#41419 ) (#41485 )	2019-04-24 09:56:59 -05:00
Dimitris Athanasiou	eb2295ac81	[7.1][ML] Refactor autodetect service into its own class (#41378 ) (#41409 ) This also improves aims to improve the corresponding unit tests with regard to readability and maintainability.	2019-04-22 17:42:13 +03:00
Zachary Tong	7e62ff2823	[Rollup] Validate timezones based on rules not string comparision (#36237 ) The date_histogram internally converts obsolete timezones (such as "Canada/Mountain") into their modern equivalent ("America/Edmonton"). But rollup just stored the TZ as provided by the user. When checking the TZ for query validation we used a string comparison, which would fail due to the date_histo's upgrading behavior. Instead, we should convert both to a TimeZone object and check if their rules are compatible.	2019-04-17 13:46:44 -04:00
Iana Bondarska	e090176f17	[ML] Exclude analysis fields with core field names from anomaly results (#41093 ) Added "_index", "_type", "_id" to list of reserved fields. Closes #39406	2019-04-17 16:08:03 +01:00
David Kyle	116167df55	[ML] Write header to autodetect before it is visible to other calls (#41085 )	2019-04-16 13:51:29 +01:00
David Roberts	3f00c29adb	[ML] Allow xpack.ml.max_machine_memory_percent higher than 100% (#41193 ) Values higher than 100% are now allowed to accommodate use cases where swapping has been determined to be acceptable. Anomaly detector jobs only use their full model memory during background persistence, and this is deliberately staggered, so with large numbers of jobs few will generally be persisting state at the same time. Settings higher than available memory are only recommended for OEM type situations where a wrapper tightly controls the types of jobs that can be created, and each job alone is considerably smaller than what each node can handle.	2019-04-15 14:37:46 +01:00
Benjamin Trent	05cf53934a	[ML] checking if p-tasks metadata is null before updating state (#41091 ) (#41123 ) * [ML] checking if p-tasks metadata is null before updating state * Adding test that validates fix * removing debug println	2019-04-11 13:54:41 -05:00
Przemysław Witek	f5014ace64	[ML] Add validation that rejects duplicate detectors in PutJobAction (#40967 ) (#41072 ) * [ML] Add validation that rejects duplicate detectors in PutJobAction Closes #39704 * Add YML integration test for duplicate detectors fix. * Use "== false" comparison rather than "!" operator. * Refine error message to sound more natural. * Put job description in square brackets in the error message. * Use the new validation in ValidateJobConfigAction. * Exclude YML tests for new validation from permission tests.	2019-04-10 15:43:35 +02:00
Mark Vieira	1287c7d91f	[Backport] Replace usages RandomizedTestingTask with built-in Gradle Test (#40978 ) (#40993 ) * Replace usages RandomizedTestingTask with built-in Gradle Test (#40978) This commit replaces the existing RandomizedTestingTask and supporting code with Gradle's built-in JUnit support via the Test task type. Additionally, the previous workaround to disable all tasks named "test" and create new unit testing tasks named "unitTest" has been removed such that the "test" task now runs unit tests as per the normal Gradle Java plugin conventions. (cherry picked from commit 323f312bbc829a63056a79ebe45adced5099f6e6) * Fix forking JVM runner * Don't bump shadow plugin version	2019-04-09 11:52:50 -07:00
Jason Tedor	ebba9393c1	Fix unsafe publication of invalid license enforcer (#40985 ) The invalid license enforced is exposed to the cluster state update thread (via the license state listener) before the constructor has finished. This violates the JLS for safe publication of an object, and means there is a concurrency bug lurking here. This commit addresses this by avoiding publication of the invalid license enforcer before the constructor has returned.	2019-04-09 13:51:37 -04:00
David Roberts	d16f86f7ab	[ML] Add created_by info to usage stats (#40518 ) This change adds information about which UI path (if any) created ML anomaly detector jobs to the stats returned by the _xpack/usage endpoint. Counts for the following possibilities are expected: * ml_module_apache_access * ml_module_apm_transaction * ml_module_auditbeat_process_docker * ml_module_auditbeat_process_hosts * ml_module_nginx_access * ml_module_sample * multi_metric_wizard * population_wizard * single_metric_wizard * unknown The "unknown" count is for jobs that do not have a created_by setting in their custom_settings. Closes #38403	2019-04-04 10:55:20 +01:00
Dimitris Athanasiou	65cca2ee6f	[7.x][ML] Scrolling datafeed should clear scroll contexts on error (#40773 ) (#40794 ) Closes #40772	2019-04-04 12:28:06 +03:00
David Kyle	1354696db9	[ML] Data Frame HLRC Get Stats API (#40443 )	2019-03-26 11:17:13 +00:00
Ed Savage	c20ea9a2dd	[ML][TEST] Fix failing test testPersistJobOnGracefulShutdown_givenTimeAdvancedAfterNoNewData (#40363 ) Ensure that there is at least a 1s delay between the time that state is persisted by each of the two jobs in the test. Model snapshot IDs use the current time in epoch seconds to distinguish themselves, hence snapshots will be overwritten by another if it occurs in the same 1s window. Closes #40347	2019-03-25 17:55:10 +00:00
David Turner	1265a15b75	Mute testPersistJobOnGracefulShutdown_givenTimeAdvancedAfterNoNewData	2019-03-22 08:46:51 +00:00
Ed Savage	23d5f7babf	[ML] Add integration tests to check persistence (#40272 ) (#40315 ) Additional checks to exercise the behaviour of persistence on graceful close of an anomaly job. Related to elastic/ml-cpp#393 Backports #40272	2019-03-21 17:01:10 +00:00
David Roberts	64028f3d8f	Mute JobResultsProviderIT.testMultipleSimultaneousJobCreations Due to https://github.com/elastic/elasticsearch/issues/40134	2019-03-17 07:50:08 +00:00
David Roberts	8d01b11918	[ML] Fix race condition when creating multiple jobs (#40049 ) If multiple jobs are created together and the anomaly results index does not exist then some of the jobs could fail to update the mappings of the results index. This lead them to fail to write their results correctly later. Although this scenario sounds rare, it is exactly what happens if the user creates their first jobs using the Nginx module in the ML UI. This change fixes the problem by updating the mappings of the results index if it is found to exist during a creation attempt. Fixes #38785	2019-03-15 10:18:03 +00:00
David Kyle	78a9754318	Mute test NetworkDisruptionIT.testJobRelocation Relates to #39858	2019-03-15 10:06:31 +00:00
Benjamin Trent	2016e23285	[ML] Refactor common utils out of ML plugin to XPack.Core (#39976 ) (#40009 ) * [ML] Refactor common utils out of ML plugin to XPack.Core * implementing GET filters with abstract transport * removing added rest param * adjusting how defaults can be supplied	2019-03-13 17:08:43 -05:00
Dimitris Athanasiou	79e414df86	[ML] Fix datafeed skipping first bucket after lookback when aggs are … (#39859 ) (#39958 ) The problem here was that `DatafeedJob` was updating the last end time searched based on the `now` even though when there are aggregations, the extactor will only search up to the floor of `now` against the histogram interval. This commit fixes the issue by using the end time as calculated by the extractor. It also adds an integration test that uses aggregations. This test would fail before this fix. Unfortunately the test is slow as we need to wait for the datafeed to work in real time. Closes #39842	2019-03-13 09:09:07 +02:00
David Kyle	48788269b0	[ML] Correct small inconsistencies in ml APIs spec and docs (#39907 )	2019-03-11 14:02:50 +00:00
Benjamin Trent	4da04616c9	[ML] refactoring lazy query and agg parsing (#39776 ) (#39881 ) * [ML] refactoring lazy query and agg parsing * Clean up and addressing PR comments * removing unnecessary try/catch block * removing bad call to logger * removing unused import * fixing bwc test failure due to serialization and config migrator test * fixing style issues * Adjusting DafafeedUpdate class serialization * Adding todo for refactor in v8 * Making query non-optional so it does not write a boolean byte	2019-03-10 14:54:02 -05:00
David Roberts	5f8f91c03b	[ML] Use scaling thread pool and xpack.ml.max_open_jobs cluster-wide dynamic (#39736 ) This change does the following: 1. Makes the per-node setting xpack.ml.max_open_jobs into a cluster-wide dynamic setting 2. Changes the job node selection to continue to use the per-node attributes storing the maximum number of open jobs if any node in the cluster is older than 7.1, and use the dynamic cluster-wide setting if all nodes are on 7.1 or later 3. Changes the docs to reflect this 4. Changes the thread pools for native process communication from fixed size to scaling, to support the dynamic nature of xpack.ml.max_open_jobs 5. Renames the autodetect thread pool to the job comms thread pool to make clear that it will be used for other types of ML jobs (data frame analytics in particular) Backport of #39320	2019-03-06 12:29:34 +00:00
Dimitris Athanasiou	5c023770d2	[ML] Disable security audit trail in native integ tests suite (#39683 ) Investigating how to make DeleteExpiredDataIT faster, it was revealed that the security audit trail threads were quite hot. Disabling that seems to be helping quite a bit with making this test faster. This commit also unmutes the test to see how it goes with the audit trail disabled. Relates #39658 Closes #39575	2019-03-05 12:43:15 +02:00
David Kyle	a58145f9e6	[ML] Transition to typeless (mapping) APIs (#39573 ) ML has historically used doc as the single mapping type but reindex in 7.x will change the mapping to _doc. Switching to the typeless APIs handles case where the mapping type is either doc or _doc. This change removes deprecated typed usages.	2019-03-04 13:52:05 +00:00
David Roberts	085ff38122	Mute DeleteExpiredDataIT.testDeleteExpiredData Due to https://github.com/elastic/elasticsearch/issues/39575	2019-03-03 18:34:30 +00:00
Dimitris Athanasiou	8843832039	[ML] Shave off DeleteExpiredDataIT runtime (#39557 ) This commit parallelizes some parts of the test and its remove an unnecessary refresh call. On my local machine it shaves off about 15 seconds for a test execution time of ~64s (down from ~80s). This test is still slow but progress over perfection. Relates #37339	2019-03-01 19:10:00 +02:00
Dimitris Athanasiou	8122650a55	[ML] Add integration test for interim results after advancing bucket (#39447 ) This is an integration test that captures the issue described in elastic/ml-cpp#324	2019-02-28 11:12:08 +02:00
Mehran Koushkebaghi	1d0097b5e8	[ML] Refactoring scheduled event to store instant instead of zoned time zone (#39380 ) The ScheduledEvent class has never preserved the time zone so it makes more sense for it to store the start and end time using Instant rather than ZonedDateTime. Closes #38620	2019-02-27 09:27:04 +00:00
David Roberts	4f2bd238d2	[ML] Increase datafeed integration test timeout for slow machines (#39311 ) The assertBusy() that waits the default 10 seconds for a datafeed to complete very occasionally times out on slow machines. This commit increases the timeout to 60 seconds. It will almost never actually take this long, but it's better to have a timeout that will prevent time being wasted looking at spurious test failures.	2019-02-22 15:35:32 +00:00
Dimitris Athanasiou	1c6818fe74	[ML] Improve DeleteExpiredDataIT failure message (#39298 ) (#39310 ) This test failed once in a very long time with the assertion that there is no document for the `non_existing_job` in the state index. I could not see how that is possible and I cannot reproduce. With this commit the failure message will reveal some examples of the left behind docs which might shed a light about what could go wrong.	2019-02-22 16:15:11 +02:00
Benjamin Trent	109b6451fd	ML refactor DatafeedsConfig(Update) so defaults are not populated in queries or aggs (#38822 ) (#39119 ) * ML refactor DatafeedsConfig(Update) so defaults are not populated in queries or aggs * Addressing pr feedback	2019-02-19 12:45:56 -06:00
David Roberts	35e30b34f9	[ML] Stop the ML memory tracker before closing node (#39111 ) The ML memory tracker does searches against ML results and config indices. These searches can be asynchronous, and if they are running while the node is closing then they can cause problems for other components. This change adds a stop() method to the MlMemoryTracker that waits for in-flight searches to complete. Once stop() has returned the MlMemoryTracker will not kick off any new searches. The MlLifeCycleService now calls MlMemoryTracker.stop() before stopping stopping the node. Fixes #37117	2019-02-19 15:12:40 +00:00
David Roberts	bbcdea43c5	[ML] Allow stop unassigned datafeed and relax unset upgrade mode wait (#39034 ) These two changes are interlinked. Before this change unsetting ML upgrade mode would wait for all datafeeds to be assigned and not waiting for their corresponding jobs to initialise. However, this could be inappropriate, if there was a reason other that upgrade mode why one job was unable to be assigned or slow to start up. Unsetting of upgrade mode would hang in this case. This change relaxes the condition for considering upgrade mode to be unset to simply that an assignment attempt has been made for each ML persistent task that did not fail because upgrade mode was enabled. Thus after unsetting upgrade mode there is no guarantee that every ML persistent task is assigned, just that each is not unassigned due to upgrade mode. In order to make setting upgrade mode work immediately after unsetting upgrade mode it was then also necessary to make it possible to stop a datafeed that was not assigned. There was no particularly good reason why this was not allowed in the past. It is trivial to stop an unassigned datafeed because it just involves removing the persistent task.	2019-02-19 14:07:10 +00:00
David Roberts	b660d2cac6	[ML] More advanced post-test cleanup of ML indices (#39049 ) The .ml-annotations index is created asynchronously when some other ML index exists. This can interfere with the post-test index deletion, as the .ml-annotations index can be created after all other indices have been deleted. This change adds an ML specific post-test cleanup step that runs before the main cleanup and: 1. Checks if any ML indices exist 2. If so, waits for the .ml-annotations index to exist 3. Deletes the other ML indices found in step 1. 4. Calls the super class cleanup This means that by the time the main post-test index cleanup code runs: 1. The only ML index it has to delete will be the .ml-annotations index 2. No other ML indices will exist that could trigger recreation of the .ml-annotations index Fixes #38952	2019-02-18 14:16:03 +00:00
Martijn Laarman	9b4d96534b	Fix #38623 remove xpack namespace REST API (#38625 ) (#39036 ) * Fix #38623 remove xpack namespace REST API Except for xpack.usage and xpack.info API's, this moves the last remaining API's out of the xpack namespace * rename xpack api's inside inside the files as well * updated yaml tests references to xpack namespaces api's * update callsApi calls in the IT subclasses * make sure docs testing does not use xpack namespaced api's * fix leftover xpack namespaced method names in docs/build.gradle * found another leftover reference (cherry picked from commit ccb5d934363c37506b76119ac050a254fa80b5e7)	2019-02-18 12:40:07 +01:00
Dimitris Athanasiou	21f76aba28	[ML] Extract base class for integ tests with native processes (#38850 ) (#38860 )	2019-02-14 12:15:00 +02:00
Benjamin Trent	d2ac05e249	ML allow aliased .ml-anomalies* index on PUT Job (#38821 ) (#38847 )	2019-02-13 10:58:55 -06:00
Benjamin Trent	24a8ea06f5	ML: update set_upgrade_mode, add logging (#38372 ) (#38538 ) * ML: update set_upgrade_mode, add logging * Attempt to fix datafeed isolation Also renamed a few methods/variables for clarity and added some comments	2019-02-08 12:56:04 -06:00
Boaz Leskes	033ba725af	Remove support for internal versioning for concurrency control (#38254 ) Elasticsearch has long [supported](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning) compare and set (a.k.a optimistic concurrency control) operations using internal document versioning. Sadly that approach is flawed and can sometime do the wrong thing. Here's the relevant excerpt from the resiliency status page: > When a primary has been partitioned away from the cluster there is a short period of time until it detects this. During that time it will continue indexing writes locally, thereby updating document versions. When it tries to replicate the operation, however, it will discover that it is partitioned away. It won’t acknowledge the write and will wait until the partition is resolved to negotiate with the master on how to proceed. The master will decide to either fail any replicas which failed to index the operations on the primary or tell the primary that it has to step down because a new primary has been chosen in the meantime. Since the old primary has already written documents, clients may already have read from the old primary before it shuts itself down. The version numbers of these reads may not be unique if the new primary has already accepted writes for the same document We recently [introduced](https://www.elastic.co/guide/en/elasticsearch/reference/6.x/optimistic-concurrency-control.html) a new sequence number based approach that doesn't suffer from this dirty reads problem. This commit removes support for internal versioning as a concurrency control mechanism in favor of the sequence number approach. Relates to #1078	2019-02-05 20:53:35 +01:00
David Turner	f2dd5dd6eb	Remove DiscoveryPlugin#getDiscoveryTypes (#38414 ) With this change we no longer support pluggable discovery implementations. No known implementations of `DiscoveryPlugin` actually override this method, so in practice this should have no effect on the wider world. However, we were using this rather extensively in tests to provide the `test-zen` discovery type. We no longer need a separate discovery type for tests as we no longer need to customise its behaviour. Relates #38410	2019-02-05 17:42:24 +00:00
David Roberts	92bc681705	[ML] Report index unavailable instead of waiting for lazy node (#38423 ) If a job cannot be assigned to a node because an index it requires is unavailable and there are lazy ML nodes then index unavailable should be reported as the assignment explanation rather than waiting for a lazy ML node.	2019-02-05 16:10:00 +00:00
Yogesh Gaikwad	fe36861ada	Add support for API keys to access Elasticsearch (#38291 ) X-Pack security supports built-in authentication service `token-service` that allows access tokens to be used to access Elasticsearch without using Basic authentication. The tokens are generated by `token-service` based on OAuth2 spec. The access token is a short-lived token (defaults to 20m) and refresh token with a lifetime of 24 hours, making them unsuitable for long-lived or recurring tasks where the system might go offline thereby failing refresh of tokens. This commit introduces a built-in authentication service `api-key-service` that adds support for long-lived tokens aka API keys to access Elasticsearch. The `api-key-service` is consulted after `token-service` in the authentication chain. By default, if TLS is enabled then `api-key-service` is also enabled. The service can be disabled using the configuration setting. The API keys:- - by default do not have an expiration but expiration can be configured where the API keys need to be expired after a certain amount of time. - when generated will keep authentication information of the user that generated them. - can be defined with a role describing the privileges for accessing Elasticsearch and will be limited by the role of the user that generated them - can be invalidated via invalidation API - information can be retrieved via a get API - that have been expired or invalidated will be retained for 1 week before being deleted. The expired API keys remover task handles this. Following are the API key management APIs:- 1. Create API Key - `PUT/POST /_security/api_key` 2. Get API key(s) - `GET /_security/api_key` 3. Invalidate API Key(s) `DELETE /_security/api_key` The API keys can be used to access Elasticsearch using `Authorization` header, where the auth scheme is `ApiKey` and the credentials, is the base64 encoding of API key Id and API key separated by a colon. Example:- ``` curl -H "Authorization: ApiKey YXBpLWtleS1pZDphcGkta2V5" http://localhost:9200/_cluster/health ``` Closes #34383	2019-02-05 14:21:57 +11:00
David Roberts	fb6a176caf	[ML] Add explanation so far to file structure finder exceptions (#38191 ) The explanation so far can be invaluable for troubleshooting as incorrect decisions made early on in the structure analysis can result in seemingly crazy decisions or timeouts later on. Relates elastic/kibana#29821	2019-02-04 14:32:35 +00:00
Boaz Leskes	ff13a43144	Move ML Optimistic Concurrency Control to Seq No (#38278 ) This commit moves the usage of internal versioning for CAS operations to use sequence numbers and primary terms Relates to #36148 Relates to #10708	2019-02-04 10:41:08 +01:00
David Turner	1d82a6d9f9	Deprecate unused Zen1 settings (#38289 ) Today the following settings in the `discovery.zen` namespace are still used: - `discovery.zen.no_master_block` - `discovery.zen.hosts_provider` - `discovery.zen.ping.unicast.concurrent_connects` - `discovery.zen.ping.unicast.hosts.resolve_timeout` - `discovery.zen.ping.unicast.hosts` This commit deprecates all other settings in this namespace so that they can be removed in the next major version.	2019-02-04 08:52:08 +00:00
Benjamin Trent	5db305023d	ML: Fix error race condition on stop _all datafeeds and close _all jobs (#38113 ) * ML: Ignore when task is not found for _all * Addressing PR comments * Update TransportStopDatafeedAction.java	2019-02-01 11:16:35 -06:00
David Roberts	1fa413a16d	[ML] Remove "8" prefixes from file structure finder timestamp formats (#38016 ) In 7.x Java timestamp formats are the default timestamp format and there is no need to prefix them with "8". (The "8" prefix was used in 6.7 to distinguish Java timestamp formats from Joda timestamp formats.) This change removes the "8" prefixes from timestamp formats in the output of the ML file structure finder.	2019-02-01 15:36:04 +00:00
Benjamin Trent	be381b4525	ML: better handle task state race condition (#38040 )	2019-01-31 11:07:54 -06:00
Henning Andersen	68ed72b923	Handle scheduler exceptions (#38014 ) Scheduler.schedule(...) would previously assume that caller handles exception by calling get() on the returned ScheduledFuture. schedule() now returns a ScheduledCancellable that no longer gives access to the exception. Instead, any exception thrown out of a scheduled Runnable is logged as a warning. This is a continuation of #28667, #36137 and also fixes #37708.	2019-01-31 17:51:45 +01:00
Benjamin Trent	9782aaa1b8	ML: Add reason field in JobTaskState (#38029 ) * ML: adding reason to job failure status * marking reason as nullable * Update AutodetectProcessManager.java	2019-01-30 11:56:24 -06:00
Benjamin Trent	8280a20664	ML: Add upgrade mode docs, hlrc, and fix bug (#37942 ) * ML: Add upgrade mode docs, hlrc, and fix bug * [DOCS] Fixes build error and edits text * adjusting docs * Update docs/reference/ml/apis/set-upgrade-mode.asciidoc Co-Authored-By: benwtrent <ben.w.trent@gmail.com> * Update set-upgrade-mode.asciidoc * Update set-upgrade-mode.asciidoc	2019-01-30 06:51:11 -06:00
Adrien Grand	c8af0f4bfa	Use mappings to format doc-value fields by default. (#30831 ) Doc-value fields now return a value that is based on the mappings rather than the script implementation by default. This deprecates the special `use_field_mapping` docvalue format which was added in #29639 only to ease the transition to 7.x and it is not necessary anymore in 7.0.	2019-01-30 10:31:51 +01:00
Benjamin Trent	34d61d3231	ML: ignore unknown fields for JobTaskState (#37982 )	2019-01-29 12:51:34 -06:00
David Kyle	6d1693ff49	[ML] Prevent submit after autodetect worker is stopped (#37700 ) Runnables can be submitted to AutodetectProcessManager.AutodetectWorkerExecutorService without error after it has been shutdown which can lead to requests timing out as their handlers are never called by the terminated executor. This change throws an EsRejectedExecutionException if a runnable is submitted after after the shutdown and calls AbstractRunnable.onRejection on any tasks not run. Closes #37108	2019-01-29 15:09:40 +00:00
Henrique Gonçalves	eceb3185c7	[ML] Make GetJobStats work with arbitrary wildcards and groups (#36683 ) The /_ml/anomaly_detectors/{job}/_stats endpoint now works correctly when {job} is a wildcard or job group. Closes #34745	2019-01-29 09:06:50 +00:00
Dimitris Athanasiou	ebe9c95230	[ML] Audit all errors during job deletion (#37933 ) This commit moves the auditing of job deletion related errors to the final listener in the job delete action. This ensures any error that occurs during job deletion is audited.	2019-01-29 10:23:50 +02:00
Benjamin Trent	7e4c0e6991	ML: Adds set_upgrade_mode API endpoint (#37837 ) * ML: Add MlMetadata.upgrade_mode and API * Adding tests * Adding wait conditionals for the upgrade_mode call to return * Adding tests * adjusting format and tests * Adjusting wait conditions for api return and msgs * adjusting doc tests * adding upgrade mode tests to black list	2019-01-28 09:07:30 -06:00
David Kyle	c0409fb9f0	[ML] Marginal gains in slow multi node QA tests (#37825 ) Move 2 tests that are simple rest tests and out of the QA suite and cut the number of post data calls in ForecastIT	2019-01-28 10:00:59 +00:00
David Roberts	57d321ed5f	[ML] Tighten up use of aliases rather than concrete indices (#37874 ) We have read and write aliases for the ML results indices. However, the job still had methods that purported to reliably return the name of the concrete results index being used by the job. After reindexing prior to upgrade to 7.x this will be wrong, so the method has been renamed and the comments made more explicit to say the returned index name may not be the actual concrete index name for the lifetime of the job. Additionally, the selection of indices when deleting the job has been changed so that it works regardless of concrete index names. All these changes are nice-to-have for 6.7 and 7.0, but will become critical if we add rolling results indices in the 7.x release stream as 6.7 and 7.0 nodes may have to operate in a mixed version cluster that includes a version that can roll results indices.	2019-01-28 09:38:46 +00:00
David Roberts	f2c0c26d15	[ML] Adjust structure finder for Joda to Java time migration (#37306 ) The ML file structure finder has always reported both Joda and Java time format strings. This change makes the Java time format strings the ones that are incorporated into mappings and ingest pipeline definitions. The BWC syntax of prepending "8" to these formats is used. This will need to be removed once Java time format strings become the default in Elasticsearch. This commit also removes direct imports of Joda classes in the structure finder unit tests. Instead the core Joda BWC class is used.	2019-01-26 20:19:57 +00:00
Benjamin Trent	9e932f4869	ML: removing unnecessary upgrade code (#37879 )	2019-01-25 13:57:41 -06:00
Christoph Büscher	b4b4cd6ebd	Clean codebase from empty statements (#37822 ) * Remove empty statements There are a couple of instances of undocumented empty statements all across the code base. While they are mostly harmless, they make the code hard to read and are potentially error-prone. Removing most of these instances and marking blocks that look empty by intention as such. * Change test, slightly more verbose but less confusing	2019-01-25 14:23:02 +01:00
David Roberts	deafce1acd	[ML] No need to add state doc mapping on job open in 7.x (#37759 ) When upgrading from 5.4 to 5.5 to 6.7 (inclusive) it was necessary to ensure there was a mapping for type "doc" on the ML state index before opening a job. This was because 5.4 created a multi-type ML state index. In version 7.x we can be sure that any such 5.4 index is no longer in use. It would have had to be reindexed into the 6.x index format prior to the upgrade to version 7.x.	2019-01-25 13:15:35 +00:00
Jim Ferenczi	787acb14b9	Track total hits up to 10,000 by default (#37466 ) This commit changes the default for the `track_total_hits` option of the search request to `10,000`. This means that by default search requests will accurately track the total hit count up to `10,000` documents, requests that match more than this value will set the `"total.relation"` to `"gte"` (e.g. greater than or equals) and the `"total.value"` to `10,000` in the search response. Scroll queries are not impacted, they will continue to count the total hits accurately. The default is set back to `true` (accurate hit count) if `rest_total_hits_as_int` is set in the search request. I choose `10,000` as the default because that's also the number we use to limit pagination. This means that users will be able to know how far they can jump (up to 10,000) even if the total number of hits is not accurate. Closes #33028	2019-01-25 13:45:39 +01:00
David Kyle	e1226f69b7	[ML] Increase close job timeout and lower the max number (#37770 )	2019-01-24 09:18:48 +00:00
Lee Hinman	427bc7f940	Use ILM for Watcher history deletion (#37443 ) * Use ILM for Watcher history deletion This commit adds an index lifecycle policy for the `.watch-history-*` indices. This policy is automatically used for all new watch history indices. This does not yet remove the automatic cleanup that the monitoring plugin does for the .watch-history indices, and it does not touch the `xpack.watcher.history.cleaner_service.enabled` setting. Relates to #32041	2019-01-23 10:18:08 -07:00
Alexander Reelsen	daa2ec8a60	Switch mapping/aggregations over to java time (#36363 ) This commit moves the aggregation and mapping code from joda time to java time. This includes field mappers, root object mappers, aggregations with date histograms, query builders and a lot of changes within tests. The cut-over to java time is a requirement so that we can support nanoseconds properly in a future field mapper. Relates #27330	2019-01-23 10:40:05 +01:00
David Roberts	7b3dd3022d	[ML] Update ML results mappings on process start (#37706 ) This change moves the update to the results index mappings from the open job action to the code that starts the autodetect process. When a rolling upgrade is performed we need to update the mappings for already-open jobs that are reassigned from an old version node to a new version node, but the open job action is not called in this case. Closes #37607	2019-01-23 09:37:37 +00:00
Ryan Ernst	fc99eb3e65	Add cache cleaning task for ML snapshot (#37505 ) The ML subproject of xpack has a cache for the cpp artifact snapshots which is checked on each build. The cache is outside of the build dir so that it is not wiped on a typical clean, as the artifacts can be large and do not change often. This commit adds a cleanCache task which will wipe the cache dir, as over time the size of the directory can become bloated.	2019-01-19 16:16:58 -08:00
Benjamin Trent	12cdf1cba4	ML: Add support for single bucket aggs in Datafeeds (#37544 ) Single bucket aggs are now supported in datafeed aggregation configurations.	2019-01-18 15:08:53 -06:00
Benjamin Trent	5384162a42	ML: creating ML State write alias and pointing writes there (#37483 ) * ML: creating ML State write alias and pointing writes there * Moving alias check to openJob method * adjusting concrete index lookup for ml-state	2019-01-18 14:32:34 -06:00
Yannick Welsch	6d64a2a901	Propagate Errors in executors to uncaught exception handler (#36137 ) This is a continuation of #28667 and has as goal to convert all executors to propagate errors to the uncaught exception handler. Notable missing ones were the direct executor and the scheduler. This commit also makes it the property of the executor, not the runnable, to ensure this property. A big part of this commit also consists of vastly improving the test coverage in this area.	2019-01-17 17:46:35 +01:00
David Kyle	75410dc632	[Ml] Prevent config snapshot failure blocking migration (#37493 )	2019-01-16 11:51:15 +00:00
Hendrik Muhs	15d1b904a1	[ML] log minimum diskspace setting if forecast fails due to insufficient d… (#37486 ) log minimum disk space setting if forecast fails due to insufficient disk space	2019-01-16 08:10:13 +01:00
David Kyle	bea46f7b52	[ML] Migrate unallocated jobs and datafeeds (#37430 ) Migrate ml job and datafeed config of open jobs and update the parameters of the persistent tasks as they become unallocated during a rolling upgrade. Block allocation of ml persistent tasks until the configs are migrated.	2019-01-15 18:21:39 +00:00
David Kyle	7c11b05c28	[ML] Remove unused code from the JIndex project (#37477 )	2019-01-15 17:19:58 +00:00
David Roberts	7cdf7f882b	[ML] Fix ML datafeed CCS with wildcarded cluster name (#37470 ) The test that remote clusters used by ML datafeeds have a license that allows ML was not accounting for the possibility that the remote cluster name could be wildcarded. This change fixes that omission. Fixes #36228	2019-01-15 14:19:05 +00:00
Julie Tibshirani	36a3b84fc9	Update the default for include_type_name to false. (#37285 ) * Default include_type_name to false for get and put mappings. * Default include_type_name to false for get field mappings. * Add a constant for the default include_type_name value. * Default include_type_name to false for get and put index templates. * Default include_type_name to false for create index. * Update create index calls in REST documentation to use include_type_name=true. * Some minor clean-ups around the get index API. * In REST tests, use include_type_name=true by default for index creation. * Make sure to use 'expression == false'. * Clarify the different IndexTemplateMetaData toXContent methods. * Fix FullClusterRestartIT#testSnapshotRestore. * Fix the ml_anomalies_default_mappings test. * Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests. We make sure to specify include_type_name=true during xContent parsing, so we continue to test the legacy typed responses. XContent generation for the typeless responses is currently only covered by REST tests, but we will be adding unit test coverage for these as we implement each typeless API in the Java HLRC. This commit also refactors GetMappingsResponse to follow the same appraoch as the other mappings-related responses, where we read include_type_name out of the xContent params, instead of creating a second toXContent method. This gives better consistency in the response parsing code. * Fix more REST tests. * Improve some wording in the create index documentation. * Add a note about types removal in the create index docs. * Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL. * Make sure to mention include_type_name in the REST docs for affected APIs. * Make sure to use 'expression == false' in FullClusterRestartIT. * Mention include_type_name in the REST templates docs.	2019-01-14 13:08:01 -08:00
David Kyle	2ee55a50bf	[ML] Use String rep of Version in map for serialisation (#37416 )	2019-01-14 16:39:47 +00:00
Benjamin Trent	5101e51891	ML: Fix testMigrateConfigs (#37373 ) * ML: :s/execute/get * Fixing other broken tests * unmuting test	2019-01-11 13:29:30 -06:00
Gordon Brown	827ece73c8	Mute MlConfigMigratorIT.testMigrateConfigs (#37374 )	2019-01-11 11:11:58 -07:00
David Roberts	953fb9352f	[ML] Update error message for process update (#37363 ) When this message was first added the model debug config was the only thing that could be updated, but now more aspects of the config can be updated so the message needs to be more general.	2019-01-11 16:31:55 +00:00
Benjamin Trent	19a7e0f4eb	ML: update .ml-state actions to support > 1 index (#37307 ) * ML: Updating .ml-state calls to be able to support > 1 index * Matching bulk delete behavior with dbq * Adjusting state name * refreshing indices before search * fixing line length * adjusting index expansion options	2019-01-11 08:03:41 -06:00
David Roberts	1da59db3fb	[ML] Wait for autodetect to be ready in the datafeed (#37349 ) This is a reinforcement of #37227. It turns out that persistent tasks are not made stale if the node they were running on is restarted and the master node does not notice this. The main scenario where this happens is when minimum master nodes is the same as the number of nodes in the cluster, so the cluster cannot elect a master node when any node is restarted. When an ML node restarts we need the datafeeds for any jobs that were running on that node to not just wait until the jobs are allocated, but to wait for the autodetect process of the job to start up. In the case of reassignment of the job persistent task this was dealt with by the stale status test. But in the case where a node restarts but its persistent tasks are not reassigned we need a deeper test. Fixes #36810	2019-01-11 13:22:35 +00:00
markharwood	434430506b	Type removal - added deprecation warnings to _bulk apis (#36549 ) Added warnings checks to existing tests Added “defaultTypeIfNull” to DocWriteRequest interface so that Bulk requests can override a null choice of document type with any global custom choice. Related to #35190	2019-01-10 21:35:19 +00:00
David Roberts	b65006e8cd	[ML] Fix ML memory tracker for old jobs (#37311 ) Jobs created in version 6.1 or earlier can have a null model_memory_limit. If these are parsed from cluster state following a full cluster restart then we replace the null with 4096mb to make the meaning explicit. But if such jobs are streamed from an old node in a mixed version cluster this does not happen. Therefore we need to account for the possibility of a null model_memory_limit in the ML memory tracker.	2019-01-10 17:28:00 +00:00
Benjamin Trent	df3b58cb04	ML: add migrate anomalies assistant (#36643 ) * ML: add migrate anomalies assistant * adjusting failure handling for reindex * Fixing request and tests * Adding tests to blacklist * adjusting test * test fix: posting data directly to the job instead of relying on datafeed * adjusting API usage * adding Todos and adjusting endpoint * Adding types to reindexRequest * removing unreliable "live" data test * adding index refresh to test * adding index refresh to test * adding index refresh to yaml test * fixing bad exists call * removing todo * Addressing remove comments * Adjusting rest endpoint name * making service have its own logger * adjusting validity check for newindex names * fixing typos * fixing renaming	2019-01-09 14:25:35 -06:00
David Roberts	e0ce73713f	[ML] Stop datafeeds running when their jobs are stale (#37227 ) We already had logic to stop datafeeds running against jobs that were OPENING, but a job that relocates from one node to another while OPENED stays OPENED, and this could cause the datafeed to fail when it sent data to the OPENED job on its new node before it had a corresponding autodetect process. This change extends the check to stop datafeeds running when their job is OPENING _or_ stale (i.e. has not had its status reset since relocating to a different node). Relates #36810	2019-01-09 10:42:47 +00:00
David Roberts	f14cff2102	[TEST] Ensure interrupted flag reset after test that sets it (#37230 ) Test fix to stop a problem in one test leaking into a different test and causing that other test to spuriously fail.	2019-01-09 08:51:00 +00:00
Benjamin Trent	6b376a1ff4	ML: fix delayed data annotations on secured cluster (#37193 ) * changing executing context for writing annotation * adjusting user * removing unused import	2019-01-07 15:18:38 -06:00
Benjamin Trent	1780ced82d	ML: changing JobResultsProvider.getForecastRequestStats to support > 1 index (#37157 ) * ML: changing JobResultsProvider.getForecastRequestStats to support more than one index * moving to use idsQuery()	2019-01-07 10:58:55 -06:00
Armin Braun	31c33fdb9b	MINOR: Remove some Deadcode in Gradle (#37160 )	2019-01-07 09:21:25 +01:00
David Roberts	ff7df40b20	[ML] Uplift model memory limit on job migration (#37126 ) When a 6.1-6.3 job is opened in a later version we increase the model memory limit by 30% if it's below 0.5GB. The migration of jobs from cluster state to the config index changes the job version, so we need to also do this uplift as part of that config migration. Relates #36961	2019-01-04 12:21:28 +00:00
Dimitris Athanasiou	0fd27d4d6f	[ML] Unused state remover should also account for jobs in index (#37119 ) The unused state remover was never adjusted to account for jobs stored in the config index. The result was that when triggered it removed state for all jobs stored in the config index. This commit fixes the issue. Closes #37109	2019-01-04 12:43:44 +02:00
Dimitris Athanasiou	586453fef1	[ML] Remove types from datafeed (#36538 ) Closes #34265	2019-01-04 09:43:44 +02:00
David Roberts	13649aa70a	[TEST] Revert "Mute ForecastIT.testSingleSeries" (#37110 ) The problem that caused the test to be muted was fixed in https://github.com/elastic/ml-cpp/pull/332 Closes #36258	2019-01-03 16:23:18 +00:00
Benjamin Trent	cfc310748d	addressing (#36891 )(#36888 )(#36889 ) (#37080 )	2019-01-03 07:25:57 -06:00
David Kyle	42bb2bae21	[ML] Order GET job stats response by job id (#36841 )	2019-01-02 16:52:20 +00:00
Hendrik Muhs	632c7fbed2	[ML] fix x-pack usage regression caused by index migration (#36936 ) Changes the feature usage retrieval to use the job manager rather than directly talking to the cluster state, because jobs can now be either in cluster state or stored in an index This is a follow-up of #36702 / #36698	2018-12-31 08:30:08 +01:00
Dimitris Athanasiou	08bcd83757	[ML] Reduce persistent tasks periodic reassignment interval in ... (#36845 ) ... MlDistributedFailureIT.testLoseDedicatedMasterNode. An intermittent failure has been observed in `MlDistributedFailureIT. testLoseDedicatedMasterNode`. The test launches a cluster comprised by a dedicated master node and a data and ML node. It creates a job and datafeed and starts them. It then shuts down and restarts the master node. Finally, the test asserts that the two tasks have been reassigned within 10s. The intermittent failure is due to the assertions that the tasks have been reassigned failing. Investigating the failure revealed that the `assertBusy` that performs that assertion times out. Furthermore, it appears that the job task is not reassigned because the memory tracking info is stale. Memory tracking info is refreshed asynchronously when a job is attempted to be reassigned. Tasks are attempted to be reassigned either due to a relevant cluster state change or periodically. The periodic interval is controlled by a cluster setting called `cluster.persistent_tasks.allocation.recheck_interval` and defaults to 30s. What seems to be happening in this test is that if all cluster state changes after the master node is restarted come through before the async memory info refresh completes, then the job might take up to 30s until it is attempted to reassigned. Thus the `assertBusy` times out. This commit changes the test to reduce the periodic check that reassigns persistent tasks to `200ms`. If the above theory is correct, this should eradicate those failures. Closes #36760	2018-12-20 14:53:36 +02:00
David Roberts	0f2f00a20a	[ML] Resolve 7.0.0 TODOs in ML code (#36842 ) This change cleans up a number of ugly BWC workarounds in the ML code. 7.0 cannot run in a mixed version cluster with versions prior to 6.7, so code that deals with these old versions is no longer required. Closes #29963	2018-12-20 12:49:57 +00:00
David Kyle	d43cbdab97	[ML] ensure the ml-config index (#36792 ) (#36832 )	2018-12-19 13:43:43 +00:00
David Roberts	ad20d6bb83	[ML] Followup to annotations index creation (#36824 ) Fixes two minor problems reported after merge of #36731: 1. Name the creation method to make clear it only creates if necessary 2. Avoid multiple simultaneous in-flight creation requests	2018-12-19 13:06:24 +00:00
Alpar Torok	e9ef5bdce8	Converting randomized testing to create a separate unitTest task instead of replacing the builtin test task (#36311 ) - Create a separate unitTest task instead of Gradle's built in - convert all configuration to use the new task - the built in task is now disabled	2018-12-19 08:25:20 +02:00
Benjamin Trent	1d429cf1c9	ML having delayed data detection create annotations (#36796 ) * ML having delayed data detection create annotations * adding upsertAsDoc, audit, and changing user * changing update to just index the doc with the id set	2018-12-18 18:40:38 -06:00
David Kyle	e294056bbf	[ML] Merge the Jindex master feature branch (#36702 ) * [ML] Job and datafeed mappings with index template (#32719) Index mappings for the configuration documents * [ML] Job config document CRUD operations (#32738) * [ML] Datafeed config CRUD operations (#32854) * [ML] Change JobManager to work with Job config in index (#33064) * [ML] Change Datafeed actions to read config from the config index (#33273) * [ML] Allocate jobs based on JobParams rather than cluster state config (#33994) * [ML] Return missing job error when .ml-config is does not exist (#34177) * [ML] Close job in index (#34217) * [ML] Adjust finalize job action to work with documents (#34226) * [ML] Job in index: Datafeed node selector (#34218) * [ML] Job in Index: Stop and preview datafeed (#34605) * [ML] Delete job document (#34595) * [ML] Convert job data remover to work with index configs (#34532) * [ML] Job in index: Get datafeed and job stats from index (#34645) * [ML] Job in Index: Convert get calendar events to index docs (#34710) * [ML] Job in index: delete filter action (#34642) This changes the delete filter action to search for jobs using the filter to be deleted in the index rather than the cluster state. * [ML] Job in Index: Enable integ tests (#34851) Enables the ml integration tests excluding the rolling upgrade tests and a lot of fixes to make the tests pass again. * [ML] Reimplement established model memory (#35500) This is the 7.0 implementation of a master node service to keep track of the native process memory requirement of each ML job with an associated native process. The new ML memory tracker service works when the whole cluster is upgraded to at least version 6.6. For mixed version clusters the old mechanism of established model memory stored on the job in cluster state was used. This means that the old (and complex) code to keep established model memory up to date on the job object has been removed in 7.0. Forward port of #35263 * [ML] Need to wait for shards to replicate in distributed test (#35541) Because the cluster was expanded from 1 node to 3 indices would initially start off with 0 replicas. If the original node was killed before auto-expansion to 1 replica was complete then the test would fail because the indices would be unavailable. * [ML] DelayedDataCheckConfig index mappings (#35646) * [ML] JIndex: Restore finalize job action (#35939) * [ML] Replace Version.CURRENT in streaming functions (#36118) * [ML] Use 'anomaly-detector' in job config doc name (#36254) * [ML] Job In Index: Migrate config from the clusterstate (#35834) Migrate ML configuration from clusterstate to index for closed jobs only once all nodes are v6.6.0 or higher * [ML] Check groups against job Ids on update (#36317) * [ML] Adapt to periodic persistent task refresh (#36633) * [ML] Adapt to periodic persistent task refresh If https://github.com/elastic/elasticsearch/pull/36069/files is merged then the approach for reallocating ML persistent tasks after refreshing job memory requirements can be simplified. This change begins the simplification process. * Remove AwaitsFix and implement TODO * [ML] Default search size for configs * Fix TooManyJobsIT.testMultipleNodes Two problems: 1. Stack overflow during async iteration when lots of jobs on same machine 2. Not effectively setting search size in all cases * Use execute() instead of submit() in MlMemoryTracker We don't need a Future to wait for completion * [ML][TEST] Fix NPE in JobManagerTests * [ML] JIindex: Limit the size of bulk migrations (#36481) * [ML] Prevent updates and upgrade tests (#36649) * [FEATURE][ML] Add cluster setting that enables/disables config migration (#36700) This commit adds a cluster settings called `xpack.ml.enable_config_migration`. The setting is `true` by default. When set to `false`, no config migration will be attempted and non-migrated resources (e.g. jobs, datafeeds) will be able to be updated normally. Relates #32905 * [ML] Snapshot ml configs before migrating (#36645) * [FEATURE][ML] Split in batches and migrate all jobs and datafeeds (#36716) Relates #32905 * SQL: Fix translation of LIKE/RLIKE keywords (#36672) * SQL: Fix translation of LIKE/RLIKE keywords Refactor Like/RLike functions to simplify internals and improve query translation when chained or within a script context. Fix #36039 Fix #36584 * Fixing line length for EnvironmentTests and RecoveryTests (#36657) Relates #34884 * Add back one line removed by mistake regarding java version check and COMPAT jvm parameter existence * Do not resolve addresses in remote connection info (#36671) The remote connection info API leads to resolving addresses of seed nodes when invoked. This is problematic because if a hostname fails to resolve, we would not display any remote connection info. Yet, a hostname not resolving can happen across remote clusters, especially in the modern world of cloud services with dynamically chaning IPs. Instead, the remote connection info API should be providing the configured seed nodes. This commit changes the remote connection info to display the configured seed nodes, avoiding a hostname resolution. Note that care was taken to preserve backwards compatibility with previous versions that expect the remote connection info to serialize a transport address instead of a string representing the hostname. * [Painless] Add boxed type to boxed type casts for method/return (#36571) This adds implicit boxed type to boxed types casts for non-def types to create asymmetric casting relative to the def type when calling methods or returning values. This means that a user calling a method taking an Integer can call it with a Byte, Short, etc. legally which matches the way def works. This creates consistency in the casting model that did not previously exist. * SNAPSHOTS: Adjust BwC Versions in Restore Logic (#36718) * Re-enables bwc tests with adjusted version conditions now that #36397 enables concurrent snapshots in 6.6+ * ingest: fix on_failure with Drop processor (#36686) This commit allows a document to be dropped when a Drop processor is used in the on_failure fork of the processor chain. Fixes #36151 * Initialize startup `CcrRepositories` (#36730) Currently, the CcrRepositoryManger only listens for settings updates and installs new repositories. It does not install the repositories that are in the initial settings. This commit, modifies the manager to install the initial repositories. Additionally, it modifies the ccr integration test to configure the remote leader node at startup, instead of using a settings update. * [TEST] fix float comparison in RandomObjects#getExpectedParsedValue This commit fixes a test bug introduced with #36597. This caused some test failure as stored field values comparisons would not work when CBOR xcontent type was used. Closes #29080 * [Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (#35320) This commit exposes lucene's LatLonShape field as the default type in GeoShapeFieldMapper. To use the new indexing approach, simply set "type" : "geo_shape" in the mappings without setting any of the strategy, precision, tree_levels, or distance_error_pct parameters. Note the following when using the new indexing approach: * geo_shape query does not support querying by MULTIPOINT. * LINESTRING and MULTILINESTRING queries do not yet support WITHIN relation. * CONTAINS relation is not yet supported. The tree, precision, tree_levels, distance_error_pct, and points_only parameters are deprecated. * TESTS:Debug Log. IndexStatsIT#testFilterCacheStats * ingest: support default pipelines + bulk upserts (#36618) This commit adds support to enable bulk upserts to use an index's default pipeline. Bulk upsert, doc_as_upsert, and script_as_upsert are all supported. However, bulk script_as_upsert has slightly surprising behavior since the pipeline is executed _before_ the script is evaluated. This means that the pipeline only has access the data found in the upsert field of the script_as_upsert. The non-bulk script_as_upsert (existing behavior) runs the pipeline _after_ the script is executed. This commit does _not_ attempt to consolidate the bulk and non-bulk behavior for script_as_upsert. This commit also adds additional testing for the non-bulk behavior, which remains unchanged with this commit. fixes #36219 * Fix duplicate phrase in shrink/split error message (#36734) This commit removes a duplicate "must be a" from the shrink/split error messages. * Deprecate types in get_source and exist_source (#36426) This change adds a new untyped endpoint `{index}/_source/{id}` for both the GET and the HEAD methods to get the source of a document or check for its existance. It also adds deprecation warnings to RestGetSourceAction that emit a warning when the old deprecated "type" parameter is still used. Also updating documentation and tests where appropriate. Relates to #35190 * Revert "[Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (#35320)" This reverts commit `5bc7822562`. * Enhance Invalidate Token API (#35388) This change: - Adds functionality to invalidate all (refresh+access) tokens for all users of a realm - Adds functionality to invalidate all (refresh+access)tokens for a user in all realms - Adds functionality to invalidate all (refresh+access) tokens for a user in a specific realm - Changes the response format for the invalidate token API to contain information about the number of the invalidated tokens and possible errors that were encountered. - Updates the API Documentation After back-porting to 6.x, the `created` field will be removed from master as a field in the response Resolves: #35115 Relates: #34556 * Add raw sort values to SearchSortValues transport serialization (#36617) In order for CCS alternate execution mode (see #32125) to be able to do the final reduction step on the CCS coordinating node, we need to serialize additional info in the transport layer as part of each `SearchHit`. Sort values are already present but they are formatted according to the provided `DocValueFormat` provided. The CCS node needs to be able to reconstruct the lucene `FieldDoc` to include in the `TopFieldDocs` and `CollapseTopFieldDocs` which will feed the `mergeTopDocs` method used to reduce multiple search responses (one per cluster) into one. This commit adds such information to the `SearchSortValues` and exposes it through a new getter method added to `SearchHit` for retrieval. This info is only serialized at transport and never printed out at REST. * Watcher: Ensure all internal search requests count hits (#36697) In previous commits only the stored toXContent version of a search request was using the old format. However an executed search request was already disabling hit counts. In 7.0 hit counts will stay enabled by default to allow for proper migration. Closes #36177 * [TEST] Ensure shard follow tasks have really stopped. Relates to #36696 * Ensure MapperService#getAllMetaFields elements order is deterministic (#36739) MapperService#getAllMetaFields returns an array, which is created out of an `ObjectHashSet`. Such set does not guarantee deterministic hash ordering. The array returned by its toArray may be sorted differently at each run. This caused some repeatability issues in our tests (see #29080) as we pick random fields from the array of possible metadata fields, but that won't be repeatable if the input array is sorted differently at every run. Once setting the tests seed, hppc picks that up and the sorting is deterministic, but failures don't repeat with the seed that gets printed out originally (as a seed was not originally set). See also https://issues.carrot2.org/projects/HPPC/issues/HPPC-173. With this commit, we simply create a static sorted array that is used for `getAllMetaFields`. The change is in production code but really affects only testing as the only production usage of this method was to iterate through all values when parsing fields in the high-level REST client code. Anyways, this seems like a good change as returning an array would imply that it's deterministically sorted. * Expose Sequence Number based Optimistic Concurrency Control in the rest layer (#36721) Relates #36148 Relates #10708 * [ML] Mute MlDistributedFailureIT	2018-12-18 17:45:31 +00:00
Mayya Sharipova	f884b2b1cd	Deprecate types in index API (#36575 ) * Deprecate types in index API - deprecate type-based constructors of IndexRequest - update tests to use typeless IndexRequest constructors - no yaml tests as they have been already added in #35790 Relates to #35190	2018-12-18 08:53:49 -05:00
David Roberts	624307410e	[ML] Create the ML annotations index (#36731 ) The ML UI now provides the ability for users to annotate time periods with arbitrary text to add insight to what happened. This change makes the backend create the index for these annotations, together with read and write aliases to make future upgrades possible without adding complexity to the UI. It also adds read and write permission to the index for all ML users (not just admins). The spec for the index is in https://github.com/elastic/kibana/pull/26034/files#diff-c5c6ac3dbb0e7c91b6d127aa06121b2cR7 Relates #33376 Relates elastic/kibana#26034	2018-12-18 12:18:29 +00:00
David Roberts	2dd56cf945	[TEST] Make filestructurefinder.TimeoutCheckerTests more robust	2018-12-14 22:28:12 +00:00
David Roberts	690b10a4a1	[ML] Interrupt Grok in file structure finder timeout (#36588 ) The file structure finder has timeout functionality, but prior to this change it would not interrupt a single long-running Grok match attempt. This commit hooks into the ThreadWatchdog facility provided by the Grok library to interrupt individual Grok matches that may be running at the time the file structure finder timeout expires.	2018-12-14 07:18:09 +00:00
Nik Everett	03daad9812	Re-deprecate xpack rollup endpoints (#36451 ) Redeprecates the `/_xpack/rollup` endpoints in favor of `/_rollup`. When we cleanup the rollup in a cluster containing 6.x nodes we need to use `/_xpack/rollup` instead of `/_rollup` because the 6.x nodes don't know about `/_rollup`. In those cases we must ignore the deprecation warnings that the 7.0 node will return for the end point. Closes #36044	2018-12-11 19:43:17 -05:00
Ioannis Kakavas	d7c5d8049a	Deprecate /_xpack/security/* in favor of /_security/* (#36293 ) * This commit is part of our plan to deprecate and ultimately remove the use of _xpack in the REST APIs. - REST API docs - HLRC docs and doc tests - Handle REST actions with deprecation warnings - Changed endpoints in rest-api-spec and relevant file names	2018-12-11 11:13:10 +02:00
Ryan Ernst	a27f2efca5	Core: Converge FormatDateTimeFormatter and DateFormatter apis (#36390 ) This commit makes FormatDateTimeFormatter and DateFormatter apis close to each other, so that the former can be removed in favor of the latter. This PR does not change the uses of FormatDateTimeFormatter yet, so that that future change can be purely mechanical.	2018-12-07 17:23:41 -08:00
David Roberts	9e8cfbb40d	[ML] Deprecate X-Pack centric ML endpoints (#36315 ) This commit is part of our plan to deprecate and ultimately remove the use of _xpack in the REST APIs. Relates #35958	2018-12-07 20:34:11 +00:00
Dimitris Athanasiou	b8dba16376	[ML] Ensure total hits are tracked (#36374 ) This is in preparation of the anticipated change that will disable accurate total hits tracking in searches.	2018-12-07 18:01:37 +00:00
Dimitris Athanasiou	0dd73ef7da	[ML] Move consuming and closing results stream to result processor (#36314 ) The results iterator is consuming and closing the results stream once it is done. It seems this should not be the responsibility of the results iterator. It stops the iterator from being reusable for different processes where closing the stream is not desirable. This commit is moving the consuming and closing of the results stream into the autodetect result processor.	2018-12-07 09:33:51 +00:00
Ryan Ernst	37b3fc383f	Build: Use explicit deps on test tasks for check (#36325 ) This commit moves back to use explicit dependsOn for test tasks on check. Not all tasks extending RandomizedTestingTask should be run by check directly.	2018-12-06 14:13:49 -08:00
Benjamin Trent	3e04a90e99	[ML] Adding audits when deprecation warnings occur with datafeed start (#36233 ) * [ML] Adding audits when deprecation warnings occur with datafeed start * adjusting parameters for log format call	2018-12-06 15:58:37 -06:00
Jim Ferenczi	18866c4c0b	Make hits.total an object in the search response (#35849 ) This commit changes the format of the `hits.total` in the search response to be an object with a `value` and a `relation`. The `value` indicates the number of hits that match the query and the `relation` indicates whether the number is accurate (in which case the relation is equals to `eq`) or a lower bound of the total (in which case it is equals to `gte`). This change also adds a parameter called `rest_total_hits_as_int` that can be used in the search APIs to opt out from this change (retrieve the total hits as a number in the rest response). Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain `hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a follow up (to allow numbers to be passed to `track_total_hits`). Relates #33028	2018-12-05 19:49:06 +01:00
David Roberts	a3c1c6938a	Mute ForecastIT.testSingleSeries Due to https://github.com/elastic/elasticsearch/issues/36258	2018-12-05 14:41:07 +00:00
Alpar Torok	60e45cd81d	Testing conventions task part 2 (#36107 ) Closes #35435 - make it easier to add additional testing tasks with the proper configuration and add some where they were missing. - mute or fix failing tests - add a check as part of testing conventions to find classes not included in any testing task.	2018-12-05 14:20:01 +02:00
Martijn van Groningen	11935cd480	Replace Streamable w/ Writeable in BaseTasksResponse and subclasses (#36176 ) This commit replaces usages of Streamable with Writeable for the BaseTasksResponse / TransportTasksAction classes and subclasses of these classes. Note that where possible response fields were made final. Relates to #34389	2018-12-05 13:14:10 +01:00
Benjamin Trent	166d9a94d4	[ML] Add lazy parsing for DatafeedConfig:Aggs,Query (#36117 ) * Lazily parsing aggs and query in DatafeedConfigs * Adding parser tests * Fixing exception types && unneccessary checked ex * Adding semi aggregation parser * Adding tests, fixing up semi-parser * Reverting semi-parsing * Moving agg validations * Making bad configs throw badRequestException	2018-12-04 09:41:47 -06:00
Martijn van Groningen	43773a32a4	Replace Streamable w/ Writeable in BaseTasksRequest and subclasses (#35854 ) * Replace Streamable w/ Writeable in BaseTasksRequest and subclasses This commit replaces usages of Streamable with Writeable for the BaseTasksRequest / TransportTasksAction classes and subclasses of these classes. Relates to #34389	2018-12-03 08:04:29 +01:00
Dimitris Athanasiou	54cf1f9d74	[ML] Refactor control message writer to allow reuse for other processes (#36070 )	2018-11-30 09:25:35 +00:00
Zachary Tong	61c2db5ebb	Revert "Deprecate X-Pack centric rollup endpoints (#35962 )" This reverts commit `b84f1f6a3a`.	2018-11-29 12:58:23 -05:00
Jason Tedor	b84f1f6a3a	Deprecate X-Pack centric rollup endpoints (#35962 ) This commit is part of our plan to deprecate and ultimately remove the use of _xpack in the REST APIs.	2018-11-27 20:34:17 -05:00
Jay Modi	2061eeb122	Remove use of AbstractComponent in xpack (#35394 ) This commit removes the use of AbstractComponent in xpack where it was still being extended. It has been replaced with explicit logger declarations. See #34488	2018-11-27 11:28:26 -07:00
David Roberts	110c4fdd65	[ML] Adjust file structure finder parser config (#35935 )	2018-11-27 12:52:52 +00:00
Benjamin Trent	6d4a3f8fce	Removes two unused AnalysisConfig options (#35645 ) * ML: Removing result_finalization_window && overlapping_buckets * Reverting bad method deletions * Setting to current before backport to try and get a green build * fixing testBuildAutodetectCommand test * disabling bwc tests for backport	2018-11-19 08:29:53 -06:00
Benjamin Trent	bc7dea4480	ML: changing automatic check_window calculation (#35643 ) * ML: changing automatic check_window calculation * adding docs on how we calculate the default	2018-11-19 08:03:34 -06:00
Arthur Gavlyukovskiy	022726011c	Remove use of AbstractComponent in server (#35444 ) Removed extending of AbstractComponent and changed logger usage to explicit declaration. Abstract classes still have logger declaration using this.getClass() in order to show implementation class name in its logs. See #34488	2018-11-16 16:10:32 -05:00
Benjamin Trent	f7ada9b29b	Add delayed datacheck to the datafeed job runner (#35387 ) * ML: Adding missing datacheck to datafeedjob * Adding client side and docs * Making adjustments to validations * Making values default to on, having more sensible limits * Intermittent commit, still need to figure out interval * Adjusting delayed data check interval * updating docs * Making parameter Boolean, so it is nullable * bumping bwc to 7 before backport * changing to version current * moving delayed data check config its own object * Separation of duties for delayed data detection * fixing checkstyles * fixing checkstyles * Adjusting default behavior so that null windows are allowed * Mentioning the default value * Fixing comments, syncing up validations	2018-11-15 13:32:45 -06:00
Tanguy Leroux	c9b4ef0dfd	Use RunOnce when appropriate (#35553 ) This pull request replaces some blocks of code that must be run once and that are currently based on AtomicBoolean by the convient RunOnce class added in #35489.	2018-11-15 09:24:40 +01:00
David Roberts	09965cb370	[ML] Fix find_file_structure NPE with should_trim_fields (#35465 ) The NPE would occur if should_trim_field was overridden to true and any field value was completely blank. This change defends against this situation. Fixes #35462	2018-11-13 08:49:24 +00:00
David Kyle	9494e046e7	[ML] Prevent notifications on deletion of a non existent job (#35337 )	2018-11-08 09:57:07 +00:00
Jason Tedor	4f4fc3b8f8	Replicate index settings to followers (#35089 ) This commit uses the index settings version so that a follower can replicate index settings changes as needed from the leader. Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>	2018-11-07 21:20:51 -05:00
Benjamin Trent	2117f4f358	[ML] Add Missing data checking class (#35310 ) * ML: Adding missing data check class * reverting bad change * Adding bucket + missing data object for returns * reverting unnecessary change * adding license header * Make client calls synchronous, akin to DatafeedJob * Fixing line length * Renaming things, addressing PR comments	2018-11-07 12:48:15 -06:00
Nik Everett	f72ef9b5fd	Build: Pull "skip assemble on qa" to common build (#35214 ) Pull all of the logic that we use to skip the `assemble` and `dependenciesInfo` tasks on `qa` projects into one spot in our root build file.	2018-11-05 16:16:00 -05:00
Alexander Reelsen	409050e8de	Refactor: Remove settings from transport action CTOR (#35208 ) As settings are not used in the transport action constructor, this removes the passing of the settings in all the transport actions.	2018-11-05 13:08:18 +01:00
David Kyle	85f8458f06	[ML] Add comment describing test behaviour	2018-11-05 11:21:59 +00:00
Tal Levy	c3cf7dd305	Merge remote-tracking branch 'upstream/master' into index-lifecycle	2018-11-01 10:13:02 -07:00
Benjamin Trent	2fadec5c3d	ML: Add support for rollup Indexes in Datafeeds (#34654 ) * Adding rollup support for datafeeds * Fixing tests and adjusting formatting * minor formatting chagne * fixing some syntax and removing redundancies * Refactoring and fixing failing test * Refactoring, adding paranoid null check * Moving rollup into the aggregation package * making AggregationToJsonProcessor package private again * Addressing test failure * Fixing validations, chunking * Addressing failing test * rolling back RollupJobCaps changes * Adding comment and cleaning up test * Addressing review comments and test failures * Moving builder logic into separate methods * Addressing PR comments, adding test for rollup permissions * Fixing test failure * Adding rollup priv check on datafeed put * Handling missing index when getting caps * Fixing unused import	2018-11-01 10:02:24 -05:00
Nik Everett	e28509fbfe	Core: Less settings to AbstractComponent (#35140 ) Stop passing `Settings` to `AbstractComponent`'s ctor. This allows us to stop passing around `Settings` in a ton of places. While this change touches many files, it touches them all in fairly small, mechanical ways, doing a few things per file: 1. Drop the `super(settings);` line on everything that extends `AbstractComponent`. 2. Drop the `settings` argument to the ctor if it is no longer used. 3. If the file doesn't use `logger` then drop `extends AbstractComponent` from it. 4. Clean up all compilation failure caused by the `settings` removal and drop any now unused `settings` isntances and method arguments. I've intentionally not removed the `settings` argument from a few files: 1. TransportAction 2. AbstractLifecycleComponent 3. BaseRestHandler These files don't need `settings` either, but this change is large enough as is. Relates to #34488	2018-10-31 21:23:20 -04:00
David Turner	0072c90e2a	Pre-populate unicast hosts files (#35136 ) Today when ESIntegTestCase starts some nodes it writes out the unicast hosts files each time a node starts its transport service. This does mean that a number of nodes can start and perform their first pinging round without any unicast hosts which, if the timing is unlucky and a lot of nodes are all started at the same time, can lead to a split brain as in #35052. Prior to #33554 this was unlikely to happen since the MockUncasedHostsProvider would always have yielded the existing hosts, so the timing would have to have been implausibly unlucky. Since #33554, however, it's more likely because the race occurs between the start of the first round of pinging and the writing of the unicast hosts file. It is realistic that new nodes will be configured with the existing nodes from startup, so this change reinstates that behaviour. Closes #35052.	2018-10-31 19:21:24 +00:00
Tal Levy	d5d28420b6	Merge remote-tracking branch 'upstream/master' into index-lifecycle	2018-10-31 10:47:07 -07:00
Dimitris Athanasiou	00dc2ba36f	[ML] Enable reusing field extraction logic when no time field is required (#35100 )	2018-10-31 10:55:11 +00:00
Nik Everett	086ada4c08	Core: Drop settings member from AbstractComponent (#35083 ) Drops the `Settings` member from `AbstractComponent`, moving it from the base class on to the classes that use it. For the most part this is a mechanical change that doesn't drop `Settings` accesses. The one exception to this is naming threads where it switches from an invocation that passes `Settings` and extracts the node name to one that explicitly passes the node name. This change doesn't drop the `Settings` argument from `AbstractComponent`'s ctor because this change is big enough as is. We'll do that in a follow up change.	2018-10-30 16:10:38 -04:00
Tal Levy	18c72e86c5	Merge remote-tracking branch 'upstream/master' into index-lifecycle	2018-10-30 08:09:57 -07:00
David Turner	f7760ddacb	Increase discovery log level in NetworkDisruptionIT Sometimes the cluster forming here will split-brain when it grows up to 5 nodes. This could be a timing issue or could be something going wrong in discovery, so this asks for more logs. Relates #35052	2018-10-30 08:45:23 +00:00
Dimitris Athanasiou	d85a654ebb	[ML] Refactor doc value format into ExtractedField (#35053 ) This commit moves the knowledge of which doc value format to be used down to the `ExtractedField` instead of being in the data extractor.	2018-10-29 22:56:53 +00:00
Tal Levy	c9e4d26a53	Merge remote-tracking branch 'upstream/master' into index-lifecycle	2018-10-29 14:03:55 -07:00
Pratik Sanglikar	f1135ef0ce	Core: Replace deprecated Loggers calls with LogManager. (#34691 ) Replace deprecated Loggers calls with LogManager. Relates to #32174	2018-10-29 15:52:30 -04:00
David Roberts	c455be7bc2	[ML] Rename the json file structure to ndjson (#34901 ) The file structure finder endpoint can find the NDJSON (newline-delimited JSON) file format, but called it `json`. This change renames the `format` for this file structure to `ndjson`, which is more precise and will hopefully avoid confusion.	2018-10-29 10:06:12 +01:00
Tal Levy	d8322ca069	Merge remote-tracking branch 'upstream/master' into index-lifecycle	2018-10-26 12:46:21 -07:00
Nik Everett	10295b306d	Core: Drop nodeName from AbstractComponent (#34487 ) `AbstractComponent` is trouble because its name implies that everything should extend from it. It is useful, but maybe too broadly useful. The things it offers access too, the `Settings` instance for the entire server and a logger are nice to have around, but not really needed everywhere. The `Settings` instance especially adds a fair bit of ceremony to testing without any value. This removes the `nodeName` method from `AbstractComponent` so it is more clear where we actually need the node name.	2018-10-26 15:26:14 -04:00
Dimitris Athanasiou	a39a67cd38	[ML] Extract common native process base class (#34856 ) We currently have two different native processes: autodetect & normalizer. There are plans for introducing a new process. All these share many things in common. This commit refactors the processes to extend an `AbstractNativeProcess` class that encapsulates those commonalities with the purpose of reusing the code for new processes in the future.	2018-10-26 15:34:48 +01:00
David Roberts	734088673e	[ML] Include message in field_stats for text log files (#34861 ) This change ensures the `message` field is always included in the `field_stats` for the semi-structured text log file file structure. Previously it was not, as it will almost certainly contain all distinct values. However, for consistency in the UI it's useful to include it.	2018-10-26 07:45:02 +02:00
Tal Levy	e1fdd00420	Lowercase static final DeprecationLogger instance names (#34887 ) After discussing on the team's FixItFriday, we concluded that static final instance variables that are mutable should be lowercased. Historically, DeprecationLogger was uppercased more frequently than lowercased.	2018-10-25 21:12:19 -07:00
Lee Hinman	3e7042832a	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-10-25 11:00:36 -06:00
Alpar Torok	795d57b4f9	Auto configure all test tasks (#34666 ) With this change, we apply the common test config automatically to all newly created tasks instead of opting in specifically. For plugin authors using the plugin externally this means that the configuration will be applied to their RandomizedTestingTasks as well. The purpose of the task is to simplify setup and make it easier to change projects that use the `test` task but actually run integration tests to use a task called `integTest` for clarity, but also because we may want to configure and run them differently. E.x. using different levels of concurrency.	2018-10-24 16:05:50 +03:00
lipsill	d5ad3de42e	[test] Introduce strict deprecation mode for REST tests (#34338 ) #33708 introduced a strict deprecation mode that makes a REST request fail if there is a warning header in the response returned by Elasticsearch (usually a deprecation message signaling that a feature or a field has been deprecated). This change adds the strict deprecation mode into the REST integration tests, and makes the tests fail if a deprecated feature is used. Also any test using a deprecated feature has been modified to pass the build. The YAML integration tests already analyzed HTTP warnings so they do not use this mode, keeping their "expected vs actual" behavior.	2018-10-24 08:21:24 -04:00
Ryan Ernst	596b5cf108	Test: Fix last reference to SearchScript (#34731 ) This was accidentally left over when converting to FieldScript. closes #34683	2018-10-23 17:26:17 -07:00
Tal Levy	67bfdb16ad	Merge remote-tracking branch 'upstream/master' into index-lifecycle	2018-10-22 13:09:37 -07:00
jaymode	ae1e46b852	Test: add empty test to PainlessDomainSplitIT All of the tests in PainlessDomainSplitIT have an awaitsfix, which causes the build to fail since no tests are run. This adds an empty test to get the build going again. Relates #34683 Relates #32966	2018-10-22 14:01:44 -06:00
Jason Tedor	7af19b8f81	Migrate wait for pending tasks helper to server (#34675 ) In some of our X-Pack REST tests we have to wait for pending tasks to complete. We are now needing this functionality in ESRestTestCase for the docs tests where we run against X-Pack features. This commit moves the helper method that we have in X-Pack to ESRestTestCase, and removes duplicate logic from waiting for rollup tasks to complete.	2018-10-22 11:14:02 -04:00
Jason Tedor	e562afad69	Awaits fix PainlessDomainSplitIT#testIsolated This test fails reliably with a compilation error. This commit awaits fix the test.	2018-10-21 19:13:53 -04:00
Colin Goodheart-Smithe	84ef91529c	Merge branch 'master' into index-lifecycle	2018-10-19 13:24:04 +01:00
David Roberts	0f8d05f2d0	[TEST] Reduce forecast disk space requirement for tests (#34552 ) The setting that reduces the disk space requirement for the forecasting integration tests was accidentally removed in #31757 when files were moved around. This change simply adds back the setting that existed before that.	2018-10-18 12:43:41 +01:00
Ryan Ernst	d445785f1a	Scripting: Convert domainSplit function for ML to whitelist (#34426 ) This commit moves the definition of domainSplit into java and exposes it as a painless whitelist extension. The method also no longer needs params, and version which ignores params is added and deprecated.	2018-10-17 15:54:21 -07:00
Colin Goodheart-Smithe	90f7cec7a5	Merge branch 'master' into index-lifecycle	2018-10-17 18:22:23 +01:00
Benjamin Trent	fb579d2d9a	ML: Adding support for lazy nodes (#29991 ) (#34538 )	2018-10-17 08:30:15 -05:00
Colin Goodheart-Smithe	0b42eda0e3	Merge branch 'master' into index-lifecycle	2018-10-15 16:03:37 +01:00
David Roberts	21c759af0e	[ML] Add an ingest pipeline definition to structure finder (#34350 ) The ingest pipeline that is produced is very simple. It contains a grok processor if the format is semi-structured text, a date processor if the format contains a timestamp, and a remove processor if required to remove the interim timestamp field parsed out of semi-structured text. Eventually the UI should offer the option to customize the pipeline with additional processors to perform other data preparation steps before ingesting data to an index.	2018-10-12 07:56:35 +01:00
David Turner	7352f0da60	Handle pre-6.x time fields (#34373 ) In `ccb9ab5717` we changed how we deal with time fields to support the `DateTime`-format fields added in 6.0, but dropped support for pre-6.x `Long`-format fields. This change reinstates this support for cases where pre-6.x data is made available to ML (e.g. in a mixed-version CCS setup or after an upgrade).	2018-10-11 15:33:09 +01:00
Dimitris Athanasiou	4dacfa95d2	[ML] Allow asynchronous job deletion (#34058 ) This changes the delete job API by adding the choice to delete a job asynchronously. The commit adds a `wait_for_completion` parameter to the delete job request. When set to `false`, the action returns immediately and the response contains the task id. This also changes the handling of subsequent delete requests for a job that is already being deleted. It now uses the task framework to check if the job is being deleted instead of the cluster state. This is a beneficial for it is going to also be working once the job configs are moved out of the cluster state and into an index. Also, force delete requests that are waiting for the job to be deleted will not proceed with the deletion if the first task fails. This will prevent overloading the cluster. Instead, the failure is communicated better via notifications so that the user may retry. Finally, this makes the `deleting` property of the job visible (also it was renamed from `deleted`). This allows a client to render a deleting job differently. Closes #32836	2018-10-05 02:41:28 +03:00
David Kyle	ef5007b6d8	[ML] Remove unused last_data_time member from Job (#34262 )	2018-10-04 13:16:14 +01:00
Kazuhiro Sera	d45fe43a68	Fix a variety of typos and misspelled words (#32792 )	2018-10-03 18:11:38 +01:00
Lee Hinman	2d9cb21490	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-10-01 14:10:09 -06:00
Benjamin Trent	96be057195	Removing unused ML parameters (#34159 )	2018-10-01 08:09:46 -07:00
David Roberts	a1d2ded98d	[ML] Fix unit test deadlock problem (#34174 ) This change fixes a potential deadlock problem in the unit test introduced in #34117. It also removes a piece of debug code and corrects a docs formatting problem that were both added in that same PR.	2018-10-01 15:35:37 +01:00
Lee Hinman	6ea396a476	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-09-28 15:40:12 -06:00
David Roberts	f709c2f694	[ML] Add a timeout option to file structure finder (#34117 ) This can be used to restrict the amount of CPU a single structure finder request can use. The timeout is not implemented precisely, so requests may run for slightly longer than the timeout before aborting. The default is 25 seconds, which is a little below Kibana's default timeout of 30 seconds for calls to Elasticsearch APIs.	2018-09-28 17:32:35 +01:00
Lee Hinman	a26cc1a242	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-09-27 11:00:37 -06:00
Christoph Büscher	ba3ceeaccf	Clean up "unused variable" warnings (#31876 ) This change cleans up "unused variable" warnings. There are several cases were we most likely want to suppress the warnings (especially in the client documentation test where the snippets contain many unused variables). In a lot of cases the unused variables can just be deleted though.	2018-09-26 14:09:32 +02:00
Ed Savage	cc70352b3f	[ML] Modify thresholds for normalization triggers (#33663 ) [ML] Modify thresholds for normalization triggers The (arbitrary) threshold factors used to judge if scores have changed significantly enough to trigger a look-back renormalization have been changed to values that reduce the frequency of such renormalizations. Added a clause to treat changes in scores as a 'big change' if it would result in a change of severity reported in the UI. Also altered the clause affecting small scores so that a change should be considered big if scores have changed by at least 1.5. Relates https://github.com/elastic/machine-learning-qa/issues/263	2018-09-25 15:30:10 +01:00
David Roberts	dfe5af0411	[ML] Return both Joda and Java formats from structure finder (#33900 ) Previously the timestamp_formats field in the response from the find_file_structure endpoint contained Joda timestamp formats. This change makes that clear by renaming the field to joda_timestamp_formats, and also adds a java_timestamp_formats field containing the equivalent Java time format strings.	2018-09-25 12:52:51 +01:00
Benjamin Trent	74d7be805a	Make certain ML node settings dynamic (#33565 ) (#33961 ) * Make certain ML node settings dynamic (#33565) * Changing to pull in updating settings and pass to constructor * adding note about only newly opened jobs getting updated value	2018-09-24 12:54:32 -07:00
Lee Hinman	243e863f6e	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-09-24 10:33:51 -06:00
David Roberts	b89551c452	[ML] Display integers without .0 in file structure field stats (#33947 ) Previously numeric values in the field_stats created by the find_file_structure endpoint were always output with a decimal point. This looked unfriendly and unnatural for fields that clearly store integer values. This change converts integer values to type Integer before output in the file structure field stats.	2018-09-22 15:48:59 +01:00
Benjamin Trent	e17bd8e913	Removing poor randomization for node name (#33918 )	2018-09-21 04:49:20 -07:00
Christoph Büscher	b654d986d7	Add OneStatementPerLineCheck to Checkstyle rules (#33682 ) This change adds the OneStatementPerLineCheck to our checkstyle precommit checks. This rule restricts the number of statements per line to one. The resoning behind this is that it is very difficult to read multiple statements on one line. People seem to mostly use it in short lambdas and switch statements in our code base, but just going through the changes already uncovered some actual problems in randomization in test code, so I think its worth it.	2018-09-21 11:52:31 +02:00
Dimitris Athanasiou	8e3a0fad9d	[ML] Refactor job deletion logic into the transport action (#33891 ) The job deletion logic was scattered around a few places: the transport action, the job manager and the deletion task. Overloading the task with deletion logic also meant extra dependencies in the core package which should be unnecessary. This commit consolidates all this logic into the transport action and replaces the deletion task with a plain one that needs not be aware of deletion logic.	2018-09-20 15:48:42 +01:00
Benjamin Trent	4767a016a5	Adding node_count to ML Usage (#33850 ) (#33863 )	2018-09-19 13:35:09 -07:00
Lee Hinman	81e9150c7a	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-09-19 09:43:26 -06:00
Alan Woodward	5107949402	Allow TokenFilterFactories to rewrite themselves against their preceding chain (#33702 ) We currently special-case SynonymFilterFactory and SynonymGraphFilterFactory, which need to know their predecessors in the analysis chain in order to correctly analyze their synonym lists. This special-casing doesn't work with Referring filter factories, such as the Multiplexer or Conditional filters. We also have a number of filters (eg the Multiplexer) that will break synonyms when they appear before them in a chain, because they produce multiple tokens at the same position. This commit adds two methods to the TokenFilterFactory interface. * `getChainAwareTokenFilterFactory()` allows a filter factory to rewrite itself against its preceding filter chain, or to resolve references to other filters. It replaces `ReferringFilterFactory` and `CustomAnalyzerProvider.checkAndApplySynonymFilter`, and by default returns `this`. * `getSynonymFilter()` defines whether or not a filter should be applied when building a synonym list `Analyzer`. By default it returns `true`. Fixes #33609	2018-09-19 15:52:14 +01:00
Benjamin Trent	4190a9f1e9	Delete custom index if the only contained job is deleted (#33788 ) * Delete custom index if the only contained job is deleted	2018-09-19 07:42:26 -07:00
Lee Hinman	e6cbaa5a78	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-09-14 16:27:37 -06:00
David Roberts	568ac10ca6	[ML] Allow overrides for some file structure detection decisions (#33630 ) This change modifies the file structure detection functionality such that some of the decisions can be overridden with user supplied values. The fields that can be overridden are: - charset - format - has_header_row - column_names - delimiter - quote - should_trim_fields - grok_pattern - timestamp_field - timestamp_format If an override makes finding the file structure impossible then the endpoint will return an exception.	2018-09-14 09:29:11 +01:00
Benjamin Trent	7e51b960fb	Adding index refresh (#33647 )	2018-09-13 10:44:33 -07:00
Colin Goodheart-Smithe	8e59de3eb2	Merge branch 'master' into index-lifecycle	2018-09-13 09:46:14 +01:00
Jay Modi	20c6c9c542	Address license state update/read thread safety (#33396 ) This change addresses some issues regarding thread safety around updates and method calls on the XPackLicenseState object. There exists a possibility that there could be a concurrent update to the XPackLicenseState when there is a scheduled check to see if the license is expired and a cluster state update. In order to address this, the update method now has a synchronized block where member variables are updated. Each method that reads these variables is now also synchronized. Along with the above change, there was a consistency issue around security calls to the license state. The majority of security checks make two calls to the license state, which could result in incorrect behavior due to the checks being made against different license states. The majority of this behavior was introduced for 6.3 with the inclusion of x-pack in the default distribution. In order to resolve the majority of these cases, the `isSecurityEnabled` method is no longer public and the logic is also included in individual methods about security such as `isAuthAllowed`. There were a few cases where this did not remove multiple calls on the license state, so a new method has been added which creates a copy of the current license state that will not change. Callers can use this copy of the license state to make decisions based on a consistent view of the license state.	2018-09-12 13:08:09 -06:00
David Roberts	8e05ce567f	[ML] Rename input_fields to column_names in file structure (#33568 ) This change tightens up the meaning of the "input_fields" field in the file structure finder output. Previously it was permitted but not calculated for JSON and XML files. Following this change the field is called "column_names" and is only permitted for delimited files. Additionally the way the column names are set for headerless delimited files is refactored to encapsulate the way they're named to one line of the code rather than having the same logic in two places.	2018-09-11 08:46:26 +01:00
Colin Goodheart-Smithe	cdc4f57a77	Merge branch 'master' into index-lifecycle	2018-09-10 21:30:44 +01:00
Dimitris Athanasiou	fcb15b0ce3	[ML] Get job stats request should filter non-ML job tasks (#33516 ) When requesting job stats for `_all`, all ES tasks are accepted resulting to loads of cluster traffic and a memory overhead. This commit correctly filters out non ML job tasks. Closes #33515	2018-09-09 22:53:03 +01:00
Nhat Nguyen	94e4cb64c2	Bootstrap a new history_uuid when force allocating a stale primary (#33432 ) This commit ensures that we bootstrap a new history_uuid when force allocating a stale primary. A stale primary should never be the source of an operation-based recovery to another shard which exists before the forced-allocation. Closes #26712	2018-09-08 19:29:31 -04:00
David Roberts	e42cc5cd8c	[ML] Add a file structure determination endpoint (#33471 ) This endpoint accepts an arbitrary file in the request body and attempts to determine the structure. If successful it also proposes mappings that could be used when indexing the file's contents, and calculates simple statistics for each of the fields that are useful in the data preparation step prior to configuring machine learning jobs.	2018-09-07 17:41:57 +01:00
Colin Goodheart-Smithe	017ffe5d12	Merge branch 'master' into index-lifecycle	2018-09-07 10:59:10 +01:00
Jim Ferenczi	79cd6385fe	Collapse package structure for metrics aggs (#33463 ) This change collapses all metrics aggregations classes into a single package `org.elasticsearch.aggregations.metrics`. It also restricts the visibility of some classes (aggregators and factories) that should not be used outside of the package. Relates #22868	2018-09-07 10:58:06 +02:00
David Roberts	0849b98f60	[ML] Rename log structure to file structure (#33421 ) Many files supplied to the upcoming ML data preparation functionality will not be "log" files. For example, CSV files are generally not "log" files. Therefore it makes sense to rename library that determines the structure of these files. Although "file structure" could be considered too broad, as the library currently only works with a few text formats, in the future it may be extended to work with more formats.	2018-09-06 09:13:08 +01:00
Tal Levy	b5f7fb6882	Merge branch 'master' into index-lifecycle	2018-09-05 12:56:58 -07:00
David Roberts	a296829205	[ML] Add field stats to log structure finder (#33351 ) The log structure endpoint will return these in addition to pure structure information so that it can be used to drive pre-import data visualizer functionality. The statistics for every field are count, cardinality (distinct count) and top hits (most common values). Extra statistics are calculated if the field is numeric: min, max, mean and median.	2018-09-05 12:57:20 +01:00
Colin Goodheart-Smithe	f00a28a909	Merge branch 'master' into index-lifecycle	2018-09-05 09:48:48 +01:00
Nik Everett	ebd5eb6dc2	ML: Fix build after HLRC change I recently merged a HLRC change that passed the PR builds but didn't compile after merging. Sad time. This fixes the compilation.	2018-09-04 11:10:44 -04:00
Sohaib Iftikhar	761e8c461f	HLRC: Add delete by query API (#32782 ) Adds the delete-by-query API to the High Level REST Client.	2018-09-04 08:56:26 -04:00
Dimitris Athanasiou	1457b07a06	[ML] The sort field on get records should default to the record_score (#33358 ) This is not changing the behaviour as when the sort field was set to `influencer_score` the secondary sort would be used and that was using the `record_score` at the highest priority.	2018-09-04 11:38:24 +01:00
David Roberts	84eaac79d7	[ML] Minor improvements to categorization Grok pattern creation (#33353 ) 1. The TOMCAT_DATESTAMP format needs to be checked before TIMESTAMP_ISO8601, otherwise TIMESTAMP_ISO8601 will match the start of the Tomcat datestamp. 2. Exclude more characters before and after numbers. For example, in 1.2.3 we don't want to match 1.2 as a float.	2018-09-04 09:43:49 +01:00
Alpar Torok	7f7e8fd733	Disable assemble task instead of removing it (#33348 )	2018-09-04 07:32:14 +03:00
Benjamin Trent	767d8e0801	[ML] Delete forecast API (#31134 ) (#33218 ) * Delete forecast API (#31134)	2018-09-03 19:06:18 -05:00
Colin Goodheart-Smithe	e2c1beb1be	Merge branch 'master' into index-lifecycle	2018-09-03 10:01:16 +01:00
Nhat Nguyen	b93507608a	Merge branch 'master' into ccr * master: Mute test watcher usage stats output [Rollup] Fix FullClusterRestart test Adjust soft-deletes version after backport into 6.5 completely drop `index.shard.check_on_startup: fix` for 7.0 (#33194) Fix AwaitsFix issue number Mute SmokeTestWatcherWithSecurityIT testsi drop `index.shard.check_on_startup: fix` (#32279) tracked at [DOCS] Moves ml folder from x-pack/docs to docs (#33248) [DOCS] Move rollup APIs to docs (#31450) [DOCS] Rename X-Pack Commands section (#33005) TEST: Disable soft-deletes in ParentChildTestCase Fixes SecurityIntegTestCase so it always adds at least one alias (#33296) Fix pom for build-tools (#33300) Lazy evaluate java9home (#33301) SQL: test coverage for JdbcResultSet (#32813) Work around to be able to generate eclipse projects (#33295) Highlight that index_phrases only works if no slop is used (#33303) Different handling for security specific errors in the CLI. Fix for https://github.com/elastic/elasticsearch/issues/33230 (#33255) [ML] Refactor delimited file structure detection (#33233) SQL: Support multi-index format as table identifier (#33278) MINOR: Remove Dead Code from PathTrie (#33280) Enable forbiddenapis server java9 (#33245)	2018-08-31 19:03:04 -04:00
Colin Goodheart-Smithe	3eef74d5d5	Merge branch 'master' into index-lifecycle	2018-08-31 14:45:22 +01:00
David Roberts	7345878d33	[ML] Refactor delimited file structure detection (#33233 ) 1. Use the term "delimited" rather than "separated values" 2. Use a single factory class with arguments to specify the delimiter and identification constraints This change makes it easier to add support for other delimiter characters.	2018-08-31 08:48:45 +01:00
Nhat Nguyen	5632e31c74	Merge branch 'master' into ccr * master: Painless: Add Bindings (#33042) Update version after client credentials backport Fix forbidden apis on FIPS (#33202) Remote 6.x transport BWC Layer for `_shrink` (#33236) Test fix - Graph HLRC tests needed another field adding to randomisation exception list HLRC: Add ML Get Records API (#33085) [ML] Fix character set finder bug with unencodable charsets (#33234) TESTS: Fix overly long lines (#33240) Test fix - Graph HLRC test was missing field name to be excluded from randomisation logic Remove unsupported group_shard_failures parameter (#33208) Update BucketUtils#suggestShardSideQueueSize signature (#33210) Parse PEM Key files leniantly (#33173) INGEST: Add Pipeline Processor (#32473) Core: Add java time xcontent serializers (#33120) Consider multi release jars when running third party audit (#33206) Update MSI documentation (#31950) HLRC: create base timed request class (#33216) [DOCS] Fixes command page titles HLRC: Move ML protocol classes into client ml package (#33203) Scroll queries asking for rescore are considered invalid (#32918) Painless: Fix Semicolon Regression (#33212) ingest: minor - update test to include dissect (#33211) Switch remaining LLREST usage to new style Requests (#33171) HLREST: add reindex API (#32679)	2018-08-29 12:30:24 -04:00
Gordon Brown	454ce99b01	Merge branch 'master' into index-lifecycle	2018-08-29 08:28:23 -06:00
David Roberts	22415fa2de	[ML] Fix character set finder bug with unencodable charsets (#33234 ) Some character sets cannot be encoded and this was tripping up the binary data check in the ML log structure character set finder. The fix is to assume that if ICU4J identifies that some bytes correspond to a character set that cannot be encoded and those bytes contain zeroes then the data is binary rather than text. Fixes #33227	2018-08-29 14:56:02 +01:00
Tal Levy	5783545222	Merge branch 'master' into index-lifecycle	2018-08-27 08:19:05 -07:00
Nhat Nguyen	75304f405b	Merge branch 'master' into ccr * master: Add proxy support to RemoteClusterConnection (#33062) TEST: Skip assertSeqNos for closed shards (#33130) TEST: resync operation on replica should acquire shard permit (#33103) Switch remaining x-pack tests to new style Requests (#33108) Switch remaining tests to new style Requests (#33109) Switch remaining ml tests to new style Requests (#33107) Build: Line up IDE detection logic Security index expands to a single replica (#33131) HLRC: request/response homogeneity and JavaDoc improvements (#33133) Checkstyle! [Test] Fix sporadic failure in MembershipActionTests Revert "Do NOT allow termvectors on nested fields (#32728)" [Rollup] Move toAggCap() methods out of rollup config objects (#32583) Fix race condition in scheduler engine test	2018-08-25 21:41:53 -04:00
Nik Everett	8bee6b3a92	Switch remaining ml tests to new style Requests (#33107 ) In #29623 we added `Request` object flavored requests to the low level REST client and in #30315 we deprecated the old `performRequest`s. This changes all calls in the `x-pack/plugin/ml/qa/native-multi-node-tests`, `x-pack/plugin/ml/qa/single-node-tests` projects to use the new versions.	2018-08-24 16:36:40 -04:00
Tal Levy	74312be0ea	Merge branch 'master' into index-lifecycle	2018-08-24 12:41:12 -07:00
Jason Tedor	91a052b617	Merge branch 'master' into ccr * master: Add hook to skip asserting x-content equivalence (#33114) Muted testListenersThrowingExceptionsDoNotCauseOtherListenersToBeSkipped [Rollup] Move getMetadata() methods out of rollup config objects (#32579) Muted testEmptyAuthorizedIndicesSearchForAllDisallowNoIndices Update Google Cloud Storage Library for Java (#32940) Remove unsupported Version.V_5_* (#32937)	2018-08-24 06:55:10 -04:00
Jim Ferenczi	f4e9729d64	Remove unsupported Version.V_5_* (#32937 ) This change removes the es 5x version constants and their usages.	2018-08-24 09:51:21 +02:00
Martijn van Groningen	82592dda5a	Merge remote-tracking branch 'es/master' into ccr * es/master: (62 commits) [DOCS] Add docs for Application Privileges (#32635) Add versions 5.6.12 and 6.4.1 Do NOT allow termvectors on nested fields (#32728) [Rollup] Return empty response when aggs are missing (#32796) [TEST] Add some ACL yaml tests for Rollup (#33035) Move non duplicated actions back into xpack core (#32952) Test fix - GraphExploreResponseTests should not randomise array elements Closes #33086 Use `addIfAbsent` instead of checking if an element is contained TESTS: Fix Random Fail in MockTcpTransportTests (#33061) HLRC: Fix Compile Error From Missing Throws (#33083) [DOCS] Remove reload password from docs cf. #32889 HLRC: Add ML Get Buckets API (#33056) Watcher: Improve error messages for CronEvalTool (#32800) Search: Support of wildcard on docvalue_fields (#32980) Change query field expansion (#33020) INGEST: Cleanup Redundant Put Method (#33034) SQL: skip uppercasing/lowercasing function tests for AZ locales as well (#32910) Fix the default pom file name (#33063) Switch ml basic tests to new style Requests (#32483) Switch some watcher tests to new style Requests (#33044) ...	2018-08-24 12:22:11 +07:00
Gordon Brown	1f13c77b49	Merge branch 'master' into index-lifecycle	2018-08-23 11:52:59 -06:00
Nik Everett	0cc99d270c	Switch ml basic tests to new style Requests (#32483 ) In #29623 we added `Request` object flavored requests to the low level REST client and in #30315 we deprecated the old `performRequest`s. This changes all calls in the `x-pack/qa/ml-basic-multi-node` project to use the new versions.	2018-08-22 14:23:43 -04:00
Alpar Torok	82d10b484a	Run forbidden api checks with runtimeJavaVersion (#32947 ) Run forbidden APIs checks with runtime hava version	2018-08-22 09:05:22 +03:00
Nik Everett	2c81d7f77e	Build: Rework shadow plugin configuration (#32409 ) This reworks how we configure the `shadow` plugin in the build. The major change is that we no longer bundle dependencies in the `compile` configuration, instead we bundle dependencies in the new `bundle` configuration. This feels more right because it is a little more "opt in" rather than "opt out" and the name of the `bundle` configuration is a little more obvious. As an neat side effect of this, the `runtimeElements` configuration used when one project depends on another now contains exactly the dependencies needed to run the project so you no longer need to reference projects that use the shadow plugin like this: ``` testCompile project(path: ':client:rest-high-level', configuration: 'shadow') ``` You can instead use the much more normal: ``` testCompile "org.elasticsearch.client:elasticsearch-rest-high-level-client:${version}" ```	2018-08-21 20:03:28 -04:00
Nik Everett	fcf8cadd9a	Switch some x-pack tests to new style Requests (#32500 ) In #29623 we added `Request` object flavored requests to the low level REST client and in #30315 we deprecated the old `performRequest`s. This changes all calls in the `x-pack/qa/audit-tests`, `x-pack/qa/ml-disabled`, and `x-pack/qa/multi-node` projects to use the new versions.	2018-08-21 14:48:53 -04:00
Jason Tedor	28d12b05b7	Move ML tests to be sub-projects of ML (#33026 ) This commit moves the ML QA tests to be a sub-project of ML. The purpose of this refactoring is to enable ML developers to run :x-pack:plugin:ml:check and run the vast majority of a ML tests with a single command (this still does not contain the ML REST tests, nor the upgrade tests). This simplifies local development for faster iteration.	2018-08-21 12:23:21 -04:00
Benjamin Trent	3f91bbfa6b	[ML] Allowing _close to accept body payloads for options (#32989 ) (#33000 )	2018-08-21 08:08:26 -05:00
Jason Tedor	b08d02e3b7	Implement CCR licensing (#33002 ) This commit implements licensing for CCR. CCR will require a platinum license, and administrative endpoints will be disabled when a license is non-compliant.	2018-08-20 23:33:18 -04:00
Jason Tedor	9050c7e846	Generalize remote license checker (#32971 ) Machine learning has baked a remote license checker for use in checking license compatibility of a remote license. This remote license checker has general usage for any feature that relies on a remote cluster. For example, cross-cluster replication will pull changes from a remote cluster and require that the local and remote clusters have platinum licenses. This commit generalizes the remote cluster license check for use in cross-cluster replication.	2018-08-20 15:33:29 -04:00
Alpar Torok	4b34b3f4aa	Set forbidden APIs target compatibility to compiler java version (#32935 ) Set forbidden apis target compatibility to compiler version Fix outstanding deprecation	2018-08-20 09:27:02 +03:00
Benjamin Trent	9cec4aa14b	[ML] fix updating opened jobs scheduled events (#31651 ) (#32881 ) * ML: fix updating opened jobs scheduled events (#31651) * Adding UpdateParamsTests license header * Adding integration test and addressing PR comments * addressing test and job names	2018-08-17 07:21:17 -05:00
David Roberts	5ba04e23fc	[ML] Add log structure finder functionality (#32788 ) This change adds a library to ML that can be used to deduce a log file's structure given only a sample of the log file. Eventually this will be used to add an endpoint to ML to make the functionality available to end users, but this will follow in a separate change. The functionality is split into a library so that it can also be used by a command line tool without requiring the command line tool to include all server code.	2018-08-15 18:04:21 +01:00
Lee Hinman	48281ac5bc	Use generic AcknowledgedResponse instead of extended classes (#32859 ) This removes custom Response classes that extend `AcknowledgedResponse` and do nothing, these classes are not needed and we can directly use the non-abstract super-class instead. While this appears to be a large PR, no code has actually changed, only class names have been changed and entire classes removed.	2018-08-15 08:06:14 -06:00
Ed Savage	8ce1ab3ed9	[ML] Removing old per-partition normalization code (#32816 ) [ML] Removing old per-partition normalization code Per-partition normalization is an old, undocumented feature that was never used by clients. It has been superseded by per-partition maximum scoring. To maintain communication compatibility with nodes prior to 6.5 it is necessary to maintain/cope with the old wire format	2018-08-15 13:13:32 +01:00
Ed Savage	d147cd72cc	[ML] Partition-wise maximum scores (#32748 ) Added infrastructure to push through the 'person name field value' to the normalizer process. This is required by the normalizer to retrieve the maximum scores for individual partitions.	2018-08-13 10:31:17 +01:00
Benjamin Trent	b08416b899	Clear Job#finished_time when it is opened (#32605 ) (#32755 ) * Clear Job#finished_time when it is opened (#32605) * not returning failure when Job#finished_time is not reset * Changing error log string and source string	2018-08-10 13:52:00 -05:00
Dimitris Athanasiou	c7b1ba33aa	[ML] Refactor ProcessCtrl into Autodetect and Normalizer builders (#32720 ) This moves the helper functionality for creating the autodetect and mormalizer processes into corresponding builders.	2018-08-10 17:28:20 +01:00
David Roberts	ae0c303dad	Move icu4j and super-csv version numbers to versions file (#32769 ) The upcoming ML log structure finder functionality will use these libraries, and it makes sense to use the same versions that are being used elsewhere in Elasticsearch. This is especially true with icu4j, which is pretty big.	2018-08-10 12:19:06 +01:00
Nicholas Knize	e162127ff3	Upgrade to Lucene-7.5.0-snapshot-13b9e28f9d The main feature is the inclusion of bkd backed geo_shape with INTERSECT, DISJOINT, WITHIN bounding box and polygon query support.	2018-08-09 11:15:02 -05:00
Dimitris Athanasiou	f30bb0ebf8	[ML] Remove multiple_bucket_spans (#32496 ) This commit removes the never released multiple_bucket_spans configuration parameter. This is now replaced with the new multibucket feature that requires no configuration.	2018-08-02 11:25:56 +01:00
David Kyle	15679315e3	[ML] Rename JobProvider to JobResultsProvider (#32551 )	2018-08-02 09:53:47 +01:00
Benjamin Trent	9fb790dcc3	[ML] Fix thread leak when waiting for job flush (#32196 ) (#32541 )	2018-08-01 10:38:04 -05:00
Armin Braun	4b199dde8d	NETWORKING: Fix Netty Leaks by upgrading to 4.1.28 (#32511 ) * Upgrade to `4.1.28` since the problem reported in #32487 is a bug in Netty itself (see https://github.com/netty/netty/issues/7337) * Fixed other leaks in test code that now showed up due to fixes improvements in leak reporting in the newer version * Needed to extend permissions for netty common package because it now sets a classloader at runtime after changes in `63bae0956a` * Adjusted forbidden APIs check accordingly * Closes #32487	2018-08-01 02:34:58 +02:00
David Roberts	0afa265ac9	[ML] Consistent pattern for strict/lenient parser names (#32399 ) Previously we had two patterns for naming of strict and lenient parsers. Some classes had CONFIG_PARSER and METADATA_PARSER, and used an enum to pass the parser type to nested parsers. Other classes had STRICT_PARSER and LENIENT_PARSER and used ternary operators to pass the parser type to nested parsers. This change makes all ML classes use the second of the patterns described above.	2018-07-26 16:55:40 +01:00
Christoph Büscher	35ae87125d	Remove some dead code (#31993 ) Removing some dead code or supressing warnings where apropriate. Most of the time the variable tested for null is dereferenced earlier or never used before.	2018-07-26 17:12:51 +02:00
Tim Vernum	387c3c7f1d	Introduce Application Privileges with support for Kibana RBAC (#32309 ) This commit introduces "Application Privileges" to the X-Pack security model. Application Privileges are managed within Elasticsearch, and can be tested with the _has_privileges API, but do not grant access to any actions or resources within Elasticsearch. Their purpose is to allow applications outside of Elasticsearch to represent and store their own privileges model within Elasticsearch roles. Access to manage application privileges is handled in a new way that grants permission to specific application names only. This lays the foundation for more OLS on cluster privileges, which is implemented by allowing a cluster permission to inspect not just the action being executed, but also the request to which the action is applied. To support this, a "conditional cluster privilege" is introduced, which is like the existing cluster privilege, except that it has a Predicate over the request as well as over the action name. Specifically, this adds - GET/PUT/DELETE actions for defining application level privileges - application privileges in role definitions - application privileges in the has_privileges API - changes to the cluster permission class to support checking of request objects - a new "global" element on role definition to provide cluster object level security (only for manage application privileges) - changes to `kibana_user`, `kibana_dashboard_only_user` and `kibana_system` roles to use and manage application privileges Closes #29820 Closes #31559	2018-07-24 10:34:46 -06:00
Nik Everett	e6b9f59e4e	Build: Shadow x-pack:protocol into x-pack:plugin:core (#32240 ) This bundles the x-pack:protocol project into the x-pack:plugin:core project because we'd like folks to consider it an implementation detail of our build rather than a separate artifact to be managed and depended on. It is now bundled into both x-pack:plugin:core and client:rest-high-level. To make this work I had to fix a few things. Firstly, I had to make PluginBuildPlugin work with the shadow plugin. In that case we have to bundle only the `shadow` dependencies and the shadow jar. Secondly, every reference to x-pack:plugin:core has to use the `shadow` configuration. Without that the reference is missing all of the un-shadowed dependencies. I tried to make it so that applying the shadow plugin automatically redefines the `default` configuration to mirror the `shadow` configuration which would allow us to use bare project references to the x-pack:plugin:core project but I couldn't make it work. It'd look like it works but then fail for transitive dependencies anyway. I think it is still a good thing to do but I don't have the willpower to do it now. Finally, I had to fix an issue where Eclipse and IntelliJ didn't properly reference shadowed transitive dependencies. Neither IDE supports shadowing natively so they have to reference the shadowed projects. We fix this by detecting `shadow` dependencies when in "Intellij mode" or "Eclipse mode" and adding `runtime` dependencies to the same target. This convinces IntelliJ and Eclipse to play nice.	2018-07-24 11:53:04 -04:00
David Kyle	99426eb4f8	[ML] Extract persistent task methods from MlMetadata (#32319 ) Move ML persistent task helper functions to the new class MlTasks and remove MLMetadataField after moving the string constant to MlMetadata.	2018-07-24 15:22:57 +01:00
Christoph Büscher	ff87b7aba4	Remove unnecessary warning supressions (#32250 )	2018-07-23 11:31:04 +02:00
David Kyle	ac960bfa6b	[ML] Use default request durability for .ml-state index (#32233 ) The initial decision to use async durability was made a long time ago for performance reasons. That argument no longer applies and we prefer the safety of request durability.	2018-07-20 15:49:37 +01:00
Tim Vernum	c32981db6b	Detect old trial licenses and mimic behaviour (#32209 ) Prior to 6.3 a trial license default to security enabled. Since 6.3 they default to security disabled. If a cluster is upgraded from <6.3 to >6.3, then we detect this and mimic the old behaviour with respect to security.	2018-07-20 10:09:28 +10:00
David Roberts	99c2a82c04	[ML] Move analyzer dependencies out of categorization config (#32123 ) The ML config classes will shortly be moved to the X-Pack protocol library to allow the ML APIs to be moved to the high level REST client. Dependencies on server functionality should be removed from the config classes before this is done. This change is entirely about moving code between packages. It does not add or remove any functionality or tests.	2018-07-17 15:01:12 +01:00
Armin Braun	ed3b44fb4c	Handle TokenizerFactory TODOs (#32063 ) * Don't replace Replace TokenizerFactory with Supplier, this approach was rejected in #32063 * Remove unused parameter from constructor	2018-07-17 14:14:02 +02:00
David Roberts	d2461643cd	[ML] Move open job failure explanation out of root cause (#31925 ) When an ML job cannot be allocated to a node the exception contained an explanation of why the job couldn't be allocated to each node in the cluster. For large clusters this was not particularly easy to read and made the error displayed in the UI look very scary. This commit changes the structure of the error to an outer ElasticsearchException with a high level message and an inner IllegalStateException containing the detailed explanation. Because the definition of root cause is the innermost ElasticsearchException the detailed explanation will not be the root cause (which is what Kibana displays). Fixes #29950	2018-07-13 08:57:33 +01:00
Nik Everett	dcbb1154bf	HLRest: Move xPackInfo() to xPack().info() (#31905 ) Originally I put the X-Pack info object into the top level rest client object. I did that because we thought we'd like to squash `xpack` from the name of the X-Pack APIs now that it is part of the default distribution. We still kind of want to do that, but at least for now we feel like it is better to keep the high level rest client aligned with the other language clients like C# and Python. This shifts the X-Pack info API to align with its json spec file. Relates to #31870	2018-07-10 13:01:28 -04:00
Nik Everett	fb27f3e7f0	HLREST: Add x-pack-info API (#31870 ) This is the first x-pack API we're adding to the high level REST client so there is a lot to talk about here! = Open source The client for these APIs is open source. We're taking the previously Elastic licensed files used for the `Request` and `Response` objects and relicensing them under the Apache 2 license. The implementation of these features is staying under the Elastic license. This lines up with how the rest of the Elasticsearch language clients work. = Location of the new files We're moving all of the `Request` and `Response` objects that we're relicensing to the `x-pack/protocol` directory. We're adding a copy of the Apache 2 license to the root fo the `x-pack/protocol` directory to line up with the language in the root `LICENSE.txt` file. All files in this directory will have the Apache 2 license header as well. We don't want there to be any confusion. Even though the files are under the `x-pack` directory, they are Apache 2 licensed. We chose this particular directory layout because it keeps the X-Pack stuff together and easier to think about. = Location of the API in the REST client We've been following the layout of the rest-api-spec files for other APIs and we plan to do this for the X-Pack APIs with one exception: we're dropping the `xpack` from the name of most of the APIs. So `xpack.graph.explore` will become `graph().explore()` and `xpack.license.get` will become `license().get()`. `xpack.info` and `xpack.usage` are special here though because they don't belong to any proper category. For now I'm just calling `xpack.info` `xPackInfo()` and intend to call usage `xPackUsage` though I'm not convinced that this is the final name for them. But it does get us started. = Jars, jars everywhere! This change makes the `xpack:protocol` project a `compile` scoped dependency of the `x-pack:plugin:core` and `client:rest-high-level` projects. I intend to keep it a compile scoped dependency of `x-pack:plugin:core` but I intend to bundle the contents of the protocol jar into the `client:rest-high-level` jar in a follow up. This change has grown large enough at this point. In that followup I'll address javadoc issues as well. = Breaking-Java This breaks that transport client by a few classes around. We've traditionally been ok with doing this to the transport client.	2018-07-08 11:03:56 -04:00
Dimitris Athanasiou	49ba271bd8	[ML] Fix master node deadlock during ML daily maintenance (#31836 ) This is the implementation for master and 6.x of #31691. Native tests are changed to use multi-node clusters in #31757. Relates #31683	2018-07-07 09:43:28 +01:00
Christoph Büscher	bd1c513422	Reduce more raw types warnings (#31780 ) Similar to #31523.	2018-07-05 15:38:06 +02:00
David Roberts	92de94c237	[ML] Don't treat stale FAILED jobs as OPENING in job allocation (#31800 ) Job persistent tasks with stale allocation IDs used to always be considered as OPENING jobs in the ML job node allocation decision. However, FAILED jobs are not relocated to other nodes, which leads to them blocking up the nodes they failed on after node restarts. FAILED jobs should not restrict how many other jobs can open on a node, regardless of whether they are stale or not. Closes #31794	2018-07-05 13:26:17 +01:00
Dimitris Athanasiou	9c11bf1e12	[ML] Fix calendar and filter updates from non-master nodes (#31804 ) Job updates or changes to calendars or filters may result into updating the job process if it has been running. To preserve the order of updates, process updates are queued through the UpdateJobProcessNotifier which is only running on the master node. All actions performing such updates must run on the master node. However, the CRUD actions for calendars and filters are not master node actions. They have been submitting the updates to the UpdateJobProcessNotifier even though it might have not been running (given the action was run on a non-master node). When that happens, the update never reaches the process. This commit fixes this problem by ensuring the notifier runs on all nodes and by ensuring the process update action gets the resources again before updating the process (instead of having those resources passed in the request). This ensures that even if the order of the updates gets messed up, the latest update will read the latest state of those resource and the process will get back in sync. This leaves us with 2 types of updates: 1. updates to the job config should happen on the master node. This is because we cannot refetch the entire job and update it. We need to know the parts that have been changed. 2. updates to resources the job uses. Those can be handled on non-master nodes but they should be re-fetched by the update process action. Closes #31803	2018-07-05 13:14:12 +01:00
David Roberts	308e37f80e	[ML] Rate limit established model memory updates (#31768 ) There is at most one model size stats document per bucket, but during lookback a job can churn through many buckets very quickly. This can lead to many cluster state updates if established model memory needs to be updated for a given model size stats document. This change rate limits established model memory updates to one per job per 5 seconds. This is done by scheduling the updates 5 seconds in the future, but replacing the value to be written if another model size stats document is received during the waiting period. Updating the values in arrears like this means that the last value received will be the one associated with the job in the long term, whereas alternative approaches such as not updating the value if a new value was close to the old value would not.	2018-07-04 13:56:32 +01:00
Hendrik Muhs	e9f8442bee	[ML] Return statistics about forecasts as part of the jobsstats and usage API (#31647 ) This change adds stats about forecasts, to the jobstats api as well as xpack/_usage. The following information is collected: _xpack/ml/anomaly_detectors/{jobid\|_all}/_stats: - total number of forecasts - memory statistics (mean/min/max) - runtime statistics - record statistics - counts by status _xpack/usage - collected by job status as well as overall (_all): - total number of forecasts - number of jobs that have at least 1 forecast - memory, runtime, record statistics - counts by status Fixes #31395	2018-07-04 08:15:45 +02:00
Alpar Torok	0afec8f31c	Remove deprecation warnings to prepare for Gradle 5 (sourceSets.main.output.classesDirs) (#30389 ) * Remove deprecation warnings to prepare for Gradle 5 Gradle replaced `project.sourceSets.main.output.classesDir` of type `File` with `project.sourceSets.main.output.classesDirs` of type `FileCollection` (see [SourceSetOutput](https://github.com/gradle/gradle/blob/master/subprojects/plugins/src/main/java/org/gradle/api/tasks/SourceSetOutput.java)) Build output is now stored on a per language folder. There are a few places where we use that, here's these and how it's fixed: - Randomized Test execution - look in all test folders ( pass the multi dir configuration to the ant runner ) - DRY the task configuration by introducing `basedOn` for `RandomizedTestingTask` DSL - Extend the naming convention test to support passing in multiple directories - Fix the standalon test plugin, the dires were not passed trough, checked with a debuger and the statement had no affect due to a missing `=`. Closes #30354 * Only check Java tests, PR feedback - Name checker was ran for Groovy tests that don't adhere to the same convections causing the check to fail - implement PR feedback * Replace `add` with `addAll` This worked because the list is passed to `project.files` that does the right thing. * Revert "Only check Java tests, PR feedback" This reverts commit 9bd9389875d8b88aadb50df57a45cd0d2b073241. * Remove `basedOn` helper * Bring some changes back Previus revert accidentally reverted too much * Fix negation * add back public * revert name check changes * Revert "revert name check changes" This reverts commit a2800c0b363168339ea65e2a79ec8256e5883e6d. * Pass all dirs to name check Only run on Java for build-tools, this is safe because it's a self test. It needs more work before we could pass in the Groovy classes as well as these inherit from `GroovyTestCase` * remove self tests from name check The self complicates the task setup and disable real checks on build-tools. With this change there are no more self tests, and the build-tools tests adhere to the conventions. The self test will be replaced by gradle test kit, thus the addition of the Gradle plugin builder plugin. * First test to run a Gradle build * Add tests that replace the name check self test * Clean up integ test base class * Always run tests * Align with test naming conventions * Make integ. test case inherit from unit test case The check requires this * Remove `import static org.junit.Assert.*`	2018-06-28 15:14:34 +03:00
Christoph Büscher	86ab3a2d1a	Reduce number of raw types warnings (#31523 ) A first attempt to reduce the number of raw type warnings, most of the time by using the unbounded wildcard.	2018-06-25 15:59:03 +02:00
Ryan Ernst	7a150ec06d	Core: Combine doExecute methods in TransportAction (#31517 ) TransportAction currently contains 2 doExecute methods, one which takes a the task, and one that does not. The latter is what some subclasses implement, while the first one just calls the latter, dropping the given task. This commit combines these methods, in favor of just always assuming a task is present.	2018-06-22 15:03:01 -07:00
Dimitris Athanasiou	c6cbc99f9c	[ML] Add ML filter update API (#31437 ) This adds an api to allow updating a filter: POST _xpack/ml/filters/{filter_id}/_update The request body may have: - description: setting a new description - add_items: a list of the items to add - remove_items: a list of the items to remove This commit also changes the PUT filter api to error when the filter_id is already used. As now there is an api for updating filters, the put api should only be used to create new ones. Also, updating a filter results into a notification message auditing the change for every job that is using that filter.	2018-06-22 15:13:31 +01:00
Adrien Grand	8ae2049889	Avoid deprecation warning when running the ML datafeed extractor. (#31463 ) In #29639 we added a `format` option to doc-value fields and deprecated usage of doc-value fields without a format so that we could migrate doc-value fields to use the format that comes with the mappings by default. However I missed to fix the machine-learning datafeed extractor.	2018-06-22 13:46:48 +02:00
Ryan Ernst	4f9332ee16	Core: Remove ThreadPool from base TransportAction (#31492 ) Most transport actions don't need the node ThreadPool. This commit removes the ThreadPool as a super constructor parameter for TransportAction. The actions that do need the thread pool then have a member added to keep it from their own constructor.	2018-06-21 11:25:26 -07:00
Ryan Ernst	401800d958	Core: Remove index name resolver from base TransportAction (#31002 ) Most transport actions don't need to resolve index names. This commit removes the index name resolver as a super constructor parameter for TransportAction. The actions that do need the resolver then have a member added to keep the resolver from their own constructor.	2018-06-19 17:06:09 -07:00
Yannick Welsch	02a4ef38a7	Use system context for cluster state update tasks (#31241 ) This commit makes it so that cluster state update tasks always run under the system context, only restoring the original context when the listener that was provided with the task is called. A notable exception is the clusterStatePublished(...) callback which will still run under system context, because it's defined on the executor-level, and not the task level, and only called once for the combined batch of tasks and can therefore not be uniquely identified with a task / thread context. Relates #30603	2018-06-18 16:46:04 +02:00
Dimitris Athanasiou	c6a5a6d924	[ML] Put ML filter API response should contain the filter (#31362 )	2018-06-15 21:15:35 +01:00
Tanguy Leroux	992c7889ee	Uncouple persistent task state and status (#31031 ) This pull request removes the relationship between the state of persistent task (as stored in the cluster state) and the status of the task (as reported by the Task APIs and used in various places) that have been confusing for some time (#29608). In order to do that, a new PersistentTaskState interface is added. This interface represents the persisted state of a persistent task. The methods used to update the state of persistent tasks are renamed: updatePersistentStatus() becomes updatePersistentTaskState() and now takes a PersistentTaskState as a parameter. The Task.Status type as been changed to PersistentTaskState in all places were it make sense (in persistent task customs in cluster state and all other methods that deal with the state of an allocated persistent task).	2018-06-15 09:26:47 +02:00
Dimitris Athanasiou	9b293275af	[ML] Add description to ML filters (#31330 ) This adds a `description` to ML filters in order to allow users to describe their filters in a human readable form which is also editable (filter updates to be added shortly).	2018-06-14 16:52:32 +01:00
Tanguy Leroux	2d4c9ce08c	Remove remaining unused imports before merging #31270	2018-06-14 09:52:03 +02:00
David Kyle	88f44a9f66	[ML] Check licence when datafeeds use cross cluster search (#31247 ) This change prevents a datafeed using cross cluster search from starting if the remote cluster does not have x-pack installed and a sufficient license. The check is made only when starting a datafeed.	2018-06-13 15:42:18 +01:00
Dimitris Athanasiou	5c77ebe89d	[ML] Implement new rules design (#31110 ) Rules allow users to supply a detector with domain knowledge that can improve the quality of the results. The model detects statistically anomalous results but it has no knowledge of the meaning of the values being modelled. For example, a detector that performs a population analysis over IP addresses could benefit from a list of IP addresses that the user knows to be safe. Then anomalous results for those IP addresses will not be created and will not affect the quantiles either. Another example would be a detector looking for anomalies in the median value of CPU utilization. A user might want to inform the detector that any results where the actual value is less than 5 is not interesting. This commit introduces a `custom_rules` field to the `Detector`. A detector may have multiple rules which are combined with `or`. A rule has 3 fields: `actions`, `scope` and `conditions`. Actions is a list of what should happen when the rule applies. The current options include `skip_result` and `skip_model_update`. The default value for `actions` is the `skip_result` action. Scope is optional and allows for applying filters on any of the partition/over/by field. When not defined the rule applies to all series. The `filter_id` needs to be specified to match the id of the filter to be used. Optionally, the `filter_type` can be specified as either `include` (default) or `exclude`. When set to `include` the rule applies to entities that are in the filter. When set to `exclude` the rule only applies to entities not in the filter. There may be zero or more conditions. A condition requires `applies_to`, `operator` and `value` to be specified. The `applies_to` value can be either `actual`, `typical` or `diff_from_typical` and it specifies the numerical value to which the condition applies. The `operator` (`lt`, `lte`, `gt`, `gte`) and `value` complete the definition. Conditions are combined with `and` and allow to specify numerical conditions for when a rule applies. A rule must either have a scope or one or more conditions. Finally, a rule with scope and conditions applies when all of them apply.	2018-06-13 11:20:38 +01:00
Jason Tedor	0bfd18cc8b	Revert upgrade to Netty 4.1.25.Final (#31282 ) This reverts upgrading to Netty 4.1.25.Final until we have a cleaner solution to dealing with the object cleaner thread.	2018-06-12 19:26:18 -04:00
Jason Tedor	563141c6c9	Upgrade to Netty 4.1.25.Final (#31232 ) This commit upgrades us to Netty 4.1.25. This upgrade is more challenging than past upgrades, all because of a new object cleaner thread that they have added. This thread requires an additional security permission (set context class loader, needed to avoid leaks in certain scenarios). Additionally, there is not a clean way to shutdown this thread which means that the thread can fail thread leak control during tests. As such, we have to filter this thread from thread leak control.	2018-06-11 16:55:07 -04:00
Tanguy Leroux	bf58660482	Remove all unused imports and fix CRLF (#31207 ) The X-Pack opening and the recent other refactorings left a lot of unused imports in the codebase. This commit removes them all.	2018-06-11 15:12:12 +02:00
Christoph Büscher	3f87c79500	Change ObjectParser exception (#31030 ) ObjectParser should throw XContentParseExceptions, not IAE. A dedicated parsing exception can includes the place where the error occurred. Closes #30605	2018-06-04 20:20:37 +02:00
David Kyle	16d1f05045	[ML] Add secondary sort to ML events (#31063 )	2018-06-04 16:31:35 +01:00
Ryan Ernst	46e8d97813	Core: Remove RequestBuilder from Action (#30966 ) This commit removes the RequestBuilder generic type from Action. It was needed to be used by the newRequest method, which in turn was used by client.prepareExecute. Both of these methods are now removed, along with the existing users of prepareExecute constructing the appropriate builder directly.	2018-05-31 16:15:00 +02:00
Tanguy Leroux	a0af0e7f1e	Rename methods in PersistentTasksService (#30837 ) This commit renames methods in the PersistentTasksService, to make obvious that the methods send requests in order to change the state of persistent tasks. Relates to #29608.	2018-05-30 09:20:14 +02:00
Jason Tedor	bcfdccaf3f	Use dedicated ML APIs in tests (#30941 ) ML has dedicated APIs for datafeeds and jobs yet base test classes and some tests were relying on the cluster state for this state. This commit removes this usage in favor of using the dedicated endpoints.	2018-05-29 21:17:47 -04:00
Adrien Grand	a19df4ab3b	Add a `format` option to `docvalue_fields`. (#29639 ) This commit adds the ability to configure how a docvalue field should be formatted, so that it would be possible eg. to return a date field formatted as the number of milliseconds since Epoch. Closes #27740	2018-05-23 14:39:04 +02:00
Yannick Welsch	03607f646b	Revert "Mutes MachineLearningTests.testNoAttributes_givenSameAndMlEnabled" This reverts commit `ca999ad569`.	2018-05-23 11:49:52 +02:00
Yannick Welsch	8145a820c2	Only allow x-pack metadata if all nodes are ready (#30743 ) Enables a rolling restart from the OSS distribution to the x-pack based distribution by preventing x-pack code from installing custom metadata into the cluster state until all nodes are capable of deserializing this metadata.	2018-05-23 11:41:23 +02:00
Colin Goodheart-Smithe	ca999ad569	Mutes MachineLearningTests.testNoAttributes_givenSameAndMlEnabled This is awaiting fix on https://github.com/elastic/elasticsearch/issues/30804	2018-05-23 10:39:00 +01:00
Yannick Welsch	30b004f582	Use original settings on full-cluster restart (#30780 ) When doing a node restart using the test framework, the restarted node does not only use the settings provided to the original node, but also additional settings provided by plugin extensions, which does not correspond to the settings that a node would have on a true restart.	2018-05-23 09:02:01 +02:00
David Kyle	f76f95b813	[ML] Filter undefined job groups from update calendar actions (#30757 ) The UI creates job groups in calendars ad hoc to ease calendar creation these must be filtered from the jobs list before applying updates.	2018-05-22 09:25:14 +01:00
David Roberts	eaf672f612	[ML] Don't install empty ML metadata on startup (#30751 ) This change is to support rolling upgrade from a pre-6.3 default distribution (i.e. without X-Pack) to a 6.3+ default distribution (i.e. with X-Pack). The ML metadata is no longer eagerly added to the cluster state as soon as the master node has X-Pack available. Instead, it is added when the first ML job is created. As a result all methods that get the ML metadata need to be able to handle the situation where there is no ML metadata in the current cluster state. They do this by behaving as though an empty ML metadata was present. This logic is encapsulated by always asking for the current ML metadata using a static method on the MlMetadata class. Relates #30731	2018-05-21 14:29:45 +01:00
Hendrik Muhs	6c313a9871	This implementation lazily (on 1st forecast request) checks for available diskspace and creates a subfolder for storing data outside of Lucene indexes, but as part of the ES data paths. Details: - tmp storage is managed and does not allow allocation if disk space is below a threshold (5GB at the moment) - tmp storage is supposed to be managed by the native component but in case this fails cleanup is provided: - on job close - on process crash - after node crash, on restart - available space is re-checked for every forecast call (the native component has to check again before writing) Note: The 1st path that has enough space is chosen on job open (job close/reopen triggers a new search)	2018-05-18 14:04:09 +02:00
Hendrik Muhs	d893041634	[ML] add version information in case of crash of native ML process (#30674 ) This change adds version information in case a native ML process crashes, the version is important for choosing the right symbol files when analyzing the crash. Adding the version combines all necessary information on one line. relates elastic/ml-cpp#94	2018-05-18 07:46:52 +02:00
Dimitris Athanasiou	75665a2d3e	[ML] Clean left behind model state docs (#30659 ) It is possible for state documents to be left behind in the state index. This may be because of bugs or uncontrollable scenarios. In any case, those documents may take up quite some disk space when they add up. This commit adds a step in the expired data deletion that is part of the daily maintenance service. The new step searches for state documents that do not belong to any of the current jobs and deletes them. Closes #30551	2018-05-17 17:51:26 +03:00
Dimitris Athanasiou	01bdfcde6f	[ML] DeleteExpiredDataAction should use client with origin (#30646 ) This is an admin action that should be allowed to operate on ML indices with full permissions.	2018-05-16 23:35:23 +03:00
Colin Goodheart-Smithe	a75b8adce5	Refactors ClientHelper to combine header logic (#30620 ) * Refactors ClientHelper to combine header logic This change removes all the `ClientHelper` classes which were repeating logic between plugins and instead adds `ClientHelper.executeWithHeaders()` and `ClientHelper.executeWithHeadersAsync()` methods to centralise the logic for executing requests with stored security headers. Removes Watcher headers constant	2018-05-16 11:38:24 +01:00
David Roberts	50c34b2a9b	[ML] Reverse engineer Grok patterns from categorization results (#30125 ) This change adds a grok_pattern field to the GET categories API output in ML. It's calculated using the regex and examples in the categorization result, and applying a list of candidate Grok patterns to the bits in between the tokens that are considered to define the category. This can currently be considered a prototype, as the Grok patterns it produces are not optimal. However, enough people have said it would be useful for it to be worthwhile exposing it as experimental functionality for interested parties to try out.	2018-05-15 09:02:38 +01:00
David Kyle	9dd629648d	[ML] Improve state persistence log message	2018-05-12 09:20:08 +01:00
Dimitris Athanasiou	3b260dcfc1	[ML] Account for gaps in data counts after job is reopened (#30294 ) This commit fixes an issue with the data diagnostics were empty buckets are not reported even though they should. Once a job is reopened, the diagnostics do not get initialized from the current data counts (especially the latest record timestamp). The result is that if the data that is sent have a time gap compared to the previous ones, that gap is not accounted for in the empty bucket count. This commit fixes that by initializing the diagnostics with the current data counts. Closes #30080	2018-05-03 15:08:24 +01:00
Ryan Ernst	fb0aa562a5	Network: Remove http.enabled setting (#29601 ) This commit removes the http.enabled setting. While all real nodes (started with bin/elasticsearch) will always have an http binding, there are many tests that rely on the quickness of not actually needing to bind to 2 ports. For this case, the MockHttpTransport.TestPlugin provides a dummy http transport implementation which is used by default in ESIntegTestCase. closes #12792	2018-05-02 11:42:05 -07:00
Dimitris Athanasiou	057cdffed5	[ML] Refactor DataStreamDiagnostics to use array (#30129 ) This commit refactors the DataStreamDiagnostics class achieving the following advantages: - simpler code; by encapsulating the moving bucket histogram into its own class - better performance; by using an array to store the buckets instead of a map - explicit handling of gap buckets; in preparation of fixing #30080	2018-05-01 09:50:32 +01:00
David Roberts	225f7093a9	[ML] Include 3rd party C++ component notices (#30132 ) The overall NOTICE file for the ML X-Pack module should include the notices from the 3rd party C++ components as well as the 3rd party Java components.	2018-04-30 20:05:27 +01:00
David Kyle	cfc66a1fd5	[ML] Wait for updates to established memory usage Tests need to wait for changes to the job's established memory usage to propagate and an over enthusiastic optimisation meant jobs were updated from stale state causing recent change to be lost.	2018-04-24 13:46:58 -04:00
Ryan Ernst	2efd22454a	Migrate x-pack-elasticsearch source to elasticsearch	2018-04-20 15:29:54 -07:00

... 7 8 9 10 11 ...

744 Commits