OpenSearch

Commit Graph

Author	SHA1	Message	Date
Benjamin Trent	79052050bf	[ML] Adding support for geo_shape, geo_centroid, geo_point in datafeeds (#42969 ) (#43069 ) * [ML] Adding support for geo_shape, geo_centroid, geo_point in datafeeds * only supporting doc_values for geo_point fields * moving validation into GeoPointField ctor	2019-06-10 21:52:53 -05:00
David Roberts	b202a59f88	[ML] Add earliest and latest timestamps to field stats (#42890 ) This change adds the earliest and latest timestamps into the field stats for fields of type "date" in the output of the ML find_file_structure endpoint. This will enable the cards for date fields in the file data visualizer in the UI to be made to look more similar to the cards for date fields in the index data visualizer in the UI.	2019-06-06 08:58:35 +01:00
David Roberts	b61202b0a8	[ML] Add a limit on line merging in find_file_structure (#42501 ) When analysing a semi-structured text file the find_file_structure endpoint merges lines to form multi-line messages using the assumption that the first line in each message contains the timestamp. However, if the timestamp is misdetected then this can lead to excessive numbers of lines being merged to form massive messages. This commit adds a line_merge_size_limit setting (default 10000 characters) that halts the analysis if a message bigger than this is created. This prevents significant CPU time being spent subsequently trying to determine the internal structure of the huge bogus messages.	2019-06-03 13:45:51 +01:00
Benjamin Trent	d06618a70d	[ML] adding delayed_data_check_config to datafeed update docs (#42095 ) (#42626 ) * [ML] adding delayed_data_check_config to datafeed update docs * [DOCS] Edits delayed data configuration details	2019-05-28 11:36:30 -04:00
David Roberts	f472186b9f	[ML] Improve file structure finder timestamp format determination (#41948 ) This change contains a major refactoring of the timestamp format determination code used by the ML find file structure endpoint. Previously timestamp format determination was done separately for each piece of text supplied to the timestamp format finder. This had the drawback that it was not possible to distinguish dd/MM and MM/dd in the case where both numbers were 12 or less. In order to do this sensibly it is best to look across all the available timestamps and see if one of the numbers is greater than 12 in any of them. This necessitates making the timestamp format finder an instantiable class that can accumulate evidence over time. Another problem with the previous approach was that it was only possible to override the timestamp format to one of a limited set of timestamp formats. There was no way out if a file to be analysed had a timestamp that was sane yet not in the supported set. This is now changed to allow any timestamp format that can be parsed by a combination of these Java date/time formats: yy, yyyy, M, MM, MMM, MMMM, d, dd, EEE, EEEE, H, HH, h, mm, ss, a, XX, XXX, zzz Additionally S letter groups (fractional seconds) are supported providing they occur after ss and separated from the ss by a dot, comma or colon. Spacing and punctuation is also permitted with the exception of the question mark, newline and carriage return characters, together with literal text enclosed in single quotes. The full list of changes/improvements in this refactor is: - Make TimestampFormatFinder an instantiable class - Overrides must be specified in Java date/time format - Joda format is no longer accepted - Joda timestamp formats in outputs are now derived from the determined or overridden Java timestamp formats, not stored separately - Functionality for determining the "best" timestamp format in a set of lines has been moved from TextLogFileStructureFinder to TimestampFormatFinder, taking advantage of the fact that TimestampFormatFinder is now an instantiable class with state - The functionality to quickly rule out some possible Grok patterns when looking for timestamp formats has been changed from using simple regular expressions to the much faster approach of using the Shift-And method of sub-string search, but using an "alphabet" consisting of just 1 (representing any digit) and 0 (representing non-digits) - Timestamp format overrides are now much more flexible - Timestamp format overrides that do not correspond to a built-in Grok pattern are mapped to a %{CUSTOM_TIMESTAMP} Grok pattern whose definition is included within the date processor in the ingest pipeline - Grok patterns that correspond to multiple Java date/time patterns are now handled better - the Grok pattern is accepted as matching broadly, and the required set of Java date/time patterns is built up considering all observed samples - As a result of the more flexible acceptance of Grok patterns, when looking for the "best" timestamp in a set of lines timestamps are considered different if they are preceded by a different sequence of punctuation characters (to prevent timestamps far into some lines being considered similar to timestamps near the beginning of other lines) - Out-of-the-box Grok patterns that are considered now include %{DATE} and %{DATESTAMP}, which have indeterminate day/month ordering - The order of day/month in formats with indeterminate day/month order is determined by considering all observed samples (plus the server locale if the observed samples still do not suggest an ordering) Relates #38086 Closes #35137 Closes #35132	2019-05-24 09:10:08 +01:00
Zachary Tong	6ae6f57d39	[7.x Backport] Force selection of calendar or fixed intervals (#41906 ) The date_histogram accepts an interval which can be either a calendar interval (DST-aware, leap seconds, arbitrary length of months, etc) or fixed interval (strict multiples of SI units). Unfortunately this is inferred by first trying to parse as a calendar interval, then falling back to fixed if that fails. This leads to confusing arrangement where `1d` == calendar, but `2d` == fixed. And if you want a day of fixed time, you have to specify `24h` (e.g. the next smallest unit). This arrangement is very error-prone for users. This PR adds `calendar_interval` and `fixed_interval` parameters to any code that uses intervals (date_histogram, rollup, composite, datafeed, etc). Calendar only accepts calendar intervals, fixed accepts any combination of units (meaning `1d` can be used to specify `24h` in fixed time), and both are mutually exclusive. The old interval behavior is deprecated and will throw a deprecation warning. It is also mutually exclusive with the two new parameters. In the future the old dual-purpose interval will be removed. The change applies to both REST and java clients.	2019-05-20 12:07:29 -04:00
James Rodewig	005296dac6	[DOCS] Allow attribute substitution in titleabbrevs for Asciidoctor migration (#41574 ) * [DOCS] Replace attributes in titleabbrevs for Asciidoctor migration * [DOCS] Add [subs="attributes"] so attributes render in Asciidoctor * Revert "[DOCS] Replace attributes in titleabbrevs for Asciidoctor migration" This reverts commit 98f130257a7c71e9f6cddf5157af7886418338d8. * [DOCS] Fix merge conflict	2019-04-30 13:46:45 -04:00
David Roberts	cbe7d335ff	[DOCS] Use "source" instead of "inline" in ML docs (#40635 ) Specifying an inline script in an "inline" field was deprecated in 5.x. The new field name is "source". (Since 6.x still accepts "inline" I will only backport this docs change as far as 7.0.)	2019-03-29 17:30:28 +00:00
David Kyle	48788269b0	[ML] Correct small inconsistencies in ml APIs spec and docs (#39907 )	2019-03-11 14:02:50 +00:00
David Roberts	5f8f91c03b	[ML] Use scaling thread pool and xpack.ml.max_open_jobs cluster-wide dynamic (#39736 ) This change does the following: 1. Makes the per-node setting xpack.ml.max_open_jobs into a cluster-wide dynamic setting 2. Changes the job node selection to continue to use the per-node attributes storing the maximum number of open jobs if any node in the cluster is older than 7.1, and use the dynamic cluster-wide setting if all nodes are on 7.1 or later 3. Changes the docs to reflect this 4. Changes the thread pools for native process communication from fixed size to scaling, to support the dynamic nature of xpack.ml.max_open_jobs 5. Renames the autodetect thread pool to the job comms thread pool to make clear that it will be used for other types of ML jobs (data frame analytics in particular) Backport of #39320	2019-03-06 12:29:34 +00:00
Tal Levy	92756288b4	relax ML Info Docs expected response (#38993 ) the get-ml-info API documentation tested that the response show that ML's `upgrade_mode` was false. For reasons that may be true due to other tests running in parallel or not cleaning themselves up, this may not be guaranteed. Since the actual value here is not of importance, this commit relaxes the requirement that upgrade_mode be static.	2019-02-15 16:31:01 -08:00
Alexander Reelsen	8e5e48319e	Add documentation about breaking java time changes (#38886 ) In addition remove joda time mentions across the docs, make sure links are updated to java time javadocs. Forward port of #38720	2019-02-14 10:18:12 +01:00
David Roberts	02f57b1e29	[DOCS] Add warning about bypassing ML PUT APIs (#38605 ) Now that ML configurations are stored in the .ml-config index rather than in cluster state there is a possibility that some users may try to add configurations directly to the index. Allowing this creates a variety of problems including possible data exflitration attacks (depending on how security is set up), so this commit adds warnings against allowing writes to the .ml-config index other than via the ML APIs. Backport of #38509	2019-02-08 11:35:37 +00:00
David Roberts	1fa413a16d	[ML] Remove "8" prefixes from file structure finder timestamp formats (#38016 ) In 7.x Java timestamp formats are the default timestamp format and there is no need to prefix them with "8". (The "8" prefix was used in 6.7 to distinguish Java timestamp formats from Joda timestamp formats.) This change removes the "8" prefixes from timestamp formats in the output of the ML file structure finder.	2019-02-01 15:36:04 +00:00
Benjamin Trent	8280a20664	ML: Add upgrade mode docs, hlrc, and fix bug (#37942 ) * ML: Add upgrade mode docs, hlrc, and fix bug * [DOCS] Fixes build error and edits text * adjusting docs * Update docs/reference/ml/apis/set-upgrade-mode.asciidoc Co-Authored-By: benwtrent <ben.w.trent@gmail.com> * Update set-upgrade-mode.asciidoc * Update set-upgrade-mode.asciidoc	2019-01-30 06:51:11 -06:00
Lisa Cawley	19529da2db	[DOCS] Delayed data annotations (#37939 )	2019-01-28 13:04:38 -08:00
Benjamin Trent	7e4c0e6991	ML: Adds set_upgrade_mode API endpoint (#37837 ) * ML: Add MlMetadata.upgrade_mode and API * Adding tests * Adding wait conditionals for the upgrade_mode call to return * Adding tests * adjusting format and tests * Adjusting wait conditions for api return and msgs * adjusting doc tests * adding upgrade mode tests to black list	2019-01-28 09:07:30 -06:00
David Roberts	f2c0c26d15	[ML] Adjust structure finder for Joda to Java time migration (#37306 ) The ML file structure finder has always reported both Joda and Java time format strings. This change makes the Java time format strings the ones that are incorporated into mappings and ingest pipeline definitions. The BWC syntax of prepending "8" to these formats is used. This will need to be removed once Java time format strings become the default in Elasticsearch. This commit also removes direct imports of Joda classes in the structure finder unit tests. Instead the core Joda BWC class is used.	2019-01-26 20:19:57 +00:00
Christoph Büscher	34f2d2ec91	Remove remaining occurances of "include_type_name=true" in docs (#37646 )	2019-01-22 15:13:52 +01:00
David Kyle	0ae7f8630c	Document ml datafeed Id limitations (#37653 )	2019-01-21 14:12:20 +00:00
Benjamin Trent	12cdf1cba4	ML: Add support for single bucket aggs in Datafeeds (#37544 ) Single bucket aggs are now supported in datafeed aggregation configurations.	2019-01-18 15:08:53 -06:00
Lisa Cawley	6dcb3af4c8	[DOCS] Adds size limitation to the get datafeeds APIs (#37578 )	2019-01-17 10:47:15 -08:00
Lisa Cawley	a2d9c464b2	[DOCS] Adds limitation to the get jobs API (#37549 )	2019-01-17 08:21:37 -08:00
Julie Tibshirani	36a3b84fc9	Update the default for include_type_name to false. (#37285 ) * Default include_type_name to false for get and put mappings. * Default include_type_name to false for get field mappings. * Add a constant for the default include_type_name value. * Default include_type_name to false for get and put index templates. * Default include_type_name to false for create index. * Update create index calls in REST documentation to use include_type_name=true. * Some minor clean-ups around the get index API. * In REST tests, use include_type_name=true by default for index creation. * Make sure to use 'expression == false'. * Clarify the different IndexTemplateMetaData toXContent methods. * Fix FullClusterRestartIT#testSnapshotRestore. * Fix the ml_anomalies_default_mappings test. * Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests. We make sure to specify include_type_name=true during xContent parsing, so we continue to test the legacy typed responses. XContent generation for the typeless responses is currently only covered by REST tests, but we will be adding unit test coverage for these as we implement each typeless API in the Java HLRC. This commit also refactors GetMappingsResponse to follow the same appraoch as the other mappings-related responses, where we read include_type_name out of the xContent params, instead of creating a second toXContent method. This gives better consistency in the response parsing code. * Fix more REST tests. * Improve some wording in the create index documentation. * Add a note about types removal in the create index docs. * Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL. * Make sure to mention include_type_name in the REST docs for affected APIs. * Make sure to use 'expression == false' in FullClusterRestartIT. * Mention include_type_name in the REST templates docs.	2019-01-14 13:08:01 -08:00
lcawl	2d5a8ec59d	[DOCS] Remove unused screenshots	2019-01-10 11:10:25 -08:00
lcawl	382e4d39ef	[DOCS] Cleans up xpackml attributes	2019-01-07 14:33:10 -08:00
Dimitris Athanasiou	586453fef1	[ML] Remove types from datafeed (#36538 ) Closes #34265	2019-01-04 09:43:44 +02:00
lcawl	32bed098bb	[DOCS] Synchs titles of X-Pack APIs	2018-12-20 10:27:24 -08:00
Lisa Cawley	4140b9eede	[DOCS] Update X-Pack terminology in security docs (#36564 )	2018-12-19 14:53:37 -08:00
Benjamin Trent	75f1c79d9f	Adding more docs for delayed data detection (#36738 ) * Adding more docs for delayed data detection	2018-12-18 19:14:18 -06:00
David Kyle	e294056bbf	[ML] Merge the Jindex master feature branch (#36702 ) * [ML] Job and datafeed mappings with index template (#32719) Index mappings for the configuration documents * [ML] Job config document CRUD operations (#32738) * [ML] Datafeed config CRUD operations (#32854) * [ML] Change JobManager to work with Job config in index (#33064) * [ML] Change Datafeed actions to read config from the config index (#33273) * [ML] Allocate jobs based on JobParams rather than cluster state config (#33994) * [ML] Return missing job error when .ml-config is does not exist (#34177) * [ML] Close job in index (#34217) * [ML] Adjust finalize job action to work with documents (#34226) * [ML] Job in index: Datafeed node selector (#34218) * [ML] Job in Index: Stop and preview datafeed (#34605) * [ML] Delete job document (#34595) * [ML] Convert job data remover to work with index configs (#34532) * [ML] Job in index: Get datafeed and job stats from index (#34645) * [ML] Job in Index: Convert get calendar events to index docs (#34710) * [ML] Job in index: delete filter action (#34642) This changes the delete filter action to search for jobs using the filter to be deleted in the index rather than the cluster state. * [ML] Job in Index: Enable integ tests (#34851) Enables the ml integration tests excluding the rolling upgrade tests and a lot of fixes to make the tests pass again. * [ML] Reimplement established model memory (#35500) This is the 7.0 implementation of a master node service to keep track of the native process memory requirement of each ML job with an associated native process. The new ML memory tracker service works when the whole cluster is upgraded to at least version 6.6. For mixed version clusters the old mechanism of established model memory stored on the job in cluster state was used. This means that the old (and complex) code to keep established model memory up to date on the job object has been removed in 7.0. Forward port of #35263 * [ML] Need to wait for shards to replicate in distributed test (#35541) Because the cluster was expanded from 1 node to 3 indices would initially start off with 0 replicas. If the original node was killed before auto-expansion to 1 replica was complete then the test would fail because the indices would be unavailable. * [ML] DelayedDataCheckConfig index mappings (#35646) * [ML] JIndex: Restore finalize job action (#35939) * [ML] Replace Version.CURRENT in streaming functions (#36118) * [ML] Use 'anomaly-detector' in job config doc name (#36254) * [ML] Job In Index: Migrate config from the clusterstate (#35834) Migrate ML configuration from clusterstate to index for closed jobs only once all nodes are v6.6.0 or higher * [ML] Check groups against job Ids on update (#36317) * [ML] Adapt to periodic persistent task refresh (#36633) * [ML] Adapt to periodic persistent task refresh If https://github.com/elastic/elasticsearch/pull/36069/files is merged then the approach for reallocating ML persistent tasks after refreshing job memory requirements can be simplified. This change begins the simplification process. * Remove AwaitsFix and implement TODO * [ML] Default search size for configs * Fix TooManyJobsIT.testMultipleNodes Two problems: 1. Stack overflow during async iteration when lots of jobs on same machine 2. Not effectively setting search size in all cases * Use execute() instead of submit() in MlMemoryTracker We don't need a Future to wait for completion * [ML][TEST] Fix NPE in JobManagerTests * [ML] JIindex: Limit the size of bulk migrations (#36481) * [ML] Prevent updates and upgrade tests (#36649) * [FEATURE][ML] Add cluster setting that enables/disables config migration (#36700) This commit adds a cluster settings called `xpack.ml.enable_config_migration`. The setting is `true` by default. When set to `false`, no config migration will be attempted and non-migrated resources (e.g. jobs, datafeeds) will be able to be updated normally. Relates #32905 * [ML] Snapshot ml configs before migrating (#36645) * [FEATURE][ML] Split in batches and migrate all jobs and datafeeds (#36716) Relates #32905 * SQL: Fix translation of LIKE/RLIKE keywords (#36672) * SQL: Fix translation of LIKE/RLIKE keywords Refactor Like/RLike functions to simplify internals and improve query translation when chained or within a script context. Fix #36039 Fix #36584 * Fixing line length for EnvironmentTests and RecoveryTests (#36657) Relates #34884 * Add back one line removed by mistake regarding java version check and COMPAT jvm parameter existence * Do not resolve addresses in remote connection info (#36671) The remote connection info API leads to resolving addresses of seed nodes when invoked. This is problematic because if a hostname fails to resolve, we would not display any remote connection info. Yet, a hostname not resolving can happen across remote clusters, especially in the modern world of cloud services with dynamically chaning IPs. Instead, the remote connection info API should be providing the configured seed nodes. This commit changes the remote connection info to display the configured seed nodes, avoiding a hostname resolution. Note that care was taken to preserve backwards compatibility with previous versions that expect the remote connection info to serialize a transport address instead of a string representing the hostname. * [Painless] Add boxed type to boxed type casts for method/return (#36571) This adds implicit boxed type to boxed types casts for non-def types to create asymmetric casting relative to the def type when calling methods or returning values. This means that a user calling a method taking an Integer can call it with a Byte, Short, etc. legally which matches the way def works. This creates consistency in the casting model that did not previously exist. * SNAPSHOTS: Adjust BwC Versions in Restore Logic (#36718) * Re-enables bwc tests with adjusted version conditions now that #36397 enables concurrent snapshots in 6.6+ * ingest: fix on_failure with Drop processor (#36686) This commit allows a document to be dropped when a Drop processor is used in the on_failure fork of the processor chain. Fixes #36151 * Initialize startup `CcrRepositories` (#36730) Currently, the CcrRepositoryManger only listens for settings updates and installs new repositories. It does not install the repositories that are in the initial settings. This commit, modifies the manager to install the initial repositories. Additionally, it modifies the ccr integration test to configure the remote leader node at startup, instead of using a settings update. * [TEST] fix float comparison in RandomObjects#getExpectedParsedValue This commit fixes a test bug introduced with #36597. This caused some test failure as stored field values comparisons would not work when CBOR xcontent type was used. Closes #29080 * [Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (#35320) This commit exposes lucene's LatLonShape field as the default type in GeoShapeFieldMapper. To use the new indexing approach, simply set "type" : "geo_shape" in the mappings without setting any of the strategy, precision, tree_levels, or distance_error_pct parameters. Note the following when using the new indexing approach: * geo_shape query does not support querying by MULTIPOINT. * LINESTRING and MULTILINESTRING queries do not yet support WITHIN relation. * CONTAINS relation is not yet supported. The tree, precision, tree_levels, distance_error_pct, and points_only parameters are deprecated. * TESTS:Debug Log. IndexStatsIT#testFilterCacheStats * ingest: support default pipelines + bulk upserts (#36618) This commit adds support to enable bulk upserts to use an index's default pipeline. Bulk upsert, doc_as_upsert, and script_as_upsert are all supported. However, bulk script_as_upsert has slightly surprising behavior since the pipeline is executed _before_ the script is evaluated. This means that the pipeline only has access the data found in the upsert field of the script_as_upsert. The non-bulk script_as_upsert (existing behavior) runs the pipeline _after_ the script is executed. This commit does _not_ attempt to consolidate the bulk and non-bulk behavior for script_as_upsert. This commit also adds additional testing for the non-bulk behavior, which remains unchanged with this commit. fixes #36219 * Fix duplicate phrase in shrink/split error message (#36734) This commit removes a duplicate "must be a" from the shrink/split error messages. * Deprecate types in get_source and exist_source (#36426) This change adds a new untyped endpoint `{index}/_source/{id}` for both the GET and the HEAD methods to get the source of a document or check for its existance. It also adds deprecation warnings to RestGetSourceAction that emit a warning when the old deprecated "type" parameter is still used. Also updating documentation and tests where appropriate. Relates to #35190 * Revert "[Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (#35320)" This reverts commit `5bc7822562`. * Enhance Invalidate Token API (#35388) This change: - Adds functionality to invalidate all (refresh+access) tokens for all users of a realm - Adds functionality to invalidate all (refresh+access)tokens for a user in all realms - Adds functionality to invalidate all (refresh+access) tokens for a user in a specific realm - Changes the response format for the invalidate token API to contain information about the number of the invalidated tokens and possible errors that were encountered. - Updates the API Documentation After back-porting to 6.x, the `created` field will be removed from master as a field in the response Resolves: #35115 Relates: #34556 * Add raw sort values to SearchSortValues transport serialization (#36617) In order for CCS alternate execution mode (see #32125) to be able to do the final reduction step on the CCS coordinating node, we need to serialize additional info in the transport layer as part of each `SearchHit`. Sort values are already present but they are formatted according to the provided `DocValueFormat` provided. The CCS node needs to be able to reconstruct the lucene `FieldDoc` to include in the `TopFieldDocs` and `CollapseTopFieldDocs` which will feed the `mergeTopDocs` method used to reduce multiple search responses (one per cluster) into one. This commit adds such information to the `SearchSortValues` and exposes it through a new getter method added to `SearchHit` for retrieval. This info is only serialized at transport and never printed out at REST. * Watcher: Ensure all internal search requests count hits (#36697) In previous commits only the stored toXContent version of a search request was using the old format. However an executed search request was already disabling hit counts. In 7.0 hit counts will stay enabled by default to allow for proper migration. Closes #36177 * [TEST] Ensure shard follow tasks have really stopped. Relates to #36696 * Ensure MapperService#getAllMetaFields elements order is deterministic (#36739) MapperService#getAllMetaFields returns an array, which is created out of an `ObjectHashSet`. Such set does not guarantee deterministic hash ordering. The array returned by its toArray may be sorted differently at each run. This caused some repeatability issues in our tests (see #29080) as we pick random fields from the array of possible metadata fields, but that won't be repeatable if the input array is sorted differently at every run. Once setting the tests seed, hppc picks that up and the sorting is deterministic, but failures don't repeat with the seed that gets printed out originally (as a seed was not originally set). See also https://issues.carrot2.org/projects/HPPC/issues/HPPC-173. With this commit, we simply create a static sorted array that is used for `getAllMetaFields`. The change is in production code but really affects only testing as the only production usage of this method was to iterate through all values when parsing fields in the high-level REST client code. Anyways, this seems like a good change as returning an array would imply that it's deterministically sorted. * Expose Sequence Number based Optimistic Concurrency Control in the rest layer (#36721) Relates #36148 Relates #10708 * [ML] Mute MlDistributedFailureIT	2018-12-18 17:45:31 +00:00
David Roberts	9e8cfbb40d	[ML] Deprecate X-Pack centric ML endpoints (#36315 ) This commit is part of our plan to deprecate and ultimately remove the use of _xpack in the REST APIs. Relates #35958	2018-12-07 20:34:11 +00:00
Lisa Cawley	c24be278e4	[DOCS] Refreshes population job examples (#36101 )	2018-11-30 08:55:29 -08:00
Ed Savage	13e11966ca	[HLRC][ML] Add delete expired data API (#35906 ) Relates to #29827	2018-11-26 16:15:54 +00:00
David Roberts	6f46584380	[ML] Add docs for ML info endpoint (#35783 ) This endpoint was not previously documented as it was not particularly useful to end users. However, since the HLRC will support the endpoint we need some documentation to link to. The purpose of the endpoint is to provide defaults and limits used by ML. These are needed to fully understand configurations that have missing values because the missing value means the default should be used. Relates #35777	2018-11-22 16:23:31 +00:00
Benjamin Trent	bc7dea4480	ML: changing automatic check_window calculation (#35643 ) * ML: changing automatic check_window calculation * adding docs on how we calculate the default	2018-11-19 08:03:34 -06:00
Benjamin Trent	f7ada9b29b	Add delayed datacheck to the datafeed job runner (#35387 ) * ML: Adding missing datacheck to datafeedjob * Adding client side and docs * Making adjustments to validations * Making values default to on, having more sensible limits * Intermittent commit, still need to figure out interval * Adjusting delayed data check interval * updating docs * Making parameter Boolean, so it is nullable * bumping bwc to 7 before backport * changing to version current * moving delayed data check config its own object * Separation of duties for delayed data detection * fixing checkstyles * fixing checkstyles * Adjusting default behavior so that null windows are allowed * Mentioning the default value * Fixing comments, syncing up validations	2018-11-15 13:32:45 -06:00
Lisa Cawley	9aeaceac4b	[DOCS] Clarify results_index_name description (#35463 )	2018-11-12 13:08:57 -08:00
David Roberts	c455be7bc2	[ML] Rename the json file structure to ndjson (#34901 ) The file structure finder endpoint can find the NDJSON (newline-delimited JSON) file format, but called it `json`. This change renames the `format` for this file structure to `ndjson`, which is more precise and will hopefully avoid confusion.	2018-10-29 10:06:12 +01:00
Julie Tibshirani	f854330e06	Make sure to use the type _doc in the REST documentation. (#34662 ) * Replace custom type names with _doc in REST examples. * Avoid using two mapping types in the percolator docs. * Rename doc -> _doc in the main repository README. * Also replace some custom type names in the HLRC docs.	2018-10-22 11:54:04 -07:00
Ryan Ernst	d445785f1a	Scripting: Convert domainSplit function for ML to whitelist (#34426 ) This commit moves the definition of domainSplit into java and exposes it as a painless whitelist extension. The method also no longer needs params, and version which ignores params is added and deprecated.	2018-10-17 15:54:21 -07:00
David Roberts	21c759af0e	[ML] Add an ingest pipeline definition to structure finder (#34350 ) The ingest pipeline that is produced is very simple. It contains a grok processor if the format is semi-structured text, a date processor if the format contains a timestamp, and a remove processor if required to remove the interim timestamp field parsed out of semi-structured text. Eventually the UI should offer the option to customize the pipeline with additional processors to perform other data preparation steps before ingesting data to an index.	2018-10-12 07:56:35 +01:00
Dimitris Athanasiou	4dacfa95d2	[ML] Allow asynchronous job deletion (#34058 ) This changes the delete job API by adding the choice to delete a job asynchronously. The commit adds a `wait_for_completion` parameter to the delete job request. When set to `false`, the action returns immediately and the response contains the task id. This also changes the handling of subsequent delete requests for a job that is already being deleted. It now uses the task framework to check if the job is being deleted instead of the cluster state. This is a beneficial for it is going to also be working once the job configs are moved out of the cluster state and into an index. Also, force delete requests that are waiting for the job to be deleted will not proceed with the deletion if the first task fails. This will prevent overloading the cluster. Instead, the failure is communicated better via notifications so that the user may retry. Finally, this makes the `deleting` property of the job visible (also it was renamed from `deleted`). This allows a client to render a deleting job differently. Closes #32836	2018-10-05 02:41:28 +03:00
Ed Savage	577261ee57	[ML] Label anomalies with multi_bucket_impact (#34233 ) * [ML] Label anomalies with multi_bucket_impact Add the multi_bucket_impact field to record results.	2018-10-04 09:08:21 +01:00
Kazuhiro Sera	d45fe43a68	Fix a variety of typos and misspelled words (#32792 )	2018-10-03 18:11:38 +01:00
David Roberts	a1d2ded98d	[ML] Fix unit test deadlock problem (#34174 ) This change fixes a potential deadlock problem in the unit test introduced in #34117. It also removes a piece of debug code and corrects a docs formatting problem that were both added in that same PR.	2018-10-01 15:35:37 +01:00
lcawl	57052f617a	[DOCS] Fixes callout in ML API	2018-09-28 10:06:02 -07:00
David Roberts	f709c2f694	[ML] Add a timeout option to file structure finder (#34117 ) This can be used to restrict the amount of CPU a single structure finder request can use. The timeout is not implemented precisely, so requests may run for slightly longer than the timeout before aborting. The default is 25 seconds, which is a little below Kibana's default timeout of 30 seconds for calls to Elasticsearch APIs.	2018-09-28 17:32:35 +01:00
David Roberts	dfe5af0411	[ML] Return both Joda and Java formats from structure finder (#33900 ) Previously the timestamp_formats field in the response from the find_file_structure endpoint contained Joda timestamp formats. This change makes that clear by renaming the field to joda_timestamp_formats, and also adds a java_timestamp_formats field containing the equivalent Java time format strings.	2018-09-25 12:52:51 +01:00
David Roberts	b89551c452	[ML] Display integers without .0 in file structure field stats (#33947 ) Previously numeric values in the field_stats created by the find_file_structure endpoint were always output with a decimal point. This looked unfriendly and unnatural for fields that clearly store integer values. This change converts integer values to type Integer before output in the file structure field stats.	2018-09-22 15:48:59 +01:00

1 2

54 Commits