OpenSearch

mirror of https://github.com/honeymoose/OpenSearch.git synced 2025-02-06 04:58:50 +00:00

Author	SHA1	Message	Date
Colin Goodheart-Smithe	0b42eda0e3	Merge branch 'master' into index-lifecycle	2018-10-15 16:03:37 +01:00
David Roberts	21c759af0e	[ML] Add an ingest pipeline definition to structure finder (#34350 ) The ingest pipeline that is produced is very simple. It contains a grok processor if the format is semi-structured text, a date processor if the format contains a timestamp, and a remove processor if required to remove the interim timestamp field parsed out of semi-structured text. Eventually the UI should offer the option to customize the pipeline with additional processors to perform other data preparation steps before ingesting data to an index.	2018-10-12 07:56:35 +01:00
David Turner	7352f0da60	Handle pre-6.x time fields (#34373 ) In ccb9ab5717d85c6786b081f197d67ac7dde4d317 we changed how we deal with time fields to support the `DateTime`-format fields added in 6.0, but dropped support for pre-6.x `Long`-format fields. This change reinstates this support for cases where pre-6.x data is made available to ML (e.g. in a mixed-version CCS setup or after an upgrade).	2018-10-11 15:33:09 +01:00
Dimitris Athanasiou	4dacfa95d2	[ML] Allow asynchronous job deletion (#34058 ) This changes the delete job API by adding the choice to delete a job asynchronously. The commit adds a `wait_for_completion` parameter to the delete job request. When set to `false`, the action returns immediately and the response contains the task id. This also changes the handling of subsequent delete requests for a job that is already being deleted. It now uses the task framework to check if the job is being deleted instead of the cluster state. This is a beneficial for it is going to also be working once the job configs are moved out of the cluster state and into an index. Also, force delete requests that are waiting for the job to be deleted will not proceed with the deletion if the first task fails. This will prevent overloading the cluster. Instead, the failure is communicated better via notifications so that the user may retry. Finally, this makes the `deleting` property of the job visible (also it was renamed from `deleted`). This allows a client to render a deleting job differently. Closes #32836	2018-10-05 02:41:28 +03:00
David Kyle	ef5007b6d8	[ML] Remove unused last_data_time member from Job (#34262 )	2018-10-04 13:16:14 +01:00
Kazuhiro Sera	d45fe43a68	Fix a variety of typos and misspelled words (#32792 )	2018-10-03 18:11:38 +01:00
Lee Hinman	2d9cb21490	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-10-01 14:10:09 -06:00
Benjamin Trent	96be057195	Removing unused ML parameters (#34159 )	2018-10-01 08:09:46 -07:00
David Roberts	a1d2ded98d	[ML] Fix unit test deadlock problem (#34174 ) This change fixes a potential deadlock problem in the unit test introduced in #34117. It also removes a piece of debug code and corrects a docs formatting problem that were both added in that same PR.	2018-10-01 15:35:37 +01:00
Lee Hinman	6ea396a476	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-09-28 15:40:12 -06:00
David Roberts	f709c2f694	[ML] Add a timeout option to file structure finder (#34117 ) This can be used to restrict the amount of CPU a single structure finder request can use. The timeout is not implemented precisely, so requests may run for slightly longer than the timeout before aborting. The default is 25 seconds, which is a little below Kibana's default timeout of 30 seconds for calls to Elasticsearch APIs.	2018-09-28 17:32:35 +01:00
Lee Hinman	a26cc1a242	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-09-27 11:00:37 -06:00
Christoph Büscher	ba3ceeaccf	Clean up "unused variable" warnings (#31876 ) This change cleans up "unused variable" warnings. There are several cases were we most likely want to suppress the warnings (especially in the client documentation test where the snippets contain many unused variables). In a lot of cases the unused variables can just be deleted though.	2018-09-26 14:09:32 +02:00
Ed Savage	cc70352b3f	[ML] Modify thresholds for normalization triggers (#33663 ) [ML] Modify thresholds for normalization triggers The (arbitrary) threshold factors used to judge if scores have changed significantly enough to trigger a look-back renormalization have been changed to values that reduce the frequency of such renormalizations. Added a clause to treat changes in scores as a 'big change' if it would result in a change of severity reported in the UI. Also altered the clause affecting small scores so that a change should be considered big if scores have changed by at least 1.5. Relates https://github.com/elastic/machine-learning-qa/issues/263	2018-09-25 15:30:10 +01:00
David Roberts	dfe5af0411	[ML] Return both Joda and Java formats from structure finder (#33900 ) Previously the timestamp_formats field in the response from the find_file_structure endpoint contained Joda timestamp formats. This change makes that clear by renaming the field to joda_timestamp_formats, and also adds a java_timestamp_formats field containing the equivalent Java time format strings.	2018-09-25 12:52:51 +01:00
Benjamin Trent	74d7be805a	Make certain ML node settings dynamic (#33565 ) (#33961 ) * Make certain ML node settings dynamic (#33565) * Changing to pull in updating settings and pass to constructor * adding note about only newly opened jobs getting updated value	2018-09-24 12:54:32 -07:00
Lee Hinman	243e863f6e	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-09-24 10:33:51 -06:00
David Roberts	b89551c452	[ML] Display integers without .0 in file structure field stats (#33947 ) Previously numeric values in the field_stats created by the find_file_structure endpoint were always output with a decimal point. This looked unfriendly and unnatural for fields that clearly store integer values. This change converts integer values to type Integer before output in the file structure field stats.	2018-09-22 15:48:59 +01:00
Benjamin Trent	e17bd8e913	Removing poor randomization for node name (#33918 )	2018-09-21 04:49:20 -07:00
Christoph Büscher	b654d986d7	Add OneStatementPerLineCheck to Checkstyle rules (#33682 ) This change adds the OneStatementPerLineCheck to our checkstyle precommit checks. This rule restricts the number of statements per line to one. The resoning behind this is that it is very difficult to read multiple statements on one line. People seem to mostly use it in short lambdas and switch statements in our code base, but just going through the changes already uncovered some actual problems in randomization in test code, so I think its worth it.	2018-09-21 11:52:31 +02:00
Dimitris Athanasiou	8e3a0fad9d	[ML] Refactor job deletion logic into the transport action (#33891 ) The job deletion logic was scattered around a few places: the transport action, the job manager and the deletion task. Overloading the task with deletion logic also meant extra dependencies in the core package which should be unnecessary. This commit consolidates all this logic into the transport action and replaces the deletion task with a plain one that needs not be aware of deletion logic.	2018-09-20 15:48:42 +01:00
Benjamin Trent	4767a016a5	Adding node_count to ML Usage (#33850 ) (#33863 )	2018-09-19 13:35:09 -07:00
Lee Hinman	81e9150c7a	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-09-19 09:43:26 -06:00
Alan Woodward	5107949402	Allow TokenFilterFactories to rewrite themselves against their preceding chain (#33702 ) We currently special-case SynonymFilterFactory and SynonymGraphFilterFactory, which need to know their predecessors in the analysis chain in order to correctly analyze their synonym lists. This special-casing doesn't work with Referring filter factories, such as the Multiplexer or Conditional filters. We also have a number of filters (eg the Multiplexer) that will break synonyms when they appear before them in a chain, because they produce multiple tokens at the same position. This commit adds two methods to the TokenFilterFactory interface. * `getChainAwareTokenFilterFactory()` allows a filter factory to rewrite itself against its preceding filter chain, or to resolve references to other filters. It replaces `ReferringFilterFactory` and `CustomAnalyzerProvider.checkAndApplySynonymFilter`, and by default returns `this`. * `getSynonymFilter()` defines whether or not a filter should be applied when building a synonym list `Analyzer`. By default it returns `true`. Fixes #33609	2018-09-19 15:52:14 +01:00
Benjamin Trent	4190a9f1e9	Delete custom index if the only contained job is deleted (#33788 ) * Delete custom index if the only contained job is deleted	2018-09-19 07:42:26 -07:00
Lee Hinman	e6cbaa5a78	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-09-14 16:27:37 -06:00
David Roberts	568ac10ca6	[ML] Allow overrides for some file structure detection decisions (#33630 ) This change modifies the file structure detection functionality such that some of the decisions can be overridden with user supplied values. The fields that can be overridden are: - charset - format - has_header_row - column_names - delimiter - quote - should_trim_fields - grok_pattern - timestamp_field - timestamp_format If an override makes finding the file structure impossible then the endpoint will return an exception.	2018-09-14 09:29:11 +01:00
Benjamin Trent	7e51b960fb	Adding index refresh (#33647 )	2018-09-13 10:44:33 -07:00
Colin Goodheart-Smithe	8e59de3eb2	Merge branch 'master' into index-lifecycle	2018-09-13 09:46:14 +01:00
Jay Modi	20c6c9c542	Address license state update/read thread safety (#33396 ) This change addresses some issues regarding thread safety around updates and method calls on the XPackLicenseState object. There exists a possibility that there could be a concurrent update to the XPackLicenseState when there is a scheduled check to see if the license is expired and a cluster state update. In order to address this, the update method now has a synchronized block where member variables are updated. Each method that reads these variables is now also synchronized. Along with the above change, there was a consistency issue around security calls to the license state. The majority of security checks make two calls to the license state, which could result in incorrect behavior due to the checks being made against different license states. The majority of this behavior was introduced for 6.3 with the inclusion of x-pack in the default distribution. In order to resolve the majority of these cases, the `isSecurityEnabled` method is no longer public and the logic is also included in individual methods about security such as `isAuthAllowed`. There were a few cases where this did not remove multiple calls on the license state, so a new method has been added which creates a copy of the current license state that will not change. Callers can use this copy of the license state to make decisions based on a consistent view of the license state.	2018-09-12 13:08:09 -06:00
David Roberts	8e05ce567f	[ML] Rename input_fields to column_names in file structure (#33568 ) This change tightens up the meaning of the "input_fields" field in the file structure finder output. Previously it was permitted but not calculated for JSON and XML files. Following this change the field is called "column_names" and is only permitted for delimited files. Additionally the way the column names are set for headerless delimited files is refactored to encapsulate the way they're named to one line of the code rather than having the same logic in two places.	2018-09-11 08:46:26 +01:00
Colin Goodheart-Smithe	cdc4f57a77	Merge branch 'master' into index-lifecycle	2018-09-10 21:30:44 +01:00
Dimitris Athanasiou	fcb15b0ce3	[ML] Get job stats request should filter non-ML job tasks (#33516 ) When requesting job stats for `_all`, all ES tasks are accepted resulting to loads of cluster traffic and a memory overhead. This commit correctly filters out non ML job tasks. Closes #33515	2018-09-09 22:53:03 +01:00
Nhat Nguyen	94e4cb64c2	Bootstrap a new history_uuid when force allocating a stale primary (#33432 ) This commit ensures that we bootstrap a new history_uuid when force allocating a stale primary. A stale primary should never be the source of an operation-based recovery to another shard which exists before the forced-allocation. Closes #26712	2018-09-08 19:29:31 -04:00
David Roberts	e42cc5cd8c	[ML] Add a file structure determination endpoint (#33471 ) This endpoint accepts an arbitrary file in the request body and attempts to determine the structure. If successful it also proposes mappings that could be used when indexing the file's contents, and calculates simple statistics for each of the fields that are useful in the data preparation step prior to configuring machine learning jobs.	2018-09-07 17:41:57 +01:00
Colin Goodheart-Smithe	017ffe5d12	Merge branch 'master' into index-lifecycle	2018-09-07 10:59:10 +01:00
Jim Ferenczi	79cd6385fe	Collapse package structure for metrics aggs (#33463 ) This change collapses all metrics aggregations classes into a single package `org.elasticsearch.aggregations.metrics`. It also restricts the visibility of some classes (aggregators and factories) that should not be used outside of the package. Relates #22868	2018-09-07 10:58:06 +02:00
David Roberts	0849b98f60	[ML] Rename log structure to file structure (#33421 ) Many files supplied to the upcoming ML data preparation functionality will not be "log" files. For example, CSV files are generally not "log" files. Therefore it makes sense to rename library that determines the structure of these files. Although "file structure" could be considered too broad, as the library currently only works with a few text formats, in the future it may be extended to work with more formats.	2018-09-06 09:13:08 +01:00
Tal Levy	b5f7fb6882	Merge branch 'master' into index-lifecycle	2018-09-05 12:56:58 -07:00
David Roberts	a296829205	[ML] Add field stats to log structure finder (#33351 ) The log structure endpoint will return these in addition to pure structure information so that it can be used to drive pre-import data visualizer functionality. The statistics for every field are count, cardinality (distinct count) and top hits (most common values). Extra statistics are calculated if the field is numeric: min, max, mean and median.	2018-09-05 12:57:20 +01:00
Colin Goodheart-Smithe	f00a28a909	Merge branch 'master' into index-lifecycle	2018-09-05 09:48:48 +01:00
Nik Everett	ebd5eb6dc2	ML: Fix build after HLRC change I recently merged a HLRC change that passed the PR builds but didn't compile after merging. Sad time. This fixes the compilation.	2018-09-04 11:10:44 -04:00
Sohaib Iftikhar	761e8c461f	HLRC: Add delete by query API (#32782 ) Adds the delete-by-query API to the High Level REST Client.	2018-09-04 08:56:26 -04:00
Dimitris Athanasiou	1457b07a06	[ML] The sort field on get records should default to the record_score (#33358 ) This is not changing the behaviour as when the sort field was set to `influencer_score` the secondary sort would be used and that was using the `record_score` at the highest priority.	2018-09-04 11:38:24 +01:00
David Roberts	84eaac79d7	[ML] Minor improvements to categorization Grok pattern creation (#33353 ) 1. The TOMCAT_DATESTAMP format needs to be checked before TIMESTAMP_ISO8601, otherwise TIMESTAMP_ISO8601 will match the start of the Tomcat datestamp. 2. Exclude more characters before and after numbers. For example, in 1.2.3 we don't want to match 1.2 as a float.	2018-09-04 09:43:49 +01:00
Alpar Torok	7f7e8fd733	Disable assemble task instead of removing it (#33348 )	2018-09-04 07:32:14 +03:00
Benjamin Trent	767d8e0801	[ML] Delete forecast API (#31134 ) (#33218 ) * Delete forecast API (#31134)	2018-09-03 19:06:18 -05:00
Colin Goodheart-Smithe	e2c1beb1be	Merge branch 'master' into index-lifecycle	2018-09-03 10:01:16 +01:00
Nhat Nguyen	b93507608a	Merge branch 'master' into ccr * master: Mute test watcher usage stats output [Rollup] Fix FullClusterRestart test Adjust soft-deletes version after backport into 6.5 completely drop `index.shard.check_on_startup: fix` for 7.0 (#33194) Fix AwaitsFix issue number Mute SmokeTestWatcherWithSecurityIT testsi drop `index.shard.check_on_startup: fix` (#32279) tracked at [DOCS] Moves ml folder from x-pack/docs to docs (#33248) [DOCS] Move rollup APIs to docs (#31450) [DOCS] Rename X-Pack Commands section (#33005) TEST: Disable soft-deletes in ParentChildTestCase Fixes SecurityIntegTestCase so it always adds at least one alias (#33296) Fix pom for build-tools (#33300) Lazy evaluate java9home (#33301) SQL: test coverage for JdbcResultSet (#32813) Work around to be able to generate eclipse projects (#33295) Highlight that index_phrases only works if no slop is used (#33303) Different handling for security specific errors in the CLI. Fix for https://github.com/elastic/elasticsearch/issues/33230 (#33255) [ML] Refactor delimited file structure detection (#33233) SQL: Support multi-index format as table identifier (#33278) MINOR: Remove Dead Code from PathTrie (#33280) Enable forbiddenapis server java9 (#33245)	2018-08-31 19:03:04 -04:00
Colin Goodheart-Smithe	3eef74d5d5	Merge branch 'master' into index-lifecycle	2018-08-31 14:45:22 +01:00

1 2 3

145 Commits