OpenSearch

mirror of https://github.com/honeymoose/OpenSearch.git synced 2025-02-19 19:35:02 +00:00

Author	SHA1	Message	Date
Julie Tibshirani	c7bfb5de41	Add search `fields` parameter to support high-level field retrieval. (#60258 ) This feature adds a new `fields` parameter to the search request, which consults both the document `_source` and the mappings to fetch fields in a consistent way. The PR merges the `field-retrieval` feature branch. Addresses #49028 and #55363.	2020-07-28 10:58:20 -07:00
Nhat Nguyen	416e51980c	Relax ShardFollowTasksExecutor validation (#60054 ) If a primary shard of a follower index is being relocated, then we will fail to create a follow-task. This validation is too restricted. We should ensure that all primaries of the follower index are active instead. Closes #59625	2020-07-28 13:46:49 -04:00
Nhat Nguyen	6ece629ec3	Set timeout of master requests on follower to unbounded (#60070 ) Today, a follow task will fail if the master node of the follower cluster is temporarily overloaded and unable to process master node requests (such as update mapping, setting, or alias) from a follow-task within the default timeout. This error is transient, and follow-tasks should not abort. We can avoid this problem by setting the timeout of master node requests on the follower cluster to unbounded. Closes #56891	2020-07-28 13:46:49 -04:00
Zachary Tong	9f8ec3e3fb	Mute SSLDriverTests#testCloseDuringHandshakePreJDK11 Tracking issue: https://github.com/elastic/elasticsearch/issues/59992	2020-07-28 13:20:53 -04:00
markharwood	e0286e9bd3	Search - remove allow-expensive-query checks from wildcard field. (#60273 ) (#60308 ) Removing allow-expensive-query checks because we think this field type is fast enough. Closes #60139	2020-07-28 17:12:33 +01:00
Dimitris Athanasiou	ed7dcff7c4	[7.x][ML] Audit updates on data frame analytics jobs (#60126 ) (#60287 ) Closes #59652 Backport of #60126	2020-07-28 16:33:35 +03:00
Dimitris Athanasiou	16ffcfb9f6	[7.x][ML] Ensure bulk requests are not over memory limit (#60219 ) (#60283 ) Data frame analytics jobs that work with very large datasets may produce bulk requests that are over the memory limit for indexing. This commit adds a helper class that bundles index requests in bulk requests that steer away from the memory limit. We then use this class both from the results joiner and the inference runner ensuring data frame analytics jobs do not generate bulk requests that are too large. Note the limit was implemented in #58885. Backport of #60219	2020-07-28 16:04:03 +03:00
Dimitris Athanasiou	981e436d6c	[7.x][ML] Improve assertion on regression alias field test (#60221 ) (#60264 ) Previously the test was asserting the prediction on each document was close 10.0 from the expected. It turned out that was not enough as we occasionally saw the test failing by little. Instead of relaxing that assertion, this commit changes it to assert the mean prediction error is less than 10.0. This should reduce the chances of the test failing significantly. Fixes #60212 Backport of #60221	2020-07-28 11:48:00 +03:00
Dan Hermann	b98caf58ee	Mark data stream APIs as stable (#59860 ) (#60206 )	2020-07-27 10:37:52 -05:00
Benjamin Trent	ea3c49979e	Test mute for issue 60212 (#60214 )	2020-07-27 10:10:40 -04:00
Hendrik Muhs	95c99ca887	[Transform] Fix Regression: continuous transform can fail for (date) histogram group_by(#60196 ) do not create change collector if group_by configuration does not support change detection fixes #60125	2020-07-27 14:50:03 +02:00
Dimitris Athanasiou	439b7f7e59	[7.x][ML] DFA result processor should only skip rows and model chunks on cancel (#60113 ) (#60193 ) When the job is force-closed or shutting down due to a fatal error we clean up all cancellable job operations. This includes cancelling the results processor. However, this means that we might not persist objects that are written from the process like stats, memory usage, etc. In hindsight, we do not gain from cancelling the results processor in its entirety. It makes more sense to skip row results and model chunks but keep stats and instrumentation about the job as the latter may contain useful information to understand what happened to the job. Backport of #60113	2020-07-27 13:42:46 +03:00
David Roberts	89466eefa5	Don't require separate privilege for internal detail of put pipeline (#60190 ) Putting an ingest pipeline used to require that the user calling it had permission to get nodes info as well as permission to manage ingest. This was due to an internal implementaton detail that was not visible to the end user. This change alters the behaviour so that a user with the manage_pipeline cluster privilege can put an ingest pipeline regardless of whether they have the separate privilege to get nodes info. The internal implementation detail now runs as the internal _xpack user when security is enabled. Backport of #60106	2020-07-27 10:44:48 +01:00
Nhat Nguyen	bc65b3a590	Increase timeout in AutoFollowIT (#60004 ) It can take more than 10 seconds to auto-follow and create a follow-task on a slow CI. This commit increases timeout in AutoFollowIT by replacing assertBusy with assertLongBusy. Closes #59952	2020-07-23 16:36:53 -04:00
Nhat Nguyen	0fe4d5df67	Increase timeout testFollowIndexWithConcurrentMappingChanges Fixes #59273	2020-07-23 16:22:58 -04:00
Dimitris Athanasiou	6b9a362ec2	[7.x][ML] Skip test inference if DFA task has been stopped (#60116 ) (#60127 ) If the job is stopped before starting inference on test data, we should skip inference entirely. Backport of #60116	2020-07-23 18:34:09 +03:00
Dan Hermann	ca25f6ae6f	Include the resolve index action in the view_index_metadata privilege (#59785 ) (#60112 )	2020-07-23 08:13:56 -05:00
Dan Hermann	fe12217c7f	[7.x] Move REST specs for data streams (#60111 )	2020-07-23 08:10:54 -05:00
Armin Braun	ebb6677815	Formalize and Streamline Buffer Sizes used by Repositories (#59771 ) (#60051 ) Due to complicated access checks (reads and writes execute in their own access context) on some repositories (GCS, Azure, HDFS), using a hard coded buffer size of 4k for restores was needlessly inefficient. By the same token, the use of stream copying with the default 8k buffer size for blob writes was inefficient as well. We also had dedicated, undocumented buffer size settings for HDFS and FS repositories. For these two we would use a 100k buffer by default. We did not have such a setting for e.g. GCS though, which would only use an 8k read buffer which is needlessly small for reading from a raw `URLConnection`. This commit adds an undocumented setting that sets the default buffer size to `128k` for all repositories. It removes wasteful allocation of such a large buffer for small writes and reads in case of HDFS and FS repositories (i.e. still using the smaller buffer to write metadata) but uses a large buffer for doing restores and uploading segment blobs. This should speed up Azure and GCS restores and snapshots in a non-trivial way as well as save some memory when reading small blobs on FS and HFDS repositories.	2020-07-22 21:06:31 +02:00
Larry Gregory	a686ccc9b2	[Backport][7.x] Introduce reserved_ml_apm_user kibana privilege (#59854 ) (#60047 )	2020-07-22 11:06:10 -04:00
Jay Modi	c8ef2e18f7	Thread safe clean up of LocalNodeModeListeners (#60007 ) This commit continues on the work in #59801 and makes other implementors of the LocalNodeMasterListener interface thread safe in that they will no longer allow the callbacks to run on different threads and possibly race each other. This also helps address other issues where these events could be queued to wait for execution while the service keeps moving forward thinking it is the master even when that is not the case. In order to accomplish this, the LocalNodeMasterListener no longer has the executorName() method to prevent future uses that could encounter this surprising behavior. Each use was inspected and if the class was also a ClusterStateListener, the implementation of LocalNodeMasterListener was removed in favor of a single listener that combined the logic. A single listener is used and there is currently no guarantee on execution order between ClusterStateListeners and LocalNodeMasterListeners, so a future change there could cause undesired consequences. For other classes, the implementations of the callbacks were inspected and if the operations were lightweight, the overriden executorName method was removed to use the default, which runs on the same thread. Backport of #59932	2020-07-22 08:02:18 -06:00
Dimitris Athanasiou	7e652ca873	[7.x][ML] Include same fields during test inference as in training (#… (#60034 ) In #58877, when we switched test inference on java, we just use the doc's `_source` as features. However, this could be missing out on features that were used during training, e.g. alias fields, etc. This commit addresses this by extracting fields to use as features during inference the same way they are extracted in `DataFrameDataExtractor` when they are used for training. Backport of #59963	2020-07-22 12:54:13 +03:00
David Roberts	7358f9fb05	[ML] Mute ForecastIT.testOverflowToDisk in EAR builds (#60040 ) Due to https://github.com/elastic/elasticsearch/issues/58806	2020-07-22 10:17:37 +01:00
James Baiera	1c1a4297e0	Track backing indices in data streams stats from cluster state (#59817 ) (#60015 ) If shard level results are incomplete in the data streams stats call, it is possible to get inaccurate counts of the number of backing indices, despite this data being accurate and available in the cluster state.	2020-07-21 23:21:33 -04:00
James Baiera	b3363cf8f9	[7.x] Remove unneeded rest params from Data Stream Stats (#59575 ) (#59661 ) This PR removes the expand_wildcards and forbid_closed_indices parameters from the Data Streams Stats REST endpoint. These options are required for broadcast requests, but are not needed for anything in terms of resolving data streams. Instead, we just set a default set of IndicesOptions on the transport request.	2020-07-21 15:59:16 -04:00
Armin Braun	5613e4b00b	Increase Timeout in testSLMRetentionAfterRestore (#59979 ) (#59991 ) This test failed by hitting the 10s default busy assert timeout. Given how involved the retention run is (multiple disk reads, CS updates etc.) we should have a higher timeout here. Also, removed the pointless delete call for the snapshot that we just asserted is gone, at the end of the test. Closes #59956	2020-07-21 18:19:18 +02:00
Nik Everett	6f6076e208	Drop some params from IndexFieldData.Builder (backport of #59934 ) (#59972 ) We never used the `IndexSettings` parameter and we only used the `MappedFieldType` parameter to get the name of the field which we already know everywhere where we build the `IFD.Builder`. This allows us to drop a fair bit of ceremony from a couple of tests.	2020-07-21 10:28:59 -04:00
Przemysław Witek	283a1f605c	Rename binary_soft_classification evaluation to outlier_detection (#59951 ) (#59970 )	2020-07-21 15:15:04 +02:00
Yannick Welsch	07784a0b16	CCR recoveries using wrong setting for chunk sizes (#59597 ) The default chunk size for CCR file-based recoveries was wrongly set to 40MB instead of 1MB.	2020-07-21 13:56:06 +02:00
Tal Levy	c9ac4bf7c8	Reduce memory usage of GeoGridTiler tests (#59921 ) This PR further reduces the memory footprint of the testGeoHashGridCircuitBreaker test such that only 0.26% of the randomized runs result in memory usage of between 500kb-1mb. where most of that those that are in that range produce ~650kb of usage. Before, 3% of the runs would use > 50mb of memory resulting in OOMs in CI Closes #59853.	2020-07-20 15:45:39 -07:00
Jay Modi	515b53d297	Fix race in SLM master/cluster state listeners (#59896 ) This change fixes two possible race conditions in SLM related to how local master changes and cluster state events are observed. When implementing the `LocalNodeMasterListener` interface, it is only recommended to execute on a separate threadpool if the operations are heavy and would block the cluster state thread. SLM specified that the listeners should run in the Snapshot thread pool, but the operations in the listener were lightweight. This had the side effect of causing master changes to be delayed if the Snapshot threads were all busy and could also potentially cause the `onMaster` and `offMaster` calls to race if both were queued and then executed concurrently. Additionally, the `SnapshotLifecycleService` is also a `ClusterStateListener` and there is currently no order of operations guarantee between `LocalNodeMasterListeners` and `ClusterStateListeners` so this could lead to incorrect behavior. The resolution for these two issues is that the SnapshotRetentionService now specifies the `SAME` executor for its implementation of the `LocalNodeMasterListener` interface. The `SnapshotLifecycleService` is no longer a `LocalNodeMasterListener` and instead tracks local master changes in its `ClusterStateListner`. Backport of #59801	2020-07-20 09:59:46 -06:00
Nik Everett	fcd8b5fe6e	Fix top_metrics when metric is missing (backport of #59471 ) (#59881 ) This fixes a null pointer exception when the metric is missing for the latest document returned by `top_metrics`. Closes #58926	2020-07-20 10:42:58 -04:00
Albert Zaharovits	3ffb20bdfc	Fix DLS/FLS permission for the submit async search action (#59693 ) The submit async search action should not populate the thread context DLS/FLS permission set, because it is not currently authorised as an "indices request" and hence the permission set that it builds is incomplete and it overrides the DLS/FLS permission set of the actual spawned search request (which is built correctly).	2020-07-20 09:37:26 +03:00
Costin Leau	9cc80621c3	EQL: Fix matching of tail/desc queries (#59827 ) When dealing with tail queries, data is returned descending for the base criterion yet the rest of the queries are ascending. This caused a problem during insertion since while in a page, the data is ASC, between pages the blocks of data is DESC. This caused incorrectly sorting inside a SequenceGroup which led to incorrect results. Further more in case of limit, since the data in a page is ASC, early return is not possible neither is desc matching. Thus the page needs to be consumed first before finding the final results. A future improvement could be to keep only the top N results dropping the rest during insertion time. (cherry picked from commit 77c88da054a1ce662a264f72cde5986d4ce37e3a)	2020-07-19 00:49:16 +03:00
Lee Hinman	8c7d414a3b	[7.x] Fix retrieving data stream stats for a DS with multiple backing indices (#59806 ) (#59810 ) Backports the following commits to 7.x: Fix retrieving data stream stats for a DS with multiple backing indices (#59806)	2020-07-17 16:56:07 -06:00
Nik Everett	95e6e4a452	Small cleanup for IndexFieldData (#59724 ) (#59800 ) This drops `IndexComponent` from `IndexFieldData` because it wasn't doing anything other than forcing us to perform a bunch of ceremony to build them.	2020-07-17 13:38:15 -04:00
Tal Levy	c9ab7bb651	Fix bug in circuit-breaker check for geoshape grid aggregations (#57962 ) (#59741 ) There was a bug in the geoshape circuit-breaker check where the hash values array was being allocated before its new size was accounted for by the circuit breaker. Fixes #57847.	2020-07-17 09:26:00 -07:00
Benjamin Trent	b7f30fc929	[7.x] Adding new `require_alias` option to indexing requests (#58917 ) (#59769 ) * Adding new `require_alias` option to indexing requests (#58917) This commit adds the `require_alias` flag to requests that create new documents. This flag, when `true` prevents the request from automatically creating an index. Instead, the destination of the request MUST be an alias. When the flag is not set, or `false`, the behavior defaults to the `action.auto_create_index` settings. This is useful when an alias is required instead of a concrete index. closes https://github.com/elastic/elasticsearch/issues/55267	2020-07-17 10:24:58 -04:00
Alan Woodward	b29d368b52	Convert DateFieldMapper to parametrized format (#59429 ) (#59759 ) This commit makes DateFieldMapper extend ParametrizedFieldMapper, declaring its parameters explicitly. As well as changes to DateFieldMapper itself, there are some changes to dynamic mapping code to ensure that dynamically detected date formats are passed through to new date mapper builders.	2020-07-17 12:46:18 +01:00
Andrei Dan	301d61a98e	Tests: fix TimeSeriesDataStreamsIT.testShrinkActionInPolicyWithoutHotPhase (#59603 ) (#59689 ) The ILM policy for the source and shrunk indices run separately (ie. they are two separate managed indices). This fixes the test which exhibited some flakiness by allowing some time for the ILM policy for the source index to finish executing. (cherry picked from commit c78d5e8499fc5ca2ca1314f97bcc6b55ba06e2e7) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-07-17 11:26:06 +01:00
Andrei Stefan	d513e1090f	Do not create the index, if it's already there (#59745 ) (#59747 ) (cherry picked from commit d097447d257efdf0a36b1157e1f177aed86ecca1)	2020-07-17 11:38:30 +03:00
Tanguy Leroux	4827fec1cf	Revert "Mute AzureSearchableSnapshotsIT (#58775 )" (#59749 ) This reverts commit 74a78b3a7b87448c5add84466b6a9cf61e0fe29a.	2020-07-17 10:02:46 +02:00
Martijn van Groningen	0096238df1	Replaced _data_stream_timestamp meta field's 'path' option with 'enabled' option (#59727 ) Backport #59503 to 7.x and adjusted exception messages. Relates to #59076	2020-07-16 22:29:40 +02:00
Igor Motov	2408803fad	Adds hard_bounds to histogram aggregations (#59175 ) (#59656 ) Adds a hard_bounds parameter to explicitly limit the buckets that a histogram can generate. This is especially useful in case of open ended ranges that can produce a very large number of buckets.	2020-07-16 15:31:53 -04:00
Marios Trivyzas	c7efbc1b83	SQL: Implement DATE_PARSE function for parsing strings into DATE values (#57391 ) (#59699 ) Implement DATE_PARSE(<date_str>, <pattern_str>) function which allows to parse a date string according to the specified pattern into a date object. The patterns allowed are those of java.time.format.DateTimeFormatter. Closes #54962 Co-authored-by: Marios Trivyzas <matriv@users.noreply.github.com> Co-authored-by: Patrick Jiang(白泽) <dreamlike.sky@foxmail.com> (cherry picked from commit 647a413d9b21bd3938f1716bb19f8407e1334125)	2020-07-16 17:24:30 +02:00
Benjamin Trent	a28547c4b4	[7.x] [ML] add new `custom` field to trained model processors (#59542 ) (#59700 ) * [ML] add new `custom` field to trained model processors (#59542) This commit adds the new configurable field `custom`. `custom` indicates if the preprocessor was submitted by a user or automatically created by the analytics job. Eventually, this field will be used in calculating feature importance. When `custom` is true, the feature importance for the processed fields is calculated. When `false` the current behavior is the same (we calculate the importance for the originating field/feature). This also adds new required methods to the preprocessor interface. If users are to supply their own preprocessors in the analytics job configuration, we need to know the input and output field names.	2020-07-16 10:57:38 -04:00
Nik Everett	343053c0a7	Fix compilation in Eclipse (backport #59675 ) Eclipse was confused by #59583. It can't see a the public inner interface within the superclass. This time. Usually that is fine, but the Eclipse gods don't like this particular code, I guess.	2020-07-16 08:25:12 -04:00
David Kyle	c349fdcb89	Mute RegressionIT testWithDataStream (#59687 ) For #59664	2020-07-16 09:45:29 +01:00
Przemysław Witek	df4fea79cb	Add a "verbose" option to the data frame analytics stats endpoint (#59589 ) (#59621 )	2020-07-16 09:51:31 +02:00
Nhat Nguyen	b599f7a9c0	Fix estimate size of translog operations (#59206 ) Make sure that the estimateSize method includes all fields of translog operations.	2020-07-16 00:19:30 -04:00

... 5 6 7 8 9 ...

5555 Commits