OpenSearch

mirror of https://github.com/honeymoose/OpenSearch.git synced 2025-02-19 19:35:02 +00:00

Author	SHA1	Message	Date
Yannick Welsch	15c85b29fd	Account for recovery throttling when restoring snapshot (#58658 ) (#58811 ) Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account (i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to configure throttling in a single place. The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to `40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change will be observed by clusters where the recovery and restore settings were not adapted. Relates https://github.com/elastic/elasticsearch/issues/57023 Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2020-07-01 12:19:29 +02:00
Russ Cam	7d34fa9b67	Update link to .NET BulkAllObservable	2020-07-01 19:54:30 +10:00
David Turner	3a234d2669	Account for remaining recovery in disk allocator (#58800 ) Today the disk-based shard allocator accounts for incoming shards by subtracting the estimated size of the incoming shard from the free space on the node. This is an overly conservative estimate if the incoming shard has almost finished its recovery since in that case it is already consuming most of the disk space it needs. This change adds to the shard stats a measure of how much larger each store is expected to grow, computed from the ongoing recovery, and uses this to account for the disk usage of incoming shards more accurately. Backport of #58029 to 7.x * Picky picky * Missing type	2020-07-01 10:12:44 +01:00
Nik Everett	40850a780d	Fail variable_width_histogram that collects from many (#58619 ) (#58780 ) Adds an explicit check to `variable_width_histogram` to stop it from trying to collect from many buckets because it can't. I tried to make it do so but that is more than an afternoon's project, sadly. So for now we just disallow it. Relates to #42035	2020-06-30 18:26:45 -04:00
James Rodewig	3aa08fbcde	[DOCS] Add data streams to API conventions (#58695 ) (#58785 ) Updates the existing API conventions docs to make them aware of data streams. Co-authored-by: debadair <debadair@elastic.co>	2020-06-30 17:33:35 -04:00
James Rodewig	046d9eeb41	[DOCS] Make `<target>` defs consistent	2020-06-30 15:55:37 -04:00
James Rodewig	416652cdd8	[DOCS] Clarify request formats for index API (#58768 ) (#58774 )	2020-06-30 15:53:03 -04:00
James Rodewig	e90c465640	[DOCS] Add data streams to cat APIs (#58699 ) (#58773 )	2020-06-30 15:52:48 -04:00
James Rodewig	3778ca8c25	[DOCS] Add data streams to count API (#58771 ) (#58772 )	2020-06-30 15:52:34 -04:00
James Rodewig	35830a7c12	[DOCS] Add data streams to get field mapping API docs (#58689 ) (#58759 ) Updates the existing get field mapping API docs to make them aware of data streams. Relates to #58488.	2020-06-30 13:24:41 -04:00
James Rodewig	874ab36b14	[DOCS] Fix error in stop SLM API docs (#58747 ) (#58750 )	2020-06-30 10:16:43 -04:00
James Rodewig	19190c529c	[DOCS] Reword admon for index API and data streams	2020-06-30 09:54:24 -04:00
James Rodewig	770f9f11af	[DOCS] Fix xref format in async EQL search docs	2020-06-30 09:37:47 -04:00
James Rodewig	e5d5b9f5e8	[DOCS] Suppress searchable snapshots in releases (#58740 ) (#58742 ) Fixes a searchable snapshot reference overlooked in #58652	2020-06-30 09:22:32 -04:00
James Rodewig	d8731853a3	[DOCS] EQL: Document `head` and `tail` pipes (#58673 ) (#58739 )	2020-06-30 09:12:54 -04:00
James Rodewig	d33764583c	[7.x] [DOCS] Document delete/update by query for data streams (#58679 ) (#58706 )	2020-06-30 08:35:13 -04:00
David Turner	ceff00997d	Suppress searchable snapshots docs in releases (#58652 ) This commit adds conditional logic to the docs to avoid including any docs on searchable snapshots in released versions. Rework of #58556 which was reverted.	2020-06-30 13:13:09 +01:00
Przemysław Witek	9ea9b7bd3b	[7.x] Implement MSLE (MeanSquaredLogarithmicError) evaluation metric for regression analysis (#58684 ) (#58731 )	2020-06-30 14:09:11 +02:00
Yannick Welsch	b885cbff1a	Add index block api (#58716 ) Adds an API for putting an index block in place, which also ensures for write blocks that, once successfully returning to the user, all shards of the index are properly accounting for the block, for example that all in-flight writes to an index have been completed after adding the write block. This API allows coordinating more complex workflows, where it is crucial that an index is no longer receiving writes after the API completes, useful for example when marking an index as read-only during an upgrade in order to reindex its documents.	2020-06-30 14:06:52 +02:00
James Rodewig	8341ebc061	[7.x] [DOCS] Reformats the update by query API. (#46199 ) (#58700 ) Co-authored-by: debadair <debadair@elastic.co>	2020-06-29 17:50:32 -04:00
Dan Hermann	84513c7539	Document the prohibition on freezing data stream write indices (#58058 ) (#58705 )	2020-06-29 16:34:36 -05:00
Adam Locke	719f2fb135	[DOCS] [7.x] Adding create index snapshot API docs (#58519 ) (#58692 ) * Adding create index snapshot API page. * Condense API description. * Remove parameter from query. * Add POST method and remove `-name` from the snapshot variable. * Expand description of `<snapshot>`. * Add data streams to introduction and expand the overall description. * Add support for data streams. * Add support for data streams. * Add data stream and reference for "point-in-time view". * Add data streams. * Change `my_backup` to `my_repository`. * Add description of boolean options for `wait_for_completion` parameter. * Change command --> response * Clarify `indices` parameter description * Update `ignore-unavailable` parameter description * Reword example description * Remove "index" from API name * Incorporating review comments from James R. * Adding a much better request + response * Clarify `include_global_state` description * Incorporating additional edits. * Changing my_backup to my_repository in example. * Update snippet test to avoid failures * Update TESTRESPONSE snippets * Remove errant space * Removing the parameter per reviewer comments	2020-06-29 16:13:53 -04:00
James Rodewig	735a3f344d	[DOCS] EQL: Remove fields from EQL search response (#58667 ) (#58669 )	2020-06-29 09:34:20 -04:00
István Zoltán Szabó	13aa8b8d9a	[DOCS] Updates results_field description in the inference processor docs (#58554 )	2020-06-29 13:15:15 +02:00
Przemysław Witek	3f7c45472e	[7.x] Introduce DataFrameAnalyticsConfig update API (#58302 ) (#58648 )	2020-06-29 10:56:11 +02:00
David Turner	8f82ec0b19	Revert "Suppress searchable snapshots docs in releases (#58556 )" This reverts commit f0c0ee691a0b0da458b99f7b33b7e6a099141556.	2020-06-29 09:21:58 +01:00
David Turner	f0c0ee691a	Suppress searchable snapshots docs in releases (#58556 ) This commit adds conditional logic to the docs to avoid including any docs on searchable snapshots in released versions.	2020-06-29 08:34:11 +01:00
Dimitris Athanasiou	1817b896c9	[7.x][ML] Add status and increased estimate to memory usage (#58588 ) (#58606 ) Adds parsing of `status` and `memory_reestimate_bytes` to data frame analytics `memory_usage`. When the training surpasses the model memory limit, the status will be set to `hard_limit` and `memory_reestimate_bytes` can be used to update the job's limit in order to restart the job. Backport of #58588	2020-06-28 16:27:26 +03:00
Costin Leau	3c81b91474	EQL: Add Head/Tail pipe support (#58536 ) Introduce pipe support, in particular head and tail (which can also be chained). (cherry picked from commit 4521ca3367147d4d6531cf0ab975d8d705f400ea) (cherry picked from commit d6731d659d012c96b19879d13cfc9e1eaf4745a4)	2020-06-27 09:49:14 +03:00
James Rodewig	69d8285a28	[DOCS] Add data streams to multi search API docs (#58610 ) (#58622 ) Makes the existing multi search API docs aware of data streams.	2020-06-26 17:32:56 -04:00
James Rodewig	c06c89d3db	[DOCS] Remove `composable index template` refs (#58567 ) (#58612 ) Replaces `composable index template` and `composable template` with `index template` throughout data stream-related docs. `Composable index template` is only used to contrast with legacy index templates.	2020-06-26 11:52:58 -04:00
James Rodewig	b37b318d0d	[DOCS] EQL: Remove references to partial async EQL results (#58548 ) (#58609 ) Removes references to partial results from the async EQL search docs. If an EQL search does not complete during the `wait_for_completion_timeout` timeout period, it returns no results.	2020-06-26 11:11:55 -04:00
James Rodewig	28717d1e02	[DOCS] Fix analyzer page titles (#58362 ) (#58603 ) Changes the titles for analyzer pages to sentence case. Also changes the 'Pattern character filter' page title to sentence case.	2020-06-26 10:17:01 -04:00
James Rodewig	c613e0915a	[DOCS] EQL: Document search API's `tiebreaker_field` param (#57935 ) (#58540 )	2020-06-26 09:25:24 -04:00
James Rodewig	ab29162ab3	[DOCS] Fix tokenizer page titles (#58361 ) (#58598 ) Changes the titles for tokenizer pages to sentence case. Also moves the 'Path hierarchy tokenizer examples' page within the 'Path hierarchy tokenizer' page and adds a related redirect.	2020-06-26 09:24:41 -04:00
Przemyslaw Gomulka	5149554709	Update format.asciidoc to describe strict_date_optional_time_nanos (#57527 ) (#58581 ) closes #57019	2020-06-26 09:02:08 +02:00
Nik Everett	d22a242613	Docs: Mark variable_width_histogram experimental (#58574 ) We're tracking this aggregation's experimental-progress in #58573. We'd like a little time to be able to make backwards incompatible changes to the aggregation because we're not 100% sure about the request and response format yet.	2020-06-25 16:54:57 -04:00
Jason Tedor	52ad5842a9	Introduce node.roles setting (#58512 ) Today we have individual settings for configuring node roles such as node.data and node.master. Additionally, roles are pluggable and we have used this to introduce roles such as node.ml and node.voting_only. As the number of roles is growing, managing these becomes harder for the user. For example, to create a master-only node, today a user has to configure: - node.data: false - node.ingest: false - node.remote_cluster_client: false - node.ml: false at a minimum if they are relying on defaults, but also add: - node.master: true - node.transform: false - node.voting_only: false If they want to be explicit. This is also challenging in cases where a user wants to have configure a coordinating-only node which requires disabling all roles, a list which we are adding to, requiring the user to keep checking whether a node has acquired any of these roles. This commit addresses this by adding a list setting node.roles for which a user has explicit control over the list of roles that a node has. If the setting is configured, the node has exactly the roles in the list, and not any additional roles. This means to configure a master-only node, the setting is merely 'node.roles: [master]', and to configure a coordinating-only node, the setting is merely: 'node.roles: []'. With this change we deprecate the existing 'node.*' settings such as 'node.data'.	2020-06-25 14:14:51 -04:00
Igor Motov	20af856abd	[7.x] EQL: Adds an ability to execute an asynchronous EQL search (#58192 ) Adds async support to EQL searches Closes #49638 Co-authored-by: James Rodewig james.rodewig@elastic.co	2020-06-25 14:11:57 -04:00
Nik Everett	03e6d1b535	Add Variable Width Histogram Aggregation (backport of #42035 ) (#58440 ) Implements a new histogram aggregation called `variable_width_histogram` which dynamically determines bucket intervals based on document groupings. These groups are determined by running a one-pass clustering algorithm on each shard and then reducing each shard's clusters using an agglomerative clustering algorithm. This PR addresses #9572. The shard-level clustering is done in one pass to minimize memory overhead. The algorithm was lightly inspired by [this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches a small number of documents to sample the data and determine initial clusters. Subsequent documents are then placed into one of these clusters, or a new one if they are an outlier. This algorithm is described in more details in the aggregation's docs. At reduce time, a [hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering) algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304) continually merges the closest buckets from all shards (based on their centroids) until the target number of buckets is reached. The final values produced by this aggregation are approximate. Each bucket's min value is used as its key in the histogram. Furthermore, buckets are merged based on their centroids and not their bounds. So it is possible that adjacent buckets will overlap after reduction. Because each bucket's key is its min, this overlap is not shown in the final histogram. However, when such overlap occurs, we set the key of the bucket with the larger centroid to the midpoint between its minimum and the smaller bucket’s maximum: `min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to increases the accuracy of the clustering. Nodes are unable to share centroids during the shard-level clustering phase. In the future, resolving https://github.com/elastic/elasticsearch/issues/50863 would let us solve this issue. It doesn’t make sense for this aggregation to support the `min_doc_count` parameter, since clusters are determined dynamically. The `order` parameter is not supported here to keep this large PR from becoming too complex. Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>	2020-06-25 11:40:47 -04:00
James Rodewig	c3f4034199	[DOCS] Note that DS timestamp field mapping changes require reindex (#58444 ) (#58517 ) With #58096, data streams now track the timestamp field mapping outside of the template associated with the stream. This means you can no longer update the timestamp field mapping using template changes. This updates the associated data stream docs.	2020-06-24 17:21:26 -04:00
markharwood	837f2643eb	Docs - Added field capabilities breaking change (#58509 )	2020-06-24 18:39:01 +01:00
Russ Cam	441bc14d21	[DOCS] Update aliases to indicate array (#58469 ) Updates the aliases documentation to correct the parameter to an array.	2020-06-24 09:41:23 -04:00
markharwood	d5ac3bb87f	Field capabilities - make `keyword` a family of field types (#58315 ) (#58483 ) Introduces a new method on `MappedFieldType` to return a family type name which defaults to the field type. Changes `wildcard` and `constant_keyword` field types to return `keyword` for field capabilities. Relates to #53175	2020-06-24 12:32:14 +01:00
James Rodewig	afbf3bd33b	[DOCS] Add data streams to bulk, delete, and index API docs (#58340 ) (#58434 ) Updates existing docs for the bulk, delete and index APIs to make them aware of data streams.	2020-06-23 09:40:25 -04:00
James Rodewig	9d03204308	[DOCS] Prohibit deletion of composable template in use by data stream (#58347 ) (#58430 ) Notes that you cannot delete a composable template currently in use by a data stream. Relates to #57957.	2020-06-23 09:01:17 -04:00
James Rodewig	b213f0222c	[DOCS] Reword tip in data streams overview	2020-06-23 08:57:59 -04:00
István Zoltán Szabó	3169e4c70e	[DOCS] Updates screenshots in ML population analysis (#58318 )	2020-06-23 09:05:08 +02:00
Dan Hermann	c5f5cc4cf8	[DOCS] Prohibit cloning, splitting, and shrinking a data stream's write index (#58105 ) (#58401 )	2020-06-22 07:29:26 -05:00
Benjamin Trent	bf8641aa15	[7.x] [ML] calculate cache misses for inference and return in stats (#58252 ) (#58363 ) When a local model is constructed, the cache hit miss count is incremented. When a user calls _stats, we will include the sum cache hit miss count across ALL nodes. This statistic is important to in comparing against the inference_count. If the cache hit miss count is near the inference_count it indicates that the cache is overburdened, or inappropriately configured.	2020-06-19 09:46:51 -04:00

... 3 4 5 6 7 ...

7511 Commits