OpenSearch

mirror of https://github.com/honeymoose/OpenSearch.git synced 2025-02-20 03:45:02 +00:00

Author	SHA1	Message	Date
Julie Tibshirani	9e52513c7b	Add support for missing value fetchers. (#63585 ) This PR implements value fetching for the following field types: * `text` phrase and prefix subfields * `search_as_you_type`, plus its subfields * `token_count`, which is implemented by fetching doc values Supporting these types helps ensure that retrieving all fields through `"fields": ["*"]` doesn't fail because of unsupported value fetchers.	2020-10-12 17:34:21 -07:00
Alan Woodward	88b45dfa61	Convert TextFieldMapper to parametrized form (#63269 ) (#63392 ) As a result of this, we can remove a chunk of code from TypeParsers as well. Tests for search/index mode analyzers have moved into their own file. This commit also rationalises the serialization checks for parameters into a single SerializerCheck interface that takes the values includeDefaults, isConfigured and the value itself. Relates to #62988	2020-10-07 13:26:25 +01:00
Igor Motov	6a9cde2918	Add support for x_opaque_id to _cat/tasks (#63036 ) (#63135 ) Adds an optional column with support for x_opaque_id to _cat/tasks API. Closes #61118	2020-10-01 13:17:46 -04:00
Alan Woodward	63afc61b08	Introduce FetchContext (#62357 ) We currently pass a SearchContext around to share configuration among FetchSubPhases. With the introduction of runtime fields, it would be useful to start storing some state on this context to be shared between different subphases (for example, stored fields or search lookups can be loaded lazily but referred to by many different subphases). However, SearchContext is a very large and unwieldy class, and adding more methods or state here feels like a bridge too far. This commit introduces a new FetchContext class that exposes only those methods on SearchContext that are required for fetch phases. This reduces the API surface area for fetch phases considerably, and should give us some leeway to add further state.	2020-09-17 09:57:43 +01:00
Nik Everett	24a24d050a	Implement fields fetch for runtime fields (backport of #61995 ) (#62416 ) This implements the `fields` API in `_search` for runtime fields using doc values. Most of that implementation is stolen from the `docvalue_fields` fetch sub-phase, just moved into the same API that the `fields` API uses. At this point the `docvalue_fields` fetch phase looks like a special case of the `fields` API. While I was at it I moved the "which doc values sub-implementation should I use for fetching?" question from a bunch of `instanceof`s to a method on `LeafFieldData` so we can be much more flexible with what is returned and we're not forced to extend certain classes just to make the fetch phase happy. Relates to #59332	2020-09-15 20:24:10 -04:00
Nik Everett	771a8893a6	Add more debugging information for cardinality agg (#62317 ) (#62397 ) This adds two extra bits of info to the profiler: 1. Count of the number of different types of collectors. This lets us figure out if we're using the optimization for segment ordinals. It adds a few more similar counters just for good measure. 2. Profiles the `getLeafCollector` and `postCollection` methods. These are non-trivial for some aggregations, like cardinality.	2020-09-15 13:21:11 -04:00
Julie Tibshirani	4a19bdb2ea	Support the 'fields' option in inner_hits and top_hits. (#62337 ) This PR adds support for the 'fields' option in the following places: * Anytime `inner_hits` is used, for both fetching nested/ child docs and field collapsing * The `top_hits` aggregation Addresses #61949.	2020-09-14 11:51:45 -07:00
Nhat Nguyen	3d69b5c41e	Introduce point in time APIs in x-pack basic (#61062 ) This commit introduces a new API that manages point-in-times in x-pack basic. Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. A search request by default executes against the most recent point in time. In some cases, it is preferred to perform multiple search requests using the same point in time. For example, if refreshes happen between search_after requests, then the results of those requests might not be consistent as changes happening between searches are only visible to the more recent point in time. A point in time must be opened before being used in search requests. The `keep_alive` parameter tells Elasticsearch how long it should keep a point in time around. ``` POST /my_index/_pit?keep_alive=1m ``` The response from the above request includes a `id`, which should be passed to the `id` of the `pit` parameter of search requests. ``` POST /_search { "query": { "match" : { "title" : "elasticsearch" } }, "pit": { "id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "keep_alive": "1m" } } ``` Point-in-times are automatically closed when the `keep_alive` is elapsed. However, keeping point-in-times has a cost; hence, point-in-times should be closed as soon as they are no longer used in search requests. ``` DELETE /_pit { "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA=" } ``` #### Notable works in this change: - Move the search state to the coordinating node: #52741 - Allow searches with a specific reader context: #53989 - Add the ability to acquire readers in IndexShard: #54966 Relates #46523 Relates #26472 Co-authored-by: Jim Ferenczi <jimczi@apache.org>	2020-09-10 19:25:47 -04:00
Jake Landis	e80f68ed77	[7.x] Add test for item-level error when no write index defined for an alias in bulk API (#55503 ) (#61999 ) Add test for item-level error when no write index defined for an alia… Co-authored-by: Jake Landis <jake.landis@elastic.co> Co-authored-by: bellengao <gbl_long@163.com>	2020-09-09 14:27:25 -05:00
Nik Everett	1104d65465	Fix bug with terms' min_doc_count (#62130 ) (#62177 ) The `global_ordinals` implementation of `terms` had a bug when `min_doc_count: 0` that'd cause sub-aggregations to have array index out of bounds exceptions. Ooops. My fault. This fixes the bug by assigning ordinals to those buckets. Closes #62084	2020-09-09 13:04:51 -04:00
Luca Cavanna	ab8f65a099	[TEST] Don't specify a type unless needed (#62011 ) We have a couple of yaml tests that index documents under a 'test' type, while they could omit it. We do want to still test that specifying the type is still allowed in 7.x but we already have specific tests for that, and other tests should use the endpoint that don't require specifying a type.	2020-09-05 09:27:00 +02:00
Dan Hermann	d52ee17054	Adjust BWC after backport of #60818	2020-09-01 08:30:32 -05:00
Dan Hermann	88a448f1cd	Fix wrong result when executing bulk requests with and without pipeline (#60818 ) (#61777 )	2020-09-01 07:05:25 -05:00
Julie Tibshirani	85ad328df7	Ensure fetch fields aren't dropped when rewriting search. (#61390 ) Previously we didn't retain the requested fields when performing a shallow copy of the search source. This meant that when a search was rewritten, we could drop the requested fields and fail to return them in the response.	2020-08-20 14:58:58 -07:00
Nik Everett	1e6400285c	Some progress on failing runtime fields tests (bring #61098 to 7.x) (#61115 ) * Some progress on failing runtime fields tests (bring #61098 to 7.x) This breaks apart the a test for the `terms` aggregation into one that work for runtime fields and one that doesn't.	2020-08-17 09:56:55 -04:00
Nik Everett	639782da12	Break up a test for with runtime fields (brings #60931 to master) (#61103 ) (#61114 ) Breaks up an integration test into one that runtime fields can run and one that runtime fields have to skip. This is because runtime fields don't have global ords and we assert things about global ords in the test we have to skip.	2020-08-17 09:56:33 -04:00
Ryan Ernst	bce93b93b2	Increase docs and client rest test timeouts for Darwin (#61075 ) The Darwin CI hosts continue to struggle with timeouts. This commit increases the timouts for docs and client rest tests. relates #58286	2020-08-13 21:22:06 -07:00
Nhat Nguyen	843122ccce	Prevent shard relocation while closing index (#61072 ) We might fail to close an index if some shards are being relocated. Close #60913	2020-08-13 10:17:48 -04:00
Russ Cam	d593c963f3	Use deprecated object to deprecate synced flush API (#57096 ) Relates: #50835 This commit updates the synced flush REST API spec to deprecate the whole API.	2020-08-06 15:17:26 +10:00
István Zoltán Szabó	7c995aa3ff	[DOCS] Fixes broken links in rest API spec. (#60582 )	2020-08-03 15:31:25 +02:00
Nik Everett	2cde43b799	Allows nanosecond resolution in search_after (backport of #60328 ) (#60426 ) Allows nanosecond resolution in search_after (#60328) This fixes `search_after` to properly parse string formatted dates that have nanosecond resolution. Closes #52424	2020-08-03 08:17:48 -04:00
James Rodewig	771e9f142a	[DOCS] Move search pagination content to one page (#60515 ) (#60525 )	2020-07-31 12:40:40 -04:00
Julie Tibshirani	fe1d877b69	Avoid using string 'y' in fields REST tests. (#60471 ) Some yaml parsers interpret 'y' and 'yes' as the boolean 'true'.	2020-07-31 09:12:15 -07:00
Dan Hermann	bd6b20a4ec	Fix failing test for resolve index API (#60306 ) (#60508 )	2020-07-31 08:43:24 -05:00
Tim Brooks	85fdf959ad	Add configured indexing memory limit to node stats (#60414 ) This commit adds the configured memory limit to the node stats API.	2020-07-29 12:28:21 -06:00
Igor Motov	0dd53b76bd	Add aggregation list to node info (#60074 ) (#60256 ) Adds a full list of supported aggregations to the node info API. This list will be used in transform tests and telemetry mapping tests that will be added as follow-up PRs. Fixes #59774	2020-07-28 14:06:12 -04:00
Julie Tibshirani	c7bfb5de41	Add search `fields` parameter to support high-level field retrieval. (#60258 ) This feature adds a new `fields` parameter to the search request, which consults both the document `_source` and the mappings to fetch fields in a consistent way. The PR merges the `field-retrieval` feature branch. Addresses #49028 and #55363.	2020-07-28 10:58:20 -07:00
David Turner	9450ea08b4	Log and track open/close of transport connections (#60297 ) Transport connections between nodes remain in place until one or other node shuts down or the connection is disrupted by a flaky network. Today it is very difficult to demonstrate that transient failures and cluster instability are caused by the network even though this is often the case. In particular, transport connections open and close without logging anything, even at `DEBUG` level, making it very hard to quantify the scale of the problem or to correlate the networking problems with external events. This commit adds the missing `DEBUG`-level logging when transport connections open and close, and also tracks the total number of transport connections a node has opened as a measure of the stability of the underlying network.	2020-07-28 17:08:04 +01:00
Dan Hermann	fe12217c7f	[7.x] Move REST specs for data streams (#60111 )	2020-07-23 08:10:54 -05:00
Tim Brooks	ba01540d7e	Implement human readable indexing pressure stats (#60058 ) The indexing pressure stats do not currently have human readable variants. This commit add human readable variants and updates the documentation.	2020-07-22 12:07:59 -06:00
Nik Everett	49f365ddfd	Fix bug in deep pipeline agg serialization (#59984 ) In #54716 I removed pipeline aggregators from the aggregation result tree and caused us to read them from the request. This saves a bunch of round trip bytes, which is neat. But there was a bug in the backwards compatibility logic. You see, we still have to give the pipeline aggregations to nodes older than 7.8 over the wire because that is how they know what pipelines to run. They have the pipelines in the request but they don't read them. They use the ones in the response tree. Anyway, we had a bug where we were never sending pipelines defined two levels down. So while you are upgrading the pipeline wouldn't run. Sometimes. If the data node of the "first" result was post-7.8 and the coordinating node was pre-7.8. This fixes the bug.	2020-07-21 16:03:15 -04:00
James Baiera	b3363cf8f9	[7.x] Remove unneeded rest params from Data Stream Stats (#59575 ) (#59661 ) This PR removes the expand_wildcards and forbid_closed_indices parameters from the Data Streams Stats REST endpoint. These options are required for broadcast requests, but are not needed for anything in terms of resolving data streams. Instead, we just set a default set of IndicesOptions on the transport request.	2020-07-21 15:59:16 -04:00
Lee Hinman	8c7d414a3b	[7.x] Fix retrieving data stream stats for a DS with multiple backing indices (#59806 ) (#59810 ) Backports the following commits to 7.x: Fix retrieving data stream stats for a DS with multiple backing indices (#59806)	2020-07-17 16:56:07 -06:00
Lee Hinman	f6b08a3115	[7.x] Allow simulating existing composable index template (#59733 ) (#59798 ) Backports the following commits to 7.x: Allow simulating existing composable index template (#59733)	2020-07-17 13:10:07 -06:00
Benjamin Trent	b7f30fc929	[7.x] Adding new `require_alias` option to indexing requests (#58917 ) (#59769 ) * Adding new `require_alias` option to indexing requests (#58917) This commit adds the `require_alias` flag to requests that create new documents. This flag, when `true` prevents the request from automatically creating an index. Instead, the destination of the request MUST be an alias. When the flag is not set, or `false`, the behavior defaults to the `action.auto_create_index` settings. This is useful when an alias is required instead of a concrete index. closes https://github.com/elastic/elasticsearch/issues/55267	2020-07-17 10:24:58 -04:00
Igor Motov	2408803fad	Adds hard_bounds to histogram aggregations (#59175 ) (#59656 ) Adds a hard_bounds parameter to explicitly limit the buckets that a histogram can generate. This is especially useful in case of open ended ranges that can produce a very large number of buckets.	2020-07-16 15:31:53 -04:00
James Baiera	5f7e7e9410	[7.x] Data Stream Stats API (#58707 ) (#59566 ) This API reports on statistics important for data streams, including the number of data streams, the number of backing indices for those streams, the disk usage for each data stream, and the maximum timestamp for each data stream	2020-07-14 16:57:46 -04:00
Tim Brooks	408a07f96a	Separate coordinating and primary bytes in stats (#59487 ) Currently we combine coordinating and primary bytes into a single bucket for indexing pressure stats. This makes sense for rejection logic. However, for metrics it would be useful to separate them.	2020-07-14 12:37:06 -06:00
Dan Hermann	e54b4a729f	[7.x] Adds write_index_only option to put mapping API (#59539 )	2020-07-14 10:34:08 -05:00
Andrei Dan	7dcdaeae49	Default to @timestamp in composable template datastream definition (#59317 ) (#59516 ) This makes the data_stream timestamp field specification optional when defining a composable template. When there isn't one specified it will default to `@timestamp`. (cherry picked from commit 5609353c5d164e15a636c22019c9c17fa98aac30) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-07-14 12:36:54 +01:00
Andrei Dan	4180333bbc	[7.x] Composable templates: add a default mapping for @timestamp (#59244 ) (#59510 ) This adds a low precendece mapping for the `@timestamp` field with type `date`. This will aid with the bootstrapping of data streams as a timestamp mapping can be omitted when nanos precision is not needed. (cherry picked from commit 4e72f43d62edfe52a934367ce9809b5efbcdb531) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-07-14 11:29:33 +01:00
Tim Brooks	623df95a32	Adding indexing pressure stats to node stats API (#59467 ) We have recently added internal metrics to monitor the amount of indexing occurring on a node. These metrics introduce back pressure to indexing when memory utilization is too high. This commit exposes these stats through the node stats API.	2020-07-13 17:23:42 -06:00
Martijn van Groningen	b1b7bf3912	Make data streams a basic licensed feature. (#59392 ) Backport of #59293 to 7.x branch. * Create new data-stream xpack module. * Move TimestampFieldMapper to the new module, this results in storing a composable index template with data stream definition only to work with default distribution. This way data streams can only be used with default distribution, since a data stream can currently only be created if a matching composable index template exists with a data stream definition. * Renamed `_timestamp` meta field mapper to `_data_stream_timestamp` meta field mapper. * Add logic to put composable index template api to fail if `_data_stream_timestamp` meta field mapper isn't registered. So that a more understandable error is returned when attempting to store a template with data stream definition via the oss distribution. In a follow up the data stream transport and rest actions can be moved to the xpack data-stream module.	2020-07-13 17:26:46 +02:00
Dan Hermann	e01d73c737	[7.x] Data stream admin actions are now index-level actions	2020-07-10 14:36:18 -05:00
Dan Hermann	c26d2b5fa5	Data stream support for indices shard stores API	2020-07-09 13:11:45 -05:00
Martijn van Groningen	17bd559253	Fix the timestamp field of a data stream to @timestamp (#59210 ) Backport of #59076 to 7.x branch. The commit makes the following changes: * The timestamp field of a data stream definition in a composable index template can only be set to '@timestamp'. * Removed custom data stream timestamp field validation and reuse the validation from `TimestampFieldMapper` and instead only check that the _timestamp field mapping has been defined on a backing index of a data stream. * Moved code that injects _timestamp meta field mapping from `MetadataCreateIndexService#applyCreateIndexRequestWithV2Template58956(...)` method to `MetadataIndexTemplateService#collectMappings(...)` method. * Fixed a bug (#58956) that cases timestamp field validation to be performed for each template and instead of the final mappings that is created. * only apply _timestamp meta field if index is created as part of a data stream or data stream rollover, this fixes a docs test, where a regular index creation matches (logs-*) with a template with a data stream definition. Relates to #58642 Relates to #53100 Closes #58956 Closes #58583	2020-07-08 17:30:46 +02:00
Andrei Dan	24c6a30e2b	[7.9] GET data stream API returns additional information (#59128 ) (#59177 ) * GET data stream API returns additional information (#59128) This adds the data stream's index template, the configured ILM policy (if any) and the health status of the data stream to the GET _data_stream response. Restoring a data stream from a snapshot could install a data stream that doesn't match any composable templates. This also makes the `template` field in the `GET _data_stream` response optional. (cherry picked from commit 0d9c98a82353b088c782b6a04c44844e66137054) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-07-07 20:30:09 +01:00
James Rodewig	6ed356ffc3	[DOCS] Replace `datatype` with `data type` (#58972 ) (#59184 )	2020-07-07 14:59:35 -04:00
Nik Everett	b99b2f1a08	Fix test for adjacency_matrix It needs to request the value count in a backwards compatible way.	2020-07-07 11:20:43 -04:00
Nik Everett	eb169ae226	Fix lookup support in adjacency matrix (backport of #59099 ) (#59108 ) This request: ``` POST /_search { "aggs": { "a": { "adjacency_matrix": { "filters": { "1": { "terms": { "t": { "index": "lookup", "id": "1", "path": "t" } } } } } } } } ``` Would fail with a 500 error and a message like: ``` { "error": { "root_cause": [ { "type": "illegal_state_exception", "reason":"async actions are left after rewrite" } ] } } ``` This fixes that by moving the query rewrite phase from a synchronous call on the data nodes into the standard aggregation rewrite phase which can properly handle the asynchronous actions.	2020-07-07 10:28:20 -04:00

1 2 3 4 5 ...

2091 Commits