OpenSearch

Commit Graph

Author	SHA1	Message	Date
Nhat Nguyen	62763b177d	Implement toString for BulkByScrollTask (#59042 ) We should implement "toString" of BulkByScrollTask.StatusOrException to have a meaningful log message when a reindex task completes.	2020-07-05 22:06:56 -04:00
Armin Braun	071d8b2c1c	Deduplicate Empty InternalAggregations (#58386 ) (#59032 ) Working through a heap dump for an unrelated issue I found that we can easily rack up tens of MBs of duplicate empty instances in some cases. I moved to a static constructor to guard against that in all cases.	2020-07-04 14:02:16 +02:00
Dan Hermann	7c43cbca82	[7.x] Ignore matching data streams if include_data_streams is false (#59028 )	2020-07-03 14:51:32 -05:00
Dan Hermann	c1781bc7e7	[7.x] Add include_data_streams flag for authorization (#59008 )	2020-07-03 12:58:39 -05:00
Dan Hermann	5e7746d3bd	[7.x] Mirror privileges over data streams to their backing indices (#58991 )	2020-07-03 06:33:38 -05:00
Armin Braun	d22dd437f1	Fix Two Common Zero Len Array Instantiations (#58944 ) (#58993 ) Two spots I found in which we commonly instatiate a non-trivial number of zero length arrays.	2020-07-03 09:18:14 +02:00
Nhat Nguyen	65645217bc	Handle IOException while checking translog corruption We can hit an IOException while reading a translog header after corrupting it. Relates #58866	2020-07-02 22:38:05 -04:00
Tim Brooks	dc9e364ff2	Count coordinating and primary bytes as write bytes (#58984 ) This is a follow-up to #57573. This commit combines coordinating and primary bytes under the same "write" bucket. Double accounting is prevented by only accounting the bytes at either the reroute phase or the primary phase. TransportBulkAction calls execute directly, so the operations handler is skipped and the bytes are not double accounted.	2020-07-02 19:48:19 -06:00
Mark Vieira	8fca312a3a	Mute WriteMemoryLimitsIT.testWriteBytesAreIncremented	2020-07-02 16:58:23 -07:00
Tim Brooks	9d1bf383d0	Add test assertions to ensure write bytes released (#58970 ) This is a follow-up to #57573. This commit ensures that the bytes marked in WriteMemoryLimits are released by any test using an internal test cluster.	2020-07-02 17:38:23 -06:00
Tim Brooks	1ef2cd7f1a	Add memory tracking to queued write operations (#58957 ) Currently we do not track the memory consuming by in-process write operations. This commit adds a mechanism to track write operation memory usage.	2020-07-02 14:14:57 -06:00
Jim Ferenczi	a4e08acdd1	Fix exists query on unmapped field in query_string (#58804 ) Since #55785, exists queries rewrite to MatchNoneQueryBuilder when the field is unmapped. This change also introduced a bug in the `query_string` query, using an unmapped field like `_exists_:foo` throws an exception if the field is unmapped. This commit avoids the exception if the query is built outside of an `ExistsQueryBuilder`. Closes #58737	2020-07-02 21:52:03 +02:00
Nhat Nguyen	be804b765d	Avoid flipping translog header version (#58866 ) An old translog header does not have a checksum. If we flip the header version of an empty translog to the older version, then we won't detect that corruption, and translog will be considered clean as before. Closes #58671	2020-07-02 14:34:19 -04:00
Tal Levy	d516959774	Re-enable support for array-valued geo_shape fields. (#58786 ) (#58943 ) A regression in the mapping code led to geo_shape no longer supporting array-valued fields. This commit fixes this support and adds an integration test to make sure this problem does not return!	2020-07-02 11:21:55 -07:00
Ryan Ernst	d825d4352c	Eagerly compile condition script at processor creation (#58882 ) Ingest script processors were changed to eagerly compile their scripts when the ingest pipeline is saved, but conditional scripts were missed. This commit adds eager compilation to ingest conditional scripts, which will help surface errors before runtime, as well as adds tests for each case we might encounter between inline and stored script compilation failures. closes #58864	2020-07-02 11:10:20 -07:00
Lee Hinman	e32623ef52	[7.x] Add test for component templates updated after cluster restart (#58883 ) (#58914 ) This commit adds an integration test that component templates used to form a composite template can still be updated after a cluster restart. In #58643 an issue arose where mappings were causing problems because of the way we unwrap `_doc` in template mappings. This was also related to the mappings being merged manually rather than using the `MapperService` to do the merging. #58643 was fixed in 7.9 and master with the #58521 change, since mappings now are read and digested by the actual mapper service. This test passes for 7.x and master, and I intend to open a separate PR including this test for 7.8.1 along with a bug fix for #58643. This test is to ensure we don't have any regression in the future.	2020-07-02 08:23:34 -06:00
Armin Braun	62152852dc	Cleanup Duplication in Snapshot ITs (#58818 ) (#58915 ) Just a few obvious static cleanups of duplication to push back against the ever increasing complexity of these tests.	2020-07-02 16:00:01 +02:00
Alan Woodward	0cd1dc3143	Percolator keyword fields should not store norms (#58899 ) The refactoring in #57666 inadvertently enabled norms on two of the percolator subfields, leading to an increase in memory usage. This commit disables norms on these fields again.	2020-07-02 13:59:28 +01:00
Nik Everett	5e49ee800e	Drop rewriting in date_histogram (backport of #57836 ) (#58875 ) The `date_histogram` aggregation had an optimization where it'd rewrite `time_zones` who's offset from UTC is fixed across the entire index. This rewrite is no longer needed after #56371 because we can tell that a time zone is fixed lower down in the aggregation. So this removes it.	2020-07-01 17:19:12 -04:00
Dan Hermann	98a62a6b2d	Make DataStream instances explicitly immutable (#58688 ) (#58839 )	2020-07-01 11:14:01 -05:00
Lee Hinman	d3d03fc1c6	[7.x] Add default composable templates for new indexing strategy (#57629 ) (#58757 ) Backports the following commits to 7.x: Add default composable templates for new indexing strategy (#57629)	2020-07-01 09:32:32 -06:00
Andrei Dan	f7dc09340b	Prohibit custom _routing for index requests targetting a data stream (#58749 ) (#58831 ) This prohibits the use of a custom _routing when the index/bulk requests are targetting a data stream. Using a custom _routing when targetting a backing index is still permitted. Relates to #53100 (cherry picked from commit ece6b7a318a8bd3a010499189f31fc5e3a012d4f) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-07-01 14:54:18 +01:00
Alan Woodward	3ba16e0f39	Move MappedFieldType#getSearchAnalyzer and #getSearchQuoteAnalyzer to TextSearchInfo (#58830 ) Analyzers are specific to text searching, and so should be in TextSearchInfo rather than on the generic MappedFieldType. Backport of #58639	2020-07-01 14:52:14 +01:00
Przemyslaw Gomulka	2c275913b9	[7.x] Week based parsing for ingest date processor (#58597 ) (#58802 ) Date processor was incorrectly parsing week based dates because when a weekbased year was provided ingest module was thinking year was not on a date and was trying to applying the logic for dd/MM type of dates. Date Processor is also allowing users to specify locale parameter. It should be taken into account when parsing dates - currently only used for formatting. If someone specifies 'en-us' locale, then calendar data rules for that locale should be used. The exception is iso8601 format. If someone is using that format, then locale should not override calendar data rules. closes #58479	2020-07-01 15:15:56 +02:00
David Turner	822b7421ce	Forbid read-only-allow-delete block in blocks API (#58727 ) The read-only-allow-delete block is not really under the user's control since Elasticsearch adds/removes it automatically. This commit removes support for it from the new API for adding blocks to indices that was introduced in #58094.	2020-07-01 13:18:26 +01:00
Martijn van Groningen	a0df96befb	Add data stream support to put mapping and update index settings APIs. (#58758 ) Backport of #58231 to 7.x branch. Change update index setting and put mapping api to execute on all backing indices if data stream is targeted. Relates #53100	2020-07-01 13:32:21 +02:00
Yannick Welsch	15c85b29fd	Account for recovery throttling when restoring snapshot (#58658 ) (#58811 ) Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account (i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to configure throttling in a single place. The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to `40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change will be observed by clusters where the recovery and restore settings were not adapted. Relates https://github.com/elastic/elasticsearch/issues/57023 Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2020-07-01 12:19:29 +02:00
David Turner	3a234d2669	Account for remaining recovery in disk allocator (#58800 ) Today the disk-based shard allocator accounts for incoming shards by subtracting the estimated size of the incoming shard from the free space on the node. This is an overly conservative estimate if the incoming shard has almost finished its recovery since in that case it is already consuming most of the disk space it needs. This change adds to the shard stats a measure of how much larger each store is expected to grow, computed from the ongoing recovery, and uses this to account for the disk usage of incoming shards more accurately. Backport of #58029 to 7.x * Picky picky * Missing type	2020-07-01 10:12:44 +01:00
Dan Hermann	1c2a726731	Data stream support for search shards API (#58486 ) (#58765 )	2020-06-30 17:59:51 -05:00
Nik Everett	40850a780d	Fail variable_width_histogram that collects from many (#58619 ) (#58780 ) Adds an explicit check to `variable_width_histogram` to stop it from trying to collect from many buckets because it can't. I tried to make it do so but that is more than an afternoon's project, sadly. So for now we just disallow it. Relates to #42035	2020-06-30 18:26:45 -04:00
Dan Hermann	cae49b0fd7	[7.x] Add data stream support to open index API (#58767 )	2020-06-30 14:30:32 -05:00
Dan Hermann	a84ff81743	Data stream support for get field mappings API (#58488 ) (#58766 )	2020-06-30 13:45:04 -05:00
Martijn van Groningen	adcef93a6c	Introduce new put mapping action for dynamic mapping updates. (#58746 ) Backport of #58419 Mapping updates that originate from indexing a document with unmapped fields will use this new action instead of the current put mapping action. This way on the security side, authorization logic can easily determine whether a mapping update is automatically generated or a mapping update originates from the put mapping api. The new auto put mapping action is only used if all nodes are on the version that supports it.	2020-06-30 18:02:31 +02:00
Boice Huang	8c93f4e154	Sort document by internal doc id in FetchPhase to better use LRU cache (#57273 ) This change sorts the docIdsToLoad once instead of in each sub-phase.	2020-06-30 17:06:09 +02:00
Julie Tibshirani	ab65a57d70	Merge mappings for composable index templates (#58709 ) This PR implements recursive mapping merging for composable index templates. When creating an index, we perform the following: * Add each component template mapping in order, merging each one in after the last. * Merge in the index template mappings (if present). * Merge in the mappings on the index request itself (if present). Some principles: * All 'structural' changes are disallowed (but everything else is fine). An object mapper can never be changed between `type: object` and `type: nested`. A field mapper can never be changed to an object mapper, and vice versa. * Generally, each section is merged recursively. This includes `object` mappings, as well as root options like `dynamic_templates` and `meta`. Once we reach 'leaf components' like field definitions, they always overwrite an existing one instead of being merged. Relates to #53101.	2020-06-30 08:01:37 -07:00
Rene Groeschke	d952b101e6	Replace compile configuration usage with api (7.x backport) (#58721 ) * Replace compile configuration usage with api (#58451) - Use java-library instead of plugin to allow api configuration usage - Remove explicit references to runtime configurations in dependency declarations - Make test runtime classpath input for testing convention - required as java library will by default not have build jar file - jar file is now explicit input of the task and gradle will ensure its properly build * Fix compile usages in 7.x branch	2020-06-30 15:57:41 +02:00
Armin Braun	b52a764143	Fix NPE in SnapshotService CS Application (#58680 ) (#58735 ) In the unlikely corner case of deleting a relocation (hence `WAITING`) primary shard's index during a partial snapshot, we would throw an NPE when checking if there's any external changes to process.	2020-06-30 15:20:49 +02:00
Yannick Welsch	b885cbff1a	Add index block api (#58716 ) Adds an API for putting an index block in place, which also ensures for write blocks that, once successfully returning to the user, all shards of the index are properly accounting for the block, for example that all in-flight writes to an index have been completed after adding the write block. This API allows coordinating more complex workflows, where it is crucial that an index is no longer receiving writes after the API completes, useful for example when marking an index as read-only during an upgrade in order to reindex its documents.	2020-06-30 14:06:52 +02:00
Patrick Jiang(白泽)	be20aacec3	Add `matchBoolPrefix` method to QueryBuilders (#58637 )	2020-06-29 16:30:40 +02:00
Armin Braun	95d85f29f8	Fix Snapshots Capturing Incomplete Datastreams (#58630 ) (#58656 ) Only snapshot datastreams that are recorded in `SnapshotInfo` and clean those that aren't from the snapshotted metadata. Do not restore all datastreams by default when restoring global metadata, use the same mechanics used for indices here. Closes #58544	2020-06-29 12:51:40 +02:00
Armin Braun	4f2f257b12	Fix DataStream Handling on Restore of Global Metadata (#58631 ) (#58649 ) When restoring a global metadata snapshot we were overwriting the correctly adjusted data streams in the metadata when looping over all custom values. Closes #58496	2020-06-29 10:58:41 +02:00
Yang Wang	61fa7f4d22	Change privilege of enrich stats API to monitor (#52027 ) (#52196 ) The remote_monitoring_user user needs to access the enrich stats API. But the request is denied because the API is categorized under admin. The correct privilege should be monitor.	2020-06-29 10:25:33 +10:00
Ryan Ernst	08e75abd4e	Always add Java-9 style file permissions (#46050 ) (#58628 ) Java 9 removed pathname canonicalization, which means that we need to add permissions for the path and also the real path when adding file permissions. Since master requires a minimum runtime of JDK 11, we no longer need conditional logic here to apply this pathname canonicalization with our bares hands. This commit removes that conditional pathname canonicalization. Co-authored-by: Jason Tedor <jason@tedor.me>	2020-06-26 18:19:07 -07:00
Nik Everett	67e9d39932	Remove useless aggregation helper (#58571 ) (#58578 ) `descendsFromBucketAggregator` was important before we removed `asMultiBucketAggregator` but now that it is gone `collectsFromSingleBucket` is good enough. Relates to #56487 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-06-26 15:58:44 -04:00
Tanguy Leroux	775fb5d4cf	Allows SparseFileTracker to progressively execute listeners during Gap processing (#58477 ) (#58584 ) Today SparseFileTracker allows to wait for a range to become available before executing a given listener. In the case of searchable snapshot, we'd like to be able to wait for a large range to be filled (ie, downloaded and written to disk) while being able to execute the listener as soon as a smaller range is available. This pull request is an extract from #58164 which introduces a ProgressListenableActionFuture that is used internally by SparseFileTracker. The progressive listenable future allows to register listeners attached to SparseFileTracker.Gap so that they are executed once the Gap is completed (with success or failure) or as soon as the Gap progress reaches a given progress value. This progress value is defined when the tracker.waitForRange() method is called; this method has been modified to accept a range and another listener's range to operate on. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-06-26 18:26:20 +02:00
Armin Braun	090211f768	Fix Incorrect Snapshot Shar Status for DONE Shards in Running Snapshots (#58390 ) (#58593 ) Minor bugs/inconsistencies: If a shard hasn't changed at all we were reporting `0` for total size and total file count while it was ongoing. If a data node restarts/drops out during snapshot creation the fallback logic did not load the correct statistic from the repository but just created a status with `0` counts from the snapshot state in the CS. Added a fallback to reading from the repository in this case.	2020-06-26 16:11:30 +02:00
Howard	eaa60b7c54	[Docs] Fix return tuple element order (#58463 )	2020-06-26 12:24:54 +02:00
Nik Everett	5f52bc4c9f	Fix two scripted_metric bugs (backport of #58547 ) (#58565 ) Fixes two bugs introduced by #57627: 1. We were not properly letting go of memory from the request breaker when the aggregation finished. 2. We no longer supported totally arbitrary stuff produced by the init script because we assumed that it'd be ok to run the script once and clone its results. Sadly, cloning can't clone anything that the init script can make, like `String` arrays. This runs the init script once for every new bucket so we don't need to clone.	2020-06-25 16:16:10 -04:00
Armin Braun	468e559ff7	Fix Memory Leak From Master Failover During Snapshot (#58511 ) (#58560 ) If we failed over while the data nodes were doing their work we would never resolve the listener and leak it. This change fails all listeners if master fails over.	2020-06-25 20:43:08 +02:00
Henning Andersen	38be2812b1	Enhance extensible plugin (#58542 ) Rather than let ExtensiblePlugins know extending plugins' classloaders, we now pass along an explicit ExtensionLoader that loads the extensions asked for. Extensions constructed that way can optionally receive their own Plugin instance in the constructor.	2020-06-25 20:37:56 +02:00
Jason Tedor	52ad5842a9	Introduce node.roles setting (#58512 ) Today we have individual settings for configuring node roles such as node.data and node.master. Additionally, roles are pluggable and we have used this to introduce roles such as node.ml and node.voting_only. As the number of roles is growing, managing these becomes harder for the user. For example, to create a master-only node, today a user has to configure: - node.data: false - node.ingest: false - node.remote_cluster_client: false - node.ml: false at a minimum if they are relying on defaults, but also add: - node.master: true - node.transform: false - node.voting_only: false If they want to be explicit. This is also challenging in cases where a user wants to have configure a coordinating-only node which requires disabling all roles, a list which we are adding to, requiring the user to keep checking whether a node has acquired any of these roles. This commit addresses this by adding a list setting node.roles for which a user has explicit control over the list of roles that a node has. If the setting is configured, the node has exactly the roles in the list, and not any additional roles. This means to configure a master-only node, the setting is merely 'node.roles: [master]', and to configure a coordinating-only node, the setting is merely: 'node.roles: []'. With this change we deprecate the existing 'node.*' settings such as 'node.data'.	2020-06-25 14:14:51 -04:00
Igor Motov	20af856abd	[7.x] EQL: Adds an ability to execute an asynchronous EQL search (#58192 ) Adds async support to EQL searches Closes #49638 Co-authored-by: James Rodewig james.rodewig@elastic.co	2020-06-25 14:11:57 -04:00
Jim Ferenczi	6451187e84	Filter empty fields in SearchHit#toXContent (#58418 ) This commit restores the filtering of empty fields during the xcontent serialization of SearchHit. The filtering was removed unintentionally in #41656.	2020-06-25 17:49:03 +02:00
Nik Everett	03e6d1b535	Add Variable Width Histogram Aggregation (backport of #42035 ) (#58440 ) Implements a new histogram aggregation called `variable_width_histogram` which dynamically determines bucket intervals based on document groupings. These groups are determined by running a one-pass clustering algorithm on each shard and then reducing each shard's clusters using an agglomerative clustering algorithm. This PR addresses #9572. The shard-level clustering is done in one pass to minimize memory overhead. The algorithm was lightly inspired by [this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches a small number of documents to sample the data and determine initial clusters. Subsequent documents are then placed into one of these clusters, or a new one if they are an outlier. This algorithm is described in more details in the aggregation's docs. At reduce time, a [hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering) algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304) continually merges the closest buckets from all shards (based on their centroids) until the target number of buckets is reached. The final values produced by this aggregation are approximate. Each bucket's min value is used as its key in the histogram. Furthermore, buckets are merged based on their centroids and not their bounds. So it is possible that adjacent buckets will overlap after reduction. Because each bucket's key is its min, this overlap is not shown in the final histogram. However, when such overlap occurs, we set the key of the bucket with the larger centroid to the midpoint between its minimum and the smaller bucket’s maximum: `min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to increases the accuracy of the clustering. Nodes are unable to share centroids during the shard-level clustering phase. In the future, resolving https://github.com/elastic/elasticsearch/issues/50863 would let us solve this issue. It doesn’t make sense for this aggregation to support the `min_doc_count` parameter, since clusters are determined dynamically. The `order` parameter is not supported here to keep this large PR from becoming too complex. Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>	2020-06-25 11:40:47 -04:00
Nik Everett	c7726cc93e	Fix janky test Fixes a test that incorrectly assumed that a list of random values less than or equal to `n` always contained `n`. Oops. Closes #58353	2020-06-25 11:13:29 -04:00
Nik Everett	71adade73a	Return clear error message if aggregation type is invalid (#58255 ) (#58365 ) The main changes are: 1. Catch the `NamedObjectNotFoundException` when parsing aggregation type, and then throw a `ParsingException` with clear error message with hint. 2. Add a unit test method: AggregatorFactoriesTests#testInvalidType(). Closes #58146. Co-authored-by: bellengao <gbl_long@163.com>	2020-06-25 11:08:25 -04:00
David Roberts	1742b1c39e	Cancel persistent task recheck when no longer master (#58539 ) If a persistent task cannot be assigned on the first attempt then the master node will schedule periodic rechecks to see if the assignment requirements have been met. These periodic rechecks should be cancelled if the node ceases to be master. Previously they weren't, leading to exceptions being logged repeatedly. This PR cancels the rechecks on learning that the node is no longer the master. Fixes #58531	2020-06-25 15:51:57 +01:00
Nik Everett	335505c4e1	Drop deprecated aggregator wrapper (backport of #58367 ) (#58448 ) This drops the deprecated and now unused `asMultiBucketAggregator`. It was too easy to use it to make inefficient `Aggregators`. Relates to #56487	2020-06-25 09:31:19 -04:00
Julie Tibshirani	1f2e05c947	Simplify mapping validation for resizing indices. (#58514 ) When creating a target index from a source index, we don't allow for target mappings to be specified. This PR simplifies the check that the target mappings are empty. This refactor will help when implementing composable template merging, since we no longer need to resolve + check the target mappings when creating an index from a template.	2020-06-24 14:07:19 -07:00
Armin Braun	9e4c5d1dde	Cleaner Handling of Snapshot Related null Custom Values in CS (#58382 ) (#58501 ) Add the ability to get a custom value while specifying a default and use it throughout the codebase to get rid of the `null` edge case and shorten the code a little.	2020-06-24 17:24:44 +02:00
Benjamin Trent	fa88e71532	[ML] unify usages of _all and wildcard <*> (#58460 ) (#58494 )	2020-06-24 09:47:57 -04:00
markharwood	d5ac3bb87f	Field capabilities - make `keyword` a family of field types (#58315 ) (#58483 ) Introduces a new method on `MappedFieldType` to return a family type name which defaults to the field type. Changes `wildcard` and `constant_keyword` field types to return `keyword` for field capabilities. Relates to #53175	2020-06-24 12:32:14 +01:00
Jim Ferenczi	ec8d5ec79c	Fix handling of terminate_after when size is 0 (#58212 ) `terminate_after` is ignored on search requests that don't return top hits (`size` set to 0) and do not tracked the number of hits accurately (`track_total_hits`). We use early termination when the number of hits to track is reached during collection but this breaks the hard termination of `terminate_after` if it happens before we reached the `terminate_after` value. This change ensures that we continue to check `terminate_after` even if the tracking of total hits has reached the provided value. Closes #57624	2020-06-24 13:16:11 +02:00
David Turner	796cb9e9ca	Reword INDEX_READ_ONLY_ALLOW_DELETE_BLOCK message (#58410 ) Users are perennially confused by the message they get when writing to an index is blocked due to excessive disk usage: TOO_MANY_REQUESTS/12/index read-only / allow delete (api) Of course this is technically accurate but it is hard to join the dots from this message to "your disk was too full" without some searching of forums and documentation. Additionally in #50166 we changed the status code to today's `429` from the previous `403` which changed the message from the one that's widely documented elsewhere: FORBIDDEN/12/index read-only / allow delete (api) Since #42559 we've considered this block to be under the sole control of the disk-based shard allocator, and we have seen no evidence to suggest that anyone is applying this block manually. Therefore this commit adjusts this block's message to indicate that it's caused by a lack of disk space.	2020-06-24 10:22:11 +01:00
Alan Woodward	d251a482e9	Move MappedFieldType.similarity() to TextSearchInfo (#58439 ) Similarities only apply to a few text-based field types, but are currently set directly on the base MappedFieldType class. This commit moves similarity information into TextSearchInfo, and removes any mentions of it from MappedFieldType or FieldMapper. It was previously possible to include a similarity parameter on a number of field types that would then ignore this information. To make it obvious that this has no effect, setting this parameter on non-text field types now issues a deprecation warning.	2020-06-24 10:00:32 +01:00
Ryan Ernst	89c03e593c	Create utility for custom config setup in packaging tests (#58352 ) This commit creates a shared withCustomConfig method that may be used by any packaging test. The method will copy the config directory and override the conf path appropriately depending on the distribution type.	2020-06-23 15:12:22 -07:00
Dan Hermann	b40c27698f	Fix incorrect stats warning when swap is disabled	2020-06-23 14:34:27 -05:00
James Rodewig	affc3954e6	[DOCS] Fix typo in RoutingNode comment (#58079 ) (#58454 ) Co-authored-by: Howard <danielhuang@tencent.com>	2020-06-23 13:07:08 -04:00
Christoph Büscher	642b05a511	Fix test failure in RangeQueryBuilderTests.testToQuery (#58449 ) Very rarely this test can fail if we draw a random TimeZone id that we cannot parse with the legacy joda DateMathParser and get an IllegalArgumentException. In addition to a "SystemV/*" time zone we also need an index "versionCreated" before V_7_0_0 and no "format" setting in the query builder. Given how unlikely this combination is, we should simply dissallow those time zone ids when generating the random query builder for RangeQueryBuilderTests. Closes #58431	2020-06-23 17:44:18 +02:00
Mark Tozzi	52806a8f89	Small VS config cleanup (#58294 ) (#58442 )	2020-06-23 10:53:06 -04:00
Alan Woodward	8ebd341710	Add text search information to MappedFieldType (#58230 ) (#58432 ) Now that MappedFieldType no longer extends lucene's FieldType, we need to have a way of getting the index information about a field necessary for building text queries, building term vectors, highlighting, etc. This commit introduces a new TextSearchInfo abstraction that holds this information, and a getTextSearchInfo() method to MappedFieldType to make it available. Field types that do not support text search can just return null here. This allows us to remove the MapperService.getLuceneFieldType() shim method.	2020-06-23 14:37:26 +01:00
Nik Everett	519f41950a	Save memory when significant_text is not on top (#58145 ) (#58364 ) This merges the aggregator for `significant_text` into `significant_terms`, applying the optimization built in #55873 to save memory when the aggregation is not on top. The `significant_text` aggregation is pretty memory intensive all on its own and this doesn't particularly help with that, but it'll help with the memory usage of any sub-aggregations.	2020-06-23 09:19:05 -04:00
Dan Hermann	41e8f584c1	[7.x] Minimum node version check before creating data stream (#58424 )	2020-06-23 07:45:27 -05:00
Armin Braun	943efb78fd	Save Shard ID Serializations in Bulk Requests (#56209 ) (#58414 ) Just like #56094 but for the request side. Removes a lot of redundant `ShardId` instances from bulk shard requests as well as stops serializing index names when they're not needed because they're not different from what is in the shard id. Even ignoring the index name serialization savings here, this change saves one `ShardId` instance per bulk shard request at least. This means it saves approximately: * 8 bytes for the `ShardId` object (itself + one field) * + another 4 bytes for the `int` in the `ShardId` * 16 bytes (two fields + the instance itself + the padding) for the `Index` object * + 30 bytes for the `Index` uuid string * + all the bytes in the index name string => 60+ bytes per bulk request item saved on heap and over the wire	2020-06-23 12:35:52 +02:00
David Turner	256b660f0a	Remove anonymous PublicationContext implementation (#58412 ) Today the `PublicationContext` interface has a single anonymous implementation, and `PublicationTransportHandler` has various methods that take the variables that this anonymous class captures. This commit refactors this into a proper class with proper fields and moves the relevant methods onto this class. Backport of #58405 to 7.x.	2020-06-23 11:13:23 +01:00
Alan Woodward	519d1278e2	Make FieldTypeLookup immutable (#58162 ) (#58411 ) FieldTypeLookup maps field names to their MappedFieldTypes. In the past, due to the presence of multiple mapping types within a single index, this had to be updated in-place because a mapping update might only affect one type. However, now that we only have a single type per index, we can completely rebuild the FieldTypeLookup on each update, removing lots of concurrency worries.	2020-06-23 10:51:32 +01:00
Martijn van Groningen	7dda9934f9	Keep track of timestamp_field mapping as part of a data stream (#58400 ) Backporting #58096 to 7.x branch. Relates to #53100 * use mapping source direcly instead of using mapper service to extract the relevant mapping details * moved assertion to TimestampField class and added helper method for tests * Improved logic that inserts timestamp field mapping into an mapping. If the timestamp field path consisted out of object fields and if the final mapping did not contain the parent field then an error occurred, because the prior logic assumed that the object field existed.	2020-06-22 17:46:38 +02:00
Przemko Robakowski	a44dad9fbb	[7.x] Add support for snapshot and restore to data streams (#57675 ) (#58371 ) * Add support for snapshot and restore to data streams (#57675) This change adds support for including data streams in snapshots. Names are provided in indices field (the same way as in other APIs), wildcards are supported. If rename pattern is specified it renames both data streams and backing indices. It also adds test to make sure SLM works correctly. Closes #57127 Relates to #53100 * version fix * compilation fix * compilation fix * remove unused changes * compilation fix * test fix	2020-06-19 22:41:51 +02:00
William Brafford	b3c99f06d6	Mute flaky test (#58356 )	2020-06-18 15:30:11 -04:00
Andrei Dan	30e777856f	[7.x] Validate alias operations don't target data streams (#58327 ) (#58337 ) This adds validation to make sure alias operations (add, remove, remove index) don't target data streams or the backing indices. (cherry picked from commit 816448990e464a02f3960f12f6f6644a8cce36a4) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-06-18 20:23:07 +01:00
Stuart Tettemer	20abba8433	Scripting: Deprecate general cache settings (#55753 ) (#58283 ) Backport: ef543b0	2020-06-18 11:54:23 -06:00
Alan Woodward	4b8cf2af6a	Add serialization test for FieldMappers when include_defaults=true (#58235 ) (#58328 ) Fixes a bug in TextFieldMapper serialization when index is false, and adds a base-class test to ensure that all field mappers are tested against all variations with defaults both included and excluded. Fixes #58188	2020-06-18 15:46:04 +01:00
Alan Woodward	ca2d12d039	Remove Settings parameter from FieldMapper base class (#58237 ) This is currently used to set the indexVersionCreated parameter on FieldMapper. However, this parameter is only actually used by two implementations, and clutters the API considerably. We should just remove it, and use it directly in the implementations that require it.	2020-06-18 12:53:54 +01:00
Rory Hunter	4da767bb3e	Fix version	2020-06-18 12:29:47 +01:00
Rory Hunter	a71f0cabdc	Version bump for 7.8.0 release	2020-06-18 11:04:56 +01:00
Christoph Büscher	ba0b046909	Fix test compilation issue	2020-06-18 11:36:11 +02:00
Christoph Büscher	31d8e03954	Prevent BigInteger serialization errors in term queries (#57987 ) When a numeric value in e.g. a `term` query doesn't fit into a long, it curerently gets parsed to a BigInteger object, that the various term query builders store untouched. This leads to serialization errors when these queries are sent across the wire. Instead we can convert to a string representation early on, since that is what we store e.g. when indexing big integers into `keyword` fields anyway. Closes #57917	2020-06-18 11:17:12 +02:00
Jim Ferenczi	82db0b575c	Allow index filtering in field capabilities API (#57276 ) (#58299 ) This change allows to use an `index_filter` in the field capabilities API. Indices are filtered from the response if the provided query rewrites to `match_none` on every shard: ```` GET metrics-* { "index_filter": { "bool": { "must": [ "range": { "@timestamp": { "gt": "2019" } } } } } ```` The filtering is done on a best-effort basis, it uses the can match phase to rewrite queries to `match_none` instead of fully executing the request. The first shard that can match the filter is used to create the field capabilities response for the entire index. Closes #56195	2020-06-18 10:23:26 +02:00
Yannick Welsch	ffeff4090e	Add new flag to check whether alias exists on remove (#58100 ) This allows doing true CAS operations on aliases, making sure that an alias is actually properly moved from a given source index onto a given target index. This is useful to ensure that an alias is actually moved from a given index to another one, and not just added to another index.	2020-06-18 10:15:26 +02:00
Rene Groeschke	abc72c1a27	Unify dependency licenses task configuration (#58116 ) (#58274 ) - Remove duplicate dependency configuration - Use task avoidance api accross the build - Remove redundant licensesCheck config	2020-06-18 08:15:50 +02:00
Mark Vieira	ef8899b130	Mute SpanMultiTermQueryBuilderTests.testToQueryInnerTermQuery	2020-06-17 16:27:18 -07:00
Julie Tibshirani	b1161cba35	Rename SearchContext#smartNameFieldType. (#58203 ) The concept of a 'smart name' doesn't make sense now that there are no mapping types.	2020-06-17 10:38:32 -07:00
Tim Brooks	2074412d79	Retry failed replication due to transient errors (#56230 ) Currently a failed replication action will fail an entire replica. This includes when replication fails due to potentially short lived transient issues such as network distruptions or circuit breaking errors. This commit implements retries using the retryable action.	2020-06-17 10:17:30 -06:00
Luca Cavanna	5ddea03de7	Remove needless termsQuery implementation from StringFieldType (#57609 ) The base class `TermBasedFieldType` already implements exactly the same `termsQuery` method, hence there is no need to override it.	2020-06-17 18:04:49 +02:00
GeChenxin	a96f526de1	Add index name to refresh mapping task (#57598 )	2020-06-17 10:49:36 -04:00
Armin Braun	41af7f5455	Fix Typo in Snapshot Abort Test (#58238 ) (#58247 ) Forgot the brackets here in #58214 so in the rare case where the first update seen by the listener doesn't match it will still remove itself and never be invoked again -> timeout.	2020-06-17 14:53:39 +02:00
Nik Everett	ab2c6d9696	Save memory when auto_date_histogram is not on top (backport of #57304 ) (#58190 ) This builds an `auto_date_histogram` aggregator that natively aggregates from many buckets and uses it when the `auto_date_histogram` used to use `asMultiBucketAggregator` which should save a significant amount of memory in those cases. In particular, this happens when `auto_date_histogram` is a sub-aggregator of a multi-bucketing aggregator like `terms` or `histogram` or `filters`. For the most part we preserve the original implementation when `auto_date_histogram` only collects from a single bucket. It isn't possible to "just port the aggregator" without taking a pretty significant performance hit because we used to rewrite all of the buckets every time we switched to a coarser and coarser rounding configuration. Without some major surgery to how to delay sub-aggs we'd end up rewriting the delay list zillions of time if there are many buckets. The multi-bucket version of the aggregator has a "budget" of "wasted" buckets and only rewrites all of the buckets when we exceed that budget. Now that we don't rebucket every time we increase the rounding we can no longer get an accurate count of the number of buckets! So instead the aggregator uses an estimate of the number of buckets to trigger switching to a coarser rounding. This estimate is likely to be terrible when buckets are far apart compared to the rounding. So it also uses the difference between the first and last bucket to trigger switching to a coarser rounding. Which covers for the shortcomings of the bucket estimation technique pretty well. It also causes the aggregator to emit fewer buckets in cases where they'd be reduced together on the coordinating node. This is wonderful! But probably fairly rare. All of that does buy us some speed improvements when the aggregator is a child of multi-bucket aggregator: Without metrics or time zone: 25% faster With metrics: 15% faster With time zone: 22% faster Relates to #56487	2020-06-17 08:48:41 -04:00
Jason Tedor	b78b3edeea	Upgrade to JNA 5.5.0 (#58183 ) This commit bumps our JNA dependency from 4.5.1 to 5.5.0, so that we are now on the latest maintained line, and pick up a large collection of bug fixes that have accumulated.	2020-06-17 07:35:08 -04:00
Ignacio Vera	b6585f2b51	Add new extensions for Lucene86 points codec to FsDirectoryFactory (#58226 ) (#58233 )	2020-06-17 12:55:33 +02:00
Armin Braun	85be78b624	Fix Snapshot Abort Not Waiting for Data Nodes (#58214 ) (#58228 ) This was a really subtle bug that we introduced a long time ago. If a shard snapshot is in aborted state but hasn't started snapshotting on a node we can only send the failed notification for it if the shard was actually supposed to execute on the local node. Without this fix, if shard snapshots were spread out across at least two data nodes (so that each data node does not have all the primaries) the abort would actually never wait on the data nodes. This isn't a big deal with uuid shard generations but could lead to potential corruption on S3 when using numeric shard generations (albeit very unlikely now that we have the 3 minute wait there). Another negative side-effect of this bug was that master would receive a lot more shard status update messages for aborted shards since each data node not assigned a primary would send one message for that primary.	2020-06-17 11:39:50 +02:00

1 2 3 4 5 ...

5019 Commits