OpenSearch

Commit Graph

Author	SHA1	Message	Date
Armin Braun	2dd086445c	Enable Fully Concurrent Snapshot Operations (#56911 ) (#59578 ) Enables fully concurrent snapshot operations: * Snapshot create- and delete operations can be started in any order * Delete operations wait for snapshot finalization to finish, are batched as much as possible to improve efficiency and once enqueued in the cluster state prevent new snapshots from starting on data nodes until executed * We could be even more concurrent here in a follow-up by interleaving deletes and snapshots on a per-shard level. I decided not to do this for now since it seemed not worth the added complexity yet. Due to batching+deduplicating of deletes the pain of having a delete stuck behind a long -running snapshot seemed manageable (dropped client connections + resulting retries don't cause issues due to deduplication of delete jobs, batching of deletes allows enqueuing more and more deletes even if a snapshot blocks for a long time that will all be executed in essentially constant time (due to bulk snapshot deletion, deleting multiple snapshots is mostly about as fast as deleting a single one)) * Snapshot creation is completely concurrent across shards, but per shard snapshots are linearized for each repository as are snapshot finalizations See updated JavaDoc and added test cases for more details and illustration on the functionality. Some notes: The queuing of snapshot finalizations and deletes and the related locking/synchronization is a little awkward in this version but can be much simplified with some refactoring. The problem is that snapshot finalizations resolve their listeners on the `SNAPSHOT` pool while deletes resolve the listener on the master update thread. With some refactoring both of these could be moved to the master update thread, effectively removing the need for any synchronization around the `SnapshotService` state. I didn't do this refactoring here because it's a fairly large change and not necessary for the functionality but plan to do so in a follow-up. This change allows for completely removing any trickery around synchronizing deletes and snapshots from SLM and 100% does away with SLM errors from collisions between deletes and snapshots. Snapshotting a single index in parallel to a long running full backup will execute without having to wait for the long running backup as required by the ILM/SLM use case of moving indices to "snapshot tier". Finalizations are linearized but ordered according to which snapshot saw all of its shards complete first	2020-07-15 03:42:31 +02:00
Armin Braun	06d94cbb2a	Fix TODO about Spurious FAILED Snapshots (#58994 ) (#59576 ) There is no point in writing out snapshots that contain no data that can be restored whatsoever. It may have made sense to do so in the past when there was an `INIT` snapshot step that wrote data to the repository that would've other become unreferenced, but in the current day state machine without the `INIT` step there is no point in doing so.	2020-07-15 00:54:30 +02:00
Armin Braun	e1014038e9	Simplify Repository.finalizeSnapshot Signature (#58834 ) (#59574 ) Many of the parameters we pass into this method were only used to build the `SnapshotInfo` instance to write. This change simplifies the signature. Also, it seems less error prone to build `SnapshotInfo` in `SnapshotsService` isntead of relying on the fact that each repository implementation will build the correct `SnapshotInfo`.	2020-07-15 00:14:28 +02:00
Martijn van Groningen	35ae3d19db	Remove data stream feature flag (#59572 ) so that it can used in the next minor release (7.9.0). Backport of #59504 to 7.x branch. Closes #53100	2020-07-14 23:50:41 +02:00
Armin Braun	d456f7870a	Deduplicate Index Metadata in BlobStore (#50278 ) (#59514 ) This PR introduces two new fields in to `RepositoryData` (index-N) to track the blob name of `IndexMetaData` blobs and their content via setting generations and uuids. This is used to deduplicate the `IndexMetaData` blobs (`meta-{uuid}.dat` in the indices folders under `/indices` so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot. This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time. The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into `7.6`). Relates to #45736 as it improves the efficiency of snapshotting unchanged indices Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete	2020-07-14 22:18:42 +02:00
Tim Brooks	408a07f96a	Separate coordinating and primary bytes in stats (#59487 ) Currently we combine coordinating and primary bytes into a single bucket for indexing pressure stats. This makes sense for rejection logic. However, for metrics it would be useful to separate them.	2020-07-14 12:37:06 -06:00
Tim Brooks	623df95a32	Adding indexing pressure stats to node stats API (#59467 ) We have recently added internal metrics to monitor the amount of indexing occurring on a node. These metrics introduce back pressure to indexing when memory utilization is too high. This commit exposes these stats through the node stats API.	2020-07-13 17:23:42 -06:00
Mark Vieira	dc7d4c615c	Ensure fixture runtime dependencies are built before starting containers (#59474 )	2020-07-13 15:58:01 -07:00
Armin Braun	64c5f70a2d	Remove Needless Context Switches on Loading RepositoryData (#56935 ) (#59452 ) We don't need to switch to the generic or snapshot pool for loading cached repository data (i.e. most of the time in normal operation). This makes `executeConsistentStateUpdate` less heavy if it has to retry and lowers the chance of having to retry in the first place. Also, this change allowed simplifying a few other spots in the codebase where we would fork off to another pool just to load repository data.	2020-07-13 21:38:29 +02:00
Martijn van Groningen	b1b7bf3912	Make data streams a basic licensed feature. (#59392 ) Backport of #59293 to 7.x branch. * Create new data-stream xpack module. * Move TimestampFieldMapper to the new module, this results in storing a composable index template with data stream definition only to work with default distribution. This way data streams can only be used with default distribution, since a data stream can currently only be created if a matching composable index template exists with a data stream definition. * Renamed `_timestamp` meta field mapper to `_data_stream_timestamp` meta field mapper. * Add logic to put composable index template api to fail if `_data_stream_timestamp` meta field mapper isn't registered. So that a more understandable error is returned when attempting to store a template with data stream definition via the oss distribution. In a follow up the data stream transport and rest actions can be moved to the xpack data-stream module.	2020-07-13 17:26:46 +02:00
Alan Woodward	f4caadd239	MappedFieldType no longer requires equals/hashCode/clone (#59212 ) With the removal of mapping types and the immutability of FieldTypeLookup in #58162, we no longer have any cause to compare MappedFieldType instances. This means that we can remove all equals and hashCode implementations, and in addition we no longer need the clone implementations which were required for equals/hashcode testing. This greatly simplifies implementing new MappedFieldTypes, which will be particularly useful for the runtime fields project.	2020-07-09 21:05:10 +01:00
Przemko Robakowski	c870d6e570	[7.x] Restart tests with data streams (#58330 ) (#59303 ) * Restart tests with data streams (#58330)	2020-07-09 17:52:20 +02:00
Lee Hinman	bb1c53a0f5	Allow warnings about 'global' template in upgrade tests (#59242 ) These tests sometimes install a template so they can be compatible with older versions, but they run amok of the occasionally installed "global" template which changes the default number of shards. This commit adds `allowedWarnings` and allows these warnings to be present, but doesn't fail if they are not (since the global template is only randomly installed). Resolves #58807 Resolves #58258	2020-07-08 13:40:55 -06:00
Nhat Nguyen	e50a0330ec	Remove random of recovery chunk size setting The recovery chunk size setting was injected in #58018, but too aggressively and broke several tests. This change removes that random injection. Relates #58018	2020-07-08 15:29:37 -04:00
Martijn van Groningen	17bd559253	Fix the timestamp field of a data stream to @timestamp (#59210 ) Backport of #59076 to 7.x branch. The commit makes the following changes: * The timestamp field of a data stream definition in a composable index template can only be set to '@timestamp'. * Removed custom data stream timestamp field validation and reuse the validation from `TimestampFieldMapper` and instead only check that the _timestamp field mapping has been defined on a backing index of a data stream. * Moved code that injects _timestamp meta field mapping from `MetadataCreateIndexService#applyCreateIndexRequestWithV2Template58956(...)` method to `MetadataIndexTemplateService#collectMappings(...)` method. * Fixed a bug (#58956) that cases timestamp field validation to be performed for each template and instead of the final mappings that is created. * only apply _timestamp meta field if index is created as part of a data stream or data stream rollover, this fixes a docs test, where a regular index creation matches (logs-*) with a template with a data stream definition. Relates to #58642 Relates to #53100 Closes #58956 Closes #58583	2020-07-08 17:30:46 +02:00
Armin Braun	c66b80b9fa	Disable WindowsFS in MockAPITests (#59163 ) (#59214 ) Turns out these tests sometimes run very slow on `WindowsFS` as well so disabling it here. Closes #59133	2020-07-08 14:47:40 +02:00
Nik Everett	a29d3515a2	Improve cardinality measure used to build aggs (#56533 ) (#59107 ) This makes a `parentCardinality` available to every `Aggregator`'s ctor so it can make intelligent choices about how it collects bucket values. This replaces `collectsFromSingleBucket` and is similar to it but: 1. It supports `NONE`, `ONE`, and `MANY` values and is generally extensible if we decide we can use more precise counts. 2. It is more accurate. `collectsFromSingleBucket` assumed that all sub-aggregations live under multi-bucket aggregations. This is normally true but `parentCardinality` is properly carried forward for single bucket aggregations like `filter` and for multi-bucket aggregations configured in single-bucket for like `range` with a single range. While I was touching every aggregation I renamed `doCreateInternal` to `createMapped` because that seemed like a much better name and it was right there, next to the change I was already making. Relates to #56487 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-08 08:42:23 -04:00
Armin Braun	9268b25789	Add Check for Metadata Existence in BlobStoreRepository (#59141 ) (#59216 ) In order to ensure that we do not write a broken piece of `RepositoryData` because the phyiscal repository generation was moved ahead more than one step by erroneous concurrent writing to a repository we must check whether or not the current assumed repository generation exists in the repository physically. Without this check we run the risk of writing on top of stale cached repository data. Relates #56911	2020-07-08 14:25:01 +02:00
Nhat Nguyen	ef5c397c0f	Sending operations concurrently in peer recovery (#58018 ) Today, we send operations in phase2 of peer recoveries batch by batch sequentially. Normally that's okay as we should have a fairly small of operations in phase 2 due to the file-based threshold. However, if phase1 takes a lot of time and we are actively indexing, then phase2 can have a lot of operations to replay. With this change, we will send multiple batches concurrently (defaults to 1) to reduce the recovery time. Backport of #58018	2020-07-07 22:03:31 -04:00
David Turner	46c8d00852	Remove nodes with read-only filesystems (#52680 ) (#59138 ) Today we do not allow a node to start if its filesystem is readonly, but it is possible for a filesystem to become readonly while the node is running. We don't currently have any infrastructure in place to make sure that Elasticsearch behaves well if this happens. A node that cannot write to disk may be poisonous to the rest of the cluster. With this commit we periodically verify that nodes' filesystems are writable. If a node fails these writability checks then it is removed from the cluster and prevented from re-joining until the checks start passing again. Closes #45286 Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com>	2020-07-07 14:00:02 +01:00
Armin Braun	d6d6df16bb	Share IT Infrastructure between Core Snapshot and SLM ITs (#59082 ) (#59119 ) For #58994 it would be useful to be able to share test infrastructure. This PR shares `AbstractSnapshotIntegTestCase` for that purpose, dries up SLM tests accordingly and adds a shared and efficient (compared to the previous implementations) way of waiting for no running snapshot operations to the test infrastructure to dry things up further.	2020-07-07 12:04:41 +02:00
Dan Hermann	550dcb0ca6	[7.x] Delete data stream API accepts multiple names (#59064 )	2020-07-06 08:06:10 -05:00
Armin Braun	49857cc35d	Dry up Master Disconnect Disruption Tests (#58953 ) (#59050 ) Dry up tests that use a disruption that isolates the master from all other nodes. Also, turn disruption types that have neither parameters nor state into constants to make things a little clearer.	2020-07-06 11:04:24 +02:00
Armin Braun	071d8b2c1c	Deduplicate Empty InternalAggregations (#58386 ) (#59032 ) Working through a heap dump for an unrelated issue I found that we can easily rack up tens of MBs of duplicate empty instances in some cases. I moved to a static constructor to guard against that in all cases.	2020-07-04 14:02:16 +02:00
David Kyle	f6a0c2c59d	[7.x] Pipeline Inference Aggregation (#58965 ) Adds a pipeline aggregation that loads a model and performs inference on the input aggregation results.	2020-07-03 09:29:04 +01:00
Tim Brooks	dc9e364ff2	Count coordinating and primary bytes as write bytes (#58984 ) This is a follow-up to #57573. This commit combines coordinating and primary bytes under the same "write" bucket. Double accounting is prevented by only accounting the bytes at either the reroute phase or the primary phase. TransportBulkAction calls execute directly, so the operations handler is skipped and the bytes are not double accounted.	2020-07-02 19:48:19 -06:00
Tim Brooks	9d1bf383d0	Add test assertions to ensure write bytes released (#58970 ) This is a follow-up to #57573. This commit ensures that the bytes marked in WriteMemoryLimits are released by any test using an internal test cluster.	2020-07-02 17:38:23 -06:00
Tim Brooks	1ef2cd7f1a	Add memory tracking to queued write operations (#58957 ) Currently we do not track the memory consuming by in-process write operations. This commit adds a mechanism to track write operation memory usage.	2020-07-02 14:14:57 -06:00
Ryan Ernst	d825d4352c	Eagerly compile condition script at processor creation (#58882 ) Ingest script processors were changed to eagerly compile their scripts when the ingest pipeline is saved, but conditional scripts were missed. This commit adds eager compilation to ingest conditional scripts, which will help surface errors before runtime, as well as adds tests for each case we might encounter between inline and stored script compilation failures. closes #58864	2020-07-02 11:10:20 -07:00
Lee Hinman	d3d03fc1c6	[7.x] Add default composable templates for new indexing strategy (#57629 ) (#58757 ) Backports the following commits to 7.x: Add default composable templates for new indexing strategy (#57629)	2020-07-01 09:32:32 -06:00
Alan Woodward	3ba16e0f39	Move MappedFieldType#getSearchAnalyzer and #getSearchQuoteAnalyzer to TextSearchInfo (#58830 ) Analyzers are specific to text searching, and so should be in TextSearchInfo rather than on the generic MappedFieldType. Backport of #58639	2020-07-01 14:52:14 +01:00
David Turner	822b7421ce	Forbid read-only-allow-delete block in blocks API (#58727 ) The read-only-allow-delete block is not really under the user's control since Elasticsearch adds/removes it automatically. This commit removes support for it from the new API for adding blocks to indices that was introduced in #58094.	2020-07-01 13:18:26 +01:00
Yannick Welsch	15c85b29fd	Account for recovery throttling when restoring snapshot (#58658 ) (#58811 ) Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account (i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to configure throttling in a single place. The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to `40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change will be observed by clusters where the recovery and restore settings were not adapted. Relates https://github.com/elastic/elasticsearch/issues/57023 Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2020-07-01 12:19:29 +02:00
David Turner	3a234d2669	Account for remaining recovery in disk allocator (#58800 ) Today the disk-based shard allocator accounts for incoming shards by subtracting the estimated size of the incoming shard from the free space on the node. This is an overly conservative estimate if the incoming shard has almost finished its recovery since in that case it is already consuming most of the disk space it needs. This change adds to the shard stats a measure of how much larger each store is expected to grow, computed from the ongoing recovery, and uses this to account for the disk usage of incoming shards more accurately. Backport of #58029 to 7.x * Picky picky * Missing type	2020-07-01 10:12:44 +01:00
Julie Tibshirani	ab65a57d70	Merge mappings for composable index templates (#58709 ) This PR implements recursive mapping merging for composable index templates. When creating an index, we perform the following: * Add each component template mapping in order, merging each one in after the last. * Merge in the index template mappings (if present). * Merge in the mappings on the index request itself (if present). Some principles: * All 'structural' changes are disallowed (but everything else is fine). An object mapper can never be changed between `type: object` and `type: nested`. A field mapper can never be changed to an object mapper, and vice versa. * Generally, each section is merged recursively. This includes `object` mappings, as well as root options like `dynamic_templates` and `meta`. Once we reach 'leaf components' like field definitions, they always overwrite an existing one instead of being merged. Relates to #53101.	2020-06-30 08:01:37 -07:00
Rene Groeschke	d952b101e6	Replace compile configuration usage with api (7.x backport) (#58721 ) * Replace compile configuration usage with api (#58451) - Use java-library instead of plugin to allow api configuration usage - Remove explicit references to runtime configurations in dependency declarations - Make test runtime classpath input for testing convention - required as java library will by default not have build jar file - jar file is now explicit input of the task and gradle will ensure its properly build * Fix compile usages in 7.x branch	2020-06-30 15:57:41 +02:00
Yannick Welsch	b885cbff1a	Add index block api (#58716 ) Adds an API for putting an index block in place, which also ensures for write blocks that, once successfully returning to the user, all shards of the index are properly accounting for the block, for example that all in-flight writes to an index have been completed after adding the write block. This API allows coordinating more complex workflows, where it is crucial that an index is no longer receiving writes after the API completes, useful for example when marking an index as read-only during an upgrade in order to reindex its documents.	2020-06-30 14:06:52 +02:00
Przemyslaw Gomulka	3923a10165	Exclude SystemV timezones from randomZone method (#58549 ) (#58655 ) RandomZone test method returns a ZoneId from the set of ids supported by java. The only difference between joda and java supported timezones are SystemV* timezones. These should be excluded from randomZone method as they would break testing. They also do not bring much confidence when used in testing as I suspect they are rarely used. That exclude should be removed for simplification once joda support is removed.	2020-06-30 12:45:53 +02:00
Armin Braun	090211f768	Fix Incorrect Snapshot Shar Status for DONE Shards in Running Snapshots (#58390 ) (#58593 ) Minor bugs/inconsistencies: If a shard hasn't changed at all we were reporting `0` for total size and total file count while it was ongoing. If a data node restarts/drops out during snapshot creation the fallback logic did not load the correct statistic from the repository but just created a status with `0` counts from the snapshot state in the CS. Added a fallback to reading from the repository in this case.	2020-06-26 16:11:30 +02:00
Nik Everett	5f52bc4c9f	Fix two scripted_metric bugs (backport of #58547 ) (#58565 ) Fixes two bugs introduced by #57627: 1. We were not properly letting go of memory from the request breaker when the aggregation finished. 2. We no longer supported totally arbitrary stuff produced by the init script because we assumed that it'd be ok to run the script once and clone its results. Sadly, cloning can't clone anything that the init script can make, like `String` arrays. This runs the init script once for every new bucket so we don't need to clone.	2020-06-25 16:16:10 -04:00
Jason Tedor	52ad5842a9	Introduce node.roles setting (#58512 ) Today we have individual settings for configuring node roles such as node.data and node.master. Additionally, roles are pluggable and we have used this to introduce roles such as node.ml and node.voting_only. As the number of roles is growing, managing these becomes harder for the user. For example, to create a master-only node, today a user has to configure: - node.data: false - node.ingest: false - node.remote_cluster_client: false - node.ml: false at a minimum if they are relying on defaults, but also add: - node.master: true - node.transform: false - node.voting_only: false If they want to be explicit. This is also challenging in cases where a user wants to have configure a coordinating-only node which requires disabling all roles, a list which we are adding to, requiring the user to keep checking whether a node has acquired any of these roles. This commit addresses this by adding a list setting node.roles for which a user has explicit control over the list of roles that a node has. If the setting is configured, the node has exactly the roles in the list, and not any additional roles. This means to configure a master-only node, the setting is merely 'node.roles: [master]', and to configure a coordinating-only node, the setting is merely: 'node.roles: []'. With this change we deprecate the existing 'node.*' settings such as 'node.data'.	2020-06-25 14:14:51 -04:00
Nik Everett	03e6d1b535	Add Variable Width Histogram Aggregation (backport of #42035 ) (#58440 ) Implements a new histogram aggregation called `variable_width_histogram` which dynamically determines bucket intervals based on document groupings. These groups are determined by running a one-pass clustering algorithm on each shard and then reducing each shard's clusters using an agglomerative clustering algorithm. This PR addresses #9572. The shard-level clustering is done in one pass to minimize memory overhead. The algorithm was lightly inspired by [this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches a small number of documents to sample the data and determine initial clusters. Subsequent documents are then placed into one of these clusters, or a new one if they are an outlier. This algorithm is described in more details in the aggregation's docs. At reduce time, a [hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering) algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304) continually merges the closest buckets from all shards (based on their centroids) until the target number of buckets is reached. The final values produced by this aggregation are approximate. Each bucket's min value is used as its key in the histogram. Furthermore, buckets are merged based on their centroids and not their bounds. So it is possible that adjacent buckets will overlap after reduction. Because each bucket's key is its min, this overlap is not shown in the final histogram. However, when such overlap occurs, we set the key of the bucket with the larger centroid to the midpoint between its minimum and the smaller bucket’s maximum: `min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to increases the accuracy of the clustering. Nodes are unable to share centroids during the shard-level clustering phase. In the future, resolving https://github.com/elastic/elasticsearch/issues/50863 would let us solve this issue. It doesn’t make sense for this aggregation to support the `min_doc_count` parameter, since clusters are determined dynamically. The `order` parameter is not supported here to keep this large PR from becoming too complex. Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>	2020-06-25 11:40:47 -04:00
Tim Brooks	5efec3a517	Add error logging when http test fails (#58505 ) Netty4HttpServerTransportTests has started to fail intermittently. It seems like unexpected successful responses are being received when the test is simulating errors. This commit adds logging to the test to provide additional information when there is an unexpected success. It also adds the logging to the nio http test.	2020-06-24 11:02:20 -06:00
Alan Woodward	d251a482e9	Move MappedFieldType.similarity() to TextSearchInfo (#58439 ) Similarities only apply to a few text-based field types, but are currently set directly on the base MappedFieldType class. This commit moves similarity information into TextSearchInfo, and removes any mentions of it from MappedFieldType or FieldMapper. It was previously possible to include a similarity parameter on a number of field types that would then ignore this information. To make it obvious that this has no effect, setting this parameter on non-text field types now issues a deprecation warning.	2020-06-24 10:00:32 +01:00
Ryan Ernst	88f1dab8b5	Fix long/int precision for test baseport calculation	2020-06-23 16:02:13 -07:00
Ryan Ernst	6285b87b97	Adjust gradle base port by one (#58368 ) When assigning ports for internal cluster tests, we use the gradle worker id as an adjustment on the base port of 10300. In order to not go outside the max port range, we modulo the worker id by 223. Since gradle worker ids start at 1, we expect to never actually get the base port of 10300. However, as the gradle daemon lasts for longer, the module can result in a value of 0, which cases the test to fail. This commit adjusts the modulo to ensure the value is never 0. closes #58279	2020-06-23 15:42:26 -07:00
David Roberts	0d6bfd0ac3	[7.x][ML] Fix wire serialization for flush acknowledgements (#58443 ) There was a discrepancy in the implementation of flush acknowledgements: most of the class was designed on the basis that the "last finalized bucket time" could be null but the wire serialization assumed that it was never null. This works because, the C++ sends zero "last finalized bucket time" when it is not known or not relevant. But then the Java code will print that to XContent as it is assuming null represents not known or not relevant. This change corrects the discrepancies. Internally within the class null represents not known or not relevant, but this is translated from/to 0 for communications from the C++ and old nodes that have the bug. Additionally I switched from Date to Instant for this class and made the member variables final to modernise it a bit. Backport of #58413	2020-06-23 16:42:06 +01:00
Alan Woodward	8ebd341710	Add text search information to MappedFieldType (#58230 ) (#58432 ) Now that MappedFieldType no longer extends lucene's FieldType, we need to have a way of getting the index information about a field necessary for building text queries, building term vectors, highlighting, etc. This commit introduces a new TextSearchInfo abstraction that holds this information, and a getTextSearchInfo() method to MappedFieldType to make it available. Field types that do not support text search can just return null here. This allows us to remove the MapperService.getLuceneFieldType() shim method.	2020-06-23 14:37:26 +01:00
Martijn van Groningen	7dda9934f9	Keep track of timestamp_field mapping as part of a data stream (#58400 ) Backporting #58096 to 7.x branch. Relates to #53100 * use mapping source direcly instead of using mapper service to extract the relevant mapping details * moved assertion to TimestampField class and added helper method for tests * Improved logic that inserts timestamp field mapping into an mapping. If the timestamp field path consisted out of object fields and if the final mapping did not contain the parent field then an error occurred, because the prior logic assumed that the object field existed.	2020-06-22 17:46:38 +02:00
Nik Everett	49684463dd	Mute ESTestCaseTests#testBasePortGradle Tracked by #58279. Failed a few times a day since June 13th.	2020-06-18 15:41:25 -04:00
Stuart Tettemer	20abba8433	Scripting: Deprecate general cache settings (#55753 ) (#58283 ) Backport: ef543b0	2020-06-18 11:54:23 -06:00
Alan Woodward	4b8cf2af6a	Add serialization test for FieldMappers when include_defaults=true (#58235 ) (#58328 ) Fixes a bug in TextFieldMapper serialization when index is false, and adds a base-class test to ensure that all field mappers are tested against all variations with defaults both included and excluded. Fixes #58188	2020-06-18 15:46:04 +01:00
Alan Woodward	ca2d12d039	Remove Settings parameter from FieldMapper base class (#58237 ) This is currently used to set the indexVersionCreated parameter on FieldMapper. However, this parameter is only actually used by two implementations, and clutters the API considerably. We should just remove it, and use it directly in the implementations that require it.	2020-06-18 12:53:54 +01:00
Rene Groeschke	abc72c1a27	Unify dependency licenses task configuration (#58116 ) (#58274 ) - Remove duplicate dependency configuration - Use task avoidance api accross the build - Remove redundant licensesCheck config	2020-06-18 08:15:50 +02:00
Ryan Ernst	c57a3f3e60	Cleanup jdk repro parameters (#57838 ) The version of java printed when a test fails currently is passed in from gradle. However, we already know this from java itself, so it is not necessary. This commit changes how the runtime.java repro parameter is found, as well as removes the compiler.java parameter which is no longer relevant. closes #57756	2020-06-17 20:53:00 -07:00
Julie Tibshirani	b1161cba35	Rename SearchContext#smartNameFieldType. (#58203 ) The concept of a 'smart name' doesn't make sense now that there are no mapping types.	2020-06-17 10:38:32 -07:00
Tim Brooks	2074412d79	Retry failed replication due to transient errors (#56230 ) Currently a failed replication action will fail an entire replica. This includes when replication fails due to potentially short lived transient issues such as network distruptions or circuit breaking errors. This commit implements retries using the retryable action.	2020-06-17 10:17:30 -06:00
Stuart Tettemer	01795d1925	Revert "Scripting: Deprecate general cache settings (#55753 )" (#58201 ) This reverts commit `88e8b34fc2`.	2020-06-16 14:58:18 -06:00
Rory Hunter	03369e0980	Implement dangling indices API (#58176 ) Backport of #50920. Part of #48366. Implement an API for listing, importing and deleting dangling indices. Co-authored-by: David Turner <david.turner@elastic.co>	2020-06-16 21:50:38 +01:00
Stuart Tettemer	88e8b34fc2	Scripting: Deprecate general cache settings (#55753 ) Backport: ef543b0	2020-06-16 13:06:59 -06:00
Alan Woodward	12a3f6dfca	MappedFieldType should not extend FieldType (#58160 ) MappedFieldType is a combination of two concerns: * an extension of lucene's FieldType, defining how a field should be indexed * a set of query factory methods, defining how a field should be searched We want to break these two concerns apart. This commit is a first step to doing this, breaking the inheritance relationship between MappedFieldType and FieldType. MappedFieldType instead has a series of boolean flags defining whether or not the field is searchable or aggregatable, and FieldMapper has a separate FieldType passed to its constructor defining how indexing should be done. Relates to #56814	2020-06-16 16:56:43 +01:00
Tal Levy	69d5e044af	Add optional description parameter to ingest processors. (#57906 ) (#58152 ) This commit adds an optional field, `description`, to all ingest processors so that users can explain the purpose of the specific processor instance. Closes #56000.	2020-06-15 19:27:57 -07:00
Ignacio Vera	1cc3c1a354	GeometryTestUtils should always generate valid polygons (#58034 ) (#58091 ) Make sure we always generate legal polygons, e.g they have area	2020-06-15 09:36:44 +02:00
Rene Groeschke	01e9126588	Remove deprecated usage of testCompile configuration (#57921 ) (#58083 ) * Remove usage of deprecated testCompile configuration * Replace testCompile usage by testImplementation * Make testImplementation non transitive by default (as we did for testCompile) * Update CONTRIBUTING about using testImplementation for test dependencies * Fail on testCompile configuration usage	2020-06-14 22:30:44 +02:00
Yannick Welsch	85b0b540f0	Fix refresh behavior in MockDiskUsagesIT (#57926 ) Ensures that InternalClusterInfoService's internally cached stats are refreshed whenever the shard size or disk usage function (to mock out disk usage) are overridden. Closes #57888	2020-06-11 17:38:12 +02:00
Dan Hermann	b501b282f8	Change default backing index naming scheme	2020-06-09 09:31:34 -05:00
Armin Braun	0987c0a5f3	Fix Broken Numeric Shard Generations in RepositoryData (#57813 ) (#57821 ) Fix broken numeric shard generations when reading them from the wire or physically from the physical repository. This should be the cheapest way to clean up broken shard generations in a BwC and safe-to-backport manner for now. We can potentially further optimize this by also not doing the checks on the generations based on the versions we see in the `RepositoryData` but I don't think it matters much since we will read `RepositoryData` from cache in almost all cases. Closes #57798	2020-06-08 18:36:56 +02:00
Mayya Sharipova	70e63a365a	Refactor how to determine if a field is metafield (#57378 ) (#57771 ) Before to determine if a field is meta-field, a static method of MapperService isMetadataField was used. This method was using an outdated static list of meta-fields. This PR instead changes this method to the instance method that is also aware of meta-fields in all registered plugins. Related #38373, #41656 Closes #24422	2020-06-08 09:16:18 -04:00
Armin Braun	619e4f8c02	Make BackgroundIndexer more Efficient (#57781 ) (#57789 ) Improve efficiency of background indexer by allowing to add an assertion for failures while they are produced to prevent queuing them up. Also, add non-blocking stop to the background indexer so that when stopping multiple indexers we don't needlessly continue indexing on some indexers while stopping another one. Closes #57766	2020-06-08 10:18:47 +02:00
Howard	76ee1aad4b	Remove unused routing for ClusterState creation utils (#57679 ) Remove some unused routing definitions from cluster state creation utils.	2020-06-04 13:59:18 -04:00
Igor Motov	8d7f389f3a	Increase search.max_buckets to 65,535 (#57042 ) Increases the default search.max_buckets limit to 65,535, and only counts buckets during reduce phase. Closes #51731	2020-06-03 15:35:41 -04:00
Nhat Nguyen	5097071230	Increase timeout for GlobalCheckpointSyncIT (#57567 ) The test failed when it was running with 4 replicas and 3 indexing threads. The recovering replicas can prevent the global checkpoint from advancing. This commit increases the timeout to 60 seconds for this suite and the check for no inflight requests. Closes #57204	2020-06-03 08:50:02 -04:00
Nik Everett	2a27c411fb	Same memory when geo aggregations are not on top (#57483 ) (#57551 ) Saves memory when the `geotile_grid` and `geohash_grid` are not on the top level by using the `LongKeyedBucketOrds` we built in #55873.	2020-06-02 16:21:50 -04:00
Mark Tozzi	e50f514092	IndexFieldData should hold the ValuesSourceType (#57373 ) (#57532 )	2020-06-02 12:16:53 -04:00
Armin Braun	ba2d70d8eb	Serialize Outbound Messages on IO Threads (#56961 ) (#57080 ) Almost every outbound message is serialized to buffers of 16k pagesize. We were serializing these messages off the IO loop (and retaining the concrete message instance as well) and would then enqueue it on the IO loop to be dealt with as soon as the channel is ready. 1. This would cause buffers to be held onto for longer than necessary, causing less reuse on average. 2. If a channel was slow for some reason, not only would concrete message instances queue up for it, but also 16k of buffers would be reserved for each message until it would be written+flushed physically. With this change, the serialization happens on the event loop which effectively limits the number of buffers that `N` IO-threads will ever use so long as messages are small and channels writable. Also, this change dereferences the reference to the concrete outbound message as soon as it has been serialized to save some more on GC. This reduces the GC time for a default PMC run by about 50% in experiments (3 nodes, 2G heap each, loopback ... obvious caveat is that GC isn't that heavy in the first place with recent changes but still a measurable gain). I also expect it to be helpful for master node stability by causing less of a spike if master is e.g. hit by a large number of requests that are processed batched (e.g. shard snapshot status updates) and responded to in a short time frame all at once. Obviously, the downside to this change is that it introduces more latency on the IO loop for the serialization. But since we read all of these messages on the IO loop as well I don't see it as much of a qualitative change really and the more predictable buffer use seems much more valuable relatively.	2020-06-02 16:15:18 +02:00
Andrei Dan	bd188f4a21	[7.x] ILM: add support for rolling over data streams (#57295 ) (#57515 ) As the datastream information is stored in the `ClusterState.Metadata` we exposed the `Metadata` to the `AsyncWaitStep#evaluateCondition` method in order for the steps to be able to identify when a managed index is part of a DataStream. If a managed index is part of a DataStream the rollover target is the DataStream name and the highest generation index is the write index (ie. the rolled index). (cherry picked from commit 6b410dfb78f3676fce1b7401f1628c1ca6fbd45a) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-06-02 11:55:23 +01:00
Zachary Tong	daaf5a3dcc	Fix assertion catching in aggregation supported type test (#56466 ) (#57382 ) At some point, we changed the supported-type test to also catch assertion errors. This has the side effect of also catching the `fail()` call inside the try-catch, which silently smothered some failures. This modifies the test to throw at the end of the try-catch block to prevent from accidentally catching itself. Catching the AssertionError is convenient because there are other locations that do throw an assertion in tests (due to hitting an assertion before the exception is thrown) so I think we should keep it around. Also includes a variety of fixes to other tests which were failing but being silently smothered.	2020-06-01 12:10:05 -04:00
Nik Everett	4263c25b2f	Save memory when histogram agg is not on top (backport of #57277 ) (#57377 ) This saves some memory when the `histogram` aggregation is not a top level aggregation by dropping `asMultiBucketAggregator` in favor of natively implementing multi-bucket storage in the aggregator. For the most part this just uses the `LongKeyedBucketOrds` that we built the first time we did this.	2020-05-29 15:07:37 -04:00
Benjamin Trent	15aba60c02	[7.x] Add new circuitbreaker plugin and refactor CircuitBreakerService (#55695 ) (#57359 ) * Add new circuitbreaker plugin and refactor CircuitBreakerService (#55695) This commit lays the ground work for plugins supplying their own circuit breakers. It adds a new interface: `CircuitBreakerPlugin`. This interface provides methods for providing custom child CircuitBreaker objects. There are also facilities for allowing dynamic settings for the custom breakers. With the refactor, circuit breakers are no longer replaced on setting changes. Instead, the two mutable settings themselves are `volatile`. Plugins that want to use their custom circuit breaker should keep a reference of their constructed breaker.	2020-05-29 12:13:46 -04:00
Mayya Sharipova	aebb78bf5c	Run sort optimization when from+size>0 (#57250 )	2020-05-29 11:30:35 -04:00
Armin Braun	be6fa72432	Fix GCS Mock Behavior for Missing Bucket (#57283 ) (#57310 ) * Fix GCS Mock Behavior for Missing Bucket We were throwing a 500 instead of a 404 for a missing bucket. This would make yaml tests needlessly wait for multiple seconds, retrying the 500 response with backoff, in the test checking behavior for missing buckets.	2020-05-29 10:01:20 +02:00
Nhat Nguyen	5b08eaf90c	Fix trimUnsafeCommits for indices created before 6.2 (#57187 ) If an upgraded node is restarted multiple times without flushing a new index commit, then we will wrongly exclude all commits from the starting commits. This bug is reproducible with these minimal steps: (1) create an empty index on 6.1.4 with translog retention disabled, (2) upgrade the cluster to 7.7.0, (3) restart the upgraded the cluster. The problem is that with the new translog policy can trim translog without having a new index commit, while the existing commit still refers to the previous translog generation. Closes #57091	2020-05-27 15:08:49 -04:00
Alan Woodward	d6b79bcd95	Remove Mapper.updateFieldType() (#57151 ) When we had multiple mapping types, an update to a field in one type had to be propagated to the same field in all other types. This was done using the Mapper.updateFieldType() method, called at the end of a merge. However, now that we only have a single type per index, this method is unnecessary and can be removed. Relates to #41059 Backport of #56986	2020-05-27 09:21:24 +01:00
Nik Everett	0fce2b7713	Fix DateHistogramAggregatorTests.testAsSubAgg Closes #57168 by using `AggregatorTestCase#newIndexSearcher` in the `AggregatorTestCase#testCase`. Without that global ordinals will sometimes fail to work.	2020-05-26 15:05:31 -04:00
Armin Braun	5569137ae3	Flatten ReleaseableBytesReference Object Trees (#57092 ) (#57109 ) When slicing a releasable bytes reference we would create a new counter every time and pass the original reference chain to the new slice on every slice invocation. This would lead to extremely deep reference chains and needlessly uses a dedicated counter for every slice when all the slices eventually just refer to the same underlying bytes and `Releasable`. This commit tracks the ref count wrapper with its releasable in a separate object that can be passed around on every slicing, making the slices' tree as flat as the original releasable bytes reference. Also, we were needlessly creating a redundant releasable bytes reference from a releasable bytes-stream-output that we never actually used for releasing (all code that uses it just releases the stream itself instead).	2020-05-25 13:00:37 +02:00
Armin Braun	a4eb3edf46	Fix GCS Repository YAML Test Build (#57073 ) (#57101 ) A few relatively obvious issues here: * We cannot run the different IT runs (large blob setting one and normal integ run) concurrently * We need to set the dependency tasks up correctly for the large blob run so that it works in isolation * We can't use the `localAddress` for the location header of the resumable upload (this breaks in YAML tests because GCS is using a loopback port forward for the initial request and the local address will be chosen as the actual Docker container host) Closes #57026	2020-05-25 11:10:39 +02:00
Alan Woodward	18bfbeda29	Move merge compatibility logic from MappedFieldType to FieldMapper (#56915 ) Merging logic is currently split between FieldMapper, with its merge() method, and MappedFieldType, which checks for merging compatibility. The compatibility checks are called from a third class, MappingMergeValidator. This makes it difficult to reason about what is or is not compatible in updates, and even what is in fact updateable - we have a number of tests that check compatibility on changes in mapping configuration that are not in fact possible. This commit refactors the compatibility logic so that it all sits on FieldMapper, and makes it called at merge time. It adds a new FieldMapperTestCase base class that FieldMapper tests can extend, and moves the compatibility testing machinery from FieldTypeTestCase to here. Relates to #56814	2020-05-20 09:43:13 +01:00
Tim Brooks	57c3a61535	Create HttpRequest earlier in pipeline (#56393 ) Elasticsearch requires that a HttpRequest abstraction be implemented by http modules before server processing. This abstraction controls when underlying resources are released. This commit moves this abstraction to be created immediately after content aggregation. This change will enable follow-up work including moving Cors logic into the server package and tracking bytes as they are aggregated from the network level.	2020-05-18 14:54:01 -06:00
Francisco Fernández Castaño	60c7832141	Track upload requests on S3 repositories (#56904 ) Add tracking for regular and multipart uploads. Regular uploads are categorized as PUT. Multi part uploads are categorized as POST. The number of documents created for the test #testRequestStats have been increased so all upload methods are exercised. Backport of #56826	2020-05-18 19:05:17 +02:00
Francisco Fernández Castaño	8ab9fc10c1	Track multipart/resumable uploads GCS API calls (#56892 ) Add tracking for multipart and resumable uploads for GoogleCloudStorage. For resumable uploads only the last request is taken into account for billing, so that's the only request that's tracked. Backport of #56821	2020-05-18 13:39:26 +02:00
Ryan Ernst	9fb80d3827	Move publishing configuration to a separate plugin (#56727 ) This is another part of the breakup of the massive BuildPlugin. This PR moves the code for configuring publications to a separate plugin. Most of the time these publications are jar files, but this also supports the zip publication we have for integ tests.	2020-05-14 20:23:07 -07:00
Francisco Fernández Castaño	97bf47f5b9	Track GET/LIST GoogleCloudStorage API calls (#56758 ) Backporting #56585 to 7.x branch. Adds tracking for the API calls performed by the GoogleCloudStorage underlying SDK. It hooks an HttpResponseInterceptor to the SDK transport layer and does http request filtering based on the URI paths that we are interested to track. Unfortunately we cannot hook a wrapper into the ServiceRPC interface since we're using different levels of abstraction to implement retries during reads (GoogleCloudStorageRetryingInputStream).	2020-05-14 14:03:21 +02:00
Nik Everett	b98b260048	Merge significant_terms into the terms package (backport of #56699 ) (#56715 ) This merges the code for the `significant_terms` agg into the package for the code for the `terms` agg. They are super entangled already, this mostly just admits that to ourselves. Precondition for the terms work in #56487	2020-05-13 17:36:21 -04:00
Ignacio Vera	b4521d5183	upgrade to Lucene 8.6.0 snapshot (#56661 )	2020-05-13 14:25:16 +02:00
Henning Andersen	48a8c7eb88	Ensure search contexts are removed on index delete (#56335 ) (#56617 ) In a race condition, a search context could remain enlisted in SearchService when an index is deleted, potentially causing the index folder to not be cleaned up (for either lengthy searches or scrolls with timeouts > 30 minutes or if the scroll is kept active).	2020-05-13 09:41:02 +02:00
Armin Braun	0a879b95d1	Save Bounds Checks in BytesReference (#56577 ) (#56621 ) Two spots that allow for some optimization: * We are often creating a composite reference of just a single item in the transport layer => special cased via static constructor to make sure we never do that * Also removed the pointless case of an empty composite bytes ref * `ByteBufferReference` is practically always created from a heap buffer these days so there is no point of dealing with all the bounds checks and extra references to sliced buffers from that and we can just use the underlying array directly	2020-05-12 20:33:45 +02:00
Nik Everett	c85a363b60	Fix nested agg test I accidentally allowed the test framework to double-wrap a reader that we rely on being only singly wrapped. Lame. closes #56529	2020-05-11 16:15:25 -04:00
Martijn van Groningen	32471abc0e	Check whether data stream feature flag is enabled before deleting all data streams, (#56517 ) (#56520 ) this will fix the release build.	2020-05-11 18:34:50 +02:00
Rene Groeschke	c29bc87040	Move bwcVersions extension property to BuildParams (back port) (#56381 ) * Move bwcVersions extension property to BuildParams (#56206) * Fix :qa Task Using Broken BwC Versions Resolution (#56332) Co-authored-by: Armin Braun <me@obrown.io>	2020-05-11 09:39:13 +02:00
Nik Everett	2f38aeb5e2	Save memory when numeric terms agg is not top (#55873 ) (#56454 ) Right now all implementations of the `terms` agg allocate a new `Aggregator` per bucket. This uses a bunch of memory. Exactly how much isn't clear but each `Aggregator` ends up making its own objects to read doc values which have non-trivial buffers. And it forces all of it sub-aggregations to do the same. We allocate a new `Aggregator` per bucket for two reasons: 1. We didn't have an appropriate data structure to track the sub-ordinals of each parent bucket. 2. You can only make a single call to `runDeferredCollections(long...)` per `Aggregator` which was the only way to delay collection of sub-aggregations. This change switches the method that builds aggregation results from building them one at a time to building all of the results for the entire aggregator at the same time. It also adds a fairly simplistic data structure to track the sub-ordinals for `long`-keyed buckets. It uses both of those to power numeric `terms` aggregations and removes the per-bucket allocation of their `Aggregator`. This fairly substantially reduces memory consumption of numeric `terms` aggregations that are not the "top level", especially when those aggregations contain many sub-aggregations. It also is a pretty big speed up, especially when the aggregation is under a non-selective aggregation like the `date_histogram`. I picked numeric `terms` aggregations because those have the simplest implementation. At least, I could kind of fit it in my head. And I haven't fully understood the "bytes"-based terms aggregations, but I imagine I'll be able to make similar optimizations to them in follow up changes.	2020-05-08 20:38:53 -04:00
Martijn van Groningen	83739b5806	Backport: allow cluster health api to resolve data streams (#56425 ) Backport of: #56413 Allow cluster health api to resolve data streams and automatically remove data streams after each test in test cases extending from `ESIntegTestCase` Relates to #53100	2020-05-08 17:16:25 +02:00
Julie Tibshirani	e852bb29b7	Simplify signature of FieldMapper#parseCreateField. (#56144 ) `FieldMapper#parseCreateField` accepts the parse context, plus a list of fields as an output parameter. These fields are immediately added to the document through `ParseContext#doc()`. This commit simplifies the signature by removing the list of fields, and having the mappers add the fields directly to `ParseContext#doc()`. I think this is nicer for implementors, because previously fields could be added either through the list, or the context (through `add`, `addWithKey`, etc.)	2020-05-06 11:12:09 -07:00
Tanguy Leroux	131a3911eb	Replace BlobContainerWrapper by FilterBlobContainer (#56200 ) A FilterBlobContainer class was introduced in #55952 and it delegates its behavior to a given BlobContainer while allowing to override only necessary methods. This commit replaces the existing BlobContainerWrapper class from the test framework with the new FilterBlobContainer from core.	2020-05-06 10:05:43 +02:00
Tanguy Leroux	35622747fd	Add Minio tests for searchable snapshots (#56112 ) (#56179 ) This commit adds QA tests for searchable snapshot on MinIO, similarly to what already exist for S3, GCS and Azure.	2020-05-05 11:40:06 +02:00
Martijn van Groningen	2ac32db607	Move includeDataStream flag from IndicesOptions to IndexNameExpressionResolver.Context (#56151 ) Backport of #56034. Move includeDataStream flag from an IndicesOptions to IndexNameExpressionResolver.Context as a dedicated field that callers to IndexNameExpressionResolver can set. Also alter indices stats api to support data streams. The rollover api uses this api and otherwise rolling over data stream does no longer work. Relates to #53100	2020-05-04 22:38:33 +02:00
Armin Braun	e01b999ef0	Add Functionality to Consistently Read RepositoryData For CS Updates (#55773 ) (#56091 ) Using optimistic locking, add the ability to run a repository state update task with a consistent view of the current repository data. Allows for a follow-up to remove the snapshot INIT state.	2020-05-04 08:13:14 +02:00
Armin Braun	3a64ecb6bf	Allow Deleting Multiple Snapshots at Once (#55474 ) (#56083 ) * Allow Deleting Multiple Snapshots at Once (#55474) Adds deleting multiple snapshots in one go without significantly changing the mechanics of snapshot deletes otherwise. This change does not yet allow mixing snapshot delete and abort. Abort is still only allowed for a single snapshot delete by exact name.	2020-05-03 20:30:58 +02:00
Tim Brooks	54dbea6c65	Improve RemoteConnectionManager consistency (#55759 ) In order to iterate through remote connections, the remote connection manager maintains a local cache of connected nodes. Unfortunately this is difficult in relationship with testing as it is inherently racy in comparison to the parent connection manager map of connections. This commit improves the relationship by only returning a cached connection if it is still registered with the parent. If the connection is not open, we will go to the slow path of allocating a iterator directly from the parent.	2020-04-30 12:13:06 -06:00
David Turner	445cf32591	Stop exposing ExecutorService from DeterministicTaskQueue (#56001 ) There are no real users of `DeterministicTaskQueue#getExecutorService()` so we can remove those public methods and expose the `ExecutorService` only through the corresponding `ThreadPool`.	2020-04-30 11:34:52 +01:00
Armin Braun	31a84b17ad	Make SAME Pool on DeterministicTaskQueue more Realistic (#55931 ) (#55999 ) By forking off the `SAME` pool tasks and executing them in random order, we are actually creating unrealisticc scenarios and missing the actual order of operations (whatever task that puts the task on the `SAME` queue will always run before the `SAME` queued task will be executed currently). Also, added caching for the executors. It doesn't matter much, but saves some objects and makes debugging a little easier because executor object ids make more sense.	2020-04-30 10:41:33 +02:00
Christos Soulios	43dab77186	[7.x] Modified searchAndReduce() to return empty agg when no docs exist (#55967 ) Backports #55826 to 7.x Modified AggregatorTestCase.searchAndReduce() method so that it returns an empty aggregation result when no documents have been inserted. Also refactored several aggregation tests so they do not re-implement method AggregatorTestCase.testCase() Fixes #55824	2020-04-30 00:28:32 +03:00
Tim Brooks	cd228095df	Retry failed peer recovery due to transient errors (#55883 ) Currently a failed peer recovery action will fail an recovery. This includes when the recovery fails due to potentially short lived transient issues such as rejected exceptions or circuit breaking errors. This commit adds the concept of a retryable action. A retryable action will be retryed in face of certain errors. The action will be retried after an exponentially increasing backoff period. After defined time, the action will timeout. This commit only implements retries for responses that indicate the target node has NOT executed the action.	2020-04-28 13:52:49 -06:00
Lee Hinman	777caf0725	[7.x] Add support for V2 index templates to /_cat/templates (#55829 ) (#55866 ) Backports the following commits to 7.x: - Add support for V2 index templates to /_cat/templates (#55829)	2020-04-28 10:14:19 -06:00
Tim Brooks	80662f31a1	Introduce mechanism to stub request handling (#55832 ) Currently there is a clear mechanism to stub sending a request through the transport. However, this is limited to testing exceptions on the sender side. This commit reworks our transport related testing infrastructure to allow stubbing request handling on the receiving side.	2020-04-27 16:57:15 -06:00
Mark Tozzi	22a98ec279	Aggregation support for Value Scripts that change types (#54830 ) (#55752 )	2020-04-27 09:57:05 -04:00
Mark Tozzi	87b4979c24	[7.x] Make ValuesSourceRegistry immutable after initilization #55493 (#55697 )	2020-04-24 13:33:38 -04:00
Zachary Tong	715c90bf7d	Aggs must specify a `field` or `script` (or both) (#52226 ) This adds a validation to VSParserHelper to ensure that a field or script or both are specified by the user. This is technically required today already, but throws an exception much deeper in the agg framework and has a very unintuitive error for the user (as well as eating more resources instead of failing early)	2020-04-23 19:23:41 -04:00
Zachary Tong	4f483ac370	Fix half-float range in SupportedTypeTests (#55409 ) Also adds a comment to the half-float number field type tests indicating why 70000 is used instead of 65504	2020-04-23 11:36:37 -04:00
Tal Levy	0844455505	Add geo_shape mapper supporting doc-values in Spatial Plugin (#55037 ) (#55500 ) After #53562, the `geo_shape` field mapper is registered within a module. This opens the door for introducing a new `geo_shape` field mapper into the Spatial Plugin that has doc-values support. This is very much an extension of server's GeoShapeFieldMapper, but with the addition of the doc values implementation.	2020-04-22 08:12:54 -07:00
Armin Braun	db7eb8e8ff	Remove Redundant CS Update on Snapshot Finalization (#55276 ) (#55528 ) This change folds the removal of the in-progress snapshot entry into setting the safe repository generation. Outside of removing an unnecessary cluster state update, this also has the advantage of removing a somewhat inconsistent cluster state where the safe repository generation points at `RepositoryData` that contains a finished snapshot while it is still in-progress in the cluster state, making it easier to reason about the state machine of upcoming concurrent snapshot operations.	2020-04-21 15:33:17 +02:00
Dan Hermann	402b6b1715	Identify backing indices for data streams	2020-04-21 07:43:10 -05:00
Yannick Welsch	ba39c261e8	Use streaming reads for GCS (#55506 ) To read from GCS repositories we're currently using Google SDK's official BlobReadChannel, which issues a new request every 2MB (default chunk size for BlobReadChannel) using range requests, and fully downloads the chunk before exposing it to the returned InputStream. This means that the SDK issues an awfully high number of requests to download large blobs. Increasing the chunk size is not an option, as that will mean that an awfully high amount of heap memory will be consumed by the download process. The Google SDK does not provide the right abstractions for a streaming download. This PR uses the lower-level primitives of the SDK to implement a streaming download, similar to what S3's SDK does. Also closes #55505	2020-04-21 13:22:26 +02:00
Nhat Nguyen	3cc4e0dd09	Retry follow task when remote connection queue full (#55314 ) If more than 100 shard-follow tasks are trying to connect to the remote cluster, then some of them will abort with "connect listener queue is full". This is because we retry on ESRejectedExecutionException, but not on RejectedExecutionException.	2020-04-20 22:43:05 -04:00
Stuart Tettemer	93a2e9b0f9	Test: MockScoreScript can be cacheable. (#55499 ) Backport: 0ed1eb5	2020-04-20 17:09:58 -06:00
Armin Braun	a0763d958d	Make RepositoryData Less Memory Heavy (#55293 ) (#55468 ) We don't really need `LinkedHashSet` here. We can assume that all the entries are unique and just use a list and use the list utilities to create the cheapest possible version of the list. Also, this fixes a bug in `addSnapshot` which would mutate the existing linked hash set on the current instance (fortunately this never caused a real world bug) and brings the collection in line with the java docs on its getter that claim immutability.	2020-04-20 18:28:06 +02:00
Dan Hermann	dc703d75f5	Add explicit generation attribute to data streams	2020-04-20 07:40:33 -05:00
Yannick Welsch	b9da307cd1	Add GCS support for searchable snapshots (#55403 ) Adds ranged read support for GCS repositories in order to enable searchable snapshot support for GCS. As part of this PR, I've extracted some of the test infrastructure to make sure that GoogleCloudStorageBlobContainerRetriesTests and S3BlobContainerRetriesTests are covering similar test (as I saw those diverging in what they cover)	2020-04-20 13:02:59 +02:00
David Turner	0458770556	Rebooted master-ineligibles should not bootstrap (#55302 ) In #55298 we saw a failure of `CoordinationStateTests#testSafety` in which a single master-eligible node is bootstrapped, then rebooted as a master-ineligible node (losing its persistent state) and then rebooted as a master-eligible node and bootstrapped again. This happens because this test loses too much of the persistent state; in fact once bootstrapped the node would not allow itself to be bootstrapped again. This commit adjusts the test logic to reflect this. Closes #55298	2020-04-20 09:10:35 +01:00
Zachary Tong	f46b567563	Convert InternalAggTestCase to AbstractNamedWriteableTestCase (#55250 ) Some aggregations, such as the Terms* family, will use an alternate class to represent unmapped shard results (while the rest of the aggs use the same object but with some form of "empty" or "nullish" values to represent unmapped). This was problematic with AbstractWireSerializingTestCase because it expects the instanceReader to always match the original class. Instead, we need to use the NamedWriteable version so that the registry can be consulted for the proper deserialization reader.	2020-04-17 16:39:38 -04:00
Rory Hunter	a5b545b2a0	Use LTS version of Ubuntu in Dockerfiles (#55370 ) We have some Dockerfiles that reference Ubuntu 19.04, which is not an LTS version and has now appears to have been retired from the Ubuntu repositories. Switch to 18.04, which is the current long-term support version. This also requires a switch from OpenJDK 12 to 11. Also change a usage of 16.04 to 18.04, for consistency.	2020-04-17 16:14:14 -04:00
Tanguy Leroux	71855fbfe0	Mute testSupportedFieldTypes in HDRPreAggregatedPercentile tests (#55369 ) Relates #55360	2020-04-17 10:49:43 +02:00
Martijn van Groningen	417d5f2009	Make data streams in APIs resolvable. (#55337 ) Backport from: #54726 The INCLUDE_DATA_STREAMS indices option controls whether data streams can be resolved in an api for both concrete names and wildcard expressions. If data streams cannot be resolved then a 400 error is returned indicating that data streams cannot be used. In this pr, the INCLUDE_DATA_STREAMS indices option is enabled in the following APIs: search, msearch, refresh, index (op_type create only) and bulk (index requests with op type create only). In a subsequent later change, we will determine which other APIs need to be able to resolve data streams and enable the INCLUDE_DATA_STREAMS indices option for these APIs. Whether an api resolve all backing indices of a data stream or the latest index of a data stream (write index) depends on the IndexNameExpressionResolver.Context.isResolveToWriteIndex(). If isResolveToWriteIndex() returns true then data streams resolve to the latest index (for example: index api) and otherwise a data stream resolves to all backing indices of a data stream (for example: search api). Relates to #53100	2020-04-17 08:33:37 +02:00
Mark Tozzi	22c55180c1	[7.x] Backport ValuesSourceRegistry and related work (#54922 ) * Add ValuesSource Registry and associated logic (#54281) * Remove ValuesSourceType argument to ValuesSourceAggregationBuilder (#48638) * ValuesSourceRegistry Prototype (#48758) * Remove generics from ValuesSource related classes (#49606) * fix percentile aggregation tests (#50712) * Basic thread safety for ValuesSourceRegistry (#50340) * Remove target value type from ValuesSourceAggregationBuilder (#49943) * Cleanup default values source type (#50992) * CoreValuesSourceType no longer implements Writable (#51276) * Remove genereics & hard coded ValuesSource references from Matrix Stats (#51131) * Put values source types on fields (#51503) * Remove VST Any (#51539) * Rewire terms agg to use new VS registry (#51182) Also adds some basic AggTestCases for untested code paths (and boilerplate for future tests once the IT are converted over) * Wire Cardinality aggregation to work with the ValuesSourceRegistry (#51337) * Wire Percentiles aggregator into new VS framework (#51639) This required a bit of a refactor to percentiles itself. Before, the Builder would switch on the chosen algo to generate an algo-specific factory. This doesn't work (or at least, would be difficult) in the new VS framework. This refactor consolidates both factories together and introduces a PercentilesConfig object to act as a standardized way to pass algo-specific parameters through the factory. This object is then used when deciding which kind of aggregator to create Note: CoreValuesSourceType.HISTOGRAM still lives in core, and will be moved in a subsequent PR. * Remove generics and target value type from MultiVSAB (#51647) * fix checkstyle after merge (#52008) * Plumb ValuesSourceRegistry through to QuerySearchContext (#51710) * Convert RareTerms to new VS registry (#52166) * Wire up Value Count (#52225) * Wire up Max & Min aggregations (#52219) * ValuesSource refactoring: Wire up Sum aggregation (#52571) * ValuesSource refactoring: Wire up SigTerms aggregation (#52590) * Soft immutability for VSConfig (#52729) * Unmute testSupportedFieldTypes, fix Percentiles/Ranks/Terms tests (#52734) Also fixes Percentiles which was incorrectly specified to only accept numeric, but in fact also accepts Boolean and Date (because those are numeric on master - thanks `testSupportedFieldTypes` for catching it!) * VS refactoring: Wire up stats aggregation (#52891) * ValuesSource refactoring: Wire up string_stats aggregation (#52875) * VS refactoring: Wire up median (MAD) aggregation (#52945) * fix valuesourcetype issue with constant_keyword field (#53041)x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/job/RollupIndexer.java this commit implements `getValuesSourceType` for the ConstantKeyword field type. master was merged into feature/extensible-values-source introducing a new field type that was not implementing `getValuesSourceType`. * ValuesSource refactoring: Wire up Avg aggregation (#52752) * Wire PercentileRanks aggregator into new VS framework (#51693) * Add a VSConfig resolver for aggregations not using the registry (#53038) * Vs refactor wire up ranges and date ranges (#52918) * Wire up geo_bounds aggregation to ValuesSourceRegistry (#53034) This commit updates the geo_bounds aggregation to depend on registering itself in the ValuesSourceRegistry relates #42949. * VS refactoring: convert Boxplot to new registry (#53132) * Wire-up geotile_grid and geohash_grid to ValuesSourceRegistry (#53037) This commit updates the geo_grid aggregations to depend on registering itself in the ValuesSourceRegistry relates to the values-source refactoring meta issue #42949. Wire-up geo_centroid agg to ValuesSourceRegistry (#53040) This commit updates the geo_centroid aggregation to depend on registering itself in the ValuesSourceRegistry. relates to the values-source refactoring meta issue #42949. * Fix type tests for Missing aggregation (#53501) * ValuesSource Refactor: move histo VSType into XPack module (#53298) - Introduces a new API (`getBareAggregatorRegistrar()`) which allows plugins to register aggregations against existing agg definitions defined in Core. - This moves the histogram VSType over to XPack where it belongs. `getHistogramValues()` still remains as a Core concept - Moves the histo-specific bits over to xpack (e.g. the actual aggregator logic). This requires extra boilerplate since we need to create a new "Analytics" Percentile/Rank aggregators to deal with the histo field. Doubly-so since percentiles/ranks are extra boiler-plate'y... should be much lighter for other aggs * Wire up DateHistogram to the ValuesSourceRegistry (#53484) * Vs refactor parser cleanup (#53198) Co-authored-by: Zachary Tong <polyfractal@elastic.co> Co-authored-by: Zachary Tong <zach@elastic.co> Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com> Co-authored-by: Tal Levy <JubBoy333@gmail.com> * First batch of easy fixes * Remove List.of from ValuesSourceRegistry Note that we intend to have a follow up PR dealing with the mutability of the registry, so I didn't even try to address that here. * More compiler fixes * More compiler fixes * More compiler fixes * Precommit is happy and so am I * Add new Core VSTs to tests * Disabled supported type test on SigTerms until we can backport it's fix * fix checkstyle * Fix test failure from semantic merge issue * Fix some metaData->metadata replacements that got lost * Fix list of supported types for MinAggregator * Fix list of supported types for Avg * remove unused import Co-authored-by: Zachary Tong <polyfractal@elastic.co> Co-authored-by: Zachary Tong <zach@elastic.co> Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com> Co-authored-by: Tal Levy <JubBoy333@gmail.com>	2020-04-16 16:54:46 -04:00
Rory Hunter	49f8f66a41	Revert "Use LTS version of Ubuntu in Dockerfiles (#55327 )" This reverts commit `dd76fbac60`.	2020-04-16 20:05:22 +01:00
Rory Hunter	dd76fbac60	Use LTS version of Ubuntu in Dockerfiles (#55327 ) We have some Dockerfiles that reference Ubuntu 19.04, which is not an LTS version and has now appears to have been retired from the Ubuntu repositories. Switch to 18.04, which is the current long-term support version. Also change a usage of 16.04 to 18.04, for consistency.	2020-04-16 19:47:18 +01:00
Christos Soulios	b810f0024a	[7.x] Backport AggregatorTestCase.writeTestDoc() (#55318 )	2020-04-16 21:10:18 +03:00
David Turner	8a565c4fa6	Voting config exclusions should work with absent nodes (#55291 ) Today the voting config exclusions API accepts node filters and resolves them to a collection of node IDs against the current cluster membership. This is problematic since we may want to exclude nodes that are not currently members of the cluster. For instance: - if attempting to remove a flaky node from the cluster you cannot reliably exclude it from the voting configuration since it may not reliably be a member of the cluster - if `cluster.auto_shrink_voting_configuration: false` then naively shrinking the cluster will remove some nodes but will leaving their node IDs in the voting configuration. The only way to clean up the voting configuration is to grow the cluster back to its original size (potentially replacing some of the voting configuration) and then use the exclusions API. This commit adds an alternative API that accepts node names and node IDs but not node filters in general, and deprecates the current node-filters-based API. Relates #47990. Backport of #50836 to 7.x. Co-authored-by: zacharymorn <zacharymorn@gmail.com>	2020-04-16 12:28:50 +01:00
William Brafford	2ba3be9db6	Remove deprecated third-party methods from tests (#55255 ) (#55269 ) I've noticed that a lot of our tests are using deprecated static methods from the Hamcrest matchers. While this is not a big deal in any objective sense, it seems like a small good thing to reduce compilation warnings and be ready for a new release of the matcher library if we need to upgrade. I've also switched a few other methods in tests that have drop-in replacements.	2020-04-15 17:54:47 -04:00
Ryan Ernst	29b70733ae	Use task avoidance with forbidden apis (#55034 ) Currently forbidden apis accounts for 800+ tasks in the build. These tasks are aggressively created by the plugin. In forbidden apis 3.0, we will get task avoidance (https://github.com/policeman-tools/forbidden-apis/pull/162), but we need to ourselves use the same task avoidance mechanisms to not trigger these task creations. This commit does that for our foribdden apis usages, in preparation for upgrading to 3.0 when it is released.	2020-04-15 13:27:53 -07:00
Armin Braun	2f91e2aab7	Fix Race in Snapshot Abort (#54873 ) (#55233 ) We can be a little more efficient when aborting a snapshot. Since we know the new repository data after finalizing the aborted snapshot when can pass it down to the snapshot completion listeners. This way, we don't have to fork off to the snapshot threadpool to get the repository data when the listener completes and can directly submit the delete task with high priority straight from the cluster state thread.	2020-04-15 15:42:15 +02:00
Dan Hermann	30638a0b41	[7.x] Wipe data streams in each REST test (#55009 )	2020-04-15 07:27:39 -05:00
Mark Vieira	ce85063653	[7.x] Re-add origin url information to publish POM files (#55173 )	2020-04-14 13:24:15 -07:00
Yannick Welsch	a610513ec7	Provide repository-level stats for searchable snapshots (#55051 ) Provides basic repository-level stats that will allow us to get some insight into how many requests are actually being made by the underlying SDK. Currently only tracks GET and LIST calls for S3 repositories. Most of the code is unfortunately boiler plate to add a new endpoint that will help us better understand some of the low-level dynamics of searchable snapshots.	2020-04-14 14:34:08 +02:00
William Brafford	52bebec51f	NodeInfo response should use a collection rather than fields (#54460 ) (#55132 ) This is a first cut at giving NodeInfo the ability to carry a flexible list of heterogeneous info responses. The trick is to be able to serialize and deserialize an arbitrary list of blocks of information. It is convenient to be able to deserialize into usable Java objects so that we can aggregate nodes stats for the cluster stats endpoint. In order to provide a little bit of clarity about which objects can and can't be used as info blocks, I've introduced a new interface called "ReportingService." I have removed the hard-coded getters (e.g., getOs()) in favor of a flexible method that can return heterogeneous kinds of info blocks (e.g., getInfo(OsInfo.class)). Taking a class as an argument removes the need to cast in the client code.	2020-04-13 17:18:39 -04:00
Nik Everett	c00811f3a3	Make some agg tests easier to read (#54954 ) (#55079 ) We added a fancy method to provide random realistic test data to the reduction tests in #54910. This uses that to remove some of the more esoteric machinations in the agg tests. This will marginally increase the coverage of the serialiation tests and, more importantly, remove some mysterious value generation code that only really made sense for random reduction tests but was used all over the place. It doesn't, on the other hand, make the tests shorter. Just hopefully more clear. I only cleaned up a few tests this way. If we like this it'd probably be worth grabbing others.	2020-04-10 14:15:30 -04:00
Martijn van Groningen	7f38b146b3	Temporarily preserve data streams after each yaml rest test has executed. (#54959 ) (#55007 ) Instead delete the data streams manually, until client yaml test runners have been updated to also delete all data streams after each yaml test. Relates to #53100	2020-04-09 14:44:57 +02:00
Tal Levy	254d1e3543	[7.x] Create new `geo` module and migrate geo_shape registration (#53562 ) (#54924 ) This commit introduces a new `geo` module that is intended to be contain all the geo-spatial-specific features in server. As a first step, the responsibility of registering the geo_shape field mapper is moved to this module. Co-authored-by: Nicholas Knize <nknize@gmail.com>	2020-04-07 16:30:58 -07:00
Tim Brooks	619028c33e	Implement transport circuit breaking in aggregator (#54927 ) This commit moves the action name validation and circuit breaking into the InboundAggregator. This work is valuable because it lays the groundwork for incrementally circuit breaking as data is received. This PR includes the follow behavioral change: Handshakes contribute to circuit breaking, but cannot be broken. They currently do not contribute nor are they broken.	2020-04-07 17:10:31 -06:00
Tim Brooks	c7053ef824	Use TransportChannel in TransportHandshaker (#54921 ) Currently the TransportHandshaker has a specialized codepath for sending a response. In other work, we are going to start having handshakes contribute to circuit breaking (while not being breakable). This commit moves in that direction by allowing the handshaker to responding using a standard TcpTransportChannel similar to other requests.	2020-04-07 15:37:15 -06:00
Nik Everett	ce7ae4a7d1	Remove pipline aggs from agg result tree (backport of #54716 ) (#54920 ) This removes pipeline aggregators from the aggregation result tree except for a single field used for backwards compatibility with pre-7.8 versions of Elasticsearch. That field isn't populated unless we are serializing to pre-7.8 Elasticsearch. So, good news! We no longer build pipeline aggregators on the data node. Most of the time.	2020-04-07 17:22:23 -04:00
Nik Everett	100f7258c7	Improve agg reduce tests (#54910 ) (#54914 ) This allows subclasses of `InternalAggregationTestCase` to make a `List` of values to reduce so that it can make values that are realistic together. The first use of this is with `InternalTTest` which uses it to make results that don't cause their `sum` field to wrap. It'd likely be useful for a ton of other aggs but just one for now.	2020-04-07 17:22:04 -04:00
Tim Brooks	9cf2406cf1	Move network stats marking into InboundPipeline (#54908 ) This is a follow-up to #48263. It moves the inbound stats tracking inside of the InboundPipeline.	2020-04-07 13:34:05 -06:00
Nik Everett	3c56e0de42	Fix scripted metric in ccs (backport of #54776 ) (#54888 ) `scripted_metric` did not work with cross cluster search because it assumed that you'd never perform a partial reduction, serialize the results, and then perform a final reduction. That serialized-after-partial-reduction step was broken. This is also required to support #54758.	2020-04-07 10:43:00 -04:00
Tanguy Leroux	4d36917e52	Merge feature/searchable-snapshots branch into 7.x (#54803 ) (#54825 ) This is a backport of #54803 for 7.x. This pull request cherry picks the squashed commit from #54803 with the additional commits: 6f50c92 which adjusts master code to 7.x a114549 to mute a failing ILM test (#54818) 48cbca1 and 50186b2 that cleans up and fixes the previous test aae12bb that adds a missing feature flag (#54861) 6f330e3 that adds missing serialization bits (#54864) bf72c02 that adjust the version in YAML tests a51955f that adds some plumbing for the transport client used in integration tests Co-authored-by: David Turner <david.turner@elastic.co> Co-authored-by: Yannick Welsch <yannick@welsch.lu> Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com> Co-authored-by: Andrei Dan <andrei.dan@elastic.co>	2020-04-07 13:28:53 +02:00
Costin Leau	a7c31a7632	Test: Include ML ILM policy in EsRestTest (#54773 ) This should avoid REST failures caused by the inability to delete said policy Fix #54759 (cherry picked from commit 3ba5e02b713c03b1bdf14e0367a2bce68c35dd30)	2020-04-05 20:10:17 +03:00
Jason Tedor	05c5529b2d	Clean up a few instances of "MetaData" We recently cleaned up the use of the word "metadata" across the codebase. A few additional uses have trickled in, likely from in-progress work. This commit cleans up these last few instances. Relates #54519	2020-04-04 10:55:09 -04:00
Christoph Büscher	8c9ac14a98	Rename field name constants in AbstractBuilderTestCase (#53234 ) Some field name constants were not updaten when we moved from "string" to "text" and "keyword" fields. Renaming them makes it easier and faster to know which field type is used in test subclassing this base test case.	2020-04-03 17:28:22 +02:00
Nik Everett	54ea4f4f50	Begin to drop pipeline aggs from the result tree (backport of #54311 ) (#54659 ) Removes pipeline aggregations from the aggregation result tree as they are no longer used. This stops us from building the pipeline aggregators at all on data nodes except for backwards compatibility serialization. This will save a tiny bit of space in the aggregation tree which is lovely, but the biggest benefit is that it is a step towards simplifying pipeline aggregators. This only does about half of the work to remove the pipeline aggs from the tree. Removing all of it would, well, double the size of the change and make it harder to review.	2020-04-02 16:45:12 -04:00
Nik Everett	a5adac0d1e	Fix pipeline agg serialization for ccs (backport of #54282 ) (#54468 ) This fixes pipeline aggregations used in cross cluster search from an older version of Elasticsearch to a newer version of Elasticsearch. I broke this in #53730 when I was too aggressive in shutting off serialization of pipeline aggs. In particular, this comes up when the coordinating node is pre-7.8.0 and the gateway node is on or after 7.8.0. The fix is another step down the line to remove pipeline aggregators from the aggregation tree. Sort of. It create a new `List<PipelineAggregator>` member in `InternalAggregation` but it is only used for bwc serialization and it is fed by the mechanism established in #53730 to read the pipelines from the	2020-04-02 10:35:40 -04:00
Andy Bristol	eb14635f1f	add tests to StatsAggregatorTests (#53768 ) Adds tests for supported ValuesSourceTypes, unmapped fields, scripting, and the missing param. The tests for unmapped fields and scripting are migrated from the StatsIT integration test	2020-04-01 17:07:51 -07:00
William Brafford	958e9d1b78	Refactor nodes stats request builders to match requests (#54363 ) (#54604 ) * Refactor nodes stats request builders to match requests (#54363) * Remove hard-coded setters from NodesInfoRequestBuilder * Remove hard-coded setters from NodesStatsRequest * Use static imports to reduce clutter * Remove uses of old info APIs	2020-04-01 17:03:04 -04:00
Ioannis Kakavas	c9ffa379ba	[7.x] Add end to end QA authentication test (#54215 ) (#54567 ) Use the same ES cluster as both an SP and an IDP and perform IDP initiated and SP initiated SSO. The REST client plays the role of both the Cloud UI and Kibana in these flows Backport of #54215 * fix compilation issues	2020-04-01 18:35:21 +03:00
David Turner	6d976e1468	Resolve some coordination-layer TODOs (#54511 ) This commit removes a handful of TODO comments in the cluster coordination layer that no longer apply. Relates #32006	2020-04-01 12:36:18 +01:00
Lee Hinman	987b6ecffa	[7.x] Be lenient when deleting index and component templates a… (#54538 ) We previously checked for a 405 response, but on 7.x BWC tests may be hitting versions that don't support the 405 response (returning 400 or 500), so be more lenient in those cases. Relates to #54513	2020-03-31 16:24:05 -06:00
Jason Tedor	63e5f2b765	Rename META_DATA to METADATA This is a follow up to a previous commit that renamed MetaData to Metadata in all of the places. In that commit in master, we renamed META_DATA to METADATA, but lost this on the backport. This commit addresses that.	2020-03-31 17:30:51 -04:00
Jason Tedor	5fcda57b37	Rename MetaData to Metadata in all of the places (#54519 ) This is a simple naming change PR, to fix the fact that "metadata" is a single English word, and for too long we have not followed general naming conventions for it. We are also not consistent about it, for example, METADATA instead of META_DATA if we were trying to be consistent with MetaData (although METADATA is correct when considered in the context of "metadata"). This was a simple find and replace across the code base, only taking a few minutes to fix this naming issue forever.	2020-03-31 17:24:38 -04:00
Zachary Tong	c9db2de41d	[7.x] Comprehensively test supported/unsupported field type:agg combinations (#54451 ) * Comprehensively test supported/unsupported field type:agg combinations (#52493) This adds a test to AggregatorTestCase that allows us to programmatically verify that an aggregator supports or does not support a particular field type. It fetches the list of registered field type parsers, creates a MappedFieldType from the parser and then attempts to run a basic agg against the field. A supplied list of supported VSTypes are then compared against the output (success or exception) and suceeds or fails the test accordingly. Co-Authored-By: Mark Tozzi <mark.tozzi@gmail.com> * Skip fields that are not aggregatable * Use newIndexSearcher() to avoid incompatible readers (#52723) Lucene's `newSearcher()` can generate readers like ParallelCompositeReader which we can't use. We need to instead use our helper `newIndexSearcher`	2020-03-31 14:35:03 -04:00
Lee Hinman	a3d1945254	[7.x] Add warnings/errors when V2 templates would match same i… (#54449 ) * Add warnings/errors when V2 templates would match same indices… (#54367) * Add warnings/errors when V2 templates would match same indices as V1 With the introduction of V2 index templates, we want to warn users that templates they put in place might not take precedence (because v2 templates are going to "win"). This adds this validation at `PUT` time for both V1 and V2 templates with the following rules: ** When creating or updating a V2 template - If the v2 template would match indices for an existing v1 template or templates, provide a warning (through the deprecation logging so it shows up to the client) as well as logging the warning The v2 warning looks like: ``` index template [my-v2-template] has index patterns [foo-] matching patterns from existing older templates [old-v1-template,match-all-template] with patterns (old-v1-template => [foo],match-all-template => []); this template [my-v2-template] will take precedence during new index creation ``` * When creating a V1 template - If the v1 template is for index patterns of `""` and a v2 template exists, warn that the v2 template may take precedence - If the v1 template is for index patterns other than all indices, and a v2 template exists that would match, throw an error preventing creation of the v1 template * When updating a V1 template (without changing its existing `index_patterns`!) - If the v1 template is for index patterns that would match an existing v2 template, warn that the v2 template may take precedence. The v1 warning looks like: ``` template [my-v1-template] has index patterns [] matching patterns from existing index templates [existing-v2-template] with patterns (existing-v2-template => [foo]); this template [my-v1-template] may be ignored in favor of an index template at index creation time ``` And the v1 error looks like: ``` template [my-v1-template] has index patterns [foo] matching patterns from existing index templates [existing-v2-template] with patterns (existing-v2-template => [f]), use index templates (/_index_template) instead ``` Relates to #53101 * Remove v2 index and component templates when cleaning up tests * Finish half-finished comment sentence * Guard template removal and ignore for earlier versions of ES Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> * Also ignore 500 errors when clearing index template v2 templates Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-03-30 13:25:50 -06:00
Tim Brooks	2ccddbfa88	Move transport decoding and aggregation to server (#54360 ) Currently all of our transport protocol decoding and aggregation occurs in the individual transport modules. This means that each implementation (test, netty, nio) must implement this logic. Additionally, it means that the entire message has been read from the network before the server package receives it. This commit creates a pipeline in server which can be passed arbitrary bytes to handle. Internally, the pipeline will decode, decompress, and aggregate the messages. Additionally, this allows us to run many megabytes of bytes through the pipeline in tests to ensure that the logic works. This work will enable future work: Circuit breaking or backoff logic based on message type and byte in the content aggregator. Sharing bytes with the application layer using the ref counted releasable network bytes. Improved network monitoring based specifically on channels. Finally, this fixes the bug where we do not circuit break on the correct message size when compression is enabled.	2020-03-27 14:13:10 -06:00
Stuart Tettemer	1630de4a42	Scripting: stats per context in nodes stats (#54008 ) (#54357 ) Adds script cache stats to `_node/stats`. If using the general cache: ``` "script_cache": { "sum": { "compilations": 12, "cache_evictions": 9, "compilation_limit_triggered": 5 } } ``` If using context caches: ``` "script_cache": { "sum": { "compilations": 13, "cache_evictions": 9, "compilation_limit_triggered": 5 }, "contexts": [ { "context": "aggregation_selector", "compilations": 8, "cache_evictions": 6, "compilation_limit_triggered": 3 }, { "context": "aggs", "compilations": 5, "cache_evictions": 3, "compilation_limit_triggered": 2 }, ``` Backport of: 32f46f2 Refs: #50152	2020-03-27 12:26:00 -06:00
Yannick Welsch	8176f1c7f8	Delegate toXContent logic from ClusterState to its member classes (#54192 ) Backport of #52743 Co-authored-by: Yannick Welsch <yannick@welsch.lu>	2020-03-26 15:08:30 +01:00
Yannick Welsch	1ba6783780	Schedule commands in current thread context (#54187 ) Changes ThreadPool's schedule method to run the schedule task in the context of the thread that scheduled the task. This is the more sensible default for this method, and eliminates a range of bugs where the current thread context is mistakenly dropped. Closes #17143	2020-03-26 10:07:59 +01:00
Nik Everett	8f40f1435a	Save a little space in agg tree (backport of #53730 ) (#54213 ) This drop the "top level" pipeline aggregators from the aggregation result tree which should save a little memory and a few serialization bytes. Perhaps more imporantly, this provides a mechanism by which we can remove all pipelines from the aggregation result tree. This will save quite a bit of space when pipelines are deep in the tree. Sadly, doing this isn't simple because of backwards compatibility. Nodes before 7.7.0 need those pipelines. We provide them by setting passing a `Supplier<PipelineTree>` into the root of the aggregation tree that we only call if we need to serialize to a version before 7.7.0. This solution works for cross cluster search because we always reduce the aggregations in each remote cluster and then forward them back to the coordinating node. Its quite possible that the coordinating node needs the pipeline (say it is version 7.1.0) and the gateway node in the remote cluster doesn't (version 7.7.0). In that case the data nodes won't send the pipeline aggregations back to the gateway node. Critically, the gateway node will send the pipeline aggregations back to the coordinating node. This is all managed with that `Supplier<PipelineTree>`, but how it is managed is a bit tricky.	2020-03-25 15:51:16 -04:00
James Baiera	b84c74cf70	Update the HDFS version used by HDFS Repo (#53693 ) (#54125 )	2020-03-25 14:01:29 -04:00
Armin Braun	4271963462	Revert "Use Azure Bulk Deletes in Azure Repository (#53919 )" (#54089 ) (#54111 ) This reverts commit 23cccf088810b8416ed278571352393cc2de9523. Unfortunately SAS token auth still doesn't work with bulk deletes so we can't use them yet. Closes #54080	2020-03-25 12:13:25 +01:00
Yannick Welsch	e006d1f6cf	Use special XContent registry for node tool (#54050 ) Fixes an issue where the elasticsearch-node command-line tools would not work correctly because PersistentTasksCustomMetaData contains named XContent from plugins. This PR makes it so that the parsing for all custom metadata is skipped, even if the core system would know how to handle it. Closes #53549	2020-03-24 17:40:51 +01:00
Tanguy Leroux	dea8a31480	Wait for Active license before running CCR API tests (#53966 ) DocsClientYamlTestSuiteIT sometimes fails for CCR related tests because tests are started before the license is fully applied and active within the cluster. The first tests to be executed then fails with the error noticed in #53430. This can be easily reproduced locally by only running CCR docs tests. This commit adds some @Before logic in DocsClientYamlTestSuiteIT so that it waits for the license to be active before running CCR tests. Closes #53430	2020-03-24 14:29:45 +01:00
Nik Everett	b9bfba2c8b	Move pipeline agg validation to coordinating node (backport of #53669 ) (#54019 ) This moves the pipeline aggregation validation from the data node to the coordinating node so that we, eventually, can stop sending pipeline aggregations to the data nodes entirely. In fact, it moves it into the "request validation" stage so multiple errors can be accumulated and sent back to the requester for the entire request. We can't always take advantage of that, but it'll be nice for folks not to have to play whack-a-mole with validation. This is implemented by replacing `PipelineAggretionBuilder#validate` with: ``` protected abstract void validate(ValidationContext context); ``` The `ValidationContext` handles the accumulation of validation failures, provides access to the aggregation's siblings, and implements a few validation utility methods.	2020-03-23 17:22:56 -04:00
Marios Trivyzas	3a3e964956	Reduce performance impact of ExitableDirectoryReader (#53978 ) (#54014 ) Benchmarking showed that the effect of the ExitableDirectoryReader is reduced considerably when checking every 8191 docs. Moreover, set the cancellable task before calling QueryPhase#preProcess() and make sure we don't wrap with an ExitableDirectoryReader at all when lowLevelCancellation is set to false to avoid completely any performance impact. Follows: #52822 Follows: #53166 Follows: #53496 (cherry picked from commit cdc377e8e74d3ca6c231c36dc5e80621aab47c69)	2020-03-23 21:30:34 +01:00
Ryan Ernst	960d1fb578	Revert "Introduce system index APIs for Kibana (#53035 )" (#53992 ) This reverts commit `c610e0893d`. backport of #53912	2020-03-23 10:29:35 -07:00
Armin Braun	5b9864db2c	Better Incrementality for Snapshots of Unchanged Shards (#52182 ) (#53984 ) Use sequence numbers and force merge UUID to determine whether a shard has changed or not instead before falling back to comparing files to get incremental snapshots on primary fail-over.	2020-03-23 16:43:41 +01:00
Armin Braun	b51ea25a00	Use Azure Bulk Deletes in Azure Repository (#53919 ) (#53967 ) Now that we upgraded the Azure SDK to 8.6.2 in #53865 we can make use of bulk deletes.	2020-03-23 13:35:05 +01:00
Tim Vernum	cde8725e3c	Create API Key on behalf of other user (#53943 ) This change adds a "grant API key action" POST /_security/api_key/grant that creates a new API key using the privileges of one user ("the system user") to execute the action, but creates the API key with the roles of the second user ("the end user"). This allows a system (such as Kibana) to create API keys representing the identity and access of an authenticated user without requiring that user to have permission to create API keys on their own. This also creates a new QA project for security on trial licenses and runs the API key tests there Backport of: #52886	2020-03-23 18:50:07 +11:00
David Turner	adfeb50a53	Use consistent threadpools in CoordinatorTests (#53868 ) Today in the `CoordinatorTests` each node uses multiple threadpools. This is mostly fine as they are almost completely stateless, except for the `ThreadContext`: by using multiple threadpools we cannot make assertions that the thread context is/isn't preserved as we expect. This commit consolidates the threadpool instances in use so that each node uses just one.	2020-03-20 16:22:42 +01:00
Armin Braun	7d66c7e25c	Rethrow Failed Assertion in BlobStoreTestUtils (#53859 ) (#53863 ) This fixes two issues: 1. Currently, the future here is never resolved on assertion error so a failing test would take a full minute to complete until the future times out. 2. S3 tests overide this method to busy assert on this method. This only works if an assertion error makes it to the calling thread. Closes #53508	2020-03-20 15:31:48 +01:00
Alan Woodward	10a2a95b25	Fix backport merge error in ESTestCase We were ignoring the `stripXContentPosition` boolean when checking for warnings.	2020-03-20 12:12:47 +00:00
Alan Woodward	d23112f441	Report parser name and location in XContent deprecation warnings (#53805 ) It's simple to deprecate a field used in an ObjectParser just by adding deprecation markers to the relevant ParseField objects. The warnings themselves don't currently have any context - they simply say that a deprecated field has been used, but not where in the input xcontent it appears. This commit adds the parent object parser name and XContentLocation to these deprecation messages. Note that the context is automatically stripped from warning messages when they are asserted on by integration tests and REST tests, because randomization of xcontent type during these tests means that the XContentLocation is not constant	2020-03-20 11:52:55 +00:00
David Turner	7d3ac4f57d	Revert "Apply cluster states in system context (#53785 )" This reverts commit `4178c57410`.	2020-03-19 15:20:36 +00:00
David Turner	4178c57410	Apply cluster states in system context (#53785 ) Today cluster states are sometimes (rarely) applied in the default context rather than system context, which means that any appliers which capture their contexts cannot do things like remote transport actions when security is enabled. There are at least two ways that we end up applying the cluster state in the default context: 1. locally applying a cluster state that indicates that the master has failed 2. the elected master times out while waiting for a response from another node This commit ensures that cluster states are always applied in the system context. Mitigates #53751	2020-03-19 14:48:55 +00:00
Stuart Tettemer	cdbee32f55	Scripting: Per-context script cache, default off (#52855 ) (#53756 ) * Adds per context settings: `script.context.${CONTEXT}.cache_max_size` ~ `script.cache.max_size` `script.context.${CONTEXT}.cache_expire` ~ `script.cache.expire` `script.context.${CONTEXT}.max_compilations_rate` ~ `script.max_compilations_rate` * Context cache is used if: `script.max_compilations_rate=use-context`. This value is dynamically updatable, so users can switch back to the general cache if desired. * Settings for context caches take the first value that applies: 1) Context specific settings if set, eg `script.context.ingest.cache_max_size` 2) Correlated general setting is set to the non-default value, eg `script.cache.max_size` 3) Context default The reason for 2's inclusion is to allow an easy transition for users who've customized their general cache settings. Using the general cache settings for the context caches results in higher effective settings, since they are multiplied across the number of contexts. So a general cache max size of 200 will become 200 * # of contexts. However, this behavior it will avoid users snapping to a value that is too low for them. Backport of: #52855 Refs: #50152	2020-03-18 14:44:04 -06:00
Ryan Ernst	5c472fcb47	Upgrade jackson to 2.10.3 and GeoIP to 2.13.1 (#53642 ) Re-applies the change from #53523 along with test fixes. closes #53626 closes #53624 closes #53622 closes #53625 Co-authored-by: Nik Everett <nik9000@gmail.com> Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com> Co-authored-by: Jake Landis <jake.landis@elastic.co>	2020-03-17 10:28:51 -07:00
Ryan Ernst	6a7baddc2b	Update to ASM 7.1 in logger usage check (#52742 ) The logger usage check uses its own version of ASM to inspect class files for logging usages. Master was updated to support java 11 compilation in #40754. However, 7.x still used ASM 5, which could not read newer java bytecode versions. This commit bumps ASM in 7.x used in the logger usage check. closes #52408	2020-03-16 18:50:50 -07:00
Jason Tedor	8c96112613	Fix compilation in simple transport test case base This commit fixes compilation after a backport gone bad due to a difference in the minimum Java version between 7.x and master.	2020-03-16 21:44:58 -04:00
Jason Tedor	01d2339883	Invoke response handler on failure to send (#53631 ) Today it can happen that a transport message fails to send (for example, because a transport interceptor rejects the request). In this case, the response handler is never invoked, which can lead to necessary cleanups not being performed. There are two ways to handle this. One is to expect every callsite that sends a message to try/catch these exceptions and handle them appropriately. The other is merely to invoke the response handler to handle the exception, which is already equipped to handle transport exceptions.	2020-03-16 21:28:24 -04:00
Nik Everett	f0beab4041	Stop using round-tripped PipelineAggregators (backport of #53423 ) (#53629 ) This begins to clean up how `PipelineAggregator`s and executed. Previously, we would create the `PipelineAggregator`s on the data nodes and embed them in the aggregation tree. When it came time to execute the pipeline aggregation we'd use the `PipelineAggregator`s that were on the first shard's results. This is inefficient because: 1. The data node needs to make the `PipelineAggregator` only to serialize it and then throw it away. 2. The coordinating node needs to deserialize all of the `PipelineAggregator`s even though it only needs one of them. 3. You end up with many `PipelineAggregator` instances when you only really need one per pipeline. 4. `PipelineAggregator` needs to implement serialization. This begins to undo these by building the `PipelineAggregator`s directly on the coordinating node and using those instead of the `PipelineAggregator`s in the aggregtion tree. In a follow up change we'll stop serializing the `PipelineAggregator`s to node versions that support this behavior. And, one day, we'll be able to remove `PipelineAggregator` from the aggregation result tree entirely. Importantly, this doesn't change how pipeline aggregations are declared or parsed or requested. They are still part of the `AggregationBuilder` tree because that makes sense.	2020-03-16 16:15:23 -04:00
Jim Ferenczi	e6680be0b1	Add new x-pack endpoints to track the progress of a search asynchronously (#49931 ) (#53591 ) This change introduces a new API in x-pack basic that allows to track the progress of a search. Users can submit an asynchronous search through a new endpoint called `_async_search` that works exactly the same as the `_search` endpoint but instead of blocking and returning the final response when available, it returns a response after a provided `wait_for_completion` time. ```` GET my_index_pattern/_async_search?wait_for_completion=100ms { "aggs": { "date_histogram": { "field": "@timestamp", "fixed_interval": "1h" } } } ```` If after 100ms the final response is not available, a `partial_response` is included in the body: ```` { "id": "9N3J1m4BgyzUDzqgC15b", "version": 1, "is_running": true, "is_partial": true, "response": { "_shards": { "total": 100, "successful": 5, "failed": 0 }, "total_hits": { "value": 1653433, "relation": "eq" }, "aggs": { ... } } } ```` The partial response contains the total number of requested shards, the number of shards that successfully returned and the number of shards that failed. It also contains the total hits as well as partial aggregations computed from the successful shards. To continue to monitor the progress of the search users can call the get `_async_search` API like the following: ```` GET _async_search/9N3J1m4BgyzUDzqgC15b/?wait_for_completion=100ms ```` That returns a new response that can contain the same partial response than the previous call if the search didn't progress, in such case the returned `version` should be the same. If new partial results are available, the version is incremented and the `partial_response` contains the updated progress. Finally if the response is fully available while or after waiting for completion, the `partial_response` is replaced by a `response` section that contains the usual _search response: ```` { "id": "9N3J1m4BgyzUDzqgC15b", "version": 10, "is_running": false, "response": { "is_partial": false, ... } } ```` Asynchronous search are stored in a restricted index called `.async-search` if they survive (still running) after the initial submit. Each request has a keep alive that defaults to 5 days but this value can be changed/updated any time: ````` GET my_index_pattern/_async_search?wait_for_completion=100ms&keep_alive=10d ````` The default can be changed when submitting the search, the example above raises the default value for the search to `10d`. ````` GET _async_search/9N3J1m4BgyzUDzqgC15b/?wait_for_completion=100ms&keep_alive=10d ````` The time to live for a specific search can be extended when getting the progress/result. In the example above we extend the keep alive to 10 more days. A background service that runs only on the node that holds the first primary shard of the `async-search` index is responsible for deleting the expired results. It runs every hour but the expiration is also checked by running queries (if they take longer than the keep_alive) and when getting a result. Like a normal `_search`, if the http channel that is used to submit a request is closed before getting a response, the search is automatically cancelled. Note that this behavior is only for the submit API, subsequent GET requests will not cancel if they are closed. Asynchronous search are not persistent, if the coordinator node crashes or is restarted during the search, the asynchronous search will stop. To know if the search is still running or not the response contains a field called `is_running` that indicates if the task is up or not. It is the responsibility of the user to resume an asynchronous search that didn't reach a final response by re-submitting the query. However final responses and failures are persisted in a system index that allows to retrieve a response even if the task finishes. ```` DELETE _async_search/9N3J1m4BgyzUDzqgC15b ```` The response is also not stored if the initial submit action returns a final response. This allows to not add any overhead to queries that completes within the initial `wait_for_completion`. The `.async-search` index is a restricted index (should be migrated to a system index in +8.0) that is accessible only through the async search APIs. These APIs also ensure that only the user that submitted the initial query can retrieve or delete the running search. Note that admins/superusers would still be able to cancel the search task through the task manager like any other tasks. Relates #49091 Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com>	2020-03-16 15:31:27 +01:00
Nhat Nguyen	6665ebe7ab	Harden search context id (#53143 ) Using a Long alone is not strong enough for the id of search contexts because we reset the id generator whenever a data node is restarted. This can lead to two issues: 1. Fetch phase can fetch documents from another index 2. A scroll search can return documents from another index This commit avoids these issues by adding a UUID to SearchContexId.	2020-03-11 11:48:11 -04:00
Gordon Brown	ff9b8bda63	Implement hidden aliases (#52547 ) This commit introduces hidden aliases. These are similar to hidden indices, in that they are not visible by default, unless explicitly specified by name or by indicating that hidden indices/aliases are desired. The new alias property, `is_hidden` is implemented similarly to `is_write_index`, except that it must be consistent across all indices with a given alias - that is, all indices with a given alias must specify the alias as either hidden, or all specify it as non-hidden, either explicitly or by omitting the `is_hidden` property.	2020-03-06 16:02:38 -07:00
Jay Modi	a81460dbf5	Make watch history indices hidden (#52974 ) This commit updates the template used for watch history indices with the hidden index setting so that new indices will be created as hidden. Relates #50251 Backport of #52962	2020-03-06 09:47:03 -07:00
Nik Everett	f32e4583d1	Add `allowed_warnings` to yaml tests (backport of #53139 ) (#53173 ) When we test backwards compatibility we often end up in a situation where we sometimes get a warning, and sometimes don't. Like, we won't get the warning if we're testing against an older version, but we will in a newer one. Or we won't get the warning if the request randomly lands on a node with an old version of the code. But we wouldn't if it randomed into a node with newer code. This adds `allowed_warnings` to our yaml test runner for those cases: warnings declared this way are "allowed" but not "required". Blocks #52959 Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>	2020-03-05 17:11:54 -05:00
Marios Trivyzas	487d442760	Implement Exitable DirectoryReader (#52822 ) (#53162 ) Implement an Exitable DirectoryReader that wraps the original DirectoryReader so that when a search task is cancelled the DirectoryReaders also stop their work fast. This is usuful for expensive operations like wilcard/prefix queries where the DirectoryReaders can spend lots of time and consume resources, as previously their work wouldn't stop even though the original search task was cancelled (e.g. because of timeout or dropped client connection). (cherry picked from commit 67acaf61f33bc5f54e26541514d07e375c202e03)	2020-03-05 14:17:31 +01:00
Nik Everett	28df7ae5ed	Support multiple metrics in `top_metrics` agg (backport of #52965 ) (#53163 ) This adds support for returning multiple metrics to the `top_metrics` agg. It looks like: ``` POST /test/_search?filter_path=aggregations { "aggs": { "tm": { "top_metrics": { "metrics": [ {"field": "v"}, {"field": "m"} ], "sort": {"s": "desc"} } } } } ```	2020-03-05 08:12:01 -05:00
Armin Braun	204c366a4e	Upgrade GCS SDK to 1.104.0 (#52839 ) (#53152 ) Upgrading the GCS SDK to the most recent version. Adjusting (i.e. improving) the REST mock accordingly. This should significantly boost performance by pulling in https://github.com/googleapis/java-core/issues/86 in some cases.	2020-03-05 11:18:18 +01:00
Tanguy Leroux	52d4807f8d	Mute GoogleCloudStorageBlobStoreRepositoryTests on jdk8 (#53119 ) Tests in GoogleCloudStorageBlobStoreRepositoryTests are known to be flaky on JDK 8 (#51446, #52430 ) and we suspect a JDK bug (https://bugs.openjdk.java.net/browse/JDK-8180754) that triggers some assertion on the server side logic that emulates the Google Cloud Storage service. Sadly we were not able to reproduce the failures, even when using the same OS (Debian 9, Ubuntu 16.04) and JDK (Oracle Corporation 1.8.0_241 [Java HotSpot(TM) 64-Bit Server VM 25.241-b07]) of almost all the test failures on CI. While we spent some time fixing code (#51933, #52431) to circumvent the JDK bug they are still flaky on JDK-8. This commit mute these tests for JDK-8 only. Close ##52906	2020-03-05 09:18:05 +01:00
Jay Modi	c610e0893d	Introduce system index APIs for Kibana (#53035 ) This commit introduces a module for Kibana that exposes REST APIs that will be used by Kibana for access to its system indices. These APIs are wrapped versions of the existing REST endpoints. A new setting is also introduced since the Kibana system indices' names are allowed to be changed by a user in case multiple instances of Kibana use the same instance of Elasticsearch. Additionally, the ThreadContext has been extended to indicate that the use of system indices may be allowed in a request. This will be built upon in the future for the protection of system indices. Backport of #52385	2020-03-03 14:11:36 -07:00
Nhat Nguyen	e6755afeeb	Upgrade to Lucene 8.5.0-snapshot-c4475920b08 (#52950 ) (#52977 ) To give LUCENE-9228 more CI cycles	2020-02-29 09:29:16 -05:00
Nhat Nguyen	fa701e4c1f	Fix testRetentionLeasesEstablishedWhenRelocatingPrimary (#52445 ) Replace the current assertion with a more robust assertion. Closes #52364	2020-02-26 22:06:01 -05:00
Armin Braun	f7d71b5930	Fix GCS Mock Range Downloads (#52804 ) (#52830 ) We were not correctly respecting the download range which lead to the GCS SDK client closing the connection at times. Also, fixes another instance of failing to drain the request fully before sending the response headers. Closes #51446	2020-02-26 18:38:02 +01:00
Adrien Grand	1807f86751	Generalize how queries on `_index` are handled at rewrite time (#52815 ) Generalize how queries on `_index` are handled at rewrite time (#52486) Since this change refactors rewrites, I also took it as an opportunity to adrress #49254: instead of returning the same queries you would get on a keyword field when a field is unmapped, queries get rewritten to a MatchNoDocsQueryBuilder. This change exposed a couple bugs, like the fact that the percolator doesn't rewrite queries at query time, or that the significant_terms aggregation doesn't rewrite its inner filter, which I fixed. Closes #49254	2020-02-26 15:37:43 +01:00
Adrien Grand	f993ef80f8	Move the terms index of `_id` off-heap. (#52518 ) In #42838 we moved the terms index of all fields off-heap except the `_id` field because we were worried it might make indexing slower. In general, the indexing rate is only affected if explicit IDs are used, as otherwise Elasticsearch almost never performs lookups in the terms dictionary for the purpose of indexing. So it's quite wasteful to require the terms index of `_id` to be loaded on-heap for users who have append-only workloads. Furthermore I've been conducting benchmarks when indexing with explicit ids on the http_logs dataset that suggest that the slowdown is low enough that it's probably not worth forcing the terms index to be kept on-heap. Here are some numbers for the median indexing rate in docs/s: \| Run \| Master \| Patch \| \| --- \| ------- \| ------- \| \| 1 \| 45851.2 \| 46401.4 \| \| 2 \| 45192.6 \| 44561.0 \| \| 3 \| 45635.2 \| 44137.0 \| \| 4 \| 46435.0 \| 44692.8 \| \| 5 \| 45829.0 \| 44949.0 \| And now heap usage in MB for segments: \| Run \| Master \| Patch \| \| --- \| ------- \| -------- \| \| 1 \| 41.1720 \| 0.352083 \| \| 2 \| 45.1545 \| 0.382534 \| \| 3 \| 41.7746 \| 0.381285 \| \| 4 \| 45.3673 \| 0.412737 \| \| 5 \| 45.4616 \| 0.375063 \| Indexing rate decreased by 1.8% on average, while memory usage decreased by more than 100x. The `http_logs` dataset contains small documents and has a simple indexing chain. More complex indexing chains, e.g. with more fields, ingest pipelines, etc. would see an even lower decrease of indexing rate.	2020-02-24 18:14:12 +01:00
bellengao	02cb5b6c0e	Return 429 status code on read_only_allow_delete index block (#50166 ) We consider index level read_only_allow_delete blocks temporary since the DiskThresholdMonitor can automatically release those when an index is no longer allocated on nodes above high threshold. The rest status has therefore been changed to 429 when encountering this index block to signal retryability to clients. Related to #49393	2020-02-22 16:24:25 +01:00
Jay Modi	8abfda0b59	Rename assertThrows to prevent naming clash (#52651 ) This commit renames ElasticsearchAssertions#assertThrows to assertRequestBuilderThrows and assertFutureThrows to avoid a naming clash with JUnit 4.13+ and static imports of these methods. Additionally, these methods have been updated to make use of expectThrows internally to avoid duplicating the logic there. Relates #51787 Backport of #52582	2020-02-21 13:30:11 -07:00
Stuart Tettemer	376932a47d	Scripting: split out compile limits and caching (#52498 ) (#52652 ) Phase 1 of adding compilation limits per context. * Refactor rate limiting and caching into separate class, `ScriptCache`, which will be used per context. * Disable compilation limit for certain tests. Backport of 0866031 Refs: #50152	2020-02-21 12:10:51 -07:00
Armin Braun	0a09e15959	Add Caching for RepositoryData in BlobStoreRepository (#52341 ) (#52566 ) Cache latest `RepositoryData` on heap when it's absolutely safe to do so (i.e. when the repository is in strictly consistent mode). `RepositoryData` can safely be assumed to not grow to a size that would cause trouble because we often have at least two copies of it loaded at the same time when doing repository operations. Also, concurrent snapshot API status requests currently load it independently of each other and so on, making it safe to cache on heap and assume as "small" IMO. The benefits of this move are: * Much faster repository status API calls * listing all snapshot names becomes instant * Other operations are sped up massively too because they mostly operate in two steps: load repository data then load multiple other blobs to get the additional data * Additional cloud cost savings * Better resiliency, saving another spot where an IO issue could break the snapshot * We can simplify a number of spots in the current code that currently pass around the repository data in tricky ways to avoid loading it multiple times in follow ups.	2020-02-21 10:20:07 +01:00
Armin Braun	4bb780bc37	Refactor Inflexible Snapshot Repository BwC (#52365 ) (#52557 ) * Refactor Inflexible Snapshot Repository BwC (#52365) Transport the version to use for a snapshot instead of whether to use shard generations in the snapshots in progress entry. This allows making upcoming repository metadata changes in a flexible manner in an analogous way to how we handle serialization BwC elsewhere. Also, exposing the version at the repository API level will make it easier to do BwC relevant changes in derived repositories like source only or encrypted.	2020-02-21 09:14:34 +01:00
Przemysław Witek	7cd997df84	[ML] Make ml internal indices hidden (#52423 ) (#52509 )	2020-02-19 14:02:32 +01:00
Tim Brooks	a742c58d45	Extract a ConnectionManager interface (#51722 ) Currently we have three different implementations representing a `ConnectionManager`. There is the basic `ConnectionManager` which holds all connections for a cluster. And a remote connection manager which support proxy behavior. And a stubbable connection manager for tests. The remote and stubbable instances use the delegate pattern, so this commit extracts an interface for them all to implement.	2020-02-18 09:19:24 -07:00
David Turner	3d57a78deb	Add extra logging for investigation into #52000 (#52472 ) It looks like #52000 is caused by a slowdown in cluster state application (maybe due to #50907) but I would like to understand the details to ensure that there's nothing else going on here too before simply increasing the timeout. This commit enables some relevant `DEBUG` loggers and also captures stack traces from all threads rather than just the three hottest ones.	2020-02-18 13:02:33 +00:00
Rory Hunter	4fb399396c	Fix mixed threadlocal and supplied random use (#52459 ) Closes #52450. `setRandomIndexSettings(...)` in `ESIntegTestCase` was using both a thread-local and a supplied source of randomness. Fix this method to only use the supplied source.	2020-02-18 10:45:23 +00:00
Nhat Nguyen	bdb2e72ea4	Fix timeout in testDowngradeRemoteClusterToBasic (#52322 ) - ESCCRRestTestCase#ensureYellow does not work well with assertBusy - Increases timeout to 60s Closes #52036	2020-02-17 15:05:42 -05:00
Nik Everett	146def8caa	Implement top_metrics agg (#51155 ) (#52366 ) The `top_metrics` agg is kind of like `top_hits` but it only works on doc values so it should be faster. At this point it is fairly limited in that it only supports a single, numeric sort and a single, numeric metric. And it only fetches the "very topest" document worth of metric. We plan to support returning a configurable number of top metrics, requesting more than one metric and more than one sort. And, eventually, non-numeric sorts and metrics. The trick is doing those things fairly efficiently. Co-Authored by: Zachary Tong <zach@elastic.co>	2020-02-14 11:19:11 -05:00
Julie Tibshirani	0d7165a40b	Standardize naming of fetch subphases. (#52171 ) This commit makes the names of fetch subphases more consistent: * Now the names end in just 'Phase', whereas before some ended in 'FetchSubPhase'. This matches the query subphases like AggregationPhase. * Some names include 'fetch' like FetchScorePhase to avoid ambiguity about what they do.	2020-02-13 13:00:46 -08:00
Nik Everett	2dac36de4d	HLRC support for string_stats (#52163 ) (#52297 ) This adds a builder and parsed results for the `string_stats` aggregation directly to the high level rest client. Without this the HLRC can't access the `string_stats` API without the elastic licensed `analytics` module. While I'm in there this adds a few of our usual unit tests and modernizes the parsing.	2020-02-12 19:25:05 -05:00
Marios Trivyzas	dac720d7a1	Add a cluster setting to disallow expensive queries (#51385 ) (#52279 ) Add a new cluster setting `search.allow_expensive_queries` which by default is `true`. If set to `false`, certain queries that have usually slow performance cannot be executed and an error message is returned. - Queries that need to do linear scans to identify matches: - Script queries - Queries that have a high up-front cost: - Fuzzy queries - Regexp queries - Prefix queries (without index_prefixes enabled - Wildcard queries - Range queries on text and keyword fields - Joining queries - HasParent queries - HasChild queries - ParentId queries - Nested queries - Queries on deprecated 6.x geo shapes (using PrefixTree implementation) - Queries that may have a high per-document cost: - Script score queries - Percolate queries Closes: #29050 (cherry picked from commit a8b39ed842c7770bd9275958c9f747502fd9a3ea)	2020-02-12 22:56:14 +01:00
Gordon Brown	d48ce12920	Convert ILM and SLM histories into hidden indices (#51456 ) Modifies SLM's and ILM's history indices to be hidden indices for added protection against accidental querying and deletion, and improves IndexTemplateRegistry to handle upgrading index templates. Also modifies the REST test cleanup to delete hidden indices.	2020-02-11 14:18:55 -07:00
David Turner	00b9098250	Ignore timeouts with single-node discovery (#52159 ) Today we use `cluster.join.timeout` to prevent nodes from waiting indefinitely if joining a faulty master that is too slow to respond, and `cluster.publish.timeout` to allow a faulty master to detect that it is unable to publish its cluster state updates in a timely fashion. If these timeouts occur then the node restarts the discovery process in an attempt to find a healthier master. In the special case of `discovery.type: single-node` there is no point in looking for another healthier master since the single node in the cluster is all we've got. This commit suppresses these timeouts and instead lets the node wait for joins and publications to succeed no matter how long this might take.	2020-02-11 14:15:01 +00:00
Armin Braun	90eb6a020d	Remove Redundant Loading of RepositoryData during Restore (#51977 ) (#52108 ) We can just put the `IndexId` instead of just the index name into the recovery soruce and save one load of `RepositoryData` on each shard restore that way.	2020-02-09 21:44:18 +01:00
Julie Tibshirani	337d73a7c6	Rename MapperService#fullName to fieldType. The new name more accurately describes what the method returns.	2020-02-07 10:35:53 -08:00
Martijn Laarman	f626700757	Introduces validation to restrict API's to single namespace (#51982 ) To be merged after #51981 lands (cherry picked from commit 196eaf8ac1ccc777f5bd81469d1148bc7ec299dd)	2020-02-06 17:17:03 +01:00
Armin Braun	3be70f64d8	Fix GCS Mock Http Handler JDK Bug (#51933 ) (#51941 ) There is an open JDK bug that is causing an assertion in the JDK's http server to trip if we don't drain the request body before sending response headers. See https://bugs.openjdk.java.net/browse/JDK-8180754 Working around this issue here by always draining the request at the beginning of the handler. Fixes #51446	2020-02-05 15:37:06 +01:00
Maria Ralli	8d3e73b3a0	Add host address to BindTransportException message (#51269 ) When bind fails, show the host address in addition to the port. This helps debugging cases with wrong "network.host" values. Closes #48001	2020-02-04 17:13:19 +00:00
James Rodewig	4ea7297e1e	[DOCS] Change http://elastic.co -> https (#48479 ) (#51812 ) Co-authored-by: Jonathan Budzenski <jon@budzenski.me>	2020-02-03 09:50:11 -05:00
Ryan Ernst	21224caeaf	Remove comparison to true for booleans (#51723 ) While we use `== false` as a more visible form of boolean negation (instead of `!`), the true case is implied and the true value does not need to explicitly checked. This commit converts cases that have slipped into the code checking for `== true`.	2020-01-31 16:35:43 -08:00
Adrien Grand	915a931e93	Bucket aggregation circuit breaker optimization. (#46751 ) (#51730 ) Co-authored-by: Howard <danielhuang@tencent.com>	2020-01-31 11:30:51 +01:00
Ioannis Kakavas	72c5bc9630	Disable diagnose trust mananer only when needed (#51683 ) Do not try to set xpack.ssl.security.diagnose.trust when it is not registered	2020-01-30 21:35:03 +02:00
Armin Braun	7914c1a734	Optimize GCS Mock (#51593 ) (#51594 ) This test was still very GC heavy in Java 8 runs in particular which seems to slow down request processing to the point of timeouts in some runs. This PR completely removes the large number of O(MB) `byte[]` allocations that were happening in the mock http handler which cuts the allocation rate by about a factor of 5 in my local testing for the GC heavy `testSnapshotWithLargeSegmentFiles` run. Closes #51446 Closes #50754	2020-01-29 11:06:05 +01:00
Jim Ferenczi	77f4aafaa2	Expose the logic to cancel task when the rest channel is closed (#51423 ) This commit moves the logic that cancels search requests when the rest channel is closed to a generic client that can be used by other APIs. This will be useful for any rest action that wants to cancel the execution of a task if the underlying rest channel is closed by the client before completion. Relates #49931 Relates #50990 Relates #50990	2020-01-28 22:55:42 +01:00
Armin Braun	aae93a7578	Allow Repository Plugins to Filter Metadata on Create (#51472 ) (#51542 ) * Allow Repository Plugins to Filter Metadata on Create Add a hook that allows repository plugins to filter the repository metadata before it gets written to the cluster state.	2020-01-28 18:33:26 +01:00
Yannick Welsch	fa212fe60b	Stricter checks of setup and teardown in docs tests (#51430 ) Adds extra checks due to 7.x backport	2020-01-28 16:52:23 +01:00
Jake Landis	1197ebb9e9	Additional assertions for SLM restart tests (#48428 ) Adds the ability to conditionally wipe SLM policies and ensures that you can read what you wrote across restarts.	2020-01-28 16:52:23 +01:00
Yannick Welsch	f6686345c9	Avoid unnecessary setup and teardown in docs tests (#51430 ) The docs tests have recently been running much slower than before (see #49753). The gist here is that with ILM/SLM we do a lot of unnecessary setup / teardown work on each test. Compounded with the slightly slower cluster state storage mechanism, this causes the tests to run much slower. In particular, on RAMDisk, docs:check is taking ES 7.4: 6:55 minutes ES master: 16:09 minutes ES with this commit: 6:52 minutes on SSD, docs:check is taking ES 7.4: ??? minutes ES master: 32:20 minutes ES with this commit: 11:21 minutes	2020-01-28 16:52:23 +01:00
William Brafford	9efa5be60e	Password-protected Keystore Feature Branch PR (#51123 ) (#51510 ) * Reload secure settings with password (#43197) If a password is not set, we assume an empty string to be compatible with previous behavior. Only allow the reload to be broadcast to other nodes if TLS is enabled for the transport layer. * Add passphrase support to elasticsearch-keystore (#38498) This change adds support for keystore passphrases to all subcommands of the elasticsearch-keystore cli tool and adds a subcommand for changing the passphrase of an existing keystore. The work to read the passphrase in Elasticsearch when loading, which will be addressed in a different PR. Subcommands of elasticsearch-keystore can handle (open and create) passphrase protected keystores When reading a keystore, a user is only prompted for a passphrase only if the keystore is passphrase protected. When creating a keystore, a user is allowed (default behavior) to create one with an empty passphrase Passphrase can be set to be empty when changing/setting it for an existing keystore Relates to: #32691 Supersedes: #37472 * Restore behavior for force parameter (#44847) Turns out that the behavior of `-f` for the add and add-file sub commands where it would also forcibly create the keystore if it didn't exist, was by design - although undocumented. This change restores that behavior auto-creating a keystore that is not password protected if the force flag is used. The force OptionSpec is moved to the BaseKeyStoreCommand as we will presumably want to maintain the same behavior in any other command that takes a force option. * Handle pwd protected keystores in all CLI tools (#45289) This change ensures that `elasticsearch-setup-passwords` and `elasticsearch-saml-metadata` can handle a password protected elasticsearch.keystore. For setup passwords the user would be prompted to add the elasticsearch keystore password upon running the tool. There is no option to pass the password as a parameter as we assume the user is present in order to enter the desired passwords for the built-in users. For saml-metadata, we prompt for the keystore password at all times even though we'd only need to read something from the keystore when there is a signing or encryption configuration. * Modify docs for setup passwords and saml metadata cli (#45797) Adds a sentence in the documentation of `elasticsearch-setup-passwords` and `elasticsearch-saml-metadata` to describe that users would be prompted for the keystore's password when running these CLI tools, when the keystore is password protected. Co-Authored-By: Lisa Cawley <lcawley@elastic.co> * Elasticsearch keystore passphrase for startup scripts (#44775) This commit allows a user to provide a keystore password on Elasticsearch startup, but only prompts when the keystore exists and is encrypted. The entrypoint in Java code is standard input. When the Bootstrap class is checking for secure keystore settings, it checks whether or not the keystore is encrypted. If so, we read one line from standard input and use this as the password. For simplicity's sake, we allow a maximum passphrase length of 128 characters. (This is an arbitrary limit and could be increased or eliminated. It is also enforced in the keystore tools, so that a user can't create a password that's too long to enter at startup.) In order to provide a password on standard input, we have to account for four different ways of starting Elasticsearch: the bash startup script, the Windows batch startup script, systemd startup, and docker startup. We use wrapper scripts to reduce systemd and docker to the bash case: in both cases, a wrapper script can read a passphrase from the filesystem and pass it to the bash script. In order to simplify testing the need for a passphrase, I have added a has-passwd command to the keystore tool. This command can run silently, and exit with status 0 when the keystore has a password. It exits with status 1 if the keystore doesn't exist or exists and is unencrypted. A good deal of the code-change in this commit has to do with refactoring packaging tests to cleanly use the same tests for both the "archive" and the "package" cases. This required not only moving tests around, but also adding some convenience methods for an abstraction layer over distribution-specific commands. * Adjust docs for password protected keystore (#45054) This commit adds relevant parts in the elasticsearch-keystore sub-commands reference docs and in the reload secure settings API doc. * Fix failing Keystore Passphrase test for feature branch (#50154) One problem with the passphrase-from-file tests, as written, is that they would leave a SystemD environment variable set when they failed, and this setting would cause elasticsearch startup to fail for other tests as well. By using a try-finally, I hope that these tests will fail more gracefully. It appears that our Fedora and Ubuntu environments may be configured to store journald information under /var rather than under /run, so that it will persist between boots. Our destructive tests that read from the journal need to account for this in order to avoid trying to limit the output we check in tests. * Run keystore management tests on docker distros (#50610) * Add Docker handling to PackagingTestCase Keystore tests need to be able to run in the Docker case. We can do this by using a DockerShell instead of a plain Shell when Docker is running. * Improve ES startup check for docker Previously we were checking truncated output for the packaged JDK as an indication that Elasticsearch had started. With new preliminary password checks, we might get a false positive from ES keystore commands, so we have to check specifically that the Elasticsearch class from the Bootstrap package is what's running. * Test password-protected keystore with Docker (#50803) This commit adds two tests for the case where we mount a password-protected keystore into a Docker container and provide a password via a Docker environment variable. We also fix a logging bug where we were logging the identifier for an array of strings rather than the contents of that array. * Add documentation for keystore startup prompting (#50821) When a keystore is password-protected, Elasticsearch will prompt at startup. This commit adds documentation for this prompt for the archive, systemd, and Docker cases. Co-authored-by: Lisa Cawley <lcawley@elastic.co> * Warn when unable to upgrade keystore on debian (#51011) For Red Hat RPM upgrades, we warn if we can't upgrade the keystore. This commit brings the same logic to the code for Debian packages. See the posttrans file for gets executed for RPMs. * Restore handling of string input Adds tests that were mistakenly removed. One of these tests proved we were not handling the the stdin (-x) option correctly when no input was added. This commit restores the original approach of reading stdin one char at a time until there is no more (-1, \r, \n) instead of using readline() that might return null * Apply spotless reformatting * Use '--since' flag to get recent journal messages When we get Elasticsearch logs from journald, we want to fetch only log messages from the last run. There are two reasons for this. First, if there are many logs, we might get a string that's too large for our utility methods. Second, when we're looking for a specific message or error, we almost certainly want to look only at messages from the last execution. Previously, we've been trying to do this by clearing out the physical files under the journald process. But there seems to be some contention over these directories: if journald writes a log file in between when our deletion command deletes the file and when it deletes the log directory, the deletion will fail. It seems to me that we might be able to use journald's "--since" flag to retrieve only log messages from the last run, and that this might be less likely to fail due to race conditions in file deletion. Unfortunately, it looks as if the "--since" flag has a granularity of one-second. I've added a two-second sleep to make sure that there's a sufficient gap between the test that will read from journald and the test before it. * Use new journald wrapper pattern * Update version added in secure settings request Co-authored-by: Lisa Cawley <lcawley@elastic.co> Co-authored-by: Ioannis Kakavas <ikakavas@protonmail.com>	2020-01-28 05:32:32 -05:00
Ioannis Kakavas	4f3548fbd7	Disable diagnostic trust manager in tests (#51501 ) This commit sets `xpack.security.ssl.diagnose.trust` to false in all of our tests when running in FIPS 140 mode and when settings objects are used to create an instance of the SSLService. This is needed in 7.x because setting xpack.security.ssl.diagnose.trust to true wraps SunJSSE TrustManager with our own DiagnosticTrustManager and this is not allowed when SunJSSE is in FIPS mode. An alternative would be to set xpack.security.fips.enabled to true which would also implicitly disable xpack.security.ssl.diagnose.trust but would have additional effects (would require that we set PBKDF2 for password hashing algorithm in all test clusters, would prohibit using JKS keystores in nodes even if relevant tests have been muted in FIPS mode etc.) Relates: #49900 Resolves: #51268	2020-01-28 10:17:35 +02:00
Ioannis Kakavas	ee202a642f	Enable tests in FIPS 140 in JDK 11 (#49485 ) This change changes the way to run our test suites in JVMs configured in FIPS 140 approved mode. It does so by: - Configuring any given runtime Java in FIPS mode with the bundled policy and security properties files, setting the system properties java.security.properties and java.security.policy with the == operator that overrides the default JVM properties and policy. - When runtime java is 11 and higher, using BouncyCastle FIPS Cryptographic provider and BCJSSE in FIPS mode. These are used as testRuntime dependencies for unit tests and internal clusters, and copied (relevant jars) explicitly to the lib directory for testclusters used in REST tests - When runtime java is 8, using BouncyCastle FIPS Cryptographic provider and SunJSSE in FIPS mode. Running the tests in FIPS 140 approved mode doesn't require an additional configuration either in CI workers or locally and is controlled by specifying -Dtests.fips.enabled=true	2020-01-27 11:14:52 +02:00
Przemysław Witek	8703b885c2	Move TupleMatchers class to org.elasticsearch.test.hamcrest package (#51359 ) (#51395 )	2020-01-24 11:10:54 +01:00
Nhat Nguyen	1ca5dd13de	Flush instead of synced-flush inactive shards (#51365 ) If all nodes are on 7.6, we prefer to perform a normal flush instead of synced flush when a shard becomes inactive. Backport of #49126	2020-01-23 17:20:57 -05:00
Henning Andersen	8c8d0dbacc	Revert "Workaround for JDK 14 EA FileChannel.map issue (#50523 )" (#51323 ) This reverts commit c7fd24ca1569a809b499caf34077599e463bb8d6. Now that JDK-8236582 is fixed in JDK 14 EA, we can revert the workaround. Relates #50523 and #50512	2020-01-23 11:27:02 +01:00
Nhat Nguyen	157b352b47	Exclude nested documents in LuceneChangesSnapshot (#51279 ) LuceneChangesSnapshot can be slow if nested documents are heavily used. Also, it estimates the number of operations to be recovered in peer recoveries inaccurately. With this change, we prefer excluding the nested non-root documents in a Lucene query instead.	2020-01-22 17:33:28 -05:00
Armin Braun	d9ad66965c	Fix Rest Tests Failing to Cleanup Rollup Jobs (#51246 ) (#51294 ) * Fix Rest Tests Failing to Cleanup Rollup Jobs If the rollup jobs index doesn't exist for some reason (like running against a 6.x cluster) we should just assume the jobs have been cleaned up and move on. Closes #50819	2020-01-22 11:38:20 +01:00
Przemyslaw Gomulka	0513d8dca3	Add SPI jvm option to SystemJvmOptions (#50916 ) Adding back accidentally removed jvm option that is required to enforce start of the week = Monday in IsoCalendarDataProvider. Adding a `feature` to yml test in order to skip running it in JDK8 commit that removed it `398c802` commit that backports SystemJvmOptions `c4fbda3` relates 7.x backport of code that enforces CalendarDataProvider use #48349	2020-01-21 09:02:21 +01:00
Jay Modi	107989df3e	Introduce hidden indices (#51164 ) This change introduces a new feature for indices so that they can be hidden from wildcard expansion. The feature is referred to as hidden indices. An index can be marked hidden through the use of an index setting, `index.hidden`, at creation time. One primary use case for this feature is to have a construct that fits indices that are created by the stack that contain data used for display to the user and/or intended for querying by the user. The desire to keep them hidden is to avoid confusing users when searching all of the data they have indexed and getting results returned from indices created by the system. Hidden indices have the following properties: * API calls for all indices (empty indices array, _all, or ) will not return hidden indices by default. Wildcard expansion will not return hidden indices by default unless the wildcard pattern begins with a `.`. This behavior is similar to shell expansion of wildcards. * REST API calls can enable the expansion of wildcards to hidden indices with the `expand_wildcards` parameter. To expand wildcards to hidden indices, use the value `hidden` in conjunction with `open` and/or `closed`. * Creation of a hidden index will ignore global index templates. A global index template is one with a match-all pattern. * Index templates can make an index hidden, with the exception of a global index template. * Accessing a hidden index directly requires no additional parameters. Backport of #50452	2020-01-17 10:09:01 -07:00
Yannick Welsch	dc47b380c8	Block too many concurrent mapping updates (#51038 ) Ensures that there are not too many concurrent dynamic mapping updates going out from the data nodes to the master. Closes #50670	2020-01-15 16:02:11 +01:00
Nik Everett	fc5fde7950	Add "did you mean" to ObjectParser (#50938 ) (#50985 ) Check it out: ``` $ curl -u elastic:password -HContent-Type:application/json -XPOST localhost:9200/test/_update/foo?pretty -d'{ "dac": {} }' { "error" : { "root_cause" : [ { "type" : "x_content_parse_exception", "reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?" } ], "type" : "x_content_parse_exception", "reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?" }, "status" : 400 } ``` The tricky thing about implementing this is that x-content doesn't depend on Lucene. So this works by creating an extension point for the error message using SPI. Elasticsearch's server module provides the "spell checking" implementation. s	2020-01-14 17:53:41 -05:00
Armin Braun	16c07472e5	Track Snapshot Version in RepositoryData (#50930 ) (#50989 ) * Track Snapshot Version in RepositoryData (#50930) Add tracking of snapshot versions to RepositoryData to make BwC logic more efficient. Follow up to #50853	2020-01-14 18:15:07 +01:00
Armin Braun	6e8ea7aaa2	Work around JVM Bug in LongGCDisruptionTests (#50731 ) (#50974 ) There is a JVM bug causing `Thread#suspend` calls to randomly take multiple seconds breaking these tests that call the method numerous times in a loop. Increasing the timeout would will not work since we may call `suspend` tens if not hundreds of times and even a small number of them experiencing the blocking will lead to multiple minutes of waiting. This PR detects the specific issue by timing the `Thread#suspend` calls and skips the remainder of the test if it timed out because of the JVM bug. Closes #50047	2020-01-14 17:13:21 +01:00
Tim Brooks	d8510be3d9	Revert "Send cluster name and discovery node in handshake (#48916 )" (#50944 ) This reverts commit `0645ee88e2`.	2020-01-14 09:53:13 -06:00
Yannick Welsch	91d7b446a0	Warn on slow metadata performance (#50956 ) Has the new cluster state storage layer emit warnings in case metadata performance is very slow. Relates #48701	2020-01-14 15:04:28 +01:00
Yannick Welsch	22ba759e1f	Move metadata storage to Lucene (#50928 ) * Move metadata storage to Lucene (#50907) Today we split the on-disk cluster metadata across many files: one file for the metadata of each index, plus one file for the global metadata and another for the manifest. Most metadata updates only touch a few of these files, but some must write them all. If a node holds a large number of indices then it's possible its disks are not fast enough to process a complete metadata update before timing out. In severe cases affecting master-eligible nodes this can prevent an election from succeeding. This commit uses Lucene as a metadata storage for the cluster state, and is a squashed version of the following PRs that were targeting a feature branch: * Introduce Lucene-based metadata persistence (#48733) This commit introduces `LucenePersistedState` which master-eligible nodes can use to persist the cluster metadata in a Lucene index rather than in many separate files. Relates #48701 * Remove per-index metadata without assigned shards (#49234) Today on master-eligible nodes we maintain per-index metadata files for every index. However, we also keep this metadata in the `LucenePersistedState`, and only use the per-index metadata files for importing dangling indices. However there is no point in importing a dangling index without any shard data, so we do not need to maintain these extra files any more. This commit removes per-index metadata files from nodes which do not hold any shards of those indices. Relates #48701 * Use Lucene exclusively for metadata storage (#50144) This moves metadata persistence to Lucene for all node types. It also reenables BWC and adds an interoperability layer for upgrades from prior versions. This commit disables a number of tests related to dangling indices and command-line tools. Those will be addressed in follow-ups. Relates #48701 * Add command-line tool support for Lucene-based metadata storage (#50179) Adds command-line tool support (unsafe-bootstrap, detach-cluster, repurpose, & shard commands) for the Lucene-based metadata storage. Relates #48701 * Use single directory for metadata (#50639) Earlier PRs for #48701 introduced a separate directory for the cluster state. This is not needed though, and introduces an additional unnecessary cognitive burden to the users. Co-Authored-By: David Turner <david.turner@elastic.co> * Add async dangling indices support (#50642) Adds support for writing out dangling indices in an asynchronous way. Also provides an option to avoid writing out dangling indices at all. Relates #48701 * Fold node metadata into new node storage (#50741) Moves node metadata to uses the new storage mechanism (see #48701) as the authoritative source. * Write CS asynchronously on data-only nodes (#50782) Writes cluster states out asynchronously on data-only nodes. The main reason for writing out the cluster state at all is so that the data-only nodes can snap into a cluster, that they can do a bit of bootstrap validation and so that the shard recovery tools work. Cluster states that are written asynchronously have their voting configuration adapted to a non existing configuration so that these nodes cannot mistakenly become master even if their node role is changed back and forth. Relates #48701 * Remove persistent cluster settings tool (#50694) Adds the elasticsearch-node remove-settings tool to remove persistent settings from the on disk cluster state in case where it contains incompatible settings that prevent the cluster from forming. Relates #48701 * Make cluster state writer resilient to disk issues (#50805) Adds handling to make the cluster state writer resilient to disk issues. Relates to #48701 * Omit writing global metadata if no change (#50901) Uses the same optimization for the new cluster state storage layer as the old one, writing global metadata only when changed. Avoids writing out the global metadata if none of the persistent fields changed. Speeds up server:integTest by ~10%. Relates #48701 * DanglingIndicesIT should ensure node removed first (#50896) These tests occasionally failed because the deletion was submitted before the restarting node was removed from the cluster, causing the deletion not to be fully acked. This commit fixes this by checking the restarting node has been removed from the cluster. Co-authored-by: David Turner <david.turner@elastic.co> * fix tests Co-authored-by: David Turner <david.turner@elastic.co>	2020-01-14 09:35:43 +01:00
Nhat Nguyen	fb32a55dd5	Deprecate synced flush (#50835 ) A normal flush has the same effect as a synced flush on Elasticsearch 7.6 or later. It's deprecated in 7.6 and will be removed in 8.0. Relates #50776	2020-01-13 19:54:38 -05:00
Nhat Nguyen	05f97d5e1b	Revert "Deprecate synced flush (#50835 )" This reverts commit `1a32d7142a`.	2020-01-13 11:41:03 -05:00
Nhat Nguyen	1a32d7142a	Deprecate synced flush (#50835 ) A normal flush has the same effect as a synced flush on Elasticsearch 7.6 or later. It's deprecated in 7.6 and will be removed in 8.0. Relates #50776	2020-01-13 10:58:29 -05:00
Armin Braun	f70e8f6ab5	Fix Snapshot Repository Corruption in Downgrade Scenarios (#50692 ) (#50797 ) * Fix Snapshot Repository Corruption in Downgrade Scenarios (#50692) This PR introduces test infrastructure for downgrading a cluster while interacting with a given repository. It fixes the fact that repository metadata in the new format could be written while there's still older snapshots in the repository that require the old-format metadata to be restorable.	2020-01-09 21:21:13 +01:00
Armin Braun	4a7e09f624	Enforce Logging of Errors in GCS Rest RetriesTests (#50761 ) (#50783 ) It's impossible to tell why #50754 fails without this change. We're failing to close the `exchange` somewhere and there is no write timeout in the GCS SDK (something to look into separately) only a read timeout on the socket so if we're failing on an assertion without reading the full request body (at least into the read-buffer) we're locking up waiting forever on `write0`. This change ensure the `exchange` is closed in the tests where we could lock up on a write and logs the failure so we can find out what broke #50754.	2020-01-09 10:46:07 +01:00
Armin Braun	a725896c92	Fix and Reenable SnapshotTool Minio Tests (#50736 ) (#50745 ) This solves half of the problem in #46813 by moving the S3 tests to using the shared minio fixture so we at least have some non-3rd-party, constantly running coverage on these tests.	2020-01-08 16:33:36 +01:00
Adrien Grand	31158ab3d5	Add per-field metadata. (#50333 ) This PR adds per-field metadata that can be set in the mappings and is later returned by the field capabilities API. This metadata is completely opaque to Elasticsearch but may be used by tools that index data in Elasticsearch to communicate metadata about fields with tools that then search this data. A typical example that has been requested in the past is the ability to attach a unit to a numeric field. In order to not bloat the cluster state, Elasticsearch requires that this metadata be small: - keys can't be longer than 20 chars, - values can only be numbers or strings of no more than 50 chars - no inner arrays or objects, - the metadata can't have more than 5 keys in total. Given that metadata is opaque to Elasticsearch, field capabilities don't try to do anything smart when merging metadata about multiple indices, the union of all field metadatas is returned. Here is how the meta might look like in mappings: ```json { "properties": { "latency": { "type": "long", "meta": { "unit": "ms" } } } } ``` And then in the field capabilities response: ```json { "latency": { "long": { "searchable": true, "aggreggatable": true, "meta": { "unit": [ "ms" ] } } } } ``` When there are no conflicts, values are arrays of size 1, but when there are conflicts, Elasticsearch includes all unique values in this array, without giving ways to know which index has which metadata value: ```json { "latency": { "long": { "searchable": true, "aggreggatable": true, "meta": { "unit": [ "ms", "ns" ] } } } } ``` Closes #33267	2020-01-08 16:21:18 +01:00
Rory Hunter	b1ff74f652	New setting to prevent automatically importing dangling indices (#49174 ) Introduce a new static setting, `gateway.auto_import_dangling_indices`, which prevents dangling indices from being automatically imported. Part of #48366.	2020-01-08 13:39:20 +01:00
Armin Braun	d0d48311f4	Faster and Simpler GCS REST Mock (#50706 ) (#50707 ) * Faster and Simpler GCS REST Mock I reworked the GCS mock a little to use less copying+allocation, log the full request body on failure to read a multi-part request and generally be a little simpler and easy to follow to track down the remaining issues that are causing almost daily failures from this class's multi-part request parsing that can't be reproduced locally.	2020-01-07 20:17:46 +01:00
David Turner	2039cc813b	Fix testDelayVariabilityAppliesToFutureTasks (#50667 ) This test seems to be bogus as it was confusing a nominal execution time with a delay (i.e. an elapsed time). This commit reworks the test to address this. Fixes #50650	2020-01-07 10:01:03 +00:00
Armin Braun	72a405fafb	Fix GCS Mock Broken Handling of some Blobs (#50666 ) (#50671 ) * Fix GCS Mock Broken Handling of some Blobs We were incorrectly handling blobs starting in `\r\n` which broke tests randomly when blobs started on these. Relates #49429	2020-01-06 19:27:57 +01:00
Nhat Nguyen	b71490b06b	Deprecate indices without soft-deletes (#50502 ) (#50634 ) Soft-deletes will be enabled for all indices in 8.0. Hence, we should deprecate new indices without soft-deletes in 7.x. Backport of #50502	2020-01-06 08:44:30 -05:00
Henning Andersen	312bf44601	Workaround for JDK 14 EA FileChannel.map issue (#50523 ) FileChannel.map provokes static initialization of ExtendedMapMode in JDK14 EA, which needs elevated privileges. Relates #50512	2020-01-06 12:18:49 +01:00
Nik Everett	2362c430cd	Clean up wire test case a bit (#50627 ) (#50632 ) * Adds JavaDoc to `AbstractWireTestCase` and `AbstractWireSerializingTestCase` so it is more obvious you should prefer the latter if you have a choice * Moves the `instanceReader` method out of `AbstractWireTestCase` becaue it is no longer used. * Marks a bunch of methods final so it is more obvious which classes are for what. * Cleans up the side effects of the above.	2020-01-05 16:20:38 -05:00
Martijn van Groningen	10ed1ae1d2	Add remote info to the HLRC (#50483 ) The additional change to the original PR (#49657), is that `org.elasticsearch.client.cluster.RemoteConnectionInfo` now parses the initial_connect_timeout field as a string instead of a TimeValue instance. The reason that this is needed is because that the initial_connect_timeout field in the remote connection api is serialized for human consumption, but not for parsing purposes. Therefore the HLRC can't parse it correctly (which caused test failures in CI, but not in the PR CI :( ). The way this field is serialized needs to be changed in the remote connection api, but that is a breaking change. We should wait making this change until rest api versioning is introduced. Co-Authored-By: j-bean <anton.shuvaev91@gmail.com> Co-authored-by: j-bean <anton.shuvaev91@gmail.com>	2019-12-24 15:11:58 +01:00
Nhat Nguyen	33204c2055	Use peer recovery retention leases for indices without soft-deletes (#50351 ) Today, the replica allocator uses peer recovery retention leases to select the best-matched copies when allocating replicas of indices with soft-deletes. We can employ this mechanism for indices without soft-deletes because the retaining sequence number of a PRRL is the persisted global checkpoint (plus one) of that copy. If the primary and replica have the same retaining sequence number, then we should be able to perform a noop recovery. The reason is that we must be retaining translog up to the local checkpoint of the safe commit, which is at most the global checkpoint of either copy). The only limitation is that we might not cancel ongoing file-based recoveries with PRRLs for noop recoveries. We can't make the translog retention policy comply with PRRLs. We also have this problem with soft-deletes if a PRRL is about to expire. Relates #45136 Relates #46959	2019-12-23 22:04:07 -05:00
Nhat Nguyen	1dc98ad617	Ensure global checkpoint was advanced and synced We need to make sure that the global checkpoints and peer recovery retention leases were advanced to the max_seq_no and synced; otherwise, we can risk expiring some peer recovery retention leases because of the file-based recovery threshold. Relates #49448	2019-12-23 21:10:30 -05:00
Lee Hinman	c3c9ccf61f	[7.x] Add ILM histore store index (#50287 ) (#50345 ) * Add ILM histore store index (#50287) * Add ILM histore store index This commit adds an ILM history store that tracks the lifecycle execution state as an index progresses through its ILM policy. ILM history documents store output similar to what the ILM explain API returns. An example document with ALL fields (not all documents will have all fields) would look like: ```json { "@timestamp": 1203012389, "policy": "my-ilm-policy", "index": "index-2019.1.1-000023", "index_age":123120, "success": true, "state": { "phase": "warm", "action": "allocate", "step": "ERROR", "failed_step": "update-settings", "is_auto-retryable_error": true, "creation_date": 12389012039, "phase_time": 12908389120, "action_time": 1283901209, "step_time": 123904107140, "phase_definition": "{\"policy\":\"ilm-history-ilm-policy\",\"phase_definition\":{\"min_age\":\"0ms\",\"actions\":{\"rollover\":{\"max_size\":\"50gb\",\"max_age\":\"30d\"}}},\"version\":1,\"modified_date_in_millis\":1576517253463}", "step_info": "{... etc step info here as json ...}" }, "error_details": "java.lang.RuntimeException: etc\n\tcaused by:etc etc etc full stacktrace" } ``` These documents go into the `ilm-history-1-00000N` index to provide an audit trail of the operations ILM has performed. This history storage is enabled by default but can be disabled by setting `index.lifecycle.history_index_enabled` to `false.` Resolves #49180 * Make ILMHistoryStore.putAsync truly async (#50403) This moves the `putAsync` method in `ILMHistoryStore` never to block. Previously due to the way that the `BulkProcessor` works, it was possible for `BulkProcessor#add` to block executing a bulk request. This was bad as we may be adding things to the history store in cluster state update threads. This also moves the index creation to be done prior to the bulk request execution, rather than being checked every time an operation was added to the queue. This lessens the chance of the index being created, then deleted (by some external force), and then recreated via a bulk indexing request. Resolves #50353	2019-12-20 12:33:36 -07:00
Jim Ferenczi	2acafd4b15	Optimize composite aggregation based on index sorting (#48399 ) (#50272 ) Co-authored-by: Daniel Huang <danielhuang@tencent.com> This is a spinoff of #48130 that generalizes the proposal to allow early termination with the composite aggregation when leading sources match a prefix or the entire index sort specification. In such case the composite aggregation can use the index sort natural order to early terminate the collection when it reaches a composite key that is greater than the bottom of the queue. The optimization is also applicable when a query other than match_all is provided. However the optimization is deactivated for sources that match the index sort in the following cases: * Multi-valued source, in such case early termination is not possible. * missing_bucket is set to true	2019-12-20 12:32:37 +01:00
Stuart Tettemer	689df1f28f	Scripting: ScriptFactory not required by compile (#50344 ) (#50392 ) Avoid backwards incompatible changes for 8.x and 7.6 by removing type restriction on compile and Factory. Factories may optionally implement ScriptFactory. If so, then they can indicate determinism and thus cacheability. Backport Relates: #49466	2019-12-19 12:50:25 -07:00
Stuart Tettemer	06a24f09cf	Scripting: Cache script results if deterministic (#50106 ) (#50329 ) Cache results from queries that use scripts if they use only deterministic API calls. Nondeterministic API calls are marked in the whitelist with the `@nondeterministic` annotation. Examples are `Math.random()` and `new Date()`. Refs: #49466	2019-12-18 13:00:42 -07:00
Armin Braun	55cc5432d6	Fix S3 Repo Tests Incomplete Reads (#50268 ) (#50275 ) We need to read in a loop here. A single read to a huge byte array will only read 16k max with the S3 SDK so if the blob we're trying to fully read is larger we close early and fail the size comparison. Also, drain streams fully when checking existence to avoid S3 SDK warnings.	2019-12-17 15:33:09 +01:00
Yannick Welsch	1f981580aa	Simplify InternalTestCluster.fullRestart (#50218 ) With node ordinals gone, there's no longer a need for such a complicated full cluster restart procedure (as we can now uniquely associate nodes to data folders). Follow-up to #41652	2019-12-17 14:33:01 +01:00
Armin Braun	2e7b1ab375	Use ClusterState as Consistency Source for Snapshot Repositories (#49060 ) (#50267 ) Follow up to #49729 This change removes falling back to listing out the repository contents to find the latest `index-N` in write-mounted blob store repositories. This saves 2-3 list operations on each snapshot create and delete operation. Also it makes all the snapshot status APIs cheaper (and faster) by saving one list operation there as well in many cases. This removes the resiliency to concurrent modifications of the repository as a result and puts a repository in a `corrupted` state in case loading `RepositoryData` failed from the assumed generation.	2019-12-17 10:55:13 +01:00
Armin Braun	761d6e8e4b	Remove BlobContainer Tests against Mocks (#50194 ) (#50220 ) * Remove BlobContainer Tests against Mocks Removing all these weird mocks as asked for by #30424. All these tests are now part of real repository ITs and otherwise left unchanged if they had independent tests that didn't call the `createBlobStore` method previously. The HDFS tests also get added coverage as a side-effect because they did not have an implementation of the abstract repository ITs. Closes #30424	2019-12-16 11:37:09 +01:00
Nhat Nguyen	df46848fb0	Migrate peer recovery from translog to retention lease (#49448 ) Since 7.4, we switch from translog to Lucene as the source of history for peer recoveries. However, we reduce the likelihood of operation-based recoveries when performing a full cluster restart from pre-7.4 because existing copies do not have PPRL. To remedy this issue, we fallback using translog in peer recoveries if the recovering replica does not have a peer recovery retention lease, and the replication group hasn't fully migrated to PRRL. Relates #45136	2019-12-15 10:24:39 -05:00
Nhat Nguyen	c151a75dfe	Use retention lease in peer recovery of closed indices (#48430 ) Today we do not use retention leases in peer recovery for closed indices because we can't sync retention leases on closed indices. This change allows that ability and adjusts peer recovery to use retention leases for all indices with soft-deletes enabled. Relates #45136 Co-authored-by: David Turner <david.turner@elastic.co>	2019-12-15 10:24:34 -05:00
Armin Braun	c73930988b	Remove Unused Delete Endpoint from GCS Mock (#50128 ) (#50134 ) Follow up to #50024: we're not using the single-delete any more so no need to have a mock endpoint for it	2019-12-12 14:18:06 +01:00
Armin Braun	2a186c8148	Disable LongGCDisruptionTests on JDK11+12 (#50097 ) (#50126 ) See discussion in #50047 (comment). There are reproducible issues with Thread#suspend in Jdk11 and Jdk12 for me locally and we have one failure for each on CI. Jdk8 and Jdk13 are stable though on CI and in my testing so I'd selectively disable this test here to keep the coverage. We aren't using suspend in production code so the JDK bug behind this does not affect us. Closes #50047	2019-12-12 11:40:44 +01:00
Armin Braun	6eee41e253	Remove Unused Single Delete in BlobStoreRepository (#50024 ) (#50123 ) * Remove Unused Single Delete in BlobStoreRepository There are no more production uses of the non-bulk delete or the delete that throws on missing so this commit removes both these methods. Only the bulk delete logic remains. Where the bulk delete was derived from single deletes, the single delete code was inlined into the bulk delete method. Where single delete was used in tests it was replaced by bulk deleting.	2019-12-12 11:17:46 +01:00
Armin Braun	0fae4065ef	Better Logging GCS Blobstore Mock (#50102 ) (#50124 ) * Better Logging GCS Blobstore Mock Two things: 1. We should just throw a descriptive assertion error and figure out why we're not reading a multi-part instead of returning a `400` and failing the tests that way here since we can't reproduce these 400s locally. 2. We were missing logging the exception on a cleanup delete failure that coincides with the `400` issue in tests. Relates #49429	2019-12-12 11:17:22 +01:00
Armin Braun	d19c8db4e4	Fix GCS Mock Batch Delete Behavior (#50034 ) (#50084 ) Batch deletes get a response for every delete request, not just those that actually hit an existing blob. The fact that we only responded for existing blobs leads to a degenerate response that throws a parse exception if a batch delete only contains non-existant blobs.	2019-12-11 17:40:25 +01:00
Przemyslaw Gomulka	81ff2d0f0d	Allow skipping ranges of versions backport(#50014 ) (#50028 ) Multiple version ranges are allowed to be used in section skip in yml tests. This is useful when a bugfix was backported to latest versions and all previous releases contain a wire breaking bug. examples: 6.1.0 - 6.3.0, 6.6.0 - 6.7.9, 7.0 - - 7.2, 8.0.0 - backport #50014	2019-12-10 16:43:41 +01:00
Armin Braun	ac2774c9fa	Use Cluster State to Track Repository Generation (#49729 ) (#49976 ) Step on the road to #49060. This commit adds the logic to keep track of a repository's generation across repository operations. See changes to package level Javadoc for the concrete changes in the distributed state machine. It updates the write side of new repository generations to be fully consistent via the cluster state. With this change, no `index-N` will be overwritten for the same repository ever. So eventual consistency issues around conflicting updates to the same `index-N` are not a possibility any longer. With this change the read side will still use listing of repository contents instead of relying solely on the cluster state contents. The logic for that will be introduced in #49060. This retains the ability to externally delete the contents of a repository and continue using it afterwards for the time being. In #49060 the use of listing to determine the repository generation will be removed in all cases (except for full-cluster restart) as the last step in this effort.	2019-12-09 09:02:57 +01:00
Stuart Tettemer	17cda5b2c0	Scripting: Groundwork for caching script results (#49895 ) (#49944 ) In order to cache script results in the query shard cache, we need to check if scripts are deterministic. This change adds a default method to the script factories, `isResultDeterministic() -> false` which is used by the `QueryShardContext`. Script results were never cached and that does not change here. Future changes will implement this method based on whether the results of the scripts are deterministic or not and therefore cacheable. Refs: #49466 Backport	2019-12-06 15:08:05 -07:00
Zachary Tong	fec882a457	Decouple pipeline reductions from final agg reduction (#45796 ) Historically only two things happened in the final reduction: empty buckets were filled, and pipeline aggs were reduced (since it was the final reduction, this was safe). Usage of the final reduction is growing however. Auto-date-histo might need to perform many reductions on final-reduce to merge down buckets, CCS may need to side-step the final reduction if sending to a different cluster, etc Having pipelines generate their output in the final reduce was convenient, but is becoming increasingly difficult to manage as the rest of the agg framework advances. This commit decouples pipeline aggs from the final reduction by introducing a new "top level" reduce, which should be called at the beginning of the reduce cycle (e.g. from the SearchPhaseController). This will only reduce pipeline aggs on the final reduce after the non-pipeline agg tree has been fully reduced. By separating pipeline reduction into their own set of methods, aggregations are free to use the final reduction for whatever purpose without worrying about generating pipeline results which are non-reducible	2019-12-05 16:11:54 -05:00
Stuart Tettemer	426c7a5e8f	Scripting: add available languages & contexts API (#49652 ) (#49815 ) Adds `GET /_script_language` to support Kibana dynamic scripting language selection. Response contains whether `inline` and/or `stored` scripts are enabled as determined by the `script.allowed_types` settings. For each scripting language registered, such as `painless`, `expression`, `mustache` or custom, available contexts for the language are included as determined by the `script.allowed_contexts` setting. Response format: ``` { "types_allowed": [ "inline", "stored" ], "language_contexts": [ { "language": "expression", "contexts": [ "aggregation_selector", "aggs" ... ] }, { "language": "painless", "contexts": [ "aggregation_selector", "aggs", "aggs_combine", ... ] } ... ] } ``` Fixes: #49463 Backport	2019-12-04 16:18:22 -07:00
Armin Braun	996cddd98b	Stop Copying Every Http Request in Message Handler (#44564 ) (#49809 ) * Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way * Relates #32228 * I think the issue that preventet that PR that PR from being merged was solved by #39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks (I'd argue the bounds checks we add, we save when copying the composite buffer) * I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)	2019-12-04 08:41:42 +01:00
Yannick Welsch	fbb92f527a	Replicate write actions before fsyncing them (#49746 ) This commit fixes a number of issues with data replication: - Local and global checkpoints are not updated after the new operations have been fsynced, but might capture a state before the fsync. The reason why this probably went undetected for so long is that AsyncIOProcessor is synchronous if you index one item at a time, and hence working as intended unless you have a high enough level of concurrent indexing. As we rely in other places on the assumption that we have an up-to-date local checkpoint in case of synchronous translog durability, there's a risk for the local and global checkpoints not to be up-to-date after replication completes, and that this won't be corrected by the periodic global checkpoint sync. - AsyncIOProcessor also has another "bad" side effect here: if you index one bulk at a time, the bulk is always first fsynced on the primary before being sent to the replica. Further, if one thread is tasked by AsyncIOProcessor to drain the processing queue and fsync, other threads can easily pile more bulk requests on top of that thread. Things are not very fair here, and the thread might continue doing a lot more fsyncs before returning (as the other threads pile more and more on top), which blocks it from returning as a replication request (e.g. if this thread is on the primary, it blocks the replication requests to the replicas from going out, and delaying checkpoint advancement). This commit fixes all these issues, and also simplifies the code that coordinates all the after write actions.	2019-12-03 12:22:46 +01:00
Mayya Sharipova	7cf170830c	Optimize sort on numeric long and date fields. (#49732 ) This rewrites long sort as a `DistanceFeatureQuery`, which can efficiently skip non-competitive blocks and segments of documents. Depending on the dataset, the speedups can be 2 - 10 times. The optimization can be disabled with setting the system property `es.search.rewrite_sort` to `false`. Optimization is skipped when an index has 50% or more data with the same value. Optimization is done through: 1. Rewriting sort as `DistanceFeatureQuery` which can efficiently skip non-competitive blocks and segments of documents. 2. Sorting segments according to the primary numeric sort field(#44021) This allows to skip non-competitive segments. 3. Using collector manager. When we optimize sort, we sort segments by their min/max value. As a collector expects to have segments in order, we can not use a single collector for sorted segments. We use collectorManager, where for every segment a dedicated collector will be created. 4. Using Lucene's shared TopFieldCollector manager This collector manager is able to exchange minimum competitive score between collectors, which allows us to efficiently skip the whole segments that don't contain competitive scores. 5. When index is force merged to a single segment, #48533 interleaving old and new segments allows for this optimization as well, as blocks with non-competitive docs can be skipped. Backport for #48804 Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>	2019-11-29 15:37:40 -05:00
Armin Braun	813b49adb4	Make BlobStoreRepository Aware of ClusterState (#49639 ) (#49711 ) * Make BlobStoreRepository Aware of ClusterState (#49639) This is a preliminary to #49060. It does not introduce any substantial behavior change to how the blob store repository operates. What it does is to add all the infrastructure changes around passing the cluster service to the blob store, associated test changes and a best effort approach to tracking the latest repository generation on all nodes from cluster state updates. This brings a slight improvement to the consistency by which non-master nodes (or master directly after a failover) will be able to determine the latest repository generation. It does not however do any tricky checks for the situation after a repository operation (create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change simple. This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess" for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the repository operates in #49060	2019-11-29 14:57:47 +01:00
Armin Braun	90e9d61f2b	Optimize GoogleCloudStorageHttpHandler (#49677 ) (#49707 ) Removing a lot of needless buffering and array creation to reduce the significant memory usage of tests using this. The incoming stream from the `exchange` is already buffered so there is no point in adding a ton of additional buffers everywhere.	2019-11-29 11:17:47 +01:00

... 4 5 6 7 8 ...

2814 Commits