OpenSearch

Commit Graph

Author	SHA1	Message	Date
Nik Everett	5f52bc4c9f	Fix two scripted_metric bugs (backport of #58547 ) (#58565 ) Fixes two bugs introduced by #57627: 1. We were not properly letting go of memory from the request breaker when the aggregation finished. 2. We no longer supported totally arbitrary stuff produced by the init script because we assumed that it'd be ok to run the script once and clone its results. Sadly, cloning can't clone anything that the init script can make, like `String` arrays. This runs the init script once for every new bucket so we don't need to clone.	2020-06-25 16:16:10 -04:00
Jason Tedor	52ad5842a9	Introduce node.roles setting (#58512 ) Today we have individual settings for configuring node roles such as node.data and node.master. Additionally, roles are pluggable and we have used this to introduce roles such as node.ml and node.voting_only. As the number of roles is growing, managing these becomes harder for the user. For example, to create a master-only node, today a user has to configure: - node.data: false - node.ingest: false - node.remote_cluster_client: false - node.ml: false at a minimum if they are relying on defaults, but also add: - node.master: true - node.transform: false - node.voting_only: false If they want to be explicit. This is also challenging in cases where a user wants to have configure a coordinating-only node which requires disabling all roles, a list which we are adding to, requiring the user to keep checking whether a node has acquired any of these roles. This commit addresses this by adding a list setting node.roles for which a user has explicit control over the list of roles that a node has. If the setting is configured, the node has exactly the roles in the list, and not any additional roles. This means to configure a master-only node, the setting is merely 'node.roles: [master]', and to configure a coordinating-only node, the setting is merely: 'node.roles: []'. With this change we deprecate the existing 'node.*' settings such as 'node.data'.	2020-06-25 14:14:51 -04:00
Nik Everett	03e6d1b535	Add Variable Width Histogram Aggregation (backport of #42035 ) (#58440 ) Implements a new histogram aggregation called `variable_width_histogram` which dynamically determines bucket intervals based on document groupings. These groups are determined by running a one-pass clustering algorithm on each shard and then reducing each shard's clusters using an agglomerative clustering algorithm. This PR addresses #9572. The shard-level clustering is done in one pass to minimize memory overhead. The algorithm was lightly inspired by [this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches a small number of documents to sample the data and determine initial clusters. Subsequent documents are then placed into one of these clusters, or a new one if they are an outlier. This algorithm is described in more details in the aggregation's docs. At reduce time, a [hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering) algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304) continually merges the closest buckets from all shards (based on their centroids) until the target number of buckets is reached. The final values produced by this aggregation are approximate. Each bucket's min value is used as its key in the histogram. Furthermore, buckets are merged based on their centroids and not their bounds. So it is possible that adjacent buckets will overlap after reduction. Because each bucket's key is its min, this overlap is not shown in the final histogram. However, when such overlap occurs, we set the key of the bucket with the larger centroid to the midpoint between its minimum and the smaller bucket’s maximum: `min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to increases the accuracy of the clustering. Nodes are unable to share centroids during the shard-level clustering phase. In the future, resolving https://github.com/elastic/elasticsearch/issues/50863 would let us solve this issue. It doesn’t make sense for this aggregation to support the `min_doc_count` parameter, since clusters are determined dynamically. The `order` parameter is not supported here to keep this large PR from becoming too complex. Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>	2020-06-25 11:40:47 -04:00
Tim Brooks	5efec3a517	Add error logging when http test fails (#58505 ) Netty4HttpServerTransportTests has started to fail intermittently. It seems like unexpected successful responses are being received when the test is simulating errors. This commit adds logging to the test to provide additional information when there is an unexpected success. It also adds the logging to the nio http test.	2020-06-24 11:02:20 -06:00
Alan Woodward	d251a482e9	Move MappedFieldType.similarity() to TextSearchInfo (#58439 ) Similarities only apply to a few text-based field types, but are currently set directly on the base MappedFieldType class. This commit moves similarity information into TextSearchInfo, and removes any mentions of it from MappedFieldType or FieldMapper. It was previously possible to include a similarity parameter on a number of field types that would then ignore this information. To make it obvious that this has no effect, setting this parameter on non-text field types now issues a deprecation warning.	2020-06-24 10:00:32 +01:00
Ryan Ernst	88f1dab8b5	Fix long/int precision for test baseport calculation	2020-06-23 16:02:13 -07:00
Ryan Ernst	6285b87b97	Adjust gradle base port by one (#58368 ) When assigning ports for internal cluster tests, we use the gradle worker id as an adjustment on the base port of 10300. In order to not go outside the max port range, we modulo the worker id by 223. Since gradle worker ids start at 1, we expect to never actually get the base port of 10300. However, as the gradle daemon lasts for longer, the module can result in a value of 0, which cases the test to fail. This commit adjusts the modulo to ensure the value is never 0. closes #58279	2020-06-23 15:42:26 -07:00
David Roberts	0d6bfd0ac3	[7.x][ML] Fix wire serialization for flush acknowledgements (#58443 ) There was a discrepancy in the implementation of flush acknowledgements: most of the class was designed on the basis that the "last finalized bucket time" could be null but the wire serialization assumed that it was never null. This works because, the C++ sends zero "last finalized bucket time" when it is not known or not relevant. But then the Java code will print that to XContent as it is assuming null represents not known or not relevant. This change corrects the discrepancies. Internally within the class null represents not known or not relevant, but this is translated from/to 0 for communications from the C++ and old nodes that have the bug. Additionally I switched from Date to Instant for this class and made the member variables final to modernise it a bit. Backport of #58413	2020-06-23 16:42:06 +01:00
Alan Woodward	8ebd341710	Add text search information to MappedFieldType (#58230 ) (#58432 ) Now that MappedFieldType no longer extends lucene's FieldType, we need to have a way of getting the index information about a field necessary for building text queries, building term vectors, highlighting, etc. This commit introduces a new TextSearchInfo abstraction that holds this information, and a getTextSearchInfo() method to MappedFieldType to make it available. Field types that do not support text search can just return null here. This allows us to remove the MapperService.getLuceneFieldType() shim method.	2020-06-23 14:37:26 +01:00
Martijn van Groningen	7dda9934f9	Keep track of timestamp_field mapping as part of a data stream (#58400 ) Backporting #58096 to 7.x branch. Relates to #53100 * use mapping source direcly instead of using mapper service to extract the relevant mapping details * moved assertion to TimestampField class and added helper method for tests * Improved logic that inserts timestamp field mapping into an mapping. If the timestamp field path consisted out of object fields and if the final mapping did not contain the parent field then an error occurred, because the prior logic assumed that the object field existed.	2020-06-22 17:46:38 +02:00
Nik Everett	49684463dd	Mute ESTestCaseTests#testBasePortGradle Tracked by #58279. Failed a few times a day since June 13th.	2020-06-18 15:41:25 -04:00
Stuart Tettemer	20abba8433	Scripting: Deprecate general cache settings (#55753 ) (#58283 ) Backport: ef543b0	2020-06-18 11:54:23 -06:00
Alan Woodward	4b8cf2af6a	Add serialization test for FieldMappers when include_defaults=true (#58235 ) (#58328 ) Fixes a bug in TextFieldMapper serialization when index is false, and adds a base-class test to ensure that all field mappers are tested against all variations with defaults both included and excluded. Fixes #58188	2020-06-18 15:46:04 +01:00
Alan Woodward	ca2d12d039	Remove Settings parameter from FieldMapper base class (#58237 ) This is currently used to set the indexVersionCreated parameter on FieldMapper. However, this parameter is only actually used by two implementations, and clutters the API considerably. We should just remove it, and use it directly in the implementations that require it.	2020-06-18 12:53:54 +01:00
Rene Groeschke	abc72c1a27	Unify dependency licenses task configuration (#58116 ) (#58274 ) - Remove duplicate dependency configuration - Use task avoidance api accross the build - Remove redundant licensesCheck config	2020-06-18 08:15:50 +02:00
Ryan Ernst	c57a3f3e60	Cleanup jdk repro parameters (#57838 ) The version of java printed when a test fails currently is passed in from gradle. However, we already know this from java itself, so it is not necessary. This commit changes how the runtime.java repro parameter is found, as well as removes the compiler.java parameter which is no longer relevant. closes #57756	2020-06-17 20:53:00 -07:00
Julie Tibshirani	b1161cba35	Rename SearchContext#smartNameFieldType. (#58203 ) The concept of a 'smart name' doesn't make sense now that there are no mapping types.	2020-06-17 10:38:32 -07:00
Tim Brooks	2074412d79	Retry failed replication due to transient errors (#56230 ) Currently a failed replication action will fail an entire replica. This includes when replication fails due to potentially short lived transient issues such as network distruptions or circuit breaking errors. This commit implements retries using the retryable action.	2020-06-17 10:17:30 -06:00
Stuart Tettemer	01795d1925	Revert "Scripting: Deprecate general cache settings (#55753 )" (#58201 ) This reverts commit `88e8b34fc2`.	2020-06-16 14:58:18 -06:00
Rory Hunter	03369e0980	Implement dangling indices API (#58176 ) Backport of #50920. Part of #48366. Implement an API for listing, importing and deleting dangling indices. Co-authored-by: David Turner <david.turner@elastic.co>	2020-06-16 21:50:38 +01:00
Stuart Tettemer	88e8b34fc2	Scripting: Deprecate general cache settings (#55753 ) Backport: ef543b0	2020-06-16 13:06:59 -06:00
Alan Woodward	12a3f6dfca	MappedFieldType should not extend FieldType (#58160 ) MappedFieldType is a combination of two concerns: * an extension of lucene's FieldType, defining how a field should be indexed * a set of query factory methods, defining how a field should be searched We want to break these two concerns apart. This commit is a first step to doing this, breaking the inheritance relationship between MappedFieldType and FieldType. MappedFieldType instead has a series of boolean flags defining whether or not the field is searchable or aggregatable, and FieldMapper has a separate FieldType passed to its constructor defining how indexing should be done. Relates to #56814	2020-06-16 16:56:43 +01:00
Tal Levy	69d5e044af	Add optional description parameter to ingest processors. (#57906 ) (#58152 ) This commit adds an optional field, `description`, to all ingest processors so that users can explain the purpose of the specific processor instance. Closes #56000.	2020-06-15 19:27:57 -07:00
Ignacio Vera	1cc3c1a354	GeometryTestUtils should always generate valid polygons (#58034 ) (#58091 ) Make sure we always generate legal polygons, e.g they have area	2020-06-15 09:36:44 +02:00
Rene Groeschke	01e9126588	Remove deprecated usage of testCompile configuration (#57921 ) (#58083 ) * Remove usage of deprecated testCompile configuration * Replace testCompile usage by testImplementation * Make testImplementation non transitive by default (as we did for testCompile) * Update CONTRIBUTING about using testImplementation for test dependencies * Fail on testCompile configuration usage	2020-06-14 22:30:44 +02:00
Yannick Welsch	85b0b540f0	Fix refresh behavior in MockDiskUsagesIT (#57926 ) Ensures that InternalClusterInfoService's internally cached stats are refreshed whenever the shard size or disk usage function (to mock out disk usage) are overridden. Closes #57888	2020-06-11 17:38:12 +02:00
Dan Hermann	b501b282f8	Change default backing index naming scheme	2020-06-09 09:31:34 -05:00
Armin Braun	0987c0a5f3	Fix Broken Numeric Shard Generations in RepositoryData (#57813 ) (#57821 ) Fix broken numeric shard generations when reading them from the wire or physically from the physical repository. This should be the cheapest way to clean up broken shard generations in a BwC and safe-to-backport manner for now. We can potentially further optimize this by also not doing the checks on the generations based on the versions we see in the `RepositoryData` but I don't think it matters much since we will read `RepositoryData` from cache in almost all cases. Closes #57798	2020-06-08 18:36:56 +02:00
Mayya Sharipova	70e63a365a	Refactor how to determine if a field is metafield (#57378 ) (#57771 ) Before to determine if a field is meta-field, a static method of MapperService isMetadataField was used. This method was using an outdated static list of meta-fields. This PR instead changes this method to the instance method that is also aware of meta-fields in all registered plugins. Related #38373, #41656 Closes #24422	2020-06-08 09:16:18 -04:00
Armin Braun	619e4f8c02	Make BackgroundIndexer more Efficient (#57781 ) (#57789 ) Improve efficiency of background indexer by allowing to add an assertion for failures while they are produced to prevent queuing them up. Also, add non-blocking stop to the background indexer so that when stopping multiple indexers we don't needlessly continue indexing on some indexers while stopping another one. Closes #57766	2020-06-08 10:18:47 +02:00
Howard	76ee1aad4b	Remove unused routing for ClusterState creation utils (#57679 ) Remove some unused routing definitions from cluster state creation utils.	2020-06-04 13:59:18 -04:00
Igor Motov	8d7f389f3a	Increase search.max_buckets to 65,535 (#57042 ) Increases the default search.max_buckets limit to 65,535, and only counts buckets during reduce phase. Closes #51731	2020-06-03 15:35:41 -04:00
Nhat Nguyen	5097071230	Increase timeout for GlobalCheckpointSyncIT (#57567 ) The test failed when it was running with 4 replicas and 3 indexing threads. The recovering replicas can prevent the global checkpoint from advancing. This commit increases the timeout to 60 seconds for this suite and the check for no inflight requests. Closes #57204	2020-06-03 08:50:02 -04:00
Nik Everett	2a27c411fb	Same memory when geo aggregations are not on top (#57483 ) (#57551 ) Saves memory when the `geotile_grid` and `geohash_grid` are not on the top level by using the `LongKeyedBucketOrds` we built in #55873.	2020-06-02 16:21:50 -04:00
Mark Tozzi	e50f514092	IndexFieldData should hold the ValuesSourceType (#57373 ) (#57532 )	2020-06-02 12:16:53 -04:00
Armin Braun	ba2d70d8eb	Serialize Outbound Messages on IO Threads (#56961 ) (#57080 ) Almost every outbound message is serialized to buffers of 16k pagesize. We were serializing these messages off the IO loop (and retaining the concrete message instance as well) and would then enqueue it on the IO loop to be dealt with as soon as the channel is ready. 1. This would cause buffers to be held onto for longer than necessary, causing less reuse on average. 2. If a channel was slow for some reason, not only would concrete message instances queue up for it, but also 16k of buffers would be reserved for each message until it would be written+flushed physically. With this change, the serialization happens on the event loop which effectively limits the number of buffers that `N` IO-threads will ever use so long as messages are small and channels writable. Also, this change dereferences the reference to the concrete outbound message as soon as it has been serialized to save some more on GC. This reduces the GC time for a default PMC run by about 50% in experiments (3 nodes, 2G heap each, loopback ... obvious caveat is that GC isn't that heavy in the first place with recent changes but still a measurable gain). I also expect it to be helpful for master node stability by causing less of a spike if master is e.g. hit by a large number of requests that are processed batched (e.g. shard snapshot status updates) and responded to in a short time frame all at once. Obviously, the downside to this change is that it introduces more latency on the IO loop for the serialization. But since we read all of these messages on the IO loop as well I don't see it as much of a qualitative change really and the more predictable buffer use seems much more valuable relatively.	2020-06-02 16:15:18 +02:00
Andrei Dan	bd188f4a21	[7.x] ILM: add support for rolling over data streams (#57295 ) (#57515 ) As the datastream information is stored in the `ClusterState.Metadata` we exposed the `Metadata` to the `AsyncWaitStep#evaluateCondition` method in order for the steps to be able to identify when a managed index is part of a DataStream. If a managed index is part of a DataStream the rollover target is the DataStream name and the highest generation index is the write index (ie. the rolled index). (cherry picked from commit 6b410dfb78f3676fce1b7401f1628c1ca6fbd45a) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-06-02 11:55:23 +01:00
Zachary Tong	daaf5a3dcc	Fix assertion catching in aggregation supported type test (#56466 ) (#57382 ) At some point, we changed the supported-type test to also catch assertion errors. This has the side effect of also catching the `fail()` call inside the try-catch, which silently smothered some failures. This modifies the test to throw at the end of the try-catch block to prevent from accidentally catching itself. Catching the AssertionError is convenient because there are other locations that do throw an assertion in tests (due to hitting an assertion before the exception is thrown) so I think we should keep it around. Also includes a variety of fixes to other tests which were failing but being silently smothered.	2020-06-01 12:10:05 -04:00
Nik Everett	4263c25b2f	Save memory when histogram agg is not on top (backport of #57277 ) (#57377 ) This saves some memory when the `histogram` aggregation is not a top level aggregation by dropping `asMultiBucketAggregator` in favor of natively implementing multi-bucket storage in the aggregator. For the most part this just uses the `LongKeyedBucketOrds` that we built the first time we did this.	2020-05-29 15:07:37 -04:00
Benjamin Trent	15aba60c02	[7.x] Add new circuitbreaker plugin and refactor CircuitBreakerService (#55695 ) (#57359 ) * Add new circuitbreaker plugin and refactor CircuitBreakerService (#55695) This commit lays the ground work for plugins supplying their own circuit breakers. It adds a new interface: `CircuitBreakerPlugin`. This interface provides methods for providing custom child CircuitBreaker objects. There are also facilities for allowing dynamic settings for the custom breakers. With the refactor, circuit breakers are no longer replaced on setting changes. Instead, the two mutable settings themselves are `volatile`. Plugins that want to use their custom circuit breaker should keep a reference of their constructed breaker.	2020-05-29 12:13:46 -04:00
Mayya Sharipova	aebb78bf5c	Run sort optimization when from+size>0 (#57250 )	2020-05-29 11:30:35 -04:00
Armin Braun	be6fa72432	Fix GCS Mock Behavior for Missing Bucket (#57283 ) (#57310 ) * Fix GCS Mock Behavior for Missing Bucket We were throwing a 500 instead of a 404 for a missing bucket. This would make yaml tests needlessly wait for multiple seconds, retrying the 500 response with backoff, in the test checking behavior for missing buckets.	2020-05-29 10:01:20 +02:00
Nhat Nguyen	5b08eaf90c	Fix trimUnsafeCommits for indices created before 6.2 (#57187 ) If an upgraded node is restarted multiple times without flushing a new index commit, then we will wrongly exclude all commits from the starting commits. This bug is reproducible with these minimal steps: (1) create an empty index on 6.1.4 with translog retention disabled, (2) upgrade the cluster to 7.7.0, (3) restart the upgraded the cluster. The problem is that with the new translog policy can trim translog without having a new index commit, while the existing commit still refers to the previous translog generation. Closes #57091	2020-05-27 15:08:49 -04:00
Alan Woodward	d6b79bcd95	Remove Mapper.updateFieldType() (#57151 ) When we had multiple mapping types, an update to a field in one type had to be propagated to the same field in all other types. This was done using the Mapper.updateFieldType() method, called at the end of a merge. However, now that we only have a single type per index, this method is unnecessary and can be removed. Relates to #41059 Backport of #56986	2020-05-27 09:21:24 +01:00
Nik Everett	0fce2b7713	Fix DateHistogramAggregatorTests.testAsSubAgg Closes #57168 by using `AggregatorTestCase#newIndexSearcher` in the `AggregatorTestCase#testCase`. Without that global ordinals will sometimes fail to work.	2020-05-26 15:05:31 -04:00
Armin Braun	5569137ae3	Flatten ReleaseableBytesReference Object Trees (#57092 ) (#57109 ) When slicing a releasable bytes reference we would create a new counter every time and pass the original reference chain to the new slice on every slice invocation. This would lead to extremely deep reference chains and needlessly uses a dedicated counter for every slice when all the slices eventually just refer to the same underlying bytes and `Releasable`. This commit tracks the ref count wrapper with its releasable in a separate object that can be passed around on every slicing, making the slices' tree as flat as the original releasable bytes reference. Also, we were needlessly creating a redundant releasable bytes reference from a releasable bytes-stream-output that we never actually used for releasing (all code that uses it just releases the stream itself instead).	2020-05-25 13:00:37 +02:00
Armin Braun	a4eb3edf46	Fix GCS Repository YAML Test Build (#57073 ) (#57101 ) A few relatively obvious issues here: * We cannot run the different IT runs (large blob setting one and normal integ run) concurrently * We need to set the dependency tasks up correctly for the large blob run so that it works in isolation * We can't use the `localAddress` for the location header of the resumable upload (this breaks in YAML tests because GCS is using a loopback port forward for the initial request and the local address will be chosen as the actual Docker container host) Closes #57026	2020-05-25 11:10:39 +02:00
Alan Woodward	18bfbeda29	Move merge compatibility logic from MappedFieldType to FieldMapper (#56915 ) Merging logic is currently split between FieldMapper, with its merge() method, and MappedFieldType, which checks for merging compatibility. The compatibility checks are called from a third class, MappingMergeValidator. This makes it difficult to reason about what is or is not compatible in updates, and even what is in fact updateable - we have a number of tests that check compatibility on changes in mapping configuration that are not in fact possible. This commit refactors the compatibility logic so that it all sits on FieldMapper, and makes it called at merge time. It adds a new FieldMapperTestCase base class that FieldMapper tests can extend, and moves the compatibility testing machinery from FieldTypeTestCase to here. Relates to #56814	2020-05-20 09:43:13 +01:00
Tim Brooks	57c3a61535	Create HttpRequest earlier in pipeline (#56393 ) Elasticsearch requires that a HttpRequest abstraction be implemented by http modules before server processing. This abstraction controls when underlying resources are released. This commit moves this abstraction to be created immediately after content aggregation. This change will enable follow-up work including moving Cors logic into the server package and tracking bytes as they are aggregated from the network level.	2020-05-18 14:54:01 -06:00
Francisco Fernández Castaño	60c7832141	Track upload requests on S3 repositories (#56904 ) Add tracking for regular and multipart uploads. Regular uploads are categorized as PUT. Multi part uploads are categorized as POST. The number of documents created for the test #testRequestStats have been increased so all upload methods are exercised. Backport of #56826	2020-05-18 19:05:17 +02:00
Francisco Fernández Castaño	8ab9fc10c1	Track multipart/resumable uploads GCS API calls (#56892 ) Add tracking for multipart and resumable uploads for GoogleCloudStorage. For resumable uploads only the last request is taken into account for billing, so that's the only request that's tracked. Backport of #56821	2020-05-18 13:39:26 +02:00
Ryan Ernst	9fb80d3827	Move publishing configuration to a separate plugin (#56727 ) This is another part of the breakup of the massive BuildPlugin. This PR moves the code for configuring publications to a separate plugin. Most of the time these publications are jar files, but this also supports the zip publication we have for integ tests.	2020-05-14 20:23:07 -07:00
Francisco Fernández Castaño	97bf47f5b9	Track GET/LIST GoogleCloudStorage API calls (#56758 ) Backporting #56585 to 7.x branch. Adds tracking for the API calls performed by the GoogleCloudStorage underlying SDK. It hooks an HttpResponseInterceptor to the SDK transport layer and does http request filtering based on the URI paths that we are interested to track. Unfortunately we cannot hook a wrapper into the ServiceRPC interface since we're using different levels of abstraction to implement retries during reads (GoogleCloudStorageRetryingInputStream).	2020-05-14 14:03:21 +02:00
Nik Everett	b98b260048	Merge significant_terms into the terms package (backport of #56699 ) (#56715 ) This merges the code for the `significant_terms` agg into the package for the code for the `terms` agg. They are super entangled already, this mostly just admits that to ourselves. Precondition for the terms work in #56487	2020-05-13 17:36:21 -04:00
Ignacio Vera	b4521d5183	upgrade to Lucene 8.6.0 snapshot (#56661 )	2020-05-13 14:25:16 +02:00
Henning Andersen	48a8c7eb88	Ensure search contexts are removed on index delete (#56335 ) (#56617 ) In a race condition, a search context could remain enlisted in SearchService when an index is deleted, potentially causing the index folder to not be cleaned up (for either lengthy searches or scrolls with timeouts > 30 minutes or if the scroll is kept active).	2020-05-13 09:41:02 +02:00
Armin Braun	0a879b95d1	Save Bounds Checks in BytesReference (#56577 ) (#56621 ) Two spots that allow for some optimization: * We are often creating a composite reference of just a single item in the transport layer => special cased via static constructor to make sure we never do that * Also removed the pointless case of an empty composite bytes ref * `ByteBufferReference` is practically always created from a heap buffer these days so there is no point of dealing with all the bounds checks and extra references to sliced buffers from that and we can just use the underlying array directly	2020-05-12 20:33:45 +02:00
Nik Everett	c85a363b60	Fix nested agg test I accidentally allowed the test framework to double-wrap a reader that we rely on being only singly wrapped. Lame. closes #56529	2020-05-11 16:15:25 -04:00
Martijn van Groningen	32471abc0e	Check whether data stream feature flag is enabled before deleting all data streams, (#56517 ) (#56520 ) this will fix the release build.	2020-05-11 18:34:50 +02:00
Rene Groeschke	c29bc87040	Move bwcVersions extension property to BuildParams (back port) (#56381 ) * Move bwcVersions extension property to BuildParams (#56206) * Fix :qa Task Using Broken BwC Versions Resolution (#56332) Co-authored-by: Armin Braun <me@obrown.io>	2020-05-11 09:39:13 +02:00
Nik Everett	2f38aeb5e2	Save memory when numeric terms agg is not top (#55873 ) (#56454 ) Right now all implementations of the `terms` agg allocate a new `Aggregator` per bucket. This uses a bunch of memory. Exactly how much isn't clear but each `Aggregator` ends up making its own objects to read doc values which have non-trivial buffers. And it forces all of it sub-aggregations to do the same. We allocate a new `Aggregator` per bucket for two reasons: 1. We didn't have an appropriate data structure to track the sub-ordinals of each parent bucket. 2. You can only make a single call to `runDeferredCollections(long...)` per `Aggregator` which was the only way to delay collection of sub-aggregations. This change switches the method that builds aggregation results from building them one at a time to building all of the results for the entire aggregator at the same time. It also adds a fairly simplistic data structure to track the sub-ordinals for `long`-keyed buckets. It uses both of those to power numeric `terms` aggregations and removes the per-bucket allocation of their `Aggregator`. This fairly substantially reduces memory consumption of numeric `terms` aggregations that are not the "top level", especially when those aggregations contain many sub-aggregations. It also is a pretty big speed up, especially when the aggregation is under a non-selective aggregation like the `date_histogram`. I picked numeric `terms` aggregations because those have the simplest implementation. At least, I could kind of fit it in my head. And I haven't fully understood the "bytes"-based terms aggregations, but I imagine I'll be able to make similar optimizations to them in follow up changes.	2020-05-08 20:38:53 -04:00
Martijn van Groningen	83739b5806	Backport: allow cluster health api to resolve data streams (#56425 ) Backport of: #56413 Allow cluster health api to resolve data streams and automatically remove data streams after each test in test cases extending from `ESIntegTestCase` Relates to #53100	2020-05-08 17:16:25 +02:00
Julie Tibshirani	e852bb29b7	Simplify signature of FieldMapper#parseCreateField. (#56144 ) `FieldMapper#parseCreateField` accepts the parse context, plus a list of fields as an output parameter. These fields are immediately added to the document through `ParseContext#doc()`. This commit simplifies the signature by removing the list of fields, and having the mappers add the fields directly to `ParseContext#doc()`. I think this is nicer for implementors, because previously fields could be added either through the list, or the context (through `add`, `addWithKey`, etc.)	2020-05-06 11:12:09 -07:00
Tanguy Leroux	131a3911eb	Replace BlobContainerWrapper by FilterBlobContainer (#56200 ) A FilterBlobContainer class was introduced in #55952 and it delegates its behavior to a given BlobContainer while allowing to override only necessary methods. This commit replaces the existing BlobContainerWrapper class from the test framework with the new FilterBlobContainer from core.	2020-05-06 10:05:43 +02:00
Tanguy Leroux	35622747fd	Add Minio tests for searchable snapshots (#56112 ) (#56179 ) This commit adds QA tests for searchable snapshot on MinIO, similarly to what already exist for S3, GCS and Azure.	2020-05-05 11:40:06 +02:00
Martijn van Groningen	2ac32db607	Move includeDataStream flag from IndicesOptions to IndexNameExpressionResolver.Context (#56151 ) Backport of #56034. Move includeDataStream flag from an IndicesOptions to IndexNameExpressionResolver.Context as a dedicated field that callers to IndexNameExpressionResolver can set. Also alter indices stats api to support data streams. The rollover api uses this api and otherwise rolling over data stream does no longer work. Relates to #53100	2020-05-04 22:38:33 +02:00
Armin Braun	e01b999ef0	Add Functionality to Consistently Read RepositoryData For CS Updates (#55773 ) (#56091 ) Using optimistic locking, add the ability to run a repository state update task with a consistent view of the current repository data. Allows for a follow-up to remove the snapshot INIT state.	2020-05-04 08:13:14 +02:00
Armin Braun	3a64ecb6bf	Allow Deleting Multiple Snapshots at Once (#55474 ) (#56083 ) * Allow Deleting Multiple Snapshots at Once (#55474) Adds deleting multiple snapshots in one go without significantly changing the mechanics of snapshot deletes otherwise. This change does not yet allow mixing snapshot delete and abort. Abort is still only allowed for a single snapshot delete by exact name.	2020-05-03 20:30:58 +02:00
Tim Brooks	54dbea6c65	Improve RemoteConnectionManager consistency (#55759 ) In order to iterate through remote connections, the remote connection manager maintains a local cache of connected nodes. Unfortunately this is difficult in relationship with testing as it is inherently racy in comparison to the parent connection manager map of connections. This commit improves the relationship by only returning a cached connection if it is still registered with the parent. If the connection is not open, we will go to the slow path of allocating a iterator directly from the parent.	2020-04-30 12:13:06 -06:00
David Turner	445cf32591	Stop exposing ExecutorService from DeterministicTaskQueue (#56001 ) There are no real users of `DeterministicTaskQueue#getExecutorService()` so we can remove those public methods and expose the `ExecutorService` only through the corresponding `ThreadPool`.	2020-04-30 11:34:52 +01:00
Armin Braun	31a84b17ad	Make SAME Pool on DeterministicTaskQueue more Realistic (#55931 ) (#55999 ) By forking off the `SAME` pool tasks and executing them in random order, we are actually creating unrealisticc scenarios and missing the actual order of operations (whatever task that puts the task on the `SAME` queue will always run before the `SAME` queued task will be executed currently). Also, added caching for the executors. It doesn't matter much, but saves some objects and makes debugging a little easier because executor object ids make more sense.	2020-04-30 10:41:33 +02:00
Christos Soulios	43dab77186	[7.x] Modified searchAndReduce() to return empty agg when no docs exist (#55967 ) Backports #55826 to 7.x Modified AggregatorTestCase.searchAndReduce() method so that it returns an empty aggregation result when no documents have been inserted. Also refactored several aggregation tests so they do not re-implement method AggregatorTestCase.testCase() Fixes #55824	2020-04-30 00:28:32 +03:00
Tim Brooks	cd228095df	Retry failed peer recovery due to transient errors (#55883 ) Currently a failed peer recovery action will fail an recovery. This includes when the recovery fails due to potentially short lived transient issues such as rejected exceptions or circuit breaking errors. This commit adds the concept of a retryable action. A retryable action will be retryed in face of certain errors. The action will be retried after an exponentially increasing backoff period. After defined time, the action will timeout. This commit only implements retries for responses that indicate the target node has NOT executed the action.	2020-04-28 13:52:49 -06:00
Lee Hinman	777caf0725	[7.x] Add support for V2 index templates to /_cat/templates (#55829 ) (#55866 ) Backports the following commits to 7.x: - Add support for V2 index templates to /_cat/templates (#55829)	2020-04-28 10:14:19 -06:00
Tim Brooks	80662f31a1	Introduce mechanism to stub request handling (#55832 ) Currently there is a clear mechanism to stub sending a request through the transport. However, this is limited to testing exceptions on the sender side. This commit reworks our transport related testing infrastructure to allow stubbing request handling on the receiving side.	2020-04-27 16:57:15 -06:00
Mark Tozzi	22a98ec279	Aggregation support for Value Scripts that change types (#54830 ) (#55752 )	2020-04-27 09:57:05 -04:00
Mark Tozzi	87b4979c24	[7.x] Make ValuesSourceRegistry immutable after initilization #55493 (#55697 )	2020-04-24 13:33:38 -04:00
Zachary Tong	715c90bf7d	Aggs must specify a `field` or `script` (or both) (#52226 ) This adds a validation to VSParserHelper to ensure that a field or script or both are specified by the user. This is technically required today already, but throws an exception much deeper in the agg framework and has a very unintuitive error for the user (as well as eating more resources instead of failing early)	2020-04-23 19:23:41 -04:00
Zachary Tong	4f483ac370	Fix half-float range in SupportedTypeTests (#55409 ) Also adds a comment to the half-float number field type tests indicating why 70000 is used instead of 65504	2020-04-23 11:36:37 -04:00
Tal Levy	0844455505	Add geo_shape mapper supporting doc-values in Spatial Plugin (#55037 ) (#55500 ) After #53562, the `geo_shape` field mapper is registered within a module. This opens the door for introducing a new `geo_shape` field mapper into the Spatial Plugin that has doc-values support. This is very much an extension of server's GeoShapeFieldMapper, but with the addition of the doc values implementation.	2020-04-22 08:12:54 -07:00
Armin Braun	db7eb8e8ff	Remove Redundant CS Update on Snapshot Finalization (#55276 ) (#55528 ) This change folds the removal of the in-progress snapshot entry into setting the safe repository generation. Outside of removing an unnecessary cluster state update, this also has the advantage of removing a somewhat inconsistent cluster state where the safe repository generation points at `RepositoryData` that contains a finished snapshot while it is still in-progress in the cluster state, making it easier to reason about the state machine of upcoming concurrent snapshot operations.	2020-04-21 15:33:17 +02:00
Dan Hermann	402b6b1715	Identify backing indices for data streams	2020-04-21 07:43:10 -05:00
Yannick Welsch	ba39c261e8	Use streaming reads for GCS (#55506 ) To read from GCS repositories we're currently using Google SDK's official BlobReadChannel, which issues a new request every 2MB (default chunk size for BlobReadChannel) using range requests, and fully downloads the chunk before exposing it to the returned InputStream. This means that the SDK issues an awfully high number of requests to download large blobs. Increasing the chunk size is not an option, as that will mean that an awfully high amount of heap memory will be consumed by the download process. The Google SDK does not provide the right abstractions for a streaming download. This PR uses the lower-level primitives of the SDK to implement a streaming download, similar to what S3's SDK does. Also closes #55505	2020-04-21 13:22:26 +02:00
Nhat Nguyen	3cc4e0dd09	Retry follow task when remote connection queue full (#55314 ) If more than 100 shard-follow tasks are trying to connect to the remote cluster, then some of them will abort with "connect listener queue is full". This is because we retry on ESRejectedExecutionException, but not on RejectedExecutionException.	2020-04-20 22:43:05 -04:00
Stuart Tettemer	93a2e9b0f9	Test: MockScoreScript can be cacheable. (#55499 ) Backport: 0ed1eb5	2020-04-20 17:09:58 -06:00
Armin Braun	a0763d958d	Make RepositoryData Less Memory Heavy (#55293 ) (#55468 ) We don't really need `LinkedHashSet` here. We can assume that all the entries are unique and just use a list and use the list utilities to create the cheapest possible version of the list. Also, this fixes a bug in `addSnapshot` which would mutate the existing linked hash set on the current instance (fortunately this never caused a real world bug) and brings the collection in line with the java docs on its getter that claim immutability.	2020-04-20 18:28:06 +02:00
Dan Hermann	dc703d75f5	Add explicit generation attribute to data streams	2020-04-20 07:40:33 -05:00
Yannick Welsch	b9da307cd1	Add GCS support for searchable snapshots (#55403 ) Adds ranged read support for GCS repositories in order to enable searchable snapshot support for GCS. As part of this PR, I've extracted some of the test infrastructure to make sure that GoogleCloudStorageBlobContainerRetriesTests and S3BlobContainerRetriesTests are covering similar test (as I saw those diverging in what they cover)	2020-04-20 13:02:59 +02:00
David Turner	0458770556	Rebooted master-ineligibles should not bootstrap (#55302 ) In #55298 we saw a failure of `CoordinationStateTests#testSafety` in which a single master-eligible node is bootstrapped, then rebooted as a master-ineligible node (losing its persistent state) and then rebooted as a master-eligible node and bootstrapped again. This happens because this test loses too much of the persistent state; in fact once bootstrapped the node would not allow itself to be bootstrapped again. This commit adjusts the test logic to reflect this. Closes #55298	2020-04-20 09:10:35 +01:00
Zachary Tong	f46b567563	Convert InternalAggTestCase to AbstractNamedWriteableTestCase (#55250 ) Some aggregations, such as the Terms* family, will use an alternate class to represent unmapped shard results (while the rest of the aggs use the same object but with some form of "empty" or "nullish" values to represent unmapped). This was problematic with AbstractWireSerializingTestCase because it expects the instanceReader to always match the original class. Instead, we need to use the NamedWriteable version so that the registry can be consulted for the proper deserialization reader.	2020-04-17 16:39:38 -04:00
Rory Hunter	a5b545b2a0	Use LTS version of Ubuntu in Dockerfiles (#55370 ) We have some Dockerfiles that reference Ubuntu 19.04, which is not an LTS version and has now appears to have been retired from the Ubuntu repositories. Switch to 18.04, which is the current long-term support version. This also requires a switch from OpenJDK 12 to 11. Also change a usage of 16.04 to 18.04, for consistency.	2020-04-17 16:14:14 -04:00
Tanguy Leroux	71855fbfe0	Mute testSupportedFieldTypes in HDRPreAggregatedPercentile tests (#55369 ) Relates #55360	2020-04-17 10:49:43 +02:00
Martijn van Groningen	417d5f2009	Make data streams in APIs resolvable. (#55337 ) Backport from: #54726 The INCLUDE_DATA_STREAMS indices option controls whether data streams can be resolved in an api for both concrete names and wildcard expressions. If data streams cannot be resolved then a 400 error is returned indicating that data streams cannot be used. In this pr, the INCLUDE_DATA_STREAMS indices option is enabled in the following APIs: search, msearch, refresh, index (op_type create only) and bulk (index requests with op type create only). In a subsequent later change, we will determine which other APIs need to be able to resolve data streams and enable the INCLUDE_DATA_STREAMS indices option for these APIs. Whether an api resolve all backing indices of a data stream or the latest index of a data stream (write index) depends on the IndexNameExpressionResolver.Context.isResolveToWriteIndex(). If isResolveToWriteIndex() returns true then data streams resolve to the latest index (for example: index api) and otherwise a data stream resolves to all backing indices of a data stream (for example: search api). Relates to #53100	2020-04-17 08:33:37 +02:00
Mark Tozzi	22c55180c1	[7.x] Backport ValuesSourceRegistry and related work (#54922 ) * Add ValuesSource Registry and associated logic (#54281) * Remove ValuesSourceType argument to ValuesSourceAggregationBuilder (#48638) * ValuesSourceRegistry Prototype (#48758) * Remove generics from ValuesSource related classes (#49606) * fix percentile aggregation tests (#50712) * Basic thread safety for ValuesSourceRegistry (#50340) * Remove target value type from ValuesSourceAggregationBuilder (#49943) * Cleanup default values source type (#50992) * CoreValuesSourceType no longer implements Writable (#51276) * Remove genereics & hard coded ValuesSource references from Matrix Stats (#51131) * Put values source types on fields (#51503) * Remove VST Any (#51539) * Rewire terms agg to use new VS registry (#51182) Also adds some basic AggTestCases for untested code paths (and boilerplate for future tests once the IT are converted over) * Wire Cardinality aggregation to work with the ValuesSourceRegistry (#51337) * Wire Percentiles aggregator into new VS framework (#51639) This required a bit of a refactor to percentiles itself. Before, the Builder would switch on the chosen algo to generate an algo-specific factory. This doesn't work (or at least, would be difficult) in the new VS framework. This refactor consolidates both factories together and introduces a PercentilesConfig object to act as a standardized way to pass algo-specific parameters through the factory. This object is then used when deciding which kind of aggregator to create Note: CoreValuesSourceType.HISTOGRAM still lives in core, and will be moved in a subsequent PR. * Remove generics and target value type from MultiVSAB (#51647) * fix checkstyle after merge (#52008) * Plumb ValuesSourceRegistry through to QuerySearchContext (#51710) * Convert RareTerms to new VS registry (#52166) * Wire up Value Count (#52225) * Wire up Max & Min aggregations (#52219) * ValuesSource refactoring: Wire up Sum aggregation (#52571) * ValuesSource refactoring: Wire up SigTerms aggregation (#52590) * Soft immutability for VSConfig (#52729) * Unmute testSupportedFieldTypes, fix Percentiles/Ranks/Terms tests (#52734) Also fixes Percentiles which was incorrectly specified to only accept numeric, but in fact also accepts Boolean and Date (because those are numeric on master - thanks `testSupportedFieldTypes` for catching it!) * VS refactoring: Wire up stats aggregation (#52891) * ValuesSource refactoring: Wire up string_stats aggregation (#52875) * VS refactoring: Wire up median (MAD) aggregation (#52945) * fix valuesourcetype issue with constant_keyword field (#53041)x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/job/RollupIndexer.java this commit implements `getValuesSourceType` for the ConstantKeyword field type. master was merged into feature/extensible-values-source introducing a new field type that was not implementing `getValuesSourceType`. * ValuesSource refactoring: Wire up Avg aggregation (#52752) * Wire PercentileRanks aggregator into new VS framework (#51693) * Add a VSConfig resolver for aggregations not using the registry (#53038) * Vs refactor wire up ranges and date ranges (#52918) * Wire up geo_bounds aggregation to ValuesSourceRegistry (#53034) This commit updates the geo_bounds aggregation to depend on registering itself in the ValuesSourceRegistry relates #42949. * VS refactoring: convert Boxplot to new registry (#53132) * Wire-up geotile_grid and geohash_grid to ValuesSourceRegistry (#53037) This commit updates the geo_grid aggregations to depend on registering itself in the ValuesSourceRegistry relates to the values-source refactoring meta issue #42949. Wire-up geo_centroid agg to ValuesSourceRegistry (#53040) This commit updates the geo_centroid aggregation to depend on registering itself in the ValuesSourceRegistry. relates to the values-source refactoring meta issue #42949. * Fix type tests for Missing aggregation (#53501) * ValuesSource Refactor: move histo VSType into XPack module (#53298) - Introduces a new API (`getBareAggregatorRegistrar()`) which allows plugins to register aggregations against existing agg definitions defined in Core. - This moves the histogram VSType over to XPack where it belongs. `getHistogramValues()` still remains as a Core concept - Moves the histo-specific bits over to xpack (e.g. the actual aggregator logic). This requires extra boilerplate since we need to create a new "Analytics" Percentile/Rank aggregators to deal with the histo field. Doubly-so since percentiles/ranks are extra boiler-plate'y... should be much lighter for other aggs * Wire up DateHistogram to the ValuesSourceRegistry (#53484) * Vs refactor parser cleanup (#53198) Co-authored-by: Zachary Tong <polyfractal@elastic.co> Co-authored-by: Zachary Tong <zach@elastic.co> Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com> Co-authored-by: Tal Levy <JubBoy333@gmail.com> * First batch of easy fixes * Remove List.of from ValuesSourceRegistry Note that we intend to have a follow up PR dealing with the mutability of the registry, so I didn't even try to address that here. * More compiler fixes * More compiler fixes * More compiler fixes * Precommit is happy and so am I * Add new Core VSTs to tests * Disabled supported type test on SigTerms until we can backport it's fix * fix checkstyle * Fix test failure from semantic merge issue * Fix some metaData->metadata replacements that got lost * Fix list of supported types for MinAggregator * Fix list of supported types for Avg * remove unused import Co-authored-by: Zachary Tong <polyfractal@elastic.co> Co-authored-by: Zachary Tong <zach@elastic.co> Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com> Co-authored-by: Tal Levy <JubBoy333@gmail.com>	2020-04-16 16:54:46 -04:00
Rory Hunter	49f8f66a41	Revert "Use LTS version of Ubuntu in Dockerfiles (#55327 )" This reverts commit `dd76fbac60`.	2020-04-16 20:05:22 +01:00
Rory Hunter	dd76fbac60	Use LTS version of Ubuntu in Dockerfiles (#55327 ) We have some Dockerfiles that reference Ubuntu 19.04, which is not an LTS version and has now appears to have been retired from the Ubuntu repositories. Switch to 18.04, which is the current long-term support version. Also change a usage of 16.04 to 18.04, for consistency.	2020-04-16 19:47:18 +01:00
Christos Soulios	b810f0024a	[7.x] Backport AggregatorTestCase.writeTestDoc() (#55318 )	2020-04-16 21:10:18 +03:00
David Turner	8a565c4fa6	Voting config exclusions should work with absent nodes (#55291 ) Today the voting config exclusions API accepts node filters and resolves them to a collection of node IDs against the current cluster membership. This is problematic since we may want to exclude nodes that are not currently members of the cluster. For instance: - if attempting to remove a flaky node from the cluster you cannot reliably exclude it from the voting configuration since it may not reliably be a member of the cluster - if `cluster.auto_shrink_voting_configuration: false` then naively shrinking the cluster will remove some nodes but will leaving their node IDs in the voting configuration. The only way to clean up the voting configuration is to grow the cluster back to its original size (potentially replacing some of the voting configuration) and then use the exclusions API. This commit adds an alternative API that accepts node names and node IDs but not node filters in general, and deprecates the current node-filters-based API. Relates #47990. Backport of #50836 to 7.x. Co-authored-by: zacharymorn <zacharymorn@gmail.com>	2020-04-16 12:28:50 +01:00
William Brafford	2ba3be9db6	Remove deprecated third-party methods from tests (#55255 ) (#55269 ) I've noticed that a lot of our tests are using deprecated static methods from the Hamcrest matchers. While this is not a big deal in any objective sense, it seems like a small good thing to reduce compilation warnings and be ready for a new release of the matcher library if we need to upgrade. I've also switched a few other methods in tests that have drop-in replacements.	2020-04-15 17:54:47 -04:00
Ryan Ernst	29b70733ae	Use task avoidance with forbidden apis (#55034 ) Currently forbidden apis accounts for 800+ tasks in the build. These tasks are aggressively created by the plugin. In forbidden apis 3.0, we will get task avoidance (https://github.com/policeman-tools/forbidden-apis/pull/162), but we need to ourselves use the same task avoidance mechanisms to not trigger these task creations. This commit does that for our foribdden apis usages, in preparation for upgrading to 3.0 when it is released.	2020-04-15 13:27:53 -07:00
Armin Braun	2f91e2aab7	Fix Race in Snapshot Abort (#54873 ) (#55233 ) We can be a little more efficient when aborting a snapshot. Since we know the new repository data after finalizing the aborted snapshot when can pass it down to the snapshot completion listeners. This way, we don't have to fork off to the snapshot threadpool to get the repository data when the listener completes and can directly submit the delete task with high priority straight from the cluster state thread.	2020-04-15 15:42:15 +02:00
Dan Hermann	30638a0b41	[7.x] Wipe data streams in each REST test (#55009 )	2020-04-15 07:27:39 -05:00
Mark Vieira	ce85063653	[7.x] Re-add origin url information to publish POM files (#55173 )	2020-04-14 13:24:15 -07:00
Yannick Welsch	a610513ec7	Provide repository-level stats for searchable snapshots (#55051 ) Provides basic repository-level stats that will allow us to get some insight into how many requests are actually being made by the underlying SDK. Currently only tracks GET and LIST calls for S3 repositories. Most of the code is unfortunately boiler plate to add a new endpoint that will help us better understand some of the low-level dynamics of searchable snapshots.	2020-04-14 14:34:08 +02:00
William Brafford	52bebec51f	NodeInfo response should use a collection rather than fields (#54460 ) (#55132 ) This is a first cut at giving NodeInfo the ability to carry a flexible list of heterogeneous info responses. The trick is to be able to serialize and deserialize an arbitrary list of blocks of information. It is convenient to be able to deserialize into usable Java objects so that we can aggregate nodes stats for the cluster stats endpoint. In order to provide a little bit of clarity about which objects can and can't be used as info blocks, I've introduced a new interface called "ReportingService." I have removed the hard-coded getters (e.g., getOs()) in favor of a flexible method that can return heterogeneous kinds of info blocks (e.g., getInfo(OsInfo.class)). Taking a class as an argument removes the need to cast in the client code.	2020-04-13 17:18:39 -04:00
Nik Everett	c00811f3a3	Make some agg tests easier to read (#54954 ) (#55079 ) We added a fancy method to provide random realistic test data to the reduction tests in #54910. This uses that to remove some of the more esoteric machinations in the agg tests. This will marginally increase the coverage of the serialiation tests and, more importantly, remove some mysterious value generation code that only really made sense for random reduction tests but was used all over the place. It doesn't, on the other hand, make the tests shorter. Just hopefully more clear. I only cleaned up a few tests this way. If we like this it'd probably be worth grabbing others.	2020-04-10 14:15:30 -04:00
Martijn van Groningen	7f38b146b3	Temporarily preserve data streams after each yaml rest test has executed. (#54959 ) (#55007 ) Instead delete the data streams manually, until client yaml test runners have been updated to also delete all data streams after each yaml test. Relates to #53100	2020-04-09 14:44:57 +02:00
Tal Levy	254d1e3543	[7.x] Create new `geo` module and migrate geo_shape registration (#53562 ) (#54924 ) This commit introduces a new `geo` module that is intended to be contain all the geo-spatial-specific features in server. As a first step, the responsibility of registering the geo_shape field mapper is moved to this module. Co-authored-by: Nicholas Knize <nknize@gmail.com>	2020-04-07 16:30:58 -07:00
Tim Brooks	619028c33e	Implement transport circuit breaking in aggregator (#54927 ) This commit moves the action name validation and circuit breaking into the InboundAggregator. This work is valuable because it lays the groundwork for incrementally circuit breaking as data is received. This PR includes the follow behavioral change: Handshakes contribute to circuit breaking, but cannot be broken. They currently do not contribute nor are they broken.	2020-04-07 17:10:31 -06:00
Tim Brooks	c7053ef824	Use TransportChannel in TransportHandshaker (#54921 ) Currently the TransportHandshaker has a specialized codepath for sending a response. In other work, we are going to start having handshakes contribute to circuit breaking (while not being breakable). This commit moves in that direction by allowing the handshaker to responding using a standard TcpTransportChannel similar to other requests.	2020-04-07 15:37:15 -06:00
Nik Everett	ce7ae4a7d1	Remove pipline aggs from agg result tree (backport of #54716 ) (#54920 ) This removes pipeline aggregators from the aggregation result tree except for a single field used for backwards compatibility with pre-7.8 versions of Elasticsearch. That field isn't populated unless we are serializing to pre-7.8 Elasticsearch. So, good news! We no longer build pipeline aggregators on the data node. Most of the time.	2020-04-07 17:22:23 -04:00
Nik Everett	100f7258c7	Improve agg reduce tests (#54910 ) (#54914 ) This allows subclasses of `InternalAggregationTestCase` to make a `List` of values to reduce so that it can make values that are realistic together. The first use of this is with `InternalTTest` which uses it to make results that don't cause their `sum` field to wrap. It'd likely be useful for a ton of other aggs but just one for now.	2020-04-07 17:22:04 -04:00
Tim Brooks	9cf2406cf1	Move network stats marking into InboundPipeline (#54908 ) This is a follow-up to #48263. It moves the inbound stats tracking inside of the InboundPipeline.	2020-04-07 13:34:05 -06:00
Nik Everett	3c56e0de42	Fix scripted metric in ccs (backport of #54776 ) (#54888 ) `scripted_metric` did not work with cross cluster search because it assumed that you'd never perform a partial reduction, serialize the results, and then perform a final reduction. That serialized-after-partial-reduction step was broken. This is also required to support #54758.	2020-04-07 10:43:00 -04:00
Tanguy Leroux	4d36917e52	Merge feature/searchable-snapshots branch into 7.x (#54803 ) (#54825 ) This is a backport of #54803 for 7.x. This pull request cherry picks the squashed commit from #54803 with the additional commits: 6f50c92 which adjusts master code to 7.x a114549 to mute a failing ILM test (#54818) 48cbca1 and 50186b2 that cleans up and fixes the previous test aae12bb that adds a missing feature flag (#54861) 6f330e3 that adds missing serialization bits (#54864) bf72c02 that adjust the version in YAML tests a51955f that adds some plumbing for the transport client used in integration tests Co-authored-by: David Turner <david.turner@elastic.co> Co-authored-by: Yannick Welsch <yannick@welsch.lu> Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com> Co-authored-by: Andrei Dan <andrei.dan@elastic.co>	2020-04-07 13:28:53 +02:00
Costin Leau	a7c31a7632	Test: Include ML ILM policy in EsRestTest (#54773 ) This should avoid REST failures caused by the inability to delete said policy Fix #54759 (cherry picked from commit 3ba5e02b713c03b1bdf14e0367a2bce68c35dd30)	2020-04-05 20:10:17 +03:00
Jason Tedor	05c5529b2d	Clean up a few instances of "MetaData" We recently cleaned up the use of the word "metadata" across the codebase. A few additional uses have trickled in, likely from in-progress work. This commit cleans up these last few instances. Relates #54519	2020-04-04 10:55:09 -04:00
Christoph Büscher	8c9ac14a98	Rename field name constants in AbstractBuilderTestCase (#53234 ) Some field name constants were not updaten when we moved from "string" to "text" and "keyword" fields. Renaming them makes it easier and faster to know which field type is used in test subclassing this base test case.	2020-04-03 17:28:22 +02:00
Nik Everett	54ea4f4f50	Begin to drop pipeline aggs from the result tree (backport of #54311 ) (#54659 ) Removes pipeline aggregations from the aggregation result tree as they are no longer used. This stops us from building the pipeline aggregators at all on data nodes except for backwards compatibility serialization. This will save a tiny bit of space in the aggregation tree which is lovely, but the biggest benefit is that it is a step towards simplifying pipeline aggregators. This only does about half of the work to remove the pipeline aggs from the tree. Removing all of it would, well, double the size of the change and make it harder to review.	2020-04-02 16:45:12 -04:00
Nik Everett	a5adac0d1e	Fix pipeline agg serialization for ccs (backport of #54282 ) (#54468 ) This fixes pipeline aggregations used in cross cluster search from an older version of Elasticsearch to a newer version of Elasticsearch. I broke this in #53730 when I was too aggressive in shutting off serialization of pipeline aggs. In particular, this comes up when the coordinating node is pre-7.8.0 and the gateway node is on or after 7.8.0. The fix is another step down the line to remove pipeline aggregators from the aggregation tree. Sort of. It create a new `List<PipelineAggregator>` member in `InternalAggregation` but it is only used for bwc serialization and it is fed by the mechanism established in #53730 to read the pipelines from the	2020-04-02 10:35:40 -04:00
Andy Bristol	eb14635f1f	add tests to StatsAggregatorTests (#53768 ) Adds tests for supported ValuesSourceTypes, unmapped fields, scripting, and the missing param. The tests for unmapped fields and scripting are migrated from the StatsIT integration test	2020-04-01 17:07:51 -07:00
William Brafford	958e9d1b78	Refactor nodes stats request builders to match requests (#54363 ) (#54604 ) * Refactor nodes stats request builders to match requests (#54363) * Remove hard-coded setters from NodesInfoRequestBuilder * Remove hard-coded setters from NodesStatsRequest * Use static imports to reduce clutter * Remove uses of old info APIs	2020-04-01 17:03:04 -04:00
Ioannis Kakavas	c9ffa379ba	[7.x] Add end to end QA authentication test (#54215 ) (#54567 ) Use the same ES cluster as both an SP and an IDP and perform IDP initiated and SP initiated SSO. The REST client plays the role of both the Cloud UI and Kibana in these flows Backport of #54215 * fix compilation issues	2020-04-01 18:35:21 +03:00
David Turner	6d976e1468	Resolve some coordination-layer TODOs (#54511 ) This commit removes a handful of TODO comments in the cluster coordination layer that no longer apply. Relates #32006	2020-04-01 12:36:18 +01:00
Lee Hinman	987b6ecffa	[7.x] Be lenient when deleting index and component templates a… (#54538 ) We previously checked for a 405 response, but on 7.x BWC tests may be hitting versions that don't support the 405 response (returning 400 or 500), so be more lenient in those cases. Relates to #54513	2020-03-31 16:24:05 -06:00
Jason Tedor	63e5f2b765	Rename META_DATA to METADATA This is a follow up to a previous commit that renamed MetaData to Metadata in all of the places. In that commit in master, we renamed META_DATA to METADATA, but lost this on the backport. This commit addresses that.	2020-03-31 17:30:51 -04:00
Jason Tedor	5fcda57b37	Rename MetaData to Metadata in all of the places (#54519 ) This is a simple naming change PR, to fix the fact that "metadata" is a single English word, and for too long we have not followed general naming conventions for it. We are also not consistent about it, for example, METADATA instead of META_DATA if we were trying to be consistent with MetaData (although METADATA is correct when considered in the context of "metadata"). This was a simple find and replace across the code base, only taking a few minutes to fix this naming issue forever.	2020-03-31 17:24:38 -04:00
Zachary Tong	c9db2de41d	[7.x] Comprehensively test supported/unsupported field type:agg combinations (#54451 ) * Comprehensively test supported/unsupported field type:agg combinations (#52493) This adds a test to AggregatorTestCase that allows us to programmatically verify that an aggregator supports or does not support a particular field type. It fetches the list of registered field type parsers, creates a MappedFieldType from the parser and then attempts to run a basic agg against the field. A supplied list of supported VSTypes are then compared against the output (success or exception) and suceeds or fails the test accordingly. Co-Authored-By: Mark Tozzi <mark.tozzi@gmail.com> * Skip fields that are not aggregatable * Use newIndexSearcher() to avoid incompatible readers (#52723) Lucene's `newSearcher()` can generate readers like ParallelCompositeReader which we can't use. We need to instead use our helper `newIndexSearcher`	2020-03-31 14:35:03 -04:00
Lee Hinman	a3d1945254	[7.x] Add warnings/errors when V2 templates would match same i… (#54449 ) * Add warnings/errors when V2 templates would match same indices… (#54367) * Add warnings/errors when V2 templates would match same indices as V1 With the introduction of V2 index templates, we want to warn users that templates they put in place might not take precedence (because v2 templates are going to "win"). This adds this validation at `PUT` time for both V1 and V2 templates with the following rules: ** When creating or updating a V2 template - If the v2 template would match indices for an existing v1 template or templates, provide a warning (through the deprecation logging so it shows up to the client) as well as logging the warning The v2 warning looks like: ``` index template [my-v2-template] has index patterns [foo-] matching patterns from existing older templates [old-v1-template,match-all-template] with patterns (old-v1-template => [foo],match-all-template => []); this template [my-v2-template] will take precedence during new index creation ``` * When creating a V1 template - If the v1 template is for index patterns of `""` and a v2 template exists, warn that the v2 template may take precedence - If the v1 template is for index patterns other than all indices, and a v2 template exists that would match, throw an error preventing creation of the v1 template * When updating a V1 template (without changing its existing `index_patterns`!) - If the v1 template is for index patterns that would match an existing v2 template, warn that the v2 template may take precedence. The v1 warning looks like: ``` template [my-v1-template] has index patterns [] matching patterns from existing index templates [existing-v2-template] with patterns (existing-v2-template => [foo]); this template [my-v1-template] may be ignored in favor of an index template at index creation time ``` And the v1 error looks like: ``` template [my-v1-template] has index patterns [foo] matching patterns from existing index templates [existing-v2-template] with patterns (existing-v2-template => [f]), use index templates (/_index_template) instead ``` Relates to #53101 * Remove v2 index and component templates when cleaning up tests * Finish half-finished comment sentence * Guard template removal and ignore for earlier versions of ES Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> * Also ignore 500 errors when clearing index template v2 templates Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-03-30 13:25:50 -06:00
Tim Brooks	2ccddbfa88	Move transport decoding and aggregation to server (#54360 ) Currently all of our transport protocol decoding and aggregation occurs in the individual transport modules. This means that each implementation (test, netty, nio) must implement this logic. Additionally, it means that the entire message has been read from the network before the server package receives it. This commit creates a pipeline in server which can be passed arbitrary bytes to handle. Internally, the pipeline will decode, decompress, and aggregate the messages. Additionally, this allows us to run many megabytes of bytes through the pipeline in tests to ensure that the logic works. This work will enable future work: Circuit breaking or backoff logic based on message type and byte in the content aggregator. Sharing bytes with the application layer using the ref counted releasable network bytes. Improved network monitoring based specifically on channels. Finally, this fixes the bug where we do not circuit break on the correct message size when compression is enabled.	2020-03-27 14:13:10 -06:00
Stuart Tettemer	1630de4a42	Scripting: stats per context in nodes stats (#54008 ) (#54357 ) Adds script cache stats to `_node/stats`. If using the general cache: ``` "script_cache": { "sum": { "compilations": 12, "cache_evictions": 9, "compilation_limit_triggered": 5 } } ``` If using context caches: ``` "script_cache": { "sum": { "compilations": 13, "cache_evictions": 9, "compilation_limit_triggered": 5 }, "contexts": [ { "context": "aggregation_selector", "compilations": 8, "cache_evictions": 6, "compilation_limit_triggered": 3 }, { "context": "aggs", "compilations": 5, "cache_evictions": 3, "compilation_limit_triggered": 2 }, ``` Backport of: 32f46f2 Refs: #50152	2020-03-27 12:26:00 -06:00
Yannick Welsch	8176f1c7f8	Delegate toXContent logic from ClusterState to its member classes (#54192 ) Backport of #52743 Co-authored-by: Yannick Welsch <yannick@welsch.lu>	2020-03-26 15:08:30 +01:00
Yannick Welsch	1ba6783780	Schedule commands in current thread context (#54187 ) Changes ThreadPool's schedule method to run the schedule task in the context of the thread that scheduled the task. This is the more sensible default for this method, and eliminates a range of bugs where the current thread context is mistakenly dropped. Closes #17143	2020-03-26 10:07:59 +01:00
Nik Everett	8f40f1435a	Save a little space in agg tree (backport of #53730 ) (#54213 ) This drop the "top level" pipeline aggregators from the aggregation result tree which should save a little memory and a few serialization bytes. Perhaps more imporantly, this provides a mechanism by which we can remove all pipelines from the aggregation result tree. This will save quite a bit of space when pipelines are deep in the tree. Sadly, doing this isn't simple because of backwards compatibility. Nodes before 7.7.0 need those pipelines. We provide them by setting passing a `Supplier<PipelineTree>` into the root of the aggregation tree that we only call if we need to serialize to a version before 7.7.0. This solution works for cross cluster search because we always reduce the aggregations in each remote cluster and then forward them back to the coordinating node. Its quite possible that the coordinating node needs the pipeline (say it is version 7.1.0) and the gateway node in the remote cluster doesn't (version 7.7.0). In that case the data nodes won't send the pipeline aggregations back to the gateway node. Critically, the gateway node will send the pipeline aggregations back to the coordinating node. This is all managed with that `Supplier<PipelineTree>`, but how it is managed is a bit tricky.	2020-03-25 15:51:16 -04:00
James Baiera	b84c74cf70	Update the HDFS version used by HDFS Repo (#53693 ) (#54125 )	2020-03-25 14:01:29 -04:00
Armin Braun	4271963462	Revert "Use Azure Bulk Deletes in Azure Repository (#53919 )" (#54089 ) (#54111 ) This reverts commit 23cccf088810b8416ed278571352393cc2de9523. Unfortunately SAS token auth still doesn't work with bulk deletes so we can't use them yet. Closes #54080	2020-03-25 12:13:25 +01:00
Yannick Welsch	e006d1f6cf	Use special XContent registry for node tool (#54050 ) Fixes an issue where the elasticsearch-node command-line tools would not work correctly because PersistentTasksCustomMetaData contains named XContent from plugins. This PR makes it so that the parsing for all custom metadata is skipped, even if the core system would know how to handle it. Closes #53549	2020-03-24 17:40:51 +01:00
Tanguy Leroux	dea8a31480	Wait for Active license before running CCR API tests (#53966 ) DocsClientYamlTestSuiteIT sometimes fails for CCR related tests because tests are started before the license is fully applied and active within the cluster. The first tests to be executed then fails with the error noticed in #53430. This can be easily reproduced locally by only running CCR docs tests. This commit adds some @Before logic in DocsClientYamlTestSuiteIT so that it waits for the license to be active before running CCR tests. Closes #53430	2020-03-24 14:29:45 +01:00
Nik Everett	b9bfba2c8b	Move pipeline agg validation to coordinating node (backport of #53669 ) (#54019 ) This moves the pipeline aggregation validation from the data node to the coordinating node so that we, eventually, can stop sending pipeline aggregations to the data nodes entirely. In fact, it moves it into the "request validation" stage so multiple errors can be accumulated and sent back to the requester for the entire request. We can't always take advantage of that, but it'll be nice for folks not to have to play whack-a-mole with validation. This is implemented by replacing `PipelineAggretionBuilder#validate` with: ``` protected abstract void validate(ValidationContext context); ``` The `ValidationContext` handles the accumulation of validation failures, provides access to the aggregation's siblings, and implements a few validation utility methods.	2020-03-23 17:22:56 -04:00
Marios Trivyzas	3a3e964956	Reduce performance impact of ExitableDirectoryReader (#53978 ) (#54014 ) Benchmarking showed that the effect of the ExitableDirectoryReader is reduced considerably when checking every 8191 docs. Moreover, set the cancellable task before calling QueryPhase#preProcess() and make sure we don't wrap with an ExitableDirectoryReader at all when lowLevelCancellation is set to false to avoid completely any performance impact. Follows: #52822 Follows: #53166 Follows: #53496 (cherry picked from commit cdc377e8e74d3ca6c231c36dc5e80621aab47c69)	2020-03-23 21:30:34 +01:00
Ryan Ernst	960d1fb578	Revert "Introduce system index APIs for Kibana (#53035 )" (#53992 ) This reverts commit `c610e0893d`. backport of #53912	2020-03-23 10:29:35 -07:00
Armin Braun	5b9864db2c	Better Incrementality for Snapshots of Unchanged Shards (#52182 ) (#53984 ) Use sequence numbers and force merge UUID to determine whether a shard has changed or not instead before falling back to comparing files to get incremental snapshots on primary fail-over.	2020-03-23 16:43:41 +01:00
Armin Braun	b51ea25a00	Use Azure Bulk Deletes in Azure Repository (#53919 ) (#53967 ) Now that we upgraded the Azure SDK to 8.6.2 in #53865 we can make use of bulk deletes.	2020-03-23 13:35:05 +01:00
Tim Vernum	cde8725e3c	Create API Key on behalf of other user (#53943 ) This change adds a "grant API key action" POST /_security/api_key/grant that creates a new API key using the privileges of one user ("the system user") to execute the action, but creates the API key with the roles of the second user ("the end user"). This allows a system (such as Kibana) to create API keys representing the identity and access of an authenticated user without requiring that user to have permission to create API keys on their own. This also creates a new QA project for security on trial licenses and runs the API key tests there Backport of: #52886	2020-03-23 18:50:07 +11:00
David Turner	adfeb50a53	Use consistent threadpools in CoordinatorTests (#53868 ) Today in the `CoordinatorTests` each node uses multiple threadpools. This is mostly fine as they are almost completely stateless, except for the `ThreadContext`: by using multiple threadpools we cannot make assertions that the thread context is/isn't preserved as we expect. This commit consolidates the threadpool instances in use so that each node uses just one.	2020-03-20 16:22:42 +01:00
Armin Braun	7d66c7e25c	Rethrow Failed Assertion in BlobStoreTestUtils (#53859 ) (#53863 ) This fixes two issues: 1. Currently, the future here is never resolved on assertion error so a failing test would take a full minute to complete until the future times out. 2. S3 tests overide this method to busy assert on this method. This only works if an assertion error makes it to the calling thread. Closes #53508	2020-03-20 15:31:48 +01:00
Alan Woodward	10a2a95b25	Fix backport merge error in ESTestCase We were ignoring the `stripXContentPosition` boolean when checking for warnings.	2020-03-20 12:12:47 +00:00
Alan Woodward	d23112f441	Report parser name and location in XContent deprecation warnings (#53805 ) It's simple to deprecate a field used in an ObjectParser just by adding deprecation markers to the relevant ParseField objects. The warnings themselves don't currently have any context - they simply say that a deprecated field has been used, but not where in the input xcontent it appears. This commit adds the parent object parser name and XContentLocation to these deprecation messages. Note that the context is automatically stripped from warning messages when they are asserted on by integration tests and REST tests, because randomization of xcontent type during these tests means that the XContentLocation is not constant	2020-03-20 11:52:55 +00:00
David Turner	7d3ac4f57d	Revert "Apply cluster states in system context (#53785 )" This reverts commit `4178c57410`.	2020-03-19 15:20:36 +00:00
David Turner	4178c57410	Apply cluster states in system context (#53785 ) Today cluster states are sometimes (rarely) applied in the default context rather than system context, which means that any appliers which capture their contexts cannot do things like remote transport actions when security is enabled. There are at least two ways that we end up applying the cluster state in the default context: 1. locally applying a cluster state that indicates that the master has failed 2. the elected master times out while waiting for a response from another node This commit ensures that cluster states are always applied in the system context. Mitigates #53751	2020-03-19 14:48:55 +00:00
Stuart Tettemer	cdbee32f55	Scripting: Per-context script cache, default off (#52855 ) (#53756 ) * Adds per context settings: `script.context.${CONTEXT}.cache_max_size` ~ `script.cache.max_size` `script.context.${CONTEXT}.cache_expire` ~ `script.cache.expire` `script.context.${CONTEXT}.max_compilations_rate` ~ `script.max_compilations_rate` * Context cache is used if: `script.max_compilations_rate=use-context`. This value is dynamically updatable, so users can switch back to the general cache if desired. * Settings for context caches take the first value that applies: 1) Context specific settings if set, eg `script.context.ingest.cache_max_size` 2) Correlated general setting is set to the non-default value, eg `script.cache.max_size` 3) Context default The reason for 2's inclusion is to allow an easy transition for users who've customized their general cache settings. Using the general cache settings for the context caches results in higher effective settings, since they are multiplied across the number of contexts. So a general cache max size of 200 will become 200 * # of contexts. However, this behavior it will avoid users snapping to a value that is too low for them. Backport of: #52855 Refs: #50152	2020-03-18 14:44:04 -06:00
Ryan Ernst	5c472fcb47	Upgrade jackson to 2.10.3 and GeoIP to 2.13.1 (#53642 ) Re-applies the change from #53523 along with test fixes. closes #53626 closes #53624 closes #53622 closes #53625 Co-authored-by: Nik Everett <nik9000@gmail.com> Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com> Co-authored-by: Jake Landis <jake.landis@elastic.co>	2020-03-17 10:28:51 -07:00
Ryan Ernst	6a7baddc2b	Update to ASM 7.1 in logger usage check (#52742 ) The logger usage check uses its own version of ASM to inspect class files for logging usages. Master was updated to support java 11 compilation in #40754. However, 7.x still used ASM 5, which could not read newer java bytecode versions. This commit bumps ASM in 7.x used in the logger usage check. closes #52408	2020-03-16 18:50:50 -07:00
Jason Tedor	8c96112613	Fix compilation in simple transport test case base This commit fixes compilation after a backport gone bad due to a difference in the minimum Java version between 7.x and master.	2020-03-16 21:44:58 -04:00
Jason Tedor	01d2339883	Invoke response handler on failure to send (#53631 ) Today it can happen that a transport message fails to send (for example, because a transport interceptor rejects the request). In this case, the response handler is never invoked, which can lead to necessary cleanups not being performed. There are two ways to handle this. One is to expect every callsite that sends a message to try/catch these exceptions and handle them appropriately. The other is merely to invoke the response handler to handle the exception, which is already equipped to handle transport exceptions.	2020-03-16 21:28:24 -04:00
Nik Everett	f0beab4041	Stop using round-tripped PipelineAggregators (backport of #53423 ) (#53629 ) This begins to clean up how `PipelineAggregator`s and executed. Previously, we would create the `PipelineAggregator`s on the data nodes and embed them in the aggregation tree. When it came time to execute the pipeline aggregation we'd use the `PipelineAggregator`s that were on the first shard's results. This is inefficient because: 1. The data node needs to make the `PipelineAggregator` only to serialize it and then throw it away. 2. The coordinating node needs to deserialize all of the `PipelineAggregator`s even though it only needs one of them. 3. You end up with many `PipelineAggregator` instances when you only really need one per pipeline. 4. `PipelineAggregator` needs to implement serialization. This begins to undo these by building the `PipelineAggregator`s directly on the coordinating node and using those instead of the `PipelineAggregator`s in the aggregtion tree. In a follow up change we'll stop serializing the `PipelineAggregator`s to node versions that support this behavior. And, one day, we'll be able to remove `PipelineAggregator` from the aggregation result tree entirely. Importantly, this doesn't change how pipeline aggregations are declared or parsed or requested. They are still part of the `AggregationBuilder` tree because that makes sense.	2020-03-16 16:15:23 -04:00
Jim Ferenczi	e6680be0b1	Add new x-pack endpoints to track the progress of a search asynchronously (#49931 ) (#53591 ) This change introduces a new API in x-pack basic that allows to track the progress of a search. Users can submit an asynchronous search through a new endpoint called `_async_search` that works exactly the same as the `_search` endpoint but instead of blocking and returning the final response when available, it returns a response after a provided `wait_for_completion` time. ```` GET my_index_pattern/_async_search?wait_for_completion=100ms { "aggs": { "date_histogram": { "field": "@timestamp", "fixed_interval": "1h" } } } ```` If after 100ms the final response is not available, a `partial_response` is included in the body: ```` { "id": "9N3J1m4BgyzUDzqgC15b", "version": 1, "is_running": true, "is_partial": true, "response": { "_shards": { "total": 100, "successful": 5, "failed": 0 }, "total_hits": { "value": 1653433, "relation": "eq" }, "aggs": { ... } } } ```` The partial response contains the total number of requested shards, the number of shards that successfully returned and the number of shards that failed. It also contains the total hits as well as partial aggregations computed from the successful shards. To continue to monitor the progress of the search users can call the get `_async_search` API like the following: ```` GET _async_search/9N3J1m4BgyzUDzqgC15b/?wait_for_completion=100ms ```` That returns a new response that can contain the same partial response than the previous call if the search didn't progress, in such case the returned `version` should be the same. If new partial results are available, the version is incremented and the `partial_response` contains the updated progress. Finally if the response is fully available while or after waiting for completion, the `partial_response` is replaced by a `response` section that contains the usual _search response: ```` { "id": "9N3J1m4BgyzUDzqgC15b", "version": 10, "is_running": false, "response": { "is_partial": false, ... } } ```` Asynchronous search are stored in a restricted index called `.async-search` if they survive (still running) after the initial submit. Each request has a keep alive that defaults to 5 days but this value can be changed/updated any time: ````` GET my_index_pattern/_async_search?wait_for_completion=100ms&keep_alive=10d ````` The default can be changed when submitting the search, the example above raises the default value for the search to `10d`. ````` GET _async_search/9N3J1m4BgyzUDzqgC15b/?wait_for_completion=100ms&keep_alive=10d ````` The time to live for a specific search can be extended when getting the progress/result. In the example above we extend the keep alive to 10 more days. A background service that runs only on the node that holds the first primary shard of the `async-search` index is responsible for deleting the expired results. It runs every hour but the expiration is also checked by running queries (if they take longer than the keep_alive) and when getting a result. Like a normal `_search`, if the http channel that is used to submit a request is closed before getting a response, the search is automatically cancelled. Note that this behavior is only for the submit API, subsequent GET requests will not cancel if they are closed. Asynchronous search are not persistent, if the coordinator node crashes or is restarted during the search, the asynchronous search will stop. To know if the search is still running or not the response contains a field called `is_running` that indicates if the task is up or not. It is the responsibility of the user to resume an asynchronous search that didn't reach a final response by re-submitting the query. However final responses and failures are persisted in a system index that allows to retrieve a response even if the task finishes. ```` DELETE _async_search/9N3J1m4BgyzUDzqgC15b ```` The response is also not stored if the initial submit action returns a final response. This allows to not add any overhead to queries that completes within the initial `wait_for_completion`. The `.async-search` index is a restricted index (should be migrated to a system index in +8.0) that is accessible only through the async search APIs. These APIs also ensure that only the user that submitted the initial query can retrieve or delete the running search. Note that admins/superusers would still be able to cancel the search task through the task manager like any other tasks. Relates #49091 Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com>	2020-03-16 15:31:27 +01:00
Nhat Nguyen	6665ebe7ab	Harden search context id (#53143 ) Using a Long alone is not strong enough for the id of search contexts because we reset the id generator whenever a data node is restarted. This can lead to two issues: 1. Fetch phase can fetch documents from another index 2. A scroll search can return documents from another index This commit avoids these issues by adding a UUID to SearchContexId.	2020-03-11 11:48:11 -04:00
Gordon Brown	ff9b8bda63	Implement hidden aliases (#52547 ) This commit introduces hidden aliases. These are similar to hidden indices, in that they are not visible by default, unless explicitly specified by name or by indicating that hidden indices/aliases are desired. The new alias property, `is_hidden` is implemented similarly to `is_write_index`, except that it must be consistent across all indices with a given alias - that is, all indices with a given alias must specify the alias as either hidden, or all specify it as non-hidden, either explicitly or by omitting the `is_hidden` property.	2020-03-06 16:02:38 -07:00
Jay Modi	a81460dbf5	Make watch history indices hidden (#52974 ) This commit updates the template used for watch history indices with the hidden index setting so that new indices will be created as hidden. Relates #50251 Backport of #52962	2020-03-06 09:47:03 -07:00
Nik Everett	f32e4583d1	Add `allowed_warnings` to yaml tests (backport of #53139 ) (#53173 ) When we test backwards compatibility we often end up in a situation where we sometimes get a warning, and sometimes don't. Like, we won't get the warning if we're testing against an older version, but we will in a newer one. Or we won't get the warning if the request randomly lands on a node with an old version of the code. But we wouldn't if it randomed into a node with newer code. This adds `allowed_warnings` to our yaml test runner for those cases: warnings declared this way are "allowed" but not "required". Blocks #52959 Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>	2020-03-05 17:11:54 -05:00
Marios Trivyzas	487d442760	Implement Exitable DirectoryReader (#52822 ) (#53162 ) Implement an Exitable DirectoryReader that wraps the original DirectoryReader so that when a search task is cancelled the DirectoryReaders also stop their work fast. This is usuful for expensive operations like wilcard/prefix queries where the DirectoryReaders can spend lots of time and consume resources, as previously their work wouldn't stop even though the original search task was cancelled (e.g. because of timeout or dropped client connection). (cherry picked from commit 67acaf61f33bc5f54e26541514d07e375c202e03)	2020-03-05 14:17:31 +01:00
Nik Everett	28df7ae5ed	Support multiple metrics in `top_metrics` agg (backport of #52965 ) (#53163 ) This adds support for returning multiple metrics to the `top_metrics` agg. It looks like: ``` POST /test/_search?filter_path=aggregations { "aggs": { "tm": { "top_metrics": { "metrics": [ {"field": "v"}, {"field": "m"} ], "sort": {"s": "desc"} } } } } ```	2020-03-05 08:12:01 -05:00
Armin Braun	204c366a4e	Upgrade GCS SDK to 1.104.0 (#52839 ) (#53152 ) Upgrading the GCS SDK to the most recent version. Adjusting (i.e. improving) the REST mock accordingly. This should significantly boost performance by pulling in https://github.com/googleapis/java-core/issues/86 in some cases.	2020-03-05 11:18:18 +01:00
Tanguy Leroux	52d4807f8d	Mute GoogleCloudStorageBlobStoreRepositoryTests on jdk8 (#53119 ) Tests in GoogleCloudStorageBlobStoreRepositoryTests are known to be flaky on JDK 8 (#51446, #52430 ) and we suspect a JDK bug (https://bugs.openjdk.java.net/browse/JDK-8180754) that triggers some assertion on the server side logic that emulates the Google Cloud Storage service. Sadly we were not able to reproduce the failures, even when using the same OS (Debian 9, Ubuntu 16.04) and JDK (Oracle Corporation 1.8.0_241 [Java HotSpot(TM) 64-Bit Server VM 25.241-b07]) of almost all the test failures on CI. While we spent some time fixing code (#51933, #52431) to circumvent the JDK bug they are still flaky on JDK-8. This commit mute these tests for JDK-8 only. Close ##52906	2020-03-05 09:18:05 +01:00
Jay Modi	c610e0893d	Introduce system index APIs for Kibana (#53035 ) This commit introduces a module for Kibana that exposes REST APIs that will be used by Kibana for access to its system indices. These APIs are wrapped versions of the existing REST endpoints. A new setting is also introduced since the Kibana system indices' names are allowed to be changed by a user in case multiple instances of Kibana use the same instance of Elasticsearch. Additionally, the ThreadContext has been extended to indicate that the use of system indices may be allowed in a request. This will be built upon in the future for the protection of system indices. Backport of #52385	2020-03-03 14:11:36 -07:00
Nhat Nguyen	e6755afeeb	Upgrade to Lucene 8.5.0-snapshot-c4475920b08 (#52950 ) (#52977 ) To give LUCENE-9228 more CI cycles	2020-02-29 09:29:16 -05:00
Nhat Nguyen	fa701e4c1f	Fix testRetentionLeasesEstablishedWhenRelocatingPrimary (#52445 ) Replace the current assertion with a more robust assertion. Closes #52364	2020-02-26 22:06:01 -05:00
Armin Braun	f7d71b5930	Fix GCS Mock Range Downloads (#52804 ) (#52830 ) We were not correctly respecting the download range which lead to the GCS SDK client closing the connection at times. Also, fixes another instance of failing to drain the request fully before sending the response headers. Closes #51446	2020-02-26 18:38:02 +01:00
Adrien Grand	1807f86751	Generalize how queries on `_index` are handled at rewrite time (#52815 ) Generalize how queries on `_index` are handled at rewrite time (#52486) Since this change refactors rewrites, I also took it as an opportunity to adrress #49254: instead of returning the same queries you would get on a keyword field when a field is unmapped, queries get rewritten to a MatchNoDocsQueryBuilder. This change exposed a couple bugs, like the fact that the percolator doesn't rewrite queries at query time, or that the significant_terms aggregation doesn't rewrite its inner filter, which I fixed. Closes #49254	2020-02-26 15:37:43 +01:00
Adrien Grand	f993ef80f8	Move the terms index of `_id` off-heap. (#52518 ) In #42838 we moved the terms index of all fields off-heap except the `_id` field because we were worried it might make indexing slower. In general, the indexing rate is only affected if explicit IDs are used, as otherwise Elasticsearch almost never performs lookups in the terms dictionary for the purpose of indexing. So it's quite wasteful to require the terms index of `_id` to be loaded on-heap for users who have append-only workloads. Furthermore I've been conducting benchmarks when indexing with explicit ids on the http_logs dataset that suggest that the slowdown is low enough that it's probably not worth forcing the terms index to be kept on-heap. Here are some numbers for the median indexing rate in docs/s: \| Run \| Master \| Patch \| \| --- \| ------- \| ------- \| \| 1 \| 45851.2 \| 46401.4 \| \| 2 \| 45192.6 \| 44561.0 \| \| 3 \| 45635.2 \| 44137.0 \| \| 4 \| 46435.0 \| 44692.8 \| \| 5 \| 45829.0 \| 44949.0 \| And now heap usage in MB for segments: \| Run \| Master \| Patch \| \| --- \| ------- \| -------- \| \| 1 \| 41.1720 \| 0.352083 \| \| 2 \| 45.1545 \| 0.382534 \| \| 3 \| 41.7746 \| 0.381285 \| \| 4 \| 45.3673 \| 0.412737 \| \| 5 \| 45.4616 \| 0.375063 \| Indexing rate decreased by 1.8% on average, while memory usage decreased by more than 100x. The `http_logs` dataset contains small documents and has a simple indexing chain. More complex indexing chains, e.g. with more fields, ingest pipelines, etc. would see an even lower decrease of indexing rate.	2020-02-24 18:14:12 +01:00
bellengao	02cb5b6c0e	Return 429 status code on read_only_allow_delete index block (#50166 ) We consider index level read_only_allow_delete blocks temporary since the DiskThresholdMonitor can automatically release those when an index is no longer allocated on nodes above high threshold. The rest status has therefore been changed to 429 when encountering this index block to signal retryability to clients. Related to #49393	2020-02-22 16:24:25 +01:00
Jay Modi	8abfda0b59	Rename assertThrows to prevent naming clash (#52651 ) This commit renames ElasticsearchAssertions#assertThrows to assertRequestBuilderThrows and assertFutureThrows to avoid a naming clash with JUnit 4.13+ and static imports of these methods. Additionally, these methods have been updated to make use of expectThrows internally to avoid duplicating the logic there. Relates #51787 Backport of #52582	2020-02-21 13:30:11 -07:00
Stuart Tettemer	376932a47d	Scripting: split out compile limits and caching (#52498 ) (#52652 ) Phase 1 of adding compilation limits per context. * Refactor rate limiting and caching into separate class, `ScriptCache`, which will be used per context. * Disable compilation limit for certain tests. Backport of 0866031 Refs: #50152	2020-02-21 12:10:51 -07:00
Armin Braun	0a09e15959	Add Caching for RepositoryData in BlobStoreRepository (#52341 ) (#52566 ) Cache latest `RepositoryData` on heap when it's absolutely safe to do so (i.e. when the repository is in strictly consistent mode). `RepositoryData` can safely be assumed to not grow to a size that would cause trouble because we often have at least two copies of it loaded at the same time when doing repository operations. Also, concurrent snapshot API status requests currently load it independently of each other and so on, making it safe to cache on heap and assume as "small" IMO. The benefits of this move are: * Much faster repository status API calls * listing all snapshot names becomes instant * Other operations are sped up massively too because they mostly operate in two steps: load repository data then load multiple other blobs to get the additional data * Additional cloud cost savings * Better resiliency, saving another spot where an IO issue could break the snapshot * We can simplify a number of spots in the current code that currently pass around the repository data in tricky ways to avoid loading it multiple times in follow ups.	2020-02-21 10:20:07 +01:00
Armin Braun	4bb780bc37	Refactor Inflexible Snapshot Repository BwC (#52365 ) (#52557 ) * Refactor Inflexible Snapshot Repository BwC (#52365) Transport the version to use for a snapshot instead of whether to use shard generations in the snapshots in progress entry. This allows making upcoming repository metadata changes in a flexible manner in an analogous way to how we handle serialization BwC elsewhere. Also, exposing the version at the repository API level will make it easier to do BwC relevant changes in derived repositories like source only or encrypted.	2020-02-21 09:14:34 +01:00
Przemysław Witek	7cd997df84	[ML] Make ml internal indices hidden (#52423 ) (#52509 )	2020-02-19 14:02:32 +01:00
Tim Brooks	a742c58d45	Extract a ConnectionManager interface (#51722 ) Currently we have three different implementations representing a `ConnectionManager`. There is the basic `ConnectionManager` which holds all connections for a cluster. And a remote connection manager which support proxy behavior. And a stubbable connection manager for tests. The remote and stubbable instances use the delegate pattern, so this commit extracts an interface for them all to implement.	2020-02-18 09:19:24 -07:00
David Turner	3d57a78deb	Add extra logging for investigation into #52000 (#52472 ) It looks like #52000 is caused by a slowdown in cluster state application (maybe due to #50907) but I would like to understand the details to ensure that there's nothing else going on here too before simply increasing the timeout. This commit enables some relevant `DEBUG` loggers and also captures stack traces from all threads rather than just the three hottest ones.	2020-02-18 13:02:33 +00:00
Rory Hunter	4fb399396c	Fix mixed threadlocal and supplied random use (#52459 ) Closes #52450. `setRandomIndexSettings(...)` in `ESIntegTestCase` was using both a thread-local and a supplied source of randomness. Fix this method to only use the supplied source.	2020-02-18 10:45:23 +00:00
Nhat Nguyen	bdb2e72ea4	Fix timeout in testDowngradeRemoteClusterToBasic (#52322 ) - ESCCRRestTestCase#ensureYellow does not work well with assertBusy - Increases timeout to 60s Closes #52036	2020-02-17 15:05:42 -05:00
Nik Everett	146def8caa	Implement top_metrics agg (#51155 ) (#52366 ) The `top_metrics` agg is kind of like `top_hits` but it only works on doc values so it should be faster. At this point it is fairly limited in that it only supports a single, numeric sort and a single, numeric metric. And it only fetches the "very topest" document worth of metric. We plan to support returning a configurable number of top metrics, requesting more than one metric and more than one sort. And, eventually, non-numeric sorts and metrics. The trick is doing those things fairly efficiently. Co-Authored by: Zachary Tong <zach@elastic.co>	2020-02-14 11:19:11 -05:00
Julie Tibshirani	0d7165a40b	Standardize naming of fetch subphases. (#52171 ) This commit makes the names of fetch subphases more consistent: * Now the names end in just 'Phase', whereas before some ended in 'FetchSubPhase'. This matches the query subphases like AggregationPhase. * Some names include 'fetch' like FetchScorePhase to avoid ambiguity about what they do.	2020-02-13 13:00:46 -08:00
Nik Everett	2dac36de4d	HLRC support for string_stats (#52163 ) (#52297 ) This adds a builder and parsed results for the `string_stats` aggregation directly to the high level rest client. Without this the HLRC can't access the `string_stats` API without the elastic licensed `analytics` module. While I'm in there this adds a few of our usual unit tests and modernizes the parsing.	2020-02-12 19:25:05 -05:00
Marios Trivyzas	dac720d7a1	Add a cluster setting to disallow expensive queries (#51385 ) (#52279 ) Add a new cluster setting `search.allow_expensive_queries` which by default is `true`. If set to `false`, certain queries that have usually slow performance cannot be executed and an error message is returned. - Queries that need to do linear scans to identify matches: - Script queries - Queries that have a high up-front cost: - Fuzzy queries - Regexp queries - Prefix queries (without index_prefixes enabled - Wildcard queries - Range queries on text and keyword fields - Joining queries - HasParent queries - HasChild queries - ParentId queries - Nested queries - Queries on deprecated 6.x geo shapes (using PrefixTree implementation) - Queries that may have a high per-document cost: - Script score queries - Percolate queries Closes: #29050 (cherry picked from commit a8b39ed842c7770bd9275958c9f747502fd9a3ea)	2020-02-12 22:56:14 +01:00
Gordon Brown	d48ce12920	Convert ILM and SLM histories into hidden indices (#51456 ) Modifies SLM's and ILM's history indices to be hidden indices for added protection against accidental querying and deletion, and improves IndexTemplateRegistry to handle upgrading index templates. Also modifies the REST test cleanup to delete hidden indices.	2020-02-11 14:18:55 -07:00
David Turner	00b9098250	Ignore timeouts with single-node discovery (#52159 ) Today we use `cluster.join.timeout` to prevent nodes from waiting indefinitely if joining a faulty master that is too slow to respond, and `cluster.publish.timeout` to allow a faulty master to detect that it is unable to publish its cluster state updates in a timely fashion. If these timeouts occur then the node restarts the discovery process in an attempt to find a healthier master. In the special case of `discovery.type: single-node` there is no point in looking for another healthier master since the single node in the cluster is all we've got. This commit suppresses these timeouts and instead lets the node wait for joins and publications to succeed no matter how long this might take.	2020-02-11 14:15:01 +00:00
Armin Braun	90eb6a020d	Remove Redundant Loading of RepositoryData during Restore (#51977 ) (#52108 ) We can just put the `IndexId` instead of just the index name into the recovery soruce and save one load of `RepositoryData` on each shard restore that way.	2020-02-09 21:44:18 +01:00
Julie Tibshirani	337d73a7c6	Rename MapperService#fullName to fieldType. The new name more accurately describes what the method returns.	2020-02-07 10:35:53 -08:00
Martijn Laarman	f626700757	Introduces validation to restrict API's to single namespace (#51982 ) To be merged after #51981 lands (cherry picked from commit 196eaf8ac1ccc777f5bd81469d1148bc7ec299dd)	2020-02-06 17:17:03 +01:00
Armin Braun	3be70f64d8	Fix GCS Mock Http Handler JDK Bug (#51933 ) (#51941 ) There is an open JDK bug that is causing an assertion in the JDK's http server to trip if we don't drain the request body before sending response headers. See https://bugs.openjdk.java.net/browse/JDK-8180754 Working around this issue here by always draining the request at the beginning of the handler. Fixes #51446	2020-02-05 15:37:06 +01:00
Maria Ralli	8d3e73b3a0	Add host address to BindTransportException message (#51269 ) When bind fails, show the host address in addition to the port. This helps debugging cases with wrong "network.host" values. Closes #48001	2020-02-04 17:13:19 +00:00
James Rodewig	4ea7297e1e	[DOCS] Change http://elastic.co -> https (#48479 ) (#51812 ) Co-authored-by: Jonathan Budzenski <jon@budzenski.me>	2020-02-03 09:50:11 -05:00
Ryan Ernst	21224caeaf	Remove comparison to true for booleans (#51723 ) While we use `== false` as a more visible form of boolean negation (instead of `!`), the true case is implied and the true value does not need to explicitly checked. This commit converts cases that have slipped into the code checking for `== true`.	2020-01-31 16:35:43 -08:00
Adrien Grand	915a931e93	Bucket aggregation circuit breaker optimization. (#46751 ) (#51730 ) Co-authored-by: Howard <danielhuang@tencent.com>	2020-01-31 11:30:51 +01:00
Ioannis Kakavas	72c5bc9630	Disable diagnose trust mananer only when needed (#51683 ) Do not try to set xpack.ssl.security.diagnose.trust when it is not registered	2020-01-30 21:35:03 +02:00
Armin Braun	7914c1a734	Optimize GCS Mock (#51593 ) (#51594 ) This test was still very GC heavy in Java 8 runs in particular which seems to slow down request processing to the point of timeouts in some runs. This PR completely removes the large number of O(MB) `byte[]` allocations that were happening in the mock http handler which cuts the allocation rate by about a factor of 5 in my local testing for the GC heavy `testSnapshotWithLargeSegmentFiles` run. Closes #51446 Closes #50754	2020-01-29 11:06:05 +01:00
Jim Ferenczi	77f4aafaa2	Expose the logic to cancel task when the rest channel is closed (#51423 ) This commit moves the logic that cancels search requests when the rest channel is closed to a generic client that can be used by other APIs. This will be useful for any rest action that wants to cancel the execution of a task if the underlying rest channel is closed by the client before completion. Relates #49931 Relates #50990 Relates #50990	2020-01-28 22:55:42 +01:00
Armin Braun	aae93a7578	Allow Repository Plugins to Filter Metadata on Create (#51472 ) (#51542 ) * Allow Repository Plugins to Filter Metadata on Create Add a hook that allows repository plugins to filter the repository metadata before it gets written to the cluster state.	2020-01-28 18:33:26 +01:00
Yannick Welsch	fa212fe60b	Stricter checks of setup and teardown in docs tests (#51430 ) Adds extra checks due to 7.x backport	2020-01-28 16:52:23 +01:00

... 2 3 4 5 6 ...

2675 Commits