OpenSearch

Commit Graph

Author	SHA1	Message	Date
Dan Hermann	cd584d49dc	Bump version after 7.9.2 release	2020-09-24 10:48:57 -05:00
Martijn van Groningen	8ca33feffd	Fail with correct error if first backing index exists when auto creating data stream (#62862 ) Backport #62825 to 7.x branch. Today if a data stream is auto created, but an index with same name as the first backing index already exists then internally that error is ignored, which then result that later in the execution of a bulk request, the bulk item fails due to that the data stream hasn't been auto created. This situation can only occur if an index with same is created that will be the backing index of a data stream prior to the creation of the data stream. Co-authored-by: Dan Hermann <danhermann@users.noreply.github.com>	2020-09-24 17:16:34 +02:00
Nik Everett	ce24115ba3	Speed up date_histogram by precomputing ranges (backport of #61467 ) (#62880 ) A few of us were talking about ways to speed up the `date_histogram` using the index for the timestamp rather than the doc values. To do that we'd have to pre-compute all of the "round down" points in the index. It turns out that just precomputing those values speeds up rounding fairly significantly: ``` Benchmark (count) (interval) (range) (zone) Mode Cnt Score Error Units before 10000000 calendar month 2000-10-28 to 2000-10-31 UTC avgt 10 96461080.982 ± 616373.011 ns/op before 10000000 calendar month 2000-10-28 to 2000-10-31 America/New_York avgt 10 130598950.850 ± 1249189.867 ns/op after 10000000 calendar month 2000-10-28 to 2000-10-31 UTC avgt 10 52311775.080 ± 107171.092 ns/op after 10000000 calendar month 2000-10-28 to 2000-10-31 America/New_York avgt 10 54800134.968 ± 373844.796 ns/op ``` That's a 46% speed up when there isn't a time zone and a 58% speed up when there is. This doesn't work for every time zone, specifically those that have two midnights in a single day due to daylight savings time will produce wonky results. So they don't get the optimization. Second, this requires a few expensive computation up front to make the transition array. And if the transition array is too large then we give up and use the original mechanism, throwing away all of the work we did to build the array. This seems appropriate for most usages of `round`, but this change uses it for all usages of `round`. That seems ok for now, but it might be worth investigating in a follow up. I ran a macrobenchmark as well which showed an 11% preformance improvement. BUT the benchmark wasn't tuned for my desktop so it overwhelmed it and might have produced "funny" results. I think it is pretty clear that this is an improvement, but know the measurement is weird: ``` Benchmark (count) (interval) (range) (zone) Mode Cnt Score Error Units before 10000000 calendar month 2000-10-28 to 2000-10-31 UTC avgt 10 96461080.982 ± 616373.011 ns/op before 10000000 calendar month 2000-10-28 to 2000-10-31 America/New_York avgt 10 g± 1249189.867 ns/op after 10000000 calendar month 2000-10-28 to 2000-10-31 UTC avgt 10 52311775.080 ± 107171.092 ns/op after 10000000 calendar month 2000-10-28 to 2000-10-31 America/New_York avgt 10 54800134.968 ± 373844.796 ns/op Before: \| Min Throughput \| hourly_agg \| 0.11 \| ops/s \| \| Median Throughput \| hourly_agg \| 0.11 \| ops/s \| \| Max Throughput \| hourly_agg \| 0.11 \| ops/s \| \| 50th percentile latency \| hourly_agg \| 650623 \| ms \| \| 90th percentile latency \| hourly_agg \| 821478 \| ms \| \| 99th percentile latency \| hourly_agg \| 859780 \| ms \| \| 100th percentile latency \| hourly_agg \| 864030 \| ms \| \| 50th percentile service time \| hourly_agg \| 9268.71 \| ms \| \| 90th percentile service time \| hourly_agg \| 9380 \| ms \| \| 99th percentile service time \| hourly_agg \| 9626.88 \| ms \| \|100th percentile service time \| hourly_agg \| 9884.27 \| ms \| \| error rate \| hourly_agg \| 0 \| % \| After: \| Min Throughput \| hourly_agg \| 0.12 \| ops/s \| \| Median Throughput \| hourly_agg \| 0.12 \| ops/s \| \| Max Throughput \| hourly_agg \| 0.12 \| ops/s \| \| 50th percentile latency \| hourly_agg \| 519254 \| ms \| \| 90th percentile latency \| hourly_agg \| 653099 \| ms \| \| 99th percentile latency \| hourly_agg \| 683276 \| ms \| \| 100th percentile latency \| hourly_agg \| 686611 \| ms \| \| 50th percentile service time \| hourly_agg \| 8371.41 \| ms \| \| 90th percentile service time \| hourly_agg \| 8407.02 \| ms \| \| 99th percentile service time \| hourly_agg \| 8536.64 \| ms \| \|100th percentile service time \| hourly_agg \| 8538.54 \| ms \| \| error rate \| hourly_agg \| 0 \| % \| ```	2020-09-24 11:03:47 -04:00
Daniel Mitterdorfer	00ce1d7e4b	Mute failing test in IndexRecoveryIT (#62865 ) (#62868 ) Relates #62863	2020-09-24 15:16:40 +02:00
Daniel Mitterdorfer	aec7c65af4	Mute DiskThresholdDeciderIT (#62858 ) (#62859 ) Relates #62326	2020-09-24 13:24:11 +02:00
Julie Tibshirani	f971146de4	Rename FieldValueRetriever -> FieldFetcher. (#62795 ) (#62836 ) The name `FieldFetcher` fits better with the 'fetch' terminology we use elsewhere, for example `FetchFieldsPhase` and `ValueFetcher`. This PR also moves the construction of the fetcher off the context and onto `FetchFieldsPhase`, which feels like a more natural place for it, and fixes a TODO in javadocs.	2020-09-23 10:12:23 -07:00
Nhat Nguyen	38c8a55df8	Better UUID for reader context (#62799 ) We can use a single and stronger UUID for all reader contexts created by the same SearchService. Backport of #62715	2020-09-23 12:50:18 -04:00
Julie Tibshirani	7ba0c95191	Mute ClusterHealthIT.testHealthOnMasterFailover while we await a fix.	2020-09-23 09:17:45 -07:00
Alan Woodward	7984e4e89f	Fix test bug in SpanMultiTermQueryBuilderTests (#62833 ) This test checks to see if the index has been created before version 6.4, in which case index prefixes are unavailable and so it expects to see a span multi-term wrapper. However, the production code doesn't bother with checking for versions, because if the field in question is configured with index_prefixes then it knows that it must have been created post 6.4 (you can't merge in a new index_prefixes configuration). This commit alters the test to remove the random version checks, as we know we will always have a prefix field available in this scenario. Fixes #58199	2020-09-23 17:02:12 +01:00
Martijn van Groningen	0baefc8ddc	Always validate that only a create op is allowed in bulk api for data streams (#62820 ) Backport #62766 to 7.x branch. The bulk api cache the resolved concrete indices when resolving the user provided index name into the actual index name. The validation that prevents write ops other than create from being executed in a data stream was only performed if the result wasn't cached. In case of cached resolvings, the validation never occurs. The validation would be skipped for all bulk items for a data stream after a create operation for that same data stream. This commit ensures that the validation is always performed for all bulk items (whether the concrete index resolution has been cached or not cached). Closes #62762	2020-09-23 16:27:54 +02:00
Armin Braun	a754fd8020	Fix CoordinatorTests.testLogsMessagesIfPublicationDelayed (#62815 ) (#62822 ) We need to account for an addional `DEFAULT_DELAY_VARIABILITY` timeout for the lag detector task to be executed after its scheduled. Closes #62383	2020-09-23 14:23:28 +02:00
Christoph Büscher	29074e7055	Add case insensitive prefix and wildcard to 'version' field (#62754 ) (#62782 ) This change adds support for the recently introduced case insensitivity flag for wildcard and prefix queries. Since version field values are encoded differently we need to adapt our own AutomatonQuery variation to add both cases if case insensitivity is turned on.	2020-09-23 11:48:34 +02:00
Ignacio Vera	81645ec2cc	nextSetBit should check if the underlaying array contains the current word (#62805 ) (#62812 ) This is a recent addition and it is missing a check as the underlaying array can be smaller that the numBits capacity.	2020-09-23 11:17:26 +02:00
Luca Cavanna	862fab06d3	Share same existsQuery impl throughout mappers (#57607 ) Most of our field types have the same implementation for their `existsQuery` method which relies on doc_values if present, otherwise it queries norms if available or uses a term query against the _field_names meta field. This standard implementation is repeated in many different mappers. There are field types that only query doc_values, because they always have them, and field types that always query _field_names, because they never have norms nor doc_values. We could apply the same standard logic to all of these field types as `MappedFieldType` has the knowledge about what data structures are available. This commit introduces a standard implementation that does the right thing depending on the data structure that is available. With that only field types that require a different behaviour need to override the existsQuery method. At the same time, this no longer forces subclasses to override `existsQuery`, which could be forgotten when needed. To address this we introduced a new test method in `MapperTestCase` that verifies the `existsQuery` being generated and its consistency with the available data structures.	2020-09-23 11:00:53 +02:00
Luca Cavanna	5ca86d541c	Move stored flag from TextSearchInfo to MappedFieldType (#62717 ) (#62770 )	2020-09-23 09:40:34 +02:00
Nhat Nguyen	663b85b98f	Make keep alive optional in PointInTimeBuilder (#62720 ) Remove the keepAlive parameter from the constructor of PointInTimeBuilder as it's optional.	2020-09-22 18:52:54 -04:00
Jay Modi	cb1dc5260f	Dedicated threadpool for system index writes (#62792 ) This commit adds a dedicated threadpool for system index write operations. The dedicated resources for system index writes serves as a means to ensure that user activity does not block important system operations from occurring such as the management of users and roles. Backport of #61655	2020-09-22 15:31:38 -06:00
Benjamin Trent	77bfb32635	[7.x] [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls (#62694 ) (#62784 ) * [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls (#62694) * [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls global parameters, outside of the global index, are ignored for internal callers in certain cases. If the interal caller is adding requests via the following methods: ``` - BulkRequest#add(IndexRequest) - BulkRequest#add(UpdateRequest) - BulkRequest#add(DocWriteRequest) - BulkRequest#add(DocWriteRequest[]) ``` It is better to specifically set the desired parameters on the requests before they are added to the bulk request object. This commit addresses this issue for the ML plugin * unmuting test	2020-09-22 15:07:08 -04:00
Rory Hunter	3f856d1c81	Prioritise recovery of system index shards (#62640 ) Closes #61660. When ordering shard for recovery, ensure system index shards are ordered first so that their recovery will be started first. Note that I rewrote PriorityComparatorTests to use IndexMetadata instead of its local IndexMeta POJO.	2020-09-22 15:48:27 +01:00
markharwood	a0df0fb074	Search - add case insensitive flag for "term" family of queries #61596 (#62661 ) Backport of fe9145f Closes #61546	2020-09-22 13:56:51 +01:00
Armin Braun	0d5250c99b	Add Trace Logging to File Restore (#62755 ) (#62761 ) Requested by the performance team and generally potentially useful to log each file at `TRACE` like we do for snapshot create.	2020-09-22 14:44:40 +02:00
Amogh Mishra	bc6bea5924	Remove node from cluster when node locks broken (#61400 ) In #52680 we introduced a mechanism that will allow nodes to remove themselves from the cluster if they locally determine themselves to be unhealthy. The only check today is that their data paths are all empirically writeable. This commit extends this check to consider a failure of `NodeEnvironment#assertEnvIsLocked()` to be an indication of unhealthiness. Closes #58373	2020-09-22 10:08:41 +01:00
Armin Braun	aa0dc56412	Ensure MockRepository is Unblocked on Node Close (#62711 ) (#62748 ) `RepositoriesService#doClose` was never called which lead to mock repositories not unblocking until the `ThreadPool` interrupts all threads. Thus stopping a node that is blocked on a mock repository operation wastes `10s` in each test that does it (which is quite a few as it turns out).	2020-09-22 11:00:18 +02:00
Armin Braun	4bdbc39e9f	Fix testQueuedSnapshotOperationsAndBrokenRepoOnMasterFailOverMultiple (#62713 ) (#62747 ) There's possible retries here that work out if both the snapshot and the delete operation are retried when master shuts down and hits the unlikely case of the retried delete executing before the retried snapshot, making both operations pass. Closes #62686	2020-09-22 10:42:11 +02:00
Luca Cavanna	9ae29713fd	Dense vector field type minor fixes (#62631 ) The dense vector field is not aggregatable although it produces fielddata through its BinaryDocValuesField. It should pass up hasDocValues set to true to its parent class in its constructor, and return isAggregatable false. Same for the sparse vector field (only in 7.x). This may not have consequences today, but it will be important once we try to share the same exists query implementation throughout all of the mappers with #57607.	2020-09-22 10:40:51 +02:00
Ignacio Vera	265387f348	override needsScore() on ValueCountAggregator (#62683 ) (#62745 )	2020-09-22 08:47:16 +02:00
Yang Wang	897d2e8a02	Fix ccs permission for search with a scroll id (#62053 ) (#62695 ) CCS with remote indices only does not require any privileges on the local cluster. This PR ensures that search with scroll follow the permission model.	2020-09-22 11:49:40 +10:00
Jim Ferenczi	1fc78d430b	Fix terms aggregation ordering after the final reduce (#62732 ) This commit ensures that the final order of the terms aggregations is registered correctly after the final reduce. This bug was introduced in #62028 which is not released yet so this PR is marked as a non-issue. This issue was discovered when running a terms aggregation under an auto-date histogram. In such a case, the auto-date histogram may run multiple final reduce to merge buckets together. This change makes sure that running multiple final reduces doesn't create duplicates but it doesn't fix the fact that the final reduce may prune the list of terms prematurely. This other bug is tracked separately in #62731.	2020-09-22 00:03:04 +02:00
Nhat Nguyen	f9f4d87437	Remove invalid assertion in SearchService (#62675 ) This assertion does not always hold because there can be a race between `putReaderContext` and `afterIndexRemoved` when an index is deleted. Closes #62624	2020-09-21 16:29:00 -04:00
Ignacio Vera	cadd5dc53f	Fix bug when initializing HyperLogLogPlusPlusSparse (#62602 ) (#62702 ) This is a follow up of #62480 where we are oversizing one array when initialising. In addition it prevents a possible CircuitBreaker leak during initialisation.	2020-09-21 17:30:40 +02:00
Armin Braun	13e28b85ff	Speed up RepositoryData Serialization (#62684 ) (#62703 ) Make serializing `RepositoryData` a little faster and split up/document the code for it a little as well given how massive this method has gotten at this point.	2020-09-21 17:29:56 +02:00
Dan Hermann	a06339ffae	Fix NPE when deleting multiple backing indices on a data stream (#62274 ) (#62708 )	2020-09-21 10:26:47 -05:00
Alan Woodward	1dde4983f6	Convert ConstantKeywordFieldMapper to parametrized form (#62688 ) As part of the conversion, adds the ability to customize merge validation - in this case, we allow an update to the constant value if it is currently set to null, but refuse further updates once it has been set once. This commit also converts ParametrizedMapperTests to use MapperServiceTestCase.	2020-09-21 15:22:56 +01:00
Henning Andersen	0c4cfe4c44	Cardinality request breaker leak (#62685 ) If HyperLogLogPlusPlus failed during construction, it would not release already allocated resources, causing the request circuit breaker to not be adjusted down. Closes #62439	2020-09-21 15:54:04 +02:00
Christoph Büscher	803f78ef05	Add field type for version strings (#59773 ) (#62692 ) This PR adds a new 'version' field type that allows indexing string values representing software versions similar to the ones defined in the Semantic Versioning definition (semver.org). The field behaves very similar to a 'keyword' field but allows efficient sorting and range queries that take into accound the special ordering needed for version strings. For example, the main version parts are sorted numerically (ie 2.0.0 < 11.0.0) whereas this wouldn't be possible with 'keyword' fields today. Valid version values are similar to the Semantic Versioning definition, with the notable exception that in addition to the "main" version consiting of major.minor.patch, we allow less or more than three numeric identifiers, i.e. "1.2" or "1.4.6.123.12" are treated as valid too. Relates to #48878	2020-09-21 14:25:42 +02:00
Alan Woodward	178b25fc4b	Fix standard filter BWC check to allow for cacheing bug (#62649 ) The `standard` tokenfilter was removed by #33310, and should have been unuseable in any indexes created since 7.0. However, a cacheing bug fixed by #51092 meant that it was still possible in certain circumstances to create indexes referencing the standard filter in versions up to 7.5.2. Our checks in AnalysisModule still refer to 7.0.0, however, meaning that a cluster that contains one of these rogue indexes cannot be upgraded. This commit adjusts the AnalysisModule checks so that we only refuse to build a mapping referring to standard filter if the index created version is 7.6 or later. Fixes #62644	2020-09-21 10:12:55 +01:00
Henning Andersen	9a77f41e55	Fix cluster health when closing (#61709 ) When master shuts down it's cluster service, a waiting health request would fail rather than fail over to a new master.	2020-09-19 10:02:36 +02:00
Luca Cavanna	00272ea877	Remove cache key renderer argument from IndicesRequestCache (#62534 ) In the context of of a recurring test failure tracked by #32827, we added trace logging and an extra cache key renderer argument to IndicesRequestCache#getOrCompute (see #39475 and #34180). We addressed the issue with #54071, but the extra argument was left behind, with a NORELEASE comment saying it should be removed. With this commit, we remove the extra cache key rendered argument and the corresponding log lines which are not so useful without it. Closes #55837	2020-09-19 00:24:02 +02:00
Lee Hinman	4a08928c47	[7.x] Add index.routing.allocation.include._tier_preference setting (#62589 ) (#62667 ) This commit adds the `index.routing.allocation.prefer._tier` setting to the `DataTierAllocationDecider`. This special-purpose allocation setting lets a user specify a preference-based list of tiers for an index to be assigned to. For example, if the setting were set to: ``` "index.routing.allocation.prefer._tier": "data_hot,data_warm,data_content" ``` If the cluster contains any nodes with the `data_hot` role, the decider will only allow them to be allocated on the `data_hot` node(s). If there are no `data_hot` nodes, but there are `data_warm` and `data_content` nodes, then the index will be allowed to be allocated on `data_warm` nodes. This allows us to specify an index's preference for tier(s) without causing the index to be unassigned if no nodes of a preferred tier are available. Subsequent work will change the ILM migration to make additional use of this setting. Relates to #60848	2020-09-18 15:41:36 -06:00
Christos Soulios	6a298970fd	[7.x] Allow metadata fields in the _source (#62616 ) Backports #61590 to 7.x So far we don't allow metadata fields in the document _source. However, in the case of the _doc_count field mapper (#58339) we want to be able to set This PR adds a method to the metadata field parsers that exposes if the field can be included in the document source or not. This way each metadata field can configure if it can be included in the document _source	2020-09-18 19:56:41 +03:00
Alan Woodward	17aabaed15	Fix warning on boost docs and warning message on non-implementing fieldmappers	2020-09-18 16:45:08 +01:00
Alan Woodward	43ace5f80d	Emit deprecation warnings when boosts are defined in mappings (#62623 ) We removed index-time boosting back in 5x, and we no longer document the 'boost' parameter on any of our mapping types. However, it is still possible to define an index-time boost on a field mapper for a surprisingly large number of field types, and they even have an effect (sometimes, on some queries). As a first step in finally removing all traces of index time boosting, this comment emits a deprecation warning whenever a boost parameter is found on a mapping definition.	2020-09-18 15:40:53 +01:00
Igor Motov	260c11d89e	Add an additional cancellation check to the fetch phase (#62577 ) (#62587 ) In #62357 we introduced an additional optimization that allows us to skip the most of the fetch phase early if no results are found. This change caused some cancellation test failures that were relying on definitive cancellation during the fetch phase. This commit adds an additional quick cancellation check at the very beginning of the fetch phase to make cancellation process more deterministic. Fixes #62530	2020-09-18 10:00:36 -04:00
Ignacio Vera	18a52f7477	Use BitArray instead of FixedBitSet for collecting ordinals in Cardinality Aggregator (#62600 ) (#62619 ) Changes the way we collecting ordinals in the Cardinality aggregation from Lucene FixedBitSet to BitArray. The benefit is that BitArray is tracked by our Circuit breakers so it is safer.	2020-09-18 14:16:31 +02:00
Tanguy Leroux	9f5e95505b	Also abort ongoing file restores when snapshot restore is aborted (#62441 ) (#62607 ) Today when a snapshot restore is aborted (for example when the index is explicitly deleted) while the restoration of the files from the repository has already started the file restores are not interrupted. It means that Elasticsearch will continue to read the files from the repository and will continue to write them to disk until all files are restored; the store will then be closed and files will be deleted from disk at some point but this can take a while. This will also take some slots in the SNAPSHOT thread pool too. The Recovery API won't show any files actively being recovered, the only notable indicator would be the active threads in the SNAPSHOT thread pool. This commit adds a check before reading a file to restore and before writing bytes on disk so that a closing store can be detected more quickly and the file recovery process aborted. This way the file restores just stops and for most of the repository implementations it means that no more bytes are read (see #62370 for S3), finishing threads in the SNAPSHOT thread pool more quickly too.	2020-09-18 14:04:58 +02:00
Armin Braun	73d19271a9	Fix Races in testQueuedSnapshotOperationsAndBrokenRepoOnMasterFailOverMultipleRepos (#62431 ) (#62614 ) This test (in-part) verifies that snapshot creation is not retried on master fail-over once a snaphot has been started already. Unless we wait for the snapshot creation to show up in the cluster state before failing the master node though, we could run into a race where the snapshot wasn't yet in the cluster state and a retry goes through successfully.	2020-09-18 12:20:23 +02:00
Przemyslaw Gomulka	d87268a264	Round up parsers should be based on a list of parsers backport(#62290 ) (#62604 ) a dateformatter can be created with a list of parsers which are iterated during parsing and the first one that passes will return a parsed date. DateMathParser should do the same, when created based on a list of non-rounding parsers it should also iterate over all of them - it is at the moment only taking first element closing #62207	2020-09-18 12:03:20 +02:00
Adrien Grand	4de8579455	Upgrade to lucene-8.7.0-snapshot-830bd186a8d. (#62596 )	2020-09-18 09:51:34 +02:00
David Turner	06d5d360f9	Tidy up fillInStackTrace implementations (#62555 ) Removes the unnecessary `synchronized` introduced in #62433 and adjusts the others to return `this` not `null` as required by the parent method's Javadocs.	2020-09-18 08:29:48 +01:00
Ignacio Vera	6a3d731be1	Only call reduce on a single InternalAggregation when needed (#62525 ) (#62594 ) Adds a new abstract method in InternalAggregation that flags the framework if it needs to reduce on a single InternalAggregation.	2020-09-18 08:43:58 +02:00
Nhat Nguyen	0127b71901	Adjust keep alive assertion in ShardSearchRequest (#62582 ) Relates #62184	2020-09-17 16:09:54 -04:00
Lee Hinman	9bb7ce0b22	[7.x] Allocate new indices on "hot" or "content" tier depending on data stream inclusion (#62338 ) (#62557 ) Backports the following commits to 7.x: Allocate new indices on "hot" or "content" tier depending on data stream inclusion (#62338)	2020-09-17 13:29:23 -06:00
Martijn van Groningen	5f643433c6	Prohibit the usage of create index api in namespaces managed by data stream templates (#62574 ) Backport of #62527 to 7.x branch. This commit adds validation that prohibits the creation of regular indices in the namespace of templates with data streams enabled. It shouldn't be possible to create ordinary indices when the name of the index matches with a composable index template that enables data streams. Auto creation has logic that creates data streams instead of regular indices. However validation logic for the create index api was missing.	2020-09-17 20:10:42 +02:00
Jim Ferenczi	df93b31b15	Faster sequential access for stored fields (#62509 ) (#62573 ) Faster sequential access for stored fields Spinoff of #61806 Today retrieving stored fields at search time is optimized for random access. So we make no effort to keep state in order to not decompress the same data multiple times because two documents might be in the same compressed block. This strategy is acceptable when retrieving a top N sorted by score since there is no guarantee that documents will be on the same block. However, we have some use cases where the document to retrieve might be completely sequential: Scrolls or normal search sorted by document id. Queries on Runtime fields that extract from _source. This commit exposes a sequential stored fields reader in the custom leaf reader that we use at search time. That allows to leverage the merge instances of stored fields readers that are optimized for sequential access. This change focuses on the fetch phase for now and leverages the merge instances for stored fields only if all documents to retrieve are adjacent. Applying the same logic in the source lookup of runtime fields should be trivial but will be done in a follow up. The speedup on queries sorted by doc id is significant. I played with the scroll task of the http_logs rally track on my laptop and had the following result: \| Metric \| Task \| Baseline \| Contender \| Diff \| Unit \| \|--------------------------------------------------------------:\|-------:\|------------:\|------------:\|---------:\|--------:\| \| Total Young Gen GC \| \| 0.199 \| 0.231 \| 0.032 \| s \| \| Total Old Gen GC \| \| 0 \| 0 \| 0 \| s \| \| Store size \| \| 17.9704 \| 17.9704 \| 0 \| GB \| \| Translog size \| \| 2.04891e-06 \| 2.04891e-06 \| 0 \| GB \| \| Heap used for segments \| \| 0.820332 \| 0.820332 \| 0 \| MB \| \| Heap used for doc values \| \| 0.113979 \| 0.113979 \| 0 \| MB \| \| Heap used for terms \| \| 0.37973 \| 0.37973 \| 0 \| MB \| \| Heap used for norms \| \| 0.03302 \| 0.03302 \| 0 \| MB \| \| Heap used for points \| \| 0 \| 0 \| 0 \| MB \| \| Heap used for stored fields \| \| 0.293602 \| 0.293602 \| 0 \| MB \| \| Segment count \| \| 541 \| 541 \| 0 \| \| \| Min Throughput \| scroll \| 12.7872 \| 12.8747 \| 0.08758 \| pages/s \| \| Median Throughput \| scroll \| 12.9679 \| 13.0556 \| 0.08776 \| pages/s \| \| Max Throughput \| scroll \| 13.4001 \| 13.5705 \| 0.17046 \| pages/s \| \| 50th percentile latency \| scroll \| 524.966 \| 251.396 \| -273.57 \| ms \| \| 90th percentile latency \| scroll \| 577.593 \| 271.066 \| -306.527 \| ms \| \| 100th percentile latency \| scroll \| 664.73 \| 272.734 \| -391.997 \| ms \| \| 50th percentile service time \| scroll \| 522.387 \| 248.776 \| -273.612 \| ms \| \| 90th percentile service time \| scroll \| 573.118 \| 267.79 \| -305.328 \| ms \| \| 100th percentile service time \| scroll \| 660.642 \| 268.963 \| -391.678 \| ms \| \| error rate \| scroll \| 0 \| 0 \| 0 \| % \| Closes #62024	2020-09-17 19:58:18 +02:00
Alan Woodward	5421a743a7	Move SearchLookup into FetchContext (#62549 ) FetchSubPhase#getProcessor currently takes a SearchLookup parameter. This however is only needed by a couple of subphases, and will almost certainly change in future as we want to simplify how fetch phases retrieve values for individual hits. To future-proof against further signature changes, this commit moves the SearchLookup reference into FetchContext instead.	2020-09-17 17:39:02 +01:00
Alan Woodward	e3e3aef3d8	Load version metadata even when stored fields are disabled (#62533 ) Currently we throw an error if stored fields are disabled, but hit version metadata is requested on a search. This doesn't make much sense, as the version information is stored in docvalues and so has no connection with stored fields. This commit removes the link between the two, allowing version metadata to be loaded even when stored fields are disabled in a request. Fixes #62456	2020-09-17 17:39:02 +01:00
Alan Woodward	91e2330529	Warn on badly-formed null values for date and IP field mappers (#62487 ) In #57666 we changed when null_value was parsed for ip and date fields. Previously, the null value was stored as a string, and parsed into a date or InetAddress whenever a document containing a null value was encountered. Now, the values are parsed when the mappings are built, which means that bad values are detected up front; if you try and add a mapping with a badly-parsed ip or date for a null_value, the mapping will be rejected. This causes problems for upgrades in the case when you have a badly-formed null_value in a pre-7.9 cluster. This commit fixes the upgrade case by changing the logic to only logging a warning on the badly formed value, replicating the earlier behaviour. Fixes #62363	2020-09-17 16:38:08 +01:00
Ignacio Vera	901000891a	Fix test error in InternalCardinalityTests#testEqualsAndHashcode (#62542 ) (#62554 ) Make sure the the new HLL++ is different to the original one	2020-09-17 17:09:13 +02:00
Alan Woodward	63afc61b08	Introduce FetchContext (#62357 ) We currently pass a SearchContext around to share configuration among FetchSubPhases. With the introduction of runtime fields, it would be useful to start storing some state on this context to be shared between different subphases (for example, stored fields or search lookups can be loaded lazily but referred to by many different subphases). However, SearchContext is a very large and unwieldy class, and adding more methods or state here feels like a bridge too far. This commit introduces a new FetchContext class that exposes only those methods on SearchContext that are required for fetch phases. This reduces the API surface area for fetch phases considerably, and should give us some leeway to add further state.	2020-09-17 09:57:43 +01:00
Adrien Grand	e0a4a94985	Speed up merging when source is disabled. (#62443 ) (#62474 ) The CodecReader wrapper we use to remove the `_recovery_source` field doesn't override `StoredFieldsreader#getMergeInstance`, which has the undesired side-effect of preventing the wrapped stored fields reader from optimizing merging.	2020-09-17 10:53:31 +02:00
David Turner	62dcc5b1ae	Suppress stack in VersionConflictEngineException (#62433 ) `VersionConflictEngineException` is thrown on the hot path for updates, but stack traces are expensive to compute and transport and rarely useful for this kind of exception. This commit avoids computing the stack trace for these exceptions.	2020-09-17 09:40:07 +01:00
Adrien Grand	9a8225bbc1	Upgrade to lucene-8.7.0-snapshot-9cd3af50f80. (#62450 ) (#62476 ) This new snapshot contains the following JIRAs that we're interested in: - [LUCENE-9525](https://issues.apache.org/jira/browse/LUCENE-9525) Better handling of small documents. This should improve retrieval times when documents are less than ~1kB. - [LUCENE-9510](https://issues.apache.org/jira/browse/LUCENE-9510) Faster flushes when index sorting is enabled by not compressing the temporary files that store stored fields and term vectors.	2020-09-17 10:28:20 +02:00
Armin Braun	5112c17319	Add WARN Logging on Slow Transport Message Handling (#62444 ) (#62521 ) Add simple WARN logging on slow inbound TCP messages.	2020-09-17 10:12:20 +02:00
David Turner	14aec44cd8	Log if recovery affected by disconnect (#62437 ) Today we only emit `DEBUG` logs if the source disconnects from the target during a recovery. This deserves to be noisier by default since it should be rare and may help users identify other problems with their network or with their shard movements. This commit promotes this message to `INFO`. There's no need for `WARN` since these days we will normally resume the recovery where it left off.	2020-09-17 08:22:40 +01:00
Ignacio Vera	2d3ca9c155	Introduce a sparse HyperLogLogPlusPlus class for cloning and serializing low cardinality buckets (#62480 ) (#62520 ) Reduces the memory footprint of an HLL++ structure that uses Linear counting when cloning or deserialising the data structure.	2020-09-17 08:54:50 +02:00
Julie Tibshirani	e1da558206	Remove unused test search context for significant_terms.	2020-09-16 14:27:11 -07:00
Jay Modi	5da922064f	LocalNodeMasterListener is a regular listener (#62485 ) This commit makes the LocalNodeMasterListener interface extend the ClusterStateListener interface and use a default implementation for detecting whether the local node master status changed. Backport of #62422	2020-09-16 11:42:53 -06:00
Tanguy Leroux	8a2e9e66d4	Wait for relocations and disk threshold monitor in DiskThresholdDeciderIT (#62358 ) (#62467 ) Closes #62326	2020-09-16 17:40:20 +02:00
Armin Braun	f6a8599cf8	Don't Start Redundant ConsistentSettingsService (#62283 ) (#62428 ) The consistent settings service is only used in tests so far. No need to start it unless it's actually used.	2020-09-16 09:43:04 +02:00
Ignacio Vera	f3ed641fc7	Adds bucketOrd back to cardinality algorithms (#62389 ) (#62427 )	2020-09-16 08:41:57 +02:00
Nik Everett	24a24d050a	Implement fields fetch for runtime fields (backport of #61995 ) (#62416 ) This implements the `fields` API in `_search` for runtime fields using doc values. Most of that implementation is stolen from the `docvalue_fields` fetch sub-phase, just moved into the same API that the `fields` API uses. At this point the `docvalue_fields` fetch phase looks like a special case of the `fields` API. While I was at it I moved the "which doc values sub-implementation should I use for fetching?" question from a bunch of `instanceof`s to a method on `LeafFieldData` so we can be much more flexible with what is returned and we're not forced to extend certain classes just to make the fetch phase happy. Relates to #59332	2020-09-15 20:24:10 -04:00
Nik Everett	0a7f335215	Speed up writeVInt (backport of #62345 ) (#62419 ) This speeds up `StreamOutput#writeVInt` quite a bit which is nice because it is very commonly called when serializing aggregations. Well, when serializing anything. All "collections" serialize their size as a vint. Anyway, I was examining the serialization speeds of `StringTerms` and this saves about 30% of the write time for that. I expect it'll be useful other places.	2020-09-15 17:14:08 -04:00
Nik Everett	771a8893a6	Add more debugging information for cardinality agg (#62317 ) (#62397 ) This adds two extra bits of info to the profiler: 1. Count of the number of different types of collectors. This lets us figure out if we're using the optimization for segment ordinals. It adds a few more similar counters just for good measure. 2. Profiles the `getLeafCollector` and `postCollection` methods. These are non-trivial for some aggregations, like cardinality.	2020-09-15 13:21:11 -04:00
Armin Braun	ffbc64bd10	Log WARN on Response Deserialization Failure (#62368 ) (#62388 ) We never see this exception in the logs even though it's pretty severe. All we might see is an exception about a transport message not having been read fully from the logic that follows this code. Technically we should probably bubble up the exception but that's a bigger change and needs some carefully reasoning, this change for the time being at least simplifies tracking down deserialization issues in responses.	2020-09-15 18:27:39 +02:00
Adrien Grand	6db8afefc2	Upgrade to lucene-8.7.0-snapshot-cdfdc1e0851. (#62376 ) Upgrade to a new Lucene snapshot that (at least partially) addresses the indexing rate regression when index sorting is enabled. Backport of #62334.	2020-09-15 17:48:07 +02:00
Alan Woodward	f89fa421e2	Remove unnecessary IndexSearcher field on HitContext (#62378 ) FastVectorHighlighter uses the top-level reader to rewrite queries against, which it gets via an IndexSearcher field on HitContext. However, we can already access this top-level reader via HitContext's existing LeafReaderContext field. This commit removes the unnecessary field and constructor parameter, and changes the implementation of topLevelReader to go via ReaderUtils and the leaf reader context.	2020-09-15 15:46:14 +01:00
Christoph Büscher	0ca9829867	Muting CoordinatorTests#testLogsMessagesIfPublicationDelayed	2020-09-15 15:40:51 +02:00
Albert Zaharovits	aeed1c05b0	Ensure authz operation overrides transient authz headers (#61621 ) AuthorizationService#authorize uses the thread context to carry the result of the authorisation as transient headers. The listener argument to the `authorize` method must necessarily observe the header values. This PR makes it so that the authorisation transient headers (`_indices_permissions` and `_authz_info`, but NOT `_originating_action_name`) of the child action override the ones of the parent action. Co-authored-by: Tim Vernum tim@adjective.org	2020-09-15 16:37:38 +03:00
Armin Braun	eae6a3b18e	Fix testMappingVersionAfterDynamicMappingUpdate (#62352 ) (#62360 ) There is a race in this test where the index request will return once the dynamic mapping update has been observed by the cluster state observer internally used by the indexing but not hit all state appliers and thus isn't showing up as the applied state returned by `clusterService.state()` yet.	2020-09-15 11:59:22 +02:00
Alan Woodward	a68f7077c7	Rationalise fetch phase exceptions (#62230 ) We have a special FetchPhaseExecutionException which contains some useful information about which shard and doc a fetch phase has failed in. However, this is not used in many places - currently only the ExplainPhase and the highlighters throw one, and the FetchPhase itself catches IOExceptions and just passes them to the ExceptionsHelper with no extra context. This commit changes FetchPhase to throw FetchPhaseExecutionException if it encounters problems in any of its subphases, and removes the special handling from the explain and highlight phases. It also removes the need to pass shard ids around when building HitContext objects.	2020-09-15 09:28:19 +01:00
Alan Woodward	8089210815	Some small cleanups in TermVectorsService (#62292 ) We removed the use of aggregated stats from term vectors back in #16452, but there is a bunch of dead code left here which can be stripped out.	2020-09-15 09:01:49 +01:00
Ignacio Vera	3536f7f7c2	Initialize BitArray storage as number of bits (#62327 ) (#62354 )	2020-09-15 08:34:22 +02:00
Armin Braun	c81a076f5a	Improve Efficiency of ClusterApplierService Iteration (#62282 ) (#62350 ) The complexity of removing a timeout listener was `O(n)` which means that in case of many queued up CS update tasks (such as in the case of an avalanche of dynamic mapping updates) we're dealing with quadratic complexity for timing out N tasks which was observed to be an issue in practice. This PR makes the complexity of timing out a task `O(1)` and generally simplifies the iteration logic of listeners and applies to be a little more efficient and inline better.	2020-09-15 05:59:48 +02:00
Julie Tibshirani	f56ce4f39b	Fix failure in InnerHitBuilderTests around 'fields' option. (#62344 ) The case InnerHitBuilderTests#testEqualsAndHashcode creates a copy of the object by serializing + deserializing it, then applies a modification. If the 'fields' list is empty, then deserializing it results in Collections.emptyList. Because this is immutable, then modifying it can throw an UnsupportedOperationException. This PR takes the same approach as for docvalue_fields, where we create a new list instead of trying to add to an empty one.	2020-09-14 15:39:03 -07:00
Julie Tibshirani	4a19bdb2ea	Support the 'fields' option in inner_hits and top_hits. (#62337 ) This PR adds support for the 'fields' option in the following places: * Anytime `inner_hits` is used, for both fetching nested/ child docs and field collapsing * The `top_hits` aggregation Addresses #61949.	2020-09-14 11:51:45 -07:00
David Turner	9acd2fd1fd	Minor cleanups to BytesReferenceStreamInput (#62302 ) Followup to #61681: - reuse the current iterator in `reset()` if possible - simply some integer-overflow-avoidance in `skip()` - clarify some comments - address some IntelliJ warnings	2020-09-14 17:02:27 +01:00
Christoph Büscher	e2eada2498	Fix disabling `allow_leading_wildcard` (#62300 ) (#62318 ) Disabling the `query_string` queries `allow_leading_wildcard` parameter didn't work after a change probably introduced in #60959 because the various field types `wildcardQuery` don't check the leading characters like QueryParserBase#getWildcardQuery does. This PR adds the missing check also before calling the field types wildcard generating method. Closes #62267	2020-09-14 17:13:17 +02:00
Alan Woodward	5358cee29c	Cut over more mapping tests to MapperServiceTestCase (#62312 ) Shaves a few more seconds off the build.	2020-09-14 16:00:37 +01:00
Armin Braun	95766da345	Save Some Allocations when Working with ClusterState (#62060 ) (#62303 ) Just a number of obvious spots where we were allocating duplicate empty structures or otherwise inefficient that I found while investigating snapshot cluster state update performance.	2020-09-14 15:09:54 +02:00
Armin Braun	875af1c976	Remove Dead Variable in BlobStoreIndexShardSnapshots. (#62285 ) (#62295 ) This was never used. Co-authored-by: Howard <danielhuang@tencent.com>	2020-09-14 13:40:39 +02:00
Luca Cavanna	53bf057a53	[TEST] avoid double null check in TransportSearchActionTests	2020-09-11 10:10:09 +02:00
Nhat Nguyen	aafb2cb812	Support point in time cross cluster search (#61827 ) This commit integrates point in time into cross cluster search. Relates #61062 Closes #61790	2020-09-10 19:25:48 -04:00
Nhat Nguyen	808c8689ac	Always include the matching node when resolving point in time (#61658 ) If shards are relocated to new nodes, then searches with a point in time will fail, although a pit keeps search contexts open. This commit solves this problem by reducing info used by SearchShardIterator and always including the matching nodes when resolving a point in time. Closes #61627	2020-09-10 19:25:48 -04:00
Nhat Nguyen	035f0638f4	Support point in time in async_search (#61560 ) This commit integrates point in time into async search and ensures that it works correctly with security enabled. Relates #61062	2020-09-10 19:25:48 -04:00
Nhat Nguyen	063a6d047c	Release search context when scroll keep_alive is too large (#62179 ) Previously, we close related search contexts if the keep_alive of a scroll is too large. But we accidentally change this behavior in #62061.	2020-09-10 19:25:48 -04:00
Nhat Nguyen	2eb1e8bc84	Make keep alive of point in time optional in search (#62184 ) A search request should not be required to extend the keep_alive of a point in time. This change makes that parameter optional.	2020-09-10 19:25:48 -04:00
Jim Ferenczi	3fc35aa76e	Shard Search Scroll failures consistency (#62061 ) Today some uncaught shard failures such as RejectedExecutionException skips the release of shard context and let subsequent scroll requests access the same shard context again. Depending on how the other shards advanced, this behavior can lead to missing data since scrolls always move forward. In order to avoid hidden data loss, this commit ensures that we always release the context of shard search scroll requests whenever a failure occurs locally. The shard search context will no longer exist in subsequent scroll requests which will lead to consistent shard failures in the responses. This change also modifies the retry tests of the reindex feature. Reindex retries scroll search request that contains a shard failure and move on whenever the failure disappears. That is not compatible with how scrolls work and can lead to missing data as explained above. That means that reindex will now report scroll failures when search rejection happen during the operation instead of skipping document silently. Finally this change removes an old TODO that was fulfilled with #61062.	2020-09-10 19:25:48 -04:00
Jim Ferenczi	4d528e91a1	Ensure validation of the reader context is executed first (#61831 ) This change makes sure that reader context is validated (`SearchOperationListener#validateReaderContext) before any other operation and that it is correctly recycled or removed at the end of the operation. This commit also fixes a race condition bug that would allocate the security reader for scrolls more than once. Relates #61446 Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>	2020-09-10 19:25:48 -04:00
Luca Cavanna	44bd4a6004	Fix point in time toXContent impl (#62080 ) PointInTimeBuilder is a ToXContentObject yet it does not print out a whole object (it is rather a fragment). Also, when it is printed out as part of SearchSourceBuilder, an error is thrown because pit should be wrapped into its own object. This commit fixes this and adds tests for it.	2020-09-10 19:25:47 -04:00
Nhat Nguyen	3d69b5c41e	Introduce point in time APIs in x-pack basic (#61062 ) This commit introduces a new API that manages point-in-times in x-pack basic. Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. A search request by default executes against the most recent point in time. In some cases, it is preferred to perform multiple search requests using the same point in time. For example, if refreshes happen between search_after requests, then the results of those requests might not be consistent as changes happening between searches are only visible to the more recent point in time. A point in time must be opened before being used in search requests. The `keep_alive` parameter tells Elasticsearch how long it should keep a point in time around. ``` POST /my_index/_pit?keep_alive=1m ``` The response from the above request includes a `id`, which should be passed to the `id` of the `pit` parameter of search requests. ``` POST /_search { "query": { "match" : { "title" : "elasticsearch" } }, "pit": { "id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "keep_alive": "1m" } } ``` Point-in-times are automatically closed when the `keep_alive` is elapsed. However, keeping point-in-times has a cost; hence, point-in-times should be closed as soon as they are no longer used in search requests. ``` DELETE /_pit { "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA=" } ``` #### Notable works in this change: - Move the search state to the coordinating node: #52741 - Allow searches with a specific reader context: #53989 - Add the ability to acquire readers in IndexShard: #54966 Relates #46523 Relates #26472 Co-authored-by: Jim Ferenczi <jimczi@apache.org>	2020-09-10 19:25:47 -04:00
Armin Braun	e0a81f7d14	Speed up Version Checks (#62216 ) (#62253 ) The `fromId` method would show up in profiling and JIT analysis as not-inlinable because it's too large in the contexts it's used in in many cases and was consuming a surprising amount of cycles for computing the min compat versions. -> extract cold path from `fromId` to make JIT happy and cache minimumg compatible versions to fields.	2020-09-10 22:57:06 +02:00
Armin Braun	25db5acb0d	Simplify TimeValue Serialization (#62023 ) (#62248 ) This can be done without map lookups => less code and much smaller methods => better inlining potentially.	2020-09-10 20:16:21 +02:00
Armin Braun	7b941a18e9	Optimize Snapshot Shard Status Update Handling (#62070 ) (#62219 ) Avoiding a number of noop updates that were observed to cause trouble (as in needless noop CS publishing) which can become an issue when working with a large number of concurrent snapshot operations. Also this sets up some simplifications made in the clone snapshot branch.	2020-09-10 16:29:16 +02:00
Ignacio Vera	c8981ea93d	upgrade to lucene-8.7.0-snapshot-b313618cc1d (#62213 ) (#62222 )	2020-09-10 16:23:18 +02:00
Igor Motov	b6bff56a56	Fix hard_bounds interval handling (#62129 ) (#62188 ) The hard bounds were incorrectly scaled for intervals, which was causing incorrect buckets to show up or no buckets at all for interval other than 1. Closes #62126	2020-09-09 15:42:12 -04:00
Nik Everett	1104d65465	Fix bug with terms' min_doc_count (#62130 ) (#62177 ) The `global_ordinals` implementation of `terms` had a bug when `min_doc_count: 0` that'd cause sub-aggregations to have array index out of bounds exceptions. Ooops. My fault. This fixes the bug by assigning ordinals to those buckets. Closes #62084	2020-09-09 13:04:51 -04:00
Armin Braun	6710104673	Fix Creating NOOP Tasks on SNAPSHOT Pool (#62152 ) (#62157 ) Fixing a few spots where NOOP tasks on the snapshot pool were created needlessly. Especially when it comes to mixed master+data nodes and concurrent snapshots these hurt delete operation performance needlessly.	2020-09-09 14:05:17 +02:00
Luca Cavanna	fbf0967e20	QueryPhaseResultConsumer to call notifyPartialReduce (#62083 ) As part of #60275 QueryPhaseResultConsumer ended up calling SearchProgressListener#onPartialReduce directly instead of notifyPartialReduce. That means we don't catch exceptions that may occur while executing the progress listener callback. This commit fixes the call and adds a test for this scenario.	2020-09-09 13:44:07 +02:00
Luca Cavanna	ad83261348	Print out search request as part of async search task description (#62057 ) Currently, the async search task is the task that will be running through the whole execution of an async search. While the submit async search task prints out the search as part of its description, async search task doesn't while it should. With this commit we address that while also making sure that the description highlights that the task is originated from an async search. Also, we streamline the way the description is printed out by SearchTask so that it does not get forgotten in the future.	2020-09-09 13:44:07 +02:00
Rory Hunter	b7fd7cf154	Write deprecation logs to a data stream (#61966 ) Backport of #58924. Closes #46106. Introduce a mechanism for writing deprecation logs to a data stream as well as to disk.	2020-09-09 12:16:28 +01:00
Armin Braun	ed4984a32e	Remove Redundant Stream Wrapping from Compression (#62017 ) (#62132 ) In many cases we don't need a `StreamInput` or `StreamOutput` wrapper around these streams so I this commit adjusts the API to just normal streams and adds the wrapping where necessary.	2020-09-09 03:27:38 +02:00
Nik Everett	b8e9a7125f	Speed up empty highlighting many fields (backport of #61860 ) (#62122 ) Kibana often highlights everything like this: ``` POST /_search { "query": ..., "size": 500, "highlight": { "fields": { "": { ... } } } } ``` This can get slow when there are hundreds of mapped fields. I tested this locally and unscientifically and it took a request from 20ms to 150ms when there are 100 fields. I've seen clusters with 2000 fields where simple search go from 500ms to 1500ms just by turning on this sort of highlighting. Even when the query is just a `range` that and the fields are all numbers and stuff so it won't highlight anything. This speeds up the `unified` highlighter in this case in a few ways: 1. Build the highlighting infrastructure once field rather than once pre document per field. This cuts out a ton* of work analyzing the query over and over and over again. 2. Bail out of the highlighter before loading values if we can't produce any results. Combined these take that local 150ms case down to 65ms. This is unlikely to be really useful when there are only a few fetched docs and only a few fields, but we often end up having many fields with many fetched docs.	2020-09-08 15:49:50 -04:00
Alan Woodward	28fd4a2ae8	Convert RangeFieldMapper to parametrized form (#62058 ) This also adds the ability to define a serialization check on Parameters, used in this case to only serialize format and locale parameters if the mapper is a date range.	2020-09-08 18:44:13 +01:00
Alan Woodward	5f05eef7e3	Convert some more mapping tests to MapperServiceTestCase (#62089 ) We don't need to extend ESSingleNodeTestCase for all these tests.	2020-09-08 17:51:40 +01:00
Tim Brooks	075271758e	Keep checkpoint file channel open across fsyncs (#61744 ) Currently we open and close the checkpoint file channel for every fsync. This file channel can be kept open for the lifecycle of a translog writer. This avoids the overhead of opening the file, checking file permissions, and closing the file on every fsync.	2020-09-08 08:54:53 -06:00
Francisco Fernández Castaño	2bb5716b3d	Add repositories metering API (#62088 ) This pull request adds a new set of APIs that allows tracking the number of requests performed by the different registered repositories. In order to avoid losing data, the repository statistics are archived after the repository is closed for a configurable retention period `repositories.stats.archive.retention_period`. The API exposes the statistics for the active repositories as well as the modified/closed repositories. Backport of #60371	2020-09-08 14:01:04 +02:00
Armin Braun	ebd1569028	Fix testMasterFailOverWithQueuedDeletes (#62062 ) (#62078 ) Fixing very rare corner case where the delete retry is slow. Closes #62031	2020-09-08 10:35:06 +02:00
Nhat Nguyen	bb0a583990	Allow enabling soft-deletes on restore from snapshot (#62018 ) Closes #61969	2020-09-07 09:45:36 -04:00
Alan Woodward	cbc9578cbd	Remove SearchPhase interface (#62050 ) The interface is never used as an abstraction - implementations are are called directly, and most of them don't need to implement the preProcess method.	2020-09-07 13:45:43 +01:00
David Turner	3389d5ccb2	Introduce integ tests for high disk watermark (#60460 ) An important goal of the disk threshold decider is to ensure that nodes use less disk space than the high watermark, and to take action if a node ever exceeds this watermark. Today we do not have any integration-style tests of this high-level behaviour. This commit introduces a small test harness that can adjust the apparent size of the disk and verify that the disk threshold decider moves shards around in response. Co-authored-by: Yannick Welsch <yannick@welsch.lu>	2020-09-07 14:39:39 +02:00
Armin Braun	395538f508	Improve Snapshot State Machine Performance (#62000 ) (#62049 ) Just a few random things to optimize motivated by somewhat sub-standard performance for large snapshot cluster states with many concurrent snapshots observed in production.	2020-09-07 13:25:40 +02:00
Jim Ferenczi	fa8e76abb1	Improve reduction of terms aggregations (#61779 ) (#62028 ) Today, the terms aggregation reduces multiple aggregations at once using a map to group same buckets together. This operation can be costly since it requires to lookup every bucket in a global map with no particular order. This commit changes how term buckets are sorted by shards and partial reduces in order to be able to reduce results using a merge-sort strategy. For bwc, results are merged with the legacy code if any of the aggregations use a different sort (if it was returned by a node in prior versions). Relates #51857	2020-09-07 13:13:20 +02:00
Alan Woodward	a295b0aa86	Fix null_value parsing for data_nanos field mapper (#61994 ) The null_value parameter for date fields is always parsed using DateFormatter.parseMillis, which is incorrect for nanosecond resolution fields. This commit changes the parsing logic to always use DateFieldType.parse() to parse the null value.	2020-09-07 10:58:54 +01:00
Alan Woodward	1799c0c583	Convert completion, binary, boolean tests to MapperTestCase (#62004 ) Also fixes a metadata serialization bug in CompletionFieldMapper.	2020-09-07 10:48:20 +01:00
Luca Cavanna	0c8b438577	Add support for runtime fields (#61776 ) This commit includes the work that has been done on the runtime fields feature branch until now. The high level tasks are listed in #59332. The tasks that have not yet been completed can be worked on after merging the feature branch. We are adding a new x-pack plugin called runtime-fields that plugs in a custom mapper which allows to define runtime fields based on a script. The changes included in this commit that were made outside of the x-pack/plugin/runtime-fields directory are minimal and revolve around 1) making the ScriptService available while parsing index mappings so that the scripts associated to runtime fields can be compiled 2) sharing code to manipulate ranges etc. as it can be reused in runtime fields. Co-authored-by: Nik Everett <nik9000@gmail.com>	2020-09-07 09:14:53 +02:00
Howard	b26584dff8	Remove unused deciders in BalancedShardsAllocator (#62026 )	2020-09-07 00:04:16 -04:00
Armin Braun	1e3edbbe74	Simplify BytesReference StreamInput (#61681 ) (#62014 ) Flattening both streams into a single stream here saves a few objects and some indirection. Also, removed the redundant `offset` field which added nothing but complexity by forcing the incrementation of two counters on every read.	2020-09-05 10:45:52 +02:00
Ryan Ernst	6d3b691048	Add snapshot only test modules (#61954 ) This commit adds external test modules. These are modules meant for external systems to test edge cases in elasticsearch, but only within snapshots. They are not meant to be used in production, so protections are also added from their accidental inclusion in release builds. Note that this commit does not actually add any new modules, it only adds the infrastructure for the new modules, under `test/external-modules`.	2020-09-04 16:35:18 -07:00
Yannick Welsch	6d08b55d4e	Simplify searchable snapshot shard allocation (#61911 ) Simplifies allocation for snapshot-backed shards by always making the recovery source "from snapshot" for those snapshot-backed shards (instead of "recover from local or from empty store"). Also let's the balancer pick a node which to allocate the snapshot-backed shard to (which takes number of shards on each node into account unlike the current implementation which just picks whatever node we are allowed to allocate to, with no notion of "balancing" at all).	2020-09-04 15:45:00 +02:00
Alan Woodward	66bb1eea98	Improve error messages on bad [format] and [null_value] params for date mapper (#61932 ) Currently, if an incorrectly formatted date is passed as a null_value for a date field mapper configuration, you get a vague error: Failed to parse mapping [_doc]: cannot parse empty date Similarly, if you pass an incorrect format, you get the error: Failed to parse mapping [_doc]: Invalid format [...] This commit improves both these errors by including the mapper name and parameter that are misconfigured. Fixes #61712	2020-09-04 14:13:28 +01:00
Ignacio Vera	31c026f25c	upgrade to Lucene-8.7.0-snapshot-61ea26a (#61957 ) (#61974 )	2020-09-04 13:46:20 +02:00
Nik Everett	3d23dcd742	Use standard bit set impl in cardinality (#61816 ) (#61930 ) This replaces a specialized bit set implementation used in cardinality with our standard `BitArray` which works exactly the same way. Its also tracked by `BigArrays` which is great!	2020-09-03 12:37:30 -04:00
Nik Everett	3934e14bc0	Fixup vwhisto test (#60936 ) (#61928 ) This test assumed some random bounds that turned out not to hold in some cases. Closes #60673	2020-09-03 12:37:17 -04:00
Alan Woodward	48870c60c7	Don't spin up a whole node to unit test some data structures (#61923 ) BytesRefHashTests and LongObjectHashMapTests currently extend ESSingleNodeTestCase, which builds an entire node just to run some unit tests over entirely in-memory data structures. This commit converts them both to extend ESTestCase.	2020-09-03 17:19:42 +01:00
Alan Woodward	3a1e0edf0a	Convert DateFieldMapperTests to MapperTestCase (#61920 )	2020-09-03 16:04:02 +01:00
Martijn Laarman	cfa54c08bd	[7.x] Version bump 7.9.1 release	2020-09-03 16:41:58 +02:00
Alan Woodward	e2f006eeb4	Merge FetchSubPhase hitsExecute and hitExecute methods (#60907 ) (#61893 ) FetchSubPhase has two 'execute' methods, one which takes all hits to be examined, and one which takes a single HitContext. It's not obvious which one should be implemented by a given sub-phase, or if implementing both is a possibility; nor is it obvious that we first run the hitExecute methods of all subphases, and then subsequently call all the hitsExecute methods. This commit reworks FetchSubPhase to replace these two variants with a processor class, `FetchSubPhaseProcessor`, that is returned from a single `getProcessor` method. This processor class has two methods, `setNextReader()` and `process`. FetchPhase collects processors from all its subphases (if a subphase does not need to execute on the current search context, it can return `null` from `getProcessor`). It then sorts its hits by docid, and groups them by lucene leaf reader. For each reader group, it calls `setNextReader()` on all non-null processors, and then passes each doc id to `process()`. Implementations of fetch sub phases can divide their concerns into per-request, per-reader and per-document sections, and no longer need to worry about sorting docs or dealing with reader slices. FetchSubPhase now provides a FetchSubPhaseExecutor that exposes two methods, setNextReader(LeafReaderContext) and execute(HitContext). The parent FetchPhase collects all these executors together (if a phase should not be executed, then it returns null here); then it sorts hits, and groups them by reader; for each reader it calls setNextReader, and then execute for each hit in turn. Individual sub phases no longer need to concern themselves with sorting docs or keeping track of readers; global structures can be built in getExecutor(SearchContext), per-reader structures in setNextReader and per-doc in execute.	2020-09-03 12:20:55 +01:00
Alan Woodward	af01ccee93	Add specific test for serializing all mapping parameter values (#61844 ) (#61877 ) This commit adds a test to MapperTestCase that explicitly checks that a mapper can serialize all its default values, and that this serialization can then be re-parsed. Note that the test is disabled for non-parametrized mappers as their serialization may in some cases output parameters that are not accepted. Gradually moving all mappers to parametrized form will address this. The commit also contains a fix to keyword mappers, which were not correctly serializing the similarity parameter; this partially addresses #61563. It also enables `null` as a value for `null_value` on `scaled_float`, as a follow-up to #61798	2020-09-03 09:20:26 +01:00
Nik Everett	c19f67ce30	Support longs in BitArray (backport of #61867 ) (#61871 ) We frequently use `long`s with `BitArray` in aggs and right now we have to assert that the `long` fits in an `int`. This adds support for `long` to `BitArray` so we don't need those assertions.	2020-09-02 17:24:31 -04:00
Henning Andersen	867d5f1c68	Search memory leak (#61788 ) (#61862 ) Search could leak memory if global ordinals were calculated as part of a search with low level cancellation enabled. QueryPhase registers a cancellation on the reader that is never removed, which ends up being referenced from the global ordinals cache entry. This keeps an indirect reference to the search context. A significant leak can occur when a heavy aggregation (cardinality for instance) is used and a failure occurs during search, in particular if the pages backing the hyperlog++ structure are not recycled when it is closed. This commit also fixes an issue with an unclosed resource and request breaker adjustment in the cardinality aggregation.	2020-09-02 18:51:14 +02:00
Jim Ferenczi	a0e4331c49	Cleanup usages of QueryPhaseResultConsumer (#61713 ) This commit generalizes how QueryPhaseResultConsumer is initialized. The query phase always uses this consumer so it doesn't need to be hidden behind an abstract class.	2020-09-02 14:41:02 +02:00
Alan Woodward	d59343b4ba	Allow [null] values in [null_value] (#61798 ) (#61807 ) Several field mappers have a null_value parameter, that allows you to specify a placeholder value to insert into a document if the incoming value for that field is null. The default value for this is always null, meaning "add no placeholder". However, we explicitly bar users from setting this parameter directly to null (done in #7978, in order to fix an NPE). This exclusion means that if a mapper is serialized with include_defaults, then we either need to special-case null_value to ensure that it is not output when it holds the default value, or we find that the resulting serialized form cannot be used to create a mapping. This stops us doing some useful generic testing of mappers. This commit permits null as a parameter value for null_value, and changes the tests to check that it is a) permissible and b) applied without throwing errors. As part of the testing changes, a new base class MapperServiceTestCase is refactored from MapperTestCase, holding the various helper methods related to building mappings but not the single-mapper specific abstract methods. Closes #58823	2020-09-02 10:42:19 +01:00
Igor Motov	48e53cca94	Fix wrong NaN comparison (#61795 ) (#61811 ) Fixes wrong NaN comparison in error message generator in GeoPolygonDecomposer and PolygonBuilder. Supersedes #48207 Co-authored-by: Pedro Luiz Cabral Salomon Prado <pedroprado010@users.noreply.github.com>	2020-09-01 15:50:38 -04:00
Tim Brooks	e573fa9abc	Add data.path fast path for FilePermission (#61302 ) The recursive data.path FilePermission check is an extremely hot codepath in Elasticsearch. Unfortunately the FilePermission check in Java is extremely allocation heavy. As it iterates through different file permissions, it allocates byte arrays for each Path component that must be compared. This PR improves the situation by adding the recursive data.path FilePermission it its own PermissionsCollection object which is checked first.	2020-09-01 12:03:22 -06:00
Armin Braun	28710c985d	Dry up Settings from Map Construction (#61778 ) (#61803 ) We used the same hack all over the place. At least drying it up to a single place. Co-authored-by: Jay Modi <jaymode@users.noreply.github.com>	2020-09-01 19:46:10 +02:00
Tanguy Leroux	6e944d9e21	Throws IndexNotFoundException in TransportGetAction for unknown System indices (#61785 ) (#61791 ) The change #57936 introduced a dedicated thread pool for reads in system indices. It also introduced a potential NPE in the case the index to read in not yet present in the cluster state. This commit fixes that bug by using the getIndexSafe() instead of just index() method when retrieving the index's metadata so that an INFE is thrown if the index does not exist.	2020-09-01 17:41:57 +02:00
Dan Hermann	88a448f1cd	Fix wrong result when executing bulk requests with and without pipeline (#60818 ) (#61777 )	2020-09-01 07:05:25 -05:00
Armin Braun	3fd25bfa87	Fix Concurrent Snapshot Create+Delete + Delete Index (#61770 ) (#61773 ) We had a bug here were we put a `null` value into the shard assignment mapping when reassigning work after a snapshot delete had gone through. This only affects partial snaphots but essentially dead-locks the snapshot process. Closes #61762	2020-09-01 13:20:25 +02:00
Tanguy Leroux	787dfda4c1	Prevent snapshots to be mounted as system indices (#61517 ) (#61727 ) System indices can be snapshotted and are therefore potential candidates to be mounted as searchable snapshot indices. As of today nothing prevents a snapshot to be mounted under an index name starting with . and this can lead to conflicting situations because searchable snapshot indices are read-only and Elasticsearch expects some system indices to be writable; because searchable snapshot indices will soon use an internal system index (#60522) to speed up recoveries and we should prevent the system index to be itself a searchable snapshot index (leading to some deadlock situation for recovery). This commit introduces a changes to prevent snapshots to be mounted as a system index.	2020-09-01 11:13:28 +02:00
Boice Huang	8fdd3d158b	Remove redundant symbol in msearch tests (#61353 )	2020-09-01 10:58:22 +02:00
Nik Everett	fb84c1f73e	Calculate precise cardinality upper bounds (#61529 ) (#61754 ) This reworks `CardinalityUpperBound` to support precise estimates while maintaining most of the public API. This will allow us to make more informed choices about the data structures that we use in aggregations. None of those interesting choices come as part of this change, but they are more possible with it.	2020-08-31 15:10:02 -04:00
Dan Hermann	2858e1efc4	Document new stats in _cat/nodes (#60445 ) (#61742 )	2020-08-31 12:40:21 -05:00
Adam Locke	5723b928d7	Remove Outdated Snapshot Docs (#61684 ) (#61728 ) Removing some now outdated statements that refer to a time when snapshot operations could not run concurrently. Closes #61680	2020-08-31 12:04:27 -04:00
Jason Tedor	43cb7c48bd	Adjust Lucene versions for 7.9.1 This commit adjusts the Lucene versions for 7.9.1 after the backporting of upgrading the 7.9 branch to Lucene 8.6.2.	2020-08-31 10:30:39 -04:00
Jason Tedor	64cd229b35	Upgrade to Lucene 8.6.2 (#61688 ) This commit upgrades the Lucene dependencies to 8.6.2.	2020-08-31 09:54:07 -04:00
Rory Hunter	ff6c071275	Implement deprecation logging using log4j (#61629 ) Backport of #61474. Part of #46106. Simplify the implementation of deprecation logging by relying of log4j more completely, and implementing additional behaviour through custom appenders and filters.	2020-08-31 12:42:04 +01:00
Armin Braun	5c86b216e8	Fix Race in testGetSnapshotsRequest (#61694 ) (#61700 ) The fact that the data node is already blocked on writing data files did not guarantee that the cluster state that made the data node start snapshotting is already applied on master. This could lead to races where the get snapshots action still runs based on a state without the snapshot in it, tripping the assertion. Much safer to handle this by waiting on the non-blocking snapshot create to return, which guarantees that the CS has been applied on master. Closes #61541	2020-08-31 11:06:51 +02:00
Armin Braun	22e4d759c3	Speed up Reading Enum Set from Stream (#61678 ) (#61687 ) No need in adding enum values to a normal set and then copying, the `EnumSet` is directly mutable just fine.	2020-08-30 20:49:51 +02:00
Jake Landis	d2e5f2f532	[7.x] Enhance the ingest node simulate verbose output (#60433 ) (#60678 ) This commit enhances the verbose output for the `_ingest/pipeline/_simulate?verbose` api. Specifically this adds the following: * the pipeline processor is now included in the output * the conditional (if) and result is now included in the output iff it was defined * a status field is always displayed. the possible values of status are * `success` - if the processor ran with out errors * `error` - if the processor ran but threw an error that was not ingored * `error_ignored` - if the processor ran but threw an error that was ingored * `skipped` - if the process did not run (currently only possible if the if condition evaluates to false) * `dropped` - if the the `drop` processor ran and dropped the document * a `processor_type` field for the type of processor (e.g. set, rename, etc.) * throw a better error if trying to simulate with a pipeline that does not exist closes #56004	2020-08-27 16:53:09 -05:00
Lee Hinman	1bfebd54ea	[7.x] Allocate newly created indices on data_hot tier nodes (#61342 ) (#61650 ) This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by default when they are created. This does not break existing behavior, as nodes with the `data` role are considered to be part of the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`, `data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by default. This change is a little more complicated than changing the default value for `index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to have a plugin inject a setting into the builder for a newly created index. This has the benefit of allowing this setting to be visible as part of the settings when retrieving the index, for example: ``` // Create an index PUT /eggplant // Get an index GET /eggplant?flat_settings ``` Returns the default settings now of: ```json { "eggplant" : { "aliases" : { }, "mappings" : { }, "settings" : { "index.creation_date" : "1597855465598", "index.number_of_replicas" : "1", "index.number_of_shards" : "1", "index.provided_name" : "eggplant", "index.routing.allocation.include._tier" : "data_hot", "index.uuid" : "6ySG78s9RWGystRipoBFCA", "index.version.created" : "8000099" } } } ``` After the initial setting of this setting, it can be treated like any other index level setting. This new setting is not set on a new index if any of the following is true: - The index is created with an `index.routing.allocation.include.<anything>` setting - The index is created with an `index.routing.allocation.exclude.<anything>` setting - The index is created with an `index.routing.allocation.require.<anything>` setting - The index is created with a null `index.routing.allocation.include._tier` value - The index was created from an existing source metadata (shrink, clone, split, etc) Relates to #60848	2020-08-27 13:41:12 -06:00
Luca Cavanna	f769821bc8	Pass SearchLookup supplier through to fielddataBuilder (#61430 ) (#61638 ) Runtime fields need to have a SearchLookup available, when building their fielddata implementations, so that they can look up other fields, runtime or not. To achieve that, we add a Supplier<SearchLookup> argument to the existing MappedFieldType#fielddataBuilder method. As we introduce the ability to look up other fields while building fielddata for mapped fields, we implicitly add the ability for a field to require other fields. This requires some protection mechanism that detects dependency cycles to prevent stack overflow errors. With this commit we also introduce detection for cycles, as well as a limit on the depth of the references for a runtime field. Note that we also plan on introducing cycles detection at compile time, so the runtime cycles detection is a last resort to prevent stack overflow errors but we hope that we can reject runtime fields from being registered in the mappings when they create a cycle in their definition. Note that this commit does not introduce any production implementation of runtime fields, but is rather a pre-requisite to merge the runtime fields feature branch. This is a breaking change for MapperPlugins that plug in a mapper, as the signature of MappedFieldType#fielddataBuilder changes from taking a single argument (the index name), to also accept a Supplier<SearchLookup>. Relates to #59332 Co-authored-by: Nik Everett <nik9000@gmail.com>	2020-08-27 18:09:56 +02:00
Alan Woodward	b6cb590685	Log more information when mappings fail on index creation (#61577 ) Errors from bad mappings at index creation are currently logged at DEBUG level, which can make it difficult to work out what's going on if the index is being auto-created. This commit ups the log level to INFO for auto-created indices, and includes some more information in the log message.	2020-08-27 15:08:51 +01:00
David Turner	411965d392	Allow background cluster state update in tests (#61455 ) Today the `CoordinatorTests` run the publication process as a single atomic action; however in production it appears possible that another master may be elected, publish its state, then fail, then we win another election, all in between the time we sampled our previous cluster state and started to publish the one we first thought of. This violates the `assertClusterStateConsistency()` assertion that verifies the cluster state update event matches the states we actually published and applied. This commit adjusts the tests to run the publication process more asynchronously so as to allow time for this behaviour to occur. This should eventually result in a reproduction of the failure in #61437 that will let us analyse what's really going on there and help us fix it.	2020-08-27 11:22:58 +01:00
David Turner	b866aaf81c	Use int for number of parts in blob store (#61618 ) Today we use `long` to represent the number of parts of a blob. There's no need for this extra range, it forces us to do some casting elsewhere, and indeed when snapshotting we iterate over the parts using an `int` which would be an infinite loop in case of overflow anyway: for (int i = 0; i < fileInfo.numberOfParts(); i++) { This commit changes the representation of the number of parts of a blob to an `int`.	2020-08-27 10:54:03 +01:00
David Turner	5df74cc888	Replace Math.toIntExact with toIntBytes (#61604 ) We convert longs to ints using `Math.toIntExact` in places where we're sure there will be no overflow, but this doesn't explain the intent of these conversions very well. This commit introduces a dedicated method for these conversions, and adds an assertion that we never overflow.	2020-08-27 08:28:54 +01:00
Jay Modi	34c4fc3b91	Remove tasks module to define tasks system index (#61588 ) This commit removes the tasks module that only existed to define the tasks result index, `.tasks`, as a system index. The definition for the tasks results system index descriptor is moved to the `SystemIndices` class with a check that no other plugin or module attempts to define an entry with the same source. Additionally, this change also makes the pattern for the tasks result index a wildcard pattern since we will need this when the index is upgraded (reindex to new name and then alias that to .tasks). Backport of #61540	2020-08-26 09:48:23 -06:00
David Turner	f2dc664228	Remove dead code in EsExecutors (#61574 ) Removes a couple of unused methods.	2020-08-26 16:08:36 +01:00
Przemyslaw Gomulka	9f566644af	Do not create two loggers for DeprecationLogger backport(#58435 ) (#61530 ) DeprecationLogger's constructor should not create two loggers. It was taking parent logger instance, changing its name with a .deprecation prefix and creating a new logger. Most of the time parent logger was not needed. It was causing Log4j to unnecessarily cache the unused parent logger instance. depends on #61515 backports #58435	2020-08-26 16:04:02 +02:00
Igor Motov	f70a59971a	[7.x] Add rate aggregation (#61369 ) (#61554 ) Adds a new rate aggregation that can calculate a document rate for buckets of a date_histogram. Closes #60674	2020-08-25 17:39:00 -04:00
Nik Everett	87cf81e179	Migrate some more mapper test cases (#61507 ) (#61552 ) Migrate some more mapper test cases from `ESSingleNodeTestCase` to `MapperTestCase`.	2020-08-25 15:27:26 -04:00
markharwood	8b56441d2b	Search - add case insensitive support for regex queries. (#59441 ) (#61532 ) Backport to add case insensitive support for regex queries. Forks a copy of Lucene’s RegexpQuery and RegExp from Lucene master. This can be removed when 8.7 Lucene is released. Closes #59235	2020-08-25 17:18:59 +01:00
Przemyslaw Gomulka	f3f7d25316	Header warning logging refactoring backport(#55941 ) (#61515 ) Splitting DeprecationLogger into two. HeaderWarningLogger - responsible for adding a response warning headers and ThrottlingLogger - responsible for limiting the duplicated log entries for the same key (previously deprecateAndMaybeLog). Introducing A ThrottlingAndHeaderWarningLogger which is a base for other common logging usages where both response warning header and logging throttling was needed. relates #55699 relates #52369 backports #55941	2020-08-25 16:35:54 +02:00
Armin Braun	f22ddf822e	Some Optimizations around BytesArray (#61183 ) (#61511 ) * Faster `equals` for `BytesArray` which is nice since with this change we use it for the search cache * Lighter `StreamInput` for `BytesArray` that should save memory and some indirection relative to the one on the abstract bytes reference * Lighter `writeTo` implementation * Build a `BytesArray` instead of a PagedBytesReference whenever possible to save indirection and memory	2020-08-25 07:13:39 +02:00
Armin Braun	806dfcfcf7	Speed up Compression Logic by Pooling Resources (#61358 ) (#61495 ) This is mostly motivated by the performance issues we are seeing around the GET mappings REST API which (in case of a large number of indices) will create decompressing streams in a hot loop which takes a significant amount of time for the system calls involved in instantiating deflaters and inflaters. Also, this fixes a leaked deflater when deserializing cached repository data.	2020-08-25 04:01:55 +02:00
Armin Braun	16b932c1dc	Remove Potentially Expensive Use of BytesReference.toBytesRef (#61415 ) (#61503 ) This method might have materialize all the bytes in a reference into a fresh `byte[]`. Using the stream is much safer and only trivially more expensive + in most cases we now run the fast path via `BytesArray` anyway.	2020-08-24 23:58:21 +02:00
Nhat Nguyen	d47bbbafe0	Cancel multisearch when http connection closed (#61399 ) Relates #61337	2020-08-24 15:12:54 -04:00
Nhat Nguyen	23a0f8b617	Detect and optimize noop of update index settings (#61348 ) This optimization is more relevant in the context of CCR. When a node in the follower cluster leaves, we reallocate the shard-follow tasks on that node to other nodes. The new tasks will overwhelm the follower cluster with many put-mapping, update-settings requests, although most of them are noop. This change detects and optimizes the noop update-settings requests.	2020-08-24 15:08:53 -04:00
Nik Everett	f3b6d49ae1	Migrate server mapper tests to new MapperTestCase (#61378 ) (#61490 ) This continues #61301, migrating all of the mappers in `server` to the new `MapperTestCase` which is nicer than `FieldMapperTestCase` because it doesn't depend on all of Elasticsearch.	2020-08-24 13:33:35 -04:00
Armin Braun	bb4d97073c	Remove Favicon Special Path in RestController (#61460 ) (#61487 ) It's unnecessary (and adds one string comparison to every request) to special case the favicon so I added it as a normal REST handler to simplify the code.	2020-08-24 18:36:23 +02:00
Armin Braun	af2e2782eb	Stop Needlessly Copying Bytes in XContent Parsing (#61447 ) (#61469 ) Wrapping a `BytesArray` in a `StreamInput` for deserialization is inefficient. This forces Jackson to internally buffer (i.e. copy) all bytes from the `BytesArray` before deserializing, adding overhead for copying the bytes and managing the buffers. This commit fixes a number of spots where `BytesArray` is the most common type of `BytesReference` to special case this type and parse it more efficiently. Also improves parsing `String`s to use the more efficient direct `String` parsing APIs.	2020-08-24 15:49:15 +02:00
Dan Hermann	c53731a0cd	[7.x] Fix wrong pipeline name in debug log (#58817 ) (#61233 )	2020-08-21 11:14:01 -05:00
David Turner	078e8717ee	Stop opening PING conns to remote clusters (#61408 ) Today a remote cluster connection comprises a `PING` and a `REG` channel. The `PING` channel is only used for health checks between the elected master and the members of its own cluster, so is unused in a remote cluster connection. This commit removes this unused connection.	2020-08-21 12:21:57 +01:00
Armin Braun	e09058df1a	Serialize Get Mappings Response on Generic ThreadPool (#57937 ) (#61401 ) For large responses to the get mappings request, the serialization to XContent can be extremely slow (serializing mappings is expensive since we have to decompress and deserialize the mapping source). To not introduce instability on the IO thread handling the get mappings response we should move the serialization to the management pool. The trade-off of introducing one or two new context switches for responses that are small enough to not cause trouble on the transport thread to prevent instability in case of a large number of mappings in the cluster seems worth it.	2020-08-21 08:06:30 +02:00
Armin Braun	22509c95f8	Fix Blackholed Connection Behavior in DisruptableMockTransport (#61310 ) (#61381 ) It is not realistic to drop messages without eventually failing. To retain the coverage of long pauses this PR adjusts the blackholed behavior to fail a send after 24h (which is assumed to be longer than any timeout in the system) instead of never. Closes #61034	2020-08-21 07:54:56 +02:00
Julie Tibshirani	997c73ec17	Correct how field retrieval handles multifields and copy_to. (#61391 ) Before when a value was copied to a field through a parent field or `copy_to`, we parsed it using the `FieldMapper` from the source field. Instead we should parse it using the target `FieldMapper`. This ensures that we apply the appropriate mapping type and options to the copied value. To implement the fix cleanly, this PR refactors the value parsing strategy. Now instead of looking up values directly, field mappers produce a helper object `ValueFetcher`. The value fetchers are responsible for almost all aspects of fetching, including looking up the right paths in the _source. The PR is fairly big but each commit can be reviewed individually. Fixes #61033.	2020-08-20 15:53:35 -07:00
Julie Tibshirani	85ad328df7	Ensure fetch fields aren't dropped when rewriting search. (#61390 ) Previously we didn't retain the requested fields when performing a shallow copy of the search source. This meant that when a search was rewritten, we could drop the requested fields and fail to return them in the response.	2020-08-20 14:58:58 -07:00
Armin Braun	08dbd6d989	Optimize a few Spots on IO Loop (#60865 ) (#61380 ) Saving some cycles here and there on the IO loop: * Don't instantiate new `Runnable` to execute on `SAME` in a few spots * Don't instantiate complicated wrapped stream for empty messages * Stop instantiating almost never used `ClusterStateObserver` in two spots * Some minor cleanup and preventing pointless `Predicate<>` instantiation in transport master node action	2020-08-20 20:22:49 +02:00
Alan Woodward	a3a0c63ccf	Convert NumberFieldMapper to parametrized form (#61092 ) (#61376 ) In addition, this commit converts ScaledFloatFieldMapper as it was relying on a number of static values taken from NumberFieldMapper that had changed or been removed.	2020-08-20 16:43:26 +01:00
Nhat Nguyen	a3906dcef3	Enable cancellation for msearch requests (#61337 ) Today multi-search requests are not cancellable because we create regular tasks instead of cancellable ones for them.	2020-08-19 16:59:17 -04:00
Nik Everett	9789e6d154	Migrate some field mapper tests to ESTestCase (#61301 ) (#61346 ) This switches a few tests for field mappers from `ESSingleNodeTestCase` to `ESTestCase` because, in general, we prefer to avoid `ESSingleNodeTestCase` when we can because it is slow and "big". "Big" here means that it pulls in an entire node, making it difficult to reason about what you are testing.	2020-08-19 15:43:49 -04:00
Armin Braun	4a53ae203e	Fix SharedClusterSnapshotRestoreIT.testThrottling (#61323 ) (#61328 ) We have to set the recovery setting to `0` if we don't want throttling from recoveries. Otherwise the randomized value used for this setting in tests can lead to throttling unexpectedly. Closes #61311	2020-08-19 15:26:32 +02:00
Nik Everett	70128e022b	fix stats aggregator tests With #60683 we stopped forcing aggregating all docs using a single Aggregator which made some of our accuracy assumptions about the stats aggregator incorrect. This adds a test that does the forcing and asserts the old accuracy and adds a test without the forcing with much looser accuracy guarantees. Closes #61132	2020-08-19 08:55:43 -04:00
Alan Woodward	b1aa0d8731	Fix fieldnames field type for pre-6.1 indexes (#61322 ) The FieldNamesFieldMapper field has different behaviour for indexes created in clusters earlier than v6.1, and the code to deal with this was still using the vestigial FieldType field of FieldMapper in its indexing path. This meant that documents added after an upgrade were not correctly indexing their field names field. This commit corrects the parseCreateField method to use the default field type. Fixes #61305	2020-08-19 12:59:09 +01:00
David Turner	389f7779e7	Report more details of unobtainable ShardLock (#61255 ) Today a common reason for a `ShardLockObtainFailedException` is when a shard is removed from a node and then assigned straight back to it again before the node has had a chance to shut the previous shard instance down. For instance, this can happen if a node briefly leaves the cluster holding a primary with no in-sync replicas. The message in this case is typically as follows: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation] This is pretty hard to interpret, and doesn't raise the important question: "why didn't the shard shut down sooner?" With this change we reword the message a bit, report the age of the shard lock, and adjust the details to report that the lock is held by a closing shard: obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [12345ms] Relates #38807	2020-08-19 06:36:28 +01:00
Nhat Nguyen	08b0e78ef4	Log more info when search ops higher than expected (#61108 ) We have seen a situation where the total search operations are higher than expected. Unfortunately, we did not have enough info to figure it out. This commit adds the failures to the error to provide more context and adjusts the log level in case of failure to debug.	2020-08-18 15:20:41 -04:00
Rory Hunter	bd7236cd65	Version bump for 7.9.0 release	2020-08-18 16:07:43 +01:00
Dimitrios Liappis	c870640cbd	[7.x] Introduce 6.8.13 as a version (#61198 ) Introduce version 6.8.13 to branch 7.x	2020-08-18 17:07:16 +03:00
Armin Braun	58d07b2ffc	Remove Unused ByteBufferReference (#61116 ) (#61250 ) We only work with heap byte buffers at this point and those we can and do unwrap the `byte[]` ourselves and use `BytesArray` instead of a needless level of indirection via `ByteBuffer`.	2020-08-18 10:53:40 +02:00
Armin Braun	6ffa7f0737	Fix testConcurrentSnapshotDeleteAndDeleteIndex (#61228 ) (#61249 ) There is a corner case here in which during partial snapshot the index is deleted right between starting the snapshot in the CS and the data node getting to work on it, causing the data node the fail that shard snapshot and making the snapshot `PARTIAL`. Closes #61208	2020-08-18 10:45:30 +02:00
Mark Tozzi	db1df6cc30	[7.x] Remove a bunch of type boilerplate from Aggs (#60852 ) (#61031 )	2020-08-17 12:13:05 -04:00
Nik Everett	1b7bbafd81	Add method to make random DateFormatter pattern (backport of #60613 ) (#61213 ) Adds a method to make a random date `DateFormatter` pattern. We expect this'll be useful for runtime fields to compate their formatting with the standard date field.	2020-08-17 10:57:52 -04:00
Christoph Büscher	6866396e1d	Improve 'ignore_malformed' handling for dates (#60211 ) Currently we occasionally can get ArithmeticException from parsing bad input values on 'date' fields that are passed on even if 'ignore_malformed' is set. This change adds this exception to the ones we already catch for malformed values. Closes #52634	2020-08-17 16:18:08 +02:00
David Turner	b21cb7f466	Reduce allocations when persisting cluster state (#61159 ) Today we allocate a new `byte[]` for each document written to the cluster state. Some of these documents may be quite large. We need a buffer that's at least as large as the largest document, but there's no need to use a fresh buffer for each document. With this commit we re-use the same `byte[]` much more, only allocating it afresh if we need a larger one, and using the buffer needed for one round of persistence as a hint for the size needed for the next one.	2020-08-17 13:45:31 +01:00
David Turner	f3e0c60896	Restrict testing of legacy discovery to tests (#61178 ) The 7.x branch preserves the legacy discovery mechanism from 6.x purely for running internal cluster tests; this mechanism is otherwise completely untested and unsupported. However it is still technically possible to use it outside of the test suite if you dig through the source code to work out what settings need to be set. With this change we make it impossible to use this mechanism in production. Closes #61177	2020-08-17 11:05:27 +01:00
Ryan Ernst	9cb45dafab	Unwrap transport exception when using transport client (#60801 ) The ReloadSecureSettingsIT makes requests to the reload settings apis. In 7.x, the client used from the integ test infrastructure may be a transport client. In that case, the expected exception type, and causes the test to fail (though it will hang indefinitely due to not counting down the latch, see https://github.com/elastic/elasticsearch/pull/60800). This commit adds unwrapping of the remote exception to get the underlying expected exception. closes #51546	2020-08-13 10:24:04 -07:00
Ryan Ernst	c73ab0b16f	Ensure hotthreads do not produce node failures (#61073 ) This commit adds an assertion that no sub-nodes requests within hot threads failed. relates #58842	2020-08-13 10:22:19 -07:00
Armin Braun	3143b5ea47	Stabilize testSnapshotDeleteRelocatingPrimaryIndex (#61088 ) (#61096 ) Use transport blocking to make relocation take forever instead of relying on the relocation to take long enough to clash with the snapshot. Closes #61069	2020-08-13 16:26:56 +02:00
Yannick Welsch	8e775394ac	Fix testNoMasterActionsMetadataWriteMasterBlock (#60605 ) We can't assert on the specific exception, unfortunately.	2020-08-13 10:48:16 +02:00
David Turner	c6276ae177	Fail invalid incremental cluster state writes (#61030 ) It is disastrous if we commit an incremental cluster state update without having written the full state first. We assert that this doesn't happen, but it is hard to fully test the myriad ways that things might fail in a messy production environment. Given the disastrous consequences it is worth erring on the side of caution in this area. This commit fails invalid writes even if assertions are disabled.	2020-08-12 19:46:19 +01:00
Lee Hinman	e3df64a429	[7.x] Add data tiers (hot, warm, cold, frozen) as custom node roles (#60994 ) (#61045 ) This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the x-pack plugin. These roles are intended to be the base for the formalization of data tiers in Elasticsearch. These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing `data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and frozen node). This also includes a custom `AllocationDecider` that allows the user to configure the following settings on a cluster level: - `cluster.routing.allocation.require._tier` - `cluster.routing.allocation.include._tier` - `cluster.routing.allocation.exclude._tier` And in index settings: - `index.routing.allocation.require._tier` - `index.routing.allocation.include._tier` - `index.routing.allocation.exclude._tier` Relates to #60848	2020-08-12 11:06:23 -06:00
Alan Woodward	5b3c10c379	Fix serialization of AllFieldMapper (#61044 ) Converting AllFieldMapper to parametrized form ended up not being run through BWC testing, resulting in an incorrect implementation being committed. This commit fixes the serialization, and adds unit tests as well as unmuting the BWC test that uncovered the bug. Fixes #60986	2020-08-12 17:32:55 +01:00
Yannick Welsch	8c488de576	Gracefully handle null in checkSettingsForTerminalDeprecation Fixes a test failure after backport to 7.x	2020-08-12 18:03:52 +02:00
Yannick Welsch	25404cbe3d	Provide option to allow writes when master is down (#60605 ) Elasticsearch currently blocks writes by default when a master is unavailable. The cluster.no_master_block setting allows a user to change this behavior to also block reads when a master is unavailable. This PR introduces a way to now also still allow writes when a master is offline. Writes will continue to work as long as routing table changes are not needed (as those require the master for consistency), or if dynamic mapping updates are not required (as again, these require the master for consistency). Eventually we should switch the default of cluster.no_master_block to this new mode.	2020-08-12 16:56:45 +02:00
Yannick Welsch	6644f2283d	Do not access snapshot repo on dedicated voting-only master node (#61016 ) Today a snapshot repository verification ensures that all master-eligible and data nodes have write access to the snapshot repository (and can see each other's data) since taking a snapshot requires data nodes and the currently elected master to write to the repository. However, a dedicated voting-only master-eligible node is not a data node and will never be the elected master so we should not require it to have write access to the repository. Closes #59649	2020-08-12 16:56:45 +02:00
Yannick Welsch	af519be9cb	Ensure repo not in use for wildcard repo deletes (#60947 ) Repositories can't be unregistered when they are actively being used for snapshots or restores. Wildcard repository deletes could silently bypass the "repo in use" checks however, which is now fixed.	2020-08-12 16:38:06 +02:00
Dan Hermann	538c93c923	Adding Hit counts and Miss counts for QueryCache exposed through REST api. (#60114 ) (#60993 )	2020-08-12 08:21:09 -05:00
Alan Woodward	c81dc2b8b7	Convert KeywordFieldMapper to parametrized form (#60645 ) This makes KeywordFieldMapper extend ParametrizedFieldMapper, with explicitly defined parameters. In addition, we add a new option to Parameter, restrictedStringParam, which accepts a restricted set of string options.	2020-08-12 11:41:11 +01:00
markharwood	66098e0bf4	Search fix: query_string regex/wildcard searches not working on wildcard fields (#60959 ) (#61010 ) The Query string parser was not delegating the construction of wildcard/regex queries to the underlying field type. The wildcard field has special data structures and queries that operate on them so cannot rely on the basic regex/wildcard queries that were being used for other fields. Closes #60957	2020-08-12 10:44:52 +01:00
Armin Braun	32423a486d	Simplify and Speed up some Compression Usage (#60953 ) (#61008 ) Use thread-local buffers and deflater and inflater instances to speed up compressing and decompressing from in-memory bytes. Not manually invoking `end()` on these should be safe since their off-heap memory will eventually be reclaimed by the finalizer thread which should not be an issue for thread-locals that are not instantiated at a high frequency. This significantly reduces the amount of byte copying and object creation relative to the previous approach which had to create a fresh temporary buffer (that was then resized multiple times during operations), copied bytes out of that buffer to a freshly allocated `byte[]`, used 4k stream buffers needlessly when working with bytes that are already in arrays (`writeTo` handles efficient writing to the compression logic now) etc. Relates #57284 which should be helped by this change to some degree. Also, I expect this change to speed up mapping/template updates a little as those make heavy use of these code paths.	2020-08-12 11:06:23 +02:00
Nik Everett	ce9c5f0e46	Fix diversified sample tests The test assumed that the aggregator only ran once but we turned that off. This turns it back on.	2020-08-11 17:49:43 -04:00
Jay Modi	2fa6448a15	System index reads in separate threadpool (#60927 ) This commit introduces a new thread pool, `system_read`, which is intended for use by system indices for all read operations (get and search). The `system_read` pool is a fixed thread pool with a maximum number of threads equal to lesser of half of the available processors or 5. Given the combination of both get and read operations in this thread pool, the queue size has been set to 2000. The motivation for this change is to allow system read operations to be serviced in spite of the number of user searches. In order to avoid a significant performance hit due to pattern matching on all search requests, a new metadata flag is added to mark indices as system or non-system. Previously created system indices will have flag added to their metadata upon upgrade to a version with this capability. Additionally, this change also introduces a new class, `SystemIndices`, which encapsulates logic around system indices. Currently, the class provides a method to check if an index is a system index and a method to find a matching index descriptor given the name of an index. Relates #50251 Relates #37867 Backport of #57936	2020-08-11 12:16:34 -06:00
Julie Tibshirani	a93be8d577	Handle nested arrays in field retrieval. (#60981 ) We accept _source values with multiple levels of arrays, such as `"field": [[[1, 2]]]`. This PR ensures that field retrieval can handle nested arrays by unwrapping the arrays before parsing the values.	2020-08-11 10:22:16 -07:00
Mark Tozzi	ab8518fb5b	[7.x] Extensibility for Composite Agg #59648 (#60842 )	2020-08-11 09:14:33 -04:00
Alan Woodward	54279212cf	Make MetadataFieldMapper extend ParametrizedFieldMapper (#59847 ) (#60924 ) This commit cuts over all metadata field mappers to parametrized format.	2020-08-11 09:02:28 +01:00
Armin Braun	3e2dfc6eac	Remove GCS Bucket Exists Check (#60899 ) (#60914 ) Same as https://github.com/elastic/elasticsearch/pull/43288 for GCS. We don't need to do the bucket exists check before using the repo, that just needlessly increases the necessary permissions for using the GCS repository.	2020-08-11 09:54:27 +02:00
Julie Tibshirani	d51eae6e9f	Prevent loading 'fields' with stored fields disabled. (#60938 ) Because the 'fields' option loads from _source (which is a stored field), it is not possible to retrieve 'fields' when stored_fields are disabled. This also fixes #60912, where setting stored_fields: _none_ prevented the _ignored fields from being loaded and caused a parsing exception.	2020-08-10 15:40:27 -07:00
Nik Everett	0286d0a769	Move distance_feature query building into MFT (#60614 ) (#60846 ) This moves the `distance_feature` query building out of `DistanceFeatureQueryBuilder` and into subclasses of `MappedFieldType`. Without this we don't have a chance of supporting this for runtime fields. In general I'm not sad to see the `instanceof`s go. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-08-10 16:05:17 -04:00
Julie Tibshirani	b216340f50	Make `FetchPhase` logic more readable. (#60779 ) * Factor out FieldsVisitor#postProcess call. * Swap logical order for normal and nested documents. * Extract the method createStoredFieldsVisitor.	2020-08-10 11:04:54 -07:00
Nik Everett	dfd502f9ca	Rework checking if a year is a leap year (#60585 ) (#60790 ) This way is faster, saving about 8% on the microbenchmark that rounds to the nearest month. That is in the hot path for `date_histogram` which is a very popular aggregation so it seems worth it to at least try and speed it up a little. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-08-10 12:45:34 -04:00
Jim Ferenczi	f30f1f04e2	Replace AggregatorTestCase#search with AggregatorTestCase#searchAndReduce (#60816 ) This commit removes the ability to test the top level result of an aggregator before it runs the final reduce. All aggregator tests that use AggregatorTestCase#search are rewritten with AggregatorTestCase#searchAndReduce in order to ensure that we test the final output (the one sent to the end user) rather than an intermediary result that could be different. This change also removes spurious commits triggered on top of a random index writer. These commits slow down the tests and are redundant with the commits that the random index writer performs.	2020-08-10 17:23:00 +02:00
David Turner	f44c28b595	Deprecate and ignore join timeout (#60872 ) There is no point in timing out a join attempt any more once a cluster is entirely in 7.x. Timing out and retrying with the same master is pointless, and an in-flight join attempt to one master no longer blocks attempts to join other masters. This commit deprecates this unnecessary setting and removes its effect from the joining process. Relates #60873 which removes this setting in master.	2020-08-10 13:57:41 +01:00
Martijn van Groningen	64bb082f9b	Improve error message for non append-only writes that target data stream (#60874 ) Backport of #60809 to 7.x branch. Closes #60581	2020-08-10 13:18:59 +02:00
Alan Woodward	e8d9185045	Cut over IPFieldMapper to parametrized form (#60602 ) This commit makes IpFieldMapper extend ParametrizedFieldMapper. It also updates the IpFieldMapper docs to add the ignore_malformed parameter, which was not previously documented.	2020-08-10 11:01:10 +01:00
David Turner	1f49e0b9d0	Fix testRerouteOccursOnDiskPassingHighWatermark (#60869 ) Sometimes this test would refresh the disk stats so quickly that it hit the refresh rate limiter even though it was almost completely disabled. This commit allows the rate limiter to be completely disabled. Closes #60587	2020-08-10 09:39:44 +01:00
Ryan Ernst	ddcfbec569	Add assert message for multiple lines in osprobe (#60796 ) Several /proc files are expected to contain a single line. We assert on this in tests, but the contents of the file are lost and the assertion therefore lacks important information to debug why the file appeared to have multiple lines. This commit dumps the contents of the file on assertion failure. relates #59284	2020-08-06 15:53:30 -07:00
Ryan Ernst	fc38af363e	Ensure latch is counted down when assertion trips (#60800 ) The ReloadSecureSettingsIT uses latches to ensure coordination across requests to the underlying in memory cluster. However, in the case of an expected failure, if the assertion fails, the latch will never be counted down, and will cause the test to hang indefinitely. This commit ensures the latch is always counted down with a try/finally. relates #51546	2020-08-06 15:33:46 -07:00
Jim Ferenczi	98119578a1	Disable sort optimization on search collapsing (#60838 ) Collapse search queries that sort by a field can throw an ArrayStoreException due to a bug in the [sort optimization](https://github.com/elastic/elasticsearch/pull/51852) introduced in 7.7.0. Search collapsing were not supposed to be eligible for this sort optimization so this change explicitly filters them from this new feature.	2020-08-06 21:37:12 +02:00
Jim Ferenczi	14980ff97e	Fix AOOBE when setting min_doc_count to 0 in significant_terms (#60823 ) This commit fixes the computation of the subset size on empty buckets (doc count of 0). The aggregator test refactoring in #60683 revealed this bug.	2020-08-06 18:57:09 +02:00
David Turner	721198c29e	Increase logging in testRerouteOccursOnDiskPassingHighWatermark (#60817 ) Relates #60587	2020-08-06 14:08:09 +01:00
Armin Braun	a2c7991e96	Fix CompressibleBytesOutputStreamTests (#60815 ) (#60822 ) Since #60730 the `bytes` field can be `null`. This adds the missing `null` check to the test override. Closes #60814	2020-08-06 15:07:48 +02:00
David Turner	273a6f916d	AwaitsFix for #60814	2020-08-06 12:56:28 +01:00
Tim Brooks	2f76c48ea7	Propagate forceExecution when acquiring permit (#60634 ) Currently the transport replication action does not propagate the force execution parameter when acquiring the indexing permit. The logic to acquire the index permit supports force execution, so this parameter should be propagate. Fixes #60359.	2020-08-05 15:57:40 -06:00
Francisco Fernández Castaño	b4044004aa	Add recovery state tracking for Searchable Snapshots (#60751 ) This pull request adds recovery state tracking for Searchable Snapshots. In order to track recoveries for searchable snapshot backed indices, this pull request adds a new type of RecoveryState. This newRecoveryState instance is able to deal with the small differences that arise during Searchable snapshots recoveries. Those differences can be summarized as follows: - The Directory implementation that's provided by SearchableSnapshots mark the snapshot files as reused during recovery. In order to keep track of the recovery process as the cache is pre-warmed, those files shouldn't be marked as reused. - Once the shard is created, the cache starts its pre-warming phase, meaning that we should keep track of those downloads during that process and tie the recovery to this pre-warming phase. The shard is considered recovered once this pre-warming phase has finished. Backport of #60505	2020-08-05 17:41:49 +02:00
Jake Landis	f3752ba1d5	7.x suport new path for re-index java-api doc (#60319 ) This commit uses the new location for the reindex java-api documentation. Temporary files have been left behind to pacify the docs build. related #60339	2020-08-05 09:05:07 -05:00
Armin Braun	ebfb93ff26	Improve some BytesStreamOutput Usage (#60730 ) (#60736 ) * Stop redundantly creating a `0` length `ByteArray` that is never used * Add efficient way to get a minimal size copy of the bytes in a `BytesStreamOutput` * Avoid multiple redundant `byte[]` copies in search cache key creation	2020-08-05 15:51:06 +02:00
Yannick Welsch	9f6f66f156	Fail searchable snapshot shards on invalid license (#60722 ) Implements license degradation behavior for searchable snapshots. Snapshot-backed shards are failed when the license becomes invalid, and shards won't be reallocated. After valid license is put in place again, shards are allocated again.	2020-08-05 13:14:15 +02:00
Igor Motov	959690a64a	Refactor extendedBounds to use DoubleBounds (#60556 ) (#60681 ) Refactors extendedBounds to use DoubleBounds instead of 2 variables. This is a follow up for #59175	2020-08-04 16:45:47 -04:00
Alan Woodward	b3ae5d26bd	Move mapper validation to the mappers themselves (#60072 ) (#60649 ) Currently, validation of mappers (checking that cross-references are correct, limits on field name lengths and object depths, multiple definitions, etc) is performed by the MapperService. This means that any mapper-specific validation, for example that done on the CompletionFieldMapper, needs to be called specifically from core server code, and so we can't add validation to mappers that live in plugins. This commit reworks the validation framework so that mapper-specific validation is done on the Mapper itself. Mapper gets a new `validate(MappingLookup)` method (already present on `MetadataFieldMapper` and now pulled up to the parent interface), which is called from a new `DocumentMapper.validate()` method. All the validation code currently living on `MapperService` moves either to individual mapper implementations (FieldAliasMapper, CompletionFieldMapper) or into `MappingLookup`, an altered `DocumentFieldMappers` which now knows about object fields and can check for duplicate definitions, or into DocumentMapper which handles soft limit checks.	2020-08-04 14:39:20 +01:00
Armin Braun	212ce22d15	Optimize CS Persistence Stream Use (#60643 ) (#60647 ) In the metadata persistence logic we failed to override the bulk write method on the FilterOutputStream resulting in all the writes to it running byte-by-byte in a loop adding a large number of bounds checks needlessly.	2020-08-04 15:06:57 +02:00
Armin Braun	859ad761bb	Fix Broken Stream Close in writeRawValue (#60625 ) (#60644 ) Small oversight in #56078 that only showed up during backporting where a stream copy was turned from a non-closing to a closing one. Enhanced part of a test in this PR to make it show up in master also even though we practically never use this method with stream targets that actually close.	2020-08-04 13:39:52 +02:00
Armin Braun	7ae9dc2092	Unify Stream Copy Buffer Usage (#56078 ) (#60608 ) We have various ways of copying between two streams and handling thread-local buffers throughout the codebase. This commit unifies a number of them and removes buffer allocations in many spots.	2020-08-04 09:54:52 +02:00
Julie Tibshirani	f99584c6f3	Avoid reloading _source for every inner hit. (#60632 ) Previously if an inner_hits block required _ source, we would reload and parse the root document's source for every hit. This PR adds a shared SourceLookup to the inner hits context that allows inner hits to reuse parsed source if it's already available. This matches our approach for sharing the root document ID. Relates to #32818.	2020-08-03 17:12:27 -07:00
Julie Tibshirani	fc63f8224f	Simplify class hierarchy for ordinals field data. (#60606 ) This PR simplifies the hierarchy for ordinals field data classes: * Remove `AbstractIndexFieldData`, since only `AbstractIndexOrdinalsFieldData` inherits directly from it. * Make `SortedSetOrdinalsIndexFieldData` extend `AbstractIndexOrdinalsFieldData`. This lets us remove some redundant code.	2020-08-03 09:58:29 -07:00
Yannick Welsch	3409e019d2	Ignore shutdown when retrying recoveries (#60586 ) Avoids failures when shutting down a node.	2020-08-03 15:14:38 +02:00
Nik Everett	2cde43b799	Allows nanosecond resolution in search_after (backport of #60328 ) (#60426 ) Allows nanosecond resolution in search_after (#60328) This fixes `search_after` to properly parse string formatted dates that have nanosecond resolution. Closes #52424	2020-08-03 08:17:48 -04:00
David Turner	d2ddf8cd6a	Improve deserialization failure logging (#60577 ) Today when a node fails to properly deserialize a transport message with a parent task we log the following relatively uninformative message: java.lang.IllegalStateException: Message not fully read (response) for requestId [9999], handler [org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler/org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler/org.elasticsearch.transport.TransportService$6@abcdefgh], error [false]; resetting In particular, the wrapping of the listener in the `TransportService` obscures all clues as to the source of the problem, e.g. the action name or the identity of the underlying listener. This commit exposes the inner listener to the logs. Also if the listener is wrapped with `ContextPreservingActionListener` then its identity is similarly hidden. This commit also exposes the wrapped listener in this case. Relates #38939	2020-08-03 11:51:01 +01:00
Armin Braun	3270cb3088	More Efficient Writes for Snapshot Shard Generations (#60458 ) (#60575 ) Same as #59905 but for shard level metadata. Since we wnat to retain the ability to do safe+atomic writes for non-uuid shard generations this PR has to create two separate write paths for both kinds of shard generations.	2020-08-03 11:11:36 +02:00
Armin Braun	204efe9387	Add Repository Setting to Disable Writing index.latest (#60448 ) (#60576 ) Writing the `index.latest` blob is unnecessary unless the contents of the repository are to be used as a URL-repository. Also, in some edge cases, the fact that `index.latest` is the only blob in the repository that regularly gets overwritten was causing compatibility issues with some backing blobstores (Azure no-overwrite policy, Hitachy S3 equivalent). => this commit changes behavior to make snapshots not fail if writing `index.latest` fails and adds a setting to disable writing `index.latest`.	2020-08-03 11:11:24 +02:00
Andrei Dan	ac258f10d6	Data streams: throw ResourceAlreadyExists exception (#60518 ) (#60536 ) For consistency reasons (and reducing the overload of IllegalArgumentException) this changes the exception thrown when trying to create a data stream that already exists. (cherry picked from commit ac2184c4614bba0f3ee377da49aea0daed98bab4) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-08-01 16:31:09 +01:00
Julie Tibshirani	f1d4fd8c3e	Correct name of IndexFieldData#loadGlobalDirect. (#60492 ) It seems 'localGlobalDirect' was just a typo.	2020-07-31 10:53:21 -07:00
Jim Ferenczi	8db896d290	Fix race condition in SearchPhaseControllerTests#testPartialMergeFailure (#60488 ) This change ensures that we call the listener for partial merge failure before calling the completion listener in order to avoid race condition in tests. Closes #60446	2020-07-31 16:29:20 +02:00
Rene Groeschke	ed4b70190b	Replace immediate task creations by using task avoidance api (#60071 ) (#60504 ) - Replace immediate task creations by using task avoidance api - One step closer to #56610 - Still many tasks are created during configuration phase. Tackled in separate steps	2020-07-31 13:09:04 +02:00
Julie Tibshirani	8ac81a3447	Remove IndexFieldData#clear since it is unused. (#60475 ) This method was never called. It also seemed tricky that calling a method on `IndexFieldData` could clear the contents of a shared cache.	2020-07-30 14:07:55 -07:00
Julie Tibshirani	dfd7f226f0	Clarify SourceLookup sharing across fetch subphases. (#60484 ) The `SourceLookup` class provides access to the _source for a particular document, specified through `SourceLookup#setSegmentAndDocument`. Previously the search context contained a single `SourceLookup` that was shared between different fetch subphases. It was hard to reason about its state: is `SourceLookup` set to the expected document? Is the _source already loaded and available? Instead of using a global source lookup, the fetch hit context now provides access to a lookup that is set to load from the hit document. This refactor closes #31000, since the same `SourceLookup` is no longer shared between the 'fetch _source phase' and script execution.	2020-07-30 13:22:31 -07:00
Dan Hermann	5e5503ac28	Change severity of negative stats messages from WARN to DEBUG (#60375 ) (#60444 )	2020-07-30 06:06:13 -05:00
Armin Braun	3bf4c01d8e	Don't Allocate Redundant Pages in BigArrays (#60201 ) (#60441 ) The oversize algorithm was allocating more pages than necessary to accommodate `minTargetSize`. An example would be that a 16k page size and 15k `minTargetSize` would result in a new size of 32k (2 pages). The difference between the minimum number of necessary pages and the estimated size then keeps growing as sizes increase. I don't think there is much value in preemptively allocating pages by over-sizing aggressively since the behavior of the system is quite different from that of a single array where over-sizing avoids copying once the minimum target size is more than a single page. Relates #60173 which lead me to this when `BytesStreamOutput` would allocate a large number of never used pages during serialization of repository metadata.	2020-07-30 11:09:58 +02:00
Armin Braun	a2c49a4f02	Reduce Heap Use during Shard Snapshot (#60370 ) (#60440 ) Instances of `BlobStoreIndexShardSnapshots` can be of non-trivial size. In case of snapshotting a larger number of shards the previous execution order would lead to memory use proportional to the number of shards for these objects. With this change, the number of these objects on heap is bounded by the size of the snapshot pool (except for in the BwC format path). This PR makes it so that they are written to the repository at the earliest possible point in time so that they can be garbage collected. If shard generations are used, we can safely write these right at the beginning of the shard snapshot. If shard generations are not used we can only write them at the end of the shard snapshot after all other blobs have been written. Closes #60173	2020-07-30 10:45:00 +02:00
Igor Motov	00a1949852	Streamline GeoJSON to map serialization (#60413 ) (#60429 ) Optimizes GeoJSON to map serialization when retrieving spatial data through fields. Closes #60259	2020-07-29 17:56:56 -04:00
Julie Tibshirani	5359417ec3	Minor clean-up around search highlight context. (#60422 ) * Rename SearchContextHighlight -> SearchHighlightContext. * Rename HighlighterContext to FieldHighlightContext. * Make the search highlight context immutable. * Avoid storing SearchHighlightContext on HighlighterContext.	2020-07-29 11:39:17 -07:00
Tim Brooks	85fdf959ad	Add configured indexing memory limit to node stats (#60414 ) This commit adds the configured memory limit to the node stats API.	2020-07-29 12:28:21 -06:00
Nhat Nguyen	9d4a64e749	Allow CCR on nodes with legacy roles only (#60093 ) CCR will stop functioning if the master node is on 7.8, but data nodes are before that version because the master node considers that all data nodes do not have the remote cluster client role. This commit allows CCR work on data nodes with legacy roles only. Relates #54146 Relates #59375	2020-07-29 10:57:31 -04:00
Armin Braun	8429b4ace8	Fix Queued Snapshot Deletes After Finalization Failure (#60285 ) (#60379 ) This fixes the behavior of the snapshot state machine in the following edge case: 1. Snapshot is running 2. Delete/abort for the snapshot is started 3. Snapshot fails to finalize We were not removing the failed snapshot id from the list of snapshots to delete in the delete. This lead to an error in the repository, which throws if we try to delete a non-existing snapshot. This commmit updates the deletions in progress by removing the failed snapshot id. The fact that this could lead to snapshot delete entries without any snapshot ids is not optimized on purpose because it allows for another attempt at writing clean `RepositoryData` and will run basic cleanup on the repository (root level blobs and stale indices) and thus bring the repository back into a clean state after a failed finalization. Closes #60274	2020-07-29 15:54:18 +02:00
Armin Braun	381cec2ba9	Fix ConcurrentSnapshotsIT.testMasterFailOverWithQueuedDeletes (#60307 ) (#60376 ) The test assumed that the master fail-over would always work out as a single step. This is not guaranteed however and we can randomly see master failing over twice, in which case the transport listener will be failed on the node that stops being leader and we have to catch an exception for the deletes as well just like we do for the snapshot. Closes #60262	2020-07-29 15:54:00 +02:00
Armin Braun	0778274b72	Fix IPV6 Scope Id in InetAddressesTests (#60368 ) (#60369 ) Follow up to #60360, turns out at times the name of an interface that isn't loopback is not a valid scope id.	2020-07-29 13:16:12 +02:00
Armin Braun	1f6a3765e4	Fix NPE in SnapshotsInProgress Constructor (#60355 ) Merge oversight between cleanups that removed `null` for `shards` and this corner case spot of no indices in a snapshot. Closes #60330	2020-07-29 10:47:28 +02:00
Armin Braun	4307a45153	Fix IPV6 Scope ID Test (#60360 ) (#60363 ) Use real scope id from first available interface instead of `lo` which might not exist on non-Linux platforms. Closes #60332	2020-07-29 09:55:37 +02:00
Armin Braun	753fd4f6bc	Cleanup and optimize More Serialization Spots (#59959 ) (#60331 ) Same as #59626 for a few more spots.	2020-07-29 07:20:44 +02:00
Zachary Tong	e3d85feecd	Mute testForStringIPv6WithScopeIdInput test Tracking issue: https://github.com/elastic/elasticsearch/issues/60332	2020-07-28 15:05:19 -04:00
Igor Motov	0dd53b76bd	Add aggregation list to node info (#60074 ) (#60256 ) Adds a full list of supported aggregations to the node info API. This list will be used in transform tests and telemetry mapping tests that will be added as follow-up PRs. Fixes #59774	2020-07-28 14:06:12 -04:00
Julie Tibshirani	c7bfb5de41	Add search `fields` parameter to support high-level field retrieval. (#60258 ) This feature adds a new `fields` parameter to the search request, which consults both the document `_source` and the mappings to fetch fields in a consistent way. The PR merges the `field-retrieval` feature branch. Addresses #49028 and #55363.	2020-07-28 10:58:20 -07:00
James Rodewig	025e7bee80	[DOCS] Fix allowed values for numeric sort types (#60176 ) (#60299 ) Co-authored-by: Philippus Baalman <philippus@gmail.com>	2020-07-28 13:51:59 -04:00
Howard	11b86b3f88	Remove unused clusterService instance in ActionModule. (#59826 )	2020-07-28 10:36:04 -07:00
jimczi	4e4ed6ee48	fix race condition in SearchPhaseControllerTests#consumerTestCase	2020-07-28 18:27:39 +02:00
David Turner	9450ea08b4	Log and track open/close of transport connections (#60297 ) Transport connections between nodes remain in place until one or other node shuts down or the connection is disrupted by a flaky network. Today it is very difficult to demonstrate that transient failures and cluster instability are caused by the network even though this is often the case. In particular, transport connections open and close without logging anything, even at `DEBUG` level, making it very hard to quantify the scale of the problem or to correlate the networking problems with external events. This commit adds the missing `DEBUG`-level logging when transport connections open and close, and also tracks the total number of transport connections a node has opened as a measure of the stability of the underlying network.	2020-07-28 17:08:04 +01:00
Armin Braun	9222070f22	Fix Test Failure in testCorrectCountsForDoneShards (#60254 ) (#60286 ) * Fix Test Failure in testCorrectCountsForDoneShards Fixing the freak edge case where the node shard status request returns before the node was able to send the state update request to master and update the cluster state. Without this change, the snapshot shard status would report as `DONE` once the data node has finished updating the shard in the cluster state. If the data node then drops out of the cluster before the state has been updated, then the status will jump to "FAILURE" because the master updates the state once the data node leaves the cluster. Closes #60247	2020-07-28 15:46:18 +02:00
David Turner	b78caa5c00	Add more useful toString on cluster state observers (#60277 ) Today if a cluster state observer's listener takes a long time to process a notification then we log the following rather useless warning message: [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@12345678]] took [34567ms] This commit adds a handful of simple `toString()` implementations in order to identify the owner of the listener in question.	2020-07-28 12:56:58 +01:00
Jim Ferenczi	1144534093	Executes incremental reduce in the search thread pool (#58461 ) (#60275 ) This change forks the execution of partial reduces in the coordinating node to the search thread pool. It also ensures that partial reduces are executed sequentially and asynchronously in order to limit the memory and cpu that a single search request can use but also to avoid blocking a network thread. If a partial reduce fails with an exception, the search request is cancelled and the reporting of the error is delayed to the start of the fetch phase (when the final reduce is performed). This ensures that we cleanup the in-flight search requests before returning an error to the user. Closes #53411 Relates #51857	2020-07-28 13:40:47 +02:00
Armin Braun	d39622e17e	Stop Serializing RepositoryData Twice when Writing (#60107 ) (#60269 ) We can save one round of serializing `RepositoryData` on the write path. This also leads to somewhat better compression because we compress larger chunks in one go potentially when compared to serializing and compressing in one go. Also, fixed the double wrapping of collections when copying the repository data instance via the `withGenId`.	2020-07-28 11:42:14 +02:00
Yannick Welsch	a55c869aab	Properly document keepalive and other tcp options (#60216 ) Keepalive options are not well-documented (only in transport section, although also available at http and network level). Co-authored-by: David Turner <david.turner@elastic.co> Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>	2020-07-28 11:10:04 +02:00
Yannick Welsch	ffe114b890	Set specific keepalive options by default on supported platforms (#59278 ) keepalives tell any intermediate devices that the connection remains alive, which helps with overzealous firewalls that are killing idle connections. keepalives are enabled by default in Elasticsearch, but use system defaults for their configuration, which often times do not have reasonable defaults (e.g. 7200s for TCP_KEEP_IDLE) in the context of distributed systems such as Elasticsearch. This PR sets the socket-level keep_alive options for network.tcp.{keep_idle,keep_interval} to 5 minutes on configurations that support it (>= Java 11 & (MacOS \|\| Linux)) and where the system defaults are set to something higher than 5 minutes. This helps keep the connections alive while not interfering with system defaults or user-specified settings unless they are deemed to be set too high by providing better out-of-the-box defaults.	2020-07-28 11:10:04 +02:00
Armin Braun	fac5953d13	Let `isInetAddress` utility understand the scope ID on ipv6 (#60172 ) (#60263 ) Make `isInetAddress` utility method understand the scope ID on ipv6. Fixes #60115 Co-authored-by: Yang Cheng <chengyang2048@163.com>	2020-07-28 09:37:39 +02:00
James Rodewig	cb4c21fa7b	[DOCS] Fix typo in adapt auto expand replica comments (#60187 ) (#60239 ) Co-authored-by: Howard <danielhuang@tencent.com>	2020-07-27 14:18:53 -04:00
weizijun	5df043d0e0	Fix wait_for_no_initializing_shards params (#58379 )	2020-07-27 14:03:26 -04:00
Adrien Grand	f1f275c91b	Add 6.8.12 and 7.8.2 version constants.	2020-07-27 19:26:22 +02:00
Tim Brooks	df0f68da23	Identify the operation type in rejected exception (#60138 ) Currently, we do not categorize the operation type in the rejection exception messsage when we reject an indexing operation for indexing memory limits. This commit fixes this to ensure that it is identified as coordinating, primary, or replica.	2020-07-27 10:09:46 -06:00
Tim Brooks	47922c9e4a	Fix indexing pressure replica rejections logic (#60150 ) Currently the logic to rejection replica rejections is evaluate before adding the additional bytes of the current operation. This means that the first replica operation which should be rejected will be allowed to proceed. This commit fixes this logic and adds unit level test to ensure indexing pressure behavior is correct.	2020-07-27 10:00:01 -06:00
Nik Everett	a451dd87aa	Reduce merge map memory overhead in the Variable Width Histogram Aggregation (#59366 ) (#60171 ) When a document which is distant from existing buckets gets collected, the `variable_width_histogram` will create a new bucket and then insert it into the ordered list of buckets. Currently, a new merge map array is created to move this bucket. This is very expensive as there might be thousands of buckets. This PR creates `mergeBuckets(UnaryOperator<Long> mergeMap)` methods in `BucketsAggregator` and `MergingBucketsDefferingCollector`, and updates the `variable_width_histogram` to use them. This eliminates the need to create an entire merge map array for each new bucket and reduces the memory overhead of the algorithm. Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>	2020-07-27 09:23:06 -04:00
Armin Braun	196ed6b90e	Remove Mostly Redundant Deleting in FsBlobContainer (#60117 ) (#60195 ) In almost all cases we write uuid named files via this method. Preemptively deleting just wastes IO ops, we can delete after a write failed and retry the write to cover the few cases where we actually do an overwrite.	2020-07-27 14:05:41 +02:00
David Roberts	89466eefa5	Don't require separate privilege for internal detail of put pipeline (#60190 ) Putting an ingest pipeline used to require that the user calling it had permission to get nodes info as well as permission to manage ingest. This was due to an internal implementaton detail that was not visible to the end user. This change alters the behaviour so that a user with the manage_pipeline cluster privilege can put an ingest pipeline regardless of whether they have the separate privilege to get nodes info. The internal implementation detail now runs as the internal _xpack user when security is enabled. Backport of #60106	2020-07-27 10:44:48 +01:00
Armin Braun	25a75d05c0	Fix Test Failure in testConcurrentlyChangeRepositoryContentsInBwCMode (#60095 ) There is a very unlikely but possible test failure in this test. The `SnapshotsService` continues iterating over queued operations after resolving the transport listener. This can lead to a situation where the moved repository data is not picked up when running the delete (even though we have the concurrent modifications BwC mode activated) concurrently. I fixed this in the test so that the test still verifies that this setting works. Technically speaking, one could add logic to the way we queue and execute repo operations to address this special case. Since this case only comes about with the concurrent modifications setting enabled (and the setting is gone in master already) I don't really see a reason to improve the logic here since we should always fail queued up repo operations on concurrent modification for safety reasons.	2020-07-27 09:33:38 +02:00
Nhat Nguyen	0031dea9cc	Fix race in testSendSnapshotSendsOps (#59831 ) There is a race between increase and get the global checkpoint in the test as indexTranslogOperations can be executed concurrently. Closes #59492	2020-07-23 16:22:40 -04:00
Ignacio Vera	db183c89ed	Refactor HyperLogLogPlusPlus to separate algorithms and internal data representation (#60104 ) (#60109 )	2020-07-23 15:07:05 +02:00
David Turner	bf7e53a91e	Remove node-level canAllocate override (#59389 ) Today there is a node-level `canAllocate` override which the balancer uses to ignore certain nodes to which it is certain no more shards can be allocated. In fact this override only ignores nodes which have hit the rarely-used `cluster.routing.allocation.total_shards_per_node` limit, so this optimization doesn't have a meaningful impact on real clusters. This commit removes this unnecessary fast path from the balancer, and also removes all the machinery needed to support it.	2020-07-23 08:48:59 +01:00
Armin Braun	43a6ff5eb1	Optimize some Spots around Closing Resources (#60049 ) (#60096 ) The single element `close` calls go through a very inefficient path that includes creating a one element list. `releaseOnce` is only with a single non-null input in production in two spots so no need for varargs and any complexity here. `ReleasableBytesStreamOutput` does not require any `releaseOnce` wrapping because we already have that kind of logic implemented in `org.elasticsearch.common.util.AbstractArray` (which we were wrapping here) already.	2020-07-23 08:49:06 +02:00
Julie Tibshirani	aa57bbd422	Consolidate validation for 'docvalue_fields'. (#60065 ) This improves modularity and also fixes some issues when `docvalues_fields` is used within `inner_hits` or the `top_hits` agg: * We previously didn't resolve wildcards in field names. * We also forgot to enforce the limit `index.max_docvalue_fields_search`.	2020-07-22 17:26:58 -07:00
Armin Braun	ebb6677815	Formalize and Streamline Buffer Sizes used by Repositories (#59771 ) (#60051 ) Due to complicated access checks (reads and writes execute in their own access context) on some repositories (GCS, Azure, HDFS), using a hard coded buffer size of 4k for restores was needlessly inefficient. By the same token, the use of stream copying with the default 8k buffer size for blob writes was inefficient as well. We also had dedicated, undocumented buffer size settings for HDFS and FS repositories. For these two we would use a 100k buffer by default. We did not have such a setting for e.g. GCS though, which would only use an 8k read buffer which is needlessly small for reading from a raw `URLConnection`. This commit adds an undocumented setting that sets the default buffer size to `128k` for all repositories. It removes wasteful allocation of such a large buffer for small writes and reads in case of HDFS and FS repositories (i.e. still using the smaller buffer to write metadata) but uses a large buffer for doing restores and uploading segment blobs. This should speed up Azure and GCS restores and snapshots in a non-trivial way as well as save some memory when reading small blobs on FS and HFDS repositories.	2020-07-22 21:06:31 +02:00
Tim Brooks	ba01540d7e	Implement human readable indexing pressure stats (#60058 ) The indexing pressure stats do not currently have human readable variants. This commit add human readable variants and updates the documentation.	2020-07-22 12:07:59 -06:00
Jay Modi	c8ef2e18f7	Thread safe clean up of LocalNodeModeListeners (#60007 ) This commit continues on the work in #59801 and makes other implementors of the LocalNodeMasterListener interface thread safe in that they will no longer allow the callbacks to run on different threads and possibly race each other. This also helps address other issues where these events could be queued to wait for execution while the service keeps moving forward thinking it is the master even when that is not the case. In order to accomplish this, the LocalNodeMasterListener no longer has the executorName() method to prevent future uses that could encounter this surprising behavior. Each use was inspected and if the class was also a ClusterStateListener, the implementation of LocalNodeMasterListener was removed in favor of a single listener that combined the logic. A single listener is used and there is currently no guarantee on execution order between ClusterStateListeners and LocalNodeMasterListeners, so a future change there could cause undesired consequences. For other classes, the implementations of the callbacks were inspected and if the operations were lightweight, the overriden executorName method was removed to use the default, which runs on the same thread. Backport of #59932	2020-07-22 08:02:18 -06:00
Luca Cavanna	702c997819	ParametrizedFieldMapper to run validators against default value (#60042 ) Sometimes there is the need to make a field required in the mappings, and validate that a value has been provided for it. This can be done through a validator when using ParametrizedFieldMapper, but validators need to run also when a value for a field has not been specified. Relates to #59332	2020-07-22 14:12:38 +02:00
Armin Braun	c06c9fb966	Fix BwC Snapshot INIT Path (#60006 ) There were two subtle bugs here from backporting #56911 to 7.x. 1. We passed `null` for the `shards` map which isn't nullable any longer when creating `SnapshotsInProgress.Entry`, fixed by just passing an empty map like the `null` handling did in the past. 2. The removal of a failed `INIT` state snapshot from the cluster state tried removing it from the finalization loop (the set of repository names that are currently finalizing). This will trip an assertion since the snapshot failed before its repository was put into the set. I made the logic ignore the set in case we remove a failed `INIT` state snapshot to restore the old logic to exactly as it was before the concurrent snapshots backport to be on the safe side here. Also, added tests that explicitly call the old code paths because as can be seen from initially missing this, the BwC tests will only run in the configuration new version master, old version nodes ever so often and having a deterministic test for the old state machine seems the safest bet here. Closes #59986	2020-07-22 10:09:55 +02:00
Jake Landis	55216dabb4	[7.x] Per processor description for verbose simulate (#58207 ) (#60008 ) For ingest node processors a per processor description was recently added. This commit displays that description in the verbose output of the pipeline simulation. related #57906	2020-07-21 17:32:45 -05:00
Nik Everett	49f365ddfd	Fix bug in deep pipeline agg serialization (#59984 ) In #54716 I removed pipeline aggregators from the aggregation result tree and caused us to read them from the request. This saves a bunch of round trip bytes, which is neat. But there was a bug in the backwards compatibility logic. You see, we still have to give the pipeline aggregations to nodes older than 7.8 over the wire because that is how they know what pipelines to run. They have the pipelines in the request but they don't read them. They use the ones in the response tree. Anyway, we had a bug where we were never sending pipelines defined two levels down. So while you are upgrading the pipeline wouldn't run. Sometimes. If the data node of the "first" result was post-7.8 and the coordinating node was pre-7.8. This fixes the bug.	2020-07-21 16:03:15 -04:00
David Turner	dde568caf7	Fix scheduling of ClusterInfoService#refresh (#59880 ) Today the `InternalClusterInfoService` uses the `LocalNodeMasterListener` interface to start/stop its operations. Since the `onMaster` and `offMaster` methods are called on the `MANAGEMENT` threadpool, there's no guarantee that they run in the correct sequence, which could result in an elected master failing to regularly update the cluster info. Since this service is also a `ClusterStateListener` we may as well drop the usage of the `LocalNodeMasterListener` interface and simply update the status of the local node on the applier thread in `clusterChanged` to ensure consistency. Additionally, today the `InternalClusterInfoService` uses a simple flag to track whether the local node is the elected master or not. If the node stops being the master and then starts again within a few seconds then the scheduled updates from the old mastership might carry on running in addition to the ones for the new mastership. This commit addresses that by tracking the identity of the scheduled update job and creating a new job for each mastership.	2020-07-21 17:14:49 +01:00
Alan Woodward	a0ad1a196b	Wrap up building parametrized TypeParsers (#59977 ) The TypeParser implementations of all ParametrizedFieldMapper descendant classes are essentially the same - stateless, requiring the construction of a Builder object, and calling parse on it before returning it. We can make this easier (and less error-prone) to implement by wrapping the logic up into a final class, which takes a function to produce the Builder from a name and parser context.	2020-07-21 16:00:11 +01:00
Nik Everett	6f6076e208	Drop some params from IndexFieldData.Builder (backport of #59934 ) (#59972 ) We never used the `IndexSettings` parameter and we only used the `MappedFieldType` parameter to get the name of the field which we already know everywhere where we build the `IFD.Builder`. This allows us to drop a fair bit of ceremony from a couple of tests.	2020-07-21 10:28:59 -04:00
Luca Cavanna	5e17f00ecf	Tweak toXContent implementation of ParametrizedFieldMapper (#59968 ) ParametrizedFieldMapper overrides `toXContent` from `FieldMapper`, yet it could override `doXContentBody` and rely on the `toXContent` from the base class. Additionally, this allows to make `doXContentBody` final. Also, toXContent is still overridden only to make it final.	2020-07-21 16:01:51 +02:00
Przemyslaw Gomulka	19fe3e511f	Deprecate camel case date format backport(#59555 ) (#59948 ) Camel case date formats are deprecated and snake case should be used instead. backports #59555	2020-07-21 15:56:44 +02:00
Armin Braun	e37bfe8a5f	Stop Checking if Segment Data Blob Exists before Write (#59905 ) (#59971 ) With uuid named segment data blobs there is no reason to ensure no overwrites are happening for these blobs when writing. On the contrary, at least on Azure this check can conflict with the SDK's retrying and cause upload failures randomly.	2020-07-21 15:23:42 +02:00
Yannick Welsch	07784a0b16	CCR recoveries using wrong setting for chunk sizes (#59597 ) The default chunk size for CCR file-based recoveries was wrongly set to 40MB instead of 1MB.	2020-07-21 13:56:06 +02:00
Armin Braun	cefaa17c52	Simplify CheckSumBlobStoreFormat and make it more Reusable (#59888 ) (#59950 ) Refactored `CheckSumBlobStoreFormat` so it can more easily be reused in other functionality (i.e. upcoming repair logic). Simplified away constant `failIfAlreadyExists` parameter and removed the atomic write method and its tests. The atomic write method was only used in a single spot and that spot has now been adjusted to work the same way writing root level metadata works.	2020-07-21 11:20:56 +02:00
Armin Braun	5b92596fad	Cleanup and Optimize Multiple Serialization Spots (#59626 ) (#59936 ) Follow up to #59606 using some of the new infrastructure and making similar cleanups (and due to at times better handling of size hints and empty collections also optimizations in the stream utility methods this also means speedups) in various spots in the core codebase.	2020-07-21 10:06:56 +02:00
Julie Tibshirani	8647872a1e	Simplify structure for parsing points. (#59938 ) Previously we constructed a GeometryFormat object and delegated point parsing to it. This wasn't a good fit conceptually because each GeometryFormat instance didn't represent a distinct point format.	2020-07-20 17:11:43 -07:00
Nik Everett	b2ca19484a	Allocate slightly less per bucket (#59740 ) (#59873 ) This replaces that data structure that we use to resolve bucket ids in bucketing aggs that are inside other bucketing aggs. This replaces the "legoed together" data structure with a purpose built `LongLongHash` with semantics similar to `LongHash`, except that it has two `long`s as keys instead of one. The microbenchmarks show a fairly substantial performance gain on the hot path, around 30%. Rally's higher level benchmarks show anywhere from 0 to 7% speed improvements. Not as much as I'd hoped, but nothing to sneeze at. And, after all, we all allocating slightly less data per owningBucketOrd, which is always nice.	2020-07-20 10:43:11 -04:00
Stéphane Campinas	bcebdfe5b1	fix handling of alias filter in SearchService#canMatch (#59368 ) The check against the alias filter should be done after the request is rewritten. Close #59367	2020-07-20 16:25:15 +02:00
David Turner	b75207a09f	Remove sporadic min/max usage estimates from stats (#59755 ) Today `GET _nodes/stats/fs` includes `{least,most}_usage_estimate` fields for some nodes. These fields have rather strange semantics. They are only reported on the elected master and on nodes that have been the elected master since they were last restarted; when a node stops being the elected master these stats remain in place but we stop updating them so they may become arbitrarily stale. This means that these statistics are pretty meaningless and impossible to use correctly. Even if they were kept up to date they're never reported for data-only nodes anyway, despite the fact that data nodes are the ones where we care most about disk usage. The information needed to compute the path with the least/most available space is already provided in the rest the stats output, so we can treat the inclusion of these stats as a bug and fix it by simply removing them in this commit. Since these stats were always optional and mostly omitted (for opaque reasons) this is not considered a breaking change.	2020-07-20 15:22:04 +01:00
Lee Hinman	8c7d414a3b	[7.x] Fix retrieving data stream stats for a DS with multiple backing indices (#59806 ) (#59810 ) Backports the following commits to 7.x: Fix retrieving data stream stats for a DS with multiple backing indices (#59806)	2020-07-17 16:56:07 -06:00
Nik Everett	514b2f3414	Clean up a few of vwh's rough edges (#59341 ) (#59807 ) This cleans up a few rough edged in the `variable_width_histogram`, mostly found by @wwang500: 1. Setting its tuning parameters in an unexpected order could cause the request to fail. 2. We checked that the maximum number of buckets was both less than 50000 and MAX_BUCKETS. This drops the 50000. 3. Fixes a divide by 0 that can occur of the `shard_size` is 1. 4. Fixes a divide by 0 that can occur if the `shard_size * 3` overflows a signed int. 5. Requires `shard_size * 3 / 4` to be at least `buckets`. If it is less than `buckets` we will very consistently return fewer buckets than requested. For the most part we expect folks to leave it at the default. If they change it, we expect it to be much bigger than `buckets`. 6. Allocate a smaller `mergeMap` in when initially bucketing requests that don't use the entire `shard_size * 3 / 4`. Its just a waste. 7. Default `shard_size` to `10 * buckets` rather than `100`. It looks like that was our intention the whole time. And it feels like it'd keep the algorithm humming along more smoothly. 8. Default the `initial_buffer` to `min(10 * shard_size, 50000)` like we've documented it rather than `5000`. Like the point above, this feels like the right thing to do to keep the algorithm happy. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-17 15:16:09 -04:00
Lee Hinman	f6b08a3115	[7.x] Allow simulating existing composable index template (#59733 ) (#59798 ) Backports the following commits to 7.x: Allow simulating existing composable index template (#59733)	2020-07-17 13:10:07 -06:00
Nik Everett	95e6e4a452	Small cleanup for IndexFieldData (#59724 ) (#59800 ) This drops `IndexComponent` from `IndexFieldData` because it wasn't doing anything other than forcing us to perform a bunch of ceremony to build them.	2020-07-17 13:38:15 -04:00
Tal Levy	c9ab7bb651	Fix bug in circuit-breaker check for geoshape grid aggregations (#57962 ) (#59741 ) There was a bug in the geoshape circuit-breaker check where the hash values array was being allocated before its new size was accounted for by the circuit breaker. Fixes #57847.	2020-07-17 09:26:00 -07:00
Christoph Büscher	f4ff5fe93b	Add `zero_terms_query` support to `match_phrase_prefix` (#58822 ) (#59784 ) Currently `match_phrase_prefix` doesn't support `zero_terms_query` like the other match-type queries. This change adds this support. Closes #58468	2020-07-17 17:23:23 +02:00
Benjamin Trent	b7f30fc929	[7.x] Adding new `require_alias` option to indexing requests (#58917 ) (#59769 ) * Adding new `require_alias` option to indexing requests (#58917) This commit adds the `require_alias` flag to requests that create new documents. This flag, when `true` prevents the request from automatically creating an index. Instead, the destination of the request MUST be an alias. When the flag is not set, or `false`, the behavior defaults to the `action.auto_create_index` settings. This is useful when an alias is required instead of a concrete index. closes https://github.com/elastic/elasticsearch/issues/55267	2020-07-17 10:24:58 -04:00
Alan Woodward	65f6fb8e94	Shortcut mapping update if the incoming mapping version is the same as the current mapping version (#59517 ) (#59772 ) Currently, when we apply a cluster state change to a shard on a non-master node, we check to see if the mappings need to be updated by comparing the decompressed serialized mappings from the update against the serialized version of the shard's existing mappings. However, we already have a much simpler way of checking this, by comparing mapping versions on the index metadata of the old and new states. This commit adds a shortcut to MapperService.updateMappings() that compares these mapping versions, and ignores the merge if they are equal.	2020-07-17 14:53:09 +01:00
Alan Woodward	b29d368b52	Convert DateFieldMapper to parametrized format (#59429 ) (#59759 ) This commit makes DateFieldMapper extend ParametrizedFieldMapper, declaring its parameters explicitly. As well as changes to DateFieldMapper itself, there are some changes to dynamic mapping code to ensure that dynamically detected date formats are passed through to new date mapper builders.	2020-07-17 12:46:18 +01:00
Przemko Robakowski	790fbbcd87	[7.x] Fix handling of final pipelines when destination is changed (#59522 ) (#59746 ) * Fix handling of final pipelines when destination is changed (#59522) This change fixes final pipelines if destination index is changed during pipeline run: -final pipelines can't change destination anymore, exception is thrown if they try to -if request/default pipeline changes destination final pipeline from old index won't be executed -if request/default pipeline changes destination and new index has final pipeline it will be executed -default pipeline from new index won't be executed Additionally TransportBulkAction.resolvePipelines was moved to IngestService as it's needed for resolving pipelines from new index. Tests were moved accordingly. Closes #57968	2020-07-17 11:13:48 +02:00
Tim Brooks	b6e6a8c090	Fix replication operation transient retry test (#58205 ) After the work to retry transient replication failures, the local and global checkpoint test metadata can be incremented on a different thread than the test thread. This appears to introduce an extremely rare scenario where this data is not visible for later test assertions. This commit fixes the issue by using synchronized maps.	2020-07-16 16:01:47 -06:00
Martijn van Groningen	0096238df1	Replaced _data_stream_timestamp meta field's 'path' option with 'enabled' option (#59727 ) Backport #59503 to 7.x and adjusted exception messages. Relates to #59076	2020-07-16 22:29:40 +02:00
Igor Motov	2408803fad	Adds hard_bounds to histogram aggregations (#59175 ) (#59656 ) Adds a hard_bounds parameter to explicitly limit the buckets that a histogram can generate. This is especially useful in case of open ended ranges that can produce a very large number of buckets.	2020-07-16 15:31:53 -04:00
Alan Woodward	10be10c99b	Migrate CompletionFieldMapper to parametrized format (#59691 ) This adds a number of new optional parameters to Parameter, including: * custom serialization (to handle analyzers) * deprecated parameter names * parameter validation * allowing default values to be based on the values of other parameters We preserve the previous serialization format of CompletionFieldMapper, always emitting most fields, in order to meet mapping checks in mixed version clusters, where the mapper service will check that mappings have been correctly parsed and updated by checking their serialized outputs.	2020-07-16 19:15:00 +01:00
Howard	c0d429863c	remove unused cluster name in environment. (backport of #59605 ) (#59681 ) removes an unused variable	2020-07-16 09:25:55 -04:00
Nik Everett	343053c0a7	Fix compilation in Eclipse (backport #59675 ) Eclipse was confused by #59583. It can't see a the public inner interface within the superclass. This time. Usually that is fine, but the Eclipse gods don't like this particular code, I guess.	2020-07-16 08:25:12 -04:00
Alan Woodward	27067de699	Make MappedFieldType#meta final (#59383 ) The MappedFieldType#updateMeta method was used for testing equality checks, but we no longer need these after #59212 , so we can remove this method and make meta final.	2020-07-16 09:45:55 +01:00
Przemysław Witek	df4fea79cb	Add a "verbose" option to the data frame analytics stats endpoint (#59589 ) (#59621 )	2020-07-16 09:51:31 +02:00
Armin Braun	6db481f49e	Fix ConcurrentSnapshotsIT.testEquivalentDeletesAreDeduplicated (#59611 ) (#59653 ) Trying to queue up snapshot deletes by blocking the delete of the latest index-N doesn't work here. The first delete will block on the delete operation but only do so after having already written the updated repository data. Since that repository data will contain no snapshots, the subsequent deletes for `*` will just fall through and complete instead of queue up. => Fixed by simply waiting on all files on master so that we block before updating the repository data and get to test the queueing of equivalent operations closes #59608	2020-07-16 09:28:36 +02:00
Nhat Nguyen	b599f7a9c0	Fix estimate size of translog operations (#59206 ) Make sure that the estimateSize method includes all fields of translog operations.	2020-07-16 00:19:30 -04:00
Julie Tibshirani	2b70758a05	Correct type parametrization in geo mappers. (#59583 ) Previously the concrete type parameters for the MappedFieldType didn't always match those for the FieldMapper. This PR updates the mappers so that the type parameters always match, which makes the design easier to follow.	2020-07-15 14:10:47 -07:00
Boice Huang	ef26c1739b	fix typo in Exception Response in GeoJson (#59270 )	2020-07-15 20:15:18 +01:00
Boice Huang	07a58d915d	Fix typo in AggregationProfiler (#59269 )	2020-07-15 20:14:19 +01:00
Armin Braun	cc7093645c	Cleanup some Serialization Code around Snapshots (#59532 ) (#59606 ) A number of obvious possible simplifications that also improve efficiency in some cases (better empty collection handling and size hint use). Also, added a shortcut for writing and reading immutable open maps that can be used to dry up additional spots.	2020-07-15 20:40:43 +02:00
David Turner	67e7c3f60e	Fix failing test introduced in #59601	2020-07-15 17:44:27 +01:00
Rory Hunter	b8d73a1e7e	Default gateway.auto_import_dangling_indices to false (#59302 ) Backport of #58898. Part of #48366. Now that there is a dedicated API for dangling indices, the auto-import behaviour can default to off. Also add a note to the breaking changes for 7.9.0.	2020-07-15 17:10:42 +01:00
David Turner	691759fb1f	Validate snapshot UUID during restore (#59601 ) Today when mounting a searchable snapshot we obtain the snapshot/index UUIDs and then assume that these are the UUIDs used during the subsequent restore. If you concurrently delete the snapshot and replace it with one with the same name then this assumption is violated, with chaotic consequences. This commit introduces a check that ensures that the snapshot UUID does not change during the mount process. If the snapshot remains in place then the index UUID necessarily does not change either. Relates #50999	2020-07-15 16:23:20 +01:00
Martijn van Groningen	2a89e13e43	Move data stream transport and rest action to xpack (#59593 ) Backport of #59525 to 7.x branch. * Actions are moved to xpack core. * Transport and rest actions are moved the data-streams module. * Removed data streams methods from Client interface. * Adjusted tests to use client.execute(...) instead of data stream specific methods. * only attempt to delete all data streams if xpack is installed in rest tests * Now that ds apis are in xpack and ESIntegTestCase no longers deletes all ds, do that in the MlNativeIntegTestCase class for ml tests.	2020-07-15 16:50:44 +02:00
Rory Hunter	2e05ce5f88	Bump version to 7.10.0	2020-07-15 11:56:45 +01:00
Ignacio Vera	f8037abf47	upgrade to lucene-8.6.0 release (#59596 ) (#59599 )	2020-07-15 12:40:57 +02:00
David Turner	0c2510dc68	Don't request cluster metadata in _cat/shards impl (#59548 ) Today `GET _cat/shards` requests the nodes, routing table, and metadata from the cluster state, but it does not use any information from the metadata portion of the response. Metadata includes things like mappings and templates that may be substantial in size. This commit drops the unnecessary metadata portion of this cluster state request.	2020-07-15 10:14:48 +01:00
Francisco Fernández Castaño	66ef1cdad7	Add the possibility to inject a custom RecoveryState factory to IndexStorePlugin implementations (#59124 ) Add a custom factory for recovery state into IndexStorePlugin that allows different implementors to provide its own RecoveryState implementation. Backport of #59038	2020-07-15 11:11:07 +02:00
Armin Braun	96f52a028f	Fix Snapshot not Starting in Partial Snapshot Corner Case (#59428 ) (#59584 ) We were not handling the case where during a partial snapshot all shards would enter a failed state right off the bat. Closes #59384	2020-07-15 07:59:22 +02:00
Armin Braun	2dd086445c	Enable Fully Concurrent Snapshot Operations (#56911 ) (#59578 ) Enables fully concurrent snapshot operations: * Snapshot create- and delete operations can be started in any order * Delete operations wait for snapshot finalization to finish, are batched as much as possible to improve efficiency and once enqueued in the cluster state prevent new snapshots from starting on data nodes until executed * We could be even more concurrent here in a follow-up by interleaving deletes and snapshots on a per-shard level. I decided not to do this for now since it seemed not worth the added complexity yet. Due to batching+deduplicating of deletes the pain of having a delete stuck behind a long -running snapshot seemed manageable (dropped client connections + resulting retries don't cause issues due to deduplication of delete jobs, batching of deletes allows enqueuing more and more deletes even if a snapshot blocks for a long time that will all be executed in essentially constant time (due to bulk snapshot deletion, deleting multiple snapshots is mostly about as fast as deleting a single one)) * Snapshot creation is completely concurrent across shards, but per shard snapshots are linearized for each repository as are snapshot finalizations See updated JavaDoc and added test cases for more details and illustration on the functionality. Some notes: The queuing of snapshot finalizations and deletes and the related locking/synchronization is a little awkward in this version but can be much simplified with some refactoring. The problem is that snapshot finalizations resolve their listeners on the `SNAPSHOT` pool while deletes resolve the listener on the master update thread. With some refactoring both of these could be moved to the master update thread, effectively removing the need for any synchronization around the `SnapshotService` state. I didn't do this refactoring here because it's a fairly large change and not necessary for the functionality but plan to do so in a follow-up. This change allows for completely removing any trickery around synchronizing deletes and snapshots from SLM and 100% does away with SLM errors from collisions between deletes and snapshots. Snapshotting a single index in parallel to a long running full backup will execute without having to wait for the long running backup as required by the ILM/SLM use case of moving indices to "snapshot tier". Finalizations are linearized but ordered according to which snapshot saw all of its shards complete first	2020-07-15 03:42:31 +02:00
Armin Braun	06d94cbb2a	Fix TODO about Spurious FAILED Snapshots (#58994 ) (#59576 ) There is no point in writing out snapshots that contain no data that can be restored whatsoever. It may have made sense to do so in the past when there was an `INIT` snapshot step that wrote data to the repository that would've other become unreferenced, but in the current day state machine without the `INIT` step there is no point in doing so.	2020-07-15 00:54:30 +02:00
Armin Braun	e1014038e9	Simplify Repository.finalizeSnapshot Signature (#58834 ) (#59574 ) Many of the parameters we pass into this method were only used to build the `SnapshotInfo` instance to write. This change simplifies the signature. Also, it seems less error prone to build `SnapshotInfo` in `SnapshotsService` isntead of relying on the fact that each repository implementation will build the correct `SnapshotInfo`.	2020-07-15 00:14:28 +02:00
Armin Braun	16a47e0d08	Simplify SnapshotsInProgress Construction (#58893 ) (#59573 ) With parallel snapshots incoming (but also in isolation) it makes sense to clean up `SnapshotsInProgress` construction. We don't need to pre-compute the waiting shards for every entry. We rarely use this information (only on routing changes) and in the one spot we did we now simply spent the extra cycles for looping over all shards instead of just the waiting ones once per routing change tops instead of on every change to `SnapshotsInProgress` (moreover, we would burn the cycles for looping on all nodes even though only the current master cares about the information). In addition to that change I removed some dead code constructors and slighly optimized deserialization.	2020-07-15 00:00:53 +02:00
Martijn van Groningen	35ae3d19db	Remove data stream feature flag (#59572 ) so that it can used in the next minor release (7.9.0). Backport of #59504 to 7.x branch. Closes #53100	2020-07-14 23:50:41 +02:00
Armin Braun	68a199f75f	Minor Cleanup Dead Code Snapshotting (#57716 ) (#59569 ) * Use consistent cluster state instead in state update * Remove dead loop in tests * Remove some dead exception ctors Just three trivial/random things I found.	2020-07-14 23:13:14 +02:00
James Baiera	5f7e7e9410	[7.x] Data Stream Stats API (#58707 ) (#59566 ) This API reports on statistics important for data streams, including the number of data streams, the number of backing indices for those streams, the disk usage for each data stream, and the maximum timestamp for each data stream	2020-07-14 16:57:46 -04:00
Mark Tozzi	ed2c29f102	If no perBucketSample has been allocated for the parent bucket return a doc count of 0 (#59360 ) (#59567 ) Co-authored-by: Fabio Corneti <info@corneti.com>	2020-07-14 16:56:29 -04:00
Armin Braun	d456f7870a	Deduplicate Index Metadata in BlobStore (#50278 ) (#59514 ) This PR introduces two new fields in to `RepositoryData` (index-N) to track the blob name of `IndexMetaData` blobs and their content via setting generations and uuids. This is used to deduplicate the `IndexMetaData` blobs (`meta-{uuid}.dat` in the indices folders under `/indices` so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot. This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time. The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into `7.6`). Relates to #45736 as it improves the efficiency of snapshotting unchanged indices Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete	2020-07-14 22:18:42 +02:00
Tim Brooks	408a07f96a	Separate coordinating and primary bytes in stats (#59487 ) Currently we combine coordinating and primary bytes into a single bucket for indexing pressure stats. This makes sense for rejection logic. However, for metrics it would be useful to separate them.	2020-07-14 12:37:06 -06:00
Tim Brooks	a46e5e0f04	Increase default write queue size (#59464 ) This commit increases the default write queue size to 10000. This is to allow a greater number of pending indexing requests. This work is safe as we have added additional memory limits. Relates to #59263.	2020-07-14 10:35:25 -06:00
Tim Brooks	1a24916fef	Enable replication retries on 7.9+ (#59546 ) Currently the work to support replication retries is present on 7.9. This commit enables these retries by setting the replication timeout to 60s.	2020-07-14 10:35:05 -06:00
Dan Hermann	e54b4a729f	[7.x] Adds write_index_only option to put mapping API (#59539 )	2020-07-14 10:34:08 -05:00
Luca Cavanna	af2f85be15	Consolidate script parsing from object (7.x) (#59509 ) The update by query action parses a script from an object (map or string). We will need to do the same for runtime fields as they are parsed as part of mappings (#59391). This commit moves the existing parsing of a script from an object from RestUpdateByQueryAction to the Script class. It also adds tests and adjusts some error messages that are incorrect. Also, options were not parsed before and they are now. And unsupported fields trigger now a deprecation warning.	2020-07-14 17:08:29 +02:00
Mark Tozzi	b357c1b77a	[7.x] Fix NPE when building exception messages for aggregations (#59156 ) (#59334 )	2020-07-14 09:37:44 -04:00
Andrei Dan	7dcdaeae49	Default to @timestamp in composable template datastream definition (#59317 ) (#59516 ) This makes the data_stream timestamp field specification optional when defining a composable template. When there isn't one specified it will default to `@timestamp`. (cherry picked from commit 5609353c5d164e15a636c22019c9c17fa98aac30) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-07-14 12:36:54 +01:00
Andrei Dan	4180333bbc	[7.x] Composable templates: add a default mapping for @timestamp (#59244 ) (#59510 ) This adds a low precendece mapping for the `@timestamp` field with type `date`. This will aid with the bootstrapping of data streams as a timestamp mapping can be omitted when nanos precision is not needed. (cherry picked from commit 4e72f43d62edfe52a934367ce9809b5efbcdb531) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-07-14 11:29:33 +01:00
Armin Braun	0e3d87ab54	Add Assertions on CS Application in Snapshot Logic (#58681 ) (#59511 ) Relates to #58680. Bugs like that should not only show up in logs but ideally also get caught in tests. We expect to never see exceptions in these two spots.	2020-07-14 12:16:42 +02:00
Armin Braun	81e96954d0	Improve Efficiency of SnapshotsService CS Apply (#56874 ) (#59508 ) This change removes the redundant submitting of two separate cluster state updates for the node configuration changes and routing changes that affect snapshots. Since we submitted the task to deal with node configuration changes every time on master fail-over we could also move the BwC cleanup loop that removes `INIT` state snapshots as well as snapshots that have all their shards completed into this cluster state update task. Aside from improving efficiency overall this change has the fortunate side effect of moving all snapshot finalization to the CS update thread. This is helpful for concurrent snapshots since it makes it very natural and straight forward to order snapshot finalizations by exploiting that they are all initiated on the same thread.	2020-07-14 11:49:09 +02:00
Tim Brooks	623df95a32	Adding indexing pressure stats to node stats API (#59467 ) We have recently added internal metrics to monitor the amount of indexing occurring on a node. These metrics introduce back pressure to indexing when memory utilization is too high. This commit exposes these stats through the node stats API.	2020-07-13 17:23:42 -06:00
Tim Brooks	68d56fa7db	Implement rejections in `WriteMemoryLimits` (#59451 ) This commit adds rejections when the indexing memory limits are exceeded for primary or coordinating operations. The amount of bytes allow for indexing is controlled by a new setting `indexing_limits.memory.limit`.	2020-07-13 14:34:50 -06:00
Mark Tozzi	eb0b28dd1d	Move getPointReaderOrNull into AggregatorBase (#58769 ) (#59455 )	2020-07-13 16:31:33 -04:00
Armin Braun	64c5f70a2d	Remove Needless Context Switches on Loading RepositoryData (#56935 ) (#59452 ) We don't need to switch to the generic or snapshot pool for loading cached repository data (i.e. most of the time in normal operation). This makes `executeConsistentStateUpdate` less heavy if it has to retry and lowers the chance of having to retry in the first place. Also, this change allowed simplifying a few other spots in the codebase where we would fork off to another pool just to load repository data.	2020-07-13 21:38:29 +02:00
Armin Braun	bde92fc5fc	Remove Needless Context Switch From Snapshot Finalization (#56871 ) (#59443 ) No need to do any switch to the `SNAPSHOT` pool here, the blob store repo handles all its writes async on the `SNAPSHOT` pool so we're just needlessly context-switching to enqueue those tasks there. Also cleaned up the source only repository (the only override to `finalizeSnapshot`) to make it clear that no IO is happening there and we don't need to run it on the `SNAPSHOT` pool either.	2020-07-13 20:11:07 +02:00
Armin Braun	31be3a3645	More Efficient Snapshot State Handling (#56669 ) (#59430 ) Follow up to #56365. Instead of redundantly checking snapshots for completion over and over, just track the completed snapshots in the CS updates that complete them instead of looping over the smae snapshot entries over and over. Also, in the batched snapshot shard status updates, only check for completion of a snapshot entry if it isn't already finalizing.	2020-07-13 18:58:04 +02:00
Christos Soulios	3868bcc7b8	[7.x] Histogram integration on Histogram field type (#59431 ) Backports #58930 to 7.x Implements histogram aggregation over histogram fields as requested in #53285.	2020-07-13 19:36:33 +03:00
Henning Andersen	adf6083dd0	Enhance real memory circuit breaker with G1 GC (#58674 ) (#59394 ) Using G1 GC, Elasticsearch can rarely trigger that heap usage goes above the real memory circuit breaker limit and stays there for an extended period. This situation will persist until the next young GC. The circuit breaking itself hinders that from occurring in a timely manner since it breaks all request before real work is done. This commit gently nudges G1 to do a young GC and then double checks that heap usage is still above the real memory circuit breaker limit before throwing the circuit breaker exception. Related to #57202	2020-07-13 17:41:09 +02:00
Martijn van Groningen	b1b7bf3912	Make data streams a basic licensed feature. (#59392 ) Backport of #59293 to 7.x branch. * Create new data-stream xpack module. * Move TimestampFieldMapper to the new module, this results in storing a composable index template with data stream definition only to work with default distribution. This way data streams can only be used with default distribution, since a data stream can currently only be created if a matching composable index template exists with a data stream definition. * Renamed `_timestamp` meta field mapper to `_data_stream_timestamp` meta field mapper. * Add logic to put composable index template api to fail if `_data_stream_timestamp` meta field mapper isn't registered. So that a more understandable error is returned when attempting to store a template with data stream definition via the oss distribution. In a follow up the data stream transport and rest actions can be moved to the xpack data-stream module.	2020-07-13 17:26:46 +02:00
Alan Woodward	bd01fd107c	Revert "Migrate CompletionFieldMapper to parametrized format (#59291 )" This reverts commit `19ba6c39d2`.	2020-07-13 14:16:09 +01:00
Armin Braun	4e574a7136	Remove Dead Code from Closed Index Snapshot Logic (#56764 ) (#59398 ) The code path for closed indices is dead code here ever since #39644 because `shards(currentState, indexIds, ...)` does not set `MISSING` on a closed index's shard that is assigned any longer. Before that change it would always set `MISSING` for a closed index's shard even it was assigned. => simplified the code accordingly.	2020-07-13 14:49:16 +02:00
David Turner	3fb9dccc22	Fix FSHealthServiceTests on Windows (#59387 ) In #52680 we introduced a new health check mechanism. This commit fixes up some related test failures on Windows caused by erroneously assuming that all paths begin with `/`. Closes #59380	2020-07-13 12:43:45 +01:00
Alan Woodward	19ba6c39d2	Migrate CompletionFieldMapper to parametrized format (#59291 ) This adds some optional extra configuration to Parameter: * custom serialization (to handle analyzers) * deprecated parameter names * parameter validation	2020-07-13 12:43:15 +01:00
Armin Braun	08b54feaaf	Remove Snapshot INIT Step (#55918 ) (#59374 ) With #55773 the snapshot INIT state step has become obsolete. We can set up the snapshot directly in one single step to simplify the state machine. This is a big help for building concurrent snapshots because it allows us to establish a deterministic order of operations between snapshot create and delete operations since all of their entries now contain a repository generation. With this change simple queuing up of snapshot operations can and will be added in a follow-up.	2020-07-13 13:41:09 +02:00
Alan Woodward	c810a4a12e	Continue to accept unused 'universal' params in <8.0 indexes (#59381 ) We have a number of parameters which are universally parsed by almost all mappers, whether or not they make sense. Migrating the binary and boolean mappers to the new style of declaring their parameters explicitly has meant that these universal parameters stopped being accepted, which would break existing mappings. This commit adds some extra logic to ParametrizedFieldMapper that checks for the existence of these universal parameters, and issues a warning on 7x indexes if it finds them. Indexes created in 8.0 and beyond will throw an error. Fixes #59359	2020-07-13 11:15:56 +01:00
David Kyle	7dcd943e1d	Mute FsHealthServiceTests testFailsHealthOnIOException (#59382 ) For #59380	2020-07-13 09:48:07 +01:00
Armin Braun	483386136d	Move all Snapshot Master Node Steps to SnapshotsService (#56365 ) (#59373 ) This refactoring has three motivations: 1. Separate all master node steps during snapshot operations from all data node steps in code. 2. Set up next steps in concurrent repository operations and general improvements by centralizing tracking of each shard's state in the repository in `SnapshotsService` so that operations for each shard can be linearized efficiently (i.e. without having to inspect the full snapshot state for all shards on every cluster state update, allowing us to track more in memory and only fall back to inspecting the full CS on master failover like we do in the snapshot shards service). * This PR already contains some best effort examples of this, but obviously this could be way improved upon still (just did not want to do it in this PR for complexity reasons) 3. Make the `SnapshotsService` less expensive on the CS thread for large snapshots	2020-07-12 22:19:07 +02:00
Dan Hermann	e01d73c737	[7.x] Data stream admin actions are now index-level actions	2020-07-10 14:36:18 -05:00
Stuart Tettemer	4c04fd1e05	Scripting: Unlimited compilation rate for ingest (#59268 ) * `ingest` and `processor_conditional` default to unlimited compilation rate Refs: #50152	2020-07-09 16:34:47 -05:00
Stuart Tettemer	94e213dd5f	Scripting: Per context stats in `script` in _nodes/stats (#59266 ) Updated `_nodes/stats`: * Update `script` in `_node/stats` to include stats per context: ``` "script": { "compilations": 1, "cache_evictions": 0, "compilation_limit_triggered": 0, "contexts":[ { "context": "aggregation_selector", "compilations": 0, "cache_evictions": 0, "compilation_limit_triggered": 0 }, ``` Refs: #50152 Backport: #59625	2020-07-09 15:30:50 -05:00
Alan Woodward	f4caadd239	MappedFieldType no longer requires equals/hashCode/clone (#59212 ) With the removal of mapping types and the immutability of FieldTypeLookup in #58162, we no longer have any cause to compare MappedFieldType instances. This means that we can remove all equals and hashCode implementations, and in addition we no longer need the clone implementations which were required for equals/hashcode testing. This greatly simplifies implementing new MappedFieldTypes, which will be particularly useful for the runtime fields project.	2020-07-09 21:05:10 +01:00
Dan Hermann	c26d2b5fa5	Data stream support for indices shard stores API	2020-07-09 13:11:45 -05:00
Nik Everett	28ef997953	Improve vwh's distant bucket handling (#59094 ) (#59248 ) This modifies the `variable_width_histogram`'s distant bucket handling to: 1. Properly handle integer overflows 2. Recalculate the average distance when new buckets are added on the ends. This should slow down the rate at which we build extra buckets as we build more of them. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-09 12:14:46 -04:00

... 6 7 8 9 10 ...

5755 Commits