Commit Graph

4431 Commits

Author SHA1 Message Date
Jason Tedor bc5dae2713
Fix compilation in RoutingNode
This commit fixes compilation in RoutingNode.java after a backport
brought back usage of an API not available in JDK 8.
2020-03-18 22:21:54 -04:00
Jason Tedor 90ab949415
Improve performance of shards limits decider (#53577)
On clusters with a large number of shards, the shards limits allocation
decider can exhibit poor performance leading to timeouts applying
cluster state updates. This occurs because for every shard, we do a loop
to count the number of shards on the node, and the number of shards for
the index of the shard. This is roughly quadratic in the number of
shards. This loop is not necessary, since we already have a O(1) method
to count the number of non-relocating shards on a node, and with this
commit we add some infrastructure to RoutingNode to make counting the
number of shards per index O(1).
2020-03-18 20:58:22 -04:00
Stuart Tettemer cdbee32f55
Scripting: Per-context script cache, default off (#52855) (#53756)
* Adds per context settings:
  `script.context.${CONTEXT}.cache_max_size` ~
  `script.cache.max_size`

  `script.context.${CONTEXT}.cache_expire` ~
  `script.cache.expire`

  `script.context.${CONTEXT}.max_compilations_rate` ~
  `script.max_compilations_rate`

* Context cache is used if:
  `script.max_compilations_rate=use-context`.  This
  value is dynamically updatable, so users can
  switch back to the general cache if desired.

* Settings for context caches take the first value
  that applies:
  1) Context specific settings if set, eg
     `script.context.ingest.cache_max_size`
  2) Correlated general setting is set to the non-default
     value, eg `script.cache.max_size`
  3) Context default

The reason for 2's inclusion is to allow an easy
transition for users who've customized their general
cache settings.

Using the general cache settings for the context caches
results in higher effective settings, since they are
multiplied across the number of contexts.  So a general
cache max size of 200 will become 200 * # of contexts.
However, this behavior it will avoid users snapping to a
value that is too low for them.

Backport of: #52855
Refs: #50152
2020-03-18 14:44:04 -06:00
Jim Ferenczi 8e17322b3a
Shortcut query phase using the results of other shards (#51852) (#53659)
This commit, built on top of #51708, allows to modify shard search requests based on informations collected on other shards. It is intended to speed up sorted queries on time-based indices. For queries that are only interested in the top documents.

This change will rewrite the shard queries to match none if the bottom sort value computed in prior shards is better than all values in the shard.
For queries that mix top documents and aggregations this change will reset the size of the top documents to 0 instead of rewriting to match none.
This means that we don't need to keep a search context open for this shard since we know in advance that it doesn't contain any competitive hit.
2020-03-18 17:20:35 +01:00
Nhat Nguyen 1615c4b379
Fix testKeepTranslogAfterGlobalCheckpoint (#53704)
Read the global checkpoint after flushed as we might advance it while flushing.

Closes #53505
2020-03-18 11:24:19 -04:00
Alan Woodward 580bc40c0c Make it possible to deprecate all variants of a ParseField with no replacement (#53722)
Sometimes we want to deprecate and remove a ParseField entirely, without replacement;
for example, the various places where we specify a _type field in 7x. Currently we can
tell users only that a particular field name should not be used, and that another name should
be used in its place. This commit adds the ability to say that a field should not be used at
all.
2020-03-18 14:16:19 +00:00
Marios Trivyzas d56dee599a
Increase step between checks for cancellation (#53712)
The introduction of the ExitableDirectoryReader showed increase of
latencies for range queries using pointvalues.

Check for cancellation every 1024 docs instead of every 15 to lower
the impact of the check in query's performance.

Follows: #52822
Fixes: #53496
(cherry picked from commit 6b5fc35e4458e60a7ca5822584ec6a60562f2c01)
2020-03-18 14:52:40 +01:00
Tanguy Leroux 6cc564d677 Restore off-heap loading for term dictionary in ReadOnlyEngine (#53713)
This is a partial restore of #43158, following decision taken in #51247

Closes #51247
2020-03-18 13:24:34 +01:00
Tianlun Li e7ae9ae596 Deprecate delaying state recovery for master nodes (#53646)
It is useful to be able to delay state recovery until enough data nodes have
joined the cluster, since this gives the shard allocator a decent opportunity
to re-use as much existing data as possible. However we also have the option to
delay state recovery until a certain number of master-eligible nodes have
joined, and this is unnecessary: we require a majority of master-eligible nodes
for state recovery, and there is no advantage in waiting for more.

This commit deprecates the unnecessary settings in preparation for their
removal.

Relates #51806
2020-03-18 10:04:22 +00:00
Lee Hinman 9c0e846db3
[7.x] Add REST API for ComponentTemplate CRUD (#53558) (#53681)
* Add REST API for ComponentTemplate CRUD

This adds the Put/Get/DeleteComponentTemplate APIs that allow inserting, retrieving, and removing
ComponentTemplateMetadata into the cluster state metadata.

These APIs are currently only available behind a feature flag system property -
`es.itv2_feature_flag_registered`.

Relates to #53101

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-03-17 13:23:28 -06:00
Ryan Ernst 5c472fcb47 Upgrade jackson to 2.10.3 and GeoIP to 2.13.1 (#53642)
Re-applies the change from #53523 along with test fixes.

closes #53626
closes #53624
closes #53622
closes #53625

Co-authored-by: Nik Everett <nik9000@gmail.com>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: Jake Landis <jake.landis@elastic.co>
2020-03-17 10:28:51 -07:00
Alan Woodward 71b703edd1 Rename AtomicFieldData to LeafFieldData (#53554)
This conforms with lucene's LeafReader naming convention, and
matches other per-segment structures in elasticsearch.
2020-03-17 12:30:12 +00:00
Jason Tedor 01d2339883
Invoke response handler on failure to send (#53631)
Today it can happen that a transport message fails to send (for example,
because a transport interceptor rejects the request). In this case, the
response handler is never invoked, which can lead to necessary cleanups
not being performed. There are two ways to handle this. One is to expect
every callsite that sends a message to try/catch these exceptions and
handle them appropriately. The other is merely to invoke the response
handler to handle the exception, which is already equipped to handle
transport exceptions.
2020-03-16 21:28:24 -04:00
Jason Tedor 881d0bfa8a
Add server name to remote info API (#53634)
This commit adds the configured server_name to the proxy mode info so
that it can be exposed in the remote info API.
2020-03-16 21:20:42 -04:00
Luca Cavanna c3d2417448
Cumulative backport of async search changes (#53635)
* Submit async search to work only with POST (#53368)

Currently the submit async search API can be called using both GET and POST at REST, but given that it submits a call and creates internal state, POST should be the only allowed method.

* Refine SearchProgressListener internal API (#53373)

The following cumulative improvements have been made:
- rename `onReduce` and `notifyReduce` to `onFinalReduce` and `notifyFinalReduce`
- add unit test for `SearchShard`
- on* methods in `SearchProgressListener` shouldn't need to be public as they should never be called directly, they only need to be overridden hence they can be made protected. They are actually called directly from a test which required some adapting, like making `AsyncSearchTask.Listener` class package private instead of private
- Instead of overriding `getProgressListener` in `AsyncSearchTask`, as it feels weird to override a getter method, added a specific method that allows to retrieve the Listener directly without needing to cast it. Made the getter and setter for the listener final in the base class.
- rename `SearchProgressListener#searchShards` methods to `buildSearchShards` and make it static given that it accesses no instance members
- make `SearchShard` and `SearchShardTask` classes final

* Move async search yaml tests to x-pack yaml test folder (#53537)

The yaml tests for async search currently sit in its qa folder. There is no reason though for them to live in a separate folder as they don't require particular setup. This commit moves them to the main folder together with the other x-pack yaml tests so that they will be run by the client test runners too.

* [DOCS] Add temporary redirect for async-search (#53454)

The following API spec files contain a link to a not-yet-created
async search docs page:

* [async_search.delete.json][0]
* [async_search.get.json][1]
* [async_search.submit.json][2]

The Elaticsearch-js client uses these spec files to create their docs.
This created a broken link in the Elaticsearch-js docs, which has broken
the docs build.

This PR adds a temporary redirect for the docs page. This redirect
should be removed when the actual API docs are added.

[0]: https://github.com/elastic/elasticsearch/blob/master/x-pack/plugin/src/test/resources/rest-api-spec/api/async_search.delete.json
[1]: https://github.com/elastic/elasticsearch/blob/master/x-pack/plugin/src/test/resources/rest-api-spec/api/async_search.get.json
[2]: https://github.com/elastic/elasticsearch/blob/master/x-pack/plugin/src/test/resources/rest-api-spec/api/async_search.submit.json

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2020-03-17 00:08:17 +01:00
Nik Everett 9845dbb7d6
Fix sorting agg buckets by doc_count (backport of #53617) (#53627)
I broke sorting aggregations by `doc_count` in #51271 by mixing up true
and false. This flips that comparison and adds a few tests to double
check that we don't so this again.
2020-03-16 17:35:43 -04:00
Nik Everett f0beab4041
Stop using round-tripped PipelineAggregators (backport of #53423) (#53629)
This begins to clean up how `PipelineAggregator`s and executed.
Previously, we would create the `PipelineAggregator`s on the data nodes
and embed them in the aggregation tree. When it came time to execute the
pipeline aggregation we'd use the `PipelineAggregator`s that were on the
first shard's results. This is inefficient because:
1. The data node needs to make the `PipelineAggregator` only to
   serialize it and then throw it away.
2. The coordinating node needs to deserialize all of the
   `PipelineAggregator`s even though it only needs one of them.
3. You end up with many `PipelineAggregator` instances when you only
   really *need* one per pipeline.
4. `PipelineAggregator` needs to implement serialization.

This begins to undo these by building the `PipelineAggregator`s directly
on the coordinating node and using those instead of the
`PipelineAggregator`s in the aggregtion tree. In a follow up change
we'll stop serializing the `PipelineAggregator`s to node versions that
support this behavior. And, one day, we'll be able to remove
`PipelineAggregator` from the aggregation result tree entirely.

Importantly, this doesn't change how pipeline aggregations are declared
or parsed or requested. They are still part of the `AggregationBuilder`
tree because *that* makes sense.
2020-03-16 16:15:23 -04:00
Gordon Brown 031932b32f
Allow _cat indices & aliases to use indices options (#53248)
This commit adjusts the _cat/indices and _cat/aliases APIs to allow
specifying indices options, so that these APIs can handle hidden
indices/aliases in the same way as other APIs.

Also adds the hidden option to the expand_wildcards parameter
in the YAML spec for every API that accepts it.
2020-03-16 11:25:05 -06:00
markharwood 2c74f3e22c
Backport of new wildcard field type (#53590)
* New wildcard field optimised for wildcard queries (#49993)

Indexes values using size 3 ngrams and also stores the full original as a binary doc value.
Wildcard queries operate by using a cheap approximation query on the ngram field followed up by a more expensive verification query using an automaton on the binary doc values.  Also supports aggregations and sorting.
2020-03-16 15:07:13 +00:00
Mayya Sharipova a906f8a0e4
Highlighters skip ignored keyword values (#53408) (#53604)
Keyword field values with length more than ignore_above are not
indexed. But highlighters still were retrieving these values
from _source and were trying to highlight them. This sometimes lead to
errors if a field length exceeded  max_analyzed_offset. But also this
is an overall wrong behaviour to attempt to highlight something that was
ignored during indexing.

This PR checks if a keyword value was ignored because of its length,
and if yes, skips highlighting it.

Backport: #53408
Closes #43800
2020-03-16 11:06:25 -04:00
Jim Ferenczi e6680be0b1
Add new x-pack endpoints to track the progress of a search asynchronously (#49931) (#53591)
This change introduces a new API in x-pack basic that allows to track the progress of a search.
Users can submit an asynchronous search through a new endpoint called `_async_search` that
works exactly the same as the `_search` endpoint but instead of blocking and returning the final response when available, it returns a response after a provided `wait_for_completion` time.

````
GET my_index_pattern*/_async_search?wait_for_completion=100ms
{
  "aggs": {
    "date_histogram": {
      "field": "@timestamp",
      "fixed_interval": "1h"
    }
  }
}
````

If after 100ms the final response is not available, a `partial_response` is included in the body:

````
{
  "id": "9N3J1m4BgyzUDzqgC15b",
  "version": 1,
  "is_running": true,
  "is_partial": true,
  "response": {
   "_shards": {
       "total": 100,
       "successful": 5,
       "failed": 0
    },
    "total_hits": {
      "value": 1653433,
      "relation": "eq"
    },
    "aggs": {
      ...
    }
  }
}
````

The partial response contains the total number of requested shards, the number of shards that successfully returned and the number of shards that failed.
It also contains the total hits as well as partial aggregations computed from the successful shards.
To continue to monitor the progress of the search users can call the get `_async_search` API like the following:

````
GET _async_search/9N3J1m4BgyzUDzqgC15b/?wait_for_completion=100ms
````

That returns a new response that can contain the same partial response than the previous call if the search didn't progress, in such case the returned `version`
should be the same. If new partial results are available, the version is incremented and the `partial_response` contains the updated progress.
Finally if the response is fully available while or after waiting for completion, the `partial_response` is replaced by a `response` section that contains the usual _search response:

````
{
  "id": "9N3J1m4BgyzUDzqgC15b",
  "version": 10,
  "is_running": false,
  "response": {
     "is_partial": false,
     ...
  }
}
````

Asynchronous search are stored in a restricted index called `.async-search` if they survive (still running) after the initial submit. Each request has a keep alive that defaults to 5 days but this value can be changed/updated any time:
`````
GET my_index_pattern*/_async_search?wait_for_completion=100ms&keep_alive=10d
`````
The default can be changed when submitting the search, the example above raises the default value for the search to `10d`.
`````
GET _async_search/9N3J1m4BgyzUDzqgC15b/?wait_for_completion=100ms&keep_alive=10d
`````
The time to live for a specific search can be extended when getting the progress/result. In the example above we extend the keep alive to 10 more days.
A background service that runs only on the node that holds the first primary shard of the `async-search` index is responsible for deleting the expired results. It runs every hour but the expiration is also checked by running queries (if they take longer than the keep_alive) and when getting a result.

Like a normal `_search`, if the http channel that is used to submit a request is closed before getting a response, the search is automatically cancelled. Note that this behavior is only for the submit API, subsequent GET requests will not cancel if they are closed.

Asynchronous search are not persistent, if the coordinator node crashes or is restarted during the search, the asynchronous search will stop. To know if the search is still running or not the response contains a field called `is_running` that indicates if the task is up or not. It is the responsibility of the user to resume an asynchronous search that didn't reach a final response by re-submitting the query. However final responses and failures are persisted in a system index that allows
to retrieve a response even if the task finishes.

````
DELETE _async_search/9N3J1m4BgyzUDzqgC15b
````

The response is also not stored if the initial submit action returns a final response. This allows to not add any overhead to queries that completes within the initial `wait_for_completion`.

The `.async-search` index is a restricted index (should be migrated to a system index in +8.0) that is accessible only through the async search APIs. These APIs also ensure that only the user that submitted the initial query can retrieve or delete the running search. Note that admins/superusers would still be able to cancel the search task through the task manager like any other tasks.

Relates #49091

Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com>
2020-03-16 15:31:27 +01:00
David Turner 7e82a4f78c Do not log no-op reconnections at DEBUG (#53469)
Today the NodeConnectionsService emits a DEBUG-level log message each time it
calls TransportService#connectToNode, which happens for every node in the
cluster every ten seconds, and also at every cluster state update. That's a lot
of log messages. Most of these calls are no-ops and can be ignored, but if the
call was not a no-op then it may be worth investigating further. Since the logs
do not distinguish the interesting and uninteresting cases, they are not
useful.

This commit distinguishes the two cases and pushes the noisy logging for the
common no-op case down to TRACE level, leaving only useful and actionable
information in the DEBUG-level logs.
2020-03-16 08:56:20 +00:00
Mark Vieira 2f0aca992b
Revert "Upgrade to Jackson 2.10.3 and GeoIP2 to 2.13.1 (#53576)"
This reverts commit b7dbadeea0.
2020-03-15 18:10:40 -07:00
Jason Tedor 66374b61ca
Remove extra code in allocation commands parsing (#53579)
This commit removes some code that is duplicated in the parsing of
allocation commands in the cluster reroute API.
2020-03-14 18:14:13 -04:00
Jason Tedor b7dbadeea0
Upgrade to Jackson 2.10.3 and GeoIP2 to 2.13.1 (#53576)
This commit upgrades our Jackson dependency to 2.10.3 and our GeoIP2
dependency to 2.13.1.

Relates #53523
2020-03-14 13:28:06 -04:00
Marios Trivyzas b6c94fd73e
Fix Term Vectors with artificial docs and keyword fields (#53504) (#53550)
Previously, Term Vectors API was returning empty results for
artificial documents with keyword fields. Checking only for `string()`
on `IndexableField` is not enough, since for `KeywordFieldType`
`binaryValue()` must be used instead.

Fixes #53494

(cherry picked from commit 1fc3fe3d32f41eab2101c0536751b7c47e63cc48)
2020-03-13 19:26:14 +01:00
Dan Hermann fb29c2dccf
Fix ingest pipeline _simulate api with empty docs never returns a res… (#52937) (#53547) 2020-03-13 09:41:14 -05:00
William Brafford 5b718d2565
Use snake case for nodes stats/info metric names (#53446) (#53535)
* Use snake case for nodes stats/info metric names (#53446)

The REST API uses "thread_pool" as the name of the thread pool metric.
If we use this name internally when we serialize nodes stats and info
requests, we won't need to do any fancy logic to check for and switch
out "threadPool", which was the previous internal name.
2020-03-13 07:49:14 -04:00
Jim Ferenczi 9dfcc07401 Fix pre-sorting of shards in the can_match phase (#53397)
This commit fixes a bug on sorted queries with a primary sort field
that uses different types in the requested indices. In this scenario
the returned min/max values to sort the shards are not comparable so
we should avoid the sorting rather than throwing an obscure exception.
2020-03-13 01:28:11 +01:00
Nhat Nguyen fe2f6b359e Fix concurrent requests race over scroll context limit (#53449)
Concurrent search scroll requests can lead to more scroll contexts than the limit.
2020-03-12 17:56:51 -04:00
Lee Hinman 2789fe4179
[7.x] Add ComponentTemplate to MetaData (#53290) (#53489)
* Add ComponentTemplate to MetaData (#53290)

* Add ComponentTemplate to MetaData

This adds a `ComponentTemplate` datastructure that will be used as part of #53101 (Index Templates
v2) to the `MetaData` class. Currently there are no APIs for interacting with this class, so it will
always be an empty map (other than in tests). This infrastructure will be built upon to add APIs in
a subsequent commit.

A `ComponentTemplate` is made up of a `Template`, a version, and a MetaData.Custom class. The
`Template` contains similar information to an `IndexTemplateMetaData` object— settings, mappings,
and alias configuration.

* Update minimal supported version constant

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-03-12 15:33:32 -06:00
Nik Everett 9dcd64c110
Preserve metric types in top_metrics (backport of #53288) (#53440)
This changes the `top_metrics` aggregation to return metrics in their
original type. Since it only supports numerics, that means that dates,
longs, and doubles will come back as stored, with their appropriate
formatter applied.
2020-03-12 17:17:09 -04:00
Jay Modi 0353b804bf
Mute testKeepTranslogAfterGlobalCheckpoint (#53510)
This change mutes a test that fails reproducibly in
InternalEngineTests.

Relates #53505
2020-03-12 13:08:13 -06:00
Lee Hinman 67fffe676e
[7.x] Add read/writeOptionalVLong to StreamInput/Output (#5314… (#53491)
The spirit of StreamInput/StreamOutput is that common I/O patterns should
be handled by these classes so that the persistence methods in
application classes can be kept short, which facilitates easy visual
comparison between read and write methods, and reduces risks of having
serialization issues due to mismatched implementations.

To this end, this change adds readOptionalVLong and writeOptionalVLong
methods to these classes as we have started to build up cases where
that conditional/null logic has been implemented directly in the read &
write methods.

Co-authored-by: Tim Vernum <tim.vernum@elastic.co>
2020-03-12 10:59:31 -06:00
Przemyslaw Gomulka 2438b899eb
Support joda style date patterns in 7.x (#52555)
If an index was created in version 6 and contain a date field with a joda-style pattern it should still be allowed to search and insert document into it.
Those created in 6 but date pattern starts with 8, should be considered as java style.
2020-03-12 08:57:03 +01:00
Nik Everett 9ada508347
Fix date_nanos in composite aggs (backport of #53315) (#53347)
It looks like `date_nanos` fields weren't likely to work properly in
composite aggs because composites iterate field values using points and
we weren't converting the points into milliseconds. Because the doc
values were coming back in milliseconds we ended up geting very confused
and just never collecting sub-aggregations.

This fixes that by adding a method to `DateFieldMapper.Resolution` to
`parsePointAsMillis` which is similarly in name and function to
`NumberFieldMapper.NumberType`'s `parsePoint` except that it normalizes
to milliseconds which is what aggs need at the moment.

Closes #53168
2020-03-11 13:00:07 -04:00
Nhat Nguyen 1fd56698fa Adjust wire version for search context id
Relates #53143
2020-03-11 11:48:11 -04:00
Nhat Nguyen 6665ebe7ab Harden search context id (#53143)
Using a Long alone is not strong enough for the id of search contexts
because we reset the id generator whenever a data node is restarted.
This can lead to two issues:

1. Fetch phase can fetch documents from another index
2. A scroll search can return documents from another index

This commit avoids these issues by adding a UUID to SearchContexId.
2020-03-11 11:48:11 -04:00
David Turner ac721938c2 Allow joining node to trigger term bump (#53338)
In rare circumstances it is possible for an isolated node to have a greater
term than the currently-elected leader. Today such a node will attempt to join
the cluster but will not offer a vote to the leader and will reject its cluster
state publications due to their stale term. This situation persists since there
is no mechanism for the joining node to inform the leader that its term is
stale and a new election is required.

This commit adds the current term of the joining node to the join request. Once
the join has been validated, the leader will perform another election to
increase its term far enough to allow the isolated node to join properly.

Fixes #53271
2020-03-11 09:19:44 +00:00
Armin Braun 7189c57b6c
Record Force Merges in Live Commit Data (#52694) (#53372)
* Record Force Merges in live commit data

Prerequisite of #52182. Record force merges in the live commit data
so two shard states with the same sequence number that differ only in whether
or not they have been force merged can be distinguished when creating snapshots.
2020-03-11 06:30:36 +01:00
Nhat Nguyen 24f114766f Fix doc_stats and segment_stats of ReadOnlyEngine (#53345)
We can't always have the same segment stats and doc stats between
InternalEngine and ReadOnlyEngine if there are some fully deleted
segments. ReadOnlyEngine always filters out them. InternalEngine,
however, will keep them if peer recovery retention leases exist or the
number of the retaining operations is non-zero.

This change reverts the fix in #51331 and uses the wrapped reader to
calculate the segment stats and doc stats. For the test, we need to
disable the extra retaining soft-deletes operations.

Closes #51303
2020-03-10 21:51:33 -04:00
Gordon Brown 20bbe5bae4
Fix Rollover handing of hidden aliases (#53146)
Prior to this commit, rollover did not propagate the `is_hidden` alias
property when rollover over an index. This commit ensures that an alias
that's rollover over will remain hidden.
2020-03-10 10:56:12 -06:00
Nik Everett 5ce6de2c1a
Simplify SiblingPipelineAggregator (#53144) (#53341)
This removes the `instanceof`s from `SiblingPipelineAggregator` by
adding a `rewriteBuckets` method to `InternalAggregation` that can be
called to, well, rewrite the buckets. The default implementation of
`rewriteBuckets` throws the same exception that was thrown when you
attempted to run a `SiblingPipelineAggregator` on an aggregation without
buckets. It is overridden by `InternalSingleBucketAggregation` and
`InternalMultiBucketAggregation` to correctly rewrite their buckets.
2020-03-10 11:39:10 -04:00
Nik Everett 89c0e1f566
Fix composite agg sort bug (backport of #53296) (#53337)
When an composite aggregation is run against an index with a sort that
*starts* with the "source" fields from the composite but has additional
fields it'd blow up in while trying to decide if it could use the sort.
This changes it to decide that it *can* use the sort.

Closes #52480
2020-03-10 11:32:46 -04:00
Jim Ferenczi ae6c25b749 Speed up partial reduce of terms aggregations (#53216)
This change optimizes the merge of terms aggregations by removing
the priority queue that was used to collect all the buckets during
a non-final reduction. We don't need to keep the result sorted since
the merge of buckets in a subsequent reduce can modify the order.
I wrote a small micro-benchmark to test the change and the speed ups
are significative for small merge buffer sizes:

````
########## Master:
Benchmark                           (bufferSize)  (cardinality)  (numShards)  (topNSize)  Mode  Cnt     Score     Error  Units
TermsReduceBenchmark.reduceTopHits             5          10000         1000        1000  avgt   10  2459,690 ± 198,682  ms/op
TermsReduceBenchmark.reduceTopHits            16          10000         1000        1000  avgt   10  1030,620 ±  91,544  ms/op
TermsReduceBenchmark.reduceTopHits            32          10000         1000        1000  avgt   10   558,608 ±  44,915  ms/op
TermsReduceBenchmark.reduceTopHits           128          10000         1000        1000  avgt   10   287,333 ±   8,342  ms/op
TermsReduceBenchmark.reduceTopHits           512          10000         1000        1000  avgt   10   257,325 ±  54,515  ms/op

########## Patch:
Benchmark                           (bufferSize)  (cardinality)  (numShards)  (topNSize)  Mode  Cnt    Score    Error  Units
TermsReduceBenchmark.reduceTopHits             5          10000         1000        1000  avgt   10  805,611 ± 14,630  ms/op
TermsReduceBenchmark.reduceTopHits            16          10000         1000        1000  avgt   10  378,851 ± 17,929  ms/op
TermsReduceBenchmark.reduceTopHits            32          10000         1000        1000  avgt   10  261,094 ± 10,176  ms/op
TermsReduceBenchmark.reduceTopHits           128          10000         1000        1000  avgt   10  241,051 ± 19,558  ms/op
TermsReduceBenchmark.reduceTopHits           512          10000         1000        1000  avgt   10  231,643 ±  6,170  ms/op
````

The code for the benchmark can be found [here](). It seems to be up to 3x faster for terms aggregations
that return 10,000 unique terms (1000 terms per shard). For a cardinality of 100,000 terms, this patch is up to 5x faster:

````
########## Patch:
Benchmark                           (bufferSize)  (cardinality)  (numShards)  (topNSize)  Mode  Cnt      Score     Error  Units
TermsReduceBenchmark.reduceTopHits             5         100000         1000        1000  avgt   10  12791,083 ± 397,128  ms/op
TermsReduceBenchmark.reduceTopHits            16         100000         1000        1000  avgt   10   3974,939 ± 324,617  ms/op
TermsReduceBenchmark.reduceTopHits            32         100000         1000        1000  avgt   10   2186,285 ± 267,124  ms/op
TermsReduceBenchmark.reduceTopHits           128         100000         1000        1000  avgt   10    914,657 ± 160,784  ms/op
TermsReduceBenchmark.reduceTopHits           512         100000         1000        1000  avgt   10    604,198 ± 145,457  ms/op

########## Master:
Benchmark                           (bufferSize)  (cardinality)  (numShards)  (topNSize)  Mode  Cnt      Score     Error  Units
TermsReduceBenchmark.reduceTopHits             5         100000         1000        1000  avgt   10  60696,107 ± 929,944  ms/op
TermsReduceBenchmark.reduceTopHits            16         100000         1000        1000  avgt   10  16292,894 ± 783,398  ms/op
TermsReduceBenchmark.reduceTopHits            32         100000         1000        1000  avgt   10   7705,444 ±  77,588  ms/op
TermsReduceBenchmark.reduceTopHits           128         100000         1000        1000  avgt   10   2156,685 ±  88,795  ms/op
TermsReduceBenchmark.reduceTopHits           512         100000         1000        1000  avgt   10    760,273 ±  53,738  ms/op
````

The merge of buckets can also be optimized. Currently we use an hash map to merge buckets coming from different shards so this can be costly if the number of unique terms is high. Instead, we could always sort the shard terms result by key and perform a merge sort to reduce the results. This would save memory and make the merge more linear in terms
of complexity in the coordinating node at the expense of an additional sort in the shards. I plan to test this possible optimization in a follow up.

Relates #51857
2020-03-10 14:26:59 +01:00
Nik Everett e23c3f915f
Save a little space on empty BitArrays (#53243) (#53316)
It doesn't make a whole lot of sense for `BitArray#clear` to grow the
underlying storage array just to clear the bit. We *already* treat
indices outside of the storage array as unset. This turns such
operations into a noop.
2020-03-10 09:22:19 -04:00
Alan Woodward 5c861cfe6e Upgrade to final lucene 8.5.0 snapshot (#53293)
Lucene 8.5.0 release candidates are imminent. This commit upgrades master to use
the latest snapshot to check that there are no last-minute bugs or regressions.
2020-03-10 09:32:59 +00:00
Gordon Brown 1cb0a4399d
Fix Get Alias API handling of hidden indices with visible aliases (#53147)
This commit changes the Get Aliases API to include hidden indices by
default - this is slightly different from other APIs, but is necessary
to make this API work intuitively.
2020-03-09 16:16:29 -06:00
William Brafford 2bb4b96a7f
Serialize NodesStatsRequest as set of strings (#53235) (#53313)
* Add unit tests before refactoring

* Convert boolean fields to set of strings

In order to make nodes stats plugins pluggable, we need to make the
NodesStatsRequest class capable of carrying a flexible list of metrics
rather than a fixed list of boolean flags. This commit changes the
internal storage of the class without changing its serialization.

* Change serialization of NodesStatsRequest

* Set up BWC before merging

* Singularize enum name
2020-03-09 18:13:29 -04:00
Jason Tedor 1860c57147
Deprecate the listener thread pool (#53266)
The listener thread pool is being removed from use in the server
codebase. This commit deprecates configuring the listener thread pool.
2020-03-09 16:56:01 -04:00
David Turner b20f86e450 Clarify JavaDoc for DiscoveryNodes#resolveNodes (#53277)
Closes #52887
2020-03-09 14:44:29 +00:00
David Turner 52ff341814 Deprecate passing settings in restore requests (#53268)
Today we accept a `settings` field in snapshot restore requests, but this field
is not used. This commit deprecates it.
2020-03-09 12:01:07 +00:00
Christoph Büscher 2fd954a3b7 Fix potential NPE in FuzzyTermsEnum (#53231)
Under certain circumstances SpanMultiTermQueryWrapper uses
SpanBooleanQueryRewriteWithMaxClause as its rewrite method, which in turn tries
to get a TermsEnum from the wrapped MultiTermQuery currently using a `null`
AttributeSource. While queries TermsQuery or subclasses of AutomatonQuery ignore
this argument, FuzzyQuery uses it to create a FuzzyTermsEnum which triggers an
NPE when the AttributeSource is not provided. This PR fixes this by supplying an
empty AttributeSource instead of a `null` value.

Closes #52894
2020-03-09 12:59:08 +01:00
Jason Tedor 5e96d3e59a
Use given executor for global checkpoint listener (#53260)
Today when notifying a global checkpoint listener, we use the listener
thread pool. This commit turns this inside out so that the global
checkpoint listener must provide an executor on which to notify the
listener.
2020-03-08 13:51:05 -04:00
Jason Tedor 79b67eb3ba
Drop action future that forks on listener executor (#53261)
This commit drops the dispatching listenable action future that forks to
the listener thread pool. This was previously used in the transport
client but is no longer used.
2020-03-08 12:36:09 -04:00
Jason Tedor a0b235888f
Avoid self-suppression on grouped action listener (#53262)
It can be that a failure is repeated to a grouped action listener. For
example, if the same exception such as a connect transport exception, is
the cause of repeated failures. Previously we were unconditionally
self-suppressing the exception into the first exception, but
self-supressing is not allowed. Thus, we would throw an exception and
the grouped action listener would never complete. This commit addresses
this by guarding against self-suppression.
2020-03-08 08:59:57 -04:00
Jason Tedor c5738ae312
Notify refresh listeners on the calling thread (#53259)
Today we notify refresh listeners by forking to the listener thread pool
and then serially notifying listeners on a thread there. Refreshes are
expensive though, so the expectation is that we are executing refreshes
on threads that can afford an expensive operation (e.g., not a network
thread) and as such, executing listeners that we expect to be cheap aon
the calling thread is okay. This commit removes the forking of notifying
refresh listeners to run directly on the calling thread that executed a
refresh.
2020-03-07 13:12:40 -05:00
Gordon Brown ff9b8bda63
Implement hidden aliases (#52547)
This commit introduces hidden aliases. These are similar to hidden
indices, in that they are not visible by default, unless explicitly
specified by name or by indicating that hidden indices/aliases are
desired.

The new alias property, `is_hidden` is implemented similarly to
`is_write_index`, except that it must be consistent across all indices
with a given alias - that is, all indices with a given alias must
specify the alias as either hidden, or all specify it as non-hidden,
either explicitly or by omitting the `is_hidden` property.
2020-03-06 16:02:38 -07:00
Nik Everett 7c9641ef9d
Simplify BucketedSort (#53199) (#53240)
Our lovely `BitArray` compactly stores "flags", lazilly growing its
underlying storage. It is super useful when you need to store one bit of
data for a zillion buckets or a documents or something. Usefully, it
defaults to `false`. But there is a wrinkle! If you ask it whether or
not a bit is set but it hasn't grown its underlying storage array
"around" that index then it'll throw an `ArrayIndexOutOfBoundsException`.
The per-document use cases tend to show up in order and don't tend to
mind this too much. But the use case in aggregations, the per-bucket use
case, does. Because buckets are collected out of order all the time.

This changes `BitArray` so it'll return `false` if the index is too big
for the underlying storage. After all, that index *can't* have been set
or else we would have grown the underlying array. Logically, I believe
this makes sense. And it makes my life easy. At the cost of three lines.

*but* this adds an extra test to every call to `get`. I think this is
likely ok because it is "very close" to an array index lookup that
already runs the same test. So I *think* it'll end up merged with the
array bounds check.
2020-03-06 15:27:51 -05:00
Jay Modi a81460dbf5
Make watch history indices hidden (#52974)
This commit updates the template used for watch history indices with
the hidden index setting so that new indices will be created as hidden.

Relates #50251
Backport of #52962
2020-03-06 09:47:03 -07:00
Christoph Büscher 9e561c2921 Fix AbstractBulkByScrollRequest slices parameter via Rest (#53068)
Currently the AbstractBulkByScrollRequest accepts slice values of 0 via its
`setSlices` method, denoting the "auto" slicing behaviour that is usable by
settting the "slices=auto" parameter on rest requests. When using the High Level
Rest Client, however, we send the 0 value as an integer, which is then rejected
as invalid by `AbstractBulkByScrollRequest#parseSlices`. Instead of making
parsing of the rest request more lenient, this PR opts for changing the
RequestConverter logic in the client to translate 0 values to "auto" on the rest
requests.

Closes #53044
2020-03-06 15:38:04 +01:00
William Brafford d145b5536f
Serialize NodesInfoRequest as a set of strings (#53140) (#53202)
For Node Info to be pluggable, NodesInfoRequest must be able to carry
arbitrary strings. This commit reworks the internals of that class to
use a set rather than hard-coded boolean fields.

NodesInfoRequest defaults to specifying all values. We test for
this behavior as we refactor and use random testing for the
various combinations of metrics.

Add backwards compatibility for transport requests.
2020-03-06 09:07:49 -05:00
Marios Trivyzas 7ddbda4c20
Check for query cancellation during rewrite (#53166) (#53203)
With ExitableDirectoryReader in place, check for query cancellation
during QueryPhase#preProcess where the query rewriting takes place.

Follows: #52822

(cherry picked from commit 0d38626d8e6e9e2620a7a446b617a2ac42852461)
2020-03-06 11:04:01 +01:00
Alan Woodward c204137451 Deprecate BoolQueryBuilder's mustNot field (#53125)
The bool query builder in elasticsearch accepts both must_not and mustNot
fields. Given that leniency is abhorrent and must be eschewed, we should deprecate
the latter as it doesn't fit with the style of parameters elsewhere in the DSL.
2020-03-06 09:11:34 +00:00
Henning Andersen 2e924e4a83 Fix ClusterDisruptionIT.testAckedIndexing (#53169)
Use assertBusy when doing reroute after bridged disruption,
since it can return non-acked if a node is marked faulty
by follower check after disruption ended.

Closes #53064
2020-03-06 08:56:55 +01:00
Nhat Nguyen 5476a49833 Revert "upgrade to lucene-snapshot-fa75139efea (#53150) (#53151)"
This reverts commit 058113aa42.
2020-03-05 17:33:00 -05:00
Nhat Nguyen d456e8ffca Revert "Mute InternalEngineTests.testVersionOnPrimaryWithConcurrentRefresh"
This reverts commit 66788afa67.
2020-03-05 17:32:18 -05:00
Nhat Nguyen e9e209ae58 Revert "Mute InternalEngineTests.testRandomOperations"
This reverts commit d1cc2e68d5.
2020-03-05 17:32:11 -05:00
Nhat Nguyen dc78cc6131 Revert "Mute InternalEngineTests.testForceMergeWithSoftDeletesRetentionAndRecoverySource"
This reverts commit da8aac9e66.
2020-03-05 17:31:56 -05:00
Nhat Nguyen f11ae5fd14 Revert "Mute GatewayMetaStatePersistedStateTests.testDataOnlyNodePersistence"
This reverts commit 4452addf10.
2020-03-05 17:31:38 -05:00
James Baiera 4452addf10 Mute GatewayMetaStatePersistedStateTests.testDataOnlyNodePersistence 2020-03-05 16:44:03 -05:00
James Baiera da8aac9e66 Mute InternalEngineTests.testForceMergeWithSoftDeletesRetentionAndRecoverySource 2020-03-05 15:55:50 -05:00
James Baiera d1cc2e68d5 Mute InternalEngineTests.testRandomOperations 2020-03-05 15:09:47 -05:00
James Baiera 66788afa67 Mute InternalEngineTests.testVersionOnPrimaryWithConcurrentRefresh 2020-03-05 15:09:47 -05:00
Mayya Sharipova 7e2a9f58ee
script_score query errors on negative scores (#53133)
7.5 and 7.6 had a regression that allowed for
script_score queries to have negative scores.
We have corrected this regression in #52478.
This is an addition to #52478 that adds
a test and release notes.
2020-03-05 14:23:39 -05:00
Marios Trivyzas 487d442760
Implement Exitable DirectoryReader (#52822) (#53162)
Implement an Exitable DirectoryReader that wraps the original
DirectoryReader so that when a search task is cancelled the
DirectoryReaders also stop their work fast. This is usuful for
expensive operations like wilcard/prefix queries where the
DirectoryReaders can spend lots of time and consume resources,
as previously their work wouldn't stop even though the original
search task was cancelled (e.g. because of timeout or dropped client
connection).

(cherry picked from commit 67acaf61f33bc5f54e26541514d07e375c202e03)
2020-03-05 14:17:31 +01:00
Nik Everett 28df7ae5ed
Support multiple metrics in `top_metrics` agg (backport of #52965) (#53163)
This adds support for returning multiple metrics to the `top_metrics`
agg. It looks like:
```
POST /test/_search?filter_path=aggregations
{
  "aggs": {
    "tm": {
      "top_metrics": {
        "metrics": [
          {"field": "v"},
          {"field": "m"}
        ],
        "sort": {"s": "desc"}
      }
    }
  }
}
```
2020-03-05 08:12:01 -05:00
Alan Woodward 3cd4b97618 Remove UnknownNamedObjectException (#53105)
This was originally thrown from NamedXContentRegistry#parseNamedObject() but
that method now throws a NamedObjectNotFoundException, so this is unused.
2020-03-05 10:06:59 +00:00
Ignacio Vera 058113aa42
upgrade to lucene-snapshot-fa75139efea (#53150) (#53151) 2020-03-05 10:04:05 +01:00
Nik Everett 302980e0c4
Remove some ceremony in agg parsing (#53078) (#53117)
With #50871 aggrgations should now be parsed directly by an
`ObjectParser` or `ConstructingObjectParser` without the need for the
ceremonial `parse` method. This removes 9 of those `parse` methods and
parses the aggregation directly from their `ObjectParser`.
2020-03-04 13:06:41 -05:00
Tim Brooks f68917160e
Fix RemoteConnectionManager size() method (#52823)
Currently the remote connection manager will delegate the size() call to
the underlying cluster connection manager. This introduces the
possibility that call will return 1 before the nodeConnection method has
been triggered to add the connection to the remote connection list. This
can cause issues, as the ensureConnected method checks the connection
managers size and executes synchronously if the size is > 0. This leads
to a potential cluster not connected exception while we are still
waiting for the connection opened callback to be triggered.

This commit fixes this issue by using the remote connection manager's
size to report the connection manager's size.

Fixes #52029.
2020-03-04 09:53:22 -07:00
Yannick Welsch 8ab74fea58
[7.x] Add 7.6.2 as version (#53114) 2020-03-04 10:39:09 -06:00
Jake Landis f08ed1f69a
[7.x] add 6.8.8 as version (#53021) 2020-03-04 10:38:07 -06:00
Alan Woodward dfebbbf862 BoolQueryBuilder uses ObjectParser (#52880)
This commit removes the hand-rolled x-content parsing logic from BoolQueryBuilder
and instead uses an ObjectParser to handle parsing. It also removes the long-deprecated
(since version 6) disable_coord parameter.
2020-03-04 15:48:38 +00:00
Zachary Tong 3fcf598b92 Reduce deprecation log noise from DateIntervalWrapper (#52655)
Converts the deprecations to `deprecatedAndMaybeLog` to reduce the
number of times we log deprecations, since some of these could be called
at a high frequency (due to unconverted queries, aggs, etc)
2020-03-03 17:08:10 -05:00
Jay Modi c610e0893d
Introduce system index APIs for Kibana (#53035)
This commit introduces a module for Kibana that exposes REST APIs that
will be used by Kibana for access to its system indices. These APIs are wrapped
versions of the existing REST endpoints. A new setting is also introduced since
the Kibana system indices' names are allowed to be changed by a user in case
multiple instances of Kibana use the same instance of Elasticsearch.

Additionally, the ThreadContext has been extended to indicate that the use of
system indices may be allowed in a request. This will be built upon in the future
for the protection of system indices.

Backport of #52385
2020-03-03 14:11:36 -07:00
Nik Everett 7339427af5
Remove some deprecation warnings parsing aggs (backport of #53026) (#53072)
With #50871 aggrgations should now be parsed directly by an
`ObjectParser` or `ConstructingObjectParser` without the need for the
ceremonial `parse` method. This removes 10 of those `parse` methods and
parses the aggregation directly from their `ObjectParser`.
2020-03-03 15:27:49 -05:00
Luca Cavanna 8a05b670ca
Address MinAndMax generics warnings (#52642)
`MinAndMax` encapsulates min and max values for a shard. It uses generics to make sure that the values are of the same type and are also comparable. Though there are warnings whenever this class is currently used, which are addressed with this commit.

Relates to #49092
2020-03-03 16:08:10 +01:00
Adrien Grand cb868d2f5e
Introduce a `constant_keyword` field. (#49713) (#53024)
This field is a specialization of the `keyword` field for the case when all
documents have the same value. It typically performs more efficiently than
keywords at query time by figuring out whether all or none of the documents
match at rewrite time, like `term` queries on `_index`.

The name is up for discussion. I liked including `keyword` in it, so that we
still have room for a `singleton_numeric` in the future. However I'm unsure
whether to call it `singleton`, `constant` or something else, any opinions?

For this field there is a choice between
 1. accepting values in `_source` when they are equal to the value configured
    in mappings, but rejecting mapping updates
 2. rejecting values in `_source` but then allowing updates to the value that
    is configured in the mapping
This commit implements option 1, so that it is possible to reindex from/to an
index that has the field mapped as a keyword with no changes to the source.

Backport of #49713
2020-03-03 16:01:47 +01:00
Jason Tedor a154f9c657
Early return if no global checkpoint listeners (#53036)
When notifying global checkpoint listeners, we have an opportunity to
early return if there are not any registered listeners. This is
important since it saves some allocations, and also saves forking some
empty work to another thread. This commit adds an early return from
notifying listeners if there are not any registered.
2020-03-02 23:28:22 -05:00
Stuart Tettemer 210aab0935
Settings: AffixSettings as validator dependencies (#52973) (#52982)
Allow AffixSetting as validator dependencies.  If a validator
specifies AffixSettings as a dependency, then `validate(T, Map)`
will have the concrete setting in a map.

Backport of: #52973, 1e0ba70
Fixes: #52933
2020-02-29 09:38:46 -07:00
Nhat Nguyen e6755afeeb
Upgrade to Lucene 8.5.0-snapshot-c4475920b08 (#52950) (#52977)
To give LUCENE-9228 more CI cycles
2020-02-29 09:29:16 -05:00
Jay Modi 1cd0eee723
Remove TODO in IndexNameExpressionResolver (#52969)
This commit removes a TODO in the IndexNameExpressionResolver that
indicated the API should use a Set instead of a List. However, this
TODO was not completely correct since the ordering of arguments matters
due to negations when evaluating wildcards and since we also allow
a list of patterns like `*,-foo,*`, which would have a different
meaning even when using a Set with insertion ordering.

Relates #52788
Backport of #52963
2020-02-28 13:56:28 -07:00
Adrien Grand 331d4bb0af
HybridDirectory should mmap postings. (#52641) (#52873)
Since version 8.4, `MMapDirectory` has an optimization to read long[]
arrays directly in little endian order, which postings leverage. So it'd
be more efficient to open postings with `MMapDirectory`.

I refactored a bit the existing logic to better explain why every listed
file extension is open with `mmap`.
2020-02-28 18:45:46 +01:00
Martijn van Groningen 6aa9aaa2c6
Add validation for dynamic templates (#52890)
Backport of #51233 to the seven dot x branch.

Tries to load a `Mapper` instance for the mapping snippet of a dynamic template.
This should catch things like using an analyzer that is undefined or mapping attributes that are unused.

This is best effort:
* If `{{name}}` placeholder is used in the mapping snippet then validation is skipped.
* If `match_mapping_type` is not specified then validation is performed for all mapping types.
  If parsing succeeds with a single mapping type then this the dynamic mapping is considered valid.

If is detected that a dynamic template mapping snippet is invalid at mapping update time then the mapping update is failed for indices created on 8.0.0-alpha1 and later. For indices created on prior version a deprecation warning is omitted instead. In 7.x clusters the mapping update will never fail in case of an invalid dynamic template mapping snippet and a deprecation warning will always be omitted.

Closes #17411
Closes #24419

Co-authored-by: Adrien Grand <jpountz@gmail.com>
2020-02-28 10:35:04 +01:00
Nik Everett 407101c39b
Clean and document sorting with partialy built buckets (backport of #52769) (#52925)
The `terms` aggregation can be sortd by the results of its
sub-aggregations. Because it uses that sorting for filtering to the
top-n it tries not to construct all of the buckets for the child
aggregations. This has its own interesting problem around reduction, but
they aren't super relevant to this change. This change moves that
optimization from the `TermsAggregator` and into the aggregators being
sorted on. This should make it more clear what is going on and it
unifies this optimization with validating the sort.

Finally, this should enable some minor optimizations to save a few
comparisons when sorting multi-valued buckets. I'll get those in a
follow up because they are now *fairly* obvious. They probably won't be
a huge performance improvement, but it'll be nice anyway.
2020-02-27 17:50:55 -05:00
Nik Everett 1d1956ee93
Add size support to `top_metrics` (backport of #52662) (#52914)
This adds support for returning the top "n" metrics instead of just the
very top.

Relates to #51813
2020-02-27 16:12:52 -05:00
Lee Hinman e139d70abe Remove TODO in MaxSizeCondition (#52854)
Similar to what we did in #52794, this removes the TODO.

Relates again to #52505
2020-02-27 09:29:12 -07:00
Dan Hermann 3c8b46a8c1
[7.x] Handle errors when evaluating if conditions in processors (#52892) 2020-02-27 09:00:51 -06:00
hezhen Zhang 280d59c724 Append index name for the source of the cluster put-mapping task (#52690)
Add index name(s) into the source for the cluster state update done when putting mapping.
This ensures that the pending tasks API includes information on source indices.
2020-02-27 12:16:24 +01:00
David Turner 52fa465300
Cache completion stats between refreshes (#52872)
Computing the stats for completion fields may involve a significant amount of
work since it walks every field of every segment looking for completion fields.
Innocuous-looking APIs like `GET _stats` or `GET _cluster/stats` do this for
every shard in the cluster. This repeated work is unnecessary since these stats
do not change between refreshes; in many indices they remain constant for a
long time.

This commit introduces a cache for these stats which is invalidated on a
refresh, allowing most stats calls to bypass the work needed to compute them on
most shards.

Closes #51915
Backport of #51991
2020-02-27 10:01:24 +00:00
Nhat Nguyen 814c275f35 Add more assertions to testMaybeFlush (#52792)
We aren't able to reproduce or figure out the reason that failed this test.
This commit adds more assertions so we can narrow the scope.

Relates #52223
2020-02-26 17:08:18 -05:00
Nhat Nguyen 0a15a6bfad Fix testSeqNoCollision (#52588)
Adjusts the assertion as we trim translog more eagerly since #52556.

Relates #52556
Closes #52148
2020-02-26 17:08:18 -05:00
Nhat Nguyen 87e765609e Fix testResyncAfterPrimaryPromotion (#52615)
Adjusts the assertion as we might eagerly clean up translog during resync since #52556

Relates #52556
Closes #52598
2020-02-26 17:08:18 -05:00
Nhat Nguyen 5aa612c275 Fix testRestoreLocalHistoryFromTranslog (#52441)
Asserts that no new operations are made into the translog since we re-opened the engine.

Relates #51905
Closes #52410
2020-02-26 17:08:18 -05:00
Nhat Nguyen a92bf5ec61 Fix IndexShardIT#testMaybeFlush (#52247)
Since #51905, we use the local checkpoint of the safe commit to
calculate the number of uncommitted operations of a translog stats. If a
periodic flush triggered by afterWriteOperation completes before we sync
translog, then the last commit is not safe. We also need to sync
translog from Engine instead of the translog so that we can advance the
safe commit.

Relates #51905
Closes #52223
2020-02-26 17:08:18 -05:00
Nhat Nguyen d7fe135d90 Fix testPrepareIndexForPeerRecovery (#52245)
Since #51905, we skip translog recovery if the local checkpoint of the
safe commit equals to the global checkpoint. This change adjusts the
test not to create a new snapshot in that case.

Closes #52221
Relates #51905
2020-02-26 17:08:18 -05:00
Yannick Welsch 82ab1bc1ff Separate translog from index deletion conditions (#52556)
Separates the translog from the index deletion conditions (allowing the translog to be cleaned
up more eagerly), and avoids taking the write lock on the translog if no clean-up is actually
necessary.
2020-02-26 17:08:18 -05:00
Nhat Nguyen db6b9c21c7 Use local checkpoint to calculate min translog gen for recovery (#51905)
Today we use the translog_generation of the safe commit as the minimum
required translog generation for recovery. This approach has a
limitation, where we won't be able to clean up translog unless we flush.
Reopening an already recovered engine will create a new empty translog,
and we leave it there until we force flush.

This commit removes the translog_generation commit tag and uses the
local checkpoint of the safe commit to calculate the minimum required
translog generation for recovery instead.

Closes #49970
2020-02-26 17:08:18 -05:00
Dan Hermann 3ffd34617f
Switch to AtomicLong for ingestCurrent metric to prevent negative values (#52581) (#52834) 2020-02-26 13:26:26 -06:00
Jay Modi 07ef8ccff4
Allow dynamic updates for index.hidden setting (#52837)
This commit changes the `index.hidden` setting from being final to a
dynamic setting. While the setting being final allows for easier
reasoning about an index, making this setting update-able has more
benefits in that we can upgrade existing indices to be hidden and it
will enable future features that would dynamically make indices hidden.

Backport of #52772
2020-02-26 11:46:29 -07:00
Nik Everett bfaa487757
Switch pipeline agg parsing to ContextParser (#52776) (#52832)
We've pretty well settled on `ContextParser` for a generic interface to
`ObjectParser`-like-things. This switches the interface used for
building parsing pipeline aggregations to `ContextParser` which saves a
couple of little wrappers around `ObjectParser`.
2020-02-26 12:57:20 -05:00
Tim Brooks be8d704e2b
Remove seeds depedency for remote cluster settings (#52829)
Currently 3 remote cluster settings (ping interval, skip unavailable,
and compression) have a dependency on the seeds setting being
comfigured. With proxy mode, it is now possible that these settings the
seeds setting has not been configured. This commit removes this
dependency and adds new validation for these settings.
2020-02-26 10:17:25 -07:00
Adrien Grand 1807f86751
Generalize how queries on `_index` are handled at rewrite time (#52815)
Generalize how queries on `_index` are handled at rewrite time (#52486)

Since this change refactors rewrites, I also took it as an opportunity to adrress #49254: instead of returning the same queries you would get on a keyword field when a field is unmapped, queries get rewritten to a MatchNoDocsQueryBuilder.

This change exposed a couple bugs, like the fact that the percolator doesn't rewrite queries at query time, or that the significant_terms aggregation doesn't rewrite its inner filter, which I fixed.

Closes #49254
2020-02-26 15:37:43 +01:00
Luca Cavanna 9e38125464
Clarify when shard iterators get sorted (#52810)
Currently we have two ways to create a GroupShardsIterator: one that will resort the iterators based on their natural ordering, and another one that will leave them in their original order. This is currently done through two constructors, one that accepts a single argument which does the sorting, and another which accepts a second boolean argument to control whether sorting should happen or not. This second constructor is only called externally to disable the sorting.

By introducing a specific method to create a sorted shard iterator we clarify and make it easier to track when we do sort and when we do not as the iterators are externally sorted.
2020-02-26 13:58:20 +01:00
Jim Ferenczi a73ad248e8
Fix backport of #46731 (#52744)
This change fixes the incomplete backport of #46731 in 7.x (as of 7.5).
We now check if `max_children` is set on the top level nested sort and fails with an
exception if it's not the case.

Relates #46731
Closes #52202
2020-02-26 10:46:51 +01:00
Sachin Frayne d3c0a2f013 Improve the error message when loading text fielddata. (#52753)
Emphasize keyword over fielddata as the preferred way to use String fields for aggregations or sorting.
2020-02-25 15:45:44 -08:00
Lee Hinman 662f21fcea Remove TODO in MaxAgeCondition serialization (#52794)
* Remove TODO in MaxAgeCondition serialization

This removes the TODO with a message for any future readers regarding the code in question.

Resolves #52505
2020-02-25 15:47:36 -07:00
Tim Brooks c8ef9649e2
Force execution of finish shard bulk request (#51957) (#52484)
Currently the shard bulk request can be rejected by the write threadpool
after a mapping update. This introduces a scenario where the mapping
listener thread will attempt to finish the request and fsync. This
thread can potentially be a transport thread. This commit fixes this
issue by forcing the finish action to happen on the write threadpool.

Fixes #51904.
2020-02-25 14:37:11 -07:00
Nhat Nguyen 848d3bc153 Revert "Fix testKeepTranslogAfterGlobalCheckpoint"
This reverts commit a88d54eb2d.
2020-02-25 14:12:35 -05:00
Nhat Nguyen a88d54eb2d Fix testKeepTranslogAfterGlobalCheckpoint
Read the last synced global checkpoint after flushing as
we might advance it during committing.

CI: https://gradle-enterprise.elastic.co/s/7o6qengg4gva2
2020-02-25 11:49:24 -05:00
Alan Woodward 638f3e4183 Use ByteBuffersDirectory rather than RAMDirectory (#52768)
Lucene's RAMDirectory has been deprecated. This commit replaces all uses of
RAMDirectory in elasticsearch with the newer ByteBuffersDirectory. Most uses
are in tests, but the percolator and painless executor may get some small speedups.
2020-02-25 15:46:35 +00:00
Alan Woodward 18663b0a85 Don't index ranges including NOW in percolator (#52748)
Currently, date ranges queries using NOW-based date math are rewritten to
MatchAllDocs queries when being preprocessed for the percolator. However,
since we added the verification step, this can result in incorrect matches when
percolator queries are run without scores. This commit changes things to instead
wrap date queries that use NOW with a new DateRangeIncludingNowQuery.
This is a simple wrapper query that returns its delegate at rewrite time, but it can
be detected by the percolator QueryAnalyzer and be dealt with accordingly.

This also allows us to remove a method on QueryRewriteContext, and push all
logic relating to NOW-based ranges into the DateFieldMapper.

Fixes #52617
2020-02-25 12:18:16 +00:00
Ryan Ernst 5fba8cbc7b Rename local Environment var in Node to avoid confusion (#52602)
When the Node class is being constructed, an initial environment is
passed in with the initial settings for the node. Once the plugin
servicie is initialized, the final Environment+Settings are created, at
which point the initial environment should no longer be used. This
commit renames the constructor arg to avoid naming clashes with the
final environment variable.
2020-02-24 11:14:46 -08:00
Lee Hinman 7d9de8412a
[7.x] fix npe in RestPluginsAction (#52620) (de56de9a) (#52721)
Relates #45321

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: Kaihong.Wang <kyra.wkh@alibaba-inc.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-02-24 11:57:01 -07:00
Mayya Sharipova 034b1c0ba3
Correct boost calculation in script_score query (#52478) (#52724)
Before boost in script_score query was wrongly applied only to the subquery.
This commit makes sure that the boost is applied to the whole score
that comes out of script.

Closes #48465
2020-02-24 13:48:21 -05:00
Adrien Grand f993ef80f8
Move the terms index of `_id` off-heap. (#52518)
In #42838 we moved the terms index of all fields off-heap except the
`_id` field because we were worried it might make indexing slower. In
general, the indexing rate is only affected if explicit IDs are used, as
otherwise Elasticsearch almost never performs lookups in the terms
dictionary for the purpose of indexing. So it's quite wasteful to
require the terms index of `_id` to be loaded on-heap for users who have
append-only workloads. Furthermore I've been conducting benchmarks when
indexing with explicit ids on the http_logs dataset that suggest that
the slowdown is low enough that it's probably not worth forcing the terms
index to be kept on-heap. Here are some numbers for the median indexing
rate in docs/s:

| Run | Master  | Patch   |
| --- | ------- | ------- |
| 1   | 45851.2 | 46401.4 |
| 2   | 45192.6 | 44561.0 |
| 3   | 45635.2 | 44137.0 |
| 4   | 46435.0 | 44692.8 |
| 5   | 45829.0 | 44949.0 |

And now heap usage in MB for segments:

| Run | Master  | Patch    |
| --- | ------- | -------- |
| 1   | 41.1720 | 0.352083 |
| 2   | 45.1545 | 0.382534 |
| 3   | 41.7746 | 0.381285 |
| 4   | 45.3673 | 0.412737 |
| 5   | 45.4616 | 0.375063 |

Indexing rate decreased by 1.8% on average, while memory usage decreased
by more than 100x.

The `http_logs` dataset contains small documents and has a simple
indexing chain. More complex indexing chains, e.g. with more fields,
ingest pipelines, etc. would see an even lower decrease of indexing rate.
2020-02-24 18:14:12 +01:00
Alan Woodward 7dc41a3b83 Use BoostQuery rather than FunctionScoreQuery for query-time indices_boost (#52272)
This is a trivial change, but it should result in a slightly more efficient query boost.
2020-02-24 14:41:46 +00:00
Nik Everett d26d7721ea
Continue realizing sorting by aggregations (backport of #52298) (#52667)
This drops more of the `instanceof`s from `AggregationPath`. There are
still a couple in `AggregationPath`. And I ended up moving two into
`BucketsAggregator`, but I think this is still an improvement!
2020-02-23 17:13:55 -05:00
bellengao 02cb5b6c0e Return 429 status code on read_only_allow_delete index block (#50166)
We consider index level read_only_allow_delete blocks temporary since
the DiskThresholdMonitor can automatically release those when an index
is no longer allocated on nodes above high threshold.

The rest status has therefore been changed to 429 when encountering this
index block to signal retryability to clients.

Related to #49393
2020-02-22 16:24:25 +01:00
Jay Modi 8abfda0b59
Rename assertThrows to prevent naming clash (#52651)
This commit renames ElasticsearchAssertions#assertThrows to
assertRequestBuilderThrows and assertFutureThrows to avoid a
naming clash with JUnit 4.13+ and static imports of these methods.
Additionally, these methods have been updated to make use of
expectThrows internally to avoid duplicating the logic there.

Relates #51787
Backport of #52582
2020-02-21 13:30:11 -07:00
Stuart Tettemer 376932a47d
Scripting: split out compile limits and caching (#52498) (#52652)
Phase 1 of adding compilation limits per context.
* Refactor rate limiting and caching into separate class,
  `ScriptCache`,  which will be used per context.
* Disable compilation limit for certain tests.

Backport of 0866031
Refs: #50152
2020-02-21 12:10:51 -07:00
Jay Modi f3f6ff97ee
Single instance of the IndexNameExpressionResolver (#52604)
This commit modifies the codebase so that our production code uses a
single instance of the IndexNameExpressionResolver class. This change
is being made in preparation for allowing name expression resolution
to be augmented by a plugin.

In order to remove some instances of IndexNameExpressionResolver, the
single instance is added as a parameter of Plugin#createComponents and
PersistentTaskPlugin#getPersistentTasksExecutor.

Backport of #52596
2020-02-21 07:50:02 -07:00
markharwood 96d603979b
Upgrade Lucene to 8.5.0-snapshot-b01d7cb (#52584)
Upgrading 7x to same Lucene 8.5 version used in master
2020-02-21 10:25:03 +00:00
Armin Braun 0a09e15959
Add Caching for RepositoryData in BlobStoreRepository (#52341) (#52566)
Cache latest `RepositoryData` on heap when it's absolutely safe to do so (i.e. when the repository is in strictly consistent mode).

`RepositoryData` can safely be assumed to not grow to a size that would cause trouble because we often have at least two copies of it loaded at the same time when doing repository operations. Also, concurrent snapshot API status requests currently load it independently of each other and so on, making it safe to cache on heap and assume as "small" IMO.

The benefits of this move are:
* Much faster repository status API calls
   * listing all snapshot names becomes instant
   * Other operations are sped up massively too because they mostly operate in two steps: load repository data then load multiple other blobs to get the additional data
* Additional cloud cost savings
* Better resiliency, saving another spot where an IO issue could break the snapshot
* We can simplify a number of spots in the current code that currently pass around the repository data in tricky ways to avoid loading it multiple times in follow ups.
2020-02-21 10:20:07 +01:00
Armin Braun 4bb780bc37
Refactor Inflexible Snapshot Repository BwC (#52365) (#52557)
* Refactor Inflexible Snapshot Repository BwC (#52365)

Transport the version to use for  a snapshot instead of whether to use shard generations in the snapshots in progress entry. This allows making upcoming repository metadata changes in a flexible manner in an analogous way to how we handle serialization BwC elsewhere.
Also, exposing the version at the repository API level will make it easier to do BwC relevant changes in derived repositories like source only or encrypted.
2020-02-21 09:14:34 +01:00
Ignacio Vera 107f00a4ec
Add support for multipoint geoshape queries (#52133) (#52553)
Currently multi-point queries are not supported when indexing your data using BKD-backed geoshape strategy. This commit removes this limitation.
2020-02-21 07:45:53 +01:00
Yannick Welsch d76358c875
Deprecate fixed_auto_queue_size thread pool type (#52399)
Relates #52280
2020-02-20 11:11:06 +01:00
Yannick Welsch 3afb5ca133 Fix synchronization in ByteSizeCachingDirectory (#52512)
One particular code place was synchronizing on the wrong object.
2020-02-19 16:10:39 +01:00
Przemysław Witek 7cd997df84
[ML] Make ml internal indices hidden (#52423) (#52509) 2020-02-19 14:02:32 +01:00
Ignacio Vera 8d2261fe47
Refactor GeoShapeIndexer by extracting polygon / line decomposers (#52422) (#52506)
Refactor GeoShapeIndexer. We extract Polygon and Line decomposers which are in charge of breaking a shape around the dateline if needed.
2020-02-19 12:04:29 +01:00
Henning Andersen 9d40277d4c Deciders should not by default collect yes'es (#52438)
AllocationDeciders would collect Yes decisions when not asking for debug
info. Changed to only include Yes decisions when debug is requested
(explain).
2020-02-19 11:18:03 +01:00
Henning Andersen d4bc3b75dc Reindex: allow comma separated source indices (#52044)
Added ability to specify comma separated list of source indices without
array. Also fixed so that empty string results in validation error
rather than index does not exist.

Closes #51949
2020-02-19 09:23:15 +01:00
David Turner baf184c93f Avoid using WindowsFS in ClusterRerouteIT (#52488)
Issue #52000 looks like a case of cluster state updates being slower than
expected, but it seems that these slowdowns are relatively rare: most
invocations of `testDelayWithALargeAmountOfShards` take well under a minute in
CI, but there are occasional failures that take 6+ minutes instead.  When it
fails like this, cluster state persistence seems generally slow: most are
slower than expected, with some small updates even taking over 2 seconds to
complete.

The failures all have in common that they use `WindowsFS` to emulate Windows'
behaviour of refusing to delete files that are still open, by tracking all
files (really, inodes) and validating that deleted files are really closed
first. There is a suggestion that this is a little slow in the Lucene test
framework [1]. To see if we can attribute the slowdown to that common factor,
this commit suppresses the use of `WindowsFS` for this test suite.

[1] 4a513fa99f/lucene/test-framework/src/java/org/apache/lucene/util/TestRuleTemporaryFilesCleanup.java (L166)
2020-02-19 07:52:49 +00:00
Tim Brooks 8038f9bba6
Do not lock when generating time based uuid (#52436)
Currently we lock when generating time based uuids. The lock is
implemented to prevent concurrent writes to the last timestamp. The uuid
generation is an area of contention when indexing. This commit modifies
the code to use atomic compare and set operations to update the last
timestamp.
2020-02-18 09:55:51 -07:00
Tim Brooks 7fcd997b39
Do not lock on settings keyset if keys initialized (#52435)
Every time a setting#exist call is made we lock on the keyset to ensure
that it has been initialized. This a heavyweight operation that only
should be done once. This commit moves to a volatile read instead to
prevent unnecessary locking.
2020-02-18 09:36:07 -07:00
Tim Brooks a742c58d45
Extract a ConnectionManager interface (#51722)
Currently we have three different implementations representing a
`ConnectionManager`. There is the basic `ConnectionManager` which
holds all connections for a cluster. And a remote connection manager
which support proxy behavior. And a stubbable connection manager for
tests. The remote and stubbable instances use the delegate pattern,
so this commit extracts an interface for them all to implement.
2020-02-18 09:19:24 -07:00
Benedict Jin 0c4f7dc193
Minor code improvements (#51921)
Fix some whitespaces, comments and usage of `this.`.

(cherry picked from commit 9f59900bf6389172811eb2279c17a2dc7cd9dfdf)
2020-02-18 16:00:05 +01:00
David Turner 3d57a78deb Add extra logging for investigation into #52000 (#52472)
It looks like #52000 is caused by a slowdown in cluster state application
(maybe due to #50907) but I would like to understand the details to ensure that
there's nothing else going on here too before simply increasing the timeout.
This commit enables some relevant `DEBUG` loggers and also captures stack
traces from all threads rather than just the three hottest ones.
2020-02-18 13:02:33 +00:00
Armin Braun 57d6dd7e31
Fix Non-Verbose Snapshot List Missing Empty Snapshots (#52433) (#52456)
We were not including snapshots without indices in the non-verbose
listing because we used the snapshot -> indices mapping to get the
snapshots.
2020-02-18 11:37:53 +01:00