Commit Graph

5449 Commits

Author SHA1 Message Date
Armin Braun 6710104673
Fix Creating NOOP Tasks on SNAPSHOT Pool (#62152) (#62157)
Fixing a few spots where NOOP tasks on the snapshot pool were created needlessly.
Especially when it comes to mixed master+data nodes and concurrent snapshots these
hurt delete operation performance needlessly.
2020-09-09 14:05:17 +02:00
Luca Cavanna fbf0967e20 QueryPhaseResultConsumer to call notifyPartialReduce (#62083)
As part of #60275 QueryPhaseResultConsumer ended up calling SearchProgressListener#onPartialReduce directly instead of notifyPartialReduce. That means we don't catch exceptions that may occur while executing the progress listener callback.

This commit fixes the call and adds a test for this scenario.
2020-09-09 13:44:07 +02:00
Luca Cavanna ad83261348 Print out search request as part of async search task description (#62057)
Currently, the async search task is the task that will be running through the whole execution of an async search. While the submit async search task prints out the search as part of its description, async search task doesn't while it should.

With this commit we address that while also making sure that the description highlights that the task is originated from an async search.

Also, we streamline the way the description is printed out by SearchTask so that it does not get forgotten in the future.
2020-09-09 13:44:07 +02:00
Rory Hunter b7fd7cf154
Write deprecation logs to a data stream (#61966)
Backport of #58924.

Closes #46106. Introduce a mechanism for writing deprecation logs to a data stream
as well as to disk.
2020-09-09 12:16:28 +01:00
Armin Braun ed4984a32e
Remove Redundant Stream Wrapping from Compression (#62017) (#62132)
In many cases we don't need a `StreamInput` or `StreamOutput`
wrapper around these streams so I this commit adjusts the API
to just normal streams and adds the wrapping where necessary.
2020-09-09 03:27:38 +02:00
Nik Everett b8e9a7125f
Speed up empty highlighting many fields (backport of #61860) (#62122)
Kibana often highlights *everything* like this:
```
POST /_search
{
  "query": ...,
  "size": 500,
  "highlight": {
    "fields": {
      "*": { ... }
    }
  }
}
```

This can get slow when there are hundreds of mapped fields. I tested
this locally and unscientifically and it took a request from 20ms to
150ms when there are 100 fields. I've seen clusters with 2000 fields
where simple search go from 500ms to 1500ms just by turning on this sort
of highlighting. Even when the query is just a `range` that and the
fields are all numbers and stuff so it won't highlight anything.

This speeds up the `unified` highlighter in this case in a few ways:
1. Build the highlighting infrastructure once field rather than once pre
   document per field. This cuts out a *ton* of work analyzing the query
   over and over and over again.
2. Bail out of the highlighter before loading values if we can't produce
   any results.

Combined these take that local 150ms case down to 65ms. This is unlikely
to be really useful when there are only a few fetched docs and only a
few fields, but we often end up having many fields with many fetched
docs.
2020-09-08 15:49:50 -04:00
Alan Woodward 28fd4a2ae8 Convert RangeFieldMapper to parametrized form (#62058)
This also adds the ability to define a serialization check on Parameters, used
in this case to only serialize format and locale parameters if the mapper is a
date range.
2020-09-08 18:44:13 +01:00
Alan Woodward 5f05eef7e3 Convert some more mapping tests to MapperServiceTestCase (#62089)
We don't need to extend ESSingleNodeTestCase for all these tests.
2020-09-08 17:51:40 +01:00
Tim Brooks 075271758e
Keep checkpoint file channel open across fsyncs (#61744)
Currently we open and close the checkpoint file channel for every fsync.
This file channel can be kept open for the lifecycle of a translog
writer. This avoids the overhead of opening the file, checking file
permissions, and closing the file on every fsync.
2020-09-08 08:54:53 -06:00
Francisco Fernández Castaño 2bb5716b3d
Add repositories metering API (#62088)
This pull request adds a new set of APIs that allows tracking the number of requests performed
by the different registered repositories.

In order to avoid losing data, the repository statistics are archived after the repository is closed for
a configurable retention period `repositories.stats.archive.retention_period`. The API exposes the
statistics for the active repositories as well as the modified/closed repositories.

Backport of #60371
2020-09-08 14:01:04 +02:00
Armin Braun ebd1569028
Fix testMasterFailOverWithQueuedDeletes (#62062) (#62078)
Fixing very rare corner case where the delete retry is slow.

Closes #62031
2020-09-08 10:35:06 +02:00
Nhat Nguyen bb0a583990
Allow enabling soft-deletes on restore from snapshot (#62018)
Closes #61969
2020-09-07 09:45:36 -04:00
Alan Woodward cbc9578cbd Remove SearchPhase interface (#62050)
The interface is never used as an abstraction - implementations are are called directly,
and most of them don't need to implement the preProcess method.
2020-09-07 13:45:43 +01:00
David Turner 3389d5ccb2 Introduce integ tests for high disk watermark (#60460)
An important goal of the disk threshold decider is to ensure that nodes
use less disk space than the high watermark, and to take action if a
node ever exceeds this watermark. Today we do not have any
integration-style tests of this high-level behaviour. This commit
introduces a small test harness that can adjust the apparent size of the
disk and verify that the disk threshold decider moves shards around in
response.

Co-authored-by: Yannick Welsch <yannick@welsch.lu>
2020-09-07 14:39:39 +02:00
Armin Braun 395538f508
Improve Snapshot State Machine Performance (#62000) (#62049)
Just a few random things to optimize motivated by somewhat sub-standard performance
for large snapshot cluster states with many concurrent snapshots observed in production.
2020-09-07 13:25:40 +02:00
Jim Ferenczi fa8e76abb1
Improve reduction of terms aggregations (#61779) (#62028)
Today, the terms aggregation reduces multiple aggregations at once using a map
to group same buckets together. This operation can be costly since it requires
to lookup every bucket in a global map with no particular order.
This commit changes how term buckets are sorted by shards and partial reduces in
order to be able to reduce results using a merge-sort strategy.
For bwc, results are merged with the legacy code if any of the aggregations use
a different sort (if it was returned by a node in prior versions).

Relates #51857
2020-09-07 13:13:20 +02:00
Alan Woodward a295b0aa86 Fix null_value parsing for data_nanos field mapper (#61994)
The null_value parameter for date fields is always parsed using DateFormatter.parseMillis,
which is incorrect for nanosecond resolution fields. This commit changes the parsing logic
to always use DateFieldType.parse() to parse the null value.
2020-09-07 10:58:54 +01:00
Alan Woodward 1799c0c583 Convert completion, binary, boolean tests to MapperTestCase (#62004)
Also fixes a metadata serialization bug in CompletionFieldMapper.
2020-09-07 10:48:20 +01:00
Luca Cavanna 0c8b438577
Add support for runtime fields (#61776)
This commit includes the work that has been done on the runtime fields feature branch until now. The high level tasks are listed in #59332. The tasks that have not yet been completed can be worked on after merging the feature branch.

We are adding a new x-pack plugin called runtime-fields that plugs in a custom mapper which allows to define runtime fields based on a script.
The changes included in this commit that were made outside of the x-pack/plugin/runtime-fields directory are minimal and revolve around 1) making the ScriptService available while parsing index mappings so that the scripts associated to runtime fields can be compiled 2) sharing code to manipulate ranges etc. as it can be reused in runtime fields.

Co-authored-by: Nik Everett <nik9000@gmail.com>
2020-09-07 09:14:53 +02:00
Howard b26584dff8 Remove unused deciders in BalancedShardsAllocator (#62026) 2020-09-07 00:04:16 -04:00
Armin Braun 1e3edbbe74
Simplify BytesReference StreamInput (#61681) (#62014)
Flattening both streams into a single stream here saves a few objects and some indirection.
Also, removed the redundant `offset` field which added nothing but complexity by forcing the
incrementation of two counters on every read.
2020-09-05 10:45:52 +02:00
Ryan Ernst 6d3b691048
Add snapshot only test modules (#61954)
This commit adds external test modules. These are modules meant for
external systems to test edge cases in elasticsearch, but only within
snapshots. They are not meant to be used in production, so protections
are also added from their accidental inclusion in release builds.

Note that this commit does not actually add any new modules, it only
adds the infrastructure for the new modules, under
`test/external-modules`.
2020-09-04 16:35:18 -07:00
Yannick Welsch 6d08b55d4e Simplify searchable snapshot shard allocation (#61911)
Simplifies allocation for snapshot-backed shards by always making the recovery source "from snapshot" for those
snapshot-backed shards (instead of "recover from local or from empty store"). Also let's the balancer pick a node which
to allocate the snapshot-backed shard to (which takes number of shards on each node into account unlike the current
implementation which just picks whatever node we are allowed to allocate to, with no notion of "balancing" at all).
2020-09-04 15:45:00 +02:00
Alan Woodward 66bb1eea98 Improve error messages on bad [format] and [null_value] params for date mapper (#61932)
Currently, if an incorrectly formatted date is passed as a null_value for a date field mapper
configuration, you get a vague error:

Failed to parse mapping [_doc]: cannot parse empty date
Similarly, if you pass an incorrect format, you get the error:

Failed to parse mapping [_doc]: Invalid format [...]
This commit improves both these errors by including the mapper name and parameter that
are misconfigured.

Fixes #61712
2020-09-04 14:13:28 +01:00
Ignacio Vera 31c026f25c
upgrade to Lucene-8.7.0-snapshot-61ea26a (#61957) (#61974) 2020-09-04 13:46:20 +02:00
Nik Everett 3d23dcd742
Use standard bit set impl in cardinality (#61816) (#61930)
This replaces a specialized bit set implementation used in cardinality
with our standard `BitArray` which works exactly the same way. Its also
tracked by `BigArrays` which is great!
2020-09-03 12:37:30 -04:00
Nik Everett 3934e14bc0
Fixup vwhisto test (#60936) (#61928)
This test assumed some random bounds that turned out not to hold in some
cases.

Closes #60673
2020-09-03 12:37:17 -04:00
Alan Woodward 48870c60c7 Don't spin up a whole node to unit test some data structures (#61923)
BytesRefHashTests and LongObjectHashMapTests currently extend ESSingleNodeTestCase,
which builds an entire node just to run some unit tests over entirely in-memory data
structures. This commit converts them both to extend ESTestCase.
2020-09-03 17:19:42 +01:00
Alan Woodward 3a1e0edf0a Convert DateFieldMapperTests to MapperTestCase (#61920) 2020-09-03 16:04:02 +01:00
Martijn Laarman cfa54c08bd [7.x] Version bump 7.9.1 release 2020-09-03 16:41:58 +02:00
Alan Woodward e2f006eeb4
Merge FetchSubPhase hitsExecute and hitExecute methods (#60907) (#61893)
FetchSubPhase has two 'execute' methods, one which takes all hits to be examined,
and one which takes a single HitContext. It's not obvious which one should be implemented
by a given sub-phase, or if implementing both is a possibility; nor is it obvious that we first
run the hitExecute methods of all subphases, and then subsequently call all the
hitsExecute methods.

This commit reworks FetchSubPhase to replace these two variants with a processor class,
`FetchSubPhaseProcessor`, that is returned from a single `getProcessor` method.  This
processor class has two methods, `setNextReader()` and `process`.  FetchPhase collects
processors from all its subphases (if a subphase does not need to execute on the current
search context, it can return `null` from `getProcessor`).  It then sorts its hits by docid, and
groups them by lucene leaf reader.  For each reader group, it calls `setNextReader()` on
all non-null processors, and then passes each doc id to `process()`.

Implementations of fetch sub phases can divide their concerns into per-request, per-reader
and per-document sections, and no longer need to worry about sorting docs or dealing with
reader slices.

FetchSubPhase now provides a FetchSubPhaseExecutor that exposes two methods,
setNextReader(LeafReaderContext) and execute(HitContext). The parent FetchPhase collects all
these executors together (if a phase should not be executed, then it returns null here); then
it sorts hits, and groups them by reader; for each reader it calls setNextReader, and then
execute for each hit in turn. Individual sub phases no longer need to concern themselves with
sorting docs or keeping track of readers; global structures can be built in
getExecutor(SearchContext), per-reader structures in setNextReader and per-doc in execute.
2020-09-03 12:20:55 +01:00
Alan Woodward af01ccee93
Add specific test for serializing all mapping parameter values (#61844) (#61877)
This commit adds a test to MapperTestCase that explicitly checks that a mapper can
serialize all its default values, and that this serialization can then be re-parsed. Note that
the test is disabled for non-parametrized mappers as their serialization may in some cases
output parameters that are not accepted. Gradually moving all mappers to parametrized
form will address this.

The commit also contains a fix to keyword mappers, which were not correctly serializing
the similarity parameter; this partially addresses #61563. It also enables `null` as a
value for `null_value` on `scaled_float`, as a follow-up to #61798
2020-09-03 09:20:26 +01:00
Nik Everett c19f67ce30
Support longs in BitArray (backport of #61867) (#61871)
We frequently use `long`s with `BitArray` in aggs and right now we have
to assert that the `long` fits in an `int`. This adds support for `long`
to `BitArray` so we don't need those assertions.
2020-09-02 17:24:31 -04:00
Henning Andersen 867d5f1c68
Search memory leak (#61788) (#61862)
Search could leak memory if global ordinals were calculated as part of
a search with low level cancellation enabled. QueryPhase registers a
cancellation on the reader that is never removed, which ends up being
referenced from the global ordinals cache entry. This keeps an indirect
reference to the search context. A significant leak can occur when a
heavy aggregation (cardinality for instance) is used and a failure occurs
during search, in particular if the pages backing the hyperlog++ structure
are not recycled when it is closed.

This commit also fixes an issue with an unclosed resource and request
breaker adjustment in the cardinality aggregation.
2020-09-02 18:51:14 +02:00
Jim Ferenczi a0e4331c49 Cleanup usages of QueryPhaseResultConsumer (#61713)
This commit generalizes how QueryPhaseResultConsumer is initialized.
The query phase always uses this consumer so it doesn't need to be hidden behind
an abstract class.
2020-09-02 14:41:02 +02:00
Alan Woodward d59343b4ba
Allow [null] values in [null_value] (#61798) (#61807)
Several field mappers have a null_value parameter, that allows you to specify a placeholder
value to insert into a document if the incoming value for that field is null. The default value
for this is always null, meaning "add no placeholder". However, we explicitly bar users from
setting this parameter directly to null (done in #7978, in order to fix an NPE).

This exclusion means that if a mapper is serialized with include_defaults, then we either need
to special-case null_value to ensure that it is not output when it holds the default value, or
we find that the resulting serialized form cannot be used to create a mapping. This stops us
doing some useful generic testing of mappers.

This commit permits null as a parameter value for null_value, and changes the tests to check
that it is a) permissible and b) applied without throwing errors. As part of the testing changes,
a new base class MapperServiceTestCase is refactored from MapperTestCase, holding
the various helper methods related to building mappings but not the single-mapper specific
abstract methods.

Closes #58823
2020-09-02 10:42:19 +01:00
Igor Motov 48e53cca94
Fix wrong NaN comparison (#61795) (#61811)
Fixes wrong NaN comparison in error message generator in GeoPolygonDecomposer and PolygonBuilder.

Supersedes #48207

Co-authored-by: Pedro Luiz Cabral Salomon Prado <pedroprado010@users.noreply.github.com>
2020-09-01 15:50:38 -04:00
Tim Brooks e573fa9abc
Add data.path fast path for FilePermission (#61302)
The recursive data.path FilePermission check is an extremely hot
codepath in Elasticsearch. Unfortunately the FilePermission check in
Java is extremely allocation heavy. As it iterates through different
file permissions, it allocates byte arrays for each Path component that
must be compared. This PR improves the situation by adding the recursive
data.path FilePermission it its own PermissionsCollection object which
is checked first.
2020-09-01 12:03:22 -06:00
Armin Braun 28710c985d
Dry up Settings from Map Construction (#61778) (#61803)
We used the same hack all over the place. At least drying it up to a single place.

Co-authored-by: Jay Modi <jaymode@users.noreply.github.com>
2020-09-01 19:46:10 +02:00
Tanguy Leroux 6e944d9e21
Throws IndexNotFoundException in TransportGetAction for unknown System indices (#61785) (#61791)
The change #57936 introduced a dedicated thread pool for reads in system indices. 
It also introduced a potential NPE in the case the index to read in not yet present in 
the cluster state. This commit fixes that bug by using the getIndexSafe() instead of 
just index() method when retrieving the index's metadata so that an INFE is thrown 
if the index does not exist.
2020-09-01 17:41:57 +02:00
Dan Hermann 88a448f1cd
Fix wrong result when executing bulk requests with and without pipeline (#60818) (#61777) 2020-09-01 07:05:25 -05:00
Armin Braun 3fd25bfa87
Fix Concurrent Snapshot Create+Delete + Delete Index (#61770) (#61773)
We had a bug here were we put a `null` value into the shard
assignment mapping when reassigning work after a snapshot delete
had gone through. This only affects partial snaphots but essentially
dead-locks the snapshot process.

Closes #61762
2020-09-01 13:20:25 +02:00
Tanguy Leroux 787dfda4c1
Prevent snapshots to be mounted as system indices (#61517) (#61727)
System indices can be snapshotted and are therefore potential candidates 
to be mounted as searchable snapshot indices. As of today nothing 
prevents a snapshot to be mounted under an index name starting with . 
and this can lead to conflicting situations because searchable snapshot 
indices are read-only and Elasticsearch expects some system indices 
to be writable; because searchable snapshot indices will soon use an 
internal system index (#60522) to speed up recoveries and we should
prevent the system index to be itself a searchable snapshot index 
(leading to some deadlock situation for recovery).

This commit introduces a changes to prevent snapshots to be mounted 
as a system index.
2020-09-01 11:13:28 +02:00
Boice Huang 8fdd3d158b Remove redundant symbol in msearch tests (#61353) 2020-09-01 10:58:22 +02:00
Nik Everett fb84c1f73e
Calculate precise cardinality upper bounds (#61529) (#61754)
This reworks `CardinalityUpperBound` to support precise estimates while
maintaining most of the public API. This will allow us to make more
informed choices about the data structures that we use in aggregations.
None of those interesting choices come as part of this change, but they
are more possible with it.
2020-08-31 15:10:02 -04:00
Dan Hermann 2858e1efc4
Document new stats in _cat/nodes (#60445) (#61742) 2020-08-31 12:40:21 -05:00
Adam Locke 5723b928d7
Remove Outdated Snapshot Docs (#61684) (#61728)
Removing some now outdated statements that refer to a time
when snapshot operations could not run concurrently.

Closes #61680
2020-08-31 12:04:27 -04:00
Jason Tedor 43cb7c48bd
Adjust Lucene versions for 7.9.1
This commit adjusts the Lucene versions for 7.9.1 after the backporting
of upgrading the 7.9 branch to Lucene 8.6.2.
2020-08-31 10:30:39 -04:00
Jason Tedor 64cd229b35
Upgrade to Lucene 8.6.2 (#61688)
This commit upgrades the Lucene dependencies to 8.6.2.
2020-08-31 09:54:07 -04:00
Rory Hunter ff6c071275
Implement deprecation logging using log4j (#61629)
Backport of #61474.

Part of #46106. Simplify the implementation of deprecation logging by
relying of log4j more completely, and implementing additional behaviour
through custom appenders and filters.
2020-08-31 12:42:04 +01:00
Armin Braun 5c86b216e8
Fix Race in testGetSnapshotsRequest (#61694) (#61700)
The fact that the data node is already blocked on writing
data files did not guarantee that the cluster state that made
the data node start snapshotting is already applied on master.
This could lead to races where the get snapshots action still
runs based on a state without the snapshot in it, tripping the assertion.
Much safer to handle this by waiting on the non-blocking snapshot create
to return, which guarantees that the CS has been applied on master.

Closes #61541
2020-08-31 11:06:51 +02:00
Armin Braun 22e4d759c3
Speed up Reading Enum Set from Stream (#61678) (#61687)
No need in adding enum values to a normal set and then copying, the `EnumSet` is directly mutable just fine.
2020-08-30 20:49:51 +02:00
Jake Landis d2e5f2f532
[7.x] Enhance the ingest node simulate verbose output (#60433) (#60678)
This commit enhances the verbose output for the
`_ingest/pipeline/_simulate?verbose` api. Specifically
this adds the following:
* the pipeline processor is now included in the output
* the conditional (if) and result is now included in the output iff it was defined
* a status field is always displayed. the possible values of status are
  * `success` - if the processor ran with out errors
  * `error` - if the processor ran but threw an error that was not ingored
  * `error_ignored` - if the processor ran but threw an error that was ingored
  * `skipped` - if the process did not run (currently only possible if the if condition evaluates to false)
  * `dropped` - if the the `drop` processor ran and dropped the document
* a `processor_type` field for the type of processor (e.g. set, rename, etc.)
* throw a better error if trying to simulate with a pipeline that does not exist

closes #56004
2020-08-27 16:53:09 -05:00
Lee Hinman 1bfebd54ea
[7.x] Allocate newly created indices on data_hot tier nodes (#61342) (#61650)
This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by
default when they are created.

This does not break existing behavior, as nodes with the `data` role are considered to be part of
the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`,
`data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by
default.

This change is a little more complicated than changing the default value for
`index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to
have a plugin inject a setting into the builder for a newly created index. This has the benefit of
allowing this setting to be visible as part of the settings when retrieving the index, for example:

```
// Create an index
PUT /eggplant

// Get an index
GET /eggplant?flat_settings
```

Returns the default settings now of:

```json
{
  "eggplant" : {
    "aliases" : { },
    "mappings" : { },
    "settings" : {
      "index.creation_date" : "1597855465598",
      "index.number_of_replicas" : "1",
      "index.number_of_shards" : "1",
      "index.provided_name" : "eggplant",
      "index.routing.allocation.include._tier" : "data_hot",
      "index.uuid" : "6ySG78s9RWGystRipoBFCA",
      "index.version.created" : "8000099"
    }
  }
}
```

After the initial setting of this setting, it can be treated like any other index level setting.

This new setting is *not* set on a new index if any of the following is true:

- The index is created with an `index.routing.allocation.include.<anything>` setting
- The index is created with an `index.routing.allocation.exclude.<anything>` setting
- The index is created with an `index.routing.allocation.require.<anything>` setting
- The index is created with a null `index.routing.allocation.include._tier` value
- The index was created from an existing source metadata (shrink, clone, split, etc)

Relates to #60848
2020-08-27 13:41:12 -06:00
Luca Cavanna f769821bc8
Pass SearchLookup supplier through to fielddataBuilder (#61430) (#61638)
Runtime fields need to have a SearchLookup available, when building their fielddata implementations, so that they can look up other fields, runtime or not.

To achieve that, we add a Supplier<SearchLookup> argument to the existing MappedFieldType#fielddataBuilder method.

As we introduce the ability to look up other fields while building fielddata for mapped fields, we implicitly add the ability for a field to require other fields. This requires some protection mechanism that detects dependency cycles to prevent stack overflow errors.

With this commit we also introduce detection for cycles, as well as a limit on the depth of the references for a runtime field. Note that we also plan on introducing cycles detection at compile time, so the runtime cycles detection is a last resort to prevent stack overflow errors but we hope that we can reject runtime fields from being registered in the mappings when they create a cycle in their definition.

Note that this commit does not introduce any production implementation of runtime fields, but is rather a pre-requisite to merge the runtime fields feature branch.

This is a breaking change for MapperPlugins that plug in a mapper, as the signature of MappedFieldType#fielddataBuilder changes from taking a single argument (the index name), to also accept a Supplier<SearchLookup>.

Relates to #59332

Co-authored-by: Nik Everett <nik9000@gmail.com>
2020-08-27 18:09:56 +02:00
Alan Woodward b6cb590685 Log more information when mappings fail on index creation (#61577)
Errors from bad mappings at index creation are currently logged at DEBUG level, which
can make it difficult to work out what's going on if the index is being auto-created. This
commit ups the log level to INFO for auto-created indices, and includes some more
information in the log message.
2020-08-27 15:08:51 +01:00
David Turner 411965d392 Allow background cluster state update in tests (#61455)
Today the `CoordinatorTests` run the publication process as a single
atomic action; however in production it appears possible that another
master may be elected, publish its state, then fail, then we win another
election, all in between the time we sampled our previous cluster state
and started to publish the one we first thought of.

This violates the `assertClusterStateConsistency()` assertion that
verifies the cluster state update event matches the states we actually
published and applied.

This commit adjusts the tests to run the publication process more
asynchronously so as to allow time for this behaviour to occur. This
should eventually result in a reproduction of the failure in #61437 that
will let us analyse what's really going on there and help us fix it.
2020-08-27 11:22:58 +01:00
David Turner b866aaf81c Use int for number of parts in blob store (#61618)
Today we use `long` to represent the number of parts of a blob. There's
no need for this extra range, it forces us to do some casting elsewhere,
and indeed when snapshotting we iterate over the parts using an `int`
which would be an infinite loop in case of overflow anyway:

    for (int i = 0; i < fileInfo.numberOfParts(); i++) {

This commit changes the representation of the number of parts of a blob
to an `int`.
2020-08-27 10:54:03 +01:00
David Turner 5df74cc888 Replace Math.toIntExact with toIntBytes (#61604)
We convert longs to ints using `Math.toIntExact` in places where we're
sure there will be no overflow, but this doesn't explain the intent of
these conversions very well. This commit introduces a dedicated method
for these conversions, and adds an assertion that we never overflow.
2020-08-27 08:28:54 +01:00
Jay Modi 34c4fc3b91
Remove tasks module to define tasks system index (#61588)
This commit removes the tasks module that only existed to define the
tasks result index, `.tasks`,  as a system index. The definition for
the tasks results system index descriptor is moved to the
`SystemIndices` class with a check that no other plugin or module
attempts to define an entry with the same source.

Additionally, this change also makes the pattern for the tasks result
index a wildcard pattern since we will need this when the index is
upgraded (reindex to new name and then alias that to .tasks).

Backport of #61540
2020-08-26 09:48:23 -06:00
David Turner f2dc664228 Remove dead code in EsExecutors (#61574)
Removes a couple of unused methods.
2020-08-26 16:08:36 +01:00
Przemyslaw Gomulka 9f566644af
Do not create two loggers for DeprecationLogger backport(#58435) (#61530)
DeprecationLogger's constructor should not create two loggers. It was
taking parent logger instance, changing its name with a .deprecation
prefix and creating a new logger.
Most of the time parent logger was not needed. It was causing Log4j to
unnecessarily cache the unused parent logger instance.

depends on #61515
backports #58435
2020-08-26 16:04:02 +02:00
Igor Motov f70a59971a
[7.x] Add rate aggregation (#61369) (#61554)
Adds a new rate aggregation that can calculate a document rate for buckets
of a date_histogram.

Closes #60674
2020-08-25 17:39:00 -04:00
Nik Everett 87cf81e179
Migrate some more mapper test cases (#61507) (#61552)
Migrate some more mapper test cases from `ESSingleNodeTestCase` to
`MapperTestCase`.
2020-08-25 15:27:26 -04:00
markharwood 8b56441d2b
Search - add case insensitive support for regex queries. (#59441) (#61532)
Backport to add case insensitive support for regex queries. 
Forks a copy of Lucene’s RegexpQuery and RegExp from Lucene master.
This can be removed when 8.7 Lucene is released.

Closes #59235
2020-08-25 17:18:59 +01:00
Przemyslaw Gomulka f3f7d25316
Header warning logging refactoring backport(#55941) (#61515)
Splitting DeprecationLogger into two. HeaderWarningLogger - responsible for adding a response warning headers and ThrottlingLogger - responsible for limiting the duplicated log entries for the same key (previously deprecateAndMaybeLog).
Introducing A ThrottlingAndHeaderWarningLogger which is a base for other common logging usages where both response warning header and logging throttling was needed.

relates #55699
relates #52369
backports #55941
2020-08-25 16:35:54 +02:00
Armin Braun f22ddf822e
Some Optimizations around BytesArray (#61183) (#61511)
* Faster `equals` for `BytesArray` which is nice since with this change we use it for the search cache
* Lighter `StreamInput` for `BytesArray` that should save memory and some indirection relative to the one on the abstract bytes reference
* Lighter `writeTo` implementation
* Build a `BytesArray` instead of a PagedBytesReference whenever possible to save indirection and memory
2020-08-25 07:13:39 +02:00
Armin Braun 806dfcfcf7
Speed up Compression Logic by Pooling Resources (#61358) (#61495)
This is mostly motivated by the performance issues we are seeing around the GET mappings
REST API which (in case of a large number of indices) will create decompressing streams in a hot loop
which takes a significant amount of time for the system calls involved in instantiating deflaters
and inflaters.
Also, this fixes a leaked deflater when deserializing cached repository data.
2020-08-25 04:01:55 +02:00
Armin Braun 16b932c1dc
Remove Potentially Expensive Use of BytesReference.toBytesRef (#61415) (#61503)
This method might have materialize all the bytes in a reference into a fresh `byte[]`.
Using the stream is much safer and only trivially more expensive + in most cases we now run the fast path via `BytesArray` anyway.
2020-08-24 23:58:21 +02:00
Nhat Nguyen d47bbbafe0 Cancel multisearch when http connection closed (#61399)
Relates #61337
2020-08-24 15:12:54 -04:00
Nhat Nguyen 23a0f8b617 Detect and optimize noop of update index settings (#61348)
This optimization is more relevant in the context of CCR. When a node in
the follower cluster leaves, we reallocate the shard-follow tasks on 
that node to other nodes. The new tasks will overwhelm the follower
cluster with many put-mapping, update-settings requests, although most
of them are noop. This change detects and optimizes the noop
update-settings requests.
2020-08-24 15:08:53 -04:00
Nik Everett f3b6d49ae1
Migrate server mapper tests to new MapperTestCase (#61378) (#61490)
This continues #61301, migrating all of the mappers in `server` to the
new `MapperTestCase` which is nicer than `FieldMapperTestCase` because
it doesn't depend on all of Elasticsearch.
2020-08-24 13:33:35 -04:00
Armin Braun bb4d97073c
Remove Favicon Special Path in RestController (#61460) (#61487)
It's unnecessary (and adds one string comparison to every request) to special
case the favicon so I added it as a normal REST handler to simplify the code.
2020-08-24 18:36:23 +02:00
Armin Braun af2e2782eb
Stop Needlessly Copying Bytes in XContent Parsing (#61447) (#61469)
Wrapping a `BytesArray` in a `StreamInput` for deserialization is inefficient.
This forces Jackson to internally buffer (i.e. copy) all bytes from the `BytesArray`
before deserializing, adding overhead for copying the bytes and managing the buffers.

This commit fixes a number of spots where `BytesArray` is the most common type of
`BytesReference` to special case this type and parse it more efficiently.
Also improves parsing `String`s to use the more efficient direct `String` parsing APIs.
2020-08-24 15:49:15 +02:00
Dan Hermann c53731a0cd
[7.x] Fix wrong pipeline name in debug log (#58817) (#61233) 2020-08-21 11:14:01 -05:00
David Turner 078e8717ee Stop opening PING conns to remote clusters (#61408)
Today a remote cluster connection comprises a `PING` and a `REG`
channel. The `PING` channel is only used for health checks between the
elected master and the members of its own cluster, so is unused in a
remote cluster connection. This commit removes this unused connection.
2020-08-21 12:21:57 +01:00
Armin Braun e09058df1a
Serialize Get Mappings Response on Generic ThreadPool (#57937) (#61401)
For large responses to the get mappings request, the serialization
to XContent can be extremely slow (serializing mappings is expensive since
we have to decompress and deserialize the mapping source).
To not introduce instability on the IO thread handling the get mappings response
we should move the serialization to the management pool.
The trade-off of introducing one or two new context switches for responses that are
small enough to not cause trouble on the transport thread to prevent instability
in case of a large number of mappings in the cluster seems worth it.
2020-08-21 08:06:30 +02:00
Armin Braun 22509c95f8
Fix Blackholed Connection Behavior in DisruptableMockTransport (#61310) (#61381)
It is not realistic to drop messages without eventually failing.
To retain the coverage of long pauses this PR adjusts the blackholed
behavior to fail a send after 24h (which is assumed to be longer than any
timeout in the system) instead of never.

Closes #61034
2020-08-21 07:54:56 +02:00
Julie Tibshirani 997c73ec17
Correct how field retrieval handles multifields and copy_to. (#61391)
Before when a value was copied to a field through a parent field or `copy_to`,
we parsed it using the `FieldMapper` from the source field. Instead we should
parse it using the target `FieldMapper`. This ensures that we apply the
appropriate mapping type and options to the copied value.

To implement the fix cleanly, this PR refactors the value parsing strategy. Now
instead of looking up values directly, field mappers produce a helper object
`ValueFetcher`. The value fetchers are responsible for almost all aspects of
fetching, including looking up the right paths in the _source.

The PR is fairly big but each commit can be reviewed individually.

Fixes #61033.
2020-08-20 15:53:35 -07:00
Julie Tibshirani 85ad328df7
Ensure fetch fields aren't dropped when rewriting search. (#61390)
Previously we didn't retain the requested fields when performing a shallow copy
of the search source. This meant that when a search was rewritten, we could drop
the requested fields and fail to return them in the response.
2020-08-20 14:58:58 -07:00
Armin Braun 08dbd6d989
Optimize a few Spots on IO Loop (#60865) (#61380)
Saving some cycles here and there on the IO loop:

* Don't instantiate new `Runnable` to execute on `SAME` in a few spots
* Don't instantiate complicated wrapped stream for empty messages
* Stop instantiating almost never used `ClusterStateObserver` in two spots
* Some minor cleanup and preventing pointless `Predicate<>` instantiation in transport master node action
2020-08-20 20:22:49 +02:00
Alan Woodward a3a0c63ccf
Convert NumberFieldMapper to parametrized form (#61092) (#61376)
In addition, this commit converts ScaledFloatFieldMapper as it was relying
on a number of static values taken from NumberFieldMapper that had changed
or been removed.
2020-08-20 16:43:26 +01:00
Nhat Nguyen a3906dcef3 Enable cancellation for msearch requests (#61337)
Today multi-search requests are not cancellable because we create
regular tasks instead of cancellable ones for them.
2020-08-19 16:59:17 -04:00
Nik Everett 9789e6d154
Migrate some field mapper tests to ESTestCase (#61301) (#61346)
This switches a few tests for field mappers from `ESSingleNodeTestCase`
to `ESTestCase` because, in general, we prefer to avoid
`ESSingleNodeTestCase` when we can because it is slow and "big". "Big"
here means that it pulls in an entire node, making it difficult to
reason about what you are testing.
2020-08-19 15:43:49 -04:00
Armin Braun 4a53ae203e
Fix SharedClusterSnapshotRestoreIT.testThrottling (#61323) (#61328)
We have to set the recovery setting to `0` if we don't want throttling
from recoveries. Otherwise the randomized value used for this setting in
tests can lead to throttling unexpectedly.

Closes #61311
2020-08-19 15:26:32 +02:00
Nik Everett 70128e022b fix stats aggregator tests
With #60683 we stopped forcing aggregating all docs using a single
Aggregator which made some of our accuracy assumptions about the stats
aggregator incorrect. This adds a test that does the forcing and asserts
the old accuracy and adds a test without the forcing with much looser
accuracy guarantees.

Closes #61132
2020-08-19 08:55:43 -04:00
Alan Woodward b1aa0d8731
Fix fieldnames field type for pre-6.1 indexes (#61322)
The FieldNamesFieldMapper field has different behaviour for indexes created in
clusters earlier than v6.1, and the code to deal with this was still using the vestigial
FieldType field of FieldMapper in its indexing path. This meant that documents
added after an upgrade were not correctly indexing their field names field. This
commit corrects the parseCreateField method to use the default field type.

Fixes #61305
2020-08-19 12:59:09 +01:00
David Turner 389f7779e7 Report more details of unobtainable ShardLock (#61255)
Today a common reason for a `ShardLockObtainFailedException` is when a
shard is removed from a node and then assigned straight back to it again
before the node has had a chance to shut the previous shard instance
down. For instance, this can happen if a node briefly leaves the cluster
holding a primary with no in-sync replicas.

The message in this case is typically as follows:

    obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]

This is pretty hard to interpret, and doesn't raise the important
question: "why didn't the shard shut down sooner?"

With this change we reword the message a bit, report the age of the
shard lock, and adjust the details to report that the lock is held by a
closing shard:

    obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [12345ms]

Relates #38807
2020-08-19 06:36:28 +01:00
Nhat Nguyen 08b0e78ef4 Log more info when search ops higher than expected (#61108)
We have seen a situation where the total search operations are higher 
than expected. Unfortunately, we did not have enough info to figure it
out. This commit adds the failures to the error to provide more context
and adjusts the log level in case of failure to debug.
2020-08-18 15:20:41 -04:00
Rory Hunter bd7236cd65 Version bump for 7.9.0 release 2020-08-18 16:07:43 +01:00
Dimitrios Liappis c870640cbd
[7.x] Introduce 6.8.13 as a version (#61198)
Introduce version 6.8.13 to branch 7.x
2020-08-18 17:07:16 +03:00
Armin Braun 58d07b2ffc
Remove Unused ByteBufferReference (#61116) (#61250)
We only work with heap byte buffers at this point and those we can and do unwrap the
`byte[]` ourselves and use `BytesArray` instead of a needless level of indirection via `ByteBuffer`.
2020-08-18 10:53:40 +02:00
Armin Braun 6ffa7f0737
Fix testConcurrentSnapshotDeleteAndDeleteIndex (#61228) (#61249)
There is a corner case here in which during partial snapshot the index is
deleted right between starting the snapshot in the CS and the data node getting to work
on it, causing the data node the fail that shard snapshot and making the snapshot `PARTIAL`.

Closes #61208
2020-08-18 10:45:30 +02:00
Mark Tozzi db1df6cc30
[7.x] Remove a bunch of type boilerplate from Aggs (#60852) (#61031) 2020-08-17 12:13:05 -04:00
Nik Everett 1b7bbafd81
Add method to make random DateFormatter pattern (backport of #60613) (#61213)
Adds a method to make a random date `DateFormatter` pattern. We expect
this'll be useful for runtime fields to compate their formatting with
the standard date field.
2020-08-17 10:57:52 -04:00
Christoph Büscher 6866396e1d Improve 'ignore_malformed' handling for dates (#60211)
Currently we occasionally can get ArithmeticException from parsing bad input
values on 'date' fields that are passed on even if 'ignore_malformed' is set.
This change adds this exception to the ones we already catch for malformed
values.

Closes #52634
2020-08-17 16:18:08 +02:00
David Turner b21cb7f466 Reduce allocations when persisting cluster state (#61159)
Today we allocate a new `byte[]` for each document written to the
cluster state. Some of these documents may be quite large. We need a
buffer that's at least as large as the largest document, but there's no
need to use a fresh buffer for each document.

With this commit we re-use the same `byte[]` much more, only allocating
it afresh if we need a larger one, and using the buffer needed for one
round of persistence as a hint for the size needed for the next one.
2020-08-17 13:45:31 +01:00
David Turner f3e0c60896
Restrict testing of legacy discovery to tests (#61178)
The 7.x branch preserves the legacy discovery mechanism from 6.x purely
for running internal cluster tests; this mechanism is otherwise
completely untested and unsupported. However it is still technically
possible to use it outside of the test suite if you dig through the
source code to work out what settings need to be set. With this change
we make it impossible to use this mechanism in production.

Closes #61177
2020-08-17 11:05:27 +01:00
Ryan Ernst 9cb45dafab
Unwrap transport exception when using transport client (#60801)
The ReloadSecureSettingsIT makes requests to the reload settings apis.
In 7.x, the client used from the integ test infrastructure may be a
transport client. In that case, the expected exception type, and causes
the test to fail (though it will hang indefinitely due to not counting
down the latch, see
https://github.com/elastic/elasticsearch/pull/60800). This commit adds
unwrapping of the remote exception to get the underlying expected
exception.

closes #51546
2020-08-13 10:24:04 -07:00
Ryan Ernst c73ab0b16f
Ensure hotthreads do not produce node failures (#61073)
This commit adds an assertion that no sub-nodes requests within
hot threads failed.

relates #58842
2020-08-13 10:22:19 -07:00
Armin Braun 3143b5ea47
Stabilize testSnapshotDeleteRelocatingPrimaryIndex (#61088) (#61096)
Use transport blocking to make relocation take forever instead of relying on the relocation to take long enough to clash with the snapshot.

Closes #61069
2020-08-13 16:26:56 +02:00
Yannick Welsch 8e775394ac Fix testNoMasterActionsMetadataWriteMasterBlock (#60605)
We can't assert on the specific exception, unfortunately.
2020-08-13 10:48:16 +02:00
David Turner c6276ae177 Fail invalid incremental cluster state writes (#61030)
It is disastrous if we commit an incremental cluster state update
without having written the full state first. We assert that this doesn't
happen, but it is hard to fully test the myriad ways that things might
fail in a messy production environment. Given the disastrous
consequences it is worth erring on the side of caution in this area.
This commit fails invalid writes even if assertions are disabled.
2020-08-12 19:46:19 +01:00
Lee Hinman e3df64a429
[7.x] Add data tiers (hot, warm, cold, frozen) as custom node roles (#60994) (#61045)
This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the
x-pack plugin. These roles are intended to be the base for the formalization of data tiers in
Elasticsearch.

These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing
`data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and
frozen node).

This also includes a custom `AllocationDecider` that allows the user to configure the following
settings on a cluster level:
- `cluster.routing.allocation.require._tier`
- `cluster.routing.allocation.include._tier`
- `cluster.routing.allocation.exclude._tier`

And in index settings:
- `index.routing.allocation.require._tier`
- `index.routing.allocation.include._tier`
- `index.routing.allocation.exclude._tier`

Relates to #60848
2020-08-12 11:06:23 -06:00
Alan Woodward 5b3c10c379
Fix serialization of AllFieldMapper (#61044)
Converting AllFieldMapper to parametrized form ended up not being run through BWC
testing, resulting in an incorrect implementation being committed. This commit fixes
the serialization, and adds unit tests as well as unmuting the BWC test that uncovered
the bug.

Fixes #60986
2020-08-12 17:32:55 +01:00
Yannick Welsch 8c488de576 Gracefully handle null in checkSettingsForTerminalDeprecation
Fixes a test failure after backport to 7.x
2020-08-12 18:03:52 +02:00
Yannick Welsch 25404cbe3d Provide option to allow writes when master is down (#60605)
Elasticsearch currently blocks writes by default when a master is unavailable. The cluster.no_master_block setting allows
a user to change this behavior to also block reads when a master is unavailable. This PR introduces a way to now also still
allow writes when a master is offline. Writes will continue to work as long as routing table changes are not needed (as
those require the master for consistency), or if dynamic mapping updates are not required (as again, these require the
master for consistency).

Eventually we should switch the default of cluster.no_master_block to this new mode.
2020-08-12 16:56:45 +02:00
Yannick Welsch 6644f2283d Do not access snapshot repo on dedicated voting-only master node (#61016)
Today a snapshot repository verification ensures that all master-eligible and data nodes have write access to the
snapshot repository (and can see each other's data) since taking a snapshot requires data nodes and the currently
elected master to write to the repository. However, a dedicated voting-only master-eligible node is not a data node and
will never be the elected master so we should not require it to have write access to the repository.

Closes #59649
2020-08-12 16:56:45 +02:00
Yannick Welsch af519be9cb Ensure repo not in use for wildcard repo deletes (#60947)
Repositories can't be unregistered when they are actively being used for snapshots or restores. Wildcard repository
deletes could silently bypass the "repo in use" checks however, which is now fixed.
2020-08-12 16:38:06 +02:00
Dan Hermann 538c93c923
Adding Hit counts and Miss counts for QueryCache exposed through REST api. (#60114) (#60993) 2020-08-12 08:21:09 -05:00
Alan Woodward c81dc2b8b7 Convert KeywordFieldMapper to parametrized form (#60645)
This makes KeywordFieldMapper extend ParametrizedFieldMapper, with explicitly
defined parameters.

In addition, we add a new option to Parameter, restrictedStringParam, which
accepts a restricted set of string options.
2020-08-12 11:41:11 +01:00
markharwood 66098e0bf4
Search fix: query_string regex/wildcard searches not working on wildcard fields (#60959) (#61010)
The Query string parser was not delegating the construction of wildcard/regex queries to the underlying field type.
The wildcard field has special data structures and queries that operate on them so cannot rely on the basic regex/wildcard queries that were being used for other fields.

Closes #60957
2020-08-12 10:44:52 +01:00
Armin Braun 32423a486d
Simplify and Speed up some Compression Usage (#60953) (#61008)
Use thread-local buffers and deflater and inflater instances to speed up
compressing and decompressing from in-memory bytes.
Not manually invoking `end()` on these should be safe since their off-heap memory
will eventually be reclaimed by the finalizer thread which should not be an issue for thread-locals
that are not instantiated at a high frequency.
This significantly reduces the amount of byte copying and object creation relative to the previous approach
which had to create a fresh temporary buffer (that was then resized multiple times during operations), copied
bytes out of that buffer to a freshly allocated `byte[]`, used 4k stream buffers needlessly when working with
bytes that are already in arrays (`writeTo` handles efficient writing to the compression logic now) etc.

Relates #57284 which should be helped by this change to some degree.
Also, I expect this change to speed up mapping/template updates a little as those make heavy use of these
code paths.
2020-08-12 11:06:23 +02:00
Nik Everett ce9c5f0e46 Fix diversified sample tests
The test assumed that the aggregator only ran once but we turned that
off. This turns it back on.
2020-08-11 17:49:43 -04:00
Jay Modi 2fa6448a15
System index reads in separate threadpool (#60927)
This commit introduces a new thread pool, `system_read`, which is
intended for use by system indices for all read operations (get and
search). The `system_read` pool is a fixed thread pool with a maximum
number of threads equal to lesser of half of the available processors
or 5. Given the combination of both get and read operations in this
thread pool, the queue size has been set to 2000. The motivation for
this change is to allow system read operations to be serviced in spite
of the number of user searches.

In order to avoid a significant performance hit due to pattern matching
on all search requests, a new metadata flag is added to mark indices
as system or non-system. Previously created system indices will have
flag added to their metadata upon upgrade to a version with this
capability.

Additionally, this change also introduces a new class, `SystemIndices`,
which encapsulates logic around system indices. Currently, the class
provides a method to check if an index is a system index and a method
to find a matching index descriptor given the name of an index.

Relates #50251
Relates #37867
Backport of #57936
2020-08-11 12:16:34 -06:00
Julie Tibshirani a93be8d577
Handle nested arrays in field retrieval. (#60981)
We accept _source values with multiple levels of arrays, such as
`"field": [[[1, 2]]]`. This PR ensures that field retrieval can handle nested
arrays by unwrapping the arrays before parsing the values.
2020-08-11 10:22:16 -07:00
Mark Tozzi ab8518fb5b
[7.x] Extensibility for Composite Agg #59648 (#60842) 2020-08-11 09:14:33 -04:00
Alan Woodward 54279212cf
Make MetadataFieldMapper extend ParametrizedFieldMapper (#59847) (#60924)
This commit cuts over all metadata field mappers to parametrized format.
2020-08-11 09:02:28 +01:00
Armin Braun 3e2dfc6eac
Remove GCS Bucket Exists Check (#60899) (#60914)
Same as https://github.com/elastic/elasticsearch/pull/43288 for GCS.
We don't need to do the bucket exists check before using the repo, that just needlessly
increases the necessary permissions for using the GCS repository.
2020-08-11 09:54:27 +02:00
Julie Tibshirani d51eae6e9f
Prevent loading 'fields' with stored fields disabled. (#60938)
Because the 'fields' option loads from _source (which is a stored field), it is
not possible to retrieve 'fields' when stored_fields are disabled.

This also fixes #60912, where setting stored_fields: _none_ prevented the
_ignored fields from being loaded and caused a parsing exception.
2020-08-10 15:40:27 -07:00
Nik Everett 0286d0a769
Move distance_feature query building into MFT (#60614) (#60846)
This moves the `distance_feature` query building out of
`DistanceFeatureQueryBuilder` and into subclasses of `MappedFieldType`.
Without this we don't have a chance of supporting this for runtime
fields. In general I'm not sad to see the `instanceof`s go.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-08-10 16:05:17 -04:00
Julie Tibshirani b216340f50
Make `FetchPhase` logic more readable. (#60779)
* Factor out FieldsVisitor#postProcess call.
* Swap logical order for normal and nested documents.
* Extract the method createStoredFieldsVisitor.
2020-08-10 11:04:54 -07:00
Nik Everett dfd502f9ca
Rework checking if a year is a leap year (#60585) (#60790)
This way is faster, saving about 8% on the microbenchmark that rounds to
the nearest month. That is in the hot path for `date_histogram` which is
a very popular aggregation so it seems worth it to at least try and
speed it up a little.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-08-10 12:45:34 -04:00
Jim Ferenczi f30f1f04e2
Replace AggregatorTestCase#search with AggregatorTestCase#searchAndReduce (#60816)
This commit removes the ability to test the top level result of an aggregator
before it runs the final reduce. All aggregator tests that use AggregatorTestCase#search
are rewritten with AggregatorTestCase#searchAndReduce in order to ensure that we test
the final output (the one sent to the end user) rather than an intermediary result
that could be different.
This change also removes spurious commits triggered on top of a random index writer.
These commits slow down the tests and are redundant with the commits that the
random index writer performs.
2020-08-10 17:23:00 +02:00
David Turner f44c28b595
Deprecate and ignore join timeout (#60872)
There is no point in timing out a join attempt any more once a cluster
is entirely in 7.x. Timing out and retrying with the same master is
pointless, and an in-flight join attempt to one master no longer blocks
attempts to join other masters. This commit deprecates this unnecessary
setting and removes its effect from the joining process.

Relates #60873 which removes this setting in master.
2020-08-10 13:57:41 +01:00
Martijn van Groningen 64bb082f9b
Improve error message for non append-only writes that target data stream (#60874)
Backport of #60809 to 7.x branch.

Closes #60581
2020-08-10 13:18:59 +02:00
Alan Woodward e8d9185045 Cut over IPFieldMapper to parametrized form (#60602)
This commit makes IpFieldMapper extend ParametrizedFieldMapper. It also
updates the IpFieldMapper docs to add the ignore_malformed parameter,
which was not previously documented.
2020-08-10 11:01:10 +01:00
David Turner 1f49e0b9d0 Fix testRerouteOccursOnDiskPassingHighWatermark (#60869)
Sometimes this test would refresh the disk stats so quickly that it hit
the refresh rate limiter even though it was almost completely disabled.
This commit allows the rate limiter to be completely disabled.

Closes #60587
2020-08-10 09:39:44 +01:00
Ryan Ernst ddcfbec569
Add assert message for multiple lines in osprobe (#60796)
Several /proc files are expected to contain a single line. We assert on
this in tests, but the contents of the file are lost and the assertion
therefore lacks important information to debug why the file appeared to
have multiple lines. This commit dumps the contents of the file on
assertion failure.

relates #59284
2020-08-06 15:53:30 -07:00
Ryan Ernst fc38af363e
Ensure latch is counted down when assertion trips (#60800)
The ReloadSecureSettingsIT uses latches to ensure coordination across
requests to the underlying in memory cluster. However, in the case of an
expected failure, if the assertion fails, the latch will never be
counted down, and will cause the test to hang indefinitely. This commit
ensures the latch is always counted down with a try/finally.

relates #51546
2020-08-06 15:33:46 -07:00
Jim Ferenczi 98119578a1 Disable sort optimization on search collapsing (#60838)
Collapse search queries that sort by a field can throw
an ArrayStoreException due to a bug in the [sort optimization](https://github.com/elastic/elasticsearch/pull/51852)
introduced in 7.7.0. Search collapsing were not supposed to
be eligible for this sort optimization so this change explicitly
filters them from this new feature.
2020-08-06 21:37:12 +02:00
Jim Ferenczi 14980ff97e Fix AOOBE when setting min_doc_count to 0 in significant_terms (#60823)
This commit fixes the computation of the subset size on empty buckets (doc count of 0).
The aggregator test refactoring in #60683 revealed this bug.
2020-08-06 18:57:09 +02:00
David Turner 721198c29e Increase logging in testRerouteOccursOnDiskPassingHighWatermark (#60817)
Relates #60587
2020-08-06 14:08:09 +01:00
Armin Braun a2c7991e96
Fix CompressibleBytesOutputStreamTests (#60815) (#60822)
Since #60730 the `bytes` field can be `null`. This adds the missing `null` check to the test
override.

Closes #60814
2020-08-06 15:07:48 +02:00
David Turner 273a6f916d AwaitsFix for #60814 2020-08-06 12:56:28 +01:00
Tim Brooks 2f76c48ea7
Propagate forceExecution when acquiring permit (#60634)
Currently the transport replication action does not propagate the force
execution parameter when acquiring the indexing permit. The logic to
acquire the index permit supports force execution, so this parameter
should be propagate. Fixes #60359.
2020-08-05 15:57:40 -06:00
Francisco Fernández Castaño b4044004aa
Add recovery state tracking for Searchable Snapshots (#60751)
This pull request adds recovery state tracking for Searchable Snapshots.

In order to track recoveries for searchable snapshot backed indices, this pull
request adds a new type of RecoveryState.
This newRecoveryState instance is able to deal with the
small differences that arise during Searchable snapshots recoveries.

Those differences can be summarized as follows:

-  The Directory implementation that's provided by SearchableSnapshots mark the
    snapshot files as reused during recovery. In order to keep track of the
    recovery process as the cache is pre-warmed, those files shouldn't be marked
    as reused.
 - Once the shard is created, the cache starts its pre-warming phase, meaning that
    we should keep track of those downloads during that process and tie the recovery
    to this pre-warming phase. The shard is considered recovered once this pre-warming
    phase has finished.

Backport of #60505
2020-08-05 17:41:49 +02:00
Jake Landis f3752ba1d5
7.x suport new path for re-index java-api doc (#60319)
This commit uses the new location for the reindex java-api documentation.
Temporary files have been left behind to pacify the docs build.

related #60339
2020-08-05 09:05:07 -05:00
Armin Braun ebfb93ff26
Improve some BytesStreamOutput Usage (#60730) (#60736)
* Stop redundantly creating a `0` length `ByteArray` that is never used
* Add efficient way to get a minimal size copy of the bytes in a `BytesStreamOutput`
* Avoid multiple redundant `byte[]` copies in search cache key creation
2020-08-05 15:51:06 +02:00
Yannick Welsch 9f6f66f156 Fail searchable snapshot shards on invalid license (#60722)
Implements license degradation behavior for searchable snapshots. Snapshot-backed shards are failed when the license becomes invalid, and shards won't be reallocated. After valid license is put in place again, shards are allocated again.
2020-08-05 13:14:15 +02:00
Igor Motov 959690a64a
Refactor extendedBounds to use DoubleBounds (#60556) (#60681)
Refactors extendedBounds to use DoubleBounds instead
of 2 variables.

This is a follow up for #59175
2020-08-04 16:45:47 -04:00
Alan Woodward b3ae5d26bd
Move mapper validation to the mappers themselves (#60072) (#60649)
Currently, validation of mappers (checking that cross-references are correct, limits on
field name lengths and object depths, multiple definitions, etc) is performed by the
MapperService. This means that any mapper-specific validation, for example that done
on the CompletionFieldMapper, needs to be called specifically from core server code,
and so we can't add validation to mappers that live in plugins.

This commit reworks the validation framework so that mapper-specific validation is
done on the Mapper itself. Mapper gets a new `validate(MappingLookup)`
method (already present on `MetadataFieldMapper` and now pulled up to the parent
interface), which is called from a new `DocumentMapper.validate()` method. All
the validation code currently living on `MapperService` moves either to individual
mapper implementations (FieldAliasMapper, CompletionFieldMapper) or into
`MappingLookup`, an altered `DocumentFieldMappers` which now knows about
object fields and can check for duplicate definitions, or into DocumentMapper
which handles soft limit checks.
2020-08-04 14:39:20 +01:00
Armin Braun 212ce22d15
Optimize CS Persistence Stream Use (#60643) (#60647)
In the metadata persistence logic we failed to override the bulk write
method on the FilterOutputStream resulting in all the writes to it
running byte-by-byte in a loop adding a large number of bounds checks
needlessly.
2020-08-04 15:06:57 +02:00
Armin Braun 859ad761bb
Fix Broken Stream Close in writeRawValue (#60625) (#60644)
Small oversight in #56078 that only showed up during backporting where a stream copy was turned from a non-closing to a closing one. Enhanced part of a test in this PR to make it show up in master also even though we practically never use this method with stream targets that actually close.
2020-08-04 13:39:52 +02:00
Armin Braun 7ae9dc2092
Unify Stream Copy Buffer Usage (#56078) (#60608)
We have various ways of copying between two streams and handling thread-local
buffers throughout the codebase. This commit unifies a number of them and
removes buffer allocations in many spots.
2020-08-04 09:54:52 +02:00
Julie Tibshirani f99584c6f3
Avoid reloading _source for every inner hit. (#60632)
Previously if an inner_hits block required _ source, we would reload and parse
the root document's source for every hit. This PR adds a shared SourceLookup to
the inner hits context that allows inner hits to reuse parsed source if it's
already available. This matches our approach for sharing the root document ID.

Relates to #32818.
2020-08-03 17:12:27 -07:00
Julie Tibshirani fc63f8224f
Simplify class hierarchy for ordinals field data. (#60606)
This PR simplifies the hierarchy for ordinals field data classes:
* Remove `AbstractIndexFieldData`, since only `AbstractIndexOrdinalsFieldData`
inherits directly from it.
* Make `SortedSetOrdinalsIndexFieldData` extend
`AbstractIndexOrdinalsFieldData`. This lets us remove some redundant code.
2020-08-03 09:58:29 -07:00
Yannick Welsch 3409e019d2 Ignore shutdown when retrying recoveries (#60586)
Avoids failures when shutting down a node.
2020-08-03 15:14:38 +02:00
Nik Everett 2cde43b799
Allows nanosecond resolution in search_after (backport of #60328) (#60426)
Allows nanosecond resolution in search_after (#60328)

This fixes `search_after` to properly parse string formatted dates that
have nanosecond resolution.

Closes #52424
2020-08-03 08:17:48 -04:00
David Turner d2ddf8cd6a Improve deserialization failure logging (#60577)
Today when a node fails to properly deserialize a transport message with
a parent task we log the following relatively uninformative message:

    java.lang.IllegalStateException: Message not fully read (response) for requestId [9999], handler [org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler/org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler/org.elasticsearch.transport.TransportService$6@abcdefgh], error [false]; resetting

In particular, the wrapping of the listener in the `TransportService`
obscures all clues as to the source of the problem, e.g. the action name
or the identity of the underlying listener. This commit exposes the
inner listener to the logs.

Also if the listener is wrapped with `ContextPreservingActionListener`
then its identity is similarly hidden. This commit also exposes the
wrapped listener in this case.

Relates #38939
2020-08-03 11:51:01 +01:00
Armin Braun 3270cb3088
More Efficient Writes for Snapshot Shard Generations (#60458) (#60575)
Same as #59905 but for shard level metadata. Since we wnat to retain
the ability to do safe+atomic writes for non-uuid shard generations
this PR has to create two separate write paths for both kinds of
shard generations.
2020-08-03 11:11:36 +02:00
Armin Braun 204efe9387
Add Repository Setting to Disable Writing index.latest (#60448) (#60576)
Writing the `index.latest` blob is unnecessary unless the contents of the repository
are to be used as a URL-repository. Also, in some edge cases, the fact that `index.latest` is the only
blob in the repository that regularly gets overwritten was causing compatibility issues with
some backing blobstores (Azure no-overwrite policy, Hitachy S3 equivalent).
=> this commit changes behavior to make snapshots not fail if writing `index.latest` fails
and adds a setting to disable writing `index.latest`.
2020-08-03 11:11:24 +02:00
Andrei Dan ac258f10d6
Data streams: throw ResourceAlreadyExists exception (#60518) (#60536)
For consistency reasons (and reducing the overload of IllegalArgumentException)
this changes the exception thrown when trying to create a data stream
that already exists.

(cherry picked from commit ac2184c4614bba0f3ee377da49aea0daed98bab4)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-08-01 16:31:09 +01:00
Julie Tibshirani f1d4fd8c3e Correct name of IndexFieldData#loadGlobalDirect. (#60492)
It seems 'localGlobalDirect' was just a typo.
2020-07-31 10:53:21 -07:00
Jim Ferenczi 8db896d290 Fix race condition in SearchPhaseControllerTests#testPartialMergeFailure (#60488)
This change ensures that we call the listener for partial merge failure **before**
calling the completion listener in order to avoid race condition in tests.

Closes #60446
2020-07-31 16:29:20 +02:00
Rene Groeschke ed4b70190b
Replace immediate task creations by using task avoidance api (#60071) (#60504)
- Replace immediate task creations by using task avoidance api
- One step closer to #56610
- Still many tasks are created during configuration phase. Tackled in separate steps
2020-07-31 13:09:04 +02:00
Julie Tibshirani 8ac81a3447 Remove IndexFieldData#clear since it is unused. (#60475)
This method was never called. It also seemed tricky that calling a method on
`IndexFieldData` could clear the contents of a shared cache.
2020-07-30 14:07:55 -07:00
Julie Tibshirani dfd7f226f0
Clarify SourceLookup sharing across fetch subphases. (#60484)
The `SourceLookup` class provides access to the _source for a particular
document, specified through `SourceLookup#setSegmentAndDocument`. Previously
the search context contained a single `SourceLookup` that was shared between
different fetch subphases. It was hard to reason about its state: is
`SourceLookup` set to the expected document? Is the _source already loaded and
available?

Instead of using a global source lookup, the fetch hit context now provides
access to a lookup that is set to load from the hit document.

This refactor closes #31000, since the same `SourceLookup` is no longer shared
between the 'fetch _source phase' and script execution.
2020-07-30 13:22:31 -07:00
Dan Hermann 5e5503ac28
Change severity of negative stats messages from WARN to DEBUG (#60375) (#60444) 2020-07-30 06:06:13 -05:00
Armin Braun 3bf4c01d8e
Don't Allocate Redundant Pages in BigArrays (#60201) (#60441)
The oversize algorithm was allocating more pages than necessary to accommodate `minTargetSize`.
An example would be that a 16k page size and 15k `minTargetSize` would result in a new size of 32k (2 pages).
The difference between the minimum number of necessary pages and the estimated size then keeps growing as sizes increase.

I don't think there is much value in preemptively allocating pages by over-sizing aggressively since the behavior of
the system is quite different from that of a single array where over-sizing avoids copying
once the minimum target size is more than a single page.

Relates #60173 which lead me to this when `BytesStreamOutput` would allocate a large number of never used
pages during serialization of repository metadata.
2020-07-30 11:09:58 +02:00
Armin Braun a2c49a4f02
Reduce Heap Use during Shard Snapshot (#60370) (#60440)
Instances of `BlobStoreIndexShardSnapshots` can be of non-trivial size. In case of snapshotting a larger
number of shards the previous execution order would lead to memory use proportional to the number of shards
for these objects. With this change, the number of these objects on heap is bounded by the size of the snapshot
pool (except for in the BwC format path).
This PR makes it so that they are written to the repository at the earliest possible point in time
so that they can be garbage collected.
If shard generations are used, we can safely write these right at the beginning of the shard snapshot.
If shard generations are not used we can only write them at the end of the shard snapshot after all
other blobs have been written.

Closes #60173
2020-07-30 10:45:00 +02:00
Igor Motov 00a1949852
Streamline GeoJSON to map serialization (#60413) (#60429)
Optimizes GeoJSON to map serialization when retrieving spatial data through
fields.

Closes #60259
2020-07-29 17:56:56 -04:00
Julie Tibshirani 5359417ec3
Minor clean-up around search highlight context. (#60422)
* Rename SearchContextHighlight -> SearchHighlightContext.
* Rename HighlighterContext to FieldHighlightContext.
* Make the search highlight context immutable.
* Avoid storing SearchHighlightContext on HighlighterContext.
2020-07-29 11:39:17 -07:00
Tim Brooks 85fdf959ad
Add configured indexing memory limit to node stats (#60414)
This commit adds the configured memory limit to the node stats API.
2020-07-29 12:28:21 -06:00
Nhat Nguyen 9d4a64e749
Allow CCR on nodes with legacy roles only (#60093)
CCR will stop functioning if the master node is on 7.8, but data nodes 
are before that version because the master node considers that all data
nodes do not have the remote cluster client role. This commit allows CCR
work on data nodes with legacy roles only.

Relates #54146
Relates #59375
2020-07-29 10:57:31 -04:00
Armin Braun 8429b4ace8
Fix Queued Snapshot Deletes After Finalization Failure (#60285) (#60379)
This fixes the behavior of the snapshot state machine in the following edge case:

1. Snapshot is running
2. Delete/abort for the snapshot is started
3. Snapshot fails to finalize

We were not removing the failed snapshot id from the list of snapshots to delete in the delete.
This lead to an error in the repository, which throws if we try to delete a non-existing snapshot.
This commmit updates the deletions in progress by removing the failed snapshot id.
The fact that this could lead to snapshot delete entries without any snapshot ids is not optimized
on purpose because it allows for another attempt at writing clean `RepositoryData` and will run basic
cleanup on the repository (root level blobs and stale indices) and thus bring the repository back into
a clean state after a failed finalization.

Closes #60274
2020-07-29 15:54:18 +02:00
Armin Braun 381cec2ba9
Fix ConcurrentSnapshotsIT.testMasterFailOverWithQueuedDeletes (#60307) (#60376)
The test assumed that the master fail-over would always work out as a single step.
This is not guaranteed however and we can randomly see master failing over twice,
in which case the transport listener will be failed on the node that stops being
leader and we have to catch an exception for the deletes as well just like we do
for the snapshot.

Closes #60262
2020-07-29 15:54:00 +02:00
Armin Braun 0778274b72
Fix IPV6 Scope Id in InetAddressesTests (#60368) (#60369)
Follow up to #60360, turns out at times the name of an interface that isn't loopback is not a valid scope id.
2020-07-29 13:16:12 +02:00
Armin Braun 1f6a3765e4
Fix NPE in SnapshotsInProgress Constructor (#60355)
Merge oversight between cleanups that removed `null` for `shards` and this corner case
spot of no indices in a snapshot.

Closes #60330
2020-07-29 10:47:28 +02:00
Armin Braun 4307a45153
Fix IPV6 Scope ID Test (#60360) (#60363)
Use real scope id from first available interface instead of `lo` which might
not exist on non-Linux platforms.

Closes #60332
2020-07-29 09:55:37 +02:00
Armin Braun 753fd4f6bc
Cleanup and optimize More Serialization Spots (#59959) (#60331)
Same as #59626 for a few more spots.
2020-07-29 07:20:44 +02:00
Zachary Tong e3d85feecd Mute testForStringIPv6WithScopeIdInput test
Tracking issue: https://github.com/elastic/elasticsearch/issues/60332
2020-07-28 15:05:19 -04:00
Igor Motov 0dd53b76bd
Add aggregation list to node info (#60074) (#60256)
Adds a full list of supported aggregations to the node info API. This list
will be used in transform tests and telemetry mapping tests that will be added
as follow-up PRs.

Fixes #59774
2020-07-28 14:06:12 -04:00
Julie Tibshirani c7bfb5de41
Add search `fields` parameter to support high-level field retrieval. (#60258)
This feature adds a new `fields` parameter to the search request, which
consults both the document `_source` and the mappings to fetch fields in a
consistent way. The PR merges the `field-retrieval` feature branch.

Addresses #49028 and #55363.
2020-07-28 10:58:20 -07:00
James Rodewig 025e7bee80
[DOCS] Fix allowed values for numeric sort types (#60176) (#60299)
Co-authored-by: Philippus Baalman <philippus@gmail.com>
2020-07-28 13:51:59 -04:00
Howard 11b86b3f88
Remove unused clusterService instance in ActionModule. (#59826) 2020-07-28 10:36:04 -07:00
jimczi 4e4ed6ee48 fix race condition in SearchPhaseControllerTests#consumerTestCase 2020-07-28 18:27:39 +02:00
David Turner 9450ea08b4 Log and track open/close of transport connections (#60297)
Transport connections between nodes remain in place until one or other
node shuts down or the connection is disrupted by a flaky network.
Today it is very difficult to demonstrate that transient failures and
cluster instability are caused by the network even though this is often
the case. In particular, transport connections open and close without
logging anything, even at `DEBUG` level, making it very hard to quantify
the scale of the problem or to correlate the networking problems with
external events.

This commit adds the missing `DEBUG`-level logging when transport
connections open and close, and also tracks the total number of
transport connections a node has opened as a measure of the stability of
the underlying network.
2020-07-28 17:08:04 +01:00
Armin Braun 9222070f22
Fix Test Failure in testCorrectCountsForDoneShards (#60254) (#60286)
* Fix Test Failure in testCorrectCountsForDoneShards

Fixing the freak edge case where the node shard status request returns before
the node was able to send the state update request to master and update the cluster state.
Without this change, the snapshot shard status would report as `DONE` once the data node
has finished updating the shard in the cluster state.
If the data node then drops out of the cluster before the state has been updated, then
the status will jump to "FAILURE" because the master updates the state once the data node
leaves the cluster.

Closes #60247
2020-07-28 15:46:18 +02:00
David Turner b78caa5c00 Add more useful toString on cluster state observers (#60277)
Today if a cluster state observer's listener takes a long time to
process a notification then we log the following rather useless warning
message:

    [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@12345678]] took [34567ms]

This commit adds a handful of simple `toString()` implementations in
order to identify the owner of the listener in question.
2020-07-28 12:56:58 +01:00
Jim Ferenczi 1144534093
Executes incremental reduce in the search thread pool (#58461) (#60275)
This change forks the execution of partial
reduces in the coordinating node to the search thread pool.
It also ensures that partial reduces are executed sequentially
and asynchronously in order to limit the memory and cpu that a
single search request can use but also to avoid blocking a
network thread.
If a partial reduce fails with an exception, the search
request is cancelled and the reporting of the error is
delayed to the start of the fetch phase (when the final
reduce is performed). This ensures that we cleanup the
in-flight search requests before returning an error to
the user.

Closes #53411
Relates #51857
2020-07-28 13:40:47 +02:00
Armin Braun d39622e17e
Stop Serializing RepositoryData Twice when Writing (#60107) (#60269)
We can save one round of serializing `RepositoryData` on the write path.
This also leads to somewhat better compression because we compress larger chunks
in one go potentially when compared to serializing and compressing in one go.
Also, fixed the double wrapping of collections when copying the repository
data instance via the `withGenId`.
2020-07-28 11:42:14 +02:00
Yannick Welsch a55c869aab Properly document keepalive and other tcp options (#60216)
Keepalive options are not well-documented (only in transport section, although also available at http and network level).

Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2020-07-28 11:10:04 +02:00
Yannick Welsch ffe114b890 Set specific keepalive options by default on supported platforms (#59278)
keepalives tell any intermediate devices that the connection remains alive, which helps with overzealous firewalls that are
killing idle connections. keepalives are enabled by default in Elasticsearch, but use system defaults for their
configuration, which often times do not have reasonable defaults (e.g. 7200s for TCP_KEEP_IDLE) in the context of
distributed systems such as Elasticsearch.

This PR sets the socket-level keep_alive options for network.tcp.{keep_idle,keep_interval} to 5 minutes on configurations
that support it (>= Java 11 & (MacOS || Linux)) and where the system defaults are set to something higher than 5
minutes. This helps keep the connections alive while not interfering with system defaults or user-specified settings
unless they are deemed to be set too high by providing better out-of-the-box defaults.
2020-07-28 11:10:04 +02:00
Armin Braun fac5953d13
Let `isInetAddress` utility understand the scope ID on ipv6 (#60172) (#60263)
Make `isInetAddress` utility method understand the scope ID on ipv6.

Fixes #60115

Co-authored-by: Yang Cheng <chengyang2048@163.com>
2020-07-28 09:37:39 +02:00
James Rodewig cb4c21fa7b
[DOCS] Fix typo in adapt auto expand replica comments (#60187) (#60239)
Co-authored-by: Howard <danielhuang@tencent.com>
2020-07-27 14:18:53 -04:00
weizijun 5df043d0e0 Fix wait_for_no_initializing_shards params (#58379) 2020-07-27 14:03:26 -04:00
Adrien Grand f1f275c91b Add 6.8.12 and 7.8.2 version constants. 2020-07-27 19:26:22 +02:00
Tim Brooks df0f68da23
Identify the operation type in rejected exception (#60138)
Currently, we do not categorize the operation type in the rejection
exception messsage when we reject an indexing operation for indexing
memory limits. This commit fixes this to ensure that it is identified as
coordinating, primary, or replica.
2020-07-27 10:09:46 -06:00
Tim Brooks 47922c9e4a
Fix indexing pressure replica rejections logic (#60150)
Currently the logic to rejection replica rejections is evaluate before
adding the additional bytes of the current operation. This means that
the first replica operation which should be rejected will be allowed to
proceed. This commit fixes this logic and adds unit level test to ensure
indexing pressure behavior is correct.
2020-07-27 10:00:01 -06:00
Nik Everett a451dd87aa
Reduce merge map memory overhead in the Variable Width Histogram Aggregation (#59366) (#60171)
When a document which is distant from existing buckets gets collected, the
`variable_width_histogram` will create a new bucket and then insert it
into the ordered list of buckets.

Currently, a new merge map array is created to move this bucket. This is
very expensive as there might be thousands of buckets.

This PR creates `mergeBuckets(UnaryOperator<Long> mergeMap)` methods in
`BucketsAggregator` and `MergingBucketsDefferingCollector`, and updates
the `variable_width_histogram`  to use them. This eliminates the need to
create an entire merge map array for each new bucket and reduces the
memory overhead of the algorithm.

Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>
2020-07-27 09:23:06 -04:00
Armin Braun 196ed6b90e
Remove Mostly Redundant Deleting in FsBlobContainer (#60117) (#60195)
In almost all cases we write uuid named files via this method.
Preemptively deleting just wastes IO ops, we can delete after a write failed
and retry the write to cover the few cases where we actually do an overwrite.
2020-07-27 14:05:41 +02:00
David Roberts 89466eefa5
Don't require separate privilege for internal detail of put pipeline (#60190)
Putting an ingest pipeline used to require that the user calling
it had permission to get nodes info as well as permission to
manage ingest.  This was due to an internal implementaton detail
that was not visible to the end user.

This change alters the behaviour so that a user with the
manage_pipeline cluster privilege can put an ingest pipeline
regardless of whether they have the separate privilege to get
nodes info.  The internal implementation detail now runs as
the internal _xpack user when security is enabled.

Backport of #60106
2020-07-27 10:44:48 +01:00
Armin Braun 25a75d05c0
Fix Test Failure in testConcurrentlyChangeRepositoryContentsInBwCMode (#60095)
There is a very unlikely but possible test failure in this test.
The `SnapshotsService` continues iterating over queued operations after
resolving the transport listener. This can lead to a situation where the moved
repository data is not picked up when running the delete (even though we
have the concurrent modifications BwC mode activated) concurrently.

I fixed this in the test so that the test still verifies that this setting works.
Technically speaking, one could add logic to the way we queue and execute repo operations
to address this special case. Since this case only comes about with the concurrent modifications
setting enabled (and the setting is gone in master already) I don't really see a reason to improve
the logic here since we should always fail queued up repo operations on concurrent modification for
safety reasons.
2020-07-27 09:33:38 +02:00
Nhat Nguyen 0031dea9cc Fix race in testSendSnapshotSendsOps (#59831)
There is a race between increase and get the global checkpoint in the 
test as indexTranslogOperations can be executed concurrently.

Closes #59492
2020-07-23 16:22:40 -04:00
Ignacio Vera db183c89ed
Refactor HyperLogLogPlusPlus to separate algorithms and internal data representation (#60104) (#60109) 2020-07-23 15:07:05 +02:00
David Turner bf7e53a91e Remove node-level canAllocate override (#59389)
Today there is a node-level `canAllocate` override which the balancer
uses to ignore certain nodes to which it is certain no more shards can
be allocated. In fact this override only ignores nodes which have hit
the rarely-used `cluster.routing.allocation.total_shards_per_node`
limit, so this optimization doesn't have a meaningful impact on real
clusters.

This commit removes this unnecessary fast path from the balancer, and
also removes all the machinery needed to support it.
2020-07-23 08:48:59 +01:00
Armin Braun 43a6ff5eb1
Optimize some Spots around Closing Resources (#60049) (#60096)
The single element `close` calls go through a very inefficient path that includes creating
a one element list.
`releaseOnce` is only with a single non-null input in production in two spots so no need for
varargs and any complexity here.
`ReleasableBytesStreamOutput` does not require any `releaseOnce` wrapping because we already have
that kind of logic implemented in `org.elasticsearch.common.util.AbstractArray` (which we were
wrapping here) already.
2020-07-23 08:49:06 +02:00
Julie Tibshirani aa57bbd422
Consolidate validation for 'docvalue_fields'. (#60065)
This improves modularity and also fixes some issues when `docvalues_fields` is
used within `inner_hits` or the `top_hits` agg:
* We previously didn't resolve wildcards in field names.
* We also forgot to enforce the limit `index.max_docvalue_fields_search`.
2020-07-22 17:26:58 -07:00
Armin Braun ebb6677815
Formalize and Streamline Buffer Sizes used by Repositories (#59771) (#60051)
Due to complicated access checks (reads and writes execute in their own access context) on some repositories (GCS, Azure, HDFS), using a hard coded buffer size of 4k for restores was needlessly inefficient.
By the same token, the use of stream copying with the default 8k buffer size  for blob writes was inefficient as well.

We also had dedicated, undocumented buffer size settings for HDFS and FS repositories. For these two we would use a 100k buffer by default. We did not have such a setting for e.g. GCS though, which would only use an 8k read buffer which is needlessly small for reading from a raw `URLConnection`.

This commit adds an undocumented setting that sets the default buffer size to `128k` for all repositories. It removes wasteful allocation of such a large buffer for small writes and reads in case of HDFS and FS repositories (i.e. still using the smaller buffer to write metadata) but uses a large buffer for doing restores and uploading segment blobs.

This should speed up Azure and GCS restores and snapshots in a non-trivial way as well as save some memory when reading small blobs on FS and HFDS repositories.
2020-07-22 21:06:31 +02:00