Commit Graph

5161 Commits

Author SHA1 Message Date
Armin Braun af2e2782eb
Stop Needlessly Copying Bytes in XContent Parsing (#61447) (#61469)
Wrapping a `BytesArray` in a `StreamInput` for deserialization is inefficient.
This forces Jackson to internally buffer (i.e. copy) all bytes from the `BytesArray`
before deserializing, adding overhead for copying the bytes and managing the buffers.

This commit fixes a number of spots where `BytesArray` is the most common type of
`BytesReference` to special case this type and parse it more efficiently.
Also improves parsing `String`s to use the more efficient direct `String` parsing APIs.
2020-08-24 15:49:15 +02:00
Dan Hermann c53731a0cd
[7.x] Fix wrong pipeline name in debug log (#58817) (#61233) 2020-08-21 11:14:01 -05:00
David Turner 078e8717ee Stop opening PING conns to remote clusters (#61408)
Today a remote cluster connection comprises a `PING` and a `REG`
channel. The `PING` channel is only used for health checks between the
elected master and the members of its own cluster, so is unused in a
remote cluster connection. This commit removes this unused connection.
2020-08-21 12:21:57 +01:00
Armin Braun e09058df1a
Serialize Get Mappings Response on Generic ThreadPool (#57937) (#61401)
For large responses to the get mappings request, the serialization
to XContent can be extremely slow (serializing mappings is expensive since
we have to decompress and deserialize the mapping source).
To not introduce instability on the IO thread handling the get mappings response
we should move the serialization to the management pool.
The trade-off of introducing one or two new context switches for responses that are
small enough to not cause trouble on the transport thread to prevent instability
in case of a large number of mappings in the cluster seems worth it.
2020-08-21 08:06:30 +02:00
Armin Braun 22509c95f8
Fix Blackholed Connection Behavior in DisruptableMockTransport (#61310) (#61381)
It is not realistic to drop messages without eventually failing.
To retain the coverage of long pauses this PR adjusts the blackholed
behavior to fail a send after 24h (which is assumed to be longer than any
timeout in the system) instead of never.

Closes #61034
2020-08-21 07:54:56 +02:00
Julie Tibshirani 997c73ec17
Correct how field retrieval handles multifields and copy_to. (#61391)
Before when a value was copied to a field through a parent field or `copy_to`,
we parsed it using the `FieldMapper` from the source field. Instead we should
parse it using the target `FieldMapper`. This ensures that we apply the
appropriate mapping type and options to the copied value.

To implement the fix cleanly, this PR refactors the value parsing strategy. Now
instead of looking up values directly, field mappers produce a helper object
`ValueFetcher`. The value fetchers are responsible for almost all aspects of
fetching, including looking up the right paths in the _source.

The PR is fairly big but each commit can be reviewed individually.

Fixes #61033.
2020-08-20 15:53:35 -07:00
Julie Tibshirani 85ad328df7
Ensure fetch fields aren't dropped when rewriting search. (#61390)
Previously we didn't retain the requested fields when performing a shallow copy
of the search source. This meant that when a search was rewritten, we could drop
the requested fields and fail to return them in the response.
2020-08-20 14:58:58 -07:00
Armin Braun 08dbd6d989
Optimize a few Spots on IO Loop (#60865) (#61380)
Saving some cycles here and there on the IO loop:

* Don't instantiate new `Runnable` to execute on `SAME` in a few spots
* Don't instantiate complicated wrapped stream for empty messages
* Stop instantiating almost never used `ClusterStateObserver` in two spots
* Some minor cleanup and preventing pointless `Predicate<>` instantiation in transport master node action
2020-08-20 20:22:49 +02:00
Alan Woodward a3a0c63ccf
Convert NumberFieldMapper to parametrized form (#61092) (#61376)
In addition, this commit converts ScaledFloatFieldMapper as it was relying
on a number of static values taken from NumberFieldMapper that had changed
or been removed.
2020-08-20 16:43:26 +01:00
Nhat Nguyen a3906dcef3 Enable cancellation for msearch requests (#61337)
Today multi-search requests are not cancellable because we create
regular tasks instead of cancellable ones for them.
2020-08-19 16:59:17 -04:00
Nik Everett 9789e6d154
Migrate some field mapper tests to ESTestCase (#61301) (#61346)
This switches a few tests for field mappers from `ESSingleNodeTestCase`
to `ESTestCase` because, in general, we prefer to avoid
`ESSingleNodeTestCase` when we can because it is slow and "big". "Big"
here means that it pulls in an entire node, making it difficult to
reason about what you are testing.
2020-08-19 15:43:49 -04:00
Armin Braun 4a53ae203e
Fix SharedClusterSnapshotRestoreIT.testThrottling (#61323) (#61328)
We have to set the recovery setting to `0` if we don't want throttling
from recoveries. Otherwise the randomized value used for this setting in
tests can lead to throttling unexpectedly.

Closes #61311
2020-08-19 15:26:32 +02:00
Nik Everett 70128e022b fix stats aggregator tests
With #60683 we stopped forcing aggregating all docs using a single
Aggregator which made some of our accuracy assumptions about the stats
aggregator incorrect. This adds a test that does the forcing and asserts
the old accuracy and adds a test without the forcing with much looser
accuracy guarantees.

Closes #61132
2020-08-19 08:55:43 -04:00
Alan Woodward b1aa0d8731
Fix fieldnames field type for pre-6.1 indexes (#61322)
The FieldNamesFieldMapper field has different behaviour for indexes created in
clusters earlier than v6.1, and the code to deal with this was still using the vestigial
FieldType field of FieldMapper in its indexing path. This meant that documents
added after an upgrade were not correctly indexing their field names field. This
commit corrects the parseCreateField method to use the default field type.

Fixes #61305
2020-08-19 12:59:09 +01:00
David Turner 389f7779e7 Report more details of unobtainable ShardLock (#61255)
Today a common reason for a `ShardLockObtainFailedException` is when a
shard is removed from a node and then assigned straight back to it again
before the node has had a chance to shut the previous shard instance
down. For instance, this can happen if a node briefly leaves the cluster
holding a primary with no in-sync replicas.

The message in this case is typically as follows:

    obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]

This is pretty hard to interpret, and doesn't raise the important
question: "why didn't the shard shut down sooner?"

With this change we reword the message a bit, report the age of the
shard lock, and adjust the details to report that the lock is held by a
closing shard:

    obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [12345ms]

Relates #38807
2020-08-19 06:36:28 +01:00
Nhat Nguyen 08b0e78ef4 Log more info when search ops higher than expected (#61108)
We have seen a situation where the total search operations are higher 
than expected. Unfortunately, we did not have enough info to figure it
out. This commit adds the failures to the error to provide more context
and adjusts the log level in case of failure to debug.
2020-08-18 15:20:41 -04:00
Rory Hunter bd7236cd65 Version bump for 7.9.0 release 2020-08-18 16:07:43 +01:00
Dimitrios Liappis c870640cbd
[7.x] Introduce 6.8.13 as a version (#61198)
Introduce version 6.8.13 to branch 7.x
2020-08-18 17:07:16 +03:00
Armin Braun 58d07b2ffc
Remove Unused ByteBufferReference (#61116) (#61250)
We only work with heap byte buffers at this point and those we can and do unwrap the
`byte[]` ourselves and use `BytesArray` instead of a needless level of indirection via `ByteBuffer`.
2020-08-18 10:53:40 +02:00
Armin Braun 6ffa7f0737
Fix testConcurrentSnapshotDeleteAndDeleteIndex (#61228) (#61249)
There is a corner case here in which during partial snapshot the index is
deleted right between starting the snapshot in the CS and the data node getting to work
on it, causing the data node the fail that shard snapshot and making the snapshot `PARTIAL`.

Closes #61208
2020-08-18 10:45:30 +02:00
Mark Tozzi db1df6cc30
[7.x] Remove a bunch of type boilerplate from Aggs (#60852) (#61031) 2020-08-17 12:13:05 -04:00
Nik Everett 1b7bbafd81
Add method to make random DateFormatter pattern (backport of #60613) (#61213)
Adds a method to make a random date `DateFormatter` pattern. We expect
this'll be useful for runtime fields to compate their formatting with
the standard date field.
2020-08-17 10:57:52 -04:00
Christoph Büscher 6866396e1d Improve 'ignore_malformed' handling for dates (#60211)
Currently we occasionally can get ArithmeticException from parsing bad input
values on 'date' fields that are passed on even if 'ignore_malformed' is set.
This change adds this exception to the ones we already catch for malformed
values.

Closes #52634
2020-08-17 16:18:08 +02:00
David Turner b21cb7f466 Reduce allocations when persisting cluster state (#61159)
Today we allocate a new `byte[]` for each document written to the
cluster state. Some of these documents may be quite large. We need a
buffer that's at least as large as the largest document, but there's no
need to use a fresh buffer for each document.

With this commit we re-use the same `byte[]` much more, only allocating
it afresh if we need a larger one, and using the buffer needed for one
round of persistence as a hint for the size needed for the next one.
2020-08-17 13:45:31 +01:00
David Turner f3e0c60896
Restrict testing of legacy discovery to tests (#61178)
The 7.x branch preserves the legacy discovery mechanism from 6.x purely
for running internal cluster tests; this mechanism is otherwise
completely untested and unsupported. However it is still technically
possible to use it outside of the test suite if you dig through the
source code to work out what settings need to be set. With this change
we make it impossible to use this mechanism in production.

Closes #61177
2020-08-17 11:05:27 +01:00
Ryan Ernst 9cb45dafab
Unwrap transport exception when using transport client (#60801)
The ReloadSecureSettingsIT makes requests to the reload settings apis.
In 7.x, the client used from the integ test infrastructure may be a
transport client. In that case, the expected exception type, and causes
the test to fail (though it will hang indefinitely due to not counting
down the latch, see
https://github.com/elastic/elasticsearch/pull/60800). This commit adds
unwrapping of the remote exception to get the underlying expected
exception.

closes #51546
2020-08-13 10:24:04 -07:00
Ryan Ernst c73ab0b16f
Ensure hotthreads do not produce node failures (#61073)
This commit adds an assertion that no sub-nodes requests within
hot threads failed.

relates #58842
2020-08-13 10:22:19 -07:00
Armin Braun 3143b5ea47
Stabilize testSnapshotDeleteRelocatingPrimaryIndex (#61088) (#61096)
Use transport blocking to make relocation take forever instead of relying on the relocation to take long enough to clash with the snapshot.

Closes #61069
2020-08-13 16:26:56 +02:00
Yannick Welsch 8e775394ac Fix testNoMasterActionsMetadataWriteMasterBlock (#60605)
We can't assert on the specific exception, unfortunately.
2020-08-13 10:48:16 +02:00
David Turner c6276ae177 Fail invalid incremental cluster state writes (#61030)
It is disastrous if we commit an incremental cluster state update
without having written the full state first. We assert that this doesn't
happen, but it is hard to fully test the myriad ways that things might
fail in a messy production environment. Given the disastrous
consequences it is worth erring on the side of caution in this area.
This commit fails invalid writes even if assertions are disabled.
2020-08-12 19:46:19 +01:00
Lee Hinman e3df64a429
[7.x] Add data tiers (hot, warm, cold, frozen) as custom node roles (#60994) (#61045)
This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the
x-pack plugin. These roles are intended to be the base for the formalization of data tiers in
Elasticsearch.

These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing
`data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and
frozen node).

This also includes a custom `AllocationDecider` that allows the user to configure the following
settings on a cluster level:
- `cluster.routing.allocation.require._tier`
- `cluster.routing.allocation.include._tier`
- `cluster.routing.allocation.exclude._tier`

And in index settings:
- `index.routing.allocation.require._tier`
- `index.routing.allocation.include._tier`
- `index.routing.allocation.exclude._tier`

Relates to #60848
2020-08-12 11:06:23 -06:00
Alan Woodward 5b3c10c379
Fix serialization of AllFieldMapper (#61044)
Converting AllFieldMapper to parametrized form ended up not being run through BWC
testing, resulting in an incorrect implementation being committed. This commit fixes
the serialization, and adds unit tests as well as unmuting the BWC test that uncovered
the bug.

Fixes #60986
2020-08-12 17:32:55 +01:00
Yannick Welsch 8c488de576 Gracefully handle null in checkSettingsForTerminalDeprecation
Fixes a test failure after backport to 7.x
2020-08-12 18:03:52 +02:00
Yannick Welsch 25404cbe3d Provide option to allow writes when master is down (#60605)
Elasticsearch currently blocks writes by default when a master is unavailable. The cluster.no_master_block setting allows
a user to change this behavior to also block reads when a master is unavailable. This PR introduces a way to now also still
allow writes when a master is offline. Writes will continue to work as long as routing table changes are not needed (as
those require the master for consistency), or if dynamic mapping updates are not required (as again, these require the
master for consistency).

Eventually we should switch the default of cluster.no_master_block to this new mode.
2020-08-12 16:56:45 +02:00
Yannick Welsch 6644f2283d Do not access snapshot repo on dedicated voting-only master node (#61016)
Today a snapshot repository verification ensures that all master-eligible and data nodes have write access to the
snapshot repository (and can see each other's data) since taking a snapshot requires data nodes and the currently
elected master to write to the repository. However, a dedicated voting-only master-eligible node is not a data node and
will never be the elected master so we should not require it to have write access to the repository.

Closes #59649
2020-08-12 16:56:45 +02:00
Yannick Welsch af519be9cb Ensure repo not in use for wildcard repo deletes (#60947)
Repositories can't be unregistered when they are actively being used for snapshots or restores. Wildcard repository
deletes could silently bypass the "repo in use" checks however, which is now fixed.
2020-08-12 16:38:06 +02:00
Dan Hermann 538c93c923
Adding Hit counts and Miss counts for QueryCache exposed through REST api. (#60114) (#60993) 2020-08-12 08:21:09 -05:00
Alan Woodward c81dc2b8b7 Convert KeywordFieldMapper to parametrized form (#60645)
This makes KeywordFieldMapper extend ParametrizedFieldMapper, with explicitly
defined parameters.

In addition, we add a new option to Parameter, restrictedStringParam, which
accepts a restricted set of string options.
2020-08-12 11:41:11 +01:00
markharwood 66098e0bf4
Search fix: query_string regex/wildcard searches not working on wildcard fields (#60959) (#61010)
The Query string parser was not delegating the construction of wildcard/regex queries to the underlying field type.
The wildcard field has special data structures and queries that operate on them so cannot rely on the basic regex/wildcard queries that were being used for other fields.

Closes #60957
2020-08-12 10:44:52 +01:00
Armin Braun 32423a486d
Simplify and Speed up some Compression Usage (#60953) (#61008)
Use thread-local buffers and deflater and inflater instances to speed up
compressing and decompressing from in-memory bytes.
Not manually invoking `end()` on these should be safe since their off-heap memory
will eventually be reclaimed by the finalizer thread which should not be an issue for thread-locals
that are not instantiated at a high frequency.
This significantly reduces the amount of byte copying and object creation relative to the previous approach
which had to create a fresh temporary buffer (that was then resized multiple times during operations), copied
bytes out of that buffer to a freshly allocated `byte[]`, used 4k stream buffers needlessly when working with
bytes that are already in arrays (`writeTo` handles efficient writing to the compression logic now) etc.

Relates #57284 which should be helped by this change to some degree.
Also, I expect this change to speed up mapping/template updates a little as those make heavy use of these
code paths.
2020-08-12 11:06:23 +02:00
Nik Everett ce9c5f0e46 Fix diversified sample tests
The test assumed that the aggregator only ran once but we turned that
off. This turns it back on.
2020-08-11 17:49:43 -04:00
Jay Modi 2fa6448a15
System index reads in separate threadpool (#60927)
This commit introduces a new thread pool, `system_read`, which is
intended for use by system indices for all read operations (get and
search). The `system_read` pool is a fixed thread pool with a maximum
number of threads equal to lesser of half of the available processors
or 5. Given the combination of both get and read operations in this
thread pool, the queue size has been set to 2000. The motivation for
this change is to allow system read operations to be serviced in spite
of the number of user searches.

In order to avoid a significant performance hit due to pattern matching
on all search requests, a new metadata flag is added to mark indices
as system or non-system. Previously created system indices will have
flag added to their metadata upon upgrade to a version with this
capability.

Additionally, this change also introduces a new class, `SystemIndices`,
which encapsulates logic around system indices. Currently, the class
provides a method to check if an index is a system index and a method
to find a matching index descriptor given the name of an index.

Relates #50251
Relates #37867
Backport of #57936
2020-08-11 12:16:34 -06:00
Julie Tibshirani a93be8d577
Handle nested arrays in field retrieval. (#60981)
We accept _source values with multiple levels of arrays, such as
`"field": [[[1, 2]]]`. This PR ensures that field retrieval can handle nested
arrays by unwrapping the arrays before parsing the values.
2020-08-11 10:22:16 -07:00
Mark Tozzi ab8518fb5b
[7.x] Extensibility for Composite Agg #59648 (#60842) 2020-08-11 09:14:33 -04:00
Alan Woodward 54279212cf
Make MetadataFieldMapper extend ParametrizedFieldMapper (#59847) (#60924)
This commit cuts over all metadata field mappers to parametrized format.
2020-08-11 09:02:28 +01:00
Armin Braun 3e2dfc6eac
Remove GCS Bucket Exists Check (#60899) (#60914)
Same as https://github.com/elastic/elasticsearch/pull/43288 for GCS.
We don't need to do the bucket exists check before using the repo, that just needlessly
increases the necessary permissions for using the GCS repository.
2020-08-11 09:54:27 +02:00
Julie Tibshirani d51eae6e9f
Prevent loading 'fields' with stored fields disabled. (#60938)
Because the 'fields' option loads from _source (which is a stored field), it is
not possible to retrieve 'fields' when stored_fields are disabled.

This also fixes #60912, where setting stored_fields: _none_ prevented the
_ignored fields from being loaded and caused a parsing exception.
2020-08-10 15:40:27 -07:00
Nik Everett 0286d0a769
Move distance_feature query building into MFT (#60614) (#60846)
This moves the `distance_feature` query building out of
`DistanceFeatureQueryBuilder` and into subclasses of `MappedFieldType`.
Without this we don't have a chance of supporting this for runtime
fields. In general I'm not sad to see the `instanceof`s go.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-08-10 16:05:17 -04:00
Julie Tibshirani b216340f50
Make `FetchPhase` logic more readable. (#60779)
* Factor out FieldsVisitor#postProcess call.
* Swap logical order for normal and nested documents.
* Extract the method createStoredFieldsVisitor.
2020-08-10 11:04:54 -07:00
Nik Everett dfd502f9ca
Rework checking if a year is a leap year (#60585) (#60790)
This way is faster, saving about 8% on the microbenchmark that rounds to
the nearest month. That is in the hot path for `date_histogram` which is
a very popular aggregation so it seems worth it to at least try and
speed it up a little.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-08-10 12:45:34 -04:00