Commit Graph

3336 Commits

Author SHA1 Message Date
Armin Braun 9b4f50b40a
Remove Redundant GetAllSnapshots Method from RepositoryData (#44259) (#44271)
* With the removal of the incompatible snapshots list in RepositoryData
the get snapshots and get all snapshots methods are equivalent so I
removed one of them
2019-07-12 15:03:09 +02:00
Yannick Welsch 068286ca4b Remove RemoteClusterConnection.ConnectedNodes (#44235)
This instead exposes the set of connected nodes on ConnectionManager.
2019-07-12 14:54:21 +02:00
Armin Braun 6c02cf0241
Fix InternalTestCluster StopRandomNode Assertion (#44258) (#44265)
* The assertion added in #44214 is tripped by tests running dedicated
test clusters per test needlessly.This breaks existing tests like the one in #44245.
* Closes #44245
2019-07-12 13:18:55 +02:00
Armin Braun ad6dce16f4
Safer Shard Snapshot Delete (#44165) (#44244)
* Safer Shard Snapshot Delete

* We shouldn't delete the snapshot meta file before we update the index
in the shard folder. If we fail to update the index-N after deleting the
existing index-N is broken because the snap- blob it references is gone.
2019-07-12 12:45:06 +02:00
David Turner 735c897ec6
Avoid counting votes from master-ineligible nodes (#43688)
Today if a master-eligible node is converted to a master-ineligible node it may
remain in the voting configuration, meaning that the master node may count its
publish responses as an indication that it has properly persisted the cluster
state. However master-ineligible nodes do not properly persist the cluster
state, so it is not safe to count these votes.

This change adjusts `CoordinationState` to take account of this from a safety
point of view, and also adjusts the `Coordinator` to prevent such nodes from
joining the cluster. Instead, it triggers a reconfiguration to remove from the
voting configuration a node that now appears to be master-ineligible before
processing its join.

Backport of #43688, see #44260.
2019-07-12 11:30:52 +01:00
Armin Braun 9e920f9612
Make Timestamps Returned by Snapshot APIs Consistent (#43148) (#44261)
* We don't have to calculate the start and end times form the shards for the status API, we have the start time available from the CS or the `SnapshotInfo` in the repo and can either take the end time form the `SnapshotInfo` or
take the most recent time from the shard stats for in progress snapshots
* Closes #43074
2019-07-12 12:05:35 +02:00
Mark Vieira 3cd9606566
Mute failing test 2019-07-11 13:32:49 -07:00
Armin Braun 0dd06cf7a5
Remove Dead Code Around Snapshots (#44109) (#44236)
* Just some random spots that have become unused with recent cleanups
2019-07-11 21:56:36 +02:00
Christoph Büscher 31725ef390 [Tests] Increase SimpleQueryStringIT allowed maxClauseCount (#44215)
For this test, we randomize the CLUSTER_MAX_CLAUSE_COUNT on test setup
(@BeforeClass) between 50 and 100. Some queries in the test generate 56 clauses
which hasn't been an issue before LUCENE-8811, but we slightly need to increase
the minimal possible clause count now.

Closes #44192
2019-07-11 20:16:20 +02:00
Yannick Welsch ae8f625d73 Report usages old child breakers when breaking on real memory (#44221)
This will help in investigations where the real memory circuit breaker is tripped to better understand
on what the actual memory is used, i.e. whether it's a temporary thing (e.g. requests) in contrast to
more permanently allocated memory (e.g. accounting).
2019-07-11 19:52:12 +02:00
Armin Braun 2768662822
Cleanup Stale Root Level Blobs in Sn. Repository (#43542) (#44226)
* Cleans up all root level temp., snap-%s.dat, meta-%s.dat blobs that aren't referenced by any snapshot to deal with dangling blobs left behind by delete and snapshot finalization failures
   * The scenario that get's us here is a snapshot failing before it was finalized or a delete failing right after it wrote the updated index-(N+1) that doesn't reference a snapshot anymore but then fails to remove that snapshot
   * Not deleting other dangling blobs since that don't follow the snap-, meta- or tempfile naming schemes to not accidentally delete blobs not created by the snapshot logic
* Follow up to #42189
  * Same safety logic, get list of all blobs before writing index-N blobs, delete things after index-N blobs was written
2019-07-11 19:35:15 +02:00
Andrei Stefan e9f9f00940
SQL: add pretty printing to JSON format (#43756) (#44220)
(cherry picked from commit cbd9d4c259bf5a541bc49f65f7973174a36df449)
2019-07-11 20:02:24 +03:00
Christos Soulios c091b6c004
Migrating tests from AvgIT integration test to AvgAggregatorTests (#44076) (#44225)
This PR migrates most tests from AvgIT integration test to AvgAggregatorTests, as described in #42893
2019-07-11 19:20:13 +03:00
Armin Braun 5f22370b6b
Fix ShrinkIndexIT (#44214) (#44223)
* Fix ShrinkIndexIT

* Move this test suit to cluster scope. Currently, `testShrinkThenSplitWithFailedNode` stops a random node which randomly turns out to be the only shared master node so the cluster reset fails on account of the fact that no shared master node survived.
* Closes #44164
2019-07-11 17:58:00 +02:00
Igor Motov 1636701d69 CI: Disable SimpleQueryStringIT.testDocWithAllTypes
Tracked by #44192
2019-07-11 09:22:18 -05:00
Nick Knize 374030a53f
Upgrade to lucene-8.2.0-snapshot-860e0be5378 (#44171) (#44184)
Upgrades lucene library to lucene-8.2.0-snapshot-860e0be5378
2019-07-11 09:17:22 -05:00
Igor Motov 66a9b721f5 Add Map to XContentParser Wrapper (#44036)
In some cases we need to parse some XContent that is already parsed into
a map. This is currently happening in handling source in SQL and ingest
processors as well as parsing null_value values in geo mappings. To avoid
re-serializing and parsing the value again or writing another map-based
parser this commit adds an iterator that iterates over a map as if it was
XContent. This makes reusing existing XContent parser on maps possible.

Relates to #43554
2019-07-11 09:38:31 -04:00
Yannick Welsch ea5513f2cf Make NodeConnectionsService non-blocking (#44211)
With connection management now being non-blocking, we can make NodeConnectionsService
avoid the use of MANAGEMENT threads that are blocked during the connection attempts.

I had to fiddle a bit with the tests as testPeriodicReconnection was using both the mock Threadpool
from the DeterministicTaskQueue as well as the real ThreadPool initialized at the test class level,
which resulted in races.
2019-07-11 14:08:07 +02:00
Armin Braun 51f0e941d3
Reduce Number of List Calls During Snapshot Create and Delete (#44088) (#44209)
* Reduce Number of List Calls During Snapshot Create and Delete

Some obvious cleanups I found when investigation the API call count
metering:
* No need to get the latest generation id after loading latest
repository data
   * Loading RepositoryData already requires fetching the latest
generation so we can reuse it
* Also, reuse list of all root blobs when fetching latest repo
generation during snapshot delete like we do for shard folders
* Lastly, don't try and load `index--1` (N = -1) repository data, it
doesn't exist -> just return the empty repo data initially
2019-07-11 13:52:36 +02:00
Armin Braun 8ce8c627dd
Some Cleanup in o.e.i.shard (#44097) (#44208)
* Some Cleanup in o.e.i.shard

* Extract one duplicated method
* Cleanup obviously unused code
2019-07-11 13:52:06 +02:00
Yannick Welsch 2ee07f1ff4 Simplify port usage in transport tests (#44157)
Simplifies AbstractSimpleTransportTestCase to use JVM-local ports  and also adds an assertion so
that cases like #44134 can be more easily debugged. The likely reason for that one is that a test,
which was repeated again and again while always spawning a fresh Gradle worker (due to Gradle
daemon) kept increasing Gradle worker IDs, causing an overflow at some point.
2019-07-11 13:35:37 +02:00
Armin Braun c0ed64bb92
Improve Repository Consistency Check in Tests (#44204)
* Improve Repository Consistency Check in Tests (#44099)

* Check that index metadata as well as snapshot metadata always exists
when referenced by other metadata

* Fix SnapshotResiliencyTests on ExtraFS (#44113)

* As a result of #44099 we're now checking more directories and have to
ignore the `extraN` folders for those like we do for indices already
* Closes #44112
2019-07-11 11:14:37 +02:00
Armin Braun 8a554f9737
Remove IncompatibleSnapshots Logic from Codebase (#44096) (#44183)
* The incompatible snapshots logic was created to track 1.x snapshots that
became incompatible with 2.x
   * It serves no purpose at this point
   * It adds an additional GET request to every loading of
RepositoryData (from loading the incompatible snapshots blob)
2019-07-11 07:15:51 +02:00
Igor Motov df2e1fb43e Geo: add validator that only checks altitude (#43893)
By default, we don't check ranges while indexing geo_shapes. As a
result, it is possible to index geoshapes that contain contain
coordinates outside of -90 +90 and -180 +180 ranges. Such geoshapes
will currently break SQL and ML retrieval mechanism. This commit removes
these restriction from the validator is used in SQL and ML retrieval.
2019-07-10 16:55:03 -04:00
Ryan Ernst 8fda49a834 Remove unused import in TransportShardBulkAction
Accidentally left from backporting #44092
2019-07-10 13:43:47 -07:00
Christoph Büscher cbb19032df [Test] Additional logging for RemoteClusterClientTests (#44124) 2019-07-10 22:41:54 +02:00
Ryan Ernst c6efb9be2a Convert ReplicationResponse to Writeable (#43953)
This commit convers ReplicationResponse and all its subclasses to
support Writeable.Reader as a constructor.

relates #34389
2019-07-10 12:45:10 -07:00
Ryan Ernst fb77d8f461 Removed writeTo from TransportResponse and ActionResponse (#44092)
The base classes for transport requests and responses currently
implement Streamable and Writeable. The writeTo method on these base
classes is implemented with an empty implementation. Not only does this
complicate subclasses to think they need to call super.writeTo, but it
also can lead to not implementing writeTo when it should have been
implemented, or extendiong one of these classes when not necessary,
since there is nothing to actually implement.

This commit removes the empty writeTo from these base classes, and fixes
subclasses to not call super and in some cases implement an empty
writeTo themselves.

relates #34389
2019-07-10 12:42:04 -07:00
Zachary Tong 92ad588275
Remove generic on AggregatorFactory (#43664) (#44079)
AggregatorFactory was generic over itself, but it doesn't appear we
use this functionality anywhere (e.g. to allow the super class
to declare arguments/return types generically for subclasses to
override).  Most places use a wildcard constraint, and even when a
concrete type is specified it wasn't used.

But since AggFactories are widely used, this led to
the generic touching many pieces of code and making type signatures
fairly complex
2019-07-10 13:20:28 -04:00
Nhat Nguyen b158919542 Do not use mock engine in PrimaryAllocationIT (#44083)
PrimaryAllocationIT#testForceStaleReplicaToBePromotedToPrimary 
relies on the flushing when a shard is no long assigned. This behavior,
however, can be randomly disabled in MockInternalEngine.

Closes #44049
2019-07-10 12:26:34 -04:00
David Turner d0f1a756d9 Comment on the extra reroute after failing shards (#44152)
The `ShardFailedClusterStateTaskExecutor` fails some shards, which performs a
reroute, but then sometimes schedules a followup reroute. It's not clear from
the code why this followup is necessary, so this commit adds a short comment
describing why it's necessary.
2019-07-10 13:24:21 +01:00
David Roberts cad804df92 [TEST] Mute ShrinkIndexIT
Due to https://github.com/elastic/elasticsearch/issues/44164
2019-07-10 13:22:25 +01:00
Martijn van Groningen 913b6a64e8
Replace Streamable w/ Writable for MultiSearchRequest (#44057)
This commit replaces usages of Streamable with Writeable for the
MultiSearchRequest class.

I ran into this when developing a custom action that reuses
MultiSearchRequest in the enrich branch.

Relates to #34389
2019-07-10 11:13:28 +02:00
Armin Braun a23d1ed00d
Mute SearchWithRandomExceptionsIT (#44147) (#44149)
* This is failing quiete often and we can reproduce it now so we don't
need additional test logging on CI
* Relates #40435
2019-07-10 08:12:26 +02:00
David Turner aec44fecbc Decouple DiskThresholdMonitor & ClusterInfoService (#44105)
Today the `ClusterInfoService` requires the `DiskThresholdMonitor` at
construction time so that it can notify it when nodes report changes in their
disk usage, but this is awkward to construct: the `DiskThresholdMonitor`
requires a `RerouteService` which requires an `AllocationService` which comees
from the `ClusterModule` which requires the `ClusterInfoService`.

Today we break the cycle with a `LazilyInitializedRerouteService` which is
itself a little ugly. This commit replaces this with a more traditional
subject/observer relationship between the `ClusterInfoService` and the
`DiskThresholdMonitor`.
2019-07-09 18:43:32 +01:00
David Turner e70cad4c52 Remove node conn block after connection barrier (#44114)
Today `testOnlyBlocksOnConnectionsToNewNodes` fails (extremely rarely) if the
last attempt to connect to `node0` is delayed for so long that the test runs
`nodeConnectionsBlocks.clear()` before the connection attempt obtains the
expected connection block. We can turn this into a reliable failure with this
delay:

```diff
diff --git a/server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java b/server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java
index f48413824d3..9a1d0336bcd 100644
--- a/server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java
+++ b/server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java
@@ -300,6 +300,13 @@ public class NodeConnectionsService extends AbstractLifecycleComponent {
         private final Runnable connectActivity = () -> threadPool.executor(ThreadPool.Names.MANAGEMENT).execute(new AbstractRunnable() {
             @Override
             protected void doRun() {
+
+                try {
+                    Thread.sleep(500);
+                } catch (InterruptedException e) {
+                    throw new AssertionError("unexpected", e);
+                }
+
                 assert Thread.holdsLock(mutex) == false : "mutex unexpectedly held";
                 transportService.connectToNode(discoveryNode);
                 consecutiveFailureCount.set(0);
```

This commit reverts the extra logging introduced in #43979 and fixes this
failure by waiting for the connection attempt to hit the barrier before
removing it.

Fixes #40170
2019-07-09 17:03:26 +01:00
David Turner 268971db03 Wait for blackholed connection before discovery (#44077)
Since #42636 we no longer treat connections specially when simulating a
blackholed connection. This means that at the end of the safety phase we may
have just started a connection attempt which will time out, but the default
timeout is 30 seconds, much longer than the 2 seconds we normally allow for
post-safety-phase discovery. This commit adds time for such a connection
attempt to time out.

It also fixes some spurious logging of `this` that now refers to an object with
an unhelpful `toString()` implementation introduced in #42636.

Fixes #44073
2019-07-09 10:59:53 +01:00
Henning Andersen 748a10866d Reindex ScrollableHitSource pump data out (#43864)
Refactor ScrollableHitSource to pump data out and have a simplified
interface (callers should no longer call startNextScroll, instead they
simply mark that they are done with the previous result, triggering a
new batch of data). This eases making reindex resilient, since we will
sometimes need to rerun search during retries.

Relates #43187 and #42612
2019-07-09 11:50:09 +02:00
David Turner fd9eebae81 Only apply initial recovery filter to shrunk shard (#44054)
Today the `index.routing.allocation.initial_recovery._id` setting can only be
set on indices that are the result of a shrink, but the filtered allocation
decider also applies this filter to shards with a recovery source of
`EMPTY_STORE`. The only way to have this setting set while the recovery source
is `EMPTY_STORE` is to force-allocate an empty primary, but such a forced
allocation ignores this allocation decider.

This commit simplifies the allocation decider so that the `initial_recovery`
setting only applies to shards with a recovery source of `LOCAL_SHARDS`.
2019-07-09 08:42:18 +01:00
Armin Braun 9eac5ceb1b
Dry up inputstream to bytesreference (#43675) (#44094)
* Dry up Reading InputStream to BytesReference
* Dry up spots where we use the same pattern to get from an InputStream to a BytesReferences
2019-07-09 09:18:25 +02:00
Armin Braun dc8f8e40eb
Fix DedicatedClusterSnapshotRestoreIT testSnapshotWithStuckNode (#43537) (#44082)
* Fix DedicatedClusterSnapshotRestoreIT testSnapshotWithStuckNode

* See comment in the test: The problem is that when the snapshot delete works out partially on master failover and the retry fails on `SnapshotMissingException` no repository cleanup is run => we still failed even with repo cleanup logic in the delete path now
   * Fixed the test by rerunning a create snapshot and delete loop to clean up the repo before verifying file counts
* Closes #39852
2019-07-09 06:32:08 +02:00
Armin Braun 03332b5aeb
Don't Consistency Check Broken Repository in Test (#43499) (#44071)
* Missed this one in #42189 and it randomly runs into a situation where the broken mock repo is broken such that we can't get to a consistent end state via a delete
* Closes #43498
2019-07-08 17:21:40 +02:00
Tanguy Leroux 251287f89d Check again on-going snapshots/restores of indices before closing (#43873)
Today we prevent any index that is actively snapshotted or restored to be closed. 
This verification is done during the execution of the first phase of index closing 
(ie before blocking the indices).

We should also do this verification again in the last phase of index closing 
(ie after the shard sanity checks and right before actually changing the index 
state and the routing table) because a snapshot/restore could sneak in while
 the shards are verified-before-close.
2019-07-08 17:07:04 +02:00
Mark Tozzi 299a52c17d
Enable validating user-supplied missing values on unmapped fields (#43718) (#43940)
Provides a hook for aggregations to introspect the `ValuesSourceType` for a user supplied Missing value on an unmapped field, when the type would otherwise be `ANY`.  Mapped field behavior is unchanged, and still applies the `ValuesSourceType` of the field.  This PR just provides the hook for doing this, no existing aggregations have their behavior changed.
2019-07-08 10:46:23 -04:00
Armin Braun 2918363e90
Simplify BlobStoreRepository (Flatten Nested Classes) (#42833) (#44060)
* In the current codebase it is hardly obvious what code operates on a shard and is run by a datanode what code operates on the global metadata and is run on master
   * Fixed by adjusting the method names accordingly
* The nested context classes don't add much if any value, they simply spread out the parameters that go into a shard snapshot create or delete all over the place since their
constructors can be inlined in all spots
   * Fixed by flattening the nested classes into BlobStoreRepository
* Also:
  * Inlined the other single use inner classes
2019-07-08 14:57:27 +02:00
Armin Braun afe81fd625
Some Cleanup in Test Framework (#44039) (#44059)
* Remove some obvious dead code
* Move assert methods that were only used in a single test class to the child they belong to
* Inline some redundant methods
2019-07-08 14:15:31 +02:00
David Turner 3f3bcb23c2 AwaitsFix testForceStaleReplicaToBePromotedToPrimary
Relates #44049
2019-07-08 11:26:57 +01:00
David Turner 3129f5b42e Do not copy initial recovery filter during split (#44053)
If an index is the result of a shrink then it will have a value set for
`index.routing.allocation.initial_recovery._id`. If this index is subsequently
split then this value will be copied over, forcing the initial allocation of
the split shards to occur on the node on which the shrink took place. Moreover
if this node no longer exists then the split will fail.  This commit suppresses
the copying of this setting when splitting an index.

Fixes #43955
2019-07-08 10:32:05 +01:00
Armin Braun af9b98e81c
Recursively Delete Unreferenced Index Directories (#42189) (#44051)
* Use ability to list child "folders" in the blob store to implement recursive delete on all stale index folders when cleaning up instead of using the diff between two `RepositoryData` instances to cover aborted deletes
* Runs after ever delete operation
* Relates  #13159 (fixing most of this issues caused by unreferenced indices, leaving some meta files to be cleaned up only)
2019-07-08 10:55:39 +02:00
Przemyslaw Gomulka 247f2dabad
Fix decimal point parsing for date_optional_time backport(#43859) #44050
Joda allowed for date_optional_time and strict_date_optional_time a decimal point to be . dot or , comma
For our java.time implementation we should also extend this for strict_date_optional_time-nanos
the approach to fix this is the same as in iso8601 parser
closes #43730
2019-07-08 09:56:01 +02:00
Armin Braun f6efc55556
Fix SnapshotResiliencyTest (#44015) (#44041)
* Closes #43989
2019-07-07 19:59:16 +02:00
Armin Braun 990ac4ca83
Some Cleanup in BlobStoreRepository (#43323) (#44043)
* Some Cleanup in BlobStoreRepository

* Extracted from #42833:
  * Dry up index and shard path handling
  * Shorten XContent handling
2019-07-07 19:50:46 +02:00
Nhat Nguyen 9089820d8f Enable indexing optimization using sequence numbers on replicas (#43616)
This PR enables the indexing optimization using sequence numbers on
replicas. With this optimization, indexing on replicas should be faster
and use less memory as it can forgo the version lookup when possible.
This change also deactivates the append-only optimization on replicas.

Relates #34099
2019-07-05 22:12:08 -04:00
Yannick Welsch 504a43d43a Move ConnectionManager to async APIs (#42636)
This commit converts the ConnectionManager's openConnection and connectToNode methods to
async-style. This will allow us to not block threads anymore when opening connections. This PR also
adapts the cluster coordination subsystem to make use of the new async APIs, allowing to remove
some hacks in the test infrastructure that had to account for the previous synchronous nature of the
connection APIs.
2019-07-05 20:40:22 +02:00
Yannick Welsch 88783927d1 Weaken assertion in PublicationTransportHandler (#44014)
These assertions do not hold true when a master fails during publication and quickly becomes
master again, publishing a new cluster state in a higher term which races against the previous
cluster state publication to self (which does not matter anyway).

Relates #43994

Closes #44012
2019-07-05 18:27:42 +02:00
Yannick Welsch 1220ff5b6d Publish to self through transport (#43994)
This commit ensures that cluster state publications to self also go through the transport layer. This
allows voting-only nodes to intercept the publication to self.

Fixes an issue discovered by a test failure where a voting-only node, which was the only
bootstrapped node, would not step down as master after state transfer because publishing to self
would succeed.

Closes #43631
2019-07-05 13:00:52 +02:00
Yannick Welsch 5cdf3ff3fa Revert "[TEST] Mute RemoteClusterServiceTests.testCollectNodes"
This reverts commit d8a2970fa4.
2019-07-05 11:02:42 +02:00
David Turner 06df0c0a4c Improve RetentionLease(Bgrd)SyncAction#toString() (#43987)
Today `RetentionLeaseSyncAction.Request` and
`RetentionLeaseBackgroundSyncAction.Request` both describe themselves as
`Request{...}` in the value returned from their respective `toString()`
methods. This commit adds the name of the owning class to both so we have
something a bit easier to search for and so we can distinguish foreground from
background syncs in logs and test failures and so on.
2019-07-05 09:58:35 +01:00
David Turner 435a83f3fd Add more logging to testOnlyBlocksOnConnectionsToNewNodes (#43979)
Some more output from this occasionally-failing test tracked in #40170.
2019-07-05 09:54:48 +01:00
Jim Ferenczi cdf55cb5c5 Refactor index engines to manage readers instead of searchers (#43860)
This commit changes the way we manage refreshes in the index engines.
Instead of relying on a SearcherManager, this change uses a ReaderManager that
creates ElasticsearchDirectoryReader when needed. Searchers are now created on-demand
(when acquireSearcher is called) from the current ElasticsearchDirectoryReader.
It also slightly changes the Engine.Searcher to extend IndexSearcher in order
to simplify the usage in the consumer.
2019-07-04 22:49:43 +02:00
Christoph Büscher aeb3c1fd1b Prevent types deprecation warning for indices.exists requests (#43963)
Currently we log a deprecation warning to the types removal in
RestGetIndicesAction even if the REST method is HEAD, which is used by the
indices.exists API. Since the body is empty in this case we should not need to
show the deprecation warning.

Closes #43905
2019-07-04 17:20:43 +02:00
Tanguy Leroux b037aeaa6e
Fix IndexShardIT.testIndexCanChangeCustomDataPath() (#43978)
The test IndexShardIT.testIndexCanChangeCustomDataPath() fails
 on 7.x and 7.3 because the translog cannot be recovered.

While I can't reproduce the issue, I think it has been introduced in #43752 
which changed ReadOnlyEngine so that it opens the translog in its 
constructor in order to load the translog stats. This opening writes a 
new checkpoint file, but because 7.x/7.3 does not wait for shards to be 
started after being closed, the test immediately starts to copy shard files
 to a new directory and possibly does not copy all the required translog files.

By waiting for the shards to be started after being closed, we ensure 
that the shards (and engines) have been correctly initialized and that
 the translog checkpoint file is not currently being written.

closes #43964
2019-07-04 17:06:37 +02:00
Alan Woodward 4b99255fed Add name() method to TokenizerFactory (#43909)
This brings TokenizerFactory into line with CharFilterFactory and TokenFilterFactory,
and removes the need to pass around tokenizer names when building custom analyzers.

As this means that TokenizerFactory is no longer a functional interface, the commit also
adds a factory method to TokenizerFactory to make construction simpler.
2019-07-04 11:28:55 +01:00
Jim Ferenczi 2cc0a56fe6 Fix wrong logic in `match_phrase` query with multi-word synonyms (#43941)
Disjunction over two individual terms in a phrase query with multi-word synonyms
wrongly applies a prefix query to each of these terms. This change fixes this bug
by inversing the logic to use prefixes on `phrase_prefix` queries only.

Closes #43308
2019-07-04 09:39:39 +02:00
Henning Andersen cacc3f7ff8 Async IO Processor release before notify (#43682)
This commit changes async IO processor to release the promiseSemaphore
before notifying consumers. This ensures that a bad consumer that
sometimes does blocking (or otherwise slow) operations does not halt the
processor. This should slightly increase the concurrency for shard
fsync, but primarily improves safety so that one bad piece of code has
less effect on overall system performance.
2019-07-04 06:33:38 +02:00
Igor Motov c593085104 Geo: Refactors libs/geo parser to provide serialization logic as well (#43717)
Enables libs/geo parser to return a geometry format object that can
perform both serialization and deserialization functions. This can
be useful for ingest nodes that are trying to modify an existing
geometry in the source.

Relates to #43554
2019-07-03 19:31:44 -04:00
Adrien Grand 680edbe3f1
Bump current version to 7.4. (#43927) 2019-07-03 20:32:04 +02:00
Armin Braun be20fb80e4
Recursive Delete on BlobContainer (#43281) (#43920)
This is a prerequisite of #42189:

* Add directory delete method to blob container specific to each implementation:
  * Some notes on the implementations:
       * AWS + GCS: We can simply exploit the fact that both AWS and GCS return blobs lexicographically ordered which allows us to simply delete in the same order that we receive the blobs from the listing request. For AWS this simply required listing without the delimiter setting (so we get a deep listing) and for GCS the same behavior is achieved by not using the directory mode on the listing invocation. The nice thing about this is, that even for very large numbers of blobs the memory requirements are now capped nicely since we go page by page when deleting.
       * For Azure I extended the parallelization to the listing calls as well and made it work recursively. I verified that this works with thread count `1` since we only block once in the initial thread and then fan out to a "graph" of child listeners that never block.
       * HDFS and FS are trivial since we have directory delete methods available for them
* Enhances third party tests to ensure the new functionality works (I manually ran them for all cloud providers)
2019-07-03 17:14:57 +02:00
Alan Woodward 49d69bf987 Actually close IndexAnalyzers contents (#43914)
IndexAnalyzers has a close() method that should iterate through all its wrapped
analyzers and close each one in turn. However, instead of delegating to the
analyzers' close() methods, it instead wraps them in a Closeable interface,
which just returns a list of the analyzers. In addition, whitespace normalizers are
ignored entirely.
2019-07-03 16:06:58 +01:00
David Turner 9cecc31cdc Shortcut simple patterns ending in `*` (#43904)
When profiling a call to `AllocationService#reroute()` in a large cluster
containing allocation filters of the form `node-name-*` I observed a nontrivial
amount of time spent in `Regex#simpleMatch` due to these allocation filters.
Patterns ending in a wildcard are not uncommon, and this change treats them as
a special case in `Regex#simpleMatch` in order to shave a bit of time off this
calculation. It also uses `String#regionMatches()` to avoid an allocation in
the case that the pattern's only wildcard is at the start.

Microbenchmark results before this change:

    Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
      1113.839 ±(99.9%) 6.338 ns/op [Average]
      (min, avg, max) = (1102.388, 1113.839, 1135.783), stdev = 9.486
      CI (99.9%): [1107.502, 1120.177] (assumes normal distribution)

Microbenchmark results with this change applied:

    Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
      433.190 ±(99.9%) 0.644 ns/op [Average]
      (min, avg, max) = (431.518, 433.190, 435.456), stdev = 0.964
      CI (99.9%): [432.546, 433.833] (assumes normal distribution)

The microbenchmark in question was:

    @Fork(3)
    @Warmup(iterations = 10)
    @Measurement(iterations = 10)
    @BenchmarkMode(Mode.AverageTime)
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    @State(Scope.Benchmark)
    @SuppressWarnings("unused") //invoked by benchmarking framework
    public class RegexStartsWithBenchmark {

        private static final String testString = "abcdefghijklmnopqrstuvwxyz";
        private static final String[] patterns;

        static {
            patterns = new String[testString.length() + 1];
            for (int i = 0; i <= testString.length(); i++) {
                patterns[i] = testString.substring(0, i) + "*";
            }
        }

        @Benchmark
        public void performSimpleMatch() {
            for (int i = 0; i < patterns.length; i++) {
                Regex.simpleMatch(patterns[i], testString);
            }
        }
    }
2019-07-03 14:15:27 +01:00
paulward24 cff027499a Ensure to access RecoveryState#fileDetails under lock
Closes #43840
2019-07-03 07:39:58 -04:00
Armin Braun 7059224668
Optimize Snapshot Finalization (#42723) (#43908)
* Optimize Snapshot Finalization

* Delete index-N blobs and segement blobs in one single bulk delete instead of in separate ones to save RPC calls on implementations that have bulk deletes implemented
* Don't fail snapshot because deleting old index-N failed, this results in needlessly logging finalization failures and makes analysis of failures harder going forward as well as incorrect index.latest blobs
2019-07-03 13:26:35 +02:00
Armin Braun 455b12a4fb
Add Ability to List Child Containers to BlobContainer (#42653) (#43903)
* Add Ability to List Child Containers to BlobContainer (#42653)

* Add Ability to List Child Containers to BlobContainer
* This is a prerequisite of #42189
2019-07-03 11:30:49 +02:00
Henning Andersen cd2972239c AsyncIOProcessor preserve thread context (#43729)
AsyncIOProcessor now preserves thread context, ensuring that deprecation
warnings are not duplicated to other concurrent operations on the same
shard.
2019-07-03 10:22:20 +02:00
Jim Ferenczi 05c0cff1b6 Fix index_prefix sub field name on nested text fields (#43862)
This change fixes the name of the index_prefix sub field when the `index_prefix`
option is set on a text field that is nested under an object or a multi-field.
We don't use the full path of the parent field to set the index_prefix field name
so the field is registered under the wrong name. This doesn't break queries since
we always retrieve the prefix field through its parent field but this breaks other
APIs like _field_caps which tries to find the parent of the `index_prefix` field
in the mapping but fails.

Closes #43741
2019-07-03 09:50:52 +02:00
Armin Braun 826f38cd70
Enable Parallel Deletes in Azure Repository (#42783) (#43886)
* Parallel deletes via private thread pool
2019-07-03 09:28:39 +02:00
Tanguy Leroux 365dfe88ca Refresh translog stats after translog trimming in NoOpEngine (#43825)
This commit changes NoOpEngine so that it refreshes its translog 
stats once translog is trimmed.

Relates #43156
2019-07-03 08:49:14 +02:00
Jake Landis 2dc056b0a0
Read the default pipeline for bulk upsert through an alias (#41963) (#42802)
This commit allows bulk upserts to correctly read the default pipeline
for the concrete index that belongs to an alias.

Bulk upserts are modeled differently from normal index requests such that
the index request is a request inside of the update request. The update
request (outer) contains the index or alias name is not part of the (inner)
index request. This commit adds a secondary check against the update request
(outer) if the index request (inner) does not find an alias.
2019-07-02 20:44:33 -05:00
Christoph Büscher 31cf96e7bf Return reloaded analyzers in _reload_search_ananlyzer response (#43813)
Currently the repsonse of the "_reload_search_analyzer" endpoint contains the
index names and nodeIds of indices were analyzers reloading was triggered. This
change add the names of the search-time analyzers that were reloaded.

Closes #43804
2019-07-02 18:51:15 +02:00
Nhat Nguyen 697cd494bf Remove sort by primary term when reading soft-deletes (#43845)
With Lucene rollback (#33473), we should never have more than one
primary term for each sequence number. Therefore we don't have to sort
by the primary term when reading soft-deletes.
2019-07-02 10:54:32 -04:00
Tanguy Leroux b977f019b8
Expose translog stats in ReadOnlyEngine (#43752) (#43823)
Backport of #43752 for 7.x.
2019-07-02 13:39:00 +02:00
David Turner 1e8e85797d Rename and refactor RoutingService (#43827)
The `RoutingService` has a confusing name, since it doesn't really have
anything to do with routing. Its responsibility is submitting reroute commands
to the master.

This commit renames this class to `BatchedRerouteService`, and extracts the
`RerouteService` interface to avoid passing `BiConsumer`s everywhere. It also
removes that `BatchedRerouteService extends AbstractLifecycleComponent` since
this service has no meaningful lifecycle. Finally, it introduces a small
wrapper class to allow for lazy initialization to deal with the dependency loop
when constructing a `Node`.
2019-07-02 07:04:18 +01:00
Christoph Büscher fe3f9f0c6b Yet another `the the` cleanup (#43815) 2019-07-01 20:22:19 +02:00
Yogesh Gaikwad 031d5e96ac
HLRC changes for kerberos grant type (#43642) (#43822)
The TODO from last PR for kerbero grant type was missed.
This commit adds the changes for kerberos grant type in HLRC.
2019-07-02 00:55:02 +10:00
Zachary Tong 1e47ea5f18 Update rare_term version skips, fix SetBackedScalingCuckooFilter javadoc 2019-07-01 10:52:06 -04:00
Zachary Tong ea1794832f Add RareTerms aggregation (#35718)
This adds a `rare_terms` aggregation.  It is an aggregation designed
to identify the long-tail of keywords, e.g. terms that are "rare" or
have low doc counts.

This aggregation is designed to be more memory efficient than the
alternative, which is setting a terms aggregation to size: LONG_MAX
(or worse, ordering a terms agg by count ascending, which has
unbounded error).

This aggregation works by maintaining a map of terms that have
been seen. A counter associated with each value is incremented
when we see the term again.  If the counter surpasses a predefined
threshold, the term is removed from the map and inserted into a cuckoo
filter.  If a future term is found in the cuckoo filter we assume it
was previously removed from the map and is "common".

The map keys are the "rare" terms after collection is done.
2019-07-01 10:30:02 -04:00
Nhat Nguyen 598e00a689 Make peer recovery send file info step async (#43792)
Relates #36195
2019-07-01 08:40:45 -04:00
Julie Tibshirani ffa5919d7c
Add support for 'flattened object' fields. (#43762)
This commit merges the `object-fields` feature branch. The new 'flattened
object' field type allows an entire JSON object to be indexed into a field, and
provides limited search functionality over the field's contents.
2019-07-01 12:08:50 +03:00
Ryan Ernst 3a2c698ce0
Rename Action to ActionType (#43778)
Action is a class that encapsulates meta information about an action
that allows it to be called remotely, specifically the action name and
response type. With recent refactoring, the action class can now be
constructed as a static constant, instead of needing to create a
subclass. This makes the old pattern of creating a singleton INSTANCE
both misnamed and lacking a common placement.

This commit renames Action to ActionType, thus allowing the old INSTANCE
naming pattern to be TYPE on the transport action itself. ActionType
also conveys that this class is also not the action itself, although
this change does not rename any concrete classes as those will be
removed organically as they are converted to TYPE constants.

relates #34389
2019-06-30 22:00:17 -07:00
David Turner fca7a19713 Avoid parallel reroutes in DiskThresholdMonitor (#43381)
Today the `DiskThresholdMonitor` limits the frequency with which it submits
reroute tasks, but it might still submit these tasks faster than the master can
process them if, for instance, each reroute takes over 60 seconds. This causes
a problem since the reroute task runs with priority `IMMEDIATE` and is always
scheduled when there is a node over the high watermark, so this can starve any
other pending tasks on the master.

This change avoids further updates from the monitor while its last task(s) are
still in progress, and it measures the time of each update from the completion
time of the reroute task rather than its start time, to allow a larger window
for other tasks to run.

It also now makes use of the `RoutingService` to submit the reroute task, in
order to batch this task with any other pending reroutes. It enhances the
`RoutingService` to notify its listeners on completion.

Fixes #40174
Relates #42559
2019-06-30 16:54:16 +01:00
Nhat Nguyen 55b3ec8d7b Make peer recovery clean files step async (#43787)
Relates #36195
2019-06-29 18:30:51 -04:00
Albert Zaharovits 5e17bc5dcc
Consistent Secure Settings #40416
Introduces a new `ConsistentSecureSettingsValidatorService` service that exposes
a single public method, namely `allSecureSettingsConsistent`. The method returns
`true` if the local node's secure settings (inside the keystore) are equal to the
master's, and `false` otherwise. Technically, the local node has to have exactly
the same secure settings - setting names should not be missing or in surplus -
for all `SecureSetting` instances that are flagged with the newly introduced
`Property.Consistent`. It is worth highlighting that the `allSecureSettingsConsistent`
is not a consensus view across the cluster, but rather the local node's perspective
in relation to the master.
2019-06-29 23:26:17 +03:00
Ryan Ernst 28ab77a023
Add StreamableResponseAction to aid in deprecation of Streamable (#43770)
The Action base class currently works for both Streamable and Writeable
response types. This commit intorduces StreamableResponseAction, for
which only the legacy Action implementions which provide newResponse()
will extend. This eliminates the need for overriding newResponse() with
an UnsupportedOperationException.

relates #34389
2019-06-28 21:40:00 -07:00
Tanguy Leroux f02cbe9e40 Trim translog for closed indices (#43156)
Today when an index is closed all its shards are forced flushed
but the translog files are left around. As explained in #42445
we'd like to trim the translog for closed indices in order to
consume less disk space. This commit reuses the existing
AsyncTrimTranslogTask task and reenables it for closed indices.

At the time the task is executed, we should have the guarantee
that nothing holds the translog files that are going to be removed.
It also leaves a short period of time (10 min) during which translog
files of a recently closed index are still present on disk. This could
 also help in some cases where the closed index is reopened
shortly after being closed (in order to update an index setting
for example).

Relates to #42445
2019-06-28 16:58:39 +02:00
Jim Ferenczi 7ca69db83f Refactor IndexSearcherWrapper to disallow the wrapping of IndexSearcher (#43645)
This change removes the ability to wrap an IndexSearcher in plugins. The IndexSearcherWrapper is replaced by an IndexReaderWrapper and allows to wrap the DirectoryReader only. This simplifies the creation of the context IndexSearcher that is used on a per request basis. This change also moves the optimization that was implemented in the security index searcher wrapper to the ContextIndexSearcher that now checks the live docs to determine how the search should be executed. If the underlying live docs is a sparse bit set the searcher will compute the intersection
betweeen the query and the live docs instead of checking the live docs on every document that match the query.
2019-06-28 16:28:02 +02:00
weizijun 377c4cfdc0 Fix threshold spelling errors (#43326)
Substitutes treshold by threshold
2019-06-28 15:47:57 +02:00
Alan Woodward 81dbcfb268 Wildcard intervals (#43691)
This commit adds a wildcard intervals source, similar to the prefix. It
also changes the term parameter in prefix to read prefix, to bring it
in to line with the pattern parameter in wildcard.

Closes #43198
2019-06-28 14:04:03 +01:00
Christoph Büscher 2cc7f5a744
Allow reloading of search time analyzers (#43313)
Currently changing resources (like dictionaries, synonym files etc...) of search
time analyzers is only possible by closing an index, changing the underlying
resource (e.g. synonym files) and then re-opening the index for the change to
take effect.

This PR adds a new API endpoint that allows triggering reloading of certain
analysis resources (currently token filters) that will then pick up changes in
underlying file resources. To achieve this we introduce a new type of custom
analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows
swapping out analysis components. Custom analyzers that contain filters that are
markes as "updateable" will automatically choose this implementation. This PR
also adds this capability to `synonym` token filters for use in search time
analyzers.

Relates to #29051
2019-06-28 09:55:40 +02:00
Alan Woodward 51b230f6ab
Fix PreConfiguredTokenFilters getSynonymFilter() implementations (#38839) (#43678)
When we added support for TokenFilterFactories to specialise how they were used when parsing
synonym files, PreConfiguredTokenFilters were set up to either apply themselves, or be ignored.
This behaviour is a leftover from an earlier iteration, and also has an incorrect default.

This commit makes preconfigured token filters usable in synonym file parsing by default, and brings
those filters that should not be used into line with index-specific filter factories; in indexes created
before version 7 we emit a deprecation warning, and we throw an error in indexes created after.

Fixes #38793
2019-06-28 08:19:00 +01:00
Ryan Ernst 5b4089e57e
Remove nodeId from BaseNodeRequest (#43658)
TransportNodesAction provides a mechanism to easily broadcast a request
to many nodes, and collect the respones into a high level response. Each
node has its own request type, with a base class of BaseNodeRequest.
This base request requires passing the nodeId to which the request will
be sent. However, that nodeId is not used anywhere. It is private to the
base class, yet serialized to each node, where the node could just as
easily find the nodeId of the node it is on locally.

This commit removes passing the nodeId through to the node request
creation, and guards its serialization so that we can remove the base
request class altogether in the future.
2019-06-27 18:45:14 -07:00