Commit Graph

2637 Commits

Author SHA1 Message Date
Mirko Jotic a6ae146ccc Converting Derivative Pipeline Agg integration test into AggregatorTestsCase. (#38679)
Replicates the majority of existing Derivative pipeline integration tests into
an AggregatorTestCase, with the goal of removing the integration
tests in the near future.
2019-02-20 16:35:32 -05:00
Igor Motov 3d93011e32 Fix median calculation in MedianAbsoluteDeviationAggregatorTests (#38979)
Fixes an error in median calculation in
MedianAbsoluteDeviationAggregatorTests for odd number of sample points,
which causes some rare test failures.

Fixes #38937
2019-02-20 13:24:30 -05:00
Ioannis Kakavas c783069804
Fix NPE on Stale Index in IndicesService(#39173)
This is a backport of  #38891 which closes #38845
2019-02-20 15:35:35 +02:00
David Turner efffb3d5b7 Simplify calculation in AwarenessAllocationDecider (#38091)
Today's calculation of the maximum number of shards per attribute is rather
convoluted. This commit clarifies that it returns
ceil(shardCount/numberOfAttributes).
2019-02-20 08:54:57 +00:00
Henning Andersen 00a26b9dd2 Blob store compression fix (#39073)
Blob store compression was not enabled for some of the files in
snapshots due to constructor accessing sub-class fields. Fixed to
instead accept compress field as constructor param. Also fixed chunk
size validation to work.

Deprecated repositories.fs.compress setting as well to be able to unify
in a future commit.
2019-02-20 09:24:41 +01:00
Hendrik Muhs 50b3858f7c add version 6.6.2 2019-02-19 20:28:06 +01:00
David Turner 0a9574c9d4 Add some missing toString() implementations (#39124)
Sometimes we turn objects into strings for logging or debugging using
`toString()`, but the default implementation is often unhelpful. This change
improves on this in two places I ran into recently.
2019-02-19 17:52:41 +00:00
Jason Tedor fef9bdb23f
Allow retention lease operations under blocks (#39089)
This commit allows manipulating retention leases under blocks.
2019-02-19 10:26:49 -05:00
Jason Tedor 12f6963456
Fix retention leases sync on recovery test
This test had a bug. We attempt to allow only the primary to be
allocated, to force all replicas to recovery from the primary after we
had set the state of the retention leases on the primary. However, in
building the index settings, we were overwriting the settings that
exclude the replicas from being allocated. This means that some of the
replicas would end up assigned and rather than receive retention leases
during recovery, they would be part of the replication group receiving
retention leases as they are manipulated. Since retention lease renewals
are only synced periodically, this means that the replica could be
lagging a little behind in some cases leading to an assertion tripping
in the test. This commit addresses this by ensuring that the replicas
are indeed not allocated until after the retention leases are done being
manipulated on the replica. We did this by not overwriting the exclude
settings.

Closes #39105
2019-02-19 09:07:33 -05:00
Alexander Reelsen 7f8a640363 Fix DateFormatters.parseMillis when no timezone is given (#39100)
The parseMillis method was able to work on formats without timezones by
falling back to UTC. The Date Formatter interface did not support this, as
the calling code was using the `Instant.from` java time API.

This switches over to an internal method which adds UTC as a timezone.

Closes #39067
2019-02-19 14:12:22 +01:00
Jim Ferenczi 199155f5fb
Enforce Completion Context Limit (#38675) (#39075)
This change adds a limit to the number of completion contexts that a completion field can define.

Closes #32741
2019-02-19 08:52:24 +01:00
Albert Zaharovits 6bc88b00ec Mute GatewayMetaStateTests.testAtomicityWithFailures (#39079)
Mute test GatewayMetaStateTests.testAtomicityWithFailures
2019-02-19 00:25:45 +02:00
Jason Tedor 2d8f6b6501
Introduce retention lease state file (#39004)
This commit moves retention leases from being persisted in the Lucene
commit point to being persisted in a dedicated state file.
2019-02-18 16:53:46 -05:00
Jason Tedor d43ac8fe11
Include in log retention leases that failed to sync
When retention leases fail to sync after an expiration check, we emit a
log message about this. This commit adds the retention leases that
failed to sync.
2019-02-18 15:08:08 -05:00
Jason Tedor bbb61002ba
Add some logging related to retention lease syncing (#39066)
When the background retention lease sync fires, we check an see if any
retention leases are expired. If any did expire, we execute a full
retention lease sync (write action). Since this is happening on a
background thread, we do not block that thread waiting for success (it
will simply try again when the timer elapses). However, we were
swallowing exceptions that indicate failure. This commit addresses that
by logging the failures. Additionally, we add some trace logging to the
execution of syncing retention leases.
2019-02-18 15:02:31 -05:00
Henning Andersen 99b2bc3461 Fix potential race during TcpTransport close (#39031)
Fixed two potential causes for leaked threads during tests:
1. When adding a channel to serverChannels, we add it under a monitor
that we do not use when reading from it. This is potentially unsafe if
there is no other happens-before relationship ensuring the safety of
this.
2. Long-shot but if the thread pool was shutdown before entering this
code, we would silently forget about closing server channels so added
assert.

Strengthened the locking to ensure that once we stop the transport, no
new server channels can be made.

Relates to CI failure issue: #37543
2019-02-18 19:13:23 +01:00
Alan Woodward ab4d5f404f Add overlapping, before, after filters to intervals query (#38999)
Lucene recently added `overlapping`, `before` and `after` filters to the intervals package. This
commit exposes them in elasticsearch.
2019-02-18 15:06:24 +00:00
Adrien Grand 45b17e8645
Don't close caches while there might still be in-flight requests. (#38958)
Many of our index components use ref-counting so that in the event that a shard
is closed while there are still ongoing requests, then the index reader and the
store only effectively get closed when ongoing requests have finished. However
we don't apply the same principle to the request and query caches, which might
get closed while there are still in-flight requests.

This commit adds ref-counting to `IndicesService` so that the caches and other
components it maintains only get closed when all shards are effectively closed.

Closes #37117
2019-02-18 13:59:58 +01:00
Martijn van Groningen ed08bc3537
Fix LocalIndexFollowingIT#testRemoveRemoteConnection() test (#38709)
* During fetching remote mapping if remote client is missing then
`NoSuchRemoteClusterException` was not handled.
* When adding remote connection, check that it is really connected
before continue-ing to run the tests.

Relates to #38695
2019-02-18 09:41:44 +01:00
Jason Tedor a5ce1e0bec
Integrate retention leases to recovery from remote (#38829)
This commit is the first step in integrating shard history retention
leases with CCR. In this commit we integrate shard history retention
leases with recovery from remote. Before we start transferring files, we
take out a retention lease on the primary. Then during the file copy
phase, we repeatedly renew the retention lease. Finally, when recovery
from remote is complete, we disable the background renewing of the
retention lease.
2019-02-16 15:37:52 -05:00
Tim Brooks b1c1daa63f
Add get file chunk timeouts with listener timeouts (#38758)
This commit adds a `ListenerTimeouts` class that will wrap a
`ActionListener` in a listener with a timeout scheduled on the generic
thread pool. If the timeout expires before the listener is completed,
`onFailure` will be called with an `ElasticsearchTimeoutException`.

Timeouts for the get ccr file chunk action are implemented using this
functionality. Additionally, this commit attempts to fix #38027 by also
blocking proxied get ccr file chunk actions. This test being un-muted is
useful to verify the timeout functionality.
2019-02-16 10:56:03 -07:00
Luca Cavanna a1a49f201d Tie break search shard iterator comparisons on cluster alias (#38853)
`SearchShardIterator` inherits its `compareTo` implementation from `PlainShardIterator`. That is good in most of the cases, as such comparisons are based on the shard id which is unique, even when searching against indices with same names across multiple clusters (thanks to the index uuid being different). In case though the same cluster is registered multiple times with different aliases, the shard id is exactly the same, hence remote results will be returned before local ones with same shard id objects. That is because remote iterators are added before local ones, and we use a stable sorting method in `GroupShardIterators` constructor.

This PR enhances `compareTo` for `SearchShardIterator` to tie break on cluster alias and introduces consistent `equals` and `hashcode` methods. This allows to remove a TODO in `SearchResponseMerger` which otherwise has to handle this special case specifically. Also, while at it I added missing tests around equals/hashcode and compareTo and expanded existing ones.
2019-02-16 09:41:03 +01:00
Nhat Nguyen 7e20a92888 Advance max_seq_no before add operation to Lucene (#38879)
Today when processing an operation on a replica engine (or the 
following engine), we first add it to Lucene, then add it to translog, 
then finally marks its seq_no as completed. If a flush occurs after step1,
but before step-3, the max_seq_no in the commit's user_data will be
smaller than the seq_no of some documents in the Lucene commit.
2019-02-15 21:04:28 -05:00
Nhat Nguyen 20755e666c Reduce global checkpoint sync interval in disruption tests (#38931)
We verify seq_no_stats is aligned between copies at the end of some
disruption tests. Sometimes, the assertion `assertSeqNos` is tripped due
to a lagged global checkpoint on replicas. The global checkpoint on
replicas is lagged because we sync the global checkpoint 30 seconds (by
default) after the last replication operation. This change reduces the
global checkpoint sync-internal to 1s in the disruption tests.

Closes #38318
Closes #36789
2019-02-15 21:04:20 -05:00
Nhat Nguyen a67b9f6d1f Relax testStressMaybeFlushOrRollTranslogGeneration (#38918)
The predicate shouldPeriodicallyFlush is determined by the uncommitted
translog size and the local checkpoint. The uncommitted translog size
depends on the local checkpoint. The condition shouldPeriodicallyFlush
can be true twice in in the test in the following scenario:

1. Index doc-0 and advances the local checkpoint to 0, the condition
shouldPeriodicallyFlush remains false.

2. Index doc-1 and add it to translog, but the local checkpoint is not
advanced yet (still 0). The condition shouldPeriodicallyFlush becomes
true because the uncommitted translog size is 216bytes (2ops + gen-1 +
gen-2) > 180bytes and the translog generation of the new index commit
would advance from 1 to 2.

> [2019-02-13T23:33:58,257][TRACE][o.e.i.e.Engine           ] [node_s_0]
> [test][0] committing writer with commit data [{local_checkpoint=0,
> max_unsafe_auto_id_timestamp=-1, translog_uuid=fFp1Yqd4QiqKDD4ZrC8F-g,
> min_retained_seq_no=0, history_uuid=cn31yrwVQk-Vs7qcg4bi_Q,
> retention_leases=primary_term:1;version:0;, translog_generation=2,
> max_seq_no=1}]

1. The shouldPeriodicallyFlush becomes true again after the local
checkpoint is advanced to 1 because the uncommitted translog size is
216bytes (2ops + gen-2 + gen-3) > 180bytes and the translog generation
of the new index commit would advance from 2 to 4.

> [2019-02-13T23:33:58,264][TRACE][o.e.i.e.Engine           ] [node_s_0]
> [test][0] committing writer with commit data [{local_checkpoint=1,
> max_unsafe_auto_id_timestamp=-1, translog_uuid=fFp1Yqd4QiqKDD4ZrC8F-g,
> min_retained_seq_no=0, history_uuid=cn31yrwVQk-Vs7qcg4bi_Q,
> retention_leases=primary_term:1;version:0;, translog_generation=4,
> max_seq_no=1}]

We need to relax the assertion in this test to cover this situation.

Closes #31629
2019-02-15 21:04:12 -05:00
Armin Braun 238425e5e7 Fix Issue with Concurrent Snapshot Init + Delete (#38518)
* Fix Issue with Concurrent Snapshot Init + Delete by ensuring that we're not finalizing a snapshot in the repository while it is initializing on another thread

* Closes #38489
2019-02-15 16:50:47 -08:00
Alan Woodward 176013e23c Avoid double term construction in DfsPhase (#38716)
DfsPhase captures terms used for scoring a query in order to build global term statistics across
multiple shards for more accurate scoring. It currently does this by building the query's `Weight`
and calling `extractTerms` on it to collect terms, and then calling `IndexSearcher.termStatistics()`
for each collected term. This duplicates work, however, as the various `Weight` implementations 
will already have collected these statistics at construction time.

This commit replaces this round-about way of collecting stats, instead using a delegating
IndexSearcher that collects the term contexts and statistics when `IndexSearcher.termStatistics()`
is called from the Weight.

It also fixes a bug when using rescorers, where a `QueryRescorer` would calculate distributed term
statistics, but ignore field statistics.  `Rescorer.extractTerms` has been removed, and replaced with
a new method on `RescoreContext` that returns any queries used by the rescore implementation.
The delegating IndexSearcher then collects term contexts and statistics in the same way described
above for each Query.
2019-02-15 16:00:38 +00:00
Daniel Mitterdorfer fcc7f553f5
Also mmap cfs files for hybridfs (#38940) (#38947)
With this commit we add the `.cfs` file extension to the list of file
types that are memory-mapped by hybridfs. `.cfs` files combine all files
of a Lucene segment into a single file in order to save file handles. As
this strategy is only used for "small" segments (less than 10% of the
shard size), it is benefical to memory-map them instead of accessing
them via NIO.

Relates #36668
2019-02-15 15:34:40 +01:00
David Turner 578514e892 Recover peers from translog, ignoring soft deletes (#38904)
Today if soft deletes are enabled then we read the operations needed for peer
recovery from Lucene. However we do not currently make any attempt to retain
history in Lucene specifically for peer recoveries so we may discard it and
fall back to a more expensive file-based recovery. Yet we still retain
sufficient history in the translog to perform an operations-based peer
recovery.

In the long run we would like to fix this by retaining more history in Lucene,
possibly using shard history retention leases (#37165). For now, however, this
commit reverts to performing peer recoveries using the history retained in the
translog regardless of whether soft deletes are enabled or not.
2019-02-15 10:45:15 +01:00
Henning Andersen a211e51343 ShardBulkAction ignore primary response on primary (#38901)
Previously, if a version conflict occurred and a previous primary
response was present, the original primary response would be used both
for sending to replica and back to client. This was made in the past as an
attempt to fix issues with conflicts after relocations where a bulk request
would experience a closed shard half way through and thus have to retry
on the new primary. It could then fail on its own update.

With sequence numbers, this leads to an issue, since if a primary is
demoted (network partitions), it will send along the original response
in the request. In case of a conflict on the new primary, the old
response is sent to the replica. That data could be stale, leading to
inconsistency between primary and replica.

Relocations now do an explicit hand-off from old to new primary and
ensures that no operations are active while doing this. Above is thus no
longer necessary. This change removes the special handling of conflicts
and ignores primary responses when executing shard bulk requests on the
primary.
2019-02-15 10:13:11 +01:00
Jason Tedor 00cb8d0be8
Mark coordinator test as awaits fix
This test is failing frequently so this commit mutes it.

Relates #38867
2019-02-14 12:43:31 -05:00
Lee Hinman 0c733c04be Remove immediate operation retry after mapping update (#38873)
Prior to this commit, when an indexing operation resulted in an
`Engine.Result.Type.MAPPING_UPDATE_REQUIRED`, TransportShardBulkAction
immediately retries the indexing operation to see if it succeeds. In the event
that it succeeds the context does not wait until the mapping update has
propagated through the cluster state before finishing the indexing.

In some of our tests we rely on mappings being available as soon as they've been
introduced in a document that indexed correctly. By removing the immediate retry
we always wait for this to be the case.

Resolves #38428
Supercedes #38579
Relates to #38711
2019-02-14 09:31:08 -07:00
Christoph Büscher 6c5cec4ff4 Enable silent FollowersCheckerTest (#38851)
One of the test methods wasn't run because it was private. Making this method
public and fixing some issues around mocking the threadpool that otherwise would
lead to an NPE.
2019-02-14 16:16:48 +01:00
Albert Zaharovits 6243a9797f _cat/indices with Security, hide names when wildcard (#38824)
This changes the output of the `_cat/indices` API with `Security` enabled.

It is possible to only display the index name (and possibly the index
health, depending on the request options) but not its stats (doc count, merges,
size, etc). This is the case for closed indices which have index metadata in the
cluster state but no associated shards, hence no shard stats.
However, when `Security` is enabled, and the request contains wildcards,
**open** indices without stats are a common occurrence. This is because the
index names in the response table are picked up directly from the cluster state
which is not filtered by `Security`'s _indexNameExpressionResolver_, unlike the
stats data which is populated by the indices stats API which does go through the
index name resolver.
This is a bug, because it is circumventing `Security`'s function to hide
unauthorized indices.

This has been fixed by displaying the index names as they are resolved by the indices
stats API. The outputs of these two APIs is now very similar: same index names,
similar data but different format.

Closes #37190
2019-02-14 15:09:17 +02:00
David Roberts 6ea483a663 Mute DedicatedClusterSnapshotRestoreIT testRestoreShrinkIndex
Due to https://github.com/elastic/elasticsearch/issues/38845
2019-02-14 11:46:22 +00:00
Luca Cavanna 7456117019 [TEST] address testCollectNodes rare failure (#38559)
#37767 changed the expected exception for "no such cluster" error from
`IllegalStateException` to a dedicated `NoSuchRemoteClusterException`.
An assertion in `testCollectNodes` needs to be updated accordingly.
2019-02-14 10:57:14 +01:00
Nhat Nguyen 5d22e45990 Copy retention leases when trim unsafe commits (#37995)
When a primary shard is recovered from its store, we trim the last
commit (when it's unsafe). If that primary crashes before the recovery
completes, we will lose the committed retention leases because they are
baked in the last commit. With this change, we copy the retention leases
from the last commit to the safe commit when trimming unsafe commits.

Relates #37165
2019-02-13 17:27:48 -05:00
Jason Tedor 062eea8fcc
Fix excessive increments in soft delete policy (#38813)
In this case, we were incrementing the policy too much. This means on
every iteration we actually keep increasing the minimum retained
sequence number, even with leases in place. It was a bug from when the
soft deletes policy had retention leases incorporated into it. This
commit fixes this bug by ensuring we only increment in the proper
places, and adds careful tests for the various situations.
2019-02-13 14:04:45 -05:00
Jake Landis 46bb663a09
Make 7.x like 6.7 user agent ecs, but default to true (#38828)
Forward port of https://github.com/elastic/elasticsearch/pull/38757

This change reverts the initial 7.0 commits and replaces them
with the 6.7 variant that still allows for the ecs flag. 
This commit differs from the 6.7 variants in that ecs flag will 
now default to true. 

6.7: `ecs` : default `false`
7.x: `ecs` : default `true`
8.0: no option, but behaves as `true`

* Revert "Ingest node - user agent, move device to an object (#38115)"
This reverts commit 5b008a34aa.

* Revert "Add ECS schema for user-agent ingest processor (#37727) (#37984)"
This reverts commit cac6b8e06f.

* cherry-pick 5dfe1935345da3799931fd4a3ebe0b6aa9c17f57 
Add ECS schema for user-agent ingest processor (#37727)

* cherry-pick ec8ddc890a34853ee8db6af66f608b0ad0cd1099 
Ingest node - user agent, move device to an object (#38115) (#38121)
  
* cherry-pick f63cbdb9b426ba24ee4d987ca767ca05a22f2fbb (with manual merge fixes)
Dep. check for ECS changes to User Agent processor (#38362)

* make true the default for the ecs option, and update 7.0 references and tests
2019-02-13 10:28:01 -06:00
Przemyslaw Gomulka 7404882105
Fix line separators in JSON logging tests backport#38771 #38834
The hardcoded '\n' in string will not work in Windows where there is a
different line separator. A System.lineSeparator should be used to make
it work on all platforms
closes #38705 
backport #38771
2019-02-13 13:34:33 +01:00
Zachary Tong 57f69082fd
Disable cache on QueryProfilerIT (#38748)
- Disables the request cache on the test, to prevent cached
values from potentially interfering with test results
- Changes the test to execute a single query, in hopes of making
failures more reproducible
Backport of #38583
2019-02-12 13:11:52 -05:00
Nhat Nguyen a3f39741be Adjust log and unmute testFailOverOnFollower (#38762)
There were two documents (seq=2 and seq=103) missing on the follower in
one of the failures of `testFailOverOnFollower`. I spent several hours
on that failure but could not figure out the reason. I adjust log and
unmute this test so we can collect more information.

Relates #38633
2019-02-12 11:42:25 -05:00
Nhat Nguyen 4a5070dcfb Use current term in initial leases in engine test (#38285)
We need to use the current primary term instead of 1L for the initial
retention leases; otherwise, the primary term of the committed
retention leases won't match the current primary term if the
retention leases never gets updated.
2019-02-12 11:40:04 -05:00
Nhat Nguyen eca5404572 Fix synchronization in LocalCheckpointTracker#contains (#38755)
We are accessing the `CountedBitSet` in `LocalCheckpointTracker#contains`
without proper synchronization.

Relates #33871
2019-02-12 11:39:50 -05:00
Nhat Nguyen 225ebb6935 Ensure no snapshotted commit when close engine (#38663)
With this change, we can automatically detect an implementation 
that acquires an index commit but fails to release.
2019-02-12 11:39:35 -05:00
Tanguy Leroux 51d6b9ab31 Fix CloseWhileRelocatingShardsIT (#38728) 2019-02-12 14:04:44 +01:00
Jason Tedor bbc9aa9979
Introduce retention lease actions (#38756)
This commit introduces actions for some common retention lease
operations that clients need to be able to perform remotely. These
actions include add/renew/remove.
2019-02-12 07:38:03 -05:00
Przemyslaw Gomulka 7e178aa4a7
Enable IndexActionTests and WatcherIndexingListenerTests Backport #38738
fix tests to use clock in milliseconds precision in watcher code
make sure the date comparison in string format is using same formatters
some of the code was modified in #38514 possibly because of merge conflicts

closes #38581
Backport #38738
2019-02-12 13:05:44 +01:00
Luca Cavanna 90fff54954 Tie break on cluster alias when merging shard search failures (#38715)
A recent test failure triggered an edge case scenario where failures may be coming back with the same shard id, yet from different clusters.
This commit adapts the failures comparator to take the cluster alias into account when merging failures as part of CCS requests execution.
Also the corresponding test has been split in two: with and without
search shard target set to the failure.

Closes #38672
2019-02-12 11:25:44 +01:00
Jason Tedor c7cdd6a46a
Add dedicated retention lease exceptions (#38754)
When a retention lease already exists on an add retention lease
invocation, or a retention lease is not found on a renew retention lease
invocation today we throw an illegal argument exception. This puts a
burden on the caller to catch that specific exception and parse the
message. This commit relieves the burden from the caller by adding
dedicated exception types for these situations.
2019-02-12 00:32:09 -05:00
Jason Tedor b97c74bbab
Enable removal of retention leases (#38751)
This commit introduces the ability to remove retention leases. Explicit
removal will be needed to manage retention leases used to increase the
likelihood of operation-based recoveries syncing, and for consumers such
as ILM.
2019-02-11 21:19:11 -05:00
Nick Knize e2f432a413 Fix the version check for LegacyGeoShapeFieldMapper (#38547)
Change version check from 7.0 to 6.6 in BaseGeoShapeFieldMapper to correctly use LegacyGeoShapeFieldMapper for indexes created prior to 6.6.
2019-02-11 16:27:47 -06:00
Nick Knize 078da6d9bd Fix GeoHash PrefixTree BWC (#38584)
geo_shape indexes created before 6.6 use geohash string encoding as default tree parameter and quadtree encoding for 6.6 and later. This commit fixes bwc to use geohash encoding in LegacyGeoshapeFieldMapper for indexes created before 6.6.
2019-02-11 11:59:51 -06:00
David Roberts d1848b96fc
Fix possible assertion failure in IndicesQueryCache.close (#38731)
The assertion that the stats2 map is empty in
IndicesQueryCache.close has been observed to
fail very occasionally in internal cluster tests.

The likely cause is a cross-thread visibility
problem for a count variable.  This change
makes that count volatile.

Relates #37117
Backport of #38714
2019-02-11 17:33:20 +00:00
Tanguy Leroux dc212de822
Specialize pre-closing checks for engine implementations (#38702) (#38722)
The Close Index API has been refactored in 6.7.0 and it now performs 
pre-closing sanity checks on shards before an index is closed: the maximum 
sequence number must be equals to the global checkpoint. While this is a 
strong requirement for regular shards, we identified the need to relax this 
check in the case of CCR following shards.

The following shards are not in charge of managing the max sequence 
number or global checkpoint, which are pulled from a leader shard. They 
also fetch and process batches of operations from the leader in an unordered 
way, potentially leaving gaps in the history of ops. If the following shard lags 
a lot it's possible that the global checkpoint and max seq number never get 
in sync, preventing the following shard to be closed and a new PUT Follow 
action to be issued on this shard (which is our recommended way to 
resume/restart a CCR following).

This commit allows each Engine implementation to define the specific 
verification it must perform before closing the index. In order to allow 
following/frozen/closed shards to be closed whatever the max seq number 
or global checkpoint are, the FollowingEngine and ReadOnlyEngine do 
not perform any check before the index is closed.

Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
2019-02-11 17:34:17 +01:00
Luca Cavanna 6443b46184
Clean up ShardSearchLocalRequest (#38574)
Added a constructor accepting `StreamInput` as argument, which allowed to
make most of the instance members final as well as remove the default
constructor.
Removed a test only constructor in favour of invoking the existing
constructor that takes a `SearchRequest` as first argument.
Also removed profile members and related methods as they were all unused.
2019-02-11 15:55:46 +01:00
Alexander Reelsen 884b5063a4 Create ISO8601 joda compatible java time formatter (#38434)
The existing formatter being used was not on par with the joda formatter
as it was missing the ability to parse a comma as a separator between
seconds and milliseconds.

While a real iso8601 would be much more complex, this might be
sufficient for some more use-cases.

The ingest date formatter now also uses the iso8601 formatter by
default.

Closes #38345
2019-02-11 15:11:26 +01:00
Alexander Reelsen e7868e92bd
Restore date aggregation performance in UTC case (#38221) (#38700)
The benchmarks showed a sharp decrease in aggregation performance for
the UTC case.

This commit uses the same calculation as joda time, which requires no
conversion into any java time object, also, the check for an fixedoffset
has been put into the ctor to reduce the need for runtime calculations.
The same goes for the amount of the used unit in milliseconds.

Closes #37826
2019-02-11 16:30:48 +03:00
Luca Cavanna fe8bd757b2
Look up connection using the right cluster alias when releasing contexts (#38570)
Whenever phase failure is raised in AbstractSearchAsyncAction, we go and
release search contexts of shards that successfully returned their
results, prior to notifying the listener of the failure. In case we are
executing a CCS request, it's important to look-up the connection to
send the release context request to.

This commit makes sure that the lookup takes the cluster alias into
account. We used to use `null` at all times instead which is not correct
and was not caught as any exception is caught without re-throwing it.
2019-02-11 13:40:42 +01:00
Przemyslaw Gomulka ba9a4d13e1
mute Failing tests related to logging and joda-java migration backport(#38704)(#38710)
the tests awaits fix from #38693 and #38705 and #38581
2019-02-11 13:15:12 +01:00
Przemyslaw Gomulka ab9e2f2e69
Move testToUtc test to DateFormattersTests #38698 Backport #38610
The test was relying on toString in ZonedDateTime which is different to
what is formatted by strict_date_time when milliseconds are 0
The method is just delegating to dateFormatter, so that scenario should
be covered there.

closes #38359
Backport #38610
2019-02-11 11:34:25 +01:00
Like b8be6cb5c7
Reject index.optimize_auto_generated_id setting (#28895)
This commit rejects the index.optmize_auto_generated_id setting for
indices created on or after 7.0.0. This setting was deprecated in 6.7.0.
2019-02-10 13:46:09 -05:00
Tim Brooks 023e3c207a
Concurrent file chunk fetching for CCR restore (#38656)
Adds the ability to fetch chunks from different files in parallel, configurable using the new `ccr.indices.recovery.max_concurrent_file_chunks` setting, which defaults to 5 in this PR.

The implementation uses the parallel file writer functionality that is also used by peer recoveries.
2019-02-09 21:19:57 -07:00
Christoph Büscher e3c7b93917 Mute failure in InternalEngineTests (#38622) 2019-02-08 16:29:54 +01:00
Dimitris Athanasiou fe8182ece2
Mute RetentionLeastIT.testRetentionLeasesSyncOnRecovery on 7x (#38597) 2019-02-08 11:32:28 +02:00
Jason Tedor fdf6b3f23f
Add 7.1 version constant to 7.x branch (#38513)
This commit adds the 7.1 version constant to the 7.x branch.

Co-authored-by: Andy Bristol <andy.bristol@elastic.co>
Co-authored-by: Tim Brooks <tim@uncontended.net>
Co-authored-by: Christoph Büscher <cbuescher@posteo.de>
Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com>
Co-authored-by: markharwood <markharwood@gmail.com>
Co-authored-by: Ioannis Kakavas <ioannis@elastic.co>
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
Co-authored-by: David Roberts <dave.roberts@elastic.co>
Co-authored-by: Jason Tedor <jason@tedor.me>
Co-authored-by: Alpar Torok <torokalpar@gmail.com>
Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
Co-authored-by: Tim Vernum <tim@adjective.org>
Co-authored-by: Albert Zaharovits <albert.zaharovits@gmail.com>
2019-02-07 16:32:27 -05:00
Jason Tedor f8ed6c15c4
Enable BWC after backport recovering leases (#38485)
This commit enables the BWC tests after backporting recovery of retention
leases during peer recovery.
2019-02-06 08:03:19 -05:00
Jason Tedor 4b42281a4e
Collapse retention lease integration tests (#38483)
This commit collapses the retention lease integration tests into a
single suite.
2019-02-06 07:55:41 -05:00
Tanguy Leroux 510829f9f7
TransportVerifyShardBeforeCloseAction should force a flush (#38401)
This commit changes the `TransportVerifyShardBeforeCloseAction` so that it 
always forces the flush of the shard. It seems that #37961 is not sufficient to 
ensure that the translog and the Lucene commit share the exact same max 
seq no and global checkpoint information in case of one or more noop 
operations have been made.

The `BulkWithUpdatesIT.testThatMissingIndexDoesNotAbortFullBulkRequest` 
and `FrozenIndexTests.testFreezeEmptyIndexWithTranslogOps` test this trivial 
situation and they both fail 1 on 10 executions.

Relates to #33888
2019-02-06 13:22:54 +01:00
David Turner 5a3c452480
Align docs etc with new discovery setting names (#38492)
In #38333 and #38350 we moved away from the `discovery.zen` settings namespace
since these settings have an effect even though Zen Discovery itself is being
phased out. This change aligns the documentation and the names of related
classes and methods with the newly-introduced naming conventions.
2019-02-06 11:34:38 +00:00
Ioannis Kakavas e1d464b22c
Mute testRetentionLeasesSyncOnRecovery (#38488)
Relates: #38487
2019-02-06 08:52:54 +02:00
Armin Braun 34f2cc78f6
Fix Master Failover and DataNode Leave Blocking Snapshot (#38460)
* Closes #38447
2019-02-05 23:56:59 +01:00
Jason Tedor 79a45b47da
Recover retention leases during peer recovery (#38435)
This commit integrates retention leases with recovery. With this change,
we copy the current retention leases on primary to the replica during
phase two of recovery. At this point, the replica will have been added
to the replication group and so is already receiving retention lease
sync requests from the primary. This means that if any retention lease
syncs are triggered on the primary after we sample the retention leases
here during phase two, that sync request will also arrive on the replica
ensuring that the replica is from this point on up to date with the
retention leases on the primary. We have to copy these during phase two
since we will be applying indexing operations, potentially triggering
merges, and therefore must ensure the correct retention leases are in
place beforehand.
2019-02-05 17:43:41 -05:00
Henning Andersen 20c66c5a05
Bubble-up exceptions from scheduler (#38317)
Instead of logging warnings we now rethrow exceptions thrown inside
scheduled/submitted tasks. This will still log them as warnings in
production but has the added benefit that if they are thrown during
unit/integration test runs, the test will be flagged as an error.

This is a continuation of #38014

Fixed NPE that caused CCR tests (IndexFollowingIT and likely others)
to fail.

schedule could bubble rejected exception to uncaught exception
handler when not using SAME executor if thread pool is terminated.
Now ignore rejected exception silently if executor is shutdown.
2019-02-05 21:48:24 +01:00
Boaz Leskes 033ba725af
Remove support for internal versioning for concurrency control (#38254)
Elasticsearch has long [supported](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning) compare and set (a.k.a optimistic concurrency control) operations using internal document versioning. Sadly that approach is flawed and can sometime do the wrong thing. Here's the relevant excerpt from the resiliency status page:

> When a primary has been partitioned away from the cluster there is a short period of time until it detects this. During that time it will continue indexing writes locally, thereby updating document versions. When it tries to replicate the operation, however, it will discover that it is partitioned away. It won’t acknowledge the write and will wait until the partition is resolved to negotiate with the master on how to proceed. The master will decide to either fail any replicas which failed to index the operations on the primary or tell the primary that it has to step down because a new primary has been chosen in the meantime. Since the old primary has already written documents, clients may already have read from the old primary before it shuts itself down. The version numbers of these reads may not be unique if the new primary has already accepted writes for the same document 

We recently [introduced](https://www.elastic.co/guide/en/elasticsearch/reference/6.x/optimistic-concurrency-control.html) a new sequence number based approach that doesn't suffer from this dirty reads problem. 

This commit removes support for internal versioning as a concurrency control mechanism in favor of the sequence number approach.

Relates to #1078
2019-02-05 20:53:35 +01:00
Jason Tedor b03d138122
Lift retention lease expiration to index shard (#38380)
This commit lifts the control of when retention leases are expired to
index shard. In this case, we move expiration to an explicit action
rather than a side-effect of calling
ReplicationTracker#getRetentionLeases. This explicit action is invoked
on a timer. If any retention leases expire, then we hard sync the
retention leases to the replicas. Otherwise, we proceed with a
background sync.
2019-02-05 14:42:17 -05:00
Tim Brooks c2a8fe1f91
Prevent CCR recovery from missing documents (#38237)
Currently the snapshot/restore process manually sets the global
checkpoint to the max sequence number from the restored segements. This
does not work for Ccr as this will lead to documents that would be
recovered in the normal followering operation from being recovered.

This commit fixes this issue by setting the initial global checkpoint to
the existing local checkpoint.
2019-02-05 13:32:41 -06:00
Tal Levy aef5775561
re-enables awaitsfixed datemath tests (#38376)
Previously, date formats of `YYYY.MM.dd` would hit an issue
where the year would jump towards the end of the calendar year.
This was an issue that had since been resolved in tests by using
`yyyy` to be the more accurate representation of the year.

Closes #37037.
2019-02-05 11:20:40 -08:00
Julie Tibshirani 3ce7d2c9b6
Make sure to reject mappings with type _doc when include_type_name is false. (#38270)
`CreateIndexRequest#source(Map<String, Object>, ... )`, which is used when
deserializing index creation requests, accidentally accepts mappings that are
nested twice under the type key (as described in the bug report #38266).

This in turn causes us to be too lenient in parsing typeless mappings. In
particular, we accept the following index creation request, even though it
should not contain the type key `_doc`:

```
PUT index?include_type_name=false
{
  "mappings": {
    "_doc": {
      "properties": { ... }
    }
  }
}
```

There is a similar issue for both 'put templates' and 'put mappings' requests
as well.

This PR makes the minimal changes to detect and reject these typed mappings in
requests. It does not address #38266 generally, or attempt a larger refactor
around types in these server-side requests, as I think this should be done at a
later time.
2019-02-05 10:52:32 -08:00
David Turner f2dd5dd6eb
Remove DiscoveryPlugin#getDiscoveryTypes (#38414)
With this change we no longer support pluggable discovery implementations. No
known implementations of `DiscoveryPlugin` actually override this method, so in
practice this should have no effect on the wider world. However, we were using
this rather extensively in tests to provide the `test-zen` discovery type. We
no longer need a separate discovery type for tests as we no longer need to
customise its behaviour.

Relates #38410
2019-02-05 17:42:24 +00:00
David Turner b7ab521eb1
Throw AssertionError when no master (#38432)
Today we throw a fatal `RuntimeException` if an exception occurs in
`getMasterName()`, and this includes the case where there is currently no
master. However, sometimes we call this method inside an `assertBusy()` in
order to allow for a cluster that is in the process of stabilising and electing
a master. The trouble is that `assertBusy()` only retries on an
`AssertionError` and not on a general `RuntimeException`, so the lack of a
master is immediately fatal.

This commit fixes the issue by asserting there is a master, triggering a retry
if there is not.

Fixes #38331
2019-02-05 17:11:20 +00:00
Armin Braun 2f6afd290e
Fix Concurrent Snapshot Ending And Stabilize Snapshot Finalization (#38368)
* The problem in #38226 is that in some corner cases multiple calls to `endSnapshot` were made concurrently, leading to non-deterministic behavior (`beginSnapshot` was triggering a repository finalization while one that was triggered by a `deleteSnapshot` was already in progress)
   * Fixed by:
      * Making all `endSnapshot` calls originate from the cluster state being in a "completed" state (apart from on short-circuit on initializing an empty snapshot). This forced putting the failure string into `SnapshotsInProgress.Entry`.
      * Adding deduplication logic to `endSnapshot`
* Also:
  * Streamlined the init behavior to work the same way (keep state on the `SnapshotsService` to decide which snapshot entries are stale)
* closes #38226
2019-02-05 16:44:18 +01:00
Lee Hinman d862453d68
Support unknown fields in ingest pipeline map configuration (#38352)
We already support unknown objects in the list of pipelines, this changes the
`PipelineConfiguration` to support fields other than just `id` and `config`.

Relates to #36938
2019-02-05 07:52:17 -07:00
David Turner 3b2a0d7959
Rename no-master-block setting (#38350)
Replaces `discovery.zen.no_master_block` with `cluster.no_master_block`. Any
value set for the old setting is now ignored.
2019-02-05 08:47:56 +00:00
David Turner 2d114a02ff
Rename static Zen1 settings (#38333)
Renames the following settings to remove the mention of `zen` in their names:

- `discovery.zen.hosts_provider` -> `discovery.seed_providers`
- `discovery.zen.ping.unicast.concurrent_connects` -> `discovery.seed_resolver.max_concurrent_resolvers`
- `discovery.zen.ping.unicast.hosts.resolve_timeout` -> `discovery.seed_resolver.timeout`
- `discovery.zen.ping.unicast.hosts` -> `discovery.seed_addresses`
2019-02-05 08:46:52 +00:00
Yogesh Gaikwad fe36861ada
Add support for API keys to access Elasticsearch (#38291)
X-Pack security supports built-in authentication service
`token-service` that allows access tokens to be used to 
access Elasticsearch without using Basic authentication.
The tokens are generated by `token-service` based on
OAuth2 spec. The access token is a short-lived token
(defaults to 20m) and refresh token with a lifetime of 24 hours,
making them unsuitable for long-lived or recurring tasks where
the system might go offline thereby failing refresh of tokens.

This commit introduces a built-in authentication service
`api-key-service` that adds support for long-lived tokens aka API
keys to access Elasticsearch. The `api-key-service` is consulted
after `token-service` in the authentication chain. By default,
if TLS is enabled then `api-key-service` is also enabled.
The service can be disabled using the configuration setting.

The API keys:-
- by default do not have an expiration but expiration can be
  configured where the API keys need to be expired after a
  certain amount of time.
- when generated will keep authentication information of the user that
   generated them.
- can be defined with a role describing the privileges for accessing
   Elasticsearch and will be limited by the role of the user that
   generated them
- can be invalidated via invalidation API
- information can be retrieved via a get API
- that have been expired or invalidated will be retained for 1 week
  before being deleted. The expired API keys remover task handles this.

Following are the API key management APIs:-
1. Create API Key - `PUT/POST /_security/api_key`
2. Get API key(s) - `GET /_security/api_key`
3. Invalidate API Key(s) `DELETE /_security/api_key`

The API keys can be used to access Elasticsearch using `Authorization`
header, where the auth scheme is `ApiKey` and the credentials, is the 
base64 encoding of API key Id and API key separated by a colon.
Example:-
```
curl -H "Authorization: ApiKey YXBpLWtleS1pZDphcGkta2V5" http://localhost:9200/_cluster/health
```

Closes #34383
2019-02-05 14:21:57 +11:00
Christoph Büscher d255303584
Add typless client side GetIndexRequest calls and response class (#37778)
The HLRC client currently uses `org.elasticsearch.action.admin.indices.get.GetIndexRequest`
and `org.elasticsearch.action.admin.indices.get.GetIndexResponse` in its get index calls. Both request and
response are designed for the typed APIs, including some return types e.g. for `getMappings()` which in
the maps it returns still use a level including the type name.
In order to change this without breaking existing users of the HLRC API, this PR introduces two new request
and response objects in the `org.elasticsearch.client.indices` client package. These are used by the
IndicesClient#get and IndicesClient#exists calls now by default and support the type-less API. The old request
and response objects are still kept for use in similarly named, but deprecated methods.

The newly introduced client side classes are simplified versions of the server side request/response classes since
they don't need to support wire serialization, and only the response needs fromXContent parsing (but no
xContent-serialization, since this is the responsibility of the server-side class).
Also changing the return type of `GetIndexResponse#getMapping` to
`Map<String, MappingMetaData> getMappings()`, while it previously was returning another map
keyed by the type-name. Similar getters return simple Maps instead of the ImmutableOpenMaps that the 
server side response objects return.
2019-02-05 03:41:05 +01:00
Gordon Brown 292e0f6fb7
Deprecate `_type` in simulate pipeline requests (#37949)
As mapping types are being removed throughout Elasticsearch, the use of
`_type` in pipeline simulation requests is deprecated. Additionally, the
default `_type` used if one is not supplied has been changed to `_doc` for
consistency with the rest of Elasticsearch.
2019-02-04 16:11:44 -07:00
Christoph Büscher 0ced775389
Mute RareClusterStateIT.testDelayedMappingPropagationOnReplica (#38357) 2019-02-04 22:30:34 +01:00
Mayya Sharipova 641704464d
Deprecate types in rollover index API (#38039)
Relates to #35190
2019-02-04 16:07:45 -05:00
Zachary Tong ab1150378b
Add Composite to AggregationBuilders (#38207) 2019-02-04 13:47:04 -05:00
David Turner 2c1eab2b8a
Clarify slow cluster-state log messages (#38302)
The message `... took [31s] above the warn threshold of 30s` suggests
incorrectly that the task took 61 seconds. This commit adds the clarifying
words `which is`.
2019-02-04 17:44:00 +00:00
Andrey Ershov 7bc8bc9605
ensureGreen (#38324) 2019-02-04 16:36:04 +01:00
Jason Tedor 625d37a26a
Introduce retention lease background sync (#38262)
This commit introduces a background sync for retention leases. The idea
here is that we do a heavyweight sync when adding a new retention lease,
and then periodically we want to background sync any retention lease
renewals to the replicas. As long as the background sync interval is
significantly lower than the extended lifetime of a retention lease, it
is okay if from time to time a replica misses a sync (it will still have
an older version of the lease that is retaining more data as we assume
that renewals do not decrease the retaining sequence number). There are
two follow-ups that will come after this commit. The first is to address
the fact that we have not adapted the should periodically flush logic to
possibly flush the retention leases. We want to do something like flush
if we have not flushed in the last five minutes and there are renewed
retention leases since the last time that we flushed. An additional
follow-up will remove the syncing of retention leases when a retention
lease expires. Today this sync could be invoked in the background by a
merge operation. Rather, we will move the syncing of retention lease
expiration to be done under the background sync. The background sync
will use the heavyweight sync (write action) if a lease has expired, and
will use the lightweight background sync (replication action) otherwise.
2019-02-04 10:35:29 -05:00
Christoph Büscher 5ee7232379
Mute SpecificMasterNodesIT#testElectOnlyBetweenMasterNodes (#38334) 2019-02-04 16:10:06 +01:00
Christoph Büscher 715e581378
Mute DedicatedClusterSnapshotRestoreIT#testRestoreShrinkIndex (#38330) 2019-02-04 15:46:19 +01:00
Boaz Leskes e49b593c81
Move TokenService to seqno powered cas (#38311)
Relates #37872 
Relates #10708
2019-02-04 15:25:41 +01:00
Yannick Welsch ece8c659c5
Decrease leader and follower check timeout (#38298)
Reduces the leader and follower check timeout to 3 * 10 = 30s instead of 3 * 30 = 90s, with 30s still
being a very long time for a node to be completely unresponsive.
2019-02-04 15:11:12 +01:00
Przemyslaw Gomulka 9b64558efb
Migrating from joda to java.time. Watcher plugin (#35809)
part of the migrating joda time work. Migrating watcher plugin to use JDK's java-time

refers #27330
2019-02-04 15:08:31 +01:00
Alexander Reelsen 87f3579125
Add nanosecond field mapper (#37755)
This adds a dedicated field mapper that supports nanosecond resolution -
at the price of a reduced date range.

When using the date field mapper, the time is stored as milliseconds since the epoch
in a long in lucene. This field mapper stores the time in nanoseconds
since the epoch - which means its range is much smaller, ranging roughly from
1970 to 2262.

Note that aggregations will still be in milliseconds.
However docvalue fields will have full nanosecond resolution

Relates #27330
2019-02-04 11:31:16 +01:00