Commit Graph

97 Commits

Author SHA1 Message Date
Nhat Nguyen 3a1dad1050 Mute testFollowIndexAndCloseNode
Tracked at #33337
2018-09-02 19:17:51 -04:00
Nhat Nguyen c6b011f8ea
TEST: Increase timeout testFollowIndexAndCloseNode (#33333)
This test fails several times due to timeout when asserting the number
of docs on the following and leading indices. This change reduces
the number of docs to index and increases the timeout.
2018-09-02 09:28:47 -04:00
Martijn van Groningen 66b164c2a6
[CCR] Removed custom follow and unfollow api's reponse classes with AcknowledgedResponse (#33260)
These response classes did not add any value and in that case just AcknowledgedResponse should be used.

I also changed the formatting of methods to take one line per parameter in
FollowIndexAction.java and UnfollowIndexAction.java files to make
reviewing diffs in the future easier.
2018-08-31 21:16:06 +07:00
Nhat Nguyen d3f32273eb Merge branch 'master' into ccr 2018-08-30 23:22:58 -04:00
Martijn van Groningen 41c7fc8d37
[CCR] Introduce leader index name & last fetch time stats to stats api response (#33155) 2018-08-29 10:54:58 +07:00
Nhat Nguyen e2b931e80b
Use Lucene history in primary-replica resync (#33178)
This commit makes primary-replica resyncer use Lucene as the source of
history operation instead of translog if soft-deletes is enabled. With
this change, we no longer expose translog snapshot directly in IndexShard.

Relates #29530
2018-08-28 10:44:15 -04:00
Jason Tedor 5954354e62
Fix ShardFollowNodeTask.Status equals and hash code (#33189)
These were broken when fetch exceptions were introduced to the status
object but equals and hash code were not updated then. This commit
addresses that.
2018-08-28 08:53:45 -04:00
Jason Tedor cd91992c89
Only fetch mapping updates when necessary (#33182)
Today we fetch the mapping from the leader and apply it as a mapping
update whenever the index metadata version on the leader changes. Yet,
the index metadata can change for many reasons other than a mapping
update (e.g., settings updates, adding an alias, or a replica being
promoted to a primary among many other reasons). This commit builds on
the addition of a mapping version to the index metadata to only fetch
mapping updates when the mapping version increases. This reduces the
number of these fetches and application of mappings on the follower to
the bare minimum.
2018-08-28 06:06:22 -04:00
Jason Tedor 0e5d42ca38
Merge branch 'master' into ccr
* master:
  Adjust BWC version on mapping version
  Token API supports the client_credentials grant (#33106)
  Build: forked compiler max memory matches jvmArgs (#33138)
  Introduce mapping version to index metadata (#33147)
  SQL: Enable aggregations to create a separate bucket for missing values (#32832)
  Fix grammar in contributing docs
  SECURITY: Fix Compile Error in ReservedRealmTests (#33166)
  APM server monitoring (#32515)
  Support only string `format` in date, root object & date range (#28117)
  [Rollup] Move toBuilders() methods out of rollup config objects (#32585)
  Fix forbiddenapis on java 11  (#33116)
  Apply publishing to genreate pom (#33094)
  Have circuit breaker succeed on unknown mem usage
  Do not lose default mapper on metadata updates (#33153)
  Fix a mappings update test (#33146)
  Reload Secure Settings REST specs & docs (#32990)
  Refactor CachingUsernamePassword realm (#32646)
2018-08-27 13:49:59 -04:00
Martijn van Groningen 47e9e72df2
reduce maximum number of writes to speed up test 2018-08-27 12:14:46 +07:00
Jason Tedor ef9607ea0c
Track fetch exceptions for shard follow tasks (#33047)
This commit adds tracking and reporting for fetch exceptions. We track
fetch exceptions per fetch, keeping track of up to the maximum number of
concurrent fetches. With each failing fetch, we associate the from
sequence number with the exception that caused the fetch. We report
these in the CCR stats endpoint, and add some testing for this tracking.
2018-08-24 14:21:23 -04:00
Jason Tedor 7fa8a728c4
Make CCR QA tests build again (#33113)
Welp, I broke this. I merged a change to auto-discover the CCR QA tests
by making :x-pack:plugin:ccr:check auto-discover the check tasks in the
qa sub-project. Yet, the check tasks for these sub-projects did not
depend on the necessary test tasks (as we were previously doing this
directly from the ccr build file. This commit fixes this!
2018-08-24 09:48:54 -04:00
Martijn van Groningen b0f22d67c4
fixed not returning response instance 2018-08-24 16:56:29 +07:00
Martijn van Groningen 575f33941c
Required changes after merging in master branch. 2018-08-24 12:51:26 +07:00
Jason Tedor 9623cf6cde
Find CCR QA sub-projects automatically (#33027)
Today we are by-hand maintaining a list of CCR QA sub-projects that the
check task depends on. This commit simplifies this by finding these
sub-projects automatically and adding their check task as dependencies
of the CCR check task.
2018-08-21 12:51:55 -04:00
Jason Tedor b08d02e3b7
Implement CCR licensing (#33002)
This commit implements licensing for CCR. CCR will require a platinum
license, and administrative endpoints will be disabled when a license is
non-compliant.
2018-08-20 23:33:18 -04:00
Nhat Nguyen 919888eba7 TEST: Enable debug log testValidateFollowingIndexSettings 2018-08-06 14:55:56 -04:00
Nhat Nguyen c394eb9ae9 CCR: Expose the operation primary term
Relates #32442
2018-08-06 10:55:37 -04:00
Jason Tedor 3b739b9fd5
Avoid NPE on shard changes action (#32630)
If a leader index is deleted while there is an active follower, the
follower will send shard changes requests bound for the leader
index. Today this will result in a null pointer exception because there
will not be an index routing table for the index. A null pointer
exception looks like a bug to a user so this commit addresses this by
throwing an index not found exception instead.
2018-08-06 08:01:47 -04:00
Jason Tedor 32c2759bb9
Remove extra blank line in CcrStatsAction.java
This commit removes an extra blank line that was accidentally committed
to CcrStatsAction.java.
2018-08-03 09:55:04 -04:00
Jason Tedor d640c9ddf9
Introduce CCR stats endpoint (#32350)
This commit introduces the CCR stats endpoint which provides shard-level
stats on the status of CCR follower tasks.
2018-08-03 09:09:45 -04:00
Jason Tedor 2387616c80
Remove _xpack from CCR APIs (#32563)
For a new feature like CCR we will go without this extra layer of
indirection. This commit replaces all /_xpack/ccr/_(\S+) endpoints by
/_ccr/$1 endpoints.
2018-08-02 20:21:43 -04:00
Nhat Nguyen 8cfbb64d6e
ShardFollowNodeTask should fetch operation once (#32455)
Today ShardFollowNodeTask might fetch some operations more than once.
This happens because we ask the leading for up to max_batch_count
operations (instead of the left-over size) for the left-over request.
The leading then can freely respond up to the max_batch_count, and at
the same time, if one of the previous requests completed, we might issue
another read request whose range overlaps with the response of the
left-over request.

Closes #32453
2018-07-30 20:53:09 -04:00
Nhat Nguyen aa3b6e098c
Reject follow request if following setting not enabled on follower (#32448)
Today we do not check if the `following_index` setting of the follower
is enabled or not when processing a follow-request. If that setting is
disabled, the follower will use the default engine, not the following
engine. This change checks and rejects such invalid follow requests.

Relates #30086
2018-07-29 21:57:45 -04:00
Nhat Nguyen 8474f8a01c
Validate source of an index in LuceneChangesSnapshot (#32288)
Today it's possible to encounter an Index operation in Lucene whose
_source is disabled, and _recovery_source was pruned by the MergePolicy.
If it's the case, we create a Translog#Index without source and let the
caller validate it later. However, this approach is challenging for the
caller.

Deletes and No-Ops don't allow invoking "source()" method. The caller
has to make sure to call "source()" only on index operations. The
current implementation in CCR does not follow this and fail to replica
deletes or no-ops. Moreover, it's easier to reason if a Translog#Index
always has the source.
2018-07-27 08:16:52 -04:00
Nhat Nguyen cd8b80da58 Use shadow plugin in ccr/qa 2018-07-25 00:16:33 -04:00
Nhat Nguyen a5d8f0b55a CCR: use shadow plugin
Relates #32240
2018-07-24 22:48:11 -04:00
Nhat Nguyen 88190299df
CCR: Fix incorrect read request completion condition (#32266)
Today we consider a read request is exhausted if from_seqno is equal to
or greater than the max_required_seqno. However, if we stop when
from_seqno equals to the max_required_seqno, we will miss an operation
whose seqno is max_required_seqno because we have not seen that 
operation yet.
2018-07-22 22:14:27 -04:00
Martijn van Groningen b6b596e471
[CCR] Add random shard follow task test (#32188)
Added shard follow task unit tests that tests whether the shard follow task is able to process randomly generated shard changes api responses.
2018-07-21 12:38:05 +02:00
Nhat Nguyen 8e15504443 TEST: Fix range issue in ShardChangesActionTests
We modified the way we calculate to_seqno in #32121 but did not adjust
this test accordingly. If min_seqno equals to max_seqno, the size should be
one instead of zero.

Relates #32121
2018-07-20 17:20:41 -04:00
Nhat Nguyen fe574f89f8 CCR: Translog op on primary should have versionType
Normally translog operations will not be replayed on the primary.
Following engine is an exception where we replay translog on both
primary and replica as a non-primary strategy.  Even though we won't use
the version_type in the following engine, we still need to pass a valid
value for the primary operation in order not to trip assertions in an
engine.

This commit passes version_type EXTERNAL for translog operation if its
origin is primary.

Relates #31945
2018-07-20 08:39:38 -04:00
Martijn van Groningen a6b7497fdc
[CCR] Add more unit tests for shard follow task (#32121)
The added tests are based on specific scenarios as described in the test plan.
Before this change the ShardFollowNodeTaskTests contained more random like tests,
but these have been removed and in a followup pr better random tests will
be added in a new test class as is described in the test plan.
2018-07-20 14:12:05 +02:00
Nhat Nguyen d0f3ed5abd Merge branch 'master' into ccr
* master:
  Painless: Simplify Naming in Lookup Package (#32177)
  Handle missing values in painless (#32207)
  add support for write index resolution when creating/updating documents (#31520)
  ECS Task IAM profile credentials ignored in repository-s3 plugin (#31864)
  Remove indication of future multi-homing support (#32187)
  Rest test - allow for snapshots to take 0 milliseconds
  Make x-pack-core generate a pom file
  Rest HL client: Add put watch action (#32026)
  Build: Remove pom generation for plugin zip files (#32180)
  Fix comments causing errors with Java 11
  Fix rollup on date fields that don't support epoch_millis (#31890)
  Detect and prevent configuration that triggers a Gradle bug (#31912)
  [test] port linux package packaging tests (#31943)
  Revert "Introduce a Hashing Processor (#31087)" (#32178)
  Remove empty @return from JavaDoc
  Adjust SSLDriver behavior for JDK11 changes (#32145)
  [test] use randomized runner in packaging tests (#32109)
  Add support for field aliases. (#32172)
  Painless: Fix caching bug and clean up addPainlessClass. (#32142)
  Call setReferences() on custom referring tokenfilters in _analyze (#32157)
  Fix BwC Tests looking for UUID Pre 6.4 (#32158)
  Improve docs for search preferences (#32159)
  use before instead of onOrBefore
  Add more contexts to painless execute api (#30511)
  Add EC2 credential test for repository-s3 (#31918)
  A replica can be promoted and started in one cluster state update (#32042)
  Fix Java 11 javadoc compile problem
  Fix CP for namingConventions when gradle home has spaces (#31914)
  Fix `range` queries on `_type` field for singe type indices (#31756)
  [DOCS] Update TLS on Docker for 6.3 (#32114)
  ESIndexLevelReplicationTestCase doesn't support replicated failures but it's good to know what they are
  Remove versionType from translog (#31945)
  Switch distribution to new style Requests (#30595)
  Build: Skip jar tests if jar disabled
  Painless: Add PainlessClassBuilder (#32141)
  Build: Make additional test deps of check (#32015)
  Disable C2 from using AVX-512 on JDK 10 (#32138)
  Build: Move shadow customizations into common code (#32014)
  Painless: Fix Bug with Duplicate PainlessClasses (#32110)
  Remove empty @param from Javadoc
  Re-disable packaging tests on suse boxes
  Docs: Fix missing example script quote (#32010)
  [ML] Wait for aliases in multi-node tests (#32086)
  [ML] Move analyzer dependencies out of categorization config (#32123)
  Ensure to release translog snapshot in primary-replica resync (#32045)
  Handle TokenizerFactory  TODOs (#32063)
  Relax TermVectors API to work with textual fields other than TextFieldType (#31915)
  Updates the build to gradle 4.9 (#32087)
  Mute :qa:mixed-cluster indices.stats/10_index/Index - all’
  Check that client methods match API defined in the REST spec (#31825)
  Enable testing in FIPS140 JVM (#31666)
  Fix put mappings java API documentation (#31955)
  Add exclusion option to `keep_types` token filter (#32012)
  [Test] Modify assert statement for ssl handshake (#32072)
2018-07-19 23:03:01 -04:00
Martijn van Groningen d88c76e02b
[CCR] Initial replication group based tests (#32024)
Tests shard follow task in the context of a leader and follower ReplicationGroup,
in order to test how the shard follow logic reacts to certain shard related
failure scenarios.

More tests will need to be added, but this indicates what changes need to be made
to have these tests.

Relates to #30102
2018-07-17 17:39:49 +02:00
Martijn van Groningen 006c79a80d
[CCR] Improve retry mechanism when making remote calls from shard follow task (#31930)
Closes #31816
2018-07-17 10:25:51 +02:00
Martijn van Groningen 815faf34fc
[CCR] Move api parameters from url to request body. (#31949)
Relates to #30102
2018-07-11 10:16:43 +02:00
Martijn van Groningen 8e1ef0cff9
Rewrite shard follow node task logic (#31581)
The current shard follow mechanism is complex and does not give us easy ways the have visibility into the system (e.g. why we are falling behind).
The main reason why it is complex is because the current design is highly asynchronous. Also in the current model it is hard to apply backpressure
other than reducing the concurrent reads from the leader shard.

This PR has the following changes:
* Rewrote the shard follow task to coordinate the shard follow mechanism between a leader and follow shard in a single threaded manner.
  This allows for better unit testing and makes it easier to add stats.
* All write operations read from the shard changes api should be added to a buffer instead of directly sending it to the bulk shard operations api.
  This allows to apply backpressure. In this PR there is a limit that controls how many write ops are allowed in the buffer after which no new reads
  will be performed until the number of ops is below that limit.
* The shard changes api includes the current global checkpoint on the leader shard copy. This allows reading to be a more self sufficient process;
  instead of relying on a background thread to fetch the leader shard's global checkpoint.
* Reading write operations from the leader shard (via shard changes api) is a separate step then writing the write operations (via bulk shards operations api).
  Whereas before a read would immediately result into a write.
* The bulk shard operations api returns the local checkpoint on the follow primary shard, to keep the shard follow task up to date with what has been written.
* Moved the shard follow logic that was previously in ShardFollowTasksExecutor to ShardFollowNodeTask.
* Moved over the changes from #31242 to make shard follow mechanism resilient from node and shard failures.

Relates to #30086
2018-07-10 16:00:55 +02:00
Martijn van Groningen ac654cbc10
Follow engine should not fill gaps upon promotion and recovery (#31751)
Closes #31318
2018-07-03 13:15:06 +02:00
Martijn van Groningen 8ecfcc3b80
muted tests that will be replaced by the shard follow task refactoring:
https://github.com/elastic/elasticsearch/pull/31581
2018-06-29 11:47:46 +02:00
Nhat Nguyen 1185ddbcc6 Replaces testClassesDir with testClassesDirs in ccr build
Relates #30389
2018-06-28 11:24:41 -04:00
Nhat Nguyen 2c56df631d Adjusts transport actions in CCR
This commit adjusts the ccr’s actions accordingly to the recent changes
in the upstream.
2018-06-23 18:10:15 -04:00
Nhat Nguyen 34f127be3c CCR: Remove index name resolver from CCR actions
Relates #31002
2018-06-20 13:20:24 -04:00
Nhat Nguyen c74cd30ac6 Remove request type parameter from CCR actions
Relates #31405
2018-06-19 10:49:05 -04:00
Martijn van Groningen 50ce990305
added missing serialization tests 2018-06-19 10:22:58 +02:00
Martijn van Groningen 73c9dd976b
Remove action request builders. 2018-06-15 12:32:08 +02:00
Tanguy Leroux 18938aab39 Adapt ShardFollowTasksExecutor after #31031 2018-06-15 11:46:08 +02:00
Martijn van Groningen cc824ebb5e
[CCR] Added more validation to follow index api. (#31068) 2018-06-15 07:39:53 +02:00
Nhat Nguyen 1ccb34ac77 Remove unused imports 2018-06-14 11:44:20 -04:00
Jason Tedor 64b4cdeda6
Merge remote-tracking branch 'elastic/master' into ccr
* elastic/master: (53 commits)
  Painless: Restructure/Clean Up of Spec Documentation (#31013)
  Update ignore_unmapped serialization after backport
  Add back dropped substitution on merge
  high level REST api: cancel task (#30745)
  Enable engine factory to be pluggable (#31183)
  Remove vestiges of animal sniffer (#31178)
  Rename elasticsearch-nio to nio (#31186)
  Rename elasticsearch-core to core (#31185)
  Move cli sub-project out of server to libs (#31184)
  [DOCS] Fixes broken link in auditing settings
  QA: Better seed nodes for rolling restart
  [DOCS] Moves ML content to stack-docs
  [DOCS] Clarifies recommendation for audit index output type (#31146)
  Add nio-transport as option for http smoke tests (#31162)
  QA: Set better node names on rolling restart tests
  Add support for ignore_unmapped to geo sort (#31153)
  Share common parser in some AcknowledgedResponses (#31169)
  Fix random failure on SearchQueryIT#testTermExpansionExceptionOnSpanFailure
  Remove reference to multiple fields with one name (#31127)
  Remove BlobContainer.move() method (#31100)
  ...
2018-06-07 23:33:42 -04:00
Simon Willnauer 5c6711b8a4
Use a `_recovery_source` if source is omitted or modified (#31106)
Today if a user omits the `_source` entirely or modifies the source
on indexing we have no chance to re-create the document after it has
been added. This is an issue for CCR and recovery based on soft deletes
which we are going to make the default. This change adds an additional
recovery source if the source is disabled or modified that is only kept
around until the document leaves the retention policy window.

This change adds a merge policy that efficiently removes this extra source
on merge for all document that are live and not in the retention policy window
anymore.
2018-06-07 07:39:28 +02:00