Commit Graph

9262 Commits

Author SHA1 Message Date
Tim Brooks 38701fb6ee
Create nio-transport plugin for NioTransport ()
This is related to . This commit moves the NioTransport from
:test:framework to a new nio-transport plugin. Additionally, supporting
tcp decoding classes are moved to this plugin. Generic byte reading and
writing contexts are moved to the nio library.

Additionally, this commit adds a basic MockNioTransport to
:test:framework that is a TcpTransport implementation for testing that
is driven by nio.
2018-01-05 09:41:29 -07:00
Scott Somerville 7180e539de Add getWarmer and getTranslog method to NodeIndicesStats ()
When calling the node stats action via the Java API it was not able to return
information about warmers and translog even though that is available through 
the REST API. This change adds methods to make the responses more 
consistent.
2018-01-05 14:19:09 +01:00
Jim Ferenczi cb783bcb57
Fix global aggregation that requires breadth first and scores ()
* Fix global aggregation that requires breadth first and scores

This change fixes the deferring collector when it is executed in a global context
with a sub collector thats requires to access scores (e.g. top_hits aggregation).
The deferring collector replays the best buckets for each document and re-executes the original query
if scores are needed. When executed in a global context, the query to replay is a simple match_all
 query and not the original query.

Closes 
Closes 
2018-01-05 11:41:36 +01:00
Simon Willnauer b68f7ed8c3
Pass `java.locale.providers=COMPAT` to Java 9 onwards ()
Java 9 added some enhancements to the internationalization support that
impact our date parsing support. To ensure flawless BWC and consistent
behavior going forward Java 9 runtimes requrie the system property
`java.locale.providers=COMPAT` to be set.

Closes 
2018-01-04 16:43:51 +01:00
Yannick Welsch ca325194d9
Allow shrinking of indices from a previous major ()
Lucene does not allow adding Lucene 6 files to a Lucene 7 index. This commit ensures that we carry over the Lucene version to the newly created Lucene index.

Closes 
2018-01-04 16:23:25 +01:00
David Turner 19c61e894e
Remove deprecated exceptions ()
DeleteFailedEngineException and IndexFailedEngineException were
deprecated in 6.0. This commit removes them entirely in 7.0.
2018-01-04 11:00:29 +00:00
Yannick Welsch 7cdbae2da8
Add Writeable.Reader support to TransportResponseHandler ()
Allows TransportResponse objects not to implement Streamable anymore. As an example, I've adapted the response handler for ShardActiveResponse, allowing the fields in that class to become final.
2018-01-04 10:27:08 +01:00
Ryan Ernst d36ec18029
Plugins: Add plugin extension capabilities ()
This commit adds the infrastructure to plugin building and loading to
allow one plugin to extend another. That is, one plugin may extend
another by the "parent" plugin allowing itself to be extended through
java SPI. When all plugins extending a plugin are finished loading, the
"parent" plugin has a callback (through the ExtensiblePlugin interface)
allowing it to reload SPI.

This commit also adds an example plugin which uses as-yet implemented
extensibility (adding to the painless whitelist).
2018-01-03 11:12:43 -08:00
kel bccf030841 Fix cluster.routing.allocation.enable and cluster.routing.rebalance.enable casing ()
Fixes the default value of cluster.routing.allocation.enable and cluster.routing.rebalance.enable to be lower-case.
2018-01-03 18:03:52 +01:00
Jason Tedor a91da9a9af
Only bind loopback addresses when binding to local
* Only bind loopback addresses when binding to local

Today when binding to local (the default) we bind to any address that is
a loopback address, or any address on an interface that declares itself
as a loopback interface. Yet, not all addresses on loopback interfaces
are loopback addresses. This arises on macOS where there is a link-local
address assigned to the loopback interface (fe80::1%lo0) and in Docker
services where virtual IPs of the service are assigned to the loopback
interface (docker/libnetwork#1877). These situations cause problems:
 - because we do not handle the scope ID of a link-local address, we end
   up bound to an address for which publishing of that address does not
   allow that address to be reached (since we drop the scope)
 - the virtual IPs in the Docker situation are not loopback addresses,
   they are not link-local addresses, so we end up bound to interfaces
   that cause the bootstrap checks to be enforced even though the
   instance is only bound to local

We address this by only binding to actual loopback addresses, and skip
binding to any address on a loopback interface that is not a loopback
address. This lets us simplify some code where in the bootstrap checks
we were skipping link-local addresses, and in writing the ports file
where we had to skip link-local addresses because again the formatting
of them does not allow them to be connected to by another node (to be
clear, they could be connected to via the scope-qualified address, but
that information is not written out).

Relates 
2018-01-02 07:04:09 -05:00
Yannick Welsch 2603391c00
Add node id to shard failure message ()
This will help in the allocation explain API to figure out which node a shard was last allocated to before it failed.

Closes 
2017-12-29 17:40:28 +01:00
Mayya Sharipova 100a7b1f01 Introduce limit to the number of terms in Terms Query ()
- Correct a bug in the referenced settings

Closes 
2017-12-29 10:31:18 -05:00
Mayya Sharipova dcde895f49
Introduce limit to the number of terms in Terms Query ()
- Introduce index level settings to control the maximum number of terms
    that can be used in a Terms Query
- Throw an error if a request exceeds this max number

Closes 
2017-12-28 17:36:29 -05:00
Doug Turnbull bd0d10d716 Carry forward weights, etc on rescore rewrite () 2017-12-26 16:36:57 +01:00
Jim Ferenczi 0b2c8c835e
Fix composite aggregation when after term is missing in the shard ()
This change fixes a bug when a keyword term in the `after` key is not present in the shard.
In this case the global ord of the document values are compared with the insertion point of the
`after` keyword and values that are equal to the insertion point should be considered "after" the top value.
2017-12-26 09:58:49 +01:00
Tanguy Leroux f6b9d3fd8f [Test] Mute testAbortedSnapshotDuringInitDoesNotStart()
The failing test is tracked in 
2017-12-23 19:51:41 +01:00
Nhat Nguyen 436a243e3c Set global checkpoint before open engine from store ()
In PR , we set the global checkpoint from the translog in a store
recovery. However, we set after an engine is opened. This causes the
global checkpoint assertion in TranslogWriter violated as if we are
forced to close the engine before we set the global checkpoint. A
closing engine will close translog which in turn read the current global
checkpoint; however it is still unassigned and smaller than the initial
global checkpoint from translog.

Closes 
2017-12-23 10:13:27 +01:00
Nhat Nguyen 6629f4ab0d
Rollback primary before recovering from translog ()
Today we always recover a primary from the last commit point. However 
with a new deletion policy, we keep multiple commit points in the
existing store, thus we have chance to find a good starting commit
point. With a good starting commit point, we may be able to throw away
stale operations. This PR rollbacks a primary to a starting commit then
recovering from translog.

Relates 
2017-12-22 18:25:36 -05:00
Boaz Leskes adb49efe17
Non-peer recovery should set the global checkpoint ()
Non-Peer recoveries should restore the global checkpoint rather than wait for the activation of the primary. This brings us a step closer to a universe where a recovered shard always has a valid global checkpoint. Concretely:

1) Recovery from store can read the checkpoint from the translog
2) Recovery from local shards and snapshots can set the global checkpoint to the local checkpoint as this is the only copy of the shard.
3) Recovery of an empty shard can set it to `NO_OPS_PERFORMED`

Peer recoveries will follow but require more work and thus will have their own PR.

I also used the moment to clean up `IndexShard`'s api around starting the engine and doing recovery from the translog. The current naming are a relic of the past and don't align with the current naming schemes in the engine.
2017-12-22 21:39:12 +01:00
Nhat Nguyen 6435928c4f Remove no existing commits assertion in onInit()
This assertion does not hold if a shard is recovered from an empty store
but failed then retries. Moreover, if the openMode is CREATE_INDEX_*, we
pass CREATE mode to the IndexWriterConfig to create a new index and
overwrite the existing one.

Closes 
2017-12-22 15:04:16 -05:00
Ryan Ernst 0375d887f2
Plugins: Add validation to plugin descriptor parsing ()
This commit checks there are no leftover unparsed elements when parsing
a plugin descriptor.
2017-12-22 10:02:11 -08:00
Tanguy Leroux bd9daf422e
Do not start snapshots that are deleted during initialization ()
When a new snapshot is created it is added to the cluster state as a
snapshot-in-progress in INIT state, and the initialization is kicked
off in a new runnable task by SnapshotService.beginSnapshot(). The
initialization writes multiple files before updating the cluster state
to change the snapshot-in-progress to STARTED state. This leaves a
short window in which the snapshot could be deleted (let's say, because
the snapshot is stuck in INIT or because it takes too much time to
upload all the initialization files for all snapshotted indices). If
the INIT snapshot is deleted, the snapshot-in-progress becomes ABORTED
but once the initialization in SnapshotService.beginSnapshot() finished
it is change back to STARTED state again.

This commit avoids an ABORTED snapshot to be started if it has been
deleted during initialization. It also adds a test that would have failed
with the previous behavior, and changes few method names here and there.
2017-12-22 12:59:36 +01:00
Tanguy Leroux 098f82f086
[Test] Do not rely on MockZenPing for Azure tests ()
This commit changes some Azure tests so that they do not rely on
MockZenPing and TestZenDiscovery anymore, but instead use a mocked
AzureComputeService that exposes internal test cluster nodes as if
they were real Azure nodes.

Related to 

Closes , 
2017-12-22 09:58:02 +01:00
Ryan Ernst da703a7383
Tests: Update plugin info unit tests to use expectThrows () 2017-12-21 14:23:40 -08:00
Nhat Nguyen 7e3dc122fd Revert "Mute testRetentionPolicyChangeDuringRecovery"
This test is fixed by https://github.com/elastic/elasticsearch/pull/27947
This reverts commit cba80f3972d76f655a1a048aab1121b56f2b3a56.
2017-12-21 16:58:07 -05:00
Nhat Nguyen c831442352
Persist global checkpoint when finalizing peer recovery ()
Today we don't persist the global checkpoint when finishing a peer 
recovery even though we advance an in memory value. This commit persists
the global checkpoint in RecoveryTarget#finalizeRecovery.

Closes 
2017-12-21 16:51:30 -05:00
Jim Ferenczi e5f0852d5f
Remove unused search plugin extension ()
Search response listeners should not be exposed in search plugin.
Support was added but reverted right after (not present in any release).
Though the SearchPlugin still contains a default definition for search response listeners
due to a broken revert. This change removes this extension point that is basically no-op.
2017-12-21 22:15:55 +01:00
Martijn van Groningen a54798354b
simplify methods 2017-12-21 19:42:32 +01:00
Martijn van Groningen 791c5ddd7e
aggs: Add a method that is invoked before the `getLeafCollector(...)` of children aggregators is invoked.
In the case of nested aggregator this allows it to push down buffered child docs down to children aggregator.
Before this was done as part of the `NestedAggregator#getLeafCollector(...)`, but by then the children aggregators
have already moved on to the next segment and this causes incorrect results to be produced.

Closes 
2017-12-21 19:28:28 +01:00
Mayya Sharipova cbd271e497
Limit the analyzed text for highlighting ()
* Limit the analyzed text for highlighting

- Introduce index level settings to control the max number of character
to be analyzed for highlighting
- Throw an error if analysis is required on a larger text

Closes 
2017-12-21 10:19:58 -05:00
Jim Ferenczi 5ac5fd95ae
Move early termination based on index sort to TopDocs collector ()
Lucene TopDocs collector are now able to early terminate the collection
based on the index sort. This change plugs this new functionality directly in the
query phase instead of relying on a dedicated early terminating collector.
2017-12-21 08:57:06 +01:00
Tim Brooks 06b313025c
Add elasticsearch-nio jar for base nio classes ()
This is related to . This commit adds a jar called
elasticsearch-nio that contains the base nio classes that will be used
for the tcp nio transport and eventually the http nio transport.

The jar does not depend on elasticsearch:core, so all references to core
have been removed.
2017-12-20 16:29:16 -06:00
Maxime Gréau d9fff6d8f2 Add unreleased v6.1.2 version 2017-12-20 19:51:29 +01:00
Nhat Nguyen 54b6885844
Check index under the store metadata lock ()
Today when we get a metadata snapshot directly from a store directory, 
we acquire a metadata lock, then acquire an IndexWriter lock. However,
we create a CheckIndex in IndexShard without acquiring the metadata lock 
first. This causes a recovery failed because the IndexWriter lock can be
still held by method snapshotStoreMetadata. This commit makes sure to
create a CheckIndex under the metadata lock.

Closes 
Closes 
Relates 
2017-12-20 11:26:06 -05:00
Colin Goodheart-Smithe 4cbbe3ed93
Fixes DocStats to not report index size < -1 ()
Previously to this change when DocStats are added together (for example when adding the index size of all primary shards for an index)  we naively added the `totalSizeInBytes` together. This worked most of the time but not when the index size on one or multiple shards was reported to be `-1` (no value).

This change improves the logic by considering if the current value or the value to be added is `-1`:
* If the current and new value are both `-1` the value remains at `-1`
* If the current value is `-1` and the new value is not `-1`, current value is changed to be equal to the new value
* If the current value is not `-1` and the new value is `-1` the new value is ignored and the current value is not changed
* If both the current and new values are not `-1` the current value is changed to be equal to the sum of the current and new values.

The change also re-enables the failing rollover YAML test that was failing due to this bug.
2017-12-20 14:45:09 +00:00
Adrien Grand 77711508b0
Upgrade to Lucene 7.2.0. () 2017-12-20 14:17:40 +01:00
Simon Willnauer 5b229c31d6
Use `_refresh` to shrink the version map on inactivity ()
We used to shrink the version map under an external lock. This is
quite ambigious and instead we can simply issue an empty refresh to
shrink it.

Closes 
2017-12-20 13:53:41 +01:00
Simon Willnauer c4fae375b0
Make KeyedLock reentrant ()
Today we prevent that the same thread acquires the same lock more than once.
This restriction is a relict form the early days of this concurrency construct
and can be removed.
2017-12-20 13:53:03 +01:00
Simon Willnauer 0779af6dd2
Move uid lock into LiveVersionMap ()
While the LiveVersionMap is an internal class that belongs to the engine we do
rely on some external locking to enforce the desired semantics.  Yet, in tests
we mimic the outer locking but we don't have any way to enforce or assert on
that the lock is actually hold.  This change moves the KeyedLock inside the
LiveVersionMap that allows the engine to access it as before but enables
assertions in the LiveVersionMap to ensure the lock for the modifying or
reading key is actually hold.
2017-12-20 08:34:58 +01:00
Christoph Büscher fb2fd4e8ee
Fix preserving FiltersAggregationBuilder#keyed field on rewrite ()
Currently FiltersAggregationBuilder#doRewrite creates a new FiltersAggregationBuilder which doesn't correctly copy the original "keyed" field if a non-keyed filter gets rewritten.
This can cause rendering bugs of the output aggregations like the one reported in .

Closes 
2017-12-19 19:56:12 +01:00
Tim Brooks 41677b0b9e
Default to no http read timeout ()
Elasticsearch offers a number of http requests that can take a while to
execute. In  we introduced an http read timeout that defaulted to
30 seconds. This means that if no reads happened for 30 seconds (even
after a request is received), the connection would be closed due to
timeout.

This commit disables the read timeout by default to allow us to evaluate
the impact of read timeouts and to avoid introducing distruptive
behavior.
2017-12-19 11:44:48 -07:00
Nhat Nguyen 0c1ac2e700 Revert "testCorruptTranslogTruncation: add logging"
We can reduce logging for this test as it's fixed in https://github.com/elastic/elasticsearch/pull/27887
This reverts commit e0e698bc26.
2017-12-19 12:13:23 -05:00
Nhat Nguyen 6b0d90b9d4
TEST: Corrupt some translog files used in recovery ()
Currently, method corruptTranslogFiles corrupts some translog files
whose translog_gen are at least the min_required_translog_gen from the
translog checkpoint. However this condition is not enough for
recoverFromTranslog to be always failed.  If we corrupt only translog
operations from only translog files whose translog_gen are smaller than
the min_translog_gen of a recovering index commit, recoverFromTranslog
will be ok as we won't read translog operations from those files.

This commit makes sure corruptTranslogFiles to corrupt some translog 
files that will be used in recoverFromTranslog.

Closes 
2017-12-19 12:01:49 -05:00
Nhat Nguyen 25b0a7b20f Revert "Add @AwaitsFix for #27890"
This test was fixed in e9a3932dbc.

This reverts commit 1383cab267.
2017-12-19 11:27:03 -05:00
Nhat Nguyen e9a3932dbc Fix incorrectly assign local checkpoint from max_seqno
Relates 
2017-12-19 11:18:36 -05:00
kel 192b263e31 Make AbstractQueryBuilder.declareStandardFields to be protected () 2017-12-19 16:34:08 +01:00
David Turner b26cc36928 Mute testRetentionPolicyChangeDuringRecovery
Relates .
2017-12-19 12:05:45 +00:00
Albert Zaharovits 01a47baa10
Retain originalIndex info when rewriting FieldCapabilities requests ()
A FieldCapabilities request can cover multiple indices (or aliases pointing to multiple indices).
When rewriting the request for each index, store the original requested indices.
2017-12-19 13:38:41 +02:00
David Turner 1383cab267 Add @AwaitsFix for 2017-12-19 08:41:42 +00:00
Boaz Leskes bea9471b2f
Use port 0 InternalTestCluster nodes ()
We currently have a complicated port assignment scheme to make sure that the nodes span off by the internal test cluster will be assigned fixed port ranges that will also not collide between clusters. The port ranges need to be fixed in advance so that the nodes will be able to find each other via `UnicastZenPing`.

This approach worked well for the last few years but we are now at a point that our testing has grown beyond it and we exceed the 5 reusable ranges per JVM. This means that nodes are not always assigned the first 5 ports in their range which causes cluster formation issues. On top of that, most of the clusters that are span up don't even rely on `UnicastZenPing` but rather `MockZenPings` that uses in memory maps for discovery (with the down side that they are not influenced by network disruption simulations).

This PR changes `InternalTestCluster` to use port 0 as a fixed assignment. This will allow the OS to manage ports and will ensure we don't have collisions. For tests that need to simulate network disruptions (and thus can't use `MockZenPings`), a new `UnicastHostProvider` is introduced that is based on the current state of the test cluster. Since that is only resolved at run time, it is aware of the port assignments of the OS.

Closes 
Closes 
2017-12-19 08:43:03 +01:00