Commit Graph

28276 Commits

Author SHA1 Message Date
Boaz Leskes 7488877d1a Validate a joining node's version with version of existing cluster nodes (#25808)
When a node tries to join a cluster, it goes through a validation step to make sure the node is compatible with the cluster. Currently we validation that the node can read the cluster state and that it is compatible with the indexes of the cluster. This PR adds validation that the joining node's version is compatible with the versions of existing nodes. Concretely we check that:

1) The node's min compatible version is higher or equal to any node in the cluster (this prevents a too-new node from joining)
2) The node's version is higher or equal to the min compat version of all cluster nodes (this prevents a too old join where, for example, the master is on 5.6, there's another 6.0 node in the cluster and a 5.4 node tries to join).
3) The node's major version is at least as higher as the lowest node in the cluster. This is important as we use the minimum version in the cluster to stop executing bwc code for operations that require multiple nodes. If the nodes are already operating in "new cluster mode", we should prevent nodes from the previous major to join (even if they are wire level compatible). This does mean that if you have a very unlucky partition during the upgrade which partitions all old nodes which are also a minority / data nodes only, the may not be able to re-join the cluster. We feel this edge case risk is well worth the simplification it brings to BWC layers only going one way. This restriction only holds if the cluster state has been recovered (i.e., the cluster has properly formed).

 Also, the node join validation can now selectively fail specific nodes (previously the entire batch was failed). This is an important preparation for a follow up PR where we plan to have a rejected joining node die with dignity.
2017-07-20 20:11:29 +02:00
Boaz Leskes de6ad7a704 awaitFix testCorruptTranslogTruncationOfReplica
see https://github.com/elastic/elasticsearch/issues/25817
2017-07-20 20:04:42 +02:00
Clinton Gormley febb4bf7bc Update removal_of_types.asciidoc
Fixed `include_in_type` -> `include_type_name`
2017-07-20 19:18:51 +02:00
Jack Conradson 9f7463e796 remove lang url parameter from stored script requests (#25779)
Also has updates to ScriptMetaData for allowing the old namespace format to be loaded all the way back through 5.0; however, it will throw an exception if two scripts share the same id but different languages.
2017-07-20 08:51:08 -07:00
Jason Tedor 137ab70d58 Fix elasticsearch-keystore handling of path.conf
This commit fixes the elasticsearch-keystore script handling of
path.conf; the problem here is that the script is setting a system
property that is completely unobserved. Instead, we use the path.conf
command line flag.

Relates #25811
2017-07-20 23:01:57 +09:00
Jason Tedor 9d8f11dc27 Remove legacy checks for config file settings
This commit removes legacy checks for unsupported an environment
variable and unsupported system properties. This environment variable
and these system properties have not been supported since 1.x so it is
safe to stop checking for the existence of these settings.

Relates #25809
2017-07-20 22:42:39 +09:00
Simon Willnauer 5e629cfba0 Ensure query resources are fetched asynchronously during rewrite (#25791)
The `QueryRewriteContext` used to provide a client object that can
be used to fetch geo-shapes, terms or documents for percolation. Unfortunately
all client calls used to be blocking calls which can have significant impact on the
rewrite phase since it occupies an entire search thread until the resource is
received. In the case that the index the resource is fetched from isn't on the local
node this can have significant impact on query throughput.

Note: this doesn't fix MLT since it fetches stuff in doQuery which is a different beast. Yet, it is a huge step in the right direction
2017-07-20 15:37:50 +02:00
Jay Modi 3e4bc027eb RestClient uses system properties and system default SSLContext (#25757)
This commit calls the `useSystemProperties` method on the HttpAsyncClientBuilder so that the jvm
system properties are used. The primary reason for doing this is to ensure the builder uses the
system default SSLContext rather than the default instance created by the http client library.

Closes #23231
2017-07-20 07:36:56 -06:00
Jason Tedor 3042b5dc7d Stop exporting HOSTNAME from scripts
Today we explicitly export the HOSTNAME variable from scripts. This is
probably a relic from the days when the scripts were not run on bash but
instead assume a POSIX-compliant shell only where HOSTNAME is not
guaranteed to exist. Yet, bash guarantees that HOSTNAME is set so we do
not need to set it in scripts. This commit removes this legacy.

Relates #25807
2017-07-20 22:27:47 +09:00
Jason Tedor 67a4288c9a Remove support for ES_INCLUDE
Today we enable users to customize the environment through the use of
ES_INCLUDE. This made sense for legacy reasons when we did not have
nicities like jvm.options (so dumped JVM options in the default include
script) and somewhat duplicates some of the functionality that we will
need from a dedicated environment script. This commit removes support
for ES_INCLUDE as a first step towards a dedicated include script.

Relates #25804
2017-07-20 15:41:59 +09:00
Jason Tedor 9aa42a438b Unzip quietly while provisioning virtual machines
When provisioning the virtual machines used for packaging, we download
the Gradle zip archive and unzip. This unzip is noisy produing a lot of
unnecessary output. This commit silences this output.

Relates #25803
2017-07-20 12:45:56 +09:00
Boaz Leskes 9989ac69a4 Revert "Validate a joining node's version with version of existing cluster nodes (#25770)"
This reverts commit 1e1f8e6376.
2017-07-19 17:34:53 +02:00
Simon Willnauer 4d78935df7 Introduce a new Rewriteable interface to streamline rewriting (#25788)
Today we have duplicated code that is quite complicated to iterate
over rewriteable (`QueryBuilders` mainly) This change introduces a
`Rewriteable` interface that allow to share code to do the rewriting as
well as encapsulation and composition of queries.
2017-07-19 15:06:49 +02:00
Adrien Grand d607c3be92 Fix list of unconverted snippets. 2017-07-19 14:57:55 +02:00
Adrien Grand 7a0eeb3978 Fix compilation. 2017-07-19 14:46:30 +02:00
Adrien Grand 55ad318541 Reduce the overhead of timeouts and low-level search cancellation. (#25776)
Setting a timeout or enforcing low-level search cancellation used to make us
wrap the collector and check either the current time or whether the search
task was cancelled for every collected document. This can be significant
overhead on cheap queries that match many documents.

This commit changes the approach to wrap the bulk scorer rather than the
collector and exponentially increase the interval between two consecutive
checks in order to reduce the overhead of those checks.
2017-07-19 14:15:53 +02:00
Adrien Grand 94a98daa37 Fix parsing of ip range queries. (#25768)
Closes #25636
2017-07-19 14:12:54 +02:00
Adrien Grand 01f083ca83 Reduce profiling overhead. (#25772)
Calling `System.nanoTime()` for each method call may have a significant
performance impact.

Closes #24799
2017-07-19 14:12:14 +02:00
Adrien Grand f1ff7f2454 Require a field when a `seed` is provided to the `random_score` function. (#25594)
We currently use fielddata on the `_id` field which is trappy, especially as we
do it implicitly. This changes the `random_score` function to use doc ids when
no seed is provided and to suggest a field when a seed is provided.

For now the change only emits a deprecation warning when no field is supplied
but this should be replaced by a strict check on 7.0.

Closes #25240
2017-07-19 14:11:15 +02:00
Clinton Gormley f69decf509 NOCONSOLE -> NOTCONSOLE in removal-of-types 2017-07-19 14:06:04 +02:00
Boaz Leskes 1e1f8e6376 Validate a joining node's version with version of existing cluster nodes (#25770)
When a node tries to join a cluster, it goes through a validation step to make sure the node is compatible with the cluster. Currently we validation that the node can read the cluster state and that it is compatible with the indexes of the cluster. This PR adds validation that the joining node's version is compatible with the versions of existing nodes. Concretely we check that:

1) The node's min compatible version is higher or equal to any node in the cluster (this prevents a too-new node from joining)
2) The node's version is higher or equal to the min compat version of all cluster nodes (this prevents a too old join where, for example, the master is on 5.6, there's another 6.0 node in the cluster and a 5.4 node tries to join).
3) The node's major version is at least as higher as the lowest node in the cluster. This is important as we use the minimum version in the cluster to stop executing bwc code for operations that require multiple nodes. If the nodes are already operating in "new cluster mode", we should prevent nodes from the previous major to join (even if they are wire level compatible). This does mean that if you have a very unlucky partition during the upgrade which partitions all old nodes which are also a minority / data nodes only, the may not be able to re-join the cluster. We feel this edge case risk is well worth the simplification it brings to BWC layers only going one way.

 Also, the node join validation can now selectively fail specific nodes (previously the entire batch was failed). This is an important preparation for a follow up PR where we plan to have a rejected joining node die with dignity.
2017-07-19 12:57:29 +02:00
Simon Willnauer 9882d2b9d3 Reduce the scope of `QueryRewriteContext` (#25787)
Today we provide a lot of functionality on the `QueryRewriteContext` that
we potentially don't have ie. if we rewrite on a coordinating node or when
we percolating. This change moves most of the unnecessary shard level or
index level services and dependencies to `QueryShardContext` instead.
2017-07-19 12:30:38 +02:00
Jason Tedor 4b18800df9 Fix handling of invalid error trace parameter
If a request contains an invalid error trace parameter, we send a error
on the channel. This should immediately abort any additional processing
of the request but instead we march on, dispatch the request and
subsequently send another message on the channel. The problem here is
this means two writes on the channel which leads to the request being
released twice ultimately raising in illegal reference count
exception. This commit addresses this by performing an early return in
the case that the request contained an invalid error trace parameter.

Relates #25785
2017-07-19 18:07:11 +09:00
Jason Tedor 82f52b17e1 Remove timed latch await in listeners test
This commit removes a timed latch await in a transport client listeners
test. The problem with a timed wait here is that on an overloaded
machine, the test can fail because the waiting thread was not unlatched
quickly enough. This makes the test unnecessarily flaky. Instead, we
should wait indefinitely and simply let the test fail by the test
timeout if the latch is not counted down for some reason.

Closes #25760
2017-07-19 16:51:27 +09:00
Jason Tedor 3d3d99557d Expand migration note regarding default paths
This commit expands on the migration note regarding the removal of
default.path.data and default.path.logs to include a note that users
that were relying on the defaults (the common case for path.logs), and
they carry over their previous elasticsearch.yml configruation file,
then they must add explicit values for path.data and path.logs.
2017-07-19 13:40:42 +09:00
Deb Adair 23c810b334 [DOCS] Changes xrefs to cross doc links to enable building GS "mini-docs" 2017-07-18 13:52:38 -07:00
Deb Adair d9e55179f1 [DOCS] Adding index file for GS "mini book". 2017-07-18 13:44:08 -07:00
Jim Ferenczi 4cd9728f55 [Test] Make sure that QueryPhaseTests#testIndexSortScrollOptimization creates segments that can be early terminated 2017-07-18 19:30:15 +02:00
Christoph Büscher e24af64de2 Add strict parsing of aggregation ranges (#25769)
Currently we ignore unknown field names when parsing RangeAggregator.Range and
GeoDistanceAggregationBuilder.Range from `range`, `date_range` or `geo_distance`
aggregations. This can hide subtle errors in the query. This change makes parsing `ranges`
stricter.
2017-07-18 18:31:04 +02:00
Boaz Leskes c0e6dafcab CombinedDeletionPolicy can't assert it has no commits when creating an index
This is an appealing assertion, but there scenarios where it can happen under normal operations. For example, when an index is created it may run into an exception when the lucene files have already been created. The master will try to assign the shard to another node (it's empty, so no need to look for data) but if there is no other node, it will reassign it to the same node. At that point the deletion will get a list of existing commits (which it will typically delete).
2017-07-18 17:23:54 +02:00
Christoph Büscher 43bfe06759 [Docs] Add sorting and source filtering section to client docs (#25767) 2017-07-18 16:58:46 +02:00
Luca Cavanna 5c5d723b86 Improve error message when aliases are not supported (#25728)
With #23997 and #25268 we have changed put alias, delete alias, update aliases and delete index to not accept aliases. Instead concrete indices should be provided as their index parameter.

This commit improves the error message in case aliases are provided, from an IndexNotFoundException (404 status code) with "no such index" message, to an IllegalArgumentException (400 status code) with "The provided expression [alias] matches an alias, specify the corresponding concrete indices instead." message.

Note that there is no specific error message for the case where wildcard expressions match one or more aliases. In fact, aliases are simply ignored when expanding wildcards for such APIs. An error is thrown only when the expression ends up matching no indices at all, and allow_no_indices is set to false. In that case the error is still the generic "404 - no such index".
2017-07-18 15:40:17 +02:00
Clinton Gormley ff4a2519f2 Update experimental labels in the docs (#25727)
Relates https://github.com/elastic/elasticsearch/issues/19798

Removed experimental label from:
* Painless
* Diversified Sampler Agg
* Sampler Agg
* Significant Terms Agg
* Terms Agg document count error and execution_hint
* Cardinality Agg precision_threshold
* Pipeline Aggregations
* index.shard.check_on_startup
* index.store.type (added warning)
* Preloading data into the file system cache
* foreach ingest processor
* Field caps API
* Profile API

Added experimental label to:
* Moving Average Agg Prediction


Changed experimental to beta for:
* Adjacency matrix agg
* Normalizers
* Tasks API
* Index sorting

Labelled experimental in Lucene:
* ICU plugin custom rules file
* Flatten graph token filter
* Synonym graph token filter
* Word delimiter graph token filter
* Simple pattern tokenizer
* Simple pattern split tokenizer

Replaced experimental label with warning that details may change in the future:
* Analysis explain output format
* Segments verbose output format
* Percentile Agg compression and HDR Histogram
* Percentile Rank Agg HDR Histogram
2017-07-18 14:06:22 +02:00
Luca Cavanna 0d8b753325 IndexClosedException to return 400 rather than 403 (#25752)
403 can be confused with security. If an API doesn't support working against closed indices and closed indices are referred to in a request, that is a bad request, hence 400 is more appropriate.
2017-07-18 10:26:32 +02:00
Boaz Leskes 194f267110 TruncateTranslogIT.testCorruptTranslogTruncation should wait for replica to allocate
The test checks if a file based or ops based recovery happened, but if the replica shard never finished recovering expectations are not met.

Fixes #25761
2017-07-18 10:17:39 +02:00
Christoph Büscher a6e3d356ed Change parsing of numeric `to` and `from` parameters in `date_range` aggregation (#25376)
Currently the `to` and `from` parameter in the `date_range` aggregation is not
parsed with the correct date field format from the mappings or the aggregation
if the argument is numeric, but always treated as a long value specifying
`epoch_millis`. This leads to problems e.g. when the format is `epoch_second`,
but the `to` and `from` are currently treated as millis.

With this change, we interpret these parameters according to the `format` of the target field.
If the `format` in the mappings is not compatible with numeric input values,
a compatible `format` (e.g. `epoch_millis`, `epoch_second`) must be specified in
the `date_range` aggregation itself, otherwise an error is thrown.

#Closes #17920
2017-07-18 09:45:28 +02:00
Boaz Leskes f347bd4a4e await fix testCorruptTranslogTruncation 2017-07-18 09:15:17 +02:00
Jason Tedor c63b7f8b0b Stop disabling explicit GC
The problem here is simple: when using direct buffers as in NIO, the JDK
relies on explict GC invocataions to trigger cleaning up direct buffers;
if such GCs do not occur and the direct buffer limit is reached, the JVM
will throw an out of memory exception. With explicit GCs disabled, the
JVM is neutered from explicitly cleaning up direct buffers in the act of
reserving a new direct buffer and instead relies on a GC occurring for
another reason. If such a GC never occurs, the JVM will OOM. This commit
removes disabling of explicit GCs. Note that these explicit GCs only
occur as a last ditch effort before going OOM when the JVM is trying to
reserve more direct memory. This is a known issue, see for example:
JDK-8142537.

Relates #25759
2017-07-18 15:16:52 +09:00
Jim Ferenczi c6d9456693 #25747: Fix check of termVector with and without offsets 2017-07-17 19:46:42 +02:00
Christoph Büscher 56b1250a34 [Docs] Adding highlighting section to high level client docs (#25751)
Adding a section about how to use highlighting in the SearchSourceBuilder and
how to retrieve highlighted fragments from the SearchResponse.
2017-07-17 19:30:58 +02:00
Jim Ferenczi 41ea8fdcec Picks offset source for the unified highlighter directly from the es mapping (#25747)
This commit changes how the offset source is picked for each field using the es mapping rather than the underlying Lucene field infos.
It's mandatory for large mappings where field infos retrieval can be costly (the global field infos is merged for each highlighted field in every hit by the Lucene impl).

Fixes #25699
2017-07-17 19:10:46 +02:00
Lee Hinman 610ba7e427 Register data node stats from info carried back in search responses (#25430)
* Register data node stats from info carried back in search responses

This is part of #24915, where we now calculate the EWMA of service time for
tasks in the search threadpool, and send that as well as the current queue size
back to the coordinating node. The coordinating node now tracks this information
for each node in the cluster.

This information will be used in the future the determining the best replica a
search request should be routed to. This change has no user-visible difference.

* Move response time timing into ResponseListenerWrapper

* Move ResponseListenerWrapper to ActionListener instead of SearchActionListener

Also removes the logger

* Move `requestIndex` back to private

* De-guice-ify ResponseCollectorService \o/

* Undo all changes to SearchQueryThenFetchAsyncAction

* Remove unneeded response collector from TransportSearchAction

* Undo all changes to SearchDfsQueryThenFetchAsyncAction

* Completely rewrite the inside of ResponseCollectorService's record keeping

* Documentation and cleanups for ResponseCollectorService

* Add unit test for collection of queue size and service time

* Fix Guice construction error

* Add basic unit tests for ResponseCollectorService

* Fix version constant for the master merge

* Fix test compilation after master merge

* Add a test for node removal on cluster changed event

* Remove integration test as there are now unit tests

* Rename ResponseListenerWrapper -> SearchExecutionStatsCollector

* Fix line-length

* Make classes private and final where appropriate

* Pass nodeId into SearchExecutionStatsCollector and use only ActionListener

* Get nodeId from connection so searchShardTarget can be private

* Remove threadpool from SearchContext, get it from IndexShard instead

* Add missing import

* Use BiFunction for responseWrapper rather than passing in collector service
2017-07-17 11:04:51 -06:00
Simon Willnauer cb4eebcd6a Make `index` in TermsLookup mandatory (#25753)
This change removes the leniency of having a `null` index to fetch
terms from in 6.0 onwards. This feature will be deprecated in the 5.x series
and 6.0 nodes will require the index to be set.

Closes #25750
2017-07-17 18:50:30 +02:00
Simon Willnauer 9ff259c260 Use concrete version for BWC checks in SearchTransportService (#25748)
We used to compare agaisnt the min compatible version which is misleading since
it might move over time and since we backported the `can_match` API entirely
it's better to compare against a version constant.
2017-07-17 18:49:50 +02:00
Clinton Gormley 25a89e613a Broke recipes into separate pages 2017-07-17 18:21:39 +02:00
Boaz Leskes c0751c8650 deubg logging to TruncateTranslogIT
To see what data paths are used.
2017-07-17 17:18:05 +02:00
Adrien Grand 949db39fad Fix reproducibility of UUIDTests.
Closes #25714
2017-07-17 15:43:28 +02:00
Adrien Grand 78a6c3427b Optimize `terms` queries on `ip` addresses to use a `PointInSetQuery` whenever possible. (#25669)
We can't do it in the general case because of prefix queries, but I believe this
is mostly used in query strings and not in explicit `terms` queries.

Closes #25667
2017-07-17 15:39:01 +02:00
Adrien Grand 264088f1c4 Deprecate the `_default_` mapping. (#25652)
Now that indices cannot have types anymore, this feature does not buy anything
anymore.

Closes #25500
2017-07-17 15:37:59 +02:00
Jason Tedor e9aa60dc9d Skip shrink ignores template mapping in BWC tests
This commit reverts some changes to the shrink API ignore template
mapping REST test in favor of simply skipping the test for BWC
purposes. The complexity here is due to deprecations and lacking the
infrastructure to gracefully handle a situation like this.
2017-07-17 20:32:18 +09:00