Commit Graph

2818 Commits

Author SHA1 Message Date
David Turner 08ecdfe20e Short-circuit rebalancing when disabled (#40966)
Today if `cluster.routing.rebalance.enable: none` then rebalancing is disabled,
but we still execute `balanceByWeights()` and perform some rather expensive
calculations before discovering that we cannot rebalance any shards. In a large
cluster this can make cluster state updates occur rather slowly. With this
change we check earlier whether rebalancing is globally disabled and, if so,
avoid the rebalancing process entirely.

Relates #40942 which was reverted because of egregiously faulty tests.
2019-04-09 07:59:52 +01:00
Nhat Nguyen 69421612e5 Mute testRecoverMissingAnalyzer
Tracked at #40867
2019-04-08 22:47:20 -04:00
Nhat Nguyen 713e5c987b Adjust init map size of user data of index commit (#40965)
The number of user data attributes of an index commit has increased 
from 6 to 8, but we forgot to adjust. This change increases the initial 
size of that map to avoid resizing.
2019-04-08 22:47:20 -04:00
Christoph Büscher 335955b874 Some internal refactorings in AnalysisRegistry (#40609)
Reducing some methods scope and marking them as static where possible. Removing
"alias" support from AnalysisRegistry#produceAnalyze and changing that method to
return a NamedAnalyzer instead of having a side effect on the analyzer map passed in.
Also, CustomAnalyzerProvider doesn't seem to need the `environment` field.
2019-04-08 20:48:34 +02:00
David Turner 8eef92fafd Revert "Short-circuit rebalancing when disabled (#40942)"
This reverts commit f78e6ef73b.
2019-04-08 15:58:56 +01:00
David Turner f78e6ef73b Short-circuit rebalancing when disabled (#40942)
Today if `cluster.routing.rebalance.enable: none` then rebalancing is disabled,
but we still execute `balanceByWeights()` and perform some rather expensive
calculations before discovering that we cannot rebalance any shards. In a large
cluster this can make cluster state updates occur rather slowly. With this
change we check earlier whether rebalancing is globally disabled and, if so,
avoid the rebalancing process entirely.
2019-04-08 14:57:29 +01:00
Jim Ferenczi bc0fe7d64d Handle min_doc_freq in phrase suggester (#40840)
The phrase suggesters have an option to remove terms that have
a frequency lower than a provided min_doc_freq. However this value is
overwritten by the frequency of the original term in the popular mode.
This change ensures that we keep the maximum value between the provided
min_doc_value and the original term frequency as a threshold to select
candidates.

Fixes #16764
2019-04-08 12:23:54 +02:00
Jason Tedor 4163e59768
Mute failing IndexShard local history test
This test fails reliably with, so this commit mutes that test until a
fix is available.
2019-04-07 10:17:46 -04:00
Jason Tedor 6900399144
Be lenient when parsing build flavor and type on the wire (#40734)
Today we are strict when parsing build flavor and types off the
wire. This means that if a later version introduces a new build flavor
or type, an older version would not be able to parse what that new
version is sending. For a practical example of this, we recently added
the build type "docker", and this means that in a rolling upgrade
scenario older nodes would not be able to understand the build type that
the newer node is sending. This breaks clusters and is bad. We do not
normally think of adding a new enumeration value as being a
serialization breaking change, it is just not a lesson that we have
learned before. We should be lenient here though, so that we can add
future changes without running the risk of breaking ourselves
horribly. It is either that, or we have super-strict testing
infrastructure here yet still I fear the possibility of mistakes. This
commit changes the parsing of build flavor and build type so that we are
still strict at startup, yet we are lenient with values coming across
the wire. This will help avoid us breaking rolling upgrades, or clients
that are on an older version.
2019-04-06 17:24:16 -04:00
Jason Tedor e44e84ab42
Suppress lease background sync failures if stopping (#40902)
If the transport service is stopped, likely because we are shutting
down, and a retention lease background sync fires the logs will display
a warn message and stacktrace. Yet, this situaton is harmless and can
happen as a normal course of business when shutting down. This commit
suppresses the log messages in this case.
2019-04-06 10:18:52 -04:00
David Turner 2ff19bc1b7
Use Writeable for TransportReplAction derivatives (#40905)
Relates #34389, backport of #40894.
2019-04-05 19:10:10 +01:00
Colin Goodheart-Smithe 4452e8e10f Mutes GatewayIndexStateIT.testRecoverBrokenIndexMetadata 2019-04-05 10:53:52 -04:00
David Turner 922a70ce32 Remove unused import
Relates #40863
2019-04-05 09:21:34 +01:00
David Turner d8956d2601 Remove test-only customisation from TransReplAct (#40863)
The `getIndexShard()` and `sendReplicaRequest()` methods in
TransportReplicationAction are effectively only used to customise some
behaviour in tests. However there are other ways to do this that do not cause
such an obstacle to separating the TransportReplicationAction into its two
halves (see #40706).

This commit removes these customisation points and injects the test-only
behaviour using other techniques.
2019-04-05 08:54:41 +01:00
Martijn van Groningen 809a5f13a4
Make -try xlint warning disabled by default. (#40833)
Many gradle projects specifically use the -try exclude flag, because
there are many cases where auto-closeable resource ignore is never
referenced in body of corresponding try statement. Suppressing this
warning specifically in each case that it happens using
`@SuppressWarnings("try")` would be very verbose.

This change removes `-try` from any gradle project and adds it to the
build plugin. Also this change removes exclude flags from gradle projects
that is already specified in build plugin (for example -deprecation).

Relates to #40366
2019-04-05 08:02:26 +02:00
Nhat Nguyen 5a2eb07c0e Primary replica resync should not send ops without seqno (#40433)
Primary-replica resync in a mixed-cluster between 6.x and 5.6 can send
operations without sequence number to a replica which already processed
operations with sequence number. This leads to the failure of that
replica for we trip the sequence number assertion when writing resync
operations without sequence number to translog.
2019-04-04 21:54:31 -04:00
Colin Goodheart-Smithe 402f312c5e
Adds version 6.7.2 2019-04-04 16:35:39 +01:00
Nhat Nguyen 2756a3936b Reject illegal flush parameters (#40213)
This change rejects an illegal combination of flush parameters where
force is true, but wait_if_ongoing is false. This combination is trappy
and should be forbidden.

Closes #36342
2019-04-04 09:02:31 -04:00
Nhat Nguyen c4960ad736 Ensure flush happen before closing an index (#40184)
If there's an ongoing flush triggered by the translog flush threshold,
we may fail to execute a flush because waitIfOngoing is false by
default.

Relates to #36342
2019-04-04 09:02:31 -04:00
Nhat Nguyen e716b9ceee Ensure no scheduled refresh in testPendingRefreshWithIntervalChange
If a refresh, which is scheduled by the setting change, executes after
the index-2 operation and win the refresh race (i.e., maybeRefresh) with
the scheduledRefresh that we are going to check, then the latter will
return false.

Closes #39565
Relates #39462

PR #40387
2019-04-04 09:02:31 -04:00
Adrien Grand 670e76669c
Fix alias resolution runtime complexity. (#40263) (#40788)
A user reported that the same query that takes ~900ms when querying an index
pattern only takes ~50ms when only querying indices that have matches. The
query is a date range query and we confirmed that the `can_match` phase works
as expected. I was able to reproduce this issue locally with a single node: with
900 1-shard indices, a query to an index pattern that matches all indices runs
in ~90ms while a query to the only index that has matches runs in 0-1ms.

This ended up not being related to the `can_match` phase but to the cost of
resolving aliases when querying an index pattern that matches lots of indices.
In that case, we first resolve the index pattern to a list of concrete indices
and then for each concrete index, we check whether it was matched through an
alias, meaning we might have to apply alias filters. Unfortunately this second
per-index operation runs in linear time with the number of matched concrete
indices, which means that alias resolution runs in O(num_indices^2) overall.
So queries get exponentially slower as an index pattern matches more indices.

I reorganized alias resolution into a one-step operation that runs in linear
time with the number of matches indices, and then a per-index operation that
runs in linear time with the number of aliases of this index. This makes alias
resolution run is O(num_indices * num_aliases_per_index) overall instead. When
testing the scenario described above, the `took` went down from ~90ms to ~10ms.
It is still more than the 0-1ms latency that one gets when only querying the
single index that has data, but still much better than what we had before.

Closes #40248
2019-04-04 11:40:42 +02:00
Adrien Grand f5f5c3e429
Add unit test for MetaDataMappingService with typeless put mapping. (#40578) (#40720)
This is currently only tested via REST tests.

Closes #37450
2019-04-04 10:07:55 +02:00
Ryan Ernst a28d5f35d9 Fix geo points missing test (#40704)
This commit initializes the geo points for the missing doc values test.

fixes #40684
2019-04-03 16:48:09 -07:00
Mayya Sharipova a94e9500ac Correct bug in ScriptDocValues (#40488)
If a field `field_name` was missing in a document,
doc['field_name'].get(0) incorrectly retrieved
a value of the previously accessed document.
This happened because `get(int index)` function
was just accessing `values[index]` without
checking the number of values - `count`.

This PR fixes this.
2019-04-03 16:47:59 -07:00
Yannick Welsch 6ae7d593ea Avoid background sync on relocated primary (#40800)
There were some test failures caused by the background retention lease sync running on a relocated
primary. This commit fixes the situation that triggered the assertion and reactivates the failing test.

Closes #40731
2019-04-03 20:28:48 +02:00
Christoph Büscher 89389197b3 Help Eclipse infering lambda parameter types (#40747)
The Eclipse compiler (4.10, Photon) cannot build this test because it cannot
correctly infer the type arguments of the functions. Explicitely adding them
helps in this case.
2019-04-03 17:51:22 +02:00
Christoph Büscher 09ba3ec677 Small refactorings to analysis components (#40745)
This change adds the following internal refactorings:

* wraps input analyzers into an unmodifiable map in IndexAnalyzers ctor
* removes duplicated indexSetting in IndexAnalyzers
* removes references to IndexAnalyzers from DocumentMapperParser and TypeParser.ParserContext.
  It can always be retrieve it from MapperService directly in those cases
2019-04-03 14:22:16 +02:00
David Turner 1d2bc85586 Inline TransportReplAction#registerRequestHandlers (#40762)
It is important that resync actions are not rejected on the primary even if its
`write` threadpool is overloaded. Today we do this by exposing
`registerRequestHandlers` to subclasses and overriding it in
`TransportResyncReplicationAction`. This isn't ideal because it obscures the
difference between this action and other replication actions, and also might
allow subclasses to try and use some state before they are properly
initialised. This change replaces this override with a constructor parameter to
solve these issues.

Relates #40706
2019-04-03 12:12:26 +01:00
Jason Tedor df65e46d10
Deprecate versions of Java prior to Java 11 (#40756)
This commit deprecates versions of Java prior to Java 11. This commit
will cause a warning to be printed to standard error when any command
line tool is invoked, or when Elasticsearch is started. Additionally, we
log a deprecation message when Elasticsearch is started.
2019-04-03 06:39:40 -04:00
David Turner e64524c46f Remove some abstractions from `TransportReplicationAction` (#40706)
`TransportReplicationAction` is a rather complex beast, and some of its
concrete implementations do not need all of its features. More specifically, it
(a) chases a primary around the cluster until it manages to pin it down and
then (b) executes an action on that primary and all its replicas. There are
some actions that are coordinated by the primary itself, meaning that there is
no need for the chase-the-primary phases, and in the case of peer recovery
retention leases and primary/replica resync it is important to bypass these
first phases.

This commit is a step towards separating the `TransportReplicationAction` into
these two parts. It is a mostly mechanical sequence of steps to remove some
abstractions that are no longer in use.
2019-04-03 09:08:29 +01:00
Simon Willnauer dd624c31b0 Don't mark shard as refreshPending on stats fetching (#40458)
Completion and DocStats are pulled from internal readers
instead of external since #33835 and #33847 which doesn't require
us to refresh after a stats call since refreshes will happen internally
anyhow and that will cause updated stats on ongoing indexing.
2019-04-02 16:15:30 +02:00
David Turner 6f00952abd Use TAR instead of DOCKER build type before 6.7.0 (#40723)
In 6.7.0 (#39378) we added a build type of DOCKER for the docker images, but
unfortunately earlier versions do not understand this and will reject any
transport messages that mention this build type.

This commit fixes this by reporting TAR instead of DOCKER when talking to older
nodes.

Relates (but does not fix) #40511
Relates #39378
2019-04-02 13:17:50 +01:00
Alexander Reelsen c644fbfc6e Allow single digit milliseconds in strict date parsing (#40676)
In order to remain compatible with the existing joda based
implementation the parsing of milliseconds should support parsing single
digits instead of relying on three, even with strict formats.

This adds a few tests to duel against the existing joda based
implementation in order to ensure the parsing behaviour is the same.

Closes #40403
2019-04-02 10:27:50 +02:00
Andrey Ershov 287e334ef3 Do not perform cleanup if Manifest write fails with dirty exception (#40519)
Currently, if Manifest write is unsuccessful (i.e. WriteStateException
is thrown) we perform cleanup of newly created metadata files.
However, this is wrong.
Consider the following sequence (caught by CI here
https://github.com/elastic/elasticsearch/issues/39077):

- cluster global data is written **successful**
- the associated manifest write **fails** (during the fsync, ie files
have been written)
- deleting (revert) the manifest files, **fails**, metadata is
therefore persisted
- deleting (revert) the cluster global data is **successful**

In this case, when trying to load metadata (after node restart
because of dirty WriteStateException),  the following exception will
happen
```
java.io.IOException: failed to find global metadata [generation: 0]
```
because the manifest file is referencing missing global metadata file.

This commit checks if thrown WriteStateException is dirty and if its
we don't perform any cleanup, because new Manifest file might be
created, but its deletion has failed.
In the future, we might add more fine-grained check - perform the
clean up if WriteStateException is dirty, but Manifest deletion is
successful.

Closes https://github.com/elastic/elasticsearch/issues/39077

(cherry picked from commit 1fac56916bb3c4f3333c639e59188dbe743e385b)
2019-04-01 12:52:32 +03:00
Jim Ferenczi 7cc79123df Fix merging of text field mapper (#40627)
On mapping updates the `text` field mapper does not update
the field types for the underlying prefix and phrase fields.
In practice this shouldn't be considered as a bug but we have
an assert in the code that check that field types in the mapper service
are identical to the ones present in field mappers.
2019-04-01 08:41:42 +02:00
Jason Tedor cebe509460
Fix bug in detecting use of bundled JDK on macOS
This commit fixes a bug in detecting the use of the bundled JDK on
macOS. This bug arose because the path of Java home is different on
macOS.
2019-03-31 19:43:17 -04:00
Henning Andersen 92d07e9377 Geo Point parse error fix (#40447)
When geo point parsing threw a parse exception, it did not consume
remaining tokens from the parser. This in turn meant that
indexing documents with malformed geo points into mappings with
ignore_malformed=true would fail in some cases, since DocumentParser
expects geo_point parsing to end on the END_OBJECT token.

Related to #17617
2019-03-29 17:39:12 +01:00
Luca Cavanna a0b02ce6ef Move top-level pipeline aggs out of QuerySearchResult (#40319)
As part of #40177 we have added top-level pipeline aggs to
`InternalAggregations`. Given that `QuerySearchResult` holds an
`InternalAggregations` instance, there is no need to keep on setting
top-level pipeline aggs separately. Top-level pipeline aggs can then
always be transported through `InternalAggregations`. Such change is
made in a backwards compatible manner.
2019-03-29 17:01:14 +01:00
Oghenovo Usiwoma 444b4c4136 Improve error message for absence of indices (#39789)
"no indices exist" has been added to the error message for absence of indices
2019-03-29 17:01:14 +01:00
Luca Cavanna 48b0deef4f Remove throws IOException from PipelineAggregationBuilder#create (#40222)
IOException are never thrown in any of the existing pipeline aggregation
builders. Removing the throws IOException from the create method allows
to remove it also from a couple of other methods which ends up simplifying
 AggregationPhase (one less catch).
2019-03-29 17:01:14 +01:00
Jason Tedor 585f38787c
Add usage indicators for the bundled JDK (#40616)
This commit adds indications whether or not a distribution is from the
bundled JDK, and whether or not we are using the bundled JDK.
2019-03-29 08:25:32 -04:00
Martijn van Groningen be31800154
Update ingest jdocs that a null return value will drop the current document. (#40359) 2019-03-29 09:46:54 +01:00
Jason Tedor 7255562afd
Add start and stop time to cat recovery API (#40378)
The cat recovery API is incredibly useful. Yet it is missing the start
and stop time as an option from the output. This commit adds these as
options to the cat recovery API. We elect to make these not visible by
default to avoid breaking the output that users might rely on.
2019-03-28 16:23:37 -04:00
Mayya Sharipova 24755209b4 Add randomScore function in script_score query (#40186)
To make script_score query to have the same features
as function_score query, we need to add randomScore
function.

This function produces different
random scores on different index shards.
It is also able to produce random scores
based on the internal Lucene Document Ids.
2019-03-28 13:23:47 -04:00
David Turner 1a3916a8de Optimise rejection of out-of-range `long` values (#40325)
Today if you try and insert a very large number like `1e9999999` into a long
field we first construct this number as a `BigDecimal`, convert this to a
`BigInteger` and then reject it because it is out of range. Unfortunately
making such a large `BigInteger` is rather expensive.

We can avoid this expense by performing a (weaker) range check on the
`BigDecimal` representation of incoming `long`s too.

Relates #26137
Closes #40323
2019-03-28 12:27:34 +00:00
David Turner 073b13f5b0 Add docs for cluster.remote.*.proxy setting (#40281)
In #33062 we introduced the `cluster.remote.*.proxy` setting for proxied
connections to remote clusters, but left it deliberately undocumented since it
needed followup work so that it could work with SNI. However, since #32517 is
now closed we can add this documentation and remove the comment about its lack
of documentation.
2019-03-28 12:11:24 +00:00
jimczi 8775e37d03 Fix SearchResponseMerger#testMergeSearchHits
This commit fixes an edge case in tests where search hits are empty
after the merge but some shards returned hits. This can happen if
the total number of merged hits is less than the provided `from`.

Closes #40553
2019-03-28 09:57:21 +01:00
Adrien Grand 65a35c985c
Remove type from VersionConflictEngineException. (#37490) (#40514)
It initially mentioned the type in the exception because the type used to be
required to uniquely identify a document. This is not necessary anymore given
that indices have at most one type.
2019-03-28 09:32:09 +01:00
Adrien Grand 2326a3dccb
Remove String interning from `o.e.index.Index`. (#40350) (#40517)
`Index` interns its name and uuid. My guess is that the main goal is to avoid
having duplicate strings in the representation of the cluster state. However
I doubt it helps much given that we have many other objects in the cluster state
that we don't try to reuse, and interning has some cost. When looking into
 #40263 my profiler pointed to string interning because of the `Index` object
that is created in `QueryShardContext` as one of the bottlenecks of the
`can_match` phase.
2019-03-28 09:31:42 +01:00
Andy Bristol 23395a9b9f
search as you type fieldmapper (#35600)
Adds the search_as_you_type field type that acts like a text field optimized
for as-you-type search completion. It creates a couple subfields that analyze
the indexed terms as shingles, against which full terms are queried, and a
prefix subfield that analyze terms as the largest shingle size used and
edge-ngrams, against which partial terms are queried

Adds a match_bool_prefix query type that creates a boolean clause of a term
query for each term except the last, for which a boolean clause with a prefix
query is created.

The match_bool_prefix query is the recommended way of querying a search as you
type field, which will boil down to term queries for each shingle of the input
text on the appropriate shingle field, and the final (possibly partial) term
as a term query on the prefix field. This field type also supports phrase and
phrase prefix queries however
2019-03-27 13:29:13 -07:00