1379 Commits

Author SHA1 Message Date
Julie Tibshirani
8e8bd56cc7
In MatchQuery, remove a check for fragile search analyzers. (#33927)
As far as I can tell this guard against fragile analyzers is no longer relevant, since
we stopped setting special analyzers on numeric fields (3bf6f4). Instead of removing
the guard completely, I opted to keep a check for untokenized + unnormalized fields
to avoid going through the analysis process unnecessarily.

My motivation for simplifying this check is that I'd like to add support for
`split_queries_on_whitespace` to the new 'queryable object' fields. As it stands, I would
have to add a dedicated instanceof check for the new mapper, which is not optimal.
2018-09-24 08:56:13 -07:00
Tim Brooks
78e483e8d8
Introduce abstract security transport testcase (#33878)
This commit introduces an AbstractSimpleSecurityTransportTestCase for
security transports. This classes provides transport tests that are
specific for security transports. Additionally, it fixes the tests referenced in
#33285.
2018-09-24 09:44:44 -06:00
Ignacio Vera
df333ca305
TESTS: Make score Float#NaN when there is no max score (#33997)
* TESTS: Make score Float#NaN when there is no max score

Fixes test failure due to maxScore set to Float#MinValue instead
on Float#NaN. In addition the initial value for maxScore is set to
Float#NEGATIVE_INFINITY so it is an illegal value.

Closes #33993
2018-09-24 17:36:48 +02:00
Luca Cavanna
e389d9e296
Clarify RemoteClusterService#groupIndices behaviour (#33899)
When executing a cross-cluster search, we need to search against all local indices (and no remote indices) in case no indices are specified. Also, if only remote indices are specified, no local indices will be queried. We previously added empty local indices whenever they were not present in the map of the grouped indices, then we would act differently later based on the extracted remote indices. Instead, we now add the empty array for local indices only in case we need to search all local indices; the entry for local indices is not added when local indices should not be searched. This way the grouped indices reflect reality and provide a better indication of what indices will be searched.
2018-09-24 11:45:33 +02:00
Christophe Bismuth
47ed6c79ee [TEST] Add validate query tests for empty and malformed queries (#33862)
Relates to #33095
2018-09-24 11:21:47 +02:00
Simon Willnauer
7d703c2f92
Fix AutoQueueAdjustingExecutorBuilder settings validation (#33922)
Settings validation in AutoQueueAdjustingExecutorBuilder always checked against
a default value which means that we never can change a max queue size that is lower
than the default. This change adds tests and fixes this validation.
2018-09-24 07:45:50 +02:00
Nhat Nguyen
432e61c971 Adjust bwc for resync request (#33964)
Relates #33964
2018-09-22 19:29:38 -04:00
Nhat Nguyen
f2f08dd6c5 Adjust bwc for recovery request (#33693)
Relates #33693
2018-09-22 19:28:20 -04:00
Nhat Nguyen
e7ae2f9d36
Propagate auto_id_timestamp in primary-replica resync (#33964)
A follow-up of #33693 to propagate max_seen_auto_id_timestamp in a
primary-replica resync.

Relates #33693
2018-09-22 11:40:10 -04:00
Nhat Nguyen
7944a0cb25
Track max seq_no of updates or deletes on primary (#33842)
This PR is the first step to use seq_no to optimize indexing operations.
The idea is to track the max seq_no of either update or delete ops on a
primary, and transfer this information to replicas, and replicas use it
to optimize indexing plan for index operations (with assigned seq_no).

The max_seq_no_of_updates on primary is initialized once when a primary
finishes its local recovery or peer recovery in relocation or being
promoted. After that, the max_seq_no_of_updates is only advanced internally
inside an engine when processing update or delete operations.

Relates #33656
2018-09-22 08:02:57 -04:00
Vladimir Dolzhenko
9c0316869b
Store: keep IndexFormatTooOldException and IndexFormatTooNewException in corruption marker (#33920)
Closes #33916
2018-09-21 14:00:02 +02:00
Nik Everett
cac93949fe
API: Drop deprecated methods from Retry (#33925)
We deprecated the `Retry.withBackoff` flavors with `Settings` in 6.5
because they were no longer needed. This drops them form 7.0.
2018-09-21 07:55:50 -04:00
Christoph Büscher
b654d986d7
Add OneStatementPerLineCheck to Checkstyle rules (#33682)
This change adds the OneStatementPerLineCheck to our checkstyle precommit
checks. This rule restricts the number of statements per line to one. The
resoning behind this is that it is very difficult to read multiple statements on
one line. People seem to mostly use it in short lambdas and switch statements in
our code base, but just going through the changes already uncovered some actual
problems in randomization in test code, so I think its worth it.
2018-09-21 11:52:31 +02:00
Nhat Nguyen
5f7f793f43
Propagate max_auto_id_timestamp in peer recovery (#33693)
Today we don't store the auto-generated timestamp of append-only
operations in Lucene; and assign -1 to every index operations
constructed from LuceneChangesSnapshot. This looks innocent but it
generates duplicate documents on a replica if a retry append-only
arrives first via peer-recovery; then an original append-only arrives
via replication. Since the retry append-only (delivered via recovery)
does not have timestamp, the replica will happily optimizes the original
request while it should not.

This change transmits the max auto-generated timestamp from the primary
to replicas before translog phase in peer recovery. This timestamp will
prevent replicas from optimizing append-only requests if retry
counterparts have been processed.

Relates #33656 
Relates #33222
2018-09-20 19:53:30 -04:00
Vladimir Dolzhenko
dbe6405354 mute RemoveCorruptedShardDataCommandTests.testCorruptedIndex 2018-09-20 21:30:40 +02:00
Nhat Nguyen
76a1a863e3
TEST: stop assertSeqNos if shards movement (#33875)
Currently, assertSeqNos assumes that the cluster is stable at the end of
the test (i.e., no more shard movement). However, this assumption does
not always hold. In these cases, we can stop the assertion instead of
failing a test.

Closes #33704
2018-09-20 13:44:26 -04:00
Christoph Büscher
28b1d41007 Fix unused import checktyle issue 2018-09-20 19:42:15 +02:00
Nhat Nguyen
002f763c48
Restore local history from translog on promotion (#33616)
If a shard was serving as a replica when another shard was promoted to
primary, then its Lucene index was reset to the global checkpoint.
However, if the new primary fails before the primary/replica resync
completes and we are now being promoted, we have to restore the reverted
operations by replaying the translog to avoid losing acknowledged writes.

Relates #33473
Relates #32867
2018-09-20 13:21:11 -04:00
Nhat Nguyen
b13a434f59 Remove wrong assert in LocalCheckpointTrackerTests
It's possible for the set "seqNos" to contain only the "unFinishedSeq"
in the testConcurrentReplica test. If this is the case, the call
`randomValueOtherThan` won't make any progress because the predicate
will never be false.

This commit removes this expectation because it's incorrect and it's no
longer needed as we have a dedicated test to verify the contains method.

Relates #33871
2018-09-20 13:12:19 -04:00
Alan Woodward
b33c18d316
Move SoraniNormalizationFilterFactory to the common analysis plugin (#33892)
Follow up to #25715
2018-09-20 17:31:41 +01:00
Yannick Welsch
db327818dd [TEST] Enable DEBUG logging on testCreateShrinkIndexToN 2018-09-20 18:16:20 +02:00
Nik Everett
f963c29876
Logging: Drop Settings from some logger lookups (#33859)
Drops `Settings` from some of the methods to lookup loggers and
deprecates another logger lookup that takes `Settings` because
`Settings` is no longer required to build a logger.
2018-09-20 10:42:48 -04:00
Jake Landis
e37e5dfc04
ingest: support simulate with verbose for pipeline processor (#33839)
* ingest: support simulate with verbose for pipeline processor

This change better supports the use of simulate?verbose with the
pipeline processor. Prior to this change any pipeline processors
executed with simulate?verbose would not show all intermediate 
processors for the inner pipelines.

This changes also moves the PipelineProcess and TrackingResultProcessor
classes to enable instance checks and to avoid overly public classes.
As well this updates the error message for when cycles are detected
in pipelines calling other pipelines.
2018-09-20 08:33:07 -05:00
Simon Willnauer
3522b9084b
Introduce a search_throttled threadpool (#33732)
Today all searches happen on the search threadpool which is the correct
behavior in almost any case. Yet, there are exceptions where for instance
searches searches should be passed through a single-thread
thread-pool to reduce impact on a node. This change adds a index-private setting that allows to mark an index as throttled for searches and forks off all non-stats searcher access to this thread-pool for indices that are marked as `index.search.throttled`
2018-09-20 13:43:11 +02:00
David Turner
c041e94349
Test that transient settings beat persistent ones (#33818)
Transient settings override persistent settings, but in fact all of the tests
that run as part of `:server:test` and `:server:integTest` will pass if the
precedence is changed to be the other way round. This change adds a test that
verifies the precedence is as documented.
2018-09-20 11:17:19 +01:00
Tim Vernum
8d50c10208 Mute ShrinkIndexIT.testCreateShrinkIndexToN on Windows
Relates: #33857
2018-09-20 18:21:15 +10:00
Daniel Mitterdorfer
b1cc58e425
Allow to clear the fielddata cache per field
With this commit we clear the fielddata cache per field as it is
supposed to be. Previously we retrieved the proper field from the cache
but then cleared the entire cache anyway.

Closes #33798
Relates #33807
2018-09-20 08:59:53 +02:00
Tim Vernum
1f1ebb4656 Add additional null check in _cat/shards
The target of the func lambda may be null (e.g. in a mixed cluster
where older nodes lack some of the values)

Relates: #33858 / 331caba
Closes #33877
2018-09-20 06:44:13 +02:00
Nhat Nguyen
05bf9dc2e8
Add contains method to LocalCheckpointTracker (#33871)
This change adds "contains" method to LocalCheckpointTracker.
One of the use cases is to check if a given operation has been processed
in an engine or not by looking up its seq_no in LocalCheckpointTracker.

Relates #33656
2018-09-19 20:29:36 -04:00
Nik Everett
26c4f1fb6c
Core: Default node.name to the hostname (#33677)
Changes the default of the `node.name` setting to the hostname of the
machine on which Elasticsearch is running. Previously it was the first 8
characters of the node id. This had the advantage of producing a unique
name even when the node name isn't configured but the disadvantage of
being unrecognizable and not being available until fairly late in the
startup process. Of particular interest is that it isn't available until
after logging is configured. This forces us to use a volatile read
whenever we add the node name to the log.

Using the hostname is available immediately on startup and is generally
recognizable but has the disadvantage of not being unique when run on
machines that don't set their hostname or when multiple elasticsearch
processes are run on the same host. I believe that, taken together, it
is better to default to the hostname.

1. Running multiple copies of Elasticsearch on the same node is a fairly
advanced feature. We do it all the as part of the elasticsearch build
for testing but we make sure to set the node name then.
2. That the node.name defaults to some flavor of "localhost" on an
unconfigured box feels like it isn't going to come up too much in
production. I expect most production deployments to at least set the
hostname.

As a bonus, production deployments need no longer set the node name in
most cases. At least in my experience most folks set it to the hostname
anyway.
2018-09-19 15:21:29 -04:00
Simon Willnauer
a92dda2e7e
Move CompletionStats into the Engine (#33847)
By moving CompletionStats into the engine we can easily cache the stats for
read-only engines if necessary. It also moves the responsibiltiy out of IndexShard
which has quiet some complexity already.

Relates to #33835
2018-09-19 20:35:57 +02:00
Simon Willnauer
0fa5758bc6
Fix potential NPE in _cat/shards/ with partial CommonStats (#33858)
Today if we fetch common stats from a shard we might get a partial response
if the shard is closed while we fetch the stats. This causes hard to track and
reproduce NPEs. This change streamlines null checking to ensure we only render
stats we actually received.
2018-09-19 20:34:54 +02:00
Nik Everett
3ede13a454
Test framework fall cleaning (#33423)
Wraps all lines in our test framework at 140 characters because that is
our standard line length and removes all of the checkstyle suppressions
for the test framework.

Drops most of `ModuleTestCase` because it isn't used and we're moving
away from using guice in the way that it wants to test anyway. Also
switches a few classes that extend it but don't use it to extend
`ESTestCase` instead.
2018-09-19 14:34:02 -04:00
Simon Willnauer
6ec12bef0d Add missing IndexShard#readAllowed()
This was lost in #33835
2018-09-19 17:07:13 +02:00
Alan Woodward
5107949402
Allow TokenFilterFactories to rewrite themselves against their preceding chain (#33702)
We currently special-case SynonymFilterFactory and SynonymGraphFilterFactory, which need to 
know their predecessors in the analysis chain in order to correctly analyze their synonym lists. This
special-casing doesn't work with Referring filter factories, such as the Multiplexer or Conditional
filters. We also have a number of filters (eg the Multiplexer) that will break synonyms when they
appear before them in a chain, because they produce multiple tokens at the same position.

This commit adds two methods to the TokenFilterFactory interface.

* `getChainAwareTokenFilterFactory()` allows a filter factory to rewrite itself against its preceding
  filter chain, or to resolve references to other filters. It replaces `ReferringFilterFactory` and
  `CustomAnalyzerProvider.checkAndApplySynonymFilter`, and by default returns `this`.
* `getSynonymFilter()` defines whether or not a filter should be applied when building a synonym
  list `Analyzer`. By default it returns `true`.

Fixes #33609
2018-09-19 15:52:14 +01:00
Christoph Büscher
546e7361ed
[Tests] Nudge wait time in RemoteClusterServiceTests (#33853)
This test occasionally fails in `testCollectSearchShards` waiting on what seems
to be a search request to a remote cluster for one second. Given that the test
fails here very rarely I suspect maybe one second is very rarely not enough so
we could fix it by increasing the max wait time slightly.

Closes #33852
2018-09-19 15:58:35 +02:00
Simon Willnauer
0c77f45dc6
Move DocsStats into Engine (#33835)
By moving DocStats into the engine we can easily cache the stats for
read-only engines if necessary. It also moves the responsibility out of IndexShard
which has quiet some complexity already.
2018-09-19 11:03:11 +02:00
Vladimir Dolzhenko
a3e8b831ee
add elasticsearch-shard tool (#32281)
Relates #31389
2018-09-19 10:28:22 +02:00
Simon Willnauer
251489d59a
Cut over to unwrap segment reader (#33843)
The fix in #33757 introduces some workaround since FilterCodecReader didn't
support unwrapping. This cuts over to a more elegant fix to access the readers
segment infos.
2018-09-19 10:18:03 +02:00
Jim Ferenczi
61e1df0274
Use the global doc id to generate a random score (#33599)
This commit changes the random_score function to use the global docID of the document
rather than the segment docID to generate random scores. As a result documents that have
the same segment docID within the shard will generate different scores.
2018-09-19 09:28:38 +02:00
Adrien Grand
c4261bab44
Add minimal sanity checks to custom/scripted similarities. (#33564)
Add minimal sanity checks to custom/scripted similarities.

Lucene 8 introduced more constraints on similarities, in particular:
 - scores must not be negative,
 - scores must not decrease when term freq increases,
 - scores must not increase when norm (interpreted as an unsigned long)
   increases.

We can't check every single case, but could at least run some sanity checks.

Relates #33309
2018-09-19 09:19:13 +02:00
Ignacio Vera
7f473b683d
Profiler: Don’t profile NEXTDOC for ConstantScoreQuery. (#33196)
* Profiler: Don’t profile NEXTDOC for ConstantScoreQuery.

A ConstantScore query will return the iterator of its inner query.
However, when profiling, the constant score query is wrapped separately
from its inner query, which distorts the times emitted by the profiler.
Return the iterator directly in such a case.

Closes #23430
2018-09-18 23:32:16 -07:00
Zachary Tong
f4cbbcf98b
Add ES version 6.4.2 (#33831)
Version and properties files
2018-09-18 15:25:20 -04:00
Armin Braun
c6462057a1
MINOR: Remove Some Dead Code in Scripting (#33800)
* The is default check method is not used in ScriptType
* The removed vars on ExpressionSearchScript are unused
2018-09-18 20:43:31 +02:00
Simon Willnauer
9026c3ee92
Ensure realtime _get and _termvectors don't run on the network thread (#33814)
The change in #27500 introduces this regression that causes `_get` and `_term_vector`
actions to run on the network thread if the realtime flag is set.
This fixes the issue by delegating to the super method forking on the corresponding threadpool.
2018-09-18 19:53:42 +02:00
Simon Willnauer
98ccd94962
Factor out a ChannelActionListener (#33819)
We use similar / same concepts in SerachTransportService and HandledTransportAction but both
duplicate the efforts with slightly different implementation details. This streamlines
sending responses / exceptions back to a channel in an ActionListener with appropriate logging.
2018-09-18 19:53:26 +02:00
Jim Ferenczi
241c74efb2
upgrade to a new snapshot of Lucene 8 (7d0a7782fa) (#33812) 2018-09-18 18:16:40 +02:00
David Turner
421f58e172
Remove discovery-file plugin (#33257)
In #33241 we moved the file-based discovery functionality to core
Elasticsearch, but preserved the `discovery-file` plugin, and support for the
existing location of the `unicast_hosts.txt` file, for BWC reasons. This commit
completes the removal of this plugin.
2018-09-18 12:01:16 +01:00
markharwood
2fa09f062e
New plugin - Annotated_text field type (#30364)
New plugin for annotated_text field type.
Largely a copy of `text` field type but adds ability to include markdown-like syntax in the text.
The “AnnotatedText” class parses text+markup and converts into plain text and AnnotationTokens.
The annotation token values are injected unchanged alongside the regular text tokens to provide a
form of additional indexed overlay useful in positional searches and highlighting.
Annotated_text fields do not support fielddata as we want to phase this out.
Also includes a new "annotated" highlighter type that retains annotations and merges in search
hits as additional annotation markup.

Closes #29467
2018-09-18 10:25:27 +01:00
Armin Braun
87cedef3cf
NETWORKING:Def CName in Http Publish Addr to True (#33631)
* Follow up to #32806 setting the setting to true for 7.x
2018-09-18 10:29:02 +02:00