8984 Commits

Author SHA1 Message Date
Simon Willnauer
b294250aba
Remove unused searcher parameter in SearchService#createContext (#27227)
This parameter isn't used anywhere and just adds complexity.
2017-11-02 14:58:34 +01:00
Colin Goodheart-Smithe
c1b8140c83
Upgrade to Lucene 7.1 (#27225) 2017-11-02 13:25:33 +00:00
Simon Willnauer
f928d613ad
Move IndexShard#getWritingBytes() under InternalEngine (#27209)
We do some accounting in IndexShard that is not necessarily correct since
we maintain two different index readers. This change moves the accounting under
the engine which knows what reader we are refreshing.

Relates to #26972
2017-11-02 10:43:17 +01:00
olcbean
b9896465cd Introducing took time for _msearch
This commit adds the took time to the response for _msearch.

Relates #23767
2017-11-01 21:39:04 -04:00
Jason Tedor
59657ad1cb
Lazy initialize checkpoint tracker bit sets
This local checkpoint tracker uses collections of bit sets to track
which sequence numbers are complete, eventually removing these bit sets
when the local checkpoint advances. However, these bit sets were eagerly
allocated so that if a sequence number far ahead of the checkpoint was
marked as completed, all bit sets between the "last" bit set and the bit
set needed to track the marked sequence number were allocated. If this
sequence number was too far ahead, the memory requirements could be
excessive. This commit opts for a different strategy for holding on to
these bit sets and enables them to be lazily allocated.

Relates #27179
2017-11-01 21:26:52 -04:00
Jason Tedor
90d6317437
Remove checkpoint tracker bit sets setting
We added an index-level setting for controlling the size of the bit sets
used to back the local checkpoint tracker. This setting is really only
needed to control the memory footprint of the bit sets but we do not
think this setting is going to be needed. This commit removes this
setting before it is released to the wild after which we would have to
worry about BWC implications.

Relates #27191
2017-11-01 21:13:01 -04:00
Colin Goodheart-Smithe
99aca9cdfc
Enhances exists queries to reduce need for _field_names (#26930)
* Enhances exists queries to reduce need for `_field_names`

Before this change we wrote the name all the fields in a document to a `_field_names` field and then implemented exists queries as a term query on this field. The problem with this approach is that it bloats the index and also affects indexing performance.

This change adds a new method `existsQuery()` to `MappedFieldType` which is implemented by each sub-class. For most field types if doc values are available a `DocValuesFieldExistsQuery` is used, falling back to using `_field_names` if doc values are disabled. Note that only fields where no doc values are available are written to `_field_names`.

Closes #26770

* Addresses review comments

* Addresses more review comments

* implements existsQuery explicitly on every mapper

* Reinstates ability to perform term query on `_field_names`

* Added bwc depending on index created version

* Review Comments

* Skips tests that are not supported in 6.1.0

These values will need to be changed after backporting this PR to 6.x
2017-11-01 10:46:59 +00:00
Martijn van Groningen
d805c41b28
Added new terms_set query
This query returns documents that match with at least one ore more
of the provided terms. The number of terms that must match varies
per document and is either controlled by a minimum should match
field or computed per document in a minimum should match script.

Closes #26915
2017-11-01 10:55:18 +01:00
Jack Conradson
fd73e5fa41 Add version 6.0.0 2017-10-31 17:49:52 -07:00
Tanguy Leroux
13cd08b1e6
Convert index blocks to cluster block exceptions (#27050) 2017-10-31 16:11:18 +01:00
Shai Erera
bd0261916c Fix Laplace scorer to multiply by alpha (and not add) (#27125) 2017-10-31 13:08:44 +01:00
javanna
34666844b3 [DOCS] Clarify migrate guide and search request validation
Relates to  #26811
2017-10-31 12:36:00 +01:00
kel
c3e2bdf20c Raise IllegalArgumentException if query validation failed (#26811)
Closes #26799
2017-10-31 12:17:27 +01:00
Armin Braun
a4c159e91e prevent duplicate fields when mixing parent and root nested includes (#27072)
Closes #26990
2017-10-31 10:01:33 +01:00
Adrien Grand
3812d3cb43
TopHitsAggregator must propagate calls to setScorer. (#27138)
It is required in order to work correctly with bulk scorer implementations
that change the scorer during the collection process. Otherwise sub collectors
might call `Scorer.score()` on the wrong scorer.

Closes #27131
2017-10-31 09:59:06 +01:00
Jason Tedor
a566942219
Refactor internal engine
This commit is a minor refactoring of internal engine to move hooks for
generating sequence numbers into the engine itself. As such, we refactor
tests that relied on this hook to use the new hook, and remove the hook
from the sequence number service itself.

Relates #27082
2017-10-30 13:10:20 -04:00
Martijn van Groningen
c406a91158
Fix division by zero in phrase suggester that causes assertion to fail 2017-10-30 09:04:56 +01:00
Nhat
d01ad9367e Enable Docstats with totalSizeInBytes for 6.1.0
Relates https://github.com/elastic/elasticsearch/pull/27117
2017-10-28 14:54:53 -04:00
Nhat
07d270b45f
Adds average document size to DocsStats (#27117)
This change is required in order to support a size based check for the
index rollover.

The index size is estimated by sampling the existing segments only. We
prefer using segments to StoreStats because StoreStats is not reliable
if indexing or merging operations are in progress.

Relates #27004
2017-10-28 12:47:08 -04:00
Jim Ferenczi
6625ecfff4 Fix max score tracking with field collapsing (#27122)
This change makes sure that we track score when sort is set to relevancy only.
In this case we always track max score like normal search does.

Closes #23840
2017-10-27 09:18:34 +02:00
olcbean
35a2cc1003 fixed typo in ConstructingObjectParse (#27129) 2017-10-26 13:14:56 -06:00
Jim Ferenczi
d1acf449f5 Apply missing request options to the expand phase (#27118)
* Apply missing request options to the expand phase

This change adds some missing options to the expand query that builds the inner hits for field collapsing.
The following options are now applied to the inner_hits query:
 * post_filters
 * preferences
 * routing

Closes #27079
Closes #26649
2017-10-26 17:01:57 +02:00
Simon Willnauer
1460a3feac Only pull SegmentReader once in getSegmentInfo (#27121) 2017-10-26 14:56:14 +02:00
Jason Tedor
0174d13ca2 Fix BWC for discovery stats
The new discovery stats were pushed to the 6.x branch (currently
versioned at 6.1.0) but master was not updated to reflect this. This
impacts the mixed-cluster BWC tests because a 6.1.0 node will be trying
to send a 7.0.0 node the new discovery stats but the 7.0.0 did not yet
understand that it should be reading these when talking to a 6.1.0
node. This commit addresses this, and changes the skip version on the
discovery stats REST tests.
2017-10-26 07:53:18 -04:00
Catalin Ursachi
8bf33241ed Add Delete Index API support to high-level REST client (#27019)
Relates to #25847
2017-10-26 09:52:46 +02:00
Jason Tedor
77f87732ef Adjust .DS_Store test assertions on Windows
Windows handles trying to read a file that does not exist because a
component of the path is not a directory differently than other OS
handle this situation. This commit adjusts these assertions for Windows.
2017-10-25 22:36:53 -04:00
Jason Tedor
17d6820a4b Emit settings deprecation logging on empty update
When executing a cluster settings update that leaves the cluster state
unchanged, we skip validation and this avoids deprecation logging for
deprecated settings in the cluster state. This commit addresses this by
running validation even if the settings are unchanged.

Relates #27017
2017-10-25 22:15:38 -04:00
Jason Tedor
9aae2f593a Avoid stack overflow on search phases
When a search is executing locally over many shards, we can stack
overflow during query phase execution. This happens due to callbacks
that occur after a phase completes for a shard and we move to the same
phase on another shard. If all the shards for the query are local to the
local node then we will never go async and these callbacks will end up
as recursive calls. With sufficiently many shards, this will end up as a
stack overflow. This commit addresses this by truncating the stack by
forking to another thread on the executor for the phase.

Relates #27069
2017-10-25 22:05:46 -04:00
Nhat
adc195e30c Fix error message for a put index template request without index_patterns (#27102)
Just correct the error message from "Validation Failed: 1: pattern is
missing;" to "Validation Failed: 1: index_patterns is missing;".

Closes #27100
2017-10-25 18:54:40 -04:00
Armin Braun
6533b165d6 #25601 Add pipeline support for REST API bulk upsert (#27075) 2017-10-25 19:03:25 +02:00
Jason Tedor
6722b9c4a2 Ignore .DS_Store files on macOS
Finder creates these files if you browse a directory there. These files
are really annoying, but it's an incredible pain for users that these
files are created unbeknownst to them, and then they get in the way of
Elasticsearch starting. This commit adds leniency on macOS only to skip
these files.

Relates #27108
2017-10-25 11:25:29 -04:00
Luca Cavanna
5818ff6b56 Make ShardSearchTarget optional when parsing ShardSearchFailure (#27078)
Turns out that `ShardSearchTarget` is nullable, hence its fields may not be printed out as part of `ShardSearchFailure#toXContent`, in which case `fromXContent` cannot parse it back. We would previously try to create the object with all of its fields set to null, but `Index` complains about it in the constructor. Also made sure that this code path is covered by our unit tests in `ShardSearchFailureTests`.

Closes #27055
2017-10-25 13:26:06 +02:00
Luca Cavanna
8caf7d4ff8 Decouple BulkProcessor from ThreadPool (#26727)
Introduce minimal thread scheduler as a base class for `ThreadPool`. Such a class can be used from the `BulkProcessor` to schedule retries and the flush task. This allows to remove the `ThreadPool` dependency from `BulkProcessor`, which requires to provide settings that contain `node.name` and also needed log4j for logging. Instead, it needs now a `Scheduler` that is much lighter and gets automatically created and shut down on close.

Closes #26028
2017-10-25 10:30:23 +02:00
David Turner
cc3364e4f8 Stats to record how often the ClusterState diff mechanism is used successfully (#26973)
It's believed that using diffs obsoletes the other mechanism for reusing the
bits of the ClusterState that didn't change between updates, but in fact we
don't know for sure how often the diff mechanism works successfully. The stats
collected here will tell us.
2017-10-25 07:35:25 +01:00
Lee Hinman
6bc7024f26 Tie-break shard path decision based on total number of shards on path (#27039)
Right now if the number of shards for a particular index is equal across the
data paths, we tie-break on space. This changes to tie-break first on the total
number of shards for each path, and then, if that is the same, on the usable
bytes.

Relates to #26654 (it's a follow-up)
2017-10-24 16:12:47 -06:00
Jason Tedor
7a792d2c1f Timed runnable should delegate to abstract runnable
If timed runnable wraps an abstract runnable, then it should delegate to
the abstract runnable otherwise force execution and handling rejections
is dropped on the floor. Thus, timed runnable should itself be an
abstract runnable delegating all methods to the wrapped runnable in
cases when it is an abstract runnable. This commit causes this to be the
case.

Relates #27095
2017-10-24 11:36:50 -04:00
Lee Hinman
fcfbdf1f37 Expose adaptive replica selection stats in /_nodes/stats API
This exposes the collected metrics we store for ARS in the nodes stats, as well
as the computed rank of nodes. Each node exposes its perspective about the
cluster.

Here's an example output (with `?human`):

```json
...
"adaptive_selection" : {
  "_k6v1-wERxyUd5ke6s-D0g" : {
    "outgoing_searches" : 0,
    "avg_queue_size" : 0,
    "avg_service_time" : "7.8ms",
    "avg_service_time_ns" : 7896963,
    "avg_response_time" : "9ms",
    "avg_response_time_ns" : 9095598,
    "rank" : "9.1"
  },
  "VJiCUFoiTpySGmO00eWmtQ" : {
    "outgoing_searches" : 0,
    "avg_queue_size" : 0,
    "avg_service_time" : "1.3ms",
    "avg_service_time_ns" : 1330240,
    "avg_response_time" : "4.5ms",
    "avg_response_time_ns" : 4524154,
    "rank" : "4.5"
  },
  "DHNGTdzyT9iiaCpEUsIAKA" : {
    "outgoing_searches" : 0,
    "avg_queue_size" : 0,
    "avg_service_time" : "2.1ms",
    "avg_service_time_ns" : 2113164,
    "avg_response_time" : "6.3ms",
    "avg_response_time_ns" : 6375810,
    "rank" : "6.4"
  }
}
...
```
2017-10-24 08:58:42 -06:00
David Turner
cf2d0834f5 Remove duplicated test (#27091) 2017-10-24 11:52:01 +01:00
Nhat
bf557fd886 test: avoid generating duplicate multiple fields (#27080)
Multifields parser does not allow duplicate values, however the
MultiFieldTests may produce duplicate field values.

See https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+release-tests/132/console.
2017-10-23 09:59:40 -04:00
Adrien Grand
d0104c22a5 Reduce the default number of cached queries. (#26949)
Memory usage of queries can't be properly accounted, which can be an issue when
large queries are cached since the actual memory usage will be much higher than
what the cache thinks. This problem is very hard if not impossible to fix so as
a workaround I would like to decrease the maximum number of cached queries so
that this problem is less likely to cause trouble in practice.

For the record, this problem is more likely to occur in envirenments that have
small shards or don't give much memory to the JVM.

Closes #26938
2017-10-23 14:11:35 +02:00
Jason Tedor
35984a616e Keep cumulative elapsed scroll time in microseconds
Today we internally accumulate elapsed scroll time in nanoseconds. The
problem here is that this can reasonably overflow. For example, on a
system with scrolls that are open for ten minutes on average, after
sixteen million scrolls the largest value that can be represented by a
long will be executed. To address this, we switch to internally
representing scrolls using microseconds as this enables with the same
number of scrolls scrolls that are open for seven days on average, or
with the same average elapsed time sixteen billion scrolls which will
never happen (executing one scroll a second until sixteen billion have
executed would not occur until more than five-hundred years had
elapsed).

Relates #27068
2017-10-21 13:18:28 +02:00
Tanguy Leroux
463e7e6fa3 Revert "Upgrade to Jackson 2.9.2 (#27032)"
This reverts commit 0b9acc5acea90887cfab666a05cb6d3cd8aa1e02.
2017-10-20 08:25:41 +02:00
Tanguy Leroux
0b9acc5ace Upgrade to Jackson 2.9.2 (#27032)
Upgrade to Jackson 2.9.2 and also use a boolean `closed` flag to
indicate that a FastStringReader instance is closed, so that length
is still correctly reported after the reader is closed.
2017-10-19 15:15:02 +02:00
Martijn van Groningen
87c9b79b10
Return the _source of inner hit nested as is without wrapping it into its full path context
Due to a change happened via #26102 to make the nested source consistent
with or without source filtering, the _source of a nested inner hit was
always wrapped in the parent path. This turned out to be not ideal for
users relying on the nested source, as it would require additional parsing
on the client side. This change fixes this, the _source of nested inner hits
is now no longer wrapped by parent json objects, irregardless of whether
the _source is included as is or source filtering is used.

Internally source filtering and highlighting relies on the fact that the
_source of nested inner hits are accessible by its full field path, so
in order to now break this, the conversion of the _source into its binary
form is performed in FetchSourceSubPhase, after any potential source filtering
is performed to make sure the structure of _source of the nested inner hit
is consistent irregardless if source filtering is performed.

PR for #26944

Closes #26944
2017-10-19 12:04:56 +02:00
Alexander Kazakov
9a3a1cd1b7 Handle leniency for cross_fields type in multi_match query (#27045) 2017-10-19 10:29:28 +02:00
Stephen Yeargin
8a05e5b92c Fix typo in thrown exception in IndicesAliasesRequest (#27025)
There is a typo in the exception thrown in `IndicesAliasesRequest`. This PR corrects the spelling and removes extraneous word.
2017-10-18 13:54:16 +00:00
Lee Hinman
78c54c4560 Balance shards for an index more evenly across multiple data paths (#26654)
* Balance shards for an index more evenly across multiple data paths

When a node has multiple data paths configured, and is assigned all of the
shards for a particular index, it's possible now that all shards will be
assigned to the same path (see #16763).

This change keeps the same behavior around determining the "best" path for a
shard based on space, however, it enforces limits for the number of shards on a
path for an index from the single-node perspective. For example:

Assume you had a node with 4 data paths, where `/path1` has a tremendously high
amount of disk space available compared to the other paths. If you create an
index with 5 primary shards, the previous behavior would be to assign all 5
shards to `/path1`.

This change would enforce a limit of 2 shards to each data path for that
particular node, so you would end up with the following distribution:

- `/path1` - 2 shards (because it has the most usable space)
- `/path2` - 1 shard
- `/path3` - 1 shard
- `/path4` - 1 shard

Note, however, that this limit is only enforced at the local node level for
simplicity in implementation, so if you had multiple nodes, the "limit" for the
node is still 2, so assuming you had enough nodes that there was only 2 shards
for this index assigned to this node, they would still both be assigned to
`/path1`.

* Switch from ObjectLongHashMap to regular HashMap

* Remove unneeded Files.isDirectory check

* Skip iterating directories when not necessary

* Add message to assert

* Implement different (better) ranking for node paths

This is the method we discussed

* Remove unused pathHasEnoughSpace method

* Use findFirst instead of .get(0);

* Update for master merge to fix compilation

Settings.putArray -> Settings.putList
2017-10-17 05:49:24 -06:00
Jason Tedor
62bf3c11a9 Stop invoking non-existant syscall
Today when getting ready to enter seccomp, we do some probes to ensure
that we are really talking to seccomp, etc. One of these probes is pure
paranoia. The paranoia was driven by a kernel bug
(https://lkml.org/lkml/2014/7/20/222) that only impacted 32-bit x86
kernels wherein invoking a non-existant syscall was not returning ENOSYS
(as it should). This probe causes problems though, for example in
containers with syscall filters, invoking a non-existant syscall will
lead to the process being sent SIGSYS and terminated. We do not need
this paranoid, we do not support 32-bit, and our other probes give us
enough of a defense to ensure that we are talking to seccomp (and we
hardcode the seccomp syscall number for platforms that we
support). Given that this probe offers us little value, but does cause
problems in valid use-cases, this commit removes this paranoia.

Relates #27016
2017-10-17 11:34:44 +02:00
Jason Tedor
3664ede9b5 Remove unnecessary exception for engine constructor
The internal engine constructor declares a checked engine exception yet
this constructor does not actually throw this exception. This commit
removes this declaration from the internal engine constructor.

Relates #27022
2017-10-16 10:17:37 -04:00
Simon Willnauer
8dda827ff4 Don't refresh on _flush _force_merge and _upgrade (#27000)
Today all these API calls have a sideeffect of making documents visible
to search requests. While this is sometimes desired it's an unnecessary sideeffect
and now that we have an internal (engine-private) index reader (#26972) we artificially
add a refresh call for bwc. This change removes this sideeffect in 7.0.
2017-10-16 10:16:35 +02:00