Commit Graph

5168 Commits

Author SHA1 Message Date
Simon Willnauer 35e705877b Limit retries of failed allocations per index (#18467)
Today if a shard fails during initialization phase due to misconfiguration, broken disks,
missing analyzers, not installed plugins etc. elasticsaerch keeps on trying to initialize
or rather allocate that shard. Yet, in the worst case scenario this ends in an endless
allocation loop. To prevent this loop and all it's sideeffects like spamming log files over
and over again this commit adds an allocation decider that stops allocating a shard that
failed more than N times in a row to allocate. The number or retries can be configured via
`index.allocation.max_retry` and it's default is set to `5`. Once the setting is updated
shards with less failures than the number set per index will be allowed to allocate again.

Internally we maintain a counter on the UnassignedInfo that is reset to `0` once the shards
has been started.

Relates to #18417
2016-05-20 20:37:45 +02:00
Jason Tedor 61f40156d3 Do not decode path when sending error
Today when sending a REST error to a client, we send the decoded
path. But decoding that path can already be the cause of the error in
which case decoding it again will just throw an exception leading to us
never sending an error back to the client. It would be better to send
the entire raw path to the client and that is what we do in this commit.

Relates #18477
2016-05-20 12:15:30 -04:00
Igor Motov ce88d7f9ab Fix race condition in snapshot initialization
When a snapshot initialization fails, the create snapshot method may return before the snapshot metadata in the cluster state is removed. This can cause follow up snapshot-API related calls to fail due to a snapshot still running. This is causing CI failures when we try to delete indices that were participating in failed snapshot to a read-only repository.

Closes #18121
2016-05-20 10:52:08 -04:00
Daniel Mitterdorfer 8b962bb234 Increase log level for NettyHttpRequestSizeLimitIT
This test fails spuriosly in CI and is not reproducible locally.

With this commit we temporarily increase the log level in a few
packages that are suspected to reveal the cause.
2016-05-20 15:12:52 +02:00
Jason Tedor 140f9dfe5f Fix scaling thread pool test bug
This commit fixes a test bug in the scaling thread pool configuration
test. In particular, the test randomization could select min and max for
a thread pool configuration where both are equal to zero. This is a
violation of the requirements of the ThreadPoolExecutor. With this
commit, we now ensure that the max is bounded below by one.
2016-05-20 08:58:08 -04:00
Martijn van Groningen 80fee8666f percolator: Removed percolator cache
Before 5.0 for it was required that the percolator queries were cached in jvm heap as Lucene queries for two reasons:
1) Performance. The percolator evaluated all percolator queries all the time. There was no pre-selecting queries that are likely to match like we have today.
2) Updates made to percolator queries were visible in realtime, Today these changes are visible in near realtime. So updating no longer requires the percolator to have the queries in jvm heap.

So having the percolator queries in jvm heap via the percolator cache is now less attractive. Especially when there are many percolator queries then these queries can consume many GBs of jvm heap.
Removing the percolator cache does make the percolate query slower compared to how the execution time in 5.0.0-alpha1 and alpha2, but it is still faster compared to 2.x and before.
2016-05-20 14:52:16 +02:00
Christoph Büscher d3fe22c990 Improve adding clauses to `span_near` and `span_or` query
Currently the query builders expose the clauses of the span
query as a modifiable list. Instead we should make the that
getter return an unmodifiable list. Also renaming the method
used to add a clause from `clause(spanQuery)` to
`addClause(spanQuery)`.
2016-05-20 13:36:55 +02:00
Boaz Leskes 34ef5306d2 Snapshotting and sync could cause a dead lock TranslogWriter (#18481)
#18360 introduced an extra lock in order to allow writes while syncing the translog. This caused a potential deadlock with snapshotting code where we first acquire the instance lock, followed by a sync (which acquires the syncLock). However, the sync logic acquires the syncLock first, followed by the instance lock.

I considered solving this by not syncing the translog on snapshot - I think we can get away with just flushing it. That however will create subtleties around snapshoting and whether operations in them are persisted. I opted instead to have slightly uglier code with nest synchronized, where the scope of the change is contained to the TranslogWriter class alone.
2016-05-20 12:56:24 +02:00
Jason Tedor c257e2c51f Remove settings and system properties entanglement
Today when parsing settings during bootstrap, we add a system property
for every Elasticsearch setting. Additionally, settings can be set via
system properties. This commit simplifies this situation.
 - settings are no longer propogated to system properties
 - system properties can not be used to set settings
 - the "es." prefix on settings is no longer required (nor permitted)
 - test logging has a dedicated system property (tests.logger.level)

Relates #18198
2016-05-19 14:08:08 -04:00
Clinton Gormley dc33a83231 Remove the preserve_original option from the FingerprintAnalyzer (#18471)
The preserve_original option to the ASCIIFoldingFilter doesn't
play well with the FingerprintFilter, as it ends up producing
fingerprints like:

    "and consistent godel gödel is said sentence this yes"

The goal of the OpenRefine algorithm is to product a small normalized
ASCII fingerprint. There's no need to expose preserve_original.
2016-05-19 19:37:13 +02:00
Christoph Büscher d2515727d0 Improve random DateTimeZone creation in tests
We often require a random joda DateTimeZone in our tests. Currently
there are a few options for generating such a random DateTimeZone
from the set of available ids. Currently most random picks are not
really reproducable across different jvms because they rely on order
in the ids set implementation. The helper in DateProcessorFactoryTests
thus performs a sort on the set of ids before random picking from
the result, so I moved this to ESTestCase to make it publicly
available and changed all other tests to use that method.
2016-05-19 18:12:48 +02:00
Boaz Leskes 4d6887075f Log IndexShard.refresh logs under trace (#18435)
We log them every second...
2016-05-19 17:12:37 +02:00
Christoph Büscher 757ccf00b2 Enforce MatchQueryBuilder#maxExpansions() to be strictly positive 2016-05-19 16:59:37 +02:00
Jeff Evans 3cf4214255 Add better error message when analyzer created without tokenizer or analyzer type (#18455)
Closes #15492
2016-05-19 15:47:07 +02:00
Ali Beyad fc6df23fea Rename AggregatorBuilder and all of its subclasses to
AggregationBuilder, in keeping consistent with the Java APIs.

Closes #18377
Closes #18367
2016-05-19 09:25:29 -04:00
Tanguy Leroux 35d3bdab84 Add Google Cloud Storage repository plugin
Closes #12880
2016-05-19 13:26:23 +02:00
Martijn van Groningen e2691d7e5c test: Don't generate a value of 0, because FuzzyQuery constructor does't allow that 2016-05-19 13:08:30 +02:00
Martijn van Groningen 050145f61b parent/child: Allow adding additional child types that point to an existing parent type
From 2.0 adding child types to existing types was forbidden because the`_parent` field stores the join between parent and child at index time.
This is to protect from the fact that types that weren't a parent before become a parent while previously indexed documents would not have a join field.
 This would break the parent/child queries.

The restriction was a bit too strict in the sense that also if a type was a parent type the restriction would forbid adding child types that point to a parent type (so child points already point to it).
This change make sure that the restriction only applies if that type isn't a parent type already.

Closes #17956
2016-05-19 11:05:17 +02:00
Simon Willnauer d77c299cb9 Register `indices.query.bool.max_clause_count` setting (#18341)
* Register `indices.query.bool.max_clause_count` setting

This commit registers `indices.query.bool.max_clause_count` as a node
level setting and removes support for its synonym setting
`index.query.bool.max_clause_count`.

Closes #18336
2016-05-19 10:42:35 +02:00
Simon Willnauer 2b972f1f75 FSync translog outside of the writers global lock (#18360)
FSync translog outside of the writers global lock

Today we aquire a write global lock that blocks all modification to the
translog file while we fsync / checkpoint the file. Yet, we don't necessarily
needt to block concurrent operations here. This can lead to a lot of blocked
threads if the machine has high concurrency (lot os CPUs) but uses slow disks
(spinning disks) which is absolutely unnecessary. We just need to protect from
fsyncing / checkpointing concurrently but we can fill buffers and write to the
underlying file in a concurrent fashion.

This change introduces an additional lock that we hold while fsyncing but moves
the checkpointing code outside of the writers global lock.
2016-05-19 09:40:10 +02:00
Simon Willnauer 9a9301f7d8 Remove dead BloomFilter code
We don't use this class for a quite a while. lets trash it.
2016-05-18 23:00:57 +02:00
Clinton Gormley cec9a94b96 Added version 2.3.3 with bwc indices 2016-05-18 17:33:21 +02:00
Christoph Büscher 7c665a010b Fix TimeZoneRounding#nextRoundingValue for hour, minute and second units
Currently rounding intervals obtained by nextRoundingValue() for hour, minute and
second units can include an extra hour when happening at DST transitions that add
an extra hour (eg CEST -> CET). This changes the rounding logic for time units
smaller or equal to an hour to fix this.

Closes #18326
2016-05-18 17:29:02 +02:00
markharwood a846ff93e9 Aggregations fix: support include/exclude strings formatted for IP and date fields in terms and significant_terms aggregations.
Closes #17705
2016-05-18 16:21:55 +01:00
Tanguy Leroux 27b65e90ca Merge pull request #18443 from tlrx/fix-18433
Add missing builder.endObject() in FsInfo
2016-05-18 15:40:55 +02:00
Jason Tedor cad0608cdb Add GC overhead logging
This commit adds simple GC overhead logging. This logging captures
intervals where the JVM is spending a lot of time performing GC but it
is not necessarily the case that each GC is large. For a start, this
logging is simple and does not attempt to incorporate whether or not the
collections were efficient (that is, we are only capturing that a lot of
GC is happening, not that a lot of useless GC is happening).

Relates #18419
2016-05-18 09:31:28 -04:00
Daniel Mitterdorfer 3954306af2 Merge pull request #18432 from danielmitterdorfer/fix-circuit-breaker-it
Clear all caches after testing parent breaker
2016-05-18 15:23:15 +02:00
Tanguy Leroux d7a31c8cf7 Add missing builder.endObject() in FsInfo
closes #18433
2016-05-18 15:19:30 +02:00
Christoph Büscher 808ef6cec7 Fix parsing single `rescore` element in SearchSourceBuilder
We are currently only parsing the array-syntax for the rescore part
in SearchSourceBuilder ("rescore" : [ {...}, {...} ]) . We also need
to support "rescore" : {...}

Closes #18439
2016-05-18 15:08:28 +02:00
Clinton Gormley c03dd8a290 Make the index-too-old exception more explicit (#18438)
Closes #18418
2016-05-18 13:33:25 +02:00
Daniel Mitterdorfer de3e7d161f Add tests for null precondition check in BulkRequest
Relates #18347

Checked with @javanna
2016-05-18 12:10:13 +02:00
Yannick Welsch 6dacac49b3 Simplify recovery logic in IndicesClusterStateService (#18405)
- Moves recovery logic into IndexShard
- Simplifies logic to cancel peer recovery of shard where recovery source node changed
- Ensures routing entry is set on initialization of IndexShard
2016-05-18 10:51:57 +02:00
Daniel Mitterdorfer c13df3b6c5 Clear all caches after testing parent breaker
With this commit we clear all caches after testing the parent circuit breaker.
This is necessary as caches hold on to circuit breakers internally. Additionally,
due to usage of CircuitBreaker#addWithoutBreaking() in caches, it's even possible
to go above the limit. As a consequence, all subsequent requests fall victim to
the limit.

Hence, right after the parent circuit breaker tripped, we clear all caches to
reduce these circuit breakers to 0 again. We also exclude the clear caches
transport request from limit check in order to ensure it will succeed. As this is
typically a very small and low-volume request, it is deemed ok to exclude it.

Closes #18325
2016-05-18 09:31:35 +02:00
Jason Tedor ecce53f0df Add I/O statistics on Linux
This commit adds a variety of real disk metrics for the block devices
that back Elasticsearch data paths. A collection of statistics are read
from /proc/diskstats and are used to report the raw metrics for
operations and read/write bytes.

Relates #15915
2016-05-17 16:16:39 -04:00
Jason Tedor 584be0b3f8 Refactor JvmGcMonitorService for testing
This commit refactors the JvmGcMonitorService so that it can be
tested. In particular, hooks are added to verify that the
JvmMonitorService correctly observes slow GC events, and that the
JvmGcMonitorService logs the correct messages.

Relates #18378
2016-05-17 13:05:36 -04:00
Yannick Welsch 9ba554dfd2 Expose previous cluster state only in RoutingAllocation (#18390)
Instead of re-exposing index metadata and blocks in RoutingNodes (which is part of the cluster state before rerouting), expose it as part of the RoutingAllocation which is known to be only temporarily used during reroute.
2016-05-17 19:02:28 +02:00
polyfractal c755a77022 [TEST] Use a reproducible source of randomness in shuffle 2016-05-17 12:55:07 -04:00
Zachary Tong 7c46b57ff2 Add a Sort ingest processor
Sorts an array of values in ascending or descending order. If all elements are numerics, they will be sorted numerically. If values are strings, or mixtures of strings/numbers, the elements will be sorted lexicographically.
2016-05-17 12:06:48 -04:00
Colin Goodheart-Smithe 8c9ca8b518 Moves query profiler classes into their own package
The change also renames fields and methods in the Profilers class.

Note that I had to make ProfileResult a public class (it was package private before) because now classes that call it are in a different package.
2016-05-17 14:20:05 +01:00
Ali Beyad 3764789d96 Removed unused AllocationService member in
TransportClusterAllocationExplainAction

Closes #18381
2016-05-16 18:41:36 -04:00
Robert Muir 8d4c1befe5 Merge pull request #18364 from rmuir/nukeRunAsFloat
Remove LeafSearchScript.runAsFloat(): Nothing calls it.
2016-05-16 17:08:25 -04:00
Adrien Grand 864ed04059 Lessen leniency of the query dsl. #18276
This change does the following:
 - Queries that are currently unsupported such as prefix queries on numeric
   fields or term queries on geo fields now throw an error rather than returning
   a query that does not match anything.
 - Fuzzy queries on numeric, date and ip fields are now unsupported: they used
   to create range queries, we now expect users to use range queries directly.
   Fuzzy, regexp and prefix queries are now only supported on text/keyword
   fields (including `_all`).
 - The `_uid` and `_id` fields do not support prefix or range queries anymore as
   it would prevent us to store them more efficiently in the future, eg. by
   using a binary encoding.

Note that it is still possible to ignore these errors by using the `lenient`
option of the `match` or `query_string` queries.
2016-05-16 17:37:00 +02:00
Colin Goodheart-Smithe e37e8af5e2 Refactor of query profile classes to make way for other profile implementations 2016-05-16 16:15:50 +01:00
Colin Goodheart-Smithe 6eda9f5df6 more tests following review 2016-05-16 09:07:22 +01:00
Colin Goodheart-Smithe 0c449fee4a small fix following rebase on master 2016-05-16 09:07:22 +01:00
Colin Goodheart-Smithe 66d0bdab0c review comments 2016-05-16 09:07:22 +01:00
Colin Goodheart-Smithe ab3121c871 Adds a methods to find (and dynamically create) the mappers for the parents of a field with dots in the field name 2016-05-16 09:07:22 +01:00
Robert Muir 8edf213492 Remove LeafSearchScript.runAsFloat(): Nothing calls it. 2016-05-15 22:59:28 -04:00
Michael McCandless 0d570352dd Merge pull request #18355 from mikemccand/iterables_flatten
Iterables.flatten should not pre-cache the first iterator
2016-05-15 10:21:35 -04:00
Mike McCandless 8d7db7fd7a remove whitespace 2016-05-14 18:50:10 -04:00