Commit Graph

537 Commits

Author SHA1 Message Date
aditya-agrawal 27ddb4ffea Avoid NPE in `more_like_this` when field has zero tokens (#30365)
Fixes and edge case when using `more_like_this` where TermVectorsWriter
could throw an NPE when a field produced zero tokens after analysis. This
changes the implementation to use an empty list of tokens in this case.

Closes #30148
2018-05-08 15:13:07 +02:00
Jack Conradson 1b22477104 Silence SplitIndexIT.testSplitIndexPrimaryTerm test failure. (#30432) 2018-05-07 13:35:28 -07:00
Yannick Welsch 82b251adcf
Auto-expand replicas when adding or removing nodes (#30423)
Auto-expands replicas in the same cluster state update (instead of a follow-up reroute) where nodes are added or removed.

Closes #1873, fixing an issue where nodes drop their copy of auto-expanded data when coming up, only to sync it again later.
2018-05-07 22:26:31 +02:00
Jason Tedor ec939dc012 Fix line length violation in cache tests
This commit fixes a line-length violation in the cache tests that was
hidden by the IDE folding the generics.
2018-05-07 14:12:38 -04:00
Igor Motov 6fb189ce47
Add stricter geohash parsing (#30376)
Adds verification that geohashes are not empty and contain only
valid characters. It fixes the issue when en empty geohash is
treated as [-180, -90] and geohashes with non-geohash character
are getting resolved into invalid coordinates.

Closes #23579
2018-05-07 13:56:39 -04:00
Jason Tedor 68760ec5da Add failing test for core cache deadlock
The core cache implementation has a deadlock bug. This commit adds a
failing test case.
2018-05-07 13:01:37 -04:00
Stéphane Campinas 39623402fc Pass the task to broadcast actions (#29672)
Since the task is required as per line 292, give the opportunity to broadcast actions to handle tasks.
2018-05-07 13:47:31 +02:00
Tanguy Leroux 1987d6261f
Do not fail snapshot when deleting a missing snapshotted file (#30332)
When deleting or creating a snapshot for a given shard, elasticsearch 
usually starts by listing all the existing snapshotted files in the repository. 
Then it computes a diff and deletes the snapshotted files that are not 
needed anymore. During this deletion, an exception is thrown if the file 
to be deleted does not exist anymore.

This behavior is challenging with cloud based repository implementations 
like S3 where a file that has been deleted can still appear in the bucket for 
few seconds/minutes (because the deletion can take some time to be fully 
replicated on S3). If the deleted file appears in the listing of files, then the 
following deletion will fail with a NoSuchFileException and the snapshot 
will be partially created/deleted.

This pull request makes the deletion of these files a bit less strict, ie not 
failing if the file we want to delete does not exist anymore. It introduces a 
new BlobContainer.deleteIgnoringIfNotExists() method that can be used 
at some specific places where not failing when deleting a file is 
considered harmless.

Closes #28322
2018-05-07 09:35:55 +02:00
Nhat Nguyen 16d6a0bfb3 AwaitsFix testCreateShrinkIndexToN
Relates #30416
2018-05-06 22:07:42 -04:00
Nhat Nguyen eed8a3b585
Add put index template api to high level rest client (#30400)
Relates #27205
2018-05-06 09:47:36 -04:00
Boaz Leskes b46d01d409 Relax testAckedIndexing to allow document updating
The test indexes new documents and is thus correct in testing that the response result
is `CREATED`. Sadly we can't guarantee exactly once delivery just yet.

Relates #9967

Closes #21658
2018-05-06 13:06:16 +02:00
Jason Tedor beee5fe004
Respect accept header on no handler (#30383)
Today when processing a request for a URL path for which we can not find
a handler we send back a plain-text response. Yet, we have the accept
header in our hand and can respect the accepted media type of the
request. This commit addresses this.
2018-05-04 18:13:50 -04:00
Ioannis Kakavas 21bc87a65b
Use readFully() to read bytes from CipherInputStream (#28515)
Changes how data is read from CipherInputStream

 Instead of using `read()` and checking that the bytes read are what we 
expect, use `readFully()` which will read exactly the number of bytes
while keep reading until the end of the stream or throw an
`EOFException` if not all bytes can be read.

This approach keeps the simplicity of using CipherInputStream while
working as expected with both JCE and BCFIPS Security Providers
2018-05-04 20:13:27 +03:00
tomcallahan 0a93956194
Add Get Settings API support to java high-level rest client (#29229)
This PR adds support for the Get Settings API to the java high-level rest client.
Furthermore, logic related to the retrieval of default settings has been moved from the rest layer into the transport layer and now default settings may be retrieved consistency via both the rest API and the transport API.
2018-05-04 11:14:28 -04:00
Jim Ferenczi 719ab30c32 Set the new lucene version for 6.4.0 2018-05-04 12:15:51 +02:00
Jim Ferenczi dbd857341f
Upgrade to 7.4.0-snapshot-1ed95c097b (#30357)
Upgrade to lucene-7.4.0-snapshot-1ed95c097b

This version contains:
* An Analyzer for Korean
* An IntervalQuery and IntervalsSource that retrieve minimum intervals of positional queries.
* A new API to retrieve matches (offsets and positions) of a query for a single document.
* Support for soft deletes in the index writer.
* A fixed shingle filter that handles index time synonyms.
* Support for emoji sequence in ICUTokenizer (with an upgrade to icu 61.1)
2018-05-04 11:44:22 +02:00
Michael Basnight 5f8101a44c
Make RepositoriesMetaData contents unmodifiable (#30361)
This commit makes the RepositoriesMetaData backing list no longer
modifiable.

Ref #30333
2018-05-03 13:14:54 -05:00
Boaz Leskes ccd791b3b4
InternalEngineTests.testConcurrentOutOfOrderDocsOnReplica should use two documents (#30121)
We were recently looking at bugs that can only occur if two different documents were indexed concurrently. For example, what happens if the local checkpoint advances above the sequence number of  a document that's being indexed. That can only happen if another concurrent operation caused the checkpoint to advance. It has to be another document to allow concurrency as we acquire a per uid lock.While our investigation proved that the suspected bug doesn't exists, we still discovered our unit testing coverage is not good enough to cover this case. 

This PR extend the test concurrent out of order replica processing to use two documents in its history.
2018-05-03 14:57:48 +02:00
Michael Basnight bdd43fa69f
Change signature of Get Repositories Response (#30333)
The Get Repositories response object held a list of RepositoryMetaData
entries. This object does not have the from/toXContent methods that are
needed to expose this to the high level REST client. The
RepositoriesMetaData, however, does, and it also contains a list of
RepositoryMetaData objects within it. So rather than duplicate this
logic or move it (RepositoriesMetaData is a fragment object used by
cluster state), the object holding state in the Response was changed to
use the RepositoriesMetaData instead. This also cleans up the read/write
methods in the response, as they can now use the same read/write in
RepositoriesMetaData, which also were not present in the singular class.
2018-05-03 07:22:59 -05:00
Zachary Tong 3c2d2a7d4a
Fix NPE when CumulativeSum agg encounters null/empty bucket (#29641)
Fix NPE when CumulativeSum agg encounters null/empty bucket

If the cusum agg encounters a null value, it's because the value is
missing (like the first value from a derivative agg), the path is
not valid, or the bucket in the path was empty.

Previously cusum would just explode on the null, but this changes it
so we only increment the sum if the value is non-null and finite.
This is safe because even if the cusum encounters all null or empty
buckets, the cumulative sum is still zero (like how the sum agg returns
zero even if all the docs were missing values)

I went ahead and tweaked AggregatorTestCase to allow testing pipelines,
so that I could delete the IT test and reimplement it as AggTests.

Closes #27544
2018-05-02 12:22:55 -07:00
Ryan Ernst fb0aa562a5
Network: Remove http.enabled setting (#29601)
This commit removes the http.enabled setting. While all real nodes (started with bin/elasticsearch) will always have an http binding, there are many tests that rely on the quickness of not actually needing to bind to 2 ports. For this case, the MockHttpTransport.TestPlugin provides a dummy http transport implementation which is used by default in ESIntegTestCase.

closes #12792
2018-05-02 11:42:05 -07:00
James Baiera 6d6da7c661
Fix merging logic of Suggester Options (#29514)
Suggester Options have a collate match field that is returned when the prune 
option is set to true. These values should be merged together in the query 
reduce phase, otherwise good suggestions that result in rare hits in shards with 
results that do not arrive first may be incorrectly marked as not matching the 
collate query.
2018-05-02 14:40:57 -04:00
Boaz Leskes 13917162ad
ReplicationTracker.markAllocationIdAsInSync may hang if allocation is cancelled (#30316)
At the end of recovery, we mark the recovering shard as "in sync" on the primary. From this point on 
the primary will treat any replication failure on it as critical and will reach out to the master to fail the 
shard. To do so, we wait for the local checkpoint of the recovered shard to be above the global 
checkpoint (in order to maintain global checkpoint invariant).

If the master decides to cancel the allocation of the recovering shard while we wait, the method can 
currently hang and fail to return. It will also ignore the interrupts that are triggered by the cancelled 
recovery due to the primary closing. 

Note that this is crucial as this method is called while holding a primary permit. Since the method 
never comes back, the permit is never released. The unreleased permit will then block any primary 
relocation *and* while the primary is trying to relocate all indexing will be blocked for 30m as it 
waits to acquire the missing permit.
2018-05-02 19:40:29 +02:00
Boaz Leskes af45b4dee4
Cancelling a peer recovery on the source can leak a primary permit (#30318)
The code in `SourceRecoveryHandler` runs under a `CancellableThreads` instance in order to allow long running operations to be interrupted when the recovery is cancelled. Sadly if this happens at just the wrong moment while acquiring a permit from the primary, that primary can be leaked and never be freed.

Note that this is slightly better than it sounds - we only cancel recoveries on the source side if the primary shard itself is closed.

Relates to https://github.com/elastic/elasticsearch/pull/30316
2018-05-02 18:01:29 +02:00
Ryan Ernst 916bf9d26d
Convert server javadoc to html5 (#30279)
This commit converts the remaining javadocs in :server using html4 to html5.
This was mostly converting `tt` to `{@code}`.
2018-05-02 08:08:54 -07:00
Adrien Grand 368ddc408f
Remove MapperService#types(). (#29617)
This isn't be necessary with a single type per index.
2018-05-02 11:35:12 +02:00
Adrien Grand 7358946bda
Add a new `_ignored` meta field. (#29658)
This adds a new `_ignored` meta field which indexes and stores fields that have
been ignored at index time because of the `ignore_malformed` option. It makes
malformed documents easier to identify by using `exists` or `term(s)` queries
on the `_ignored` field.

Closes #29494
2018-05-02 10:47:02 +02:00
Paul Sanwald 00b21f886a
Fix failure for validate API on a terms query (#29483)
* WIP commit to try calling rewrite on coordinating node during TransportSearchAction

* Use re-written query instead of using the original query

* fix incorrect/unused imports and wildcarding

* add error handling for cases where an exception is thrown

* correct exception handling such that integration tests pass successfully

* fix additional case covered by IndicesOptionsIntegrationIT.

* add integration test case that verifies queries are now valid

* add optional value for index

* address review comments: catch superclass of XContentParseException

fixes #29483
2018-05-01 13:38:22 -07:00
Michael Basnight 62a9b8909e
Remove RepositoriesMetaData variadic constructor (#29569)
The variadic constructor was only used in a few places and the
RepositoriesMetaData class is backed by a List anyway, so just using a
List will make it simpler to instantiate it.
2018-05-01 15:02:06 -05:00
Nhat Nguyen 038fe1151b TEST: Add debug log to FlushIT
We still don't have a strong reason for the failures of
testDoNotRenewSyncedFlushWhenAllSealed and
testSyncedFlushSkipOutOfSyncReplicas.

This commit adds debug logging for these two tests.
2018-05-01 10:15:03 -04:00
Diwas Joshi dd5fcb211d index name added to snapshot restore exception (#29604)
This PR adds index name to snapshot restore exception if index is renamed during restoring.
closes [#27601](https://github.com/elastic/elasticsearch/issues/27601)
2018-05-01 15:16:38 +02:00
Jason Tedor 5de6f4ff7b Adjust copy settings on resize BWC version
This commit adjusts the BWC version for copy settings on resize
operations after the behavior was backported to 6.x.
2018-05-01 08:49:16 -04:00
Jason Tedor 50535423ff
Allow copying source settings on resize operation (#30255)
Today when an index is created from shrinking or splitting an existing
index, the target index inherits almost none of the source index
settings. This is surprising and a hassle for operators managing such
indices. Given this is the default behavior, we can not simply change
it. Instead, we start by introducing the ability to copy settings. This
flag can be set on the REST API or on the transport layer and it has the
behavior that it copies all settings from the source except non-copyable
settings (a property of a setting introduced in this
change). Additionally, settings on the request will always override.

This change is the first step in our adventure:
 - this flag is added here in 7.0.0 and immediately deprecated
 - this flag will be backported to 6.4.0 and remain deprecated
 - then, we will remove the ability to set this flag to false in 7.0.0
 - finally, in 8.0.0 we will remove this flag and the only behavior will
   be for settings to be copied
2018-05-01 08:48:19 -04:00
Nik Everett 99b98fab18
Core: Pick inner most parse exception as root cause (#30270)
Just like `ElasticsearchException`, the inner most
`XContentParseException` tends to contain the root cause of the
exception and show be show to the user in the `root_cause` field.

The effectively undoes most of the changes that #29373 made to the
`root_cause` for parsing exceptions. The `type` field still changes from
`parse_exception` to `x_content_parse_exception`, but this seems like a
fairly safe change.

`ElasticsearchWrapperException` *looks* tempting to implement this but
the behavior isn't quite right. `ElasticsearchWrapperExceptions` are
entirely unwrapped until the cause no longer
`implements ElasticsearchWrapperException` but `XContentParseException`
should be unwrapped until its cause is no longer an
`XContentParseException` but no further. In other words,
`ElasticsearchWrapperException` are unwrapped one step too far.

Closes #30261
2018-05-01 07:44:58 -04:00
Luca Cavanna acdf330a0e
Minor DocWriteResponse changes (#29675)
Remove double if depending on the Result value. It makes little sense to
pass in a boolean flag based on a Result value that we already have,
if that internally is represented again as a `Result` value.

Also changed the `Result` `lowercase` instance member to be computed
based on `name()` instead of `toString()` which is safer and to use
`Locale.ROOT` instead of `Locale.ENGLISH`
2018-05-01 09:35:09 +02:00
Boaz Leskes 4a537ef03c
Bulk operation fail to replicate operations when a mapping update times out (#30244)
Starting with the refactoring in https://github.com/elastic/elasticsearch/pull/22778 (released in 5.3) we may fail to properly replicate operation when a mapping update on master fails. If a bulk
operations needs a mapping update half way, it will send a request to the master before continuing 
to index the operations. If that request times out or isn't acked (i.e., even one node in the cluster 
didn't process it within 30s), we end up throwing the exception and aborting the entire bulk. This is 
a problem because all operations that were processed so far are not replicated any more to the 
replicas.  Although these operations were never "acked" to the user (we threw an error) it cause the 
local checkpoint on the replicas to lag (on 6.x) and the primary and replica to diverge. 

This PR does a couple of things:
1) Most importantly, treat *any* mapping update failure as a document level failure, meaning only 
    the relevant indexing operation will fail.
2) Removes the mapping update callbacks from `IndexShard.applyIndexOperationOnPrimary` and 
    similar methods for simpler execution. We don't use exceptions any more when a mapping 
    update was successful.

I think we need to do more work here (the fact that a single slow node can prevent those mappings 
updates from being acked and thus fail operations is bad), but I want to keep this as small as I can 
(it is already too big).
2018-05-01 08:15:02 +02:00
Chris Earle 725a5af2c6
_cluster/state should always return cluster_uuid (#30143)
Currently, the only way to get the REST response for the `/_cluster/state`
call to return the `cluster_uuid` is to request the `metadata` metrics,
which is one of the most expensive response structures. However, external
monitoring agents will likely want the `cluster_uuid` to correlate the
response with other API responses whether or not they want cluster
metadata.
2018-04-30 10:16:11 -04:00
Jason Tedor 811f5b4efc
Do not ignore request analysis/similarity on resize (#30216)
Today when a resize operation is performed, we copy the analysis,
similarity, and sort settings from the source index. It is possible for
the resize request to include additional index settings including
analysis, similarity, and sort settings. We reject sort settings when
validating the request. However, we silently ignore analysis and
similarity settings on the request that are already set on the source
index. Since it is possible to change the analysis and similarity
settings on an existing index, this should be considered a bug and the
sort of leniency that we abhor. This commit addresses this bug by
allowing the request analysis/similarity settings to override the
existing analysis/similarity settings on the target.
2018-04-30 07:31:36 -04:00
Tanguy Leroux a6624bb742
[Test] Update test in SharedClusterSnapshotRestoreIT (#30200)
The `testDeleteSnapshotWithMissingIndexAndShardMetadata` test uses an
obsolete repository directory structure based on index names instead of
UUIDs. Because it swallows exceptions when deleting test files the test
never failed when the directory structure changed.

This commit fixes the test to use the right directory structure and file
 names and to not swallow exceptions anymore.
2018-04-30 09:48:03 +02:00
Jason Tedor 0a6312a5e6
Collapse REST resize handlers (#30229)
The REST resize handlers for shrink/split operations are effectively the
same code with a minor difference. This commit collapse these handlers
into a single base class.
2018-04-29 08:58:11 -04:00
Jason Tedor bdde2b9824
Rename request variables in shrink/split handlers (#30207)
This is a code-tidying PR, a little side adventure while working on
another change. Previously only shrink request existed but when the
ability to split indices was added, shrink and split were done together
under a single request object: the resize request object. However, the
code inherited the legacy name in the naming of some variables. This
commit cleans this up.
2018-04-28 01:09:44 -04:00
Julie Tibshirani f5978d6d33
In the field capabilities API, remove support for providing fields in the request body. (#30185) 2018-04-27 16:14:11 -07:00
Nhat Nguyen 9c586a2f07
Do not log warn shard not-available exception in replication (#30205)
Since #28049, only fully initialized shards are received write requests.
This enhancement allows us to handle all exceptions. In #28571, we
started strictly handling shard-not-available exceptions and tried to
keep the way we report replication errors to users by only reporting if
the error is not shard-not-available exceptions. However, since then we
unintentionally always log warn for all exception. This change restores
to the previous behavior which logs warn only if an exception is not a
shard-not-available exception.

Relates #28049
Relates #28571
2018-04-27 16:45:42 -04:00
Nik Everett f4ed902698
CCS: Drop http address from remote cluster info (#29568)
They are expensive to fetch and no longer needed by Kibana so they
*shouldn't* be needed by anyone else either.

Closes #29207
2018-04-27 14:19:00 -04:00
Julie Tibshirani d633130e1b
Convert FieldCapabilitiesResponse to a ToXContentObject. (#30182) 2018-04-27 09:47:11 -07:00
Tanguy Leroux 63148dd9ba
Fail snapshot operations early on repository corruption (#30140)
A NullPointerException is thrown when trying to create or delete
a snapshot in a repository that has been written to by an older 
Elasticsearch after writing to it with a newer Elasticsearch version.

This is because the way snapshots are formatted in the repository 
snapshots index file changed in #24477.

This commit changes the parsing of the repository index file so that 
it now detects a corrupted index file and fails early the snapshot 
operation.

closes #29052
2018-04-27 16:29:59 +02:00
Jim Ferenczi c08daf2589
Build global ordinals terms bucket from matching ordinals (#30166)
The global ordinals terms aggregator has an option to remap global ordinals to
dense ordinal that match the request. This mode is automatically picked when the terms
aggregator is a child of another bucket aggregator or when it needs to defer buckets to an
aggregation that is used in the ordering of the terms.
Though when building the final buckets, this aggregator loops over all possible global ordinals
rather than using the hash map that was built to remap the ordinals.
For fields with high cardinality this is highly inefficient and can lead to slow responses even
when the number of terms that match the query is low.
This change fixes this performance issue by using the hash table of matching ordinals to perform
the pruning of the final buckets for the terms and significant_terms aggregation.
I ran a simple benchmark with 1M documents containing 0 to 10 keywords randomly selected among 1M unique terms.
This field is used to perform a multi-level terms aggregation using rally to collect the response times.
The aggregation below is an example of a two-level terms aggregation that was used to perform the benchmark:

```
"aggregations":{
   "1":{
      "terms":{
         "field":"keyword"
      },
      "aggregations":{
         "2":{
            "terms":{
               "field":"keyword"
            }
         }
      }
   }
}
```

| Levels of aggregation | 50th percentile ms (master) | 50th percentile ms (patch) |
| --- | --- | --- |
| 2 | 640.41ms | 577.499ms |
| 3 | 2239.66ms | 600.154ms |
| 4 | 14141.2ms | 703.512ms |

Closes #30117
2018-04-27 15:26:46 +02:00
Alexander Reelsen e1a16a6018
REST: Remove GET support for clear cache indices (#29525)
Clearing the cache indices can be done via GET and POST. As GET should
only support read only operations, this removes the support for using
GET for clearing the indices caches.
2018-04-27 08:41:36 +02:00
Julie Tibshirani 0d8aed8c2b
Fix a bug in FieldCapabilitiesRequest#equals and hashCode. (#30181)
Also update its unit test to AbstractStreamableTestCase for better coverage.
2018-04-26 16:09:27 -07:00
Jim Ferenczi 80e0e64bfe Fix SliceBuilderTests#testRandom failures
Add missing shard context creation in a random test.
2018-04-26 22:18:39 +02:00