Commit Graph

8972 Commits

Author SHA1 Message Date
Simon Willnauer cc78b24867 Bump BWC version to 6.1.0 for #27469 2017-11-21 16:16:31 +01:00
Adrien Grand 6ac799074e
Fix dynamic mapping update generation. (#27467)
When a field is not mapped, Elasticsearch tries to generate a mapping update
from the parsed document. Some documents can introduce corner-cases, for
instance in the event of a multi-valued field whose values would be mapped to
different field types if they were supplied on their own, see for instance:

```
PUT index/doc/1
{
  "foo": ["2017-11-10T02:00:01.247Z","bar"]
}
```

In that case, dynamic mappings want to map the first value as a `date` field
and the second one as a `text` field. This currently throws an exception,
which is expected, but the wrong one since it throws a `class_cast_exception`
(which triggers a HTTP 5xx code) when it should throw an
`illegal_argument_exception` (HTTP 4xx).
2017-11-21 15:31:18 +01:00
Simon Willnauer 5a0b6d1977
Use the primary_term field to identify parent documents (#27469)
This change stops indexing the `_primary_term` field for nested documents
to allow fast retrieval of parent documents. Today we create a docvalues
field for children to ensure we have a dense datastructure on disk. Yet,
since we only use the primary term to tie-break on when we see the same
seqID on indexing having a dense datastructure is less important. We can
use this now to improve the nested docs performance and it's memory footprint.

Relates to #24362
2017-11-21 15:14:03 +01:00
Jim Ferenczi 6319424e4a
Move composite aggregation to core (#27474)
This change removes the module named aggs-composite and adds the `composite` aggs
as a core aggregation. This allows other plugins to use this new aggregation
and simplifies the integration in the HL rest client.
2017-11-21 13:31:01 +01:00
Simon Willnauer ea35abca28
Protect shard splitting from illegal target shards (#27468)
While we have an assertion that checks if the number of routing shards is a multiple
of the number of shards we need a real hard exception that checks this way earlier.
This change adds a check and test that is executed before we create the index.

Relates to #26931
2017-11-21 12:09:45 +01:00
Luca Cavanna 29450de7b5
Cross Cluster Search: make remote clusters optional (#27182)
Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that.

This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory.

Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile.

The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped:

"_clusters" : {
    "total" : 3,
    "successful" : 2,
    "skipped" : 1
}
Such section won't be part of the response if no clusters have been skipped.

The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 11:41:47 +01:00
Jason Tedor 190da14bfe Move resync request serialization assertion
This commit moves an assertion that some guard code that will eventually
be dead code in the resync replication request read serialization is
removed when the master branch is bumped to version 8.0.0.
2017-11-20 20:59:41 -05:00
Jason Tedor 28660be40a
Fix resync request serialization
This commit addresses a subtle bug in the serialization routine for
resync requests. The problem here is that Translog.Operation#readType is
not compatible with the implementations of
Translog.Operation#writeTo. Unfortunately, this issue prevents
primary-replica from succeeding, issues which we will address in
follow-ups.

Relates #27418
2017-11-20 20:56:48 -05:00
Nicholas Knize 093218e052 [TEST] Fix `GeoShapeQueryTests#testPointsOnly` failure
Changes unnecessary geoIntersection query to a matchAll query.

closes #27454
2017-11-20 12:11:18 -06:00
Tim Brooks 0a8f48d592
Transition transport apis to use void listeners (#27440)
Currently we use ActionListener<TcpChannel> for connect, close, and send
message listeners in TcpTransport. However, all of the listeners have to
capture a reference to a channel in the case of the exception api being
called. This commit changes these listeners to be type <Void> as passing
the channel to onResponse is not necessary. Additionally, this change
makes it easier to integrate with low level transports (which use
different implementations of TcpChannel).
2017-11-20 10:47:47 -07:00
Simon Willnauer d02f45f694 AwaitsFix GeoShapeQueryTests#testPointsOnly #27454 2017-11-20 17:16:36 +01:00
Simon Willnauer 720e96e288
Ensure nested documents have consistent version and seq_ids (#27455)
Today we index dummy values for seq_ids and version on nested documents.
This is on the one hand trappy since users can request these values via
inner hits and on the other hand not necessarily good for compression since
the dummy value will likely not compress well when seqIDs are lowish.

This change ensures that we share the same field values for all documents in a
nested block. This won't have any overhead, in-fact it might be more efficient since
we even reduce the work needed slightly.
2017-11-20 16:50:08 +01:00
Christoph Büscher 682a85b2c1
Delete some seemingly unused exceptions (#27439) 2017-11-20 09:05:03 +01:00
Michael Basnight 2949c53174
Remove config prompting for secrets and text (#27216)
This commit removes the ability to use ${prompt.secret} and
${prompt.text} as valid config settings. Secure settings has obsoleted
the need for this, and it cleans up some of the code in Bootstrap.
2017-11-19 22:33:17 -06:00
Michael Basnight cb3e8f4763
Move the CLI into its own subproject (#27114)
Projects the depend on the CLI currently depend on core. This should not
always be the case. The EnvironmentAwareCommand will remain in :core,
but the rest of the CLI components have been moved into their own
subproject of :core, :core:cli.
2017-11-18 21:42:57 -06:00
Jason Tedor 56540281a8
Avoid NPE when getting build information
When the Elasticsearch code is loaded in an unusual classloading
environment (e.g., when using the high-level REST client) in Jetty, the
code source can be null and we trip with an NPE. This commit addresses
this.

Relates #27442
2017-11-18 07:19:22 -05:00
Nhat Nguyen 4f711a828b
Removes BWC snapshot status handler used in 6.x (#27443)
We introduced a new snapshot status update handler in 6.1.0. We will
keep the old handler along with this new one in all 6.x. This commit
removes the old handler from 7.0.

Relates #27151
2017-11-17 20:13:56 -05:00
Tim Brooks cc3be6ddda
Remove parameters on HandshakeResponseHandler (#27444)
This is a followup to #27407. That commit removed the channel type
parameter from TcpTransport. This commit removes the parameter from the
handshake response handler.
2017-11-17 14:53:15 -07:00
Nicholas Knize 075c77fc81 [GEO] fix pointsOnly bug for MULTIPOINT
This commit fixes a bug where geo_shape indexes configured for "points_only" : "true" reject documents containing multipoint shape types.
2017-11-17 14:43:36 -06:00
Mayya Sharipova 858b2c7cb8
Standardize underscore requirements in parameters (#27414)
Stardardize underscore requirements in parameters across different type of
requests:
_index, _type, _source, _id keep their underscores
params like version and retry_on_conflict will be without underscores
Throw an error if older versions of parameters are used

BulkRequest, MultiGetRequest, TermVectorcRequest, MoreLikeThisQuery
were changed

Closes #26886
2017-11-17 15:31:52 -05:00
Jason Tedor da115151a5
Log primary-replica resync failures
Today we do not fail a replica shard if the primary-replica resync to
that replica fails. Yet, we should at least log the failure
messages. This commit causes this to be the case.

Relates #27421
2017-11-17 13:33:58 -05:00
Nhat Nguyen db688e1a17
Uses TransportMasterNodeAction to update shard snapshot status (#27165)
Currently, we are using a plain TransportRequestHandler to post snapshot
status messages to the master. However, it doesn't have a robust retry
mechanism as TransportMasterNodeAction. This change migrates from
TransportRequestHandler to TransportMasterNodeAction for the new
versions and keeps the current implementation for the old versions.

Closes #27151
2017-11-17 11:54:44 -05:00
Lee Hinman d92afa1e0a Enforce a minimum task execution and service time of 1 nanosecond
Resolves #27371
2017-11-17 09:39:55 -07:00
Yannick Welsch 76203e72bd
Fix place-holder in allocation decider messages (#27436)
Allocation decider messages were using the wrong place-holder, which resulted in output of the form "no allocations are allowed due to {}" when showing diagnostics information in the explain API.
2017-11-17 17:27:19 +01:00
Jim Ferenczi c91b7cad83 [#27380] Adjust bwc for multi_match lenient option 2017-11-17 15:45:45 +01:00
Jim Ferenczi 53462f6499
Make fields optional in multi_match query and rely on index.query.default_field by default (#27380)
* Make fields optional in multi_match query and rely on index.query.default_field by default

This commit adds the ability to send `multi_match` query without providing any `fields`.
When no fields are provided the `multi_match` query will use the fields defined in the index setting `index.query.default_field`
(which in turns defaults to `*`).
The same behavior is already implemented in `query_string` and `simple_query_string` so this change just applies
the heuristic to `multi_match` queries.
Relying on `index.query.default_field` rather than `*` is safer for big mappings that break the 1024 field expansion limit added in 7.0 for all
text queries. For these kind of mappings the admin can change the `index.query.default_field` in order to make sure that exploratory queries using
`multi_match`, `query_string` or `simple_query_string` do not throw an exception.
2017-11-17 10:25:21 +01:00
David Turner 492edb91b9 Bump version to 6.0.1 2017-11-16 18:39:20 +00:00
David Turner 9766b858d0
Prepare for bump to 6.0.1 on the master branch (#27391)
An assortment of fixes, particularly to version number calculations, in preparation for the bump to 6.0.1.
2017-11-16 18:38:54 +00:00
Tim Brooks 80ef9bbdb1
Remove parameterization from TcpTransport (#27407)
This commit is a follow up to the work completed in #27132. Essentially
it transitions two more methods (sendMessage and getLocalAddress) from
Transport to TcpChannel. With this change, there is no longer a need for
TcpTransport to be aware of the specific type of channel a transport
returns. So that class is no longer parameterized by channel type.
2017-11-16 11:19:36 -07:00
kel 6b817489f3 Fix default value of ignore_unavailable for snapshot REST API (#27056)
The default value for ignore_unavailable did not match what was documented when using the REST APIs for snapshot creation and restore. This commit sets the default value of ignore_unavailable to false, the way it is documented and ensures it's the same when using either REST API or transport client.

Closes #25359
2017-11-16 16:03:09 +01:00
Jim Ferenczi 623367d793
Add composite aggregator (#26800)
* This change adds a module called `aggs-composite` that defines a new aggregation named `composite`.
The `composite` aggregation is a multi-buckets aggregation that creates composite buckets made of multiple sources.
The sources for each bucket can be defined as:
  * A `terms` source, values are extracted from a field or a script.
  * A `date_histogram` source, values are extracted from a date field and rounded to the provided interval.
This aggregation can be used to retrieve all buckets of a deeply nested aggregation by flattening the nested aggregation in composite buckets.
A composite buckets is composed of one value per source and is built for each document as the combinations of values in the provided sources.
For instance the following aggregation:

````
"test_agg": {
  "terms": {
    "field": "field1"
  },
  "aggs": {
    "nested_test_agg":
      "terms": {
        "field": "field2"
      }
  }
}
````
... which retrieves the top N terms for `field1` and for each top term in `field1` the top N terms for `field2`, can be replaced by a `composite` aggregation in order to retrieve **all** the combinations of `field1`, `field2` in the matching documents:

````
"composite_agg": {
  "composite": {
    "sources": [
      {
	"field1": {
          "terms": {
              "field": "field1"
            }
        }
      },
      {
	"field2": {
          "terms": {
            "field": "field2"
          }
        }
      },
    }
  }
````

The response of the aggregation looks like this:

````
"aggregations": {
  "composite_agg": {
    "buckets": [
      {
        "key": {
          "field1": "alabama",
          "field2": "almanach"
        },
        "doc_count": 100
      },
      {
        "key": {
          "field1": "alabama",
          "field2": "calendar"
        },
        "doc_count": 1
      },
      {
        "key": {
          "field1": "arizona",
          "field2": "calendar"
        },
        "doc_count": 1
      }
    ]
  }
}
````

By default this aggregation returns 10 buckets sorted in ascending order of the composite key.
Pagination can be achieved by providing `after` values, the values of the composite key to aggregate after.
For instance the following aggregation will aggregate all composite keys that sorts after `arizona, calendar`:

````
"composite_agg": {
  "composite": {
    "after": {"field1": "alabama", "field2": "calendar"},
    "size": 100,
    "sources": [
      {
	"field1": {
          "terms": {
            "field": "field1"
          }
        }
      },
      {
	"field2": {
          "terms": {
            "field": "field2"
          }
	}
      }
    }
  }
````

This aggregation is optimized for indices that set an index sorting that match the composite source definition.
For instance the aggregation above could run faster on indices that defines an index sorting like this:

````
"settings": {
  "index.sort.field": ["field1", "field2"]
}
````

In this case the `composite` aggregation can early terminate on each segment.
This aggregation also accepts multi-valued field but disables early termination for these fields even if index sorting matches the sources definition.
This is mandatory because index sorting picks only one value per document to perform the sort.
2017-11-16 15:13:36 +01:00
Simon Willnauer 303e0c0e86
Fix `ShardSplittingQuery` to respect nested documents. (#27398)
Today if nested docs are used in an index that is split the operation
will only work correctly if the index is not routing partitioned or
unless routing is used. This change fixes the query that selectes the docs
to delete to also select all parents nested docs as well.

Closes #27378
2017-11-16 11:35:42 +01:00
Tim Brooks ca11085bb6
Add TcpChannel to unify Transport implementations (#27132)
Right now our different transport implementations must duplicate
functionality in order to stay compliant with the requirements of
TcpTransport. They must all implement common logic to open channels,
close channels, keep track of channels for eventual shutdown, etc.

Additionally, there is a weird and complicated relationship between
Transport and TransportService. We eventually want to start merging
some of the functionality between these classes.

This commit starts moving towards a world where TransportService retains
all the application logic and channel state. Transport implementations
in this world will only be tasked with returning a channel when one is
requested, calling transport service when a channel is accepted from
a server, and starting / stopping itself.

Specifically this commit changes how channels are opened and closed. All
Transport implementations now return a channel type that must comply with
the new TcpChannel interface. This interface has the methods necessary
for TcpTransport to completely manage the lifecycle of a channel. This
includes setting the channel up, waiting for connection, adding close
listeners, and eventually closing.
2017-11-15 12:38:39 -07:00
Tim Brooks a8f916911a
Remove implementations of `TransportChannel` (#27388)
Right now we have unnecessary implementations of `TransportChannel`.
Additionally, there are methods on the interface that are not used. This
commit removes unnecessary implementations and methods.
2017-11-15 09:48:07 -07:00
olcbean 5ce407e26f wildcard query on _index (#27334) 2017-11-14 10:22:21 -07:00
Jason Tedor be399965e3 Revert "Reduce synchronization on field data cache"
This reverts commit 2e863572f4.

Relates #27365
2017-11-14 05:57:51 -05:00
tinder-xli 2e863572f4 Reduce synchronization on field data cache
The field data cache can come under heavy contention in cases when lots
of search threads are hitting it for doc values. This commit reduces the
amount of contention here by using a double-checked locking strategy to
only lock when the cache needs to be initialized.

Relates #27365
2017-11-13 23:35:46 -05:00
Yannick Welsch 6d30fd5ac0
Properly format IndexGraveyard deletion date as date (#27362)
The toXContent method for IndexGraveYard (which is a collection of tombstones for explicitly marking indices as deleted in the cluster state) confused timeValue with dateField, resulting in output of the form "delete_date" : "23424.3d" instead of "delete_date":"2017-11-13T15:50:51.614Z".
2017-11-13 18:05:58 +01:00
Yannick Welsch c83f112b1a
Stop responding to ping requests before master abdication (#27329)
When the current master node is shutting down, it sends a leave request to the other nodes so that they can eagerly start a fresh master election. Unfortunately, it was still possible for the master node that was shutting down to respond to ping requests, possibly influencing the election decision as it still appeared as an active master in the ping responses. This commit ensures that UnicastZenPing does not respond to ping requests once it's been closed. ZenDiscovery.doStop() continues to ensure that the pinging component is first closed before it triggers a master election.

Closes #27328
2017-11-13 15:18:59 +01:00
Simon Willnauer 2299c70371
Allow affix settings to specify dependencies (#27161)
We use affix settings to group settings / values under a certain namespace.
In some cases like login information for instance a setting is only valid if
one or more other settings are present. For instance `x.test.user` is only valid
if there is an `x.test.passwd` present and vice versa. This change allows to specify
such a dependency to prevent settings updates that leave settings in an inconsistent
state.
2017-11-13 12:06:36 +01:00
tinder-xli 1e99195743 Remove unnecessary logger creation for doc values field data
This commit removes an unnecessary logger instance creation from the
constructor for doc values field data. This construction is expensive
for this oft-created class because of a synchronized block in the
constructor for the logger.

Relates #27349
2017-11-10 22:28:58 -05:00
Nicholas Knize 8904fc8210 [Geo] Decouple geojson parse logic from ShapeBuilders
This is the first step to supporting WKT (and other future) format(s). The ShapeBuilders are quite messy and can be simplified by decoupling the parse logic from the build logic. This commit refactors the parsing logic into its own package separate from the Shape builders. It also decouples the GeoShapeType into a standalone enumerator that is responsible for validating the parsed data and providing the appropriate builder. This future-proofs the code making it easier to maintain and add new shape types.
2017-11-10 14:37:58 -06:00
Ryan Ernst 8b9e23de93
Plugins: Add versionless alias to all security policy codebase properties (#26756)
This is a followup to #26521. This commit expands the alias added for
the elasticsearch client codebase to all codebases. The original full
jar name property is left intact. This only adds an alias without the
version, which should help ease the pain in updating any versions (ES
itself or dependencies).
2017-11-10 11:00:09 -08:00
Jim Ferenczi bec5d43228 [Test] #27342 Fix SearchRequests#testValidate 2017-11-10 18:51:58 +01:00
Jim Ferenczi 29331f1127
Fail queries with scroll that explicitely set request_cache (#27342)
Queries that create a scroll context cannot use the cache.
They modify the search context during their execution so using the cache
can lead to duplicate result for the next scroll query.

This change fails the entire request if the request_cache option is explictely set
on a query that creates a scroll context (`scroll=1m`) and make sure internally that we never
use the cache for these queries when the option is not explicitely used.
For 6.x a deprecation log will be printed instead of failing the entire request and the request_cache hint
will be ignored (forced to false).
2017-11-10 16:02:06 +01:00
Christoph Büscher 4fa33e7111
[Tests] Relax allowed delta in extended_stats aggregation (#27171)
The order in which double values are added in java can give different results
for the sum, so we need to allow a certain delta in the test assertions. The
current value was still a bit too low, which manifested itself in occasional
test failures.
2017-11-10 14:37:26 +01:00
Nicholas Knize 06ff92d237 Add ignore_malformed to geo_shape fields
This commit adds ignore_malformed support to geo_shape field types to skip malformed geoJson fields.

closes #23747
2017-11-09 17:59:05 -06:00
Dimitris Athanasiou 66bef26495
Aggregations: bucket_sort pipeline aggregation (#27152)
This commit adds a parent pipeline aggregation that allows
sorting the buckets of a parent multi-bucket aggregation.

The aggregation also offers [from] and [size] parameters
in order to truncate the result as desired.

Closes #14928
2017-11-09 17:59:57 +00:00
David Turner 1c6f5ce9cb
Improve error message for parse failures of completion fields (#27297)
Fix spacing/grammar/punctuation, and include the field name and location in the source document.
2017-11-09 10:45:44 +00:00
Simon Willnauer a34c2f0b8d
Ensure external refreshes will also refresh internal searcher to minimize segment creation (#27253)
We cut over to internal and external IndexReader/IndexSearcher in #26972 which uses
two independent searcher managers. This has the downside that refreshes of the external
reader will never clear the internal version map which in-turn will trigger additional
and potentially unnecessary segment flushes since memory must be freed. Under heavy
indexing load with low refresh intervals this can cause excessive segment creation which
causes high GC activity and significantly increases the required segment merges.

This change adds a dedicated external reference manager that delegates refreshes to the
internal reference manager that then `steals` the refreshed reader from the internal
reference manager for external usage. This ensures that external and internal readers
are consistent on an external refresh. As a sideeffect this also releases old segments
referenced by the internal reference manager which can potentially hold on to already merged
away segments until it is refreshed due to a flush or indexing activity.
2017-11-09 08:40:22 +00:00