Commit Graph

1530 Commits

Author SHA1 Message Date
kel f5e0932c8d Add version support for inner hits in field collapsing (#27822) (#27833)
Add version support for inner hits in field collapsing
2017-12-15 18:00:40 +01:00
Christoph Büscher 52cb6c8ef2 Merge branch 'master' into rankeval 2017-12-07 14:22:46 +01:00
Jim Ferenczi caea6b70fa
Add a new cluster setting to limit the total number of buckets returned by a request (#27581)
This commit adds a new dynamic cluster setting named `search.max_buckets` that can be used to limit the number of buckets created per shard or by the reduce phase. Each multi bucket aggregator can consume buckets during the final build of the aggregation at the shard level or during the reduce phase (final or not) in the coordinating node. When an aggregator consumes a bucket, a global count for the request is incremented and if this number is greater than the limit an exception is thrown (TooManyBuckets exception).
This change adds the ability for multi bucket aggregator to "consume" buckets in the global limit, the default is 10,000. It's an opt-in consumer so each multi-bucket aggregator must explicitly call the consumer when a bucket is added in the response.

Closes #27452 #26012
2017-12-06 09:15:28 +01:00
Christoph Büscher bbec33d35c Merge branch 'master' into rankeval 2017-12-04 12:57:19 +01:00
Mayya Sharipova c6b73239ae
Limit the number of tokens produced by _analyze (#27529)
Add an index level setting `index.analyze.max_token_count` to control
the number of generated tokens in the  _analyze endpoint.
Defaults to 10000.

Throw an error if the number of generated tokens exceeds this limit.

Closes #27038
2017-11-30 11:54:39 -05:00
Tanguy Leroux 41f73e0acf Fix version for include_global_state in Snapshot Status API
It also adds a Rest test.

Related #26853
2017-11-30 11:33:01 +01:00
Christoph Büscher 35688f6441 Merge branch 'master' into rankeval 2017-11-29 15:24:06 +01:00
Martijn van Groningen cb1204774b
Include the _index, _type and _id to nested search hits in the top_hits and inner_hits response.
Also include _type and _id for parent/child hits inside inner hits.

In the case of top_hits aggregation the nested search hits are
directly returned and are not grouped by a root or parent document, so
it is important to include the _id and _index attributes in order to know
to what documents these nested search hits belong to.

Closes #27053
2017-11-28 14:05:29 +01:00
Nhat Nguyen 8d6bfe53bb
Remove workaround in translog rest test (#27530)
Relates #25623 and a6db0ea908
2017-11-27 09:41:30 -05:00
Christoph Büscher 5661b1c3df Merge branch 'master' into rankeval 2017-11-24 16:25:05 +01:00
Nhat Nguyen 06d35f4f01 Backport wait_for_initialiazing_shards to cluster health API
Relates #27489
2017-11-24 09:56:16 -05:00
Nhat Nguyen 46b508d6c9
Add wait_for_no_initializing_shards to cluster health API (#27489)
This adds a new option to the cluster health request allowing to wait
until there is no initializing shards.

Closes #25623
2017-11-23 15:09:58 -05:00
Simon Willnauer fadbe0de08
Automatically prepare indices for splitting (#27451)
Today we require users to prepare their indices for split operations.
Yet, we can do this automatically when an index is created which would
make the split feature a much more appealing option since it doesn't have
any 3rd party prerequisites anymore.

This change automatically sets the number of routinng shards such that
an index is guaranteed to be able to split once into twice as many shards.
The number of routing shards is scaled towards the default shard limit per index
such that indices with a smaller amount of shards can be split more often than
larger ones. For instance an index with 1 or 2 shards can be split 10x
(until it approaches 1024 shards) while an index created with 128 shards can only
be split 3x by a factor of 2. Please note this is just a default value and users
can still prepare their indices with `index.number_of_routing_shards` for custom
splitting.

NOTE: this change has an impact on the document distribution since we are changing
the hash space. Documents are still uniformly distributed across all shards but since
we are artificually changing the number of buckets in the consistent hashign space
document might be hashed into different shards compared to previous versions.

This is a 7.0 only change.
2017-11-23 09:48:54 +01:00
Mayya Sharipova 57e4d10007
Limit the number of nested documents (#27405)
Add an index level setting `index.mapping.nested_objects.limit` to control
the number of nested json objects that can be in a single document
across all fields. Defaults to 10000.

Throw an error if the number of created nested documents exceed this
limit during the parsing of a document.

Closes #26962
2017-11-22 10:16:28 -05:00
Jim Ferenczi 90d2ead14a Adapt rest test BWC version after backport
Relates #26800
2017-11-21 15:45:02 +01:00
Christoph Büscher d979ccace9 Merge branch 'master' into rankeval 2017-11-21 14:11:02 +01:00
Jim Ferenczi 6319424e4a
Move composite aggregation to core (#27474)
This change removes the module named aggs-composite and adds the `composite` aggs
as a core aggregation. This allows other plugins to use this new aggregation
and simplifies the integration in the HL rest client.
2017-11-21 13:31:01 +01:00
Simon Willnauer 8aba7c8bbe Fix test BWC version after backport
Relates to #27468
2017-11-21 12:31:04 +01:00
Simon Willnauer ea35abca28
Protect shard splitting from illegal target shards (#27468)
While we have an assertion that checks if the number of routing shards is a multiple
of the number of shards we need a real hard exception that checks this way earlier.
This change adds a check and test that is executed before we create the index.

Relates to #26931
2017-11-21 12:09:45 +01:00
Luca Cavanna 29450de7b5
Cross Cluster Search: make remote clusters optional (#27182)
Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that.

This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory.

Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile.

The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped:

"_clusters" : {
    "total" : 3,
    "successful" : 2,
    "skipped" : 1
}
Such section won't be part of the response if no clusters have been skipped.

The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 11:41:47 +01:00
Zachary Tong 196dbf3357
Add YAML REST tests for filters bucket agg (#27128)
Related to #26220
2017-11-20 16:44:30 -05:00
Simon Willnauer 28e5cf933f Bump test version after backport
Relates to #27455
2017-11-20 16:54:59 +01:00
Simon Willnauer 720e96e288
Ensure nested documents have consistent version and seq_ids (#27455)
Today we index dummy values for seq_ids and version on nested documents.
This is on the one hand trappy since users can request these values via
inner hits and on the other hand not necessarily good for compression since
the dummy value will likely not compress well when seqIDs are lowish.

This change ensures that we share the same field values for all documents in a
nested block. This won't have any overhead, in-fact it might be more efficient since
we even reduce the work needed slightly.
2017-11-20 16:50:08 +01:00
Mayya Sharipova 858b2c7cb8
Standardize underscore requirements in parameters (#27414)
Stardardize underscore requirements in parameters across different type of
requests:
_index, _type, _source, _id keep their underscores
params like version and retry_on_conflict will be without underscores
Throw an error if older versions of parameters are used

BulkRequest, MultiGetRequest, TermVectorcRequest, MoreLikeThisQuery
were changed

Closes #26886
2017-11-17 15:31:52 -05:00
Yannick Welsch 3b963dcfe5 Stop skipping REST test after backport of #27056 2017-11-16 16:08:10 +01:00
kel 6b817489f3 Fix default value of ignore_unavailable for snapshot REST API (#27056)
The default value for ignore_unavailable did not match what was documented when using the REST APIs for snapshot creation and restore. This commit sets the default value of ignore_unavailable to false, the way it is documented and ensures it's the same when using either REST API or transport client.

Closes #25359
2017-11-16 16:03:09 +01:00
Clinton Gormley 1caa5c8e32 Rest test fixes (#27354)
* REST: Rename ingest.processor.grok to ingest.processor_grok
* REST: Rename remote.info to cluster.remote_info
* REST: Fixed bad YAML comments
* REST: Force dummy scripts to be strings, not numbers
* REST: Fix bad YAML in search/110_field_collapsing.yml
* REST: Adjust percentile tests to work with Perl number handling
2017-11-14 11:14:14 +01:00
Jim Ferenczi 29331f1127
Fail queries with scroll that explicitely set request_cache (#27342)
Queries that create a scroll context cannot use the cache.
They modify the search context during their execution so using the cache
can lead to duplicate result for the next scroll query.

This change fails the entire request if the request_cache option is explictely set
on a query that creates a scroll context (`scroll=1m`) and make sure internally that we never
use the cache for these queries when the option is not explicitely used.
For 6.x a deprecation log will be printed instead of failing the entire request and the request_cache hint
will be ignored (forced to false).
2017-11-10 16:02:06 +01:00
Boaz Leskes ace446f335 Update shrink's bwc version to 6.1.0 and enabled bwc tests 2017-11-07 15:35:46 +01:00
olcbean 7f593a26a3 Setting url parts as required to reflect the code base (#27263) 2017-11-06 09:58:27 -07:00
Nick Lang 09294a9b9a keys in aggs percentiles need to be in quotes. (#26905)
Languages which are stronger typed will failed when comparing these results
2017-11-06 17:45:04 +01:00
Russ Cam a0bdedb143 Align routing param type with search.json (#26958)
Relates https://github.com/elastic/elasticsearch-net/issues/2869
2017-11-06 17:34:22 +01:00
Simon Willnauer bd7efa908a Add ability to split shards (#26931)
This change adds a new `_split` API that allows to split indices into a new
index with a power of two more shards that the source index.  This API works
alongside the `_shrink` API but doesn't require any shard relocation before
indices can be split.

The split operation is conceptually an inverse `_shrink` operation since we
initialize the index with a _syntetic_ number of routing shards that are used
for the consistent hashing at index time. Compared to indices created with
earlier versions this might produce slightly different shard distributions but
has no impact on the per-index backwards compatibility.  For now, the user is
required to prepare an index to be splittable by setting the
`index.number_of_routing_shards` at index creation time.  The setting allows the
user to prepare the index to be splittable in factors of
`index.number_of_routing_shards` ie. if the index is created with
`index.number_of_routing_shards: 16` and `index.number_of_shards: 2` it can be
split into `4, 8, 16` shards. This is an intermediate step until we can make
this the default. This also allows us to safely backport this change to 6.x.

The `_split` operation is implemented internally as a DeleteByQuery on the
lucene level that is executed while the primary shards execute their initial
recovery. Subsequent merges that are triggered due to this operation will not be
executed immediately. All merges will be deferred unti the shards are started
and will then be throttled accordingly.

This change is intended for the 6.1 feature release but will not support pre-6.1
indices to be split unless these indices have been shrunk before. In that case
these indices can be split backwards into their original number of shards.
2017-11-06 11:37:55 +01:00
olcbean e440e23ad1 Fix inconsistencies in the rest api specs for `tasks` (#27163)
modify parameters names to reflect the changes done in the code base
2017-11-06 10:11:25 +01:00
Nhat fd3fac9565 Backport the size-based index rollver to v6.1.0
Relates #27004
2017-11-04 20:14:59 -04:00
Nhat c7ce5a07f2
Add size-based condition to the index rollover API (#27160)
This is to add a max_size condition to the index rollover API. We use
a totalSizeInBytes from DocsStats to evaluate this condition.

Closes #27004
2017-11-04 19:51:48 -04:00
Colin Goodheart-Smithe 2f65f3aaa7
Adjust bwc version for exists query tests 2017-11-02 08:40:34 +00:00
Colin Goodheart-Smithe 99aca9cdfc
Enhances exists queries to reduce need for `_field_names` (#26930)
* Enhances exists queries to reduce need for `_field_names`

Before this change we wrote the name all the fields in a document to a `_field_names` field and then implemented exists queries as a term query on this field. The problem with this approach is that it bloats the index and also affects indexing performance.

This change adds a new method `existsQuery()` to `MappedFieldType` which is implemented by each sub-class. For most field types if doc values are available a `DocValuesFieldExistsQuery` is used, falling back to using `_field_names` if doc values are disabled. Note that only fields where no doc values are available are written to `_field_names`.

Closes #26770

* Addresses review comments

* Addresses more review comments

* implements existsQuery explicitly on every mapper

* Reinstates ability to perform term query on `_field_names`

* Added bwc depending on index created version

* Review Comments

* Skips tests that are not supported in 6.1.0

These values will need to be changed after backporting this PR to 6.x
2017-11-01 10:46:59 +00:00
olcbean 354862c26e Set request body to required to reflect the code base (#27188)
Update API, Cluster Update Settings API and Put Index Template API didn't have the request body set to required in their spec, hence this commit updates the spec to align them with reality.
2017-11-01 10:54:43 +01:00
kel c3e2bdf20c Raise IllegalArgumentException if query validation failed (#26811)
Closes #26799
2017-10-31 12:17:27 +01:00
Honza Král 1dd6aeeb8d Exists template needs a template name (#25988) 2017-10-27 16:14:41 +02:00
Jason Tedor 0174d13ca2 Fix BWC for discovery stats
The new discovery stats were pushed to the 6.x branch (currently
versioned at 6.1.0) but master was not updated to reflect this. This
impacts the mixed-cluster BWC tests because a 6.1.0 node will be trying
to send a 7.0.0 node the new discovery stats but the 7.0.0 did not yet
understand that it should be reading these when talking to a 6.1.0
node. This commit addresses this, and changes the skip version on the
discovery stats REST tests.
2017-10-26 07:53:18 -04:00
Nhat b31028e8c4 bwc: do not check errmsg for put template request
Relates #27100
2017-10-25 21:43:20 -04:00
Nhat adc195e30c Fix error message for a put index template request without index_patterns (#27102)
Just correct the error message from "Validation Failed: 1: pattern is
missing;" to "Validation Failed: 1: index_patterns is missing;".

Closes #27100
2017-10-25 18:54:40 -04:00
David Turner cc3364e4f8 Stats to record how often the ClusterState diff mechanism is used successfully (#26973)
It's believed that using diffs obsoletes the other mechanism for reusing the
bits of the ClusterState that didn't change between updates, but in fact we
don't know for sure how often the diff mechanism works successfully. The stats
collected here will tell us.
2017-10-25 07:35:25 +01:00
olcbean bb013c60b5 Fix inconsistencies in the rest api specs for *_script (#26971) 2017-10-13 11:20:34 -07:00
olcbean 2310c8e2e2 fix inconsistencies in the rest api specs for cat.snapshots (#26996) 2017-10-13 10:57:23 -07:00
Glen Smith f3942799bb Cat shards bytes (#26952)
* Add `bytes` to cat.shards API spec

* add bytes param to rest-api-spec of cat.segments
2017-10-11 11:03:59 -06:00
Nhat bf4c3642b2 remove _primary and _replica shard preferences (#26791)
The shard preference _primary, _replica and its variants were useful
for the asynchronous replication. However, with the current impl, they
are no longer useful and should be removed.

Closes #26335
2017-10-08 11:03:06 -04:00
Karel Minarik 6825cafaa6 [API] Added the `terminate_after` parameter to the REST spec for "Count" API
Closes #26895
2017-10-08 14:13:20 +02:00