Commit Graph

23 Commits

Author SHA1 Message Date
Simon Willnauer e81804cfa4 Add a shard filter search phase to pre-filter shards based on query rewriting (#25658)
Today if we search across a large amount of shards we hit every shard. Yet, it's quite
common to search across an index pattern for time based indices but filtering will exclude
all results outside a certain time range ie. `now-3d`. While the search can potentially hit
hundreds of shards the majority of the shards might yield 0 results since there is not document
that is within this date range. Kibana for instance does this regularly but used `_field_stats`
to optimize the indexes they need to query. Now with the deprecation of `_field_stats` and it's upcoming removal a single dashboard in kibana can potentially turn into searches hitting hundreds or thousands of shards and that can easily cause search rejections even though the most of the requests are very likely super cheap and only need a query rewriting to early terminate with 0 results.

This change adds a pre-filter phase for searches that can, if the number of shards are higher than a the `pre_filter_shard_size` threshold (defaults to 128 shards), fan out to the shards
and check if the query can potentially match any documents at all. While false positives are possible, a negative response means that no matches are possible. These requests are not subject to rejection and can greatly reduce the number of shards a request needs to hit. The approach here is preferable to the kibana approach with field stats since it correctly handles aliases and uses the correct threadpools to execute these requests. Further it's completely transparent to the user and improves scalability of elasticsearch in general on large clusters.
2017-07-12 22:19:20 +02:00
Nik Everett 21b1db2965 Remove assemble from build task when assemble removed
Removes the `assemble` task from the `build` task when we have
removed `assemble` from the project. We removed `assemble` from
projects that aren't published so our releases will be faster. But
That broke CI because CI builds with `gradle precommit build` and,
it turns out, that `build` includes `check` and `assemble`. With
this change CI will only run `check` for projects without an
`assemble`.
2017-06-16 17:19:14 -04:00
Nik Everett 7b358190d6 Remove assemble task when not used for publishing (#25228)
Removes the `assemble` task from projects that are not published.
This should speed up `gradle assemble` by skipping projects that
don't need to be built. Which is useful because `gradle assemble`
is how we cut releases.
2017-06-16 11:46:34 -04:00
Alex Benusovich 5463294ec4 Fixed NPEs caused by requests without content. (#23497)
REST handlers that require a body will throw an an ElasticsearchParseException "request body required".
REST handlers that require a body OR source param will throw an ElasticsearchParseException "request body or source param required".
Replaced asserts in BulkRequest parsing code with a more descriptive IllegalArgumentException if the line contains an empty object.
Updated bulk REST test to verify an empty action line is rejected properly.
Updated BulkRequestTests with randomized testing for an empty action line.
Used try-with-resouces for XContentParser in AbstractBulkByQueryRestHandler.
2017-06-05 09:08:14 -06:00
Colin Goodheart-Smithe 779fb9a1c0 Adds nodes usage API to monitor usages of actions (#24169)
* Adds nodes usage API to monitor usages of actions

The nodes usage API has 2 main endpoints

/_nodes/usage and /_nodes/{nodeIds}/usage return the usage statistics
for all nodes and the specified node(s) respectively.

At the moment only one type of usage statistics is available, the REST
actions usage. This records the number of times each REST action class is
called and when the nodes usage api is called will return a map of rest
action class name to long representing the number of times each of the action
classes has been called.

Still to do:

* [x] Create usage service to store usage statistics
* [x] Record usage in REST layer
* [x] Add Transport Actions
* [x] Add REST Actions
* [x] Tests
* [x] Documentation

* Rafactors UsageService so counts are done by the handlers

* Fixing up docs tests

* Adds a name to all rest actions

* Addresses review comments
2017-06-02 08:46:38 +01:00
Simon Willnauer ce625ebdcc Expose `batched_reduce_size` via `_search` (#23288)
In #23253 we added an the ability to incrementally reduce search results.
This change exposes the parameter to control the batch since and therefore
the memory consumption of a large search request.
2017-02-21 18:36:59 +01:00
Simon Willnauer ecb01c15b9 Fold InternalSearchHits and friends into their interfaces (#23042)
We have a bunch of interfaces that have only a single implementation
for 6 years now. These interfaces are pretty useless from a SW development
perspective and only add unnecessary abstractions. They also require
lots of casting in many places where we expect that there is only one
concrete implementation. This change removes the interfaces, makes
all of the classes final and removes the duplicate `foo` `getFoo` accessors
in favor of `getFoo` from these classes.
2017-02-08 14:40:08 +01:00
Jason Tedor 9a0b216c36 Upgrade checkstyle to version 7.5
This commit upgrades the checkstyle configuration from version 5.9 to
version 7.5, the latest version as of today. The main enhancement
obtained via this upgrade is better detection of redundant modifiers.

Relates #22960
2017-02-03 09:46:44 -05:00
Jay Modi 7520a107be Optionally require a valid content type for all rest requests with content (#22691)
This change adds a strict mode for xcontent parsing on the rest layer. The strict mode will be off by default for 5.x and in a separate commit will be enabled by default for 6.0. The strict mode, which can be enabled by setting `http.content_type.required: true` in 5.x, will require that all incoming rest requests have a valid and supported content type header before the request is dispatched. In the non-strict mode, the Content-Type header will be inspected and if it is not present or not valid, we will continue with auto detection of content like we have done previously.

The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to auto-detect the content type.

As part of this, many transport requests and builders were updated to provide methods that
accepted the XContentType along with the bytes and the methods that would rely on auto-detection have been deprecated.

In the non-strict mode, deprecation warnings are issued whenever a request with body doesn't provide the Content-Type header.

See #19388
2017-02-02 14:07:13 -05:00
Nik Everett 6265ef1c1b Deguice rest handlers (#22575)
There are presently 7 ctor args used in any rest handlers:
* `Settings`: Every handler uses it to initialize a logger and
  some other strange things.
* `RestController`: Every handler registers itself with it.
* `ClusterSettings`: Used by `RestClusterGetSettingsAction` to
  render the default values for cluster settings.
* `IndexScopedSettings`: Used by `RestGetSettingsAction` to get
  the default values for index settings.
* `SettingsFilter`: Used by a few handlers to filter returned
  settings so we don't expose stuff like passwords.
* `IndexNameExpressionResolver`: Used by `_cat/indices` to
  filter the list of indices.
* `Supplier<DiscoveryNodes>`: Used to fill enrich the response
  by handlers that list tasks.

We probably want to reduce these arguments over time but
switching construction away from guice gives us tighter
control over the list of available arguments.

These parameters are passed to plugins using
`ActionPlugin#initRestHandlers` which is expected to build and
return that handlers immediately. This felt simpler than
returning an reference to the ctors given all the different
possible args.

Breaks java plugins by moving rest handlers off of guice.
2017-01-20 11:48:51 -05:00
javanna ded694fc83 Make StatusToXContent extend ToXContentObject and rename it to StatusToXContentObject
This also allows to make RestToXContentListener require ToXContentObject rather than ToXContent
2017-01-06 23:31:48 +01:00
Luca Cavanna 5b8bdba12e Remove subrequests method from CompositeIndicesRequest (#21873) 2016-11-30 15:03:58 +01:00
Ryan Ernst d14c470b89 Remove generics from ActionRequest
closes #21368
2016-11-14 15:32:01 -08:00
Areek Zillur 0e8b6532ec rename DocumentRequest to DocWriteRequest 2016-10-11 16:00:10 -04:00
Areek Zillur 396f80c963 Revert "rename DocumentRequest to DocumentWriteRequest"
This reverts commit b5079ce009.
2016-10-07 17:50:07 -04:00
Areek Zillur b5079ce009 rename DocumentRequest to DocumentWriteRequest 2016-10-06 05:05:59 -04:00
Areek Zillur bd4a03a426 Merge branch 'master' into cleanup/transport_bulk 2016-10-04 14:06:17 -04:00
Jason Tedor 51d53791fe Remove lenient URL parameter parsing
Today when parsing a request, Elasticsearch silently ignores incorrect
(including parameters with typos) or unused parameters. This is bad as
it leads to requests having unintended behavior (e.g., if a user hits
the _analyze API and misspell the "tokenizer" then Elasticsearch will
just use the standard analyzer, completely against intentions).

This commit removes lenient URL parameter parsing. The strategy is
simple: when a request is handled and a parameter is touched, we mark it
as such. Before the request is actually executed, we check to ensure
that all parameters have been consumed. If there are remaining
parameters yet to be consumed, we fail the request with a list of the
unconsumed parameters. An exception has to be made for parameters that
format the response (as opposed to controlling the request); for this
case, handlers are able to provide a list of parameters that should be
excluded from tripping the unconsumed parameters check because those
parameters will be used in formatting the response.

Additionally, some inconsistencies between the parameters in the code
and in the docs are corrected.

Relates #20722
2016-10-04 12:45:29 -04:00
Areek Zillur 248ac240ed Merge branch 'master' into cleanup/transport_bulk 2016-10-03 16:12:11 -04:00
Ryan Ernst 85b8f29415 Build: Remove old maven deploy support (#20403)
* Build: Remove old maven deploy support

This change removes the old maven deploy that we have in parallel to
maven-publish, and makes maven-publish fully work with publishing to
maven local. Using `gradle publishToMavenLocal` should be used to
publish to .m2.

Note that there is an unfortunate hack that means for
zip artifacts we must first create/publish a dummy pom file, and then
follow that with the real pom file. It would be nice to have the pom
file contains packaging=zip, but maven central then requires sources and
javadocs. But our zips are really just attached artifacts, so we already
set the packaging type to pom for our zip files. This change just works
around a limitation of the underlying maven publishing library which
silently skips attached artifacts when the packaging type is set to pom.

relates #20164
closes #20375

* Remove unnecessary extra spacing
2016-09-19 15:10:41 -07:00
Jim Ferenczi 1764ec56b3 Fixed naming inconsistency for fields/stored_fields in the APIs (#20166)
This change replaces the fields parameter with stored_fields when it makes sense.
This is dictated by the renaming we made in #18943 for the search API.

The following list of endpoint has been changed to use `stored_fields` instead of `fields`:
* get
* mget
* explain

The documentation and the rest API spec has been updated to cope with the changes for the following APIs:
* delete_by_query
* get
* mget
* explain

The `fields` parameter has been deprecated for the following APIs (it is replaced by _source filtering):
* update: the fields are extracted from the _source directly.
* bulk: the fields parameter is used but fields are extracted from the source directly so it is allowed to have non-stored fields.

Some APIs still have the `fields` parameter for various reasons:
* cat.fielddata: the fields paramaters relates to the fielddata fields that should be printed.
* indices.clear_cache: used to indicate which fielddata fields should be cleared.
* indices.get_field_mapping: used to filter fields in the mapping.
* indices.stats: get stats on fields (stored or not stored).
* termvectors: fields are retrieved from the stored fields if possible and extracted from the _source otherwise.
* mtermvectors:
* nodes.stats: the fields parameter is used to concatenate completion_fields and fielddata_fields so it's not related to stored_fields at all.

Fixes #20155
2016-09-13 20:54:41 +02:00
Daniel Mitterdorfer 4460998ff8 Remove obsolete NoopSearchRequestBuilder#setNoStoredFields() 2016-08-26 09:58:53 +02:00
Daniel Mitterdorfer 7b81c4ca59 Add client-benchmark-noop-api-plugin to stress clients even more in benchmarks (#20103) 2016-08-26 09:05:47 +02:00