A request that relates to indices (`IndicesRequest` or `CompositeIndicesRequest`) might be converted to some other internal request(s) (e.g. shard level request) that get distributed over the cluster. Those requests contain the concrete index they refer to, but it is not known which indices (or aliases or expressions) the original request related to.
This commit makes sure that the original indices are available as part of the shard level requests and makes them implement `IndicesRequest` as well.
Also every internal request should be created passing in the original request, so that the original headers, together with the eventual original indices and options, get copied to it. Corrected some places where this information was lost.
NOTE: As for the bulk api and other multi items api (e.g. multi_get), their shard level requests won't keep around the whole set of original indices, but only the ones that related to the bulk items sent to each shard, the important bit is that we keep the original names though, not only the concrete ones.
Closes#7319
This change fixes the creation circle shapes o it calculates it correctly instead of essentially using the diameter as the radius. The radius has to be converted into degrees but calculating the ratio of the desired radius to the circumference of the earth and then multiplying it by 360 (number of degrees around the earths circumference). This issue here was that it was only multiplied by 180 making the result out by a factor of 2. Also made the test for circles actually check to make sure it has the correct centre and radius.
Closes#7301
Recent test failures triggered by #7289 were caused by this, simply because internal node settings (transport type key) that are not supported by the external older nodes were copied to them by mistake.
* Removed & refactored unused module code
* Allowed to set transports programmatically
* Allow to set the source of the changed transport
Note: The current implementation breaks BWC as you need to specify a concrete
transport now instead of a module if you want to use a different
Transport or HttpServerTransport
Closes#7289
TransportShardSingleOperationAction is currently subclassed by different transport actions. Some of them are internal only, meaning that their execution will take place only in the same node where their parent execution took place. That means that their main transport handler doesn't need to be registered, the only transport handler that's needed is the shard level one.
Added `isSubAction` method (defaults to false) to the parent class that tells whether the action is a main one or a subaction, used to decide whether we need to register the main transport handler.
Closes#7285
These two aggregators basically do exactly the same thing, they just interpret
bytes differently. This refactoring found an (unreleased) bug in the long terms
aggregator which didn't work correctly with duplicate values.
Close#7279
This was causing too much work e.g. when pulling node stats or when
opening a new reader, because the least_used distributor would
unnecessarily check free disk space on all path.data entires every
time we try to open a file for reading or check its length.
Closes#7306Closes#7323
This issue has been fixed in commons-cli:1.3 project which sadly has not been released yet.
See https://issues.apache.org/jira/browse/CLI-183
This patch builds another list of options with no selected groups by default.
When commons-cli:1.3 will be released, we need to remove this patch.
Closes#7282.
Switch management threads to a fixed thread pool with up to 5 threads, and queue size of 100 by default, after which excess incoming requests are rejected.
Closes#7318Closes#7320
The score is explained already, it should not be again explained per function.
Also, remove explanation from parameter list of ScoreFunction#explainScore()
and leave only the score.
This also removes ExplainableSearchScript which is not used anywhere and
was the only reason to have the Explanation in the parameter anyway.
closes#7245
The terms and histogram aggregations always have an order. So it would make the
response easier to consume to return the buckets as a list instead of a
collection in order to make it easier to do things like getting the first/last
buckets.
Close#7275
This change means that the default settings for expand_wildcards are only applied if the expand_wildcards parameter is not specified rather than being set upfront. It also adds the none and all options to the parameter to allow the user to specify no expansion and expansion to all indexes (equivalent to 'open,closed')
Closes#7258
Added @Nullable to:
- IndicesService.indexService
- IndexService.shard
- IndexService.shardInjector
This change doesn't try to do anything smart but just makes sure that a
*MissingException is thrown instead of a NullPointerException when the requested
object doesn't exist.
Close#7251
Also replaced int,String pair with ShardId that holds the same info and serializes it the same way.
Replaced shardId and index getters in BroadcastOperationRequest with a single ShardId getter.
Closes#7255
The geohash grid it 8 cells wide and 4 cells tall. GeoHashUtils.neighbor(String,int,int.int) set the limit of the number of cells in y to < 3 rather than <= 3 resulting in it either not finding all neighbours or incorrectly searching for a neighbour in a different parent cell.
Closes#7226
This change stores the index creation time in the index metadata when an index is created. The creation time cannot be changed but can be set as part of the create index request to allow for correct creation times for historical data.
Closes#7119
In order to have the possibility of debugging on the command line, the user
now can either set the es.cli.debug system property
which results in stack traces being written to to the terminal.
Closes#7222
A range filter on a date field with a numeric `from`/`to` value is **not** cached by default:
DELETE /test
PUT /test/t/1
{
"date": "2014-01-01"
}
GET /_validate/query?explain
{
"query": {
"filtered": {
"filter": {
"range": {
"date": {
"from": 0
}
}
}
}
}
}
Returns:
"explanation": "ConstantScore(no_cache(date:[0 TO *]))"
This patch fixes as well not caching `from`/`to` when using `now` value not rounded.
Previously, a query like:
GET /_validate/query?explain
{
"query": {
"filtered": {
"filter": {
"range": {
"date": {
"from": "now"
"to": "now/d+1"
}
}
}
}
}
}
was cached.
Also, this patch does not cache anymore `now` even if the user asked for caching it.
As it won't be cached at all by definition.
Added as well tests for all possible combinations.
Closes#7114.
When installing a bin only plugin, it is identified as a site plugin.
A current workaround would be to create in the zip file another empty dir. So if you have:
* `bin/myfile.sh`
* `empty/empty.txt`
the `bin` content will be extracted as expected.
Closes#7152.
indexRandom will try to delete bogus documents multiple times since they get tracked by indexOrAlias/id, and after the actual deletion any other attempt throws error and fails the test
An anti-pattern that we have in our code, noticeable for java API users, is that we modify incoming requests by replacing the index or alias with the concrete index. This way not only the request has changed, but all following communications that use that request will lose the information on whether the original request was performed against an alias or an index.
Refactored the following base classes: `TransportShardReplicationOperationAction`, `TransportShardSingleOperationAction`, `TransportSingleCustomOperationAction`, `TransportInstanceSingleOperationAction` and all subclasses by introduced an InternalRequest object that contains the original request plus additional info (e.g. the concrete index). This internal request doesn't get sent over the transport but rebuilt on each node on demand (not different to what currently happens anyway, as concrete index gets set on each node). When the request becomes a shard level request, instead of using the only int shardId we serialize the ShardId that contains both concrete index name (which might then differ ffrom the original one within the request) and shard id.
Using this pattern we can move get, multi_get, explain, analyze, term_vector, multi_term_vector, index, delete, update, bulk to not replace the index name with the concrete one within the request. The index name within the original request will stay the same.
Made it also clearer within the different transport actions when the index needs to be resolved and when that's not needed (e.g. shard level request), by exposing `resolveIndex` method. Moved check block methods to parent classes as their content was always the same on every subclass.
Improved existing tests by randomly introducing the use of an alias, and verifying that the responses always contain the concrete index name and not the original one, as that's the expected behaviour.
Added backwards compatibility tests to make sure that the change is applied in a backwards compatible manner.
Closes#7223
If a geo_shape had edges which either ran vertically along the dateline or touched the date line but did not cross it they would fail to parse. This is because the code which splits a polygon along the dateline did not take into account the case where the polygon touched but did not cross the dateline. This PR fixes those issues and provides tests for them.
Close#7016
The `.percolator` type is a hidden type and therefor the types from the delete mapping request should passed down to the delete by query request, otherwise the percolator type gets ignored and the percolator queries don't get deleted from disk (only unregistered).
Closes#7087
internal cluster communication.
See CorruptedCompressorTests for details on how this bug can be hit.
This change also removes the ability to use the unsafe variant of
ChunkedEncoder, removing support for the compress.lzf.decoder setting.
If a dynamic mapping for a geo_point field is defined and the first document specifies the value of the field as a geo_point array, the dynamic mapping throws an error as the array is broken into individual number before consulting the dynamic mapping configuration. This change adds a check of the dynamic mapping before the array is split into individual numbers.
Closes#6939
This test assumed that the `num` field was mapped as an integer on all shards
and thus that all of them should fail when providing a timezone. However, since
it used dynamic mappings, some shards might have this field not mapped, as a
consequence they didn't fail.
TransportSingleCustomOperationAction is subclassed by two similar, yet different transport action: TransportAnalyzeAction and TransportGetFieldMappingsAction. Made their difference and similarities more explicit by sharing common code and moving specific code to subclasses:
- moved index field to the parent SingleCustomOperationAction class
- moved the common check blocks code to the parent transport action class
- moved the main transport handler to the TransportAnalyzeAction subclass as it is only used to receive external requests through clients. In the case of the TransportGetFieldMappingsIndexAction instead, the action is internal and executed only locally as part of the user facing TransportGetFieldMappingsAction. The corresponding request gets sent over the transport though as part of the related shard request
- removed the get field mappings index action from the action names mapping as it is not a transport handler anymore. It was before although never used.
Closes#7214
TransportIndexReplicationAction is always executed locally, as an internal action that is part of either delete by query or delete (when routing is required but not specified). Only the corresponding shard level requests get sent over the transport, hence no transport endpoint is needed for the index version, nor the index request itself is supposed to be sent over the transport.
Moved classes from org.elasticsearch.action.delete.index to org.elasticsearch.action.delete and adjusted visibility so that internal requests are not public anymore.
Also removed serialization code from IndexDeleteResponse as it never gets sent over transport either.
Closes#7211
Index, type and id were returned as part of the REST explain api response, but not through java api. That info was read out of the request, relying on the fact that the index would get overridden with the concrete one within that same request.
Closes#7201
Until now, IP addresses were only checked for four dots, which
allowed invalid values like 127.0.0.111111
This adds an additional check for validation.
Closes#7131
- The context enables setting arbitrary transient data on the message (this data is not serialized with the request)
- Changed header accessors/mutators so header manipulation will be done directly on the request (to void NPE with transport message headers when dealing with maps that can potentially be null)
It also didn't follow the setter convention that we adopted for request builders.
Fixed also javadocs warning caused byt missing descriptions for tag.
Closes#7186
The second internal node, when present, wasn't able to join the existing cluster due ti misconfigured unicast hosts, thus it would form its own cluster.
we need that in order for refresh to be effective and actually refresh in the second round of indexing, otherwise, it caches a 0 docs shard and a refresh won't expire anything there
A request level flag, defaults to be unset, to control the query cache. When not set, it defaults to the index level settings, when explicitly set, will override the index level setting
closes#7167
In the case of inserts the UpdateHelper class will now allow the script used to apply updates to run on the upsert doc provided by clients. This allows the logic for managing the internal state of the data item to be managed by the script and is not reliant on clients performing the initialisation of data structures managed by the script.
Closes#7143
The query cache allow to cache the (binary serialized) response of the shard level query phase execution based on the actual request as the key. The cache is fully coherent with the semantics of NRT, with a refresh (that actually ended up refreshing) causing previous cached entries on the relevant shard to be invalidated and eventually evicted.
This change enables query caching as an opt in index level setting, called `index.cache.query.enable` and defaults to `false`. The setting can be changed dynamically on an index. The cache is only enabled for search requests with search_type count.
The indices query cache is a node level query cache. The `indices.cache.query.size` controls what is the size (bytes wise) the cache will take, and defaults to `1%` of the heap. Note, this cache is very effective with small values in it already. There is also the advanced option to set `indices.cache.query.expire` that allow to control after a certain time of inaccessibility the cache will be evicted.
Note, the request takes the search "body" as is (bytes), and uses it as the key. This means same JSON but with different key order will constitute different cache entries.
This change includes basic stats (shard level, index/indices level, and node level) for the query cache, showing how much is used and eviction rates.
While this is a good first step, and the goal is to get it in, there are a few things that would be great additions to this work, but they can be done as additional pull requests:
- More stats, specifically cache hit and cache miss, per shard.
- Request level flag, defaults to "not set" (inheriting what the setting is).
- Allowing to change the cache size using the cluster update settings API
- Consider enabling the cache to query phase also when asking hits are involved, note, this will only include the "top docs", not the actual hits.
- See if there is a performant manner to solve the "out of order" of keys in the JSON case.
- Maybe introduce a filter element, that is outside of the request, that is checked, and if it matches all docs in a shard, will not be used as part of the key. This will help with time based indices and moving windows for shards that fall "inside" the window to be more effective caching wise.
- Add a more infra level support in search context that allows for any element to mark the search as non deterministic (on top of the support for "now"), and use it to not cache search responses.
closes#7161
This adds support to return the "Access-Control-Allow-Credentials" header
if needed, so CORS will work flawlessly with authenticated applications.
Closes#6380
It's now possible to define the additional customesettings for transport clients by extending `transportClientSettings` callback method on `ElasticsearchIntegrationTest`.
Filters and Queries now supports `time_zone` parameter which defines which time zone should be applied to the query or filter to convert it to UTC time based value.
When applied on `date` fields the `range` filter and queries accept also a `time_zone` parameter.
The `time_zone` parameter will be applied to your input lower and upper bounds and will move them to UTC time based date:
[source,js]
--------------------------------------------------
{
"constant_score": {
"filter": {
"range" : {
"born" : {
"gte": "2012-01-01",
"lte": "now",
"time_zone": "+1:00"
}
}
}
}
}
{
"range" : {
"born" : {
"gte": "2012-01-01",
"lte": "now",
"time_zone": "+1:00"
}
}
}
--------------------------------------------------
In the above examples, `gte` will be actually moved to `2011-12-31T23:00:00` UTC date.
NOTE: if you give a date with a timezone explicitly defined and use the `time_zone` parameter, `time_zone` will be
ignored. For example, setting `from` to `2012-01-01T00:00:00+01:00` with `"time_zone":"+10:00"` will still use `+01:00` time zone.
Closes#3729.
Our transport relies on action names that tell what we need to do with each message received and sent on any node, together with the content of the request itself.
The action names could use a better categorization and more consistent naming though, the following are the categories introduced with this commit:
- indices: for all the apis that execute against indices
- admin: for the apis that allow to perform administration tasks against indices
- data: for the apis that are about data
- read: apis that read data
- write: apis that write data
- benchmark: apis that run benchmarks
- cluster: for all the cluster apis
- admin: for the cluster apis that allow to perform administration tasks
- monitor: for the cluster apis that allow to monitor the system
- internal: for all the internal actions that are used from node to node but not directly exposed to users
The change is applied in a backwards compatible manner: we keep the mapping old-to-new action name around, and when receiving a message, depending on the version of the node we receive it from, we use the received action name or we convert it to the previous version (old to new if version < 1.4). When sending a message, depending on the version of the node we talk to, we use the updated action or we convert it to the previous version (new to old if version < 1.4).
For the cases where we don't know the version of the node we talk to, namely unicast ping, transport client nodes info and transport client sniff mode (which calls cluster state), we just use a lower bound for the version, thus we will always use the old action name, which can be understood by both old nodes and new nodes.
Added test that enforces known updated categories for transport action names and test that verifies all action names have a pre 1.4 version for bw compatibility
Added backwards compatibility tests for unicast and transport client in sniff mode, the one for the ordinary transport client (which calls nodes info) is implicit as it's used all the time in our bw comp tests.
Added also backwards comp test that sends an empty message to any of the registered transport handler exposed by older nodes and verifies that what gets back is not ActionNotFoundTransportException, which would mean that there is a problem in the actions mappings.
Added TestCluster#getClusterName abstract method and allow to retrieve externalTransportAddress and internalCluster from CompositeTestCluster.
Closes#7105
If simultaneous create & delete operations arrive against the same id,
it's possible that primary and replica see those operations in
different orders, which may result in replica throwing
DocumentAlreadyExistsException when the primary didn't which would
lead to replica being inconsistent (missing a document that primary
had indexed).
This push fixes the issue, by never throwing DAEE from the replica on
create.
Closes#7146#7142
Made sure that the routing required check is performed against the concrete index, added use of aliases to existing routing tests.
Taken the change to unify the failure message as well to this form: routing is required for [" + index + "]/[" + type + "]/[" + id + "]
Closes#7145
Before the index reader used by the percolator didn't allow to register a CoreCloseListener, but now it does, making it safe to cache index field data cache entries.
Creating field data structures is relatively expensive and caching them can save a lot of noise if many queries are evaluated in a percolator call.
Closes#6806Closes#7081
Fields of type `token_count`, `murmur3`, `_all` and `_field_names` are generated only when indexing.
If a GET requests accesses the transaction log (because no refresh
between indexing and GET request) then these fields cannot be retrieved at all.
Before the behavior was so:
`_all, _field_names`: The field was siletly ignored
`murmur3, token_count`: `NumberFormatException` because GET tried to parse the values from the source.
In addition, if these fields were not stored, the same behavior occured if the fields were
retrieved with GET after a `refresh()` because here also the source was used to get the fields.
Now, GET accepts a parameter `ignore_errors_on_generated_fields` which has
the following effect:
- Throw exception with meaningful error message explaining the problem if set to false (default)
- Ignore the field if set to true
- Always ignore the field if it was not set to stored
This changes the behavior for `_all` and `_field_names` as now an Exception is thrown if a user
tries to GET them before a `refresh()`.
closes#6676closes#6973
Allow to set the value default to network.tcp.no_delay and network.tcp.keep_alive so they won't be set at all, since on solaris, setting tcpNoDelay can actually cause failure
relates to #7115
CliTool is a base class for command-line interface tools (such as the plugin manager and potentially others). It supports the following:
- single or multi command tool
- help printing infrastructure (based on help files)
- consistent mechanism of parsing arguments (based on commons-cli lib)
- separation of argument parsing and command execution (for easier unit testing)
- terminal abstraction (will use System.console() when available)
A multi-bucket aggregation where multiple filters can be defined (each filter defines a bucket). The buckets will collect all the documents that match their associated filter.
This aggregation can be very useful when one wants to compare analytics between different criterias. It can also be accomplished using multiple definitions of the single filter aggregation, but here, the user will only need to define the sub-aggregations only once.
Closes#6118
Now that we have explicit support for aliases when creating indices and as part of index templates, we may remove support for aliases (only names) as part of index settings. This is partially breaking as the following calls:
curl -XPUT localhost:9200/index -d '{
"settings" : {
"aliases" : [ "alias1"]
}
}
and
curl -XPUT localhost:9200/index -d '{
"settings" : {
"index.aliases" : [ "alias1"]
}
}
were previously supported and will need to be replaced with
curl -XPUT localhost:9200/index -d '{
"aliases" : {
"alias1": {}
}
}
Closes#5545
The histogram reduce method can run into an infinite loop if the
Rounding.nextRoundingValue value is buggy, which happened to be the case for
DayTimeZoneRoundingFloor.
DayTimeZoneRoundingFloor is fixed, and the histogram reduce method has been
changed to fail instead of running into an infinite loop in case of a buffy
nextRoundingValue impl.
Close#6965
Today, `copy_to` always copies a field to the current document, which is often
wrong in the case of nested documents. For example, if you have a nested field
called `n` which has a sub-field `n.source` whose content should be copied to
`target`, then the latter field should be created in the root document instead
of the nested one, since it doesn't have `n.` as a prefix. On the contrary, if
you configure the destination field to be `n.target`, then it should go to the
nested document.
Close#6701
Implements a new Exists API allowing users to do fast exists check on any matched documents for a given query.
This API should be faster then using the Count API as it will:
- early terminate the search execution once any document is found to exist
- return the response as soon as the first shard reports matched documents
closes#6995
Index process fails when having `_timestamp` enabled and `path` option is set.
It fails with a `TimestampParsingException[failed to parse timestamp [null]]` message.
Reproduction:
```
DELETE test
PUT test
{
"mappings": {
"test": {
"_timestamp" : {
"enabled" : "yes",
"path" : "post_date"
}
}
}
}
PUT test/test/1
{
"foo": "bar"
}
```
You can define a default value for when timestamp is not provided
within the index request or in the `_source` document.
By default, the default value is `now` which means the date the document was processed by the indexing chain.
You can disable that default value by setting `default` to `null`. It means that `timestamp` is mandatory:
```
{
"tweet" : {
"_timestamp" : {
"enabled" : true,
"default" : null
}
}
}
```
If you don't provide any timestamp value, indexation will fail.
You can also set the default value to any date respecting timestamp format:
```
{
"tweet" : {
"_timestamp" : {
"enabled" : true,
"format" : "YYYY-MM-dd",
"default" : "1970-01-01"
}
}
}
```
If you don't provide any timestamp value, indexation will fail.
Closes#4718.
Closes#7036.
when there is a cluster block (like no master yet discovered), the bulk action doesn't properly catch the exception of inner execute to notify the listener, causing the bulk operation to hang
closes#7086