This commit modifies the initial value of the transport client
round-robin index to a random value so that initial requests are more
likely to not all hit the same node.
Relates #14143
Due to some optimization on the netty layer we had quite some code / cruft
added to the TcpTransport to allow for those optimizations. After cleaning
up BytesReference we can now move this optimization into TcpTransport and
have a simple send method on the implementation layer instead. This commit
adds a CompositeBytesReference that also allows message headers to be written
separately which simplify the header code as well since no skips are needed
anymore.
The top-level class Throwable represents all errors and exceptions in
Java. This hierarchy is divided into Error and Exception, the former
being serious problems that applications should not try to catch and the
latter representing exceptional conditions that an application might
want to catch and handle. This commit renames
org.elasticsearch.cli.UserError to org.elasticsearch.UserException to
make its name consistent with where it falls in this hierarchy.
Relates #19254
Today when reading a malformed operation from the translog, we throw an
assertion error that is immediately caught and wrapped into a translog
corrupted exception. This commit replaces this by electing to directly
throw a translog corrupted exception instead.
Additionally, this cleanup also addressed a double-wrapped translog
corrupted exception. Namely, verifying the checksum can throw a translog
corrupted exception which the existing code would catch and wrap again
in a translog corrupted exception.
Relates #19256
We've been slowly improving batch support in `ClusterService` so service won't need to implement this tricky logic themselves. These good changes are blessed but our logging infra didn't catch up and we now log things like:
```
[2016-07-04 21:51:22,318][DEBUG][cluster.service ] [node_sm0] processing [put-mapping [type1],put-mapping [type1]]:
```
Depending on the `source` string this can get quite ugly (mostly in the ZenDiscovery area).
This PR adds some infra to improve logging, keeping the non-batched task the same. As result the above line looks like:
```
[2016-07-04 21:44:45,047][DEBUG][cluster.service ] [node_s0] processing [put-mapping[type0, type0, type0]]: execute
```
ZenDiscovery waiting on join moved from:
```
[2016-07-04 17:09:45,111][DEBUG][cluster.service ] [node_t0] processing [elected_as_master, [1] nodes joined),elected_as_master, [1] nodes joined)]: execute
```
To
```
[2016-07-04 22:03:30,142][DEBUG][cluster.service ] [node_t3] processing [elected_as_master ([3] nodes joined)[{node_t2}{R3hu3uoSQee0B6bkuw8pjw}{p9n28HDJQdiDMdh3tjxA5g}{127.0.0.1}{127.0.0.1:30107}, {node_t1}{ynYQfk7uR8qR5wKIysFlQg}{wa_OKuJHSl-Oyl9Gis-GXg}{127.0.0.1}{127.0.0.1:30106}, {node_t0}{pweq-2T4TlKPrEVAVW6bJw}{NPBSLXSTTguT1So0JsZY8g}{127.0.0.1}{127.0.0.1:30105}]]: execute
```
As a bonus, I removed all `zen-disco` prefixes to sources from that area.
Node IDs are currently randomly generated during node startup. That means they change every time the node is restarted. While this doesn't matter for ES proper, it makes it hard for external services to track nodes. Another, more minor, side effect is that indexing the output of, say, the node stats API results in creating new fields due to node ID being used as keys.
The first approach I considered was to use the node's published address as the base for the id. We already [treat nodes with the same address as the same](https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/discovery/zen/NodeJoinController.java#L387) so this is a simple change (see [here](https://github.com/elastic/elasticsearch/compare/master...bleskes:node_persistent_id_based_on_address)). While this is simple and it works for probably most cases, it is not perfect. For example, if after a node restart, the node is not able to bind to the same port (because it's not yet freed by the OS), it will cause the node to still change identity. Also in environments where the host IP can change due to a host restart, identity will not be the same.
Due to those limitation, I opted to go with a different approach where the node id will be persisted in the node's data folder. This has the upside of connecting the id to the nodes data. It also means that the host can be adapted in any way (replace network cards, attach storage to a new VM). I
It does however also have downsides - we now run the risk of two nodes having the same id, if someone copies clones a data folder from one node to another. To mitigate this I changed the semantics of the protection against multiple nodes with the same address to be stricter - it will now reject the incoming join if a node exists with the same id but a different address. Note that if the existing node doesn't respond to pings (i.e., it's not alive) it will be removed and the new node will be accepted when it tries another join.
Last, and most importantly, this change requires that *all* nodes persist data to disk. This is a change from current behavior where only data & master nodes store local files. This is the main reason for marking this PR as breaking.
Other less important notes:
- DummyTransportAddress is removed as we need a unique network address per node. Use `LocalTransportAddress.buildUnique()` instead.
- I renamed `node.add_lid_to_custom_path` to `node.add_lock_id_to_custom_path` to avoid confusion with the node ID which is now part of the `NodeEnvironment` logic.
- I removed the `version` paramater from `MetaDataStateFormat#write` , it wasn't really used and was just in the way :)
- TribeNodes are special in the sense that they do start multiple sub-nodes (previously known as client nodes). Those sub-nodes do not store local files but derive their ID from the parent node id, so they are generated consistently.
Once all of these are migrated we'll be able to remove aggregation's
custom "streams" which function that same as NamedWriteable. It also
allows us to make most of the fields on aggregations final which is
rather nice.
Also starts to migrate MultiBucketAggregation.Bucket to Writeable,
allowing the buckets to have immutable parts.
Once all of these are migrated we'll be able to remove aggregation's
custom "streams" which function that same as NamedWriteable. It also
allows us to make most of the fields on aggregations final which is
rather nice.
Today throughout the codebase, catch throwable is used with reckless
abandon. This is dangerous because the throwable could be a fatal
virtual machine error resulting from an internal error in the JVM, or an
out of memory error or a stack overflow error that leaves the virtual
machine in an unstable and unpredictable state. This commit removes
catch throwable from the codebase and removes the temptation to use it
by modifying listener APIs to receive instances of Exception instead of
the top-level Throwable.
Relates #19231
Rename `fields` to `stored_fields` and add `docvalue_fields`
`stored_fields` parameter will no longer try to retrieve fields from the _source but will only return stored fields.
`fields` will throw an exception if the user uses it.
Add `docvalue_fields` as an adjunct to `fielddata_fields` which is deprecated. `docvalue_fields` will try to load the value from the docvalue and fallback to fielddata cache if docvalues are not enabled on that field.
Closes#18943
With this commit we also propagate the `canTripCircuitBreaker`
setting for the main action in TransportBroadcastByNodeAction.
Previously, we set it only on the additional action added by
this handler.
This change removes a handful of classes and methods that were simply
unused. Some of the classes were intermediate abstract classes that
added nothing to the base class they extended.
Primary relocation and indexing concurrently can currently lead to a deadlock situation as indexing operations are blocked on a (bounded) thread pool during the hand-off phase between old and new primary. This change replaces blocking of indexing operations by putting operations that cannot be executed during relocation hand-off in a queue to be executed once relocation completes.
Closes#18553.
The only reason for LifecycleComponent taking a generic type was so that
it could return that type on its start and stop methods. However, this
chaining has no practical necessity. Instead, start and stop can be
void, and a whole bunch of confusing generics disappear.
Before, a repository would maintain an index file (named 'index') per
repository, that contained the current snapshots in the repository.
This file was not atomically written, so repositories had to depend on
listing the blobs in the repository to determine what the current
snapshots are, and only rely on the index file if the repository does
not support the listBlobs operation. This could cause an incorrect view
of the current snapshots in the repository if any prior snapshot delete
operations failed to delete snapshot metadata files.
This commit introduces the atomic writing of the index file, and because
atomic writes are not guaranteed if the file already exists, we write to
a generational index file (index-N, where N is the current generation).
We also maintain an index-latest file that contains the current
generation, for those repositories that cannot list blobs.
Closes#19002
Relates #18156
These are the first aggregations with multiple `InternalAggregation`s
backing the same `AggregationBuilder`. This required a change in the
register method's signature.
NodeService has an "service attributes" map, which is only
set by HttpServer on start/stop. But the only thing it puts in this map
is already available as part of the HttpServer info which is added to
node info requests. This change removes the attributes map and removes
the dependency in HttpServer on NodeService.
BytesReference should be a really simple interface, yet it has a gazillion
ways to achieve the same this. Methods like `#hasArray`, `#toBytesArray`, `#copyBytesArray`
`#toBytesRef` `#bytes` are all really duplicates. This change simplifies the interface
dramatically and makes implementations of it much simpler. All array access has been removed
and is streamlined through a single `#toBytesRef` method. Utility methods to materialize a
compact byte array has been added too for convenience.
Migrates the `stats` and `extended_stats` aggregations and pipeline
aggregations from the special purpose aggregations streams to
`NamedWriteable`. These are the first pipeline aggregations so this
adds the infrastructure to support both streams and `NamedWriteable`s
for pipeline aggregations.
Raise IOException on deleteBlob if the blob doesn't exist
This commit raises an IOException on BlobContainer#deleteBlob
if the blob does not exist, in conformance with the BlobContainer
interface contract. Each implementation of BlobContainer now
conforms to this contract (file system, S3, Azure, HDFS). This
commit also contains blob container tests for each of the
repository implementations.
Closes#18530
We'll migrate to NamedWriteable so we can share code with the rest
of the system. So we can work on this in multiple pull requests without
breaking Elasticsearch in between the commits this change supports
*both* old style `InternalAggregations.stream` serialization and
`NamedWriteable` style serialization. As such it creates about a
half dozen `// NORELEASE` comments that will have to be removed
once the migration is complete.
This also introduces a boolean `transportClient` flag to `SearchModule`
which is used to skip inappropriate registrations for for the
transport client while still registering the things it needs. In
this case that means that the `InternalAggregation` subclasses are
registered with the `NamedWriteableRegistry` but the `AggregationBuilder`
subclasses are not.
Finally, this moves aggregation registration from guice configuration
time to `SearchModule` construction time. This will make it simpler to
work with in the future as we further clean up Elasticsearch's
extension points.
We have long worked to capture different partitioning scenarios in our testing infra. This PR adds a new variant, inspired by the Jepsen blogs, which was forgotten far - namely a partition where one node can still see and be seen by all other nodes. It also updates the resiliency page to better reflect all the work that was done in this area.
Currently there are cases when using TimeIntervalRounding#round() and date1 <
date2 that round(date2) < round(date1). These errors can happen when using a
non-fixed time zone and the values to be rounded are slightly after a time zone
offset change (e.g. DST transition).
Here is an example for the "CET" time zone with a 45 minute rounding interval.
The dates to be rounded are on the left (with utc time stamp), the rounded
values on the right. The error case is marked:
2011-10-30T01:40:00.000+02:00 1319931600000 | 2011-10-30T01:30:00.000+02:00 1319931000000
2011-10-30T02:02:30.000+02:00 1319932950000 | 2011-10-30T01:30:00.000+02:00 1319931000000
2011-10-30T02:25:00.000+02:00 1319934300000 | 2011-10-30T02:15:00.000+02:00 1319933700000
2011-10-30T02:47:30.000+02:00 1319935650000 | 2011-10-30T02:15:00.000+02:00 1319933700000
2011-10-30T02:10:00.000+01:00 1319937000000 | 2011-10-30T01:30:00.000+02:00 1319931000000 *
2011-10-30T02:32:30.000+01:00 1319938350000 | 2011-10-30T02:15:00.000+01:00 1319937300000
2011-10-30T02:55:00.000+01:00 1319939700000 | 2011-10-30T02:15:00.000+01:00 1319937300000
2011-10-30T03:17:30.000+01:00 1319941050000 | 2011-10-30T03:00:00.000+01:00 1319940000000
We should correct this by detecting that we are crossing a transition when
rounding, and in that case pick the largest valid rounded value before the
transition.
This change adds this correction logic to the rounding function and adds this
invariant to the randomized TimeIntervalRounding tests. Also adding the example
test case from above (with corrected behaviour) for illustrative purposes.
Today we have a ton of logic inside the NettyTransport* codebase. The footprint
of the code that has a direct netty dependency is large and alternative implementations
are pretty hard today since they need to know all about our proticol etc.
This change moves most of the code into TCPTransport* baseclasses and moves all
the protocol send code together. The base classes now contain the majority of the logic
while NettyTransport* classes remain to implement the glue code, configuration and optimization.
The factory for ingest processor is generic, but that is only for the
return type of the create mehtod. However, the actual consumer of the
factories only cares about Processor, so generics are not needed.
This change removes the generic type from the factory. It also removes
AbstractProcessorFactory which only existed in order pull the optional
tag from config. This functionality is moved to the caller of the
factories in ConfigurationUtil, and the create method now takes the tag.
This allows the covariant return of the implementation to work with
tests not needing casts.
Previously all rest handlers would take Client in their injected ctor.
However, it was only to hold the client around for runtime. Instead,
this can be done just once in the HttpService which handles rest
requests, and passed along through the handleRequest method. It also
should always be a NodeClient, and other types of Clients (eg a
TransportClient) would not work anyways (and some handlers can be
simplified in follow ups like reindex by taking NodeClient).
`RestHandler`s are highly tied to actions so registering them in the
same place makes sense.
Removes the need to for plugins to check if they are in transport client
mode before registering a RestHandler - `getRestHandlers` isn't called
at all in transport client mode.
This caused guice to throw a massive fit about the circular dependency
between NodeClient and the allocation deciders. I broke the circular
dependency by registering the actions map with the node client after
instantiation.
In 2f638b5a23, support for fractional time
values was removed. While this change is documented, the error message
presented does not give an indication that fractional inputs are not
supported. This commit fixes this by detecting when the input is a time
value that would successfully parse as a double but will not parse as a
long and presenting a clear error message that fractional time values
are not supported.
Relates #19158
When doing a `date_histogram` aggregation with `"format":"epoch_millis"` or
`"format" : "epoch_second"` and using a time zone other than UTC, the
`key_as_string` ouput in the response does not reflect the UTC timestamp that is
used as the key. This happens because when applying the `time_zone` in
DocValueFormat.DateTime to an epoch-based formatter, this adds the time zone
offset to the value being formated. Instead we should adjust the added display
offset to get back the utc instance in EpochTimePrinter.
Closes#19038
As some plugins are becoming big now, it is hard for the user to know, if the plugin
is being downloaded or just nothing happens.
This commit adds a progress bar during download, which can be disabled by using the `-q`
parameter.
In addition this updates to jimfs 1.1, which allows us to test the batch mode, as adding
security policies are now supported due to having jimfs:// protocol support in URL stream
handlers.
We have a ton of tests for PagedBytesReference but not really many for the other
implementation of BytesReference. This change factors out a basic AbstractBytesReferenceTestCase
that simplifies testing other implementations. It also caught a couple of bug here and there like
a missing mask when reading bytes as ints in PagedBytesReference.
The ChannelBuffer interface today leaks into the BytesReference abstraction
which causes a hard dependency on Netty across the board. This chance moves
this dependency and all BytesReference -> ChannelBuffer conversion into
NettyUtlis and removes the abstraction leak on BytesReference.
This change also removes unused methods on the BytesReference interface
and simplifies access to internal pages.
This change allows Plugin implementions to implement Closeable when they
have resources that should be released. As a first example of how this
can be used, I switched over ingest plugins, which just had the geoip
processor. The ingest framework had chains of closeable to support this,
which is now removed.
Stored scripts are pulled from the cluster state, and the current api
requires passing the ClusterState on each call to compile. However, this
means every user of the ScriptService needs to depend on the
ClusterService. Instead, this change makes the ScriptService a
ClusterStateListener. It also simplifies tests a lot, as they no longer
need to create fake cluster states (except when testing stored scripts).
Today we have several deprecated methods, leaking netty interfaces, support for
multiple compressors on the compressor interface. The netty interface can simply
be replaced by BytesReference which we already have an implementation for, all the
others are not used and are removed in this commit.
The plan for persistent node ids ( #17811 ) is to tie the node identity to a file stored in it's data folders. As such it becomes important that nodes in our testing infra have better affinity with their data folders and that their data folders are not cleaned underneath them. The first is important because we fix the random seed used for node id generation (for reproducibility) and allowing the same node to use two different data folders causes two separate nodes to have the same id, which prevents the cluster from forming. The second is important, for example, where a full cluster restart / single node restart need to maintain node identity and wiping the data folders at the wrong moment prevents this.
Concretely this commit does the following:
1) Remove previous attempts to have data folder per role using a prefix. This wasn't effective as it was using the data paths settings which are only used for part of the runs. An attempt to completely separate the paths via the home dir failed due to assumptions made by index custom path about node data folder ordinal uniqueness (see #19076)
2) Change full cluster restarts to start up nodes in the same order their were first created in, only randomly swapping nodes with the same roles.
3) Change test cluster reset methods to first shutdown the unneeded nodes and then re-start the shared nodes that were shut down, so they'll reclaim their data folders.
4) Improve data folder wiping logic and make sure it wipes only folders of "offline" nodes.
5) Add some very basic tests
Thanks to https://github.com/elastic/elasticsearch/pull/19088 the settings are now validated against dynamic updaters on the master.
Though only the new settings are applied to the IndexService created for the validation.
Because of this we cannot check the transition from one value to another in a dynamic updaters.
This change creates the IndexService from the current settings and validates that the new dynamic settings
can replace the current settings.
This change also removes the validation of dynamic settings when an index is opened.
The validation should have occurred when the settings have been updated.
Instead of implementing onModule(ActionModule) to register actions,
this has plugins implement ActionPlugin to declare actions. This is
yet another step in cleaning up the plugin infrastructure.
While I was in there I switched AutoCreateIndex and DestructiveOperations
to be eagerly constructed which makes them easier to use when
de-guice-ing the code base.
This commit modifies TimeValue parsing to keep the input time unit. This
enables round-trip parsing from instances of String to instances of
TimeValue and vice-versa. With this, this commit removes support for the
unit "w" representing weeks, and also removes support for fractional
values of units (e.g., 0.5s).
Relates #19102
Today all settings are only validated against their validators
that are available when settings are registered. Yet, some settings updaters
have validators that are dynamic ie. their validation depends on other variables
that are only available at runtime. We do not run those validators when settings
are updated causing index updates to fail on the data nodes instead of on the master.
Relates to #19046
Previous to this change the unresolved extended bounds was passed into the histogram aggregator which meant extendedbounds.min and extendedbounds.max was passed through as null. This had two effects on the histogram aggregator:
1. If the histogram aggregator was unmapped across all shards, the reduce phase would not add buckets for the extended bounds and the response would contain zero buckets
2. If the histogram aggregator was not unmapped in some shards, the reduce phase might sometimes chose to reduce based on the unmapped shard response and therefore the extended bounds would be ignored.
This change resolves the extended bounds in the unmapped case and solves the above two issues.
Closes#19009
#18938 has changed the timing in which we send out to nodes to fetch their shard stores. Instead of doing this after the cluster state resulting of the node's join was published, #18938 made it be sent concurrently to the publishing processes. This revealed a couple of points where the shard store fetching is dependent of the current state of affairs of the cluster state, both on the master and the data nodes. The problem discovered were already present without #18938 but required a failure/extreme situations to make them happen.This PR tries to remove as much as possible of these dependencies making shard store fetching simpler and make the way to re-introduce #18938 which was reverted.
These are the notable changes:
1) Allow TransportNodesAction (of which shard store fetching is derived) callers to supply concrete disco nodes, so it won't need the cluster state to resolve them. This was a problem because the cluster state containing the needed nodes was not yet made available through ClusterService. Note that long term we can expect the rest layer to resolve node ids to concrete nodes, making this mode the only one needed.
2) The data node relied on the cluster state to have the relevant index meta data so it can find data when custom paths are used. We now fall back to read the meta data from disk if needed.
3) The data node was relying on it's own IndexService state to indicate whether the data it has corresponds to an existing allocation. This is of course something it can not know until it got (and processed) the new cluster state from the master. This flag in the response is now removed. This is not a problem because we used that flag to protect against double assigning of a shard to the same node, but we are already protected from it by the allocation deciders.
4) I removed the redundant filterNodeIds method in TransportNodesAction - if people want to filter they can override resolveRequest.
This commit fixes several NPEs caused by implicitly performing a get request for a document that exists with its _source disabled and then trying to access the source. Instead of causing an NPE the following queries will throw an exception with a "source disabled" message (similar behavior as if the document does not exist).:
- GeoShape query for pre-indexed shape (throws IllegalArgumentException)
- Percolate query for an existing document (throws IllegalArgumentException)
A Terms query with a lookup will ignore the document if the source does not exist (same as if the document does not exist).
GET and HEAD requests for the document _source will return a 404 if the source is disabled (even if the document exists).
Instead of plugins calling `registerTokenizer` to extend the analyzer
they now instead have to implement `AnalysisPlugin` and override
`getTokenizer`. This lines up extending plugins in with extending
scripts. This allows `AnalysisModule` to construct the `AnalysisRegistry`
immediately as part of its constructor which makes testing anslysis
much simpler.
This also moves the default analysis configuration into `AnalysisModule`
which is how search is setup.
Like `ScriptModule`, `AnalysisModule` no longer extends `AbstractModule`.
Instead it is only responsible for building `AnslysisRegistry`. We still
bind `AnalysisRegistry` but we only do so in `Node`. This is means it
is available at module construction time so we slowly remove the need to
bind it in guice.
Today when parsing the timeout field in a query body, if time units are
supplied the parser throws a NumberFormatException. Addtionally, the
parsing allows the timeout field to not specify units (it assumes
milliseconds). This commit fixes this behavior by not only allowing time
units to be specified but requires time units to be specified. This is
consistent with the documented behavior and the behavior in 2.x.
Relates #19077
This commit fixes several NPEs caused by implicitly performing a get request for a document that exists with its _source disabled and then trying to access the source. Instead of causing an NPE the following queries will throw an exception with a "source disabled" message (similar behavior as if the document does not exist).:
- GeoShape query for pre-indexed shape (throws IllegalArgumentException)
- Percolate query for an existing document (throws IllegalArgumentException)
A Terms query with a lookup will ignore the document if the source does not exist (same as if the document does not exist).
GET and HEAD requests for the document _source will return a 404 if the source is disabled (even if the document exists).
If we don't care about scoring then for certain candidate matches we can be certain, that if they are a candidate match,
then they will always match. So verifying these queries with the MemoryIndex can be skipped.
Currently we don't throw an error when there is more than one query clause
specified in a must/must_not/should/filter object of the bool query without
using array notation, e.g.:
{ "bool" : { "must" : { "match" : { ... }, "match": { ... }}}}
In these cases, only the first query will be parsed and further behaviour is
unspecified, possibly leading to silently ignoring the rest of the query.
Instead we should throw a ParsingException if we don't encounter an END_OBJECT
token after having parsed the query clause.
Global cluster blocks were not checked if an empty set of indices were passed as argument to the block checking method. This would lead to issues where some operations are already executed on a cluster before it has recovered its cluster state.
This commit upgrades JNA from version 4.1.0 to 4.2.2. Additionally, this
dependency is now non-optional as JNA is dual-licensed with Apache
License 2.0 since JNA 4.0.0.
Relates #19045
This is the same as what Lucene does for its analysis factories, and we hawe
tests that make sure that the elasticsearch factories are in sync with
Lucene's. This is a first step to move forward on #9978 and #18064.
I suspect recent failures are due to the fact that the cache disables itself
when there is contention. This runs assertions in an assertBusy block since
they should eventually succeed.
This commit moves template support out of the Search API to its own dedicated Search Template API in the lang-mustache module. It provides a new SearchTemplateAction that can be used to render templates before it gets delegated to the usual Search API. The current REST endpoint are identical, but the Render Search Template endpoint now uses the same Search Template API with a new "simulate" option. When this option is enabled, the Search Template API only renders template and returns immediatly, without executing the search.
Closes#17906
There are secondary issues with async shard fetch going out to nodes before they have a cluster state published to them that need to be solved first. For example:
- async fetch uses transport node action that resolves nodes based on the cluster state (but it's not yet exposed by ClusterService since we inline the reroute)
- after disruption nodes will respond with an allocated shard (they didn't clean up their shards yet) which throws of decisions master side.
- nodes deed the index meta data in question but they may not have if they didn't recieve the latest CS
If a plugin declares `onModule(SomethingThatIsntAModule)` then refuse
to start. Before this commit we just logged a warning that flies by in
the console and is easy to miss. You can't miss refusing to start!
This commit removes duplicated methods for reading byte arrays in
StreamInput. One method would read a byte array by repeatedly calling
StreamInput#readByte in a loop, and the other would just call
StreamInput#readBytes. In this commit, we remove the former.
Relates #19023
`stored_fields` parameter will no longer try to retrieve fields from the _source but will only return stored fields.
`fields` will throw an exception if the user uses it.
Add `docvalue_fields` as an adjunct to `fielddata_fields` which is deprecated. `docvalue_fields` will try to load the value from the docvalue and fallback to fielddata cache if docvalues are not enabled on that field.
Closes#18943
This changes adds a MapperPlugin interface which allows pull style
retrieval of mappers and metadata mappers added by plugins. For now, I
have kept the MapperRegistry, but this should be removed in the future
as it is just a silly container for 2 maps which could themselves be
passed around.
This commit refactors InternalEngine#innerIndex and
InternalEngine#innerDelete to collapse some common logic into a single
method. This has the advantage that it shrinks the bytecode size of
InternalEngine#innerIndex so that it can be inlined.
Makes ScriptModule just a plain class that manages building the
ScriptSettings and ScriptService from plugins. When we *need*
to bind ScriptService with guice we bind it in a lambda.
- fix it so that processors with the `ignore_failure` option do not
record their exception in the response
- add more tests to make empty `on_failure`. This now throws an
exception
Today we only throw random exceptions on the translog writer. This commit
extends it to also throw exceptions during checkpoint writing etc to test
if the correct flags are provided to open method etc.
This makes this sequence:
```
curl -XDELETE localhost:9200/source,dest?pretty
for i in $( seq 1 100 ); do
curl -XPOST localhost:9200/source/test -d'{"test": "test"}'; echo
done
curl localhost:9200/_refresh?pretty
curl -XPOST 'localhost:9200/_reindex?pretty&wait_for_completion=false' -d'{
"source": {
"index": "source"
},
"dest": {
"index": "dest"
}
}'
curl 'localhost:9200/_tasks/Jsyd6d9wSRW-O-NiiKbPcQ:237?wait_for_completion&pretty'
```
Return task *AND* the response to the user.
This also renames "result" to "response" in the persisted task info
to line it up with how we name the objects in Elasticsearch.
This commit modifies reading the Elasticsearch jar manifest via the URL
instead of converting the URL to an NIO path for increased portability.
Relates #18999
The default similarity was set to `classic` which refers to TFIDF and has not been moved after the upgrade to Lucene 6.
Though moving to BM25 could have some downside for queries that relies on coordination factor (match_query, multi_match_query) ?
relates #18944
```
> Throwable #1: java.lang.RuntimeException: file handle leaks: [SeekableByteChannel(/var/lib/jenkins/workspace/elastic+elasticsearch+master+g1gc/core/build/testrun/integTest/J0/temp/org.elasticsearch.search.suggest.CompletionSuggestSearch2xIT_518545A20D129C8C-001/tempDir-001/data/nodes/1/indices/4sTECv6WSJOJsw9L4CGamg/0/index/segments_1), SeekableByteChannel(/var/lib/jenkins/workspace/elastic+elasticsearch+master+g1gc/core/build/testrun/integTest/J0/temp/org.elasticsearch.search.suggest.CompletionSuggestSearch2xIT_518545A20D129C8C-001/tempDir-001/data/nodes/1/indices/4sTECv6WSJOJsw9L4CGamg/0/index/segments_1)]
> at __randomizedtesting.SeedInfo.seed([518545A20D129C8C]:0)
> at org.apache.lucene.mockfile.LeakFS.onClose(LeakFS.java:63)
> at org.apache.lucene.mockfile.FilterFileSystem.close(FilterFileSystem.java:77)
> at org.apache.lucene.mockfile.FilterFileSystem.close(FilterFileSystem.java:78)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception
> at org.apache.lucene.mockfile.LeakFS.onOpen(LeakFS.java:46)
> at org.apache.lucene.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:81)
> at org.apache.lucene.mockfile.HandleTrackingFS.newByteChannel(HandleTrackingFS.java:271)
> at org.apache.lucene.mockfile.FilterFileSystemProvider.newByteChannel(FilterFileSystemProvider.java:212)
> at org.apache.lucene.mockfile.HandleTrackingFS.newByteChannel(HandleTrackingFS.java:240)
> at java.nio.file.Files.newByteChannel(Files.java:361)
> at java.nio.file.Files.newByteChannel(Files.java:407)
> at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:77)
> at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:94)
> at org.apache.lucene.util.LuceneTestCase.slowFileExists(LuceneTestCase.java:2695)
> at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:737)
> at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:94)
> at org.elasticsearch.common.lucene.Lucene$1.doBody(Lucene.java:237)
> at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:685)
> at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:637)
> at org.elasticsearch.common.lucene.Lucene.checkSegmentInfoIntegrity(Lucene.java:242)
> at org.elasticsearch.index.store.Store$MetadataSnapshot.loadMetadata(Store.java:847)
> at org.elasticsearch.index.store.Store$MetadataSnapshot.<init>(Store.java:740)
> at org.elasticsearch.index.store.Store.getMetadata(Store.java:260)
> at org.elasticsearch.index.store.Store.getMetadata(Store.java:240)
> at org.elasticsearch.index.shard.IndexShard.doCheckIndex(IndexShard.java:1310)
> at org.elasticsearch.common.util.CancellableThreads.executeIO(CancellableThreads.java:102)
> at org.elasticsearch.index.shard.IndexShard.checkIndex(IndexShard.java:1288)
> at org.elasticsearch.index.shard.IndexShard.internalPerformTranslogRecovery(IndexShard.java:921)
> at org.elasticsearch.index.shard.IndexShard.skipTranslogRecovery(IndexShard.java:964)
> at org.elasticsearch.indices.recovery.RecoveryTarget.prepareForTranslogOperations(RecoveryTarget.java:297)
> at
```
The MMapDirectory has a switch that allows the content of files to be loaded
into the filesystem cache upon opening. This commit exposes it with the new
`index.store.pre_load` setting.
Today we only emit that the setting wasn't found unless we have
some DYM suggestions. Yet, if a setting is not found at all and there
are no suggestions due to typos it's likely a removed setting or the plugin
that is supposed to be configured is not installed.
This commit adds some info text to the exception to help the user debugging
the problem before opening bugreports.
Instead of emitting:
`unknown setting [foo.bar]`
we now emit:
`unknown setting [foo.bar] please check the migration guide for removed settings and ensure that the plugin you are configuring is installed`
Relates to #18663
If someone sets `index.shard.check_on_startup`, indexing start up time can be slow (by design, it diligently goes and checks all data). If for some reason the shard is closed in that time, the store ref is kept around and prevents a new shard copy to be allocated to this node via the shard level locks. This is especially tricky if the shard is close due to a cancelled recovery which may re-restart soon.
This commit adds a cancellable threads instance to each IndexShard and perform index checking underneath it, so it can be cancelled on close.
We pretended to be able to ackt like a different version node for so long it's
time to be honest and remove this ability. It's just confusing and where needed
and tested we should build dedicated extension points.
This commit introduce unit testing infrastructure to test replication operations using real index shards. This is infra is complementary to the full integration tests and unit testing of ReplicationOperation we already have. The new ESIndexLevelReplicationTestCase base makes it easier to test and simulate failure mode that require real shards and but do not need the full blow stack of a complete node.
The commit also add a simple "nothing is wrong" test plus a test that checks we don't drop docs during the various stages of recovery.
For now, only single doc indexing is supported but this can be easily extended in the future.
This class was forked in 0.20 to remove a volatile keyword. While there
is no issue attached to the commit, no evidence of the criticality of the
change nor does it seem to be correct since we set this value internally as well
I think this class should be used as is from joda time even if we have to pay
the price of volatile reads. We can't do 3rd party optimization in our codebase that
way it just not maintainable.
This was added in 2280915d3c
This commit removes the search preference _only_node as the same
functionality can be obtained by using the search preference
_only_nodes. This commit also adds a test that ensures that _only_nodes
will continue to support specifying node IDs.
Relates #18875
In the past, we had the semantics where the very first cluster state a node processed after joining could not contain shard assignment to it. This was to make sure the node cleans up local / stale shard copies before receiving new ones that might confuse it. Since then a lot of work in this area, most notably the introduction of allocation ids and #17270 . This means we don't have to be careful and just reroute in the same cluster state change where we process the join, keeping things simple and following the same pattern we have in other places.
This change removes some unnecessary dependencies from ClusterService
and cleans up ClusterName creation. ClusterService is now not created
by guice anymore.
The NodeJoinController is responsible for processing joins from nodes, both normally and during master election. For both use cases, the class processes incoming joins in batches in order to be efficient and to accumulated enough joins (i.e., >= min_master_nodes) to seal an election and ensure the new cluster state can be committed. Since the class was written, we introduced a new infrastructure to support batch changes to the cluster state at the `ClusterService` level. This commit rewrites NodeJoinController to use that infra and be simpler.
The PR also introduces a new concept to ClusterService allowing to submit tasks in batches, guaranteeing that all tasks submitted in a batch will be processed together (potentially with more tasks). On top of that I added some extra safety checks to the ClusterService, around potential double submission of task objects into the queue.
This is done in preparation to revive #17811
Today we have a push model for registering basically anything. All our extension points
are defined on modules which we pass in to plugins. This is harder to maintain and adds
unnecessary dependencies on the modules itself. This change moves towards a pull model
where the plugin offers a getter kind of method to get the extensions. This will also
help in the future if we need to pass dependencies to the extension points which can
easily be defined on the method as arguments if a pull model is used.
Currently the error messages for failing tests in the TimeZoneRoundingTests test
suite are hard to read because they usually report the actual end expected date
in milliseconds utc (e.g. "Expected: <1414270860000L> but: was <1414270800000L>".
This makes failing tests hard to read.
This change introduces a new Matcher that can be used for equality checks for
long dates but reports the error both as a formated date string according to
some time zone and also as the actual long values, so you get messages like
"Expected: 2014-10-26T00:01:00.000+03:00 [1414270860000] but: was
"2014-10-26T00:00:00.000+03:00 [1414270800000]".
Also clean cleaning up some helper methods and generally simplifying a few test
cases. Otherwise this change shouldn't affect either the scope of the test or
anything about the rounding implementation itself.
They have been implemented in https://issues.apache.org/jira/browse/LUCENE-7289.
Ranges are implemented so that the accuracy loss only occurs at index time,
which means that if you are searching for values between A and B, the query will
match exactly all documents whose value rounded to the closest half-float point
is between A and B.
Registering a script engine or native scripts still uses Guice today
and is much more complicated than needed. This change moves to a pull
based model where script plugins have to implement a dedicated interface
`ScriptPlugin` and defines simple getter returning instances rather than
classes.
In 2.0 we added plugin descriptors which require defining a name and
description for the plugin. However, we still have name() and
description() which must be overriden from the Plugin class. This still
exists for classpath plugins. But classpath plugins are mainly for
tests, and even then, referring to classpath plugins with their class is
a better idea. This change removes name() and description(), replacing
the name for classpath plugins with the full class name.
This interface used to have dedicated methods to prevent calling execute
methods. These methods are unnecessary as the checks can simply be
done inside the execute methods itself. This simplifies the interface
as well as its usage.
With this commit we exclude certain HTTP requests that are needed to inspect the cluster
from HTTP request limiting to ensure these commands are processed even in critical
memory conditions.
Relates #17951, relates #18145, closes#18833
When installing plugins, we first try the elastic download service for
official plugins, then try maven coordinates, and finally try the
argument as a url. This can lead to confusing error messages about
unknown protocols when eg an official plugin name is mispelled. This
change adds a heuristic for determining if the argument in the final
case is in fact a url that we should try, and gives a simplified error
message in the case it is definitely not a url.
closes#17226
The search preference _prefer_node allows specifying a single node to
prefer when routing a request. This functionality can be enhanced by
permitting multiple nodes to be preferred. This commit replaces the
search preference _prefer_node with the search preference _prefer_nodes
which supplants the former by specifying a single node and otherwise
adds functionality.
Relates #18872
This caches FieldStats at the field level. For one off requests or for
few indicies this doesn't save anything, but when there are 30 indices,
5 shards, 1 replica, 100 parallel requests this is about twice as fast
as not caching. I expect lots of usage won't see much benefit from this
but pointing kibana to a cluster with many indexes and shards, will be
faster.
Closes#18717
This adds a get task API that supports GET /_tasks/${taskId} and
removes that responsibility from the list tasks API. The get task
API supports wait_for_complation just as the list tasks API does
but doesn't support any of the list task API's filters. In exchange,
it supports falling back to the .results index when the task isn't
running any more. Like any good GET API it 404s when it doesn't
find the task.
Then we change reindex, update-by-query, and delete-by-query to
persist the task result when wait_for_completion=false. The leads
to the neat behavior that, once you start a reindex with
wait_for_completion=false, you can fetch the result of the task by
using the get task API and see the result when it has finished.
Also rename the .results index to .tasks.
There are edge cases where rounding a date to a certain interval using a time
zone with DST shifts can currently cause the rounded date to be bigger than the
original date. This happens when rounding a date closely after a DST start and
the rounded date falls into the DST gap.
Here is an example for CET time zone, where local time is set forward by one
hour at 2016-03-27T02:00:00+01:00 to 2016-03-27T03:00:00.000+02:00:
The date 2016-03-27T03:01:00.000+02:00 (1459040460000) which is just after the
DST change is first converted to local time (1459047660000). If we then apply
interval rounding for a 14m interval in local time, this takes us to
1459047240000, which unfortunately falls into the DST gap. When converting
this back to UTC, joda provides options to throw exceptions on illegal dates
like this, or correct this by adjusting the date to the new time zone offset.
We currently do the later, but this leads to converting this illegal date back
to 2016-03-27T03:54:00.000+02:00 (1459043640000), giving us a date that is
larger than the original date we wanted to round.
This change fixes this by using the "strict" option of 'convertLocalToUTC()'
to detect rounded dates that fall into the DST gap. If this happens, we can use
the time of the DST change instead as the interval start.
Even before this change, intervals around DST shifts like this can be shorter
than the desired interval. This, for example, happens when the requested
interval width doesn't completely fit into the remaining time span when the DST
shift happens. For example, using a 14m interval in UTC+1 (CET before DST
starts) leads to the following valid rounding values around the time where DST
happens:
2016-03-27T01:30:00+01:00
2016-03-27T01:44:00+01:00
2016-03-27T01:58:00+01:00
2016-03-27T02:12:00+01:00
2016-03-27T02:26:00+01:00
...
while the rounding values in UTC+2 (CET after DST start) are placed like this
around the same time:
2016-03-27T02:40:00+02:00
2016-03-27T02:54:00+02:00
2016-03-27T03:08:00+02:00
2016-03-27T03:22:00+02:00
...
From this we can see then when we switch from UTC+1 to UTC+2 at 02:00 the last
rounding value in UTC+1 is at 01:58 and the first valid one in UTC+2 is at
03:08, so even if we decide to put all the dates in between into one rounding
interval, it will only cover 10 minutes. With this change we choose to use the
moment of DST shift as an aditional interval separator, leaving us with a 2min
interval from [01:58,02:00) before the shift and an 8min interval from
[03:00,03:08) after the shift.
This change also adds tests for the above example and adds randomization to the
existing TimeIntervalRounding tests.
By default the number of searches msearch executes is capped by the number of
nodes multiplied with the default size of the search threadpool. This default can be
overwritten by using the newly added `max_concurrent_searches` parameter.
Before the msearch api would concurrently execute all searches concurrently. If many large
msearch requests would be executed this could lead to some searches being rejected
while other searches in the msearch request would succeed.
The goal of this change is to avoid this exhausting of the search TP.
Closes#17926
Writeable is better for immutable objects like TimeValue.
Switch to writeZLong which takes up less space than the original
writeLong in the majority of cases. Since we expect negative
TimeValues we shouldn't use
writeVLong.
Today we use a random source of UUIDs for assigning allocation IDs,
cluster IDs, etc. Yet, the source of randomness for this is not
reproducible in tests. Since allocation IDs end up as keys in hash maps,
this means allocation decisions and not reproducible in tests and this
leads to non-reproducible test failures. This commit modifies the
behavior of random UUIDs so that they are reproducible under tests. The
behavior for production code is not changed, we still use a true source
of secure randomness but under tests we just use a reproducible source
of non-secure randomness.
It is important to note that there is a test,
UUIDTests#testThreadedRandomUUID that relies on the UUIDs being truly
random. Thus, we have to modify the setup for this test to use a true
source of randomness. Thus, this is one test that will never be
reproducible but it is intentionally so.
Relates #18808
When trying to restore a snapshot of an index created in a previous
version of Elasticsearch, it is possible that empty shards in the
snapshot have a segments_N file that has an unsupported Lucene version
and a missing checksum. This leads to issues with restoring the
snapshot. This commit handles this special case by avoiding a restore
of a shard that has no data, since there is nothing to restore anyway.
Closes#18707
Testability of ICSS is achieved by introducing interfaces for IndicesService, IndexService and IndexShard. These interfaces extract all relevant methods used by ICSS (which do not deal directly with store) and give the possibility to easily mock all the store behavior away in the tests (and cuts down on dependencies).
Add Aggregation profiling initially only be for the shard phases (i.e. the reduce phase will not be profiled in this change)
This change refactors the query profiling class to extract abstract classes where it is useful for other profiler types to share code.
It presented as listeners never being called if you refresh at the same
time as the listener is added. It was caught rarely by
testConcurrentRefresh. mostly this is removing code and adding a comment:
```
Note that it is not safe for us to abort early if we haven't advanced the
position here because we set and read lastRefreshedLocation outside of a
synchronized block. We do that so that waiting for a refresh that has
already passed is just a volatile read but the cost is that any check
whether or not we've advanced the position will introduce a race between
adding the listener and the position check. We could work around this by
moving this assignment into the synchronized block below and double
checking lastRefreshedLocation in addOrNotify's synchronized block but
that doesn't seem worth it given that we already skip this process early
if there aren't any listeners to iterate.
```
This commit addresses a performance issue in
IndicesClusterStateService#applyDeletedShards. Namely, the current
implementation is O(number of indices * number of shards). This is
because of an outer loop over the indices and an inner loop over the
assigned shards, all to check if a shard is in the outer index. Instead,
we can group the shards by index, and then just do a map lookup for each
index.
Testing this on a single-node with 2500 indices, each with 2 shards,
creating an index before this optimization takes 0.90s and after this
optimization takes 0.19s.
Relates #18788
You declare them like
```
static {
PARSER.declareInt(optionalConstructorArg(), new ParseField("animal"));
}
```
Other than being optional they follow all of the rules of regular
`constructorArg()`s. Parsing an object with optional constructor args
is going to be slightly less efficient than parsing an object with
all required args if some of the optional args aren't specified because
ConstructingObjectParser isn't able to build the target before the
end of the json object.
Due to an error in our current TimeIntervalRounding, two dates can
round to the same key, even when they are 1h apart when using
short interval roundings (e.g. 20m) and a time zone with DST change.
Here is an example for the CET time zone:
On 25 October 2015, 03:00:00 clocks are turned backward 1 hour to
02:00:00 local standard time. The dates
"2015-10-25T02:15:00+02:00" (1445732100000) (before DST end) and
"2015-10-25T02:15:00+01:00" (1445735700000) (after DST end)
are thus 1h apart, but currently they round to the same value
"2015-10-25T02:00:00.000+01:00" (1445734800000).
This violates an important invariant of rounding, namely that the
rounded value must be less or equal to the value that is rounded.
It also leads to wrong histogram bucket counts because documents in
[02:00:00+02:00, 02:20:00+02:00) go to the same bucket as documents
from [02:00:00+01:00, 02:20:00+01:00).
The problem happens because in TimeIntervalRounding#roundKey() we
need to perform the rounding operation in local time, but on
converting back to UTC we don't honor the original values time zone
offset. This fix changes that and adds tests both for DST start and
DST end as well as a test that demonstrates what happens to bucket
sizes when the dst change is not evently divisibly by the interval.
Previously Elasticsearch used $DATA_DIR/$CLUSTER_NAME/nodes for the path
where data is stored, this commit changes that to be $DATA_DIR/nodes.
On startup, if the old folder structure is detected it will be used.
This behavior will be removed in Elasticsearch 6.0
Resolves#17810
Folded grok processor into ingest-common module.
The rest tests have been moved to ingest-common module as well, because these tests don't run in the rest-api-spec module but in the distribution:integ-test-zip module
and adding a test plugin there felt just wrong to me. I think this is ok. I left a tiny ingest rest test behind in that tests with an empty pipeline.
Removed messy tests, these tests were already covered in the rest tests
Added ingest test plugin in test infra so that each module testing integration with ingest doesn't need write its own plugin
Moved reindex ingest tests to qa module
Closes#18490
API:
```
curl -XGET 'localhost:9200/twitter/tweet/_search?scroll=1m' -d '{
"slice": {
"field": "_uid", <1>
"id": 0, <2>
"max": 10 <3>
},
"query": {
"match" : {
"title" : "elasticsearch"
}
}
}
```
<1> (optional) The field name used to do the slicing (_uid by default)
<2> The id of the slice
By default the splitting is done on the shards first and then locally on each shard using the _uid field
with the following formula:
`slice(doc) = floorMod(hashCode(doc._uid), max)`
For instance if the number of shards is equal to 2 and the user requested 4 slices then the slices 0 and 2 are assigned
to the first shard and the slices 1 and 3 are assigned to the second shard.
Each scroll is independent and can be processed in parallel like any scroll request.
Closes#13494
This commit modifies the bootstrap check invocations in the might fork
tests to use the underlying test name when setting up the logging prefix
when invoking the bootstrap checks. This is done to give clear logs in
case of failure.
Today we allow to shrink to 1 shard but that might not be possible due to
too many document or a single shard doesn't meet the requirements for the index.
The logic can be expanded to N shards if the source index shards is a multiple of N.
This guarantees that there are not hotspots created due to different number of shards
being shrunk into one.
This commit fixes a compilation issue in RefreshListenersTests that
arose from code being integrated into master, and then a large pull
request refactoring the handling of thread pools was later merged into
master.
This commit adds a bootstrap check for the JVM option OnError being in
use and seccomp being enabled. These two options are incompatible
because OnError allows the user to specify an arbitrary program to fork
when the JVM encounters an fatal error, and seccomp enables system call
filters that prevents forking.
This commit refactors the handling of thread pool settings so that the
individual settings can be registered rather than registering the top
level group. With this refactoring, individual plugins must now register
their own settings for custom thread pools that they need, but a
dedicated API is provided for this in the thread pool module. This
commit also renames the prefix on the thread pool settings from
"threadpool" to "thread_pool". This enables a hard break on the settings
so that:
- some of the settings can be given more sensible names (e.g., the max
number of threads in a scaling thread pool is now named "max" instead
of "size")
- change the soft limit on the number of threads in the bulk and
indexing thread pools to a hard limit
- the settings names for custom plugins for thread pools can be
prefixed (e.g., "xpack.watcher.thread_pool.size")
- remove dynamic thread pool settings
Relates #18674