This change removes all guice interaction from Transport, HttpServerTransport,
HttpServer and TransportService. All these classes as well as their subclasses
or extended version configured via plugins are now created by using plain old
bloody java constructors. YAY!
Since #19975 we are aggressively failing with AssertionError when we catch an ACE
inside the InternalEngine. We treat everything that is neither a tragic even on
the IndexWriter or the Translog as a bug and throw an AssertionError. Yet, if the
engine hits an IOException on refresh of some sort and the IW doesn't realize it since
it's not fully under it's control we fail he engine but neither IW nor Translog are marked
as failed by tragic event while they are already closed.
This change takes the `failedEngine` exception into account and if it's set we know
that the engine failed by some other even than a tragic one and can continue.
This change also uses the `ReferenceManager#RefreshListener` interface in the engine rather
than it's concrete implementation.
Relates to #19975
Currently all the reroute-like methods of `AllocationService` return a result object of type `RoutingAllocation.Result`. The result object contains the new `RoutingTable` and `MetaData` plus an indication whether those were changed. The caller is then responsible of updating a cluster state with these. These means that things can easily go wrong and one can take one of these but not the other causing inconsistencies. We already have a utility method on the `ClusterState` builder that does but no one forces you to do so. Also 99% of the callers do the same thing: i.e., check if the result was changed and if so update the very same cluster state that was passed to `AllocationService`. This PR folds this pattern into `AllocationService` and changes almost all it's methods to return a new cluster state (potentially the original one). This saves some 500 lines of code.
The one exception here is the reroute API which executes allocation commands and potentially returns an explanation as well (next to the routing table and metadata). That API now returns a `CommandsResult` object which encapsulate a cluster state and the explanation.
A few of our unit tests generate a random search request body nd run tests against it. The source can optionally contain ext elements under the ext sections, which can be parsed by plugins. With this commit we introduce a plugin so that the tests don't use the one from FetchSubPhasePluginIT anymore. They rather generate multiple search ext elements. The plugin can parse and deal with all those. This extends the test coverage as we may have multiple elements with random names.
Took the chance to introduce a common test base class for search requests, called AbstractSearchTestCase, given that the setup phase is the same for all three tests around search source. Then we can have the setup isolated to the base class and the subclasses relying on it.
Closes#17685
* Throw error if query element doesn't end with END_OBJECT
Followup to #20515 where we added validation that after we parse a query within a query element, we should not get a field name. Truth is that the only token allowed at that point is END_OBJECT, as our DSL allows only one single query within the query object:
```
{
"query" : {
"term" : { "field" : "value" }
}
}
```
We can then check that after parsing of the query we have an end_object that closes the query itself (which we already do). Following that we can check that the query object is immediately closed, as there are no other tokens that can be present in that position.
Relates to #20515
I'd made some mistakes that hadn't caused the test to fail but did
slow it down and partially invalidate some of the assertions. This
fixes those mistakes.
* Fix FieldStats deserialization of `ip` field
Add missing readBytes in `ip` field deserialization
Add (de)serialization tests for all types
This change also removes the ability to set FieldStats.minValue or FieldStats.maxValue to null.
This is not required anymore since the stats are built on fields with values only.
Fixes#20516
If an index was created with pre 2.0 we should not treat it as supported
even if all segments have been upgraded to a supported lucene version.
Closes#20512
TransportService is such a central part of the core server, replacing
it's implementation is risky and can cause serious issues. This change removes the ability to
plug in TransportService but allows registering a TransportInterceptor that enables
plugins to intercept requests on both the sender and the receiver ends. This is a commonly used
and overwritten functionality but encapsulates the custom code in a contained manner.
During a networking partition, cluster states updates (like mapping changes or shard assignments)
are committed if a majority of the masters node received the update correctly. This means that the current master has access to enough nodes in the cluster to continue to operate correctly. When the network partition heals, the isolated nodes catch up with the current state and get the changes they couldn't receive before. However, if a second partition happens while the cluster
is still recovering from the previous one *and* the old master is put in the minority side, it may be that a new master is elected which did not yet catch up. If that happens, cluster state updates can be lost.
This commit fixed 95% of this rare problem by adding the current cluster state version to `PingResponse` and use them when deciding which master to join (and thus casting the node's vote).
Note: this doesn't fully mitigate the problem as a cluster state update which is issued concurrently with a network partition can be lost if the partition prevents the commit message (part of the two phased commit of cluster state updates) from reaching any single node in the majority side *and* the partition does allow for the master to acknowledge the change. We are working on a more comprehensive fix but that requires considerate work and is targeted at 6.0.
Currently, we silently accept malformed query where more
than one key is defined at the top-level for query object.
If all the keys have a valid query body, only the last query
is executed, besides throwing off parsing for additional suggest,
aggregation or highlighting defined in the search request.
This commit throws a parsing exception when we encounter a query
with multiple keys.
closes#20500
The default of 30s causes some tests to timeout when running ensureGreen and similar. This is because network delays simulation blocks connect until either the connect timeout expires or the disruption configured time stops. We do *not* immediately connect when the disruption is stopped.
Today when starting Elasticsearch without a Log4j 2 configuration file,
we end up throwing an array index out of bounds exception. This is
because we are passing no configuration files to Log4j. Instead, we
should throw a useful error message to the user. This commit modifies
the Log4j configuration setup to throw a user exception if no Log4j
configuration files are present in the config directory.
Relates #20493
The test uses a NetworkDelay that drops requests and slows down connecting. Next to that it disable node fault detection to make sure nodes are not removed before we check our publishing. Sadly that can lead to huge slow downs if the disruption hits while a node is still pinging (and tries to connect, which is slowed down). Instead we can start the disruption on the cluster state thread, making sure the result of fault detection won't be processed before we publish
This was actually a byproduct of trying to remove a //norelease for
index shard setting validation in MetaDataIndexService. This //norelease
is now removed. Previously this check was *only* used by the template
service, so we validated twice, once in the Settings infrastructure and
once when actually creating the index. We now instead use the Settings
infrastructure to validate the settings for shard count.
`TransportService#registerRequestHandler` allowed to register
handlers more than once and issues an annoying warn log message when
this happens. This change simple throws an exception to prevent regsitering
the same handler more than once. This commit also removes the ability
to remove request handlers.
Relates to #20468
We still use some crazy poor mans compression in InternalSearchHit that
uses a thread local and an unordered map as a lookup table if requested.
Stuff like this should be handled by compression on the transport layer
rather than in-line in the serialization code. This code is complex enough.
This utility class is used in 3 places while we only need to register
the handlers once per node. Otherwise we will see nasty `WARN` logs like:
`registered two transport handlers for action indices:data/read/search[phase/fetch/id/scroll]...`
This change will only register handlers inside the main TransportSearchAction.
After this change SearchModule doesn't subclass AbstractModule anymore and all wiring
happens in `Node.java`. As a side-effect several tests don't need a guice injector anymore.
This commit modifies the logger names within Elasticsearch to be the
fully-qualified class name as opposed removing the org.elasticsearch
prefix and dropping the class name. This change separates the root
logger from the Elasticsearch loggers (they were equated from the
removal of the org.elasticsearch prefix) and enables log levels to be
set at the class level (instead of the package level).
Relates #20457
Today when setting the logging level via the command-line or an API
call, the expectation is that the logging level should trickle down the
hiearchy to descendant loggers. However, this is not necessarily the
case. For example, if loggers x and x.y are already configured then
setting the logging level on x will not descend to x.y. This is because
the logging config for x.y has already been forked from the logging
config for x. Therefore, we must explicitly descend the hierarchy when
setting the logging level and that is what this commit does.
Relates #20463
This commit introduces a new plugin for file-based unicast hosts
discovery. This allows specifying the unicast hosts participating
in discovery through a `unicast_hosts.txt` file located in the
`config/discovery-file` directory. The plugin will use the hosts
specified in this file as the set of hosts to ping during discovery.
The format of the `unicast_hosts.txt` file is to have one host/port
entry per line. The hosts file is read and parsed every time
discovery makes ping requests, thus a new version of the file that
is published to the config directory will automatically be picked
up.
Closes#20323
This commit fixes the following geo_point bwc tests:
* GeoDistanceIT to test deprecated GeoDistanceRangeQuery on legacy indexes only.
* ExternalFieldMapperTests to correctly handle LatLonPoint type
* GeoPointFieldMapperTests to correctly test stored geo_point fields
This change replaces the fields parameter with stored_fields when it makes sense.
This is dictated by the renaming we made in #18943 for the search API.
The following list of endpoint has been changed to use `stored_fields` instead of `fields`:
* get
* mget
* explain
The documentation and the rest API spec has been updated to cope with the changes for the following APIs:
* delete_by_query
* get
* mget
* explain
The `fields` parameter has been deprecated for the following APIs (it is replaced by _source filtering):
* update: the fields are extracted from the _source directly.
* bulk: the fields parameter is used but fields are extracted from the source directly so it is allowed to have non-stored fields.
Some APIs still have the `fields` parameter for various reasons:
* cat.fielddata: the fields paramaters relates to the fielddata fields that should be printed.
* indices.clear_cache: used to indicate which fielddata fields should be cleared.
* indices.get_field_mapping: used to filter fields in the mapping.
* indices.stats: get stats on fields (stored or not stored).
* termvectors: fields are retrieved from the stored fields if possible and extracted from the _source otherwise.
* mtermvectors:
* nodes.stats: the fields parameter is used to concatenate completion_fields and fielddata_fields so it's not related to stored_fields at all.
Fixes#20155
Today we add a prefix when logging within Elasticsearch. This prefix
contains the node name, and index and shard-level components if
appropriate.
Due to some implementation details with Log4j 2 , this does not work for
integration tests; instead what we see is the node name for the last
node to startup. The implementation detail here is that Log4j 2 there is
only one logger for a name, message factory pair, and the key derived
from the message factory is the class name of the message factory. So,
when the last node starts up and starts setting prefixes on its message
factories, it will impact the loggers for the other nodes.
Additionally, the prefixes are lost when logging an exception. This is
due to another implementation detail in Log4j 2. Namely, since we log
exceptions using a parameterized message, Log4j 2 decides that that
means that we do not want to use the message factory that we have
provided (the prefix message factory) and so logs the exception without
the prefix.
This commit fixes both of these issues.
Relates #20429
This commit cuts over geo_point fields to use Lucene's new point-based LatLonPoint type for indexes created in 5.0. Indexes created prior to 5.0 continue to use their respective encoding type. Below is a description of the changes made to support the new encoding type:
* New indexes use a new LatLonPointFieldMapper which provides a parse method for the new type
* The new LatLonPoint parse method removes support for lat_lon and geohash parameters
* Backcompat testing for deprecated lat_lon and geohash parameters is added to all unit and integration tests
* LatLonPointFieldMapper provides DocValues support (enabled by default) which uses Lucene's new LatLonDocValuesField type
* New LatLonPoint field data classes are added for aggregation support (wraps LatLonPoint's Numeric Doc Values)
* MultiFields use the geohash as the string value instead of the lat,lon string making it easier to perform geo string queries on the geohash instead of a lat,lon comma delimited string.
Removed Features:
* With the removal of geohash indexing, GeoHashCellQuery support is removed for all new indexes (still supported on existing indexes)
* LatLonPoint does not support a Distance Range query because it is super inefficient. Instead, the geo_distance_range query should be accomplished using either the geo_distance aggregation, sorting by descending distance on a geo_distance query, or a boolean must not of the excluded distance (which is what the distance_range query did anyway).
TODO:
* fix/finish yaml changes for plugin and rest integration tests
* update documentation
When generating random bogus documents, it could happen that they contain both the terms "the" and "ultimate", which would match the query "the ultimate" better than all the other non bogus documents, which would cause testCrossFieldMode to fail. "the" is a term that's relatively likely to be randomly generated given its length; we can simply increase the minimum length of randomly generated terms to 5, so that there are no collisions, as "the" cannot be generated anymore (nor can "ultimate" as the lenght doesn't go up to 8).
Also made some assertions more accurate to check how many hits match a query rather than checking only that the first or second hits are there.
Closes#18873