The refresh description should indicate that the affected shards are
refreshed as opposed to the entire index.
This was raised as a discrepancy on
discuss (https://discuss.elastic.co/t/refresh-parameter-of-index-api/59008/2)
on the .NET client that originates from code generated from the rest api
spec. The description has been updated in master but should be updated
for the 2.4.0 release.
BATS upgrade tests fails on master branch because it tries to install 2.x versions to upgrade from instead of 5.x versions. And since #18554 we should only test upgrades from 5.0.0-alpha4 versions.
This commit changes the vagrant tests so that it tries to list all the previous releases from version N-1. If nothing is found, it will fetch the current version and will run the upgrade tests with it. It works nicely with the current master 6.0.0-alpha1-SNAPSHOT. Once 5.0.0 is released it should run the test with it.
When uninstalling or upgrading elasticsearch using the RPM package some empty directories remain on the filesystem:
/usr/share/elasticsearch/bin
/usr/share/elasticsearch/lib
/usr/share/elasticsearch/modules
/usr/share/elasticsearch/modules/foo
Having empty directories in modules can prevent elasticsearch to start after an upgrade: the plugins service expects to find a plugin-descriptor.properties file in every sub directory of modules.
This PR cleans things a bit so that these empty directories are removed on upgrade/removal like it was in 2.x.
* Update rescoring docs in respect to sort
If sort is present in a query the rescore query is not executed. As long as this feature is neither implemented (see discussion in #6788) nor the combination of sort and rescoring raises an error, we should warn the user in the documentation about this.
* Missed a dot
The Log4j shutdown hack test tests that a hack we have in place to
workaround a bug in Log4j during shutdown is effective. Log4j can use
JMX to control logging levels, but we disable this through the use of a
system property log4j2.disable.jmx (mainly because there is no need for
this feature, but it also means granting additional security
permissions). The bug in Log4j is that during shutdown, it neglects to
check whether or not its usage of JMX is disable and so it attempts to
unregister management beans, leading to a permissions violation. The
test works by attempting to shutdown Log4j and thus triggering the bad
code path. With the Log4j hack in place, we have introduced jar hell so
that its our code running instead of code from the Log4j jar. Our code
correctly checks that the usage of JMX is disabled and thus does not
trip on a permissions violation. The test was a little complicated in
that it attempted to just grant the minimal permissions needed for Log4j
to do its thing, but this can sometimes lead to other unwanted
permissions violations because the permissions put in place are more
restrictive necessary. This commit simplifies this situation by
rewriting the test to only deny Log4j the sole permission needed to
trigger the bug.
Relates #20476
We still use some crazy poor mans compression in InternalSearchHit that
uses a thread local and an unordered map as a lookup table if requested.
Stuff like this should be handled by compression on the transport layer
rather than in-line in the serialization code. This code is complex enough.
In 7560101ec7, the Elasticsearch logger
names were modified to be their fully-qualified class name (with some
exceptions for special loggers like the slow logs and the transport
tracer). This commit updates the docs accordingly.
Relates #20475
This utility class is used in 3 places while we only need to register
the handlers once per node. Otherwise we will see nasty `WARN` logs like:
`registered two transport handlers for action indices:data/read/search[phase/fetch/id/scroll]...`
This change will only register handlers inside the main TransportSearchAction.
When upgrading elasticsearch using the RPM package, the scripts directory is removed if it's empty but it won't be recreated by the upgraded package. But after that the service won't start because the scripts dir is missing.
After this change SearchModule doesn't subclass AbstractModule anymore and all wiring
happens in `Node.java`. As a side-effect several tests don't need a guice injector anymore.
This commit modifies the logger names within Elasticsearch to be the
fully-qualified class name as opposed removing the org.elasticsearch
prefix and dropping the class name. This change separates the root
logger from the Elasticsearch loggers (they were equated from the
removal of the org.elasticsearch prefix) and enables log levels to be
set at the class level (instead of the package level).
Relates #20457
Today when setting the logging level via the command-line or an API
call, the expectation is that the logging level should trickle down the
hiearchy to descendant loggers. However, this is not necessarily the
case. For example, if loggers x and x.y are already configured then
setting the logging level on x will not descend to x.y. This is because
the logging config for x.y has already been forked from the logging
config for x. Therefore, we must explicitly descend the hierarchy when
setting the logging level and that is what this commit does.
Relates #20463
update geoip to not include null-valued results from database
Originally, the plugin would still insert all the requested fields, but
assign null to each one. This fixes that by not writing the fields at
all. Makes for a better experience when the null fields conflict with
the typical geo_point field mapping.
This commit introduces a new plugin for file-based unicast hosts
discovery. This allows specifying the unicast hosts participating
in discovery through a `unicast_hosts.txt` file located in the
`config/discovery-file` directory. The plugin will use the hosts
specified in this file as the set of hosts to ping during discovery.
The format of the `unicast_hosts.txt` file is to have one host/port
entry per line. The hosts file is read and parsed every time
discovery makes ping requests, thus a new version of the file that
is published to the config directory will automatically be picked
up.
Closes#20323
With the cut over to LatLonPoint the geohash, geohash_precision, lat_lon, and geohash_prefix parameters have been removed. This commit fixes the doc build by removing the remaining dangling references to these removed parameters.
This commit fixes the following geo_point bwc tests:
* GeoDistanceIT to test deprecated GeoDistanceRangeQuery on legacy indexes only.
* ExternalFieldMapperTests to correctly handle LatLonPoint type
* GeoPointFieldMapperTests to correctly test stored geo_point fields
We put the rest api spec into a jar for upload to maven, so that we can
use within external rest tests. This change adds making a pom for maven
(as well as producing sources and javadoc jars, even though they will be
empty, because maven central requires them).
This change replaces the fields parameter with stored_fields when it makes sense.
This is dictated by the renaming we made in #18943 for the search API.
The following list of endpoint has been changed to use `stored_fields` instead of `fields`:
* get
* mget
* explain
The documentation and the rest API spec has been updated to cope with the changes for the following APIs:
* delete_by_query
* get
* mget
* explain
The `fields` parameter has been deprecated for the following APIs (it is replaced by _source filtering):
* update: the fields are extracted from the _source directly.
* bulk: the fields parameter is used but fields are extracted from the source directly so it is allowed to have non-stored fields.
Some APIs still have the `fields` parameter for various reasons:
* cat.fielddata: the fields paramaters relates to the fielddata fields that should be printed.
* indices.clear_cache: used to indicate which fielddata fields should be cleared.
* indices.get_field_mapping: used to filter fields in the mapping.
* indices.stats: get stats on fields (stored or not stored).
* termvectors: fields are retrieved from the stored fields if possible and extracted from the _source otherwise.
* mtermvectors:
* nodes.stats: the fields parameter is used to concatenate completion_fields and fielddata_fields so it's not related to stored_fields at all.
Fixes#20155
Today we add a prefix when logging within Elasticsearch. This prefix
contains the node name, and index and shard-level components if
appropriate.
Due to some implementation details with Log4j 2 , this does not work for
integration tests; instead what we see is the node name for the last
node to startup. The implementation detail here is that Log4j 2 there is
only one logger for a name, message factory pair, and the key derived
from the message factory is the class name of the message factory. So,
when the last node starts up and starts setting prefixes on its message
factories, it will impact the loggers for the other nodes.
Additionally, the prefixes are lost when logging an exception. This is
due to another implementation detail in Log4j 2. Namely, since we log
exceptions using a parameterized message, Log4j 2 decides that that
means that we do not want to use the message factory that we have
provided (the prefix message factory) and so logs the exception without
the prefix.
This commit fixes both of these issues.
Relates #20429
This commit cuts over geo_point fields to use Lucene's new point-based LatLonPoint type for indexes created in 5.0. Indexes created prior to 5.0 continue to use their respective encoding type. Below is a description of the changes made to support the new encoding type:
* New indexes use a new LatLonPointFieldMapper which provides a parse method for the new type
* The new LatLonPoint parse method removes support for lat_lon and geohash parameters
* Backcompat testing for deprecated lat_lon and geohash parameters is added to all unit and integration tests
* LatLonPointFieldMapper provides DocValues support (enabled by default) which uses Lucene's new LatLonDocValuesField type
* New LatLonPoint field data classes are added for aggregation support (wraps LatLonPoint's Numeric Doc Values)
* MultiFields use the geohash as the string value instead of the lat,lon string making it easier to perform geo string queries on the geohash instead of a lat,lon comma delimited string.
Removed Features:
* With the removal of geohash indexing, GeoHashCellQuery support is removed for all new indexes (still supported on existing indexes)
* LatLonPoint does not support a Distance Range query because it is super inefficient. Instead, the geo_distance_range query should be accomplished using either the geo_distance aggregation, sorting by descending distance on a geo_distance query, or a boolean must not of the excluded distance (which is what the distance_range query did anyway).
TODO:
* fix/finish yaml changes for plugin and rest integration tests
* update documentation
When generating random bogus documents, it could happen that they contain both the terms "the" and "ultimate", which would match the query "the ultimate" better than all the other non bogus documents, which would cause testCrossFieldMode to fail. "the" is a term that's relatively likely to be randomly generated given its length; we can simply increase the minimum length of randomly generated terms to 5, so that there are no collisions, as "the" cannot be generated anymore (nor can "ultimate" as the lenght doesn't go up to 8).
Also made some assertions more accurate to check how many hits match a query rather than checking only that the first or second hits are there.
Closes#18873
This commit adds a -q/--quiet option to Elasticsearch so that it does not log anything in the console and closes stdout & stderr streams. This is useful for SystemD to avoid duplicate logs in both journalctl and /var/log/elasticsearch/elasticsearch.log while still allows the JVM to print error messages in stdout/stderr if needed.
closes#17220
This commit adds a health status parameter to the cat indices API for
filtering on indices that match the specified status (green|yellow|red).
Relates #20393
Previously when trying to listen on virtual interfaces during
bootstrap the application would stop working - the interface
couldn't be found by the NetworkUtils class.
The NetworkUtils utilize the underlying JDK NetworkInterface
class which, when asked to lookup by name only takes physical
interfaces into account, failing at virtual (or subinterfaces)
ones (returning null).
Note that when interating over all interfaces, both physical and
virtual ones are taken into account.
This changeset asks for all known interfaces, iterates over them
and matches on the given name as part of the loop, allowing it
to catch both physical and virtual interfaces.
As a result, elasticsearch can now also serve on virtual
interfaces.
A test case has been added which makes sure that all
iterable interfaces can be found by their respective name.
Note that this PR is a second iteration over the previously
merged but later reverted #19537 because it causes tests
to fail when interfaces are down. The test has been modified
to take this into account now.
Closes#17473Closes#19568
Relates #19537
Add docs to template support for _msearch
Relates to #10885
Relates to #15674
* Reference those docs from the rest api spec for _msearch/template support.
Match query throws parsing errors when an array of terms is provided, we should test that to make sure this behaviour doesn't change.
Relates to #15741
Once a primary is marked as relocated, we can not safely move it back to started (as we have no way of waiting on inflight operations that are performed on the target primary). If the master cancels the relocation in that state, we fail the primary. Sadly, there is a racing condition between the `updateRoutingEntry` method (which is called when the relocation is cancelled by the master) and the `relocated` method. That racing condition can leave the shard as marked "relocated" but have the routing entry not reflect the target relocation. This in turns causes NPEs in TransportReplicationAction:
```
java.util.Objects requireNonNull Objects.java 203
org.elasticsearch.action.support.replication.TransportReplicationAction$ConcreteShardRequest <init> TransportReplicationAction.java 982
```
Sadly, once we end up in this state, we will never recover.
This commit fixes that race condition by making sure `updateRoutingEntry` acquires the mutex when checking for the relocated status. While at it, I also tightened up the code and added lots of assertions/hard checks.
Adds the entire DiscoveryNode object to the trace log in AllocationDeciders.
The allocation decider logging at TRACE level can sometimes be helpful to determine why a shard is not getting allocated on specific nodes. Currently, we only log the node id for these messages. It will be helpful to also include the node name (esp. when dealing with a lot of nodes in the cluster).