This is one of our esoteric metadata mappers so I think we should distribute
it in a plugin rather than in elasticsearch core.
This introduces one limitation: the value of the `_size` parameter is not
retrievable for documents that are only in the transaction log.
The `multi_match` query groups terms that have the same analyzer together and
then applies the boost of the first query in each group. This is not necessary
given that boosts for each term are already applied another way.
Versions can be tricky with linux vendors and such. To help debug any possible issues, we should output a better version.
Today:
```
[elasticsearch] java.lang.RuntimeException: Java version: 1.7.0_55 suffers from critical bug https://bugs.openjdk.java.net/browse/JDK-8024830 which can cause data corruption.
[elasticsearch] Please upgrade the JVM, see http://www.elastic.co/guide/en/elasticsearch/reference/current/_installation.html for current recommendations.
[elasticsearch] If you absolutely cannot upgrade, please add -XX:-UseSuperWord to the JAVA_OPTS environment variable.
[elasticsearch] Upgrading is preferred, this workaround will result in degraded performance.
[elasticsearch] at org.elasticsearch.bootstrap.JVMCheck.check(JVMCheck.java:121)
[elasticsearch] at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:270)
[elasticsearch] at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:28)
```
With patch:
```
java.lang.RuntimeException: Java version: Oracle Corporation 1.7.0_40 [Java HotSpot(TM) 64-Bit Server VM 24.0-b56] suffers from critical bug https://bugs.openjdk.java.net/browse/JDK-8024830 which can cause data corruption.
Please upgrade the JVM, see http://www.elastic.co/guide/en/elasticsearch/reference/current/_installation.html for current recommendations.
If you absolutely cannot upgrade, please add -XX:-UseSuperWord to the JAVA_OPTS environment variable.
Upgrading is preferred, this workaround will result in degraded performance.
at org.elasticsearch.bootstrap.JVMCheck.check(JVMCheck.java:121)
at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:270)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:28)
```
This commit adds a new API to allow scripts to say whether they need scores.
In practice, only the `expression` script engine makes use of it correctly,
other engines just return `true` since they can't predict whether they'll
need scores. This should make scripted aggregations and `function_query`
faster as we'll now be able to pass needsScores=false to Query.createWeight.
Adding a named query that is null can lead to a NullPointerException
when copying the named queries. This is due to an implementation detail
in QueryParseContent.copyNamedQueries. In particular, this method uses
com.google.common.collect.ImmutableMap.copyOf. A documented requirement
of ImmutableMap is that none of the entries have a null key nor null
value. Therefore, we should not add such queries to the namedQueries
map. This will not change any behavior since Map.get returns null if no
entry with the given key exists anyway.
Closes#12683
This change improves the `function_score` query to not compute scores at all
when they are not needed, and to not compute scores on the underlying query
when the combine function is to replace the score with the scores of the
functions.
This commit makes it possible to serialize arbitrary objects by having them extend Writeable. When reading them though, we need to be able to identify which object we have to create, based on its name. This is useful for queries once we move to parsing on the coordinating node, as well as with aggregations and so on.
Introduced a new abstraction called NamedWriteable, which is supported by StreamOutput and StreamInput through writeNamedWriteable and readNamedWriteable methods. A new NamedWriteableRegistry is introduced also where named writeable prototypes need to be registered so that we are able to retrieve the proper instance of the writeable given its name and then de-serialize it calling readFrom against it.
Closes#12393
In order to avoid extra reroutes, `RoutingService` should avoid
scheduling a reroute of any shards where the delay is negative. To make
sure that we don't encounter a race condition between the
GatewayAllocator thinking a shard is delayed and RoutingService thinking
it is not, the GatewayAllocator will update the RoutingService with the
last time it checked in order to use a consistent "view" of the delay.
Resolves#12456
Relates to #12515 and #12456
When postFilter generates a token that is identical to the input term
DirectCandidateGenerator should not preFilter this token. If postFilter
and preFilter are the same analyzer instance it would fail with :
"TokenStream contract violation: close() call missing"
This is a forward port of @nomoa's #12670
Today we try to detect if there is an index existing in the directory
and if not we create one. This can be tricky and errof prone since we
rely on the filesystem without taking the context into account when the
engine gets created. We know in all situations if the index should be created
so we can just use this infromation and rely on the lucene index writer to barf
if we hit a situations where we can't append to an index while we should.
Today we miss to throw / rethrow an recovery exception if it happens during
the finalization of phase 1 if the source files are not affected. Even worse
this can cause some dataloss if the reason for this exception is a failure of
deleting a corruption marker or similar pre-existing corruptions since we continue
with the recovery and mark the target shared as started which will in-turn open
an engine with an empty index.
This makes the output of EsThreadPoolExecutor#toString less pretty but
we no longer have funky hacky that rely on the specific format of the
toString produced by ThreadPoolExecutor which isn't part of its API and
could change with any JVM version and break the output.
Improving the toString allows for nicer error reporting. Also cleaned up
the way that EsRejectedExecutionException notices that it was rejected
from a shutdown thread pool. I left javadocs about how its not 100% correct
but good enough for most uses.
The improved toString on EsThreadPoolExecutor mean every one of them needs
a name. In most cases the name to use is obvious. In tests I use the name
of the test method and in real thread pools I use the name of the thread
pool. In non-ThreadPool executors I use the thread's name.
Closes#9732
In order to support the URL notation including a user/pass combination
(like http://user:pass@host/plugin.zip) the auth info needs to be added
manually.
Today this is "unofficial" as conf/scripts, but some people
want to share scripts across different nodes and so on. Because
they cannot configure it, they are forced to use dirty hacks
like symbolic links, which isnt going to work: we aren't going
to recursively scan conf/ and add permissions to all link targets
underneath it, thats crazy.
I really hate adding yet another configuration knob here, but
users resorting to using symlinks are going to be frustrated,
and do things in a more insecure way.
The URL to download the main elasticsearch plugins did not match
what the S3 wageon is supposed to write to.
In addition this PR adds support for snapshots to access the
snapshot S3 bucket, so we can possibly download snapshot versions
of plugins.
The format of the URLs stems from #12270Closes#12632
Conflicting mappings that were allowed before v2.0 can cause runaway shard failures on upgrade. This commit adds a check that prevents a cluster from starting if it contains such indices as well as restoring such indices from a snapshot into already running cluster.
Closes#11857
Currently when we delete files belonging to deleted snapshots we issue one delete command to underlying snapshot store at a time. Some repositories can benefit from bulk deletes of multiple files.
Closes#12533
This method syncs the translog unless it's already synced. If the engine
is alreayd closed we are guaranteed to be synced already such that we can just
ignore this exception.
Closes#12603
this change was added recently which uses default timezone for the creation
date on CAT endpoints. We should be consistent and use UTC across the board.
This commit adds #getDefaultTimzone() to forbidden API and fixes the REST tests.
Relates to #11688
The Streams.copyTo(String|Bytes)FromClasspath() methods resolve resources using org.elasticsearch.io.Streams classloader. This is fine in elasticsearch core and when running tests but if used in a plugin this can lead to FileNotFoundExceptions at runtime because plugin are loaded in a dedicated classloader.
Most of the abstract base test classes we have were previously @Ignored.
However, there were also some other tests ignored. Having two ways to
quiet tests is confusing, and clearly it has caused some tests
to get lost in the fold.
This change moves all base test classes to use the "TestCase" suffix,
which is not picked up by the test class name pattern. It also removes
@Ignore from (almost) all tests, and adds it to forbidden apis.
And since we were renaming, I shorted base test class names to use
"ES" instead of "Elasticsearch". I type this a lot of types a day,
and I have heard others express a similar desire for a shorter name.
closes#10659
Now that integ tests are moved into `mvn verify`, we don't really have
a need for @Slow, and especially not @Integration. This removes
uses of the first, and completely removes uses of the latter.
Tasks can be registered with a timeout, which runs as a task in a separate
threadpool. The idea is that the timeout runner cancels the main task when
the time is out, and the timeout runner is cancelled when the main task
starts executing. However, the following statement:
```java
timeoutFuture = timer.schedule(new Runnable() {
@Override
public void run() {
if (remove(TieBreakingPrioritizedRunnable.this)) {
runAndClean(timeoutCallback);
}
}
}, timeValue.nanos(), TimeUnit.NANOSECONDS);
```
is not atomic: the removal task is first started, and then the (volatile)
variable is assigned. As a consequence, there is a short window that allows
a timeout task to wait until the time is out even if the task is already
completed.
See http://build-us-00.elastic.co/job/es_core_17_centos/496/ for an example of
such a failure.
Previously only the first aggregation in a buckets_path was check to make sure the aggregation existed. Now the whole path is checked to ensure an aggregation exists at each element in the buckets_path
Closes#12360
Its not going to work: its blocked by security policy
and will just add a confusing SecurityException to the mix, and
bogusly give an exit status of 0 when in fact something bad happened.
Finally, if ES can't startup, it is a serious problem, there is
no sense in hiding the reason why: deliver the full stack trace.
The repository verification process should create a subdirectory to make sure we check permission of newly created directories in case elasticsearch processes on different nodes are running using different uids and creating blobs with incompatible permissions.
Closes#11611
Previously we issued a reroute when a node went over the high watermark
in order to move shards away from the node. This change tracks nodes
that have previously been over the high or low watermarks and issues a
reroute when the node goes back underneath the watermark.
This allows shards that may be unassigned to be assigned back to a node
that was previously over the low watermark but no longer is.
Resolves#12422
As windows has different line endings and this has
already been fixed by another test, the method has been
moved into CliToolTestCase.
In addition one test has been removed, as it was redundant.
In order to ensure, we have the same experience across operating systems
and shells, this commit uses the java CLI parser instead of the shell
getopt parsing to parse arguments.
This also allows for support for paths, which contain spaces.
Also commons-cli depdency was upgraded to 1.3.1 and tests have been added.
Changes
* new exit code, OK_AND_EXIT, allowing to tell the caller to exit, as everything
went as expected (e.g. when running a version output)
BWC breaking:
* execute() returns an ExitStatus instead of an integer, otherwise there is no
possibility to signal by a command, if the JVM should be exited after a run.
This affects plugins, that have command line tools
* -v used to be version, but is a verbose flag by default in the current CLI infra,
must be -V or --version now
* -X has been removed - the current implementation was useless anyway, as
it prefixed those properties with "es.". You should use
ES_JAVA_OPTS/JAVA_OPTS for JVM configuration
Date math index name resolution enables you to search a range of time-series indices, rather than searching all of your time-series indices and filtering the the results or maintaining aliases. Limiting the number of indices that are searched reduces the load on the cluster and improves execution performance. For example, if you are searching for errors in your daily logs, you can use a date math name template to restrict the search to the past two days.
The added `ExpressionResolver` implementation that is responsible for resolving date math expressions in index names. This resolver is evaluated before wildcard expressions are evaluated.
The supported format: `<static_name{date_math_expr{date_format|timezone_id}}>` and the date math expressions must be enclosed within angle brackets. The `date_format` is optional and defaults to `YYYY.MM.dd`. The `timezone_id` id is optional too and defaults to `utc`.
The `{` character can be escaped by places `\\` before it.
Closes#12059