This behavior has changed been changed rescently to throw an IAE if
the translog we try to read from is already outdated. This is not
the expected behavior and this commit adds back the `old` way returning
`null` instead. The InternalEngine implementation will then go and ask the
lucene index for the document instead.
In some cases due to calling checking `rarely()` the `indexRandom()` method
can potentially flush, which creates flush requests, that miss a certain
header in this test and allow the test to fail.
In addition unused configuration code for this test has been removed.
We used to double write the translog operation which is not needed except
of for recovery. This commit cuts over to a big-array based temporary serialiation
and removes the crazy double writing.
Now that mapping updates are synchronous, it is not necessary to send mappings
to the master node during the recovery process anymore: they will already be on
the master node since we ensure mappings are on the master node before indexing.
Mappings conflicts should not be ignored. If I read the history correctly, this
option was added when a mapping update to an existing field was considered a
conflict, even if the new mapping was exactly the same. Now that mapping updates
are smart enough to detect conflicting options, we don't need an option to
ignore conflicts.
Whenever a query parser (or any other component) issues another
request as part of a request, the headers and the context has to
be supplied as well.
In order to do this, the `SearchContext` has to have those headers
available, which in turn means, the shard level request needs to
copy those from the original `SearchRequest`
This commit introduces two new interface to supply the needed methods
to work with context and headers.
Closes#10979
This is a leftover from the times where we failed a flush when
recoveries are ongoing. This code is really not needed anymore and
we can luckily flush the translog all the time as well.
ShardOperationFailureException implementations alread provide structured
exception support but it's not yet exposed on the interface. This change
allows nice rendering of structured REST exceptions also if searches fail on
only a subset of the shards etc.
Closes#11017
This commit fixes the name of the upated_snapshot task from something like "update_snapshot [org.elasticsearch.cluster.metadata.SnapshotMetaData$Entry@de00bc50]" to a more readable "update_snapshot [test-snap]"
When in a shared filesystem environment and recovering the primary to
any node. We should respect the allocation deciders if possible (still
force-allocting to another node if there aren't any "YES" decisions).
The AllocationDeciders should take precedence over the shard state
version when force-allocating an unassigned primary shard.
Closes#11192 which I accidentally already closed.
Squashed commit of the following:
commit f23faccddc2a77a880841da4c89c494edaa2aa46
Author: Robert Muir <rmuir@apache.org>
Date: Fri May 15 16:04:55 2015 -0400
Simplify this FileUtils even more: its either from the filesystem, or the classpath,
not both. Its already trying 4 different combinations of crazy paths for either of these anyway.
commit c7016c8a2b5a6043e2ded4b48b160821ba196974
Author: Robert Muir <rmuir@apache.org>
Date: Fri May 15 14:21:37 2015 -0400
include rest tests in test-framework jar
Now that lucene provides a way to identify if the warming reader is
the first initial opened reader we can detach this class from the
enclosing and make it static. This is important since it might access
not fully initialized members of the enclosing class since it's initialized
and used during constructor invocation.
The default `false` for `require_field_match` is a bit odd and confusing for users, given that field names get ignored by default and every field gets highlighted if it contains terms extracted out of the query, regardless of which fields were queries. Changed the default to `true`, it can always be changed per request.
Closes#10627Closes#11067
Our own fork of the lucene PostingsHighlighter is not easy to maintain and doesn't give us any added value at this point. In particular, it was introduced to support the require_field_match option and discrete per value highlighting, used in case one wants to highlight the whole content of a field, but get back one snippet per value. These two features won't
make it into lucene as they slow things down and shouldn't have been supported from day one on our end probably.
One other customization we had was support for a wider range of queries via custom rewrite etc. (yet another way to slow
things down), which got added to lucene and works much much better than what we used to do (instead of or rewrite, term
s are pulled out of the automata for multi term queries).
Removing our fork means the following in terms of features:
- dropped support for require_field_match: the postings highlighter will only highlight fields that were queried
- some custom es queries won't be supported anymore, meaning they won't be highlighted. The only one I found up until now is the phrase_prefix. Postings highlighter rewrites against an empty reader to avoid slow operations (like the ones that we were performing with the fork that we are removing here), thus the prefix will not be expanded to any term. What the postings highlighter does instead is pulling the automata out of multi term queries, but this is not supported at the moment with our MultiPhrasePrefixQuery.
Closes#10625Closes#11077
The underlying automaton-backed implementation throws an error if there are too many states.
This fix changes to using an implementation based on Set lookups for lists of excluded terms.
If the global-ordinals execution mode is in effect this implementation also addresses the slowness identified in issue 11181 which is caused by traversing the TermsEnum - instead the excluded terms’ global ordinals are looked up individually and unset the bits of acceptable terms. This is significantly faster.
Closes#11176
When scrolling, SCAN previously collected documents until it reached where it
had stopped on the previous iteration. This makes pagination slower and slower
as you request deep pages. With this change, SCAN now directly jumps to the
doc ID where is had previously stopped.
Most aggregations (terms, histogram, stats, percentiles, geohash-grid) now
support a new `missing` option which defines the value to consider when a
field does not have a value. This can be handy if you eg. want a terms
aggregation to handle the same way documents that have "N/A" or no value
for a `tag` field.
This works in a very similar way to the `missing` option on the `sort`
element.
One known issue is that this option sometimes cannot make the right decision
in the unmapped case: it needs to replace all values with the `missing` value
but might not know what kind of values source should be produced (numerics,
strings, geo points?). For this reason, we might want to add an `unmapped_type`
option in the future like we did for sorting.
Related to #5324
When specifying relative paths on startup, handling plugin
paths failed due to recently added security fix. This fix
ensures normalization of the plugin path as well.
In addition a new matcher has been added to easily check for a
status code of an HTTP response likes this
assertThat(response, hasStatus(OK));
Closes#10958
When an index setting is invalid and fails to be set, a WARN statement
is logged but it doesn't contain the index name, making tracking down
and fixing the problem more difficult. This commit adds the index name
to the log statement.
Previously, collate feature would be executed on all shards of an index using the client,
this leads to a deadlock when concurrent collate requests are run from the _search API,
due to the fact that both the external request and internal collate requests use the
same search threadpool.
As phrase suggestions are generated from the terms of the local shard, in most cases the
generated suggestion, which does not yield a hit for the collate query on the local shard
would not yield a hit for collate query on non-local shards.
Instead of using the client for collating suggestions, collate query is executed against
the ContextIndexSearcher. This PR removes the ability to specify a preference for a collate
query, as the collate query is only run on the local shard.
closes#9377
This adds back the ability to disable _source, as well as set includes
and excludes. However, it also restricts these settings to not be
updateable. enabled was actually already not modifiable, but no
conflict was previously given if an attempt was made to change it.
This also adds a check that can be made on the source mapper to
know if the the source is "complete" and can be used for
purposes other than returning in search or get requests. There is
one example use here in highlighting, but more need to be added
in a follow up issue (eg in the update API).
closes#11116
Add methods to operate on multi-valued fields in the expressions language.
Note that users will still not be able to access individual values
within a multi-valued field.
The following methods will be included:
* min
* max
* avg
* median
* count
* sum
Additionally, changes have been made to MultiValueMode to support the
new median method.
closes#11105
- Renamed TranslogSnapshot to MultiSnapshot
- moved legacy logic for trucation into LegacyTranslogReaderBase
- made several methods private and pkg private where applicable
- renamed arguments for consistency
Today we barf if repositories are unregistered with a `*` pattern. This
happens on almost every test and adds weird log messages. I dont' think
we should barf in that case.
Closes#11113
We have some builders, specifically query builders, `SearchSourceBuilder`, `QuerySourceBuilder` and `SuggestBuilder`, that implement `ToXContent` and also allow to build their content as bytes by simply creating a `BytesReference` that holds their json (or yaml etc.) content (`buildAsBytes` methods). They can also print out their content through `toString`. Made sure that those common methods are in one single place and reused where needed.
Also, merged `QueryBuilder` and `BaseQueryBuilder` and made `QueryBuilder` an abstract class instead of an interface.
Closes#11063
We parse the rewrite field in FuzzyQueryParser but we don't allow to set it via FuzzyQueryBuilder for our java api users. Added missing field and setter.
Closes#11130Closes#11139
The esoteric classifier contains in particular maps that take bytes or doubles
as keys. In the byte case, we can just use integer, and in the double case we
can use their long bits instead.
Today we are almost intentionally corrupt the translog if we loose
a node due to powerloss or similary disasters. In the translog reading
code we simply read until we hit an EOF exception ignoring the rest of the
translog file once hit. There is no information stored how many records
we are expecting or what the last written offset was.
This commit restructures the translog to add checkpoints that are written
with every sync operation recording the number of synced operations as well
as the last synced offset. These checkpoints are also used to identify the actual
transaction log file to open instead of relying on directory traversal.
This change adds a significant amount of additional checks and pickyness to the translog
code. For instance is the translog now associated with a specific engine via a UUID that is
written to each translog file as part of it's header. If an engine opens a translog file it
was not associated with the operation will fail.
Closes to #10933
Relates to #11011
These clauses filter the document space without affecting scoring and map to
Lucene's BooleanClause.Occur.FILTER. The `filtered` query is now deprecated and
```json
{
"filtered": {
"query": { //query },
"filter": { //filter }
}
}
```
should be replaced with
```json
{
"bool": {
"must": { //query },
"filter": { //filter }
}
}
```
In case FieldNameAnalyzer does not find an explicit analyzer for a given
field name, it returns the default analyzer. This behaviour can hide bugs
where the analyzer fails to be propagated to FieldNameAnalyzer or an
analyzer is requested for a field which is not mapped.
We are using a a VLong to serialize the PercolateResponse#tookInMillis. This
can due to several `System.currentTimeMillis()` implemenation details be negative.
We should prevent the negavite value for being serialized as a VLong and make sure
we use a valid value for this in the first place
Closes#11138
A few meta fields can currently be set within a document's source.
However, the recommended way to set meta fields like this is through
the api, and setting within the document can be a performance trap
(e.g. needing to find _id in order to route the document).
This change removes the ability to set meta fields within
a document source for 2.0+ indexes.
closes#11051closes#11074
Several plugins (e.g. elasticsearch-cloud-aws, elasticsearch-cloud-azure, elasticsearch-cloud-gce)
have integration tests that run with actual credentials to a remote service, so test runs
need access to this file.
These all require the tester (or jenkins) to supply the file with -Dtests.config.
In `NodeEnvironment.deleteShardDirectoryUnderLock`, we will now attempt
to acquire, then release, the `write.lock` file for the Lucene index in
question to ensure that no other `IndexWriter` has the directory open
before deleting the data.
Note that the `write.lock` file must be released before the actual
deletion in order to allow the directory to be deleted.
Fixes#11097
Today we enforce blocking which doesnt' really fit in the elasticsearch model
this commit adds async execution to the synced flush service by passing a
ActinListener to the service returing immediately.
If the user has set a shard_min_doc_count setting then avoid looking up background frequencies if the term fails to meet the foreground threshold on a shard.
Closes#11093
* Properly support symlinks (e.g. /tmp -> /mnt/tmp)
* Check all configured paths up front and deliver the best exception we can when things are wrong
* Initialize securitymanager earlier
* Fix too-loud error logging of Natives root check
As a follow up to #10870, this removes support for
index templates on disk. It also removes a missed
place still allowing disk based mappings.
closes#11052
When the longitude is zero for a document, the left and right bounds do not get updated in the geo bounds aggregation which can cause the bounds to be returned with Infinite values for longitude
Closes#11085
There currently are small differences between search api and count, exists, validate query, explain api when it comes to reading query_string parameters. `analyze_wildcard`, `lowercase_expanded_terms` and `lenient` are only read by the search api and ignored by all other mentioned apis. Unified code to fix this and make sure it doesn't happen again. Also shared some code when it comes to printing out the query as part of SearchSourceBuilder conversion to ToXContent.
Extended REST spec to include all the supported params (some that were already supported weren't listed), and added REST tests (also some basic tests for count and search_exists which weren't tested at all).
Closes#11057
BytesQueryBuilder was introduced to be used internally by the phrase suggester and its collate feature. It ended up being exposed via Java api but the existing WrapperQueryBuilder could be used instead. Added WrapperQueryBuilder constructor that accepts a BytesReference as argument.
One other reason why this filter builder should be removed is that it gets on the way of the query parsers refactoring, given that it's the only query builder that allows to build a query through java api without having a respective query parser.
Closes#10919
Different responses hold the shards header, search, count, flush etc. The code was duplicated in two different places, centralized in RestActions.
It turns out that only the search response printed out the status field before the reason, which was added to all other broadcast responses too.
Closes#11064
Dynamic settings has to be injected into constructor with either @ClusterDynamicSettings or @IndexDynamicSettings. If annotations are not specified an empty instance of Dynamic Settings is injected that can lead to difficult to discover errors such as #10614. This commit will make any attempt to inject unannotated dynamic settings to generate a giuce error.
Our ThreadPool constructor creates a couple of threads (scheduler and timer) which might not get shut down if the initialization of a node fails. A guice error might occur for example, which causes the InternalNode constructor to throw an exception. In this case the two threads are left behind, which is not a big problem when running es standalone as the error will be intercepted and the jvm will be stopped as a whole. It can become more of a problem though when running es in embedded mode, as we'll end up with lingering threads or testing an handling of initialization failures.
Closes#9107