Most aggregations (terms, histogram, stats, percentiles, geohash-grid) now
support a new `missing` option which defines the value to consider when a
field does not have a value. This can be handy if you eg. want a terms
aggregation to handle the same way documents that have "N/A" or no value
for a `tag` field.
This works in a very similar way to the `missing` option on the `sort`
element.
One known issue is that this option sometimes cannot make the right decision
in the unmapped case: it needs to replace all values with the `missing` value
but might not know what kind of values source should be produced (numerics,
strings, geo points?). For this reason, we might want to add an `unmapped_type`
option in the future like we did for sorting.
Related to #5324
When specifying relative paths on startup, handling plugin
paths failed due to recently added security fix. This fix
ensures normalization of the plugin path as well.
In addition a new matcher has been added to easily check for a
status code of an HTTP response likes this
assertThat(response, hasStatus(OK));
Closes#10958
When an index setting is invalid and fails to be set, a WARN statement
is logged but it doesn't contain the index name, making tracking down
and fixing the problem more difficult. This commit adds the index name
to the log statement.
Previously, collate feature would be executed on all shards of an index using the client,
this leads to a deadlock when concurrent collate requests are run from the _search API,
due to the fact that both the external request and internal collate requests use the
same search threadpool.
As phrase suggestions are generated from the terms of the local shard, in most cases the
generated suggestion, which does not yield a hit for the collate query on the local shard
would not yield a hit for collate query on non-local shards.
Instead of using the client for collating suggestions, collate query is executed against
the ContextIndexSearcher. This PR removes the ability to specify a preference for a collate
query, as the collate query is only run on the local shard.
closes#9377
This adds back the ability to disable _source, as well as set includes
and excludes. However, it also restricts these settings to not be
updateable. enabled was actually already not modifiable, but no
conflict was previously given if an attempt was made to change it.
This also adds a check that can be made on the source mapper to
know if the the source is "complete" and can be used for
purposes other than returning in search or get requests. There is
one example use here in highlighting, but more need to be added
in a follow up issue (eg in the update API).
closes#11116
Add methods to operate on multi-valued fields in the expressions language.
Note that users will still not be able to access individual values
within a multi-valued field.
The following methods will be included:
* min
* max
* avg
* median
* count
* sum
Additionally, changes have been made to MultiValueMode to support the
new median method.
closes#11105
- Renamed TranslogSnapshot to MultiSnapshot
- moved legacy logic for trucation into LegacyTranslogReaderBase
- made several methods private and pkg private where applicable
- renamed arguments for consistency
Today we barf if repositories are unregistered with a `*` pattern. This
happens on almost every test and adds weird log messages. I dont' think
we should barf in that case.
Closes#11113
We have some builders, specifically query builders, `SearchSourceBuilder`, `QuerySourceBuilder` and `SuggestBuilder`, that implement `ToXContent` and also allow to build their content as bytes by simply creating a `BytesReference` that holds their json (or yaml etc.) content (`buildAsBytes` methods). They can also print out their content through `toString`. Made sure that those common methods are in one single place and reused where needed.
Also, merged `QueryBuilder` and `BaseQueryBuilder` and made `QueryBuilder` an abstract class instead of an interface.
Closes#11063
We parse the rewrite field in FuzzyQueryParser but we don't allow to set it via FuzzyQueryBuilder for our java api users. Added missing field and setter.
Closes#11130Closes#11139
The esoteric classifier contains in particular maps that take bytes or doubles
as keys. In the byte case, we can just use integer, and in the double case we
can use their long bits instead.
Today we are almost intentionally corrupt the translog if we loose
a node due to powerloss or similary disasters. In the translog reading
code we simply read until we hit an EOF exception ignoring the rest of the
translog file once hit. There is no information stored how many records
we are expecting or what the last written offset was.
This commit restructures the translog to add checkpoints that are written
with every sync operation recording the number of synced operations as well
as the last synced offset. These checkpoints are also used to identify the actual
transaction log file to open instead of relying on directory traversal.
This change adds a significant amount of additional checks and pickyness to the translog
code. For instance is the translog now associated with a specific engine via a UUID that is
written to each translog file as part of it's header. If an engine opens a translog file it
was not associated with the operation will fail.
Closes to #10933
Relates to #11011
These clauses filter the document space without affecting scoring and map to
Lucene's BooleanClause.Occur.FILTER. The `filtered` query is now deprecated and
```json
{
"filtered": {
"query": { //query },
"filter": { //filter }
}
}
```
should be replaced with
```json
{
"bool": {
"must": { //query },
"filter": { //filter }
}
}
```
In case FieldNameAnalyzer does not find an explicit analyzer for a given
field name, it returns the default analyzer. This behaviour can hide bugs
where the analyzer fails to be propagated to FieldNameAnalyzer or an
analyzer is requested for a field which is not mapped.
We are using a a VLong to serialize the PercolateResponse#tookInMillis. This
can due to several `System.currentTimeMillis()` implemenation details be negative.
We should prevent the negavite value for being serialized as a VLong and make sure
we use a valid value for this in the first place
Closes#11138
A few meta fields can currently be set within a document's source.
However, the recommended way to set meta fields like this is through
the api, and setting within the document can be a performance trap
(e.g. needing to find _id in order to route the document).
This change removes the ability to set meta fields within
a document source for 2.0+ indexes.
closes#11051closes#11074
Several plugins (e.g. elasticsearch-cloud-aws, elasticsearch-cloud-azure, elasticsearch-cloud-gce)
have integration tests that run with actual credentials to a remote service, so test runs
need access to this file.
These all require the tester (or jenkins) to supply the file with -Dtests.config.
In `NodeEnvironment.deleteShardDirectoryUnderLock`, we will now attempt
to acquire, then release, the `write.lock` file for the Lucene index in
question to ensure that no other `IndexWriter` has the directory open
before deleting the data.
Note that the `write.lock` file must be released before the actual
deletion in order to allow the directory to be deleted.
Fixes#11097
Today we enforce blocking which doesnt' really fit in the elasticsearch model
this commit adds async execution to the synced flush service by passing a
ActinListener to the service returing immediately.
If the user has set a shard_min_doc_count setting then avoid looking up background frequencies if the term fails to meet the foreground threshold on a shard.
Closes#11093
* Properly support symlinks (e.g. /tmp -> /mnt/tmp)
* Check all configured paths up front and deliver the best exception we can when things are wrong
* Initialize securitymanager earlier
* Fix too-loud error logging of Natives root check
As a follow up to #10870, this removes support for
index templates on disk. It also removes a missed
place still allowing disk based mappings.
closes#11052
When the longitude is zero for a document, the left and right bounds do not get updated in the geo bounds aggregation which can cause the bounds to be returned with Infinite values for longitude
Closes#11085
There currently are small differences between search api and count, exists, validate query, explain api when it comes to reading query_string parameters. `analyze_wildcard`, `lowercase_expanded_terms` and `lenient` are only read by the search api and ignored by all other mentioned apis. Unified code to fix this and make sure it doesn't happen again. Also shared some code when it comes to printing out the query as part of SearchSourceBuilder conversion to ToXContent.
Extended REST spec to include all the supported params (some that were already supported weren't listed), and added REST tests (also some basic tests for count and search_exists which weren't tested at all).
Closes#11057
BytesQueryBuilder was introduced to be used internally by the phrase suggester and its collate feature. It ended up being exposed via Java api but the existing WrapperQueryBuilder could be used instead. Added WrapperQueryBuilder constructor that accepts a BytesReference as argument.
One other reason why this filter builder should be removed is that it gets on the way of the query parsers refactoring, given that it's the only query builder that allows to build a query through java api without having a respective query parser.
Closes#10919
Different responses hold the shards header, search, count, flush etc. The code was duplicated in two different places, centralized in RestActions.
It turns out that only the search response printed out the status field before the reason, which was added to all other broadcast responses too.
Closes#11064
Dynamic settings has to be injected into constructor with either @ClusterDynamicSettings or @IndexDynamicSettings. If annotations are not specified an empty instance of Dynamic Settings is injected that can lead to difficult to discover errors such as #10614. This commit will make any attempt to inject unannotated dynamic settings to generate a giuce error.
Our ThreadPool constructor creates a couple of threads (scheduler and timer) which might not get shut down if the initialization of a node fails. A guice error might occur for example, which causes the InternalNode constructor to throw an exception. In this case the two threads are left behind, which is not a big problem when running es standalone as the error will be intercepted and the jvm will be stopped as a whole. It can become more of a problem though when running es in embedded mode, as we'll end up with lingering threads or testing an handling of initialization failures.
Closes#9107
This commit makes queries and filters parsed the same way using the
QueryParser abstraction. This allowed to remove duplicate code that we had
for similar queries/filters such as `range`, `prefix` or `term`.
The mapper listener concept is only now used as a callback to the
MapperService when new fields are added. This change removes the
listeners, instead storing a link to the mapper service in
each doc mapper.
This commit makes create, update and delete operations on an index durable
by default. The user has the option to opt out to use async translog flushes
on a per-index basis by settings `index.translog.durability=request`.
Initial benchmarks running on SSDs have show that indexing is about 7% - 10% slower
with bulk indexing compared to async translog flushes. This change is orthogonal to
the transaction log sync interval and will only sync the transaction log if the operation
has not yet been concurrently synced. Ie. if multiple indexing requests are submitted and
one operations sync call already persists the operations of others only one sync call is executed.
Relates to #10933
The mapper listener abstractions for object and field mappers are used
to notify the mapper service of new fields, as well as collect
all object and field mappers through a set of traversal functions.
This change removes the traversal functions in favor of simple
iteration over subfields of a mapper.
This is pretty much a workaround for the fact that we simply
close the downstream resources once the shard is closed. This means
the document parser will barf with NPE or something similar while
AlreadyClosedException would be approriate.
Removes the More Like This API, users should now use the More Like This query.
The MLT API tests were converted to their query equivalent. Also some clean
ups in MLT tests.
Closes#10736Closes#11003
In some cases it might happen that a mapping which is already available on the
master node is not available yet on the node that holds the primary shard.
This commit changes indexing on the primary shard so that if a dynamic update
is triggered then the index operation is re-tried until required mappings are
available locally (using cluster state observing).
Previously, we were using the "statistical", technically accurate name. Instead, we
should probably use the name that people are familiar with, e.g. "Holt Winters" instead
of "triple exponential". To that end:
- `single_exp` becomes `ewma` (exponentially weighted moving average)
- `double_exp` becomes `holt`
When the `triple_exp` is added, it will be called `holt_winters`.
Current features (eg. update API) and future features (eg. reindex API)
depend on _source. This change locks down the field so that
it can no longer be disabled. It also removes legacy settings
compress/compress_threshold.
closes#8142closes#10915
If we have to do the one-time loading of fieldata, it requires
more permissions than groovy scripts currently have (zero). This
is because of RamUsageEstimator reflection and so on in PagedBytes.
GroovySecurityTests only test a numeric field, so add a string field
to the test (so pagedbytes fielddata gets created etc).
LUCENE-6422 - PackedQuadTree enhancement - was committed in Lucene 5.2 which is now integrated w/ ES 2.0. This eliminates the need to carry our own local lucene.spatial package. This commit removes the now unnecessary files.
When settings sync to 0, we benefit from using the buffered type, no need to change to simple, since we get a chance to fsync multiple operations (for that single operation) and not have to sync for the other ones before returning each one
This commit adds the path of the PID file to the Environment. It also add it to the Security Manager since the PID file is deleted by a shutdown hook when the JVM is exited.
Double.MIN_VALUE does not follow the same semantics as Integer.MIN_VALUE. Namely, it
represents the smallest positive, non-zero value a double can hold. Since the test uses negative
doubles, this can incorrectly find the min/max metric for a set of values.
Instead, Double.NEGATIVE_INFINITY needs to be used, which represents the smallest value possible.
Not strictly necessary, but MAX_VALUE was switched to POSITIVE_INFINITY just to be 100% correct
we already fail the shard in the `onFailure` method if the replica
operation barfs. This additional check has been added lately that
bypasses the clusterstate observer which causes replicas to fail
if the mappings are not yet present.
When a node was a data node only then the index state was not written.
In case this node connected to a master that did not have the index
in the cluster state, for example because a master was restarted and
the data folder was lost, then the indices were not imported as dangling
but instead deleted.
This commit makes sure that index state for data nodes is also written
if they have at least one shard of this index allocated.
closes#8823closes#9952
This commit moves the translog creation into the InternalEngine
to ensure the transactino log is created after we acquired the write
lock on the index. This also prevents races when ShadowEngines are shutting
down due to node restarts where another node already takes over the not yet
fully synced transaction log.
If the collect method was called with a bucketOrd of > 0 the arrays holding the state for the aggregation would be grown but the initial values for the bucketOrds > 0 were all set to Double.NEGATIVE_INFINITY meaning that for the bottom, posLeft and negLeft values no collected document would change the value since NEGATIVE_INFINITY is always less than every other value.
Closes#10804
This change removes the multiple implementations of different admin interfaces and centralizes it with AbstractClient. It also makes sure *all* executions of actions now go through a single AbstractClient#execute method, taking care of copying headers and wrapping listener.
This also has the side benefit of removing all the code around differnet possible clients, and removes quite a bit of code (most of the + code is actually removal of generics and such).
This change also changes how TransportClient is constructed, requiring a Builder to create it, its a breaking change and its noted in the migration guide.
Yea another step towards simplifying the action infra and making it simpler...
Currently, when all copies of a shard are lost, we reach out to all
other nodes to see whether they have a copy of the data. For a shared
filesystem, though, we can assume that each node has a copy of the data
available, so return a state version of at least 0 for each node.
This feature is set using the dynamic index setting
`index.shared_filesystem.recover_on_any_node`, which defaults to
`false`.
Fixes#10932
It appears the previous failure (-Dtests.seed=D9EF60095522804F) is just accumulation of
floating point error differences between expected and actual results. Making the tests less
stringent by requiring closeTo(0.1) instead of the previous 0.00001
This commit adds a counter for IndexShard that keeps track of how many write operations
are currently in flight on a shard. The counter is incremented whenever a write request is
submitted in TransportShardReplicationOperationAction and decremented when it is finished.
On a primary it stays incremented while replicas are being processed.
The counter is an instance of AbstractRefCounted. Once this counter reaches 0
each write operation will be rejected with an IndexClosedException.
closes#10610
Today we wait 30 sec for shards to flush and close and then simply exit the process.
This is often not desired and we should by default wait long enough for shards to
close etc. This commit adds a default timeout of one day which simplifies the code
and gives us _enough_ time to shut down.
Closes#10680
If the initial recovery is skipped all uncommitted changes are lost. This
must be enforced otherwise primary and replica will go out of sync once the
primary is started after restore and a replica recovers from it applying the
still referenced transaction logs.
Instead of failing the Engine for a shared filesystem, this change
allows a "soft close" of the Engine, where only the IndexWriter is
closed so that the replica can open an IndexWriter using the same
filesystem directory/mount.
Fixes#10469
This refcounting doesn't work for shadow replicas since we open
the same translog file from more than one node while running a rolling
restart. This functionality is also superseeded by our filesystem abstraction
which detects file leaks under the hood.
Today, we rely on the user to set request listener threads to true when they are on the client side in order not to block the IO threads on heavy operations. This proves to be very trappy for users, and end up creating problems that are very hard to debug.
Instead, we can do the right thing, and automatically thread listeners that are used from the client when the client is a node client or a transport client.
This change also removes the ability to set request level listener threading, in the effort of simplifying the code path and reasoning around when something is threaded and when it is not.
closes#10940