This change just changes the default for index.codec.bloom.load to
false: with recent performance improvements to ID lookup, such as
#6298, bloom filters don't give much of a performance gain anymore,
and they can consume non-trivial RAM when there are many tiny
documents.
For now, we still index the bloom filters, so if a given app wants
them back, it can just update the index.codec.bloom.load to true.
Closes#6959
GET only returned null even when stored if requested with GET like this:
`curl -XGET "http://localhost:9200/test/test/1?fields=_all"`
Instead, it should simply behave like a String field and return the
concatenated fields as String.
closes#6924
A new option `prune` has been added to allow users to control phrase suggestion pruning when `collate`
is set. If the new option is set, the phrase suggestion option will contain a boolean `collate_match`
indicating whether the respective result had hits in collation.
CLoses#6927
Today we only do count searches to ensure sane results are returned
after upgrading etc. This change adds sorting to the picture asserting
on simple numeric sorting that uses field data etc. after upgrading.
Relates to #6967
In unicast discovery, we try to reuse existing discovery nodes based on the node address they have. If we find an existing node based on its address, and for some reason its not connected, don't add it to the list of nodes to disconnect from, as that (full) connection is useful down the road
closes#6966
Looking at the connect code, if 2 threads at the same time try and connect to a node, and both enter sequentially the connectLock code block, the second one would try and put the connection in the map, and close the replaced channels, which will cause the existing connection to close as well (since it removes the node from the connectedNodes map)
To fix this, simply make sure we properly check the existence of the connection within the connectionLock block, so there won't be concurrent connections going on.
While doing this, also went over all the mutation code that handles disconnections, and made sure they are properly done only within a connection lock.
closes#6964
This commits removes BytesValues/LongValues/DoubleValues/... and tries to use
Lucene's APIs such as NumericDocValues or RandomAccessOrds instead whenever
possible.
The next step would be to take advantage of the fact that APIs are the same in
Lucene and Elasticsearch in order to remove our custom comparators and use
Lucene's.
There are a few side-effects to this change:
- GeoDistanceComparator has been removed, DoubleValuesComparator is used instead
on top of dynamically computed values (was easier than migrating
GeoDistanceComparator).
- SortedNumericDocValues doesn't guarantee uniqueness so long/double terms
aggregators have been updated to make sure a document cannot fall twice in
the same bucket.
- Sorting by maximum value of a field or running a `max` aggregation is
potentially significantly faster thanks to the random-access API.
Our aggs and p/c aggregations benchmarks don't report differences with this
change on uninverted field data. However the fact that doc values don't need
to be wrapped anymore seems to help a lot. For example
TermsAggregationSearchBenchmark reports ~30% faster terms aggregations on doc
values on string fields with this change, which are now only ~18% slower than
uninverted field data although stored on disk.
Close#6908
The assertion on binary equality for streamable serialization
sometimes fails due to the usage of identify hashmaps inside
the InternalSearchHits serialization. This only happens if
the number of shards the result set is composed of is very high.
This commit makes the serialziation deterministic and removes
the need to serialize the ordinal due to in-order serialization.
The change introduced in #6949 (do not serialize the cluster state) also means master now responds with an empty response rather then a JoinResponse. However, sendJoinRequestBlocking still expected a JoinRequest.
At the moment we serialize the cluster state in JoinResponse and ValidateJoinRequest. However this state is not used anywhere and can be removed to save on network overhead
Closes#6949
This test deletes and updates using upserts documents over several threads in a
tight loop. It counts the number of responses and verifies that the versions at
the end are correct.
today if a snapshot is corrupted the restore operation
never terminates. Yet, if the snapshot is corrupted there
is no way to restore it anyway. If such a snapshot is restored
today the only way to cancle it is to delete the entire index which
might cause dataloss. This commit also fixes an issue in InternalEngine
where a deadlock can occur if a corruption is detected during flush
since the InternalEngine#snapshotIndex aqcuires a topLevel read lock
which prevents closing the engine.
Closes#6938
The `index.fail_on_corruption` was not updateable via the index settings
API. This commit also fixed the setting prefix to be consistent with other
setting on the engine. Yet, this feature is unreleased so this won't break anything.
Closes#6941
Small refactorings to make the MessageChannelHandler more extensible.
Also allowed access to the different netty pipelines
This is the fix after the first version had problems with the HTTP
transport due to wrong reusing channel handlers, which is the reason
why tests failed.
Relates #6889Closes#6915
It's now possible to inject action filters from plugins via `ActionModule#registerFilter` through the following code:
```
public void onModule(ActionModule actionModule) {
actionModule.registerFilter(MyFilter.class);
}
```
Also made `TransportAction#execute` methods final to enforce the execution of the filter chain. By default the chain is empty though.
Note that the action filter chain is executed right after the request validation, as the filters might rely on a valid request to do their work.
Closes#6921
Today when we start a `TransportClient` we use the given transport
addresses and create a `DiscoveryNode` from it without knowing the
actual nodes version. We just use the `Version.CURRENT` which is an
upper bound. Yet, the other node might be a version less than the
currently running and serialisation of the nodes info might break. We
should rather use a lower bound here which is the version of the first
release with the same major version as `Version.CURRENT` since this is
what we officially support.
This commit moves to use the minimum major version or an RC / Snapshot
if the current version is a snapshot.
Closes#6894
Adds the ability to the Term Vector API to generate term vectors for some
chosen fields, even though they haven't been explicitely stored in the index.
Relates to #5184Closes#6567
The bulk processor tries to acquire all leases for the semaphore to wait
for all pending requests. Yet, we should release them afterwards again to
ensure we don't ever deadlock if there is a bug in the processor.
This commit also adds a testcase for this method
It's now possible to register watchers along with a specified check frequency. There are three frequencies: low, medium, high. Each one is associated with a check interval that determines how frequent the watchers will check for changes and notify listeners if needed. By default, the intervals are 5s, 30s and 60s respectively, but they can also be customized in the settings. also:
- Added the WatcherHandle construct by which one can stop it (remove it) and resume it (re add it). Also provices access to the watchers itself and the frequency by which it's checked
- Change the default frequency to 30 seconds interval (used to be 60 seconds). The only watcher that is currently effected by this is the script watcher (now auto-loading scripts will auto-load every 30 seconds if changed)
This is to prevent a rare racing condition where the very same shard gets allocated to the node after our sanity check that the cluster state didn't check and the actual deletion of the files.
Closes#6902
This results in unstable tests, most likely due to Channels being mixed
up by wrongly creating the pipelines. Needs investigation and a test.
This reverts commit db7f0d36af.
The IndexingMemoryController determines the amount of indexing buffer size and translog buffer size each shard should have. It takes memory from inactive shards (indexing wise) and assigns it to other shards. To do so it needs to know about the addition and closing of shards. The current implementation hooks into the indicesService.indicesLifecycle() mechanism to receive call backs, such shard entered the POST_RECOVERY state. Those call backs are typically run on the thread that actually made the change. A mutex was used to synchronize those callbacks with IndexingMemoryController's background thread, which updates the internal engines memory usage on a regular interval. This introduced a dependency between those threads and the locks of the internal engines hosted on the node. In a *very* rare situation (two tests runs locally) this can cause recovery time outs where two nodes are recovering replicas from each other.
This commit introduces a a lock free approach that updates the internal data structures during iterations in the background thread.
Closes#6892
FieldMapper has two methods
`Filter termsFilter(List values, @Nullable QueryParseContext)` which is supposed
to work on the inverted index and
`Filter termsFilter(QueryParseContext, List, QueryParseContext)` which is
supposed to work on field data. Let's rename the second one to
`fieldDataTermsFilter` and remove the unused `QueryParseContext`.
Close#6888
If you call `bin/plugin --remove es-plugin` the plugin got removed but the file `bin/plugin` itself was also deleted.
We now don't allow the following plugin names:
* elasticsearch
* plugin
* elasticsearch.bat
* plugin.bat
* elasticsearch.in.sh
* service.bat
Closes#6745
With the removal of setNextScore in #6864, script engines must use
the Scorer to find the score of a document. The DocLookup is updated
appropriately to do this, but most script engines require a Number to be
bound for numeric variables. Groovy already had an encapsulation for
this funtionality, and this moves it out to be shared with other script
engines.
closes#6898
While it would be nice to do this all the way up the chain (into
score functions), this at least removes the weird dual
setNextScore/setScorer for SearchScripts.
closes#6864
This fixes an issue introduced by the serialization changes in #6486
which are not needed at all. Node that the serialization itself is not broken
but the TransportClient uses its own version on initial connect and getting
the NodeInfos.
Today we synchronize when updating the IndexFieldDataService
datastructures. This might unnecessarily block progress if multiple
request need different fielddata instance for different fields.
This commit also fixes clear calls to actually consistently clear
the caches in the case of an exception.
Closes#6855
As a SizeValue is used for serializing the thread pool size, a negative number
resulted in throwing an exception when deserializing (using -ea an assertionerror
was thrown).
This fixes a check for changing the serialization logic, so that negative numbers are read correctly, by adding an internal UNBOUNDED value.
Closes#6325Closes#5357
Due to change introduced in #6825, we now start a local gateway recovery for replicas, if the source node can not be found. The recovery then fails because we never recover replicas from disk.
Closes#6879
In rare cases we may fail to send a shard failure event to the master, or there is no known master when the shard has failed (ex. a couple of node leave the cluster canceling recoveries and causing a master to step down at the same time). When that happens and a cluster state arrives from the (new) master we should resend the shard failure in order for the master to remove the shard from this node.
Closes#6881
make the builder releasable (auto closeable), and use it in shards state
also make XContentParser releasable (AutoCloseable) and not closeable since it doesn't throw an IOException
closes#6869
use similar mechanism that shares numeric analyzers for long/double/... for dates as well. This has nice memory save properties with many date fields mapping case, as well as analysis saves (thread local resources)
closes#6843
Enviorment variables might override the tests settings even if
they are explicitly set. Other base classes like InternalTestCluster
also specify `config.ignore_system_properties: true` to ensure `what
we set is what we get`
These are javascript expressions, which can only access numeric
fielddata, parameters, and _score. They can only be used for searches (not document updates).
closes#6818
This test makes it easy to create a lightweight node (no http, indices stored
in RAM, ...) whose main purpose is to get an instance of the Guice injector
for unit tests.
This should help not have to update lots of unit tests when we add a new
Guice dependency.
Throwables thrown on update retries are now caught and handled via
the provided callback. This commit also contains an integration test
demonstrating the bug and validating the fix.
Closes#6355Closes#6724
Each transport action is associated with at least an action name, which is the action name that gets serialized together with the request and identifies what to do with the request itself. Also, the action name is the name of the registered transport handler that handles incoming request for the transport action.
This commit makes the action name available in a generic manner in the TransportAction base class, so that it can be used when needed by subclasses, or in the base class for instance for action filtering.
Closes#6860
Today we close/reopen IW when we change the RAM buffer but that's
costly because it means the next NRT reader is a full reopen. The RAM
buffer size setting is a live one in IndexWriter, even if there are no
buffered docs in RAM when you call it.
Separately it would be nice if Lucene let you manage a "reader pool"
that could outlive individual IW instances ...
Closes#6856
This commit cleans up some code around the indexed script/templates feature.
Remove dead code in ScriptService.
Remove setXScript methods for UpdateRequestBuilder and use setScript(script,type) instead
CorruptFileTest sometimes hits conditions where lots of rebalancing
happens. In such a case the default timeout is just not enough - this
timeout just makes sure that the cluster has enough time to balance
itself.
The newly added collate option will let the user provide a template query/filter which will be executed for every phrase suggestions generated to ensure that the suggestion matches at least one document for the filter/query.
The user can also add routing preference `preference` to route the collate query/filter and additional `params` to inject into the collate template.
Closes#3482
- names() to return the direct settings names
- getAsSettings(String) to return the settings mapped to the given name (like getByPrefix(...) except no need to provide a tailing '.')
This change allow elasticsearch users to store scripts and templates in an index for use at search time.
Scripts/Templates are stored in the .scripts index. The type of the events is set to the script language.
Templates use the mustache language so their type is be "mustache".
Adds the concept of a script type to calls to the ScriptService types are INDEXED,INLINE,FILE.
If a script type of INDEXED is supplied the script will be attempted to be loaded from the indexed, FILE will
look in the file cache and INLINE will treat the supplied script argument as the literal script.
REST endpoints are provided to do CRUD operations as is a java client library.
All query dsl points have been upgraded to allow passing in of explicit script ids and script file names.
Backwards compatible behavior has been preserved so this shouldn't break any existing querys that expect to
pass in a filename as the script/template name. The ScriptService will check the disk cache before parsing the
script.
Closes#5921#5637#5484
For unicast ping the DiscoveryNode identity is based on its id, which in that stage is a dummy value, this breaks any rule in the mock tran
However the TransportAddress is a valid value in unicast ping and all other places, so that is a better alternative.
Closes#6836
The Hunspell service would throw a confusing error message if more than
one affix file was present. This commit distinguishes between the two
error cases: where there are no affix files and when there are too many
affix files.
Also implements lazy dictionary loading, which was used in the tests
but not implemented.
Closes#6850
This commit adds the infrastructure to allow pluging in different
measures for computing the significance of a term.
Significance measures can be provided externally by overriding
- SignificanceHeuristic
- SignificanceHeuristicBuilder
- SignificanceHeuristicParser
closes#6561
We have to mark a shard as corrupted if necessary before the
shard failed event is fired ie. before we call the corresponding
listener in the engine. Otherwise the shard might be re-allocated
on the same node and just started up without being marked as corrupted.
Relates to #5924
Now both TransportRequest and TransportResponse inherit from a base TransportMessage that holds the message headers and also now added the remote transport address (where this message came from).
This started out as a simple correction to a missing setting problem, but go bigger into more general work on the ZenUnicastDiscoveryTets suite. It now works with both network and local mode. I also merge the different ZenUnicast test suites into a single place.
Closes#6835
The `recovery_after_time` tells the gateway to wait before starting recovery from disk. The goal here is to allow for more nodes to join the cluster and thus not start potentially unneeded replications. The `expectedNodes` setting (and friends) tells the gateway when it can start recovering even if the `recover_after_time` has not yet elapsed. However, `expectedNodes` is useless if one doesn't set `recovery_after_time`. This commit changes that by setting a sensible default of 5m for `recover_after_time` *if* a `expectedNodes` setting is present.
Closes#6742
Previously the size of the priority queue was wrongly set to the total number
of terms. Instead, it should be set to 'maxQueryTerms'. This makes the
selection of best terms O(n), instead of O(n*log(n)).
Jira patch: https://issues.apache.org/jira/browse/LUCENE-5795Closes#6657
Out of #6808, we improved the handling of a primary failing to make sure replicas that are initializing are properly failed as well. After double checking it, it has 2 problems, the first, if the same shard routing is failed again, there is no protection that we don't apply the failure (which we do in failed shard cases), and the other was that we already tried to handle it (wrongly) in the elect primary method.
This change fixes the handling to work correctly in the elect primary method, and adds unit tests to verify the behavior
The change also expose a problem in our handling of replica shards that stay initializing during primary failure and electing another replica shard as primary, where we need to cancel its ongoing recovery to make sure it re-starts from the new elected primary
closes#6825
Out of #6808, we improved the handling of a primary failing to make sure replicas that are initializing are properly failed as well. After double checking it, it has 2 problems, the first, if the same shard routing is failed again, there is no protection that we don't apply the failure (which we do in failed shard cases), and the other was that we already tried to handle it (wrongly) in the elect primary method.
This change fixes the handling to work correctly in the elect primary method, and adds unit tests to verify the behavior
closes#6816
This is what DateHistogramParser expects so will enable the builder to build valid requests using these variables.
Also added tests for preOffset and postOffset since these tests did not exist
Closes#5586
Since Lucene version 4.8 each file has a checksum written as it's
footer. We used to calculate the checksums for all files transparently
on the filesystem layer (Directory / Store) which is now not necessary
anymore. This commit makes use of the new checksums in a backwards
compatible way such that files written with the old checksum mechanism
are still compared against the corresponding Alder32 checksum while
newer files are compared against the Lucene build in CRC32 checksum.
Since now every written file is checksummed by default this commit
also verifies the checksum for files during recovery and restore if
applicable.
Closes#5924
This commit also has a fix for #6808 since the added tests in
`CorruptedFileTest.java` exposed the issue.
Closes#6808
Today, the tribe node needs the local node so it adds it when it starts, but other APIs would benefit from adding the local node, also, adding the local node should be done in a cleaner manner, where it belongs, which is right after the discovery service starts in the cluster service
closes#6811
This commit ensures that all primaries are allocated before we
restart the node. If one primary is in post recovery when we
restart it will not be allocated otherwise.
If the node initialisation fails, make sure the
node environment is closed correctly and thus
all locks (on data directories) being properly released.
Closes#6715
`mmapfs` is really good for random access but can have sideeffects if
memory maps are large depending on the operating system etc. A hybrid
solution where only selected files are actually memory mapped but others
mostly consumed sequentially brings the best of both worlds and
minimizes the memory map impact.
This commit mmaps only the `dvd` and `tim` file for fast random access
on docvalues and term dictionaries.
Closes#6636
especially the tests that check for update of mapping, we need to make sure that the cluster is green so mappings won't get override, also, put mapping during index creation when possible
- adds support for multiple logging configurations under the config dir (will pick up any logging.xxx in the config folder tree)
- plugins can now define a top level config directory that will be copied under es config dir and will be renamed after the plugin name (same as the support we have the plugin "bin" dirs)
Closes#6802
There is a special type of request that tries to not allocate another buffer when sending bytes request (used by the public cluster state action). With the new pages bytes reference support, the content can already be a composite channel buffer, take that into account when building the actual composite buffer that will be sent over the network
closes#6756
Today, if we miss on a get on setting, we try its camel case version. The assumption is that in our code, we never use camel case to lookup a setting, but we want to support camel case if the user provided one.
This can be expensive (#toCamelCase) when the get on the setting is done in a tight call, which is evident when running the allocation deciders as part of the reroute logic.
Instead of doing the camel case check on get, prepare an additional map that includes all the settings that are provided as came case, and try and lookup from it if needed.
closes#6765
At the moment one can iterate the MapperService to go through all document mappers. This includes the document mapper of DEFAULT_MAPPING, which may be surprising and lead to unintended results. This commit removes the Iterable implementation and add a docMappers method that asks the caller to make an explicit choice
Closes#6793
We check if the version map needs to be refreshed after we released
the readlock which can cause the the engine being closed before we
read the value from the volatile `indexWriter` field which can cause an
NPE on the indexing thread. This commit also fixes a potential uncaught
exception if the refresh failed due to the engine being already closed.
Relates to #6443Closes#6786
remove internal callas on FieldMappers#Names, we properly reuse FieldMapper, so there is no need to try and call intern in order to reuse the names. This can be heavy with many fields and continuous mapping parsing.
closes#6747
We use awaitBusy in our tests, the problem is that we have to check if it awaited or not, and then try and keep around somehow more info around why the predicate failed and a timeout happened.
The idea of assertBusy is to allow to simply write "regular" test code, and if the test code trips, it will busy wait till a timeout. This allows us to keep around the assertion information and properly throw it for information that is inherently kept in the failure itself.
If you are indexing tiny documents then the previous default (5000)
was too low, causing excessive fsyncs with high indexing rates. With
this change we now only flush by byte size (200 MB) by default for
better indexing performance for tiny documents.
Closes#6783
When refresh_interval is long or disabled, and indexing rate is high,
it's possible for live version map to use non-trivial amounts of RAM.
With this change we now trigger a refresh in such cases to clear the
version map so we don't use unbounded RAM.
Closes#6443
Aggregation name are now able to use any character except '[', ']', and '>". Dot syntax may still be used to reference values (e.g. in the order field) but may only defence the value directly beneath the last aggregation in the list. more complex structures will need to be accessed using the aggname[value] syntax
Closes#6702
When a master receives a cluster state from another node, it compares the local cluster state with the one it got. If the local one has a higher version, it sends a JoinClusterRequest to the other master to tell it step down. Because our network layer is asymmetric, we need to make sure we're connected before sending.
Closes#6779
Concrete indices is now called multiple times when needed instead of changing what's inside the incoming request with the concrete indices. Ideally we want to keep the original aliases or indices or wildcard expressions in the request.
Also made sure that the check blocks is done against the concrete indices, which wasn't the case for delete index, delete mapping, open index, close index, types exists and indices exists.
Closes#6694Closes#6777
In #6706 we change the master validation to start pining immediately after a new master as ellected or a node joined. The idea is to have a quicker response to failures. This does however create a problem if the new master has yet fully processed it's ellection and responds to the ping with a NoLongerMasterException. This causes the source node to remove the current master and ellect another, only to find out it's not a master either and so forth. We are moving this change to the feature/improve_zen branch, where the improvements we made will cause the situation to be handled properly.
This reverts commit ae16956e07.
This reverts:
"Test: Temporarily change delete/put_mapping to wait for green": commit e408f8f638c2dd97a3ec86c8a9ac940f43ab37a0.
"[TEST] wait for green to update mapping": commit b3641a2ee6eb23318d49f5f04b39149e70c2b65b.
The change added in #6762 helps making sure the pending mapping updates are processed on all nodes to prevent moving shards to nodes which are not yet fully aware of the new mapping. However it introduced a racing condition delete_mapping operations, potentially causing a type to be added after it's deletion. This commit solves this by only sending a mapping update if the mapping source has actually changed.
Closes#6772
the default mapping is not merged, but updated in place, and only put mapping API can change it, no need to make sure it has been properly updated on the master. This can cause conflicts when a put mapping for it happens at the same time.
closes#6771
When we know the ID for the document we are about to index was
auto-generated, we don't need to acquire/release the per-ID lock,
which might provide small speedups during highly concurrent indexing.
Closes#6584
The match_phrase_prefix provided the same explanation as the match_phrase
query. There was no indication that the last term was run as a prefix
query.
This change marks the last term (or terms if there are multiple terms
in the same position) with a *
Closes#2449
During phase1 we copy over all lucene segments. These may refer to mapping updates that are still queued up to be sent to master. We must make sure those pending updates are processed before completing the relocation.
Relates to #6648Closes#6762
we need to make sure we wait for all threads to finish executing, since there might still be a thread around even post await (i.e. just starting) performing updates
the pending tasks api will now include the current executing tasks (with a proper marker boolean flag)
this will also help in tests that wait for no pending tasks, to also wait till the current executing task is done
closes#6744