It create the following challenges:
- it automatically load all the norms for all fields. This should be an opt in feature similar to the new loading feature in field data. Will open a separate issue for it.
- It automatically loads all doc values for all fields (if they have it), overriding effectively the loading option of field data when its backed by doc values.
closes#4078
We now return acknowledged true when no wait is needed (mustAck always returns false). We do wait for the master node to complete its actions though. Previously it would try to timeout and hang due to a CountDown#fastForward call when the internal counter is set to 0
We now return acknowledged false without starting the timeout thread when the timeout is set 0, as starting the wait and immediately stopping the thread seems pointless.
Added coverage for ack in ClusterServiceTests
this map is a "true" immutable map, encapsulating an open impl, and has a builder that allows it to be built easily.
the builder has the optimization of using clone if its being built based on an existing immutable map.
we need to move to started from post recovery on cluster level changes, we need to make sure we handle a global state change of relocating, which can happen (and not pass through started)
Improve the introduction of new fields into the concrete parsed mappings by not relying on immutable maps and copying over entries, but instead using open maps (which will also use less memory), and using clone to perform the copy on write logic
This allows the RegexpQueryBuilder to be used in span queries
Added tests for all span multi term queries.
Also updated the documentation and removed mentioning of numeric range
queries for span queries (they have to be terms).
Closes#3392
This new API allows to get the mapping for a specific set of fields rather than get the whole index mapping and traverse it.
The fields to be retrieved can be specified by their full path, index name and field name and will be resolved in this order.
In case multiple field match, the first one will be returned.
Since we are now generating the output (rather then fall back to the stored mapping), you can specify `include_defaults`=true on the request to have default values returned.
Closes#3941
its preferable to execute the indices cluster state service as quickly as possible, as one of the first listeners, so it will apply the cluster state to the local state
potentially, it should even execute before it state is "visible" (through the state call), but that's another change...
make sure to throw the already exists exception, so when indexing into an alias, and it has not propagated yet through the cluster state, it will end up being ignored if it already exists
Some FieldData consumers require hash values per byte. We provide an optimization
that allows to cache the hashes internally if the consumer knows that they are needed
this optimization got lost in a previous commit. This commit adds them back and folds
the dedicated method into AtomicFieldData#getBytesValues(true|false)
As a side note, the internal reroute call is now part of the ack mechanism. That means that if the response contains acknowledged flag, the internal reroute that was eventually issued was acknowledged too. Also, even if the request is not acknowledged, the reroute is issued before returning, which means that there is no need to manually call reroute afterwards to make sure the new settings are immediately applied.
Closes#3995
Added support for serialization based on version to AcknowledgedResponse. Useful in api that don't support yet the acknowledged flag in the response.
Moved also ack warmer tests to more specific AckTests class
Close#3983
This addes the _cat/recovery/{index} API endpoint, which displays
information about the status of recovering shards. An example of the
output:
index shard node target recovered %
test2 0 Fwo7c_6MSdWM0uM1Ho4t-g 147304414 19236101 13.1%
test 0 Fwo7c_6MSdWM0uM1Ho4t-g 145891423 119640535 82.0%
Fixes#3969
This patch takes the version of the created index into account when a
prebuilt analyzer is created.
So, if an index was created with 0.90.4, then the prebuilt analyzers
will be the same than on the 0.90.4 release.
One reason for this feature is the possibility to change pre built
analyzers like the standard one.
The patch tries to reuse analyzers as mutch as possible. So even if
version X.Y.Z and X.Y.A use the same lucene analyzers, the same instance
is reused in order to prevent overcreation of lucene analyzer instances.
Closes#3790
The setting causes the upper bound for a range query/filter to be rounded up,
therefore the name `round_ceil` seems to make more sense.
Also this commit removes the redundant fourth parameter to DateMathParser.parse(..)
which was never used.
was: parse(String text, long now, boolean roundUp, boolean upperInclusive)
is now: parse(String text, long now, boolean roundCeil)
closes#3914
When running tests, Engine.searcher() is going to be an AssertingIndexSearcher
so we definitely don't want to discard it. This commit fixes it as well as the
bugs it found.
Closes#3987
In order to make sure that people do not get confused, if they
index a float as weight, it makes more sense to reject it instead of
silently parsing it to an integer and using it.
The CompletionFieldMapper now checks for the type of the number which
is being read and throws and exception if the number is something else
than int or long.
Closes#3977
Requires field index_options set to "offsets" in order to store positions and offsets in the postings list.
Considerably faster than the plain highlighter since it doesn't require to reanalyze the text to be highlighted: the larger the documents the better the performance gain should be.
Requires less disk space than term_vectors, needed for the fast_vector_highlighter.
Breaks the text into sentences and highlights them. Uses a BreakIterator to find sentences in the text. Plays really well with natural text, not quite the same if the text contains html markup for instance.
Treats the document as the whole corpus, and scores individual sentences as if they were documents in this corpus, using the BM25 algorithm.
Uses forked version of lucene postings highlighter to support:
- per value discrete highlighting for fields that have multiple values, needed when number_of_fragments=0 since we want to return a snippet per value
- manually passing in query terms to avoid calling extract terms multiple times, since we use a different highlighter instance per doc/field, but the query is always the same
The lucene postings highlighter api is quite different compared to the existing highlighters api, the main difference being that it allows to highlight multiple fields in multiple docs with a single call, ensuring sequential IO.
The way it is introduced in elasticsearch in this first round is a compromise trying not to change the current highlight api, which works per document, per field. The main disadvantage is that we lose the sequential IO, but we can always refactor the highlight api to work with multiple documents.
Supports pre_tag, post_tag, number_of_fragments (0 highlights the whole field), require_field_match, no_match_size, order by score and html encoding.
Closes#3704
You can configure the highlighting api to return an excerpt of a field
even if there wasn't a match on the field.
The FVH makes excerpts from the beginning of the string to the first
boundary character after the requested length or the boundary_max_scan,
whichever comes first. The Plain highlighter makes excerpts from the
beginning of the string to the end of the last token before the requested
length.
Closes#1171
Currently we have a marker interface for Acknowledged[Request|Response],
this makes not much sense since we duplicate the code in each subclass
or class that implements the interface. We can simply use abstract
classes and have it implemented only once.
This commit primarily folds [Double|Bytes|Long|GeoPoint]Values.Iter
into [Double|Bytes|Long|GeoPoint]Values. Iterations now don't require
a auxillary class (Iter) but instead driven by native for loops. All
[Double|Bytes|Long|GeoPoint]Values are stateful and provide `setDocId`
and `nextValue` methods to iterate over all values in a document.
This has several advantage:
* The amout of specialized classes is reduced
* Iteration is clearly stateful ie. Iters can't be confused to be local.
* All iterations are size bounded which prevents runtime checks and
allows JIT optimizations / loop un-rolling and most iterations are
branch free.
* Due to the bounded iteration the need for a `hasNext` method call
is removed.
* Value iterations feels more native.
This commit also adds consistent documentation and unifies the calcualtion
if SortMode is involved.
This commit also changes the runtime behavior of BytesValues#getValue() such that it
will never return `null` anymore. If a document has no value in a field
this method still returns a `BytesRef` with a `length` of 0. To identify
documents with no values #hasValue() or #setDocument(int) should be used.
The latter should be preferred if the value will be consumed in the case
the document has a value.
Have a separate channel for recovery, so it won't overflow the "low" channel which is also used for bulk indexing.
Also, rename the channel names to be more descriptive. Change low to bulk (for bulk based operations, currently just bulk indexing), med to reg (for "regular" operations), and high to state (for state based communication). The new channel for recovery will be named recovery, and the ping channel will remain the same.
closes#3954
When setting track scores, the scan search type will return the scores for each document. The Java API builder does not properly set this value (it only sets it if a sort in in place, which is not relevant for scan search type).
closes#3949
Currently we don't allow resetting the awareness
attribute via the API since it requires at least one
non-empty string to update the setting. This commit
allows resetting this using an empty string.
Closes#3931
The #3526 fix was not complete, it handled cases of on node execution, but didn't properly handle cases where it was executed over the network, and forcing the execution of the replica operation when done over the wire.
This relates to #3854closes#3929
Added new AckedClusterStateUpdateTask interface that can be used to submit cluster state update tasks and allows actions to be notified back when a set of (configurable) nodes have acknowledged the cluster state update. Supports a configurable timeout, so that we wait for acknowledgement for a limited amount of time (will be provided in the request as it curently happens, default 10s).
Internally, a low level AckListener is created (InternalClusterService) and passed to the publish method, so that it can be notified whenever each node responds to the publish request. Once all the expected nodes have responded or the timeoeout has expired, the AckListener notifies the action which will return adding the proper acknowledged flag to the response.
Ideally, this new mechanism will gradually replace the existing ones based on custom endpoints and notifications (per api).
Closes#3786
The suggest stop filter is an improved version of the stop filter, which
takes stopwords only into account if the last char of a query is a
whitespace. This allows you to keep stopwords, but to allow suggesting for
"a".
Example: Index document content "a word". You are now able to suggest for
"a" and get back results in the completion suggester, if the suggest stop
filter is used on the query side, but will not get back any results for
"a " as this is identified as a stopword.
The implementation allows to set the `remove_trailing` parameter for a
custom stop filter and thus use the suggest stop filter instead of the
standard stop filter.
- "boost" should be "boost_factor"
- "mult" should be "multiply"
Also, store combine function names in ImmutableMap instead of iterating
over all possible names each time.
closes#3872 for master
SynonymFilters produces token streams with stacked tokens such that
conjunction queries need to be parsed in a special way such that the
stacked tokens are added as an innner disjuncition.
Closes#3881
The array holding the payloads (TermVectorFields.payloads) is reused for each token. If the
previous token had payloads but the current token had not, then the payloads of the previous
token were returned, because the payloads of the previous token were never invalidated.
For example, for a field only contained two tokens each occurring once, the first having a
payload and the second not, then for the second token, the payload of the first was returned.
closes#3873
Extended the specialized simplified mapping source method to support metadata mapping fields. These fields can just specified as normal fields, but will automatically be placed as top level mapping field.
Closes#3818
Return a merge_id element in each segment of the segments API, allowing to group segments that are being merged as part of a single merge and indicate which ones are being merged now.
closes#3904
Some tests use AbstractIntegrationTest#indexRandom which sometimes uses async
indexing. This can easily run into queue size based rejections on a slow
box. In that case we should retry blocked indexing.
Now that we properly fixed the ability to set the queue size on the index / bulk thread pool, we should actually set them to a somehow reasonable value to protect from users potentially overflowing our system.
I suggest defaults to be 50 for bulk, and 200 for indexing.
Also, set the thread pool for get, which we should set (in a similar value to a "read" queue size we have today).
closes#3888
This commit adds support for random merge policies set for every
index created in an AbstractIntegrationTest. It will either set
'logbyte', 'logdoc' or 'tiered' merge policy as well as a random
value for compound files.
Added checks to IndexRequest#source(Object...) to ensure and even number
of parameters. This method otherwise throws an AIOOBException which is
confusing to users and doesn't report the root cause of the problem.
This commit allows for using Lucene doc values as a backend for field data,
moving the cost of building field data from the refresh operation to indexing.
In addition, Lucene doc values can be stored on disk (partially, or even
entirely), so that memory management is done at the operating system level
(file-system cache) instead of the JVM, avoiding long pauses during major
collections due to large heaps.
So far doc values are supported on numeric types and non-analyzed strings
(index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values
which is the only type to support multi-valued fields. Since the field data API
set is a bit wider than the doc values API set, some operations are not
supported:
- field data filtering: this will fail if doc values are enabled,
- field data cache clearing, even for memory-based doc values formats,
- getting the memory usage for a specific field,
- knowing whether a field is actually multi-valued.
This commit also allows for configuring doc-values formats on a per-field basis
similarly to postings formats. In particular the doc values format of the
_version field can be configured through its own field mapper (it used to be
handled in UidFieldMapper previously).
Closes#3806
Migrate SearchContext.Rewrite to Releasable. All the parent child queries are now implemented as Lucene queries and the state is kept in the Weight.
Completely disable caching for `has_child` and `has_parent` filters, this has never worked and it also can also never work. The matching docIds are cached per segment while the collection of parent ids is top level.
Closes#3822
* Merged segments are now warmed-up at the end of the merge operation instead
of _refresh, so that _refresh doesn't pay the price for the warm-up of merged
segments, which is often higher than flushed segments because of their size.
* Even when no _warmer is registered, some basic warm-up of the segments is
performed: norms, doc values (_version). This should help a bit people who
forget to register warmers.
* Eager loading support for the parent id cache and field data: when one
can't predict what terms will be present in the index, it is tempting to use
a match_all query in a warmer, but in that case, query execution might not be
much faster than field data loading so having a warmer that only loads field
data without running a query can be useful.
Closes#3819
I tried to install a plugin directly from disk, but used the wrong arguments. Totally my fault for not RTFM properly.
However, by following the instructions printed out by `bin/plugin`, I ended up deleting my $ES/bin directory.
```
$ pwd
/tmp/elasticsearch-0.90.5
$ ls
LICENSE.txt NOTICE.txt README.textile bin config lib
$ bin/plugin --install file:///tmp/foo.zip
-> Installing file:///tmp/foo.zip...
Failed to install file:///tmp/foo.zip, reason: plugin directory /tmp/elasticsearch-0.90.5/plugins already exists. To update the plugin, uninstall it first using -remove file:///tmp/foo.zip command
$ bin/plugin -remove file:///tmp/foo.zip
-> Removing file:///tmp/foo.zip
Removed file:///tmp/foo.zip
$ ls
LICENSE.txt NOTICE.txt README.textile config lib
```
I reproduced the problem in 0.90.5 and the latest master.
Closes#3847.
Adding the mappers first could cause the wrong FieldMappers to be used during object parsing
Also:
Removed MergeContext.newFieldMappers, MergeContext.newObjectMappers and relatives as they are not used anymore.
Similarly removed MergeContext.addFieldDataChange & fieldMapperChanges as they were not used.
Closes#3844
Due to a change in 0.90 some exceptions may show up when starting a river,
even though this is not a problem as the shard has not been initialized
yet and a retry is scheduled.
Only valid fields (input, output, weight and payload) are now allowed inside a completion field.
This makes sure that typos like "inputs" are caught on indexing.
When parsing a filter, we effectively have a case where we need to ignore a filter. For example, when an "and" filter has no elements. Ignoring a filter is different compared to matching on all or matching on none, because an and filter with no elements should simply be ignored in the context of a bool filter for example, regardless if its used within a must or a must_not filter.
We should be consistent in our codebase when handling these use case. See also #3809closes#3838
Added a load_average_format=array|hash param to OsStats to allow serializing load averages into a hash (with keys 1m,5m, 15m).
Added node_info_format to NodeStats, allowing to suppress node info output
Also:
We always return a valid json back
If there are no templates define we return 200 OK ( as opposed to a request for a specific template id which returns 404)
Closes#3812
If 'pattern_capture' tokenfilter is create / mapped without a 'patterns'
settings we now throw an exception since this is a misconfiguration and
likely due to the similar settings on related token filters.
Closes#3808
If nodes are shutting down we close thread pools and throw
'EsRejectedExcutionException'. This commit handles these exceptions
gracefully if throw during Ping execution.
This can go wrong if indices with the same name are repeatably created and deleted.
UUIDs can not be null anymore. If UUID is not available `_na_` will be used as a value.
Also - some minor clean up in ShardStateAction where shard started events could be added twice to the to-be-applied list where the second instance will be ignored.
Closes#3783
instead of relying on the current cluster state from the cluster service, make sure to rely on the cluster state we get from the change event, this will allow us to move processing of the cluster event around potentially to before the local cluster state has been updated
Introduce a new internal, index shard level, post recovery state, where the shard moves to when its done with recovery. The shard will now move to started only once the cluster state with its respective cluster state level state is started.
This change allow to have more fine grained control over when to allow reads on a shard, resolving potential refresh temporal visibility aspects while indexing and issuign a refresh. By only allowing reads on started shards, and making sure we refresh right before we move to started
Added node restart capabilities to TestCluster
Trigger retry mechanism (onFailure method) instead of invoking transport service with null DiscoveryNode when no DiscoveryNode can be found for a ShardRouting.
Catching Throwable instead of Exception in TransportClient
and TransportClientNodesService and restore interrupted flag
if interrupt exception is caught and ignored
TransportClient doesn't add the initial nodes to the nodes list
if it doesn't retrieve any nodes from the listeners which can cause
the transport client to throw a 'NoNodeAvailableException' if the
'sniff' response didn't return any nodes. This situation can occure
if the client tries to get the listener nodes cluster state while that
node is not yet connected to any other nodes.
This assertion module also injects an AssertingIndexSearcher that
checks if our queries are all compliant with the lucene specification
which is improtant for future updates and changes in the upstream project.
There was a small window of time where the transport response handler's handException method was invoked twice. As far as I can tell this happened when node disconnect event was processed just after the request was registered and between a "Node not connected" error was thrown. The TransportService#sendRequest method would invoke the transport response handler's handException method regardless if it was already invoked. This resulted that for one request failure, two retries were executed.
The mpercolate api has an assert that tripped when more than the expected shard level responses were returned. This was caused by the issue described above. For the a single shard level request we had multiple responses and this broke the the the total excepted responses. Also the reduce could be started prematurely, which resulted in an incorrect final response (e.g. total count being incorrect). For example: two shards in total, shard 0 gets reduces twice. The second shard 0 response gets in just before shard 1 response gets in. The reduce starts without shard 1 response.
The master node processing changes to cluster state, and part of the processing is publishing the cluster state to other nodes. It does not wait for the cluster state to be processed on the other nodes before it moves on to the next cluster state processing job.
This is fine, we support out of order cluster state events using versioning, and nodes can handle those cases. It does lead though to non optimal API semantics. For example, when issuing cluster health, and waiting for green state, the master node will report back once the cluster is green based on its cluster state, but that mentioned "green" state might not have been received by all other nodes yet.
Add a discovery.zen.publish_timeout setting, and default it to 5s. This will give a best effort into making sure all nodes will process a cluster state within a window of time.
closes#3736
The average and sum comparators basically share the same code which is
copy-past today. We can simplify this into a base class which reduces
code duplication and prevents copy-paste bugs.
use service id for pid name
disable filtering on *.exe (caused corruption)
rename exe names and add more options to .bat
start/stop operations are now supported (and expected to be called) by service.bat
add more variables from the env to customize default behavior prior to installing the service
add manager option
fixes regarding batch flow
specify service id in description
minor readability improvement
include .exe only in ZIP archive
rename x64 service id to make it work out of the box
add elasticsearch as a service for Windows platforms
based on Apace Commons Daemon
supports both x64 and x86
Compared to setting node.local to true, would be nicer to support node.mode with values of local or network.
Note, node.local is still supported.
closes#3713
Stats can be retrieved on a per-feature / per-component basis including the fields
they apply to. This commit add support for a 'completion' flag to include statistics
for the complition feature as well as 'completion_fields' to only
include certain fields into the returned statistics.
To disambiguate between 'fielddata' and 'completion' fields this commit
uses 'fields' as the default inclusion filter for stats fields only used
if not dedicated '[completion|fielddata]_fields' paramter is provided.
Relates to #3522
Add a dedicated suggest thread pool for the suggest API. With the new completion suggest type, which is purely CPU bounded, it makes more sense to have a dedicated thread pool for suggest compared to having it share the search thread pool and "competing" against other search operations.
closes#3698
When a dynamic type is introduced during indexing, the node that introduces it sends the fact that its added to the master, to be added to the master node. The master node then adds it to the index metadata and republishes that fact.
In order not to delete the mapping while the new type is introduced on the node that introduced it, we keep a map of seen mappings, and remove a mapping type when we already processed it.
The map is not properly cleared though in all places where an actual index service is being removed on a node.
closes#3697
Refresh flag in optimize is problematic, since the shards refresh is allowed to execute on is different compared to the optimize shards. In order to do optimize and then refresh, they should be executed as separate APIs when needed.
closes#3690
Refresh flag in flush is problematic, since the shards refresh is allowed to execute on is different compared to the flush shards. In order to do flush and then refresh, they should be executed as separate APIs when needed.
closes#3689
In case of retries, we update the clusterState and shardsIt, make sure they are visible using volatile (even though updates will probably go through a memory barrier, this might explain rare failure we see when retry happens)
In 0.90.4, we deprecated some code:
* `GetIndexTemplatesRequest#GetIndexTemplatesRequest(String)` moved to `GetIndexTemplatesRequest#GetIndexTemplatesRequest(String...)`
* `GetIndexTemplatesRequest#name(String)` moved to `GetIndexTemplatesRequest#names(String...)`
* `GetIndexTemplatesRequest#name()` moved to `GetIndexTemplatesRequest#names()`
* `GetIndexTemplatesRequestBuilder#GetIndexTemplatesRequestBuilder(IndicesAdminClient, String)` moved to `GetIndexTemplatesRequestBuilder#GetIndexTemplatesRequestBuilder(IndicesAdminClient, String...)`
* `IndicesAdminClient#prepareGetTemplates(String)` moved to `IndicesAdminClient#prepareGetTemplates(String...)`
* `AbstractIndicesAdminClient#prepareGetTemplates(String)` moved to `AbstractIndicesAdminClient#prepareGetTemplates(String...)`
We can now remove that old methods in 1.0.
**Note**: it breaks the Java API
Relative to #2532.
Closes#3681.
/_template shows: No handler found for uri [/_template] and method [GET]
It would make sense to list the templates as they are listed in the /_cluster/state call.
Closes#2532.
This commit adds a test that checks that we are closing
all files even if there are random IOExceptions happening. The test
found several places where due to IOExceptions streams were not
closed due to unsufficient exception handling.
MockDirectoryWrapper adds asserting logic to the low level directory
implementation that helps to track and catch resource leaks like
unclosed index inputs caused by dangling IndexReader or IndexSearcher
instances. It prevents double writes to files and allows low level
random exceptions to be thrown for testing index consistency etc.
Closes#3654
Dynamic mapping allow to dynamically introduce new fields into an existing mapping. There is a (pretty rare) race condition, where a new field/object being introduced will not be immediately visible for another document that introduces it at the same time.
closes#3667, closes#3544
Improved LoggingListener (which reads and applies @TestLogging annotation) to take into account the logger prefix. We can now use the @TestLogging annotation and specify either the whole package name (e.g. o.e.action.metadata) or only the package name without the logger prefix (action.metadata), the custom log level will be properly applied in both cases
Enabled trace logging for RecoveryPercolatorTests#testMultiPercolator_recovery test
Cleaned up and removed unnecessary usage of concurrent collections.
The clear scroll api allows clear all resources associated with a `scroll_id` by deleting the `scroll_id` and its associated SearchContext.
Closes#3657
Now with have proper exit codes for elasticsearch plugin manager (see #3463), we can add a silent mode to plugin manager.
```sh
bin/plugin --install karmi/elasticsearch-paramedic --silent
```
Closes#3628.
Based on recent bugs ( #3652 ) where searchers were acquired multiple times
but never released 'IndexShard#searcher()' has not a more accurate name.
Closes#3653
When a node shuts down thread pools might throw
ESRejectedExecutionException and our test framework fails tests if
exceptions are not caught hitting uncaught exception handler.
When installing a plugin, we use:
```sh
bin/plugin --install groupid/artifactid/version
```
But when removing the plugin, we only support:
```sh
bin/plugin --remove dirname
```
where `dirname` is the directory name of the plugin under `/plugins` dir.
Closes#3421.
The inner call to the completion stats pulled a second searcher
that never got released causing the underlying readers to never
get closed unless the node is shut down. This was triggered
with literally each stats call including the completion stats
even if no completion service was used on the index.
Closes#3652
This commit adds two main pieces, the first is a ClusterInfoService
that provides a service running on the master nodes that fetches the
total/free bytes for each data node in the cluster as well as the
sizes of all shards in the cluster. This information is gathered by
default every 30 seconds, and can be changed dynamically by setting
the `cluster.info.update.interval` setting. This ClusterInfoService
can hopefully be used in the future to weight nodes for allocation
based on their disk usage, if desired.
The second main piece is the DiskThresholdDecider, which can disallow
a shard from being allocated to a node, or from remaining on the node
depending on configuration parameters. There are three main
configuration parameters for the DiskThresholdDecider:
`cluster.routing.allocation.disk.threshold_enabled` controls whether
the decider is enabled. It defaults to false (disabled). Note that the
decider is also disabled for clusters with only a single data node.
`cluster.routing.allocation.disk.watermark.low` controls the low
watermark for disk usage. It defaults to 0.70, meaning ES will not
allocate new shards to nodes once they have more than 70% disk
used. It can also be set to an absolute byte value (like 500mb) to
prevent ES from allocating shards if less than the configured amount
of space is available.
`cluster.routing.allocation.disk.watermark.high` controls the high
watermark. It defaults to 0.85, meaning ES will attempt to relocate
shards to another node if the node disk usage rises above 85%. It can
also be set to an absolute byte value (similar to the low watermark)
to relocate shards once less than the configured amount of space is
available on the node.
Closes#3480
It supports multiple logger:level comma separated key value pairs
Use the _root keyword to set the root logger level
e.g. @Logging("_root:DEBUG,org.elasticsearch.cluster.metadata:TRACE")
or just @TestLogging("_root:DEBUG,cluster.metadata:TRACE") since we start the test with -Des.logger.prefix=
The completion suggester reserves 0x00 and 0xFF as for internal use.
If those chars are used in the input string an IAE is thrown and the
input is rejected.
Closes#3648
SuggestSearchTests had tons of duplicate code and didn't use all of the
fancy new integration test helper method. I've removed a ton of duplicate
code and used as many of the nice test helper method I could think of.
Closes#3611
The reason behind this is that the `_name` support has been extended to queries as well and the name `namedQueries` suggests better that it is applicable to the whole query dsl.
Relates to #3581
Sometimes, one wants to just control the number of processors our different size base calculations for thread pools and network workers are based on, and not use the reported available processor by the OS. Add processors setting, where it can be controlled.
closes#3643
Currently, in NettyTransport the locks for connecting and disconnecting channels are stored in a ConcurrentMap that has 500 entries. A tread can acquire a lock for an id and the lock returned is chosen on a hash function computed from the id.
Unfortunately, a collision of two ids can cause a deadlock as follows:
Scenario: one master (no data), one datanode (only data node)
DiscoveryNode id of master is X
DiscoveryNode id of datanode is Y
Both X and Y cause the same lock to be returned by NettyTransport#connectLock()
Both are up and running, all is fine until master stops.
Thread 1: The master fault detection of the datanode is notified (onNodeDisconnected()), which in turn leads the node to try and reconnect to master via the callstack titled "Thread 1" below.
-> connectToNode() is called and lock for X is acquired. The method waits for 45s for the cannels to reconnect.
Furthermore, Thread 1 holds the NettyTransport#masterNodeMutex.
Thread 2: The connection fails with an exception (connection refused, see callstack below), because the master shut down already. The exception is handled in NettyTransport#exceptionCaught which calls NettyTransport#disconnectFromNodeChannel. This method acquires the lock for Y (see Thread 2 below).
Now, if Y and X have two different locks, this would get the lock, disconnect the channels and notify thread 1. But since X and Y have the same locks, thread 2 is deadlocked with thread 1 which waits for 45s.
In this time, no thread can acquire the masterNodeMutex (held by thread 1), so the node can, for example, not stop.
This commit introduces a mechanism that assures unique locks for unique ids. This lock is not reentrant and therfore assures that threads can not end up in an infinite recursion (see Thread 3 below for an example on how a thread can aquire a lock twice).
While this is not a problem right now, it is potentially dangerous to have it that way, because the callstacks are complex as is and slight changes might cause unecpected recursions.
Thread 1
----
owns: Object (id=114)
owns: Object (id=118)
waiting for: DefaultChannelFuture (id=140)
Object.wait(long) line: not available [native method]
DefaultChannelFuture(Object).wait(long, int) line: 461
DefaultChannelFuture.await0(long, boolean) line: 311
DefaultChannelFuture.awaitUninterruptibly(long) line: 285
NettyTransport.connectToChannels(NettyTransport$NodeChannels, DiscoveryNode) line: 672
NettyTransport.connectToNode(DiscoveryNode, boolean) line: 609
NettyTransport.connectToNode(DiscoveryNode) line: 579
TransportService.connectToNode(DiscoveryNode) line: 129
MasterFaultDetection.handleTransportDisconnect(DiscoveryNode) line: 195
MasterFaultDetection.access$0(MasterFaultDetection, DiscoveryNode) line: 188
MasterFaultDetection$FDConnectionListener.onNodeDisconnected(DiscoveryNode) line: 245
TransportService$Adapter$2.run() line: 298
EsThreadPoolExecutor(ThreadPoolExecutor).runWorker(ThreadPoolExecutor$Worker) line: 1145
ThreadPoolExecutor$Worker.run() line: 615
Thread.run() line: 724
Thread 2
-------
waiting for: Object (id=114)
NettyTransport.disconnectFromNodeChannel(Channel, Throwable) line: 790
NettyTransport.exceptionCaught(ChannelHandlerContext, ExceptionEvent) line: 495
MessageChannelHandler.exceptionCaught(ChannelHandlerContext, ExceptionEvent) line: 228
MessageChannelHandler(SimpleChannelUpstreamHandler).handleUpstream(ChannelHandlerContext, ChannelEvent) line: 112
DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline$DefaultChannelHandlerContext, ChannelEvent) line: 564
DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(ChannelEvent) line: 791
SizeHeaderFrameDecoder(FrameDecoder).exceptionCaught(ChannelHandlerContext, ExceptionEvent) line: 377
SizeHeaderFrameDecoder(SimpleChannelUpstreamHandler).handleUpstream(ChannelHandlerContext, ChannelEvent) line: 112
DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline$DefaultChannelHandlerContext, ChannelEvent) line: 564
DefaultChannelPipeline.sendUpstream(ChannelEvent) line: 559
Channels.fireExceptionCaught(Channel, Throwable) line: 525
NioClientBoss.processSelectedKeys(Set<SelectionKey>) line: 110
NioClientBoss.process(Selector) line: 79
NioClientBoss(AbstractNioSelector).run() line: 312
NioClientBoss.run() line: 42
ThreadRenamingRunnable.run() line: 108
DeadLockProofWorker$1.run() line: 42
ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145
ThreadPoolExecutor$Worker.run() line: 615
Thread.run() line: 724
Thread 3
---------
org.elasticsearch.transport.netty.NettyTransport.disconnectFromNode(NettyTransport.java:772)
org.elasticsearch.transport.netty.NettyTransport.access$1200(NettyTransport.java:92)
org.elasticsearch.transport.netty.NettyTransport$ChannelCloseListener.operationComplete(NettyTransport.java:830)
org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427)
org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:418)
org.jboss.netty.channel.DefaultChannelFuture.setSuccess(DefaultChannelFuture.java:362)
org.jboss.netty.channel.AbstractChannel$ChannelCloseFuture.setClosed(AbstractChannel.java:355)
org.jboss.netty.channel.AbstractChannel.setClosed(AbstractChannel.java:185)
org.jboss.netty.channel.socket.nio.AbstractNioChannel.setClosed(AbstractNioChannel.java:197)
org.jboss.netty.channel.socket.nio.NioSocketChannel.setClosed(NioSocketChannel.java:84)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:357)
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:58)
org.jboss.netty.channel.Channels.close(Channels.java:812)
org.jboss.netty.channel.AbstractChannel.close(AbstractChannel.java:197)
org.elasticsearch.transport.netty.NettyTransport$NodeChannels.closeChannelsAndWait(NettyTransport.java:892)
org.elasticsearch.transport.netty.NettyTransport$NodeChannels.close(NettyTransport.java:879)
org.elasticsearch.transport.netty.NettyTransport.disconnectFromNode(NettyTransport.java:778)
org.elasticsearch.transport.netty.NettyTransport.access$1200(NettyTransport.java:92)
org.elasticsearch.transport.netty.NettyTransport$ChannelCloseListener.operationComplete(NettyTransport.java:830)
org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427)
org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:418)
org.jboss.netty.channel.DefaultChannelFuture.setSuccess(DefaultChannelFuture.java:362)
org.jboss.netty.channel.AbstractChannel$ChannelCloseFuture.setClosed(AbstractChannel.java:355)
org.jboss.netty.channel.AbstractChannel.setClosed(AbstractChannel.java:185)
org.jboss.netty.channel.socket.nio.AbstractNioChannel.setClosed(AbstractNioChannel.java:197)
org.jboss.netty.channel.socket.nio.NioSocketChannel.setClosed(NioSocketChannel.java:84)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:357)
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:724)
We changed the default of BytesStreamOutput (used in various places in ES) to 32k from 1k with the assumption that most stream tend to be large. This doesn't hold for example when indexing small documents and adding them using XContentBuilder (which will have a large overhead).
Default the buffer size to 2k now, but be relatively aggressive in expanding the buffer when below 256k (double it), and just use oversize (1/8th) when larger to try and minimize garbage and buffer copies.
relates to #3624closes#3638
we can use the index in the node ids list as the index for the array when we set the response or the exception, removing the need for an index AtomicInteger
This reverts commit e943cc81a5.
The more complex queries support causes StackOverflowErrors that
can influence the cluster performance and stability dramatically.
This commit backs out this change to reduce the risk for deep
stacks.
Reverts #3357
If the index shard is not started yet the underlying engine can
throw an exception since not everything is initialized. This can
happen if a shard is just about starting up or recovering / initalizing
and stats are requested.
Closes#3619
The unassinged list is used to make allocation decisions but is currently
modified during allocation runs which causes primaries to be throttled
during allocation. If this happens newly allocated indices can be stalled
for a long time turning a cluster into a RED state if concurrent relocations
and / or recoveries are happening.
Closes#3610
Whether we remove the top-level folder from the archive depends now on the zip itself and not on where it was downloaded from. That makes it work installing local files too.
Closes#3582
Restrict the size of the input length to a reasonable size otherwise very
long strings can cause StackOverflowExceptions deep down in lucene land.
Yet, this is simply a saftly limit set to `50` UTF-16 codepoints by default.
This limit is only present at index time and not at query time. If prefix
completions > 50 UTF-16 codepoints are expected / desired this limit should be raised.
Critical string sizes are beyone the 1k UTF-16 Codepoints limit.
Closes#3596
Added more debug logging to TransportShardReplicationOperationAction
*Changed* node naming in tests from GUIDs to node# based naming as they are much easier to read
The goal is to throw out suggestions that only meet the cutoff in some
shards. This will happen if your input phrase is only contained in a
few shards. If your shards are unbanced this rechecking can throw out
good suggestions.
Closes#3547.
In a special case if allocations from the "heaviest" to the "lighter" nodes is not possible
due to some allocation decider restrictions like zone awareness. if one zone has for instance
less nodes than another zone so one zone is horribly overloaded from a balanced perspective but we
can't move to the "lighter" shards since otherwise the zone would go over capacity.
Closes#3580