TestCluster can currently only be used in a globally shared scope.
This commit adds the ability to use the TestCluster in 3 different
scopes per test-suite. The scopes are 'Global', 'Suite' and 'Test'
where the cluster is shared across all tests, across all test methods or
not at all respectivly.
Subclasses of AbstractIntegrationTest (formerly AbstractSharedClusterTest)
can add an annotation if they need a different scope than Global (default):
```
@ClusterScope(scope=Scope.Suite, numNodes=1)
```
This also allows to specify the number of shared nodes in that TestCluster
that are available when a test starts.
The cleanups in this commit include:
- s/Elasticsearch/ElasticSearch/g on test classes
- Move test classes in org.elasticsearch.test
This assertion module also injects an AssertingIndexSearcher that
checks if our queries are all compliant with the lucene specification
which is improtant for future updates and changes in the upstream project.
There was a small window of time where the transport response handler's handException method was invoked twice. As far as I can tell this happened when node disconnect event was processed just after the request was registered and between a "Node not connected" error was thrown. The TransportService#sendRequest method would invoke the transport response handler's handException method regardless if it was already invoked. This resulted that for one request failure, two retries were executed.
The mpercolate api has an assert that tripped when more than the expected shard level responses were returned. This was caused by the issue described above. For the a single shard level request we had multiple responses and this broke the the the total excepted responses. Also the reduce could be started prematurely, which resulted in an incorrect final response (e.g. total count being incorrect). For example: two shards in total, shard 0 gets reduces twice. The second shard 0 response gets in just before shard 1 response gets in. The reduce starts without shard 1 response.
The master node processing changes to cluster state, and part of the processing is publishing the cluster state to other nodes. It does not wait for the cluster state to be processed on the other nodes before it moves on to the next cluster state processing job.
This is fine, we support out of order cluster state events using versioning, and nodes can handle those cases. It does lead though to non optimal API semantics. For example, when issuing cluster health, and waiting for green state, the master node will report back once the cluster is green based on its cluster state, but that mentioned "green" state might not have been received by all other nodes yet.
Add a discovery.zen.publish_timeout setting, and default it to 5s. This will give a best effort into making sure all nodes will process a cluster state within a window of time.
closes#3736
If a request hasn't been acknowledged, there's no guarantee for any node to hold the up-to-date cluster state (not even the master yet, as the execution is asynchronous)
The average and sum comparators basically share the same code which is
copy-past today. We can simplify this into a base class which reduces
code duplication and prevents copy-paste bugs.
While testing an async system providing reproducible tests that
use randomized components is a hard task we should at least try to
reestablish the enviroment of a failing test as much as possible.
This commit allows to re-establish the shared 'TestCluster' by
resetting the cluster to a predefined shared state before each test.
Before this commit a tests that is executed in isolation was likely
using a entirely different node enviroment as the failing test since
the 'TestCluster' kept intermediate nodes started by other tests around.
use service id for pid name
disable filtering on *.exe (caused corruption)
rename exe names and add more options to .bat
start/stop operations are now supported (and expected to be called) by service.bat
add more variables from the env to customize default behavior prior to installing the service
add manager option
fixes regarding batch flow
specify service id in description
minor readability improvement
include .exe only in ZIP archive
rename x64 service id to make it work out of the box
add elasticsearch as a service for Windows platforms
based on Apace Commons Daemon
supports both x64 and x86
Compared to setting node.local to true, would be nicer to support node.mode with values of local or network.
Note, node.local is still supported.
closes#3713
The SearchWithRandomExceptionTests aim to catch problems when resources
are not closed due to exceptions etc. Yet in some cases the random seeds
cause the index to never be fully allocated so we basically go into a
ping-pong state where we try to allocate shards back and forth on nodes.
This causes all docs to time out which in-turn causes the tests to run
for a very long time (hours or days).
If we can not allocate the index and get to a yellow state we simply
index only one doc and expected all searches to fail.
This commit also beefs up the assertions in this test to check if
documents are actually present if they are indexed and refresh was
successful.
Closes#3694
Stats can be retrieved on a per-feature / per-component basis including the fields
they apply to. This commit add support for a 'completion' flag to include statistics
for the complition feature as well as 'completion_fields' to only
include certain fields into the returned statistics.
To disambiguate between 'fielddata' and 'completion' fields this commit
uses 'fields' as the default inclusion filter for stats fields only used
if not dedicated '[completion|fielddata]_fields' paramter is provided.
Relates to #3522
Add a dedicated suggest thread pool for the suggest API. With the new completion suggest type, which is purely CPU bounded, it makes more sense to have a dedicated thread pool for suggest compared to having it share the search thread pool and "competing" against other search operations.
closes#3698
When a dynamic type is introduced during indexing, the node that introduces it sends the fact that its added to the master, to be added to the master node. The master node then adds it to the index metadata and republishes that fact.
In order not to delete the mapping while the new type is introduced on the node that introduced it, we keep a map of seen mappings, and remove a mapping type when we already processed it.
The map is not properly cleared though in all places where an actual index service is being removed on a node.
closes#3697
Refresh flag in optimize is problematic, since the shards refresh is allowed to execute on is different compared to the optimize shards. In order to do optimize and then refresh, they should be executed as separate APIs when needed.
closes#3690
Refresh flag in flush is problematic, since the shards refresh is allowed to execute on is different compared to the flush shards. In order to do flush and then refresh, they should be executed as separate APIs when needed.
closes#3689
Both package types, RPM and deb now contain an option to not restart on upgrade.
This option can be configure in /etc/default/elasticsearch for dpkg based systems
and /etc/sysconfig/elasticsearch for rpm based systems.
By default the setting is as before, where a restart is executed on upgrade.
Closes#3685
In case of retries, we update the clusterState and shardsIt, make sure they are visible using volatile (even though updates will probably go through a memory barrier, this might explain rare failure we see when retry happens)
Introduced ElasticsearchAssertions that check for failures in all response
Added comment to flush requests as we don't check for failures there
Attempt to remove Thread.sleep in testChangeInitialShardsRecovery in favour of awaitBusy block
Added awaitBusy block in testQuorumRecovery waiting for YELLOW. We could have GREEN after the close node but wa want to wait for the YELLOW state.
As we are within an awaitBusy block, it doesn't make sense to have an assertion, since it would fail the test instead of waiting till the condition is verified (till timeout expires)
In 0.90.4, we deprecated some code:
* `GetIndexTemplatesRequest#GetIndexTemplatesRequest(String)` moved to `GetIndexTemplatesRequest#GetIndexTemplatesRequest(String...)`
* `GetIndexTemplatesRequest#name(String)` moved to `GetIndexTemplatesRequest#names(String...)`
* `GetIndexTemplatesRequest#name()` moved to `GetIndexTemplatesRequest#names()`
* `GetIndexTemplatesRequestBuilder#GetIndexTemplatesRequestBuilder(IndicesAdminClient, String)` moved to `GetIndexTemplatesRequestBuilder#GetIndexTemplatesRequestBuilder(IndicesAdminClient, String...)`
* `IndicesAdminClient#prepareGetTemplates(String)` moved to `IndicesAdminClient#prepareGetTemplates(String...)`
* `AbstractIndicesAdminClient#prepareGetTemplates(String)` moved to `AbstractIndicesAdminClient#prepareGetTemplates(String...)`
We can now remove that old methods in 1.0.
**Note**: it breaks the Java API
Relative to #2532.
Closes#3681.
/_template shows: No handler found for uri [/_template] and method [GET]
It would make sense to list the templates as they are listed in the /_cluster/state call.
Closes#2532.
This commit adds a test that checks that we are closing
all files even if there are random IOExceptions happening. The test
found several places where due to IOExceptions streams were not
closed due to unsufficient exception handling.
MockDirectoryWrapper adds asserting logic to the low level directory
implementation that helps to track and catch resource leaks like
unclosed index inputs caused by dangling IndexReader or IndexSearcher
instances. It prevents double writes to files and allows low level
random exceptions to be thrown for testing index consistency etc.
Closes#3654
Tests now support ignoring integartion tests by passing
'-Dtests.integration=false' to the test runner either via IDE or
maven to only run fast unittests.
Dynamic mapping allow to dynamically introduce new fields into an existing mapping. There is a (pretty rare) race condition, where a new field/object being introduced will not be immediately visible for another document that introduces it at the same time.
closes#3667, closes#3544
Improved LoggingListener (which reads and applies @TestLogging annotation) to take into account the logger prefix. We can now use the @TestLogging annotation and specify either the whole package name (e.g. o.e.action.metadata) or only the package name without the logger prefix (action.metadata), the custom log level will be properly applied in both cases
Enabled trace logging for RecoveryPercolatorTests#testMultiPercolator_recovery test
Cleaned up and removed unnecessary usage of concurrent collections.
The clear scroll api allows clear all resources associated with a `scroll_id` by deleting the `scroll_id` and its associated SearchContext.
Closes#3657
Now with have proper exit codes for elasticsearch plugin manager (see #3463), we can add a silent mode to plugin manager.
```sh
bin/plugin --install karmi/elasticsearch-paramedic --silent
```
Closes#3628.
Based on recent bugs ( #3652 ) where searchers were acquired multiple times
but never released 'IndexShard#searcher()' has not a more accurate name.
Closes#3653
When a node shuts down thread pools might throw
ESRejectedExecutionException and our test framework fails tests if
exceptions are not caught hitting uncaught exception handler.
When installing a plugin, we use:
```sh
bin/plugin --install groupid/artifactid/version
```
But when removing the plugin, we only support:
```sh
bin/plugin --remove dirname
```
where `dirname` is the directory name of the plugin under `/plugins` dir.
Closes#3421.
The inner call to the completion stats pulled a second searcher
that never got released causing the underlying readers to never
get closed unless the node is shut down. This was triggered
with literally each stats call including the completion stats
even if no completion service was used on the index.
Closes#3652
This commit adds two main pieces, the first is a ClusterInfoService
that provides a service running on the master nodes that fetches the
total/free bytes for each data node in the cluster as well as the
sizes of all shards in the cluster. This information is gathered by
default every 30 seconds, and can be changed dynamically by setting
the `cluster.info.update.interval` setting. This ClusterInfoService
can hopefully be used in the future to weight nodes for allocation
based on their disk usage, if desired.
The second main piece is the DiskThresholdDecider, which can disallow
a shard from being allocated to a node, or from remaining on the node
depending on configuration parameters. There are three main
configuration parameters for the DiskThresholdDecider:
`cluster.routing.allocation.disk.threshold_enabled` controls whether
the decider is enabled. It defaults to false (disabled). Note that the
decider is also disabled for clusters with only a single data node.
`cluster.routing.allocation.disk.watermark.low` controls the low
watermark for disk usage. It defaults to 0.70, meaning ES will not
allocate new shards to nodes once they have more than 70% disk
used. It can also be set to an absolute byte value (like 500mb) to
prevent ES from allocating shards if less than the configured amount
of space is available.
`cluster.routing.allocation.disk.watermark.high` controls the high
watermark. It defaults to 0.85, meaning ES will attempt to relocate
shards to another node if the node disk usage rises above 85%. It can
also be set to an absolute byte value (similar to the low watermark)
to relocate shards once less than the configured amount of space is
available on the node.
Closes#3480
It supports multiple logger:level comma separated key value pairs
Use the _root keyword to set the root logger level
e.g. @Logging("_root:DEBUG,org.elasticsearch.cluster.metadata:TRACE")
or just @TestLogging("_root:DEBUG,cluster.metadata:TRACE") since we start the test with -Des.logger.prefix=
The completion suggester reserves 0x00 and 0xFF as for internal use.
If those chars are used in the input string an IAE is thrown and the
input is rejected.
Closes#3648
SuggestSearchTests had tons of duplicate code and didn't use all of the
fancy new integration test helper method. I've removed a ton of duplicate
code and used as many of the nice test helper method I could think of.
Closes#3611
The reason behind this is that the `_name` support has been extended to queries as well and the name `namedQueries` suggests better that it is applicable to the whole query dsl.
Relates to #3581
Sometimes, one wants to just control the number of processors our different size base calculations for thread pools and network workers are based on, and not use the reported available processor by the OS. Add processors setting, where it can be controlled.
closes#3643
Currently, in NettyTransport the locks for connecting and disconnecting channels are stored in a ConcurrentMap that has 500 entries. A tread can acquire a lock for an id and the lock returned is chosen on a hash function computed from the id.
Unfortunately, a collision of two ids can cause a deadlock as follows:
Scenario: one master (no data), one datanode (only data node)
DiscoveryNode id of master is X
DiscoveryNode id of datanode is Y
Both X and Y cause the same lock to be returned by NettyTransport#connectLock()
Both are up and running, all is fine until master stops.
Thread 1: The master fault detection of the datanode is notified (onNodeDisconnected()), which in turn leads the node to try and reconnect to master via the callstack titled "Thread 1" below.
-> connectToNode() is called and lock for X is acquired. The method waits for 45s for the cannels to reconnect.
Furthermore, Thread 1 holds the NettyTransport#masterNodeMutex.
Thread 2: The connection fails with an exception (connection refused, see callstack below), because the master shut down already. The exception is handled in NettyTransport#exceptionCaught which calls NettyTransport#disconnectFromNodeChannel. This method acquires the lock for Y (see Thread 2 below).
Now, if Y and X have two different locks, this would get the lock, disconnect the channels and notify thread 1. But since X and Y have the same locks, thread 2 is deadlocked with thread 1 which waits for 45s.
In this time, no thread can acquire the masterNodeMutex (held by thread 1), so the node can, for example, not stop.
This commit introduces a mechanism that assures unique locks for unique ids. This lock is not reentrant and therfore assures that threads can not end up in an infinite recursion (see Thread 3 below for an example on how a thread can aquire a lock twice).
While this is not a problem right now, it is potentially dangerous to have it that way, because the callstacks are complex as is and slight changes might cause unecpected recursions.
Thread 1
----
owns: Object (id=114)
owns: Object (id=118)
waiting for: DefaultChannelFuture (id=140)
Object.wait(long) line: not available [native method]
DefaultChannelFuture(Object).wait(long, int) line: 461
DefaultChannelFuture.await0(long, boolean) line: 311
DefaultChannelFuture.awaitUninterruptibly(long) line: 285
NettyTransport.connectToChannels(NettyTransport$NodeChannels, DiscoveryNode) line: 672
NettyTransport.connectToNode(DiscoveryNode, boolean) line: 609
NettyTransport.connectToNode(DiscoveryNode) line: 579
TransportService.connectToNode(DiscoveryNode) line: 129
MasterFaultDetection.handleTransportDisconnect(DiscoveryNode) line: 195
MasterFaultDetection.access$0(MasterFaultDetection, DiscoveryNode) line: 188
MasterFaultDetection$FDConnectionListener.onNodeDisconnected(DiscoveryNode) line: 245
TransportService$Adapter$2.run() line: 298
EsThreadPoolExecutor(ThreadPoolExecutor).runWorker(ThreadPoolExecutor$Worker) line: 1145
ThreadPoolExecutor$Worker.run() line: 615
Thread.run() line: 724
Thread 2
-------
waiting for: Object (id=114)
NettyTransport.disconnectFromNodeChannel(Channel, Throwable) line: 790
NettyTransport.exceptionCaught(ChannelHandlerContext, ExceptionEvent) line: 495
MessageChannelHandler.exceptionCaught(ChannelHandlerContext, ExceptionEvent) line: 228
MessageChannelHandler(SimpleChannelUpstreamHandler).handleUpstream(ChannelHandlerContext, ChannelEvent) line: 112
DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline$DefaultChannelHandlerContext, ChannelEvent) line: 564
DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(ChannelEvent) line: 791
SizeHeaderFrameDecoder(FrameDecoder).exceptionCaught(ChannelHandlerContext, ExceptionEvent) line: 377
SizeHeaderFrameDecoder(SimpleChannelUpstreamHandler).handleUpstream(ChannelHandlerContext, ChannelEvent) line: 112
DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline$DefaultChannelHandlerContext, ChannelEvent) line: 564
DefaultChannelPipeline.sendUpstream(ChannelEvent) line: 559
Channels.fireExceptionCaught(Channel, Throwable) line: 525
NioClientBoss.processSelectedKeys(Set<SelectionKey>) line: 110
NioClientBoss.process(Selector) line: 79
NioClientBoss(AbstractNioSelector).run() line: 312
NioClientBoss.run() line: 42
ThreadRenamingRunnable.run() line: 108
DeadLockProofWorker$1.run() line: 42
ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145
ThreadPoolExecutor$Worker.run() line: 615
Thread.run() line: 724
Thread 3
---------
org.elasticsearch.transport.netty.NettyTransport.disconnectFromNode(NettyTransport.java:772)
org.elasticsearch.transport.netty.NettyTransport.access$1200(NettyTransport.java:92)
org.elasticsearch.transport.netty.NettyTransport$ChannelCloseListener.operationComplete(NettyTransport.java:830)
org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427)
org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:418)
org.jboss.netty.channel.DefaultChannelFuture.setSuccess(DefaultChannelFuture.java:362)
org.jboss.netty.channel.AbstractChannel$ChannelCloseFuture.setClosed(AbstractChannel.java:355)
org.jboss.netty.channel.AbstractChannel.setClosed(AbstractChannel.java:185)
org.jboss.netty.channel.socket.nio.AbstractNioChannel.setClosed(AbstractNioChannel.java:197)
org.jboss.netty.channel.socket.nio.NioSocketChannel.setClosed(NioSocketChannel.java:84)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:357)
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:58)
org.jboss.netty.channel.Channels.close(Channels.java:812)
org.jboss.netty.channel.AbstractChannel.close(AbstractChannel.java:197)
org.elasticsearch.transport.netty.NettyTransport$NodeChannels.closeChannelsAndWait(NettyTransport.java:892)
org.elasticsearch.transport.netty.NettyTransport$NodeChannels.close(NettyTransport.java:879)
org.elasticsearch.transport.netty.NettyTransport.disconnectFromNode(NettyTransport.java:778)
org.elasticsearch.transport.netty.NettyTransport.access$1200(NettyTransport.java:92)
org.elasticsearch.transport.netty.NettyTransport$ChannelCloseListener.operationComplete(NettyTransport.java:830)
org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427)
org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:418)
org.jboss.netty.channel.DefaultChannelFuture.setSuccess(DefaultChannelFuture.java:362)
org.jboss.netty.channel.AbstractChannel$ChannelCloseFuture.setClosed(AbstractChannel.java:355)
org.jboss.netty.channel.AbstractChannel.setClosed(AbstractChannel.java:185)
org.jboss.netty.channel.socket.nio.AbstractNioChannel.setClosed(AbstractNioChannel.java:197)
org.jboss.netty.channel.socket.nio.NioSocketChannel.setClosed(NioSocketChannel.java:84)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:357)
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:724)
We changed the default of BytesStreamOutput (used in various places in ES) to 32k from 1k with the assumption that most stream tend to be large. This doesn't hold for example when indexing small documents and adding them using XContentBuilder (which will have a large overhead).
Default the buffer size to 2k now, but be relatively aggressive in expanding the buffer when below 256k (double it), and just use oversize (1/8th) when larger to try and minimize garbage and buffer copies.
relates to #3624closes#3638
we can use the index in the node ids list as the index for the array when we set the response or the exception, removing the need for an index AtomicInteger
This reverts commit e943cc81a5.
The more complex queries support causes StackOverflowErrors that
can influence the cluster performance and stability dramatically.
This commit backs out this change to reduce the risk for deep
stacks.
Reverts #3357
If the index shard is not started yet the underlying engine can
throw an exception since not everything is initialized. This can
happen if a shard is just about starting up or recovering / initalizing
and stats are requested.
Closes#3619
The unassinged list is used to make allocation decisions but is currently
modified during allocation runs which causes primaries to be throttled
during allocation. If this happens newly allocated indices can be stalled
for a long time turning a cluster into a RED state if concurrent relocations
and / or recoveries are happening.
Closes#3610
Whether we remove the top-level folder from the archive depends now on the zip itself and not on where it was downloaded from. That makes it work installing local files too.
Closes#3582
Restrict the size of the input length to a reasonable size otherwise very
long strings can cause StackOverflowExceptions deep down in lucene land.
Yet, this is simply a saftly limit set to `50` UTF-16 codepoints by default.
This limit is only present at index time and not at query time. If prefix
completions > 50 UTF-16 codepoints are expected / desired this limit should be raised.
Critical string sizes are beyone the 1k UTF-16 Codepoints limit.
Closes#3596
Added more debug logging to TransportShardReplicationOperationAction
*Changed* node naming in tests from GUIDs to node# based naming as they are much easier to read
This test will make random allocation decision on a growing and
shrinking cluster leading to a random distribution of the shards.
After a certain amount of iterations the test allows allocation unless
the same shard is already allocated on a node and balances the cluster
to gain optimal balance.
The goal is to throw out suggestions that only meet the cutoff in some
shards. This will happen if your input phrase is only contained in a
few shards. If your shards are unbanced this rechecking can throw out
good suggestions.
Closes#3547.
If a path to the test classes contains spaces Hunspell tests failed due
to URL encoded paths. This happens on CI builds if you give the build
a name containing a space. This is fixed by first converting to a URI
and created a File object from the URI directly.
In a special case if allocations from the "heaviest" to the "lighter" nodes is not possible
due to some allocation decider restrictions like zone awareness. if one zone has for instance
less nodes than another zone so one zone is horribly overloaded from a balanced perspective but we
can't move to the "lighter" shards since otherwise the zone would go over capacity.
Closes#3580
When recovery of a shard fails on a node, the node sends notification to the master with information about the failure. During the period between the shard failure and the time when notification about the failure reaches the master, any changes in shard allocations can trigger the node with the failed shard to allocate this shard again. This allocation (especially if successful) creates a ripple effect of the shard going through failure/started states in order to match the delayed states processed by master. Under certain condition, a node involved in this process might generate warning messages: "marked shard as started, but shard has not been created, mark shard as failed".
This fix makes sure that nodes keep track of failed shard allocations and will not try to allocate such shards repeatedly while waiting for the failure notification to be processed by master.
Shard relocation caused queries to fail in the fetch phase either with a
`AlreadyClosedException` or a `SearchContextMissingException`.
This was caused by the `SearchService` releasing the `SearchContext`s via the
superfluous call of `releaseContextsForShard()` when the
This commit allows for configuring the sort order of missing values in BytesRef
comparators (used for strings) with the following options:
- _first: missing values will appear in the first positions,
- _last: missing values will appear in the last positions (default),
- <any value>: documents with missing sort value will use the given value when
sorting.
Since the default is _last, sorting by string value will have a different
behavior than in previous versions of elasticsearch which used to insert missing
value in the first positions when sorting in ascending order.
Implementation notes:
- Nested sorting is supported through the implementation of
NestedWrappableComparator,
- BytesRefValComparator was mostly broken since no field data implementation
was using it, it is now tested through NoOrdinalsStringFieldDataTests,
- Specialized BytesRefOrdValComparators have been removed now that the ordinals
rely on packed arrays instead of raw arrays,
- Field data tests hierarchy has been changed so that the numeric tests don't
inherit from the string tests anymore,
- When _first or _last is used, internally the comparators are told to use
null or BytesRefFieldComparatorSource.MAX_TERM to replace missing values
(depending on the sort order),
- BytesRefValComparator just replaces missing values with the provided value
and uses them for comparisons,
- BytesRefOrdValComparator multiplies ordinals by 4 so that it can find
ordinals for the missing value and the bottom value which are directly
comparable with the segment ordinals. For example, if the segment values and
ordinals are (a,1) and (b,2), they will be stored internally as (a,4) and
(b,8) and if the missing value is 'ab', it will be assigned 6 as an ordinal,
since it is between 'a' and 'b'. Then if the bottom value is 'abc', it will
be assigned 7 as an ordinal since if it between 'ab' and 'b'.
Closes#896
count_percolate -> count
percolate_existing_doc -> percolate
count_percolate_existing_doc -> count
If header contains `id` field, then it will automatically be percolation an existing document.
The highlighter in the percolate api highlights snippets in the document being percolated. If highlighting is enabled then foreach matching query, highlight snippets will be generated.
All highlight options that are supported via the search api are also supported in the percolate api, since the percolate api embeds the same highlighter infrastructure as the search api.
The `size` option is a required option if highlighting is specified in the percolate api, other than that the `highlight`request part can just be placed in the percolate api request body.
Closes#3574
- improve the test to be more re-creatable
- have tests for various number of replica counts, to check if failures are caused by searching on replicas that might not have been refreshed yet
- improve test to test explicit index creation, and index creation caused by index operation
- have an initial search go to _primary, to check if failure fails when searching on replica because it missed a refresh
--------------------------
This feature allows to retrieve [term vectors](https://github.com/elasticsearch/elasticsearch/issues/3114) for a list of documents. The json request has exactly the same [format](https://github.com/elasticsearch/elasticsearch/issues/3484) as the ```_termvectors``` endpoint
It use it, call
```
curl -XGET 'http://localhost:9200/index/type/_mtermvectors' -d '{
"fields": [
"field1",
"field2",
...
],
"ids": [
"docId1",
"docId2",
...
],
"offsets": false|true,
"payloads": false|true,
"positions": false|true,
"term_statistics": false|true,
"field_statistics": false|true
}'
```
The return format is an array, each entry of which conatins the term vector response for one document:
```
{
"docs": [
{
"_index": "index",
"_type": "type",
"_id": "docId1",
"_version": 1,
"exists": true,
"term_vectors": {
...
}
},
{
"_index": "index",
"_type": "type",
"_id": "docId2",
"_version": 1,
"exists": true,
"term_vectors": {
...
}
}
]
}
```
Note that, like term vectors, the mult term vectors request will silenty skip over documents that have no term vectors stored in the index and will simply return an empty response in this case.
Closes#3536
Currently we run unittests with clusters running in the background
that can potentiallly spawn threads causeing the thread leak control
to fire off in tests that don't use the test cluster. This commit
introduces some base classes for that purpose shadowning lucene test
framework classes adding the approriate ThreadScope.
always multiply query score to function score. For script score
functions, this means that boost_mode has to be set to `plain` if
'function_score' should behave like 'custom_score'
This commit fixes inconsistencies in `function_score` and `filters_function_score`
using scripts, see issue #3464
The method 'ScoreFunction.factor(docId)' is removed completely, since the name
suggests that this method actually computes a factor which was not the case.
Multiplying the computed score is now handled by 'FiltersFunctionScoreQuery'
and 'FunctionScoreQuery' and not implicitely performed in
'ScoreFunction.factor(docId, subQueryScore)' as was the case for 'BoostScoreFunction'
and 'DecayScoreFunctions'.
This commit also fixes the explain function for FiltersFunctionScoreQuery. Here,
the influence of the maxBoost was never printed. Furthermore, the queryBoost was
printed as beeing multiplied to the filter score.
Closes#3464
This looks like a copy/paste issue where onCached was being called
rather than onRemoval. This should fix the ID cache stats not being
correct after a call to /_cache/clear?id_cache=true
During segment merges FieldConsumers for all fields from the source
segments are created. Yet, if all documents that have a value in a
certain field are deleted / pruned during the merge the FieldConsumer
will not be called at all causing the internally used FST Builder to
return `null` from it's `build()` method. This means we can't store
it and run potentially into a NPE. This commit adds handling for
this corner case both during merge and during suggest phase since
we also don't have a Lookup instance for this segments field.
Closes#3555