In several places in the code we need to notify a node it needs to do something (typically the master). When that node is the local node, we have an optimization in serveral places that runs the execution code immediately instead of sending the request through the wire to itself. This is a shame as we need to implement the same pattern again and again. On top of that we may forget (see note bellow) to do so and we might have to write some craft if the code need to run under another thread pool.
This commit folds the optimization in the TrasnportService, shortcutting wire serliazition if the target node is local.
Note: this was discovered by #10247 which tries to import a dangling index quickly after the cluster forms. When sending an import dangling request to master, the code didn't take into account that fact that the local node may master. If this happens quickly enough, one would get a NodeNotConnected exception causing the dangling indices not to be imported. This will succeed after 10s where InternalClusterService.ReconnectToNodes runs and actively connects the local node to itself (which is not needed), potentially after another cluster state update.
Closes#10350
We still have a lot of APIs that use setNextReader in order to change the
current segment that should be considered. This commit moves such APIs to
getLeafXXX() instead to be more in-line with Lucene 5's collector API.
I also renamed setDocId to setDocument to be more in-line with the doc values
APIs.
Close#10389
These tests create artificial hash collisions in order to make sure that they
can be resolved correctly. But this also makes the tests very slow if there
are too many collisions because insertions/deletions become linear in such
cases. The tests have been modified to not do too many iterations when
collisions are likely.
Close#10442
Closes#10435.
Squashed commit of the following:
commit aa1935c790b2731fc2bbc7de6142b09e3fe8bd4a
Author: Ryan Ernst <ryan@iernst.net>
Date: Mon Apr 6 13:44:40 2015 -0700
fix index lookup
commit bb6373595ff62ffc56fdf0cba3ac9c0ebe679946
Merge: 916962b eb3a170
Author: Robert Muir <rmuir@apache.org>
Date: Mon Apr 6 14:24:38 2015 -0400
Merge branch 'lucene_r1671277' of github.com:elasticsearch/elasticsearch into lucene_r1671277
commit 916962b82d192a53add471b4cc4a1396bc30eb0e
Merge: 197b3a2 21f72fe
Author: Robert Muir <rmuir@apache.org>
Date: Mon Apr 6 07:09:41 2015 -0400
Merge branch 'master' into lucene_r1671277
commit eb3a1703f7932ddd0cf3e83bec0e86131d255407
Author: Ryan Ernst <ryan@iernst.net>
Date: Sat Apr 4 11:06:03 2015 -0700
re-enable index lookup tests
commit 80d65d5eab39062dd8364687da74ddbb87ebcb76
Author: Ryan Ernst <ryan@iernst.net>
Date: Sat Apr 4 10:39:52 2015 -0700
update pom to point to new snapshot repo
commit 197b3a21ac2c2d70c9f740fe53e58632a22d1aad
Author: Robert Muir <rmuir@apache.org>
Date: Sat Apr 4 12:51:22 2015 -0400
fix postingsenum usage
commit 0e2b7a00cd07d068f755c51185ac521aa1eb0326
Author: Robert Muir <rmuir@apache.org>
Date: Sat Apr 4 12:21:23 2015 -0400
upgrade to lucene r1671277 (have not yet run tests or looked at postings changes)
The current implementation of AbstractBlobContainer.deleteByPrefix() calls AbstractBlobContainer.deleteBlobsByFilter() which calls BlobContainer.listBlobs() for deleting files, resulting in loading all files in order to delete few of them. This can be improved by calling BlobContainer.listBlobsByPrefix() directly.
This problem happened in #10344 when the repository verification process tries to delete a blob prefixed by "tests-" to ensure that the repository is accessible for the node. When doing so we have the following calling graph: BlobStoreRepository.endVerification() -> BlobContainer.deleteByPrefix() -> AbstractBlobContainer.deleteByPrefix() -> AbstractBlobContainer.deleteBlobsByFilter() -> BlobContainer.listBlobs()... and boom.
Also, AbstractBlobContainer.listBlobsByPrefix() and BlobContainer.deleteBlobsByFilter() can be removed because it has the same drawbacks as AbstractBlobContainer.deleteByPrefix() and also lists all blobs. Listing blobs by prefix can be done at the FsBlobContainer level.
Related to #10344
The static old index tests currently take a long time to run because
each index version essentially recreates the cluster, and spins up
new nodes. This PR instead loads each old version into the existing
cluster as a dangling index. It also removes the intermediate
"StaticIndexBackwardCompatibilityTest" which was an extra layer
with no purpose, and moves a shared version of a commonly found
function to get an http client.
The test now takes between 40 and 60 seconds for me. I also ran it
"under stress" by running all ES tests in one shell, while
simultaneously running 10 iterations of the old index tests. Each
iteration took on average about 90 seconds, which is much better
than the 20+ minutes we see in master on jenkins.
closes#10247
When doc values are explicitly set to the default value serialization
is skipped. This means the alternate way of specifying doc values,
through `fielddata.format: doc_values`, will take precedense if
present.
This change fixes doc values to always be serialized when an explicit value
was passed, so that it continues to take precedence over
`fielddata.format`.
closes#10297closes#10302
The current version is normally a snapshot while in development.
However, when the release process changes the snapshot flag to false,
this causes the static bwc tests to fail because they cannot
find an index for the current version. Instead, this change
skips the current version, because there is no need to test
a verion's bwc against itself.
closes#10292closes#10293
We had an undocumented parameter called `numeric_resolution` which allows to
configure how to deal with dates when provided as a number. The default is to
handle them as milliseconds, but you can also opt-on for eg. seconds.
Close#10072
We recently increased the size of bw indexes and backward compatibility tests
are now taking more time so it makes sense to ask them to do a bit less. This
commit changes the number of replicas we try to copy primaries to from (2 or 3)
to (1 or 2).
Separate repository registration to make sure that failure in registering one repository doesn't cause failures to register other repositories.
Closes#10351
1.1.0 is affected by #5817 which prevents merges from keeping up with the
indexing rate. As a consequence it generates lots of segments and makes bw
compat tests slow. So I added a special case for this version to index fewer
documents.
This pull request makes boolean handled like dates and ipv4 addresses: things
are stored as as numerics under the hood and aggregations add some special
formatting logic in order to return true/false in addition to 1/0.
For example, here is an output of a terms aggregation on a boolean field:
```
"aggregations": {
"top_f": {
"doc_count_error_upper_bound": 0,
"buckets": [
{
"key": 0,
"key_as_string": "false",
"doc_count": 2
},
{
"key": 1,
"key_as_string": "true",
"doc_count": 1
}
]
}
}
```
Sorted numeric doc values are used under the hood.
Close#4678Close#7851
A shard recovery response might serialize a shard state at the same time that it is
modified by the recovery process. The test
RelocationTests.testMoveShardsWhileRelocation
failed because of this with a ConcurrentModificationException.
closes#10381
RoutingTables activePrimaryShardsGrouped(), allActiveShardsGrouped() and
allAssignedShardsGrouped() methods treated empty index array input
parameters as meaning "all" indices and expanded to the routing maps
keyset. However, the expansion of index names is now already done in
MetaData#concreteIndices(). Returning an empty index name list here
when a wildcard pattern didn't match any index name could lead to
problems like #9081 because the RoutingTable still expanded this
list of names to "_all". In case of e.g. the recovery endpoint this
could lead to problems.
Closes#9081Closes#10148
This fixes an issue where this was logged:
```
[node_t1] [test][0] flush with org.elasticsearch.action.admin.indices.flush.FlushRequest@65f6f1e
```
by adding a .toString() method to FlushRequest.
It also changes:
```
creating Index [test], shards [1]/[2]
```
to:
```
creating Index [test], shards [1]/[2s]
```
If shadow replicas are being used.
Today we reuse the UUID of the source index on restore. This can create conflicts
with existing shard state on disk. This also causes multiple indices with the same
UUID. This commit preserves the UUID of an existing index or creates a new UUID for
a newly created index.
For quite some time now, our networking layer makes sure to create safe messages as in not using the shared buffers. This is great, and we should remove the old support for "unsafe" notion in our codebase.
closes#10360
Prevents a current edge case resolving concrete aliases or index names in cluster MetaData
that could potentialy lead to NullPointerException when the IndicesOptions don't allow
wildcard expansion and the method is called with aliasesOrIndices argument null or emtpy list.
This change adds a check for that and introduces randomized test that catches this.
Closes#10342Closes#10339
For optimization pruposes a function score query with an empty function
will just result in the original sub query. However, sometimes one might
want to use function_score query to actually filter out docs within for example
bool clauses by using the min_score functionallity.
Therefore the sub query should only be used without wrapping inside
a function_score query if min_score was also not set.
closes#10253closes#10326
When deleting a shard th node that deletes th shard first checks if all shard copies are
started on other nodes. A message is sent to each node tand each node checks locally for
STARTED or RELOCATED.
However, it might happen that the shard is still in state POST_RECOVERY, like this:
shard is relocating from node1 to node2
1. relocated shard on node2 goes in POST_RECOVERY and node2 sends shard started to master
2. master updates routing table and sends new cluster state to node1 and node2
3. node1 processes the cluster state and asks node2 if it has the active shard
before node2 processes the new cluster state (which would cause it to set the shard to started)
4. node2 sends back it does not have the shard started and so node1 does not delete it
This can be avoided by waiting until cluster state that sets the shard to started is actually processed.
closes#10018
Today there is a chance that the state version for shard, index or cluster
state goes backwards or is reset on a full restart etc. depending on
several factors not related to the state. To prevent any collisions
with already existing state files and to maintain write-once properties
this change introductes an incremental state ID instead of using the plain
state version. This also fixes a bug when the previous legacy state had a
greater version than the current state which causes an exception on node
startup or if left-over files are present.
Closes#10316
Now that fine-grained script settings are supported (#10116) we can remove support for the script.disable_dynamic setting.
Same result as `script.disable_dynamic: false` can be obtained as follows:
```
script.inline: on
script.indexed: on
```
An exception is thrown at startup when the old setting is set, so we make sure we tell users they have to change it rather than ignoring the setting.
Closes#10286