Relates to #10154 and #10150
Adds link to additional information on how document frequencies are treated across shards to the cutoff_frequency parameter documentation.
Closes#10451
These tests create artificial hash collisions in order to make sure that they
can be resolved correctly. But this also makes the tests very slow if there
are too many collisions because insertions/deletions become linear in such
cases. The tests have been modified to not do too many iterations when
collisions are likely.
Close#10442
Closes#10435.
Squashed commit of the following:
commit aa1935c790b2731fc2bbc7de6142b09e3fe8bd4a
Author: Ryan Ernst <ryan@iernst.net>
Date: Mon Apr 6 13:44:40 2015 -0700
fix index lookup
commit bb6373595ff62ffc56fdf0cba3ac9c0ebe679946
Merge: 916962b eb3a170
Author: Robert Muir <rmuir@apache.org>
Date: Mon Apr 6 14:24:38 2015 -0400
Merge branch 'lucene_r1671277' of github.com:elasticsearch/elasticsearch into lucene_r1671277
commit 916962b82d192a53add471b4cc4a1396bc30eb0e
Merge: 197b3a2 21f72fe
Author: Robert Muir <rmuir@apache.org>
Date: Mon Apr 6 07:09:41 2015 -0400
Merge branch 'master' into lucene_r1671277
commit eb3a1703f7932ddd0cf3e83bec0e86131d255407
Author: Ryan Ernst <ryan@iernst.net>
Date: Sat Apr 4 11:06:03 2015 -0700
re-enable index lookup tests
commit 80d65d5eab39062dd8364687da74ddbb87ebcb76
Author: Ryan Ernst <ryan@iernst.net>
Date: Sat Apr 4 10:39:52 2015 -0700
update pom to point to new snapshot repo
commit 197b3a21ac2c2d70c9f740fe53e58632a22d1aad
Author: Robert Muir <rmuir@apache.org>
Date: Sat Apr 4 12:51:22 2015 -0400
fix postingsenum usage
commit 0e2b7a00cd07d068f755c51185ac521aa1eb0326
Author: Robert Muir <rmuir@apache.org>
Date: Sat Apr 4 12:21:23 2015 -0400
upgrade to lucene r1671277 (have not yet run tests or looked at postings changes)
The current implementation of AbstractBlobContainer.deleteByPrefix() calls AbstractBlobContainer.deleteBlobsByFilter() which calls BlobContainer.listBlobs() for deleting files, resulting in loading all files in order to delete few of them. This can be improved by calling BlobContainer.listBlobsByPrefix() directly.
This problem happened in #10344 when the repository verification process tries to delete a blob prefixed by "tests-" to ensure that the repository is accessible for the node. When doing so we have the following calling graph: BlobStoreRepository.endVerification() -> BlobContainer.deleteByPrefix() -> AbstractBlobContainer.deleteByPrefix() -> AbstractBlobContainer.deleteBlobsByFilter() -> BlobContainer.listBlobs()... and boom.
Also, AbstractBlobContainer.listBlobsByPrefix() and BlobContainer.deleteBlobsByFilter() can be removed because it has the same drawbacks as AbstractBlobContainer.deleteByPrefix() and also lists all blobs. Listing blobs by prefix can be done at the FsBlobContainer level.
Related to #10344
I do not know normally whether this section is the right place or not, please let me know.
Here is the pes plugin's GitHub site :https://github.com/kodcu/pesCloses#10398
The static old index tests currently take a long time to run because
each index version essentially recreates the cluster, and spins up
new nodes. This PR instead loads each old version into the existing
cluster as a dangling index. It also removes the intermediate
"StaticIndexBackwardCompatibilityTest" which was an extra layer
with no purpose, and moves a shared version of a commonly found
function to get an http client.
The test now takes between 40 and 60 seconds for me. I also ran it
"under stress" by running all ES tests in one shell, while
simultaneously running 10 iterations of the old index tests. Each
iteration took on average about 90 seconds, which is much better
than the 20+ minutes we see in master on jenkins.
closes#10247
When doc values are explicitly set to the default value serialization
is skipped. This means the alternate way of specifying doc values,
through `fielddata.format: doc_values`, will take precedense if
present.
This change fixes doc values to always be serialized when an explicit value
was passed, so that it continues to take precedence over
`fielddata.format`.
closes#10297closes#10302
The current version is normally a snapshot while in development.
However, when the release process changes the snapshot flag to false,
this causes the static bwc tests to fail because they cannot
find an index for the current version. Instead, this change
skips the current version, because there is no need to test
a verion's bwc against itself.
closes#10292closes#10293
We had an undocumented parameter called `numeric_resolution` which allows to
configure how to deal with dates when provided as a number. The default is to
handle them as milliseconds, but you can also opt-on for eg. seconds.
Close#10072
We recently increased the size of bw indexes and backward compatibility tests
are now taking more time so it makes sense to ask them to do a bit less. This
commit changes the number of replicas we try to copy primaries to from (2 or 3)
to (1 or 2).
Separate repository registration to make sure that failure in registering one repository doesn't cause failures to register other repositories.
Closes#10351
1.1.0 is affected by #5817 which prevents merges from keeping up with the
indexing rate. As a consequence it generates lots of segments and makes bw
compat tests slow. So I added a special case for this version to index fewer
documents.
Many scripts are used to start/stop and install/uninstall elasticsearch. These scripts share a lot of configuration properties like directory paths, max value for a setting, default user etc. Most of the values are identical but some of them are different depending of the platform (Debian-based or Redhat-based OS), depending of the way elasticsearch is started (shell script, systemd, sysv-init...) or the way it is installed (zip, rpm, deb...). Today the values are duplicated in multiple places, making it difficult to maintain the scripts or to update a value.
This pull request make this more uniform: values used in scripts must be defined in a common packaging.properties file. Each value can be overridden in another specific packaging.properties file for Debian or Redhat. All startup and installation scripts are filtered with the common then the custom packaging.properties files before being packaged as a zip/tar.gz/rpm/dpkf archive.
Follow-up of #9135. We initially decreased the stack size because it would end
up a lot of memory when there are many threads. But we also have some thread
pools that may be oversized, in particular the search thread pool.
This commit proposes to decrease the default search thread pool size from
`3 * num_procs` to `3 * num_procs / 2 + 1`. This is large enough to be sure
that we can use all the machine resources even with a search-only work load
but not too large in order to not consume too much memory because of the stack
size and thread locals.
This pull request makes boolean handled like dates and ipv4 addresses: things
are stored as as numerics under the hood and aggregations add some special
formatting logic in order to return true/false in addition to 1/0.
For example, here is an output of a terms aggregation on a boolean field:
```
"aggregations": {
"top_f": {
"doc_count_error_upper_bound": 0,
"buckets": [
{
"key": 0,
"key_as_string": "false",
"doc_count": 2
},
{
"key": 1,
"key_as_string": "true",
"doc_count": 1
}
]
}
}
```
Sorted numeric doc values are used under the hood.
Close#4678Close#7851
A shard recovery response might serialize a shard state at the same time that it is
modified by the recovery process. The test
RelocationTests.testMoveShardsWhileRelocation
failed because of this with a ConcurrentModificationException.
closes#10381
RoutingTables activePrimaryShardsGrouped(), allActiveShardsGrouped() and
allAssignedShardsGrouped() methods treated empty index array input
parameters as meaning "all" indices and expanded to the routing maps
keyset. However, the expansion of index names is now already done in
MetaData#concreteIndices(). Returning an empty index name list here
when a wildcard pattern didn't match any index name could lead to
problems like #9081 because the RoutingTable still expanded this
list of names to "_all". In case of e.g. the recovery endpoint this
could lead to problems.
Closes#9081Closes#10148