Fixed behaviour where two representations of the default index analyzer weren't being treated as equivalent. Added REST test to confirm fix.
Closes#2716
In addition to `_source`, the following variables are available through
the `ctx` map: `_index`, `_type`, `_id`, `_version`, `_routing`,
`_parent`, `_timestamp`, `_ttl`.
Some of these fields are more useful still within the context of an
Update By Query, see #1607, #2230, #2231.
While this commit is primariy a fix for issue/8433 it adds more rigor to ShapeBuilder for parsing against the GeoJSON specification. Specifically, this adds LinearRing and LineString validity checks as defined in http://geojson.org/geojson-spec.html to ensure valid polygons are specified. The benefit of this fix is to provide a gate check at parse time to avoid any further processing if an invalid GeoJSON is provided. More parse checks like these will be necessary going forward to ensure full compliance with the GeoJSON specification.
Closes#8433
Based on some test failures, this commit fixes two minor things
* Bind ports only on so called ephemeral ports to prevent try to
bind to ports where elasticsearch already runs on
* Remove @Network annotation as it was used in a wrong scope
Previous to this change all features (_alias,_mapping,_settings,_warmer) are run regardless of which features are actually requested. This change fixes the request object to resolve this bug
The compression bug fixed in #7210 can still strike us since we are
running BWC test against these version. This commit disables compression
forcefully if the compatibility version is < 1.3.2 to prevent debugging
already known issues.
The query_string query has an option for analyzing wildcard/prefix (#787) by a best effort approach.
This adds `analyze_wildcard` option also to simple_query_string.
The default is set to `false` so the existing behavior of simple_query_string is unchanged.
When data nodes receive mapping updates from the master, the parse it and merge it into their own in memory representation (if there). If this results in different bytes then the master sent, the nodes will send a refresh-mapping command to indicate to the master that it's byte level storage of the mapping should be refreshed via the document mappers. This comes handy when the mapping format has changed, in a backwards compatible manner, and we want to make sure we can still rely on the bytes to identify changes. An example of such a change can be seen at #4760.
This commit extends the logic to include the `_default_` type, which was never refreshed before. In some unlucky scenarios, this caused the _default_ mapping to be parsed with every cluster state update.
Closes#8413
This prevents too-difficult regular expressions from consuming
excessive RAM/CPU; the default max_determinized_states is 10,000 (same
as Lucene) but query_string and regepx query/filter can override
per-request.
The also upgrades to a new Lucene 5.0.0 snapshot.
Closes#8386Closes#8357
DocIdSets.isFast(DocIdSet) has two issues:
- it works on the DocIdSet interface while some doc sets can generate either
slow or fast sets depending on their options (eg. whether an OrDocIdSet is
fast or not depends on the wrapped clauses).
- it only works because the result of this method is only taken into account
when a DocIdSet has non-null `bits()`.
This commit changes this method to work on top of a DocIdSetIterator and to use
a black-list rather than a white list: slow iterators should really be the
exception rather than the rule.
Close#8380
The rename(String, String) method doesn't allow this implementation to use a simple
concurrent map. There is a race during a rename operation where files are not fully
renamed but already visible via #listAll(). This inconsistency can lead to problems
when opening commit points since the pending_segments_N as well as segments_N are visible
but not yet atomically renamed.
Yet, non of the methods that are synced are long running such that adding sychronization
doesn't introduce bottlenecks here. The Direcotry#sync(...) method is not synchronized since
it doesn't change any mapping nor does it depend on the mapping.
Previously we didn't calculate this checksums even though we have a checksum
to compare. Since we now also verify checksums for legacy files #checkIntegrity
should also calculate the legacy checksums.
Closes#8407
When a lucene 4.8+ file is transferred, Store returns a VerifyingIndexOutput
that verifies both the CRC32 integrity and the length of the file.
However, for older files, problems can make it to the lucene level. This is not great
since older lucene files aren't especially strong as far as detecting issues here.
For example, if a network transfer is closed on the remote side, we might write a
truncated file... which old lucene formats may or may not detect.
The idea here is to verify old files with their legacy Adler32 checksum, plus expected
length. If they don't have an Adler32 (segments_N, jurassic elasticsearch?, its optional
as far as the protocol goes), then at least check the length.
We could improve it for segments_N, its had an embedded CRC32 forever in lucene, but this
gets trickier. Long term, we should also try to also improve tests around here, especially
backwards compat testing, we should test that detected corruptions are handled properly.
Closes#8399
Conflicts:
src/main/java/org/elasticsearch/index/store/Store.java
src/test/java/org/elasticsearch/index/store/StoreTest.java
Fixes an issue where only absolute bytes were taken into account when
kicking off an automatic reroute due to disk usage. Also randomized the
tests to use either an absolute value or a percentage so this is tested.
Also adds logging for each node over the high and low watermark every
time a new cluster info usage is gathered (defaults to every 30
seconds).
Related to #8368Fixes#8367
Today we use busy waiting and sampling when we execute HealthReqeusts
on the master. This is tricky sicne we might sample a not yet fully applied
cluster state and make a decsions base on the partial cluster state. This can
lead to ugly problems since requests might be routed to nodes where shards are
already marked as relocated but on the actual cluster state they are still started.
Yet, this window is very small usually it can lead to ugly test failures.
This commit moves the health request over to a listener pattern that gets the actual
applied cluster state.
Closes#8350
When a node stops, we cancel any ongoing join process. With #8327, we improved this logic and wait for it to complete before shutting down the node. However, the joining thread is part of a thread pool and will not stop until the thread pool is shutdown.
Another issue raised by the unneeded wait is that when we shutdown, we may ping ourselves - which results in an ugly warn level log. We now log all remote exception during pings at a debug level.
Closes#8359
Previously the bulk of our shard recovery code was in a 300-line
anonymous class in `RecoverySource`. This made it difficult to find and
more difficult to read.
This factors out that code into a `ShardRecoveryHandler` class, adding
javadocs for each function and phase of the recovery, as well as
comments explaining some of the more esoteric functions performed during
recovery.
It's hoped that this will help more people understand Elasticsearch's
recovery procedure.
No *major* functionality has changed, only typo corrections, some minor
allocation improvements and logging clarification changes.
If the translog file is corrupted or truncated the stream is never closed
and the filehandle leaks. This commit closes the stream in the case of an
exception.
Today we use the File API for file deletion as well as recursive
directory deletions. This API returns a boolean if operations
are successful while hiding the actual reason why they failed.
The Path API throws and actual exception that might provide better
insights and debug information.
Closes#8366
When a node stops, we cancel any ongoing join process. With #8327, we improved this logic and wait for it to complete before shutting down the node. In our tests we typically shutdown an entire cluster at once, which makes it very likely for nodes to be joining while shutting down. This introduces a race condition where the joinThread.interrupt can happen before the thread starts waiting on pings which causes shutdown logic to be slow. This commits improves by repeatedly trying to stop the thread in smaller waits.
Another side effect of the change is that we are now more likely to ping ourselves while shutting down, we results in an ugly warn level log. We now log all remote exception during pings at a debug level.
Closes#8359
The tests were failing because there was a shard which didn't get any documents and the tests assumed all shards had documents. This commit fixes this assumption
This has a lot of improvements in lucene, particularly around memory usage, merging, safety, compressed bitsets, etc.
On the elasticsearch side, summary of the larger changes:
API changes: postings API became a "pull" rather than "push", collector API became per-segment, etc.
packaging changes: add lucene-backwards-codecs.jar as a dependency.
improvements to boolean filtering: especially ensuring it will not be slow for SparseBitSet.
use generic BitSet api in plumbing so that concrete bitset type is an implementation detail.
use generic BitDocIdSetFilter api for dedicated bitset cache, so there is type safety.
changes to support atomic commits
implement Accountable.getChildResources (detailed memory usage API) for fielddata, etc
change handling of IndexFormatTooOld/New, since they no longer extends CorruptIndexException
Closes#8347.
Squashed commit of the following:
commit d90d53f5f21b876efc1e09cbd6d63c538a16cd89
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Nov 5 21:35:28 2014 +0100
Make default codec/postings/docvalues format constants
commit cb66c22c71cd304a36e7371b199a8c279908ae37
Merge: d4e2f6d ad4ff43
Author: Robert Muir <rmuir@apache.org>
Date: Wed Nov 5 11:41:13 2014 -0500
Merge branch 'master' into enhancement/lucene_5_0_upgrade
commit d4e2f6dfe767a5128c9b9ae9e75036378de08f47
Merge: 4e5445c 4111d93
Author: Robert Muir <rmuir@apache.org>
Date: Wed Nov 5 06:26:32 2014 -0500
Merge branch 'master' into enhancement/lucene_5_0_upgrade
commit 4e5445c775f580730eb01360244e9330c0dc3958
Author: Robert Muir <rmuir@apache.org>
Date: Tue Nov 4 16:19:19 2014 -0500
FixedBitSet -> BitSet
commit 9887ea73e8b857eeda7f851ef3722ef580c92acf
Merge: 1bf8894 fc84666
Author: Robert Muir <rmuir@apache.org>
Date: Tue Nov 4 15:26:25 2014 -0500
Merge branch 'master' into enhancement/lucene_5_0_upgrade
commit 1bf8894430de3e566d0dc5623b0cc28b0d674ebb
Author: Robert Muir <rmuir@apache.org>
Date: Tue Nov 4 15:22:51 2014 -0500
remove nocommit
commit a9c2a2259ff79c69bae7806b64e92d5f472c18c8
Author: Robert Muir <rmuir@apache.org>
Date: Tue Nov 4 13:48:43 2014 -0500
turn jenkins red again
commit 067baaaa4d52fce772c81654dcdb5051ea79139f
Author: Robert Muir <rmuir@apache.org>
Date: Tue Nov 4 13:18:21 2014 -0500
unzip from stream
commit 82b6fba33d362aca2313cc0ca495f28f5ebb9260
Merge: b2214bb 6523cd9
Author: Robert Muir <rmuir@apache.org>
Date: Tue Nov 4 13:10:59 2014 -0500
Merge branch 'master' into enhancement/lucene_5_0_upgrade
commit b2214bb093ec2f759003c488c3c403c8931db914
Author: Robert Muir <rmuir@apache.org>
Date: Tue Nov 4 13:09:53 2014 -0500
go back to my URL until we can figure out what is up with jenkins
commit e7d614172240175a51f580aeaefb6460d21cede9
Author: Robert Muir <rmuir@apache.org>
Date: Tue Nov 4 10:52:54 2014 -0500
try this jenkins
commit 337a3c7704efa7c9809bf373152d711ee55f876c
Author: Simon Willnauer <simonw@apache.org>
Date: Tue Nov 4 16:17:49 2014 +0100
Rename temp-files under lock to prevent metadata reads while renaming
commit 77d5ba80d0a76efa549dd753b9f114b2f2d2d29c
Author: Robert Muir <rmuir@apache.org>
Date: Tue Nov 4 10:07:11 2014 -0500
continue to treat too-old/too-new as corruption for now
commit 98d0fd2f4851bc50e505a94ca592a694d502c51c
Author: Robert Muir <rmuir@apache.org>
Date: Tue Nov 4 09:24:21 2014 -0500
fix last nocommit
commit 643fceed66c8caf22b97fc489d67b4a2a90a1a1c
Author: Simon Willnauer <simonw@apache.org>
Date: Tue Nov 4 14:46:17 2014 +0100
remove NoSuchDirectoryException
commit 2e43c4feba05cfaf451df70f946c0930cbcc4557
Merge: 93826e4 8163107
Author: Simon Willnauer <simonw@apache.org>
Date: Tue Nov 4 14:38:00 2014 +0100
Merge branch 'master' into enhancement/lucene_5_0_upgrade
commit 93826e4d56a6a97c2074669014af77ff519bde63
Merge: 7f10129 44e24d3
Author: Simon Willnauer <simonw@apache.org>
Date: Tue Nov 4 12:54:27 2014 +0100
Merge branch 'master' into enhancement/lucene_5_0_upgrade
Conflicts:
src/main/java/org/elasticsearch/index/store/DistributorDirectory.java
src/main/java/org/elasticsearch/index/store/Store.java
src/main/java/org/elasticsearch/indices/recovery/RecoveryStatus.java
src/test/java/org/elasticsearch/index/store/DistributorDirectoryTest.java
src/test/java/org/elasticsearch/index/store/StoreTest.java
src/test/java/org/elasticsearch/indices/recovery/RecoveryStatusTests.java
commit 7f10129364623620575c109df725cf54488b3abb
Author: Adrien Grand <jpountz@gmail.com>
Date: Tue Nov 4 11:32:24 2014 +0100
Fix TopHitsAggregator to not ignore the top-level/leaf collector split.
commit 042fadc8603b997bdfdc45ca44fec70dc86774a6
Author: Adrien Grand <jpountz@gmail.com>
Date: Tue Nov 4 11:31:20 2014 +0100
Remove MatchDocIdSet in favor of DocValuesDocIdSet.
commit 7d877581ff5db585a674c95ac391ac78a0282826
Author: Adrien Grand <jpountz@gmail.com>
Date: Tue Nov 4 11:10:08 2014 +0100
Make the and filter use the cost API.
Lucene 5 ensured that cost() can safely be used, and this will have the benefit
that the order in which filters are specified is not important anymore (only
for slow random-access filters in practice).
commit 78f1718aa2cd82184db7c3a8393e6215f43eb4a8
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 23:55:17 2014 -0500
fix previous eclipse import braindamage
commit 186c40e9258ce32f22a9a714ab442a310b6376e0
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 22:32:34 2014 -0500
allow child queries to exhaust iterators again
commit b0b1271305e1b6d0c4c4da51a3c54df1aa5c0605
Author: Ryan Ernst <ryan@iernst.net>
Date: Mon Nov 3 14:50:44 2014 -0800
Fix nocommit for mapping output. index_options will not be printed if
the field is not indexed.
commit ba223eb85e399c9620a347a983e29bf703953e7a
Author: Ryan Ernst <ryan@iernst.net>
Date: Mon Nov 3 14:07:26 2014 -0800
Remove no commit for chinese analyzer provider. We should have a
separate issue to address not using this provider on new indexes.
commit ca554b03c4471797682b2fb724f25205cf040c4a
Author: Ryan Ernst <ryan@iernst.net>
Date: Mon Nov 3 13:41:59 2014 -0800
Fix stop tests
commit de67c4653ec47dee9c671390536110749d2bb05f
Author: Ryan Ernst <ryan@iernst.net>
Date: Mon Nov 3 12:51:17 2014 -0800
Remove analysis nocommits, switching over to Lucene43*Filters for
backcompat
commit 50cae9bec72c25c33a1ab8a8931bccb3355171e2
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 15:32:25 2014 -0500
add ram accounting and TODO lazy-loading (its no worse than master, can be a followup improvement) for suggesters
commit 7a7f0122f138684b312d0f0b03dc2a9c16c15f9c
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 15:11:26 2014 -0500
bump lucene version
commit cd0cae5c35e7a9e049f49ae45431f658fb86676b
Merge: 446bc09 3c72073
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 14:49:05 2014 -0500
Merge branch 'master' into enhancement/lucene_5_0_upgrade
commit 446bc09b4e8bf4602d3c252b53ddaa0da65cce2f
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 14:46:30 2014 -0500
remove hack
commit a19d85a968d82e6d00292b49630ef6ff2dbf2f32
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 12:53:11 2014 -0500
dont create exceptions with circular references on corruption (will open a PR for this)
commit 0beefb9e821d97c37e90ec556d81ac7b00369b8a
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 11:47:14 2014 -0500
temporarily add craptastic detector for this horrible bug
commit e9f2d298bff75f3d1591f8622441e459c3ce7ac3
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 10:56:01 2014 -0500
add nocommit
commit e97f1d50a91a7129650b8effc7a9ecf74ca0569a
Merge: c57a3c8 f1f50ac
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 10:12:12 2014 -0500
Merge branch 'master' into enhancement/lucene_5_0_upgrade
commit c57a3c8341ed61dca62eaf77fad6b8b48aeb6940
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 10:11:46 2014 -0500
fix nocommit
commit dd0e77e4ec07c7011ab5f6b60b2ead33dc2333d2
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 09:54:09 2014 -0500
nocommit -> TODO, this is in much more places in the codebase, bigger issue
commit 3cc3bf56d72d642059f8fe220d6f2fed608363e9
Author: Ryan Ernst <ryan@iernst.net>
Date: Sat Nov 1 23:59:17 2014 -0700
Remove nocommit and awaitsfix for edge ngram filter test.
commit 89f115245155511c0fbc0d5ee62e63141c3700c1
Author: Ryan Ernst <ryan@iernst.net>
Date: Sat Nov 1 23:57:44 2014 -0700
Fix EdgeNGramTokenFilter logic for version <= 4.3, and fixed instanceof
checks in corresponding tests to correctly check for reverse filter when
applicable.
commit 112df869cd199e36aab0e1a7a288bb1fdb2ebf1c
Author: Robert Muir <rmuir@apache.org>
Date: Sun Nov 2 00:08:30 2014 -0400
execute geo disjoint query/filter as intersects
commit e5061273cc685f1252e9a3a9ae4877ec9bce7752
Author: Robert Muir <rmuir@apache.org>
Date: Sat Nov 1 22:58:59 2014 -0400
remove chinese analyzer from docs
commit ea1af11b8978fcc551f198e24fe21d52806993ef
Author: Robert Muir <rmuir@apache.org>
Date: Sat Nov 1 22:29:00 2014 -0400
fix ram accounting bug
commit 53c0a42c6aa81aa6bf81d3aa77b95efd513e0f81
Merge: e3bcd3c 6011a18
Author: Robert Muir <rmuir@apache.org>
Date: Sat Nov 1 22:16:29 2014 -0400
Merge branch 'master' into enhancement/lucene_5_0_upgrade
commit e3bcd3cc07a4957e12c7b3affc462c31290a9186
Author: Robert Muir <rmuir@apache.org>
Date: Sat Nov 1 22:15:01 2014 -0400
fix url-email back compat (thanks ryan)
commit 91d6b096a96c357755abee167098607223be1aad
Author: Robert Muir <rmuir@apache.org>
Date: Sat Nov 1 22:11:26 2014 -0400
bump lucene version
commit d2bb9568df72b37ec7050d25940160b8517394bc
Author: Robert Muir <rmuir@apache.org>
Date: Sat Nov 1 20:33:07 2014 -0400
remove nocommit
commit 1d049c471e19e5c457262c7399c5bad9e023b2e3
Author: Robert Muir <rmuir@apache.org>
Date: Sat Nov 1 20:28:58 2014 -0400
fix eclipse to group org/com imports together: without this, its madness
commit 09d8c1585ee99b6e63be032732c04ef6fed84ed2
Author: Robert Muir <rmuir@apache.org>
Date: Sat Nov 1 14:27:41 2014 -0400
remove nocommit, if you dont liek it, print assembly and tell me how it can be better
commit 8a6a294313fdf33b50c7126ec20c07867ecd637c
Author: Adrien Grand <jpountz@gmail.com>
Date: Fri Oct 31 20:01:55 2014 +0100
Remove deprecated usage of DocIdSets.newDocIDSet.
commit 601bee60543610558403298124a84b1b3bbd1045
Author: Robert Muir <rmuir@apache.org>
Date: Fri Oct 31 14:13:18 2014 -0400
maybe one of these zillions of annotations will stop thread leaks
commit 9d3f69abc7267c5e455aefa26db95cb554b02d62
Author: Robert Muir <rmuir@apache.org>
Date: Fri Oct 31 14:05:39 2014 -0400
fix some analysis nocommits
commit 312e3a29c77214b8142d21c33a6b2c2b151acf9a
Author: Adrien Grand <jpountz@gmail.com>
Date: Fri Oct 31 18:28:45 2014 +0100
Remove XConstantScoreQuery/XFilteredQuery/ApplyAcceptedDocsFilter.
commit 5a0cb9f8e167215df7f1b1fad11eec6e6c74940f
Author: Adrien Grand <jpountz@gmail.com>
Date: Fri Oct 31 17:06:45 2014 +0100
Fix misleading documentation of DocIdSets.toCacheable.
commit 8b4ef2b5b476fff4c79c0c2a0e4769ead26cf82b
Author: Adrien Grand <jpountz@gmail.com>
Date: Fri Oct 31 17:05:59 2014 +0100
Fix CustomRandomAccessFilterStrategy to override the right method.
commit d7a9a407a615987cfffc651f724fbd8795c9c671
Author: Adrien Grand <jpountz@gmail.com>
Date: Fri Oct 31 16:21:35 2014 +0100
Better handle the special case when there is a single SHOULD clause.
commit 648ad389f07e92dfc451f345549c9841ba5e4c9a
Author: Adrien Grand <jpountz@gmail.com>
Date: Fri Oct 31 15:53:38 2014 +0100
Cut over XBooleanFilter to BitDocIdSet.Builder.
The idea is similar to what happened to Lucene's BooleanFilter.
Yet XBooleanFilter is a bit more sophisticated and I had to slightly
change the way it is implemented in order to make it work. The main difference
with before is that slow filters are now applied lazily, so eg. if you have 3
MUST clauses, two with a fast iterator and the third with a slow iterator, the
previous implementation used to apply the fast iterators first and then only
check the slow filter for bits which were set in the bit set. Now we are
computing a bit set based on the fast must clauses and then basically returning
a BitsFilteredDocIdSet.wrap(bitset, slowClause).
Other than that, BooleanFilter still uses the bitset optimizations when or-ing
and and-ind filters.
Another improvement is that BooleanFilter is now aware of the cost API.
commit b2dad312b4bc9f931dc3a25415dd81c0d9deee08
Author: Robert Muir <rmuir@apache.org>
Date: Fri Oct 31 10:18:53 2014 -0400
clear nocommit
commit 4851d2091e744294336dfade33906c75fbe695cd
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Oct 31 15:15:16 2014 +0100
cut over to RoaringDocIdSet
commit ca6aec24a901073e65ce4dd6b70964fd3612409e
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Oct 31 14:57:30 2014 +0100
make nocommit more explicit
commit d0742ee2cb7a6c48b0bbb31580b7fbcebdb6ec40
Author: Robert Muir <rmuir@apache.org>
Date: Fri Oct 31 09:55:24 2014 -0400
fix standardtokenizer nocommit
commit 7d6faccafff22a86af62af0384838391d46695ca
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Oct 31 14:54:08 2014 +0100
fix compilation
commit a038a405c1ff6458ad294e6b5bc469e622f699d0
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Oct 31 14:53:43 2014 +0100
fix compilation
commit 30c9e307b1f5d80e2deca3392c0298682241207f
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Oct 31 14:52:35 2014 +0100
fix compilation
commit e5139bc5a0a9abd2bdc6ba0dfbcb7e3c2e7b8481
Author: Robert Muir <rmuir@apache.org>
Date: Fri Oct 31 09:52:16 2014 -0400
clear nocommit here
commit 85dd2cedf7a7994bed871ac421cfda06aaf5c0a5
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Oct 31 14:46:17 2014 +0100
fix CompletionPostingsFormatTest
commit c0f3781f616c9b0ee3b5c4d0998810f595868649
Author: Robert Muir <rmuir@apache.org>
Date: Fri Oct 31 09:38:00 2014 -0400
add tests for these analyzers
commit 51f9999b4ad079c283ae762c862fd0e22d00445f
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Oct 31 14:10:26 2014 +0100
remove nocommit - this is not an issue
commit fd1388fa03e622b0738601c8aeb2dbf7949a6dd2
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date: Fri Oct 31 14:07:01 2014 +0100
Remove redundant null check
commit 3d6dd51b0927337ba941a235446b22e8cd500dc3
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date: Fri Oct 31 14:01:37 2014 +0100
Removed the work around to prevent p/c error when invoking #iterator() twice, because the custom query filter wrapper now doesn't transform the result to a cache doc id set any more.
I think the transforming to a cachable doc id set in CustomQueryWrappingFilter isn't needed at all, because we use the DocIdSet only once and because of that is just slowed things down.
commit 821832a537e00cd1216064b379df3e01d2911d3a
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Oct 31 13:54:33 2014 +0100
one more nocommit
commit 77eb9ea4c4ea50afb2680c29682ddcb3851a9d4f
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date: Fri Oct 31 13:52:29 2014 +0100
Remove cast
commit a400573c034ed602221f801b20a58a9186a06eae
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Oct 31 13:49:24 2014 +0100
fix stop filter
commit 51746087cf8ec34c4d20aa05ba8dbff7b3b43eec
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Oct 31 13:21:36 2014 +0100
fix changed semantics of FBS.nextSetBit to check for NO_MORE_DOCS
commit 8d0a4e2511310f1293860823fe3ba80ac771bbe3
Author: Robert Muir <rmuir@apache.org>
Date: Fri Oct 31 08:13:44 2014 -0400
do the bogus cast differently
commit 46a5cc5732dea096c0c80ae5ce42911c9c51e44e
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Oct 31 13:00:16 2014 +0100
I hate it but P/C now passes
commit 580c0c2f82bbeacf217e594f22312b11d1bdb839
Merge: a9d3c00 1645434
Author: Robert Muir <rmuir@apache.org>
Date: Fri Oct 31 06:54:31 2014 -0400
fix nocommit/classcast
commit a9d3c004d62fe04989f49a897e6ff84973c06eb9
Author: Adrien Grand <jpountz@gmail.com>
Date: Fri Oct 31 08:49:31 2014 +0100
Update TODO.
commit aa75af0b407792aeef32017f03a6f442ed970baa
Author: Robert Muir <rmuir@apache.org>
Date: Thu Oct 30 19:18:25 2014 -0400
clear obselete nocommits from lucene bump
commit d438534cf41fcbe2d88070e2f27c994625e082c2
Author: Robert Muir <rmuir@apache.org>
Date: Thu Oct 30 18:53:20 2014 -0400
throw classcastexception when ES abuses regular filtercache for nested docs
commit 2c751f3a8feda43ec127c34769b069de21f3d16f
Author: Robert Muir <rmuir@apache.org>
Date: Thu Oct 30 18:31:34 2014 -0400
bump lucene revision, fix tests
commit d6ef7f6304ae262bf6228a7d661b2a452df332be
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Oct 30 22:37:58 2014 +0100
fix merge problems
commit de9d361f88a9ce6bb3fba85285de41f223c95767
Merge: 41f6aab f6b37a3
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Oct 30 22:28:59 2014 +0100
Merge branch 'master' into enhancement/lucene_5_0_upgrade
Conflicts:
pom.xml
src/main/java/org/elasticsearch/Version.java
src/main/java/org/elasticsearch/gateway/local/state/meta/MetaDataStateFormat.java
commit 41f6aab388aa80c40b08a2facab2617576203a0d
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Oct 30 17:48:46 2014 +0100
fix potiential NPE
commit c4428b12e1ae838b91e847df8b4a8be7f49e10f4
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Oct 30 17:38:46 2014 +0100
don't advance iterator in a match(doc) method
commit 28ab948e99e3ea4497c9b1e468384806ba7e1790
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Oct 30 17:34:58 2014 +0100
don't advance iterator in a match(doc) method
commit eb0f33f6634fadfcf4b2bf7327400e568f0427bb
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Oct 30 16:55:54 2014 +0100
fix GeoUtilsTest
commit 7f711fe3eaf73b6c2268cf42d5a41132a61ad831
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Oct 30 16:43:16 2014 +0100
Use a dedicated default index option if field type is not indexed by default
commit 78e3f37ab779e3e1b25b45a742cc86ab5f975149
Author: Robert Muir <rmuir@apache.org>
Date: Thu Oct 30 10:56:14 2014 -0400
disable this test with AwaitsFix to reduce noise
commit 9a590f563c8e03a99ecf0505c92d12d7ab20d11d
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Oct 30 09:38:49 2014 +0100
fix lucene version
commit abe3ca1d8bb6b5101b545198f59aec44bacfa741
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Oct 30 09:35:05 2014 +0100
fix AnalyzingCompletionLookupProvider to wrok with new codec API
commit 464293b245852d60bde050c6d3feb5907dcfbf5f
Author: Robert Muir <rmuir@apache.org>
Date: Thu Oct 30 00:26:00 2014 -0400
don't try to write stuff to tests class directory
commit 031cc6c19f4fe4423a034b515f77e5a0e282a124
Author: Robert Muir <rmuir@apache.org>
Date: Thu Oct 30 00:12:36 2014 -0400
AwaitsFix these known issues to reduce noise
commit 4600d51891e35847f2d344247d6f915a0605c0d1
Author: Robert Muir <rmuir@apache.org>
Date: Thu Oct 30 00:06:53 2014 -0400
openbitset lives on
commit 8492bae056249e2555d24acd55f1046b66a667c4
Author: Robert Muir <rmuir@apache.org>
Date: Wed Oct 29 23:42:54 2014 -0400
fixes for filter tests
commit 31f24ce4efeda31f97eafdb122346c7047a53bf2
Author: Robert Muir <rmuir@apache.org>
Date: Wed Oct 29 23:12:38 2014 -0400
don't use fieldcache
commit 8480789942fdff14a6d2b2cd8134502fe62f20c8
Author: Robert Muir <rmuir@apache.org>
Date: Wed Oct 29 23:04:29 2014 -0400
ancient index no longer supported
commit 02e78dc7ebdd827533009f542582e8db44309c57
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 23:37:02 2014 +0100
fix more tests
commit ff746c6df23c50b3f3ec24922413b962c8983080
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 23:08:19 2014 +0100
fix all mapper
commit e4fb84b517107b25cb064c66f83c9aa814a311b2
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 22:55:54 2014 +0100
fix distributor tests and cut over to FileStore API
commit 20c850e2cfe3210cd1fb9e232afed8d4ac045857
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 22:42:18 2014 +0100
use DOCS_ONLY if index=true and current options == null
commit 44169c108418413cfe51f5ce23ab82047463e4c2
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 22:33:36 2014 +0100
Fix index=yes|no settings in mappers
commit a3c5f77987461a18121156ed345d42ded301c566
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 21:51:41 2014 +0100
fix several field mappers conversion from setIndexed to indexOptions
commit df84d736908e88a031d710f98e222be68ae96af1
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 21:33:35 2014 +0100
fix SourceFieldMapper to be not indexed
commit b2bf01d12a8271a31fb2df601162d0e89924c8f5
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 21:23:08 2014 +0100
Cut over to .liv files in store and corruption tests
commit 619004df436f9ef05d24bef1b6a7f084c6b0ad75
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 17:05:52 2014 +0100
fix more tests
commit b7ed653a8b464de446e00456bce0a89e47627c38
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 16:19:08 2014 +0100
[STORE] Add dedicated method to write temporary files
Recovery writes temporary files which might not end up in the
right distributor directories today. This commit adds a dedicated
API that allows specifying the target file name in order to create the
tempoary file in the correct directory.
commit 7d574659f6ae04adc2b857146ad0d8d56ca66f12
Author: Robert Muir <rmuir@apache.org>
Date: Wed Oct 29 10:28:49 2014 -0400
add some leniency to temporary bogus method
commit f97022ea7c2259f7a5cf97d924c59ed75ab65b32
Author: Robert Muir <rmuir@apache.org>
Date: Wed Oct 29 10:24:17 2014 -0400
fix MultiCollector bug
commit b760533128c2b4eb10ad76e9689ef714293dd819
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 14:56:08 2014 +0100
CheckIndex is now closeable we need to close it
commit 9dae9fb6d63546a6c2427be2a2d5c8358f5b1934
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 14:45:11 2014 +0100
s/Lucene51/Lucene50
commit 7aea9b86856a8c1b06a08e7c312ede1168af1287
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 14:42:30 2014 +0100
fix BloomFilterPostingsFormat
commit 16fea6fe842e88665d59cc091e8224e8dc6ce08c
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 14:41:16 2014 +0100
fix some codec format issues
commit 3d77aa97dd2c4012b63befef3f2ba2525965e8a6
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 14:30:43 2014 +0100
fix CodecTests
commit 6ef823b1fde25657438ace1aabd9d552d6ae215e
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 14:26:47 2014 +0100
make it compile
commit 9991eee1fe99435118d4dd42b297ffc83fce5ec5
Author: Robert Muir <rmuir@apache.org>
Date: Wed Oct 29 09:12:43 2014 -0400
add an ugly hack for TopHitsAggregator for now
commit 03e768a01fcae6b1f4cb50bcceec7d42977ac3e6
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 14:01:02 2014 +0100
cut over ES090PostingsFormat
commit 463d281faadb794fdde3b469326bdaada25af048
Merge: 0f8740a 8eac79c
Author: Robert Muir <rmuir@apache.org>
Date: Wed Oct 29 08:30:36 2014 -0400
Merge branch 'master' into enhancement/lucene_5_0_upgrade
commit 0f8740a782455a63524a5a82169f6bbbfc613518
Author: Robert Muir <rmuir@apache.org>
Date: Wed Oct 29 01:00:15 2014 -0400
fix/hack remaining filter and analysis issues
commit df534488569da13b31d66e581456dfd4b55156b9
Author: Robert Muir <rmuir@apache.org>
Date: Tue Oct 28 23:11:47 2014 -0400
fix ngrams / openbitset usage
commit 11f5dc3b9887f4da80a0fa1818e1350b30599329
Author: Robert Muir <rmuir@apache.org>
Date: Tue Oct 28 22:42:44 2014 -0400
hack over sort comparators
commit 4ebdc754350f512596f6a02770d223e9f5f7975a
Author: Robert Muir <rmuir@apache.org>
Date: Tue Oct 28 21:27:07 2014 -0400
compiler errors < 100
commit 2d60c9e29de48ccb0347dd87f7201f47b67b83a0
Author: Robert Muir <rmuir@apache.org>
Date: Tue Oct 28 03:13:08 2014 -0400
clear some nocommits around ram usage
commit aaf47fe6c0aabcfb2581dd456fc50edf871da758
Author: Robert Muir <rmuir@apache.org>
Date: Mon Oct 27 12:27:34 2014 -0400
migrate fieldinfo handling
commit ef6ed6d15d8def71cd880d97249678136cd29fe3
Author: Robert Muir <rmuir@apache.org>
Date: Mon Oct 27 12:07:13 2014 -0400
more simple fixes
commit f475e1048ae697dd9da5bd9da445102b0b7bc5b3
Author: Robert Muir <rmuir@apache.org>
Date: Mon Oct 27 11:58:21 2014 -0400
more fielddata ram accounting fixes
commit 16b4239eaa9b4262df258257df4f31d39f28a3a2
Author: Simon Willnauer <simonw@apache.org>
Date: Mon Oct 27 16:47:32 2014 +0100
add missing file
commit 5b542fa2a6da81e36a0c35b8e891a1d8bc58f663
Author: Simon Willnauer <simonw@apache.org>
Date: Mon Oct 27 16:43:29 2014 +0100
cut over completion posting formats - still some nocommits
commit ecdea49404c4ec4e1b78fb54575825f21b4e096e
Author: Robert Muir <rmuir@apache.org>
Date: Mon Oct 27 11:21:09 2014 -0400
fielddata accountable fixes
commit d43da265718917e20c8264abd43342069198fe9c
Author: Simon Willnauer <simonw@apache.org>
Date: Mon Oct 27 16:19:53 2014 +0100
cut over BloomFilterPostings to new API
commit 29b192ba621c14820175775d01242162b88bd364
Author: Robert Muir <rmuir@apache.org>
Date: Mon Oct 27 10:22:51 2014 -0400
fix more analyzers
commit 74b4a0c5283e323a7d02490df469497c722780d2
Author: Robert Muir <rmuir@apache.org>
Date: Mon Oct 27 09:54:25 2014 -0400
fix tests
commit 554084ccb4779dd6b1c65fa7212ad1f64f3a6968
Author: Simon Willnauer <simonw@apache.org>
Date: Mon Oct 27 14:51:48 2014 +0100
maintain supressed exceptions on CorruptIndexException
commit cf882d9112c5e8ef1e9f2b0f800f7aa59001a4f2
Author: Simon Willnauer <simonw@apache.org>
Date: Mon Oct 27 14:47:17 2014 +0100
commitOnClose=false
commit ebb2a9189ab2f459b7c6c9985be610fd90dfe410
Author: Simon Willnauer <simonw@apache.org>
Date: Mon Oct 27 14:46:06 2014 +0100
cut over indexwriter closeing in InternalEngine
commit cd21b3d4706f0b562bd37792d077d60832aff65f
Author: Simon Willnauer <simonw@apache.org>
Date: Mon Oct 27 14:38:10 2014 +0100
fix constant
commit f93f900c4a1c90af3a21a4af5735a7536423fe28
Author: Robert Muir <rmuir@apache.org>
Date: Mon Oct 27 09:50:49 2014 -0400
fix test
commit a9a752940b1ab4699a6a08ba8b34afca82b843fe
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date: Mon Oct 27 09:26:18 2014 +0100
Be explicit about the index options
commit d9ee815babd030fa2ceaec9f467c105ee755bf6b
Author: Simon Willnauer <simonw@apache.org>
Date: Sun Oct 26 20:03:44 2014 +0100
cut over store and directory
commit b3f5c8e39039dd8f5caac0c4dd1fc3b1116e64ca
Author: Robert Muir <rmuir@apache.org>
Date: Sun Oct 26 13:08:39 2014 -0400
more test fixes
commit 8842f2684e3606aae0860c27f7a4c53e273d47fb
Author: Robert Muir <rmuir@apache.org>
Date: Sun Oct 26 12:14:52 2014 -0400
tests manual labor
commit c43de5aec337919a3fdc3638406dff17fc80bc98
Author: Robert Muir <rmuir@apache.org>
Date: Sun Oct 26 11:04:13 2014 -0400
BytesRef -> BytesRefBuilder
commit 020c0d087a2f37566a1db390b0e044ebab030138
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date: Sun Oct 26 15:53:37 2014 +0100
Moved over to BitSetFilter
commit 48dd1b909e6c52cef733961c9ecebfe4f67109fe
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date: Sun Oct 26 15:53:11 2014 +0100
Left over Collector api change in ScanContext
commit 6ec248ef63f262bcda400181b838fd9244752625
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date: Sun Oct 26 15:47:40 2014 +0100
Moved indexed() over to indexOptions != null or indexOptions == null
commit 9937aebfd8546ae4bb652cd976b3b43ac5ab7a63
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date: Sun Oct 26 13:26:31 2014 +0100
Fixed many compile errors. Mainly around the breaking Collector api change in 5.0.
commit fec32c4abc0e3309cf34260c8816305a6f820c9e
Author: Robert Muir <rmuir@apache.org>
Date: Sat Oct 25 11:22:17 2014 -0400
more easy fixes
commit dab22531d801800d17a65dc7c9464148ce8ebffd
Author: Robert Muir <rmuir@apache.org>
Date: Sat Oct 25 09:33:41 2014 -0400
more progress
commit 414767e9a955010076b0497cc4f6d0c1850b48d3
Author: Robert Muir <rmuir@apache.org>
Date: Sat Oct 25 06:33:17 2014 -0400
more progress
commit ad9d969fddf139a8830254d3eb36a908ba87cc12
Author: Robert Muir <rmuir@apache.org>
Date: Fri Oct 24 14:28:01 2014 -0400
current state of fun
commit 464475eecb0be15d7d084135ed16051f76a7e521
Author: Robert Muir <rmuir@apache.org>
Date: Fri Oct 24 11:42:41 2014 -0400
bump to 5.0 snapshot
Currently MetaDataStateFormat loads the first available state file that has
the latest version. In case several files are available and some of them use
the new format while other ones use the legacy format, it should also prefer
the new format. This is typically useful when we upgrade the metadata when
recovering from the gateway: we might write the upgraded state with the new
format while the previous state used the legacy format, so we end up with
two files having the same version but using different formats.
Close#8343
This will help the exists/missing filters behave as expected in presence of
empty strings, as well as when using a default analyzer that would generate
tokens for an empty string (uncommon).
Close#8198
The transport client created within ExternalTestCluster needs a name that follows our naming convention otherwise the thread leak filter barfs when running tests against an external cluster. Used "transport_client_external_{n}" where n gets incremented every time a new external cluster gets created. Updated thread leak filters rules to ignore threads created by such transport client.
We currently use the djb2 hash function in order to compute the shard a
document should go to. Unfortunately this hash function is not very
sophisticated and you can sometimes hit adversarial cases, such as numeric ids
on 33 shards.
Murmur3 generates hashes with a better distribution, which should avoid the
adversarial cases.
Here are some examples of how 100000 incremental ids are distributed to shards
using either djb2 or murmur3.
5 shards:
Murmur3: [19933, 19964, 19940, 20030, 20133]
DJB: [20000, 20000, 20000, 20000, 20000]
3 shards:
Murmur3: [33185, 33347, 33468]
DJB: [30100, 30000, 39900]
33 shards:
Murmur3: [2999, 3096, 2930, 2986, 3070, 3093, 3023, 3052, 3112, 2940, 3036, 2985, 3031, 3048, 3127, 2961, 2901, 3105, 3041, 3130, 3013, 3035, 3031, 3019, 3008, 3022, 3111, 3086, 3016, 2996, 3075, 2945, 2977]
DJB: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 900, 900, 900, 900, 1000, 1000, 10000, 10000, 10000, 10000, 9100, 9100, 9100, 9100, 9000, 9000, 0, 0, 0, 0, 0, 0]
Even if djb2 looks ideal in some cases (5 shards), the fact that the
distribution of its hashes has some patterns can raise issues with some shard
counts (eg. 3, or even worse 33).
Some tests have been modified because they relied on implementation details of
the routing hash function.
Close#7954
ClusterDiscoveryConfiguration is part of the test infra and should get exported as part of the test jar. This is achieved by moving the class to org.elasticsearch.test.discovery
Closes#8337
NettyTransport*Tests were previously in org.elasticsearch.test.transport and ended up being exported with the test jar. org.elasticsearch.transport.netty should be a better place for them together with exising tests.
The test verifies the correct behavior of a listener but we only call the listener after publishing a new cluster state. Only checking on the publishing of the state introduces a racing condition.
This commit removes all special file handling from DistributorDirectory
that assigned certain files to the primary directory. This special handling
was added to ensure that files that are written more than once are essentially
overwritten. Yet this implementation is consistent all the time and doesn't need
this special handling for files that are written through this directory. Writes
to the underlying directory not going through the distributor directory are not
and have never been supported.
Note: this commit also fixes the problem of adding directories to the distributor
during restart where the primary can suddenly change and file mappings are by-passed.
Closes#8276
Also added:
* Better exception handling in UnicastZenPing#ping and MulticastZenPing#ping
* In the join thread that runs the innerJoinCluster loop, remember the last known exception and throw that when assertions are enabled. We loop until inner join has completed and if continues exceptions are thrown we should fail the test, because the exception shouldn't occur in production (at least not too often).
Applied feedback 3
Closes#8327
This commit adds the ability to associate a bit of state with each
individual aggregation.
The aggregation response can be hard to stitch back together without
having a reference to the aggregation request. In many cases this is not
available, many json serializer frameworks cache types globally or have a
static deserialisation override mechanism. In these cases making the
original request available, if at all possible, would be a hack.
The old facets returned `_type` which was just enough metadata to know
what the originating facet type in the request was.
This PR takes `_type` one step further by introducing ANY arbitrary meta
data. This could be further <strike>ab</strike>used for instance by
generic/automated aggregations that include UI state (color information,
thresholds, user input states, etc) per aggregation.
The discovery.zen.minimum_master_nodes setting can be updated dynamically. Settings it to a value higher then the current number of master nodes will cause the current master to step down. This is dangerous because if done by mistake (typo) there is no way to restore the settings (this requires an active master).
Closes#8321
version.
An early version of #7966 had the ability to choose a bwc version
automatically, but this was removed before the change was committed.
However, the change was not removed from the ongoing work in #7922
and it made it in unknowningly.
If includes or excludes are set
XContentFactory.xcontentBuilder() allocates a new
BytesStreamOutput using the default page size which is 16kb.
Can be optimized to use the length of the sourceRef because
that is the maximum possible size that the streamOutput will
use.
This redcues the amount of memory allocated for a request
that is fetching 200.000 small documents (~150 bytes each)
by about 300 MB
Close#8138
This adds HTTP pipelining support to netty. Previously pipelining was not
supported due to the asynchronous nature of elasticsearch. The first request
that was returned by Elasticsearch, was returned as first response,
regardless of the correct order.
The solution to this problem is to add a handler to the netty pipeline
that maintains an ordered list and thus orders the responses before
returning them to the client. This means, we will always have some state
on the server side and also requires some memory in order to keep the
responses there.
Pipelining is enabled by default, but can be configured by setting the
http.pipelining property to true|false. In addition the maximum size of
the event queue can be configured.
The initial netty handler is copied from this repo
https://github.com/typesafehub/netty-http-pipeliningCloses#2665
This changes the weighing function for the filter cache to use a
configurable minimum weight for each filter cached. This value defaults
to 1kb and can be configured with the
`indices.cache.filter.minimum_entry_weight` setting.
This also fixes an issue with the filter cache where the concurrency
level of the cache was exposed as a setting, but not used in cache
construction.
Relates to #8268
This adds a Listener interface to the ClusterInfoService, this is used
by the DiskThresholdDecider, which adds a listener to check for nodes
passing the high watermark. If a node is past the high watermark an
empty reroute is issued so shards can be reallocated if desired.
A reroute will only be issued once every
`cluster.routing.allocation.disk.reroute_interval`, which is "60s" by
default.
Refactors InternalClusterInfoService to delegate the nodes stats and
indices stats gathering into separate methods so they have be overriden
by extending classes. Each stat gathering method returns a
CountDownLatch that can be used to wait until processing for that part
is successful before calling the listeners.
Fixes#8146
This change corrects the location information gathered by the loggers so that when printing class name, method name, and line numbers in the log pattern, the information from the class calling the logger is used rather than a location within the logger itself.
A reset method has also been added to the LogConfigurator class which allows the logging configuration to be reset. This is needed because if the LoggingConfigurationTests and Log4jESLoggerTests are run in the same JVM the second one to run needs to be able to override the log configuration set by the first
Closes#5130, #8052
we current check that the recovery is not finished when people access the status local variables. This is wrong and we should check for the refcount being > 0 as it is OK to use the status after it has marked as finished but there are still on going, in-flight reference to it.
Relates to #8092Closes#8271
This change corrects the location information gathered by the loggers so that when printing class name, method name, and line numbers in the log pattern, the information from the class calling the logger is used rather than a location within the logger itself.
Closes#5130
The MLT field query is simply replaced by a MLT query set to specififc field.
To simplify code maintenance we should deprecate it in 1.4 and remove it in
2.0.
Closes#8238
Today any call to the current randomized context modifies the random
sequence such that cluster initialization is context dependent. If due
to an error for instance a static util method is used like `randomLong`
inside the TestCluster instead of the provided Random instance all
reproducibility guarantees are gone. This commit adds a safe mechanism
to initialize these clusters even if a static helper is used. All none
test scope clusters are now initialized in a private randomized context.
Today we simply close the IW when we have to open a new one if ie.
we change merge settings. Yet, the API is deprecated and under the hood
it commits the IW. For safety we should add the current translog ID and
use rollback instead to not wait for merges.
One can set the headers sent with request by the clients by setting the `request.headers` setting. This commit enables overriding any such set headers directly on the requests.
In case the multicast binding fails (due to socket errors), abort zen
pinging code and throw a better error message
Relates #8225
(cherry picked from commit f819cff77a0ef95b340afc2f22e3464283803960)
Break-out common functionality between elasticsearch.bat and service.bat
Relates #8237
(cherry picked from commit 87095e79534bcef97b2b275322c924bd96b34e3b)
Fixes a bug where alias creation would allow `null` for index name, which thereby
applied the alias to _all_ indices. This patch makes the validator throw an
exception if the index is null.
```bash
POST /_aliases
{
"actions": [
{
"add": {
"alias": "empty-alias",
"index": null
}
}
]
}
```
```json
{
"error": "ActionRequestValidationException[Validation Failed: 1: Alias action [add]: [index] may not be null;]",
"status": 400
}
```
The reason this bug wasn't caught by the existing tests is because
the old test for nullness only validated against a cluster which had
zero indices. The null index is translated into "_all", and since
there are no indices, this fails because the index doesn't exist.
So the test passes.
However, as soon as you add an index, "_all" resolves and you get the
situation described in the original bug report: null index is
accepted by the alias, resolves to "_all" and gets applied to everything.
The REST tests, otoh, explicitly tested this bug as a real feature and therefore
passed. The REST tests were modified to change this behavior.
Fixes#7863
If the filter producing the FBS were to be cached then it would flip bits each time the NonNestedDocsFilter was executed.
In our case we cache the NonNestedDocsFilter itself so that didn't happen each time this filter was used.
However the hashcode of NonNestedDocsFilter and NestedDocsFilter was the same, since NonNestedDocsFilter directly used NestedDocsFilter#hashCode()
Closes#8227Closes#8232
The test validates that minimum master node is honored after shutting down nodes. When nodes are restarted we may go through a couple of master election of a master is elected and discovers some of the old nodes left before all new node have joined. Processing that node failure the master de-elects it self, which is fine because we will start a new master election. Test however runs a clusterHealth call with a wait for event. This is implemented using a cluster state update task which fails when the master steps down. Longer term fix requires a rewrite of the cluster health API code but a quick fix is not check for events (not needed for this test).
This commit adds a new field to the response of the terms aggregation called
`sum_other_doc_count` which is equal to the sum of the doc counts of the buckets
that did not make it to the list of top buckets. It is typically useful to have
a sector called eg. `other` when using terms aggregations to build pie charts.
Example query and response:
```json
GET test/_search?search_type=count
{
"aggs": {
"colors": {
"terms": {
"field": "color",
"size": 3
}
}
}
}
```
```json
{
[...],
"aggregations": {
"colors": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 4,
"buckets": [
{
"key": "blue",
"doc_count": 65
},
{
"key": "red",
"doc_count": 14
},
{
"key": "brown",
"doc_count": 3
}
]
}
}
}
```
Close#8213
Add source_node and target_node fields to the recovery cat API. Also fixed and updated the documentation which was not complete concerning fields names.
Closes#8041
- Added parseBooleanExact in booleans which throws exception in case of
parse failure
- Used ParseExact in static's to make it consistent
+ code review fixes
- Added unit test
- Changed Exception Type to ElasticSearchIllegalArg from Parse
- used isExplicit*
Closes#8097
initialize ports.
JUnit uses an instance per test which caused the prot range to be
initialized twice since suite level tests are not configured in a
different context.
The cluster for `Scope.SUITE` tests must be initialize in a static manner
before the first test runs otherwise the random context used to initialize
the cluster is taken from tests randomness rather than the suites randomness.
This means test clusters will have different setups if only a single test is
executed or even the test might have a entirely different random sequence.
This is due to a bug in older version causing refreshes to potentially be missed due to relocations #6545
Also:
- Changed test to keep track of ids and report missing ones.
- Removed total count check from assertSearchHits in order to enable per id checks in cased of a mismatch
- Added a printable unique id part the ids of dummy documents added by indexRandom. The current random unicode id
sometimes prints as ???? to the logs making them hard to trace
The MLT query has a lot of parameters. For example, a set of documents is
specified with either `like_text`, `ids` or `docs`, with at least one
parameter required. This commit groups all the document specification
parameters under one called `like`. The syntax is described below and could
easily be extended to allow for new means of specifying document input. The
`like_text`, `ids` and `docs` parameters are deprecated.
As a single piece text:
{
"query": {
"more_like_this": {
"like": "some text here"
}
}
}
As a single item:
{
"query": {
"more_like_this": {
"like": {
"_index": "imdb",
"_type": "movies",
"_id": "88247"
}
}
}
}
Or as a mixture of all:
{
"query": {
"more_like_this": {
"like": [
"Some random text ...",
{
"_index": "imdb",
"_type": "movies",
"_id": "88247"
},
{
"_index": "imdb",
"_type": "movies",
"doc": {
"title": "Document with an artificial title!"
}
}
]
}
}
}
Closes#8039
If a bulk request contains a mix of indexing requests for an existing index and one that needs to be auto-created but a cluster configuration prevents the auto-create of the new index the ingest process hangs. The exception for the failure to create an index was not caught or reported back properly. Added a Junit test to recreate the issue and the associated fix is in TransportBulkAction.
Closes#8125
When reading metadata we do catch FileNotFound and NoSuchFileExceptions
today, log the even and return an empty metadata object. Yet, in some cases
this might be the wrong thing todo ie. if a commit point is provided these
situations are actually an error and should be rethrown. This commit
pushes the responsiblity to the caller to handle this exception.
Closes#8207
The concurrency level allows to configure the cache internal segments
used to cache data. This can have direct impact on evicition rates since
memory bound caches are equally divided into segments which can cause
early evictions if cache entries are not well balanced.
Relates to #7836
We already have two places duplicating this rather hairy logic, this
commit intorduces a new RefCoutned interace and an abstract implementation
that can be used for delegation. It factors out all the reference counting
and adds single and multithreaded test for it.
Closes#8210
Windows can throw NoSuchFileException when using File.walkFileTree and deleting files concurrently. This commit changes IO exceptions into assertion error so that assertBusy will wait for them as well.
This commit rewrites the state controls in the RecoveryTarget family classes to make it easier to guarantee that:
- recovery resources are only cleared once there are no ongoing requests
- recovery is automatically canceled when the target shard is closed/removed
- canceled recoveries do not leave temp files behind when canceled.
Highlights of the change:
1) All temporary files are cleared upon failure/cancel (see #7315 )
2) All newly created files are always temporary
3) Doesn't list local files on the cluster state update thread (which throw unwanted exception)
4) Recoveries are canceled by a listener to IndicesLifecycle.beforeIndexShardClosed, so we don't need to explicitly call it.
5) Simplifies RecoveryListener to only notify when a recovery is done or failed. Removed subtleties like ignore and retry (they are dealt with internally)
Closes#8092 , Closes#7315
This commit adds the ability to enable / disable relocations
on an entire cluster or on individual indices for either:
* `primaries` - only primaries can rebalance
* `replica` - only replicas can rebalance
* `all` - everything can rebalance (default)
* `none` - all rebalances are disabled
similar to the allocation enable / disable functionality.
Relates to #7288
If `fielddata_fields` are passed as a simple value instead of an array
we end up in an infinite loop createing parsed elements with null
values.
This commit validates the incoming token
Closes#8203
Since we enabled the disk threshold decider by default, we need to
enable the cluster info service so that disk usages and shard sizes can
be gathered also.
Adds a test that checks that we are gathering information by default.
This commit adds throttle stats to the indexing stats and uses a call back from InternalEngine to manage the stats.
Also includes updates the IndexStatsTests to test for these new stats.
Stats added :
```
throttle_time_in_millis
is_throttled
```
Closes#7861
We used to handle FNF exceptions in the store when reading a snapshot.
For instance if we can't open a segments file for a given commit point
we just return an empty metadata object and tracelog the even. This can
cause shards to be false marked as corrupted if a shard is forcefully
removed while a recovery started at the same time. We should in general
bubble up these exceptions and let the caller decided how to handle the
IOExceptions.
The issue with making it dynamic is that in the event a cluster is
switched from a noop to a concrete implementation, there may be
in-flight requests, once these requests complete we adjust the breaker
with a negative number and trip an assertion.
This also rarely uses noop breakers in InternalTestCluster
Previously, the leniency was on a per-query basis, with each query being
parsed into multiple queries, one for each field. If any one of these
queries failed, the entire query was discarded in the name of being
lenient.
Now query parts will only be discarded if they fail for a particular
field, the entire query is not discarded. This helps when performing a
query over a numeric and string field, as only the sub-queries that are
invalid due to format exceptions will be discarded.
Also moves the `simple_query_string` queries out of SimpleQueryTests and
into a dedicated SimpleQueryStringTests class.
Fixes#7967
The live docs that is passed down was ignored by the filter impl. Now the children filter gets wrapped with ApplyAcceptedDocsFilter, so live docs are actually applied.
Closes#8180
The IndicesWarmer gets set before the InternalIndexService gets set, which can lead to a small time window were InternalIndexService isn't set
Closes#8140Closes#8168
Also added a bwc test that runs a delete by query with a has_child query and verifies that only that operation is ignored when recovering from disk during a upgrade.
Closes#8031Closes#8177
Query String query now supports a new `time_zone` option based on JODA time zones.
When using a range on date field, the time zone is applied.
```json
{
"query": {
"query_string": {
"text": "date:[2012 TO 2014]",
"timezone": "Europe/Paris"
}
}
}
```
Closes#7880.
Storing `_timestamp` by default means that under the default configuration, you
would have all the information you need in order to reindex into a different
index.
Close#8139
This adds a NoopCircuitBreaker, and then adds the settings
`indices.breaker.fielddata.type` and `indices.breaker.request.type`,
which can be set to "noop" in order to use a breaker that will never
break, and incurs no overhead during computation.
This also refactors the tests for the CircuitBreakerService to use
@Before and @After functions as well as adding settings in
ElasticsearchIntegrationTest to occasionally use NOOP breakers for all
tests.
This is functionally equivalent to before, so there should be no
user-visible impact, except I added a NOTE in the docs warning about
the interaction of pagination and rescoring.
Closes#6232Closes#7707
This patch allows to create several netty bootstrap, each of which
listening on different ports. This will potentially allow for features
to listen to different network interfaces for node-to-node or node-to-client
communication and is also the base to listen to several interfaces, so that those
can be used to speed up cluster communication in the future.
Closes#8098
This change means that buckets can now be serialised to JSON and serialized and deserialized to the transport API outside of the aggregation that contains them. This is a required change for #8110 (Reducers framework) but should make sense on it's own since object should really take care of their own serialization rather than relying on their parent object.
cat/nodes currently does not report any details related to file descriptors. This adds the current number in use, the maximum number available as well as their ratio (percentage) to cat/nodes as hidden-by-default metrics. In addition, this also adds current heap usage (as a non-percentage of ts max) and ram usage (as a non-percerntage of its max) to allow tools to provide more granularity.
Closes#7652
Dangling indices are indexes found on disk which are not part of the cluster state. By default, we don't delete them but rather import them into the cluster state in order to not accidentally delete data and also allow for the ease of copying index data folders from one cluster to another. Currently, the import logic doesn't check for existing aliases of the same name as the imported dangling index, causing both an index and an alias with the same name.
This commit add a protection against this. Note that the index is still kept as dangling and is only deleted from disk after `gateway.local.dangling_timeout` has passed (2 hours).
We also change the log message indicating deletion of dangling indices to a `WARN` level.
Closes#8059
This commit adds checksumming for cluster and index states. It moves
from a plain XContent based on-disk format to a more structured binary
format including a header and footer as well as a CRC32 checksum for
each of these files. Since previous versions didn't write any format
identifier etc. this commit adds a file extension `.st` for states
that have header/footer and checksum.
This commit also moves over to write temporary files and moves them into
place with an atomic move operation. This commit also serializes and
deserializes the states without materilazing them into opaque memory.
Closes#7586
Though we check that internet connection is fine and that the service is available, it could happen that some services are not available.
```java
assumeTrue(isDownloadServiceWorking("search.maven.org", 80, "/"));
singlePluginInstallAndRemove("org.elasticsearch/elasticsearch-transport-thrift/2.4.0", null);
```
In that case, the first check was successful but downloading thrift plugin from maven central download services was not successful.
This is not an issue with the plugin manager itself but sometimes the test could fail.
Adding a check that actual download service in Maven central works:
```java
assumeTrue(isDownloadServiceWorking("repo1.maven.org", 443, "/maven2/org/elasticsearch/elasticsearch-transport-thrift/2.4.0/elasticsearch-transport-thrift-2.4.0.pom"));
```
If refresh was already running when the refresh_interval is
dynamically disabled (set to a non-positive value like 0, -1, etc.),
then it's possible for the refresh thread to go into while (true)
refresh() loop.
Closes#8085Closes#8087
Today we use the current version which might enable features that are
not supported. We should throw an exception if this setting is not
present for any index.
Closes#8018
Since we don't use the cache, it's okay to clear it entirely if needed,
Elasticsearch maintains its own cache for compiled scripts.
Adds loader.clearCache() into a listener, the listener is called when a
script is removed from the Guava cache.
This also lowers the amount of cached scripts to 100, since 500 is
around the limit some users have run into before hitting an out of
memory error in permgem.
Fixes#7658
Today it's easy to use the wrong version when initializing DiscoveryNode
instances. This commit adds javadocs and a utility constant to
initialize DiscoveryNode instances if the the remotes node version is
unknown.
Closes#8051
Fixes a bug where alias creation would allow `null` for index name, which thereby
applied the alias to _all_ indices. This patch makes the validator throw an
exception if the index is null.
```bash
POST /_aliases
{
"actions": [
{
"add": {
"alias": "empty-alias",
"index": null
}
}
]
}
```
```json
{
"error": "ActionRequestValidationException[Validation Failed: 1: Alias action [add]: [index] may not be null;]",
"status": 400
}
```
The reason this bug wasn't caught by the existing tests is because
the old test for nullness only validated against a cluster which had
zero indices. The null index is translated into "_all", and since
there are no indices, this fails because the index doesn't exist.
So the test passes.
However, as soon as you add an index, "_all" resolves and you get the
situation described in the original bug report: null index is
accepted by the alias, resolves to "_all" and gets applied to everything.
Fixes#7976
Original indices were introduced in 1.4.0.Beta1 as optional, preceded by a boolean flag that told whether they were empty or not. We have to keep serializing as optional only when reading and writing from/to 1.4.0.Beta1, although the original indices are not optional whenever we go and serialize the request they belong to (which is why the boolean got removed in 1.4.0).
The scripted metric aggregation is now a PER_BUCKET aggregation so that parent buckets are evaluated independently. Also the params and reduceParams are copied for each instance of the aggregator (each parent bucket) so modifications to the values are kept only within the scope of its parent bucket
Closes#8036
short[] were mistakenly encoded as a float[]. This is not an issue for the
text-based xcontents that we have (yaml, json) since floats can represent any
short value and are serialized as strings. However, this will make the binary
xcontents serialize shorts as int instead of floats.
Close#7845
By letting the fetch phase understand the nested docs structure we can serve nested docs as hits.
The `top_hits` aggregation can because of this commit be placed in a `nested` or `reverse_nested` aggregation.
Closes#7164
In some cases a shard search request gets created on a node to be only used there and never sent over the transport. This commit clarifies that and creates a new base class called `ShardSearchLocalRequest` that can and will be only used locally. `ShardSearchTransportRequest` on the other hand delegates to the local version but extends `TransportRequest` and is `Streamable`, which means that it is supposed to be sent over the transport.
This way we can make the `OriginalIndices` only required (and mandatory now) in the transport variant.
Took the chance to remove an unused InternalScrollSearchRequest constructor and an empty else branch in `TransportSearchScrollQueryAndFetchAction`.
Closes#7855
This patch adds to `_cat/indices` information about memory usage per index by adding memory used by FieldData, IdCache, Percolate, Segments (memory, index writer, version map).
```
% curl 'localhost:9200/_cat/indices?v&h=i,tm'
i tm
wiki 8.1gb
test 30.5kb
user 1.9mb
```
Closes#7008
* Run flush in beforeIndexShardClosed to prevent an empty shard.
* Only run check index if the shard state before closing was: started, relocated or post_recovery
This commit does the following:
* Add the new API at the rest layer, being backed by the optimize API
with upgrade flag, and segments api to find upgrade status.
* Add `upgrade` flag to optimize API, and deprecate `force` flag (will
remove in master)
* Add test for both synchronous and async upgrade
closes#7884closes#7922