We're concerned that unmodifiableMap uses significantly more memory than
ImmutableMap did - especially in cluster state - so we ban it there outright
and move to ImmutableOpenMap.
Removes ClusterState$Builder#routingTable(RoutingTable$Builder) because that
method had the side effect of building the routing table which can only be
done once per RoutingTable$Builder now that it uses ImmutableOpenMap.
Closes#13880
Squashed commit of the following:
commit 316a328e5032e580ba840db993d907631334aac0
Author: Robert Muir <rmuir@apache.org>
Date: Wed Sep 30 16:57:47 2015 -0400
windows is terrible
commit 0406b560c58bf833f8d77af9c7cf3386771dd9c5
Author: Robert Muir <rmuir@apache.org>
Date: Wed Sep 30 16:43:09 2015 -0400
Nuke ES_CLASSPATH appending
Out of box, ES expects its stuff to be in particular places. We should not be appending to ES_CLASSPATH, allowing users to specify stuff there, like we do in elasticsearch.bin.sh
If the user sets it, its not going to work out of box.
Closes#13812
commit 415d8972df28eddec322bb6d70100a1993fa95f6
Author: Robert Muir <rmuir@apache.org>
Date: Wed Sep 30 16:26:35 2015 -0400
Fail hard on empty classpath elements.
This can happen easily, if somehow old 1.x shellscripts survive and try to launch 2.x code.
I have the feeling this happens maybe because of packaging upgrades or something.
Either way: we can just fail hard and clear in this situation, rather than the current situation
where CWD might be /, and we might traverse the entire filesystem until we hit an error...
Relates to #13864
When the plugin manager does not find in `plugin-descriptor.properties` the exact same elasticsearch version it was built on
as the current elasticsearch version, it fails with a message like:
```
ERROR: Elasticsearch version [2.0.0-beta1] is too old for plugin [elasticsearch-mapper-attachments]
```
Actually, the message should be:
```
Plugin [elasticsearch-mapper-attachments] is incompatible with Elasticsearch [2.0.0.beta2]. Was designed for version [2.0.0.beta1].
```
The opposite is true. If you try to install a version of a plugin which was built with a newer version of elasticsearch, it will fail the same way:
```
Plugin [elasticsearch-mapper-attachments] is incompatible with Elasticsearch [2.0.0.beta1]. Was designed for version [2.0.0.beta2].
```
This commit adds supports for expiration after writes to Cache. This
enables entries to expire after they were initially placed in the cache
without prolonging their life on retrieval. Replacements are considered
new writes.
This commit removes and now forbids all uses of
com.google.common.cache.Cache, com.google.common.cache.CacheBuilder,
com.google.common.cache.RemovalListener,
com.google.common.cache.RemovalNotification,
com.google.common.cache.Weigher across the codebase. This is a major
step in the eventual removal of Guava as a dependency.
Relates #13224
We try to get the index shard instance again from the index service on a different
threads while that shard might have already been closed or removed which can cause a NPE
instead of another expected expecption.
This commit shuffels and rewrites some code in RecoverySourceHandler to make it
simpler and more unittestable. This commit doesn't change all parts of this class
neither is it fully tested yet. It's an important part of the infrastrucutre so I started
to make it better tested but I don't want to change everything in one go since it makes
review simpler and more detailed. Future commits will continue cleaning up the class and
add more tests.
If a document is indexed into ES with no id, ES will generate one for it. We used to have an optimization for this case where the engine will not try to resolve the ids of these request in the existing index but immediately try to index them. This optimization has proven to be the source of brittle bugs (solved!) and we disabled it in 1.5, preparing for it to be removed if no performance degradation was found. Since we haven't seen any such degradation we can remove it.
Along with the removal of the optmization, we can remove the autogenerate id flag on indexing requests and the can have duplicate flag. The only downside of the removal of the canHaveDuplicate flag is that we can't make sure any more that when we retry an autogenerated id create operation we will ignore any document already exists exception (See #9125 for background and discussion). To work around this, we don't set the operation to CREATE any more when we generate an id, so the resulting request will never fail when it finds an existing doc but do return a version of 2. I think that's acceptable.
Closes#13857
We have two unneded heavy dependencies on IndexShard that are unneeded and only cause
trouble if you try to mock index shard. This commit removes IndexSettingsService as well as
ClusterSerivce from IndexShard to simplify future mocking and construction.
While refactoring has_child and has_parent query we lost an important detail around types. The types that the inner query gets executed against shouldn't be the main types of the search request but the parent or child type set to the parent query. We used to use QueryParseContext#setTypesWithPrevious as part of XContentStructure class which has been deleted, without taking care though of setting the types and restoring them as part of the innerQuery#toQuery call.
Meanwhile also we make sure that the original context types are restored in PercolatorQueriesRegistry
Closes#13863
Closes#13854
Squashed commit of the following:
commit 42c1166efc55adda0d13fed77de583c0973e44b3
Author: Robert Muir <rmuir@apache.org>
Date: Tue Sep 29 11:59:43 2015 -0400
Add paranoia
Groovy holds on to a classloader, so check it before compilation too.
I have not reviewed yet what Rhino is doing, but just be safe.
commit b58668a81428e964dd5ffa712872c0a34897fc91
Author: Robert Muir <rmuir@apache.org>
Date: Tue Sep 29 11:46:06 2015 -0400
Add SpecialPermission to guard exceptions to security policy.
In some cases (e.g. buggy cloud libraries, scripting engines), we must
grant dangerous permissions to contained cases. Those AccessController blocks
are dangerous, since they truncate the stack, and can allow privilege escalation.
This PR adds a simple permission to check before each one, so that unprivileged code
like groovy scripts, can't do anything they shouldn't be allowed to do otherwise.
StoreRecoveryService used to be a pretty heavy class with lots of dependencies.
This class was basically not testable in isolation and had an async API with a listener.
This commit refactors this class to be a simple utility classs with a sync API hidden behind
the IndexShard interface. It includes single node tests and moves all the async properities to
the caller side.
Note, this change also removes the mapping update on master from the store recovery code since
it's not needed anymore in 3.0 because all stores have been subject to sync mapping updates such
that the master already has all the mappings for documents that made it into the transaction log.
Closes#13766
Now that groovy is factored out, we contain this dangerous stuff there.
TODO: look into those test hacks inspecting class protection domains, maybe we can
clean that one up too.
TODO: generalize the GroovyCodeSourcePermission to something all script engines check,
before entering accesscontrollerblocks. this way e.g. groovy script cannot coerce
python engine into creating something with more privs if it gets ahold of it... we
should probably protect the aws/gce hacks in the same way.
Previous to this change if to equal geohash cell query builders were created and then toQuery was called on one, they would no longer be equal.
This change also adds a test to AbstractQueryTestCase to make sure calling toQuery on any query builder does not affect the query builder's equality
* Add OS X support via "seatbelt" mechanism. This gives consistency across dev and prod, since many devs use OS X.
* block execveat system call: it may be new, but we should not allow it.
When a shard becomes in active we trigger a sync flush in order to speed up future recoveries. The sync flush causes a new translog generation to be made, which in turn confuses the IndexingMemoryController making it think that the shard is active. If no documents comes along in the next 5m, the shard is made inactive again , triggering a sync flush and so forth.
To avoid this, the IndexingMemoryController is changed to ignore empty translogs when checking if a shard became active. This comes with the price of potentially missing indexing operations which are followed by a flush. This is acceptable as if no more index operation come in, it's OK to leave the shard in active.
A new unit test is introduced and comparable integration tests are removed.
Closes#13802
Instead just log the same thing we print to the startup console for that case (magic logic),
it sucks to do this, but guice exceptions are too much.
All other non-guice exceptions will still be fully logged.
Closes#13782
This commit adds explicit ids for managing ElasticsearchException
serialization. By adding explicit ids and unit tests for them, the ids
are less brittle and breakage can be more clearly detected.
Today we use reflection where it's not needed anymore since java8 can
pass ctors around. This commit replaces runtime checks with compile time
checks which is always preferrable.
These classes are really duplicates and are just here for historical reasons.
We don't need these anymore since the same classes exist in lucene today.
This also removes the guice injection for DeletionPolicy and make them shard private.
TermsLookupQueryBuilder was left around only for bw comp reasons, but TermsQueryBuilder is its replacement. We can remove it now that it is clear query refactoring goes in master (3.0).
There is no need to have term vectors service on the shard level where it's
created for every shard. This commit moves it to a higher level which makes
shard creation slightly simpler and reduces the number of long living objects.
Before the refactoring we didn't check any invalid settings for strategy and relation
in the GeoShapeQueryBuilder. However, using SpatialStrategy.TERM and ShapeRelation.INTERSECTS
together is invalid and we tried to protect against that in the validate() method.
This PR moves these checks to setter for strategy and relation and adds tests for the new
behaviour.
Relates to #10217
Block execve(), fork(), and vfork() system calls, returning EACCES instead,
on kernels that support seccomp-bpf: either via seccomp() or falling back
to prctl().
Only linux/amd64 is supported. This feature can be disabled (in case
of problems) with bootstrap.seccomp=false.
Closes#13753
Squashed commit of the following:
commit 92cee05c72b49e532d41be7b16709e1c9f919fa9
Author: Robert Muir <rmuir@apache.org>
Date: Thu Sep 24 10:12:51 2015 -0400
Add a note about why we don't parse uname() or anything
commit b427971f45cbda4d0b964ddc4a55fae638880335
Author: Robert Muir <rmuir@apache.org>
Date: Thu Sep 24 09:44:31 2015 -0400
style only: we already pull errno into a local, use it for catch-all case
commit ddf93305525ed1546baf91f7902148a8f5b1ad06
Author: Robert Muir <rmuir@apache.org>
Date: Thu Sep 24 08:36:01 2015 -0400
add TODO
commit f29d1b7b809a9d4c1fcf15f6064e43f7d1b24696
Author: Robert Muir <rmuir@apache.org>
Date: Thu Sep 24 08:33:28 2015 -0400
Add full stacktrace at debug level always
commit a3c991ff8b0b16dc5e128af8fb3dfa6346c6d6f1
Author: Robert Muir <rmuir@apache.org>
Date: Thu Sep 24 00:08:19 2015 -0400
Add missing check just in case.
commit 628ed9c77603699aa9c67890fe7632b0e662a911
Author: Robert Muir <rmuir@apache.org>
Date: Wed Sep 23 22:47:16 2015 -0400
Add public getter, for stats or whatever if they need to know this
commit 3e2265b5f89d42043d9a07d4525ce42e2cb1c727
Author: Robert Muir <rmuir@apache.org>
Date: Wed Sep 23 22:43:06 2015 -0400
Enable use of seccomp(2) on Linux 3.17+ which provides more protection.
Add nice errors.
Add all kinds of checks and paranoia.
Add documentation.
Add boolean switch.
commit 0e421f7fa2d5236c8fa2cd073bcb616f5bcd2d23
Author: Robert Muir <rmuir@apache.org>
Date: Wed Sep 23 21:36:32 2015 -0400
Add defensive checks and nice error messages
commit 6231c3b7c96a81af8460cde30135e077f60a3f39
Author: Robert Muir <rmuir@apache.org>
Date: Wed Sep 23 20:52:40 2015 -0400
clean up JNA and BPF. block fork and vfork too.
commit bb31e8a6ef03ceeb1d5137c84d50378c270af85a
Author: Robert Muir <rmuir@apache.org>
Date: Wed Sep 23 19:00:32 2015 -0400
order is LE already for the JNA buffer, but be explicit about it
commit 10456d2f08f12ddc3d60989acb86b37be6a4b12b
Author: Robert Muir <rmuir@apache.org>
Date: Wed Sep 23 17:47:07 2015 -0400
block process execution with seccomp on linux/amd64
Some methods had default implementation while queries were going to be refactored, now that they are all refactored all those methods can be made abstract.
Remove ParseFieldMatcher duplication query in both contexts. QueryParseContext is still contained in QueryShardContext, as parsing still happens in the shards here and there. Most of the norelease comments have been removed simply because the scope of the refactoring has become smaller. Some could only be removed once everything, the whole search request, gets parsed on the coordinating node. We will get there eventually.
SimpleIndexQueryParserTests was the main responsible: deleted lots of duplicated tests, moved the ones that made sense to keep to their corresponding unit tests (note they were ESSingleNode tests before while are now converted to unit tests).
Closes#13750
After all queries now have a `toQuery` method and the parsers all
support `fromXContent` it is possible to remove the following
workarounds and deprecated methods we kept around while doing the
refactoring:
* remove the BaseQueryParser and BaseQueryParserTemp. All parsers
implement QueryParser directly now
* remove deprecated methods in QueryParseContext that either returned
a Query or a Filter.
* remove the temporary QueryWrapperQueryBuilder
Relates to #10217
The IndexingMemoryController checks periodically if there is any indexing activity on the shard. If no activity is sean for 5m (default) the shard is marked as inactive allowing it's indexing buffer quota to given to other active shards.
Sadly the current check is bad as it checks for 0 translog operation. This makes the inactive wait for a flush to happen - which used to take 30m and since #13707 doesn't happen at all (as we rely on the synced flush triggered by inactivity). This commit fixes the check so it will work with any translog size.
Closes#13759
This commit fixes ping timeout settings inconsistencies in
ZenDiscovery. In particular, the documentation refers to the ping
timeout setting as discovery.zen.ping_timeout but the code was
ultimately using discovery.zen.ping.timeout if this was set.
This commit also changes all instances of the raw string
“discovery.zen.ping_timeout” to the constant
o.e.d.z.ZenDiscovery.SETTING_PING_TIMEOUT.
Finally, this commit removes the legacy setting
"discovery.zen.initial_ping_timeout".
Closes#6579, #9581, #9908
The current MoreLikeThisQueryBuilder validation checks for existence of at
least one `like` text or item. This is hard to check in setters, so this PR
tries to change the construction of the query so that we can do these checks
already at construction time.
Changing to using arrays for fieldnames, likeTexts, likeItems, unlikeTexts
and unlikeItems. `likeTexts` and/or `likeItems` need to be specified at
construction time to validate we have at least one item there.
Relates to #10217
Banning `ImmutableSet` outright is too much to do all at once - this starts
the process by banning `ImmutableMap#entrySet` - one of the more common ways
that `ImmutableSet`s come up. It then starts to remove calls to
`ImmutableMap#entrySet` by changing declarations from `ImmutableMap` to `Map`.
Unfortunately this process is like pulling on a long, windy string and one
declaration change requires another which requires 5 more which in turn
require another few. So this change is rather large.
As such, to keep the changes manageable they only remove `ImmutableMap` from
the signatures that are needed for `entrySet` and make little effort to stop
using `ImmutableMap` internally. Removing the usages of `ImmutableMap`
complicates immutability guarantees and will be done separately.
In #12942, the NettyTransport and NettyHttpServerTransport were updated to allow for binding
to multiple addresses. However, the BoundTransportAddress holder only exposed the first address
that the transport was bound to and this object is used to populate the values returned to the user
via our APIs.
This change exposes all of the bound addresses in the BoundTransportAddress holder, which allows
for an accurate representation of all interfaces that elasticsearch is bound to and listening on.
Not the neatest implementation in the world. Maybe we should consider changing the builders so it is a single builder for sort and a single builder for rescore instead of a list of builders for each?
This commit addresses a confusing error message that arises when a
property parameter (e.g. -D) is after a double-dash parameter. The
current error message reports to the user that the parameter does not
start with “--". Adding the second dash as the error message suggests
causes the parameter to be silently ignored. This is confusing for the
user. With this commit, the user is now informed that the parameter
order is violated.
Relates e27ede48ce
These exceptions are useless and unused, since we are on a major verison we should remove
them. This commit also makes it easier to remove excepitons in the future.
In the past ClusterStateUpdateTask was an interface and we had various derived marker interfaces to control behavior. Since then we moved ClusterStateUpdateTask to be an abstract class but we kept the old hierarchy of implementations. All of those (but the AckedClusterStateUpdateTask) can be folded into ClusterStateUpdateTask, adding correct default behavior.
Closes#13735
This commit moves the size and ops based flush into a synchronous API into
IndexShard and removes the time-based flush alltogether since it' basically
covered by the inactive async flush API we have today. The functionality doesn't
need to be covered by scheduled task and async APIs while we can actually make all
the decisions in a sync manner which is way easier to control and to test.
Closes#13707
This commit removes all the opaque bytes for extra_source and template_source.
Instead source and extra_source etc. are represented as SearchSourceBuilder which can
in-place be modified and is updated with the content of the request parameters.
Template Source is parsed and evaluated which in-turn replaces the actual source.
Refactor the function_score query so it can be parsed on the coordinating node, split parse into fromXContent and toQuery, make FunctionScoreQueryBuilder Writeable.
Closes#13653
Given that we are moving to parsing queries on the coordinating node, the index name is not relevant anymore in QueryParseContext, as the parsing phase cannot be related to any specific index. On the contrary, the QueryShardContext is the one that holds mappings etc. and the index name too, as the lucene query creation happens on the data node and can still be related with the index that it happens against.
Changes are mainly around tests that were expecting the index name, moved to using QueryShardException in some of them, removed the index name elsewhere.
Closes#13631
Moving validation from validate() to constructors and setters for the
following query builders:
* GeoDistanceQueryBuilder
* GeoDistanceRangeQueryBuilder
* GeoPolygonQueryBuilder
* GeoShapeQueryBuilder
* GeohashCellQuery
* TermsQueryBuilder
Relates to #10217
This parser prototype allows to decleratively define parsers for XContent
instead of writing messy and error prone while loops. It encapsulates all the error handling logic
and only even tries to parse if the token types match the declaration.
I want to refactor scripting engines so we can contain dangerous "God-like" permissions
like createClassloader/sun.reflect. These are used for dynamic class generation (scripts, mocks).
This will mean some refactoring to ES core.
But first lets get the plugins in order first. I removed those permissions globally, and
fixed grants for lang-javascript, lang-python, securemock so that everything works.
lang-javascript needs no code changes, because rhino is properly written :)
lang-python needs accesscontroller blocks. securemock was already working as of 1.1
This is just a baby step, to try to do some of this incrementally! It doesn't yet provide
us anything.
Currently the tribe node version always stays 0, which can cause issues for the services that rely on cluster state version. For example, ClusterStateObserver doesn't revalidate the cluster state after change, which leads to cluster health check with wait flags to take much longer then actually needed.
Until now we had a cloud-azure plugin which is providing 3 distinct features:
* discovery on Azure
* snapshot/restore on Aure
* SMB store
This commit splits the plugin by feature so people can use either one or the other or both features.
Doc is updated accordingly.
This add equals, hashcode, read/write methods, validation, separates toQuery
and JSON parsing and adds serialization and query generation tests.
Deprecates two types of initializing the bounding box: In our documentation we
speak about specifying top/left and bottom/right corner of a bounding box. Here
we also allow for top/right and bottom/left. This adds not only to the amount
of code but also testing needed w/o too much benefit for the user other than
more chances to confuse top/right/bottom/left/latitude/longitude IMHO.
Missing: The toQuery method with type set to "indexed" is not tested at the
moment.
Cleanup changes unrelated to base refactoring:
* Switched from type String to enum for types in GeoBoundingBoxQueryBuilder.
* Switched to using type GeoPoint for storing the bounding box coordinates
instead of array of double values.
Relates to #10217 for the query refactoring part.
Relates to #12016 for how missing mappings are handled.
Adds a utility class for generating random geo data.
Adds some missing documentation.
Extend test to MEMORY type config
Fix final review comments and rebase
We moved a lot of repositories into elasticsearch, but in their new
location they retained their LICENSE.txt and NOTICE.txt files. These are
all the same, and having the license and notice and the root of the
repository should be sufficient.
This commit removes and now forbids all uses of
com.google.common.primitives.Ints across the codebase. This is one of
many steps in the eventual removal of Guava as a dependency.
Relates #13224
graduate this from a hack for insecure plugins to something we can
live with for per-module/plugin permissions, it now works reasonably
in unit tests and with Intellij and Eclipse IDEs.
remove security warnings: we will deal with these issues in a secure
way, if we cannot, then the plugin shouldn't be in our core codebase.
This PR is the second batch in moving the query validation we started
to collect in the validate() method to the corresponding setters
and constructors.
This is the more sheisty business along the same lines as
https://github.com/elastic/elasticsearch/pull/13638
1 hour total adding the real functionality, days of wasted time
on simulated fake functionality to satisfy our crazy test framework...
I debugged on the problematic jenkins machine and I think issues are
from parsing the classpath and URL normalization etc (trailing slashes
vs not, etc in URLs). So I simplifed the code, to remove this completely,
inverting the logic so we just use an exclusion list instead of inclusion one.
I also allow tests for these plugins to run from the IDE (works at least for eclipse) too.
At least for eclipse this is even less realistic as it piles all the code (src and test)
into a single codebase, but it means you can *use it* and you just have to run mvn verify
before pushing as always. And as always... best effort.
A JTS bug causes a misinterpretation of polygon coordinates leading to an unhelpful "geom" AssertionError. While this assertion occurs approx 0.02% of the time it can lead to a misleading test failure. This patch catches the geom assertion and retries randomShapeCreation. For safety a threshold is set to prevent unlimited retrying - though 1 retry is typically sufficient for correcting the invalid shape.
closes#13551
A JTS bug causes a misinterpretation of polygon coordinates leading to an unhelpful "geom" AssertionError. While this assertion occurs approx 0.02% of the time it can lead to a misleading test failure. This patch catches the geom assertion and retries randomShapeCreation. For safety a threshold is set to prevent unlimited retrying - though 1 retry is typically sufficient for correcting the invalid shape.
closes#13551
We don't have a plugin .zip for unit tests, so we can't do it
correctly. But we can approximate it better, so that if code
is simply missing an AccessController block at least tests will fail.
Classnames change quickly due to refactorings etc. If that happens in a minor release
we loose the ability to deserialize the exceptoin coming from another node sicne we today
look it up by classname. This change uses a dedicated static id instead of the classname
to lookup the actual class.
Especially the worst of the worst with thread permissions: for example,
this prevents some code from starting daemon thread that will outlive
the elasticsearch process and hang around doing evil shit.
This commit removes unnecesssary use of ExceptionHelpers where we actually
should serialize / deserialize the actual exception. This commit also
fixes one of the oddest problems where the actual exception was never
rendered / printed if `all shards failed` due to a missing cause.
This commit unfortunately doesn't fix Snapshot/Restore which is almost
unfixable since it has to serialize XContent and read from it which can't
transport exceptions.
Weighted centroid, morton hash, and geohash can be imprecise (computation error) to 1e-5. The previous compareTo set this tolerance too strict (1e-6) causing a reproducible comparison error on weighted centroid (#13558). This change relaxes the tolerance to the acceptable computation error of 1e-5
closes#13558
Improve IndexingMemoryController a bit:
- promptly push indexing buffer changes to IndexWriter, instead of waiting for next refresh/flush
- don't wait for merges to finish before dropping a shards's indexing buffer to 512 KB once it's inactive
- fix NPE if indices.memory.index_buffer_size is in node's settings with a bytes (not %) unit
- add some more logger.debug
This commit removes and now forbids all uses of
com.google.common.base.Joiner across the codebase. This is one of many
steps in the eventual removal of Guava as a dependency.
Relates #13224
This commit removes and now forbids all uses of
com.google.common.math.LongMath across the codebase. This is one step
of many in the eventual removal of Guava as a dependency.
This commit removes and now forbids all uses of
com.google.common.collect.Iterables across the codebase. This is one of
many steps in the eventual removal of Guava as a dependency.
Relates #13224
This commit removes and now forbids all uses of
com.google.common.base.Preconditions across the codebase. This is one
of many steps in the eventual removal of Guava as a dependency.
Relates #13224
https://github.com/elastic/elasticsearch/pull/12908
Moved output to verbose level and additionally changed the plugin info output format
Before:
```shell
PluginInfo{name='cloud-aws', description='The Amazon Web Service (AWS) Cloud plugin allows to use AWS API for the unicast discovery mechanism and add S3 repositories.', site=false, jvm=true, classname=org.elasticsearch.plugin.cloud.aws.CloudAwsPlugin, isolated=true, version='2.1.0-SNAPSHOT'}
```
After:
```shell
- Plugin information:
Name: cloud-aws
Description: The Amazon Web Service (AWS) Cloud plugin allows to use AWS API for the unicast discovery mechanism and add S3 repositories.
Site: false
Version: 2.1.0-SNAPSHOT
JVM: true
* Classname: org.elasticsearch.plugin.cloud.aws.CloudAwsPlugin
* Isolated: true
```
Fixes#12907
If we don't wait we might retrieve the cluster state before second node
was added when we try to add the delegate in which the discovery node for
node_2 is null.
When a single node starts up it will first elect itself as master and then tries to
recover the cluster state or, if there is none, initialize an empty one and publish it.
Until it has done that, the cluster state will contain a global block and
requests might fail with
SERVICE_UNAVAILABLE/1/state not recovered / initialized
We need to wait for green.
This commit fixes a test bug in
o.e.a.s.b.n.TransportBroadcastByNodeActionTests. Namely, the randomized
test allowed for the creation of cluster states that allocated indices
having zero shards. This ultimately surfaced in a
NoSuchElementException when attempting to iterate over the nonexistent
shards. The fix is merely to draw the random number of shards from 1 to
10 instead of 0 to 10.
This commit replaces the usage of LoadedCache with a simple CHM and calls
to computeIfAbsent and adds LoadingCache and CacheLoader to forbidden APIs
Relates to #13224
* Added BWC indices
* Added snapshot version to Version.java
* Fixed create_bwc_index to use localhost instead of localhost and 127.0.0.1 (problem with ipv4/6 setup)
This commit removes and now forbids all uses of
Function, Charsets, Collections2 across the codebase. This
is one of many steps in the eventual removal of Guava as a dependency.
Relates #13224
When publishing a new cluster state, the master will send it to all the node of the cluster, noting down how many *master* nodes responded successfully. The nodes do not yet process the new cluster state, but rather park it in memory. As soon as at least minimum master nodes have ack-ed the cluster state change, it is committed and a commit request is sent to all the node that responded so far (and will respond in the future). Once receiving the commit requests the nodes continue to process the cluster state change as they did before this change.
A few notable comments:
1. For this change to have effect, min master nodes must be configured.
2. All basic cluster state validation is done in the first phase of publish and is thus now part of `ShardOperationResult`
3. A new `COMMIT_TIMEOUT` settings is introduced, dictating how long a master should wait for nodes to ack the first phase. Unlike `PUBLISH_TIMEOUT`, if waiting for a commit times out, the cluster state change will be rejected.
4. Failing to achieve a min master node of acks, will cause the master to step down as it clearly doesn't have enough active followers.
5. Previously there was a short window between the moment a master lost it's followers and it stepping down because of node fault detection failures. In this short window, the master could process any change (but fail to publish it). This PR closes this gap to 0.
6. A dedicated pending cluster states queue was added to keep pending non-comitted cluster states and manage the logic around processing committed cluster states. See #13303 for details.
Closes#13062 , Closes#13303
This commit allows to refresh the info service in a blocking fashion
which allows tests to prevent installing listeners alltogether and
makes the class easier to test. Installing a listnener is always subject
to concurrent modifications where the listener might be called mulitple times
or with stale information which causes tests to fail.
Java 8 allows for method references which in-turn will cause
compile errors if a method is not visible while reflection fails late
and maybe too late. We can now register Request instances via FooRequest::new
instead of passing FooRequest.class and call it's ctor via reflection.
This test sometimes fails because the first node is elected as master and waits 30s for incoming joins but in the meanwhile the 3 other nodes form a cluster on their side. The index will be created and its shards allocated on these 3 nodes, then the test checks for the number of shards on each node (it should be 2 or 3) but because the first node has not fully join the cluster yet one node will have 5 shards.
closes#13305
1. FileSystem wrapping code is broken, thats why you get providermismatch exception!
Instead of fixing this, it SuppressesForbidden!!!!
2. Because it uses SuppressForbidden on the test, the whole thing is lenient, it uses java.io.File for example!
3. Of course it fails consistently on windows because it can't remove files, because it leaks file handles (locks)
like a sieve since it does not close node environment. With correct wrapping this is always detected by e.g.
our leak detection FS. Instead of fixing the leak, it assumesFalse(WINDOWS) !!!!!
I do not know how this snuck past me, but I need this fixed to remove setAccessible.
- promptly push indexing buffer changes to IndexWriter, instead of waiting for next refresh/flush
- don't wait for merges to finish before dropping a shards's indexing buffer to 512 KB
- fix NPE if indices.memory.index_buffer_size is in node's settings with a bytes (not %) unit
- add some more logger.debug
During the second phase of recovery, replayed transaction log entries may need to wait on mapping changes that have not yet propagated to the target node. Currently we correctly replay the operation at a later stage, but we acknowledge the replay request before actually performing the work.
Example failure: http://build-us-00.elastic.co/job/es_feature_two_phase_pub/859/Closes#13535
If the machine is very slow this test fails if the delta of the unallocaiton
timestamp and the last scheduled delay is greater than the scheduled delay time.
Today this is really horrible, and we have a PR sent to fix it, but nobody
does anything: https://github.com/aws/aws-sdk-java/pull/432
With java 9, we cannot even grant the permission, this kind of sheistiness is not allowed,
and s3 repository is completely broken.
The problem is their code is still broken, and won't handle neither SecurityException (our PR)
nor the new InaccessibleObjectException they will get from java 9.
We use a really hacky hack to deliver an exception that their code catches (IllegalAccessException) instead.
This means s3 repository is working on java 9, and we close off access to sun.security.ssl completely
Don't worry, I will fix the rest. But some of those remaining will need a lucene upgrade,
we need to add a getter or two for tests to do things cleanly.
In addition to being a big security problem, setAccessible is a risk
for java 9 migration. We need to clean up our code so we can ban it
and eventually enforce this with security manager for third-party code, too,
or we may have problems.
Instead of using setAccessible, use the correct modifier (e.g. public).
TODO: ban in tests
TODO: ban in security manager at runtime
This commit removes and now forbids all uses of
com.google.common.collect.ImmutableSortedMap across the codebase. This
is one of many steps in the eventual removal of Guava as a dependency.
Relates #13224
Instead of asking blob store to create output for posting blob content, this change provides that content of the blob to the blob store for writing. This will significantly simplify the interface for S3 and Azure plugins.
This commit replaces:
* com.google.common.util.concurrent.ListenableFuture
* com.google.common.util.concurrent.SettableFuture
* com.google.common.util.concurrent.Futures
* com.google.common.util.concurrent.MoreExecutors
And forbits its usage via forbidden APIs. This is one of
many steps in the eventual removal of Guava as a dependency.
Relates to #13224
We have a gazillion ways to specify the source of the search request.
There is no need to so many we can reduce them dramatically and also remove
some deprecated API.
The public org.elasticsearch.common.settings.Settings#getAsMap method
leaks the dependency on Guava by returning a
com.google.common.collect.ImmutableMap. The leaking of this dependency
should be removed in preparation for the eventual complete removal of
Guava as a dependency.
Relates #13224