We're concerned that unmodifiableMap uses significantly more memory than
ImmutableMap did - especially in cluster state - so we ban it there outright
and move to ImmutableOpenMap.
Removes ClusterState$Builder#routingTable(RoutingTable$Builder) because that
method had the side effect of building the routing table which can only be
done once per RoutingTable$Builder now that it uses ImmutableOpenMap.
Closes#13880
Squashed commit of the following:
commit 316a328e5032e580ba840db993d907631334aac0
Author: Robert Muir <rmuir@apache.org>
Date: Wed Sep 30 16:57:47 2015 -0400
windows is terrible
commit 0406b560c58bf833f8d77af9c7cf3386771dd9c5
Author: Robert Muir <rmuir@apache.org>
Date: Wed Sep 30 16:43:09 2015 -0400
Nuke ES_CLASSPATH appending
Out of box, ES expects its stuff to be in particular places. We should not be appending to ES_CLASSPATH, allowing users to specify stuff there, like we do in elasticsearch.bin.sh
If the user sets it, its not going to work out of box.
Closes#13812
commit 415d8972df28eddec322bb6d70100a1993fa95f6
Author: Robert Muir <rmuir@apache.org>
Date: Wed Sep 30 16:26:35 2015 -0400
Fail hard on empty classpath elements.
This can happen easily, if somehow old 1.x shellscripts survive and try to launch 2.x code.
I have the feeling this happens maybe because of packaging upgrades or something.
Either way: we can just fail hard and clear in this situation, rather than the current situation
where CWD might be /, and we might traverse the entire filesystem until we hit an error...
Relates to #13864
When the plugin manager does not find in `plugin-descriptor.properties` the exact same elasticsearch version it was built on
as the current elasticsearch version, it fails with a message like:
```
ERROR: Elasticsearch version [2.0.0-beta1] is too old for plugin [elasticsearch-mapper-attachments]
```
Actually, the message should be:
```
Plugin [elasticsearch-mapper-attachments] is incompatible with Elasticsearch [2.0.0.beta2]. Was designed for version [2.0.0.beta1].
```
The opposite is true. If you try to install a version of a plugin which was built with a newer version of elasticsearch, it will fail the same way:
```
Plugin [elasticsearch-mapper-attachments] is incompatible with Elasticsearch [2.0.0.beta1]. Was designed for version [2.0.0.beta2].
```
This commit adds supports for expiration after writes to Cache. This
enables entries to expire after they were initially placed in the cache
without prolonging their life on retrieval. Replacements are considered
new writes.
This commit removes and now forbids all uses of
com.google.common.cache.Cache, com.google.common.cache.CacheBuilder,
com.google.common.cache.RemovalListener,
com.google.common.cache.RemovalNotification,
com.google.common.cache.Weigher across the codebase. This is a major
step in the eventual removal of Guava as a dependency.
Relates #13224
We try to get the index shard instance again from the index service on a different
threads while that shard might have already been closed or removed which can cause a NPE
instead of another expected expecption.
This commit shuffels and rewrites some code in RecoverySourceHandler to make it
simpler and more unittestable. This commit doesn't change all parts of this class
neither is it fully tested yet. It's an important part of the infrastrucutre so I started
to make it better tested but I don't want to change everything in one go since it makes
review simpler and more detailed. Future commits will continue cleaning up the class and
add more tests.
If a document is indexed into ES with no id, ES will generate one for it. We used to have an optimization for this case where the engine will not try to resolve the ids of these request in the existing index but immediately try to index them. This optimization has proven to be the source of brittle bugs (solved!) and we disabled it in 1.5, preparing for it to be removed if no performance degradation was found. Since we haven't seen any such degradation we can remove it.
Along with the removal of the optmization, we can remove the autogenerate id flag on indexing requests and the can have duplicate flag. The only downside of the removal of the canHaveDuplicate flag is that we can't make sure any more that when we retry an autogenerated id create operation we will ignore any document already exists exception (See #9125 for background and discussion). To work around this, we don't set the operation to CREATE any more when we generate an id, so the resulting request will never fail when it finds an existing doc but do return a version of 2. I think that's acceptable.
Closes#13857
We have two unneded heavy dependencies on IndexShard that are unneeded and only cause
trouble if you try to mock index shard. This commit removes IndexSettingsService as well as
ClusterSerivce from IndexShard to simplify future mocking and construction.
While refactoring has_child and has_parent query we lost an important detail around types. The types that the inner query gets executed against shouldn't be the main types of the search request but the parent or child type set to the parent query. We used to use QueryParseContext#setTypesWithPrevious as part of XContentStructure class which has been deleted, without taking care though of setting the types and restoring them as part of the innerQuery#toQuery call.
Meanwhile also we make sure that the original context types are restored in PercolatorQueriesRegistry
Closes#13863
Closes#13854
Squashed commit of the following:
commit 42c1166efc55adda0d13fed77de583c0973e44b3
Author: Robert Muir <rmuir@apache.org>
Date: Tue Sep 29 11:59:43 2015 -0400
Add paranoia
Groovy holds on to a classloader, so check it before compilation too.
I have not reviewed yet what Rhino is doing, but just be safe.
commit b58668a81428e964dd5ffa712872c0a34897fc91
Author: Robert Muir <rmuir@apache.org>
Date: Tue Sep 29 11:46:06 2015 -0400
Add SpecialPermission to guard exceptions to security policy.
In some cases (e.g. buggy cloud libraries, scripting engines), we must
grant dangerous permissions to contained cases. Those AccessController blocks
are dangerous, since they truncate the stack, and can allow privilege escalation.
This PR adds a simple permission to check before each one, so that unprivileged code
like groovy scripts, can't do anything they shouldn't be allowed to do otherwise.
StoreRecoveryService used to be a pretty heavy class with lots of dependencies.
This class was basically not testable in isolation and had an async API with a listener.
This commit refactors this class to be a simple utility classs with a sync API hidden behind
the IndexShard interface. It includes single node tests and moves all the async properities to
the caller side.
Note, this change also removes the mapping update on master from the store recovery code since
it's not needed anymore in 3.0 because all stores have been subject to sync mapping updates such
that the master already has all the mappings for documents that made it into the transaction log.
Closes#13766
Now that groovy is factored out, we contain this dangerous stuff there.
TODO: look into those test hacks inspecting class protection domains, maybe we can
clean that one up too.
TODO: generalize the GroovyCodeSourcePermission to something all script engines check,
before entering accesscontrollerblocks. this way e.g. groovy script cannot coerce
python engine into creating something with more privs if it gets ahold of it... we
should probably protect the aws/gce hacks in the same way.
Previous to this change if to equal geohash cell query builders were created and then toQuery was called on one, they would no longer be equal.
This change also adds a test to AbstractQueryTestCase to make sure calling toQuery on any query builder does not affect the query builder's equality
* Add OS X support via "seatbelt" mechanism. This gives consistency across dev and prod, since many devs use OS X.
* block execveat system call: it may be new, but we should not allow it.
When a shard becomes in active we trigger a sync flush in order to speed up future recoveries. The sync flush causes a new translog generation to be made, which in turn confuses the IndexingMemoryController making it think that the shard is active. If no documents comes along in the next 5m, the shard is made inactive again , triggering a sync flush and so forth.
To avoid this, the IndexingMemoryController is changed to ignore empty translogs when checking if a shard became active. This comes with the price of potentially missing indexing operations which are followed by a flush. This is acceptable as if no more index operation come in, it's OK to leave the shard in active.
A new unit test is introduced and comparable integration tests are removed.
Closes#13802
Instead just log the same thing we print to the startup console for that case (magic logic),
it sucks to do this, but guice exceptions are too much.
All other non-guice exceptions will still be fully logged.
Closes#13782
This commit adds explicit ids for managing ElasticsearchException
serialization. By adding explicit ids and unit tests for them, the ids
are less brittle and breakage can be more clearly detected.
Today we use reflection where it's not needed anymore since java8 can
pass ctors around. This commit replaces runtime checks with compile time
checks which is always preferrable.
These classes are really duplicates and are just here for historical reasons.
We don't need these anymore since the same classes exist in lucene today.
This also removes the guice injection for DeletionPolicy and make them shard private.
TermsLookupQueryBuilder was left around only for bw comp reasons, but TermsQueryBuilder is its replacement. We can remove it now that it is clear query refactoring goes in master (3.0).
There is no need to have term vectors service on the shard level where it's
created for every shard. This commit moves it to a higher level which makes
shard creation slightly simpler and reduces the number of long living objects.
Before the refactoring we didn't check any invalid settings for strategy and relation
in the GeoShapeQueryBuilder. However, using SpatialStrategy.TERM and ShapeRelation.INTERSECTS
together is invalid and we tried to protect against that in the validate() method.
This PR moves these checks to setter for strategy and relation and adds tests for the new
behaviour.
Relates to #10217
Block execve(), fork(), and vfork() system calls, returning EACCES instead,
on kernels that support seccomp-bpf: either via seccomp() or falling back
to prctl().
Only linux/amd64 is supported. This feature can be disabled (in case
of problems) with bootstrap.seccomp=false.
Closes#13753
Squashed commit of the following:
commit 92cee05c72b49e532d41be7b16709e1c9f919fa9
Author: Robert Muir <rmuir@apache.org>
Date: Thu Sep 24 10:12:51 2015 -0400
Add a note about why we don't parse uname() or anything
commit b427971f45cbda4d0b964ddc4a55fae638880335
Author: Robert Muir <rmuir@apache.org>
Date: Thu Sep 24 09:44:31 2015 -0400
style only: we already pull errno into a local, use it for catch-all case
commit ddf93305525ed1546baf91f7902148a8f5b1ad06
Author: Robert Muir <rmuir@apache.org>
Date: Thu Sep 24 08:36:01 2015 -0400
add TODO
commit f29d1b7b809a9d4c1fcf15f6064e43f7d1b24696
Author: Robert Muir <rmuir@apache.org>
Date: Thu Sep 24 08:33:28 2015 -0400
Add full stacktrace at debug level always
commit a3c991ff8b0b16dc5e128af8fb3dfa6346c6d6f1
Author: Robert Muir <rmuir@apache.org>
Date: Thu Sep 24 00:08:19 2015 -0400
Add missing check just in case.
commit 628ed9c77603699aa9c67890fe7632b0e662a911
Author: Robert Muir <rmuir@apache.org>
Date: Wed Sep 23 22:47:16 2015 -0400
Add public getter, for stats or whatever if they need to know this
commit 3e2265b5f89d42043d9a07d4525ce42e2cb1c727
Author: Robert Muir <rmuir@apache.org>
Date: Wed Sep 23 22:43:06 2015 -0400
Enable use of seccomp(2) on Linux 3.17+ which provides more protection.
Add nice errors.
Add all kinds of checks and paranoia.
Add documentation.
Add boolean switch.
commit 0e421f7fa2d5236c8fa2cd073bcb616f5bcd2d23
Author: Robert Muir <rmuir@apache.org>
Date: Wed Sep 23 21:36:32 2015 -0400
Add defensive checks and nice error messages
commit 6231c3b7c96a81af8460cde30135e077f60a3f39
Author: Robert Muir <rmuir@apache.org>
Date: Wed Sep 23 20:52:40 2015 -0400
clean up JNA and BPF. block fork and vfork too.
commit bb31e8a6ef03ceeb1d5137c84d50378c270af85a
Author: Robert Muir <rmuir@apache.org>
Date: Wed Sep 23 19:00:32 2015 -0400
order is LE already for the JNA buffer, but be explicit about it
commit 10456d2f08f12ddc3d60989acb86b37be6a4b12b
Author: Robert Muir <rmuir@apache.org>
Date: Wed Sep 23 17:47:07 2015 -0400
block process execution with seccomp on linux/amd64
Some methods had default implementation while queries were going to be refactored, now that they are all refactored all those methods can be made abstract.
Remove ParseFieldMatcher duplication query in both contexts. QueryParseContext is still contained in QueryShardContext, as parsing still happens in the shards here and there. Most of the norelease comments have been removed simply because the scope of the refactoring has become smaller. Some could only be removed once everything, the whole search request, gets parsed on the coordinating node. We will get there eventually.
SimpleIndexQueryParserTests was the main responsible: deleted lots of duplicated tests, moved the ones that made sense to keep to their corresponding unit tests (note they were ESSingleNode tests before while are now converted to unit tests).
Closes#13750
After all queries now have a `toQuery` method and the parsers all
support `fromXContent` it is possible to remove the following
workarounds and deprecated methods we kept around while doing the
refactoring:
* remove the BaseQueryParser and BaseQueryParserTemp. All parsers
implement QueryParser directly now
* remove deprecated methods in QueryParseContext that either returned
a Query or a Filter.
* remove the temporary QueryWrapperQueryBuilder
Relates to #10217
The IndexingMemoryController checks periodically if there is any indexing activity on the shard. If no activity is sean for 5m (default) the shard is marked as inactive allowing it's indexing buffer quota to given to other active shards.
Sadly the current check is bad as it checks for 0 translog operation. This makes the inactive wait for a flush to happen - which used to take 30m and since #13707 doesn't happen at all (as we rely on the synced flush triggered by inactivity). This commit fixes the check so it will work with any translog size.
Closes#13759
This commit fixes ping timeout settings inconsistencies in
ZenDiscovery. In particular, the documentation refers to the ping
timeout setting as discovery.zen.ping_timeout but the code was
ultimately using discovery.zen.ping.timeout if this was set.
This commit also changes all instances of the raw string
“discovery.zen.ping_timeout” to the constant
o.e.d.z.ZenDiscovery.SETTING_PING_TIMEOUT.
Finally, this commit removes the legacy setting
"discovery.zen.initial_ping_timeout".
Closes#6579, #9581, #9908
The current MoreLikeThisQueryBuilder validation checks for existence of at
least one `like` text or item. This is hard to check in setters, so this PR
tries to change the construction of the query so that we can do these checks
already at construction time.
Changing to using arrays for fieldnames, likeTexts, likeItems, unlikeTexts
and unlikeItems. `likeTexts` and/or `likeItems` need to be specified at
construction time to validate we have at least one item there.
Relates to #10217
Banning `ImmutableSet` outright is too much to do all at once - this starts
the process by banning `ImmutableMap#entrySet` - one of the more common ways
that `ImmutableSet`s come up. It then starts to remove calls to
`ImmutableMap#entrySet` by changing declarations from `ImmutableMap` to `Map`.
Unfortunately this process is like pulling on a long, windy string and one
declaration change requires another which requires 5 more which in turn
require another few. So this change is rather large.
As such, to keep the changes manageable they only remove `ImmutableMap` from
the signatures that are needed for `entrySet` and make little effort to stop
using `ImmutableMap` internally. Removing the usages of `ImmutableMap`
complicates immutability guarantees and will be done separately.
In #12942, the NettyTransport and NettyHttpServerTransport were updated to allow for binding
to multiple addresses. However, the BoundTransportAddress holder only exposed the first address
that the transport was bound to and this object is used to populate the values returned to the user
via our APIs.
This change exposes all of the bound addresses in the BoundTransportAddress holder, which allows
for an accurate representation of all interfaces that elasticsearch is bound to and listening on.
Not the neatest implementation in the world. Maybe we should consider changing the builders so it is a single builder for sort and a single builder for rescore instead of a list of builders for each?
This commit addresses a confusing error message that arises when a
property parameter (e.g. -D) is after a double-dash parameter. The
current error message reports to the user that the parameter does not
start with “--". Adding the second dash as the error message suggests
causes the parameter to be silently ignored. This is confusing for the
user. With this commit, the user is now informed that the parameter
order is violated.
Relates e27ede48ce
These exceptions are useless and unused, since we are on a major verison we should remove
them. This commit also makes it easier to remove excepitons in the future.
In the past ClusterStateUpdateTask was an interface and we had various derived marker interfaces to control behavior. Since then we moved ClusterStateUpdateTask to be an abstract class but we kept the old hierarchy of implementations. All of those (but the AckedClusterStateUpdateTask) can be folded into ClusterStateUpdateTask, adding correct default behavior.
Closes#13735
This commit moves the size and ops based flush into a synchronous API into
IndexShard and removes the time-based flush alltogether since it' basically
covered by the inactive async flush API we have today. The functionality doesn't
need to be covered by scheduled task and async APIs while we can actually make all
the decisions in a sync manner which is way easier to control and to test.
Closes#13707
This commit removes all the opaque bytes for extra_source and template_source.
Instead source and extra_source etc. are represented as SearchSourceBuilder which can
in-place be modified and is updated with the content of the request parameters.
Template Source is parsed and evaluated which in-turn replaces the actual source.
Refactor the function_score query so it can be parsed on the coordinating node, split parse into fromXContent and toQuery, make FunctionScoreQueryBuilder Writeable.
Closes#13653
Given that we are moving to parsing queries on the coordinating node, the index name is not relevant anymore in QueryParseContext, as the parsing phase cannot be related to any specific index. On the contrary, the QueryShardContext is the one that holds mappings etc. and the index name too, as the lucene query creation happens on the data node and can still be related with the index that it happens against.
Changes are mainly around tests that were expecting the index name, moved to using QueryShardException in some of them, removed the index name elsewhere.
Closes#13631
Moving validation from validate() to constructors and setters for the
following query builders:
* GeoDistanceQueryBuilder
* GeoDistanceRangeQueryBuilder
* GeoPolygonQueryBuilder
* GeoShapeQueryBuilder
* GeohashCellQuery
* TermsQueryBuilder
Relates to #10217
This parser prototype allows to decleratively define parsers for XContent
instead of writing messy and error prone while loops. It encapsulates all the error handling logic
and only even tries to parse if the token types match the declaration.
I want to refactor scripting engines so we can contain dangerous "God-like" permissions
like createClassloader/sun.reflect. These are used for dynamic class generation (scripts, mocks).
This will mean some refactoring to ES core.
But first lets get the plugins in order first. I removed those permissions globally, and
fixed grants for lang-javascript, lang-python, securemock so that everything works.
lang-javascript needs no code changes, because rhino is properly written :)
lang-python needs accesscontroller blocks. securemock was already working as of 1.1
This is just a baby step, to try to do some of this incrementally! It doesn't yet provide
us anything.
Currently the tribe node version always stays 0, which can cause issues for the services that rely on cluster state version. For example, ClusterStateObserver doesn't revalidate the cluster state after change, which leads to cluster health check with wait flags to take much longer then actually needed.
Until now we had a cloud-azure plugin which is providing 3 distinct features:
* discovery on Azure
* snapshot/restore on Aure
* SMB store
This commit splits the plugin by feature so people can use either one or the other or both features.
Doc is updated accordingly.
This add equals, hashcode, read/write methods, validation, separates toQuery
and JSON parsing and adds serialization and query generation tests.
Deprecates two types of initializing the bounding box: In our documentation we
speak about specifying top/left and bottom/right corner of a bounding box. Here
we also allow for top/right and bottom/left. This adds not only to the amount
of code but also testing needed w/o too much benefit for the user other than
more chances to confuse top/right/bottom/left/latitude/longitude IMHO.
Missing: The toQuery method with type set to "indexed" is not tested at the
moment.
Cleanup changes unrelated to base refactoring:
* Switched from type String to enum for types in GeoBoundingBoxQueryBuilder.
* Switched to using type GeoPoint for storing the bounding box coordinates
instead of array of double values.
Relates to #10217 for the query refactoring part.
Relates to #12016 for how missing mappings are handled.
Adds a utility class for generating random geo data.
Adds some missing documentation.
Extend test to MEMORY type config
Fix final review comments and rebase
We moved a lot of repositories into elasticsearch, but in their new
location they retained their LICENSE.txt and NOTICE.txt files. These are
all the same, and having the license and notice and the root of the
repository should be sufficient.
This commit removes and now forbids all uses of
com.google.common.primitives.Ints across the codebase. This is one of
many steps in the eventual removal of Guava as a dependency.
Relates #13224
graduate this from a hack for insecure plugins to something we can
live with for per-module/plugin permissions, it now works reasonably
in unit tests and with Intellij and Eclipse IDEs.
remove security warnings: we will deal with these issues in a secure
way, if we cannot, then the plugin shouldn't be in our core codebase.
This PR is the second batch in moving the query validation we started
to collect in the validate() method to the corresponding setters
and constructors.
This is the more sheisty business along the same lines as
https://github.com/elastic/elasticsearch/pull/13638
1 hour total adding the real functionality, days of wasted time
on simulated fake functionality to satisfy our crazy test framework...
I debugged on the problematic jenkins machine and I think issues are
from parsing the classpath and URL normalization etc (trailing slashes
vs not, etc in URLs). So I simplifed the code, to remove this completely,
inverting the logic so we just use an exclusion list instead of inclusion one.
I also allow tests for these plugins to run from the IDE (works at least for eclipse) too.
At least for eclipse this is even less realistic as it piles all the code (src and test)
into a single codebase, but it means you can *use it* and you just have to run mvn verify
before pushing as always. And as always... best effort.
A JTS bug causes a misinterpretation of polygon coordinates leading to an unhelpful "geom" AssertionError. While this assertion occurs approx 0.02% of the time it can lead to a misleading test failure. This patch catches the geom assertion and retries randomShapeCreation. For safety a threshold is set to prevent unlimited retrying - though 1 retry is typically sufficient for correcting the invalid shape.
closes#13551
A JTS bug causes a misinterpretation of polygon coordinates leading to an unhelpful "geom" AssertionError. While this assertion occurs approx 0.02% of the time it can lead to a misleading test failure. This patch catches the geom assertion and retries randomShapeCreation. For safety a threshold is set to prevent unlimited retrying - though 1 retry is typically sufficient for correcting the invalid shape.
closes#13551
We don't have a plugin .zip for unit tests, so we can't do it
correctly. But we can approximate it better, so that if code
is simply missing an AccessController block at least tests will fail.
Classnames change quickly due to refactorings etc. If that happens in a minor release
we loose the ability to deserialize the exceptoin coming from another node sicne we today
look it up by classname. This change uses a dedicated static id instead of the classname
to lookup the actual class.
Especially the worst of the worst with thread permissions: for example,
this prevents some code from starting daemon thread that will outlive
the elasticsearch process and hang around doing evil shit.
This commit removes unnecesssary use of ExceptionHelpers where we actually
should serialize / deserialize the actual exception. This commit also
fixes one of the oddest problems where the actual exception was never
rendered / printed if `all shards failed` due to a missing cause.
This commit unfortunately doesn't fix Snapshot/Restore which is almost
unfixable since it has to serialize XContent and read from it which can't
transport exceptions.
Weighted centroid, morton hash, and geohash can be imprecise (computation error) to 1e-5. The previous compareTo set this tolerance too strict (1e-6) causing a reproducible comparison error on weighted centroid (#13558). This change relaxes the tolerance to the acceptable computation error of 1e-5
closes#13558
Improve IndexingMemoryController a bit:
- promptly push indexing buffer changes to IndexWriter, instead of waiting for next refresh/flush
- don't wait for merges to finish before dropping a shards's indexing buffer to 512 KB once it's inactive
- fix NPE if indices.memory.index_buffer_size is in node's settings with a bytes (not %) unit
- add some more logger.debug
This commit removes and now forbids all uses of
com.google.common.base.Joiner across the codebase. This is one of many
steps in the eventual removal of Guava as a dependency.
Relates #13224
This commit removes and now forbids all uses of
com.google.common.math.LongMath across the codebase. This is one step
of many in the eventual removal of Guava as a dependency.
This commit removes and now forbids all uses of
com.google.common.collect.Iterables across the codebase. This is one of
many steps in the eventual removal of Guava as a dependency.
Relates #13224
This commit removes and now forbids all uses of
com.google.common.base.Preconditions across the codebase. This is one
of many steps in the eventual removal of Guava as a dependency.
Relates #13224
https://github.com/elastic/elasticsearch/pull/12908
Moved output to verbose level and additionally changed the plugin info output format
Before:
```shell
PluginInfo{name='cloud-aws', description='The Amazon Web Service (AWS) Cloud plugin allows to use AWS API for the unicast discovery mechanism and add S3 repositories.', site=false, jvm=true, classname=org.elasticsearch.plugin.cloud.aws.CloudAwsPlugin, isolated=true, version='2.1.0-SNAPSHOT'}
```
After:
```shell
- Plugin information:
Name: cloud-aws
Description: The Amazon Web Service (AWS) Cloud plugin allows to use AWS API for the unicast discovery mechanism and add S3 repositories.
Site: false
Version: 2.1.0-SNAPSHOT
JVM: true
* Classname: org.elasticsearch.plugin.cloud.aws.CloudAwsPlugin
* Isolated: true
```
Fixes#12907
If we don't wait we might retrieve the cluster state before second node
was added when we try to add the delegate in which the discovery node for
node_2 is null.
When a single node starts up it will first elect itself as master and then tries to
recover the cluster state or, if there is none, initialize an empty one and publish it.
Until it has done that, the cluster state will contain a global block and
requests might fail with
SERVICE_UNAVAILABLE/1/state not recovered / initialized
We need to wait for green.
This commit fixes a test bug in
o.e.a.s.b.n.TransportBroadcastByNodeActionTests. Namely, the randomized
test allowed for the creation of cluster states that allocated indices
having zero shards. This ultimately surfaced in a
NoSuchElementException when attempting to iterate over the nonexistent
shards. The fix is merely to draw the random number of shards from 1 to
10 instead of 0 to 10.
This commit replaces the usage of LoadedCache with a simple CHM and calls
to computeIfAbsent and adds LoadingCache and CacheLoader to forbidden APIs
Relates to #13224
* Added BWC indices
* Added snapshot version to Version.java
* Fixed create_bwc_index to use localhost instead of localhost and 127.0.0.1 (problem with ipv4/6 setup)
This commit removes and now forbids all uses of
Function, Charsets, Collections2 across the codebase. This
is one of many steps in the eventual removal of Guava as a dependency.
Relates #13224
When publishing a new cluster state, the master will send it to all the node of the cluster, noting down how many *master* nodes responded successfully. The nodes do not yet process the new cluster state, but rather park it in memory. As soon as at least minimum master nodes have ack-ed the cluster state change, it is committed and a commit request is sent to all the node that responded so far (and will respond in the future). Once receiving the commit requests the nodes continue to process the cluster state change as they did before this change.
A few notable comments:
1. For this change to have effect, min master nodes must be configured.
2. All basic cluster state validation is done in the first phase of publish and is thus now part of `ShardOperationResult`
3. A new `COMMIT_TIMEOUT` settings is introduced, dictating how long a master should wait for nodes to ack the first phase. Unlike `PUBLISH_TIMEOUT`, if waiting for a commit times out, the cluster state change will be rejected.
4. Failing to achieve a min master node of acks, will cause the master to step down as it clearly doesn't have enough active followers.
5. Previously there was a short window between the moment a master lost it's followers and it stepping down because of node fault detection failures. In this short window, the master could process any change (but fail to publish it). This PR closes this gap to 0.
6. A dedicated pending cluster states queue was added to keep pending non-comitted cluster states and manage the logic around processing committed cluster states. See #13303 for details.
Closes#13062 , Closes#13303
This commit allows to refresh the info service in a blocking fashion
which allows tests to prevent installing listeners alltogether and
makes the class easier to test. Installing a listnener is always subject
to concurrent modifications where the listener might be called mulitple times
or with stale information which causes tests to fail.
Java 8 allows for method references which in-turn will cause
compile errors if a method is not visible while reflection fails late
and maybe too late. We can now register Request instances via FooRequest::new
instead of passing FooRequest.class and call it's ctor via reflection.
This test sometimes fails because the first node is elected as master and waits 30s for incoming joins but in the meanwhile the 3 other nodes form a cluster on their side. The index will be created and its shards allocated on these 3 nodes, then the test checks for the number of shards on each node (it should be 2 or 3) but because the first node has not fully join the cluster yet one node will have 5 shards.
closes#13305
1. FileSystem wrapping code is broken, thats why you get providermismatch exception!
Instead of fixing this, it SuppressesForbidden!!!!
2. Because it uses SuppressForbidden on the test, the whole thing is lenient, it uses java.io.File for example!
3. Of course it fails consistently on windows because it can't remove files, because it leaks file handles (locks)
like a sieve since it does not close node environment. With correct wrapping this is always detected by e.g.
our leak detection FS. Instead of fixing the leak, it assumesFalse(WINDOWS) !!!!!
I do not know how this snuck past me, but I need this fixed to remove setAccessible.
- promptly push indexing buffer changes to IndexWriter, instead of waiting for next refresh/flush
- don't wait for merges to finish before dropping a shards's indexing buffer to 512 KB
- fix NPE if indices.memory.index_buffer_size is in node's settings with a bytes (not %) unit
- add some more logger.debug
During the second phase of recovery, replayed transaction log entries may need to wait on mapping changes that have not yet propagated to the target node. Currently we correctly replay the operation at a later stage, but we acknowledge the replay request before actually performing the work.
Example failure: http://build-us-00.elastic.co/job/es_feature_two_phase_pub/859/Closes#13535
If the machine is very slow this test fails if the delta of the unallocaiton
timestamp and the last scheduled delay is greater than the scheduled delay time.
Today this is really horrible, and we have a PR sent to fix it, but nobody
does anything: https://github.com/aws/aws-sdk-java/pull/432
With java 9, we cannot even grant the permission, this kind of sheistiness is not allowed,
and s3 repository is completely broken.
The problem is their code is still broken, and won't handle neither SecurityException (our PR)
nor the new InaccessibleObjectException they will get from java 9.
We use a really hacky hack to deliver an exception that their code catches (IllegalAccessException) instead.
This means s3 repository is working on java 9, and we close off access to sun.security.ssl completely
Don't worry, I will fix the rest. But some of those remaining will need a lucene upgrade,
we need to add a getter or two for tests to do things cleanly.
In addition to being a big security problem, setAccessible is a risk
for java 9 migration. We need to clean up our code so we can ban it
and eventually enforce this with security manager for third-party code, too,
or we may have problems.
Instead of using setAccessible, use the correct modifier (e.g. public).
TODO: ban in tests
TODO: ban in security manager at runtime
This commit removes and now forbids all uses of
com.google.common.collect.ImmutableSortedMap across the codebase. This
is one of many steps in the eventual removal of Guava as a dependency.
Relates #13224
Instead of asking blob store to create output for posting blob content, this change provides that content of the blob to the blob store for writing. This will significantly simplify the interface for S3 and Azure plugins.
This commit replaces:
* com.google.common.util.concurrent.ListenableFuture
* com.google.common.util.concurrent.SettableFuture
* com.google.common.util.concurrent.Futures
* com.google.common.util.concurrent.MoreExecutors
And forbits its usage via forbidden APIs. This is one of
many steps in the eventual removal of Guava as a dependency.
Relates to #13224
We have a gazillion ways to specify the source of the search request.
There is no need to so many we can reduce them dramatically and also remove
some deprecated API.
The public org.elasticsearch.common.settings.Settings#getAsMap method
leaks the dependency on Guava by returning a
com.google.common.collect.ImmutableMap. The leaking of this dependency
should be removed in preparation for the eventual complete removal of
Guava as a dependency.
Relates #13224
The filter element has been deprecated in the function_score query parser. Whenever a filter is found it gets wrapped into a query automatically. The filter in the java api builder is always null, there is no way to set its value, just a leftover.
Moving the query building functionality from the parser to the builders
new toQuery() method analogous to other recent query refactorings.
Relates to #10217
PR goes against the query-refactoring branch
The initial implementation of two phase commit based cluster state publishing (#13062) relied on a single in memory "pending" cluster state that is only processed by ZenDiscovery once committed by the master. While this is fine on it's own, it resulted in an issue with acknowledged APIs, such as the open index API, in the extreme case where a node falls behind and receives a commit message after a new cluster state has been published. Specifically:
1) Master receives and acked-API call and publishes cluster state CS1
2) Master waits for a min-master nodes to receives CS1 and commits it.
3) All nodes that have responded to CS1 are sent a commit message, however, node N didn't respond yet
4) Master waits for publish timeout (defaults to 30s) for all nodes to process the commit. Node N fails to do so.
5) Master publishes a cluster state CS2. Node N responds to cluster state CS1's publishing but receives cluster state CS2 before the commit for CS1 arrives.
6) The commit message for cluster CS1 is processed on node N, but fails because CS2 is pending. This caused the acked API in step 1 to return (but CS2 , is not yet processed).
In this case, the action indicated by CS1 is not yet executed on node N and therefore the acked API calls return pre-maturely. Note that once CS2 is processed but the change in CS1 takes effect (cluster state operations are safe to batch and we do so all the time).
An example failure can be found on: http://build-us-00.elastic.co/job/es_feature_two_phase_pub/314/
This commit extracts the already existing pending cluster state queue (processNewClusterStates) from ZenDiscovery into it's own class, which serves as a temporary container for in-flight cluster states. Once committed the cluster states are transferred to ZenDiscovery as they used to before. This allows "lagging" cluster states to still be successfully committed and processed (and likely to be ignored as a newer cluster state has already been processed).
As a side effect, all batching logic is now extracted from ZenDiscovery and is unit tested.
This commit removes all the optional injects etc. for the FetchServices and
provides a Client via IndexQueryParserService / Context. This allows direct
injection instead of optional injection. It also allows to remove all the
unnecessary services and use the fetch code where it belongs.
This commit also adds testing infrastructure for intercepting client calls
to AbstractQueryTestCase to support GET calls in query tests.
Moving the query building functionality from the parser to the builders
new toQuery() method analogous to other recent query refactorings.
Relates to #10217
PR goes against the query-refactoring branch
This commit removes and now forbids all uses of
com.google.common.collect.Queues across the codebase. This is one of
many steps in the eventual removal of Guava as a dependency.
Relates #13224
This commit removes and now forbids all uses of
com.google.common.collect.ImmutableSortedSet across the codebase. This
is one of many steps in the eventual removal of Guava as a dependency.
Relates #13224
The MockInternalClusterInfoService depends on a constant and helper
method from actual test cases. This moves the constant and helper method
into the mock itself. Without this change, the classes put into the test
jar are not completely useable on their own (since this mock is in the
test jar, but the test cases are not).
This commit removes and now forbids all uses of
com.google.common.base.Preconditions#checkNotNull across the codebase.
This is one of many steps in the eventual removal of Guava as a
dependency.
Relates #13224
The semantics of the `boost` parameter for `function_score` changed. This is
due to the fact that Lucene now requires that query boosts and top-level boosts
are applied the same way.
Requesting a million hits, or page 100,000 is always a bad idea, but users
may not be aware of this. This adds a per-index limit on the maximum size +
from that can be requested which defaults to 10,000.
This should not interfere with deep-scrolling.
Closes#9311
This commit removes and now forbids all uses of
com.google.common.collect.Sets across the codebase. This is one of many
steps in the eventual removal of Guava as a dependency.
Relates #13224
* Dropped ScoreType in favour of Lucene's ScoreMode
* Removed `score_type` option from `has_child` and `has_parent` queries in favour for the already existing `score_mode` option.
* Removed the score mode `sum` in favour for the already existing `total` score mode. (`sum` doesn't exist in Lucene's ScoreMode class)
* If `max_children` is set to `0` it now really means that zero children are allowed to match.
Adds a node attribute to all test runs and uses the attribute to test
`_cat/nodeattrs`.
Note that its quite possible create an impressively slow regex while doing
this and you have to be careful. See comment in commit for more if curious.
Closes#12558
Previously PipelineAggregatorFactory's at the root to the agg tree (top-level aggs) were not validated. This commit adds a call to PipelineAggregatoFactory.validate() for that case.
Closes#13179
This add equals, hashcode, read/write methods, separates toQuery and JSON parsing and adds tests.
Also moving MatchQueryBuilder.Type to MatchQuery to MatchQuery, adding serialization and hashcode,
equals there.
Relates to #10217
This commit contains:
* renaming BaseQueryTestCase to AbstractQueryTestCase
* uses always STRICT parsing when parsing from builders XContent
* ensures that SearchContext is only but always available during toQuery but never during parse phase
* adds a way to override the default ParseFieldMatcher to allow queries to set deprecated API by default
There are a few tests that currently use the statically generated
backcompat indexes. This change moves them to a shared location, so they
no longer have to build a path based on the package name of the old
index tests.
This commit removes and now forbids all uses of
com.google.common.collect.Maps across the codebase. This is one of many
steps in the eventual removal of Guava as a dependency.
Relates #13224
This commit addresses several bugs that prevented the Windows
service from being started or stopped:
- Extra white space in the concatenation of java options in
elasticsearch.in.bat which tripped up Apache Commons Daemon
and caused ES to startup without any params, eventually leading
to the "path.home is not configured" exception.
- service.bat was not passing the start argument to ES
- The service could not be stopped gracefully via the stop command
because there wasn't a method for procrun to call.
Closes#13247Closes#13401
Allocation filtering by IP only works today using the node host address. But in some cases, you might want to filter using the publish address which could be different.
Moving the query building functionality from the parser to the builders
new toQuery() method analogous to other recent query refactorings.
Also this PR removes the check that the index is created before 2.0 for the normalize parameter. The parameter is now always parsed but as a deprecated parameter. We cannot and should not access the index version during parsing.
Relates to #10217
PR goes against the query-refactoring branch
Moving the query building functionality from the parser to the builders
new toQuery() method analogous to other recent query refactorings.
Relates to #10217
PR goes against the query-refactoring branch
This commit splits HasParentQueryParser into toQuery and fromXContent.
This change also deprecates several keys in favor of simplified settings
and adds basic unittests for HasParentQueryParser.
Relates to #10217
Previously the parser could take any Term Vectors request, but this would be
not the case of the builder which would still use MultiGetRequest.Item. This
introduces a new Item class which is used by both the builder and parser.
Beyond that the rest is mostly cleanups such as:
1) Deprecating the ignoreLike methods, in favor to using unlike.
2) Deprecating and renaming MoreLikeThisBuilder#addItem to addLikeItem.
3) Ordering the methods of MoreLikeThisBuilder more logically.
This change is needed for the upcoming query refactoring of MLT.
Closes#13372
1) A shared immutable fieldtype for the _parent field (used for direct access to that field in the dsl). This field type is stored and indexed.
2) A per type field type for the child join field. The field type has doc values enabled if index is created on or post 2.0 and field data type is allowed to be changed.
3) A per type field type for the parent join field. The field type has doc values enabled if index is created on or post 2.0.
This resolves the issue that a mapping is not compatible if parent and child types have different field data loading settings.
Closes#13169
When we commit the translog, documents that were in it before cannot be retrieved from
it anymore via get and have to be retrieved from the index instead. But they will only
be visible if between index and get a refresh is called. Therfore we have to call
first refresh and then translog.commit() because otherwise there is a small gap
in which we cannot read from the translog anymore but also not from the index.
closes#13379
We have a handful of compiler warnings, mostly because of passing an
array to varargs methods. This change fixes these warnings and adds
-Werror so we don't get anymore of these warnings.
Note this does *not* enable deprecation or unchecked type warnings, so
these remain "hidden". We should work towards removing those as well,
but this is a first step.
This commit removes and now forbids all uses of
com.google.common.base.Throwables across the codebase.
For uses of com.google.common.base.Throwables#getStackTraceAsString,
use org.elasticsearch.ExceptionsHelper#stackTrace.
Relates #13224
This is an intial commit that splits HasChildQueryParser / Builder into
the two seperate steps. This one is particularly nasty since it transports
a pretty wild InnerHits object that needs heavy refactoring. Yet, this commit
has still some nocommits and needs more tests and maybe another cleanup but
it's a start to get the code out there.
Whe we call optimize we ignore Exceptions that indicate a closed shard.
However, when a shard is closed while an optimize request is in flight it
might also trigger an AlreadyClosedException from the IndexWriter when we
get the config or ForceMergeFailedEngineException with the EngineClosedException
wrapped inside. Because these are not identified as exceptions that indicate
a closed shard (TransportActions.isShardNotAvailableException(..)) optimize
would sometimes report failures when shards were relocating while optimize was called
and sometimes not. This caused weird test failures, see #13266 .
Instead, we should let EngineClosedException bubble up and also recognize
AlreadyClosedException as an indicator for a closed shard.
Today we try to allocate primaries first and then replicas
but don't take the index creation date and priority into account
as we do in the GatewayAlloactor.
Closes#13249
Adds a listeners to each of the caches that allows us to remove the dependency on IndexService which is cyclic since
the IndexService depends on both of these caches. This cyclic dependency makes
testing the indiviual parts very hard and is only added for the sake of
incrementing some stats.
Transport clients run embedded within external applications, so
elasticsearch should not be doing anything with the filesystem, as there
is not elasticsearch home.
This change makes a number of cleanups to the internal API for loading
settings and creating an environment. The loadFromConfig option was
removed, since it was always true except for tests. We now always
attempt to load settings from config a file when an environment is
created. The prepare methods were also simplified so there is now
prepareSettingsAndEnvironment which nodes use, and prepareSettings which
the transport client uses. I also attempted to improve the tests, but
there is a still a lot of follow up work to do there.
closes#13155
This commit removes and now forbids all uses of
com.google.common.base.Strings across the codebase.
For uses of com.google.common.base.Strings.isNullOrEmpty, use
org.elasticsearch.common.Strings.isNullOrEmpty.
For uses of com.google.common.base.Strings.padStart use
org.elasticsearch.common.Strings.padStart.
For uses of com.google.common.base.Strings.nullToEmpty use
org.elasticsearch.common.Strings.coalesceToEmpty.
Relates #13224
Before #13068 refresh and flush ignored all exceptions that matched
TransportActions.isShardNotAvailableException(e) and this should not change.
In addition, refresh and flush which are based on broadcast replication
might now get UnavailableShardsException from TransportReplicationAction if a shard
is unavailable and this is not caught by TransportActions.isShardNotAvailableException(e).
This must be ignored as well.
This commit removes and now forbids all uses of
com.google.common.base.Predicate and com.google.common.base.Predicates
across the codebase. This is one of the many steps in the eventual
removal of Guava as a dependency. This was enabled by #13314.
Relates #13224
This commit removes and now forbids all uses of
com.google.common.base.Objects across the codebase. This is a small
step in the eventual removal of Guava as a dependency.
Relates #13224
Previously we skip deleting the index store for indices on a shared
filesystem, because we don't want to delete the data when the shard is
relocating around the cluster. This adds a flag to the
`deleteIndexStore` method signifying that the index is closed and that
we should allow deleting the contents even if it is on a shared
filesystem.
Includes a unit test for the IndicesService.canDeleteIndexContents and
integration tests ensure a closed shadow replica index deletes files
correctly.
Resolves#13297
As a refinement to Project Coin (JEP-213, JDK-8042880), Java 9 is going
to disallow the use of ‘_’ as a one-character identifier. This will be
done by adding ‘_’ as a keyword to the Java language (JDK-8065599).
Currently, uses of ‘_’ as a one-character identifier are warnings in
the Java 8 compiler. This commit removes all uses of ‘_’ as a
one-character identifier from the codebase.
SpanContainingQueryParser and SpanWithinQueryParser always set the boost to the parsed lucene query, even if it is the default one. The default boost of the main query though is the boost coming from the inner little query, value that we end up overriding all the time. We should instead set the boost to the main query only if it differs from the default, to mimic lucene's behaviour.
Relates to #13272Closes#13339
SimpleQueryStringParser applies whatever boost the query holds, even if the default 1, to the query obtained from parsing of the query string. that might contain its boost, for instance if it resolved to a simple query like term (single term query against a single field). We should rather multiply the existing boost with the boost set to the query, same as we do in query_string
Relates to #13272Closes#13331
doParse() was supposed to allow aggs to perform extra parsing. Unfortunately, this forced the
parser to carry instance-level state, which would carry-over and "corrupt" any other aggs of the
same type in the same query.
Instead, we are now collecting all unknown params and pasing them as a Map<String, Object>
to buildFactory(). The agg may then parse them and instantiate a factory. Each param the
agg uses, it should unset from the unusedParams object.
After building the factory, the parser verifies that unusedParams is empty. If it is not empty,
an exception is raised so the user knows they provided unknown params.
Fixes#13337
This changes construction of Phrase and Boolean queries to use the builder,
and replaces BitDocIdSetFilter with BitSetProducer for nested and parent/child
queries. I had to remove the ParentIdsFilter for the case when there was a
single parent as it was using the source of BitSets for parents as a regular
Filter, which is not possible anymore now. I don't think this is an issue since
this case rarely occurs, and the alternative logic for when there are several
matching parent ids should not be much worse.
This pipeline will calculate percentiles over a set of sibling buckets. This is an exact
implementation, meaning it needs to cache a copy of the series in memory and sort it to determine
the percentiles.
This comes with a few limitations: to prevent serializing data around, only the requested percentiles
are calculated (unlike the TDigest version, which allows the java API to ask for any percentile).
It also needs to store the data in-memory, resulting in some overhead if the requested series is
very large.
This commit moves ignore_malformed and coerce options from the GeoPointFieldType to the Builder in GeoPointFieldMapper. This makes these options consistent with other types in 2.0.
This was supposed to just help the user, in case they misconfigured something.
Broadcast is an ipv4 only thing, the only way you can really detect its a broadcast
address, is to look and see if an interface has that address as its broadcast address.
But we cannot trust that container interfaces won't have a crazy setup...
Closes#13327
The match_phrase_prefix query properly parses the boost etc. but it loses it in its rewrite method. Fixed that by setting the orginal boost to the rewritten query before returning it. Also cleaned up some warning in MultiPhrasePrefixQuery.
Closes#13129Closes#13142
Target-type inference has been improved in Java 8. This leads to these
lines now being interpreted as invoking String#valueOf(char[]) whereas
they previously were interpreted as invoking String#valueOf(Object).
This change leads to ClassCastExceptions during test execution. Simply
casting the parameter to Object restores the old invocation.
Closes#13315
We have some optimization in FilteredQueryParser that tries to mimic what the rewrite method in lucene does, based on what gets parsed we return the simplest query possible. That might cause issues with boost values though, if specified in both the main query and the inner query that we shortcut to. We should rather rely on lucene's rewrite method to simplify the lucene representation of the query, and always build a filtered query instead.
relates to #13272Closes#13312
We currently optimize scroll when sort=_doc because docs are returned in order.
But documents are also returned in order when sorting by score and the query
gives constant scores. This optimization has the nice side-effect of also
optimizing scrolls with the default `match_all` query.
Until now we had a cloud-aws plugin which is providing 2 disctinct features:
* discovery on EC2
* snapshot/restore on S3
This commit splits the plugin by feature so people can use either one or the other or both features.
Doc is updated accordingly.
Before this change the check would check that all test classes end in Tests but the message would say they need to end in Test or Tests which was confusing.
Today we always collect in order to compute counts, but some of them can be
easily optimized by using pre-computed index statistics. This is especially
true in the case that there are no deletions, which should be common for the
time-based data use-case.
Counts on match_all queries can always be optimized, so requests like
```
GET index/_search?size=0
GET index/_search
{
"size": 0,
"query" : {
"match_all": {}
}
}
```
should now return almost instantly. Additionally, when there are no deletions,
term queries are also optimized, so the below queries which all boil down to a
single term query would also return almost immediately:
```
GET index/type/_search?size=0
GET index/_search
{
"size": 0,
"query" : {
"match": {
"foo": "bar"
}
}
}
GET index/_search
{
"size": 0,
"query" : {
"constant_score": {
"filter": {
"exists": {
"field": "foo"
}
}
}
}
}
```
Users might specify something like -Des.network.host=0.0.0.0, as that
was the old default with previous versions of elasticsearch. This means
to bind to all interfaces, but it makes no sense as a publish address.
Pick a good one in this case, just like we do in other cases where
publish isn't explicitly specified and we are bound to multiple (e.g.
when configured by interface, or dns hostname with multiple addresses).
However, in this case warn the user about it: since its arbitrarily
picking the first non-loopback address like the old versions
did, thats a little too heuristical, but lets make the cutover easy.
Separately, fail hard if things like multicast or broadcast addresses are
configured as bind or publish addresses, as that is simply invalid.
Closes#13274
The number and distribution of errors in some restore test may cause restore process to continue to fail for a prolong time. This test caps the total number of simulated failures to make sure that restore is guaranteed to eventually succeed after a limited number of retries.
We currently have a small number of test classes with the suffix "Test",
yet most use the suffix "Tests". This change renames all the "Test"
classes, so that we have a simple rule: "Non-inner classes ending with
Tests".
These are not actually tests, but command line applications that must be
run manually. This change removes the entire stresstest package. We can
add back individual tests that we find necessary, and make them real
tests (whether integ or not).
While the list of having exclusions is small, it shouldn't be necessary
at all. Base test cases should be suffixed with TestCase so they are not
picked up by the test class name pattern. This same rule works for
abstract classes as well.
This change renames abstract tests to use the TestCase suffix, adds a
check in naming convention tests, and removes the exclusion from our
test runner configuration. It also excludes inner classes (the only
exclude we should have IMO), so that we have no need to @Ignore the
inner test classes for naming convention tests.
In this test we assume that after waitForRelocation() has returned shards
are no more relocated and optimize will therefore succeed always.
However, because the test does not wait for green status, relocations can
still start after waitForRelocation() has returned successfully.
see #13266 for a detailed explanation
At the moment if an index script is used in a request, the spawned request to get the indexed script from the `.scripts` index does not get the headers and context copied to it from the original request. This change makes the calls to the `ScriptService` pass in a `HasContextAndHeaders` object that can provide the headers and context. For the `search()` method the context and headers are retrieved from `SearchContext.current()`.
Closes#12891
The shaded version of elasticsearch was built at the very beginning to avoid dependency conflicts in a specific case where:
* People use elasticsearch from Java
* People needs to embed elasticsearch jar within their own application (as it's today the only way to get a `TransportClient`)
* People also embed in their application another (most of the time older) version of dependency we are using for elasticsearch, such as: Guava, Joda, Jackson...
This conflict issue can be solved within the projects themselves by either upgrade the dependency version and use the one provided by elasticsearch or by shading elasticsearch project and relocating some conflicting packages.
Example
-------
As an example, let's say you want to use within your project `Joda 2.1` but elasticsearch `2.0.0-beta1` provides `Joda 2.8`.
Let's say you also want to run all that with shield plugin.
Create a new maven project or module with:
```xml
<groupId>fr.pilato.elasticsearch.test</groupId>
<artifactId>es-shaded</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<elasticsearch.version>2.0.0-beta1</elasticsearch.version>
</properties>
<dependencies>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>${elasticsearch.version}</version>
</dependency>
<dependency>
<groupId>org.elasticsearch.plugin</groupId>
<artifactId>shield</artifactId>
<version>${elasticsearch.version}</version>
</dependency>
</dependencies>
```
And now shade and relocate all packages which conflicts with your own application:
```xml
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<relocations>
<relocation>
<pattern>org.joda</pattern>
<shadedPattern>fr.pilato.thirdparty.joda</shadedPattern>
</relocation>
</relocations>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
```
You can create now a shaded version of elasticsearch + shield by running `mvn clean install`.
In your project, you can now depend on:
```xml
<dependency>
<groupId>fr.pilato.elasticsearch.test</groupId>
<artifactId>es-shaded</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>joda-time</groupId>
<artifactId>joda-time</artifactId>
<version>2.1</version>
</dependency>
```
Build then your TransportClient as usual:
```java
TransportClient client = TransportClient.builder()
.settings(Settings.builder()
.put("path.home", ".")
.put("shield.user", "username:password")
.put("plugin.types", "org.elasticsearch.shield.ShieldPlugin")
)
.build();
client.addTransportAddress(new InetSocketTransportAddress(new InetSocketAddress("localhost", 9300)));
// Index some data
client.prepareIndex("test", "doc", "1").setSource("foo", "bar").setRefresh(true).get();
SearchResponse searchResponse = client.prepareSearch("test").get();
```
If you want to use your own version of Joda, then import for example `org.joda.time.DateTime`. If you want to access to the shaded version (not recommended though), import `fr.pilato.thirdparty.joda.time.DateTime`.
You can run a simple test to make sure that both classes can live together within the same JVM:
```java
CodeSource codeSource = new org.joda.time.DateTime().getClass().getProtectionDomain().getCodeSource();
System.out.println("unshaded = " + codeSource);
codeSource = new fr.pilato.thirdparty.joda.time.DateTime().getClass().getProtectionDomain().getCodeSource();
System.out.println("shaded = " + codeSource);
```
It will print:
```
unshaded = (file:/path/to/joda-time-2.1.jar <no signer certificates>)
shaded = (file:/path/to/es-shaded-1.0-SNAPSHOT.jar <no signer certificates>)
```
This PR also removes fully-loaded module.
By the way, the project can now build with Maven 3.3.3 so we can relax a bit our maven policy.
Currently, we do not allow reads on shards which are in POST_RECOVERY which
unfortunately can cause search failures on shards which just recovered if there no replicas (#9421).
The reason why we did not allow reads on shards that are in POST_RECOVERY is
that after relocating a shard might miss a refresh if the node that executed the
refresh is behind with cluster state processing. If that happens, a user might execute
index/refresh/search but still not find the document that was indexed.
We changed how refresh works now in #13068 to make sure that shards cannot miss a refresh this
way by sending refresh requests the same way that we send write requests.
This commit changes IndexShard to allow reads on POST_RECOVERY now.
In addition it adds two test:
- test for issue #9421 (After relocation shards might temporarily not be searchable if still in POST_RECOVERY)
- test for visibility issue with relocation and refresh if reads allowed when shard is in POST_RECOVERY
closes#9421
DateHistogramIT has suite scope and therefore must take care of cleaning up after
each test. Otherwise we cannot run tests in several times with tests.iters.
The implementation this commit replaces was almost k-NN regression with
k=2, but had two bugs: (a) it depends on the empirical raw estimates
being in strictly non-decreasing order for the binary search (which they
are not); and (b) it weights the biases positively with increased
distance from the corresponding raw estimate.
“HyperLogLog in Practice” leaves the choice of exact algorithm here
fairly vague, just noting: “We use k-nearest neighbor interpolation to
get the bias for a given raw estimate (for k = 6).” The majority of
other open source HyperLogLog++ implementations appear to use k-NN
regression with uniform weights (and generally k = 6). Uniform
weighting does decrease variance, but also introduces bias at the domain
extrema. This problem, plus the use of the word “interpolation” in the
original paper, suggests (inverse) distance-weighted k-NN, as
implemented here.
Since #13068 refresh and flush requests go to the primary first and are then replicated.
One difference to before is though that if a shard is not available (INITIALIZING for example)
we wait a little for an indexing request but for refresh we don't and just give up immediately.
Before, refresh requests were just send to the shards regardless of what their state is.
In tests we sometimes create an index, issue an indexing request, refresh and
then get the document. But we do not wait until all nodes know that all primaries have ben assigned.
Now potentially one node can be one cluster state behind and not know yet that
the shards have ben started. If the refresh is executed through this node then the
refresh request will silently fail on shards that are started already because from
the nodes perspective they are still initializing. As a consequence, documents
that expected to be available in the test are now not.
Example test failures are here: http://build-us-00.elastic.co/job/elasticsearch-20-oracle-jdk7/395/
This commit changes the timeout to 1m (default) to make sure we don't miss shards
when we refresh. This will trigger the same retry mechanism as for indexing requests.
We still have to make a decision if this change of behavior is acceptable.
see #13238
From a user perspective, the main benefit from this upgrade is that the new
Lucene53Codec has disk-based norms. The elasticsearch directory has been fixed
to load these norms through mmap instead of nio.
Other changes include the removal of `max_thread_states`, the fact that
PhraseQuery and BooleanQuery are now immutable, and that deleted docs are now
applied on top of the Scorer API.
This change introduces a couple of `AwaitsFix`s but I don't think it should
hold us from merging.
This allows reducing privileges with doPrivileged to work,
otherwise it will fail with NPE.
In general, if some code wants to do that, let it. The null
check is needed, even though ProtectionDomain(CodeSource, PermissionCollection)
is more than a bit misleading: "the current Policy will not be consulted".
Additionally add a defensive check for location, since the docs
there are even more confusing: https://bugs.openjdk.java.net/browse/JDK-8129972
The jdk policy impl has both these checks.
We used to mutate the map of fields and weights while transforming to lucene query (toQuery method), by adding the default field if the map is empty. That is an anti-pattern, we should leave the original query as is instead, but still generate the proper lucene query. Also took the chance to fix a couple of issues in SimpleQueryStringBuilderTest#doAssertLuceneQuery.
Several shard-level operations that previously broadcasted a request
per shard were converted to broadcast a request per node. This commit
converts upgrade action to this new model as well.
Closes#13204