This commit moves the size and ops based flush into a synchronous API into
IndexShard and removes the time-based flush alltogether since it' basically
covered by the inactive async flush API we have today. The functionality doesn't
need to be covered by scheduled task and async APIs while we can actually make all
the decisions in a sync manner which is way easier to control and to test.
Closes#13707
Refactor the function_score query so it can be parsed on the coordinating node, split parse into fromXContent and toQuery, make FunctionScoreQueryBuilder Writeable.
Closes#13653
Given that we are moving to parsing queries on the coordinating node, the index name is not relevant anymore in QueryParseContext, as the parsing phase cannot be related to any specific index. On the contrary, the QueryShardContext is the one that holds mappings etc. and the index name too, as the lucene query creation happens on the data node and can still be related with the index that it happens against.
Changes are mainly around tests that were expecting the index name, moved to using QueryShardException in some of them, removed the index name elsewhere.
Closes#13631
Moving validation from validate() to constructors and setters for the
following query builders:
* GeoDistanceQueryBuilder
* GeoDistanceRangeQueryBuilder
* GeoPolygonQueryBuilder
* GeoShapeQueryBuilder
* GeohashCellQuery
* TermsQueryBuilder
Relates to #10217
This parser prototype allows to decleratively define parsers for XContent
instead of writing messy and error prone while loops. It encapsulates all the error handling logic
and only even tries to parse if the token types match the declaration.
I want to refactor scripting engines so we can contain dangerous "God-like" permissions
like createClassloader/sun.reflect. These are used for dynamic class generation (scripts, mocks).
This will mean some refactoring to ES core.
But first lets get the plugins in order first. I removed those permissions globally, and
fixed grants for lang-javascript, lang-python, securemock so that everything works.
lang-javascript needs no code changes, because rhino is properly written :)
lang-python needs accesscontroller blocks. securemock was already working as of 1.1
This is just a baby step, to try to do some of this incrementally! It doesn't yet provide
us anything.
Currently the tribe node version always stays 0, which can cause issues for the services that rely on cluster state version. For example, ClusterStateObserver doesn't revalidate the cluster state after change, which leads to cluster health check with wait flags to take much longer then actually needed.
Until now we had a cloud-azure plugin which is providing 3 distinct features:
* discovery on Azure
* snapshot/restore on Aure
* SMB store
This commit splits the plugin by feature so people can use either one or the other or both features.
Doc is updated accordingly.
This add equals, hashcode, read/write methods, validation, separates toQuery
and JSON parsing and adds serialization and query generation tests.
Deprecates two types of initializing the bounding box: In our documentation we
speak about specifying top/left and bottom/right corner of a bounding box. Here
we also allow for top/right and bottom/left. This adds not only to the amount
of code but also testing needed w/o too much benefit for the user other than
more chances to confuse top/right/bottom/left/latitude/longitude IMHO.
Missing: The toQuery method with type set to "indexed" is not tested at the
moment.
Cleanup changes unrelated to base refactoring:
* Switched from type String to enum for types in GeoBoundingBoxQueryBuilder.
* Switched to using type GeoPoint for storing the bounding box coordinates
instead of array of double values.
Relates to #10217 for the query refactoring part.
Relates to #12016 for how missing mappings are handled.
Adds a utility class for generating random geo data.
Adds some missing documentation.
Extend test to MEMORY type config
Fix final review comments and rebase
We moved a lot of repositories into elasticsearch, but in their new
location they retained their LICENSE.txt and NOTICE.txt files. These are
all the same, and having the license and notice and the root of the
repository should be sufficient.
This commit removes and now forbids all uses of
com.google.common.primitives.Ints across the codebase. This is one of
many steps in the eventual removal of Guava as a dependency.
Relates #13224
graduate this from a hack for insecure plugins to something we can
live with for per-module/plugin permissions, it now works reasonably
in unit tests and with Intellij and Eclipse IDEs.
remove security warnings: we will deal with these issues in a secure
way, if we cannot, then the plugin shouldn't be in our core codebase.
This PR is the second batch in moving the query validation we started
to collect in the validate() method to the corresponding setters
and constructors.
This is the more sheisty business along the same lines as
https://github.com/elastic/elasticsearch/pull/13638
1 hour total adding the real functionality, days of wasted time
on simulated fake functionality to satisfy our crazy test framework...
I debugged on the problematic jenkins machine and I think issues are
from parsing the classpath and URL normalization etc (trailing slashes
vs not, etc in URLs). So I simplifed the code, to remove this completely,
inverting the logic so we just use an exclusion list instead of inclusion one.
I also allow tests for these plugins to run from the IDE (works at least for eclipse) too.
At least for eclipse this is even less realistic as it piles all the code (src and test)
into a single codebase, but it means you can *use it* and you just have to run mvn verify
before pushing as always. And as always... best effort.
A JTS bug causes a misinterpretation of polygon coordinates leading to an unhelpful "geom" AssertionError. While this assertion occurs approx 0.02% of the time it can lead to a misleading test failure. This patch catches the geom assertion and retries randomShapeCreation. For safety a threshold is set to prevent unlimited retrying - though 1 retry is typically sufficient for correcting the invalid shape.
closes#13551
A JTS bug causes a misinterpretation of polygon coordinates leading to an unhelpful "geom" AssertionError. While this assertion occurs approx 0.02% of the time it can lead to a misleading test failure. This patch catches the geom assertion and retries randomShapeCreation. For safety a threshold is set to prevent unlimited retrying - though 1 retry is typically sufficient for correcting the invalid shape.
closes#13551
We don't have a plugin .zip for unit tests, so we can't do it
correctly. But we can approximate it better, so that if code
is simply missing an AccessController block at least tests will fail.
Classnames change quickly due to refactorings etc. If that happens in a minor release
we loose the ability to deserialize the exceptoin coming from another node sicne we today
look it up by classname. This change uses a dedicated static id instead of the classname
to lookup the actual class.
Especially the worst of the worst with thread permissions: for example,
this prevents some code from starting daemon thread that will outlive
the elasticsearch process and hang around doing evil shit.
This commit removes unnecesssary use of ExceptionHelpers where we actually
should serialize / deserialize the actual exception. This commit also
fixes one of the oddest problems where the actual exception was never
rendered / printed if `all shards failed` due to a missing cause.
This commit unfortunately doesn't fix Snapshot/Restore which is almost
unfixable since it has to serialize XContent and read from it which can't
transport exceptions.
Weighted centroid, morton hash, and geohash can be imprecise (computation error) to 1e-5. The previous compareTo set this tolerance too strict (1e-6) causing a reproducible comparison error on weighted centroid (#13558). This change relaxes the tolerance to the acceptable computation error of 1e-5
closes#13558
Improve IndexingMemoryController a bit:
- promptly push indexing buffer changes to IndexWriter, instead of waiting for next refresh/flush
- don't wait for merges to finish before dropping a shards's indexing buffer to 512 KB once it's inactive
- fix NPE if indices.memory.index_buffer_size is in node's settings with a bytes (not %) unit
- add some more logger.debug
This commit removes and now forbids all uses of
com.google.common.base.Joiner across the codebase. This is one of many
steps in the eventual removal of Guava as a dependency.
Relates #13224
This commit removes and now forbids all uses of
com.google.common.math.LongMath across the codebase. This is one step
of many in the eventual removal of Guava as a dependency.
This commit removes and now forbids all uses of
com.google.common.collect.Iterables across the codebase. This is one of
many steps in the eventual removal of Guava as a dependency.
Relates #13224
This commit removes and now forbids all uses of
com.google.common.base.Preconditions across the codebase. This is one
of many steps in the eventual removal of Guava as a dependency.
Relates #13224
https://github.com/elastic/elasticsearch/pull/12908
Moved output to verbose level and additionally changed the plugin info output format
Before:
```shell
PluginInfo{name='cloud-aws', description='The Amazon Web Service (AWS) Cloud plugin allows to use AWS API for the unicast discovery mechanism and add S3 repositories.', site=false, jvm=true, classname=org.elasticsearch.plugin.cloud.aws.CloudAwsPlugin, isolated=true, version='2.1.0-SNAPSHOT'}
```
After:
```shell
- Plugin information:
Name: cloud-aws
Description: The Amazon Web Service (AWS) Cloud plugin allows to use AWS API for the unicast discovery mechanism and add S3 repositories.
Site: false
Version: 2.1.0-SNAPSHOT
JVM: true
* Classname: org.elasticsearch.plugin.cloud.aws.CloudAwsPlugin
* Isolated: true
```
Fixes#12907
If we don't wait we might retrieve the cluster state before second node
was added when we try to add the delegate in which the discovery node for
node_2 is null.
When a single node starts up it will first elect itself as master and then tries to
recover the cluster state or, if there is none, initialize an empty one and publish it.
Until it has done that, the cluster state will contain a global block and
requests might fail with
SERVICE_UNAVAILABLE/1/state not recovered / initialized
We need to wait for green.
This commit fixes a test bug in
o.e.a.s.b.n.TransportBroadcastByNodeActionTests. Namely, the randomized
test allowed for the creation of cluster states that allocated indices
having zero shards. This ultimately surfaced in a
NoSuchElementException when attempting to iterate over the nonexistent
shards. The fix is merely to draw the random number of shards from 1 to
10 instead of 0 to 10.
This commit replaces the usage of LoadedCache with a simple CHM and calls
to computeIfAbsent and adds LoadingCache and CacheLoader to forbidden APIs
Relates to #13224
* Added BWC indices
* Added snapshot version to Version.java
* Fixed create_bwc_index to use localhost instead of localhost and 127.0.0.1 (problem with ipv4/6 setup)
This commit removes and now forbids all uses of
Function, Charsets, Collections2 across the codebase. This
is one of many steps in the eventual removal of Guava as a dependency.
Relates #13224
When publishing a new cluster state, the master will send it to all the node of the cluster, noting down how many *master* nodes responded successfully. The nodes do not yet process the new cluster state, but rather park it in memory. As soon as at least minimum master nodes have ack-ed the cluster state change, it is committed and a commit request is sent to all the node that responded so far (and will respond in the future). Once receiving the commit requests the nodes continue to process the cluster state change as they did before this change.
A few notable comments:
1. For this change to have effect, min master nodes must be configured.
2. All basic cluster state validation is done in the first phase of publish and is thus now part of `ShardOperationResult`
3. A new `COMMIT_TIMEOUT` settings is introduced, dictating how long a master should wait for nodes to ack the first phase. Unlike `PUBLISH_TIMEOUT`, if waiting for a commit times out, the cluster state change will be rejected.
4. Failing to achieve a min master node of acks, will cause the master to step down as it clearly doesn't have enough active followers.
5. Previously there was a short window between the moment a master lost it's followers and it stepping down because of node fault detection failures. In this short window, the master could process any change (but fail to publish it). This PR closes this gap to 0.
6. A dedicated pending cluster states queue was added to keep pending non-comitted cluster states and manage the logic around processing committed cluster states. See #13303 for details.
Closes#13062 , Closes#13303
This commit allows to refresh the info service in a blocking fashion
which allows tests to prevent installing listeners alltogether and
makes the class easier to test. Installing a listnener is always subject
to concurrent modifications where the listener might be called mulitple times
or with stale information which causes tests to fail.
Java 8 allows for method references which in-turn will cause
compile errors if a method is not visible while reflection fails late
and maybe too late. We can now register Request instances via FooRequest::new
instead of passing FooRequest.class and call it's ctor via reflection.
This test sometimes fails because the first node is elected as master and waits 30s for incoming joins but in the meanwhile the 3 other nodes form a cluster on their side. The index will be created and its shards allocated on these 3 nodes, then the test checks for the number of shards on each node (it should be 2 or 3) but because the first node has not fully join the cluster yet one node will have 5 shards.
closes#13305
1. FileSystem wrapping code is broken, thats why you get providermismatch exception!
Instead of fixing this, it SuppressesForbidden!!!!
2. Because it uses SuppressForbidden on the test, the whole thing is lenient, it uses java.io.File for example!
3. Of course it fails consistently on windows because it can't remove files, because it leaks file handles (locks)
like a sieve since it does not close node environment. With correct wrapping this is always detected by e.g.
our leak detection FS. Instead of fixing the leak, it assumesFalse(WINDOWS) !!!!!
I do not know how this snuck past me, but I need this fixed to remove setAccessible.
- promptly push indexing buffer changes to IndexWriter, instead of waiting for next refresh/flush
- don't wait for merges to finish before dropping a shards's indexing buffer to 512 KB
- fix NPE if indices.memory.index_buffer_size is in node's settings with a bytes (not %) unit
- add some more logger.debug
During the second phase of recovery, replayed transaction log entries may need to wait on mapping changes that have not yet propagated to the target node. Currently we correctly replay the operation at a later stage, but we acknowledge the replay request before actually performing the work.
Example failure: http://build-us-00.elastic.co/job/es_feature_two_phase_pub/859/Closes#13535
If the machine is very slow this test fails if the delta of the unallocaiton
timestamp and the last scheduled delay is greater than the scheduled delay time.
Today this is really horrible, and we have a PR sent to fix it, but nobody
does anything: https://github.com/aws/aws-sdk-java/pull/432
With java 9, we cannot even grant the permission, this kind of sheistiness is not allowed,
and s3 repository is completely broken.
The problem is their code is still broken, and won't handle neither SecurityException (our PR)
nor the new InaccessibleObjectException they will get from java 9.
We use a really hacky hack to deliver an exception that their code catches (IllegalAccessException) instead.
This means s3 repository is working on java 9, and we close off access to sun.security.ssl completely
Don't worry, I will fix the rest. But some of those remaining will need a lucene upgrade,
we need to add a getter or two for tests to do things cleanly.
In addition to being a big security problem, setAccessible is a risk
for java 9 migration. We need to clean up our code so we can ban it
and eventually enforce this with security manager for third-party code, too,
or we may have problems.
Instead of using setAccessible, use the correct modifier (e.g. public).
TODO: ban in tests
TODO: ban in security manager at runtime
This commit removes and now forbids all uses of
com.google.common.collect.ImmutableSortedMap across the codebase. This
is one of many steps in the eventual removal of Guava as a dependency.
Relates #13224
Instead of asking blob store to create output for posting blob content, this change provides that content of the blob to the blob store for writing. This will significantly simplify the interface for S3 and Azure plugins.
This commit replaces:
* com.google.common.util.concurrent.ListenableFuture
* com.google.common.util.concurrent.SettableFuture
* com.google.common.util.concurrent.Futures
* com.google.common.util.concurrent.MoreExecutors
And forbits its usage via forbidden APIs. This is one of
many steps in the eventual removal of Guava as a dependency.
Relates to #13224
We have a gazillion ways to specify the source of the search request.
There is no need to so many we can reduce them dramatically and also remove
some deprecated API.
The public org.elasticsearch.common.settings.Settings#getAsMap method
leaks the dependency on Guava by returning a
com.google.common.collect.ImmutableMap. The leaking of this dependency
should be removed in preparation for the eventual complete removal of
Guava as a dependency.
Relates #13224
The filter element has been deprecated in the function_score query parser. Whenever a filter is found it gets wrapped into a query automatically. The filter in the java api builder is always null, there is no way to set its value, just a leftover.
Moving the query building functionality from the parser to the builders
new toQuery() method analogous to other recent query refactorings.
Relates to #10217
PR goes against the query-refactoring branch
The initial implementation of two phase commit based cluster state publishing (#13062) relied on a single in memory "pending" cluster state that is only processed by ZenDiscovery once committed by the master. While this is fine on it's own, it resulted in an issue with acknowledged APIs, such as the open index API, in the extreme case where a node falls behind and receives a commit message after a new cluster state has been published. Specifically:
1) Master receives and acked-API call and publishes cluster state CS1
2) Master waits for a min-master nodes to receives CS1 and commits it.
3) All nodes that have responded to CS1 are sent a commit message, however, node N didn't respond yet
4) Master waits for publish timeout (defaults to 30s) for all nodes to process the commit. Node N fails to do so.
5) Master publishes a cluster state CS2. Node N responds to cluster state CS1's publishing but receives cluster state CS2 before the commit for CS1 arrives.
6) The commit message for cluster CS1 is processed on node N, but fails because CS2 is pending. This caused the acked API in step 1 to return (but CS2 , is not yet processed).
In this case, the action indicated by CS1 is not yet executed on node N and therefore the acked API calls return pre-maturely. Note that once CS2 is processed but the change in CS1 takes effect (cluster state operations are safe to batch and we do so all the time).
An example failure can be found on: http://build-us-00.elastic.co/job/es_feature_two_phase_pub/314/
This commit extracts the already existing pending cluster state queue (processNewClusterStates) from ZenDiscovery into it's own class, which serves as a temporary container for in-flight cluster states. Once committed the cluster states are transferred to ZenDiscovery as they used to before. This allows "lagging" cluster states to still be successfully committed and processed (and likely to be ignored as a newer cluster state has already been processed).
As a side effect, all batching logic is now extracted from ZenDiscovery and is unit tested.
This commit removes all the optional injects etc. for the FetchServices and
provides a Client via IndexQueryParserService / Context. This allows direct
injection instead of optional injection. It also allows to remove all the
unnecessary services and use the fetch code where it belongs.
This commit also adds testing infrastructure for intercepting client calls
to AbstractQueryTestCase to support GET calls in query tests.
Moving the query building functionality from the parser to the builders
new toQuery() method analogous to other recent query refactorings.
Relates to #10217
PR goes against the query-refactoring branch
This commit removes and now forbids all uses of
com.google.common.collect.Queues across the codebase. This is one of
many steps in the eventual removal of Guava as a dependency.
Relates #13224
This commit removes and now forbids all uses of
com.google.common.collect.ImmutableSortedSet across the codebase. This
is one of many steps in the eventual removal of Guava as a dependency.
Relates #13224
The MockInternalClusterInfoService depends on a constant and helper
method from actual test cases. This moves the constant and helper method
into the mock itself. Without this change, the classes put into the test
jar are not completely useable on their own (since this mock is in the
test jar, but the test cases are not).
This commit removes and now forbids all uses of
com.google.common.base.Preconditions#checkNotNull across the codebase.
This is one of many steps in the eventual removal of Guava as a
dependency.
Relates #13224
The semantics of the `boost` parameter for `function_score` changed. This is
due to the fact that Lucene now requires that query boosts and top-level boosts
are applied the same way.