ClusterRebalanceAllocationDecider did not take unassigned shards into account
that are temporarily marked as ingored. This can cause unexpected behavior
when gateway allocator is still fetching shards or has marked shareds as ignored
since their quorum is not met yet.
Closes#14670Closes#14678
Currently the next delay is calculated based on System.currentTimeMillis() but the actual shards to delay based on the last time the GatewayAllocator tried to assign/delay the shard.
This introduces an inconsistency for the case where shards should have been delay-allocated between the GatewayAllocator-based timestamp and System.currentTimeMillis().
Closes#14765
DateHistogramTests had some dependency on groovy scripts and
were moved to the lang-groovy module. This PR moves it back
and replaces use of groovy scripts by a mock script engine.
Removing three test cases that were testing doing some date
manipulation using script, since these are more groovy script
tests than testing the DateHistogram aggregation.
This commit changes TransportRequestOptions and TransportResponseOptions
to be immutable. This is to address an issue where the empty options
were being mutated permanently altering their state. Making these
objects immutable is just good, clean coding.
This fixes an issue where if the field for the aggregation was unmapped the extended bounds would get dropped and the resulting buckets would not cover the extended bounds requested.
Closes#14735
This fixes an issue where if the field for the aggregation was unmapped the extended bounds would get dropped and the resulting buckets would not cover the extended bounds requested.
Closes#14735
closes#14726
Squashed commit of the following:
commit 5b591e98570e3fa481b2816a44063b98bff36ddf
Author: Robert Muir <rmuir@apache.org>
Date: Fri Nov 13 00:54:08 2015 -0500
add assumption for self-signing in PluginManagerTests
commit ed11e5371b6f71591dc41c6f60d033502cfcf029
Author: Robert Muir <rmuir@apache.org>
Date: Fri Nov 13 00:20:59 2015 -0500
show error output from integ test startup
commit d8b187a10e95d89a0e775333dcbe1aaa903fb376
Author: Robert Muir <rmuir@apache.org>
Date: Thu Nov 12 22:14:11 2015 -0500
fix gradle check under jigsaw
This commit removes all noreleases and cuts over to Lucene 5.4 GeoPointField type. Included are randomized testing updates to unit and integration test suites for ensuring full backward compatability with existing geo_point indexes.
Just suck in the system policy, so its compatible with any version of java.
It means it also respects configuration (e.g. for monitoring agents)
Closes#14704
The disruption rules are changed to work on all transport addresses that are bound by a node (not only publish address).
This is important as UnicastZenPing creates fake DiscoveryNode instances which match one of the bound addresses and not necessarily the publish address.
Closes#14625Closes#14653
After a delayed reroute of a shard, RoutingService misses to schedule a new delayed reroute of other delayed shards.
Closes#14494Closes#14010Closes#14445
Transitive dependencies can be confusing and hard to deal with when
conflicts arise between them. This change removes transitive
dependencies from elasticsearch, and forces any dependency conflicts to
be resolved manually, instead of automatically by gradle.
closes#14627
This allows different circuit breakers to have different logging levels.
It's useful when diagnosing problems (say for instance with the
fielddata breaker) and not seeing the enormous amount of logging from
the request breaker.
The log messages use the breaker name for logging, so example logging
will look like:
```
[2015-11-10 09:51:52,993][TRACE][indices.breaker.fielddata] [fielddata] Adding [27b][body] to used bytes [new used: [27b], limit: 623326003 [594.4mb], estimate: 27 [27b]]
[2015-11-10 09:51:53,000][TRACE][indices.breaker.fielddata] [fielddata] Adjusted breaker by [453] bytes, now [480]
[2015-11-10 09:51:53,016][TRACE][indices.breaker.request ] [request] Adjusted breaker by [16440] bytes, now [16440]
[2015-11-10 09:51:53,018][TRACE][indices.breaker.request ] [request] Adjusted breaker by [-16440] bytes, now [0]
```
`AbstractLegacyBlobContainer` was kept for historical reasons (see #13434).
We can migrate Azure and S3 repositories to use the new methods added in #13434 so we can remove `AbstractLegacyBlobContainer` class.
We only passed the engine config since we had no chance to get the cache
etc. from the IndexSearcher. Now that we have these getters we can just pass the
searcher instead.
Some dependencies must be specified in a couple places in the build.
e.g. randomized runner is specified both in buildSrc (for the gradle
wrapper plugin), as well as in the test-framework.
This change creates buildSrc/versions.properties which acts similar to
the set of shared version properties we used to have in the maven parent
pom.
This adds the `cluster.routing.allocation.total_shards_per_node`
setting, which limits the total number of shards across all indices on
each node. It defaults to -1 and can be dynamically configured.
Resolves#14456
GeoDistanceRangeQueryTest was using an inconsistent tolerance when validating from=0 with include_lower/upper set to false. This commit changes the assertion such that validation is consistent with builder logic.
- Same as for TransportInstanceSingleOperationAction and TransportReplicationAction, onClusterServiceClose consistently throws a NodeClosedException now.
- Added retry logic if master could not publish cluster state or stepped down before publishing (ZenDiscovery). The test IndexingMasterFailoverIT shows the issue.
- Simplified retry logic by moving bits from different places into shared retry method.
- Removed boolean flag retrying that aborted retrying after a single master node change (now we retry until timeout).
- Two existing predicates that deal with master node changes unified in a single predicate masterNodeChangedPredicate
Closes#14222
We used to test only json parsing as we relied on QueryBuilder#toString which uses the json format. This commit makes sure that we now output the randomly generated queries using a random format, and that we are always able to parse them correctly.
This revealed a couple of issues with binary objects that haven't been migrated yet to be structured Writeable objects. We used to keep them in the format they were sent while parsing, which led to problems when printing them out as we expected them to always be in json format. Also we can't compare different BytesReference objects that hold the same content but in different formats (unless we want to parse them as part of equal and hashcode, doesn't seem like a good idea) and verify that we have parsed the right objects if they can be different formats. The fix is to always keep binary objects in json format. Best fix would be not to have binary objects, which we'll get to once we are done with the search refactoring.
Closes#14415
Latest version of lucene deprecated Query#setBoost and Query#getBoost which made queries effectively immutable. Those methods need to be replaced with `BoostQuery` that wraps any query that needs boosting.
This commit replaces usages of setBoost with BoostQuery and adds it to forbidden-apis for prod code.
Usages of `getBoost` are only partially removed, as some will have to stay for backwards compatibility.
Closes#14264
If you run tests under a 32-bit jvm, you will get a test failure in IndexStoreTests,
the logic there is wrong in the case of 32-bit (its NIOFSDirectory on linux).
Also if mlockall fails, you'll see huge bogus values (because of use of `long` instead of `NativeLong`)
finally add seccomp support for 32 bit too, and clean up all its `long` usage as well.
run.sh and run.bat were calling out to the old maven build system.
This is no longer in place, so we've created new gradle tasks to
start an elasticsearch node from the current codebase.
fixed#14423
The completion suggester provides auto-complete/search-as-you-type functionality.
This is a navigational feature to guide users to relevant results as they are typing, improving search precision.
It is not meant for spell correction or did-you-mean functionality like the term or phrase suggesters.
The completions are indexed as a weighted FST (finite state transducer) to provide fast Top N prefix-based
searches suitable for serving relevant results as a user types.
closes#10746
Random code shouldn't be listening on sockets elsewhere.
Today its the wild west, but we only need to grant access to what the user configured.
This means e.g. multicast plugin has to declare its intentions in its security.policy
Closes#14549
Closes#14595
Squashed commit of the following:
commit d0b2b262e9dcdbc2aee163b9a84db082c8b5b96b
Author: Robert Muir <rmuir@apache.org>
Date: Fri Nov 6 22:36:54 2015 -0500
Switch to JarInputStream, to contain suppressforbidden. Also add a test that fails if path is not accessible (regardless of whether its a jar)
commit f99c1d240db23ceb2a06987b3bd69eae0229550b
Author: Robert Muir <rmuir@apache.org>
Date: Fri Nov 6 22:16:16 2015 -0500
remove leniency in i/o here
commit b160d4303ee81a8c9298729596ecbc893f5f8894
Author: Robert Muir <rmuir@apache.org>
Date: Fri Nov 6 21:58:21 2015 -0500
Fix Build to correctly treat URLs and to not leak a file handle
Eclipse does not have the ability to differentiate test dependencies
from main dependencies. This causes what looks like a circular
dependency through test-framework. This change sets up an additional
core-tests project for eclipse only, which removes this problem.
This commit prevents running rebalance operations if the store allocator is
still fetching async shard / store data to prevent pre-mature rebalance decisions
which need to be reverted once shard store data is available. This is typically happening
on rolling restarts which can make those restarts extremely painful.
Closes#14387
This commit adds the abstraction layer to GeoPointFieldMapper needed to cut over to Lucene 5.4's new GeoPointField type while maintaining backward compatibility with 'legacy' geo_point indexes.
This commit addresses an issue in TransportBroadcastByNodeAction.
Namely, if in between a master leaving the cluster and a new one being
assigned, a request that relies on TransportBroadcastByNodeAction
(e.g., an indices stats request) is issued,
TransportBroadcastByNodeAction might attempt to send a request to a
node that is no longer in the local node’s cluster state.
The exact circumstances that lead to this are as follows. When the
master leaves the cluster and another node’s master fault detection
detects this, the local node will update its local cluster state to no
longer include the master node. However, the routing table will not be
updated. This means that in the preparation for sending the requests in
TransportBroadcastByNodeAction, we need to check that not only is a
shard assigned, but also that it is assigned to a node that is still in
the local node’s cluster state.
This commit adds such a check to the constructor of
TransportBroadcastByNodeAction. A new unit test is added that checks
that no request is sent to the master node in such a situation; this
test fails with a NullPointerException without the fix. Additionally,
the unit test TransportBroadcastByNodeActionTests#testResultAggregation
is updated to also simulate a master failure. This updated test also
fails prior to the fix.
Closes#14584
if we set the inactive time for the shard via API the entire test if fully time
dependent and might fail if we concurrently check if the shard is inactive
while the document we are indexing is in-flight.
This commit restores the build properties provided in
org.elasticsearch.Build. This class previously obtained the build hash
and timestamp from a resource es-build.properties that was included in
the jar and produced by the Maven resources plugin. After the switch to
Gradle, the production of this file was lost and these build properties
defaulted to “NA” in all instances.
The most important place that the build hash is used is in the plugin
manager to determine the URL of staging artifacts for plugins.
The build hash is also used in several responses including the /_nodes
response and the response to HTTP GET requests on the root path.
These properties can now be obtained from the jar manifest as they are
currently placed there by the gradle-info plugin. However, only the
short hash is provided. We now read the manifest for these properties
and no longer provide the full hash in responses to HTTP GET requests
on the root path.
Originally, only numeric values were allowed for parameters of the
'geohash_grid' aggregation in contrast to other places in the REST
API.
With this commit we also allow that parameters are enclosed in quotes (i.e.
as JSON strings). Additionally, with this commit the valid range for
'precision' is enforced for the Java API and the REST API (the latter was
previously missing the check).
Closes#13132
This commit fixes a compilation issue due to modified type inference in
the latest JDK 9 early access builds. We just have to lend a helping
hand to type inference by being explicit about the type.
Closes#14496
Current processors setting is not reflected in nodes info API
("os.available_processors"). Add os.allocated_processors to shows
actual number of processors that we are using.
the current jar is over 3 years old, we should upgrade it for bugfixes.
the current integration could be more secure: set a global policy and enforce additional (compile-time) checks
closes#14466
query_binary and filter_binary are unused at this point, as we only parse on the coordinating node and the java api only holds structured java objects for queries and filters, meaning they all implement Writeable and get natively serialized.
Relates to #14308Closes#14433
This commit fixes a bug in cat thread pool. This bug resulted from a
refactoring of the handling of thread pool types. To get the previously
displayed thread pool type from the ThreadPoolType object,
ThreadPoolType#getType needs to be called.
Reverting fix for #13884 because it was discussed to be too
fragile with respect to future changes in lucene simple query
string parsing. Undoes fix and removes test.
If we have a shard failure on SearchPhaseExecutionException
we can deduplicate the original cause and use the more informative
ShardSearchFailure containing the shard ID etc. but we should deduplicate
the actual cause to prevent stack trace duplication.
This commit forbids the changing of thread pool types for any thread
pool. The motivation here is that these are expert settings with
little practical advantage.
Closes#14294, relates #2509, relates #2858, relates #5152
IndexQueryParserService is only a factory for QueryShardContext instances
which are not even bound to a shard. The service only forwards dependencies and even
references node level service directly which makes dependency seperation on shard,
index and node level hard. This commit removes the service entirely, folds the creation
of QueryShardContext into IndexShard which is it's logical place and detaches the
ClusterService needed for index name matching during query parsing with a simple predicate
interface on IndexSettings.
This PR adds a randomized test to the query test base class
that mutates an otherwise correct query by adding an additional
object into the query hierarchy. Doing so makes the query illegal
and should trigger some kind of exception. The new test revelead
that some query parsers quietly return queries when called with
such an illegal query. Those are also fixed here.
Relates to #10974
Similarly to what we did with the search api, we can now also move query parsing on the coordinating node for the validate query api. Given that the explain api is a single shard operation (compared to search which is instead a broadcast operation), this doesn't change a lot in how the api works internally. The main benefit is that we can simplify the java api by requiring a structured query object to be provided rather than a bytes array that will get parsed on the data node. Previously if you specified a QueryBuilder it would be serialized in json format and would get reparsed on the data node, while now it doesn't go through parsing anymore (as expected), given that after the query-refactoring we are able to properly stream queries natively. Note that the WrapperQueryBuilder can be used from the java api to provide a query as a string, in that case the actual parsing of the inner query will happen on the data node.
Relates to #10217Closes#14384
The RR gradle plugin is at
https://github.com/randomizedtesting/gradle-randomized-testing-plugin.
However, we currently have a copy of this, since the plugin is still in
heavy development. This change moves the files around so they can be
copied directly from the elasticsearch fork to that repo, for ease of
syncing.
Removes the mapping transform feature which when used made debugging very
difficult. Users should transform their documents on the way into
Elasticsearch rather than having Elasticsearch do it.
Closes#12674
Most query parsers throw a ParsingException when they trying
to parse a field with an unknown name. This adds a generic
check for this to the AbstractQueryTestCase so the behaviour
gets tested for all query parsers. The test works by first
making sure the test query has a `boost` field and then
changing this to an unknown field name and checking for an
error.
There are exceptions to this for WrapperQueryBuilder
and QueryFilterBuilder, because here the parser only expects
the wrapped `query` element. MatchNoneQueryBuilder and
MatchAllQueryBuilder so far had setters for boost() and
queryName() but didn't render them, which is added here for
consistency.
GeoDistance, GeoDistanceRange and GeoHashCellQuery so far
treat unknown field names in the json as the target field name
of the center point of the query, which so far was handled by
overwriting points previously specified in the query. This
is changed here so that an attempt to use two different field names
to specify the central point of the query throws a
ParsingException
Relates to #10974
This change moves all the analysis component registration to the node level
and removes the significant API overhead to register tokenfilter, tokenizer,
charfilter and analyzer. All registration is done without guice interaction such
that real factories via functional interfaces are passed instead of class objects
that are instantiated at runtime.
This change also hides the internal analyzer caching that was done previously in the
IndicesAnalysisService entirely and decouples all analysis registration and creation
from dependency injection.
This change removes the leftover pom files. A couple files were left for
reference, namely in qa tests that have not yet been migrated (vagrant
and multinode). The deb and rpm assemblies also still exist for
reference when finishing their setup in gradle.
See #13930
The test jar was previously built in maven by copying class files. With
gradle we now have a proper test framework artifact. This change moves
the classes used by the test framework into the test-framework module.
See #13930
Closes#14353
Squashed commit of the following:
commit edae0729f71ea3d3f9fa9c0d27c9effc042eb5a9
Author: Robert Muir <rmuir@apache.org>
Date: Thu Oct 29 14:13:42 2015 -0400
update sha1 and simplify test
commit 635c4f245d66ad353a16267c810e02b725553fad
Author: Robert Muir <rmuir@apache.org>
Date: Thu Oct 29 07:01:26 2015 -0400
Add threadgroup isolation.
Code with `modifyThread` and `modifyThreadGroup` may only modify
its own threadgroup (or an ancestor of that). This enforces
what is intended by the ThreadGroup class.
This has two immediate implications:
1. Code without these permissions (scripts) may not create or mess with threads
2. ES application threads cannot mess with Java system threads
ES puts all application threads in one single group today, but in the future
this can be organized better, and we will have more isolation in the system.
This commit fixes two issues that could arise when a loader throws an
exception during a load in Cache#computeIfAbsent.
The underlying issue is that if the loader throws an exception,
Cache#computeIfAbsent would attempt to remove the polluted entry from
the cache. However, this cleanup was performed outside of the segment
lock. This means another thread could race and expire the polluted
entry (leading to NPEs) or get a polluted entry out of the cache before
the loading thread had a chance to cleanup (leading to ISEs).
The solution to the initial problem of correctly handling failed cached
loads is to check for failed loads in all places where entries are
retrieved from the map backing the segment. In such cases, we treat it
as if there was no entry in the cache, and we clean up the cache on a
best-effort basis. All of this is done outside of the segment lock to
avoid reintroducing the deadlock that was initially a problem when
loads were executed under a segment lock.
This commit adds a unit test for a deadlock issue that existed prior to
commit 1d0b93f766. While commit
1d0b93f766 seems to have addressed the
deadlock issue it would be more robust to have a unit test for it and a
unit test will reduce the risk that future maintenance on Cache will
reintroduce the deadlock issue. This test reliably fails prior to but
passes after commit 1d0b93f766.
Currently a `simple_query_string` query with one term and multiple fields
gets parsed to a BooleanQuery where the number of clauses is determined
by the number of fields, which lead to wrong calculation of `minimum_should_match`.
This PR adds checks to detect this case and wrap the resulting BooleanQuery into
another BooleanQuery with just one should-clause, so `minimum_should_match`
calculation is corrected.
In order to differentiate between the case where one term is queried across
multiple fields and the case where multiple terms are queried on one field,
we override a simplification step in Lucenes SimpleQueryParser that reduces
a one-clause BooleanQuery to the clause itself.
Closes#13884
This commit adds a listener mechanism for executing callbacks when
exceptional situations occur sending a shard failure message to the
master. The two types of exceptional situations that can occur are if
the master is not known and if the transport request exception handler
is invoked for any reason after sending the shard failed request to the
master. This commit only adds the infrastructure for executing
callbacks when one of these exceptional situations occur; no effort is
made to properly handle the exceptional situations. Some unit tests are
added for ShardStateAction to test that the listener infrastructure is
correct.
Relates #14252
The only way to refer to the plain highlighter is now `plain`, the only way to refer to the fast vector highlighter is `fvh` and the only way to refer to the postings highlighter is `postings`. The name variants like `highlighter`, `postings-highlighter` and `fast-vector-highlighter` have been removed.
Similarly to what we did with the search api, we can now also move query parsing on the coordinating node for the explain api. Given that the explain api is a single shard operation (compared to search which is instead a broadcast operation), this doesn't change a lot in how the api works internally. The main benefit is that we can simplify the java api by requiring a structured query object to be provided rather than a bytes array that will get parsed on the data node. Previously if you specified a QueryBuilder it would be serialized in json format and would get reparsed on the data node, while now it doesn't go through parsing anymore (as expected), given that after the query-refactoring we are able to properly stream queries natively.
Closes#14270
This is not needed: full mvn verify passes.
Furthermore, there are all kinds of checks for this case
(rejected while shutting down) in the actual code, so there
is no need to have it here. If its supposed to be non-fatal,
then we add the missing places to the actual code, not globally to all threads.
Some jenkins servers have this, but our codebase normalization doesn't
follow symlinks. Add this so that its correct.
Only really impacts tests, i suppose it helps if someone has a symlinked plugins/
but that is not recommended :)
* plugin authors can use full policy syntax, including codebase substitution
properties like core syntax.
* simplify test logic.
* move out test-framework permissions to separate file.
Closes#14311
On _lastWriteNanos_ we use System.nanoTime() to initialize this since:
* we use the value for figuring out if the shard / engine is active so if we startup and no write has happened yet we still consider it active
for the duration of the configured active to inactive period. If we initialize to 0 or Long.MAX_VALUE we either immediately or never mark it
inactive if no writes at all happen to the shard.
* we also use this to flush big-ass merges on an inactive engine / shard but if we we initialize 0 or Long.MAX_VALUE we either immediately or never
commit merges even though we shouldn't from a user perspective (this can also have funky sideeffects in tests when we open indices with lots of segments
and suddenly merges kick in.