This commit introduces SearchOperationListeneres which allow to hook
into search operation lifecycle and execute operations like slow-logs
and statistic collection in a transparent way. SearchOperationListenrs
can be registered on the IndexModule just like IndexingOperationListeners.
The main consumers (slow log) have already been moved out of IndexService
into IndexModule which reduces the dependency on IndexService as well as
IndexShard and makes slowlogging transparent.
Closes#17398
Today the basic node settings like `node.data` and `node.master` can't really be fully validated
since we allow to specify custom user attributes on the node level. We have to, in order to
support that, add a wildcard setting for `node.*` to let these setting pass validation.
Instead we should require a more contraint prefix like `node.attr.` that defines a namespace
that is reserved for user attributes.
This commit adds a new namespace for attributes in `node.attr`.
Closes#17280
When we test we add `-Djna.nosys=true` to the system properties but
we don't add it to system properties when running the naming conventions
test. This was causing the build to fail on a newly minted Ubuntu 15.10
machine, presumably because I made the mistake of installing maven using
the system package manager.
This adds a new `/_cluster/allocation/explain` API that explains why a
shard can or cannot be allocated to nodes in the cluster. Additionally,
it will show where the master *desires* to put the shard, according to
the `ShardsAllocator`.
It looks like this:
```
GET /_cluster/allocation/explain?pretty
{
"index": "only-foo",
"shard": 0,
"primary": false
}
```
Though, you can optionally send an empty body, which means "explain the
allocation for the first unassigned shard you find".
The output when a shard is unassigned looks like this:
```
{
"shard" : {
"index" : "only-foo",
"index_uuid" : "KnW0-zELRs6PK84l0r38ZA",
"id" : 0,
"primary" : false
},
"assigned" : false,
"unassigned_info" : {
"reason" : "INDEX_CREATED",
"at" : "2016-03-22T20:04:23.620Z"
},
"nodes" : {
"V-Spi0AyRZ6ZvKbaI3691w" : {
"node_name" : "Susan Storm",
"node_attributes" : {
"bar" : "baz"
},
"final_decision" : "NO",
"weight" : 0.06666675,
"decisions" : [ {
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index include filters [foo:\"bar\"]"
} ]
},
"Qc6VL8c5RWaw1qXZ0Rg57g" : {
"node_name" : "Slipstream",
"node_attributes" : {
"bar" : "baz",
"foo" : "bar"
},
"final_decision" : "NO",
"weight" : -1.3833332,
"decisions" : [ {
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated on the same node id [Qc6VL8c5RWaw1qXZ0Rg57g] on which it already exists"
} ]
},
"PzdyMZGXQdGhqTJHF_hGgA" : {
"node_name" : "The Symbiote",
"node_attributes" : { },
"final_decision" : "NO",
"weight" : 2.3166666,
"decisions" : [ {
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index include filters [foo:\"bar\"]"
} ]
}
}
}
```
And when the shard *is* assigned, the output looks like:
```
{
"shard" : {
"index" : "only-foo",
"index_uuid" : "KnW0-zELRs6PK84l0r38ZA",
"id" : 0,
"primary" : true
},
"assigned" : true,
"assigned_node_id" : "Qc6VL8c5RWaw1qXZ0Rg57g",
"nodes" : {
"V-Spi0AyRZ6ZvKbaI3691w" : {
"node_name" : "Susan Storm",
"node_attributes" : {
"bar" : "baz"
},
"final_decision" : "NO",
"weight" : 1.4499999,
"decisions" : [ {
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index include filters [foo:\"bar\"]"
} ]
},
"Qc6VL8c5RWaw1qXZ0Rg57g" : {
"node_name" : "Slipstream",
"node_attributes" : {
"bar" : "baz",
"foo" : "bar"
},
"final_decision" : "CURRENTLY_ASSIGNED",
"weight" : 0.0,
"decisions" : [ {
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated on the same node id [Qc6VL8c5RWaw1qXZ0Rg57g] on which it already exists"
} ]
},
"PzdyMZGXQdGhqTJHF_hGgA" : {
"node_name" : "The Symbiote",
"node_attributes" : { },
"final_decision" : "NO",
"weight" : 3.6999998,
"decisions" : [ {
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index include filters [foo:\"bar\"]"
} ]
}
}
}
```
Only "NO" decisions are returned by default, but all decisions can be
shown by specifying the `?include_yes_decisions=true` parameter in the
request.
Resolves#14593
Node roles are now serialized as well, they are not part of the node attributes anymore. DiscoveryNodeService takes care of dividing settings into attributes and roles. DiscoveryNode always requires to pass in attributes and roles separately.
readFrom is confusing because it requires an instance of the type that it
is reading but it doesn't modify it. But we also have (deprecated) methods
named readFrom that *do* modify the instance. The "right" way to implement
the non-modifying readFrom is to delegate to a constructor that takes a
StreamInput so that the read object can be immutable. Now that we have
`@FunctionalInterface`s it is fairly easy to register things by referring
directly to the constructor.
This change modifying NamedWriteableRegistry so that it does that. It keeps
supporting `registerPrototype` which registers objects to be read by
readFrom but deprecates it and delegates it to a new `register` method
that allows passing a simple functional interface. It also cuts Task.Status
subclasses over to using that method.
The start of #17085
This commit fixes a line-length checkstyle violation in
YamlSettingsLoaderTests.java and removes this file from the checkstyle
line-length suppressions.
This commit fixes a line-length checkstyle violation in
JsonSettingsLoaderTests.java and removes this file from the checkstyle
line-length suppressions.
This commit fixes a line-length checkstyle violation in
PropertiesSettingsLoader.java and removes this file from the checkstyle
line-length suppressions.
Change version, required a minor fix in the RPM building.
In case of a alpha/beta version, the release will contain alpha/beta
as the RPM version cannot contains dashes/tildes.
We already archive index level settings if we find an unknown or invalid/broken
value for a setting on node startup. The same could potentially happen for persistent
cluster level settings if we remove a setting or if we add validation to a setting that
didn't exist in the past. To ensure that only valid settings are recovered into the cluster
state we archive them (prefix them with `archive.` and log a warning. Tools that check the
cluster settings can then warn users that they have broken settings in their clusterstate that
got archived.
This commit fixes string formatting issues in the error handling and
provides a bettter error message if malformed input is detected.
This commit also adds tests for both situations.
Relates to #17212
this is the last step to remove node level service from IndexShard.
This means that tests can now more easily create an IndexShard instance
without starting a node and removes the dependency between IndexShard and Client/ScriptService
Today if something is wrong with the IndexMetaData we detect it very
late and most of the time if that happens we already allocated the index
and get endless loops and full log files on data-nodes. This change tries
to verify IndexService creattion during initial state recovery on the master
and if the recovery fails the index is imported as `closed` and won't be allocated
at all.
Closes#17187
We current have a ClusterService interface, implemented by InternalClusterService and a couple of test classes. Since the decoupling of the transport service and the cluster service, one can construct a ClusterService fairly easily, so we don't need this extra indirection.
Closes#17183
Also replaced the PercolatorQueryRegistry with the new PercolatorQueryCache.
The PercolatorFieldMapper stores the rewritten form of each percolator query's xcontext
in a binary doc values field. This make sure that the query rewrite happens only during
indexing (some queries for example fetch shapes, terms in remote indices) and
the speed up the loading of the queries in the percolator query cache.
Because the percolator now works inside the search infrastructure a number of features
(sorting fields, pagination, fetch features) are available out of the box.
The following feature requests are automatically implemented via this refactoring:
Closes#10741Closes#7297Closes#13176Closes#13978Closes#11264Closes#10741Closes#4317
This commit fixes the line-length checkstyle violations in
InstallPluginCommand.java and removes this from the list of files for
which the line-length check is suppressed.
Today we allow to set all kinds of index level settings on the node level which
is error prone and difficult to get right in a consistent manner.
For instance if some analyzers are setup in a yaml config file some nodes might
not have these analyzers and then index creation fails.
Nevertheless, this change allows some selected settings to be specified on a node level
for instance:
* `index.codec` which is used in a hot/cold node architecture and it's value is really per node or per index
* `index.store.fs.fs_lock` which is also dependent on the filesystem a node uses
All other index level setting must be specified on the index level. For existing clusters the index must be closed
and all settings must be updated via the API on each of the indices.
Closes#16799
Adds support for scheduling commands to run at a later time on another
thread pool in the current thread's context:
```java
Runnable someCommand = () -> {System.err.println("Demo");};
someCommand = threadPool.getThreadContext().preserveContext(someCommand);
threadPool.schedule(timeValueMinutes(1), Names.GENERAL, someCommand);
```
This happens automatically for calls to `threadPool.execute` but `schedule`
and `scheduleWithFixedDelay` don't do that, presumably because scheduled
tasks are usually context-less. Rather than preserve the current context
on all scheduled tasks this just makes it possible to preserve it using
the syntax above.
To make this all go it moves the Runnables that wrap the commands from
EsThreadPoolExecutor into ThreadContext.
This, or something like it, is required to support reindex throttling.
The build currently uses the old maven support in gradle. This commit
switches to use the newer maven-publish plugin. This will allow future
changes, for example, easily publishing to artifactory.
An additional part of this change makes publishing of build-tools part
of the normal publishing, instead of requiring a separate upload step
from within buildSrc. That also sets us up for a follow up to enable
precomit checks on the buildSrc code itself.
Today, certain bootstrap properties are set and read via system
properties. This action-at-distance way of managing these properties is
rather confusing, and completely unnecessary. But another problem exists
with setting these as system properties. Namely, these system properties
are interpreted as Elasticsearch settings, not all of which are
registered. This leads to Elasticsearch failing to startup if any of
these special properties are set. Instead, these properties should be
kept as local as possible, and passed around as method parameters where
needed. This eliminates the action-at-distance way of handling these
properties, and eliminates the need to register these non-setting
properties. This commit does exactly that.
Additionally, today we use the "-D" command line flag to set the
properties, but this is confusing because "-D" is a special flag to the
JVM for setting system properties. This creates confusion because some
"-D" properties should be passed via arguments to the JVM (so via
ES_JAVA_OPTS), and some should be passed as arguments to
Elasticsearch. This commit changes the "-D" flag for Elasticsearch
settings to "-E".
This change adds the infrastructure to run the rest tests on a multi-node
cluster that users 2 different minor versions of elasticsearch. It doesn't implement
any dedicated BWC tests but rather leverages the existing REST tests.
Since we don't have a real version to test against, the tests uses the current version
until the first minor / RC is released to ensure the infrastructure works.
Relates to #14406Closes#17072
Today we use hardcoded ports to form a cluster in the mulit-node case.
The hardcoded URIs are passed to the unicast host list which is error prone and
might cause problems if those ports are exhausted etc. This commit moves to a
less error prone way of forming the cluster where all nodes are started with port `0`
and all but the first node wait for the first node to write it's ports file to form a
cluster. This seed node is enough to form a cluster.
We removed leniencey from version parsing which caught problems with
-SNAPSHOT suffixes on plugin properies. This commit removes the -SNAPSHOT
from both es and the extension version and adds tests to ensure we can
parse older versions that allowed -SNAPSHOT in BWC way.
Closes#16964
Squashed commit of the following:
commit a23f9d2d29220991aa498214530753d7a5a148c6
Merge: eec9c4e 0b0a251
Author: Robert Muir <rmuir@apache.org>
Date: Mon Mar 7 04:12:02 2016 -0500
Merge branch 'master' into lucene6
commit eec9c4e5cd11e9c3e0b426f04894bb2a6dae4f21
Merge: bc67205 675d940
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 13:45:00 2016 -0500
Merge branch 'master' into lucene6
commit bc67205bdfe1526eae277ab7856fc050ecbdb7b2
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 09:56:31 2016 -0500
fix test bug
commit a60723b007ff12d97b1810cef473bd7b553a0327
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 15:35:35 2016 +0100
Fix SimpleValidateQueryIT to put braces around boosted terms
commit ae3a49d7ba7ced448d2a5262e5d8ec98671a9090
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 15:27:25 2016 +0100
fix multimatchquery
commit ae23fdb88a8f6d3fb7ba60fd1aaf3fd72d899aa5
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 15:20:49 2016 +0100
Rewrite DecayFunctionScoreIT to be independent of the similarity used
This test relied a lot on the term scoring and compared scores
that are dependent on the similarity. This commit changes the base query
to be a predictable constant score query.
commit 366c2d518c35d31251033f1b6f6a93f6e2ae327d
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 14:06:14 2016 +0100
Fix scoring in tests due to changes to idf calculation.
Lucene 6 uses a different default similarity as well as a different
way to calculate IDF. In contrast to older version lucene 6 uses docCount per field
to calculate the IDF not the # of docs in the index to overcome the sparse field
cases.
commit dac99fd64ac2fa71b8d8d106fe68825e574c49f8
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 08:21:57 2016 -0500
don't hardcoded expected termquery score
commit 6e9f340ba49ab10eed512df86d52a121aa775b0f
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 08:04:45 2016 -0500
suppress deprecation warning until migrated to points
commit 3ac8908424b3fdad44a90a4f7bdb3eff7efd077d
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 07:21:43 2016 -0500
Remove invalid test: all commits have IDs, and its illegal to do this.
commit c12976288124ad1a26467e7e848fb810548e7eab
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 07:06:14 2016 -0500
don't test with unsupported back compat
commit 18bbfe76128570bc70883bf91ff4c44c82d27817
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 07:02:18 2016 -0500
remove now invalid lucene 4 backcompat test
commit 7e730e572886f0ef2d3faba712e4256216ff01ec
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 06:58:52 2016 -0500
remove now invalid lucene 4 backwards test
commit 244d2ab6868ba5ac9e0bcde3c2833743751a25ec
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 06:47:23 2016 -0500
use 6.0 codec
commit 5f64d4a431a6fdaa1234adca23f154c2a1de8284
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 06:43:08 2016 -0500
compile, javadocs, forbidden-apis, etc
commit 1f273cd62a7fe9ca8f8944acbbfc5cbdd3d81ccb
Merge: cd33921 29e3443
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 10:45:29 2016 +0100
Merge branch 'master' into lucene6
commit cd33921ac742ef9fb351012eff35f3c7dbda7264
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:58:37 2016 -0500
fix hunspell dictionary loading
commit c7fdbd837b01f7defe9cb1c24e2ec65604b0dc96
Merge: 4d4190f d8948ba
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:41:53 2016 -0500
Merge branch 'master' into lucene6
commit 4d4190fd82601aaafac6b8254ccb3edf218faa34
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:39:14 2016 -0500
remove nocommit
commit 77ca69e288b1a41aa9595c921ed166c272a00ea8
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:38:24 2016 -0500
clean up numericutils vs legacynumericutils
commit a466d696fbaad04b647ffbc0857a9439b583d0bf
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:32:43 2016 -0500
upgrade spatial4j
commit 5412c747a8cfe638bacedbc8233163cb75cc3dc5
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:19:28 2016 -0500
move to 6.0.0-snapshot-8eada27
commit b32bfe924626b87e540692375ece09e7c2edb189
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 11:30:09 2016 +0100
Fix some test compile errors.
commit 6ccde35e9840b03c68d1a2cd47c7923a06edf64a
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 11:25:51 2016 +0100
Current Lucene version is 6.0.0.
commit f62e1015d931b4cc04c778298a8fa1ba65e97ad9
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 11:20:48 2016 +0100
Fix compile errors in NGramTokenFilterFactory.
commit 6837c6eabf96075f743649da9b9b52dd39611c58
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:50:59 2016 +0100
Fix the edge ngram tokenizer/filter.
commit ccd7f070de5efcdfbeb34b9555c65c4990bf1ba6
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:42:44 2016 +0100
The missing value is now accessible through a getter.
commit bd3b77f9b28e5b05daa3d49683a9922a6baf2963
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:41:51 2016 +0100
Remove IndexCacheableQuery.
commit 05f3091c347aeae80eeb16349ac51d2b53cf86f7
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:39:43 2016 +0100
Fix compilation of function_score queries.
commit 81cda79a2431ac78f56b0cc5a5765387f662d801
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:35:02 2016 +0100
Fix compile errors in BlendedTermQuery.
commit 70994ce8dd1eca0b995870974a38e20f26f96a7b
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 23:33:03 2016 -0500
add bug ID
commit 29d4f1a71f36f646b5a6060bed3db019564a279d
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 21:02:32 2016 -0500
easy .store changes
commit 5e1a1e6fd665fa455e88d3a8987362fad5f44bb1
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 20:47:24 2016 -0500
cleanups mostly around boosting
commit 333a669ec6c305ada5645d13ed1da0e19ec1d053
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 20:27:56 2016 -0500
more simple fixes
commit bd5cd98a1e089c866b6b4a5e159400b110140ce6
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 19:49:38 2016 -0500
more easy fixes and removal of ancient cruft
commit a68f419ee47da5f9c9ce5b372f01d707e902474c
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 19:35:02 2016 -0500
cutover numerics
commit 4ca5dc1fa47dd5892db00899032133318fff3116
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 18:34:18 2016 -0500
fix some constants
commit 88710a17817086e477c6c021ec346d0534b7fb88
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 18:14:25 2016 -0500
Add spatial-extras jar as a core dependency
commit c8cd6726583e5ce3f546ed355d4eca037164a30d
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 18:03:33 2016 -0500
update to lucene 6 jars
The suffix TransportAction is misleading as it may make think that it extends TransportAction, but it does not. This class makes accessible the different search operations exposed by SearchService through the transport layer. Also resolved few compiler warnings in the class itself.
The big win here is catching tests that are incorrectly named and will
be skipped by gradle, providing a false sense of security.
The whole thing takes about 10 seconds on my Macbook Air, not counting
compiling the test classes, which seems worth it. Because this runs as
a gradle task with propery UP-TO-DATE handling it can be skipped if the
tests haven't been changed which should save some time.
I chose to keep this in test:framework rather than a new subproject of
buildSrc because ESIntegTestCase and doesn't inroduce any additional
dependencies.
TransportSearchTypeAction and subclasses are not actually transport actions, but just support classes useful for their inner async actions that can easily be extracted out so that we get rid of one too many level of abstraction.
Same pattern can be applied to TransportSearchScrollQueryAndFetchAction & TransportSearchScrollQueryThenFetchAction which we could remove in favour of keeping only their inner classes named SearchScrollQueryAndFetchAsyncAction and SearchScrollQueryThenFetchAsyncAction.
Remove org.elasticsearch.action.search.type package, collapsed remaining classes into existing org.elasticsearch.action.search package
Make also ParsedScrollId ScrollIdForNode and TransportSearchHelper classes and their methods package private.
Closes#11710
Removes all our logger wrappers except the wrapper for log4j1.2. If you
depend on Elasticsearch's jar in your application you'll need to declare
log4j 1.2 and/or some bridge to your favorite logger.
We did this to simplify our builds and code. No more commons-logging like
log implementation sniffing. No more optional dependency hacks in gradle.
We might one day want to use j.u.l instead of log4j. If we do want that
we can recover its wrapper by studying this commit. We didn't go directly
to j.u.l in this commit because that is a bigger change. Our logging
configuration is based on log4j1.2 and people are used to it. So it'd
be a much more fraught breaking change to do that conversion.
This commit works around an issue with hostname verification in HttpClient when using IPv6
addresses in URLs. When an IPv6 address is used in a URL it is typically wrapped with square
brackets. The hostname verifier for HttpClient does not recognize these as valid IPv6 addresses
and instead treats them as a DNS name. We wrap the strict hostname verifier for this version
of HttpClient and strip brackets if we need to.
The corresponding issue in HttpClient is https://issues.apache.org/jira/browse/HTTPCLIENT-1698
but the fix has not been released yet in a stable version.
The extra config copy task for integ tests was previously overwriting
the source and destination for each file to be copied, leaving only the
last. This fixes it to create sub copy specs (which is what was
originally intended) for each file.
Today we have the notion of a snapshot inside Version.java which makes
releasing complicated since to do a release Version.java must be changed.
This commit removes all notions of snapshot from the code and allows to
switch between snapshot and release build by specifying a system property on
the build. For instance running:
```
gradle run -Dbuild.snapshot=false
```
will build and package a release build while the default always
builds snapshots. Calls to the main rest action will still get the snapshot
information rendered out with the response.
The current logic for doing recovery from a source to a target shourd is tightly coupled with the underlying network pipes. This changes decouple the two, making it easier to add unit tests for shard recovery that doesn't involve the node and network environment.
On top that, RecoveryTarget is renamed to RecoveryTargetService leaving space to renaming RecoveryStatus to RecoveryTarget (and thus avoid the confusion we have today with RecoveryState).
Correspondingly RecoverySource is renamed to RecoverySourceService.
Closes#16605
This commit moves IndicesRequestCache into o.e.indics and makes all API in this
class package private. All references to SearchReqeust, SearchContext etc. have been factored
out and relevant glue code has been added to IndicesService. The IndicesRequestCache is not a
simple class without any hard dependencies on ThreadPool nor SearchService or IndexShard. This now
allows to add unittests.
This commit also removes two settings `indices.requests.cache.clean_interval` and `indices.fielddata.cache.clean_interval`
in favor of `indices.cache.clean_interval` which cleans both caches.
That is like some kind of cardinal sin or something, right?
We had two violations though they weren't super likely to be keys in a hashmap
any time soon.
This commit forbids the usage of java.security.MessageDigest#clone() in
favor of getting a thread local instance using
org.elasticsearch.common.hash.MessageDigests. This is to prevent running
into java.lang.CloneNotSupportedExceptions for message digest providers
that do not support clone.
Closes#16543
This is a simple port of the mapper attachment plugin to the ingest
functionality, no new features. The only option is to limit
the number of chars to prevent indexing of huge documents.
Fields can be selected in the processor as well.
Close#16303
For the ongoing search refactoring (#10217) the PhraseSuggestionBuilder
gets a way of parsing from xContent that will eventually replace the
current SuggestParseElement. This PR adds the fromXContent method
to the PhraseSuggestionBuilder and also adds parsing code for the
common suggestion parameters to SuggestionBuilder.
Also adding links from the Suggester implementations registeres in the
Suggesters registry to the corresponding prototype that is going to
be used for parsing once the refactoring is done and we switch from
parsing on shard to parsing on coordinating node.
140 characters is our official line length from CONTRIBUTING.md. It is a
little longer than fits well in github but you can use a userstyle to fix
that. It is perfect for Eclipse and unified diffs but too wide for side by
side diffs. Regardless, its the official limit we've had for years. We just
haven't enforced it automatically and we haven't enforced it using github
because line limits are hard to notice there.
This only hits about 2/3 of the java files - those that didn't have failures
already. If they did have a failure it suppresses them. We should pick files
off that list as time goes on.
For posterity I generated the suppressions by running checkstyle with
ignoreFailures = true, piping the output to a file, and then running it
through this:
perl -ne 'print if s{.*\[ant:checkstyle\] /.+/elasticsearch/}{ <suppress files="} && s{\.java.+}{\.java" checks="LineLength" />}' | uniq
Early versions of JDK 8 have a compiler bug that prevents assignment to
a definitely unassigned final variable inside the body of a lambda. This
commit adds an early-out to the build process which also gives a useful
error message.
Closes#16418
This commit removes and forbids the use of lowercase ells ('l') in long
literals because they are often hard to distinguish from the digit
representing one ('1').
Closes#16329
This adds a small pile of checkstyle checks that all pass without any changes.
It moves some checks from ForbiddenPatterns that were java only into
checkstyle.
Don't duplicate forbiddenPatterns in checkstyle
* `test.cluster.node.seed` was only used in one place where it wasn't adding value and was now replaced with a constant
* `tests.portsfile` is now an official setting and has been renamed to `node.portsfile`
this commit also convert `search.default_keep_alive` and `search.keep_alive_interval` to the new settings infrastrucutre
This is consistent with what happens in elasticsearch.sh/.bat, and it means
tests will work even if there is a crazy "system" JNA installed on the machine.
The rest test framework, because it used to be tightly integrated with
ESIntegTestCase, currently expects the addresses for the test cluster to
be passed using the transport protocol port. However, it only uses this
to then find the http address.
This change makes ESRestTestCase extend from ESTestCase instead of
ESIntegTestCase, and changes the sysprop used to tests.rest.cluster,
which now takes the http address.
closes#15459
Site plugins used to be used for things like kibana and marvel, but
there is no longer a need since kibana (and marvel as a kibana plugin)
uses node.js. This change removes site plugins, as well as the flag for
jvm plugins. Now all plugins are jvm plugins.
With this commit we do not check only if an endpoint is up but
we also check that the cluster status is green. Previously,
builds sporadically failed to pass this condition.
1. Uses forbidden patterns to prevent things from referencing
java.io.Serializable or from mentioning serialVersionUID.
2. Uses -Xlint:-serial so we don't have to hear from javac that we aren't
declaring serialVersionUID on any classes that we make that happen to extend
Serializable.
3. Remove Serializable and serialVersionUID declarations.
I didn't use forbidden apis because it doesn't look like it has a way to ban
explicitly implementing Serializable. If you try to ban Serializable with
forbidden apis you end up banning all Exceptions and all Strings.
Closes#15847
This commit removes and now forbids use of
org.apache.lucene.index.IndexWriter#isLocked as this method was
deprecated in LUCENE-6508. The deprecation is due to the fact that
checking if a lock is held before acquiring that lock is subject to a
time-of-check-to-time-of-use race condition. There were three uses of
IndexWriter#isLocked in the code base:
- a logging statement in o.e.i.e.InternalEngine where we are already in
an exceptional condition that the lock was held; in this case,
logging whether or not the directory is locked is superfluous
- in o.e.c.l.u.VersionsTests where we were verifying that a write lock
is released upon closing an IndexWriter; in this case, the check is
not needed as successfully closing an IndexWriter releases its
write lock
- in o.e.t.s.MockFSDirectoryService where we were verifying that a
directory is not write-locked before (implicitly) trying to obtain
such a write lock in org.apache.lucene.index.CheckIndex#<init> (this
is the exact type of a situation that is subject to a race
condition); in this case we can proceed by just (implicitly) trying
to obtain the write lock and failing if we encounter a
LockObtainFailedException
This commit removes and now forbids all uses of
java.util.concurrent.ThreadLocalRandom across the codebase. The
underlying issue with ThreadLocalRandom is that it can not be
seeded. This means that if ThreadLocalRandom is used in production code,
then tests that cover any code path containing ThreadLocalRandom will be
prevented from being reproducible by use of ThreadLocalRandom. Instead,
using org.elasticsearch.common.random.Randomness#get will give
reproducible sources of random when running under tests and otherwise
still give an instance of ThreadLocalRandom when running as production
code.
This was originally intended to be general purpose in #15555, but
that still had problems. Instead, this change fixes the issue explicitly
for slf4j-api, since that is the problematic dep that is not actually
included in the distributions.
This change adds a Fixture class for use by gradle. A Fixture is an
external process that integration tests will use. It can be added as a
dependsOn for integTest, and will automatically be shutdown upon success
or failure, as well as relevant information dumped on failure. There is
also an example fixture in this change.
This new task allows setting code, similar to a doLast or doFirst,
except it is specifically geared at running ant (and thus called doAnt).
It adjusts the ant logging while running the ant so that the log
level/behavior can be tweaked, and automatically buffers based on gradle
logging level, and dumps the ant output upon failure.
This fixes the `lenient` parameter to be `missingClasses`. I will remove this boolean and we can handle them via the normal whitelist.
It also adds a check for sheisty classes (jar hell with the jdk).
This is inspired by the lucene "sheisty" classes check, but it has false positives. This check is more evil, it validates every class file against the extension classloader as a resource, to see if it exists there. If so: jar hell.
This jar hell is a problem for several reasons:
1. causes insanely-hard-to-debug problems (like bugs in forbidden-apis)
2. hides problems (like internal api access)
3. the code you think is executing, is not really executing
4. security permissions are not what you think they are
5. brings in unnecessary dependencies
6. its jar hell
The more difficult problems are stuff like jython, where these classes are simply 'uberjared' directly in, so you cant just fix them by removing a bogus dependency. And there is a legit reason for them to do that, they want to support java 1.4.
This change removes hardcoded ports from cluster formation. It passes
port 0 for http and transport, and then uses a special property to have
the node log the ports used for http and transport (just for tests).
This does not yet work for multi node tests. This brings us one step
closer to working with --parallel.
This commit removes and now forbids all uses of
Collections#shuffle(List) and Random#<init>() across the codebase. The
rationale for removing and forbidding these methods is to increase test
reproducibility. As these methods use non-reproducible seeds, production
code and tests that rely on these methods contribute to
non-reproducbility of tests.
Instead of Collections#shuffle(List) the method
Collections#shuffle(List, Random) can be used. All that is required then
is a reproducible source of randomness. Consequently, the utility class
Randomness has been added to assist in creating reproducible sources of
randomness.
Instead of Random#<init>(), Random#<init>(long) with a reproducible seed
or the aforementioned Randomess class can be used.
Closes#15287
Wildcard imports are terrible, they cause ambiguity in the code,
make it not compile with the future versions of java in many cases.
We should simply fail the build on this, it is messiness, caused by
messy Intellij configuration
We have some tests which have crazy dependencies, like on other plugins.
This change adds a "messy-test" gradle plugin which can be used for qa
projects that these types of tests can run in. What this adds over
regular standalone tests is the plugin properties and metadata on the
classpath, so that the plugins are properly initialized.
We have eclipse settings added to all projects when running gradle
eclipse, but buildSrc is its own special project that is not
encapsulated by allprojects blocks. This adds eclipse settings to
buildSrc.
This is a relic from shading where it was trickier to implement.
Third party signatures are already in e.g. the test list, there
is no reason to separate them out.
Instead, we could have a third party signatures that does
something different... like keep tabs on third party libraries.
This commit removes and now forbids all uses of the type-unsafe empty
Collections fields Collections#EMPTY_LIST, Collections#EMPTY_MAP, and
Collections#EMPTY_SET. The type-safe methods Collections#emptyList,
Collections#emptyMap, and Collections#emptySet should be used instead.
Typical failure:
```
:test-framework:dependencyLicenses (Thread[main,5,main]) started.
:test-framework:dependencyLicenses
Executing task ':test-framework:dependencyLicenses' (up-to-date check took 0.0 secs) due to:
Task has not declared any outputs.
:test-framework:dependencyLicenses FAILED
:test-framework:dependencyLicenses (Thread[main,5,main]) completed. Took 0.023 secs.
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':test-framework:dependencyLicenses'.
> Licences dir /mnt/jenkins/workspace/es_core_master_strong/test-framework/licenses does not exist, but there are dependencies
```
Related to #15168
This change attempts to simplify the gradle tasks for precommit. One
major part of that is using a "less groovy style", as well as being more
consistent about how tasks are created and where they are configured. It
also allows the things creating the tasks to set up inter task
dependencies, instead of assuming them (ie decoupling from tasks
eleswhere in the build).
We had increased this in maven, but it was lost in the transition to
gradle. This change adds it as a configurable setting the the logger for
randomized testing and bumps it to 25.
The current delay waits until later after normal configuration, but
still just after resolution (ie when paths would be known for
dependencies but not actual execution). This delays the checks further
to be done right before we actually execute the copy task.
closes#15068
Currently if running with --info, the command line for ES, along with
env vars, are logged before they may be ammended to add debug options.
This moves the adding JAVA_OPTS to before we print the command.
This adds the standalone tests so they will compile (and thus can be
modified with import completion) within IntelliJ. It also explicitly
sets up buildSrc as a module.
Note that this does *not* mean eg evil-tests can be run from intellij.
These are special tests that require special settings (eg disabling
security manager). They need to be run from the command line.
closes#15075
The Exec task outputs stdout/stderr to the standard streams by default.
However, to keep output short, we currently capture this, and only
output if the task failed. This change makes a small wrapper around Exec
to facilitate this behavior anywhere we use Exec.
This change delays the lookup for whatever is passed to extra config as
the source file to happen at execution time. This allows using eg a task
which generates a file, but maintains the checks that the file is not a
dir and that it exists at runtime.
This change allows copy extra files into the integ test cluster before
it runs. However, it explicitly forbids overwriting elasticsearch.yml,
since that is generated.
The current mechanism for adding plugins to the integTest cluster is to
have a FileCollection. This works well for the integTests for a single
plugin, which automatically adds itself to be installed. However, for qa
tests where many plugins may be installed, and from other projects, it
is cumbersome to add configurations, dependencies and dependsOn
statements over and over. This simplifies installing a plugin from
another project by moving this common setup into the cluster
configuration code.
The current wait condition for an integ test cluster being up is a
simple http get on the root path for elasticsearch. However, it is
useful to allow having arbitrary wait conditions. This change reworks
the wait task to first check that each node process started successfully
and has a socket up, followed by an arbitrary wait condition which
defaults to the current http get.
Also, cluster settings are allowed to be added, and overriden. Finally,
custom setup commands are made relative to the elasticsearch home dir
for each node.
This change removes the subdirectory support for extra-plugins, and
replaces it with an iteration of sibling directories with the prefix
"extra-plugin-".
This change adds back the last of the missing test options to the juni4
wrapper. leaveTemporary is important in that setting it to true (which
is how we had it set in maven) removes the warnings we currently get
about a leftover file that cannot be deleted (from jna).
This change adds back the multi node smoke test, as well as making the
cluster formation for any test allow multiple nodes. The main changes in
cluster formation are abstracting out the node specific configuration to
a helper struct, as well as making a single wait task that waits for all
nodes after their start tasks have run. The output on failure was also
improved to log which node's info is being printed.
* Forbid System.setProperties & co in forbidden APIs.
* Ban property write access at runtime with security manager.
Plugins that need to modify system properties will need to request permission in their plugin-security.policy
As part of the refactoring to allow --debug-jvm with gradle run, the way
java options are passed for integ tests was changed. However, we need to
make sure the jvm argline passed goes to ES_GC_OPTS because this
allows overriding things like which garbage collector we run, which we
do for testing from jenkins. This change adds back ES_GC_OPTS.
Because jar hell checks run during static initialization of tests, a
failure will result in all tests failing. However, only the first test
within each jvm shows the jarhell failure, and later tests get a class
not found exception for the class that failed to load when static init
failed.
This change adds a task to run as part of precommit, which checks the
test runtime classpath for jarhell.
closes#14721
If there is a failure in the elasticsearch start script, we currently
completely lose the failure. This is due to how spawning works with ant.
This change avoids the issue by introducing an intermediate script,
built dynamically before running ant exec, which runs elasticsearch and
redirects the output to a log file. This essentially causes us to run
elasticsearch in the foreground and capture the output, but at the same
time keep a running script which ant can pump streams from (which will
always be empty).
There were a number of subtle issues with the existing logging that
wraps events from Junit4 and ant. This change:
* Tweaks at what level certain events are logged
* Fixes -Dtests.output=always to force everything to be logged
* Makes -Dtests.class imply -Dtests.output=always
* Captures ant logging from junit4, so that direct jvm output will be
logged on failure when not using gradle info logging
When the build tries to start an elasticsearch instance but the start fails
if it fails to find the log file then we log a line about how we can't find
the file.
Sometimes when running elasticsearch, it is useful to attach a remote
debugger. This change adds a --debug-jvm option (the same name gradle
uses for its tests debug option), which adds java agent config for a
remote debugger. The configuration is set to hava java suspend until the
remove debugger is attached.
closes#14772
If you build elasticsearch without a git repository it was creating a null
shortHash which was causing Elasticsearch not to be able to form transport
connections.
Closes#14748
This makes the rest tests **tons** more responsive.
Also stop test progress output from jumping by using formating. The progess
now looks like:
Suites [004/549], Tests [0019|0|0], in 1.58s J2 completed UpdateNumberOfReplicasTests
The changes included are:
1. The suites, total tests, and JVM id are now padded based on their maximum
size. The maximum number of tests is just a guess because that data isn't
easily available when the suite starts. JVM id rarely matters because only
the most crazy individuals use more than 10 JVMs.
2. The suite information is reordered. Now its runtime, jvm id, suite name,
and, optionally, method name. This reordering is useful because the thing
that varies in length, the suite and method name, are on the right hand
side. This means that nothing jumps around during the test run.
We recently got a run command with gradle, but it is sometimes useful to
run ES with a specific plugin. This is a start, by making each esplugin
have a run command which installs the plugin and runs elasticsearch in
the foreground.
This makes forbidden patterns a little smarter, so it does not need to
run on every build. It works because the marker file timestamp will be
compared against the source files (which are the inputs to forbidden
patterns).
closes#14788
We currently enforce JAVA_HOME is set, but use gradle's java version to
compile, and JAVA_HOME version to test. This change makes the compile
and test both use JAVA_HOME, and clarifies which java version is used by
gradle and that used for the compile/test.
With gradle, deploying to maven means first generating poms. These are
filled in based on dependencies of the project. Recently, we started
disallowing transitive dependencies. However, this configuration does
not translate to maven poms because maven has no concept of excluding
all transitive dependencies.
This change adds exclusions for each of the transitive deps of each
dependency being added to the maven pom. It does so by creating dummy
configurations for each direct dependency (which does not have
transitive deps excluded), so that we can iterate the transitive deps
when building the pom.
Note, this should be simpler (just modifying maven's pom model), but
gradle tries to hide that from their api, causing us to need to
manipulate the xml directly.
https://discuss.gradle.org/t/modifying-maven-pom-generation-to-add-excludes/12744
Currently elasticsearch in integ tests is started using an ant task on
windows, or gradle exec on everything else. However, gradle exec has
some flaws, one being Ctrl-C does not run finalizedBy tasks, which means
interrupting integ tests will leak a jvm. This change makes all systems
use ant exec. One caveat is, if there is any output by the jvm, we lose
it in ant bit heaven. But this is no different than what we had with
gradle. In the future, we should look at using a separate thread to
pump streams from the elasticsearch process.
closes#14701
closes#14726
Squashed commit of the following:
commit 5b591e98570e3fa481b2816a44063b98bff36ddf
Author: Robert Muir <rmuir@apache.org>
Date: Fri Nov 13 00:54:08 2015 -0500
add assumption for self-signing in PluginManagerTests
commit ed11e5371b6f71591dc41c6f60d033502cfcf029
Author: Robert Muir <rmuir@apache.org>
Date: Fri Nov 13 00:20:59 2015 -0500
show error output from integ test startup
commit d8b187a10e95d89a0e775333dcbe1aaa903fb376
Author: Robert Muir <rmuir@apache.org>
Date: Thu Nov 12 22:14:11 2015 -0500
fix gradle check under jigsaw
If we use JAVA_HOME consistently for tests, we can run tests with a
different version of java than gradle runs with. For example, this
enables running tests with jigsaw, but building with java 8. The only
caveat is intellij does not set JAVA_HOME. This change enforces
JAVA_HOME is set, but ignores for intellij.
This moves the min java version used by elasticsearch to one place, a
constant in BuildPlugin. For me on java 9, this fixed my jar to have the
correct target/source versions.
closes#14702
Transitive dependencies can be confusing and hard to deal with when
conflicts arise between them. This change removes transitive
dependencies from elasticsearch, and forces any dependency conflicts to
be resolved manually, instead of automatically by gradle.
closes#14627
In gradle 2.7 (or groovy 2.3.10, not sure which), there appears to be a
bug on linux where using a fully qualified class (without an import
statement) does not work. This change forces gradle 2.8 or above. It
also moves the logic around a little for the version check so the build
info is printed before checks against that info.