The current rest backcompat tests, which run against a mixed cluster of
5.x and 6.0 nodes, depend on snapshot builds of 5.x. However, this has
the potential for inconsistency that results in CI failures, and happens
quite often, whenever some backcompat logic is added to 5.x, but the bwc
test on master fails because the 5.x code has not yet been published as
a snapshot.
This change creates a git clone of the 5.x branch,
builds the zip distribution, and ties that into gradle substitutions for
the 5.x version.
After the removal of the joda time hack we used to have, we can cleanup
the codebase handling in security, jarhell and plugins to be more picky
about uniqueness. This was originally in #18959 which was never merged.
closes#18959
Gradle's finalizedBy on tasks only ensures one task runs after another,
but not immediately after. This is problematic for our integration tests
since it allows multiple project's integ test clusters to be
simultaneously. While this has not been a problem thus far (gradle 2.13
happened to keep the finalizedBy tasks close enough that no clusters
were running in parallel), with gradle 3.3 the task graph generation has
changed, and numerous clusters may be running simultaneously, causing
memory pressure, and thus generally slower tests, or even failure if the
system has a limited amount of memory (eg in a vagrant host).
This commit reworks how integ tests are configured. It adds an
`integTestCluster` extension to gradle which is equivalent to the current
`integTest.cluster` and moves the rest test runner task to
`integTestRunner`. The `integTest` task is then just a dummy task,
which depends on the cluster runner task, as well as the cluster stop
task. This means running `integTest` in one project will both run the
rest tests, and shut down the cluster, before running `integTest` in
another project.
This commit enforces the requirement of Content-Type for the REST layer and removes the deprecated methods in transport
requests and their usages.
While doing this, it turns out that there are many places where *Entity classes are used from the apache http client
libraries and many of these usages did not specify the content type. The methods that do not specify a content type
explicitly have been added to forbidden apis to prevent more of these from entering our code base.
Relates #19388
Today when users start Elasticsearch with their Java configuration
pointing to a pre-Java 8 install, they encounter a cryptic message:
Exception in thread "main" java.lang.UnsupportedClassVersionError:
org/elasticsearch/bootstrap/Elasticsearch : Unsupported major.minor
version 52.0
They often think that they have Java 8 installed but if their JAVA_HOME
or other configuration is causing them to start with a pre-Java 8
install, this error message does not help them.
We introduce a Java version checker that runs on Java 6 as part of the
startup scripts. If the Java version is pre-Java 8, we can display a
helpful error message to the user informing them of the Java version
that the runtime was started with. Otherwise, Elasticsearch starts as it
does today.
This change adds a strict mode for xcontent parsing on the rest layer. The strict mode will be off by default for 5.x and in a separate commit will be enabled by default for 6.0. The strict mode, which can be enabled by setting `http.content_type.required: true` in 5.x, will require that all incoming rest requests have a valid and supported content type header before the request is dispatched. In the non-strict mode, the Content-Type header will be inspected and if it is not present or not valid, we will continue with auto detection of content like we have done previously.
The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to auto-detect the content type.
As part of this, many transport requests and builders were updated to provide methods that
accepted the XContentType along with the bytes and the methods that would rely on auto-detection have been deprecated.
In the non-strict mode, deprecation warnings are issued whenever a request with body doesn't provide the Content-Type header.
See #19388
By default, the JVM GC log file grows without
limitation. This is inconvenient for a long running
process like Elasticsearch.
With this commit we add an example configuration
for a rotating GC log in `conig/jvm.options`.
For certain situations, end-users need the base path for Elasticsearch
logs. Exposing this as a property is better than hard-coding the path
into the logging configuration file as otherwise the logging
configuration file could easily diverge from the Elasticsearch
configuration file. Additionally, Elasticsearch will only have
permissions to write to the log directory configured in the
Elasticsearch configuration file. This commit adds a property that
exposes this base path.
One use-case for this is configuring a rollover strategy to retain logs
for a certain period of time. As such, we add an example of this to the
documentation.
Additionally, we expose the property es.logs.cluster_name as this is
used as the name of the log files in the default configuration.
Finally, we expose es.logs.node_name in cases where node.name is
explicitly set in case users want to include the node name as part of
the name of the log files.
Relates #22625
The config template that ships with Elasticsearch distributions contains
links to various pieces of documentation. Links go out of date and get
broken. This commit removes such links from the config template.
Relates #22553
This commit reverts switching to the unpooled allocator (for now) to let
some benchmarks run to see if this is the source of an increase in GC
times.
Relates #22452
Right now closing a shard looks like it strands refresh listeners,
causing tests like
`delete/50_refresh/refresh=wait_for waits until changes are visible in search`
to fail. Here is a build that fails:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+multi_cluster_search+multijob-darwin-compatibility/4/console
This attempts to fix the problem by implements `Closeable` on
`RefreshListeners` and rejecting listeners when closed. More importantly
the act of closing the instance flushes all pending listeners
so we shouldn't have any stranded listeners on close.
Because it was needed for testing, this also adds the number of
pending listeners to the `CommonStats` object and all API to which
that flows: `_cat/nodes`, `_cat/indices`, `_cat/shards`, and
`_nodes/stats`.
Netty plays a lot of games with recycling byte buffers in thread local
caches, and using a pooled byte buffer allocator to reduce pressure on
the garbage collector.
The recycler in particular appears to be fraught with peril. It appears
that there are circumstances where the recycler does not recycle quickly
enough and can exceed its capacity leading to heap exhaustion and out of
memory errors. If you spend a few minutes reading the history of the
recycler on the Netty GitHub issues, it appears it has been nothing but
a source of trouble, and the project itself has an open issue that
proposes disabling by default and possibly even removing the recycler.
The pooled byte buffer allocator has problems itself. It sizes the pool
based on the number of runtime processors and can indeed grab a very
large percentage of the heap (in some cases 50% or more). Additionally,
the Netty project continues to struggle with leaks here.
We are seeing users struggle with issues in 5.x that I think are largely
driven by some of the problems here with Netty.
This change proposes to disable the recycler, and to disable the pooled
byte buffer allocator. I think that disabling these features will return
some of the stablity that these features appear to be losing us.
I have done performance testing on my workstation with disabling these
and I do not see a difference in performance. I propose that we make
this change in master and let some nightly benchmarks run to confirm
that there is not a difference in performance. If we are comfortable
with the performance changes, I propose backporting this to all active
branches.
Relates #22452
This change is the first towards providing the ability to store
sensitive settings in elasticsearch. It adds the
`elasticsearch-keystore` tool, which allows managing a java keystore.
The keystore is loaded upon node startup in Elasticsearch, and used by
the Setting infrastructure when a setting is configured as secure.
There are a lot of caveats to this PR. The most important is it only
provides the tool and setting infrastructure for secure strings. It does
not yet provide for keystore passwords, keypairs, certificates, or even
convert any existing string settings to secure string settings. Those
will all come in follow up PRs. But this PR was already too big, so this
at least gets a basic version of the infrastructure in.
The two main things to look at. The first is the `SecureSetting` class,
which extends `Setting`, but removes the assumption for the raw value of the
setting to be a string. SecureSetting provides, for now, a single
helper, `stringSetting()` to create a SecureSetting which will return a
SecureString (which is like String, but is closeable, so that the
underlying character array can be cleared). The second is the
`KeyStoreWrapper` class, which wraps the java `KeyStore` to provide a
simpler api (we do not need the entire keystore api) and also extend
the serialized format to add metadata needed for loading the keystore
with no assumptions about keystore type (so that we can change this in
the future) as well as whether the keystore has a password (so that we
can know whether prompting is necessary when we add support for keystore
passwords).
* Remove a checked exception, replacing it with `ParsingException`.
* Remove all Parser classes for the yaml sections, replacing them with static methods.
* Remove `ClientYamlTestFragmentParser`. Isn't used any more.
* Remove `ClientYamlTestSuiteParseContext`, replacing it with some static utility methods.
I did not rewrite the parsers using `ObjectParser` because I don't think it is worth it right now.
A previous fix for the handling of paths on Windows related to paths
containing multiple spaces introduced a issue where if JAVA_HOME ends
with a backslash, then Elasticsearch will refuse to start. This is not a
critical bug as a workaround exists (remove the trailing backslash), but
should be fixed nevertheless. This commit addresses this situation while
not regressing the previous fix.
Relates #22132
This commit fixes the handling of spaces in Windows paths. The current
mechanism works fine in a path that contains a single space, but fails
on a path that contains multiple spaces. With this commit, that is no
longer the case.
Relates #21921
Elasticsearch can be run in a few different ways:
- from the command line on Linux and Windows
- as a service on Linux and Windows
on both 32-bit client and 64-bit server VMs. We strive for a great
out-of-the-box experience any of these combinations but today it is
lacking on 32-bit client JVMs and on the Windows service. There are two
deficiencies that arise:
- on any 32-bit client JVM we fail to start out of the box because we
force the server JVM in jvm.options
- when installing the Windows service, the thread stack size must be
specified in jvm.options
This commit attempts to address these deficiencies.
We should continue to force the server JVM because there are systems
where the server JVM is not active by default (e.g., the 32-bit JDK on
Windows). This does mean that if a user tries to run with a client JVM
they will see a failure message at startup but this is the best that we
can do if we want to continue to force the server JVM. Thus, this commit
at least documents this situation.
To improve the situation with installing the Windows service, this
commit adds a default setting for the thread stack size. This default is
chosen based on the default thread stack size across all 64-bit server
JVMs. This means that if a user tries to run with a 32-bit JVM they
could otherwise see significantly higher memory usage (this situation is
complicated, it's really only on Windows where the extra memory usage is
egregious, but cutting into the 32-bit address space on any system is
bad). So this commit makes it so that the out-of-the-box experience is
improved for the Windows service on 64-bit server JVMs and we document
the need to adjust this setting on 32-bit JVMs.
Again, we are focusing on the out-of-the-box experience here and this
means optimizing for the best experience on any 64-bit server JVM as
this covers the vast majority of the user base. The users that are on
32-bit JVMs will suffer a little bit but at least now any user on any
64-bit server JVM can start Elasticsearch out of the box.
Finally, we fix some references to the jvm.options documentation.
Relates #21920
During package install on systemd-based systems, we try to set
vm.max_map_count. On some systems (e.g., containers), users do not have
the ability to tune these parameters from within the container. This
commit provides an option for these users to skip setting such kernel
parameters.
Relates #21899
Our default pattern layout truncates log messages. This is to avoid
blowing disk space from excessively log messages, which can happen if a
message contains a mapping or an large query. Yet, we trunacte from the
beginning which is probably where the most germane information is. This
commit modifies the default pattern layout to trunacte from the end.
Relates #21609
Adds a version constant for it, bwc indices, and a vagrant upgrade-from
version. Also bumps the "upgrade from" version for the backwards-5.0
test and adds `skip`s for tests that don't fail against 5.0 so we skip
them during the backwards testing.
Finally, this skips the "Shrink index via API" test because it fails
consistently for me. Inconsistently for CI, but consistently for me.
I'll work on making it consistent tomorrow.
On some systems these utilities are in /usr/lib/systemd/systemd-sysctl
and /usr/sbin/sysctl, and on others the /usr is dropped. This commit
accounts for that fact.
Our docs claim that we set vm.max_map_count automatically. This is not
quite the case. The story is that on SysV init we set vm.max_map_count
each time the service starts, which is good. On systemd, we create a
sysctl.d conf file that sets vm.map_max_count, but this is only
meaningful if the system is rebooted after package install. This commit
modifies the post-install script so that we run systemd-sysctl so that
the vm.max_map_count change occurs after package install without a
reboot.
Relates #21507
Given that the default is now 1, the comment in the config file was outdated. Also considering that the default value is production ready, we shouldn't list it among the values that need attention when going to production.
Relates to #19964
The environment variable ES_JVM_OPTIONS allows end-users to specify a
custom location for the jvm.options file. Unfortunately, this
environment variable is not exported from the SysV init scripts. This
commit addresses this issue, and includes a test that ES_JVM_OPTIONS and
ES_JAVA_OPTS work for the SysV init packages.
Relates #21445
All plugins currently have their own licenses dir for the
dependencyLicenses task, but core disables this and has the check inside
distribution. This may have been better for maven, but for
gradle it makes more sense to just use the dependencyLicenses task that
automatically exists inside :core, and remove the hacked up version that
is inside distribution.
At one point in the past when moving out the rest tests from core to
their own subproject, we had multiple test classes which evenly split up
the tests to run. However, we simplified this and went back to a single
test runner to have better reproduceability in tests. This change
removes the remnants of that multiplexing support.
On ubuntu 14.04, which uses upstart, where as our debian package uses
sysvinit, there is no stdout/stderr message printed when starting up,
because the start-stop-daemon swallows it.
As Elasticsearch is started to daemonize, we can remove the background
flag from the start-stop-daemon and thus see, if the system does not have
enough memory for starting up - something that happens often on VMs, since
Elasticsearch 5.0 uses 2gb by default instead of one.
Relates #21300
Relates #12716
Today when running gradle clean
:distribution:(integ-test-zip|tar|zip):assemble, the created archive
distribution will be missing the empty plugins directory. This is
because the empty plugins folder created in the build folder for the
copy spec task is created during configuration and then is later wiped
away by the clean task. This commit addresses this issue, by pushing
creation of the directory out of the configuration phase.
Relates #21271
Today when installing Elasticsearch from an archive distribution (tar.gz
or zip), an empty plugins folder is not included. This means that if you
install Elasticsearch and immediately run elasticsearch-plugin list, you
will receive an error message about the plugins directory missing. While
the plugins directory would be created when starting Elasticsearch for
the first time, it would be better to just include an empty plugins
directory in the archive distributions. This commit makes this the
case. Note that the package distributions already include an empty
plugins folder.
Relates #21204
When installing the Windows service, certain settings like the minimum
heap, maximum heap and thread stack size setting must be set. While
there is an error message making mention of this fact, the error message
is not explicit exactly what setting needs to be set. This commit makes
these settings explicit.
Relates #21200
Lucene 6.3 is expected to be released in the next weeks so it'd be good to give
it some integration testing. I had to upgrade randomized-testing too so that
both Lucene and Elasticsearch are on the same version.
This commit fixes responses to HEAD requests so that the value of the
Content-Length is correct per the HTTP spec. Namely, the value of this
header should be equal to the Content-Length if the request were not a
HEAD request.
This commit also fixes a memory leak on HEAD requests to the main action
that arose from the bytes on a builder not being released due to them
being dropped on the floor to ensure that the response to the main
action did not have a body.
Relates #21123
Before this commit `curl -XHEAD localhost:9200?pretty` would return
`Content-Length: 1` and a body which is fairly upsetting to standards
compliant tools. Now it'll return `Content-Length: 0` with an empty
body like every other `HEAD` request.
Relates to #21075
This commit upgrades the Log4j 2 dependency to version 2.7 and removes
some hacks that we had in place to work around bugs in Log4j 2 version
2.6.2.
Relates #20805
We have a "HUGE HACK" that allows us to publish zip artifacts to
Sonatype's OSS repository without javadoc and source jars. We don't
include those jars because the zip is just a repackaging of the
core and module jars for which we already publish the javadoc and
source jars. So we have a hack to publish the zip artifact when the
pom says the project is of type 'pom'.
I'm not sure why we need this pom instead of the pom generated by
nebula, but if we are going to have it then we need to populate it
with appropriate stuff like project name, description, and url.