Commit Graph

294 Commits

Author SHA1 Message Date
Rory Hunter 8fb9bed078 Fix compilation error 2020-02-20 14:39:44 +00:00
Maria Ralli ba8d6d1fb5 Remove Xlint exclusions from gradle files
Backport of #52542.

This commit is part of issue #40366 to remove disabled Xlint warnings
from gradle files. In particular, it removes the Xlint exclusions from
the following files:

- benchmarks/build.gradle
- client/client-benchmark-noop-api-plugin/build.gradle
- x-pack/qa/rolling-upgrade/build.gradle
- x-pack/qa/third-party/active-directory/build.gradle
- modules/transport-netty4/build.gradle

For the first three files no code adjustments were needed. For
x-pack/qa/third-party/active-directory move the suppression at the code
level. For transport-netty4 replace the variable arguments with
ArrayLists and remove any redundant casts.
2020-02-20 14:12:05 +00:00
Armin Braun 91e938ead8
Add Trace Logging of REST Requests (#51684) (#52015)
Being able to trace log all REST requests to a node would make debugging
a number of issues a lot easier.
2020-02-07 09:03:20 +01:00
Maria Ralli 8d3e73b3a0 Add host address to BindTransportException message (#51269)
When bind fails, show the host address in addition to the port. This
helps debugging cases with wrong "network.host" values.

Closes #48001
2020-02-04 17:13:19 +00:00
Ioannis Kakavas ba3051a50f
Mute Netty4ClientYamlTestSuiteIT in FIPS 140 (#51536)
rest-api-spec/test/10_basic.yml would check that transport_types is
`netty4` but we run FIPS 140 tests with default distribution and
transport_types is `security4`
2020-01-29 08:16:47 +02:00
Armin Braun 91ac87d75b
Stop Allocating Buffers in CopyBytesSocketChannel (#49825) (#49832)
* Stop Allocating Buffers in CopyBytesSocketChannel (#49825)

The way things currently work, we read up to 1M from the channel
and then potentially force all of it into the `ByteBuf` passed
by Netty. Since that `ByteBuf` tends to by default be `64k` in size,
large reads will force the buffer to grow, completely circumventing
the logic of `allocHandle`.

This seems like it could break
`io.netty.channel.RecvByteBufAllocator.Handle#continueReading`
since that method for the fixed-size allocator does check
whether the last read was equal to the attempted read size.
So if we set `64k` because that's what the buffer size is,
then wirte `1M` to the buffer we will stop reading on the IO loop,
even though the channel may still have bytes that we can read right away.

More imporatantly though, this can lead to running OOM quite easily
under IO pressure as we are forcing the heap buffers passed to the read
to `reallocate`.

Closes #49699
2019-12-04 19:36:52 +01:00
Armin Braun 996cddd98b
Stop Copying Every Http Request in Message Handler (#44564) (#49809)
* Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way
* Relates #32228
   * I think the issue that preventet that PR  that PR from being merged was solved by #39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks  (I'd argue the bounds checks we add, we save when copying the composite buffer)
* I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)
2019-12-04 08:41:42 +01:00
Yannick Welsch bd007271cf Avoid double-wrapping allocator (#49534)
When using unpooled, the allocator is wrapped twice in a NoDirectBuffers.
2019-11-27 09:25:32 +01:00
Henning Andersen 49bb5fb642 Netty4: switch to composite cumulator (#49478)
The default merge cumulator used in netty transport leads to additional
GC pressure and memory copying when a message that exceeds the chunk
size is handled. This is especially a problem on G1 GC, since we get
many "humongous" allocations and that can in theory cause real memory
circuit breaker to break unnecessarily.
2019-11-22 18:14:10 +01:00
Yogesh Gaikwad 9ed7352a12
Add Sysprop to Adjust IO Buffer Size (#48267) (#48667)
The 1MB IO-buffer size per transport thread is causing trouble in
some tests, albeit at a low rate. Reducing the number of transport
threads was not enough to fully fix this situation.
Allowing to configure the size of the buffer and reducing it by
more than an order of magnitude should fix these tests.

Closes #46803
2019-10-30 14:19:54 +11:00
Tim Brooks 45e42f4e18
Upgrade to Netty 4.1.43 (#48484)
With this update we can remove the mitigation in our custom allocator
which forces heap buffer allocations.
2019-10-25 10:17:25 -06:00
Tim Brooks c0b545f325
Make BytesReference an interface (#48486)
BytesReference is currently an abstract class which is extended by
various implementations. This makes it very difficult to use the
delegation pattern. The implication of this is that our releasable
BytesReference is a PagedBytesReference type and cannot be used as a
generic releasable bytes reference that delegates to any reference type.
This commit makes BytesReference an interface and introduces an
AbstractBytesReference for common functionality.
2019-10-24 15:39:30 -06:00
Tim Brooks c1f6aff5bb
Remove default netty allocator empty assertions (#48356)
This commit removes a problematic assertion that the netty default
allocator is not used. This assertion is problematic because any other
test can cause this task to fail by touching the default allocator. We
assert that we are using heap buffers in the channel.
2019-10-22 20:22:32 -06:00
Tim Brooks 547e399dbf
Remove option to enable direct buffer pooling (#48310)
This commit removes the option to change the netty system properties to
reenable the direct buffer pooling. It also removes the need for us to
disable the buffer pooling in the system properties file. Instead, we
programmatically craete an allocator that is used by our networking
layer.

This commit does introduce an Elasticsearch property which allows the
user to fallback on the netty default allocator. If they choose this
option, they can configure the default allocator how they wish using the
standard netty properties.
2019-10-21 19:15:50 -06:00
Tim Brooks ad233e3e38
Add test for CopyBytesSocketChannel (#46031)
Currently we use a custom CopyBytesSocketChannel for interfacing with
netty. We have integration tests that use this channel, however we never
verify the read and write behavior in the face of potential partial
writes. This commit adds a test for this behavior.
2019-08-27 11:25:22 -05:00
Jason Tedor de6b6fd338
Add node.processors setting in favor of processors (#45885)
This commit namespaces the existing processors setting under the "node"
namespace. In doing so, we deprecate the existing processors setting in
favor of node.processors.
2019-08-22 22:18:37 -04:00
Tim Brooks ae06a9399a
Fix bug in copying bytes for socket write (#45463)
Currently we take the array of nio buffers from the netty channel
outbound buffer and copy their bytes to a direct buffer. In the process
we mutate the nio buffer positions. It seems like netty will continue to
reuse these buffers. This means than any data that is not flushed in a
call is lost. This commit fixes this by incrementing the positions after
the flush has completed. This is similar to the behavior that
SocketChannel would have provided and netty relied upon.

Fixes #45444.
2019-08-12 15:59:26 -06:00
Tim Brooks af908efa41
Disable netty direct buffer pooling by default (#44837)
Elasticsearch does not grant Netty reflection access to get Unsafe. The
only mechanism that currently exists to free direct buffers in a timely
manner is to use Unsafe. This leads to the occasional scenario, under
heavy network load, that direct byte buffers can slowly build up without
being freed.

This commit disables Netty direct buffer pooling and moves to a strategy
of using a single thread-local direct buffer for interfacing with sockets.
This will reduce the memory usage from networking. Elasticsearch
currently derives very little value from direct buffer usage (TLS,
compression, Lucene, Elasticsearch handling, etc all use heap bytes). So
this seems like the correct trade-off until that changes.
2019-08-08 15:10:31 -06:00
Yannick Welsch 7aeb2fe73c Add per-socket keepalive options (#44055)
Uses JDK 11's per-socket configuration of TCP keepalive (supported on Linux and Mac), see
https://bugs.openjdk.java.net/browse/JDK-8194298, and exposes these as transport settings.
By default, these options are disabled for now (i.e. fall-back to OS behavior), but we would like
to explore whether we can enable them by default, in particular to force keepalive configurations
that are better tuned for running ES.
2019-08-06 10:45:44 +02:00
Tim Brooks 984ba82251
Move nio channel initialization to event loop (#45155)
Currently in the transport-nio work we connect and bind channels on the
a thread before the channel is registered with a selector. Additionally,
it is at this point that we set all the socket options. This commit
moves these operations onto the event-loop after the channel has been
registered with a selector. It attempts to set the socket options for a
non-server channel at registration time. If that fails, it will attempt
to set the options after the channel is connected. This should fix
#41071.
2019-08-02 17:31:31 -04:00
Armin Braun 9450505d5b
Stop Passing Around REST Request in Multiple Spots (#44949) (#45109)
* Stop Passing Around REST Request in Multiple Spots

* Motivated by #44564
  * We are currently passing the REST request object around to a large number of places. This works fine since we simply copy the full request content before we handle the rest itself which is needlessly hard on GC and heap.
  * This PR removes a number of spots where the request is passed around needlessly. There are many more spots to optimize in follow-ups to this, but this one would already enable bypassing the request copying for some error paths in a follow up.
2019-08-02 07:31:38 +02:00
Tim Brooks aff66e3ac5
Add Cors integration tests (#44361)
This commit adds integration tests to ensure that the basic cors
functionality works for the netty and nio transports.
2019-07-31 14:24:23 -06:00
Armin Braun ac11073183
Optimize Netty Frame Decoding (#44664) (#45001)
* We should not create a new wrapper object if there's no bytes in the `ByteBuf`
* We should not create a new wrapped `ByteBuf` if it can't contain a message anyway because it doesn't even have enough bytes for a header left
2019-07-30 15:25:52 +02:00
Armin Braun 4495140d1f
Release Pooled Buffers Earlier for HTTP Requests (#44952) (#44991)
* We should release the buffers right after copying and not only do so after we did all the request handling on the copy
* Relates #44564
2019-07-30 10:30:01 +02:00
Jason Tedor 39c5f98de7
Introduce test issue logging (#44477)
Today we have an annotation for controlling logging levels in
tests. This annotation serves two purposes, one is to control the
logging level used in tests, when such control is needed to impact and
assert the behavior of loggers in tests. The other use is when a test is
failing and additional logging is needed. This commit separates these
two concerns into separate annotations.

The primary motivation for this is that we have a history of leaving
behind the annotation for the purpose of investigating test failures
long after the test failure is resolved. The accumulation of these stale
logging annotations has led to excessive disk consumption. Having
recently cleaned this up, we would like to avoid falling into this state
again. To do this, we are adding a link to the test failure under
investigation to the annotation when used for the purpose of
investigating test failures. We will add tooling to inspect these
annotations, in the same way that we have tooling on awaits fix
annotations. This will enable us to report on the use of these
annotations, and report when stale uses of the annotation exist.
2019-07-18 05:33:33 +09:00
Tim Brooks 6b1a769638
Move CORS Config into :server package (#43779)
This commit moves the config that stores Cors options into the server
package. Currently both nio and netty modules must have a copy of this
config. Moving it into server allows one copy and the tests to be in a
common location.
2019-07-16 17:50:42 -06:00
Yannick Welsch 2ee07f1ff4 Simplify port usage in transport tests (#44157)
Simplifies AbstractSimpleTransportTestCase to use JVM-local ports  and also adds an assertion so
that cases like #44134 can be more easily debugged. The likely reason for that one is that a test,
which was repeated again and again while always spawning a fresh Gradle worker (due to Gradle
daemon) kept increasing Gradle worker IDs, causing an overflow at some point.
2019-07-11 13:35:37 +02:00
David Turner df0f0b3d40
Rename autoMinMasterNodes to autoManageMasterNodes (#42789)
Renames the `ClusterScope` attribute `autoMinMasterNodes` to reflect its
broader meaning since 7.0.

Backport of the relevant part of #42700 to `7.x`.
2019-06-03 12:12:07 +01:00
Jason Tedor 371cb9a8ce
Remove Log4j 1.2 API as a dependency (#42702)
We had this as a dependency for legacy dependencies that still needed
the Log4j 1.2 API. This appears to no longer be necessary, so this
commit removes this artifact as a dependency.

To remove this dependency, we had to fix a few places where we were
accidentally relying on Log4j 1.2 instead of Log4j 2 (easy to do, since
both APIs were on the compile-time classpath).

Finally, we can remove our custom Netty logger factory. This was needed
when we were on Log4j 1.2 and handled logging in our own unique
way. When we migrated to Log4j 2 we could have dropped this
dependency. However, even then Netty would still pick up Log4j 1.2 since
it was on the classpath, thus the advantage to removing this as a
dependency now.
2019-05-30 16:08:07 -04:00
Armin Braun 47d50c6774
Fix Class Load Order in Netty4Plugin (#42591) (#42703)
* Don't force the logger in the Netty4Plugin class already, at this point log4j might not be fully initialized.
   * The call was redundant anyway since we do the same thing in the Netty4Transport and Netty4HttpServerTransport classes already and there we do it properly after setting up log4j by initilizing the loggers
* Relates #42532
2019-05-30 14:55:55 +02:00
Armin Braun 1beed9e71f
Adjust use of Deprecated Netty API (#42613) (#42657)
* With the recent upgrade to Netty 4.1.36 this method became deprecated and I made the advised change to fix the deprecation
2019-05-30 12:42:39 +02:00
Alan Woodward 4cca1e8fff Correct spelling of MockLogAppender.PatternSeenEventExpectation (#41893)
The class was called PatternSeenEventExcpectation. This commit
is a straight class rename to correct the spelling.
2019-05-07 17:28:51 +01:00
Tim Brooks b4bcbf9f64
Support http read timeouts for transport-nio (#41466)
This is related to #27260. Currently there is a setting
http.read_timeout that allows users to define a read timeout for the
http transport. This commit implements support for this functionality
with the transport-nio plugin. The behavior here is that a repeating
task will be scheduled for the interval defined. If there have been
no requests received since the last run and there are no inflight
requests, the channel will be closed.
2019-05-02 09:48:52 -06:00
Michael Morello 75283294f5 Fix multi-node parsing in voting config exclusions REST API (#41588)
Fixes an issue where multiple nodes where not properly parsed in the voting config exclusions REST API.

Closes #41587
2019-04-27 12:20:03 +02:00
Armin Braun ebcb925afb
Cleanup Duplication in Netty4 Module (#40148) (#40563)
* Just drying up the listener/promise handling a little
2019-03-28 00:57:58 +01:00
Tim Brooks ab44f5fd5d
Add InboundHandler for inbound message handling (#40430)
This commit adds an InboundHandler to handle inbound message processing.
With this commit, this code is moved out of the TcpTransport.
Additionally, finer grained unit tests are added to ensure that the
inbound processing works as expected
2019-03-27 12:33:26 -06:00
Tim Brooks 3860ddd1a4
Move outbound message handling to OutboundHandler (#40336)
Currently there are some components of message serializer and sending
that still occur in TcpTransport. This commit makes it possible to
send a message without the TcpTransport by moving all of the remaining
application logic to the OutboundHandler. Additionally, it adds unit
tests to ensure that this logic works as expected.
2019-03-27 11:47:36 -06:00
Armin Braun 13d76239a0
Use Netty ByteBuf Bulk Operations for Faster Deserialization (#40158) (#40339)
* Use bulk methods to read numbers faster from byte buffers
2019-03-24 19:08:51 +01:00
Tim Brooks 0b50a670a4
Remove transport name from tcp channel (#40074)
Currently, we maintain a transport name ("mock-nio", "nio", "netty")
that is passed to a `TcpTransportChannel` when a request is received.
The value of this name is to associate with the task when we register a
task with the task manager. However, it is only possible to run ES with
one transport, so having an implementation specific name is unnecessary.
This commit removes the name and replaces it with the generic
"transport".
2019-03-15 12:04:13 -06:00
Armin Braun f5da028a3d
Chunk + Throttle Netty Writes (#39286) (#39778)
* Chunk large writes and throttle on a non-writable channel to reduce direct memory usage by Netty
2019-03-07 07:24:08 +01:00
Armin Braun aaecaf59a4
Optimize Bulk Message Parsing and Message Length Parsing (#39634) (#39730)
* Optimize Bulk Message Parsing and Message Length Parsing

* findNextMarker took almost 1ms per invocation during the PMC rally track
  * Fixed to be about an order of magnitude faster by using Netty's bulk `ByteBuf` search
* It is unnecessary to instantiate an object (the input stream wrapper) and throw it away, just to read the `int` length from the message bytes
  * Fixed by adding bulk `int` read to BytesReference
2019-03-06 08:13:15 +01:00
Marios Trivyzas 11fe8cd16f
[Tests] Fix flakiness by ensuring stable cluster (#39300) (#39356)
In integration tests where `setBootstrapMasterNodeIndex()` is used in
combination with `autoMinMasterNodes = false` the cluster can start
bootstrapping once the number of nodes set with the
`setBootstrapMasterNodeIndex` have been started but it's not ensured
that all nodes have successfully joined to form the cluster.

This behaviour was introduced with 5db7ed22a0
and in order to ensure that the cluster is properly formed before proceeding
with the integration test, use `ensureStableCluster()` with the
appropriate number of expected nodes.

Fixes: #39220
2019-02-25 17:26:15 +01:00
Luca Cavanna a7046e001c
Remove support for maxRetryTimeout from low-level REST client (#38085)
We have had various reports of problems caused by the maxRetryTimeout
setting in the low-level REST client. Such setting was initially added
in the attempts to not have requests go through retries if the request
already took longer than the provided timeout.

The implementation was problematic though as such timeout would also
expire in the first request attempt (see #31834), would leave the
request executing after expiration causing memory leaks (see #33342),
and would not take into account the http client internal queuing (see #25951).

Given all these issues, it seems that this custom timeout mechanism 
gives little benefits while causing a lot of harm. We should rather rely 
on connect and socket timeout exposed by the underlying http client 
and accept that a request can overall take longer than the configured 
timeout, which is the case even with a single retry anyways.

This commit removes the `maxRetryTimeout` setting and all of its usages.
2019-02-06 08:43:47 +01:00
David Turner f2dd5dd6eb
Remove DiscoveryPlugin#getDiscoveryTypes (#38414)
With this change we no longer support pluggable discovery implementations. No
known implementations of `DiscoveryPlugin` actually override this method, so in
practice this should have no effect on the wider world. However, we were using
this rather extensively in tests to provide the `test-zen` discovery type. We
no longer need a separate discovery type for tests as we no longer need to
customise its behaviour.

Relates #38410
2019-02-05 17:42:24 +00:00
Andrey Ershov bfd618cf83
Universal cluster bootstrap method for tests with autoMinMasterNodes=false (#38038)
Currently, there are a few tests that use autoMinMasterNodes=false and
hence override addExtraClusterBootstrapSettings, mostly this is 10-30
lines of codes that are copy-pasted from class to class.

This PR introduces `InternalTestCluster.setBootstrapMasterNodeIndex`
which is suitable for all classes and copy-paste could be removed.

Removing code is always a good thing!
2019-02-01 11:34:31 +01:00
David Turner 81c443c9de
Deprecate minimum_master_nodes (#37868)
Today we pass `discovery.zen.minimum_master_nodes` to nodes started up in
tests, but for 7.x nodes this setting is not required as it has no effect.
This commit removes this setting so that nodes are started with more realistic
configurations, and deprecates it.
2019-01-30 20:09:15 +00:00
Nik Everett e97718245d
Test: Enable strict deprecation on all tests (#36558)
This drops the option for tests to disable strict deprecation mode in
the low level rest client in favor of configuring expected warnings on
any calls that should expect warnings. This behavior is
paranoid-by-default which is generally the right way to handle
deprecations and tests in general.
2019-01-30 11:48:34 -05:00
Tim Brooks 21838d73b5
Extract message serialization from `TcpTransport` (#37034)
This commit introduces a NetworkMessage class. This class has two
subclasses - InboundMessage and OutboundMessage. These messages can
be serialized and deserialized independent of the transport. This allows
more granular testing. Additionally, the serialization mechanism is now
a simple Supplier. This builds the framework to eventually move the
serialization of transport messages to the network thread. This is the
one serialization component that is not currently performed on the
network thread (transport deserialization and http serialization and
deserialization are all on the network thread).
2019-01-21 14:14:18 -07:00
Julie Tibshirani 36a3b84fc9
Update the default for include_type_name to false. (#37285)
* Default include_type_name to false for get and put mappings.

* Default include_type_name to false for get field mappings.

* Add a constant for the default include_type_name value.

* Default include_type_name to false for get and put index templates.

* Default include_type_name to false for create index.

* Update create index calls in REST documentation to use include_type_name=true.

* Some minor clean-ups around the get index API.

* In REST tests, use include_type_name=true by default for index creation.

* Make sure to use 'expression == false'.

* Clarify the different IndexTemplateMetaData toXContent methods.

* Fix FullClusterRestartIT#testSnapshotRestore.

* Fix the ml_anomalies_default_mappings test.

* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.

We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.

This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.

* Fix more REST tests.

* Improve some wording in the create index documentation.

* Add a note about types removal in the create index docs.

* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.

* Make sure to mention include_type_name in the REST docs for affected APIs.

* Make sure to use 'expression == false' in FullClusterRestartIT.

* Mention include_type_name in the REST templates docs.
2019-01-14 13:08:01 -08:00
Andrey Ershov ca92d74e7e
[Zen2] Change unsafe bootstrap nodes count to nodes list in tests (#36559)
This commit modifies ESSingleNodeTestCase and ESIntegTestCase and
several concrete test classes to use node names when bootstrapping the
cluster.

Today ClusterBootstrapService.INITIAL_MASTER_NODE_COUNT_SETTING
setting is used to bootstrap clusters in tests. Instead, we want to use
ClusterBootrstapService.INITIAL_MASTER_NODES_SETTING and get rid of
the former setting eventually.

There were two main problems when refactoring InternalTestCluster:

1. Nodes are created one-by-one in buildNode method. And node.name
is created in this method as well. It's not suitable for bootstrapping,
because we need to have the names of all master eligible nodes in
advance, before creating the node with bootstrapping configuration set.
We address this issue by separating buildNode into two methods:
getNodeSettings and buildNode. We first iterate over all nodes to
get nodes settings, then change the setting for the bootstrapping node
and then proceed with building the node.
2. If autoManageMinMasterNodes = false, there is no way for the test to
set the list of bootstrapping nodes because node names are not known in
advance. This problem is solved by adding updateNodesSettings method
to NodeConfigurationSource and ESIntegTestCase (which could be
overridden by concrete integration test class). Once we have the list
of settings for all nodes, the integration test class is allowed to
update it. In our case, we update the
ClusterBootrstapService.INITIAL_MASTER_NODES_SETTING setting.
2018-12-20 15:20:33 +01:00