Currently we create a new netty event loop group for client connections
and all server profiles. Each new group creates new threads for io
processing. This means 2 * num of processors new threads for each group.
A single group should be able to handle all io processing (for the
transports). This also brings the netty module inline with what we do
for nio.
Additionally, this PR renames the worker threads to be the same for
netty and nio.
With this commit we differentiate between permanent circuit breaking
exceptions (which require intervention from an operator and should not
be automatically retried) and transient ones (which may heal themselves
eventually and should be retried). Furthermore, the parent circuit
breaker will categorize a circuit breaking exception as either transient
or permanent based on the categorization of memory usage of its child
circuit breakers.
Closes#31986
Relates #34460
In order to remove Streamable from the codebase, Response objects need
to be read using the Writeable.Reader interface which this change
enables. This change enables the use of Writeable.Reader by adding the
`Action#getResponseReader` method. The default implementation simply
uses the existing `newResponse` method and the readFrom method. As
responses are migrated to the Writeable.Reader interface, Action
classes can be updated to throw an UnsupportedOperationException when
`newResponse` is called and override the `getResponseReader` method.
Relates #34389
This is related to #30876. The AbstractSimpleTransportTestCase initiates
many tcp connections. There are normally over 1,000 connections in
TIME_WAIT at the end of the test. This is because every test opens at
least two different transports that connect to each other with 13
channel connection profiles. This commit modifies the default
connection profile used by this test to 6. One connection for each
type, except for REG which gets 2 connections.
The contains syntax was added in #30874 but the skips were not properly
put in place.
The java runner has the feature so the tests will run as part of the
build, but language clients will be able to support it at their own
pace.
Currently a bad regex in CORS settings throws a PatternSyntaxException, which
then bubbles up through the bootstrap code, meaning users have to parse a
stack trace to work out where the problem is. We should instead catch this
exception and rethrow with a more useful error message.
This commit introduces an AbstractSimpleSecurityTransportTestCase for
security transports. This classes provides transport tests that are
specific for security transports. Additionally, it fixes the tests referenced in
#33285.
Historically we have had a ESLoggingHandler in the netty module that
logs low-level connection operations. This class just extends the netty
logging handler with some (broken) message deserialization. This commit
fixes this message serialization and moves the class to server.
This new logger logs inbound and outbound messages. Eventually, we
should move other event logging to this class (connect, close, flush).
That way we will have consistent logging regards of which transport is
loaded.
Resolves#27306 on master. Older branches will need a different fix.
This commit is related to #32517. It allows an "server_name"
attribute on a DiscoveryNode to be propagated to the server using
the TLS SNI extentsion. This functionality is only implemented for
the netty security transport.
- third party audit detects jar hell with JDK so we disable it
- jdk non portable in forbiddenapis detects classes being used from the
JDK ( for fips ) that are not portable, this is intended so we don't
scan for it on fips.
- different exclusion rules for third party audit on fips
Closes#33179
In #29623 we added `Request` object flavored requests to the low level
REST client and in #30315 we deprecated the old `performRequest`s. In a
long series of PRs I've changed all of the old style requests that I
could find with `grep`. In this PR I change all requests that I could
find by *removing* the deprecated methods. Since this is a non-trivial
change I do not include actually removing the deprecated requests. I'll
do that in a follow up. But this should be the last set of usage
removals before the actual deprecated method removal. Yay!
In our Netty layer we have had to take extra precautions against Netty
catching throwables which prevents them from reaching the uncaught
exception handler. This code has taken on additional uses in NIO layer
and now in the scheduler engine because there are other components in
stack traces that could catch throwables and suppress them from reaching
the uncaught exception handler. This commit is a simple cleanup of the
iterative evolution of this code to refactor all uses into a single
method in ExceptionsHelper.
This is related to #32517. This commit passes the DiscoveryNode to the
initiateChannel method for different Transport implementation. This
will allow additional attributes (besides just the socket address) to be
used when opening channels.
This is a followup to #31886. After that commit the
TransportConnectionListener had to be propogated to both the
Transport and the ConnectionManager. This commit moves that listener
to completely live in the ConnectionManager. The request and response
related methods are moved to a TransportMessageListener. That listener
continues to live in the Transport class.
This is related to #31835. It moves the default connection profile into
the ConnectionManager class. The will allow us to have different
connection managers with different profiles.
This is related to #31835. This commit adds a connection manager that
manages client connections to other nodes. This means that the
TcpTransport no longer maintains a map of nodes that it is connected
to.
* Upgrade to `4.1.28` since the problem reported in #32487 is a bug in Netty itself (see https://github.com/netty/netty/issues/7337)
* Fixed other leaks in test code that now showed up due to fixes improvements in leak reporting in the newer version
* Needed to extend permissions for netty common package because it now sets a classloader at runtime after changes in 63bae0956a
* Adjusted forbidden APIs check accordingly
* Closes#32487
There are two scenarios where a http request could terminate in the cors
handler. If that occurs, the requests need to be released. This commit
releases those requests.
* Test `handler` must release buffer the same way the replaced `org.elasticsearch.http.netty4.Netty4HttpRequestHandler#channelRead0` releases it
* Closes#32289
In #29623 we added `Request` object flavored requests to the low level
REST client and in #30315 we deprecated the old `performRequest`s. This
changes most of the calls not in X-Pack to their new versions.
This is related to #27260. It adds the SecurityNioHttpServerTransport
to the security plugin. It randomly uses the nio http transport in
security integration tests.
Today TransportService is tightly coupled with Transport since it
requires an instance of TransportService in order to receive responses
and send requests. This is mainly due to the Request and Response handlers
being maintained in TransportService but also because of the lack of a proper
callback interface.
This change moves request handler registry and response handler registration into
Transport and adds all necessary methods to `TransportConnectionListener` in order
to remove the `TransportService` dependency from `Transport`
Transport now accepts one or more `TransportConnectionListener` instances that are
executed sequentially in a blocking fashion.
* remove left-over comment
* make sure of the property for plugins
* skip installing modules if these exist in the distribution
* Log the distrbution being ran
* Don't allow running with integ-tests-zip passed externally
* top level x-pack/qa can't run with oss distro
* Add support for matching objects in lists
Makes it possible to have a key that points to a list and assert that a
certain object is present in the list. All keys have to be present and
values have to match. The objects in the source list may have additional
fields.
example:
```
match: { 'nodes.$master.plugins': { name: ingest-attachment } }
```
* Update plugin and module tests to work with other distributions
Some of the tests expected that the integration tests will always be ran
with the `integ-test-zip` distribution so that there will be no other
plugins loaded.
With this change, we check for the presence of the plugin without
assuming exclusivity.
* Allow modules to run on other distros as well
To match the behavior of tets.distributions
* Add and use a new `contains` assertion
Replaces the previus changes that caused `match` to do a partial match.
* Implement PR review comments
TransportRequestHandler currently contains 2 messageReceived methods,
one which takes a Task, and one that does not. The first just delegates
to the second. This commit changes all existing implementors of
TransportRequestHandler to implement the version which takes Task, thus
allowing the class to be a functional interface, and eliminating the
need to throw exceptions when a task needs to be ensured.
Historically in TcpTransport server channels were represented by the
same channel interface as socket channels. This was necessary as
TcpTransport was parameterized by the channel type. This commit
introduces TcpServerChannel and HttpServerChannel classes. Additionally,
it adds the implementations for the various transports. This allows
server channels to have unique functionality and not implement the
methods they do not support (such as send and getRemoteAddress).
Additionally, with the introduction of HttpServerChannel this commit
extracts some of the storing and closing channel work to the abstract
http server transport.
This is a general cleanup of channels and exception handling in http.
This commit introduces a CloseableChannel that is a superclass of
TcpChannel and HttpChannel. This allows us to unify the closing logic
between tcp and http transports. Additionally, the normal http channels
are extracted to the abstract server transport.
Finally, this commit (mostly) unifies the exception handling between nio
and netty4 http server transports.
This is related to #28898. This PR implements pooling of bytes arrays
when reading from the wire in the http server transport. In order to do
this, we must integrate with netty reference counting. That manner in
which this PR implements this is making Pages in InboundChannelBuffer
reference counted. When we accessing the underlying page to pass to
netty, we retain the page. When netty releases its bytebuf, it releases
the underlying pages we have passed to it.
Currently we maintain a compatibility map of http status codes in both
the netty4 and nio modules. These maps convert a RestStatus to a netty
HttpResponseStatus. However, as these fundamentally represent integers,
we can just use the netty valueOf method to convert a RestStatus to a
HttpResponseStatus.
This is related to #28898. With the addition of the http nio transport,
we now have two different modules that provide http transports.
Currently most of the http logic lives at the module level. However,
some of this logic can live in server. In particular, some of the
setting of headers, cors, and pipelining. This commit begins this moving
in that direction by introducing lower level abstraction (HttpChannel,
HttpRequest, and HttpResonse) that is implemented by the modules. The
higher level rest request and rest channel work can live entirely in
server.
Currently the http pipelining handlers seem to support chunked http
content. However, this does not make sense. There is a content
aggregator in the pipeline before the pipelining handler. This means the
pipelining handler should only see full http messages. Additionally, the
request handler immediately after the pipelining handler only supports
full messages.
This commit modifies both nio and netty4 pipelining handlers to assert
that an inbound message is a full http message. Additionally it removes
the tests for chunked content.
This commit upgrades us to Netty 4.1.25. This upgrade is more
challenging than past upgrades, all because of a new object cleaner
thread that they have added. This thread requires an additional security
permission (set context class loader, needed to avoid leaks in certain
scenarios). Additionally, there is not a clean way to shutdown this
thread which means that the thread can fail thread leak control during
tests. As such, we have to filter this thread from thread leak control.
This is related to #27260 and #28898. This commit adds the transport-nio
plugin as a random option when running the http smoke tests. As part of
this PR, I identified an issue where cors support was not properly
enabled causing these tests to fail when using transport-nio. This
commit also fixes that issue.
This is related to #28898. This commit adds cors support to the nio http
transport. Most of the work is copied directly from the netty module
implementation. Additionally, this commit adds tests for the nio http
channel.
This is related to #31017. That issue identified that these three http
methods were treated like GET requests. This commit adds them to
RestRequest. This means that these methods will be handled properly and
generate 405s.
This commit introduces the ability for a client to communicate to the
server features that it can support and for these features to be used in
influencing the decisions that the server makes when communicating with
the client. To this end we carry the features from the client to the
underlying stream as we carry the version of the client today. This
enables us to enhance the logic where we make protocol decisions on the
basis of the version on the stream to also make protocol decisions on
the basis of the features on the stream. With such functionality, the
client can communicate to the server if it is a transport client, or if
it has, for example, X-Pack installed. This enables us to support
rolling upgrades from the OSS distribution to the default distribution
without breaking client connectivity as we can now elect to serialize
customs in the cluster state depending on whether or not the client
reports to us using the feature capabilities that it can under these
customs. This means that we would avoid sending a client pieces of the
cluster state that it can not understand. However, we want to take care
and always send the full cluster state during node-to-node communication
as otherwise we would end up with different understanding of what is in
the cluster state across nodes depending on which features they reported
to have. This is why when deciding whether or not to write out a custom
we always send the custom if the client is not a transport client and
otherwise do not send the custom if the client is transport client that
does not report to have the feature required by the custom.
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
Currently AbstractHttpServerTransport is in a netty4 module. This is the
incorrect location. This commit moves it out of netty4 module.
Additionally, it moves unit tests that test AbstractHttpServerTransport
logic to server.
Currently nio and netty modules use the CompletableFuture class for
managing listeners. This is unfortunate as that class accepts
Throwable. This commit adds a class CompletableContext that wraps
the CompletableFuture but does not accept Throwable. This allows the
modification of netty and nio logic to no longer handle Throwable.
This commit reintroduces 31251c9 and 63a5799. These commits introduced a
memory leak and were reverted. This commit brings those commits back
and fixes the memory leak by removing unnecessary retain method calls.
This reverts commit 31251c9 introduced in #30695.
We suspect this commit is causing the OOME's reported in #30811 and we will use this PR to test this assertion.
This is related to #29500 and #28898. This commit removes the abilitiy
to disable http pipelining. After this commit, any elasticsearch node
will support pipelined requests from a client. Additionally, it extracts
some of the http pipelining work to the server module. This extracted
work is used to implement pipelining for the nio plugin.
This commit is related to #28898. It adds an nio driven http server
transport. Currently it only supports basic http features. Cors,
pipeling, and read timeouts will need to be added in future PRs.
With this commit we determine the maximum number of buffers that Netty keeps
while accumulating one HTTP request based on the maximum content length (default
1500 bytes, overridable with the system property `es.net.mtu`). Previously, we
kept the default value of 1024 which is too small for bulk requests which leads
to unnecessary copies of byte buffers internally.
Relates #29448
This commit removes the http.enabled setting. While all real nodes (started with bin/elasticsearch) will always have an http binding, there are many tests that rely on the quickness of not actually needing to bind to 2 ports. For this case, the MockHttpTransport.TestPlugin provides a dummy http transport implementation which is used by default in ESIntegTestCase.
closes#12792
I am not sure why we have this leniency for HTTP max content length, it
has been there since the beginning
(5ac51ee93f) with no explanation of its
source. That said, our philosophy today is different than the philosophy
of the past where Elasticsearch would be quite lenient in its handling
of settings and today we aim for predictability for both users and
us. This commit removes leniency in the parsing of
http.max_content_length.
We use a latch when sending requests during tests so that we do not hang
forever waiting for replies on those requests. This commit increases the
timeout on that latch to 30 seconds because sometimes 10 seconds is just
not enough.
Today we have a few problems with how we handle bad requests:
- handling requests with bad encoding
- handling requests with invalid value for filter_path/pretty/human
- handling requests with a garbage Content-Type header
There are two problems:
- in every case, we give an empty response to the client
- in most cases, we leak the byte buffer backing the request!
These problems are caused by a broader problem: poor handling preparing
the request for handling, or the channel to write to when the response
is ready. This commit addresses these issues by taking a unified
approach to all of them that ensures that:
- we respond to the client with the exception that blew us up
- we do not leak the byte buffer backing the request
* Decouple XContentBuilder from BytesReference
This commit removes all mentions of `BytesReference` from `XContentBuilder`.
This is needed so that we can completely decouple the XContent code and move it
into its own dependency.
While this change appears large, it is due to two main changes, moving
`.bytes()` and `.string()` out of XContentBuilder itself into static methods
`BytesReference.bytes` and `Strings.toString` respectively. The rest of the
change is code reacting to these changes (the majority of it in tests).
Relates to #28504
* Remove log4j dependency from elasticsearch-core
This removes the log4j dependency from our elasticsearch-core project. It was
originally necessary only for our jar classpath checking. It is now replaced by
a `Consumer<String>` so that the es-core dependency doesn't have external
dependencies.
The parts of #28191 which were moved in conjunction (like `ESLoggerFactory` and
`Loggers`) have been moved back where appropriate, since they are not required
in the core jar.
This is tangentially related to #28504
* Add javadocs for `output` parameter
* Change @code to @link
We have code used in the networking layer to search for errors buried in
other exceptions. This code will be useful in other locations so with
this commit we move it to our exceptions helpers.
Relates #28691
The is a follow up to #28567 changing the method used to capture stack traces, as requested
during the review. Instead of creating a throwable, we explicitly capture the stack trace of the
current thread. This should Make Jason Happy Again ™️ .
This commit updates netty to 4.1.16.Final. This is the latest version that we can have work without
extra permissions. This updated version of netty fixes issues seen with Java 9 and some data
not being sent, which results in timeouts.
The method `initiateChannel` on `TcpTransport` is explicit in that
channels can be connect asynchronously. All production implementations
do connect asynchronously. Only the blocking `MockTcpTransport`
connects in a synchronous manner. This avoids testing some of the
blocking code in `TcpTransport` that waits on connections to complete.
Additionally, it requires a more extensive method signature than
required for other transports.
This commit modifies the `MockTcpTransport` to make these connections
asynchronously on a different thread. Additionally, it simplifies that
`initiateChannel` method signature.
This is related to #27933. It introduces a jar named elasticsearch-core
in the lib directory. This commit moves the JarHell class from server to
elasticsearch-core. Additionally, PathUtils and some of Loggers are
moved as JarHell depends on them.
This is related to #27260. This commit moves the NioTransport from
:test:framework to a new nio-transport plugin. Additionally, supporting
tcp decoding classes are moved to this plugin. Generic byte reading and
writing contexts are moved to the nio library.
Additionally, this commit adds a basic MockNioTransport to
:test:framework that is a TcpTransport implementation for testing that
is driven by nio.
This commit attempts to continue unifying the logic between different
transport implementations. As transports call a `TcpTransport` callback
when a new channel is accepted, there is no need to internally track
channels accepted. Instead there is a set of accepted channels in
`TcpTransport`. This set is used for metrics and shutting down channels.
This is related to #27563. This commit modifies the
InboundChannelBuffer to support releasable byte pages. These byte
pages are provided by the PageCacheRecycler. The PageCacheRecycler
must be passed to the Transport with this change.
We currently do not have any server-side read timeouts implemented in
elasticsearch. This commit adds a read timeout setting that defaults to
30 seconds. If after 30 seconds a read has not occurred, the channel
will be closed. A timeout of value of 0 will disable the timeout.
This is related to #27422. Right now when we send a write to the netty
transport, we attach a listener to the future. When you submit a write
on the netty event loop and the event loop is shutdown, the onFailure
method is called. Unfortunately, netty then tries to notify the listener
which cannot be done without dispatching to the event loop. In this
case, the dispatch fails and netty logs and error and does not tell us.
This commit checks that netty is still not shutdown after sending a
message. If netty is shutdown, we complete the listener.
Currently we use ActionListener<TcpChannel> for connect, close, and send
message listeners in TcpTransport. However, all of the listeners have to
capture a reference to a channel in the case of the exception api being
called. This commit changes these listeners to be type <Void> as passing
the channel to onResponse is not necessary. Additionally, this change
makes it easier to integrate with low level transports (which use
different implementations of TcpChannel).
This commit is a follow up to the work completed in #27132. Essentially
it transitions two more methods (sendMessage and getLocalAddress) from
Transport to TcpChannel. With this change, there is no longer a need for
TcpTransport to be aware of the specific type of channel a transport
returns. So that class is no longer parameterized by channel type.
Right now our different transport implementations must duplicate
functionality in order to stay compliant with the requirements of
TcpTransport. They must all implement common logic to open channels,
close channels, keep track of channels for eventual shutdown, etc.
Additionally, there is a weird and complicated relationship between
Transport and TransportService. We eventually want to start merging
some of the functionality between these classes.
This commit starts moving towards a world where TransportService retains
all the application logic and channel state. Transport implementations
in this world will only be tasked with returning a channel when one is
requested, calling transport service when a channel is accepted from
a server, and starting / stopping itself.
Specifically this commit changes how channels are opened and closed. All
Transport implementations now return a channel type that must comply with
the new TcpChannel interface. This interface has the methods necessary
for TcpTransport to completely manage the lifecycle of a channel. This
includes setting the channel up, waiting for connection, adding close
listeners, and eventually closing.
This is a followup to #26521. This commit expands the alias added for
the elasticsearch client codebase to all codebases. The original full
jar name property is left intact. This only adds an alias without the
version, which should help ease the pain in updating any versions (ES
itself or dependencies).
If creating the REST request throws an exception (for example, because
of invalid headers), we leak the request due to failure to release the
buffer (which would otherwise happen after replying on the
channel). This commit addresses this leak by handling the failure case.
Relates #27222
This commit removes the `ByteBufStreamInput` `readBytesReference` and
`readBytesRef` methods. These methods are zero-copy which means that
they retain a reference to the underlying netty buffer. The problem is
that our `TcpTransport` is not designed to handle zero-copy. The netty
implementation sets the read index past the current message once it has
been deserialized, handled, and mostly likely dispatched to another
thread. This means that netty is free to release this buffer. So it is
unsafe to retain a reference to it without calling `retain`. And we
cannot call `retain` because we are not currently designed to handle
reference counting past the transport level.
This should not currently impact us as we wrap the `ByteBufStreamInput`
in `NamedWriteableAwareStreamInput` in the `TcpTransport`. This stream
essentially delegates to the underling stream. However, in the case of
`readBytesReference` and `readBytesRef` it leaves thw implementations
to the standard `StreamInput` methods. These methods call the read byte
array method which delegates to `ByteBufStreamInput`. The read byte
array method on `ByteBufStreamInput` copies so it is safe. The only
impact of this commit should be removing methods that could be dangerous
if they were eventually called due to some refactoring.
Right now we are attempting to set SO_LINGER to 0 on server channels
when we are stopping the tcp transport. This is not a supported socket
option and throws an exception. This also prevents the channels from
being closed.
This commit 1. doesn't set SO_LINGER for server channges, 2. checks
that it is a supported option in nio, and 3. changes the log message
to warn for server channel close exceptions.
While opening a connection to a node, a channel can subsequently
close. If this happens, a future callback whose purpose is to close all
other channels and disconnect from the node will fire. However, this
future will not be ready to close all the channels because the
connection will not be exposed to the future callback yet. Since this
callback is run once, we will never try to disconnect from this node
again and we will be left with a closed channel. This commit adds a
check that all channels are open before exposing the channel and throws
a general connection exception. In this case, the usual connection retry
logic will take over.
Relates #26932
With this commit we simplify our network layer by only allowing to define a
fixed receive predictor size instead of a minimum and maximum value. This also
means that the following (previously undocumented) settings are removed:
* http.netty.receive_predictor_min
* http.netty.receive_predictor_max
Using an adaptive sizing policy in the receive predictor is a very low-level
optimization. The implications on allocation behavior are extremely hard to grasp
(see our previous work in #23185) and adaptive sizing does not provide a lot of
benefits (see benchmarks in #26165 for more details).
* Add additional low-level logging handler
We have the trace handler which is useful for recording sent messages
but there are times where it would be useful to have more low-level
logging about the events occurring on a channel. This commit adds a
logging handler that can be enabled by setting a certain log level
(org.elasticsearch.transport.netty4.ESLoggingHandler) to trace that
provides trace logging on low-level channel events and includes some
information about the request/response read/write events on the channel
as well.
* Remove imports
* License header
* Remove redundant
* Add test
* More assertions
We should unwrap the cause looking for any suppressed errors or root
causes that are errors when checking if we should maybe die. This commit
causes that to be the case.
Relates #26884
This commit changes the log level on a write and flush failure to warn
as this is not necessarily an Elasticsearch problem but more likely
indicative of an infrastructure problem.
This commit fixes a #26855. Right now we set SO_LINGER to 0 if we are
stopping the transport. This can throw a ChannelClosedException if the
raw channel is already closed. We have a number of scenarios where it is
possible this could be called with a channel that is already closed.
This commit fixes the issue be checking that the channel is not closed
before attempting to set the socket option.
This commit reorders a maybe die check and a logging statement for the
following reasons:
- we should die as quickly as possible if the cause is fatal
- we do not want the JVM to be so broken that when we try to log
another exception is thrown (maybe another out of memory exception)
and then the maybe die is never invoked
- maybe die will log the cause anyway if the cause is fatal so we only
need to log if the cause is not fatal
At current, we do not feel there is enough of a reason to shade the low
level rest client. It caused problems with commons logging and IDE's
during the brief time it was used. We did not know exactly how many
users will need this, and decided that leaving shading out until we
gather more information is best. Users can still shade the jar
themselves. For information and feeback, see issue #26366.
Closes#26328
This reverts commit 3a20922046.
This reverts commit 2c271f0f22.
This reverts commit 9d10dbea39.
This reverts commit e816ef89a2.
There is a group of five settings relating to raw tcp configurations
(no_delay, buffer sizes, etc) that we have for the http transport. These
currently live in the netty module. As they are unrelated to netty
specifically, this commit moves these settings to the
`HttpTransportSettings` class in core.
Right now it is possible for the `HttpPipeliningHandler` to queue
pipelined responses. On channel close, we do not clear and release these
responses. This commit releases the responses and completes the promise.