This commit refactors the following packages:
* o.e.common.logging
* o.e.common.lucene
to the o.opensearch.common parent package. References throughout the codebase
have also been refactored.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
* [Rename] o.e.common subpackages round 1
This commit refactors the following subpackages of o.e.common:
* o.e.common.joda
* o.e.common.lease
* o.e.common.metrics
* o.e.common.network
* o.e.common.path
* o.e.common.recycling
* o.e.common.regex
* o.e.common.rounding
* o.e.common.text
* o.e.common.time
* o.e.common.transport
to the o.opensearch namespace. All references throughout the codebase have been
refactored.
Signed-off-by: Nicholas Knize <nknize@amazon.com>
* fix imports 1
Signed-off-by: Nicholas Knize <nknize@amazon.com>
This commit refactors the following packages:
* o.e.common.geo
* o.e.common.hash
* o.e.common.io
into the o.opensearch.common namespace. All references throughout the codebase
have been refactored.
Signed-off-by: Nicholas Knize <nknize@amazon.com>
This commit refactors the following:
* o.e.common.cache
* o.e.common.collect
* o.e.common.component
* o.e.common.compress
* o.e.common.document
to the o.opensearch namespace. All references throughout the codebase are also
refactored
Signed-off-by: Nicholas Knize <nknize@amazon.com>
This commit refactors the following packages:
* o.e.common.blobstore
* o.e.common.breaker
* o.e.common.bytes
to the o.opensearch.common namespace. All references throughout the codebase
have been refactored.
Signed-off-by: Nicholas Knize <nknize@amazon.com>
This commit refactors classes under o.e.common to o.opensearch.common. All
references throughout the codebase have also been refactored.
Signed-off-by: Nicholas Knize <nknize@amazon.com>
This commit refactors the remaining o.e.cluster packages to
o.opensearch.cluster. All references throughout the codebase are also
refactored.
Signed-off-by: Nicholas Knize <nknize@amazon.com>
Refactor the transport package in the server module to rename the package from `org.elasticsearch.transport` to `org.opensearch.transport`
Signed-off-by: Rabi Panda <adnapibar@gmail.com>
This commit refactors o.e.Version to o.opensearch.Version. This is retained in a
single commit to serve as a reference for re-versioning the opensearch codebase
from legacy 7.10 to 1.0.
Signed-off-by: Nicholas Knize <nknize@amazon.com>
This commit refactors all OpenSearch classes in the root server package to
o.opensearch. All references throughout the codebase are also refactored.
Signed-off-by: Nicholas Knize <nknize@amazon.com>
This commit refactors the o.e.cli and o.e.client packages from elasticsearch to
o.opensearch.cli and o.opensearch.client packages in the server module,
respectively.
Signed-off-by: Nicholas Knize <nknize@amazon.com>
This commit refactors the following subpackages:
* o.e.cluster.health
* o.e.cluster.metadata
* o.e.cluster.node
to o.opensearch.cluster.*. All other references throughout the codebase are
updated.
Signed-off-by: Nicholas Knize <nknize@amazon.com>
Refactor the server/tasks package to rename the package names from`org.elasticsearch.tasks` to `org.opensearch.tasks`.
Signed-off-by: Rabi Panda <adnapibar@gmail.com>
Refactor the server/threadpool package to rename the package names from`org.elasticsearch.threadpool` to `org.opensearch.threadpool`.
Signed-off-by: Rabi Panda <adnapibar@gmail.com>
This commit refactors all classes in o.e.action.admin.cluster to
org.opensearch.action.admin.cluster. References are updated
throughout the codebase.
Signed-off-by: Nicholas Knize <nknize@amazon.com>
This commit refactors top level classes in o.e.action to o.opensearch.action.
References throughout the rest of the codebase have been updated.
Signed-off-by: Nicholas Knize <nknize@amazon.com>
This commit refactors the ElasticsearchException class located in the server module
to OpenSearchException. References and usages throughout the rest of the
codebase are fully refactored.
Signed-off-by: Nicholas Knize <nknize@amazon.com>
Currently we read in 64KB blocks from the network. When TLS is not
enabled, these bytes are normally passed all the way to the application
layer (some exceptions: compression). For the HTTP layer this means that
these bytes can live throughout the entire lifecycle of an indexing
request.
The problem is that if the reads from the socket are small, this means
that 64KB buffers can be consumed by 1KB or smaller reads. If the socket
buffer or TCP buffer sizes are small, the leads to massive memory
waste. It has been identified as a major source of OOMs on coordinating
nodes as Elasticsearch easily exhausts the heap for these network bytes.
This commit resolves the problem by placing a handler after the TLS
handler to copy these bytes to a more appropriate buffer size as
necessary. This comes after TLS, because TLS is a framing layer which
often resolves this problem for us (the 64KB buffer will be decoded
into a more appropriate buffer size). However, this extra handler will
solve it for the non-TLS pipelines.
Currently Netty will batch compression an entire HTTP response
regardless of its content size. It allocates a byte array at least of
the same size as the uncompressed content. This causes issues with our
attempts to remove humungous G1GC allocations. This commit resolves the
issue by split responses into 128KB chunks.
This has the side-effect of making large outbound HTTP responses that
are compressed be send as chunked transfer-encoding.
Currently we duplicate our specialized cors logic in all transport
plugins. This is unnecessary as it could be implemented in a single
place. This commit moves the logic to server. Additionally it fixes a
but where we are incorrectly closing http channels on early Cors
responses.
Currently we log the NettyAllocator description when the netty plugin is
created. Unfortunately, this hits certain static fields in Netty which
triggers the settings of the number of CPU processors. This conflicts
with out Elasticsearch behavior to override this based on a setting.
This commit resolves the issue by logging after the processors have been
set.
Currently the netty pool chunk size defaults to 16MB. The number does
not play well with the G1GC which causes this to consume entire regions.
Additionally, we normally allocated arrays of size 64KB or less. This
means that Elasticsearch could handle a smaller pool chunk size to play
nicer with the G1GC.
* Merge test runner task into RestIntegTest (#60261)
* Merge test runner task into RestIntegTest
* Reorganizing Standalone runner and RestIntegTest task
* Rework general test task configuration and extension
* Fix merge issues
* use former 7.x common test configuration
In #60297 we added some tests related to logging from the transport
layer, but these tests failed occasionally since the cluster
was kept alive between test invocations but the logging framework
expected it only to be used for a single test. With this commit we
reduce the scope of the internal test cluster to `TEST` to solve this
problem.
Closes#60321.
Transport connections between nodes remain in place until one or other
node shuts down or the connection is disrupted by a flaky network.
Today it is very difficult to demonstrate that transient failures and
cluster instability are caused by the network even though this is often
the case. In particular, transport connections open and close without
logging anything, even at `DEBUG` level, making it very hard to quantify
the scale of the problem or to correlate the networking problems with
external events.
This commit adds the missing `DEBUG`-level logging when transport
connections open and close, and also tracks the total number of
transport connections a node has opened as a measure of the stability of
the underlying network.
Introduce a javaRestTest source set and task to compliment the yamlRestTest.
javaRestTest differs such that the code is sourced from Java and may have
different dependencies and setup requirements for the test clusters. This also
allows the tests to run in parallel in different cluster instances to prevent any
cross test contamination between the two types of tests.
Included in this PR is all :modules no longer use the integTest task. The tests
are now driven by test, yamlRestTest, javaRestTest, and internalClusterTest.
Since only :modules (and :rest-api-spec) have been converted to yamlRestTest
we can now disable the integTest task if either yamlRestTest or javaRestTest have
been applied. Once all projects are converted, we can delete the integTest task.
related: #56841
related: #59444
keepalives tell any intermediate devices that the connection remains alive, which helps with overzealous firewalls that are
killing idle connections. keepalives are enabled by default in Elasticsearch, but use system defaults for their
configuration, which often times do not have reasonable defaults (e.g. 7200s for TCP_KEEP_IDLE) in the context of
distributed systems such as Elasticsearch.
This PR sets the socket-level keep_alive options for network.tcp.{keep_idle,keep_interval} to 5 minutes on configurations
that support it (>= Java 11 & (MacOS || Linux)) and where the system defaults are set to something higher than 5
minutes. This helps keep the connections alive while not interfering with system defaults or user-specified settings
unless they are deemed to be set too high by providing better out-of-the-box defaults.
This commit moves the modules REST tests to the
newly introduced yamlRestTest source set. A few
tests have also been re-named to include the correct
IT suffix. Without changing the names, the testing
conventions task would fail since now that the YAML
tests are no longer present pacify the convention.
These tests have moved to the internalClusterTest
source set.
related: #56841
This commit creates a new Gradle plugin to provide a separate task name
and source set for running YAML based REST tests. The only project
converted to use the new plugin in this PR is distribution/archives/integ-test-zip.
For which the testing has been moved to :rest-api-spec since it makes the most
sense and it avoids a small but awkward change to the distribution plugin.
The remaining cases in modules, plugins, and x-pack will be handled in followups.
This plugin is distinctly different from the plugin introduced in #55896 since
the YAML REST tests are intended to be black box tests over HTTP. As such they
should not (by default) have access to the classpath for that which they are testing.
The YAML based REST tests will be moved to separate source sets (yamlRestTest).
The which source is the target for the test resources is dependent on if this
new plugin is applied. If it is not applied, it will default to the test source
set.
Further, this introduces a breaking change for plugin developers that
use the YAML testing framework. They will now need to either use the new source set
and matching task, or configure the rest resources to use the old "test" source set that
matches the old integTest task. (The former should be preferred).
As part of this change (which is also breaking for plugin developers) the
rest resources plugin has been removed from the build plugin and now requires
either explicit application or application via the new YAML REST test plugin.
Plugin developers should be able to fix the breaking changes to the YAML tests
by adding apply plugin: 'elasticsearch.yaml-rest-test' and moving the YAML tests
under a yamlRestTest folder (instead of test)
Currently we are leaving the settings to default port range in the nio
and netty4 http server test. This has recently led to tests failing due
to what appears to be a port conflict with other processes. This commit
modifies these tests to use the test case helper method to generate port
ranges.
Fixes#58433 and #58296.
* Replace compile configuration usage with api (#58451)
- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
- required as java library will by default not have build jar file
- jar file is now explicit input of the task and gradle will ensure its properly build
* Fix compile usages in 7.x branch
Netty4HttpServerTransportTests has started to fail intermittently. It
seems like unexpected successful responses are being received when the
test is simulating errors. This commit adds logging to the test to
provide additional information when there is an unexpected success. It
also adds the logging to the nio http test.
Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and
also that thread contexts are not leaked). Moves the ClusterApplierService to use the system
context (same as we do for MasterService), which allows to remove a hack from
TemplateUgradeService and makes it clearer that applying CS updates is fully executing under
system context.
Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and
also that thread contexts are not leaked). Moves the ClusterApplierService to use the system
context (same as we do for MasterService), which allows to remove a hack from
TemplateUgradeService and makes it clearer that applying CS updates is fully executing under
system context.
Follow up to #56961:
We can be a little more efficient than just serializing at the IO loop by serializing
only when we flush to a channel. This has the advantage that we don't serialize a long
queue of messages for a channel that isn't writable for a longer period of time (unstable network,
actually writing large volumes of data, etc.).
Also, this further reduces the time for which we hold on to the write buffer for a message,
making allocations because of an empty page cache recycler pool less likely.
Almost every outbound message is serialized to buffers of 16k pagesize.
We were serializing these messages off the IO loop (and retaining the concrete message
instance as well) and would then enqueue it on the IO loop to be dealt with as soon as the
channel is ready.
1. This would cause buffers to be held onto for longer than necessary, causing less reuse on average.
2. If a channel was slow for some reason, not only would concrete message instances queue up for it, but also 16k of buffers would be reserved for each message until it would be written+flushed physically.
With this change, the serialization happens on the event loop which effectively limits the number of buffers that `N` IO-threads will ever use so long as messages are small and channels writable.
Also, this change dereferences the reference to the concrete outbound message as soon as it has been serialized to save some more on GC.
This reduces the GC time for a default PMC run by about 50% in experiments (3 nodes, 2G heap each, loopback ... obvious caveat is that GC isn't that heavy in the first place with recent changes but still a measurable gain).
I also expect it to be helpful for master node stability by causing less of a spike if master is e.g. hit by a large number of requests that are processed batched (e.g. shard snapshot status updates) and responded to in a short time frame all at once.
Obviously, the downside to this change is that it introduces more latency on the IO loop for the serialization. But since we read all of these messages on the IO loop as well I don't see it as much of a qualitative change really and the more predictable buffer use seems much more valuable relatively.
We don't need to hold on to the request body past the beginning of sending
the response. There is no need to keep a reference to it until after the response
has been sent fully and we can eagerly release it here.
Note, this can be optimized further to release the contents even earlier but for now
this is an easy increment to saving some memory on the IO pool.
Elasticsearch requires that a HttpRequest abstraction be implemented
by http modules before server processing. This abstraction controls when
underlying resources are released. This commit moves this abstraction to
be created immediately after content aggregation. This change will
enable follow-up work including moving Cors logic into the server
package and tracking bytes as they are aggregated from the network
level.
In most cases we are seeing a `PooledHeapByteBuf` here now. No need to
redundantly create an new `ByteBuffer` and single element array for it
here when we can just directly unwrap its internal `byte[]`.
We never do any file IO or other blocking work on the transport threads
so no tangible benefit can be derived from using more threads than CPUs
for IO.
There are however significant downsides to using more threads than necessary
with Netty in particular. Since we use the default setting for
`io.netty.allocator.useCacheForAllThreads` which is `true` we end up
using up to `16MB` of thread local buffer cache for each transport thread.
Meaning we potentially waste CPUs * 16MB of heap for unnecessary IO threads in addition to obvious inefficiencies of artificially adding extra context switches.
Currently Elasticsearch creates independent event loop groups for each
transport (http and internal) transport type. This is unnecessary and
can lead to contention when different threads access shared resources
(ex: allocators). This commit moves to a model where, by default, the
event loops are shared between the transports. The previous behavior can
be attained by specifically setting the http worker count.