Commit Graph

380 Commits

Author SHA1 Message Date
Nick Knize f216f2e556 [Rename] o.e.common.logging,lucene (#335)
This commit refactors the following packages:

* o.e.common.logging
* o.e.common.lucene

to the o.opensearch.common parent package. References throughout the codebase
have also been refactored.

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2021-03-21 20:56:34 -05:00
Nick Knize 946c7bb2dc [Rename] o.e.common subpackages round 1 (#332)
* [Rename] o.e.common subpackages round 1

This commit refactors the following subpackages of o.e.common:

* o.e.common.joda
* o.e.common.lease
* o.e.common.metrics
* o.e.common.network
* o.e.common.path
* o.e.common.recycling
* o.e.common.regex
* o.e.common.rounding
* o.e.common.text
* o.e.common.time
* o.e.common.transport

to the o.opensearch namespace. All references throughout the codebase have been
refactored.

Signed-off-by: Nicholas Knize <nknize@amazon.com>

* fix imports 1

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Nick Knize c4565adc9d [Rename] o.e.common.geo, hash, io (#317)
This commit refactors the following packages:

* o.e.common.geo
* o.e.common.hash
* o.e.common.io

into the o.opensearch.common namespace. All references throughout the codebase
have been refactored.

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Nick Knize 73441879f1 [Rename] o.e.common.cache,collect,component,compress,document (#309)
This commit refactors the following:

* o.e.common.cache
* o.e.common.collect
* o.e.common.component
* o.e.common.compress
* o.e.common.document

to the o.opensearch namespace. All references throughout the codebase are also
refactored

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Rabi Panda 43c9b2425e [Rename] refactor client/rest module. (#310)
Renames `org.elasticsearch.client` to `org.opensearch.client` in package names and references.

Signed-off-by: Rabi Panda <adnapibar@gmail.com>
2021-03-21 20:56:34 -05:00
Nick Knize 0f74cbed1c [Rename] o.e.common.blobstore,breaker,bytes (#307)
This commit refactors the following packages:

* o.e.common.blobstore
* o.e.common.breaker
* o.e.common.bytes

to the o.opensearch.common namespace. All references throughout the codebase
have been refactored.

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Nick Knize dafc0510ea [Rename] o.e.common classes (#305)
This commit refactors classes under o.e.common to o.opensearch.common. All
references throughout the codebase have also been refactored.

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Nick Knize 5ad13ca2f7 [Rename] o.e.cluster (#297)
This commit refactors the remaining o.e.cluster packages to
o.opensearch.cluster. All references throughout the codebase are also
refactored.

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Rabi Panda a687087b44 [Rename] refactor o.e.transport in the server module. (#284)
Refactor the transport package in the server module to rename the package from `org.elasticsearch.transport` to `org.opensearch.transport`

Signed-off-by: Rabi Panda <adnapibar@gmail.com>
2021-03-21 20:56:34 -05:00
Nick Knize fe2b5d6d39 [Rename] o.e.version (#296)
This commit refactors o.e.Version to o.opensearch.Version. This is retained in a
single commit to serve as a reference for re-versioning the opensearch codebase
from legacy 7.10 to 1.0.

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Nick Knize 0deb25590d [Rename] server OpenSearch classes (#290)
This commit refactors all OpenSearch classes in the root server package to
o.opensearch. All references throughout the codebase are also refactored.

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Nick Knize 452f6e1b81 [Rename] server cli and client (#254)
This commit refactors the o.e.cli and o.e.client packages from elasticsearch to
o.opensearch.cli and o.opensearch.client packages in the server module,
respectively.

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Nick Knize fe7f29f549 [Rename] o.e.cluster.health,metadata,node (#283)
This commit refactors the following subpackages:

* o.e.cluster.health
* o.e.cluster.metadata
* o.e.cluster.node

to o.opensearch.cluster.*. All other references throughout the codebase are
updated.

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Rabi Panda a7d8245a47 [Rename] refactor server/tasks package. (#265)
Refactor the server/tasks package to rename the package names from`org.elasticsearch.tasks` to `org.opensearch.tasks`.

Signed-off-by: Rabi Panda <adnapibar@gmail.com>
2021-03-21 20:56:34 -05:00
Rabi Panda dc4736dca1 [Rename] refactor server/threadpool package. (#267)
Refactor the server/threadpool package to rename the package names from`org.elasticsearch.threadpool` to `org.opensearch.threadpool`.

Signed-off-by: Rabi Panda <adnapibar@gmail.com>
2021-03-21 20:56:34 -05:00
Rabi Panda 3eee5183d1 [Rename] server/rest (#229)
This commit refactors the `server/rest` package as part of the Elasticsearch to OpenSearch renaming.

Signed-off-by: Rabi Panda <adnapibar@gmail.com>
2021-03-21 20:56:34 -05:00
Nick Knize 8aa818e93e [Rename] refactor o.e.action.admin.cluster (#207)
This commit refactors all classes in o.e.action.admin.cluster to 
org.opensearch.action.admin.cluster. References are updated 
throughout the codebase.

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Nick Knize 1203aa7302 [Rename] refactor o.e.action classes (#203)
This commit refactors top level classes in o.e.action to o.opensearch.action.
References throughout the rest of the codebase have been updated.

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Rabi Panda 95f5997433 [Rename] modules/transport-netty4 (#225)
This commit refactors the transport-netty4 module as part of the Elasticsearch to OpenSearch renaming.

Signed-off-by: Rabi Panda <adnapibar@gmail.com>
2021-03-21 20:56:34 -05:00
Nick Knize ccceb381db [Rename] ElasticsearchException class in server module (#165)
This commit refactors the ElasticsearchException class located in the server module
to OpenSearchException. References and usages throughout the rest of the
codebase are fully refactored.

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Armin Braun 51e9d6f227
Revert Serializing Outbound Transport Messages on IO Threads (#64632) (#64654)
Serializing outbound transport message on the IO loop was introduced in https://github.com/elastic/elasticsearch/pull/56961. Unfortunately it turns out that this is incompatible with assumptions made by CCR code here: f22ddf822e/x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/action/repositories/GetCcrRestoreFileChunkAction.java (L60-L61) and that are not easy to work around on short notice.

Raising reverting this move (as a temporary solution, it's still a valuable change long-term) as a blocker therefore as this seriously affects the stability of the initial phase of the CCR following by causing corrupted bytes to be send to the follower.
2020-11-05 16:29:12 +01:00
Tim Brooks 7f6d1981a1
Transfer network bytes to smaller buffer (#62673)
Currently we read in 64KB blocks from the network. When TLS is not
enabled, these bytes are normally passed all the way to the application
layer (some exceptions: compression). For the HTTP layer this means that
these bytes can live throughout the entire lifecycle of an indexing
request.

The problem is that if the reads from the socket are small, this means
that 64KB buffers can be consumed by 1KB or smaller reads. If the socket
buffer or TCP buffer sizes are small, the leads to massive memory
waste. It has been identified as a major source of OOMs on coordinating
nodes as Elasticsearch easily exhausts the heap for these network bytes.

This commit resolves the problem by placing a handler after the TLS
handler to copy these bytes to a more appropriate buffer size as
necessary. This comes after TLS, because TLS is a framing layer which
often resolves this problem for us (the 64KB buffer will be decoded
into a more appropriate buffer size). However, this extra handler will
solve it for the non-TLS pipelines.
2020-10-01 10:39:24 -06:00
Tim Brooks 59dd889c10
Split up large HTTP responses in outbound pipeline (#62666)
Currently Netty will batch compression an entire HTTP response
regardless of its content size. It allocates a byte array at least of
the same size as the uncompressed content. This causes issues with our
attempts to remove humungous G1GC allocations. This commit resolves the
issue by split responses into 128KB chunks.

This has the side-effect of making large outbound HTTP responses that
are compressed be send as chunked transfer-encoding.
2020-09-24 16:35:52 -06:00
Tim Brooks 43a4882951
Move CorsHandler to server (#62007)
Currently we duplicate our specialized cors logic in all transport
plugins. This is unnecessary as it could be implemented in a single
place. This commit moves the logic to server. Additionally it fixes a
but where we are incorrectly closing http channels on early Cors
responses.
2020-09-24 16:32:59 -06:00
Tim Brooks fae2f5f8e1
Log alloc description after netty processors set (#62741)
Currently we log the NettyAllocator description when the netty plugin is
created. Unfortunately, this hits certain static fields in Netty which
triggers the settings of the number of CPU processors. This conflicts
with out Elasticsearch behavior to override this based on a setting.

This commit resolves the issue by logging after the processors have been
set.
2020-09-21 19:52:51 -06:00
Tim Brooks 9bf0d9105a
Change netty pool chunk size to respect G1 region (#62410)
Currently the netty pool chunk size defaults to 16MB. The number does
not play well with the G1GC which causes this to consume entire regions.
Additionally, we normally allocated arrays of size 64KB or less. This
means that Elasticsearch could handle a smaller pool chunk size to play
nicer with the G1GC.
2020-09-21 16:45:09 -06:00
Rene Groeschke bdd7347bbf
Merge test runner task into RestIntegTest (7.x backport) (#60600)
* Merge test runner task into RestIntegTest (#60261)
* Merge test runner task into RestIntegTest
* Reorganizing Standalone runner and RestIntegTest task
* Rework general test task configuration and extension
* Fix merge issues
* use former 7.x common test configuration
2020-08-04 14:46:32 +02:00
David Turner bbacad648a Fix network logging test failures (#60334)
In #60297 we added some tests related to logging from the transport
layer, but these tests failed occasionally since the cluster
was kept alive between test invocations but the logging framework
expected it only to be used for a single test. With this commit we
reduce the scope of the internal test cluster to `TEST` to solve this
problem.

Closes #60321.
2020-07-29 08:29:09 +01:00
David Turner 9c62b5cb96 Mute tests for #60321 2020-07-28 18:12:54 +01:00
David Turner 9450ea08b4 Log and track open/close of transport connections (#60297)
Transport connections between nodes remain in place until one or other
node shuts down or the connection is disrupted by a flaky network.
Today it is very difficult to demonstrate that transient failures and
cluster instability are caused by the network even though this is often
the case. In particular, transport connections open and close without
logging anything, even at `DEBUG` level, making it very hard to quantify
the scale of the problem or to correlate the networking problems with
external events.

This commit adds the missing `DEBUG`-level logging when transport
connections open and close, and also tracks the total number of
transport connections a node has opened as a measure of the stability of
the underlying network.
2020-07-28 17:08:04 +01:00
Jake Landis 92ce41cfaf
[7.x] Introduce javaRestTest source set/task and convert modules (#59939) (#60026)
Introduce a javaRestTest source set and task to compliment the yamlRestTest.
javaRestTest differs such that the code is sourced from Java and may have
different dependencies and setup requirements for the test clusters. This also
allows the tests to run in parallel in different cluster instances to prevent any
cross test contamination between the two types of tests.

Included in this PR is all :modules no longer use the integTest task. The tests
are now driven by test, yamlRestTest, javaRestTest, and internalClusterTest.
Since only :modules (and :rest-api-spec) have been converted to yamlRestTest
we can now disable the integTest task if either yamlRestTest or javaRestTest have
been applied. Once all projects are converted, we can delete the integTest task.

related: #56841
related: #59444
2020-07-28 08:39:11 -05:00
Yannick Welsch ffe114b890 Set specific keepalive options by default on supported platforms (#59278)
keepalives tell any intermediate devices that the connection remains alive, which helps with overzealous firewalls that are
killing idle connections. keepalives are enabled by default in Elasticsearch, but use system defaults for their
configuration, which often times do not have reasonable defaults (e.g. 7200s for TCP_KEEP_IDLE) in the context of
distributed systems such as Elasticsearch.

This PR sets the socket-level keep_alive options for network.tcp.{keep_idle,keep_interval} to 5 minutes on configurations
that support it (>= Java 11 & (MacOS || Linux)) and where the system defaults are set to something higher than 5
minutes. This helps keep the connections alive while not interfering with system defaults or user-specified settings
unless they are deemed to be set too high by providing better out-of-the-box defaults.
2020-07-28 11:10:04 +02:00
Jake Landis 665b7b7bd8
Convert modules to use yamlRestTest (#59089) (#59446)
This commit moves the modules REST tests to the
newly introduced yamlRestTest source set. A few
tests have also been re-named to include the correct
IT suffix. Without changing the names, the testing
conventions task would fail since now that the YAML
tests are no longer present pacify the convention.
These tests have moved to the internalClusterTest
source set.

related: #56841
2020-07-13 13:53:05 -05:00
Jake Landis 604c6dd528
7.x - Create plugin for yamlTest task (#56841) (#59090)
This commit creates a new Gradle plugin to provide a separate task name
and source set for running YAML based REST tests. The only project
converted to use the new plugin in this PR is distribution/archives/integ-test-zip.
For which the testing has been moved to :rest-api-spec since it makes the most
sense and it avoids a small but awkward change to the distribution plugin.

The remaining cases in modules, plugins, and x-pack will be handled in followups.

This plugin is distinctly different from the plugin introduced in #55896 since
the YAML REST tests are intended to be black box tests over HTTP. As such they
should not (by default) have access to the classpath for that which they are testing.

The YAML based REST tests will be moved to separate source sets (yamlRestTest).
The which source is the target for the test resources is dependent on if this
new plugin is applied. If it is not applied, it will default to the test source
set.

Further, this introduces a breaking change for plugin developers that
use the YAML testing framework. They will now need to either use the new source set
and matching task, or configure the rest resources to use the old "test" source set that
matches the old integTest task. (The former should be preferred).

As part of this change (which is also breaking for plugin developers) the
rest resources plugin has been removed from the build plugin and now requires
either explicit application or application via the new YAML REST test plugin.

Plugin developers should be able to fix the breaking changes to the YAML tests
by adding apply plugin: 'elasticsearch.yaml-rest-test' and moving the YAML tests
under a yamlRestTest folder (instead of test)
2020-07-06 14:16:26 -05:00
Tim Brooks 605e24ed7c
Use `getPortRange` in http server tests (#58794)
Currently we are leaving the settings to default port range in the nio
and netty4 http server test. This has recently led to tests failing due
to what appears to be a port conflict with other processes. This commit
modifies these tests to use the test case helper method to generate port
ranges.

Fixes #58433 and #58296.
2020-07-02 13:21:45 -06:00
Rene Groeschke d952b101e6
Replace compile configuration usage with api (7.x backport) (#58721)
* Replace compile configuration usage with api (#58451)

- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
  - required as java library will by default not have build jar file
  - jar file is now explicit input of the task and gradle will ensure its properly build

* Fix compile usages in 7.x branch
2020-06-30 15:57:41 +02:00
Tim Brooks 5efec3a517
Add error logging when http test fails (#58505)
Netty4HttpServerTransportTests has started to fail intermittently. It
seems like unexpected successful responses are being received when the
test is simulating errors. This commit adds logging to the test to
provide additional information when there is an unexpected success. It
also adds the logging to the nio http test.
2020-06-24 11:02:20 -06:00
Luca Cavanna 7e2bb8d6a2 Mute Netty4HttpServerTransportTests#testCorsRequest (#58480)
Relates to #58433
2020-06-24 14:31:38 +02:00
Rene Groeschke abc72c1a27
Unify dependency licenses task configuration (#58116) (#58274)
- Remove duplicate dependency configuration
- Use task avoidance api accross the build
- Remove redundant licensesCheck config
2020-06-18 08:15:50 +02:00
Yannick Welsch 80f221e920
Use clean thread context for transport and applier service (#57792) (#57914)
Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and
also that thread contexts are not leaked). Moves the ClusterApplierService to use the system
context (same as we do for MasterService), which allows to remove a hack from
TemplateUgradeService and makes it clearer that applying CS updates is fully executing under
system context.
2020-06-10 10:30:28 +02:00
Yannick Welsch 9eec819c5b Revert "Use clean thread context for transport and applier service (#57792)"
This reverts commit 259be236cf.
2020-06-09 22:24:54 +02:00
Yannick Welsch 259be236cf Use clean thread context for transport and applier service (#57792)
Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and
also that thread contexts are not leaked). Moves the ClusterApplierService to use the system
context (same as we do for MasterService), which allows to remove a hack from
TemplateUgradeService and makes it clearer that applying CS updates is fully executing under
system context.
2020-06-09 12:32:28 +02:00
Armin Braun 24779c80f9
Serialize Outbound Message on Flush (#57084) (#57682)
Follow up to #56961:

We can be a little more efficient than just serializing at the IO loop by serializing
only when we flush to a channel. This has the advantage that we don't serialize a long
queue of messages for a channel that isn't writable for a longer period of time (unstable network,
actually writing large volumes of data, etc.).
Also, this further reduces the time for which we hold on to the write buffer for a message,
making allocations because of an empty page cache recycler pool less likely.
2020-06-04 18:06:13 +02:00
Armin Braun ba2d70d8eb
Serialize Outbound Messages on IO Threads (#56961) (#57080)
Almost every outbound message is serialized to buffers of 16k pagesize.
We were serializing these messages off the IO loop (and retaining the concrete message
instance as well) and would then enqueue it on the IO loop to be dealt with as soon as the
channel is ready.
1. This would cause buffers to be held onto for longer than necessary, causing less reuse on average.
2. If a channel was slow for some reason, not only would concrete message instances queue up for it, but also 16k of buffers would be reserved for each message until it would be written+flushed physically.

With this change, the serialization happens on the event loop which effectively limits the number of buffers that `N` IO-threads will ever use so long as messages are small and channels writable.
Also, this change dereferences the reference to the concrete outbound message as soon as it has been serialized to save some more on GC.

This reduces the GC time for a default PMC run by about 50% in experiments (3 nodes, 2G heap each, loopback ... obvious caveat is that GC isn't that heavy in the first place with recent changes but still a measurable gain).
I also expect it to be helpful for master node stability by causing less of a spike if master is e.g. hit by a large number of requests that are processed batched (e.g. shard snapshot status updates) and responded to in a short time frame all at once.

Obviously, the downside to this change is that it introduces more latency on the IO loop for the serialization. But since we read all of these messages on the IO loop as well I don't see it as much of a qualitative change really and the more predictable buffer use seems much more valuable relatively.
2020-06-02 16:15:18 +02:00
Armin Braun 56401d3f66
Release HTTP Request Body Earlier (#57094) (#57110)
We don't need to hold on to the request body past the beginning of sending
the response. There is no need to keep a reference to it until after the response
has been sent fully and we can eagerly release it here.
Note, this can be optimized further to release the contents even earlier but for now
this is an easy increment to saving some memory on the IO pool.
2020-05-25 13:00:19 +02:00
Tim Brooks 57c3a61535
Create HttpRequest earlier in pipeline (#56393)
Elasticsearch requires that a HttpRequest abstraction be implemented
by http modules before server processing. This abstraction controls when
underlying resources are released. This commit moves this abstraction to
be created immediately after content aggregation. This change will
enable follow-up work including moving Cors logic into the server
package and tracking bytes as they are aggregated from the network
level.
2020-05-18 14:54:01 -06:00
Armin Braun cac85a6f18
Shorter Path in Netty ByteBuf Unwrap (#56740) (#56857)
In most cases we are seeing a `PooledHeapByteBuf` here now. No need to
redundantly create an new `ByteBuffer` and single element array for it
here when we can just directly unwrap its internal `byte[]`.
2020-05-16 11:54:36 +02:00
Armin Braun 14a042fbe5
Make No. of Transport Threads == Available CPUs (#56488) (#56780)
We never do any file IO or other blocking work on the transport threads
so no tangible benefit can be derived from using more threads than CPUs
for IO.
There are however significant downsides to using more threads than necessary
with Netty in particular. Since we use the default setting for
`io.netty.allocator.useCacheForAllThreads` which is `true` we end up
using up to `16MB` of thread local buffer cache for each transport thread.
Meaning we potentially waste CPUs * 16MB of heap for unnecessary IO threads in addition to obvious inefficiencies of artificially adding extra context switches.
2020-05-14 21:33:46 +02:00
Armin Braun b449661b8f
Remove Unused ByteBufStreamInput (#56567) (#56601)
We're not using this one any more.
2020-05-12 16:04:58 +02:00
Tim Brooks 760ab726c2
Share netty event loops between transports (#56553)
Currently Elasticsearch creates independent event loop groups for each
transport (http and internal) transport type. This is unnecessary and
can lead to contention when different threads access shared resources
(ex: allocators). This commit moves to a model where, by default, the
event loops are shared between the transports. The previous behavior can
be attained by specifically setting the http worker count.
2020-05-11 15:43:43 -06:00