It was 10mb and that was causing trouble when folks reindex-from-remoted
with large documents.
We also improve the error reporting so it tells folks to use a smaller
batch size if they hit a buffer size exception. Finally, adds some docs
to reindex-from-remote mentioning the buffer and giving an example of
lowering the size.
Closes#21185
Lucene 6.3 is expected to be released in the next weeks so it'd be good to give
it some integration testing. I had to upgrade randomized-testing too so that
both Lucene and Elasticsearch are on the same version.
This change proposes the removal of all non-tcp transport implementations. The
mock transport can be used by default to run tests instead of local transport that has
roughly the same performance compared to TCP or at least not noticeably slower.
This is a master only change, deprecation notice in 5.x will be committed as a
separate change.
Today when parsing a request, Elasticsearch silently ignores incorrect
(including parameters with typos) or unused parameters. This is bad as
it leads to requests having unintended behavior (e.g., if a user hits
the _analyze API and misspell the "tokenizer" then Elasticsearch will
just use the standard analyzer, completely against intentions).
This commit removes lenient URL parameter parsing. The strategy is
simple: when a request is handled and a parameter is touched, we mark it
as such. Before the request is actually executed, we check to ensure
that all parameters have been consumed. If there are remaining
parameters yet to be consumed, we fail the request with a list of the
unconsumed parameters. An exception has to be made for parameters that
format the response (as opposed to controlling the request); for this
case, handlers are able to provide a list of parameters that should be
excluded from tripping the unconsumed parameters check because those
parameters will be used in formatting the response.
Additionally, some inconsistencies between the parameters in the code
and in the docs are corrected.
Relates #20722
* Build: Remove old maven deploy support
This change removes the old maven deploy that we have in parallel to
maven-publish, and makes maven-publish fully work with publishing to
maven local. Using `gradle publishToMavenLocal` should be used to
publish to .m2.
Note that there is an unfortunate hack that means for
zip artifacts we must first create/publish a dummy pom file, and then
follow that with the real pom file. It would be nice to have the pom
file contains packaging=zip, but maven central then requires sources and
javadocs. But our zips are really just attached artifacts, so we already
set the packaging type to pom for our zip files. This change just works
around a limitation of the underlying maven publishing library which
silently skips attached artifacts when the packaging type is set to pom.
relates #20164closes#20375
* Remove unnecessary extra spacing
This change replaces the fields parameter with stored_fields when it makes sense.
This is dictated by the renaming we made in #18943 for the search API.
The following list of endpoint has been changed to use `stored_fields` instead of `fields`:
* get
* mget
* explain
The documentation and the rest API spec has been updated to cope with the changes for the following APIs:
* delete_by_query
* get
* mget
* explain
The `fields` parameter has been deprecated for the following APIs (it is replaced by _source filtering):
* update: the fields are extracted from the _source directly.
* bulk: the fields parameter is used but fields are extracted from the source directly so it is allowed to have non-stored fields.
Some APIs still have the `fields` parameter for various reasons:
* cat.fielddata: the fields paramaters relates to the fielddata fields that should be printed.
* indices.clear_cache: used to indicate which fielddata fields should be cleared.
* indices.get_field_mapping: used to filter fields in the mapping.
* indices.stats: get stats on fields (stored or not stored).
* termvectors: fields are retrieved from the stored fields if possible and extracted from the _source otherwise.
* mtermvectors:
* nodes.stats: the fields parameter is used to concatenate completion_fields and fielddata_fields so it's not related to stored_fields at all.
Fixes#20155
* master:
Increase visibility of deprecation logger
Skip transport client plugin installed on JDK 9
Explicitly disable Netty key set replacement
percolator: Fail indexing percolator queries containing either a has_child or has_parent query.
Make it possible for Ingest Processors to access AnalysisRegistry
Allow RestClient to send array-based headers
Silence rest util tests until the bogusness can be simplified
Remove unknown HttpContext-based test as it fails unpredictably on different JVMs
Tests: Improve rest suite names and generated test names for docs tests
Add support for a RestClient base path
This commit adds an assumption to
PreBuiltTransportClientTests#testPluginInstalled on JDK 9. The
underlying issue is that Netty attempts to access sun.nio.ch but this
package is not exported from java.base on JDK 9. This throws an uncaught
InaccessibleObjectException causing the test to fail. This assumption
can be removed when Netty 4.1.6 is released as it will include a fix for
this scenario.
Relates #20251
This enables the RestClient to send array-based (multi-valued) header values, rather than only sending whatever happened to be the _last_ value of the header.
This enables simple support for proxies (beyond proxy host and proxy port, which is done via the RequestConfig)) to provide a base path in front of all requests performed by the RestClient.
This removes final from the RestClient, Response, and Sniffer classes so that outside code can mock them. Their constructors are already package private, so there's not much that can go wrong.
Today when we load the Netty plugins, we indirectly cause several Netty
classes to initialize. This is because we attempt to load some classes
by name, and loading these classes is done in a way that triggers a long
chain of class initializers within Netty. We should not do this, this
can lead to log messages before the logger is loader, and it leads to
initialization in cases when the classes would never be needed (for
example, Netty 3 class initialization is never needed if Netty 4 is
used, and vice versa). This commit avoids this early initialization of
these classes by removing the need for the early loading.
Relates #19819
This commit updates Jackson to the 2.8.1 version, which is more strict when it comes to build objects. It also adds the snakeyaml dependency that was previously shaded in jackson libs.
It also closes#18076
When closing a transport client that depends on Netty 4, interrupted
exceptions can be thrown while shutting down some Netty threads. This
commit refactors the handling of these exceptions to finish shutting
down and then just restore the interrupted status.
Today if the PreBuiltTransportClient is using Netty 4 transport, on
shutdown some Netty 4 threads could linger. This commit causes the
client to wait for these threads to shutdown upon termination.
This change does three things:
1. Makes PreBuiltTransportClientTests run since it was silently
failing on a missing dependency
2. Makes PreBuiltTransportClientTests pass
3. Removes the http.type and transport.type from being set in the
transport clients additional settings since these are set to `netty4` by
default anyway.
* Allow to run client benchmark as an uberjar
* Busy wait to avoid accidental skew on low target throughput rates
* Trigger and wait for full GC to happen between trials
* Add missing SuppressForbidden to allow System.gc in client benchmark
Consuming the response body to make it part of the exception message means that it may not be readable anymore later, depending on whether the entity is repeatable or not. Turns out that the response body tells a lot about the error itself, and considering that we don't expect bodies to be incredibly big for errors, we can wrap the entity into a BufferedHttpEntity to make it repeatable.
Closes#19622
Simplify Sniffer initialization and automatically create the default HostsSniffer
Take Sniffer.Builder out to its own top level class. Remove HostsSniffer.Builder and let SnifferBuilder create the default HostsSniffer. This simplifies the Sniffer initialization as the HostsSniffer is not mandatory anymore. It can still be specified though in case the configuration needs to be changed or a different impl has to be used. Also make HostsSniffer an interface.
It can happen that the list of healthy hosts is empty, then we get one from the blacklist. but some other operation might have sneaked in and emptied the blacklist in the meantime, so we have to retry till we manage to get some host, either from the healthy list or from the blacklist.
Throw explicit IllegalStateException in unexpected situations, like where both response and exception are set, or when both are unset. Add unit test for SyncResponseListener.
We throw IOException, which is the exception that is going to be thrown in 99% of the cases. A more generic exception can happen, and if it is a runtime one we just let it bubble up as is, otherwise we wrap it into runtime one so that we don't require to catch Exception everywhere, which seems odd.
Also adjusted javadocs for all performRequest methods
We keep the default async client behaviour like in BasicAsyncResponseConsumer, but we lower the maximum size of the buffer from Integer.MAX_VALUE (2GB) to 10 MB. This way users will realize they are buffering big responses in heap hence they'll know they have to do something about it, either write their own response consumer or increase the buffer size limit by providing their manually creeted instance of HeapBufferedAsyncResponseConsumer (constructor accept a bufferLimit int argument).
Also delayed call to HttpAsyncClient#start so that if something goes wrong while creating the RestClient, the http client threads don't linger. In fact, if the constructor fails it is not possible to call close against the RestClient.
HttpClientConfigCallback#customizeHttpClient now also returns the HttpClientBuilder so it can be completely replaced
RequestConfigCallback#customizeRequestConfig now also returns the HttpClientBuilder so it can be completely replaced
The new method accepts the usual parameters (method, endpoint, params, entity and headers) plus a response listener and an async response consumer. Shortcut methods are also added that don't require params, entity and the async response consumer optional.
There are a few relevant api changes as a consequence of the move to async client that affect sync methods:
- Response doesn't implement Closeable anymore, responses don't need to be closed
- performRequest throws Exception rather than just IOException, as that is the the exception that we get from the FutureCallback#failed method in the async http client
- ssl configuration is a bit simpler, one only needs to call setSSLStrategy from a custom HttpClientConfigCallback, that doesn't end up overridng any other default around connection pooling (it used to happen with the sync client and make ssl configuration more complex)
Relates to #19055
The `client/transport` project adds a new jar build project that
pulls in all dependencies and configures all required modules.
Preinstalled modules are:
* transport-netty
* lang-mustache
* reindex
* percolator
The `TransportClient` classes are still in core
while `TransportClient.Builder` has only a protected construcutor
such that users are redirected to use the new `TransportClientBuilder`
from the new jar.
Closes#19412
The callback replaces the ability to fully replace the http client instance. By doing that, one used to lose any default that the RestClient had set for the underlying http client. Given that you'd usually override one or two things only, like a couple of timeout values, the ssl factory or the default credentials providers, it is not uder friendly if by doing that users end up replacing the whole http client instance and lose any default set by us.
Users wanting to send a request by providing only its method and endpoint, effectively the only two required arguments, shouldn't need to pass in an empty map and a null entity for the body. While at it we can also add a variant to send requests by specifying only method, endpoint and params, but not body. Headers remain a vararg as last argument, so they can always optionally be provided.
Closes#19312
The assumption is HostsSniffer is that all of the arguments have been properly provided and validated through HostsSniffer.Builder, except they weren't, as the scheme didn't have a default value and when not set would cause NPEs down the road. Improved tests to catch this also.
Some Rest tests use the Sun HTTP server which has lingering threads after shutdown. Similar to ESTestCase, this adds an
option to wait 5 seconds for these threads to terminate.
:client ---------> :client:rest
:client-sniffer -> :client:sniffer
:client-test ----> :client:test
This lines the client up with how we do things like modules and
plugins.
The lucene-test dependency caused issues with IDEs as they would always load the lucene 5 jar although they shouldn't have, which caused jarhell in es core tests.
If we depend directly on randomized runner we don't have this problem. It is luckily still compatible with java 1.7. This requires though adding a thin module that includes the base test class which can be shared between client and client-sniffer.
Projects that don't depend on elasticsearch-test fail otherwise because org.elasticsearch.test.EsIntegTestCase (default integ test class) is not in the classpath. They should provide their onw integ test base class, but having integration tests should not be mandatory. One can simply set skipIntegTestsInDisguise to true to prevent loading of integ test class.
Unit tests rely on mockito to mock the internal HttpClient instance. No http request is performed, we only simulate interaction between RestClient and its internal HttpClient.
Store default headers ourselves instead, otherwise default ones cannot be replaced. Don't allow for multiple headers with same key, last one wins and replaces previous ones with same key.
Also fail with null params or headers.
Although elasticsearch doesn't support these methods (RestController doesn't even allow to register handler for them), the RestClient should allow to send requests using them.
Use sun HttpServer instead and disable forbidden-apis for test classes. It turns out to be more flexible than okhttp as it allows get & delete with body.
Create a new subproject called client-sniffer that contains the o.e.client.sniff package. Since it is going to go to a separate jar, due to its additional functionalities and dependency on jackson, it makes sense to have it as a separate project that depends on client. This way we make sure that client doesn't depend on it etc.
ElasticsearchResponseException, as well as ElasticsearchResponse, should only be created from o.e.client package.
RequestLogger should only be used from this package too.
Instead of having a Connection mutable object that holds the state of the connection to each host, we now have immutable objects only. We keep two sets, one with all the hosts, one with the blacklisted ones. Once we blacklist a host we associate it with a DeadHostState which keeps track of the number of failed attempts and when the host should be retried. A new state object is created each and every time the state of the host needs to be updated.
We still have a wrapper called RestTestClient that is very specific to Rest tests, as well as RestTestResponse etc. but all the low level bits around http connections etc. are now handled by RestClient.
The only small problem is that the response gets closed straightaway and its body read immediately into a string. Should be ok to load it all into memory eagerly though in case of errors. Otherwise it becomes cumbersome to have an exception implement Closeable...
The connection class can be greatly simplified now that we don't ping anymore. Pings required a special initial state (UNKNOWN) for connections, to indicate that they require pinging although they are not dead. At this point we don't need the State enum anymore, as connections can only be dead or alive based on the number of failed attempts. markResurrected is also not needed, as it was again a way to make pings required. RestClient can simply pick a dead connection now and use it, no need to change its state when picking the connection.
Given that we don't use streams anymore, we can check straightaway if the connection iterator is empty before returning it and resurrect a connection when needed directly in the connection pool, no lastResortConnection method required.
There are two implementations of connection pool, a static one that allows to enable/disable pings, and a sniffing one that sniffs nodes from the nodes info api.
Transport retrieves a stream of connections from the connection for each request and calls onSuccess or onFailure depending on the result of the request.
Transport also supports a max retry timeout to control the timeout for the request retries overall.