Update by query is a shortcut to search + index. UpdateByQueryRequest gets serialized on the transport layer only when the transport client is used. Given that the request supports wildcards and allows to set its indices, it should implement IndicesRequest.Repleaceable. implementing CompositeIndicesRequest makes little sense as the indices that the request works against depend entirely on the inner search request.
Delete by query is a shortcut to search + delete. DeleteByQueryRequest gets serialized on the transport layer only when the transport client is used. Given that the request supports wildcards and allows to set its indices, it should implement IndicesRequest.Repleaceable
Fixes two issues:
1. lang-javascript doesn't support `executable` with a `null` `vars`
parameters. The parameter is quite nullable.
2. reindex didn't support script engines who's `unwrap` method wasn't
a noop. This didn't come up for lang-groovy or lang-painless because
both of those `unwrap`s were noops. lang-javascript copys all maps that
it `unwrap`s.
This adds fairly low level unit tests for these fixes but dosen't add
an integration test that makes sure that reindex and lang-javascript
play well together. That'd make backporting this difficult and would
add a fairly significant amount of time to the build for a fairly rare
interaction. Hopefully the unit tests will be enough.
I also reduced the visibility of a couple classes and renamed/consolidated some
test classes for consistency, eg. removing the `Simple` prefix or using the
`<Type>FieldMapperTests` convention for testing field mappers.
Today when we load the Netty plugins, we indirectly cause several Netty
classes to initialize. This is because we attempt to load some classes
by name, and loading these classes is done in a way that triggers a long
chain of class initializers within Netty. We should not do this, this
can lead to log messages before the logger is loader, and it leads to
initialization in cases when the classes would never be needed (for
example, Netty 3 class initialization is never needed if Netty 4 is
used, and vice versa). This commit avoids this early initialization of
these classes by removing the need for the early loading.
Relates #19819
* Rename operation to result and reworking responses
* Rename DocWriteResponse.Operation enum to DocWriteResponse.Result
These are just easier to interpret names.
Closes#19664
In an effort to reduce the number of tiny packages we have in the
code base this moves all the files that were in subdirectories of
`org.elasticsearch.rest.action.admin.cluster` into
`org.elasticsearch.rest.action.admin.cluster`.
Also fixes line length in these packages.
`_reindex` only needs the `_version` if the `dest` has
`"version_type": "external"`. So it shouldn't ask for it unless it does.
`_update_by_query` and `_delete_by_query` always need the `_version`.
Closes#19135
This is cleanup work from #19566, where @nik9000 suggested trying to nuke the isCreated and isFound methods. I've combined nuking the two methods with removing UpdateHelper.Operation in favor of DocWriteResponse.Operation here.
Closes#19631.
Reindex from remote uses the Elasticsearch client which uses apache
httpasyncclient which spins up 5 thread by default, 1 as a dispatcher
and 4 more to handle IO. This changes Reindex's usage so it only spins
up two thread - 1 dispatcher and one to handle io. It also renames the
threads to "es-client-$taskid-$thread_number". That way if we see any
thread sticking around we can trace it back to the task.
The tests for authentication extend ESIntegTestCase and use a mock
authentication plugin. This way the clients don't have to worry about
running it. Sadly, that means we don't really have good coverage on the
REST portion of the authentication.
This also adds ElasticsearchStatusException, and exception on which you
can set an explicit status. The nice thing about it is that you can
set the RestStatus that it returns to whatever arbitrary status you like
based on the status that comes back from the remote system.
reindex-from-remote then uses it to wrap all remote failures, preserving
the status from the remote Elasticsearch or whatever proxy is between us
and the remove Elasticsearch.
This makes it obvious that these tests are for running the client yaml
suites. Now that there are other ways of running tests using the REST
client against a running cluster we can't go on calling the shared
client yaml tests "REST tests". They are rest tests, but they aren't
**the** rest tests.
Performing the bulk request shown in #19267 now results in the following:
```
{"_index":"test","_type":"test","_id":"1","_version":1,"_operation":"create","forced_refresh":false,"_shards":{"total":2,"successful":1,"failed":0},"status":201}
{"_index":"test","_type":"test","_id":"1","_version":1,"_operation":"noop","forced_refresh":false,"_shards":{"total":2,"successful":1,"failed":0},"status":200}
```
We disable transitive dependencies in our build plugin
for all dependencies except for the group `org.elasticsearch`.
However, in the reindex plugin we depend on the REST client
and declare its dependencies again which is not necessary
(and led to problems with conflicting versions in #19281).
With this PR we remove the duplicate declaration.
This adds a header that looks like `Location: /test/test/1` to the
response for the index/create/update API. The requirement for the header
comes from https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.htmlhttps://tools.ietf.org/html/rfc7231#section-7.1.2 claims that relative
URIs are OK. So we use an absolute path which should resolve to the
appropriate location.
Closes#19079
This makes large changes to our rest test infrastructure, allowing us
to write junit tests that test a running cluster via the rest client.
It does this by splitting ESRestTestCase into two classes:
* ESRestTestCase is the superclass of all tests that use the rest client
to interact with a running cluster.
* ESClientYamlSuiteTestCase is the superclass of all tests that use the
rest client to run the yaml tests. These tests are shared across all
official clients, thus the `ClientYamlSuite` part of the name.
We better read the header, but who knows what can happen, maybe headers are filtered out for some reasons and we don't want to run into an NPE, then we fallback to auto-detection.
The new method accepts the usual parameters (method, endpoint, params, entity and headers) plus a response listener and an async response consumer. Shortcut methods are also added that don't require params, entity and the async response consumer optional.
There are a few relevant api changes as a consequence of the move to async client that affect sync methods:
- Response doesn't implement Closeable anymore, responses don't need to be closed
- performRequest throws Exception rather than just IOException, as that is the the exception that we get from the FutureCallback#failed method in the async http client
- ssl configuration is a bit simpler, one only needs to call setSSLStrategy from a custom HttpClientConfigCallback, that doesn't end up overridng any other default around connection pooling (it used to happen with the sync client and make ssl configuration more complex)
Relates to #19055
* Removed `Template` class and unified script & template parsing logic. Templates are scripts, so they should be defined as a script. Unless there will be separate template infrastructure, templates should share as much code as possible with scripts.
* Removed ScriptParseException in favour for ElasticsearchParseException
* Moved TemplateQueryBuilder to lang-mustache module because this query is hard coded to work with mustache only
creation in the REST tests, as we no longer need it due
to index creation now waiting for active shard copies
before returning (by default, it waits for the primary of
each shard, which is the same as ensuring yellow health).
Relates #19450
This commit renames the Netty 3 transport module from transport-netty to
transport-netty3. This is to make room for a Netty 4 transport module,
transport-netty4.
Relates #19439
This changes adds a flag which can be set in the esplugin closure in
build.gradle for plugins and modules which contain pieces that must be
published to maven, for use in the transport client. The jar/pom and
source/javadoc jars are moved to a new name that has the suffix
"-client".
I enabled this for the two modules that I know definitely need this;
there may be more. One open question is which groupId to use for the
generated pom.
closes#19411
Some tests still start http implicitly or miss configuring the transport clients correctly.
This commit fixes all remaining tests and adds a depdenceny to `transport-netty` from
`qa/smoke-test-http` and `modules/reindex` since they need an http server running on the nodes.
This also moves all required permissions for netty into it's module and out of core.
This exposes a method to start an action and return a task from
`NodeClient`. This allows reindex to use the injected `Client` rather
than require injecting `TransportAction`s
* master: (192 commits)
[TEST] Fix rare OBOE in AbstractBytesReferenceTestCase
Reindex from remote
Rename writeThrowable to writeException
Start transport client round-robin randomly
Reword Refresh API reference (#19270)
Update fielddata.asciidoc
Fix stored_fields message
Add missing footer notes in mapper size docs
Remote BucketStreams
Add doc values support to the _size field in the mapper-size plugin
Bump version to 5.0.0-alpha5.
Update refresh.asciidoc
Update shrink-index.asciidoc
Change Debian repository for Vagrant debian-8 box
[TEST] fix test to account for internal empyt reference optimization
Upgrade to netty 3.10.6.Final (#19235)
[TEST] fix histogram test when extended bounds overlaps data
Remove redundant modifier
Simplify TcpTransport interface by reducing send code to a single send method (#19223)
Fix style violation in InstallPluginCommand.java
...
This adds a remote option to reindex that looks like
```
curl -POST 'localhost:9200/_reindex?pretty' -d'{
"source": {
"remote": {
"host": "http://otherhost:9200"
},
"index": "target",
"query": {
"match": {
"foo": "bar"
}
}
},
"dest": {
"index": "target"
}
}'
```
This reindex has all of the features of local reindex:
* Using queries to filter what is copied
* Retry on rejection
* Throttle/rethottle
The big advantage of this version is that it goes over the HTTP API
which can be made backwards compatible.
Some things are different:
The query field is sent directly to the other node rather than parsed
on the coordinating node. This should allow it to support constructs
that are invalid on the coordinating node but are valid on the target
node. Mostly, that means old syntax.
Today throughout the codebase, catch throwable is used with reckless
abandon. This is dangerous because the throwable could be a fatal
virtual machine error resulting from an internal error in the JVM, or an
out of memory error or a stack overflow error that leaves the virtual
machine in an unstable and unpredictable state. This commit removes
catch throwable from the codebase and removes the temptation to use it
by modifying listener APIs to receive instances of Exception instead of
the top-level Throwable.
Relates #19231
Rename `fields` to `stored_fields` and add `docvalue_fields`
`stored_fields` parameter will no longer try to retrieve fields from the _source but will only return stored fields.
`fields` will throw an exception if the user uses it.
Add `docvalue_fields` as an adjunct to `fielddata_fields` which is deprecated. `docvalue_fields` will try to load the value from the docvalue and fallback to fielddata cache if docvalues are not enabled on that field.
Closes#18943
We have long worked to capture different partitioning scenarios in our testing infra. This PR adds a new variant, inspired by the Jepsen blogs, which was forgotten far - namely a partition where one node can still see and be seen by all other nodes. It also updates the resiliency page to better reflect all the work that was done in this area.
Update-By-Query and Delete-By-Query use internal versioning to update/delete documents. But documents can have a version number equal to zero using the external versioning... making the UBQ/DBQ request fail because zero is not a valid version number and they only support internal versioning for now. Sequence numbers might help to solve this issue in the future.
Previously all rest handlers would take Client in their injected ctor.
However, it was only to hold the client around for runtime. Instead,
this can be done just once in the HttpService which handles rest
requests, and passed along through the handleRequest method. It also
should always be a NodeClient, and other types of Clients (eg a
TransportClient) would not work anyways (and some handlers can be
simplified in follow ups like reindex by taking NodeClient).
`RestHandler`s are highly tied to actions so registering them in the
same place makes sense.
Removes the need to for plugins to check if they are in transport client
mode before registering a RestHandler - `getRestHandlers` isn't called
at all in transport client mode.
This caused guice to throw a massive fit about the circular dependency
between NodeClient and the allocation deciders. I broke the circular
dependency by registering the actions map with the node client after
instantiation.
Stored scripts are pulled from the cluster state, and the current api
requires passing the ClusterState on each call to compile. However, this
means every user of the ScriptService needs to depend on the
ClusterService. Instead, this change makes the ScriptService a
ClusterStateListener. It also simplifies tests a lot, as they no longer
need to create fake cluster states (except when testing stored scripts).
Instead of implementing onModule(ActionModule) to register actions,
this has plugins implement ActionPlugin to declare actions. This is
yet another step in cleaning up the plugin infrastructure.
While I was in there I switched AutoCreateIndex and DestructiveOperations
to be eagerly constructed which makes them easier to use when
de-guice-ing the code base.
* master: (416 commits)
docs: removed obsolete information, percolator queries are not longer loaded into jvm heap memory.
Upgrade JNA to 4.2.2 and remove optionality
[TEST] Increase timeouts for Rest test client (#19042)
Update migrate_5_0.asciidoc
Add ThreadLeakLingering option to Rest client tests
Add a MultiTermAwareComponent marker interface to analysis factories. #19028
Attempt at fixing IndexStatsIT.testFilterCacheStats.
Fix docs build.
Move templates out of the Search API, into lang-mustache module
revert - Inline reroute with process of node join/master election (#18938)
Build valid slices in SearchSourceBuilderTests
Docs: Convert aggs/misc to CONSOLE
Docs: migration notes for _timestamp and _ttl
Group client projects under :client
[TEST] Add client-test module and make client tests use randomized runner directly
Move upgrade test to upgrade from version 2.3.3
Tasks: Add completed to the mapping
Fail to start if plugin tries broken onModule
Remove duplicated read byte array methods
Rename `fields` to `stored_fields` and add `docvalue_fields`
...
`stored_fields` parameter will no longer try to retrieve fields from the _source but will only return stored fields.
`fields` will throw an exception if the user uses it.
Add `docvalue_fields` as an adjunct to `fielddata_fields` which is deprecated. `docvalue_fields` will try to load the value from the docvalue and fallback to fielddata cache if docvalues are not enabled on that field.
Closes#18943
This makes this sequence:
```
curl -XDELETE localhost:9200/source,dest?pretty
for i in $( seq 1 100 ); do
curl -XPOST localhost:9200/source/test -d'{"test": "test"}'; echo
done
curl localhost:9200/_refresh?pretty
curl -XPOST 'localhost:9200/_reindex?pretty&wait_for_completion=false' -d'{
"source": {
"index": "source"
},
"dest": {
"index": "dest"
}
}'
curl 'localhost:9200/_tasks/Jsyd6d9wSRW-O-NiiKbPcQ:237?wait_for_completion&pretty'
```
Return task *AND* the response to the user.
This also renames "result" to "response" in the persisted task info
to line it up with how we name the objects in Elasticsearch.
This change removes some unnecessary dependencies from ClusterService
and cleans up ClusterName creation. ClusterService is now not created
by guice anymore.
In 2.0 we added plugin descriptors which require defining a name and
description for the plugin. However, we still have name() and
description() which must be overriden from the Plugin class. This still
exists for classpath plugins. But classpath plugins are mainly for
tests, and even then, referring to classpath plugins with their class is
a better idea. This change removes name() and description(), replacing
the name for classpath plugins with the full class name.
This adds a get task API that supports GET /_tasks/${taskId} and
removes that responsibility from the list tasks API. The get task
API supports wait_for_complation just as the list tasks API does
but doesn't support any of the list task API's filters. In exchange,
it supports falling back to the .results index when the task isn't
running any more. Like any good GET API it 404s when it doesn't
find the task.
Then we change reindex, update-by-query, and delete-by-query to
persist the task result when wait_for_completion=false. The leads
to the neat behavior that, once you start a reindex with
wait_for_completion=false, you can fetch the result of the task by
using the get task API and see the result when it has finished.
Also rename the .results index to .tasks.
Writeable is better for immutable objects like TimeValue.
Switch to writeZLong which takes up less space than the original
writeLong in the majority of cases. Since we expect negative
TimeValues we shouldn't use
writeVLong.
Today we use a random source of UUIDs for assigning allocation IDs,
cluster IDs, etc. Yet, the source of randomness for this is not
reproducible in tests. Since allocation IDs end up as keys in hash maps,
this means allocation decisions and not reproducible in tests and this
leads to non-reproducible test failures. This commit modifies the
behavior of random UUIDs so that they are reproducible under tests. The
behavior for production code is not changed, we still use a true source
of secure randomness but under tests we just use a reproducible source
of non-secure randomness.
It is important to note that there is a test,
UUIDTests#testThreadedRandomUUID that relies on the UUIDs being truly
random. Thus, we have to modify the setup for this test to use a true
source of randomness. Thus, this is one test that will never be
reproducible but it is intentionally so.
Relates #18808
* master: (51 commits)
Switch QueryBuilders to new MatchPhraseQueryBuilder
Added method to allow creation of new methods on-the-fly.
more cleanups
Remove cluster name from data path
Remove explicit parallel new GC flag
rehash the docvalues in DocValuesSliceQuery using BitMixer.mix instead of the naive Long.hashCode.
switch FunctionRef over to methodhandles
ingest: Move processors from core to ingest-common module.
Fix some typos (#18746)
Fix ut
convert FunctionRef/Def usage to methodhandles.
Add the ability to partition a scroll in multiple slices. API:
use painless types in FunctionRef
Update ingest-node.asciidoc
compute functional interface stuff in Definition
Use method name in bootstrap check might fork test
Make checkstyle happy (add Lookup import, line length)
Don't hide LambdaConversionException and behave like real javac compiled code when a conversion fails. This works anyways, because fallback is allowed to throw any Throwable
Pass through the lookup given by invokedynamic to the LambdaMetaFactory. Without it real lambdas won't work, as their implementations are private to script class
checkstyle have your upper L
...
Folded grok processor into ingest-common module.
The rest tests have been moved to ingest-common module as well, because these tests don't run in the rest-api-spec module but in the distribution:integ-test-zip module
and adding a test plugin there felt just wrong to me. I think this is ok. I left a tiny ingest rest test behind in that tests with an empty pipeline.
Removed messy tests, these tests were already covered in the rest tests
Added ingest test plugin in test infra so that each module testing integration with ingest doesn't need write its own plugin
Moved reindex ingest tests to qa module
Closes#18490
This commit refactors the handling of thread pool settings so that the
individual settings can be registered rather than registering the top
level group. With this refactoring, individual plugins must now register
their own settings for custom thread pools that they need, but a
dedicated API is provided for this in the thread pool module. This
commit also renames the prefix on the thread pool settings from
"threadpool" to "thread_pool". This enables a hard break on the settings
so that:
- some of the settings can be given more sensible names (e.g., the max
number of threads in a scaling thread pool is now named "max" instead
of "size")
- change the soft limit on the number of threads in the bulk and
indexing thread pools to a hard limit
- the settings names for custom plugins for thread pools can be
prefixed (e.g., "xpack.watcher.thread_pool.size")
- remove dynamic thread pool settings
Relates #18674
* master: (184 commits)
Add back pending deletes (#18698)
refactor matrix agg documentation from modules to main agg section
Implement ctx.op = "delete" on _update_by_query and _reindex
Close SearchContext if query rewrite failed
Wrap lines at 140 characters (:qa projects)
Remove log file
painless: Add support for the new Java 9 MethodHandles#arrayLength() factory (see https://bugs.openjdk.java.net/browse/JDK-8156915)
More complete exception message in settings tests
Use java from path if JAVA_HOME is not set
Fix uncaught checked exception in AzureTestUtils
[TEST] wait for yellow after setup doc tests (#18726)
Fix recovery throttling to properly handle relocating non-primary shards (#18701)
Fix merge stats rendering in RestIndicesAction (#18720)
[TEST] mute RandomAllocationDeciderTests.testRandomDecisions
Reworked docs for index-shrink API (#18705)
Improve painless compile-time exceptions
Adds UUIDs to snapshots
Add test rethrottle test case for delete-by-query
Do not start scheduled pings until transport start
Adressing review comments
...
The assertBusy method currently has both a Runnable and Callable
version. This has caused confusion with type inference and lambdas
sometimes, in particular with java 9. This change removes the callable
version as nothing was actually using it.
The retry test has failed a couple of times in CI because it wasn't able
to cause any retries. Putting it in a bash `while` loop shows that it
eventually does fail that way. The seed "4F6477A9C999CA20" seems especially
good at failing to get retries. It doesn't fail all the time, but more
than most.
This adds a retry to each test case, retrying a maximum of 10 times or
until it causes the retries. I've seen it fail to get retries 7 times
in a row but not go beyond that. Retrying doesn't seem to really hurt
the test runtime all that much. Most of the time is in the startup
cost.
Failing CI build that triggered this:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+periodic/852/console
* master: (158 commits)
Document the hack
Refactor property placeholder use of env. vars
Force java9 log4j hack in testing
Fix log4j buggy java version detection
Make java9 work again
Don't mkdir directly in deb init script
Fix env. var placeholder test so it's reproducible
Remove ScriptMode class in favor of boolean true/false
[rest api spec] fix doc urls
Netty request/response tracer should wait for send
Filter client/server VM options from jvm.options
[rest api spec] fix url for reindex api docs
Remove use of a Fields class in snapshot responses that contains x-content keys, in favor of declaring/using the keys directly.
Limit retries of failed allocations per index (#18467)
Proxy box method to use valueOf.
Use the build-in valueOf method instead of the custom one.
Fixed tests and added a comment to the box method.
Fix boxing.
Do not decode path when sending error
Fix race condition in snapshot initialization
...
This uses the same backoff policy we use for bulk and just retries until
the request isn't rejected.
Instead of `{"retries": 12}` in the response to count retries this now
looks like `{"retries": {"bulk": 12", "search": 1}`.
Closes#18059
* master: (904 commits)
Removes unused methods in the o/e/common/Strings class.
Add note regarding thread stack size on Windows
painless: restore accidentally removed test
Documented fuzzy_transpositions in match query
Add not-null precondition check in BulkRequest
Build: Make run task you full zip distribution
Build: More pom generation improvements
Add test for wrong array index
Take return type from "after" field.
painless: build descriptor of array and field load/store in code; fix array index to adapt type not DEF
Build: Add developer info to generated pom files
painless: improve exception stacktraces
painless: Rename the dynamic call site factory to DefBootstrap and make the inner class very short (PIC = Polymorphic Inline Cache)
Remove dead code.
Avoid race while retiring executors
Allow only a single extension for a scripting engine
Adding REST tests to ensure key_as_string behavior stays consistent
[test] Set logging to 11 on reindex test
[TEST] increase logger level until we know what is going on
Don't allow `fuzziness` for `multi_match` types cross_fields, phrase and phrase_prefix
...
QueryBuilder has generics, but those are never used: all call sites use
`QueryBuilder<?>`. Only `AbstractQueryBuilder` needs generics so that the base
class can contain a default implementation for setters that returns `this`.
All other values are errors.
Add java test for throttling. We had a REST test but it only ran against
one node so it didn't catch serialization errors.
Add Simple round trip test for rethrottle request
Do this by creating a Client subclass that automatically assigns the
parentTask to all requests that come through it. Code that doesn't want
to set the parentTask can call `unwrap` on the Client to get the inner
client instance that doesn't set the parentTask. Reindex uses this for
its ClearScrollRequest so that the request will run properly after the
reindex request has been canceled.
Now that the current uses of magical camelCase support have been
deprecated, we can remove these in master (sans remaining issues like
BulkRequest). This change removes camel case support from ParseField,
query types, analysis, and settings lookup.
see #8988