This is related to #22116. Core no longer needs SocketPermission
accept. This permission is relegated to the transport-netty4 module
and (for tests) to the mocksocket jar.
This commit adds a MessyRestTestPlugin to the gradle build. It extends
StandaloneRestTestPlugin. The main piece of functionality that it adds
is to copy plugin-metadata from dependencies into the
generated-resources for the current test source. This is necessary to
ensure that permissions for dependencies are applied when running the
tests.
A current limitation is that the permissions are applied differently
than in the distribution sources. When permissions are granted to all
depedencies for a module or plugin, the permissions are granted to all
dependencies on the classpath for tests besides a few hardcoded
exclusions:
- es core
- es test framework
- lucene test framework
- randomized runner
- junit library
We don't want to expose `String#getBytes` which is required for
`Base64.getEncoder.encode` to work because we're worried about
character sets. This adds `encodeBase64` and `decodeBase64`
methods to `String` in Painless that are duals of one another
such that:
`someString == someString.encodeBase64().decodeBase64()`.
Both methods work with the UTF-8 encoding of the string.
Closes#22648
Everything that extended `AbstractAsyncBulkByScrollAction` also
extended `AbstractAsyncBulkIndexByScrollAction` so this removes
`AbstractAsyncBulkIndexByScrollAction`, merging it into
`AbstractAsyncBulkByScrollAction`.
Changes the error message when `action.auto_create_index` or
`index.mapper.dynamic` forbids automatic creation of an index
from `no such index` to one of:
* `no such index and [action.auto_create_index] is [false]`
* `no such index and [index.mapper.dynamic] is [false]`
* `no such index and [action.auto_create_index] contains [-<pattern>] which forbids automatic creation of the index`
* `no such index and [action.auto_create_index] ([all patterns]) doesn't match`
This should make it more clear *why* there is `no such index`.
Closes#22435
Today we do not preserve response headers if they are present on a transport protocol
response. While preserving these headers is not always desired, in the most cases we
should pass on these headers to have consistent results for depreciation headers etc.
yet, this hasn't been much of a problem since most of the deprecations are detected early
ie. on the coordinating node such that this bug wasn't uncovered until #22647
This commit allow to optionally preserve headers when a context is restored and also streamlines
the context restore since it leaked frequently into the callers thread context when the callers
context wasn't restored again.
Instead of forcing each task to register all nodes where its children are running, this commit runs cancellation on all nodes. The task cancellation operation doesn't run too frequently, so this optimization doesn't seem to be worth additional complexity of the interface.
Previously, certain settings that could take multiple comma delimited
values would pick up incorrect values for all entries but the first if
each comma separated value was followed by a whitespace character. For
example, the multi-value "A,B,C" would be correctly parsed as
["A", "B", "C"] but the multi-value "A, B, C" would be incorrectly parsed
as ["A", " B", " C"].
This commit allows a comma separated list to have whitespace characters
after each entry. The specific settings that were affected by this are:
cluster.routing.allocation.awareness.attributes
index.routing.allocation.require.*
index.routing.allocation.include.*
index.routing.allocation.exclude.*
cluster.routing.allocation.require.*
cluster.routing.allocation.include.*
cluster.routing.allocation.exclude.*
http.cors.allow-methods
http.cors.allow-headers
For the allocation filtering related settings, this commit also provides
validation of each specified entry if the filtering is done by _ip,
_host_ip, or _publish_ip, to ensure that each entry is a valid IP
address.
Closes#22297
This commit tries to simplify the way ElasticsearchException are rendered to xcontent. It adds some documentation and renames and merges some methods. Current behavior is preserved, the goal is to be more readable and centralize everything in the ElasticsearchException class.
Today we have quite some abstractions that are essentially providing a simple
dispatch method to the plugins defining a `HttpServerTransport`. This commit
removes `HttpServer` and `HttpServerAdaptor` and introduces a simple `Dispatcher` functional
interface that delegate to `RestController` by default.
Relates to #18482
The IndexingOperationListener interface did not provide any
information about the shard id when a document was indexed.
This commit adds the shard id as the first parameter to all methods
in the IndexingOperationListener.
This is related to #22116. netty channels require socket `connect` and
`accept` privileges. Netty does not currently wrap these operations
with `doPrivileged` blocks. These changes extend the netty channels
and wrap calls to the relevant super methods in doPrivileged blocks.
Adds a message about how the remote is unlikely to be Elasticsearch.
This isn't as good as including the whole message from the remote but
we can't do that because we are stream parsing it and we don't want
to mark the whole request.
Closes#22330
Moves fetching the local node id into `NodeClient` which is a
fairly useful place to put it so you can generate task ids from
`NodeClient#executeLocally`.
It is no longer needed. It used to contain a lot of strings
used by serialization but those have since been removed. Now
it is just another thing to pass around that we don't really
need.
Reindex-from-remote had a race when it tried to clear the scroll. It
first starts the request to clear the scroll and then submits a task
to the generic threadpool to shutdown the client. These two things
race and, in my experience, closing the scroll generally loses. That
means that most of the time reindex-from-remote isn't clearing the
scrolls that it uses. This isn't the end of the world because we
flush old scroll contexts after a while but this isn't great.
Noticed while experimenting with #22514.
Reindex-from-remote was accepting source filtering in the request
but ignoring it and setting `_source=true` on the search URI. This
fixes the filtering so it is piped through to the remote node and
adds tests for that.
Closes#22507
Removes `AggregatorParsers`, replacing all of its functionality with
`XContentParser#namedObject`.
This is the third bit of payoff from #22003, one less thing to pass
around the entire application.
If the remote doesn't return a content type then reindex
tried to guess the content-type. This didn't work most
of the time and produced a rather useless error message.
Given that Elasticsearch always returns the content-type
we are dropping content-type detection in favor of just
failing the request if the remote didn't return a content-type.
Closes#22329
1. Escape sequences we're working. For example `\\` is now correctly
interpreted as `\` instead of `\\`. Same with `\'` being `'` and
`\"` being `"`.
2. `'` delimited strings weren't allowed to contain `"`s but it looked
like they were intended to support it. Now they do.
3. Improves the error message when the script contains an invalid
escape sequence inside a string to include a list of the valid
escape sequences.
Closes#22372
This integrates the mocksocket jar with elasticsearch tests. Mocksocket wraps actions requiring SocketPermissions in doPrivilege blocks. This will eventually allow SocketPermissions to be assigned to the mocksocket jar opposed to the entire elasticsearch codebase.
We previously named the thread using a frame from the stack trace, but
this was removed to simplify the code here. However, the comment
explaining this was left behind and this commit cleans that up.
* Remove a checked exception, replacing it with `ParsingException`.
* Remove all Parser classes for the yaml sections, replacing them with static methods.
* Remove `ClientYamlTestFragmentParser`. Isn't used any more.
* Remove `ClientYamlTestSuiteParseContext`, replacing it with some static utility methods.
I did not rewrite the parsers using `ObjectParser` because I don't think it is worth it right now.
As the translog evolves towards a full operations log as part of the
sequence numbers push, there is a need for the translog to be able to
represent operations for which a sequence number was assigned, but the
operation did not mutate the index. Examples of how this can arise are
operations that fail after the sequence number is assigned, and gaps in
this history that arise when an operation is assigned a sequence number
but the operation never completed (e.g., a node crash). It is important
that these operations appear in the history so that they can be
replicated and replayed during recovery as otherwise the history will be
incomplete and local checkpoints will not be able to advance. This
commit introduces a no-op to the translog to set the stage for these
efforts.
Relates #22291
Introduces `XContentParser#namedObject which works a little like
`StreamInput#readNamedWriteable`: on startup components register
parsers under names and a superclass. At runtime we look up the
parser and call it to parse the object.
Right now the parsers take a context object they use to help with
the parsing but I hope to be able to eliminate the need for this
context as most what it is used for at this point is to move
around parser registries which should be replaced by this method
eventually. I make no effort to do so in this PR because it is
big enough already. This is meant to the a start down a road that
allows us to remove classes like `QueryParseContext`,
`AggregatorParsers`, `IndicesQueriesRegistry`, and
`ParseFieldRegistry`.
The goal here is to reduce the amount of plumbing required to
allow parsing pluggable things. With this you don't have to pass
registries all over the place. Instead you must pass a super
registry to fewer places and use it to wrap the reader. This is
the same tradeoff that we use for NamedWriteable and it allows
much, much simpler binary serialization. We think we want that
same thing for xcontent serialization.
The only parsing actually converted to this method is parsing
`ScoreFunctions` inside of `FunctionScoreQuery`. I chose this
because it is relatively self contained.
It looks like the exception reason can differ in different default
locales, so the build would fail in any non-English locale. This
switches the catch to the name of the exception which shouldn't
vary.
We are currenlty checking that no deprecation warnings are emitted in our query tests. That can be moved to ESTestCase (disabled in ESIntegTestCase) as it allows us to easily catch where our tests use deprecated features and assert on the expected warnings.
We return deprecation warnings as response headers, besides logging them. Strict parsing mode stayed around, but was only used in query tests, though we also introduced checks for deprecation warnings there that don't need strict parsing anymore (see #20993).
We can then safely remove support for strict parsing mode. The final goal is to remove the ParseFieldMatcher class, but there are many many users of it. This commit prepares the field for the removal, by deprecating ParseFieldMatcher and making it effectively not needed. Strict parsing is removed from ParseFieldMatcher, and strict parsing is replaced in tests where needed with deprecation warnings checks.
Note that the setting to enable strict parsing was never ported to the new settings infra hance it cannot be set in production. It is really only used in our own tests.
Relates to #19552
This commit makes mapping updates atomic when multiple types in an index are updated. Mappings for an index are now applied in a single atomic operation, which also allows to optimize some of the cross-type updates and checks.
In #22094 we introduce a test-only setting to simulate transport
impls that don't support handshakes. This commit implements the same logic
without a setting.
The JSON processor has an optional field called "target_field".
If you don't specify target_field then target_field becomes what you specified as "field".
There isn't anyway to add the fields to the root of a document. By
setting `add_to_root`, now serialized fields will be inserted into the
top-level fields of the ingest document.
Closes#21898.
Today we initialize Netty in a static initializer. We trigger this
method via static initializers from Netty-related classes, but we can
trigger this method earlier than we do to ensure that Netty is
initialized how we want it to be.
Inline scripts defined in Ingest Pipelines are now compiled at creation time to preemptively catch errors on initialization of the pipeline.
Fixes#21842.
Low level handshake code doesn't handle situations gracefully if the connection
is concurrently closed or reset by peer. This commit adds the relevant code to
fail the handshake if the connection is closed.
Moves the last of the "easy" parser construction into
`RestRequest`, this time with a new method
`RestRequest#contentParser`. The rest of the production
code that builds `XContentParser` isn't "easy" because it is
exposed in the Transport Client API (a Builder) object.
The creation of the `ValuesSource` used to pass `DateTimeZone.UTC` as a time
zone all the time in case of empty fields in spite of the fact that all doc
value formats but the date one reject this parameter.
This commit centralizes the creation of the `ValuesSource` and adds unit tests
to it.
Closes#22009
With this commit we enable the Jackson feature 'STRICT_DUPLICATE_DETECTION'
by default. This ensures that JSON keys are always unique. While this has
a performance impact, benchmarking has indicated that the typical drop in
indexing throughput is around 1 - 2%.
As a last resort, we allow users to still disable strict duplicate checks
by setting `-Des.json.strict_duplicate_detection=false` which is
intentionally undocumented.
Closes#19614
Grok was originally ignoring potential matches to named-capture groups
larger than one. For example, If you had two patterns containing the
same named field, but only the second pattern matched, it would fail to
pick this up.
This PR fixes this by exploring all potential places where a
named-capture was used and chooses the first one that matched.
Fixes#22117.
Today we rely on the version that the API user passes in together with the DiscoveryNode. This commit introduces a low level handshake where nodes exchange their version to be used with the transport protocol that is executed every time a connection to a node is established. This, on the one hand allows to change the wire protocol based on the version we are talking to even without a full cluster restart. Today we would need to carry on a BWC layer across major versions but with a handshake we can rely on the fact that the latest version of the previous minor executes a handshake and uses the latest protocol version across all communication with the N+1 version nodes.
This change is yet fully backwards compatible, a followup PR will remove the BWC in 6.0 once this has been back-ported to the 5.x branch
This class is just a wrapper around `SearchContext`, so let's use
`SearchContext` directly. The change is mechanical, except the
`ValuesSourceConfig` class, where I moved the logic to get a `ValuesSource`
given a config.
Our query DSL supports empty queries (`{}`), which have a different meaning depending on the query that holds it, either ignored, match_all or match_none. We deprecated the support for empty queries in 5.0, where we log a deprecation warning wherever they are used.
The way we supported it once we moved query parsing to the coordinating node was having an Optional<QueryBuilder> return type in all of our parse methods (called fromXContent). See #17624. The central place for this was QueryParseContext#parseInnerQueryBuilder. We can now remove all the optional return types and simply throw an exception whenever an empty query is found.
Today we connect and publish the nodes connection before we execute a
handshake with the node we connect to. In the case of connecting to a node
that won't pass the handshake this connection is already `published` and other
code paths can use it. This commit detaches the connection and the publish of the
connection such that `TransportService` can do a handshake before actually connect
and publish the connection.
To get #22003 in cleanly we need to centralize as much `XContentParser` creation as possible into `RestRequest`. That'll mean we have to plumb the `NamedXContentRegistry` into fewer places.
This removes `RestAction.hasBody`, `RestAction.guessBodyContentType`, and `RestActions.getRestContent`, moving callers over to `RestRequest.hasContentOrSourceParam`, `RestRequest.contentOrSourceParam`, and `RestRequest.contentOrSourceParamParser` and `RestRequest.withContentOrSourceParamParserOrNull`. The idea is to use `withContentOrSourceParamParserOrNull` if you need to handle requests without any sort of body content and to use `contentOrSourceParamParser` otherwise.
I believe the vast majority of this PR to be purely mechanical but I know I've made the following behavioral change (I'll add more if I think of more):
* If you make a request to an endpoint that requires a request body and has cut over to the new APIs instead of getting `Failed to derive xcontent` you'll get `Body required`.
* Template parsing is now non-strict by default. This is important because we need to be able to deprecate things without requests failing.
If you try to close the rest client inside one of its callbacks then
it blocks itself. The thread pool switches the status to one that
requests a shutdown and then waits for the pool to shutdown. When
another thread attempts to honor the shutdown request it waits
for all the threads in the pool to finish what they are working on.
Thus thread a is waiting on thread b while thread b is waiting
on thread a. It isn't quite that simple, but it is close.
Relates to #22027
This is an attempt to start moving aggs parsing to `ObjectParser`. There is
still A LOT to do, but ObjectParser is way better than the way aggregations
parsing works today. For instance in most cases, we reject numbers that are
provided as strings, which we are supposed to accept since some client languages
(looking at you Perl) cannot make sure to use the appropriate types.
Relates to #22009
* Remove 2.0 prerelease version constants
This is a start to addressing #21887. This removes:
* pre 2.0 snapshot format support
* automatic units addition to cluster settings
* bwc check for delete by query in pre 2.0 indexes
This adds the `_primary_term` field internally to the mappings. This field is
populated with the current shard's primary term.
It is intended to be used for collision resolution when two document copies have
the same sequence id, therefore, doc_values for the field are stored but the
filed itself is not indexed.
This also fixes the `_seq_no` field so that doc_values are retrievable (they
were previously stored but irretrievable) and changes the `stats` implementation
to more efficiently use the points API to retrieve the min/max instead of
iterating on each doc_value value. Additionally, even though we intend to be
able to search on the field, it was previously not searchable. This commit makes
it searchable.
There is no user-visible `_primary_term` field. Instead, the fields are
updated by calling:
```java
index.parsedDoc().updateSeqID(seqNum, primaryTerm);
```
This includes example methods in `Versions` and `Engine` for retrieving the
sequence id values from the index (see `Engine.getSequenceID`) that are only
used in unit tests. These will be extended/replaced by actual implementations
once we make use of sequence numbers as a conflict resolution measure.
Relates to #10708
Supercedes #21480
P.S. As a side effect of this commit, `SlowCompositeReaderWrapper` cannot be
used for documents that contain `_seq_no` because it is a Point value and SCRW
cannot wrap documents with points, so the tests have been updated to loop
through the `LeafReaderContext`s now instead.
This adds a fromXContent method and unit test to the HighlightField class so we
can parse it as part of a serch response. This is part of the preparation for
parsing search responses on the client side.
`_update_by_query` supports specifying the `pipeline` to process the
documents as a url parameter but `_reindex` doesn't. It doesn't because
everything about the `_reindex` request that has to do with writing
the documents is grouped under the `dest` object in the request body.
This changes the response parameter from
`request [_reindex] contains unrecognized parameter: [pipeline]` to
`_reindex doesn't support [pipeline] as a query parmaeter. Specify it in the [dest] object instead.`
Action filters currently have the ability to filter both the request and
response. But the response side was not actually used. This change
removes support for filtering responses with action filters.
Changes the default socket and connection timeouts for the rest
client from 10 seconds to the more generous 30 seconds.
Defaults reindex-from-remote to those timeouts and make the
timeouts configurable like so:
```
POST _reindex
{
"source": {
"remote": {
"host": "http://otherhost:9200",
"socket_timeout": "1m",
"connect_timeout": "10s"
},
"index": "source",
"query": {
"match": {
"test": "data"
}
}
},
"dest": {
"index": "dest"
}
}
```
Closes#21707
I added an assertion to Netty4/Netty3Transport in 5.x that is not in
master yet. This commit port the assert to ensure we consumed all connection
in `connectToChannels`
We don't use the test infra nor do we run the tests. They might all be
entirely out of date. We also have a different BWC test infra in-place.
This change removes all of the legacy infra.
Timeouts are global today across all connections this commit allows to specify
a connection timeout per node such that depending on the context connections can
be established with different timeouts.
Relates to #19719