As the translog evolves towards a full operations log as part of the
sequence numbers push, there is a need for the translog to be able to
represent operations for which a sequence number was assigned, but the
operation did not mutate the index. Examples of how this can arise are
operations that fail after the sequence number is assigned, and gaps in
this history that arise when an operation is assigned a sequence number
but the operation never completed (e.g., a node crash). It is important
that these operations appear in the history so that they can be
replicated and replayed during recovery as otherwise the history will be
incomplete and local checkpoints will not be able to advance. This
commit introduces a no-op to the translog to set the stage for these
efforts.
Relates #22291
Introduces `XContentParser#namedObject which works a little like
`StreamInput#readNamedWriteable`: on startup components register
parsers under names and a superclass. At runtime we look up the
parser and call it to parse the object.
Right now the parsers take a context object they use to help with
the parsing but I hope to be able to eliminate the need for this
context as most what it is used for at this point is to move
around parser registries which should be replaced by this method
eventually. I make no effort to do so in this PR because it is
big enough already. This is meant to the a start down a road that
allows us to remove classes like `QueryParseContext`,
`AggregatorParsers`, `IndicesQueriesRegistry`, and
`ParseFieldRegistry`.
The goal here is to reduce the amount of plumbing required to
allow parsing pluggable things. With this you don't have to pass
registries all over the place. Instead you must pass a super
registry to fewer places and use it to wrap the reader. This is
the same tradeoff that we use for NamedWriteable and it allows
much, much simpler binary serialization. We think we want that
same thing for xcontent serialization.
The only parsing actually converted to this method is parsing
`ScoreFunctions` inside of `FunctionScoreQuery`. I chose this
because it is relatively self contained.
It looks like the exception reason can differ in different default
locales, so the build would fail in any non-English locale. This
switches the catch to the name of the exception which shouldn't
vary.
We are currenlty checking that no deprecation warnings are emitted in our query tests. That can be moved to ESTestCase (disabled in ESIntegTestCase) as it allows us to easily catch where our tests use deprecated features and assert on the expected warnings.
We return deprecation warnings as response headers, besides logging them. Strict parsing mode stayed around, but was only used in query tests, though we also introduced checks for deprecation warnings there that don't need strict parsing anymore (see #20993).
We can then safely remove support for strict parsing mode. The final goal is to remove the ParseFieldMatcher class, but there are many many users of it. This commit prepares the field for the removal, by deprecating ParseFieldMatcher and making it effectively not needed. Strict parsing is removed from ParseFieldMatcher, and strict parsing is replaced in tests where needed with deprecation warnings checks.
Note that the setting to enable strict parsing was never ported to the new settings infra hance it cannot be set in production. It is really only used in our own tests.
Relates to #19552
This commit makes mapping updates atomic when multiple types in an index are updated. Mappings for an index are now applied in a single atomic operation, which also allows to optimize some of the cross-type updates and checks.
In #22094 we introduce a test-only setting to simulate transport
impls that don't support handshakes. This commit implements the same logic
without a setting.
The JSON processor has an optional field called "target_field".
If you don't specify target_field then target_field becomes what you specified as "field".
There isn't anyway to add the fields to the root of a document. By
setting `add_to_root`, now serialized fields will be inserted into the
top-level fields of the ingest document.
Closes#21898.
Today we initialize Netty in a static initializer. We trigger this
method via static initializers from Netty-related classes, but we can
trigger this method earlier than we do to ensure that Netty is
initialized how we want it to be.
Inline scripts defined in Ingest Pipelines are now compiled at creation time to preemptively catch errors on initialization of the pipeline.
Fixes#21842.
Low level handshake code doesn't handle situations gracefully if the connection
is concurrently closed or reset by peer. This commit adds the relevant code to
fail the handshake if the connection is closed.
Moves the last of the "easy" parser construction into
`RestRequest`, this time with a new method
`RestRequest#contentParser`. The rest of the production
code that builds `XContentParser` isn't "easy" because it is
exposed in the Transport Client API (a Builder) object.
The creation of the `ValuesSource` used to pass `DateTimeZone.UTC` as a time
zone all the time in case of empty fields in spite of the fact that all doc
value formats but the date one reject this parameter.
This commit centralizes the creation of the `ValuesSource` and adds unit tests
to it.
Closes#22009
With this commit we enable the Jackson feature 'STRICT_DUPLICATE_DETECTION'
by default. This ensures that JSON keys are always unique. While this has
a performance impact, benchmarking has indicated that the typical drop in
indexing throughput is around 1 - 2%.
As a last resort, we allow users to still disable strict duplicate checks
by setting `-Des.json.strict_duplicate_detection=false` which is
intentionally undocumented.
Closes#19614
Grok was originally ignoring potential matches to named-capture groups
larger than one. For example, If you had two patterns containing the
same named field, but only the second pattern matched, it would fail to
pick this up.
This PR fixes this by exploring all potential places where a
named-capture was used and chooses the first one that matched.
Fixes#22117.
Today we rely on the version that the API user passes in together with the DiscoveryNode. This commit introduces a low level handshake where nodes exchange their version to be used with the transport protocol that is executed every time a connection to a node is established. This, on the one hand allows to change the wire protocol based on the version we are talking to even without a full cluster restart. Today we would need to carry on a BWC layer across major versions but with a handshake we can rely on the fact that the latest version of the previous minor executes a handshake and uses the latest protocol version across all communication with the N+1 version nodes.
This change is yet fully backwards compatible, a followup PR will remove the BWC in 6.0 once this has been back-ported to the 5.x branch
This class is just a wrapper around `SearchContext`, so let's use
`SearchContext` directly. The change is mechanical, except the
`ValuesSourceConfig` class, where I moved the logic to get a `ValuesSource`
given a config.
Our query DSL supports empty queries (`{}`), which have a different meaning depending on the query that holds it, either ignored, match_all or match_none. We deprecated the support for empty queries in 5.0, where we log a deprecation warning wherever they are used.
The way we supported it once we moved query parsing to the coordinating node was having an Optional<QueryBuilder> return type in all of our parse methods (called fromXContent). See #17624. The central place for this was QueryParseContext#parseInnerQueryBuilder. We can now remove all the optional return types and simply throw an exception whenever an empty query is found.
Today we connect and publish the nodes connection before we execute a
handshake with the node we connect to. In the case of connecting to a node
that won't pass the handshake this connection is already `published` and other
code paths can use it. This commit detaches the connection and the publish of the
connection such that `TransportService` can do a handshake before actually connect
and publish the connection.
To get #22003 in cleanly we need to centralize as much `XContentParser` creation as possible into `RestRequest`. That'll mean we have to plumb the `NamedXContentRegistry` into fewer places.
This removes `RestAction.hasBody`, `RestAction.guessBodyContentType`, and `RestActions.getRestContent`, moving callers over to `RestRequest.hasContentOrSourceParam`, `RestRequest.contentOrSourceParam`, and `RestRequest.contentOrSourceParamParser` and `RestRequest.withContentOrSourceParamParserOrNull`. The idea is to use `withContentOrSourceParamParserOrNull` if you need to handle requests without any sort of body content and to use `contentOrSourceParamParser` otherwise.
I believe the vast majority of this PR to be purely mechanical but I know I've made the following behavioral change (I'll add more if I think of more):
* If you make a request to an endpoint that requires a request body and has cut over to the new APIs instead of getting `Failed to derive xcontent` you'll get `Body required`.
* Template parsing is now non-strict by default. This is important because we need to be able to deprecate things without requests failing.
If you try to close the rest client inside one of its callbacks then
it blocks itself. The thread pool switches the status to one that
requests a shutdown and then waits for the pool to shutdown. When
another thread attempts to honor the shutdown request it waits
for all the threads in the pool to finish what they are working on.
Thus thread a is waiting on thread b while thread b is waiting
on thread a. It isn't quite that simple, but it is close.
Relates to #22027
This is an attempt to start moving aggs parsing to `ObjectParser`. There is
still A LOT to do, but ObjectParser is way better than the way aggregations
parsing works today. For instance in most cases, we reject numbers that are
provided as strings, which we are supposed to accept since some client languages
(looking at you Perl) cannot make sure to use the appropriate types.
Relates to #22009
* Remove 2.0 prerelease version constants
This is a start to addressing #21887. This removes:
* pre 2.0 snapshot format support
* automatic units addition to cluster settings
* bwc check for delete by query in pre 2.0 indexes
This adds the `_primary_term` field internally to the mappings. This field is
populated with the current shard's primary term.
It is intended to be used for collision resolution when two document copies have
the same sequence id, therefore, doc_values for the field are stored but the
filed itself is not indexed.
This also fixes the `_seq_no` field so that doc_values are retrievable (they
were previously stored but irretrievable) and changes the `stats` implementation
to more efficiently use the points API to retrieve the min/max instead of
iterating on each doc_value value. Additionally, even though we intend to be
able to search on the field, it was previously not searchable. This commit makes
it searchable.
There is no user-visible `_primary_term` field. Instead, the fields are
updated by calling:
```java
index.parsedDoc().updateSeqID(seqNum, primaryTerm);
```
This includes example methods in `Versions` and `Engine` for retrieving the
sequence id values from the index (see `Engine.getSequenceID`) that are only
used in unit tests. These will be extended/replaced by actual implementations
once we make use of sequence numbers as a conflict resolution measure.
Relates to #10708
Supercedes #21480
P.S. As a side effect of this commit, `SlowCompositeReaderWrapper` cannot be
used for documents that contain `_seq_no` because it is a Point value and SCRW
cannot wrap documents with points, so the tests have been updated to loop
through the `LeafReaderContext`s now instead.
This adds a fromXContent method and unit test to the HighlightField class so we
can parse it as part of a serch response. This is part of the preparation for
parsing search responses on the client side.
`_update_by_query` supports specifying the `pipeline` to process the
documents as a url parameter but `_reindex` doesn't. It doesn't because
everything about the `_reindex` request that has to do with writing
the documents is grouped under the `dest` object in the request body.
This changes the response parameter from
`request [_reindex] contains unrecognized parameter: [pipeline]` to
`_reindex doesn't support [pipeline] as a query parmaeter. Specify it in the [dest] object instead.`
Action filters currently have the ability to filter both the request and
response. But the response side was not actually used. This change
removes support for filtering responses with action filters.
Changes the default socket and connection timeouts for the rest
client from 10 seconds to the more generous 30 seconds.
Defaults reindex-from-remote to those timeouts and make the
timeouts configurable like so:
```
POST _reindex
{
"source": {
"remote": {
"host": "http://otherhost:9200",
"socket_timeout": "1m",
"connect_timeout": "10s"
},
"index": "source",
"query": {
"match": {
"test": "data"
}
}
},
"dest": {
"index": "dest"
}
}
```
Closes#21707
I added an assertion to Netty4/Netty3Transport in 5.x that is not in
master yet. This commit port the assert to ensure we consumed all connection
in `connectToChannels`
We don't use the test infra nor do we run the tests. They might all be
entirely out of date. We also have a different BWC test infra in-place.
This change removes all of the legacy infra.
Timeouts are global today across all connections this commit allows to specify
a connection timeout per node such that depending on the context connections can
be established with different timeouts.
Relates to #19719
We currently treat every node equally when we establish connections to a node.
Yet, if we are not master eligible or can't hold any data there is no point in creating
a dedicated connection for sending the cluster state or running remote recoveries respectively.
The usage of STATE and RECOVERY connections on non-master and/or non-data nodes will result in an IllegalStateException.
For the record, I also had to remove the geo-hash cell and geo-distance range
queries to make the code compile. These queries already throw an exception in
all cases with 5.x indices, so that does not hurt any more.
I also had to rename all 2.x bwc indices from `index-${version}` to
`unsupported-${version}` to make `OldIndexBackwardCompatibilityIT`
happy.
SearchTemplateRequest to implement CompositeIndicesRequest
Given that SearchTemplateRequest effectively delegates to search when a search is being executed, it should implement the CompositeIndicesRequest interface. The subrequests method should return a single search request. When a search is not going to be executed, because we are in simulate mode, there are no inner requests, and there are no corresponding indices to that request either.
Closes#21747
Set lucene version to 6.4.0-snapshot-ec38570 and update all the sha1s/license
Fix invalid combo after upgrade in query_string query. split_on_whitespace=false is disallowed if auto_generate_phrase_queries=true
Adapt the expectations of some tests to the new format of the Lucene explain output