Currently, when adding a new mapping, we attempt to parse + merge it before
checking whether its top-level document type matches the existing type. So when
a user attempts to introduce a new mapping type, we may give a confusing error
message around merging instead of complaining that it's not possible to add
more than one type ("Rejecting mapping update to [my-index] as the final
mapping would have more than 1 type...").
This PR moves the type validation to the start of
`MetaDataMappingService#applyRequest` so that we make sure the type matches
before performing any mapper merging.
We already partially addressed this issue in #29316, but the tests there
focused on `MapperService` and did not catch this problem with end-to-end
mapping updates.
Addresses #43012.
We accidentally introduced this bug when adding a typeless version of the
rollover request. The bug is not present if include_type_name is set to true.
The previous hasProcessors method would validate if a processor was
present within a pipeline, but would not return the contents of the
processors. This does not allow a consumer to inspect the processor for
specific metadata. The method now returns the list of processors based
on the class of the processor passed in.
Because auto-date-histo can perform multiple reductions while
merging buckets, we need to ensure that the intermediate reductions
are done with a `finalReduce` set to false to prevent Pipeline aggs
from generating their output.
Once all the buckets have been merged and the output is stable,
a mostly-noop reduction can be performed which will allow pipelines
to generate their output.
* SQL: ODBC: document newest conn string parameters
This commit adds the documentation for two newly added connection string
parameters: AutoEscapePVA and IndexIncludeFrozen.
It also removes the recommended OSes from the prerequisites list and
places the recommendation distinctively: the unmet prerequisites will
fail the installation, while the driver would install on other OSes than
those recommended.
* address review suggestions.
- adjust phrasing for clearer message.
(cherry picked from commit e18ac10c6e163a04f5b7cf7fa72f262882ffb711)
This commit makes sure that mapping parameters to `CreateIndex` and
`PutIndexTemplate` are keyed by the type name.
`IndexCreationTask` expects mappings to be keyed by the type name.
It asserts this for template mappings but not for the mappings in the request.
The `CreateIndexRequest` and `RestCreateIndexAction` mostly make it sure
that the mapping is keyed by a type name, but not always.
When building the create-index request outside of the REST handler, there are
a few methods to set the mapping for the request. Some of them add the type
name some of them do not.
For example, `CreateIndexRequest#mapping(String type, Map<String, ?> source)`
adds the type name, but
`CreateIndexRequest#mapping(String type, XContentBuilder source)` does not.
This PR asserts the type name in the request mapping inside `IndexCreationTask`
and makes all `CreateIndexRequest#mapping` methods add the type name.
testShouldFlushAfterPeerRecovery was added #28350 to make sure the
flushing loop triggered by afterWriteOperation eventually terminates.
This test relies on the fact that we call afterWriteOperation after
making changes in translog. In #44756, we roll a new generation in
RecoveryTarget#finalizeRecovery but do not call afterWriteOperation.
Relates #28350
Relates #45073
Today, if an operation-based peer recovery occurs, we won't trim
translog but leave it as is. Some unacknowledged operations existing in
translog of that replica might suddenly reappear when it gets promoted.
With this change, we ensure trimming translog above the starting
sequence number of phase 2. This change can allow us to read translog
forward.
* Follow up to #44949
* Stop using a special code path for multi-line JSON and instead handle its detection like that of other XContent types when creating the request
* Only leave a single path that holds a reference to the full REST request
* In the next step we can move the copying of request content to happen before the actual request handling and make it conditional on the handler in question to stop copying bulk requests as suggested in #44564
This commit adds a first draft of a regression analysis
to data frame analytics. There is high probability that
the exact syntax might change.
This commit adds the new analysis type and its parameters as
well as appropriate validation. It also modifies the extractor
and the fields detector to be able to handle categorical fields
as regression analysis supports them.
* Reduces complicated callback relations in `testSuccessfulSnapshotAndRestore` to flat steps of sequential actions
* Will refactor the other tests in this suit as a follow up
* This format certainly makes it easier to create more complicated tests that involve multiple subsequent snapshots as it would allow adding loops
When having a cluster state from 6.x, display the metadata version as the cluster state version.
Avoids confusion where a cluster state from 6.x is displayed as version 0 even if has some actual
content.
* Restrict which tasks can use testclusters
This PR fixes a problem between the interaction of test-clusters and
build cache.
Before this any task could have used a cluster without tracking it as
input.
With this change a new interface is introduced to track the tasks that
can use clusters and we do consider the cluster as input for all of
them.
This commit makes the gitRevision property a lazy loaded value by
returning an Object implementing toString(). The Dockerfile template is
also changed to use groovy templates instead of the mavenfilter hack, so
converting to String will not happen until runtime.
Today if a shard is not fully allocated we maintain a retention lease for a
lost peer for up to 12 hours, retaining all operations that occur in that time
period so that we can recover this replica using an operations-based recovery
if it returns. However it is not always reasonable to perform an
operations-based recovery on such a replica: if the replica is a very long way
behind the rest of the replication group then it can be much quicker to perform
a file-based recovery instead.
This commit introduces a notion of "reasonable" recoveries. If an
operations-based recovery would involve copying only a small number of
operations, but the index is large, then an operations-based recovery is
reasonable; on the other hand if there are many operations to copy across and
the index itself is relatively small then it makes more sense to perform a
file-based recovery. We measure the size of the index by computing its number
of documents (including deleted documents) in all segments belonging to the
current safe commit, and compare this to the number of operations a lease is
retaining below the local checkpoint of the safe commit. We consider an
operations-based recovery to be reasonable iff it would involve replaying at
most 10% of the documents in the index.
The mechanism for this feature is to expire peer-recovery retention leases
early if they are retaining so much history that an operations-based recovery
using that lease would be unreasonable.
Relates #41536
CellIdSource is a helper ValuesSource that encodes GeoPoint
into a long-encoded representation of the grid bucket the point
is associated with. This complicates thing as usage evolves to
support shapes that are associated with more than one bucket ordinal.
* Name each inner_hits section of nested queries differently and extract and combine the multiple values it generates into a single list.
This also introduces a limitation (its origin it's with Elasticsearch
though) on the sorting capabilities when the sorting is based on the
nested fields filtered: only one of the conditions applied to nested
documents will be used in the nested sorting.
(cherry picked from commit cfc5cf68f6e83b07bb9006986d0903d6be418ec6)
Elasticsearch does not grant Netty reflection access to get Unsafe. The
only mechanism that currently exists to free direct buffers in a timely
manner is to use Unsafe. This leads to the occasional scenario, under
heavy network load, that direct byte buffers can slowly build up without
being freed.
This commit disables Netty direct buffer pooling and moves to a strategy
of using a single thread-local direct buffer for interfacing with sockets.
This will reduce the memory usage from networking. Elasticsearch
currently derives very little value from direct buffer usage (TLS,
compression, Lucene, Elasticsearch handling, etc all use heap bytes). So
this seems like the correct trade-off until that changes.
Today the test waits for one of the shards to be blocked, but this does not
mean that the block has been applied on all nodes, so a subsequent indexing
operation may still go through.
Fixes#45338