When creating a target index from a source index, we don't allow for target
mappings to be specified. This PR simplifies the check that the target mappings
are empty.
This refactor will help when implementing composable template merging, since we
no longer need to resolve + check the target mappings when creating an index
from a template.
Add the ability to get a custom value while specifying a default and use it throughout the
codebase to get rid of the `null` edge case and shorten the code a little.
Introduces a new method on `MappedFieldType` to return a family type name which defaults to the field type.
Changes `wildcard` and `constant_keyword` field types to return `keyword` for field capabilities.
Relates to #53175
`terminate_after` is ignored on search requests that don't return top hits (`size` set to 0)
and do not tracked the number of hits accurately (`track_total_hits`).
We use early termination when the number of hits to track is reached during collection
but this breaks the hard termination of `terminate_after` if it happens before we reached
the `terminate_after` value.
This change ensures that we continue to check `terminate_after` even if the tracking of total
hits has reached the provided value.
Closes#57624
Users are perennially confused by the message they get when writing to
an index is blocked due to excessive disk usage:
TOO_MANY_REQUESTS/12/index read-only / allow delete (api)
Of course this is technically accurate but it is hard to join the dots
from this message to "your disk was too full" without some searching of
forums and documentation. Additionally in #50166 we changed the status
code to today's `429` from the previous `403` which changed the message
from the one that's widely documented elsewhere:
FORBIDDEN/12/index read-only / allow delete (api)
Since #42559 we've considered this block to be under the sole control of
the disk-based shard allocator, and we have seen no evidence to suggest
that anyone is applying this block manually. Therefore this commit
adjusts this block's message to indicate that it's caused by a lack of
disk space.
Similarities only apply to a few text-based field types, but are currently set directly on
the base MappedFieldType class. This commit moves similarity information into
TextSearchInfo, and removes any mentions of it from MappedFieldType or FieldMapper.
It was previously possible to include a similarity parameter on a number of field types
that would then ignore this information. To make it obvious that this has no effect, setting
this parameter on non-text field types now issues a deprecation warning.
This commit creates a shared withCustomConfig method that may be used by
any packaging test. The method will copy the config directory and
override the conf path appropriately depending on the distribution type.
Very rarely this test can fail if we draw a random TimeZone id that we cannot
parse with the legacy joda DateMathParser and get an IllegalArgumentException.
In addition to a "SystemV/*" time zone we also need an index "versionCreated"
before V_7_0_0 and no "format" setting in the query builder. Given how unlikely
this combination is, we should simply dissallow those time zone ids when
generating the random query builder for RangeQueryBuilderTests.
Closes#58431
Now that MappedFieldType no longer extends lucene's FieldType, we need to have a
way of getting the index information about a field necessary for building text queries,
building term vectors, highlighting, etc. This commit introduces a new TextSearchInfo
abstraction that holds this information, and a getTextSearchInfo() method to
MappedFieldType to make it available. Field types that do not support text search can
just return null here.
This allows us to remove the MapperService.getLuceneFieldType() shim method.
This merges the aggregator for `significant_text` into
`significant_terms`, applying the optimization built in #55873 to save
memory when the aggregation is not on top. The `significant_text`
aggregation is pretty memory intensive all on its own and this doesn't
particularly help with that, but it'll help with the memory usage of any
sub-aggregations.
Just like #56094 but for the request side.
Removes a lot of redundant `ShardId` instances from bulk shard requests as well as stops serializing index names when they're not needed because they're not different from what is in the shard id.
Even ignoring the index name serialization savings here, this change saves one `ShardId` instance per bulk shard request at least. This means it saves approximately:
* 8 bytes for the `ShardId` object (itself + one field)
* + another 4 bytes for the `int` in the `ShardId`
* 16 bytes (two fields + the instance itself + the padding) for the `Index` object
* + 30 bytes for the `Index` uuid string
* + all the bytes in the index name string
=> 60+ bytes per bulk request item saved on heap and over the wire
Today the `PublicationContext` interface has a single anonymous
implementation, and `PublicationTransportHandler` has various methods
that take the variables that this anonymous class captures. This commit
refactors this into a proper class with proper fields and moves the
relevant methods onto this class.
Backport of #58405 to 7.x.
FieldTypeLookup maps field names to their MappedFieldTypes. In the past, due to
the presence of multiple mapping types within a single index, this had to be updated
in-place because a mapping update might only affect one type. However, now that
we only have a single type per index, we can completely rebuild the FieldTypeLookup
on each update, removing lots of concurrency worries.
Backporting #58096 to 7.x branch.
Relates to #53100
* use mapping source direcly instead of using mapper service to extract the relevant mapping details
* moved assertion to TimestampField class and added helper method for tests
* Improved logic that inserts timestamp field mapping into an mapping.
If the timestamp field path consisted out of object fields and
if the final mapping did not contain the parent field then an error
occurred, because the prior logic assumed that the object field existed.
* Add support for snapshot and restore to data streams (#57675)
This change adds support for including data streams in snapshots.
Names are provided in indices field (the same way as in other APIs), wildcards are supported.
If rename pattern is specified it renames both data streams and backing indices.
It also adds test to make sure SLM works correctly.
Closes#57127
Relates to #53100
* version fix
* compilation fix
* compilation fix
* remove unused changes
* compilation fix
* test fix
This adds validation to make sure alias operations (add, remove, remove index)
don't target data streams or the backing indices.
(cherry picked from commit 816448990e464a02f3960f12f6f6644a8cce36a4)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Fixes a bug in TextFieldMapper serialization when index is false, and adds a
base-class test to ensure that all field mappers are tested against all variations
with defaults both included and excluded.
Fixes#58188
This is currently used to set the indexVersionCreated parameter on FieldMapper.
However, this parameter is only actually used by two implementations, and clutters
the API considerably. We should just remove it, and use it directly in the
implementations that require it.
When a numeric value in e.g. a `term` query doesn't fit into a long, it
curerently gets parsed to a BigInteger object, that the various term query
builders store untouched. This leads to serialization errors when these queries
are sent across the wire. Instead we can convert to a string representation
early on, since that is what we store e.g. when indexing big integers into
`keyword` fields anyway.
Closes#57917
This change allows to use an `index_filter` in the
field capabilities API. Indices are filtered from
the response if the provided query rewrites to `match_none`
on every shard:
````
GET metrics-*
{
"index_filter": {
"bool": {
"must": [
"range": {
"@timestamp": {
"gt": "2019"
}
}
}
}
}
````
The filtering is done on a best-effort basis, it uses the can match phase
to rewrite queries to `match_none` instead of fully executing the request.
The first shard that can match the filter is used to create the field
capabilities response for the entire index.
Closes#56195
This allows doing true CAS operations on aliases, making sure that an alias is actually properly
moved from a given source index onto a given target index. This is useful to ensure that an
alias is actually moved from a given index to another one, and not just added to another index.
Currently a failed replication action will fail an entire replica. This
includes when replication fails due to potentially short lived transient
issues such as network distruptions or circuit breaking errors.
This commit implements retries using the retryable action.
Forgot the brackets here in #58214 so in the rare case where the
first update seen by the listener doesn't match it will still remove
itself and never be invoked again -> timeout.
This builds an `auto_date_histogram` aggregator that natively aggregates
from many buckets and uses it when the `auto_date_histogram` used to use
`asMultiBucketAggregator` which should save a significant amount of
memory in those cases. In particular, this happens when
`auto_date_histogram` is a sub-aggregator of a multi-bucketing aggregator
like `terms` or `histogram` or `filters`. For the most part we preserve
the original implementation when `auto_date_histogram` only collects from
a single bucket.
It isn't possible to "just port the aggregator" without taking a pretty
significant performance hit because we used to rewrite all of the
buckets every time we switched to a coarser and coarser rounding
configuration. Without some major surgery to how to delay sub-aggs
we'd end up rewriting the delay list zillions of time if there are many
buckets.
The multi-bucket version of the aggregator has a "budget" of "wasted"
buckets and only rewrites all of the buckets when we exceed that budget.
Now that we don't rebucket every time we increase the rounding we can no
longer get an accurate count of the number of buckets! So instead the
aggregator uses an estimate of the number of buckets to trigger switching
to a coarser rounding. This estimate is likely to be *terrible* when
buckets are far apart compared to the rounding. So it also uses the
difference between the first and last bucket to trigger switching to a
coarser rounding. Which covers for the shortcomings of the bucket
estimation technique pretty well. It also causes the aggregator to emit
fewer buckets in cases where they'd be reduced together on the
coordinating node. This is wonderful! But probably fairly rare.
All of that does buy us some speed improvements when the aggregator is
a child of multi-bucket aggregator:
Without metrics or time zone: 25% faster
With metrics: 15% faster
With time zone: 22% faster
Relates to #56487
This commit bumps our JNA dependency from 4.5.1 to 5.5.0, so that we are
now on the latest maintained line, and pick up a large collection of bug
fixes that have accumulated.
This was a really subtle bug that we introduced a long time ago.
If a shard snapshot is in aborted state but hasn't started snapshotting on a node
we can only send the failed notification for it if the shard was actually supposed
to execute on the local node.
Without this fix, if shard snapshots were spread out across at least two data nodes
(so that each data node does not have all the primaries) the abort would actually
never wait on the data nodes. This isn't a big deal with uuid shard generations
but could lead to potential corruption on S3 when using numeric shard generations
(albeit very unlikely now that we have the 3 minute wait there).
Another negative side-effect of this bug was that master would receive a lot more
shard status update messages for aborted shards since each data node not assigned
a primary would send one message for that primary.
The dangling indices action is not a proper master node action so it does not
retry when executed while the cluster hasn't fully formed yet.
Since we use node restarts when setting up the dangling indices state we need
to manually ensure a fully formed cluster before moving on with the tests to avoid
failures.
Backport of #50920. Part of #48366. Implement an API for listing,
importing and deleting dangling indices.
Co-authored-by: David Turner <david.turner@elastic.co>
MappedFieldType is a combination of two concerns:
* an extension of lucene's FieldType, defining how a field should be indexed
* a set of query factory methods, defining how a field should be searched
We want to break these two concerns apart. This commit is a first step to doing this, breaking
the inheritance relationship between MappedFieldType and FieldType. MappedFieldType
instead has a series of boolean flags defining whether or not the field is searchable or
aggregatable, and FieldMapper has a separate FieldType passed to its constructor defining
how indexing should be done.
Relates to #56814
* Normalized prefix for rollover API (#57271)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Lee Hinman <lee@writequit.org>
It fixes the issue #53388
by normalizing prefix at index creation request itself
* Fix compilation for backport
Co-authored-by: Gaurav Chandani <chngau@amazon.com>