We have both `Settings.settingsBuilder` and `Settings.builder` that do exactly
the same thing, so we should keep only one. I kept `Settings.builder` since it
has my preference but also it is the one that we use in examples of the Java API.
Because sigma is also used at reduce time, it should be passed to empty aggs.
Otherwise it causes bugs when an empty aggregation is used to perform reduction
is it would assume a sigma of zero.
Closes#17362
This removes PROTOTYPEs from ScoreFunctionsBuilders. To do so we rework
registration so it doesn't need PROTOTYPEs and lines up with the recent
changes to query registration.
FieldStatsProvider had to perform instanceof calls to properly handle dates or
ip addresses. By moving the logic to MappedFieldType, each field type can check
whether all values are within bounds its way.
Note that this commit only keeps rewriting support for dates, which are the only
field for which the rewriting mechanism is likely to help (because of time-based
indices).
BulkByScrollTaskTest#testDelayAndRethrottle was getting rejected exceptions
every once in a while. This was reproducible ~20% of the time for me. I
added a CyclicBarrier to prevent the test from shutting down the thread pool
before the threads get finished.
This allows the user to update the reindex throttle on the fly, with changes
that speed up the throttling being applied immediately and changes that
slow down the throttling being applied during the next batch. This means
that if a user throttles reindex in such a way that it tries to sleep for
16 years and then realizes that they've done something wrong then they
can change the throttle and reindex will wake up again. We don't apply
slow downs immediately so we never get in danger of losing the scan context.
Also, if reindex is canceled while it is sleeping (how it honor throttling)
then it'll immediately wake up and cancel itself.
`text` fields will have fielddata disabled by default. Fielddata can still be
enabled on an existing index by setting `fielddata=true` in the mappings.
If a terms aggregation was ordered by a metric nested in a single bucket aggregator which did not collect any documents (e.g. a filters aggregation which did not match in that term bucket) an ArrayOutOfBoundsException would be thrown when the ordering code tried to retrieve the value for the metric. This fix fixes all numeric metric aggregators so they return their default value when a bucket ordinal is requested which was not collected.
Closes#17225
readFrom is confusing because it requires an instance of the type that it
is reading but it doesn't modify it. But we also have (deprecated) methods
named readFrom that *do* modify the instance. The "right" way to implement
the non-modifying readFrom is to delegate to a constructor that takes a
StreamInput so that the read object can be immutable. Now that we have
`@FunctionalInterface`s it is fairly easy to register things by referring
directly to the constructor.
This change modifying NamedWriteableRegistry so that it does that. It keeps
supporting `registerPrototype` which registers objects to be read by
readFrom but deprecates it and delegates it to a new `register` method
that allows passing a simple functional interface. It also cuts Task.Status
subclasses over to using that method.
The start of #17085
The test was checking that we'd set the headers properly but in some cases
the request had yet to come in because it was running on another thread.
Now we wait for the headers to show up before failing the test.
Closes#17299
If the user asks for a refresh but their reindex or update-by-query
operation touched no indexes we should just skip the resfresh call
entirely. Without this commit we refresh *all* indexes which is totally
wrong.
Closes#17296
The fielddata settings in mappings have been refatored so that:
- text and string have a `fielddata` (boolean) setting that tells whether it
is ok to load in-memory fielddata. It is true by default for now but the
plan is to make it default to false for text fields.
- text and string have a `fielddata_frequency_filter` which contains the same
thing as `fielddata.filter.frequency` used to (but validated at parsing time
instead of being unchecked settings)
- regex fielddata filtering is not supported anymore and will be dropped from
mappings automatically on upgrade.
- text, string and _parent fields have an `eager_global_ordinals` (boolean)
setting that tells whether to load global ordinals eagerly on refresh.
- in-memory fielddata is not supported on keyword fields anymore at all.
- the `fielddata` setting is not supported on other fields that text and string
and will be dropped when upgrading if specified.
Currently, both Gsub and Grok parse regex strings during
Pipeline creation. Thrown parsing exceptions were leaking out, this
commit wraps those exceptions in ElasticsearchParseExceptions.
We current have a ClusterService interface, implemented by InternalClusterService and a couple of test classes. Since the decoupling of the transport service and the cluster service, one can construct a ClusterService fairly easily, so we don't need this extra indirection.
Closes#17183
Also replaced the PercolatorQueryRegistry with the new PercolatorQueryCache.
The PercolatorFieldMapper stores the rewritten form of each percolator query's xcontext
in a binary doc values field. This make sure that the query rewrite happens only during
indexing (some queries for example fetch shapes, terms in remote indices) and
the speed up the loading of the queries in the percolator query cache.
Because the percolator now works inside the search infrastructure a number of features
(sorting fields, pagination, fetch features) are available out of the box.
The following feature requests are automatically implemented via this refactoring:
Closes#10741Closes#7297Closes#13176Closes#13978Closes#11264Closes#10741Closes#4317
Today we allow to set all kinds of index level settings on the node level which
is error prone and difficult to get right in a consistent manner.
For instance if some analyzers are setup in a yaml config file some nodes might
not have these analyzers and then index creation fails.
Nevertheless, this change allows some selected settings to be specified on a node level
for instance:
* `index.codec` which is used in a hot/cold node architecture and it's value is really per node or per index
* `index.store.fs.fs_lock` which is also dependent on the filesystem a node uses
All other index level setting must be specified on the index level. For existing clusters the index must be closed
and all settings must be updated via the API on each of the indices.
Closes#16799