Fixes an issue where the value for the `script.engine.<lang>.inline`
settings would be _set_ properly, but would not accurately be reflected
in the `include_defaults` output. Adds a test to ensure the default raw
setting is now correct.
Resolves#20159
The class Setting holds a static reference to a deprecation logger
instance. When the class initializer for Setting runs, it starts
triggering log4j initialization. There is a chain of initializations
from InternalSettingsPreparer to Environment to Setting that triggers
this initialization before log4j configuration has occurred. This commit
modifies this initialization so that initialization is not done eagerly.
Relates #20170
This makes GET operations more consistent with `_search` operations which expect
`(stored_)fields` to work on stored fields and source filtering to work on the
`_source` field. This is now possible thanks to the fact that GET operations
do not read from the translog anymore (#20102) and also allows to get rid of
`FieldMapper#isGenerated`.
The `_termvectors` API (and thus more_like_this too) was relying on the fact
that GET operations would extract fields from either stored fields or the source
so the logic to do this that used to exist in `ShardGetService` has been moved
to `TermVectorsService`. It would be nice that term vectors do not rely on this,
but this does not seem to be a low hanging fruit.
Non-stale shard copies are currently tracked using their allocation ids in the cluster state. When a node leaves the cluster, shard copies of that node are marked as stale by removing their allocation ids from the active set in the cluster. For full cluster restarts, this can have the unwanted effect that only the last node holding a copy of the shard will be seen as non-stale. The other shard copies are not really stale though as long as no writes have happened on this shard copy. Shard copies should thus only be marked as stale (by the master in the cluster state) if other active shards have received writes.
This commit implements the above logic and also renames the persistent structure used to track non-stale shard copies from "active_allocations" to "in_sync_allocations" as we now also support tracking non-stale shard copies that have no active routing entries in the cluster state.
We need to get the string representation of numbers in order to include in
`_all`. However this has a cost and disabling `_all` is rather common so we
should look into skipping it.
The network types in use on a cluster can be useful information to have,
so this commit adds aggregate metrics for the network types in use in a
cluster to the cluster stats.
Relates #20144
This moves the Writer interface from StreamOutput into Writeable, as a peer of its inner Reader interface. This should hopefully help to avoid random functional interfaces being created for the same purpose.
It also makes use of the moved class by updating writeMapOfLists and readMapOfLists.
Previous to this change the nesting of aggregation profiling results
would be incorrect when the request contains a terms aggregation and the
collect mode is (implicitly or explicitly) set to `breadth_first`. This
was because the aggregation profiling has to make the assumption that
the `preCollection()` method of children aggregations is always called in
the `preCollection()` method of their parent aggregation. When the collect
mode is `breadth_first` the `preCollection` of the children aggregations
was delayed until the documents were replayed.
This change moves the `preCollection()` of deferred aggregations to run
during the `preCollection()` of the parent aggregation. This should have
no adverse impact on the breadth_first mode as there is no allocation of
memory in any of the aggregations.
We also apply the same logic to the diversified sampler aggregation as
we did to the terms aggregation to move the `preCollection()` of the
child aggregations method to be called during the `preCollection()` of
the parent aggregation.
This commit also includes a fix so that the `ProfilingLeafBucketCollector`
propagates the scorer to its delegate so the diversified sampler agg works
when profiling is enabled.
If they are specified by a mapping update, these properties are currently
ignored. This commit also fixes the handling of `dynamic_templates` so that it
is possible to remove templates (and so that it works more similarly to all
other mapping properties).
Closes#20111
This is a house cleaning commit that refactors GeoPointFieldMapperLegacy to LegacyGeoPointFieldMapper for consistency with Legacy Numerics and IP field mappers.
IndexedGeoBoundingBoxQuery and InMemoryGeoBoundingBoxQuery are also deprecated and refactored as Legacy classes.
This change adds a special field named _none_ that allows to disable the retrieval of the stored fields in a search request or in a TopHitsAggregation.
To completely disable stored fields retrieval (including disabling metadata fields retrieval such as _id or _type) use _none_ like this:
````
POST _search
{
"stored_fields": "_none_"
}
````
Today we do a lot of accounting inside the engine to maintain locations
of documents inside the transaction log. This is only needed to ensure
we can return the documents source from the engine if it hasn't been refreshed.
Aside of the added complexity to be able to read from the currently writing translog,
maintainance of pointers into the translog this also caused inconsistencies like different values
of the `_ttl` field if it was read from the tlog or not. TermVectors are totally different if
the document is fetched from the tranlog since copy fields are ignored etc.
This chance will simply call `refresh` if the documents latest version is not in the index. This
streamlines the semantics of the `_get` API and allows for more optimizations inside the engine
and on the transaction log. Note: `_refresh` is only called iff the requested document is not refreshed
yet but has recently been updated or added.
#Relates to #19787
I was writing tests for RAM usage estimation of LiveVersionMap and found a
couple issues:
- The BytesRef objects used as uids were oversized since they were created
via `new BytesRef(CharSequence)` which creates a `byte[]` whose size is 3x
the length of the provided char sequence. Given that our uids are most of
times ASCII sequences, this is a waste of memory.
- `VersionValue` was using `translogLocation.size` instead of
`translogLocation.ramBytesUsed()` for RAM estimation, which is completely
unrelated to the memory footprint of the `Translog.Location` object.
In particular, the latter issue could cause RAM usage estimation to be
significantly overestimated, especially on large documents.
I also added tests for ram accounting.
Deprecates the optimize_bbox parameter on geodistance queries. This has no longer been needed since version 2.2 because lucene geo distance queries (postings and LatLonPoint) already optimize by bounding box.
Catching and suppressing AlreadyClosedException from Lucene is dangerous because it can mean there is a bug in ES since ES should normally guard against invoking Lucene classes after they were closed.
I reviewed the cases where we catch AlreadyClosedException from Lucene and removed the ones that I believe are not needed, or improved comments explaining why ACE is OK in that case.
I think (@s1monw can you confirm?) that holding the engine's readLock means IW will not be closed, except if disaster strikes (failEngine) at which point I think it's fine to see the original ACE in the logs?
Closes#19861
When a SearchContext is closed it's reader / searcher reference is closed too.
If this happens while a search is accessing it's reader reference it can lead
to an unexpected `AlreadyClosedException` or worst case, an already closed MMapDirectory
is access causing a `SIGSEV` like in #20008 (even though the window for this is very small).
SearchContext can be closed concurrently if:
* an index is deleted / removed from the node
* a search context is idle for too long and is cleaned by the reaper
* an explicit freeContext message is received
This change adds reference counting to the SearchContext base class and it's used
inside SearchService each time the context is accessed.
Closes#20008
This includes:
- All regular numeric types such as int, long, scaled-float, double, etc
- IP addresses
- Dates
- Geopoints and Geoshapes
Relates to #19784
Adds ignoreUnavailable to the snapshot status API to be consistent
with the get snapshots API which has a similar parameter. If
ignoreUnavailable is set to true, then the snapshot status request
will ignore any snapshots that were not found in the repository,
instead of throwing a SnapshotMissingException.
Closes#18522
StartupException overrides Throwable#printStackTrace(PrintStream) but
not Throwable#printStackTrace(PrintWriter). The former override is used
when the JVM terminates with an exception, but the latter override can
be used in some logging frameworks when rendering an exception (e.g.,
log4j). This commit adds an override for the latter, with the behavior
for the two overrides being the same.
This commit renames StartupError to StartupException. This rename is due
to the fact that this class inherits from Exception not Error in the
Throwable class hierarchy.
This commit removes the minimum master nodes bootstrap check. The
motivation for this check was to raise awareness of the minimum master
nodes setting but this check gives a false sense of security because
it's too easy to set the setting to one when first standing up a cluster
and never update it when adding master-eligible nodes, or have it out of
sync on various nodes and still pass this check. Since this check does
not have the security that other bootstrap checks provide, it should be
removed in favor of a stronger guarantee in the future. We do log a
warning if an election occurs with minimum master nodes less than a
quorum of master-eligible nodes that participated in an election and
this is the best that we can do right now.
Relates #20082
Some time ago, AllocationService.reroute was changed to not only return updates to the routing table but also to the metadata (which contain primary terms and in-sync allocation ids). A lot of test code still only updates the routing table though, which is fixed by this PR.
How index templates match is currently controlled by the
IndexTemplateFilter interface. It is pluggable, to add additional
filter implementations to the default glob matcher.
This change removes the IndexTemplateFilter interface completely. This
is a very esoteric extension point, and not worth maintaining. Instead,
any improvements should be made to all of our glob matching.