We have some methods Strings#splitStringByCommaToArray and
Strings#splitStringByCommaToSet. It is not obvious that the former
leaves whitespace and the latter trims it. We also have
Strings#tokenizeToStringArray which tokenizes a string to an array, and
trims whitespace. It seems the right thing to do here is to rename
Strings#splitStringByCommaToSet to Strings#tokenizeByCommaToSet so that
its name is aligned with another method that tokenizes by a delimiter
and trims whitespace. We also cleanup the code here, removing an
unneeded splitting by delimiter to set method.
Relates #27715
We currently do not have any server-side read timeouts implemented in
elasticsearch. This commit adds a read timeout setting that defaults to
30 seconds. If after 30 seconds a read has not occurred, the channel
will be closed. A timeout of value of 0 will disable the timeout.
The problem here is that splitting was using a method that intentionally
trims whitespace (the method is really meant to be used for splitting
parameters where whitespace should be trimmed like list
settings). However, for routing values whitespace should not be trimmed
because we allow routing with leading and trailing spaces. This commit
switches the parsing of these routing values to a method that does not
trim whitespace.
Relates #27712
It's possible that a merge may be ongoing when we check the breaker and segment
stats' memory usage, this causes the test to fail. Instead, we should wait for
merging to complete.
Resolves#27651
This test periodically fails if the nodes that apply the cluster state fail to ack the change within 100ms. This commit changes the checks on the test so that
it still checks that the open command has taken effect, but that the wait for active shards has actually failed.
This is related to #27563. In order to interface with java nio, we must
have buffers that are compatible with ByteBuffer. This commit introduces
a basic ByteBufferReference to easily allow transferring bytes off the
wire to usage in the application.
Additionally it introduces an InboundChannelBuffer. This is a buffer
that can internally expand as more space is needed. It is designed to
be integrated with a page recycler so that it can internally reuse pages.
The final piece is moving all of the index work for writing bytes to a
channel into the WriteOperation.
This commit adds a new dynamic cluster setting named `search.max_buckets` that can be used to limit the number of buckets created per shard or by the reduce phase. Each multi bucket aggregator can consume buckets during the final build of the aggregation at the shard level or during the reduce phase (final or not) in the coordinating node. When an aggregator consumes a bucket, a global count for the request is incremented and if this number is greater than the limit an exception is thrown (TooManyBuckets exception).
This change adds the ability for multi bucket aggregator to "consume" buckets in the global limit, the default is 10,000. It's an opt-in consumer so each multi-bucket aggregator must explicitly call the consumer when a bucket is added in the response.
Closes#27452#26012
Index settings didn't support reset by wildcard which also causes
issues like #27537 where archived settings can't be reset. This change
adds support for wildcards like `archived.*` to be used to reset setting to their
defaults or remove them from an index.
Closes#27537
The mappings can be submitted wrapped in a type object or not. They need to be returned in the same way as they were submitted. When applying field filters, we need to make sure that the format is preserved. MappingMetaData#getSourceAsMap removes the root level if it's the type object, which would make us overwrite the original mappings with filtered mappings but without the original root object.
Closes#27678
This commit restricts settings added to the keystore to have a lowercase
ascii name. The java Keystore javadocs state that case sensitivity of
key alias names are implementation dependent. This ensures regardless of
case sensitivity in a jvm implementation, the keys will be stored as we
expect.
Today, we prevent the system from storing a broken index template in the
transport layer, however we don't prevent this in XContent. A broken
index template can break the whole cluster state.
This commit attempts to prevent the system from constructing an index
template without a proper index patterns.
Add support for filtering fields returned as part of mappings in get index, get mappings, get field mappings and field capabilities API.
Plugins can plug in their own function, which receives the index as argument, and return a predicate which controls whether each field is included or not in the returned output.
This commit adds the node name to the names of thread pool executors so
that the node name is visible in rejected execution exception messages.
Relates #27663
The main constructor for rejected execution exception its executor
shutdown constructor parameter to the super constructor where it would
be used as a formatting parameter. This is a mistake so this commit
fixes this issue.
In the global checkpoint sync action, we fsync the translog. However,
the last synced global checkpoint might already be equal to the current
global checkpoint in which case the fsyncing the translog is unnecessary
as either the sync needed guard in the translog will skip the translog,
or the translog needs an fsync for another reason that will be picked up
elsewhere (e.g., at the end of a bulk request).
Relates #27652
The hashCode contract states that equal objects must have equal hash
codes, however the unequal objects are not required to have unequal
hashCodes.
This commit rewrites GeoPointParsingTests#testEqualsHashCodeContract
using#checkEqualsAndHashCode helper.
Closes#27633
* Fix highlighting on a keyword field that defines a normalizer
The `plain` and sometimes the `unified` highlighters need to re-analyze the content to highlight a field
This change makes sure that we don't ignore the normalizer defined on the keyword field for this analysis.
After write operations in some situations we fire a post-operation
global checkpoint sync. The global checkpoint sync unconditionally
fsyncs the translog and this can then look like an fsync
per-request. This violates the translog durability settings on the index
if this durability is set to async. This commit changes the global
checkpoint sync to observe the translog durability.
Relates #27641
Today we exclude internal refreshes in the refresh stats. Yet, it's very much
confusing to not take these into account. This change includes internal refreshes
into the stats until we have a dedicated stats for this.
This new snapshot mostly brings a change to TopFieldCollector which can now
early terminate collection when trackTotalHits is `false`.
As a follow-up, we should replace our usage of
`EarlyTerminatingSortingCollector` with this new option.
Today, we maintain two sets in a SeqNoSet: ongoing sets and completed
sets. We can remove the completed sets and use only the ongoing sets by
releasing the internal bitset of a CountedBitSet when all its bits are
set. This behaves like two sets but simpler. This commit also makes
CountedBitSet as a drop-in replacement for BitSet.
Relates #27268
* Add accounting circuit breaker and track segment memory usage
This commit adds a new circuit breaker "accounting" that is used for tracking
the memory usage of non-request-tied memory users. It also adds tracking for the
amount of Lucene segment memory used by a shard as a user of the new circuit
breaker.
The Lucene segment memory is updated when the shard refreshes, and removed when
the shard relocates away from a node or is deleted. It should also be noted that
all tracking for segment memory uses `addWithoutBreaking` so as not to fail the
shard if a limit is reached.
The `accounting` breaker has a default limit of 100% and will contribute to the
parent breaker limit.
Resolves#27044